Yesterday I found the following paper (Dropout Inference in Bayesian Neural Networks with Alpha-divergences) by Yingzhen Li and Yarin Gal in which they address one of the shortcomings of the approach presented in the blog post. Basically, the method I showed above is based on Variational Bayesian Inference, which has a tendency to under fit the posterior (meaning that it gives more optimistic results than it should). To address this, they propose a modified loss function to train your neural network on.
In the attached notebook I tried to implement their loss function. It took a bit of tinkering, but I think this should work adequately. Other than that, I haven't given much thought yet to the calibration of the network and training parameters, which is definitely an important thing to do.
edit
For those of who who're interested in understanding what the alpha parameter does in the modified loss function, it might be instructive to look at figure 2 in the following paper (Black-Box ?-Divergence Minimization) by Hernández-Lobato et al.,
Attachments: