Which is the most difficult machine learning algorithm? Usually, what works well for one industry may not work well for the next. When it comes to applying ML algorithms, there are a lot of things that can go wrong — it could be a poor model fit or an incorrect application of a method that could lead to incorrect inference. Also, not understanding the mathematics behind the methods can lead to disasters. In machine learning setting, anything Bayesian has been termed as “challenging” to implement from scratch. For example, a data scientist from Shopify pegged Bayesian Nonparametrics or a combination of Bayesian inference and neural networks difficult to implement.

### Bayesian Inference Described As The Best Approach For Modelling Uncertainty

In machine learning, the Bayesian inference is known for its powerful set of tools for modelling any random variable — the value of a regression parameter, a demographic statistic and even business performance indicators. Dubbed as the best approach to modelling uncertainty, this method is very useful when there is limited data, model overfit and has to use data as evidence that certain facts are more likely than others.

According to a thesis on Variational Algorithms for Approximate Bayesian Inference from the University of Buffalo, the Bayesian framework for machine learning allows incorporation of prior knowledge in a structured way and helps in avoiding the overfitting problems. However, the computational work required is often difficult.

For example, there are times when it is difficult to know or understand the aspects of the data which are relevant for prediction and what data should be considered noise. Even if one finds the parameters to fit the data, any predictions made from the best-fit model would not render the best results or reflect the right trends.

According to another research paper, Variational Bayesian (VB) inference is defined as a tool for machine learning of probabilistic models which is highly accurate than traditional point estimates (least squares, maximum likelihood, maximum a posteriori) but still very fast compared to sampling methods. Variational Bayesian has been pegged useful with latent variable models where the number of unknown variables are more which makes point estimates overfit on one hand, and sampling methods very slow on the other. It also factors in the uncertainty of the unknown variables by estimating a probability distribution for them.

Data scientist Stefano Cosentino observed in a post that the Bayesian approach leans more towards the distributions associated with each parameter. For instance, he writes that the two parameters depicted below, as shown by the Gaussian curves after a trained Bayesian network has converged. Hence the Bayesian approach, where the parameters are unknown quantities can be considered as random variables. University of Buffalo’s paper defines the Bayesian approach to uncertainty, which treats all uncertain quantities as random variables and uses the laws of probability to manipulate those uncertain quantities. Hence, the right Bayesian approach integrates over all uncertain quantities rather than optimise them, states the paper.

### Challenges In Variational Bayesian

**Complicated**: ML practitioners state that one needs to spend a lot of time understanding and designing an inference.

**Hard To Automate**: According to another practitioner, one needs to pick distributions in VB which can approximate with other unknown distribution P. But the distributions need to be appropriate. The computer scientist breaks down the problem as P(X<0)=0, then Q(X<0) should be zero for every Q in the class. P imposes constraints on the possible Q’s that would make sense as approximations.

**Heavy Computation**: Variational inference is suited to large data sets and scenarios where we want to quickly explore many models. However, it also requires heavy computation.

**Not The Most Optimal Results**: Forums are abuzz that VB is the best method to approximate a difficult-to-compute probability density, p, through optimisation. The best way to do this is through the mean-field variational family and find q by coordinate ascent, but this doesn’t guarantee the most optimal member q∈Q either.

**Not Enough Literature Out There**: According to a section of researchers, there isn’t much literature, books or guides available on VB.

Despite its disadvantages, the Bayesian models are very useful, especially in areas such as speech recognition, image restoration and industrial applications such as detecting errors on industrial gas turbines.

Try deep learning using MATLAB