Neural Networks are a network of nodes which can process data and extract valuable insights, inspired from biological neurons within the brain. They were introduced with the sole purpose of recreating or achieving the capabilities of a human brain. A neural network predicts by determining the correlation between the input and output.
What Is A Probabilistic Graphical Model?
The Probabilistic Graphical Model or PGM is an amalgamation of the classic Probabilistic Models and the Graph Theory.
How Are Probabilistic Graphical Models And Neural Networks Related?
Both PGM and NN are data-driven frameworks and both are capable of solving problems on their own. The essential difference between them rests on how they use the data to predict the outcome.
One of the major drawbacks of NN is that it has a certain level of uncertainty in the prediction of outcomes. PGM makes the uncertainty explicit and helps to build models that are more faithful to reality. PGM is also more capable than NN in that case.
A PGM can be limited to providing the functionality of an NN and at the same time, NN can also be exploited to obtain the information that a PGM would provide.
We are going to use one of the popular use cases found on the internet — asking a girl out for a date — to describe the same.
John Doe likes a girl he knows through N mutual acquaintances and wants to ask her out on a date. He has low self-esteem and wants to ask some subset of their mutual friends to introduce him to her. However, a referral from someone she mistrusts may only have a weak positive influence, while a referral from someone she dislikes might even have a negative impact.
Now suppose n of his friends have already had the same idea, and have tried doing this before (assume, for the purposes of this argument, that the young lady in question is unbearably desirable). So he has in his possession, a set of observations, which he can compactly represent as an n bit vector [1,0,1,0 ….] where 1 is a boost, 0 means no contact.
John needs to find the set of friends that he should ask to boost him, i.e. the best vector. Note that this need not be one of the n vectors already tried.
Using a Neural Network: The input layer would have N nodes that obtain the appraisal input, the output layer would be the girl’s response in binary and we can throw in a hidden layer of M neurons that all have N connections back up to the input layer, and encode various quanta of me-friend-girl dynamics that when pooled together in M forward connections to the output layer, determine her interest in him.
When we train this neural network with the n vectors we have, we will learn weights between 0 and 1 on the NM input-mid-layer connections and the M mid-layer-output connections. Intuitively, combinations of inputs that predict the output well (say her 3 closest friends) will be repeatedly reinforced and will become bigger across n observations. So will the weight corresponding to the mid-level neuron that is receiving inputs from specifically that clique. However, there will be no easy way of knowing which of the M mid-layer neurons contains the 3-closest-friends information. The NN will function as a Delphic oracle — you can ask it about the fate of individual vectors, but not for reasons explaining its prediction.
Using PGM: We could also treat this problem as one of Bayesian reasoning, where we potentially receive observations of approval from N nodes, which leads to the formation of an impression (a random variable), which causes date acceptance (an observable event). In this case, we get to see the likelihood probability p(approval from friend i|impression), from which we have to estimate the conditional posterior probability p(impression|vector of all approvals) using Bayes theorem.
Going from p(approval i|impression) to p(all approvals|impression) though, is hard. Usually, machine learners tend to assume conditional independence for all i approvals, i.e. p(all approvals|impression) = product of all p(approval i | impression). This is simple to compute, but gives up on the possibility of modelling non-trivial correlations between inputs. For example, if hearing good things from either A, B and C or from C and F, but not from A and C together impresses the girl (assume that the girl’s social life is extremely rich), such effects won’t show up in such ‘naive’ Bayesian predictions.
Summary: The neural Network lacks the reason behind a prediction that it makes whereas a Probabilistic Graphical Model shows enough evidence to support its prediction. PGM’s will give you detailed intermediate predictions about how likely the individual inputs will be to generate the effect.
Both Neural Networks and Probabilistic Graphical Models can describe the correlation between the dependent and independent factors of any problem which involves a number of features that results in a specific outcome. Both can be used to learn about the network functions.