MITB Banner

Want To Simplify Neural Networks? Transform Them Into Decision Trees

Share
dt-bn

The applications of deep learning are growing so vast that it has made a great headway into areas requiring heavy computing. And, the elements that make up DL models — artificial neural networks (ANN) — can be said to be the driving force behind the subject.

Neural networks can perform tasks efficiently, especially those related to classification. But, the problem lies in its understanding, meaning that neural networks are incomprehensible when it comes to their function. It is not yet ascertained as to why it performs classification so well. This article elaborates on a research paper by noted computer scientists Geoffrey Hinton and Nicholas Frosst, where they go in-depth about how a neural net can be made into a soft decision tree.

Interpretation Through Decision Trees

In the paper Distilling a Neural Network Into a Soft Decision Tree, Hinton and Prosst tell why decision trees are preferred to elucidate neural nets. “Unlike the hidden units in a neural net, a typical node at the lower levels of a decision tree is only used by a very small fraction of the training data so the lower parts of the decision tree tend to overfit unless the size of the training set is exponentially large compared with the depth of the tree.”

Hinton and Frosst use the neural network itself to train on a decision tree. If large amounts of data are encountered, poor statistical efficiency observed in decision trees is nullified by neural nets. An important point here, ‘soft decision trees’ are considered to train the neural net. Soft decision trees mean children nodes (also known as the leaf or terminal node if it ends here) in a decision tree are assigned a certain probability. Also, all leaves contribute equally to the final decision in the tree.

Mixture Of Experts and The Gradient Descent

To simplify how neural nets perform better, decision tree model in Hinton and Frosst’s study uses gradient descent and relies on a machine learning technique called ‘mixture of experts’ (equally dividing the problem space). This is to assign children nodes with a bias, a learned distribution and a weight. Altogether, this setup would form a soft decision tree model, and the leaves with the highest probability become the output here.

In order to avoid incoherent solutions while training the model, a penalty term is also included. Penalty term consists of probability distributions across the nodes. Furthermore, a  hyperparameter in the term strengthens the penalty. All in all, this penalty aspect reinforces the decreasing accuracy as the tree is descended from parent to children nodes.

Even though decision tree models fared slowly compared to neural nets in testing, the gist of comprehending a neural net is not lost.

Training On MNIST Dataset

When trained on a MNIST dataset, soft decision trees show less overfitting than neural nets. However, there is a slight hit on accuracy. Neural net with two convolutional hidden layers has a test accuracy of 99 percent while decision trees proved to be 94 percent accurate. When the same neural net is trained along ‘soft decision tree model’ as mentioned earlier, accuracy is 96 percent.

Visualisation of Soft Decision Tree on MNIST. Children nodes are learned filters whereas terminal nodes are probability distribution visualisation for various classes. (Image courtesy: Geoffrey Hinton and Nicholas Frosst)

What does soft decision tree models simplify? Decision trees rely on decisions rather than hierarchical features as evident in neural nets. After a layer or two in these networks, it is quite difficult to explain how the network behaves that way. There are many studies that have come up with possible explanations but fail to prove them with significant evidence.

Hinton and Frosst’s study tries to bring in a fresh approach towards explaining neural networks. As you can see here it is expressed in the form of a decision tree. By doing this, abstraction is made less at the processing front. If very large layers are present in the network, decision tree structure can also help with overfitting in addition to comprehending the large network. Ultimately, it is always important to check why certain models perform good or bad for specific tasks or use cases.

PS: The story was written using a keyboard.
Share
Picture of Abhishek Sharma

Abhishek Sharma

I research and cover latest happenings in data science. My fervent interests are in latest technology and humor/comedy (an odd combination!). When I'm not busy reading on these subjects, you'll find me watching movies or playing badminton.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India