One of the major tasks of machine learning algorithms is to construct a fair model from a dataset. The process of generating models from data is called learning or training and the learned model can be called as hypothesis or learner. The learning algorithms which construct a set of classifiers and then classify new data points by taking a choice of their predictions are known as Ensemble methods.
It has been discovered that ensembles are often much more accurate than the individual classifiers which make them up. The ensemble methods, also known as committee-based learning or learning multiple classifier systems train multiple hypotheses to solve the same problem. One of the most common examples of ensemble modelling is the random forest trees where a number of decision trees are used to predict outcomes.
An ensemble contains a number of hypothesis or learners which are usually generated from training data with the help of a base learning algorithm. Most ensemble methods use a single base learning algorithm to produce homogenous base learners or homogenous ensembles and there are also some other methods which use multiple learning algorithms and thus produce heterogenous ensembles. Ensemble methods are well known for their ability to boost weak learners.
Why Use Ensemble Methods
The learning algorithms which output only a single hypothesis tends to suffer from basically three issues. These issues are the statistical problem, the computational problem and the representation problem which can be partly overcome by applying ensemble methods.
The learning algorithm which suffers from the statistical problem is said to have high variance. The algorithm which exhibits the computational problem is sometimes described as having computational variance and the learning algorithm which suffers from the representational problem is said to have a high bias. These three fundamental issues can be said as the three important ways in which existing learning algorithms fail. The ensemble methods promise of reducing both the bias and the variance of these three shortcomings of the standard learning algorithm.
Some of the commonly used Ensemble techniques are discussed below
Bagging or Bootstrap Aggregation is a powerful, effective and simple ensemble method. The method uses multiple versions of a training set by using the bootstrap, i.e. sampling with replacement and t it can be used with any type of model for classification or regression. Bagging is only effective when using unstable (i.e. a small change in the training set can cause a significant change in the model) non-linear models.
Boosting is a meta-algorithm which can be viewed as a model averaging method. It is the most widely used ensemble method and one of the most powerful learning ideas. This method was originally designed for classification but it can also be profitably extended to regression. The original boosting algorithm combined three weak learners to generate a strong learner.
Stacking is concerned with combining multiple classifiers generated by using different learning algorithms on a single dataset which consists of pairs of feature vectors and their classifications. This technique consists of basically two phases, in the first phase, a set of base-level classifiers is generated and in the second phase, a meta-level classifier is learned which combines the outputs of the base-level classifiers.
Applications Of Ensemble Methods
- Ensemble methods can be used as overall diagnostic procedures for a more conventional model building. The larger the difference in fit quality between one of the stronger ensemble methods and a conventional statistical model, the more information that the conventional model is probably missing.
- Ensemble methods can be used to evaluate the relationships between explanatory variables and the response in conventional statistical models. Predictors or basis functions overlooked in a conventional model may surface with an ensemble approach.
- With the help of the ensemble method, the selection process could be better captured and the probability of membership in each treatment group estimated with less bias.
- One could use ensemble methods to implement the covariance adjustments inherent in multiple regression and related procedures. One would “residualized” the response and the predictors of interest with ensemble methods.
- Zhou, Zhi-Hua. Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, 2012.
- Dietterich, Thomas G. “Ensemble learning.” The handbook of brain theory and neural networks 2 (2002): 110-125.
- Berk, Richard A. “An introduction to ensemble methods for data analysis.” Sociological methods & research 34.3 (2006): 263-295.
- Sewell, Martin. “Ensemble learning.” RN 11.02 (2008).