Deep learning neural networks are highly capable networks that can predict and solve problems with complexity. Yet the model comes with an inbuilt disability, a model may not always have the same prediction in terms of accuracy on the same machine with the same dataset. Although the model may come up with a good prediction each and every time, the variance in prediction is a real drawback in the network

Ensemble learning enables us to use multiple algorithms or the same algorithm multiple times to solve the same problem helps to reduce the variance in the prediction of the same model. Model averaging is an ensemble learning technique that helps to reduce the variance in neural networks.

We referenced Jason Brownlee’s tutorial, to implement Model averaging on a neural network. We also made a few changes to the code mentioned in Brownlee’s blog.

Following this tutorial will require you to have:

- Basic knowledge in Python
- Understanding of Neural Networks

### Model Averaging

Model averaging belongs to the family of ensemble learning techniques that uses multiple models for the same problem and combines their predictions to produce a more reliable and consistent prediction accuracy.

#### Model Averaging on A Multi-Class Classification Problem

First, we will create a sample dataset for a multiclass classification problem using the make_blobs() function from sklearn.datasets.

`X, y = make_blobs(n_samples=500, centers=3, n_features=2, cluster_std=2, random_state=2)`

The above function will return 500 samples of an independent variable set with 2 features and a dependent variable which is categorical. The data points will have a standard deviation of 2 and will have 3 centers meaning that they will fall into either of the 3 categories.

Visualizing the dataset :

`from matplotlib import pyplot`

import pandas as pd

df = pd.DataFrame(dict(x=X[:,0], y=X[:,1], label=y))

colors = {0:'red', 1:'black', 2:'yellow'}

fig, ax = pyplot.subplots()

grouped = df.groupby('label')

for key, group in grouped:

group.plot(ax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key])

pyplot.show()

Output:

#### The Multi-Layer Perceptron Model

Now that we have our dataset, we will determine the variance of the prediction in the same model applied to the same dataset in the same machine.

The problem is a Multi-Class classification problem, and the model will use softmax function on the output layer to predict either of the 3 categories or classes that a point falls in. Thus the first step would be to one hot encode the categorical feature which is the dependent factory here.

`from keras.utils import np_utils`

y = np_utils.to_categorical(y)

Now we will create the training and testing samples for our dataset. We will split the dataset, 30% goes to the training_set and 70% goes to the test_set.

`from sklearn.model_selection import train_test_split`

X_train, X_test, Y_train, Y_test = train_test_split(X,y,test_size = 0.7, random_state = 1)

Now let’s create out Neural Network model

We will create a Neural Network with 2 input nodes and one hidden layer with 20 nodes and an output layer with 3 nodes and with softmax activation. The model will be compiled with ‘adam’ optimizer.

`from keras.models import Sequential`

from keras.layers import Dense

model = Sequential()

model.add(Dense(20, input_dim=2, activation='relu'))

model.add(Dense(3, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

_model = model.fit(X_train, Y_train, validation_data=(X_test, Y_test), epochs=100)

After fitting the model with the training set will now evaluate its performance and compare the accuracy metrics of training and test sets.

`_, train_acc = model.evaluate(X_train, Y_train, verbose=0)`

_, test_acc = model.evaluate(X_test, Y_test, verbose=0)

print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))

##### Variance in MLP

To see the variance, we just need to fit the already defined model with the same dataset on the same machine multiple times.To simplify the process of fitting the models and evaluating it for a specific number of times we will create a function.

`def evaluate_model(trainX, trainy, testX, testy):`

model = Sequential()

model.add(Dense(15, input_dim=2, activation='relu'))

model.add(Dense(3, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(trainX, trainy, epochs=100, verbose=0)

_, test_acc = model.evaluate(testX, testy, verbose=0)

return test_acc

`n_repeats = 10`

scores = list()

for _ in range(n_repeats):

score = evaluate_model(X_train, Y_train, X_test, Y_test)

print('> %.3f' % score)

scores.append(score)

from statistics import *

print('Scores Mean: %.3f, Standard Deviation: %.3f' % (mean(scores), stdev(scores)))

Output:

### Model Averaging Ensemble

Now that we have understood how model averaging works we will implement it on our classification problem. But we still do not know how many reps will give the best score. Hence we will perform a sensitivity analysis to determine the optimum number of rounds that a model should run before averaging the scores.

`from sklearn.datasets.samples_generator import make_blobs`

from keras.utils import to_categorical

from keras.models import Sequential

from keras.layers import Dense

import numpy

from numpy import array

from numpy import argmax

from sklearn.metrics import accuracy_score

from matplotlib import pyplot

from sklearn.model_selection import train_test_split

`def fit_model(trainX, trainy):`

model = Sequential()

model.add(Dense(20, input_dim=2, activation='relu'))

model.add(Dense(3, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(trainX, trainy, epochs=100, verbose=0)

return model

`#make an ensemble prediction for multi-class classification`

def ensemble_predictions(members, testX):

`#make predictions`

yhats = [model.predict(testX) for model in members]
yhats = array(yhats)

#sum across ensemble members

summed = numpy.sum(yhats, axis=0)

#argmax across classes

result = argmax(summed, axis=1)

return result

`#evaluate a specific number of members in an ensemble`

def evaluate_n_members(members, n_members, testX, testy):

`#select a subset of members`

subset = members[:n_members]
print(len(subset))

`#make prediction`

yhat = ensemble_predictions(subset, testX)

`#calculate accuracy`

return accuracy_score(testy, yhat)

`X, y = make_blobs(n_samples=500, centers=3, n_features=2, cluster_std=2, random_state=2)`

X_train, X_test, Y_train, Y_test = train_test_split(X,y,test_size = 0.3, random_state = 1)

Y_train = to_categorical(Y_train)

`#fit all models`

n_members = 20

members = [fit_model(X_train, Y_train) for _ in range(n_members)]
#evaluate different numbers of ensembles

scores = list()

`for i in range(1, n_members+1):`

score = evaluate_n_members(members, i, X_test, Y_test)

print('> %.3f' % score)

scores.append(score)

`print("Average Accuracy Score : ", numpy.mean(score))`

#plot score vs number of ensemble members

x_axis = [i for i in range(1, n_members+1)]
pyplot.plot(x_axis, scores)

pyplot.show()

Output:

We can see that the accuracy maintains the average at around 13 and then fluctuates within close ranges of the average.Hence we will choose the optimum number of members to be 13,

Now we can update the code to use an ensemble of 13 models.

`from sklearn.datasets.samples_generator import make_blobs`

from keras.utils import to_categorical

from keras.models import Sequential

from keras.layers import Dense

import numpy

from numpy import array

from numpy import argmax

from numpy import mean

from numpy import std

from sklearn.metrics import accuracy_score

from sklearn.model_selection import train_test_split

`#fit model on dataset`

def fit_model(trainX, trainy):

#define model

model = Sequential()

model.add(Dense(20, input_dim=2, activation='relu'))

model.add(Dense(3, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#fit model

model.fit(trainX, trainy, epochs=100, verbose=0)

return model

`#make an ensemble prediction for multi-class classification`

def ensemble_predictions(members, testX):

#make predictions

yhats = [model.predict(testX) for model in members]
yhats = array(yhats)

#sum across ensemble members

summed = numpy.sum(yhats, axis=0)

#argmax across classes

result = argmax(summed, axis=1)

return result

`#evaluate ensemble model`

def evaluate_members(members, testX, testy):

#make prediction

yhat = ensemble_predictions(members, testX)

#calculate accuracy

return accuracy_score(testy, yhat)

`X, y = make_blobs(n_samples=500, centers=3, n_features=2, cluster_std=2, random_state=2)`

X_train, X_test, Y_train, Y_test = train_test_split(X,y,test_size = 0.7, random_state = 1)

Y_train = to_categorical(Y_train)

`#repeated evaluation`

n_repeats = 10

n_members = 13 #optimum number of modeling

scores = list()

`for _ in range(n_repeats):`

#fit all models

members = [fit_model(X_train, Y_train) for _ in range(n_members)]
#evaluate ensemble

score = evaluate_members(members, X_test, Y_test)

print('> %.3f' % score)

scores.append(score)

`#summarize the distribution of scores`

print('Scores Mean: %.3f, Standard Deviation: %.3f' % (mean(scores), std(scores)))

Output: