This page contains description ensembles of neural networks and their implementation in ALGLIB. Prior to reading this page, it is necessary that you look through the paper on the general principles of data analysis methods. It contains important information which, to avoid duplication (as it is of great significance for each algorithm in this section), is moved to a separate page.
A neural network ensemble is a set of neural network models taking a decision by averaging the results of individual models'. Depending on the way the ensemble is designed, its application contributes to solving one of the two problems, namely problems associated with either a tendency of the basic neural network architecture to underfit (the boosting meta-algorithm), or a tendency of the basic architecture to overfit (bagging meta-algorithm, and other algorithms).
Two ensemble-designing algorithms are implemented in the current version of the ALGLIB package, namely bagged neural networks and early stopping ensembles. Both algorithms make use of averaging to overcome the neural network's tendency to overfit. Boosting algorithms are not implemented yet.
The neural network ensemble subroutines are similar to the subroutines used to operate on individual neural networks (mlpbase and mlptrain modules).Operations with ensemble models are performed in three stages:
There are two restrictions placed by ALGLIB on the neural network ensembles:
The bagging meta-algorithm (abbreviated from "bootstrap aggregating") consists in generation of K new training sets by sampling examples from the original training set, uniformly and with replacement, and in training of K neural networks using these training sets. The records that fail to get into the j-th set are used as a test set for a j-th neural network. This algorithm is more fully described in Wikipedia.
The following two subroutines can be used for training an ensemble: MLPEBaggingLM and MLPEBaggingLBFGS. The first one uses modified Levenberg-Marquardt algorithm to train individual networks, while the other uses the L-BFGS algorithm.
The main advantage of the algorithm is that an internal generalization error estimate is generated during its work, which is similar to the cross-validation estimate. The main disadvantage consists in the high computational cost which is comparable with that of cross-validation, whereas the generalization error is no better than generalization error of a sufficiently regularized individual neural network. Some saving of time can be achieved, due to the averaging that permits less stringent stopping criterions for individual neural network training algorithm. However, on the whole, this algorithm is not much better than the "individual neural network + regularization + cross-validation" bundle.
Early stopping is a well-known way to deal with the overfitting of a neural network model. The training set is separated into two parts: one of them is to be used for training, while the other one is meant for validation purposes. A neural network with an rebundant number of neurons in a hidden layer is used (e.g., a network with N inputs, M outputs and one hidden layer containing 30-100 neurons). The network's redundancy is essential for the algorithm's success, that is, the network shall be highly flexible to provide for the efficiency of early stopping. The training is stopped when the error in a validation set starts growing (hence, it is named "early stopping").
Such neural networks are characterized by low bias, but high variance. It means that an individual neural network trained using early stopping has too high error, but the averaging of several neural networks (ten is a good value) leads to a substantial decrease in the error. Experimental results show that an early stopping neural network ensemble generalization error is comparable with an individual neural network of optimal architecture that is trained by a traditional algorithm. But individual neural network needs a long and complex tuning (searching through all possible combinations of architecture with the regularizing parameter), while ensemble of early stopping networks does not need tuning at all.
Thus, early stopping neural network ensembles are characterized by the following advantages:
The algorithm's disadvantages:
The training set format is described in the paper that is recommended at the top of the page. That paper also deals with such problems as missing values and nominal variable encoding. It should be noted that the dataset format depends on which problem - regression or classification - the network solves.
This article is intended for personal use only.