Neural network ensembles
This page contains description ensembles of neural networks and their implementation in ALGLIB. Prior to reading this page, it is necessary that you look through the paper on the general principles of data analysis methods. It contains important information which, to avoid duplication (as it is of great significance for each algorithm in this section), is moved to a separate page.
A neural network ensemble is a set of neural network models taking a decision by averaging the results of individual models'. Depending on the way the ensemble is designed, its application contributes to solving one of the two problems, namely problems associated with either a tendency of the basic neural network architecture to underfit (the boosting meta-algorithm), or a tendency of the basic architecture to overfit (bagging meta-algorithm, and other algorithms).
Two ensemble-designing algorithms are implemented in the current version of the ALGLIB package, namely bagged neural networks and early stopping ensembles. Both algorithms make use of averaging to overcome the neural network's tendency to overfit. Boosting algorithms are not implemented yet.
Using Neural Network Ensembles
The neural network ensemble subroutines are similar to the subroutines used to operate on individual neural networks (mlpbase and mlptrain modules).Operations with ensemble models are performed in three stages:
- Selection of a basic neural network architecture and choosing the number of networks in an ensemble. Initialization using one of MLPECreateXX subroutines. The interface of these subroutines virtually duplicates, in full, similar subroutines of the mlpbase module. The ensebmle is initialized with random values and needs training.
- Training using one of the algorithms set forth below.
- Using the trained ensemble (mapping inputs to outputs, serialization, etc). The subprograms performing these operations are almost entirely similar to analogous subroutines of the mlpbase module.
There are two restrictions placed by ALGLIB on the neural network ensembles:
- The size of an ensemble is predetermined in the stage of its initialization. The dynamically growing ensembles are not supported.
- The ensemble is uniform, that is, architecture of the neural networks forming the ensemble is identical.
Bagged neural networks
The bagging meta-algorithm (abbreviated from "bootstrap aggregating") consists in generation of K new training sets by sampling examples from the original training set, uniformly and with replacement, and in training of K neural networks using these training sets. The records that fail to get into the j-th set are used as a test set for a j-th neural network. This algorithm is more fully described in Wikipedia.
The following two subroutines can be used for training an ensemble: MLPEBaggingLM and MLPEBaggingLBFGS. The first one uses modified Levenberg-Marquardt algorithm to train individual networks, while the other uses the L-BFGS algorithm.
The main advantage of the algorithm is that an internal generalization error estimate is generated during its work, which is similar to the cross-validation estimate. The main disadvantage consists in the high computational cost which is comparable with that of cross-validation, whereas the generalization error is no better than generalization error of a sufficiently regularized individual neural network. Some saving of time can be achieved, due to the averaging that permits less stringent stopping criterions for individual neural network training algorithm. However, on the whole, this algorithm is not much better than the "individual neural network + regularization + cross-validation" bundle.
Early stopping ensembles
Early stopping is a well-known way to deal with the overfitting of a neural network model. The training set is separated into two parts: one of them is to be used for training, while the other one is meant for validation purposes. A neural network with an rebundant number of neurons in a hidden layer is used (e.g., a network with N inputs, M outputs and one hidden layer containing 30-100 neurons). The network's redundancy is essential for the algorithm's success, that is, the network shall be highly flexible to provide for the efficiency of early stopping. The training is stopped when the error in a validation set starts growing (hence, it is named "early stopping").
Such neural networks are characterized by low bias, but high variance. It means that an individual neural network trained using early stopping has too high error, but the averaging of several neural networks (ten is a good value) leads to a substantial decrease in the error. Experimental results show that an early stopping neural network ensemble generalization error is comparable with an individual neural network of optimal architecture that is trained by a traditional algorithm. But individual neural network needs a long and complex tuning (searching through all possible combinations of architecture with the regularizing parameter), while ensemble of early stopping networks does not need tuning at all.
Thus, early stopping neural network ensembles are characterized by the following advantages:
- Mostly, the ensemble has a smaller generalization error than a traditional neural network of an optimum architecture
- The early stopping ensemble is less prone to overfitting than a traditional neural network.
- Ordinary neural networks require optimum architecture selection - searching through different networks with one or two hidden layers and through different values of the regularization parameter. The early stopping ensemble does not require complex tuning: you just need to take a network with one hidden layer and with an excessive number of neurons in it (e.g., 30 to 100). Training does not require human intervention.
The algorithm's disadvantages:
- When solving some rare problems, the ensemble error is greater than error of a traditional neural network. The ensemble underfits. What accounts for this is not clearly understood yet.
- Ensemble training is several times slower than the training of a traditional neural network. For example, an ensemble consisting of ten early stopping networks can be trained 2 or 3 times slower than single network (trained with L-BFGS).
Open issues:
- The role of regularization in training early stopping networks is still not certain. Even an unregularized ensemble is usually trained quite well. However, higher regularization may prove advantageous in dealing with noisy tasks.
- The question of the optimum size of a hidden layer needs clarification. The hidden layer shall be excessive; otherwise, the early-stopping algorithm will be ineffective. However, it is still a question so far whether there is a limit beyond which an increase in excessiveness will lead to increase of the generalization error.
- The question whether a early stopping ensemble is tolerant of noise in the training set needs to be examined.
Training Set Format
The training set format is described in the paper that is recommended at the top of the page. That paper also deals with such problems as missing values and nominal variable encoding. It should be noted that the dataset format depends on which problem - regression or classification - the network solves.
Manual entries
This article is intended for personal use only.
Download ALGLIB