Capacity control in linear classifiers for pattern recognition.

I. Guyon, V. Vapnik, B. Boser, L. Bottou, and S.A. Solla.
In 11th International Conference on Pattern Recognition, volume II, pages 385--388.
1992

Achieving good performance in statistical pattern recognition requires matching the capacity of the classifier to the amount of training data. If the classifier has too many adjustable parameters (large capacity), it is likely to learn the training data without difficulty, but will probably not generalize properly to patterns that do not belong to the training set. Conversely, if the capacity of the classifier is not large enough, it might not be able to learn the task at all. In between, there is an optimal classifier capacity which ensures the best expected generalization for a given amount of training data.

The method of Structural Risk Minimization (SRM) refers to tuning the capacity of the classifier to the available amount of training data. In this paper, we illustrate the method of SRM with several examples of algorithms. We present experiments which confirm theoretical predictions of performance improvement in application to handwritten digit recognition.

[ next paper ]