Structural risk minimization for character recognition.

I. Guyon, V. Vapnik, B. Boser, L. Bottou, and S.A. Solla.
In J. E. Moody et al., editor, Advances in Neural Information Processing Systems 4 (NIPS 91), pages 471--479, San Mateo CA, Morgan Kaufmann.

Generalization properties of learning systems are influenced by several factors including: (1) properties of the input space, (2) nature and structure of the classifier, and (3) learning algorithm. The notion of Structural Risk Minimization provides a theoretical framework that accounts for various methods of tuning the capacity of the system based on these three factors. It is shown that methods apparently as different as Principal Component Analysis, Optimal Brain Damage and Weight Decay (respectively playing with factor (1), (2) and (3)), share in fact a common mechanism. It is also shown that synergistic effects can be obtained by the combined use of techniques based on the three factors such as (1) smoothing, (2) second-order units, (3) regularization. Our experimental results indicate that, by incorporating the factors (1), (2) and (3) and not solely factor (2), in the definition of the capacity, the generalization performance of the system can be accurately predicted.

[ next paper ]