Isabelle Guyon, Hans-Marcus Bitter, Zulfikar Ahmed, Michael Brown, and Jonathan Heller.
In proceedings BISC FLINT-CIBI 2003 workshop, Berkeley, Dec. 2003.
We address problems of classification in which the number of input
components (variables, features) is very large compared to the number of
training samples. In this setting, it is often desirable to perform a feature
selection to reduce the number of inputs, either for efficiency, performance,
or to gain understanding of the data and the classifiers. We compare a
number of methods on mass-spectrometric data of human protein sera from
asymptomatic patients and prostate cancer patients. We show empirical evidence
that, in spite of the high danger of overfitting, non-linear methods can
outperform linear methods, both in performance and number of features selected.
[ next paper ]