Asa Ben-Hur and Isabelle Guyon.
In Methods in Molecular Biology, M.J. Brownstein and A. Kohodursky (eds.) Humana press, pp. 159-182.
The emergence of cluster structure depends on several choices: data representation
and normalization, the choice of a similarity measure and clustering algorithm. In this
chapter we extend the stability-based validation of cluster structure, and propose stability
as a figure of merit that is useful for comparing clustering solutions, thus helping in making
these choices. We use this framework to demonstrate the ability of Principal Component
Analysis (PCA) to extract features relevant to the cluster structure. We use stability as a
tool for simultaneously choosing the number of principal components and the number of
clusters; we compare the performance of different similarity measures and normalization
schemes. The approach is demonstrated through a case study of yeast gene expression data
from Eisen et al.
[ next paper ]