Model Selection
for Gaussian Processes
Chris
Williams
Institute for Adaptive and Neural Computation
School of Informatics
University of Edinburgh
In this talk I
will discuss the model selection problem from a Bayesian viewpoint. I will
describe model selection for Gaussian process predictors, based on (i) the
marginal likelihood, and (ii) the leave-one-out log predictive probability
(a cross-validated relative of the marginal likelihood). Both of these criteria
are continuous functions of the hyperparameters and can be optimized using
standard gradient-based optimzers. These methods are useful for setting kernel
parameters and for model comparison, but note that model comparison is essentially
an open-ended problem: when is a model good enough? MacKay (2003) asked the
question of kernel methods "did we throw the baby out with the bathwater?",
i.e. what happened to the hidden feature representations that are developed
in neural networks? We argue that these "hidden" properties of the problem
are to be found in
more sophisticated kernel functions.