Model Selection for Gaussian Processes

Chris Williams
Institute for Adaptive and Neural Computation
School of Informatics
University of Edinburgh 

In this talk I will discuss the model selection problem from a Bayesian viewpoint. I will describe model selection for Gaussian process predictors, based on (i) the marginal likelihood, and (ii) the leave-one-out log predictive probability (a cross-validated relative of the marginal likelihood). Both of these criteria are continuous functions of the hyperparameters and can be optimized using standard gradient-based optimzers. These methods are useful for setting kernel parameters and for model comparison, but note that model comparison is essentially an open-ended problem: when is a model good enough? MacKay (2003) asked the question of kernel methods "did we throw the baby out with the bathwater?", i.e. what happened to the hidden feature representations that are developed in neural networks? We argue that these "hidden" properties of the problem are to be found in
more sophisticated kernel functions.