Untitled Document

4.und 5.Vorlesung: Kernels, PCA, and kernel PCA

PCA: Principle Component Analysis is a linear technique to reduce data dimensionality. The main idea is to find the directions in the (high dimensional) space where the data variability is highest and ignore all other directions. We discuss two different ways to derive PCA: as the projection minimizing the squared error, and the one maximizing the data variance.

Literature on PCA: Classical PCA is covered in many statistics books:

A complete book on PCA is Jolliffe: Principal Component Analysis. Springer, 2002.
Chapter 8 in Mardia, Kent, Bibby: Multivariate Analysis. Academic Press, 1979. A classic.

Kernels: are very convenient similarity functions which automatically come together with an embedding in a high-dimensional space.

Chapter 2 of Schölkopf and Smola: Learning with Kernels, MIT Press, 2002.
Chapter 2 and 3 of Shawe-Taylor and Cristianini: Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.

Kernel PCA: combines the kernel trick with PCA.

Literature on kernel PCA:

Chapter 14.2 of Schölkopf and Smola: Learning with Kernels, MIT Press, 2002.
Chapter 6.2. of Shawe-Taylor and Cristianini: Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
The original article: B. Schölkopf, A. Smola, and K.-R. Müller. Kernel Principal component Analysis. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods--Support Vector Learning, pages 327-352. MIT Press, Cambridge, MA, 1999.

Demos:

PCA first demo: this demo shows projections produced by a simple PCA application, step by step. Call it for example with with demo_pca(3,2)
PCA second demo: shows how PCA can be applied to handwritten digits, called for example with demo_pca_usps(300). For this demo you also need the USPS (US Postal Handwritten digits) data set.
Kernel PCA demo: shows how kernel PCA works