Exercises * on Principal Component Analysis

Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement.......................................... 3.. Exercise: Second moment from mean and variance.................... 3..2 Exercise: Second moment of a uniform distribution.................... 3.2 Projection and reconstruction error................................. 3.2. Exercise: Projection by an inner product is orthogonal.................. 3.2.2 Exercise: Error function................................... 3.3 Reconstruction error and variance................................. 4.4 Covariance matrix.......................................... 4.4. Exercise: Relation among the elements of a second moment matrix........... 4.4.2 Exercise: From data distribution to second-moment matrix............... 4.4.3 Exercise: From data distribution to second-moment matrix............... 4.4.4 Exercise: From second-moment matrix to data...................... 4.4.5 Exercise: Data distributions with and without mean................... 5 206, 207 Laurenz Wiskott (homepage https://www.ini.rub.de/people/wiskott/. This work (except for all figures from other sources, if present is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/. Figures from other sources have their own copyright, which is generally indicated. Do not distribute parts of these lecture notes showing figures with non-free copyrights (here usually figures I have the rights to publish but you don t, like my own published figures. Several of my exercises (not necessarily on this topic were inspired by papers and textbooks by other authors. Unfortunately, I did not document that well, because initially I did not intend to make the exercises publicly available, and now I cannot trace it back anymore. So I cannot give as much credit as I would like to. The concrete versions of the exercises are certainly my own work, though. In cases where I reuse an exercise in different variants, references may be wrong for technical reasons. * These exercises complement my corresponding lecture notes available at https://www.ini.rub.de/people/wiskott/ Teaching/Material/, where you can also find other teaching material such as programming exercises. The table of contents of the lecture notes is reproduced here to give an orientation when the exercises can be reasonably solved. For best learning effect I recommend to first seriously try to solve the exercises yourself before looking into the solutions.

.5 Covariance matrix and higher order structure........................... 5.6 PCA by diagonalizing the covariance matrix............................ 5 2 Formalism 5 2. Definition of the PCA-optimization problem............................ 5 2.2 Matrix V T : Mapping from high-dimensional old coordinate system to low-dimensional new coordinate system.......................................... 5 2.3 Matrix V: Mapping from low-dimensional new coordinate system to subspace in old coordinate system.............................................. 5 2.3. Exercise: Norm of a vector................................. 5 2.4 Matrix (V T V: Identity mapping within new coordinate system................ 6 2.5 Matrix (VV T : Projection from high- to low-dimensional (subspace within old coordinate system................................................. 6 2.6 Variance................................................ 6 2.7 Reconstruction error......................................... 6 2.8 Covariance matrix.......................................... 6 2.8. Exercise: Second-moment matrices are positive semi-definite.............. 6 2.8.2 Exercise: Covariance matrix from mean and second-moment matrix.......... 6 2.9 Eigenvalue equation of the covariance matrix........................... 6 2.9. Exercise: Eigenvectors of a symmetric matrix are orthogonal.............. 6 2.0 Total variance of the data x..................................... 7 2. Diagonalizing the covariance matrix................................ 7 2.2 Variance of y for a diagonalized covariance matrix........................ 7 2.3 Constraints of matrix V....................................... 7 2.4 Finding the optimal subspace.................................... 7 2.5 Interpretation of the result..................................... 7 2.5. Exercise: Moments of a data distribution: Simple example................ 7 2.5.2 Exercise: From data distribution to second-moment matrix via the eigenvectors.... 7 2.5.3 Exercise: From data distribution to second-moment matrix via the eigenvectors.... 8 2.5.4 Exercise: Dimensionality reduction............................. 8 2.6 PCA Algorithm............................................ 8 2.7 Intuition of the Results....................................... 8 2.8 Whitening or sphering........................................ 8 2.8. Exercise: Sphered data is uncorrelated........................... 8 2.9 Singular value decomposition +................................... 9 2

3 Application 9 3. Face processing............................................ 9 4 Acknowledgment 9 Intuition. Problem statement.. Exercise: Second moment from mean and variance How are mean m, variance v and 2nd moment s related to each other? In other words, if mean and variance of a one-dimensional distribution were given. How could you compute the corresponding 2nd moment? Hint: Assume x to be the data values and x their mean. Then play around with the corresponding expressions for mean x = x, variance (x x 2 and second moment x 2...2 Exercise: Second moment of a uniform distribution Calculate the second moment of a uniform, i.e. flat, distribution in [, +]. This is a distribution where every value between and + is equally likely and other values are impossible..2 Projection and reconstruction error.2. Exercise: Projection by an inner product is orthogonal. We have defined the projected vector, x, by x = vv T x ( where x is the data point and v is the unit vector along the principal axis of the projection. Show that the difference vector between data point and the projected data point is orthogonal to v. 2. Give a reason why the orthogonality of the two vectors is useful. x = x x (2.2.2 Exercise: Error function Why should the reconstruction error, E, be defined as the mean of the squared difference of the original and reconstructed data vectors, and not simply the mean of the difference or the mean of the absolute difference? 3

.3 Reconstruction error and variance.4 Covariance matrix.4. Exercise: Relation among the elements of a second moment matrix For a set of data vectors x µ, µ =,..., M the second moment matrix C is defined as C ij := x µ i xµ j µ. What are the upper and lower limits of C ij if C ii and C jj are known? Hint: Consider x µ i xµ j µ = M µ xµ i xµ j as the scalar product of two vectors..4.2 Exercise: From data distribution to second-moment matrix Give an estimate of the second moment matrix for the following data distributions. x 2 x 2 x 2 x x x (a (b (c CC BY-SA 4.0.4.3 Exercise: From data distribution to second-moment matrix Give an estimate of the second moment matrix for the following data distributions. x 2 x 2 x 2 x x x (a (b (c CC BY-SA 4.0.4.4 Exercise: From second-moment matrix to data Draw a data distribution qualitatively consistent with the following second-moment matrices C. (a C = ( 0.5 0.5 (b C = ( 0 0 0.5 (c C = ( 4

.4.5 Exercise: Data distributions with and without mean. Define a procedure by which you can turn any mean-free data distribution into a distribution with finite (non-zero mean but identical second-moment matrix. (Are there exceptions? 2. Conversely, define a procedure by which you can turn any data distribution with finite mean into a distribution with zero mean but identical second-moment matrix. (Are there exceptions? Hint: Think about what happens if you flip a point µ at the origin, i.e. if you replace x µ by x µ in the data set..5 Covariance matrix and higher order structure.6 PCA by diagonalizing the covariance matrix 2 Formalism 2. Definition of the PCA-optimization problem 2.2 Matrix V T : Mapping from high-dimensional old coordinate system to lowdimensional new coordinate system 2.3 Matrix V: Mapping from low-dimensional new coordinate system to subspace in old coordinate system 2.3. Exercise: Norm of a vector Let b i, i =,..., N, be an orthonormal basis. Then we have (b i, b j = δ ij and v = N v i b i with v i := (v, b i v. ( i= Show that N v 2 = vi 2. (2 i= 5

2.4 Matrix (V T V: Identity mapping within new coordinate system 2.5 Matrix (VV T : Projection from high- to low-dimensional (subspace within old coordinate system 2.6 Variance 2.7 Reconstruction error 2.8 Covariance matrix 2.8. Exercise: Second-moment matrices are positive semi-definite (//0/ minshow that a second-moment matrix C := x µ (x µ T µ is always positive semi-definite, i.e. for each vector v we find v T Cv 0. For which vectors v does v T Cv = 0 hold? 2.8.2 Exercise: Covariance matrix from mean and second-moment matrix Given some data x µ, µ =,..., M, with mean x := x = ( ( and second-moment matrix Calculate the covariance matrix C := xx T = ( 4 2 Σ := (x x(x x T. (3 First derive a general formula and then calculate it for the concrete values given. (2 2.9 Eigenvalue equation of the covariance matrix 2.9. Exercise: Eigenvectors of a symmetric matrix are orthogonal Prove that the eigenvectors of a symmetric matrix are orthogonal, if their eigenvalues are different. Proceed as follows:. Let A be a symmetric N-dimensional matrix, i.e. A = A T. Show first that (v, Aw = (Av, w for any vectors v, w R N, with (, indicating the Euclidean inner product. 2. Let {a i } be the eigenvectors of the matrix A with the eigenvalues λ i. Show with the help of part one that (a i, a j = 0 if λ i λ j. Hint: λ i (a i, a j =... 6

2.0 Total variance of the data x 2. Diagonalizing the covariance matrix 2.2 Variance of y for a diagonalized covariance matrix 2.3 Constraints of matrix V 2.4 Finding the optimal subspace 2.5 Interpretation of the result 2.5. Exercise: Moments of a data distribution: Simple example Given a data distribution x µ with x = ( ( 3, x 2 =, x 3 = 2 ( 2. ( 3. Calculate the mean x = x µ µ and the second-moment matrix C = x µ x µt µ. 2. Determine the normalized eigenvectors c and c 2 of C and the corresponding eigenvalues. Hint: Look at the data distribution and guess the eigenvectors on the basis of the symmetry of the distribution. Then insert the guessed eigenvectors into the eigenvalue equation, verify that they are eigenvectors and calculate the eigenvalues. Otherwise you have to go the hard way via the characteristic polynomial. 3. Determine the first and second moment of i.e. y µ µ and (y µ 2 µ, for α {, 2}. y µ = c T αx µ, (2 Hint: You don t have to compute the projected data. There is a simpler way. 2.5.2 Exercise: From data distribution to second-moment matrix via the eigenvectors Give an estimate of the second-moment matrix for the following data distributions by first guessing the eigenvalues and normalized eigenvectors from the distribution and then calculating the matrix. x 2 x 2 x 2 x x x (a (b (c CC BY-SA 4.0 7

2.5.3 Exercise: From data distribution to second-moment matrix via the eigenvectors Give an estimate of the second-moment matrix for the following data distributions by first guessing the eigenvalues and normalized eigenvectors from the distribution and then calculating the matrix. x 2 x 2 x 2 x x x (a (b (c CC BY-SA 4.0 2.5.4 Exercise: Dimensionality reduction Given some data in R 3 with the corresponding 3 3 second-moment matrix C with eigenvectors c α and eigenvalues λ α, with λ = 3, λ 2 = and λ 3 = 0.2.. Define a matrix A R 2 3 that maps the data into a two-dimensional space while preserving as much variance as possible. 2. Define a matrix B R 3 2 that places the reduced data back into R 3 with minimal reconstruction error. How large is the reconstruction error? 3. Prove that AB is an identity matrix. Why would one expect that intuitively? 4. Prove that BA is a projection matrix but not the identity matrix. 2.6 PCA Algorithm 2.7 Intuition of the Results 2.8 Whitening or sphering 2.8. Exercise: Sphered data is uncorrelated Prove that sphered zero-mean data ˆx projected onto two orthogonal vectors n and n 2 is uncorrelated. Hint: The correlation coefficient for two scalar data sets y and y 2 with means ȳ i := y i is defined as c := (y ȳ (y 2 ȳ 2 (y ȳ 2 (y 2 ȳ 2 2 ( 8

2.9 Singular value decomposition + 3 Application 3. Face processing 4 Acknowledgment 9