Exercises * on Principal Component Analysis

Similar documents
Exercises * on Linear Algebra

Exercises * on Functions

Solutions to the Exercises * on Multiple Integrals

Solutions to the Exercises * on Linear Algebra

Principal Component Analysis

CS 143 Linear Algebra Review

Concentration Ellipsoids

14 Singular Value Decomposition

Mathematical foundations - linear algebra

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Contents. Preface for the Instructor. Preface for the Student. xvii. Acknowledgments. 1 Vector Spaces 1 1.A R n and C n 2

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Principal Component Analysis

15 Singular Value Decomposition

Linear Algebra and Eigenproblems

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Tutorial on Principal Component Analysis

Foundations of Computer Vision

Chapter 6: Orthogonality

The Hilbert Space of Random Variables

Lecture 2: Linear Algebra Review

1 Principal Components Analysis

Principal Component Analysis (PCA)

Maximum variance formulation

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Spectral Theorem for Self-adjoint Linear Operators

4 Bias-Variance for Ridge Regression (24 points)

Principal Component Analysis

Multivariate Statistical Analysis

1 Singular Value Decomposition and Principal Component

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Chapter 3 Transformations

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA

Review problems for MA 54, Fall 2004.

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Eigenvalues and diagonalization

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

Lecture 7: Positive Semidefinite Matrices

Basic Calculus Review

LECTURE NOTE #11 PROF. ALAN YUILLE

ANOVA: Analysis of Variance - Part I

Principal Component Analysis (PCA)

1 Inner Product and Orthogonality

Matrix Vector Products

Linear Algebra Review. Vectors

Linear Algebra: Characteristic Value Problem

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

EXERCISES ON DETERMINANTS, EIGENVALUES AND EIGENVECTORS. 1. Determinants

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Vectors and Matrices Statistics with Vectors and Matrices

Exercise Sheet 1.

Introduction to Machine Learning

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Cheat Sheet for MATH461

Image Registration Lecture 2: Vectors and Matrices

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Lecture Notes 1: Vector spaces

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Functional Analysis Review

7 Principal Component Analysis

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Lecture 10 - Eigenvalues problem

Principal Components Theory Notes

Numerical Analysis Lecture Notes

1 = I I I II 1 1 II 2 = normalization constant III 1 1 III 2 2 III 3 = normalization constant...

Methods for sparse analysis of high-dimensional data, II

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution

Review of Linear Algebra

Maths for Signals and Systems Linear Algebra in Engineering

MAT Linear Algebra Collection of sample exams

A PRIMER ON SESQUILINEAR FORMS

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Learning with Singular Vectors

Principal Component Analysis

UNIT 6: The singular value decomposition.

OHSx XM511 Linear Algebra: Solutions to Online True/False Exercises

What is Principal Component Analysis?

1 Last time: least-squares problems

2. Review of Linear Algebra

Exercise Set 7.2. Skills

Quantitative Understanding in Biology Principal Components Analysis

Appendix A. Vector addition: - The sum of two vectors is another vector that also lie in the space:

STATISTICAL LEARNING SYSTEMS

Large Scale Data Analysis Using Deep Learning

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

The Singular Value Decomposition

MLCC 2015 Dimensionality Reduction and PCA

Lecture Summaries for Linear Algebra M51A

Lecture 2: Linear Algebra

Exercises * on Bayesian Theory and Graphical Models

Lecture II: Linear Algebra Revisited

8 Eigenvectors and the Anisotropic Multivariate Gaussian Distribution

Stat 206: Linear algebra

Transcription:

Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement.......................................... 3.. Exercise: Second moment from mean and variance.................... 3..2 Exercise: Second moment of a uniform distribution.................... 3.2 Projection and reconstruction error................................. 3.2. Exercise: Projection by an inner product is orthogonal.................. 3.2.2 Exercise: Error function................................... 3.3 Reconstruction error and variance................................. 4.4 Covariance matrix.......................................... 4.4. Exercise: Relation among the elements of a second moment matrix........... 4.4.2 Exercise: From data distribution to second-moment matrix............... 4.4.3 Exercise: From data distribution to second-moment matrix............... 4.4.4 Exercise: From second-moment matrix to data...................... 4.4.5 Exercise: Data distributions with and without mean................... 5 206, 207 Laurenz Wiskott (homepage https://www.ini.rub.de/people/wiskott/. This work (except for all figures from other sources, if present is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/. Figures from other sources have their own copyright, which is generally indicated. Do not distribute parts of these lecture notes showing figures with non-free copyrights (here usually figures I have the rights to publish but you don t, like my own published figures. Several of my exercises (not necessarily on this topic were inspired by papers and textbooks by other authors. Unfortunately, I did not document that well, because initially I did not intend to make the exercises publicly available, and now I cannot trace it back anymore. So I cannot give as much credit as I would like to. The concrete versions of the exercises are certainly my own work, though. In cases where I reuse an exercise in different variants, references may be wrong for technical reasons. * These exercises complement my corresponding lecture notes available at https://www.ini.rub.de/people/wiskott/ Teaching/Material/, where you can also find other teaching material such as programming exercises. The table of contents of the lecture notes is reproduced here to give an orientation when the exercises can be reasonably solved. For best learning effect I recommend to first seriously try to solve the exercises yourself before looking into the solutions.

.5 Covariance matrix and higher order structure........................... 5.6 PCA by diagonalizing the covariance matrix............................ 5 2 Formalism 5 2. Definition of the PCA-optimization problem............................ 5 2.2 Matrix V T : Mapping from high-dimensional old coordinate system to low-dimensional new coordinate system.......................................... 5 2.3 Matrix V: Mapping from low-dimensional new coordinate system to subspace in old coordinate system.............................................. 5 2.3. Exercise: Norm of a vector................................. 5 2.4 Matrix (V T V: Identity mapping within new coordinate system................ 6 2.5 Matrix (VV T : Projection from high- to low-dimensional (subspace within old coordinate system................................................. 6 2.6 Variance................................................ 6 2.7 Reconstruction error......................................... 6 2.8 Covariance matrix.......................................... 6 2.8. Exercise: Second-moment matrices are positive semi-definite.............. 6 2.8.2 Exercise: Covariance matrix from mean and second-moment matrix.......... 6 2.9 Eigenvalue equation of the covariance matrix........................... 6 2.9. Exercise: Eigenvectors of a symmetric matrix are orthogonal.............. 6 2.0 Total variance of the data x..................................... 7 2. Diagonalizing the covariance matrix................................ 7 2.2 Variance of y for a diagonalized covariance matrix........................ 7 2.3 Constraints of matrix V....................................... 7 2.4 Finding the optimal subspace.................................... 7 2.5 Interpretation of the result..................................... 7 2.5. Exercise: Moments of a data distribution: Simple example................ 7 2.5.2 Exercise: From data distribution to second-moment matrix via the eigenvectors.... 7 2.5.3 Exercise: From data distribution to second-moment matrix via the eigenvectors.... 8 2.5.4 Exercise: Dimensionality reduction............................. 8 2.6 PCA Algorithm............................................ 8 2.7 Intuition of the Results....................................... 8 2.8 Whitening or sphering........................................ 8 2.8. Exercise: Sphered data is uncorrelated........................... 8 2.9 Singular value decomposition +................................... 9 2

3 Application 9 3. Face processing............................................ 9 4 Acknowledgment 9 Intuition. Problem statement.. Exercise: Second moment from mean and variance How are mean m, variance v and 2nd moment s related to each other? In other words, if mean and variance of a one-dimensional distribution were given. How could you compute the corresponding 2nd moment? Hint: Assume x to be the data values and x their mean. Then play around with the corresponding expressions for mean x = x, variance (x x 2 and second moment x 2...2 Exercise: Second moment of a uniform distribution Calculate the second moment of a uniform, i.e. flat, distribution in [, +]. This is a distribution where every value between and + is equally likely and other values are impossible..2 Projection and reconstruction error.2. Exercise: Projection by an inner product is orthogonal. We have defined the projected vector, x, by x = vv T x ( where x is the data point and v is the unit vector along the principal axis of the projection. Show that the difference vector between data point and the projected data point is orthogonal to v. 2. Give a reason why the orthogonality of the two vectors is useful. x = x x (2.2.2 Exercise: Error function Why should the reconstruction error, E, be defined as the mean of the squared difference of the original and reconstructed data vectors, and not simply the mean of the difference or the mean of the absolute difference? 3

.3 Reconstruction error and variance.4 Covariance matrix.4. Exercise: Relation among the elements of a second moment matrix For a set of data vectors x µ, µ =,..., M the second moment matrix C is defined as C ij := x µ i xµ j µ. What are the upper and lower limits of C ij if C ii and C jj are known? Hint: Consider x µ i xµ j µ = M µ xµ i xµ j as the scalar product of two vectors..4.2 Exercise: From data distribution to second-moment matrix Give an estimate of the second moment matrix for the following data distributions. x 2 x 2 x 2 x x x (a (b (c CC BY-SA 4.0.4.3 Exercise: From data distribution to second-moment matrix Give an estimate of the second moment matrix for the following data distributions. x 2 x 2 x 2 x x x (a (b (c CC BY-SA 4.0.4.4 Exercise: From second-moment matrix to data Draw a data distribution qualitatively consistent with the following second-moment matrices C. (a C = ( 0.5 0.5 (b C = ( 0 0 0.5 (c C = ( 4

.4.5 Exercise: Data distributions with and without mean. Define a procedure by which you can turn any mean-free data distribution into a distribution with finite (non-zero mean but identical second-moment matrix. (Are there exceptions? 2. Conversely, define a procedure by which you can turn any data distribution with finite mean into a distribution with zero mean but identical second-moment matrix. (Are there exceptions? Hint: Think about what happens if you flip a point µ at the origin, i.e. if you replace x µ by x µ in the data set..5 Covariance matrix and higher order structure.6 PCA by diagonalizing the covariance matrix 2 Formalism 2. Definition of the PCA-optimization problem 2.2 Matrix V T : Mapping from high-dimensional old coordinate system to lowdimensional new coordinate system 2.3 Matrix V: Mapping from low-dimensional new coordinate system to subspace in old coordinate system 2.3. Exercise: Norm of a vector Let b i, i =,..., N, be an orthonormal basis. Then we have (b i, b j = δ ij and v = N v i b i with v i := (v, b i v. ( i= Show that N v 2 = vi 2. (2 i= 5

2.4 Matrix (V T V: Identity mapping within new coordinate system 2.5 Matrix (VV T : Projection from high- to low-dimensional (subspace within old coordinate system 2.6 Variance 2.7 Reconstruction error 2.8 Covariance matrix 2.8. Exercise: Second-moment matrices are positive semi-definite (//0/ minshow that a second-moment matrix C := x µ (x µ T µ is always positive semi-definite, i.e. for each vector v we find v T Cv 0. For which vectors v does v T Cv = 0 hold? 2.8.2 Exercise: Covariance matrix from mean and second-moment matrix Given some data x µ, µ =,..., M, with mean x := x = ( ( and second-moment matrix Calculate the covariance matrix C := xx T = ( 4 2 Σ := (x x(x x T. (3 First derive a general formula and then calculate it for the concrete values given. (2 2.9 Eigenvalue equation of the covariance matrix 2.9. Exercise: Eigenvectors of a symmetric matrix are orthogonal Prove that the eigenvectors of a symmetric matrix are orthogonal, if their eigenvalues are different. Proceed as follows:. Let A be a symmetric N-dimensional matrix, i.e. A = A T. Show first that (v, Aw = (Av, w for any vectors v, w R N, with (, indicating the Euclidean inner product. 2. Let {a i } be the eigenvectors of the matrix A with the eigenvalues λ i. Show with the help of part one that (a i, a j = 0 if λ i λ j. Hint: λ i (a i, a j =... 6

2.0 Total variance of the data x 2. Diagonalizing the covariance matrix 2.2 Variance of y for a diagonalized covariance matrix 2.3 Constraints of matrix V 2.4 Finding the optimal subspace 2.5 Interpretation of the result 2.5. Exercise: Moments of a data distribution: Simple example Given a data distribution x µ with x = ( ( 3, x 2 =, x 3 = 2 ( 2. ( 3. Calculate the mean x = x µ µ and the second-moment matrix C = x µ x µt µ. 2. Determine the normalized eigenvectors c and c 2 of C and the corresponding eigenvalues. Hint: Look at the data distribution and guess the eigenvectors on the basis of the symmetry of the distribution. Then insert the guessed eigenvectors into the eigenvalue equation, verify that they are eigenvectors and calculate the eigenvalues. Otherwise you have to go the hard way via the characteristic polynomial. 3. Determine the first and second moment of i.e. y µ µ and (y µ 2 µ, for α {, 2}. y µ = c T αx µ, (2 Hint: You don t have to compute the projected data. There is a simpler way. 2.5.2 Exercise: From data distribution to second-moment matrix via the eigenvectors Give an estimate of the second-moment matrix for the following data distributions by first guessing the eigenvalues and normalized eigenvectors from the distribution and then calculating the matrix. x 2 x 2 x 2 x x x (a (b (c CC BY-SA 4.0 7

2.5.3 Exercise: From data distribution to second-moment matrix via the eigenvectors Give an estimate of the second-moment matrix for the following data distributions by first guessing the eigenvalues and normalized eigenvectors from the distribution and then calculating the matrix. x 2 x 2 x 2 x x x (a (b (c CC BY-SA 4.0 2.5.4 Exercise: Dimensionality reduction Given some data in R 3 with the corresponding 3 3 second-moment matrix C with eigenvectors c α and eigenvalues λ α, with λ = 3, λ 2 = and λ 3 = 0.2.. Define a matrix A R 2 3 that maps the data into a two-dimensional space while preserving as much variance as possible. 2. Define a matrix B R 3 2 that places the reduced data back into R 3 with minimal reconstruction error. How large is the reconstruction error? 3. Prove that AB is an identity matrix. Why would one expect that intuitively? 4. Prove that BA is a projection matrix but not the identity matrix. 2.6 PCA Algorithm 2.7 Intuition of the Results 2.8 Whitening or sphering 2.8. Exercise: Sphered data is uncorrelated Prove that sphered zero-mean data ˆx projected onto two orthogonal vectors n and n 2 is uncorrelated. Hint: The correlation coefficient for two scalar data sets y and y 2 with means ȳ i := y i is defined as c := (y ȳ (y 2 ȳ 2 (y ȳ 2 (y 2 ȳ 2 2 ( 8

2.9 Singular value decomposition + 3 Application 3. Face processing 4 Acknowledgment 9