Principal Component Analysis

Similar documents
Computation. For QDA we need to calculate: Lets first consider the case that

COMP 558 lecture 18 Nov. 15, 2010

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Singular Value Decomposition

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Mathematical foundations - linear algebra

Lecture: Face Recognition and Feature Reduction

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

LECTURE 16: PCA AND SVD

Stat 159/259: Linear Algebra Notes

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

1 Principal Components Analysis

Singular Value Decomposition (SVD)

The Singular Value Decomposition

Positive Definite Matrix

Maths for Signals and Systems Linear Algebra in Engineering

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

UNIT 6: The singular value decomposition.

18.06SC Final Exam Solutions

1 Linearity and Linear Systems

Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Matrix Vector Products

B553 Lecture 5: Matrix Algebra Review

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Singular Value Decomposition

EE731 Lecture Notes: Matrix Computations for Signal Processing

Lecture 5 Singular value decomposition

Linear Algebra Methods for Data Mining

Multivariate Statistical Analysis

Lecture: Face Recognition and Feature Reduction

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Numerical Methods I Singular Value Decomposition

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Conceptual Questions for Review

Linear Systems. Class 27. c 2008 Ron Buckmire. TITLE Projection Matrices and Orthogonal Diagonalization CURRENT READING Poole 5.4

Linear Algebra Primer

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

14 Singular Value Decomposition

Machine Learning (Spring 2012) Principal Component Analysis

Foundations of Computer Vision

Linear Algebra Review. Vectors

CS 143 Linear Algebra Review

Principal Component Analysis

Singular Value Decomposition

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

1 Singular Value Decomposition and Principal Component

Lecture 2: Linear Algebra

Lecture 7 Spectral methods

Large Scale Data Analysis Using Deep Learning

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Singular Value Decomposition

Vectors and Matrices Statistics with Vectors and Matrices

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Mathematical foundations - linear algebra

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Chapter 7: Symmetric Matrices and Quadratic Forms

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Bindel, Fall 2013 Matrix Computations (CS 6210) Week 2: Friday, Sep 6

Example Linear Algebra Competency Test

Computational Methods CMSC/AMSC/MAPL 460. EigenValue decomposition Singular Value Decomposition. Ramani Duraiswami, Dept. of Computer Science

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

Homework 1 Elena Davidson (B) (C) (D) (E) (F) (G) (H) (I)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Introduction to Machine Learning

Notes on Linear Algebra

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

APPLICATIONS The eigenvalues are λ = 5, 5. An orthonormal basis of eigenvectors consists of

Notes on Eigenvalues, Singular Values and QR

Linear Algebra - Part II

15 Singular Value Decomposition

FSAN/ELEG815: Statistical Learning

I. Multiple Choice Questions (Answer any eight)

IV. Matrix Approximation using Least-Squares

Linear Algebra Review

Linear Algebra & Geometry why is linear algebra useful in computer vision?

NORMS ON SPACE OF MATRICES

Linear Models Review

Data Preprocessing Tasks

Basic Calculus Review

EE731 Lecture Notes: Matrix Computations for Signal Processing

Chapter 3 Transformations

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Tutorial on Principal Component Analysis

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Linear Algebra Methods for Data Mining

Announcements (repeat) Principal Components Analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

5 Linear Algebra and Inverse Problem

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

Singular Value Decomposition and Principal Component Analysis (PCA) I

Appendix A: Matrices

Basic Concepts in Matrix Algebra

Transcription:

Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal Component Analysis 1 / 47

Motivation Motivation How wide is the widest part of NGC 1300? Leow Wee Kheng (NUS) Principal Component Analysis 2 / 47

Motivation How thick is the thickest part of NGC 4594? Use principal component analysis. Leow Wee Kheng (NUS) Principal Component Analysis 3 / 47

Maximum Variance Estimate Maximum Variance Estimate Consider a set of points x i, i = 1,...,n, in an m-dimensional space such that their mean µ = 0, i.e., centroid is at the origin. x 2 x i x 2 y i d i l q x 1 O x1 x 4 x 3 Want to find a line l through the origin that maximizes the projections y i of the points x i on l. Leow Wee Kheng (NUS) Principal Component Analysis 4 / 47

Maximum Variance Estimate Let q denote the unit vector along line l. Then, the projection y i of x i on l is y i = x i q. (1) The mean squared projection, which is the variance, V over all points is V = 1 n yi 2 = 1 n (x i q) 2 0. (2) n n i=1 i=1 Expanding Eq. 2 gives [ V = 1 n (x i q) (x i q) = q 1 n n i=1 ] n x i x i q. (3) The middle factor is the covariance matrix C of the data points C = 1 n x i x i. (4) n i=1 i=1 Leow Wee Kheng (NUS) Principal Component Analysis 5 / 47

Maximum Variance Estimate We want to find a unit vector q that maximizes the variance V, i.e., maximize V = q Cq subject to q = 1. (5) This is a constrained optimization problem. Use Lagrange multiplier method: combine V and the constraint using Lagrange multiplier λ maximize V = q Cq λ(q q 1). (6) Leow Wee Kheng (NUS) Principal Component Analysis 6 / 47

Maximum Variance Estimate Lagrange Multiplier Method Lagrange Multiplier Method Lagrange multiplier is a method for solving constrained optimization. Consider this problem: maximize f(x) subject to g(x) = c. (7) Lagrange multiplier method introduces a Lagrange multiplier λ to combine f(x) and g(x): The sign of λ can be positive or negative. L(x,λ) = f(x)+λ(g(x) c). (8) Then, solve for the stationary point of L(x,λ): L(x, λ) x = 0. (9) Leow Wee Kheng (NUS) Principal Component Analysis 7 / 47

Maximum Variance Estimate Lagrange Multiplier Method If x 0 is a solution of the original problem, then there is a λ 0 such that (x 0,λ 0 ) is a stationary point of L. Not all stationary points yield solutions of the original problem. The same method applies to minimization problem. Multiple constraints can be combined by adding multiple terms. is solved with minimize f(x) subject to g 1 (x) = c 1, g 2 (x) = c 2 (10) L(x) = f(x)+λ 1 (g 1 (x) c 1 )+λ 2 (g 2 (x) c 2 ). (11) Leow Wee Kheng (NUS) Principal Component Analysis 8 / 47

Maximum Variance Estimate Now, we differentiate V with respect to q and set to 0: Rearranging the terms gives V q = 2q C 2λq = 0. (12) q C = λq. (13) Since the covariance matrix C is symmetric (homework), C = C. So, transposing both sides of Eq. 13 gives Eq. 14 is called an eigenvector equation. q is the eigenvector. λ is the eigenvalue. Cq = λq. (14) Thus, the eigenvector q of C gives the line that maximizes variance V. Leow Wee Kheng (NUS) Principal Component Analysis 9 / 47

Maximum Variance Estimate The perpendicular distance d i of x i from the line l is d i = x i y i q = x i (x i q)q. (15) x 2 x i x 2 y i d i l q x 1 O x1 x 4 x 3 Leow Wee Kheng (NUS) Principal Component Analysis 10 / 47

Maximum Variance Estimate The squared distance is d 2 i = x i (x i q)q 2 = (x i (x i q)q) (x i (x i q)q) = x i x i x i (x i q)q (x i q)q x i +(x i q)q (x i q)q = x i x i (x i q)2. Averaging over all i gives D = 1 n n d 2 i = 1 n i=1 n x i x i V. (16) i=1 So, maximizing variance V means minimizing mean squared distance D to data points x i. The eigenvalue λ = V, the variance (homework); so λ 0. Leow Wee Kheng (NUS) Principal Component Analysis 11 / 47

Maximum Variance Estimate Name q as the first eigenvector q 1. The component of x i orthogonal to q 1 is x i = x i x i q 1q 1. Repeat previous method on x i in an (m 1)-D subspace orthogonal to q 1 to get q 2. Then, repeat to get q 3,...,q m. x 2 orthogonal subspace O q 1 x 1 x 3 Leow Wee Kheng (NUS) Principal Component Analysis 12 / 47

Eigendecomposition Eigendecomposition In general, a m m covariance matrix C has m eigenvectors: Cq j = λ j q j, j = 1,...,m. (17) Transpose both sides of the equations to get q j C = λ j q j, j = 1,...,m, (18) which are row matrices. Stack the row matrices into a column to get q λ 1 0 0 q 1. C =. 0.. 0 1. (19) 0 0 λ m q m Denote matrix Q and Λ as q m Q = [q 1 q m ], Λ = diag(λ 1,...,λ m ). (20) Leow Wee Kheng (NUS) Principal Component Analysis 13 / 47

Eigendecomposition Then, Eq. 19 becomes Q C = ΛQ (21) or C = QΛQ. (22) This is the matrix equation for eigendecomposition. The eigenvectors are orthonormal: q j q j = 1, q j q k = 0, for k j. (23) So, the eigenmatrix Q is orthogonal: Q Q = QQ = I. (24) The eigenvalues are arranged to be sorted λ j λ j+1. Leow Wee Kheng (NUS) Principal Component Analysis 14 / 47

Eigendecomposition In general, the mean of the data points x i are not at the origin. In this case, we subtract µ from each x i. Now, we can transform x i into a new vector y i : q 1 y i = Q (x i µ) m (x i µ) =. = (x i µ) q j q j, (25) q m(x i µ) j=1 where is the projection of x i µ on q j. Original x i can be recovered from y i : Caution! x i y i +µ. y ij = (x i µ) q j (26) x i = Qy i +µ. (27) x i is in the original input space but y i is in the eigenspace. Leow Wee Kheng (NUS) Principal Component Analysis 15 / 47

Eigendecomposition Leow Wee Kheng (NUS) Principal Component Analysis 16 / 47

PCA Algorithm 1 PCA Algorithm 1 Let x i denote m-dimensional vectors (data points), i = 1,...,n. 1. Compute the mean vector of the data points µ = 1 n n x i. (28) i=1 2. Compute the covariance matrix C = 1 n n (x i µ)(x i µ). (29) i=1 3. Perform eigendecomposition of C C = QΛQ. (30) Leow Wee Kheng (NUS) Principal Component Analysis 17 / 47

PCA Algorithm 1 Some books and papers use sample covariance C = 1 n 1 n (x i µ)(x i µ). (31) i=1 Eq. 31 differs from Eq. 29 by only a constant. Leow Wee Kheng (NUS) Principal Component Analysis 18 / 47

PCA Algorithm 1 Properties of PCA Properties of PCA Mean µ y over all y i is 0 (homework). Variance σ 2 j along q j is λ j (homework). Since λ 1 λ m 0, so σ 1 σ m 0. q 1 gives orientation of the largest variance. q 2 gives orientation of largest variance orthogonal to q 1 (2nd largest variance). q j gives orientation of largest variance orthogonal to q 1,...,q j 1 (j-th largest variance). q m is orthogonal to all other eigenvectors (least variance). Leow Wee Kheng (NUS) Principal Component Analysis 19 / 47

PCA Algorithm 1 Properties of PCA Data Points Leow Wee Kheng (NUS) Principal Component Analysis 20 / 47

PCA Algorithm 1 Properties of PCA Centroid at Origin Leow Wee Kheng (NUS) Principal Component Analysis 21 / 47

PCA Algorithm 1 Properties of PCA Principal Components Leow Wee Kheng (NUS) Principal Component Analysis 22 / 47

PCA Algorithm 1 Properties of PCA Another Example Leow Wee Kheng (NUS) Principal Component Analysis 23 / 47

PCA Algorithm 1 Properties of PCA Centroid at Origin Leow Wee Kheng (NUS) Principal Component Analysis 24 / 47

PCA Algorithm 1 Properties of PCA Principal Components Leow Wee Kheng (NUS) Principal Component Analysis 25 / 47

PCA Algorithm 1 Properties of PCA Notes: Covariance matrix C is a m m matrix. Eigendecomposition of very large matrix is inefficient. Example: A 256 256 colour image has 256 256 3 values. Number of dimensions m = 196680. C has a size of 196608 196608!! Number of images n is usually m, e.g., 1000. Leow Wee Kheng (NUS) Principal Component Analysis 26 / 47

PCA Algorithm 2 PCA Algorithm 2 1. Compute mean µ of data points x i. µ = 1 n 2. Form a m n matrix A, n m: n x i. i=1 A = [(x 1 µ) (x n µ)]. (32) 3. Compute A A, which is just a n n matrix. 4. Apply eigendecomposition on A A: 5. Pre-multiply A to Eq. 33 giving (A A)q j = λ j q j. (33) AA Aq j = Aλ j q j. (34) Leow Wee Kheng (NUS) Principal Component Analysis 27 / 47

PCA Algorithm 2 6. Recover eigenvectors and eigenvalues. Since AA = nc (homework), Eq. 34 is nc(aq j ) = λ j (Aq j ) (35) Therefore, eigenvectors of C are Aq j, and eigenvalues of C are λ j /n. C(Aq j ) = λ j n (Aq j). (36) Leow Wee Kheng (NUS) Principal Component Analysis 28 / 47

PCA Algorithm 3 PCA Algorithm 3 1. Compute mean µ of data points x i. µ = 1 n n x i. i=1 2. Form a m n matrix A: A = [(x 1 µ) (x n µ)]. 3. Apply singular value decomposition (SVD) on A: A = UΣV. 4. Recover eigenvectors and eigenvalues (page 31). Leow Wee Kheng (NUS) Principal Component Analysis 29 / 47

PCA Algorithm 3 Singular Value Decomposition Singular Value Decomposition Singular value decomposition (SVD) decomposes a matrix A into A = UΣV. (37) Column vectors of U are left singular vectors u j, and u j are orthonormal. Column vectors of V are right singular vectors v j, and v j are orthonormal. Σ is diagonal and contains singular values s j. Rank of A = number of non-zero singular values. Leow Wee Kheng (NUS) Principal Component Analysis 30 / 47

PCA Algorithm 3 Singular Value Decomposition Notice that AA = UΣV ( UΣV ) = UΣV VΣU = UΣ 2 U and eigendecomposition of AA is AA = QΛQ. So, eigenvector of AA = u j, eigenvalue of AA = s 2 j. On the other hand, ( A A = UΣV ) UΣV = VΣU UΣV = VΣ 2 V. Compare with the eigendecomposition of A A: A A = QΛQ. So, eigenvector of A A = v j, eigenvalue of A A = s 2 j. Leow Wee Kheng (NUS) Principal Component Analysis 31 / 47

PCA Algorithm 3 4. Recover eigenvectors and eigenvalues. Compare AA = nc with eigendecomposition of C: AA = UΣ 2 U, C = QΛQ. Therefore, eigenvectors q j of C = u j in U, and eigenvalues λ j of C = s 2 j /n, for s j in Σ. Leow Wee Kheng (NUS) Principal Component Analysis 32 / 47

Application Examples Application Examples Identify the 1st and 2nd major axes of these objects. Leow Wee Kheng (NUS) Principal Component Analysis 33 / 47

Application Examples Apply PCA for plane fitting in 3-D. Compute PCA of the points. Then, mean of points = known point on plane 3rd eigenvector = normal of plane Leow Wee Kheng (NUS) Principal Component Analysis 34 / 47

Dimensionality Reduction Dimensionality Reduction If a point represents a 256 256 colour image, then the dimensionality is 256 256 3 = 196680!! But, not all 196680 eigenvalues are non-zero. Leow Wee Kheng (NUS) Principal Component Analysis 35 / 47

Dimensionality Reduction Case 1: Number of data vectors n m number of dimensions. Data vectors are all independent. Then, number of non-zero eigenvalues (eigenvectors) = n 1, i.e., rank of covariance matrix = n 1. Why not = n? Data vectors are not independent. Then, rank of covariance matrix < n 1. Case 2: Number of data vectors n > m. m or more independent data vectors. Then, rank of covariance matrix = m. Fewer than m data vectors are independent. In this case, what is the rank of covariance matrix? In practice, it is often possible to reduce the dimensionality. Leow Wee Kheng (NUS) Principal Component Analysis 36 / 47

Dimensionality Reduction Eigenmatrix Q is Q = [q 1 q m ]. (38) PCA maps a data point x to a vector y in the eigenspace as m y = Q (x µ) = (x µ) q j q j, (39) which has m dimensions. Pick l eigenvectors with the largest eigenvalues to form truncated Q: Q = [q 1 q l ], (40) which spans a subspace of the eigenspace. Then, Q maps x to ŷ, an estimate of y: ŷ = Q l (x µ) = (x µ) q j q j, (41) which has l < m dimensions. j=1 j=1 Leow Wee Kheng (NUS) Principal Component Analysis 37 / 47

Dimensionality Reduction x y ŷ x x 1 y 1 y 1 x 1. Q.. Q.. Q y l y l Q /.. y l+1 dimensionality... reduction. x m y m x m Difference between y and ŷ is m y ŷ = (x µ) q j q j. (42) j=l+1 Leow Wee Kheng (NUS) Principal Component Analysis 38 / 47

Dimensionality Reduction Beware: and, therefore, y = Q (x µ) x = Qy+µ. But, whereas Why? ŷ = Q (x µ), x = Qŷ+µ x. (43) With n data points x i, sum-squared error E between x i and its estimate x i is n E = x i x i 2. (44) i=1 Leow Wee Kheng (NUS) Principal Component Analysis 39 / 47

Dimensionality Reduction When all m dimensions are kept, the total variance is m m σj 2 = λ j. (45) j=1 j=1 When l dimensions are used, the ratio R of unaccounted variance is: m m R (l) = j=l+1 m σ 2 j σj 2 j=1 = j=l+1 λ j. (46) m λ j j=1 Leow Wee Kheng (NUS) Principal Component Analysis 40 / 47

Dimensionality Reduction A sample plot of R vs. number of dimensions used: 1 R not enough enough 0 l m number of dimensions How to choose appropriate l? Choose l such that larger than l doesn t reduce R significantly: R (l) R (l+1) < ǫ. (47) Leow Wee Kheng (NUS) Principal Component Analysis 41 / 47

Dimensionality Reduction Alternatively, compute ratio R of accounted variance: R(l) = R 1 l j=1 m j=1 σ 2 j σ 2 j = l j=1 λ j. (48) m λ j j=1 l is large enough if R(l+1) R(l) < ǫ. not enough enough 0 l m number of dimensions Leow Wee Kheng (NUS) Principal Component Analysis 42 / 47

Summary Summary Eigendecomposition of covariance matrix = PCA. In practice, PCA is computed using SVD. PCA maximizes variances along the eigenvectors. Eigenvalues = variances along eigenvectors. Best fitting plane computed by PCA minimizes distance to points. PCA can be used for dimensionality reduction. Leow Wee Kheng (NUS) Principal Component Analysis 43 / 47

Probing Questions Probing Questions When applying PCA to plane fitting, how to check whether the points really lie close to the plane? If you apply PCA to a set of points on a curve surface, where do you expect the eigenvectors to point at? In dimensionality reduction, the m-d vector y is reduced to a l-d vector ŷ. Since l m, vector subtraction is undefined. But, subtraction of Eq. 39 and 41 gives y ŷ = m (x µ) q j q j. j=l+1 How is this possible? Is there a contradiction? Show that when n m, there are at most n 1 eigenvectors. Leow Wee Kheng (NUS) Principal Component Analysis 44 / 47

Homework Homework I 1. Describe the essence of PCA in one sentence. 2. Show that the covariance matrix C of a set of data points x i, i = 1,...,n, is symmetric about its diagonal. 3. Show that the eigenvalue λ of covariance matrix C equals the variance V as defined in Eq. 2. 4. Show that the mean µ y over all y i in the eigenspace is 0. 5. Show that the variance σ 2 j along eigenvector q j is λ j. 6. Show that AA = nc. 7. Derive the difference Qy Qŷ in dimensionality reduction. Leow Wee Kheng (NUS) Principal Component Analysis 45 / 47

Homework Homework II 8. Suppose n m, and the truncated eigenmatrix Q contains all the eigenvectors with non-zero eigenvalues, Q = [q 1 q n 1 ]. Now, map an input vector x to y and ŷ respectively by the complete Q (Eq. 39) and the truncated Q (Eq. 41). What is the error y ŷ? What is the result of mapping ŷ back to the input space by Q (Eq. 43)? 9. Q3 of AY2015/16 Final Evaluation. Leow Wee Kheng (NUS) Principal Component Analysis 46 / 47

References References 1. G. Strang, Introduction to Linear Algebra, 4th ed., Wellesley-Cambridge, 2009. www-math.mit.edu/~gs 2. C. Shalizi, Advanced Data Analysis from an Elementary Point of View, 2015. www.stat.cmu.edu/~cshalizi/adafaepov Leow Wee Kheng (NUS) Principal Component Analysis 47 / 47