Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13
Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =. } x 1n x 2n x mn x m1 Figure: A collection data in R n Yuanzhen Shao PCA 2 / 13
Data as points in R n Without loss of generality, we may assume that the mean of this collection of data 0 X = 1 m 0 X i = m i=1.. 0 Otherwise, we just translate these data to have 0 mean by looking at S = {X 1 X, X 2 X,, X m X }. Figure: A collection data in R n Yuanzhen Shao PCA 3 / 13
Variance of Data Question 1: How can we determine a subspace that S is close to? Some examples: http://setosa.io/ev/principal-component-analysis/ Yuanzhen Shao PCA 4 / 13
Variance of Data Question 1: How can we determine a subspace that S is close to? Some examples: http://setosa.io/ev/principal-component-analysis/ Question 2: How to determine the direction representing the largest variance of S? Yuanzhen Shao PCA 4 / 13
Variance of Data Question 1: How can we determine a subspace that S is close to? Some examples: http://setosa.io/ev/principal-component-analysis/ Question 2: How to determine the direction representing the largest variance of S? Recall if v R n is a unit vector, then the orthogonal projection of X i on the direction given by v, i.e. span{v}, is Proj V X i = (X i, v)v = c i v, that is, c i is the coordinate of X i in the direction of v. Figure: Orthogonal projection Yuanzhen Shao PCA 4 / 13
Variance of Data Answer: the direction representing the largest variance of S is given by the unit vector v that can maximize m ci 2 = m (X i, v) 2 among all unit vector v, i=1 i=1 or equivalently to maximize m m ci 2 = (X i, v) 2 among all unit vector v. i=1 i=1 Yuanzhen Shao PCA 5 / 13
Optimization Problem Question 3: How to find such v to this optimization problem? Let X T 1 X T x 11 x 12 x 1n A m n = 2. =.... x m1 x m2 x mn X T m Yuanzhen Shao PCA 6 / 13
Optimization Problem Question 3: How to find such v to this optimization problem? Let X T 1 X T x 11 x 12 x 1n A m n = 2. =.... x m1 x m2 x mn X T m Recall (X i, v) = Xi T v. So X1 T v (X 1, v) X2 T Av = v. = (X 2, v). = Xm T v (X m, v) c 1 c 2. c m. Yuanzhen Shao PCA 6 / 13
Optimization Problem Question 3: How to find such v to this optimization problem? Let X T 1 X T x 11 x 12 x 1n A m n = 2. =.... x m1 x m2 x mn X T m Recall (X i, v) = Xi T v. So X1 T v (X 1, v) X2 T Av = v. = (X 2, v). = Xm T v (X m, v) c 1 c 2. c m. Therefore, m i=1 ci 2 = (Av, Av) = v T A T Av = v T }{{} C v =A T A Yuanzhen Shao PCA 6 / 13
Optimization Problem C is an n n symmetric matrix (consider why), and thus is diagonalizable. Yuanzhen Shao PCA 7 / 13
Optimization Problem C is an n n symmetric matrix (consider why), and thus is diagonalizable. Assume that C has eigenvalues λ 1 λ 2 λ n. Moreover, C has an orthonormal basis of eigenvectors v 1, v 2, v n such that Cv i = λ i v i. Yuanzhen Shao PCA 7 / 13
Optimization Problem C is an n n symmetric matrix (consider why), and thus is diagonalizable. Assume that C has eigenvalues λ 1 λ 2 λ n. Moreover, C has an orthonormal basis of eigenvectors v 1, v 2, v n such that Cv i = λ i v i. In particular, vi T Cv i = λ i vi T v i = λ i v i 2 = λ i. Yuanzhen Shao PCA 7 / 13
Optimization Problem C is an n n symmetric matrix (consider why), and thus is diagonalizable. Assume that C has eigenvalues λ 1 λ 2 λ n. Moreover, C has an orthonormal basis of eigenvectors In particular, v 1, v 2, v n such that Cv i = λ i v i. vi T Cv i = λ i vi T v i = λ i v i 2 = λ i. Claim: v 1 represents the direction of the largest variance of S. Yuanzhen Shao PCA 7 / 13
Optimization Problem Proof. If u = a 1 v 1 + a 2 v 2 + a n v n is a unit vector in R n, i.e. then 1 = u 2 = a 2 1 + a 2 2 + + a 2 n, (consider why?) u T Cu = (a 1 v 1 + a 2 v 2 + a n v n ) T (λ 1 a 1 v 1 + λ 2 a 2 v 2 + λ 2 a n v n ) }{{}}{{} u T Cu = λ 1 a1 2 + λ 2 a2 2 + λ n an 2 λ 1 a 2 1 + λ 1 a 2 2 + λ 1 a 2 n = λ 1 (a 2 1 + a 2 2 + + a 2 n) }{{} =1 = λ 1. Yuanzhen Shao PCA 8 / 13
Optimization Problem Thus, v 1 = the direction of the largest variance of S. Yuanzhen Shao PCA 9 / 13
Optimization Problem Thus, v 1 = the direction of the largest variance of S. Similarly, we can show that v 2 = the direction of the largest variance of S in span{v 1 } Yuanzhen Shao PCA 9 / 13
Optimization Problem Thus, v 1 = the direction of the largest variance of S. Similarly, we can show that v 2 = the direction of the largest variance of S in span{v 1 } v 3 = the direction of the largest variance of S in span{v 1, v 2 } Yuanzhen Shao PCA 9 / 13
Optimization Problem Thus, v 1 = the direction of the largest variance of S. Similarly, we can show that v 2 = the direction of the largest variance of S in span{v 1 } v 3 = the direction of the largest variance of S in span{v 1, v 2 } etc. Yuanzhen Shao PCA 9 / 13
Optimization Problem Thus, v 1 = the direction of the largest variance of S. Similarly, we can show that v 2 = the direction of the largest variance of S in span{v 1 } v 3 = the direction of the largest variance of S in span{v 1, v 2 } etc. In the end, we can just drop the directions corresponding to very small eigenvalues of C. Yuanzhen Shao PCA 9 / 13
PCA in digital images: an example by Václav Hlaváč Let us consider a 321 261 image. Such an image can be considered as a vector in R n with n = 321 261 = 83781. Yuanzhen Shao PCA 10 / 13
What if we have 32 instances of images? Yuanzhen Shao PCA 11 / 13
PCA in digital images: an example by Václav Hlaváč Using PCA method, we can determine a four-dimensional subspace W in R n such that all 32 images are close to W. Yuanzhen Shao PCA 12 / 13
PCA in digital images: an example by Václav Hlaváč Using PCA method, we can determine a four-dimensional subspace W in R n such that all 32 images are close to W. We can find four basis vectors for W, which can be displayed as images: Yuanzhen Shao PCA 12 / 13
PCA in digital images: an example by Václav Hlaváč Using PCA method, we can determine a four-dimensional subspace W in R n such that all 32 images are close to W. We can find four basis vectors for W, which can be displayed as images: We can reconstruct all 32 images by using linear combinations of these four basis images, e.g. where q 1 = 0.078, q 2 = 0.062, q 3 = 0.182, q 4 = 0.179. Yuanzhen Shao PCA 12 / 13
Reconstruction fidelity, 4 components Yuanzhen Shao PCA 13 / 13
References Václav Hlaváč, Principal Component Analysis Application to images http://setosa.io/ev/principal-component-analysis/ http://www.visiondummy.com/2014/04/ geometric-interpretation-covariance-matrix/ Yuanzhen Shao PCA 14 / 13