More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

More Linear Algebra Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring 2017

Overview Eigensystems: decomposition of square matrix Singular Value Decompositions: decomposition of rectangular matrix Maximization: Reading: Johnson & Wichern pages 60 66, 73 75, 77 81 C.J. Anderson (Illinois) More Linear Algebra Spring 2017 2.1/ 40

Eigensystems Let A be a p p square matrix, then the scalars λ 1,λ 2,...,λ p that satisfy the polynomial equation A λi = 0 are called eigenvalues (or characteristic roots ) of matrix A. The equation A λi = 0 is called the characteristic equation. Example: A = A λi = ( ) 1 5 5 1 (1 λ) 5 5 (1 λ) = 0 (1 λ) 2 ( 5)( 5)) = 0 λ 2 2λ 24 = 0 (λ 6)(λ+4) = 0 λ 1 = 6 and λ 2 = 4 Quadratic Formula: ax 2 +bx +c = 0 ( b ± b 2 4ac)/(2a) C.J. Anderson (Illinois) More Linear Algebra Spring 2017 3.1/ 40

Eigenvectors A square matrix A is said to have eigenvalues λ with a corresponding eigenvector x 0 if Ax = λx or (A λi)x = 0 We usually normalize x so that it has length = 1. e = x Lx = x x x e is also an eigenvector of A because Ae = λe A(Lxe) = λ(lxe) Ax = λx so e e = 1 Any multiple of x is an eigenvector associated with λ. All that matters is the direction and not the length of x. C.J. Anderson (Illinois) More Linear Algebra Spring 2017 4.1/ 40

Example: Eigenvectors continued ( ) 1 5 A = 5 1 ( )( ) 1 5 x1 = λ 5 1 x 2 ( x1 x 2 ) x 1 5x 2 = λx 1 5x 1 +x 2 = λx 2 So we have 2 equations and 3 unknowns (x 1, x 2 and λ). Set λ = 6, now there are 2 equations with 2 unknowns: x 1 5x 2 = 6x 1 5x 1 +x 2 = 6x 2 x = e = ( ) 1/ 2 1/ 2 C.J. Anderson (Illinois) More Linear Algebra Spring 2017 5.1/ 40

Symmetric Matrix Now A is (p p) symmetric Let A (p p) be a symmetric matrix. Then A has p pairs of eigenvalues and eigenvectors λ 1,e 1 ; λ 2,e 2 ; ; λ p,e p. The eigenvectors are chosen to have length= 1: e 1 e 1 = e 2 e 2 = = e p e p = 1. The eigenvectors are also chosen to be mutually orthogonal (perpendicular): e i e k that is e ie k = 0 for all i k The eigenvectors are all unique if no 2 eigenvalues are equal. Typically the eigenvalues are ordered from largest to smallest C.J. Anderson (Illinois) More Linear Algebra Spring 2017 6.1/ 40

Little Example continued A = ( 1 5 5 1 ) and λ 1 = 6 λ 2 = 4 e 1 = ( 1/ 2 1/ 2 ) e 2 = ( 1/ 2 1/ 2 ) Note that e 1 e 2 = 0 and Le 1 = Le 2 = 1. C.J. Anderson (Illinois) More Linear Algebra Spring 2017 7.1/ 40

Spectral Decomposition of A The Spectral Decomposition of A where A (p p) symmetric. A = λ 1 e 1 e 1+λ }{{} 2 e 2 e 2+ +λ }{{} p e k e k }{{} p p p p p p If A is also positive definite, then k = p. Matrix A is decomposed into p (p p) component matrices. where e i e i = 1 for all i, and e i e j = 0 for all i j. ( ) ( ) 1 5 λ A = 1 = 6 1/ 2 e 5 1 λ 2 = 4 1 = 1/ e 2 = ( ) ( 2 ) λ 1 e 1 e 1 +λ 2e 2 e 1/2 1/2 1/2 1/2 2 = 6 4 1/2 1/2 1/2 1/2 ( ) 1 5 = = A 5 1 ( 1/ 2 1/ 2 ) C.J. Anderson (Illinois) More Linear Algebra Spring 2017 8.1/ 40

A Bigger Example A = e 1 = 13 4 2 4 13 2 2 2 10 1 2 1 20 e 2 = λ 1 = λ 2 = 9,λ 3 = 18 1 18 1 18 4 18 e 3 = 2 3 2 3 1 3 Note that since λ 1 = λ 2 the labeling of e 1 and e 2 is arbitrary. The lengths: e 1 e 1 = e 2 e 2 = e 3 e 3 = 1. Orthogonality: e 1 e 2 = e 1 e 3 = e 2 e 3 = 0. Decomposition: A = 9e 1 e 1 +9e 2 e 2 +18e 3 e 3 C.J. Anderson (Illinois) More Linear Algebra Spring 2017 9.1/ 40

Decomposition of (3 3) A = 9 +18 9 2 9 2 1 2 ) 1 ( 1 20 2, 1 2,0 2 3 2 3 1 3 9 2 0 9 2 0 ( 2 3 2 3 9 18 9 18 36 18 = + 0 0 0 = 1 234 72 36 72 234 36 18 36 36 180 +9 1 3 ) 9 18 9 18 36 18 = 1 18 1 18 4 18 36 18 36 18 144 18 ( 1 18 + 13 4 2 4 13 2 2 2 10 72 9 72 9 54 9 1 18 72 9 72 9 36 9 ) 4 18 36 9 36 9 18 9 C.J. Anderson (Illinois) More Linear Algebra Spring 2017 10.1/ 40

Recall: Quadratic Form is defined as x Ax for x p and A p p symmetric The terms of x Ax are squares of x i (i.e., xi 2 ) and cross-products of x i and x k (i.e., x i x k ): p p x Ax = a ik x i x k e.g., ( a11 a (x 1,x 2 ) 12 a 21 a 22 i=1 k=1 )( x1 x 2 = ((a 11 x 1 +a 21 x 2 ),(a 12 x 1 +a 22 x 2 )) ) ( x1 = a 11 x 2 1 +a 21x 1 x 2 +a 12 x 1 x 2 +a 22 x 2 2 = 2 C.J. Anderson (Illinois) More Linear Algebra i=1 k=1 Spring 2017 11.1/ 40 x 2 ) 2 a ik x i x k

Eigenvalues and Definiteness IF x Ax > 0 for all x, matrix A is positive definite. IF x A x 0 for all x, matrix A is non-negative definite. Important: All eigenvalues of A > 0 A is positive definite. All eigenvalues of A 0 A is non-negative definite Implication: If A is positive definite, then the diagonal elements of A must be positive. If x = (0,..., }{{} 1,...0) then x Ax = a ii xi 2 > 0 i th position C.J. Anderson (Illinois) More Linear Algebra Spring 2017 12.1/ 40

More on Spectral Decomposition When A p p symmetric and positive definite, (i.e., diagonals of A are all > 0, and λ i > 0 for all i). We can write the spectral decomposition of A as the sum of the weighted vector products, p A p p = λ i e i e i i=1 In matrix form this is A = PΛP where λ 1 0 0 0 λ 2 0 Λ p p = diag(λ i ) =...... 0 0 λ p and P p p = (e 1,e 2,,e p ). C.J. Anderson (Illinois) More Linear Algebra Spring 2017 13.1/ 40

Showing that A = PΛP A p p = P p p Λ p p P p p λ 1 0 0 0 λ 2 0 = (e 1,e 2,,e p )...... 0 0 λ p e 1 e 2 = (λ 1 e 1,λ 2 e 2,,λ p e p ). e p p = λ i e i e i i=1 e 1 e 2. e p C.J. Anderson (Illinois) More Linear Algebra Spring 2017 14.1/ 40

More about P Since The lengths of e i equal 1 (i.e., e i e i = 1), and e i and e k are orthogonal for all i k (i.e., e i e k = 0). e 1 1 0 0 P e P = 2. (e 0 1 0 1,e 2,,e p ) =...... 0 0 1 e p = I = PP P is an orthogonal matrix. C.J. Anderson (Illinois) More Linear Algebra Spring 2017 15.1/ 40

Rank r decompositions If A is non-negative definite (semi-definite): So λ i > 0 for i = 1,...,r < p λ i = 0 for i = r +1,...,p A p p = P p r Λ r r P r p. If A is positive or positive semi-definite, we sometimes want to approximate A by a rank r decomposition, where r < Rank of A, B = λ 1 e 1 e 1 +...+λ r e r e r This decomposition minimized the loss function p p (a ik b ik ) 2 = λ 2 r+1 +λ 2 r+2 + λ 2 p i=1 k=1 C.J. Anderson (Illinois) More Linear Algebra Spring 2017 16.1/ 40

Inverse of A If A is positive definite, the inverse of A equals where Why: A 1 = PΛ 1 P 1/λ 1 0 0 ( ) 1 0 1/λ 2 0 diag = λ i...... 0 0 1/λ p AA 1 = (PΛ P ) (P Λ 1 P ) = PΛΛ }{{}}{{ 1 } P = PP = I I I What does A 1 A equal? C.J. Anderson (Illinois) More Linear Algebra Spring 2017 17.1/ 40

Square Root Matrix If A is symmetric, the Square Root Matrix of A is A 1/2 = p λi e i e i = PΛ 1/2 P i=1 Common mistake: A 1/2 = { a ij }. Properties of A 1/2 : (A 1/2 ) = A 1/2...since A 1/2 is symmetric. A 1/2 A 1/2 = A (A 1/2 ) 1 = p i=1 (1/ λ i )e i e i = PΛ 1/2 P = A 1/2 A 1/2 A 1/2 = A 1/2 A 1/2 = I A 1/2 A 1/2 = A 1 C.J. Anderson (Illinois) More Linear Algebra Spring 2017 18.1/ 40

Determinant, Trace and Eigenvalues A = p λ i = λ 1 λ 2 λ p. i=1 Implication: A positive definite matrix has A > 0, because λ 1 > λ 2 > > λ p > 0 p a ii = trace(a) = i=1 p i=1 Now let s consider what s true for Σ and S. λ i C.J. Anderson (Illinois) More Linear Algebra Spring 2017 19.1/ 40

Numerical Example We ll use the psychological test data from Rencher (2002) who got it from Beall (1945) to illustrate these properties 32 males and 32 females had measures on four psychological tests. The tests were x 1 = pictorial inconsistencies x 2 = paper form board x 3 = tool recognition x 4 = vocabulary S = 10.387897 7.7926587 15.298115 5.3740079 7.7926587 16.657738 13.706845 6.1755952 15.298115 13.706845 57.057292 15.932044 5.3740079 6.1755952 15.932044 22.133929 Note that the total sample variance = trace(s) = 106.23686 and that the generalize sample variance = det(s) = 65980.199 C.J. Anderson (Illinois) More Linear Algebra Spring 2017 20.1/ 40

Numerical Example continued Eigenvalue of S are Λ = and the eigenvectors are P = Note that (for example) 72.717 0 0 0 0 16.111 0 0 0 0 13.114 0 0 0 0 4.295 0.274 0.002 0.327 0.904 0.284 0.185 0.854 0.394 0.856 0.409 0.271 0.163 0.333 0.8936 0.300 0.009 = (e 1, e 2, e 3, e 4 ) e 1 e 1 = (.274 2 +.284 2 +.856 2 +.333 2 ) = 1 = L 2 e 1 = Le 1. e 1 e 2 = (.274(.002)+.284(.185)+.856(.409)+.333(.894)) = 0. C.J. Anderson (Illinois) More Linear Algebra Spring 2017 21.1/ 40

Example: eigenvalues of S Sum of eigenvalues: λ 1 +λ 2 +λ 3 +λ 4 = 72.717+16.111+13.114+4.295 = 106.237 = trace(s) = Total sample variance Product of the eigenvalues: 4 λ i = 72.717 16.111 13.114 4.295 i=1 = 65986.76 = det(s) = GSV C.J. Anderson (Illinois) More Linear Algebra Spring 2017 22.1/ 40

Properties of Covariance Matrices Σ p p & S p p symmetric population and sample covariance matrices, respectively. Most of following holds true for both. Eigenvalues and eigenvectors: S has p pairs of eigenvalues and eigenvectors λ 1,e 1 ; λ 2,e 2 ; ; λ p,e p The λ i s are the roots of the characteristic equation S λi = 0 Eigenvectors are the solutions of the equation Se i = λ i e i C.J. Anderson (Illinois) More Linear Algebra Spring 2017 23.1/ 40

Properties of Covariance Matrices (continued) Since any multiple of e i will solve the above equation, we (usually) set the length of e i = 1 (i.e., L 2 e i = Le i = e i e i = 1). Eigenvectors are orthogonal: e i e k = 0 for all i k. Convention to order eigenvalues: λ 1 λ 2 λ p. Since S (& Σ) are symmetric, eigenvalues are Real numbers. C.J. Anderson (Illinois) More Linear Algebra Spring 2017 24.1/ 40

More about Covariance Matrices Spectral Decomposition: S = λ 1 e 1 e 1 +λ 2 e 2 e 2 + +λ p e p e p = PΛP Pp p = (e 1,e 2,...,e p ) Λp p = diag(λ i ). P P = {e i e k} = PP = I, which implies that P = P 1. Implications for quadratic forms: If x Sx > 0 for all x 0, then S is positive definite and λ i > 0 for all i. If x Sx 0 for all x 0, then S is non-negative or positive semi-definite and λ i 0 for all i. The inverse of S (if S is non-singular, i.e., λ i > 0 for all i) is S 1 = PΛ 1 P = P{diag{1/λ i }P C.J. Anderson (Illinois) More Linear Algebra Spring 2017 25.1/ 40

Numerical Example & Spectral Decomposition S = PΛP = 0.274 0.002 0.327 0.904 0.284 0.185 0.854 0.394 0.856 0.409 0.271 0.163 0.333 0.8936 0.300 0.009 0.274 0.284 0.856 0.333 0.002 0.185 0.409 0.8936 0.327 0.854 0.271 0.300 0.904 0.394 0.163 0.009 72.717 0 0 0 16.111 0 0 0 13.114 0 0 0 4.294 Do SAS/IML Demonstration of this and S 1 = PΛ 1 P. C.J. Anderson (Illinois) More Linear Algebra Spring 2017 26.1/ 40

and Even More about Covariance Matrices If {λ i,e i ;i = 1,...,p} for Σ and Σ is non-singular, then {1/λ i,e i ;i = 1,...p} for Σ 1 That is, Σ and Σ 1 have the same eigenvectors and their eigenvalues are the inverses of each other. S = λ 1 λ 2 λ p = p i=1 λ i. This is the generalized sample variance (GSV). p i=1 s ii = trace(s) = tr(s) = p i=1 λ i. This is the Total Sample Variance. If λ p, the smallest eigenvalue, is greater than 0, then S > 0. If S is singular, then there is at least 1 or more eigenvalues equal to 0. C.J. Anderson (Illinois) More Linear Algebra Spring 2017 27.1/ 40

The Rank of S (and Σ) Definition of rank: The Rank of S = the number of linearly independent rows (columns) = the number of non-zero eigenvalues If S p p is of Full Rank (i.e., rank = p), then λ p > 0 S is positive definite S > 0 S 1 exists S is non-singular definition: p linearly independent rows/columns C.J. Anderson (Illinois) More Linear Algebra Spring 2017 28.1/ 40

Singular Value Decomposition Given matrix A n p, the Singular Value Decomposition (SVD) of A is where A n p = P n r r r Q r p The r columns of P = (p 1,p 2,...p r ) are orthogonal: p i p i = 1 and p i p k = 0 for i k; that is, P P = I r. The r columns of Q = (q 1,q 2,...,q r ) are orthogonal: q i q i = 1 and q i q k = 0 for i k; that is, Q Q = I r. is a diagonal matrix with ordered positive values δ 1 δ 2 δ r r is the rank of A, which must be r min(n,p). C.J. Anderson (Illinois) More Linear Algebra Spring 2017 29.1/ 40

Singular Value Decomposition (continued) A n p = P n r r r Q r p Terminology: P are the left singular vectors Q are the right singular vectors The elements of are the singular values C.J. Anderson (Illinois) More Linear Algebra Spring 2017 30.1/ 40

Relationship between Eigensystems and SVD To show this let X n p which has rank p, and X n p = P n p p p Q p p. The product X p nx n p is a square and symmetric matrix. X p n X n p = (P n p p p Q p p ) (P n p p p Q p p ) = ( Q p p p p P p n )( P n p p p Q p p ) }{{} I = Q p p p p p p Q p p = Q p p 2 p p Q p p }{{}}{{}}{{} vectors valuesvectors If A (e.g., X p nx n p ) is square and symmetric, then SVD gives the same as eigenvector/value decomposition. C.J. Anderson (Illinois) More Linear Algebra Spring 2017 31.1/ 40

Lower Rank SVD Sometimes we want to summarize or approximate the basic structure of a matrix. In particular, let A n p = P n r r r Q r p, then B n p = P n r r r Q r p where r < r (note: r =rank of matrix A). This Lower Rank Decomposition minimizes the loss function n p (a ji b ji ) 2 = δr 2 +1 + +δr 2 j=1 i=1 This result of the least squared approximation of one matrix by another of lower rank is known as the Eckart-Young theorem. See Eckart, C. & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 211 218. C.J. Anderson (Illinois) More Linear Algebra Spring 2017 32.1/ 40

So What can I do with SVD? Biplot: Lower rank representation of a data matrix. Correspondence Analysis: Lower rank representation of the relationship between two categorical variables. Multiple Correspondence Analysis: Lower rank representations of the relationship between multiple categorical variables. Multidimensional Scaling Reduce the number of parameters in a complex model. and Many other scaling and data analytic methods. We ll examine what a Biplot can give us... Consider the psychological test data: The rank of the data matrix is 4, so X c = (X x) = P 64 4 4 4 Q 4 4 = (P ) Q }{{} cases }{{} variables C.J. Anderson (Illinois) More Linear Algebra Spring 2017 33.1/ 40

Biplot Example: Singular Values Cumulative i δ i δi 2 percent sum percent 1 67.685 4581.197 68.45 4581.197 68.45 2 31.859 1014.964 15.16 5896.161 83.61 3 28.744 826.204 12.35 6722.365 95.96 4 16.449 270.557 4.04 6692.922 100.00 where percent = (δi 2/6692.922) 100%, sum = ik=1 δ2 i, and cumulative percent = ( ik=1 δ2 k /6602.922) 100%. If we take a rank 2 decomposition, B = 2 l=1 δ lp l q l = {δ 1 p j1 q i1 +δ 2 p j2 q i2 } = {b ji } and the value of the loss function is loss = n j=1 i=1 4 (x c,ji b ji ) 2 = 826.204 +270.557 = 1096.761 Only losing (1096/6692) 100% = 16.39% of the information in the data matrix (loosely speaking). C.J. Anderson (Illinois) More Linear Algebra Spring 2017 34.1/ 40

Biplot Example: Singular Vectors Left Singular Vectors: P 64 4 Right Singular Vectors: Q 4 4 p 1 p 2 p 3 p 4 q 1 q 2 q 3 q 4 0.002 0.248 0.139 0.029 0.274 0.001 0.326 0.904 0.157 0.026 0.098 0.056 0.284 0.184 0.854 0.394 0.092 0.077 0.091 0.001 0.856 0.408 0.271 0.162 0.198 0.041 0.079 0.120 0.333 0.893 0.300 0.009 0.111 0.118 0.031 0.233 0.073 0.054 0.166 0.140 0.045 0.073 0.081 0.051 0.046 0.068 0.304 0.173 0.042 0.299 0.257 0.098 etc. C.J. Anderson (Illinois) More Linear Algebra Spring 2017 35.1/ 40

Biplot: Representing Cases First let s look at the rank 2 solution/approximation X c }{{} (64 4) = P }{{} }{{} Q }{{} (64 2) (2 2)(2 4) For our rank 2 solution, to represent subjects or cases, we ll plot the rows of the product P 64 2 2 2 as points in a 2-dimensional space. Let q il = the value in the i th row of q l, so post-multiplying both side by Q gives P = X c,(64 4) Q (4 4) = 4i=1 q i1 x 4i=1 c,1i q i2 x 4i=1 c,1i q i3 x 4i=1 c,1i q i4 x c,1i 4i=1 q i1 x 4i=1 c,2i q i2 x 4i=1 c,2i q i3 x 4i=1 c,2i q i4 x c,2i.... 4i=1 q i1 x 4i=1 c,64i q i2 x 4i=1 c,64i q i3 x 4i=1 c,64i q i4 x c,64i C.J. Anderson (Illinois) More Linear Algebra Spring 2017 36.1/ 40

Biplot: Representing Cases & Variables For cases, what we are plotting are linear combination of the data (mean centered) matrix. For example, for subject one, we plot the point (p j1 δ 1,p j2 δ 2 ) = (( 0.002)(67.685),( 0.248)(31.859))= ( 0.135, 7.901). To represent variables, we ll plot the rows of Q 4 2 as vectors in the 2-dimensional space. For example, for variable one, we ll plot (0.274, 0.001). For the plot, I actually plotted variable vectors multiplied by 30 for cosmetic purposes it doesn t effect the interpretation. C.J. Anderson (Illinois) More Linear Algebra Spring 2017 37.1/ 40

The Graph & Foreshadowing of Things to Come C.J. Anderson (Illinois) More Linear Algebra Spring 2017 38.1/ 40

Maximization of Quadratic Forms for Points on the Unit Sphere In multivariate analyses, we have different goals and purposes different criteria to maximize (or minimize). Let B p p be a positive definite matrix with eigenvalues λ 1 λ 2 λ p and eigenvectors e 1, e 2,..., e p. x Bx Maximization: max x 0 x x = λ 1 is obtained when x = e 1 x Bx Minimization: min x 0 x x = λ p is obtained when x = e p Maximization under an orthogonality constraint: x Bx max x e 1,...,e k x x = λ k+1 is obtained when x = e k+1 C.J. Anderson (Illinois) More Linear Algebra Spring 2017 39.1/ 40

Overview of the Rest of the Semester See pages on web-site... C.J. Anderson (Illinois) More Linear Algebra Spring 2017 40.1/ 40