ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization Xiaohui Xie University of California, Irvine xhx@uci.edu Xiaohui Xie (UCI) ICS 6N 1 / 21

Symmetric matrices An n n matrix A is symmetric if A T = A. Component wise: A is symmetric if for i, j = 1, 2,..., n a ij = a ji Xiaohui Xie (UCI) ICS 6N 2 / 21

Matrix Diagonalization Matrix A is diagonalizable if there exists a diagonal matrix Λ such that A = PΛP 1 If A can be diagonalized, then A k = PΛ k P 1 No all matrices can be diagonalized. A matrix can be diagonalized if and only if there exists n linearly independent eigenvectors. Some special cases: If an nxn matrix A has n distinct eigenvalues, then it is diagonalizable. If A is symmetric, then it is diagonalizable. Xiaohui Xie (UCI) ICS 6N 3 / 21

Diagonalization of symmetric matrices Example: diagonalize the matrix 6 2 1 A = 2 6 1 1 1 5 Characteristic equation of A is 0 = λ 3 + 17λ 2 90λ + 144 = (λ 8)(λ 6)(λ 3) so we have three distinct eigenvalues λ 1 = 8, λ 2 = 6, λ 3 = 3. Find corresponding eigenvectors 1 1 1 v 1 = 1, v 2 = 1, v 3 = 1 0 2 1 Note that v T 1 v 2 = 0, v T 1 v 3 = 0, v T 2 v 3 = 0, i.e., the eigenvectors are mutually orthogonal. Xiaohui Xie (UCI) ICS 6N 4 / 21

Diagonalization of symmetric matrices 6 2 1 Example: diagonalize the matrix A = 2 6 1 1 1 5 Further normalize eigenvector to be unit vectors. 1/ 2 u 1 = 1/ 1/ 6 2, u 2 = 1/ 1/ 3 6 0 2/, u 3 = 1/ 3 6 1/ 3 Let 1/ 2 1/ 6 1/ 3 P = 1/ 2 1/ 6 1/ 8 0 0 3 0 2/ 6 1/, D = 0 6 0 3 0 0 3 A = PDP T, since P is an orthogonal matrix (P 1 = P T ). Xiaohui Xie (UCI) ICS 6N 5 / 21

Spectrum theorem If A is an n n symmetric matrix 1 All eigenvalues of A are real 2 A has exactly n real eigenvalues (counting for multiplicity). But this doesn t mean they are distinct 3 The geometric multiplicity of λ = dim(null(a λi )) = the algebraic multiplicity of λ 4 The eigenspaces are mutually orthogonal: If λ 1 λ 2 are two distinct eigenvalues, then their corresponding eigenvectors v 1, v 2 are orthogonal. Xiaohui Xie (UCI) ICS 6N 6 / 21

Proof 1 Let λ be an eigenvalue of A with corresponding eigenvector x, so Ax = λx and Ax = λ x. Then = λ = λ, so λ is real. λ x T x = x T Ax = (Ax) T x = λx T x. 2 Let x 1 and x 2 be two eigenvectors corresponding to two distinct eigenvalues λ 1 and λ 2. x T 1 Ax 2 = (x T 1 Ax 2 ) T = x T 2 A T (x T 1 ) T = x T 2 Ax 1 = λ 2 x T 1 x 2 = λ 1 x T 1 x 2 = (λ 1 λ 2 )(x T 1 x 2 ) = 0 Since λ 1 λ 2, (x T 1 x 2) = 0 so they are orthogonal. Xiaohui Xie (UCI) ICS 6N 7 / 21

Orthogonal diagonalization If an n n matrix A is symmetric, its eigenvectors v 1,, v n can be chosen to be orthonormal. If it has n distinct eigenvalues, then the n eigenvectors are orthogonal. Normalize these vectors to make them orthonormal. If an eigenvalue λ has multiplicity greater than 1, find an orthonormal basis of the corresponding eigenspace, Null(A λi), and use vectors in this basis as eigenvectors. In this case, P = [ v 1 v 2... ] v n is an orthogonal matrix, that is, P 1 = P T. And A can be orthogonally diagonalized A = PΛP T Xiaohui Xie (UCI) ICS 6N 8 / 21

Orthogonal diagonalization: an example 3 2 4 Orthogonally diagonalize the matrix A = 2 6 2 4 2 3 Characteristic equation: 0 = λ 3 + 12λ 2 21λ 98 = (λ 7) 2 (λ + 2) Produce bases for the eigenspaces by solving linear equations: 1 1/2 1 λ = 7 : v 1 = 0 v 2 = 1 1 0 ; λ = 2 : v 3 = 1/2 1 Apply Gram-Schdmit to produce an orthogonal basis for the eigenspace of λ = 7. Xiaohui Xie (UCI) ICS 6N 9 / 21

Orthogonal diagonalization: an example Produce bases for the eigenspaces by solving linear equations: 1 1/2 1 λ = 7 : v 1 = 0 v 2 = 1 1 0 ; λ = 2 : v 3 = 1/2 1 Apply Gram-Schdmit to produce orthogonal bases The component of v 2 orthogonal to v 1 is z 2 = v 2 v 1/4 2 v 1 v 1 = 1 v 1 v 1 1/4 Normalize v 1, z 2 1/ 2 1/ 18 u 1 = 0 1/, u 2 = 4/ 18 2 1/ 18 Normalize v 3 to obtain u 3. A = PDP T where P = [u 1, u 2, u 3 ] and D = diag(7, 7, 2). Xiaohui Xie (UCI) ICS 6N 10 / 21

Application 1: Quadratic Forms Any quadratic function of x can be expressed in the form of Q(x) = x T Ax where x is a vector in R n and A is an nxn symmetric matrix. More explicitly, n n x T Ax = a ij x i x j i=1 j=1 Xiaohui Xie (UCI) ICS 6N 11 / 21

Example For example, Q(x) = 2x1 2 + 3x2 2 + 4x3 2 + 5x 2 x 3 + 6x 1 x 2 can be written in quadratic form with matrix 2 3 0 A = 3 3 5 2 5 0 2 4 Xiaohui Xie (UCI) ICS 6N 12 / 21

Optimizing quadratic functions Consider the following optimization problem: max Q(x) = 2x 2 1 + 3x 2 2 + 4x 2 3 subject to x = 1 Xiaohui Xie (UCI) ICS 6N 13 / 21

Optimizing quadratic functions Consider the following optimization problem (without cross-product terms): max Q(x) = 2x 2 1 + 3x 2 2 + 4x 2 3 subject to x = 1 Solution: Since 2x1 2 4x 1 2 and 3x 2 4x2 2, we have Q(x) 4x 2 1 + 4x 2 2 + 4x 2 3 = 4 In addition, we can choose x 1 = 0, x 2 = 0, x 3 = 1 to reach the maximum. Xiaohui Xie (UCI) ICS 6N 14 / 21

Optimizing quadratic functions A more general problem: max Q(x) = x T Ax subject to x = 1 Xiaohui Xie (UCI) ICS 6N 15 / 21

Optimizing quadratic functions A more general problem: max Q(x) = x T Ax subject to x = 1 Solution: Use A = PΛP T to transform the problem into an easier form. Q(x) = x T PΛP T x = (P T x) T Λ(P T x) Use y = P T x to change variables. Convert the problem to max Q(y) = y T Λy = λ 1 y 2 1 + + λ n y 2 n subject to y = 1 max x T Ax subject to x = 1: λ max { A } min x T Ax subject to x = 1: λ min { A } Xiaohui Xie (UCI) ICS 6N 16 / 21

Optimizing quadratic functions: example max Q(x) = x 2 1 8x 1 x 2 5x 2 2 subject to x = 1 Xiaohui Xie (UCI) ICS 6N 17 / 21

Optimizing quadratic functions: example Solution: The matrix of the quadratic form is [ ] 1 4 A = 4 5 Orthogonally diagonalize A: [ ] [ ] 2/ 5 1/ 5 P = 1/ 5 2/ 3 0, D = 5 0 7 Change variables from x to y = P T x, and rewrite the objective function x 2 1 8x 1 x 2 5x 2 2 = x T Ax = (Py) T A(Py) = y T Dy = 3y 2 1 7y 2 2 max Q(x) over x = 1 is 3. Xiaohui Xie (UCI) ICS 6N 18 / 21

Application 2: Principle Component Analysis (PCA) Problem: Given a set of data points {x (1), x (2),, x (m) } in R n, find the axis along which the data points have maximal variation. Assume the data center around origin. If not, subtract the mean from each data point. Xiaohui Xie (UCI) ICS 6N 19 / 21

Application 2: Principle Component Analysis (PCA) Problem: Given a set of data points {x (1), x (2),, x (m) } in R n, find the axis along which the data points have maximal variance. Use a unit vector u in R n denote the direction of the axis. Project each data point onto u to obtain {y (1), y (2),, y (m) }, where y (i) = u T x (i). The variance of projected points σ 2 = 1 m m (y (i) ) 2 = 1 m i=1 m u T x (i) (x (i) ) T u = u T Xu i=1 where matrix X is defined by X = 1 m m x (i) (x (i) ) T i=1 called covariance matrix. Xiaohui Xie (UCI) ICS 6N 20 / 21

Application 2: Principle Component Analysis (PCA) Problem: Given a set of data points {x (1), x (2),, x (m) } in R n, find the axis along which the data points have maximal variance. Reformulate the problem into a quadratic optimization problem max u T Xu subject to u = 1 where matrix X = 1 m m i=1 x (i) (x (i) ) T is the covariance matrix. Solution: u is the eigenvector corresponding to the largest eigenvalue of X. The resulting y points are called the first principle component. Xiaohui Xie (UCI) ICS 6N 21 / 21