Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn No.1 Science Building, 1575
Outline Review and applications QR for symmetric matrix Numerical SVD
Singular value decomposition Theorem (Singular value decomposition) Let A R m n, then there exist U R m m, V R n n and Σ R m n such that A = UΣV where Σ = diag(σ 1,..., σ r) R m n. r is the rank of A, σ i > 0 are called singular values of A, U T U = I, V T V = I are orthogonal matrices. It is straightforward that A T A = V T Σ T ΣV = V T diag(σ1, 2..., σr, 2 0,..., 0)V i.e. the singular value σ i = λ i(a T A). Similarly we have σ i = λ i(aa T ).
About singular values To find the orthogonal matrices U and V is equivalent to find the eigenvectors of matrices A T A and AA T. If A is symmetric, the singular value matrix Σ = D, where D = diag(λ 1,..., λ r, 0,..., 0). λ i is the eigenvalues of A, and V = U T. The 2-norm of a matrix The 2-condition number A 2 = λ max(a T A) = σ max. Cond 2(A) = A 2 A 1 2 = σmax σ min.
Generalized inverse of a matrix In general, if A is singular, A 1 doesn t exist! If A R m n, there is no definition for A 1. We define the Moore-Penrose generalized inverse of A as A + = V T diag(σ 1 1,..., σr 1, 0,..., 0)U T for arbitrary matrix A!
Least square problems Least square problem 1: Ax = b may have more than one solution. If it has more than one solution we wish to pick one with x 2 is the smallest, i.e., to find x S = {x Ax = b} such that min x x 2 Least square problem 2: if it has no solution we wish to pick one which is the solution of the following minimization problem min x Ax b 2 In any case we have the following solution by generalized inverse x = A + b.
Multivariate linear regression Formulation Suppose we have a list of experimental data for a multi-variate function Y = f(x 1, x 2,..., x m), after taking the zero-th and first order terms, we approximate Y as Y = β 0 + β 1x 1 + + β mx m The problem is how to recover β i from the data? Naively consider the linear system Y i = β 0 + β 1x i1 + + β mx im and i = 1,..., n. It may have no solution or have infinite solutions. This is reduced to the least square problem for Xβ = Y
Multivariate linear regression We have X = 1 x 11 x 12 x 1m 1 x 21 x 22 x 2m, β = β 0 β 1. 1 x n1 x n2 x nm β m Least square solution β = X + Y
Principal component analysis (PCA) Object: For a multi-component problem, is it possible to catch very few but very important characters to reduce the scale or dimension of the problem? Answer: Yes! PCA can do this job!
Principal component analysis (PCA) PCA Suppose we have experimental data to n characters( ) of t units( ) for a biological species, which can be proposed a matrix under experiments or investigations as y 11 y 12 y 1n Y = y 21 y 22 y 2n y t1 y t2 y tn Object: Intuitively, PCA is to find vectors a i = (a 1i, a 2i, a ni) (i = 1,..., n) such that F i = a 1iy 1 + a 2iy 2 + + a niy n, i = 1,..., n are perpendicular each other, and pick up some large components among F i 2. The analysis of a i will give the main components of the problem.
Principal component analysis (PCA) A geometrical interpretation of PCA for 2D coordinates analysis A mathematical rigorous interpretation (Projection maximization) max a 2 =1 i=1 Courant-Fisher s theorem gives PCA. N (x i a) 2 = a T X T Xa
Principal component analysis (PCA) Step 1: non-dimensionalization Calculate the mean ȳ j = 1 t y kj, j = 1, 2..., n t k=1 Calculate variance d j = t (y kj ȳ j) 2, j = 1, 2..., n Transformation x ij = k=1 yij ȳj d j, i = 1, 2..., t; j = 1, 2..., n Non-dimensionalization is used to eliminate the effect of choice of unit ( ).
Principal component analysis (PCA) Step 2: Define principal component vector as F i = a 1ix 1 + a 2ix 2 + + a nix n, i = 1,..., n where x i = (x 1i, x 2i,..., x ti). In order the vectors are independent each other, we need F T i F j = 0, i j i.e. F T i F j = (a 1i a 2i... a ni)x T X a 1j a 2j. = 0 a nj
Principal component analysis (PCA) Step 3: There exists orthogonal matrix A such that A T X T XA = diag(λ 1,..., λ n) and λ k 0 (k = 1,..., n). We have if i j, the vectors a i, a j in the i-th and j-th column will satisfy the independent condition, and F i 2 = λ i Step 4: Take the eigenvectors a i corresponding to the first m biggest eigenvalues (λ 1 > λ 2 > > λ m > ), and make linear combination F i = a 1ix 1 + a 2ix 2 + + a nix n, i = 1, 2,..., m We will obtain the first m principal component vectors.
PCA and SVD If X has SVD then we have A = V T, and X = UΣV V X T XV T = Σ T Σ To find the first m principal component vectors is equivalent to find the first m principal (biggest) singular value and corresponding right singular vectors.
Outline Review and applications QR for symmetric matrix Numerical SVD
Tri-diagonalization of symmetric matrix First transform symmetric A into tri-diagonal matrix T α 1 β 1. T = β 1 α.. 2...... βn 1 β n 1 α n by a sequence of Householder transformations. The transformation procedure is the same as that for upper Hessenburg form with symmetry argument.
Tri-diagonalization of symmetric matrix The approach is to apply Householder transformation to A column by column. a 11 a 12 a 1n A = a 21 a 22 a 2n a n1 a n2 a nn Suitably choose Householder matrix H 1 such that H 1 a 11 a 21 a 31. a n1 = a 11 a 21 0. 0 ( ) 1 0, H 1 = 0 H 1
Tri-diagonalization of symmetric matrix Now we have A 1 = H 1AH 1 = a 11 a 12 0 a 21 a 22 a 2n 0 a n2 a nn by symmetry of A and A 1. The next step is the same for upper Hesseburg form. Finally we have tridiagonal form T and T has the same eigenvalues as A.
Implicit shifted QR for symmetric tridiagonal matrix Now we have symmetric tridiagonal T with diagonal entries α i(i = 1,..., n) and off-diagonal entries β i(i = 1,..., n 1), one shifted QR step is T µi = QR ˆT = RQ + µi In fact ˆT = Q T T Q If we can find Q, ˆT directly, we doesn t need the intermediate steps. In fact Q T T Q = Q T (QR + µi)q = RQ + µi = ˆT.
Implicit shifted QR for symmetric tridiagonal matrix Find Givens matrix G 1 = G(1, 2; θ 1) such that ( ) T ( ) ( c s α1 µ = s c 0 β 1 ) Define T 1 = G T 1 T G 1. We have. T 1 =........ We should zero out the term *. That only needs another Givens matrix G 2 multiplication.
Implicit shifted QR for symmetric tridiagonal matrix We can find Givens matrix G 2 = G(2, 3; θ 2) such that the term * would be zero out. Define T 2 = G T 2 G T 1 T G 1G 2 We have. T 2 =........ We should zero out the term * again. That needs a Givens matrix multiplication again.
Implicit shifted QR for symmetric tridiagonal matrix Sequentially we have T n 2 =......... Finally we obtain ˆT =.........
Implicit shifted QR for symmetric tridiagonal matrix Iterating for ˆT to obtain the next QR step! In general the shift is chosen as the famous Wilkinson s shift: If the submatrix of T ( ) αn 1 β n 1 S = β n 1 then choose µ one of the eigenvalues of S which is more closer to α n. µ = α n + δ sign(δ) δ 2 + βn 1 2 α n and δ = αn+α n 1 2. The convergence will be very fast with this shift.
Outline Review and applications QR for symmetric matrix Numerical SVD
Implicit QR method for singular value computation First transform A into upper bidiagonal matrix B d 1 f 2. B = d.. 2... fn d n by a sequence of Householder transformations A U 1 eliminate the first column V 1 eliminate the first row ) U n eliminate the n-th column = ( B 0 A has the same singular values as B.
Implicit QR method for singular value computation First transform A into upper bidiagonal matrix B d 1 f 2. B = d.. 2... fn d n by a sequence of Householder transformations A U 1 eliminate the first column V 1 eliminate the first row ) U n eliminate the n-th column = ( B 0 Now we have ( ) B U n U 1AV 1 V n 1 = 0
Implicit shifted QR method for singular value computation Basic idea: Implicitly apply shifted QR method to symmetric tridiagonal matrix B T B but without forming it. Steps: Determine the shift µ. This is equivalent to the shift step for B T B. Wilkinson shift: set µ is the eigenvalue of ( d 2 n 1 + f n 1 2 d n 1 f n d n 1 f n d 2 n + f n 2 ) closer to d 2 n + fn 2 to make the convergence faster. Find Givens matrix G 1 = G(1, 2; θ) such that ( ) T ( ) ( c s d 2 1 µ = s c d 1 f 2 and compute BG 1. This is equivalent to apply G 1 step for B T B. 0 )
Implicit shifted QR method for singular value computation We have BG 1 =...... so we should zero out the term *. We want to find P 2 and G 2 such that P 2(BG 1)G 2 is bidiagonal and G 2e 1 = e 1. This is equivalent to apply G 2 step for G T 1 B T BG 1.
Implicit shifted QR method for singular value computation It is not difficult to find P 2 and G 2 by Givens transformation and we have. P 2BG 1G 2 =..... so we should zero out the term *. We want to find P 3 and G 3 such that P 3P 2BG 1G 2G 3 is bidiagonal and G 3e i = e i, i = 1, 2. These steps should be repeated until BG 1 becomes bidiagonal! It is equivalent to find G i steps for symmetric tridiagonal matrix.
Implicit shifted QR method for singular value computation Finally we have P n 1 P 2BG 1 G n 1 =...... Iterate until the off-diagonal entries converge to 0, and the diagonal entries converge to singular values!