Lecture 5 Singular value decomposition

Similar documents
Lecture 4 Eigenvalue problems

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Linear Algebra Methods for Data Mining

UNIT 6: The singular value decomposition.

Singular Value Decomposition

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Parallel Singular Value Decomposition. Jiaxing Tan

Eigenvalue Problems and Singular Value Decomposition

COMP 558 lecture 18 Nov. 15, 2010

Numerical Methods I Singular Value Decomposition

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

13-2 Text: 28-30; AB: 1.3.3, 3.2.3, 3.4.2, 3.5, 3.6.2; GvL Eigen2

Principal Component Analysis

Solving linear equations with Gaussian Elimination (I)

The Singular Value Decomposition and Least Squares Problems

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

Properties of Matrices and Operations on Matrices

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

1 Linearity and Linear Systems

Jeffrey D. Ullman Stanford University

IV. Matrix Approximation using Least-Squares

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Maths for Signals and Systems Linear Algebra in Engineering

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Linear Algebra in Actuarial Science: Slides to the lecture

Computational Methods. Eigenvalues and Singular Values

STA141C: Big Data & High Performance Statistical Computing

Lecture 6. Numerical methods. Approximation of functions

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Linear Least Squares. Using SVD Decomposition.

The Singular Value Decomposition

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

14 Singular Value Decomposition

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra

EECS 275 Matrix Computation

Notes on Eigenvalues, Singular Values and QR

STA141C: Big Data & High Performance Statistical Computing

Linear Algebra Primer

SVD, PCA & Preprocessing

Matrix decompositions

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

The Singular Value Decomposition

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

ACM106a - Homework 4 Solutions

Singular Value Decomposition

Linear Algebra Review. Vectors

Multivariate Statistical Analysis

MLCC 2015 Dimensionality Reduction and PCA

Numerical Analysis Lecture Notes

In this section again we shall assume that the matrix A is m m, real and symmetric.

Lecture notes: Applied linear algebra Part 1. Version 2

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

MATH36001 Generalized Inverses and the SVD 2015

CSE 554 Lecture 7: Alignment

Chapter 1. Matrix Algebra

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Probabilistic Latent Semantic Analysis

1 Cricket chirps: an example

Large Scale Data Analysis Using Deep Learning

Unsupervised Learning: Dimensionality Reduction

Least Squares Optimization

(Linear equations) Applied Linear Algebra in Geoscience Using MATLAB

B553 Lecture 5: Matrix Algebra Review

Linear Algebra for Machine Learning. Sargur N. Srihari

Knowledge Discovery and Data Mining 1 (VO) ( )

Lecture: Face Recognition and Feature Reduction

1 Inner Product and Orthogonality

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Algebra C Numerical Linear Algebra Sample Exam Problems

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Applied Numerical Linear Algebra. Lecture 8

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Dimensionality Reduction

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Lecture: Face Recognition and Feature Reduction

Singular Value Decomposition

Block Bidiagonal Decomposition and Least Squares Problems

Linear Least-Squares Data Fitting

Mathematical foundations - linear algebra

11.3 Eigenvalues and Eigenvectors of a Tridiagonal Matrix

Singular Value Decomposition

18.06 Problem Set 10 - Solutions Due Thursday, 29 November 2007 at 4 pm in

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

15 Singular Value Decomposition

Principal components analysis COMS 4771

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Lecture # 11 The Power Method for Eigenvalues Part II. The power method find the largest (in magnitude) eigenvalue of. A R n n.

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Transcription:

Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn No.1 Science Building, 1575

Outline Review and applications QR for symmetric matrix Numerical SVD

Singular value decomposition Theorem (Singular value decomposition) Let A R m n, then there exist U R m m, V R n n and Σ R m n such that A = UΣV where Σ = diag(σ 1,..., σ r) R m n. r is the rank of A, σ i > 0 are called singular values of A, U T U = I, V T V = I are orthogonal matrices. It is straightforward that A T A = V T Σ T ΣV = V T diag(σ1, 2..., σr, 2 0,..., 0)V i.e. the singular value σ i = λ i(a T A). Similarly we have σ i = λ i(aa T ).

About singular values To find the orthogonal matrices U and V is equivalent to find the eigenvectors of matrices A T A and AA T. If A is symmetric, the singular value matrix Σ = D, where D = diag(λ 1,..., λ r, 0,..., 0). λ i is the eigenvalues of A, and V = U T. The 2-norm of a matrix The 2-condition number A 2 = λ max(a T A) = σ max. Cond 2(A) = A 2 A 1 2 = σmax σ min.

Generalized inverse of a matrix In general, if A is singular, A 1 doesn t exist! If A R m n, there is no definition for A 1. We define the Moore-Penrose generalized inverse of A as A + = V T diag(σ 1 1,..., σr 1, 0,..., 0)U T for arbitrary matrix A!

Least square problems Least square problem 1: Ax = b may have more than one solution. If it has more than one solution we wish to pick one with x 2 is the smallest, i.e., to find x S = {x Ax = b} such that min x x 2 Least square problem 2: if it has no solution we wish to pick one which is the solution of the following minimization problem min x Ax b 2 In any case we have the following solution by generalized inverse x = A + b.

Multivariate linear regression Formulation Suppose we have a list of experimental data for a multi-variate function Y = f(x 1, x 2,..., x m), after taking the zero-th and first order terms, we approximate Y as Y = β 0 + β 1x 1 + + β mx m The problem is how to recover β i from the data? Naively consider the linear system Y i = β 0 + β 1x i1 + + β mx im and i = 1,..., n. It may have no solution or have infinite solutions. This is reduced to the least square problem for Xβ = Y

Multivariate linear regression We have X = 1 x 11 x 12 x 1m 1 x 21 x 22 x 2m, β = β 0 β 1. 1 x n1 x n2 x nm β m Least square solution β = X + Y

Principal component analysis (PCA) Object: For a multi-component problem, is it possible to catch very few but very important characters to reduce the scale or dimension of the problem? Answer: Yes! PCA can do this job!

Principal component analysis (PCA) PCA Suppose we have experimental data to n characters( ) of t units( ) for a biological species, which can be proposed a matrix under experiments or investigations as y 11 y 12 y 1n Y = y 21 y 22 y 2n y t1 y t2 y tn Object: Intuitively, PCA is to find vectors a i = (a 1i, a 2i, a ni) (i = 1,..., n) such that F i = a 1iy 1 + a 2iy 2 + + a niy n, i = 1,..., n are perpendicular each other, and pick up some large components among F i 2. The analysis of a i will give the main components of the problem.

Principal component analysis (PCA) A geometrical interpretation of PCA for 2D coordinates analysis A mathematical rigorous interpretation (Projection maximization) max a 2 =1 i=1 Courant-Fisher s theorem gives PCA. N (x i a) 2 = a T X T Xa

Principal component analysis (PCA) Step 1: non-dimensionalization Calculate the mean ȳ j = 1 t y kj, j = 1, 2..., n t k=1 Calculate variance d j = t (y kj ȳ j) 2, j = 1, 2..., n Transformation x ij = k=1 yij ȳj d j, i = 1, 2..., t; j = 1, 2..., n Non-dimensionalization is used to eliminate the effect of choice of unit ( ).

Principal component analysis (PCA) Step 2: Define principal component vector as F i = a 1ix 1 + a 2ix 2 + + a nix n, i = 1,..., n where x i = (x 1i, x 2i,..., x ti). In order the vectors are independent each other, we need F T i F j = 0, i j i.e. F T i F j = (a 1i a 2i... a ni)x T X a 1j a 2j. = 0 a nj

Principal component analysis (PCA) Step 3: There exists orthogonal matrix A such that A T X T XA = diag(λ 1,..., λ n) and λ k 0 (k = 1,..., n). We have if i j, the vectors a i, a j in the i-th and j-th column will satisfy the independent condition, and F i 2 = λ i Step 4: Take the eigenvectors a i corresponding to the first m biggest eigenvalues (λ 1 > λ 2 > > λ m > ), and make linear combination F i = a 1ix 1 + a 2ix 2 + + a nix n, i = 1, 2,..., m We will obtain the first m principal component vectors.

PCA and SVD If X has SVD then we have A = V T, and X = UΣV V X T XV T = Σ T Σ To find the first m principal component vectors is equivalent to find the first m principal (biggest) singular value and corresponding right singular vectors.

Outline Review and applications QR for symmetric matrix Numerical SVD

Tri-diagonalization of symmetric matrix First transform symmetric A into tri-diagonal matrix T α 1 β 1. T = β 1 α.. 2...... βn 1 β n 1 α n by a sequence of Householder transformations. The transformation procedure is the same as that for upper Hessenburg form with symmetry argument.

Tri-diagonalization of symmetric matrix The approach is to apply Householder transformation to A column by column. a 11 a 12 a 1n A = a 21 a 22 a 2n a n1 a n2 a nn Suitably choose Householder matrix H 1 such that H 1 a 11 a 21 a 31. a n1 = a 11 a 21 0. 0 ( ) 1 0, H 1 = 0 H 1

Tri-diagonalization of symmetric matrix Now we have A 1 = H 1AH 1 = a 11 a 12 0 a 21 a 22 a 2n 0 a n2 a nn by symmetry of A and A 1. The next step is the same for upper Hesseburg form. Finally we have tridiagonal form T and T has the same eigenvalues as A.

Implicit shifted QR for symmetric tridiagonal matrix Now we have symmetric tridiagonal T with diagonal entries α i(i = 1,..., n) and off-diagonal entries β i(i = 1,..., n 1), one shifted QR step is T µi = QR ˆT = RQ + µi In fact ˆT = Q T T Q If we can find Q, ˆT directly, we doesn t need the intermediate steps. In fact Q T T Q = Q T (QR + µi)q = RQ + µi = ˆT.

Implicit shifted QR for symmetric tridiagonal matrix Find Givens matrix G 1 = G(1, 2; θ 1) such that ( ) T ( ) ( c s α1 µ = s c 0 β 1 ) Define T 1 = G T 1 T G 1. We have. T 1 =........ We should zero out the term *. That only needs another Givens matrix G 2 multiplication.

Implicit shifted QR for symmetric tridiagonal matrix We can find Givens matrix G 2 = G(2, 3; θ 2) such that the term * would be zero out. Define T 2 = G T 2 G T 1 T G 1G 2 We have. T 2 =........ We should zero out the term * again. That needs a Givens matrix multiplication again.

Implicit shifted QR for symmetric tridiagonal matrix Sequentially we have T n 2 =......... Finally we obtain ˆT =.........

Implicit shifted QR for symmetric tridiagonal matrix Iterating for ˆT to obtain the next QR step! In general the shift is chosen as the famous Wilkinson s shift: If the submatrix of T ( ) αn 1 β n 1 S = β n 1 then choose µ one of the eigenvalues of S which is more closer to α n. µ = α n + δ sign(δ) δ 2 + βn 1 2 α n and δ = αn+α n 1 2. The convergence will be very fast with this shift.

Outline Review and applications QR for symmetric matrix Numerical SVD

Implicit QR method for singular value computation First transform A into upper bidiagonal matrix B d 1 f 2. B = d.. 2... fn d n by a sequence of Householder transformations A U 1 eliminate the first column V 1 eliminate the first row ) U n eliminate the n-th column = ( B 0 A has the same singular values as B.

Implicit QR method for singular value computation First transform A into upper bidiagonal matrix B d 1 f 2. B = d.. 2... fn d n by a sequence of Householder transformations A U 1 eliminate the first column V 1 eliminate the first row ) U n eliminate the n-th column = ( B 0 Now we have ( ) B U n U 1AV 1 V n 1 = 0

Implicit shifted QR method for singular value computation Basic idea: Implicitly apply shifted QR method to symmetric tridiagonal matrix B T B but without forming it. Steps: Determine the shift µ. This is equivalent to the shift step for B T B. Wilkinson shift: set µ is the eigenvalue of ( d 2 n 1 + f n 1 2 d n 1 f n d n 1 f n d 2 n + f n 2 ) closer to d 2 n + fn 2 to make the convergence faster. Find Givens matrix G 1 = G(1, 2; θ) such that ( ) T ( ) ( c s d 2 1 µ = s c d 1 f 2 and compute BG 1. This is equivalent to apply G 1 step for B T B. 0 )

Implicit shifted QR method for singular value computation We have BG 1 =...... so we should zero out the term *. We want to find P 2 and G 2 such that P 2(BG 1)G 2 is bidiagonal and G 2e 1 = e 1. This is equivalent to apply G 2 step for G T 1 B T BG 1.

Implicit shifted QR method for singular value computation It is not difficult to find P 2 and G 2 by Givens transformation and we have. P 2BG 1G 2 =..... so we should zero out the term *. We want to find P 3 and G 3 such that P 3P 2BG 1G 2G 3 is bidiagonal and G 3e i = e i, i = 1, 2. These steps should be repeated until BG 1 becomes bidiagonal! It is equivalent to find G i steps for symmetric tridiagonal matrix.

Implicit shifted QR method for singular value computation Finally we have P n 1 P 2BG 1 G n 1 =...... Iterate until the off-diagonal entries converge to 0, and the diagonal entries converge to singular values!