Numerical Methods I Singular Value Decomposition

Similar documents
Singular Value Decomposition

Numerical Methods I Eigenvalue Problems

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

The Singular Value Decomposition

Maths for Signals and Systems Linear Algebra in Engineering

Numerical Methods I Non-Square and Sparse Linear Systems

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

UNIT 6: The singular value decomposition.

Linear Algebra Methods for Data Mining

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

The Singular Value Decomposition and Least Squares Problems

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Notes on Eigenvalues, Singular Values and QR

Linear Algebra, part 3 QR and SVD

Computational Methods CMSC/AMSC/MAPL 460. EigenValue decomposition Singular Value Decomposition. Ramani Duraiswami, Dept. of Computer Science

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Singular Value Decomposition

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Singular Value Decomposition

1 Singular Value Decomposition and Principal Component

Lecture: Face Recognition and Feature Reduction

Scientific Computing: Dense Linear Systems

Lecture: Face Recognition and Feature Reduction

Linear Least Squares. Using SVD Decomposition.

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

MATH36001 Generalized Inverses and the SVD 2015

COMP 558 lecture 18 Nov. 15, 2010

Probabilistic Latent Semantic Analysis

Review problems for MA 54, Fall 2004.

1 Cricket chirps: an example

Singular Value Decomposition

Homework 1. Yuan Yao. September 18, 2011

Computational Methods. Eigenvalues and Singular Values

Lecture notes: Applied linear algebra Part 1. Version 2

Scientific Computing: Solving Linear Systems

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Principal Component Analysis (PCA)

The Singular Value Decomposition

Singular Value Decompsition

14 Singular Value Decomposition

Lecture 5 Singular value decomposition

Singular Value Decomposition (SVD)

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

Chapter 7: Symmetric Matrices and Quadratic Forms

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Example: Face Detection

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

7. Symmetric Matrices and Quadratic Forms

Linear Algebra Methods for Data Mining

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Principal Component Analysis

The Singular Value Decomposition

Latent Semantic Analysis. Hongning Wang

Linear Algebra Review. Vectors

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

1 Linearity and Linear Systems

15 Singular Value Decomposition

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Pseudoinverse & Moore-Penrose Conditions

Linear Algebra - Part II

IV. Matrix Approximation using Least-Squares

Applied Linear Algebra in Geoscience Using MATLAB

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

STA141C: Big Data & High Performance Statistical Computing

Principal Component Analysis

Linear Methods in Data Mining

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

Maths for Signals and Systems Linear Algebra in Engineering

Information Retrieval

Problems. Looks for literal term matches. Problems:

Linear Algebra Review. Fei-Fei Li

EE731 Lecture Notes: Matrix Computations for Signal Processing

7 Principal Component Analysis

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4]

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

MIT Final Exam Solutions, Spring 2017

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Pseudoinverse & Orthogonal Projection Operators

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

STA141C: Big Data & High Performance Statistical Computing

Math 408 Advanced Linear Algebra

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Summary of Week 9 B = then A A =

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

LinGloss. A glossary of linear algebra

The Mathematics of Facial Recognition

Multivariate Statistical Analysis

ECE 275A Homework # 3 Due Thursday 10/27/2016

MATH 612 Computational methods for equation solving and function minimization Week # 2

Large Scale Data Analysis Using Deep Learning

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Chapter 3 Transformations

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Transcription:

Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute) Lecture V 10/2014 1 / 23

Outline 1 Review of Linear Algebra: SVD 2 Computing the SVD 3 Principal Component Analysis (PCA) 4 Conclusions A. Donev (Courant Institute) Lecture V 10/2014 2 / 23

Review of Linear Algebra: SVD Formal definition of the SVD Every matrix has a singular value decomposition A =UΣV = p σ i u i v i i=1 [m n] = [m m] [m n] [n n], where U and V are unitary matrices whose columns are the left, u i, and the right, v i, singular vectors, and Σ = Diag {σ 1, σ 2,..., σ p } is a diagonal matrix with real positive diagonal entries called singular values of the matrix σ 1 σ 2 σ p 0, and p = min (m, n) is the maximum possible rank of the matrix. A. Donev (Courant Institute) Lecture V 10/2014 3 / 23

Review of Linear Algebra: SVD Comparison to eigenvalue decomposition Recall the eigenvector decomposition for diagonalizable matrices AX = XΛ. The singular value decomposition can be written similarly to the eigenvector one AV = UΣ A U = VΣ and they both diagonalize A, but there are some important differences: 1 The SVD exists for any matrix, not just diagonalizable ones. 2 The SVD uses different vectors on the left and the right (different basis for the domain and image of the linear mapping represented by A). 3 The SVD always uses orthonormal basis (unitary matrices), not just for unitarily diagonalizable matrices. A. Donev (Courant Institute) Lecture V 10/2014 4 / 23

Review of Linear Algebra: SVD Relation to Hermitian Matrices For Hermitian (symmetric) matrices, and X = ±U = ±V Σ = Λ, so there is no fundamental difference between the SVD and eigenvalue decompositions. The squared singular values are eigenvalues of the normal matrix: since σ i (A) = λ i (AA ) = λ i (A A) A A = (VΣU ) (UΣV ) = VΣ 2 V is a similarity transformation. Similarly, the singular vectors are the corresponding eigenvectors up to a sign. A. Donev (Courant Institute) Lecture V 10/2014 5 / 23

Review of Linear Algebra: SVD Rank-Revealing Properties Assume the rank of the matrix is r, that is, the dimension of the range of A is r and the dimension of the null-space of A is n r (recall the fundamental theorem of linear algebra). The SVD is a rank-revealing matrix factorization because only r of the singular values are nonzero, σ r+1 = = σ p = 0. The left singular vectors {u 1,..., u r } form an orthonormal basis for the range (column space, or image) of A. The right singular vectors {v r+1,..., v n } form an orthonormal basis for the null-space (kernel) of A. A. Donev (Courant Institute) Lecture V 10/2014 6 / 23

Review of Linear Algebra: SVD The matrix pseudo-inverse For square non-singular systems, x = A 1 b. Can we generalize the matrix inverse to non-square or rank-deficient matrices? Yes: matrix pseudo-inverse (Moore-Penrose inverse): where A = VΣ U, Σ = Diag { σ 1 1, σ 1 2,..., σ 1 r, 0,..., 0 }. In numerical computations very small singular values should be considered to be zero (see homework). Theorem: The least-squares solution to over- or under-determined linear systems Ax = b is x = A b. A. Donev (Courant Institute) Lecture V 10/2014 7 / 23

Review of Linear Algebra: SVD Proof of Least-Squares (1) A. Donev (Courant Institute) Lecture V 10/2014 8 / 23

Review of Linear Algebra: SVD Proof of Least-Squares (2) A. Donev (Courant Institute) Lecture V 10/2014 9 / 23

Review of Linear Algebra: SVD Proof of Least-Squares (3) A. Donev (Courant Institute) Lecture V 10/2014 10 / 23

Computing the SVD Sensitivity (conditioning) of the SVD Since unitary transformations preserve the 2-norm, δσ 2 δa 2. The SVD computation is always perfectly well-conditioned! However, this refers to absolute errors: The relative error of small singular values will be large. The power of the SVD lies in the fact that it always exists and can be computed stably...but it is expensive to compute. A. Donev (Courant Institute) Lecture V 10/2014 11 / 23

Computing the SVD Computing the SVD The SVD can be computed by performing an eigenvalue computation for the normal matrix A A (a positive-semidefinite matrix). This squares the condition number for small singular values and is not numerically-stable. Instead, one can compute the eigenvalue decomposition of the symmetric indefinite 2m 2m block matrix [ ] 0 A H =. A 0 The cost of the calculation is O(mn 2 ), of the same order as eigenvalue calculation, but in practice SVD is more expensive, at least for well-conditioned cases. A. Donev (Courant Institute) Lecture V 10/2014 12 / 23

Reduced SVD Computing the SVD The full (standard) SVD A =UΣV = p σ i u i v i i=1 [m n] = [m m] [m n] [n n], is in practice often computed in reduced (economy) SVD form, where Σ is [p p]: [m n] = [m n] [n n] [n n] for m > n [m n] = [m m] [m m] [m n] for n > m This contains all the information as the full SVD but can be cheaper to compute if m n or m n. A. Donev (Courant Institute) Lecture V 10/2014 13 / 23

In MATLAB Computing the SVD [U, Σ, V ] = svd(a) for full SVD, computed using a QR-like method. [U, Σ, V ] = svd(a, econ ) for economy SVD. For rank-defficient or under-determined systems the backslash operator (mldivide) gives a basic solution. Basic means x has at most r non-zeros (not unique). The least-squares solution can be computed using svd or pinv (pseudo-inverse, see homework). A rank-q approximation can be computed efficiently for sparse matrices using [U, S, V ] = svds(a, q). A. Donev (Courant Institute) Lecture V 10/2014 14 / 23

Principal Component Analysis (PCA) Low-rank approximations The SVD is a decomposition into rank-1 outer product matrices: r r A = UΣV = σ i u i v i = i=1 The rank-1 components A i are called principal components, the most important ones corresponding to the larger σ i. Ignoring all singular values/vectors except the first q, we get a low-rank approximation: q A Â q = U q Σ q V q = σ i u i v i. Theorem: This is the best approximation of rank-q in the Euclidian and Frobenius norm: 2 A Â q = σ q+1 i=1 i=1 A i A. Donev (Courant Institute) Lecture V 10/2014 15 / 23

Principal Component Analysis (PCA) Applications of SVD/PCA Statistical analysis (e.g., DNA microarray analysis, clustering). Data compression (e.g., image compression, explained next). Feature extraction, e.g., face or character recognition (see Eigenfaces on Wikipedia). Latent semantic indexing for context-sensitive searching (see Wikipedia). Noise reduction (e.g., weather prediction). One example concerning language analysis given in homework. A. Donev (Courant Institute) Lecture V 10/2014 16 / 23

Principal Component Analysis (PCA) Image Compression >> A=r g b 2 g r a y ( imread ( b a s k e t. jpg ) ) ; >> imshow (A ) ; >> [ U, S, V]= svd ( double (A ) ) ; >> r =25; % Rank r a p p r o x i m a t i o n >> Acomp=U ( :, 1 : r ) S ( 1 : r, 1 : r ) (V ( :, 1 : r ) ) ; >> imshow ( u i n t 8 (Acomp ) ) ; A. Donev (Courant Institute) Lecture V 10/2014 17 / 23

Principal Component Analysis (PCA) Compressing an image of a basket We used only 25 out of the 400 singular values to construct a rank 25 approximation: A. Donev (Courant Institute) Lecture V 10/2014 18 / 23

Principal Component Analysis (PCA) Principal Component Analysis Principal Component Analysis (PCA) is a term used for low-rank approximations in statistical analysis of data. Consider having m empirical data points or observations (e.g., daily reports) of n variables (e.g., stock prices), and put them in a data matrix A = [m n]. Assume that each of the variables has zero mean, that is, the empirical mean has been subtracted out. It is also useful to choose the units of each variable (normalization) so that the variance is unity. We would like to find an orthogonal transformation of the original variables that accounts for as much of the variability of the data as possible. Specifically, the first principal component is the direction along which the variance of the data is largest. A. Donev (Courant Institute) Lecture V 10/2014 19 / 23

Principal Component Analysis (PCA) PCA and Variance A. Donev (Courant Institute) Lecture V 10/2014 20 / 23

Principal Component Analysis (PCA) PCA and SVD The covariance matrix of the data tells how correlated different pairs of variables are: C = A T A = [n n] The largest eigenvalue of C is the direction (line) that minimizes the sum of squares of the distances from the points to the line, or equivalently, maximizes the variance of the data projected onto that line. The SVD of the data matrix is A = UΣV. The eigenvectors of C are in fact the columns of V, and the eigenvalues of C are the squares of the singular values, C = A T A = VΣ (U U) ΣV = VΣ 2 V. Note: the singular values necessarily real since C is positive semi-definite. A. Donev (Courant Institute) Lecture V 10/2014 21 / 23

Principal Component Analysis (PCA) Clustering Analysis Given a new data point x, we can project it onto the basis formed by the principal component directions as: Vy = x y = V 1 x = V x, which simply amounts to taking dot products y i = v i x. The first few y i s often provide a good a reduced-dimensionality representation that captures most of the variance in the data. This is very useful for data clustering and analysis, as a tool to understand empirical data. The PCA/SVD is a linear transformation and it cannot capture nonlinearities. A. Donev (Courant Institute) Lecture V 10/2014 22 / 23

Conclusions Conclusions/Summary The singular value decomposition (SVD) is an alternative to the eigenvalue decomposition that is better for rank-defficient and ill-conditioned matrices in general. Computing the SVD is always numerically stable for any matrix, but is typically more expensive than other decompositions. The SVD can be used to compute low-rank approximations to a matrix via the principal component analysis (PCA). PCA has many practical applications and usually large sparse matrices appear. A. Donev (Courant Institute) Lecture V 10/2014 23 / 23