Linear Algebra Methods for Data Mining

Size: px
Start display at page:

Download "Linear Algebra Methods for Data Mining"


1 Linear Algebra Methods for Data Mining Saara Hyvönen, Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

2 The Singular Value Decomposition Any m n matrix A, with m n, can be factorized A = U ( ) Σ V T, 0 where U R m m and V R n n are orthogonal, and Σ R n n is diagonal: Σ = diag(σ 1, σ 2,..., σ n ), σ 1 σ 2... σ n 0. Skinny version : A = U 1 ΣV T, U 1 R m n. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 1

3 Matrix approximation Theorem. Let U k = (u 1 u 2... u k ), V k = (v 1 v 2... v k ) and Σ k = diag(σ 1, σ 2,..., σ k ), and define A k = U k Σ k V T k. Then min rank(b) k A B 2 = A A k 2 = σ k+1. E.g. the best approximation of rank k for the matrix A is A k = U k Σ k V T k. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 2

4 Consequences The best rank one approximation of A is A k = U 1 Σ 1 V T 1. Assume σ 1 σ 2... σ j > σ j+1 = 0 = σ j+2 =... = σ m. Then min rank(b) j A B 2 = A A j 2 = σ j+1 = 0. So the rank of A is the number of nonzero singular values of A. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 3

5 Perturbation theory Theorem. If A and A + E are in R m n with m n, then for k = 1...n σ k (A + E) σ k (A) σ 1 (E) = E 2. Proof. Omitted. Think of E as added noise. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 4

6 Example: low rank matrix plus noise 8 singular values index Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 5

7 Example: low rank matrix plus noise Assume A is a low rank matrix plus noise: N A. A = A + N, where Correct rank can be estimated by looking at singular values: when choosing a good k, look for gaps in singular values! When N is small, the number of larger singular values is often referred to as the numerical rank of A. The noise can be removed by estimating the numerical rank k from the singular values, and approximating A by the truncated SVD U k Σ k V T k. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 6

8 2 log of singular values index Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 7

9 In the figure, there is a gap between the 11th and 12th singular values. Estimate numerical rank to be 11. So to remove noise replace A by A k = U k Σ k V T k, where k = 11. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 8

10 Eigenvalue decomposition vs. SVD For symmetric A the singular value decomposition is closely related to the eigendecomposition: A = UΛU T, U and Λ eigenvectors and eigenvalues. Computation of both the eigendecomposition and the SVD follow the same pattern. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 9

11 Computation of the eigenvalue decomposition 1. Use Givens to transform A into tridiagonal form. 2. Use QR iteration with Wilkinson shift µ to transform tridiagonal form to diagonal form: Repeat until converged, (i) QR = T k µ k I (ii) T k+1 = RQ + µ k I The shift parameter µ is the eigenvalue of the 2 2 submatrix in the lower right corner that is closest to the element A n,n in the lower right corner. Once an eigenvalue is found, forget it, reduce (deflate) the problem, and go back to step 2. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 10

12 A= Example %After 1st step (with Givens to tridiagonal form:) A= Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 11

13 %intermediate results of 4 QR iteration steps: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 12

14 eig(aorig)= Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 13

15 Eigenvalues only: about 4n 3 /3. Flop counts Accumulation of the orthogonal transformations to compute the matrix of eigenvectors: about 9n 3 more. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 14

16 How about computing the SVD? We now know how to compute the eigendecomposition. Couldn t we use this to compute the eigenvalues of A T A to get the singular values? Well, yes... and NO. This is not the way to do it. Forming A T A can lead to loss of information. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 15

17 Computing the SVD 1. Use Householder transformations to transform A into bidiagonal form B. Now B T B is tridiagonal. 2. Use QR iteration with Wilkinson shift µ to transform tridiagonal B to diagonal form (without forming B T B implicitly!) Can be computed in 6mn n 3 flops. Efficiently implemented everywhere. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 16

18 Sparse matrices In many applications only a small number of entries are nonzero (e.g. term-document matrices). Iterative methods frequently used in solving sparse problems. This is because e.g. transformation to tridiagonal form would destroy sparsity, which leads to excessive storage requirements. Also computational complexity might be prohibitively high when dimension of data matrix is very large. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 17

19 The Singular Value Decomposition Any m n matrix A, with m n, can be factorized A = U ( ) Σ V T, 0 where U R m m and V R n n are orthogonal, and Σ R n n is diagonal: Σ = diag(σ 1, σ 2,..., σ n ), σ 1 σ 2... σ n 0. Skinny version : A = U 1 ΣV T, U 1 R m n. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 18

20 Equivalent forms of SVD: Facts about SVD A T Av j = σ 2 jv j, AA T u j = σ 2 ju j, where u j and v j are the columns of U and V respectively. Let U k, V k and Σ k, be matrices with the k first singular vectors and values, and define A k = U k Σ k V T k. Then min rank(b) k A B 2 = A A k 2 = σ k+1. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 19

21 Principal components analysis Idea: look for such a direction that the data projected onto it has maximal variance. When found, continue by seeking the next direction, which is orthogonal to this (i.e. uncorrelated), and which explains as much of the remaining variance in the data as possible. Ergo: we are seeking linear combinations of the original variables. If we are lucky, we can find a few such linear combinations, or directions, or (principal) components, which describe the data fairly accurately. The aim is to capture the intrinsic variability in the data. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 20

22 1st principal component x x x x x x x x x x x x 2nd principal component x x x Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 21

23 Example: Atmospheric data Data: 1500 days, and for each day, we have the mean and the std of around 30 measured variables (temperature, wind speed and direction, rain fall, UV-A radiation, concentration of CO2 etc.) Therefore, our data matrix is Visualizing things in a 60-dimensional space is challenging! Instead, do PCA, and project days onto the plane defined by the first two principal components. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 22

24 30 Days projected in the plane defined by the 1st two principal components, colored per month nd principal component st principal component 1 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 23

25 Example: spatial data analysis Data: 9000 dialect words, 500 counties. Word-county matrix A: A(i, j) = { 1 if word i appears in county j 0 otherwise. Apply PCA to this. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 24

26 Results obtained by PCA Data points: words; variables: counties. Each principal component tells which counties explain the most significant part of the variation left in the data. The first principal component is essentially just the number of words in each county! After this, geographical structure of principal components is apparent. Note: PCA knows nothing of the geography of the counties. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 25

27 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 26

28 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 27

29 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 28

30 PCA=SVD Let A be a n m data matrix in which the rows represent the cases. Each row is a data vector, each column represents a variable. (Note: usually the roles of rows and columns are the other way around!) A is centered: the estimated mean is subtracted from each column, so each column has zero mean. Let w be the m 1 column vector of (unknown) projection weights that result in the largest variance when the data A is projected along w. Require w T w = 1. Projection of a onto w is w T a = m j=1 a jw j. Projection of data along w is Aw. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 29

31 Projection of data along w is Aw. Variance: σw 2 = (Aw) T (Aw) = w T A T Aw = w T Cw where C = A T A is the covariance matrix of the data (A is centered!) Task: maximize variance subject to constraint w T w = 1. Optimization problem: maximize f = w T Cw λ(w T w 1), λ is the Lagrange multiplier. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 30

32 Optimization problem: maximize f = w T Cw λ(w T w 1), λ is the Lagrange multiplier. Differentiating with respect to w yields f w = 2Cw 2λw = 0 Eigenvalue equation: Cw = λw, where C = A T A. Solution: singular values and singular vectors of A!!! More precisely: the first principal component of A is exactly the first right singular vector v 1 of A. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 31

33 The solution of our opimization problem is given by w = v 1, λ = σ 2 1, where σ 1 and v 1 are the first singular value and the corresponding right singular vector of A, and Cw = λw. Our interest was to maximize the variance, which is given by w T Cw = w T λw = σ 2 1w T w = σ 2 1, so the singular value tells about the variance in the direction of the principal component. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 32

34 What next? Once the first principal component is found, we continue in the same fashion to look for the next one, which is orthogonal to (all) the principal component(s) already found. The solutions are the right singular vectors v k of A, and the variance in each direction is given by the corresponding singular values σ k. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 33

35 How not to compute the PCA: In literature one frequently runs across PCA algorithms, which start by computing the covariance matrix C = A T A of the centered data matrix A, and computes the eigenvalues of this. But we already know that this is a bad idea! The condition number of A T A is much larger than that of A. Loss of information. For a sparse A, C = A T A is no longer sparse! Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 34

36 How to compute the PCA: Data matrix A, rows=data points, columns = variables (attributes, parameters). 1. Center the data by subtracting the mean of each column. 2. Compute the SVD of the centered matrix values and vectors): Â = UΣV T. Â (or the k first singular 3. The principal components are the columns of V, the coordinates of the data in the basis defined by the principal components are UΣ. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 35

37 Matlab code for PCA %Data matrix A, columns:variables, rows: data points %matlab function for computing the first k principal components of A. function [pc,score]=pca(a,k); [rows,cols]=size(a); Ameans=repmat(mean(A,1),rows,1); %matrix, rows=means of columns A=A-Ameans; %centering data [U,S,V]=svds(A,k); %k is the number of pc:s desired pc=v; score=u*s; %now A=scores*pcs +Ameans; Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 36

38 Note on Matlab PCA is coded in the statistics toolbox in matlab, BUT... DO NOT USE IT!! Why? We have so few statistics toolbox licenses, that we run out of them frequently! Better not waste scarce resources on this. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 37

39 A vs A T Let A be a centered data matrix, and A = UΣV T. The principal components of A are V, and the coordinates in the basis defined by the principal components are UΣ. If A = UΣV T, then A T = VΣ T U T. So aren t the principal components of A T given by U and the coordinates VΣ T? So the pc s of A are the new coordinates of A T and vica versa, modulo a multiplication by the diagonal matrix Σ (or its inverse)? No. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 38

40 And why not? Because of the centering of the data! In general it does not hold, that the transpose of a centered matrix is the same as the centered transpose: (A meansofcolumns(a)) T (A T meansofcolumns(a T )) In practice these are related in special cases. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 39

41 Singular values tell about variance The variance in the direction of the k th principal component is given by the corresponding singular value: σ 2 k. Singular values can be used to estimate how many principal components to keep. Rule of thumb: keep enough to explain 85% of the variation: k j=1 σ2 j n j=1 σ2 j Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 40

42 Why talk about PCA? Why not just stick to SVD? Singular vectors=principal components: Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 41

43 Centering is central SVD will give vectors that go through the origin. Centering makes sure that the origin is in the middle of the data set. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 42

44 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 43

45 Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 44

46 Summary: PCA PCA is SVD done on centered data. PCA looks for such a direction that the data projected onto it has maximal variance. When found, PCA continues by seeking the next direction, which is orthogonal to all the previously found directions, and which explains as much of the remaining variance in the data as possible. Principal components are uncorrelated. Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 45

47 PCA is useful for data exploration visualizing data compressing data outlier detection ratio rules Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 46

48 References [1] Lars Eldén: Matrix Methods in Data Mining and Pattern Recognition, SIAM [2] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press [3] D. Hand, H. Mannila, P. Smyth, Principles of Data Mining, The MIT Press, Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki 47

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Spring 2007 PCA, NMF Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Summary: PCA PCA is SVD

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Spring 2007 2. Basic Linear Algebra continued Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki

More information

Lecture 5 Singular value decomposition

Lecture 5 Singular value decomposition Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, 2 School of Mathematical Sciences, Peking University,

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example

More information

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =

More information

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg Linear Algebra, part 3 Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2010 Going back to least squares (Sections 1.7 and 2.3 from Strang). We know from before: The vector

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Linear Methods in Data Mining

Linear Methods in Data Mining Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software

More information

Linear Algebra, part 3 QR and SVD

Linear Algebra, part 3 QR and SVD Linear Algebra, part 3 QR and SVD Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2012 Going back to least squares (Section 1.4 from Strang, now also see section 5.2). We

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information

Linear Algebra Review. Fei-Fei Li

Linear Algebra Review. Fei-Fei Li Linear Algebra Review Fei-Fei Li 1 / 37 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector

More information

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T. Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where

More information

18.06SC Final Exam Solutions

18.06SC Final Exam Solutions 18.06SC Final Exam Solutions 1 (4+7=11 pts.) Suppose A is 3 by 4, and Ax = 0 has exactly 2 special solutions: 1 2 x 1 = 1 and x 2 = 1 1 0 0 1 (a) Remembering that A is 3 by 4, find its row reduced echelon

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition An Important topic in NLA Radu Tiberiu Trîmbiţaş Babeş-Bolyai University February 23, 2009 Radu Tiberiu Trîmbiţaş ( Babeş-Bolyai University)The Singular Value Decomposition

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 5 Singular Value Decomposition We now reach an important Chapter in this course concerned with the Singular Value Decomposition of a matrix A. SVD, as it is commonly referred to, is one of the

More information

Linear Algebra Review. Fei-Fei Li

Linear Algebra Review. Fei-Fei Li Linear Algebra Review Fei-Fei Li 1 / 51 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector

More information

Notes on Eigenvalues, Singular Values and QR

Notes on Eigenvalues, Singular Values and QR Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Computational Methods. Eigenvalues and Singular Values

Computational Methods. Eigenvalues and Singular Values Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Maths for Signals and Systems Linear Algebra in Engineering

Maths for Signals and Systems Linear Algebra in Engineering Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE

More information

The Mathematics of Facial Recognition

The Mathematics of Facial Recognition William Dean Gowin Graduate Student Appalachian State University July 26, 2007 Outline EigenFaces Deconstruct a known face into an N-dimensional facespace where N is the number of faces in our data set.

More information

SVD, PCA & Preprocessing

SVD, PCA & Preprocessing Chapter 1 SVD, PCA & Preprocessing Part 2: Pre-processing and selecting the rank Pre-processing Skillicorn chapter 3.1 2 Why pre-process? Consider matrix of weather data Monthly temperatures in degrees

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17 Logistics Week 8: Friday, Oct 17 1. HW 3 errata: in Problem 1, I meant to say p i < i, not that p i is strictly ascending my apologies. You would want p i > i if you were simply forming the matrices and

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of

More information

Computational Methods CMSC/AMSC/MAPL 460. EigenValue decomposition Singular Value Decomposition. Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460. EigenValue decomposition Singular Value Decomposition. Ramani Duraiswami, Dept. of Computer Science Computational Methods CMSC/AMSC/MAPL 460 EigenValue decomposition Singular Value Decomposition Ramani Duraiswami, Dept. of Computer Science Hermitian Matrices A square matrix for which A = A H is said

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Singular Value Decomposition and Digital Image Compression

Singular Value Decomposition and Digital Image Compression Singular Value Decomposition and Digital Image Compression Chris Bingham December 1, 016 Page 1 of Abstract The purpose of this document is to be a very basic introduction to the singular value decomposition

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

1 Linearity and Linear Systems

1 Linearity and Linear Systems Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)

More information

Homework 1. Yuan Yao. September 18, 2011

Homework 1. Yuan Yao. September 18, 2011 Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to

More information

COMP 558 lecture 18 Nov. 15, 2010

COMP 558 lecture 18 Nov. 15, 2010 Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea

More information

Orthogonal iteration to QR

Orthogonal iteration to QR Notes for 2016-03-09 Orthogonal iteration to QR The QR iteration is the workhorse for solving the nonsymmetric eigenvalue problem. Unfortunately, while the iteration itself is simple to write, the derivation

More information

Linear Least Squares. Using SVD Decomposition.

Linear Least Squares. Using SVD Decomposition. Linear Least Squares. Using SVD Decomposition. Dmitriy Leykekhman Spring 2011 Goals SVD-decomposition. Solving LLS with SVD-decomposition. D. Leykekhman Linear Least Squares 1 SVD Decomposition. For any

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Anders Øland David Christiansen 1 Introduction Principal Component Analysis, or PCA, is a commonly used multi-purpose technique in data analysis. It can be used for feature

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

FSAN/ELEG815: Statistical Learning

FSAN/ELEG815: Statistical Learning : Statistical Learning Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware 3. Eigen Analysis, SVD and PCA Outline of the Course 1. Review of Probability 2. Stationary

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas

BlockMatrixComputations and the Singular Value Decomposition. ATaleofTwoIdeas BlockMatrixComputations and the Singular Value Decomposition ATaleofTwoIdeas Charles F. Van Loan Department of Computer Science Cornell University Supported in part by the NSF contract CCR-9901988. Block

More information


EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4 EIGENVALUE PROBLEMS EIGENVALUE PROBLEMS p. 1/4 EIGENVALUE PROBLEMS p. 2/4 Eigenvalues and eigenvectors Let A C n n. Suppose Ax = λx, x 0, then x is a (right) eigenvector of A, corresponding to the eigenvalue

More information

Block Bidiagonal Decomposition and Least Squares Problems

Block Bidiagonal Decomposition and Least Squares Problems Block Bidiagonal Decomposition and Least Squares Problems Åke Björck Department of Mathematics Linköping University Perspectives in Numerical Analysis, Helsinki, May 27 29, 2008 Outline Bidiagonal Decomposition

More information

Chapter XII: Data Pre and Post Processing

Chapter XII: Data Pre and Post Processing Chapter XII: Data Pre and Post Processing Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 XII.1 4-1 Chapter XII: Data Pre and Post Processing 1. Data

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition (Com S 477/577 Notes Yan-Bin Jia Sep, 7 Introduction Now comes a highlight of linear algebra. Any real m n matrix can be factored as A = UΣV T where U is an m m orthogonal

More information

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 26, 2010 Linear system Linear system Ax = b, A C m,n, b C m, x C n. under-determined

More information


PRINCIPAL COMPONENT ANALYSIS PRINCIPAL COMPONENT ANALYSIS 1 INTRODUCTION One of the main problems inherent in statistics with more than two variables is the issue of visualising or interpreting data. Fortunately, quite often the problem

More information


PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Linear Algebra and Matrices

Linear Algebra and Matrices Linear Algebra and Matrices 4 Overview In this chapter we studying true matrix operations, not element operations as was done in earlier chapters. Working with MAT- LAB functions should now be fairly routine.

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Lecture 12 Eigenvalue Problem. Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method

Lecture 12 Eigenvalue Problem. Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method Lecture Eigenvalue Problem Review of Eigenvalues Some properties Power method Shift method Inverse power method Deflation QR Method Eigenvalue Eigenvalue ( A I) If det( A I) (trivial solution) To obtain

More information


CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6 CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6 GENE H GOLUB Issues with Floating-point Arithmetic We conclude our discussion of floating-point arithmetic by highlighting two issues that frequently

More information

Derivation of the Kalman Filter

Derivation of the Kalman Filter Derivation of the Kalman Filter Kai Borre Danish GPS Center, Denmark Block Matrix Identities The key formulas give the inverse of a 2 by 2 block matrix, assuming T is invertible: T U 1 L M. (1) V W N P

More information

Linear Algebra Primer

Linear Algebra Primer Linear Algebra Primer David Doria Wednesday 3 rd December, 2008 Contents Why is it called Linear Algebra? 4 2 What is a Matrix? 4 2. Input and Output.....................................

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

Main matrix factorizations

Main matrix factorizations Main matrix factorizations A P L U P permutation matrix, L lower triangular, U upper triangular Key use: Solve square linear system Ax b. A Q R Q unitary, R upper triangular Key use: Solve square or overdetrmined

More information

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Principal Component Analysis. Applied Multivariate Statistics Spring 2012 Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated. Math 504, Homework 5 Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated 1 Find the eigenvalues and the associated eigenspaces

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

8. the singular value decomposition

8. the singular value decomposition 8. the singular value decomposition cmda 3606; mark embree version of 19 February 2017 The singular value decomposition (SVD) is among the most important and widely applicable matrix factorizations. It

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information


LECTURE 16: PCA AND SVD Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal

More information

The Singular Value Decomposition and Least Squares Problems

The Singular Value Decomposition and Least Squares Problems The Singular Value Decomposition and Least Squares Problems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 27, 2009 Applications of SVD solving

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding

More information