# Lecture 7 Spectral methods

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional vector u 0 for which Mu = λu. This u is the eigenvector corresponding to λ. In other words, the linear transformation M maps vector u into the same direction. It is interesting that any linear transformation necessarily has directional fixed points of this kind. The following chain of implications helps in understanding this: λ is an eigenvalue of M there exists u 0 with Mu = λu there exists u 0 with (M λi)u = 0 (M λi) is singular (that is, not invertible) det(m λi) = 0. Now, det(m λi) is a polynomial of degree d in λ. As such it has d roots (although some of them might be complex). This explains the existence of eigenvalues. A case of great interest is when M is real-valued and symmetric, because then the eigenvalues are real. Theorem 2. Let M be any real symmetric d d matrix. Then: 1. M has d real eigenvalues λ 1,...,λ d (not necessarily distinct). 2. There is a set of d corresponding eigenvectors u 1,...,u d that constitute an orthonormal basis of R d, that is, u i u j = δ ij for all i,j Spectral decomposition The spectral decomposition recasts a matrix in terms of its eigenvalues and eigenvectors. This representation turns out to be enormously useful. Theorem 3. Let M be a real symmetric d d matrix with eigenvalues λ 1,...,λ d and corresponding orthonormal eigenvectors u 1,...,u d. Then: λ 1 0 u 1 1. M = u 1 u 2 u d λ 2 u {{ call this Q 0 {{ λ d Λ 7-1 u d {{ Q T

2 2. M = d λ iu i u T i. Proof. A general proof strategy is to observe that M represents a linear transformation x Mx on R d, and as such, is completely determined by its behavior on any set of d linearly independent vectors. For instance, {u 1,...,u d are linearly independent, so any d d matrix N that satisfies Nu i = Mu i (for all i) is necessarily identical to M. Let s start by verifying (1). For practice, we ll do this two different ways. Method One: For any i, we have QΛQ T u i = QΛe i = Qλ i e i = λ i Qe i = λ i u i = Mu i. Thus QΛQ T = M. Method Two: Since the u i are orthonormal, we have Q T Q = I. Thus Q is invertible, with Q 1 = Q T ; whereupon QQ T = I. For any i, Q T MQe i = Q T Mu i = Q T λ i u i = λ i Q T u i = λ i e i = Λe i. Thus Λ = Q T MQ, which implies M = QΛQ T. Now for (2). Again we use the same proof strategy. For any j, ( ) λ i u i u T i u j = λ j u j = Mu j. Hence M = i λ iu i u T i Positive semidefinite matrices We now introduce an important subclass of real symmetric matrices. i Definition 4. A real symmetric d d matrix M is positive semidefinite (denoted M 0) if z T Mz 0 for all z R d. It is positive definite (denoted M 0) if z T Mz > 0 for all nonzero z R d. Example 5. Consider any random vector X R d, and let µ = EX and S = E[(X µ)(x µ) T ] denote its mean and covariance, respectively. Then S 0 because for any z R d, z T Sz = z T E[(X µ)(x µ) T ]z = E[(z T (X µ))((x µ) T z)] = E[(z (X µ)) 2 ] 0. Positive (semi)definiteness is easily characterized in terms of eigenvalues. Theorem 6. Let M be a real symmetric d d matrix. Then: 1. M is positive semidefinite iff all its eigenvalues λ i M is positive definite iff all its eigenvalues λ i > 0. Proof. Let s prove (1) (the second is similar). Let λ 1,...,λ d be the eigenvalues of M, with corresponding eigenvectors u 1,...,u d. First, suppose M 0. Then for all i, λ i = u T i Mu i 0. Conversely, suppose that all the λ i 0. Then for any z R d, we have ( ) z T Mz = z T λ i u i u T i z = λ i (z u)

3 7.1.4 The Rayleigh quotient One of the reasons why eigenvalues are so useful is that they constitute the optimal solution of a very basic quadratic optimization problem. Theorem 7. Let M be a real symmetric d d matrix with eigenvalues λ 1 λ 2 λ d, and corresponding eigenvectors u 1,...,u d. Then: z T Mz max z =1 zt Mz = max z 0 z T z z T Mz min z =1 zt Mz = min z 0 z T z = λ 1 = λ d and these are realized at z = u 1 and z = u d, respectively. Proof. Denote the spectral decomposition by M = QΛQ T. Then: max z 0 z T Mz z T z = max z 0 z T QΛQ T z z T QQ T z y T Λy = max y 0 y T y (since QQ T = I) (writing y = Q T z) λ 1 y λ d yd 2 = max y 0 y y2 d λ 1, where equality is attained in the last step when y = e 1, that is, z = Qe 1 = u 1. The argument for the minimum is identical. Example 8. Suppose random vector X R d has mean µ and covariance matrix M. Then z T Mz represents the variance of X in direction z: var(z T X) = E[(z T (X µ)) 2 ] = E[z T (X µ)(x µ) T z] = z T Mz. Theorem 7 tells us that the direction of maximum variance is u 1, and that of minimum variance is u d. Continuing with this example, suppose that we are interested in the k-dimensional subspace (of R d ) that has the most variance. How can this be formalized? To start with, we will think of a linear projection from R d to R k as a function x P T x, where P T is a k d matrix with P T P = I k. The last condition simply says that the rows of the projection matrix are orthonormal. When a random vector X R d is subjected to such a projection, the resulting k-dimensional vector has covariance matrix cov(p T X) = E[P T (X µ)(x µ) T P] = P T MP. Often we want to summarize the variance by just a single number rather than an entire matrix; in such cases, we typically use the trace of this matrix, and we write var(p T X) = tr(p T MP). This is also equal to E P T X P T µ 2. With this terminology established, we can now determine the projection P T that maximizes this variance. Theorem 9. Let M be a real symmetric d d matrix as in Theorem 7. Pick any k d. max tr(p T MP) = λ λ k P R d k,p T P=I min tr(p T MP) = λ d k λ d. P R d k,p T P=I 7-3

4 These are realized when the columns of P span the k-dimensional subspace spanned by {u 1,...,u k and {u d k+1,...,u d, respectively. Proof. We will prove the result for the maximum; the other case is symmetric. Let p 1,...,p k denote the columns of P. Then tr ( P T MP ) = p T i Mp i = p T i λ j u j u T j p i = λ j (p i u j ) 2. We will show that this quantity is at most λ λ k. To this end, let z j denote k (p i u j ) 2 ; clearly it is nonnegative. We will show that j z j = k and that each z j 1; the desired bound is then immediate. First, z j = j=1 j=1 (p i u j ) 2 = j=1 j=1 p T i u j u T j p i = j=1 p T i QQ T p i = p i 2 = k. To upper-bound an individual z j, start by extending the k orthonormal vectors p 1,...,p k to a full orthonormal basis p 1,...,p d of R d. Then It then follows that z j = (p i u j ) 2 (p i u j ) 2 = tr ( P T MP ) = u T j p i p T i u j = u T j 2 = 1. λ j z j λ λ k, and equality holds when p 1,...,p k span the same space as u 1,...,u k. j=1 7.2 Principal component analysis Let X R d be a random vector. We wish to find the single direction that captures as much as possible of the variance of X. Formally: we want p R d (the direction) such that p = 1, so as to maximize var(p T X). Theorem 10. The solution to this optimization problem is to make p the principal eigenvector of cov(x). Proof. Denote µ = EX and S = cov(x) = E[(X µ)(x µ) T ]. For any p R d, the projection p T X has mean E[p T X] = p T µ and variance var(p T X) = E[(p T X p T µ) 2 ] = E[p T (X µ)(x µ) T p] = p T Sp. By Theorem 7, this is maximized (over all unit-length p) when p is the principal eigenvector of S. Likewise, the k-dimensional subspace that captures as much as possible of the variance of X is simply the subspace spanned by the top k eigenvectors of cov(x); call these u 1,...,u k. Projection onto these eigenvectors is called principal component analysis (PCA). It can be used to reduce the dimension of the data from d to k. Here are the steps: Compute the mean µ and covariance matrix S of the data X. Compute the top k eigenvectors u 1,...,u k of S. Project X P T X, where P T is the k d matrix whose rows are u 1,...,u k. 7-4

5 7.2.1 The best approximating affine subspace We ve seen one optimality property of PCA. Here s another: it is the k-dimensional affine subspace that best approximates X, in the sense that the expected squared distance from X to the subspace is minimized. Let s formalize the problem. A k-dimensional affine subspace is given by a displacement p o R d and a set of (orthonormal) basis vectors p 1,...,p k R d. The subspace itself is then {p o +α 1 p 1 + +α k p k : α i R. p 2 p 1 subspace p o The projection of X R d onto this subspace is P T X +p o, where P T is the k d matrix whose rows are p 1,...,p k. Thus, the expected squared distance from X to this subspace is E X (P T X +p o ) 2. We wish to find the subspace for which this is minimized. Theorem 11. Let µ and S denote the mean and covariance of X, respectively. The solution of this optimization problem is to choose p 1,...,p k to be the top k eigenvectors of S and to set p o = (I P T )µ. Proof. Fix any matrix P; the choice of p o that minimizes E X (P T X + p o ) 2 is (by calculus) p o = E[X P T X] = (I P T )µ. Now let s optimize P. Our cost function is origin E X (P T X + p o ) 2 = E (I P T )(X µ) 2 = E X µ 2 E P T (X µ) 2, where the second step is simply an invocation of the Pythagorean theorem. X µ P T (X µ) origin Therefore, we need to maximize E P T (X µ) 2 = var(p T X), and we ve already seen how to do this in Theorem 10 and the ensuing discussion. 7-5

6 7.2.2 The projection that best preserves interpoint distances Suppose we want to find the k-dimensional projection that minimizes the expected distortion in interpoint distances. More precisely, we want to find the k d projection matrix P T (with P T P = I k ) such that, for i.i.d. random vectors X and Y, expected squared distortion E[ X Y 2 P T X P T Y 2 ] is minimized (of course, the term in brackets is always positive). Theorem 12. The solution is to make the rows of P T the top k eigenvectors of cov(x). Proof. This time we want to maximize E P T X P T Y 2 = 2 E P T X P T µ 2 = 2var(P T X), and once again we re back to our original problem. This is emphatically not the same as finding the linear transformation (that is, not necessarily a projection) P T for which E X Y 2 P T X P T Y 2 is minimized. The random projection method that we saw earlier falls in this latter camp, because it consists of a projection followed by a scaling by d/k A prelude to k-means clustering Suppose that for random vector X R d, the optimal k-means centers are µ 1,...,µ k, with cost opt = E X (nearest µ i ) 2. If instead, we project X into the k-dimensional PCA subspace, and find the best k centers µ 1,...,µ k in that subspace, how bad can these centers be? Theorem 13. cost(µ 1,...,µ k ) 2 opt. Proof. Without loss of generality EX = 0 and the PCA mapping is X P T X. Since µ 1,...,µ k are the best centers for P T X, it follows that E P T X (nearest µ i ) 2 E P T (X (nearest µ i )) 2 E X (nearest µ i ) 2 = opt. Let X A T X denote projection onto the subspace spanned by µ 1,...,µ k. From our earlier result on approximating affine subspaces, we know that E X P T X 2 E X A T X 2. Thus cost(µ 1,...,µ k ) = E X (nearest µ i ) 2 = E P T X (nearest µ i ) 2 + E X P T X 2 (Pythagorean theorem) opt + E X A T X 2 opt + E X (nearest µ i ) 2 = 2 opt. 7.3 Singular value decomposition For any real symmetric d d matrix M, we can find its eigenvalues λ 1 λ d and corresponding orthonormal eigenvectors u 1,...,u d, and write M = λ i u i u T i. 7-6

7 The best rank-k approximation to M is M k = λ i u i u T i, in the sense that this minimizes M M k 2 F over all rank-k matrices. (Here F denotes Frobenius norm; it is the same as L 2 norm if you imagine the matrix rearranged into a very long vector.) In many applications, M k is an adequate approximation of M even for fairly small values of k. And it is conveniently compact, of size O(kd). But what if we are dealing with a matrix M that is not square; say it is m n with m n: m M To find a compact approximation in such cases, we look at M T M or MM T, which are square. Eigendecompositions of these matrices lead to a good representation of M The relationship between MM T and M T M Lemma 14. M T M and MM T are symmetric positive semidefinite matrices. Proof. We ll do M T M; the other is similar. First off, it is symmetric: n (M T M) ij = k (M T ) ik M kj = k M ki M kj = k (M T ) jk M ki = (M T M) ji. Next, M T M 0 since for any z R n, we have z T M T Mz = Mz 2 0. Which one should we use, M T M or MM T? Well, they are of different sizes, n n and m m respectively. n M T M m MM T n m Ideally, we d prefer to deal with the smaller of two, MM T, especially since eigenvalue computations are expensive. Fortunately, it turns out the two matrices have the same (non-zero) eigenvalues! Lemma 15. If λ is an eigenvalue of M T M with eigenvector u, then either: (i) λ is an eigenvalue of MM T with eigenvector Mu, or (ii) λ = 0 and Mu =

8 Proof. Say λ 0; we ll prove that condition (i) holds. First of all, M T Mu = λu 0, so certainly Mu 0. It is an eigenvector of MM T with eigenvalue λ, since MM T (Mu) = M(M T Mu) = M(λu) = λ(mu). Next, suppose λ = 0; we ll establish condition (ii). Notice that Thus it must be the case that Mu = 0. Mu 2 = u T M T Mu = u T (M T Mu) = u T (λu) = A spectral decomposition for rectangular matrices Let s summarize the consequences of Lemma 15. We have two square matrices, a large one (M T M) of size n n and a smaller one (MM T ) of size m m. Let the eigenvalues of the large matrix be λ 1 λ 2 λ n, with corresponding orthonormal eigenvectors u 1,...,u n. From the lemma, we know that at most m of the eigenvalues are nonzero. The smaller matrix MM T has eigenvalues λ 1,...,λ m, and corresponding orthonormal eigenvectors v 1,...,v m. The lemma suggests that v i = Mu i ; this is certainly a valid set of eigenvectors, but they are not necessarily normalized to unit length. So instead we set v i = Mu i Mu i = Mu i u T i M T Mu i = Mu i. λi This finally gives us the singular value decomposition, a spectral decomposition for general matrices. Theorem 16. Let M be a rectangular m n matrix with m n. Define λ i,u i,v i as above. Then M = v 1 v 2 v m {{ Q 1, size m m λ1 0 λ λm 0 {{ Σ, size m n u 1 u 2. u n. {{ Q T 2, size n n Proof. We will check that Σ = Q T 1 MQ 2. By our proof strategy from Theorem 3, it is enough to verify that both sides have the same effect on e i for all 1 i n. For any such i, { { Q T 1 MQ 2 e i = Q T Q T 1 Mu i = 1 λi v i if i m λi e = i if i m = Σe 0 if i > m 0 if i > m i. The alternative form of the singular value decomposition is m M = λi v i u T i, which immediately yields a rank-k approximation M k = λi v i u T i. As in the square case, M k is the rank-k matrix that minimizes M M k 2 F. 7-8

### Symmetric matrices and dot products

Symmetric matrices and dot products Proposition An n n matrix A is symmetric iff, for all x, y in R n, (Ax) y = x (Ay). Proof. If A is symmetric, then (Ax) y = x T A T y = x T Ay = x (Ay). If equality

### 2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

### Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler Math 283 Fall 2016 Also see the separate version of this with Matlab and R commands. Prof. Tesler Diagonalizing

### Math 215 HW #9 Solutions

Math 5 HW #9 Solutions. Problem 4.4.. If A is a 5 by 5 matrix with all a ij, then det A. Volumes or the big formula or pivots should give some upper bound on the determinant. Answer: Let v i be the ith

### Principal Component Analysis

Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal

### 18.06SC Final Exam Solutions

18.06SC Final Exam Solutions 1 (4+7=11 pts.) Suppose A is 3 by 4, and Ax = 0 has exactly 2 special solutions: 1 2 x 1 = 1 and x 2 = 1 1 0 0 1 (a) Remembering that A is 3 by 4, find its row reduced echelon

### Symmetric Matrices and Eigendecomposition

Symmetric Matrices and Eigendecomposition Robert M. Freund January, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 Symmetric Matrices and Convexity of Quadratic Functions

### 10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

### Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

### MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T KEMP) Definition 01 If T (x) = Ax is a linear transformation from R n to R m then Nul (T ) = {x R n : T (x) = 0} = Nul (A) Ran (T ) = {Ax R m : x R n } = {b R m

### B553 Lecture 5: Matrix Algebra Review

B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations

### Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

### Recall the convention that, for us, all vectors are column vectors.

Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

### LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

### Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

### Least Squares Optimization

Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the

### Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

### Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as

MAHALANOBIS DISTANCE Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the p-dimensional space R p is defined as d E (x, y) = (x 1 y 1 ) 2 + +(x p y p ) 2

### Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

### Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

### PRINCIPAL COMPONENTS ANALYSIS

121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

### Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

2. REVIEW OF LINEAR ALGEBRA 1 Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε, where Y n 1 response vector and X n p is the model matrix (or design matrix ) with one row for

### CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

### Lecture 3: Review of Linear Algebra

ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

### Lecture 3: Review of Linear Algebra

ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

### Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

### Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

### AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

### Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

Statistics for Applications Chapter 9: Principal Component Analysis (PCA) 1/16 Multivariate statistics and review of linear algebra (1) Let X be a d-dimensional random vector and X 1,..., X n be n independent

### Eigenvectors and Hermitian Operators

7 71 Eigenvalues and Eigenvectors Basic Definitions Let L be a linear operator on some given vector space V A scalar λ and a nonzero vector v are referred to, respectively, as an eigenvalue and corresponding

### Chapter 3. Matrices. 3.1 Matrices

40 Chapter 3 Matrices 3.1 Matrices Definition 3.1 Matrix) A matrix A is a rectangular array of m n real numbers {a ij } written as a 11 a 12 a 1n a 21 a 22 a 2n A =.... a m1 a m2 a mn The array has m rows

### Singular Value Decomposition

Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

### Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem

Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem Steven J. Miller June 19, 2004 Abstract Matrices can be thought of as rectangular (often square) arrays of numbers, or as

### There are two things that are particularly nice about the first basis

Orthogonality and the Gram-Schmidt Process In Chapter 4, we spent a great deal of time studying the problem of finding a basis for a vector space We know that a basis for a vector space can potentially

### Problem Set 1. Homeworks will graded based on content and clarity. Please show your work clearly for full credit.

CSE 151: Introduction to Machine Learning Winter 2017 Problem Set 1 Instructor: Kamalika Chaudhuri Due on: Jan 28 Instructions This is a 40 point homework Homeworks will graded based on content and clarity

### MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

### The QR Decomposition

The QR Decomposition We have seen one major decomposition of a matrix which is A = LU (and its variants) or more generally PA = LU for a permutation matrix P. This was valid for a square matrix and aided

### Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

### December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

### Chapters 5 & 6: Theory Review: Solutions Math 308 F Spring 2015

Chapters 5 & 6: Theory Review: Solutions Math 308 F Spring 205. If A is a 3 3 triangular matrix, explain why det(a) is equal to the product of entries on the diagonal. If A is a lower triangular or diagonal

### Lecture 13 Spectral Graph Algorithms

COMS 995-3: Advanced Algorithms March 6, 7 Lecture 3 Spectral Graph Algorithms Instructor: Alex Andoni Scribe: Srikar Varadaraj Introduction Today s topics: Finish proof from last lecture Example of random

### Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

### Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

### DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

### Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

### Math 240 Calculus III

Generalized Calculus III Summer 2015, Session II Thursday, July 23, 2015 Agenda 1. 2. 3. 4. Motivation Defective matrices cannot be diagonalized because they do not possess enough eigenvectors to make

### Least Squares Optimization

Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization

### EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition

### The Eigenvalue Problem: Perturbation Theory

Jim Lambers MAT 610 Summer Session 2009-10 Lecture 13 Notes These notes correspond to Sections 7.2 and 8.1 in the text. The Eigenvalue Problem: Perturbation Theory The Unsymmetric Eigenvalue Problem Just

### Computation. For QDA we need to calculate: Lets first consider the case that

Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

### Introduction to Matrix Algebra

Introduction to Matrix Algebra August 18, 2010 1 Vectors 1.1 Notations A p-dimensional vector is p numbers put together. Written as x 1 x =. x p. When p = 1, this represents a point in the line. When p

### 2. Review of Linear Algebra

2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

### COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of

### MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors. Orthogonal sets Let V be a vector space with an inner product. Definition. Nonzero vectors v 1,v

### The Spectral Theorem for normal linear maps

MAT067 University of California, Davis Winter 2007 The Spectral Theorem for normal linear maps Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (March 14, 2007) In this section we come back to the question

### ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization Xiaohui Xie University of California, Irvine xhx@uci.edu Xiaohui Xie (UCI) ICS 6N 1 / 21 Symmetric matrices An n n

### MATH 304 Linear Algebra Lecture 34: Review for Test 2.

MATH 304 Linear Algebra Lecture 34: Review for Test 2. Topics for Test 2 Linear transformations (Leon 4.1 4.3) Matrix transformations Matrix of a linear mapping Similar matrices Orthogonality (Leon 5.1

### MATH 829: Introduction to Data Mining and Analysis Principal component analysis

1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

### Maths for Signals and Systems Linear Algebra in Engineering

Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE

### Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

### Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

### Proposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M)

RODICA D. COSTIN. Singular Value Decomposition.1. Rectangular matrices. For rectangular matrices M the notions of eigenvalue/vector cannot be defined. However, the products MM and/or M M (which are square,

### Linear Algebra Massoud Malek

CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

### Math Linear Algebra Final Exam Review Sheet

Math 15-1 Linear Algebra Final Exam Review Sheet Vector Operations Vector addition is a component-wise operation. Two vectors v and w may be added together as long as they contain the same number n of

### Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

### Optimization Theory. A Concise Introduction. Jiongmin Yong

October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

### Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

### Image Registration Lecture 2: Vectors and Matrices

Image Registration Lecture 2: Vectors and Matrices Prof. Charlene Tsai Lecture Overview Vectors Matrices Basics Orthogonal matrices Singular Value Decomposition (SVD) 2 1 Preliminary Comments Some of this

### Linear algebra and applications to graphs Part 1

Linear algebra and applications to graphs Part 1 Written up by Mikhail Belkin and Moon Duchin Instructor: Laszlo Babai June 17, 2001 1 Basic Linear Algebra Exercise 1.1 Let V and W be linear subspaces

### 1 Principal component analysis and dimensional reduction

Linear Algebra Working Group :: Day 3 Note: All vector spaces will be finite-dimensional vector spaces over the field R. 1 Principal component analysis and dimensional reduction Definition 1.1. Given an

### (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

. (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

### [POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

### 5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.

Linear Algebra - Test File - Spring Test # For problems - consider the following system of equations. x + y - z = x + y + 4z = x + y + 6z =.) Solve the system without using your calculator..) Find the

### The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

### 1 9/5 Matrices, vectors, and their applications

1 9/5 Matrices, vectors, and their applications Algebra: study of objects and operations on them. Linear algebra: object: matrices and vectors. operations: addition, multiplication etc. Algorithms/Geometric

### Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

### Linear Algebra: Characteristic Value Problem

Linear Algebra: Characteristic Value Problem . The Characteristic Value Problem Let < be the set of real numbers and { be the set of complex numbers. Given an n n real matrix A; does there exist a number

### Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

Quadratic forms 1. Symmetric matrices An n n matrix (a ij ) n ij=1 with entries on R is called symmetric if A T, that is, if a ij = a ji for all 1 i, j n. We denote by S n (R) the set of all n n symmetric

### The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

### Linear Algebra- Final Exam Review

Linear Algebra- Final Exam Review. Let A be invertible. Show that, if v, v, v 3 are linearly independent vectors, so are Av, Av, Av 3. NOTE: It should be clear from your answer that you know the definition.

### LECTURE 16: PCA AND SVD

Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal

### Principal Component Analysis

Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

### Eigenvalues and Eigenvectors

Chapter 1 Eigenvalues and Eigenvectors Among problems in numerical linear algebra, the determination of the eigenvalues and eigenvectors of matrices is second in importance only to the solution of linear

### Linear Algebra and Eigenproblems

Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

### L3: Review of linear algebra and MATLAB

L3: Review of linear algebra and MATLAB Vector and matrix notation Vectors Matrices Vector spaces Linear transformations Eigenvalues and eigenvectors MATLAB primer CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna

### Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods

Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point

### Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

### Chap. 3. Controlled Systems, Controllability

Chap. 3. Controlled Systems, Controllability 1. Controllability of Linear Systems 1.1. Kalman s Criterion Consider the linear system ẋ = Ax + Bu where x R n : state vector and u R m : input vector. A :

### Lecture 9: SVD, Low Rank Approximation

CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 9: SVD, Low Rank Approimation Lecturer: Shayan Oveis Gharan April 25th Scribe: Koosha Khalvati Disclaimer: hese notes have not been subjected

### 33A Linear Algebra and Applications: Practice Final Exam - Solutions

33A Linear Algebra and Applications: Practice Final Eam - Solutions Question Consider a plane V in R 3 with a basis given by v = and v =. Suppose, y are both in V. (a) [3 points] If [ ] B =, find. (b)

### Positive Definite Matrix

1/29 Chia-Ping Chen Professor Department of Computer Science and Engineering National Sun Yat-sen University Linear Algebra Positive Definite, Negative Definite, Indefinite 2/29 Pure Quadratic Function

### Definition (T -invariant subspace) Example. Example

Eigenvalues, Eigenvectors, Similarity, and Diagonalization We now turn our attention to linear transformations of the form T : V V. To better understand the effect of T on the vector space V, we begin

### 1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220)

Notes for 2017-01-30 Most of mathematics is best learned by doing. Linear algebra is no exception. You have had a previous class in which you learned the basics of linear algebra, and you will have plenty

### BASIC ALGORITHMS IN LINEAR ALGEBRA. Matrices and Applications of Gaussian Elimination. A 2 x. A T m x. A 1 x A T 1. A m x

BASIC ALGORITHMS IN LINEAR ALGEBRA STEVEN DALE CUTKOSKY Matrices and Applications of Gaussian Elimination Systems of Equations Suppose that A is an n n matrix with coefficents in a field F, and x = (x,,

### Topic 2 Quiz 2. choice C implies B and B implies C. correct-choice C implies B, but B does not imply C

Topic 1 Quiz 1 text A reduced row-echelon form of a 3 by 4 matrix can have how many leading one s? choice must have 3 choice may have 1, 2, or 3 correct-choice may have 0, 1, 2, or 3 choice may have 0,

### ALGEBRA QUALIFYING EXAM PROBLEMS LINEAR ALGEBRA

ALGEBRA QUALIFYING EXAM PROBLEMS LINEAR ALGEBRA Kent State University Department of Mathematical Sciences Compiled and Maintained by Donald L. White Version: August 29, 2017 CONTENTS LINEAR ALGEBRA AND

### x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

### Math 443 Differential Geometry Spring Handout 3: Bilinear and Quadratic Forms This handout should be read just before Chapter 4 of the textbook.

Math 443 Differential Geometry Spring 2013 Handout 3: Bilinear and Quadratic Forms This handout should be read just before Chapter 4 of the textbook. Endomorphisms of a Vector Space This handout discusses