Lecture 7 Spectral methods


 Joleen Easter Richard
 1 years ago
 Views:
Transcription
1 CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a ddimensional vector u 0 for which Mu = λu. This u is the eigenvector corresponding to λ. In other words, the linear transformation M maps vector u into the same direction. It is interesting that any linear transformation necessarily has directional fixed points of this kind. The following chain of implications helps in understanding this: λ is an eigenvalue of M there exists u 0 with Mu = λu there exists u 0 with (M λi)u = 0 (M λi) is singular (that is, not invertible) det(m λi) = 0. Now, det(m λi) is a polynomial of degree d in λ. As such it has d roots (although some of them might be complex). This explains the existence of eigenvalues. A case of great interest is when M is realvalued and symmetric, because then the eigenvalues are real. Theorem 2. Let M be any real symmetric d d matrix. Then: 1. M has d real eigenvalues λ 1,...,λ d (not necessarily distinct). 2. There is a set of d corresponding eigenvectors u 1,...,u d that constitute an orthonormal basis of R d, that is, u i u j = δ ij for all i,j Spectral decomposition The spectral decomposition recasts a matrix in terms of its eigenvalues and eigenvectors. This representation turns out to be enormously useful. Theorem 3. Let M be a real symmetric d d matrix with eigenvalues λ 1,...,λ d and corresponding orthonormal eigenvectors u 1,...,u d. Then: λ 1 0 u 1 1. M = u 1 u 2 u d λ 2 u {{ call this Q 0 {{ λ d Λ 71 u d {{ Q T
2 2. M = d λ iu i u T i. Proof. A general proof strategy is to observe that M represents a linear transformation x Mx on R d, and as such, is completely determined by its behavior on any set of d linearly independent vectors. For instance, {u 1,...,u d are linearly independent, so any d d matrix N that satisfies Nu i = Mu i (for all i) is necessarily identical to M. Let s start by verifying (1). For practice, we ll do this two different ways. Method One: For any i, we have QΛQ T u i = QΛe i = Qλ i e i = λ i Qe i = λ i u i = Mu i. Thus QΛQ T = M. Method Two: Since the u i are orthonormal, we have Q T Q = I. Thus Q is invertible, with Q 1 = Q T ; whereupon QQ T = I. For any i, Q T MQe i = Q T Mu i = Q T λ i u i = λ i Q T u i = λ i e i = Λe i. Thus Λ = Q T MQ, which implies M = QΛQ T. Now for (2). Again we use the same proof strategy. For any j, ( ) λ i u i u T i u j = λ j u j = Mu j. Hence M = i λ iu i u T i Positive semidefinite matrices We now introduce an important subclass of real symmetric matrices. i Definition 4. A real symmetric d d matrix M is positive semidefinite (denoted M 0) if z T Mz 0 for all z R d. It is positive definite (denoted M 0) if z T Mz > 0 for all nonzero z R d. Example 5. Consider any random vector X R d, and let µ = EX and S = E[(X µ)(x µ) T ] denote its mean and covariance, respectively. Then S 0 because for any z R d, z T Sz = z T E[(X µ)(x µ) T ]z = E[(z T (X µ))((x µ) T z)] = E[(z (X µ)) 2 ] 0. Positive (semi)definiteness is easily characterized in terms of eigenvalues. Theorem 6. Let M be a real symmetric d d matrix. Then: 1. M is positive semidefinite iff all its eigenvalues λ i M is positive definite iff all its eigenvalues λ i > 0. Proof. Let s prove (1) (the second is similar). Let λ 1,...,λ d be the eigenvalues of M, with corresponding eigenvectors u 1,...,u d. First, suppose M 0. Then for all i, λ i = u T i Mu i 0. Conversely, suppose that all the λ i 0. Then for any z R d, we have ( ) z T Mz = z T λ i u i u T i z = λ i (z u)
3 7.1.4 The Rayleigh quotient One of the reasons why eigenvalues are so useful is that they constitute the optimal solution of a very basic quadratic optimization problem. Theorem 7. Let M be a real symmetric d d matrix with eigenvalues λ 1 λ 2 λ d, and corresponding eigenvectors u 1,...,u d. Then: z T Mz max z =1 zt Mz = max z 0 z T z z T Mz min z =1 zt Mz = min z 0 z T z = λ 1 = λ d and these are realized at z = u 1 and z = u d, respectively. Proof. Denote the spectral decomposition by M = QΛQ T. Then: max z 0 z T Mz z T z = max z 0 z T QΛQ T z z T QQ T z y T Λy = max y 0 y T y (since QQ T = I) (writing y = Q T z) λ 1 y λ d yd 2 = max y 0 y y2 d λ 1, where equality is attained in the last step when y = e 1, that is, z = Qe 1 = u 1. The argument for the minimum is identical. Example 8. Suppose random vector X R d has mean µ and covariance matrix M. Then z T Mz represents the variance of X in direction z: var(z T X) = E[(z T (X µ)) 2 ] = E[z T (X µ)(x µ) T z] = z T Mz. Theorem 7 tells us that the direction of maximum variance is u 1, and that of minimum variance is u d. Continuing with this example, suppose that we are interested in the kdimensional subspace (of R d ) that has the most variance. How can this be formalized? To start with, we will think of a linear projection from R d to R k as a function x P T x, where P T is a k d matrix with P T P = I k. The last condition simply says that the rows of the projection matrix are orthonormal. When a random vector X R d is subjected to such a projection, the resulting kdimensional vector has covariance matrix cov(p T X) = E[P T (X µ)(x µ) T P] = P T MP. Often we want to summarize the variance by just a single number rather than an entire matrix; in such cases, we typically use the trace of this matrix, and we write var(p T X) = tr(p T MP). This is also equal to E P T X P T µ 2. With this terminology established, we can now determine the projection P T that maximizes this variance. Theorem 9. Let M be a real symmetric d d matrix as in Theorem 7. Pick any k d. max tr(p T MP) = λ λ k P R d k,p T P=I min tr(p T MP) = λ d k λ d. P R d k,p T P=I 73
4 These are realized when the columns of P span the kdimensional subspace spanned by {u 1,...,u k and {u d k+1,...,u d, respectively. Proof. We will prove the result for the maximum; the other case is symmetric. Let p 1,...,p k denote the columns of P. Then tr ( P T MP ) = p T i Mp i = p T i λ j u j u T j p i = λ j (p i u j ) 2. We will show that this quantity is at most λ λ k. To this end, let z j denote k (p i u j ) 2 ; clearly it is nonnegative. We will show that j z j = k and that each z j 1; the desired bound is then immediate. First, z j = j=1 j=1 (p i u j ) 2 = j=1 j=1 p T i u j u T j p i = j=1 p T i QQ T p i = p i 2 = k. To upperbound an individual z j, start by extending the k orthonormal vectors p 1,...,p k to a full orthonormal basis p 1,...,p d of R d. Then It then follows that z j = (p i u j ) 2 (p i u j ) 2 = tr ( P T MP ) = u T j p i p T i u j = u T j 2 = 1. λ j z j λ λ k, and equality holds when p 1,...,p k span the same space as u 1,...,u k. j=1 7.2 Principal component analysis Let X R d be a random vector. We wish to find the single direction that captures as much as possible of the variance of X. Formally: we want p R d (the direction) such that p = 1, so as to maximize var(p T X). Theorem 10. The solution to this optimization problem is to make p the principal eigenvector of cov(x). Proof. Denote µ = EX and S = cov(x) = E[(X µ)(x µ) T ]. For any p R d, the projection p T X has mean E[p T X] = p T µ and variance var(p T X) = E[(p T X p T µ) 2 ] = E[p T (X µ)(x µ) T p] = p T Sp. By Theorem 7, this is maximized (over all unitlength p) when p is the principal eigenvector of S. Likewise, the kdimensional subspace that captures as much as possible of the variance of X is simply the subspace spanned by the top k eigenvectors of cov(x); call these u 1,...,u k. Projection onto these eigenvectors is called principal component analysis (PCA). It can be used to reduce the dimension of the data from d to k. Here are the steps: Compute the mean µ and covariance matrix S of the data X. Compute the top k eigenvectors u 1,...,u k of S. Project X P T X, where P T is the k d matrix whose rows are u 1,...,u k. 74
5 7.2.1 The best approximating affine subspace We ve seen one optimality property of PCA. Here s another: it is the kdimensional affine subspace that best approximates X, in the sense that the expected squared distance from X to the subspace is minimized. Let s formalize the problem. A kdimensional affine subspace is given by a displacement p o R d and a set of (orthonormal) basis vectors p 1,...,p k R d. The subspace itself is then {p o +α 1 p 1 + +α k p k : α i R. p 2 p 1 subspace p o The projection of X R d onto this subspace is P T X +p o, where P T is the k d matrix whose rows are p 1,...,p k. Thus, the expected squared distance from X to this subspace is E X (P T X +p o ) 2. We wish to find the subspace for which this is minimized. Theorem 11. Let µ and S denote the mean and covariance of X, respectively. The solution of this optimization problem is to choose p 1,...,p k to be the top k eigenvectors of S and to set p o = (I P T )µ. Proof. Fix any matrix P; the choice of p o that minimizes E X (P T X + p o ) 2 is (by calculus) p o = E[X P T X] = (I P T )µ. Now let s optimize P. Our cost function is origin E X (P T X + p o ) 2 = E (I P T )(X µ) 2 = E X µ 2 E P T (X µ) 2, where the second step is simply an invocation of the Pythagorean theorem. X µ P T (X µ) origin Therefore, we need to maximize E P T (X µ) 2 = var(p T X), and we ve already seen how to do this in Theorem 10 and the ensuing discussion. 75
6 7.2.2 The projection that best preserves interpoint distances Suppose we want to find the kdimensional projection that minimizes the expected distortion in interpoint distances. More precisely, we want to find the k d projection matrix P T (with P T P = I k ) such that, for i.i.d. random vectors X and Y, expected squared distortion E[ X Y 2 P T X P T Y 2 ] is minimized (of course, the term in brackets is always positive). Theorem 12. The solution is to make the rows of P T the top k eigenvectors of cov(x). Proof. This time we want to maximize E P T X P T Y 2 = 2 E P T X P T µ 2 = 2var(P T X), and once again we re back to our original problem. This is emphatically not the same as finding the linear transformation (that is, not necessarily a projection) P T for which E X Y 2 P T X P T Y 2 is minimized. The random projection method that we saw earlier falls in this latter camp, because it consists of a projection followed by a scaling by d/k A prelude to kmeans clustering Suppose that for random vector X R d, the optimal kmeans centers are µ 1,...,µ k, with cost opt = E X (nearest µ i ) 2. If instead, we project X into the kdimensional PCA subspace, and find the best k centers µ 1,...,µ k in that subspace, how bad can these centers be? Theorem 13. cost(µ 1,...,µ k ) 2 opt. Proof. Without loss of generality EX = 0 and the PCA mapping is X P T X. Since µ 1,...,µ k are the best centers for P T X, it follows that E P T X (nearest µ i ) 2 E P T (X (nearest µ i )) 2 E X (nearest µ i ) 2 = opt. Let X A T X denote projection onto the subspace spanned by µ 1,...,µ k. From our earlier result on approximating affine subspaces, we know that E X P T X 2 E X A T X 2. Thus cost(µ 1,...,µ k ) = E X (nearest µ i ) 2 = E P T X (nearest µ i ) 2 + E X P T X 2 (Pythagorean theorem) opt + E X A T X 2 opt + E X (nearest µ i ) 2 = 2 opt. 7.3 Singular value decomposition For any real symmetric d d matrix M, we can find its eigenvalues λ 1 λ d and corresponding orthonormal eigenvectors u 1,...,u d, and write M = λ i u i u T i. 76
7 The best rankk approximation to M is M k = λ i u i u T i, in the sense that this minimizes M M k 2 F over all rankk matrices. (Here F denotes Frobenius norm; it is the same as L 2 norm if you imagine the matrix rearranged into a very long vector.) In many applications, M k is an adequate approximation of M even for fairly small values of k. And it is conveniently compact, of size O(kd). But what if we are dealing with a matrix M that is not square; say it is m n with m n: m M To find a compact approximation in such cases, we look at M T M or MM T, which are square. Eigendecompositions of these matrices lead to a good representation of M The relationship between MM T and M T M Lemma 14. M T M and MM T are symmetric positive semidefinite matrices. Proof. We ll do M T M; the other is similar. First off, it is symmetric: n (M T M) ij = k (M T ) ik M kj = k M ki M kj = k (M T ) jk M ki = (M T M) ji. Next, M T M 0 since for any z R n, we have z T M T Mz = Mz 2 0. Which one should we use, M T M or MM T? Well, they are of different sizes, n n and m m respectively. n M T M m MM T n m Ideally, we d prefer to deal with the smaller of two, MM T, especially since eigenvalue computations are expensive. Fortunately, it turns out the two matrices have the same (nonzero) eigenvalues! Lemma 15. If λ is an eigenvalue of M T M with eigenvector u, then either: (i) λ is an eigenvalue of MM T with eigenvector Mu, or (ii) λ = 0 and Mu =
8 Proof. Say λ 0; we ll prove that condition (i) holds. First of all, M T Mu = λu 0, so certainly Mu 0. It is an eigenvector of MM T with eigenvalue λ, since MM T (Mu) = M(M T Mu) = M(λu) = λ(mu). Next, suppose λ = 0; we ll establish condition (ii). Notice that Thus it must be the case that Mu = 0. Mu 2 = u T M T Mu = u T (M T Mu) = u T (λu) = A spectral decomposition for rectangular matrices Let s summarize the consequences of Lemma 15. We have two square matrices, a large one (M T M) of size n n and a smaller one (MM T ) of size m m. Let the eigenvalues of the large matrix be λ 1 λ 2 λ n, with corresponding orthonormal eigenvectors u 1,...,u n. From the lemma, we know that at most m of the eigenvalues are nonzero. The smaller matrix MM T has eigenvalues λ 1,...,λ m, and corresponding orthonormal eigenvectors v 1,...,v m. The lemma suggests that v i = Mu i ; this is certainly a valid set of eigenvectors, but they are not necessarily normalized to unit length. So instead we set v i = Mu i Mu i = Mu i u T i M T Mu i = Mu i. λi This finally gives us the singular value decomposition, a spectral decomposition for general matrices. Theorem 16. Let M be a rectangular m n matrix with m n. Define λ i,u i,v i as above. Then M = v 1 v 2 v m {{ Q 1, size m m λ1 0 λ λm 0 {{ Σ, size m n u 1 u 2. u n. {{ Q T 2, size n n Proof. We will check that Σ = Q T 1 MQ 2. By our proof strategy from Theorem 3, it is enough to verify that both sides have the same effect on e i for all 1 i n. For any such i, { { Q T 1 MQ 2 e i = Q T Q T 1 Mu i = 1 λi v i if i m λi e = i if i m = Σe 0 if i > m 0 if i > m i. The alternative form of the singular value decomposition is m M = λi v i u T i, which immediately yields a rankk approximation M k = λi v i u T i. As in the square case, M k is the rankk matrix that minimizes M M k 2 F. 78
Symmetric matrices and dot products
Symmetric matrices and dot products Proposition An n n matrix A is symmetric iff, for all x, y in R n, (Ax) y = x (Ay). Proof. If A is symmetric, then (Ax) y = x T A T y = x T Ay = x (Ay). If equality
More information2. Matrix Algebra and Random Vectors
2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns
More informationLinear Algebra review Powers of a diagonalizable matrix Spectral decomposition
Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition Prof. Tesler Math 283 Fall 2016 Also see the separate version of this with Matlab and R commands. Prof. Tesler Diagonalizing
More informationMath 215 HW #9 Solutions
Math 5 HW #9 Solutions. Problem 4.4.. If A is a 5 by 5 matrix with all a ij, then det A. Volumes or the big formula or pivots should give some upper bound on the determinant. Answer: Let v i be the ith
More informationPrincipal Component Analysis
Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal
More information18.06SC Final Exam Solutions
18.06SC Final Exam Solutions 1 (4+7=11 pts.) Suppose A is 3 by 4, and Ax = 0 has exactly 2 special solutions: 1 2 x 1 = 1 and x 2 = 1 1 0 0 1 (a) Remembering that A is 3 by 4, find its row reduced echelon
More informationSymmetric Matrices and Eigendecomposition
Symmetric Matrices and Eigendecomposition Robert M. Freund January, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 Symmetric Matrices and Convexity of Quadratic Functions
More information10725/36725: Convex Optimization Prerequisite Topics
10725/36725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationMATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)
MATH 20F: LINEAR ALGEBRA LECTURE B00 (T KEMP) Definition 01 If T (x) = Ax is a linear transformation from R n to R m then Nul (T ) = {x R n : T (x) = 0} = Nul (A) Ran (T ) = {Ax R m : x R n } = {b R m
More informationB553 Lecture 5: Matrix Algebra Review
B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in R n and gradients of functions. Matrices represent linear transformations
More informationLinear Algebra & Geometry why is linear algebra useful in computer vision?
Linear Algebra & Geometry why is linear algebra useful in computer vision? References: Any book on linear algebra! [HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia
More informationRecall the convention that, for us, all vectors are column vectors.
Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationLinear Algebra for Machine Learning. Sargur N. Srihari
Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the
More informationMath Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88
Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant
More informationDef. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the pdimensional space R p is defined as
MAHALANOBIS DISTANCE Def. The euclidian distance between two points x = (x 1,...,x p ) t and y = (y 1,...,y p ) t in the pdimensional space R p is defined as d E (x, y) = (x 1 y 1 ) 2 + +(x p y p ) 2
More informationBasic Concepts in Matrix Algebra
Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1
More informationVectors and Matrices Statistics with Vectors and Matrices
Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #39/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A  augmented with SAS proc
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationLecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,
2. REVIEW OF LINEAR ALGEBRA 1 Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε, where Y n 1 response vector and X n p is the model matrix (or design matrix ) with one row for
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationDimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can
More informationAN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES
AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim
More informationStatistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16
Statistics for Applications Chapter 9: Principal Component Analysis (PCA) 1/16 Multivariate statistics and review of linear algebra (1) Let X be a ddimensional random vector and X 1,..., X n be n independent
More informationEigenvectors and Hermitian Operators
7 71 Eigenvalues and Eigenvectors Basic Definitions Let L be a linear operator on some given vector space V A scalar λ and a nonzero vector v are referred to, respectively, as an eigenvalue and corresponding
More informationChapter 3. Matrices. 3.1 Matrices
40 Chapter 3 Matrices 3.1 Matrices Definition 3.1 Matrix) A matrix A is a rectangular array of m n real numbers {a ij } written as a 11 a 12 a 1n a 21 a 22 a 2n A =.... a m1 a m2 a mn The array has m rows
More informationSingular Value Decomposition
Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our
More informationReview of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem
Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem Steven J. Miller June 19, 2004 Abstract Matrices can be thought of as rectangular (often square) arrays of numbers, or as
More informationThere are two things that are particularly nice about the first basis
Orthogonality and the GramSchmidt Process In Chapter 4, we spent a great deal of time studying the problem of finding a basis for a vector space We know that a basis for a vector space can potentially
More informationProblem Set 1. Homeworks will graded based on content and clarity. Please show your work clearly for full credit.
CSE 151: Introduction to Machine Learning Winter 2017 Problem Set 1 Instructor: Kamalika Chaudhuri Due on: Jan 28 Instructions This is a 40 point homework Homeworks will graded based on content and clarity
More informationMLCC 2015 Dimensionality Reduction and PCA
MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGEMITIIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component
More informationThe QR Decomposition
The QR Decomposition We have seen one major decomposition of a matrix which is A = LU (and its variants) or more generally PA = LU for a permutation matrix P. This was valid for a square matrix and aided
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts  ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLSR) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationChapters 5 & 6: Theory Review: Solutions Math 308 F Spring 2015
Chapters 5 & 6: Theory Review: Solutions Math 308 F Spring 205. If A is a 3 3 triangular matrix, explain why det(a) is equal to the product of entries on the diagonal. If A is a lower triangular or diagonal
More informationLecture 13 Spectral Graph Algorithms
COMS 9953: Advanced Algorithms March 6, 7 Lecture 3 Spectral Graph Algorithms Instructor: Alex Andoni Scribe: Srikar Varadaraj Introduction Today s topics: Finish proof from last lecture Example of random
More informationNumerical Methods I Singular Value Decomposition
Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATHGA 2011.003 / CSCIGA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)
More informationApplied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA  SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA  SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationStat 159/259: Linear Algebra Notes
Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the
More informationMath 240 Calculus III
Generalized Calculus III Summer 2015, Session II Thursday, July 23, 2015 Agenda 1. 2. 3. 4. Motivation Defective matrices cannot be diagonalized because they do not possess enough eigenvectors to make
More informationLeast Squares Optimization
Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization
More informationEE731 Lecture Notes: Matrix Computations for Signal Processing
EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition
More informationThe Eigenvalue Problem: Perturbation Theory
Jim Lambers MAT 610 Summer Session 200910 Lecture 13 Notes These notes correspond to Sections 7.2 and 8.1 in the text. The Eigenvalue Problem: Perturbation Theory The Unsymmetric Eigenvalue Problem Just
More informationComputation. For QDA we need to calculate: Lets first consider the case that
Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the
More informationIntroduction to Matrix Algebra
Introduction to Matrix Algebra August 18, 2010 1 Vectors 1.1 Notations A pdimensional vector is p numbers put together. Written as x 1 x =. x p. When p = 1, this represents a point in the line. When p
More information2. Review of Linear Algebra
2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear
More informationCOMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare
COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of
More informationMATH 304 Linear Algebra Lecture 20: The GramSchmidt process (continued). Eigenvalues and eigenvectors.
MATH 304 Linear Algebra Lecture 20: The GramSchmidt process (continued). Eigenvalues and eigenvectors. Orthogonal sets Let V be a vector space with an inner product. Definition. Nonzero vectors v 1,v
More informationThe Spectral Theorem for normal linear maps
MAT067 University of California, Davis Winter 2007 The Spectral Theorem for normal linear maps Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (March 14, 2007) In this section we come back to the question
More informationICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization
ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization Xiaohui Xie University of California, Irvine xhx@uci.edu Xiaohui Xie (UCI) ICS 6N 1 / 21 Symmetric matrices An n n
More informationMATH 304 Linear Algebra Lecture 34: Review for Test 2.
MATH 304 Linear Algebra Lecture 34: Review for Test 2. Topics for Test 2 Linear transformations (Leon 4.1 4.3) Matrix transformations Matrix of a linear mapping Similar matrices Orthogonality (Leon 5.1
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 Highdimensional
More informationMaths for Signals and Systems Linear Algebra in Engineering
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about rewriting our data using a new set of variables, or a new basis.
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about rewriting our data using a new set of variables, or a new basis.
More informationProposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M)
RODICA D. COSTIN. Singular Value Decomposition.1. Rectangular matrices. For rectangular matrices M the notions of eigenvalue/vector cannot be defined. However, the products MM and/or M M (which are square,
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationMath Linear Algebra Final Exam Review Sheet
Math 151 Linear Algebra Final Exam Review Sheet Vector Operations Vector addition is a componentwise operation. Two vectors v and w may be added together as long as they contain the same number n of
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationOptimization Theory. A Concise Introduction. Jiongmin Yong
October 11, 017 16:5 wsbook9x6 Book Title Optimization Theory 01708Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 01708Lecture Notes page Optimization
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationImage Registration Lecture 2: Vectors and Matrices
Image Registration Lecture 2: Vectors and Matrices Prof. Charlene Tsai Lecture Overview Vectors Matrices Basics Orthogonal matrices Singular Value Decomposition (SVD) 2 1 Preliminary Comments Some of this
More informationLinear algebra and applications to graphs Part 1
Linear algebra and applications to graphs Part 1 Written up by Mikhail Belkin and Moon Duchin Instructor: Laszlo Babai June 17, 2001 1 Basic Linear Algebra Exercise 1.1 Let V and W be linear subspaces
More information1 Principal component analysis and dimensional reduction
Linear Algebra Working Group :: Day 3 Note: All vector spaces will be finitedimensional vector spaces over the field R. 1 Principal component analysis and dimensional reduction Definition 1.1. Given an
More information(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =
. (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)
More information[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More information5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.
Linear Algebra  Test File  Spring Test # For problems  consider the following system of equations. x + y  z = x + y + 4z = x + y + 6z =.) Solve the system without using your calculator..) Find the
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational
More information1 9/5 Matrices, vectors, and their applications
1 9/5 Matrices, vectors, and their applications Algebra: study of objects and operations on them. Linear algebra: object: matrices and vectors. operations: addition, multiplication etc. Algorithms/Geometric
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) KernelPCA Fisher Linear Discriminant Analysis tsne 2 PCA: Motivation
More informationLinear Algebra: Characteristic Value Problem
Linear Algebra: Characteristic Value Problem . The Characteristic Value Problem Let < be the set of real numbers and { be the set of complex numbers. Given an n n real matrix A; does there exist a number
More informationQuadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.
Quadratic forms 1. Symmetric matrices An n n matrix (a ij ) n ij=1 with entries on R is called symmetric if A T, that is, if a ij = a ji for all 1 i, j n. We denote by S n (R) the set of all n n symmetric
More informationThe Multivariate Gaussian Distribution
The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vectorvalued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance
More informationLinear Algebra Final Exam Review
Linear Algebra Final Exam Review. Let A be invertible. Show that, if v, v, v 3 are linearly independent vectors, so are Av, Av, Av 3. NOTE: It should be clear from your answer that you know the definition.
More informationLECTURE 16: PCA AND SVD
Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationEigenvalues and Eigenvectors
Chapter 1 Eigenvalues and Eigenvectors Among problems in numerical linear algebra, the determination of the eigenvalues and eigenvectors of matrices is second in importance only to the solution of linear
More informationLinear Algebra and Eigenproblems
Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details
More informationL3: Review of linear algebra and MATLAB
L3: Review of linear algebra and MATLAB Vector and matrix notation Vectors Matrices Vector spaces Linear transformations Eigenvalues and eigenvectors MATLAB primer CSCE 666 Pattern Analysis Ricardo GutierrezOsuna
More informationGuaranteed Learning of Latent Variable Models through Spectral and Tensor Methods
Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 111 Recap  Curse of dimensionality Assume 5000 points uniformly distributed
More informationChap. 3. Controlled Systems, Controllability
Chap. 3. Controlled Systems, Controllability 1. Controllability of Linear Systems 1.1. Kalman s Criterion Consider the linear system ẋ = Ax + Bu where x R n : state vector and u R m : input vector. A :
More informationLecture 9: SVD, Low Rank Approximation
CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 9: SVD, Low Rank Approimation Lecturer: Shayan Oveis Gharan April 25th Scribe: Koosha Khalvati Disclaimer: hese notes have not been subjected
More information33A Linear Algebra and Applications: Practice Final Exam  Solutions
33A Linear Algebra and Applications: Practice Final Eam  Solutions Question Consider a plane V in R 3 with a basis given by v = and v =. Suppose, y are both in V. (a) [3 points] If [ ] B =, find. (b)
More informationPositive Definite Matrix
1/29 ChiaPing Chen Professor Department of Computer Science and Engineering National Sun Yatsen University Linear Algebra Positive Definite, Negative Definite, Indefinite 2/29 Pure Quadratic Function
More informationDefinition (T invariant subspace) Example. Example
Eigenvalues, Eigenvectors, Similarity, and Diagonalization We now turn our attention to linear transformations of the form T : V V. To better understand the effect of T on the vector space V, we begin
More information1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220)
Notes for 20170130 Most of mathematics is best learned by doing. Linear algebra is no exception. You have had a previous class in which you learned the basics of linear algebra, and you will have plenty
More informationBASIC ALGORITHMS IN LINEAR ALGEBRA. Matrices and Applications of Gaussian Elimination. A 2 x. A T m x. A 1 x A T 1. A m x
BASIC ALGORITHMS IN LINEAR ALGEBRA STEVEN DALE CUTKOSKY Matrices and Applications of Gaussian Elimination Systems of Equations Suppose that A is an n n matrix with coefficents in a field F, and x = (x,,
More informationTopic 2 Quiz 2. choice C implies B and B implies C. correctchoice C implies B, but B does not imply C
Topic 1 Quiz 1 text A reduced rowechelon form of a 3 by 4 matrix can have how many leading one s? choice must have 3 choice may have 1, 2, or 3 correctchoice may have 0, 1, 2, or 3 choice may have 0,
More informationALGEBRA QUALIFYING EXAM PROBLEMS LINEAR ALGEBRA
ALGEBRA QUALIFYING EXAM PROBLEMS LINEAR ALGEBRA Kent State University Department of Mathematical Sciences Compiled and Maintained by Donald L. White Version: August 29, 2017 CONTENTS LINEAR ALGEBRA AND
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the mostimportant distribution in all of statistics
More informationMath 443 Differential Geometry Spring Handout 3: Bilinear and Quadratic Forms This handout should be read just before Chapter 4 of the textbook.
Math 443 Differential Geometry Spring 2013 Handout 3: Bilinear and Quadratic Forms This handout should be read just before Chapter 4 of the textbook. Endomorphisms of a Vector Space This handout discusses
More informationLinear Algebra Primer
Linear Algebra Primer David Doria daviddoria@gmail.com Wednesday 3 rd December, 2008 Contents Why is it called Linear Algebra? 4 2 What is a Matrix? 4 2. Input and Output.....................................
More information18.S34 linear algebra problems (2007)
18.S34 linear algebra problems (2007) Useful ideas for evaluating determinants 1. Row reduction, expanding by minors, or combinations thereof; sometimes these are useful in combination with an induction
More information