Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)

Size: px
Start display at page:

Download "Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)"

Transcription

1 Fourier PCA Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)

2 Introduction 1. Describe a learning problem. 2. Develop an efficient tensor decomposition.

3 Independent component analysis See independent samples x = As: s R m is a random vector with independent coordinates. Variables s i are not Gaussian. A R n m is a fixed matrix of full row rank. Each column A i R m has unit norm. Goal is to compute A.

4 ICA: start with independent random vector s

5 ICA: independent samples

6 ICA: but under the map A

7 ICA: goal is to recover A from samples only.

8 Applications Matrix A gives n linear measurements of m random variables. General dimensionality reduction tool in statistics and machine learning [HTF01]. Gained traction in deep belief networks [LKNN12]. Blind source separation and deconvolution in signal processing [HKO01]. More practically: finance [KO96], biology [VSJHO00] and MRI [KOHFY10].

9 Status Jutten and Herault 1991 formalised this problem. Studied in our community first by [FJK96]. Provably good algorithms: [AGMS12] and [AGHKT12]. Many algorithms proposed in signals processing literature [HKO01].

10 Standard approaches PCA Define a contrast function where optima are A j. Second moment E ( (u T x) 2) is usual PCA. Only succeeds when all the eigenvalues of the covariance matrix are different. Any distribution can be put into isotropic position.

11 Standard approaches fourth moments Define a contrast function where optima are A j. Fourth moment E ( (u T x) 4). Tensor decomposition: T = m λ j A j A j A j A j j=1 In case A j are orthonormal, they are the local optima of: T (v, v, v, v) = T ijkl v i v j v k v l where v = 1. i,j,k,l

12 Standard assumptions for x = As All algorithms require: 1. A is full rank n n: as many measurements as underlying variables. 2. Each s i differs from a Gaussian in the fourth moment: E (s i ) = 0, E ( si 2 ) ( ) = 1, E s 4 i 3 Note: this precludes the underdetermined case when A R n m is fat.

13 Our results We require neither standard assumptions. Underdetermined: A R n m is a fat matrix where n << m. Any moment: E (s r i ) E (zr ) where z N(0, 1). Theorem (Informal) Let x = As be an underdetermined ( )] m ) ICA model. Let d 2N be such that σ m ([vec > 0. Suppose for each s i, one A d/2 i i=1 of its first k cumulants satisfies cum ki (s i ). Then one can recover the columns of A up to ɛ accuracy in polynomial time.

14 Underdetermined ICA: start with distribution over Rm

15 Underdetermined ICA: independent samples s

16 Underdetermined ICA: A first rotates/scales

17 Underdetermined ICA: then A projects down to Rn

18 Fully determined ICA nice algorithm 1. (Fourier weights) Pick a random vector u from N(0, σ 2 I n ). For every x, compute its Fourier weight w(x) = e iut x x S eiut x. 2. (Reweighted Covariance) Compute the covariance matrix of the points x reweighted by w(x) µ u = 1 S x S w(x)x and Σ u = 1 S 3. Compute the eigenvectors V of Σ u. w(x)(x µ u )(x µ u ) T. x S

19 Why does this work? 1. Fourier differentiation/multiplication relationship: (f )ˆ(u) = (2πiu)ˆf (u) 2. Actually we consider the log of the fourier transform: ( ( )) ( ) D 2 log E exp(iu T x) = Adiag g j (A T j u) where g j (t) is the second derivative of log(e (exp(its j ))). A T

20 Technical overview Fundamental analytic tools: Second characteristic function ψ(u) = log(e ( exp(iu T x) ). Estimate order d derivative tensor field D d ψ from samples. Evaluate D d ψ at two randomly chosen u, v R n to give two tensors T u and T v. Perform a tensor decomposition on T u and T v to obtain A.

21 First derivative Easy case A = I n : Thus: ψ u 1 = = ψ(u) = log(e ( ) ( ) exp(iu T x) = log(e exp(iu T s) 1 ( ) E (exp(iu T s)) E s 1 exp(iu T s) 1 n j=1 E (exp(iu js j )) E (s 1 exp(iu 1 s 1 )) = E (s 1 exp(iu 1 s 1 )) E (exp(iu 1 s 1 )) n E (exp(iu j s j )) j=2

22 Second derivative Easy case A = I n : 1. Differentiating via quotient rule: 2 ψ u Differentiating a constant: = E ( s 2 1 exp(iu 1s 1 ) ) E (s i exp(iu 1 s 1 )) 2 E (exp(iu 1 s 1 )) 2 2 ψ u 1 u 2 = 0

23 General derivatives Key point: taking one derivative isolates each variable u i. Second derivative is a diagonal matrix. Subsequent derivatives are diagonal tensors: only the (i,..., i) term is nonzero. NB: Higher derivatives are represented by n n tensors. There is one such tensor per point in R n.

24 Basis change: second derivative When A I n, we have to work much harder: ( ) D 2 ψ u = Adiag g j (A T j u) where g j : R C is given by: A T g j (v) = 2 v 2 log (E (exp(ivs j)))

25 Basis change: general derivative When A I n, we have to work much harder: D d ψ u = m g(a T j u)(a j A j ) j=1 where g j : R C is given by: g j (v) = d v d log (E (exp(ivs j))) Evaluating the derivative at different points u give us tensors with shared decompositions!

26 Single tensor decomposition is NP hard Forget the derivatives now. Take λ j C and A j R n : T = m λ j A j A j, j=1 When we can recover the vectors A j? When is this computationally tractable?

27 Known results When d = 2, usual eigenvalue decomposition. M = n λ j A j A j j=1 When d 3 and A j are linearly independent, a tensor power iteration suffices [AGHKT12]. T = m λ j A j A j, j=1 This necessarily implies m n. For unique recovery, require all the eigenvalues to be different.

28 Generalising the problem What about two equations instead of one? T µ = m µ j A j A j T λ = j=1 m λ j A j A j j=1 Our technique will flatten the tensors: M µ = [ ( vec A d/2 j )] [ ( diag (µ j ) vec A d/2 j )] T

29 Algorithm Input: two tensors T µ and T λ flattened to M µ and M λ : 1. Compute W the right singular vectors of M µ. 2. Form matrix M = (W T M µ W )(W T M λ W ) Eigenvector decomposition M = PDP For each column P i, let v i C n be the best rank 1 approximation to P i packed back into a tensor. 5. For each v i, output re ( e iθ v i ) / re ( e iθ v i ) where θ = argmax θ [0,2π] ( re ( e iθ v i ) ).

30 Theorem Theorem (Tensor decomposition) Let T µ, T λ R n n be order d tensors such that d 2N and: ( where vec µ i λ i T µ = A d/2 j m j=1 µ j A d j T λ = m j=1 λ j A d j ) are linearly independent, µ i /λ i 0 and µ j λ j > 0 for all i, j. Then, the vectors Aj can be estimated to any desired accuracy in polynomial time.

31 Analysis Let s pretend M µ and M λ are full rank: [ ( M µ M 1 λ = vec = A d/2 j ( [ ( vec [ ( vec A d/2 j )] [ ( diag (µ j ) vec A d/2 j )] A d/2 j )] T )] T ) 1 diag (λ j ) 1 [ vec ( [ ( diag (µ j /λ j ) vec A d/2 j )] 1 The eigenvectors are flattened tensors of the form A d/2 j. A d/2 j )] 1

32 Diagonalisability for non-normal matrices When can we write A = PDP 1? Require all eigenvectors to be independent (P invertible). Minimal polynomial of A has non-degenerate roots.

33 Diagonalisability for non-normal matrices When can we write A = PDP 1? Require all eigenvectors to be independent (P invertible). Minimal polynomial of A has non-degenerate roots. Sufficient condition: all roots are non-degenerate.

34 Perturbed spectra for non-normal matrices More complicated than normal matrices: Normal: λ i (A + E) λ i (A) E. not-normal: Either Bauer-Fike Theorem λ i (A + E) λ j (A) E for some j, or we must assume A + E is already diagonalizable. Neither of these suffice.

35 Generalised Weyl inequality Lemma Let A C n n be a diagonalizable matrix such that A = Pdiag (λ i ) P 1. Let E C n n be a matrix such that λ i (A) λ j (A) 3κ(P) E for all i j. Then there exists a permutation π : [n] [n] such that λi (A + E) λ π(i) (A) κ(p) E. Proof. Via a homotopy argument (like strong Gershgorin theorem).

36 Robust analysis Proof sketch: 1. Apply Generalized Weyl to bound eigenvalues hence diagonalisable. 2. Apply Ipsen-Eisenstat theorem (generalised Davis-Kahan sin(θ) theorem). ( 3. This implies that output eigenvectors are close to vec 4. Apply tensor power iteration to extract approximate A j. A d/2 j 5. Show that the best real projection of approximate A j is close to true. )

37 Underdetermined ICA Algorithm x = As where A is a fat matrix. 1. Pick two independent random vectors u, v N(0, σ 2 I n ). 2. Form the d th derivative tensors at u and v, T u and T v. 3. Run tensor decomposition on the pair (T u, T v ).

38 Estimating from samples [D 4 ψ u ] i1,i 2,i 3,i 4 = 1 [ ( ) φ(u) 4 E (ix i1 )(ix i2 )(ix i3 )(ix i4 ) exp(iu T x) φ(u) 3 ( ) ( ) E (ix i2 )(ix i3 )(ix i4 ) exp(iu T x) E (ix i1 ) exp(iu T x) φ(u) 2 ( ) ( ) E (ix i2 )(ix i3 ) exp(iu T x) E (ix i1 )(ix i4 ) exp(iu T x) φ(u) 2 ( ) ( ) E (ix i2 )(ix i4 ) exp(iu T x) E (ix i1 )(ix i3 ) exp(iu T x) φ(u) 2 + At most 2 d 1 (d 1)! terms. Each one is easy to estimate empirically!

39 Theorem Theorem Fix n, m N such that n m. Let x R n be given by an underdetermined ( )] ICA m model ) x = As. Let d N such that and σ m ([vec > 0. Suppose that for each s i, one of its A d/2 i i=1 cumulants ( d < k i k satisfies cum ki (s i ) and E s i k) M. Then one can recover the columns of A up to ɛ accuracy in time and sample complexity ([ ( )]) ) k poly (n d+k, m k2, M k, 1/ k, 1/σ m vec, 1/ɛ. A d/2 i

40 Analysis Recall our matrices were: [ ( )] M u = vec where: A d/2 i ( ) [ ( diag g j (A T j u) vec g j (v) = d v d log (E (exp(ivs j))) A d/2 i Need to show that g j (A T j u)/g j (A T j v) are well-spaced. )] T

41 Truncation Taylor series of second characteristic: k i g i (u) = cum l (s i ) (iu)l d (l d)! + R (iu) k i d+1 t (k i d + 1)!. l=d Finite degree polynomials are anti-concentrated. Tail error is small because of existence of higher moments (in fact one suffices).

42 Tail error g j is the d th derivative of log(e ( exp(iu T s) ) ). For characteristic function φ (d) (u) ( E x d). Count the number of terms after iterating quotient rule d times.

43 Polynomial anti-concentration Lemma Let p(x) be a degree d monic polynomial over R. Let x N(0, σ 2 ), then for any t R we have Pr ( p(x) t ɛ) 4dɛ1/d σ 2π

44 Polynomial anti-concentration Proof. 1. For a fixed interval, a scaled Chebyshev polynomial has smallest l norm (order 1/2 d when interval is [ 1, 1]). 2. Since p is degree d, there are at most d 1 changes of sign, hence only d 1 intervals where p(x) is close to any t. 3. Applying the first fact, each interval is of length at most ɛ 1/d, each has Gaussian measure 1/σ 2π.

45 Polynomial anti-concentration

46 Eigenvalue spacings ( g Want to bound Pr i (A T i u) g i (A T i v) g j (A T j u) g j (A T j v) Condition on a value of A T j u = s. Then: ). ɛ g i (A T i u) g i (A T i v) s g j (A T j v) = p i (A T i u) g i (A T i v) + ɛ i g i (A T i v) s g j (A T j v) p i (A T i u) g i (A T i v) s g j (A T j v) ɛ i g i (A T i v). Once we ve conditioned on A T j u we can pretend A T i u is also a Gaussian (of highly reduced variance).

47 Eigenvalue spacings A T i u = A i, A j A T j u + r T u where r is orthogonal to A i Variance of remaining ( randomness )]) is r 2 σ m ([vec. A d/2 i We conclude by union bounding with the event that denominators are not too large, and then over all pairs i, j.

48 Extensions Can remove Gaussian noise when x = As + η and η N(µ, Σ). Gaussian mixtures (when x n i=1 w in(µ i, Σ i )), in the spherical covariance setting. (Gaussian noise applies here too.)

49 Open problems What is the relationship between our method and kernel PCA? Independent subspaces. Gaussian mixtures: underdetermined and generalized covariance case.

50 Fin Questions?

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.2: Fundamentals 2 / 31 Eigenvalues and Eigenvectors Eigenvalues and eigenvectors of

More information

Econ Slides from Lecture 7

Econ Slides from Lecture 7 Econ 205 Sobel Econ 205 - Slides from Lecture 7 Joel Sobel August 31, 2010 Linear Algebra: Main Theory A linear combination of a collection of vectors {x 1,..., x k } is a vector of the form k λ ix i for

More information

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent. Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Recall : Eigenvalues and Eigenvectors

Recall : Eigenvalues and Eigenvectors Recall : Eigenvalues and Eigenvectors Let A be an n n matrix. If a nonzero vector x in R n satisfies Ax λx for a scalar λ, then : The scalar λ is called an eigenvalue of A. The vector x is called an eigenvector

More information

Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det

1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det What is the determinant of the following matrix? 3 4 3 4 3 4 4 3 A 0 B 8 C 55 D 0 E 60 If det a a a 3 b b b 3 c c c 3 = 4, then det a a 4a 3 a b b 4b 3 b c c c 3 c = A 8 B 6 C 4 D E 3 Let A be an n n matrix

More information

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Lecture 7 Spectral methods

Lecture 7 Spectral methods CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional

More information

Robust Statistics, Revisited

Robust Statistics, Revisited Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown

More information

235 Final exam review questions

235 Final exam review questions 5 Final exam review questions Paul Hacking December 4, 0 () Let A be an n n matrix and T : R n R n, T (x) = Ax the linear transformation with matrix A. What does it mean to say that a vector v R n is an

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Appendix A. Proof to Theorem 1

Appendix A. Proof to Theorem 1 Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

The Eigenvalue Problem: Perturbation Theory

The Eigenvalue Problem: Perturbation Theory Jim Lambers MAT 610 Summer Session 2009-10 Lecture 13 Notes These notes correspond to Sections 7.2 and 8.1 in the text. The Eigenvalue Problem: Perturbation Theory The Unsymmetric Eigenvalue Problem Just

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Orthogonal tensor decomposition

Orthogonal tensor decomposition Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS LINEAR ALGEBRA, -I PARTIAL EXAM SOLUTIONS TO PRACTICE PROBLEMS Problem (a) For each of the two matrices below, (i) determine whether it is diagonalizable, (ii) determine whether it is orthogonally diagonalizable,

More information

HST.582J/6.555J/16.456J

HST.582J/6.555J/16.456J Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear

More information

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB Glossary of Linear Algebra Terms Basis (for a subspace) A linearly independent set of vectors that spans the space Basic Variable A variable in a linear system that corresponds to a pivot column in the

More information

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

MATH 304 Linear Algebra Lecture 34: Review for Test 2. MATH 304 Linear Algebra Lecture 34: Review for Test 2. Topics for Test 2 Linear transformations (Leon 4.1 4.3) Matrix transformations Matrix of a linear mapping Similar matrices Orthogonality (Leon 5.1

More information

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science Computational Methods CMSC/AMSC/MAPL 460 Eigenvalues and Eigenvectors Ramani Duraiswami, Dept. of Computer Science Eigen Values of a Matrix Recap: A N N matrix A has an eigenvector x (non-zero) with corresponding

More information

Quantum Computing Lecture 2. Review of Linear Algebra

Quantum Computing Lecture 2. Review of Linear Algebra Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces

More information

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015

Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015 Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015 The test lasts 1 hour and 15 minutes. No documents are allowed. The use of a calculator, cell phone or other equivalent electronic

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Bare-bones outline of eigenvalue theory and the Jordan canonical form

Bare-bones outline of eigenvalue theory and the Jordan canonical form Bare-bones outline of eigenvalue theory and the Jordan canonical form April 3, 2007 N.B.: You should also consult the text/class notes for worked examples. Let F be a field, let V be a finite-dimensional

More information

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T. Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where

More information

ECE 598: Representation Learning: Algorithms and Models Fall 2017

ECE 598: Representation Learning: Algorithms and Models Fall 2017 ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors

More information

Conceptual Questions for Review

Conceptual Questions for Review Conceptual Questions for Review Chapter 1 1.1 Which vectors are linear combinations of v = (3, 1) and w = (4, 3)? 1.2 Compare the dot product of v = (3, 1) and w = (4, 3) to the product of their lengths.

More information

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v ) Section 3.2 Theorem 3.6. Let A be an m n matrix of rank r. Then r m, r n, and, by means of a finite number of elementary row and column operations, A can be transformed into the matrix ( ) Ir O D = 1 O

More information

Math 408 Advanced Linear Algebra

Math 408 Advanced Linear Algebra Math 408 Advanced Linear Algebra Chi-Kwong Li Chapter 4 Hermitian and symmetric matrices Basic properties Theorem Let A M n. The following are equivalent. Remark (a) A is Hermitian, i.e., A = A. (b) x

More information

MTH 2032 SemesterII

MTH 2032 SemesterII MTH 202 SemesterII 2010-11 Linear Algebra Worked Examples Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education December 28, 2011 ii Contents Table of Contents

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we

More information

Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in

More information

EIGENVALUE PROBLEMS. Background on eigenvalues/ eigenvectors / decompositions. Perturbation analysis, condition numbers..

EIGENVALUE PROBLEMS. Background on eigenvalues/ eigenvectors / decompositions. Perturbation analysis, condition numbers.. EIGENVALUE PROBLEMS Background on eigenvalues/ eigenvectors / decompositions Perturbation analysis, condition numbers.. Power method The QR algorithm Practical QR algorithms: use of Hessenberg form and

More information

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 Lecture 3 In which we show how to find a planted clique in a random graph. 1 Finding a Planted Clique We will analyze

More information

Definition (T -invariant subspace) Example. Example

Definition (T -invariant subspace) Example. Example Eigenvalues, Eigenvectors, Similarity, and Diagonalization We now turn our attention to linear transformations of the form T : V V. To better understand the effect of T on the vector space V, we begin

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

Matrix Vector Products

Matrix Vector Products We covered these notes in the tutorial sessions I strongly recommend that you further read the presented materials in classical books on linear algebra Please make sure that you understand the proofs and

More information

7. Symmetric Matrices and Quadratic Forms

7. Symmetric Matrices and Quadratic Forms Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value

More information

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM 33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM (UPDATED MARCH 17, 2018) The final exam will be cumulative, with a bit more weight on more recent material. This outline covers the what we ve done since the

More information

Remark 1 By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

Remark 1 By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero. Sec 5 Eigenvectors and Eigenvalues In this chapter, vector means column vector Definition An eigenvector of an n n matrix A is a nonzero vector x such that A x λ x for some scalar λ A scalar λ is called

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Singular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0

Singular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0 Singular value decomposition If only the first p singular values are nonzero we write G =[U p U o ] " Sp 0 0 0 # [V p V o ] T U p represents the first p columns of U U o represents the last N-p columns

More information

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY ILSE C.F. IPSEN Abstract. Absolute and relative perturbation bounds for Ritz values of complex square matrices are presented. The bounds exploit quasi-sparsity

More information

Name: Final Exam MATH 3320

Name: Final Exam MATH 3320 Name: Final Exam MATH 3320 Directions: Make sure to show all necessary work to receive full credit. If you need extra space please use the back of the sheet with appropriate labeling. (1) State the following

More information

Polynomial Chaos and Karhunen-Loeve Expansion

Polynomial Chaos and Karhunen-Loeve Expansion Polynomial Chaos and Karhunen-Loeve Expansion 1) Random Variables Consider a system that is modeled by R = M(x, t, X) where X is a random variable. We are interested in determining the probability of the

More information

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Natural Gradient Learning for Over- and Under-Complete Bases in ICA NOTE Communicated by Jean-François Cardoso Natural Gradient Learning for Over- and Under-Complete Bases in ICA Shun-ichi Amari RIKEN Brain Science Institute, Wako-shi, Hirosawa, Saitama 351-01, Japan Independent

More information

Linear Algebra in Actuarial Science: Slides to the lecture

Linear Algebra in Actuarial Science: Slides to the lecture Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

Recall the convention that, for us, all vectors are column vectors.

Recall the convention that, for us, all vectors are column vectors. Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

More information

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL MATH 3 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL MAIN TOPICS FOR THE FINAL EXAM:. Vectors. Dot product. Cross product. Geometric applications. 2. Row reduction. Null space, column space, row space, left

More information

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2 Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an

More information

Chapter 4 Euclid Space

Chapter 4 Euclid Space Chapter 4 Euclid Space Inner Product Spaces Definition.. Let V be a real vector space over IR. A real inner product on V is a real valued function on V V, denoted by (, ), which satisfies () (x, y) = (y,

More information

Remark By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

Remark By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero. Sec 6 Eigenvalues and Eigenvectors Definition An eigenvector of an n n matrix A is a nonzero vector x such that A x λ x for some scalar λ A scalar λ is called an eigenvalue of A if there is a nontrivial

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

1 9/5 Matrices, vectors, and their applications

1 9/5 Matrices, vectors, and their applications 1 9/5 Matrices, vectors, and their applications Algebra: study of objects and operations on them. Linear algebra: object: matrices and vectors. operations: addition, multiplication etc. Algorithms/Geometric

More information

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP) MATH 20F: LINEAR ALGEBRA LECTURE B00 (T KEMP) Definition 01 If T (x) = Ax is a linear transformation from R n to R m then Nul (T ) = {x R n : T (x) = 0} = Nul (A) Ran (T ) = {Ax R m : x R n } = {b R m

More information

DIAGONALIZATION. In order to see the implications of this definition, let us consider the following example Example 1. Consider the matrix

DIAGONALIZATION. In order to see the implications of this definition, let us consider the following example Example 1. Consider the matrix DIAGONALIZATION Definition We say that a matrix A of size n n is diagonalizable if there is a basis of R n consisting of eigenvectors of A ie if there are n linearly independent vectors v v n such that

More information

No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question.

No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question. Math 304 Final Exam (May 8) Spring 206 No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question. Name: Section: Question Points

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding

More information

Math 113 Final Exam: Solutions

Math 113 Final Exam: Solutions Math 113 Final Exam: Solutions Thursday, June 11, 2013, 3.30-6.30pm. 1. (25 points total) Let P 2 (R) denote the real vector space of polynomials of degree 2. Consider the following inner product on P

More information

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization

ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization Xiaohui Xie University of California, Irvine xhx@uci.edu Xiaohui Xie (UCI) ICS 6N 1 / 21 Symmetric matrices An n n

More information

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016

Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix

More information

CIFAR Lectures: Non-Gaussian statistics and natural images

CIFAR Lectures: Non-Gaussian statistics and natural images CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

Lecture 8 : Eigenvalues and Eigenvectors

Lecture 8 : Eigenvalues and Eigenvectors CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with

More information

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION MATH (LINEAR ALGEBRA ) FINAL EXAM FALL SOLUTIONS TO PRACTICE VERSION Problem (a) For each matrix below (i) find a basis for its column space (ii) find a basis for its row space (iii) determine whether

More information

Today: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today

Today: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today AM 205: lecture 22 Today: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today Posted online at 5 PM on Thursday 13th Deadline at 5 PM on Friday 14th Covers material up to and including

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Eigenvectors and Hermitian Operators

Eigenvectors and Hermitian Operators 7 71 Eigenvalues and Eigenvectors Basic Definitions Let L be a linear operator on some given vector space V A scalar λ and a nonzero vector v are referred to, respectively, as an eigenvalue and corresponding

More information

v = 1(1 t) + 1(1 + t). We may consider these components as vectors in R n and R m : w 1. R n, w m

v = 1(1 t) + 1(1 + t). We may consider these components as vectors in R n and R m : w 1. R n, w m 20 Diagonalization Let V and W be vector spaces, with bases S = {e 1,, e n } and T = {f 1,, f m } respectively Since these are bases, there exist constants v i and w such that any vectors v V and w W can

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Chapter 6: Orthogonality

Chapter 6: Orthogonality Chapter 6: Orthogonality (Last Updated: November 7, 7) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved around.. Inner products

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Exercise Set 7.2. Skills

Exercise Set 7.2. Skills Orthogonally diagonalizable matrix Spectral decomposition (or eigenvalue decomposition) Schur decomposition Subdiagonal Upper Hessenburg form Upper Hessenburg decomposition Skills Be able to recognize

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

Independent Component Analysis. Contents

Independent Component Analysis. Contents Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle

More information

Robustness Meets Algorithms

Robustness Meets Algorithms Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Independent Component Analysis and Its Application on Accelerator Physics

Independent Component Analysis and Its Application on Accelerator Physics Independent Component Analysis and Its Application on Accelerator Physics Xiaoying Pang LA-UR-12-20069 ICA and PCA Similarities: Blind source separation method (BSS) no model Observed signals are linear

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =

More information