Fourier PCA. Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)
|
|
- Isabella Hunt
- 5 years ago
- Views:
Transcription
1 Fourier PCA Navin Goyal (MSR India), Santosh Vempala (Georgia Tech) and Ying Xiao (Georgia Tech)
2 Introduction 1. Describe a learning problem. 2. Develop an efficient tensor decomposition.
3 Independent component analysis See independent samples x = As: s R m is a random vector with independent coordinates. Variables s i are not Gaussian. A R n m is a fixed matrix of full row rank. Each column A i R m has unit norm. Goal is to compute A.
4 ICA: start with independent random vector s
5 ICA: independent samples
6 ICA: but under the map A
7 ICA: goal is to recover A from samples only.
8 Applications Matrix A gives n linear measurements of m random variables. General dimensionality reduction tool in statistics and machine learning [HTF01]. Gained traction in deep belief networks [LKNN12]. Blind source separation and deconvolution in signal processing [HKO01]. More practically: finance [KO96], biology [VSJHO00] and MRI [KOHFY10].
9 Status Jutten and Herault 1991 formalised this problem. Studied in our community first by [FJK96]. Provably good algorithms: [AGMS12] and [AGHKT12]. Many algorithms proposed in signals processing literature [HKO01].
10 Standard approaches PCA Define a contrast function where optima are A j. Second moment E ( (u T x) 2) is usual PCA. Only succeeds when all the eigenvalues of the covariance matrix are different. Any distribution can be put into isotropic position.
11 Standard approaches fourth moments Define a contrast function where optima are A j. Fourth moment E ( (u T x) 4). Tensor decomposition: T = m λ j A j A j A j A j j=1 In case A j are orthonormal, they are the local optima of: T (v, v, v, v) = T ijkl v i v j v k v l where v = 1. i,j,k,l
12 Standard assumptions for x = As All algorithms require: 1. A is full rank n n: as many measurements as underlying variables. 2. Each s i differs from a Gaussian in the fourth moment: E (s i ) = 0, E ( si 2 ) ( ) = 1, E s 4 i 3 Note: this precludes the underdetermined case when A R n m is fat.
13 Our results We require neither standard assumptions. Underdetermined: A R n m is a fat matrix where n << m. Any moment: E (s r i ) E (zr ) where z N(0, 1). Theorem (Informal) Let x = As be an underdetermined ( )] m ) ICA model. Let d 2N be such that σ m ([vec > 0. Suppose for each s i, one A d/2 i i=1 of its first k cumulants satisfies cum ki (s i ). Then one can recover the columns of A up to ɛ accuracy in polynomial time.
14 Underdetermined ICA: start with distribution over Rm
15 Underdetermined ICA: independent samples s
16 Underdetermined ICA: A first rotates/scales
17 Underdetermined ICA: then A projects down to Rn
18 Fully determined ICA nice algorithm 1. (Fourier weights) Pick a random vector u from N(0, σ 2 I n ). For every x, compute its Fourier weight w(x) = e iut x x S eiut x. 2. (Reweighted Covariance) Compute the covariance matrix of the points x reweighted by w(x) µ u = 1 S x S w(x)x and Σ u = 1 S 3. Compute the eigenvectors V of Σ u. w(x)(x µ u )(x µ u ) T. x S
19 Why does this work? 1. Fourier differentiation/multiplication relationship: (f )ˆ(u) = (2πiu)ˆf (u) 2. Actually we consider the log of the fourier transform: ( ( )) ( ) D 2 log E exp(iu T x) = Adiag g j (A T j u) where g j (t) is the second derivative of log(e (exp(its j ))). A T
20 Technical overview Fundamental analytic tools: Second characteristic function ψ(u) = log(e ( exp(iu T x) ). Estimate order d derivative tensor field D d ψ from samples. Evaluate D d ψ at two randomly chosen u, v R n to give two tensors T u and T v. Perform a tensor decomposition on T u and T v to obtain A.
21 First derivative Easy case A = I n : Thus: ψ u 1 = = ψ(u) = log(e ( ) ( ) exp(iu T x) = log(e exp(iu T s) 1 ( ) E (exp(iu T s)) E s 1 exp(iu T s) 1 n j=1 E (exp(iu js j )) E (s 1 exp(iu 1 s 1 )) = E (s 1 exp(iu 1 s 1 )) E (exp(iu 1 s 1 )) n E (exp(iu j s j )) j=2
22 Second derivative Easy case A = I n : 1. Differentiating via quotient rule: 2 ψ u Differentiating a constant: = E ( s 2 1 exp(iu 1s 1 ) ) E (s i exp(iu 1 s 1 )) 2 E (exp(iu 1 s 1 )) 2 2 ψ u 1 u 2 = 0
23 General derivatives Key point: taking one derivative isolates each variable u i. Second derivative is a diagonal matrix. Subsequent derivatives are diagonal tensors: only the (i,..., i) term is nonzero. NB: Higher derivatives are represented by n n tensors. There is one such tensor per point in R n.
24 Basis change: second derivative When A I n, we have to work much harder: ( ) D 2 ψ u = Adiag g j (A T j u) where g j : R C is given by: A T g j (v) = 2 v 2 log (E (exp(ivs j)))
25 Basis change: general derivative When A I n, we have to work much harder: D d ψ u = m g(a T j u)(a j A j ) j=1 where g j : R C is given by: g j (v) = d v d log (E (exp(ivs j))) Evaluating the derivative at different points u give us tensors with shared decompositions!
26 Single tensor decomposition is NP hard Forget the derivatives now. Take λ j C and A j R n : T = m λ j A j A j, j=1 When we can recover the vectors A j? When is this computationally tractable?
27 Known results When d = 2, usual eigenvalue decomposition. M = n λ j A j A j j=1 When d 3 and A j are linearly independent, a tensor power iteration suffices [AGHKT12]. T = m λ j A j A j, j=1 This necessarily implies m n. For unique recovery, require all the eigenvalues to be different.
28 Generalising the problem What about two equations instead of one? T µ = m µ j A j A j T λ = j=1 m λ j A j A j j=1 Our technique will flatten the tensors: M µ = [ ( vec A d/2 j )] [ ( diag (µ j ) vec A d/2 j )] T
29 Algorithm Input: two tensors T µ and T λ flattened to M µ and M λ : 1. Compute W the right singular vectors of M µ. 2. Form matrix M = (W T M µ W )(W T M λ W ) Eigenvector decomposition M = PDP For each column P i, let v i C n be the best rank 1 approximation to P i packed back into a tensor. 5. For each v i, output re ( e iθ v i ) / re ( e iθ v i ) where θ = argmax θ [0,2π] ( re ( e iθ v i ) ).
30 Theorem Theorem (Tensor decomposition) Let T µ, T λ R n n be order d tensors such that d 2N and: ( where vec µ i λ i T µ = A d/2 j m j=1 µ j A d j T λ = m j=1 λ j A d j ) are linearly independent, µ i /λ i 0 and µ j λ j > 0 for all i, j. Then, the vectors Aj can be estimated to any desired accuracy in polynomial time.
31 Analysis Let s pretend M µ and M λ are full rank: [ ( M µ M 1 λ = vec = A d/2 j ( [ ( vec [ ( vec A d/2 j )] [ ( diag (µ j ) vec A d/2 j )] A d/2 j )] T )] T ) 1 diag (λ j ) 1 [ vec ( [ ( diag (µ j /λ j ) vec A d/2 j )] 1 The eigenvectors are flattened tensors of the form A d/2 j. A d/2 j )] 1
32 Diagonalisability for non-normal matrices When can we write A = PDP 1? Require all eigenvectors to be independent (P invertible). Minimal polynomial of A has non-degenerate roots.
33 Diagonalisability for non-normal matrices When can we write A = PDP 1? Require all eigenvectors to be independent (P invertible). Minimal polynomial of A has non-degenerate roots. Sufficient condition: all roots are non-degenerate.
34 Perturbed spectra for non-normal matrices More complicated than normal matrices: Normal: λ i (A + E) λ i (A) E. not-normal: Either Bauer-Fike Theorem λ i (A + E) λ j (A) E for some j, or we must assume A + E is already diagonalizable. Neither of these suffice.
35 Generalised Weyl inequality Lemma Let A C n n be a diagonalizable matrix such that A = Pdiag (λ i ) P 1. Let E C n n be a matrix such that λ i (A) λ j (A) 3κ(P) E for all i j. Then there exists a permutation π : [n] [n] such that λi (A + E) λ π(i) (A) κ(p) E. Proof. Via a homotopy argument (like strong Gershgorin theorem).
36 Robust analysis Proof sketch: 1. Apply Generalized Weyl to bound eigenvalues hence diagonalisable. 2. Apply Ipsen-Eisenstat theorem (generalised Davis-Kahan sin(θ) theorem). ( 3. This implies that output eigenvectors are close to vec 4. Apply tensor power iteration to extract approximate A j. A d/2 j 5. Show that the best real projection of approximate A j is close to true. )
37 Underdetermined ICA Algorithm x = As where A is a fat matrix. 1. Pick two independent random vectors u, v N(0, σ 2 I n ). 2. Form the d th derivative tensors at u and v, T u and T v. 3. Run tensor decomposition on the pair (T u, T v ).
38 Estimating from samples [D 4 ψ u ] i1,i 2,i 3,i 4 = 1 [ ( ) φ(u) 4 E (ix i1 )(ix i2 )(ix i3 )(ix i4 ) exp(iu T x) φ(u) 3 ( ) ( ) E (ix i2 )(ix i3 )(ix i4 ) exp(iu T x) E (ix i1 ) exp(iu T x) φ(u) 2 ( ) ( ) E (ix i2 )(ix i3 ) exp(iu T x) E (ix i1 )(ix i4 ) exp(iu T x) φ(u) 2 ( ) ( ) E (ix i2 )(ix i4 ) exp(iu T x) E (ix i1 )(ix i3 ) exp(iu T x) φ(u) 2 + At most 2 d 1 (d 1)! terms. Each one is easy to estimate empirically!
39 Theorem Theorem Fix n, m N such that n m. Let x R n be given by an underdetermined ( )] ICA m model ) x = As. Let d N such that and σ m ([vec > 0. Suppose that for each s i, one of its A d/2 i i=1 cumulants ( d < k i k satisfies cum ki (s i ) and E s i k) M. Then one can recover the columns of A up to ɛ accuracy in time and sample complexity ([ ( )]) ) k poly (n d+k, m k2, M k, 1/ k, 1/σ m vec, 1/ɛ. A d/2 i
40 Analysis Recall our matrices were: [ ( )] M u = vec where: A d/2 i ( ) [ ( diag g j (A T j u) vec g j (v) = d v d log (E (exp(ivs j))) A d/2 i Need to show that g j (A T j u)/g j (A T j v) are well-spaced. )] T
41 Truncation Taylor series of second characteristic: k i g i (u) = cum l (s i ) (iu)l d (l d)! + R (iu) k i d+1 t (k i d + 1)!. l=d Finite degree polynomials are anti-concentrated. Tail error is small because of existence of higher moments (in fact one suffices).
42 Tail error g j is the d th derivative of log(e ( exp(iu T s) ) ). For characteristic function φ (d) (u) ( E x d). Count the number of terms after iterating quotient rule d times.
43 Polynomial anti-concentration Lemma Let p(x) be a degree d monic polynomial over R. Let x N(0, σ 2 ), then for any t R we have Pr ( p(x) t ɛ) 4dɛ1/d σ 2π
44 Polynomial anti-concentration Proof. 1. For a fixed interval, a scaled Chebyshev polynomial has smallest l norm (order 1/2 d when interval is [ 1, 1]). 2. Since p is degree d, there are at most d 1 changes of sign, hence only d 1 intervals where p(x) is close to any t. 3. Applying the first fact, each interval is of length at most ɛ 1/d, each has Gaussian measure 1/σ 2π.
45 Polynomial anti-concentration
46 Eigenvalue spacings ( g Want to bound Pr i (A T i u) g i (A T i v) g j (A T j u) g j (A T j v) Condition on a value of A T j u = s. Then: ). ɛ g i (A T i u) g i (A T i v) s g j (A T j v) = p i (A T i u) g i (A T i v) + ɛ i g i (A T i v) s g j (A T j v) p i (A T i u) g i (A T i v) s g j (A T j v) ɛ i g i (A T i v). Once we ve conditioned on A T j u we can pretend A T i u is also a Gaussian (of highly reduced variance).
47 Eigenvalue spacings A T i u = A i, A j A T j u + r T u where r is orthogonal to A i Variance of remaining ( randomness )]) is r 2 σ m ([vec. A d/2 i We conclude by union bounding with the event that denominators are not too large, and then over all pairs i, j.
48 Extensions Can remove Gaussian noise when x = As + η and η N(µ, Σ). Gaussian mixtures (when x n i=1 w in(µ i, Σ i )), in the spherical covariance setting. (Gaussian noise applies here too.)
49 Open problems What is the relationship between our method and kernel PCA? Independent subspaces. Gaussian mixtures: underdetermined and generalized covariance case.
50 Fin Questions?
Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.2: Fundamentals 2 / 31 Eigenvalues and Eigenvectors Eigenvalues and eigenvectors of
More informationEcon Slides from Lecture 7
Econ 205 Sobel Econ 205 - Slides from Lecture 7 Joel Sobel August 31, 2010 Linear Algebra: Main Theory A linear combination of a collection of vectors {x 1,..., x k } is a vector of the form k λ ix i for
More informationj=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.
Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationRecall : Eigenvalues and Eigenvectors
Recall : Eigenvalues and Eigenvectors Let A be an n n matrix. If a nonzero vector x in R n satisfies Ax λx for a scalar λ, then : The scalar λ is called an eigenvalue of A. The vector x is called an eigenvector
More informationApplied Linear Algebra in Geoscience Using MATLAB
Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More information1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det
What is the determinant of the following matrix? 3 4 3 4 3 4 4 3 A 0 B 8 C 55 D 0 E 60 If det a a a 3 b b b 3 c c c 3 = 4, then det a a 4a 3 a b b 4b 3 b c c c 3 c = A 8 B 6 C 4 D E 3 Let A be an n n matrix
More informationMath 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination
Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationRobustness of Principal Components
PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.
More informationLecture 7 Spectral methods
CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional
More informationRobust Statistics, Revisited
Robust Statistics, Revisited Ankur Moitra (MIT) joint work with Ilias Diakonikolas, Jerry Li, Gautam Kamath, Daniel Kane and Alistair Stewart CLASSIC PARAMETER ESTIMATION Given samples from an unknown
More information235 Final exam review questions
5 Final exam review questions Paul Hacking December 4, 0 () Let A be an n n matrix and T : R n R n, T (x) = Ax the linear transformation with matrix A. What does it mean to say that a vector v R n is an
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationAppendix A. Proof to Theorem 1
Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationThe Eigenvalue Problem: Perturbation Theory
Jim Lambers MAT 610 Summer Session 2009-10 Lecture 13 Notes These notes correspond to Sections 7.2 and 8.1 in the text. The Eigenvalue Problem: Perturbation Theory The Unsymmetric Eigenvalue Problem Just
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationUNIT 6: The singular value decomposition.
UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T
More informationLINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS
LINEAR ALGEBRA, -I PARTIAL EXAM SOLUTIONS TO PRACTICE PROBLEMS Problem (a) For each of the two matrices below, (i) determine whether it is diagonalizable, (ii) determine whether it is orthogonally diagonalizable,
More informationHST.582J/6.555J/16.456J
Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear
More informationGlossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB
Glossary of Linear Algebra Terms Basis (for a subspace) A linearly independent set of vectors that spans the space Basic Variable A variable in a linear system that corresponds to a pivot column in the
More informationMATH 304 Linear Algebra Lecture 34: Review for Test 2.
MATH 304 Linear Algebra Lecture 34: Review for Test 2. Topics for Test 2 Linear transformations (Leon 4.1 4.3) Matrix transformations Matrix of a linear mapping Similar matrices Orthogonality (Leon 5.1
More informationComputational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science
Computational Methods CMSC/AMSC/MAPL 460 Eigenvalues and Eigenvectors Ramani Duraiswami, Dept. of Computer Science Eigen Values of a Matrix Recap: A N N matrix A has an eigenvector x (non-zero) with corresponding
More informationQuantum Computing Lecture 2. Review of Linear Algebra
Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces
More informationMidterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015
Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015 The test lasts 1 hour and 15 minutes. No documents are allowed. The use of a calculator, cell phone or other equivalent electronic
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationBare-bones outline of eigenvalue theory and the Jordan canonical form
Bare-bones outline of eigenvalue theory and the Jordan canonical form April 3, 2007 N.B.: You should also consult the text/class notes for worked examples. Let F be a field, let V be a finite-dimensional
More informationNotes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.
Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where
More informationECE 598: Representation Learning: Algorithms and Models Fall 2017
ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors
More informationConceptual Questions for Review
Conceptual Questions for Review Chapter 1 1.1 Which vectors are linear combinations of v = (3, 1) and w = (4, 3)? 1.2 Compare the dot product of v = (3, 1) and w = (4, 3) to the product of their lengths.
More informationLecture 7: Positive Semidefinite Matrices
Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationIr O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )
Section 3.2 Theorem 3.6. Let A be an m n matrix of rank r. Then r m, r n, and, by means of a finite number of elementary row and column operations, A can be transformed into the matrix ( ) Ir O D = 1 O
More informationMath 408 Advanced Linear Algebra
Math 408 Advanced Linear Algebra Chi-Kwong Li Chapter 4 Hermitian and symmetric matrices Basic properties Theorem Let A M n. The following are equivalent. Remark (a) A is Hermitian, i.e., A = A. (b) x
More informationMTH 2032 SemesterII
MTH 202 SemesterII 2010-11 Linear Algebra Worked Examples Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education December 28, 2011 ii Contents Table of Contents
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationCS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery
CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we
More informationApplied Linear Algebra in Geoscience Using MATLAB
Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in
More informationEIGENVALUE PROBLEMS. Background on eigenvalues/ eigenvectors / decompositions. Perturbation analysis, condition numbers..
EIGENVALUE PROBLEMS Background on eigenvalues/ eigenvectors / decompositions Perturbation analysis, condition numbers.. Power method The QR algorithm Practical QR algorithms: use of Hessenberg form and
More informationU.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018
U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018 Lecture 3 In which we show how to find a planted clique in a random graph. 1 Finding a Planted Clique We will analyze
More informationDefinition (T -invariant subspace) Example. Example
Eigenvalues, Eigenvectors, Similarity, and Diagonalization We now turn our attention to linear transformations of the form T : V V. To better understand the effect of T on the vector space V, we begin
More informationProvable Alternating Minimization Methods for Non-convex Optimization
Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationMatrix Vector Products
We covered these notes in the tutorial sessions I strongly recommend that you further read the presented materials in classical books on linear algebra Please make sure that you understand the proofs and
More information7. Symmetric Matrices and Quadratic Forms
Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value
More information33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM
33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM (UPDATED MARCH 17, 2018) The final exam will be cumulative, with a bit more weight on more recent material. This outline covers the what we ve done since the
More informationRemark 1 By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.
Sec 5 Eigenvectors and Eigenvalues In this chapter, vector means column vector Definition An eigenvector of an n n matrix A is a nonzero vector x such that A x λ x for some scalar λ A scalar λ is called
More informationPreliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012
Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationSingular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0
Singular value decomposition If only the first p singular values are nonzero we write G =[U p U o ] " Sp 0 0 0 # [V p V o ] T U p represents the first p columns of U U o represents the last N-p columns
More informationRITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY
RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY ILSE C.F. IPSEN Abstract. Absolute and relative perturbation bounds for Ritz values of complex square matrices are presented. The bounds exploit quasi-sparsity
More informationName: Final Exam MATH 3320
Name: Final Exam MATH 3320 Directions: Make sure to show all necessary work to receive full credit. If you need extra space please use the back of the sheet with appropriate labeling. (1) State the following
More informationPolynomial Chaos and Karhunen-Loeve Expansion
Polynomial Chaos and Karhunen-Loeve Expansion 1) Random Variables Consider a system that is modeled by R = M(x, t, X) where X is a random variable. We are interested in determining the probability of the
More informationNatural Gradient Learning for Over- and Under-Complete Bases in ICA
NOTE Communicated by Jean-François Cardoso Natural Gradient Learning for Over- and Under-Complete Bases in ICA Shun-ichi Amari RIKEN Brain Science Institute, Wako-shi, Hirosawa, Saitama 351-01, Japan Independent
More informationLinear Algebra in Actuarial Science: Slides to the lecture
Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations
More informationAn Introduction to Spectral Learning
An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,
More informationRecall the convention that, for us, all vectors are column vectors.
Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists
More informationMATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL
MATH 3 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL MAIN TOPICS FOR THE FINAL EXAM:. Vectors. Dot product. Cross product. Geometric applications. 2. Row reduction. Null space, column space, row space, left
More informationFall TMA4145 Linear Methods. Exercise set Given the matrix 1 2
Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an
More informationChapter 4 Euclid Space
Chapter 4 Euclid Space Inner Product Spaces Definition.. Let V be a real vector space over IR. A real inner product on V is a real valued function on V V, denoted by (, ), which satisfies () (x, y) = (y,
More informationRemark By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.
Sec 6 Eigenvalues and Eigenvectors Definition An eigenvector of an n n matrix A is a nonzero vector x such that A x λ x for some scalar λ A scalar λ is called an eigenvalue of A if there is a nontrivial
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More information1 9/5 Matrices, vectors, and their applications
1 9/5 Matrices, vectors, and their applications Algebra: study of objects and operations on them. Linear algebra: object: matrices and vectors. operations: addition, multiplication etc. Algorithms/Geometric
More informationMATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)
MATH 20F: LINEAR ALGEBRA LECTURE B00 (T KEMP) Definition 01 If T (x) = Ax is a linear transformation from R n to R m then Nul (T ) = {x R n : T (x) = 0} = Nul (A) Ran (T ) = {Ax R m : x R n } = {b R m
More informationDIAGONALIZATION. In order to see the implications of this definition, let us consider the following example Example 1. Consider the matrix
DIAGONALIZATION Definition We say that a matrix A of size n n is diagonalizable if there is a basis of R n consisting of eigenvectors of A ie if there are n linearly independent vectors v v n such that
More informationNo books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question.
Math 304 Final Exam (May 8) Spring 206 No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question. Name: Section: Question Points
More informationSingular Value Decomposition
Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding
More informationMath 113 Final Exam: Solutions
Math 113 Final Exam: Solutions Thursday, June 11, 2013, 3.30-6.30pm. 1. (25 points total) Let P 2 (R) denote the real vector space of polynomials of degree 2. Consider the following inner product on P
More informationICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization
ICS 6N Computational Linear Algebra Symmetric Matrices and Orthogonal Diagonalization Xiaohui Xie University of California, Irvine xhx@uci.edu Xiaohui Xie (UCI) ICS 6N 1 / 21 Symmetric matrices An n n
More informationDonald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016
Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix
More informationCIFAR Lectures: Non-Gaussian statistics and natural images
CIFAR Lectures: Non-Gaussian statistics and natural images Dept of Computer Science University of Helsinki, Finland Outline Part I: Theory of ICA Definition and difference to PCA Importance of non-gaussianity
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional
More informationLecture 8 : Eigenvalues and Eigenvectors
CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with
More informationMATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION
MATH (LINEAR ALGEBRA ) FINAL EXAM FALL SOLUTIONS TO PRACTICE VERSION Problem (a) For each matrix below (i) find a basis for its column space (ii) find a basis for its row space (iii) determine whether
More informationToday: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today
AM 205: lecture 22 Today: eigenvalue sensitivity, eigenvalue algorithms Reminder: midterm starts today Posted online at 5 PM on Thursday 13th Deadline at 5 PM on Friday 14th Covers material up to and including
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationEigenvectors and Hermitian Operators
7 71 Eigenvalues and Eigenvectors Basic Definitions Let L be a linear operator on some given vector space V A scalar λ and a nonzero vector v are referred to, respectively, as an eigenvalue and corresponding
More informationv = 1(1 t) + 1(1 + t). We may consider these components as vectors in R n and R m : w 1. R n, w m
20 Diagonalization Let V and W be vector spaces, with bases S = {e 1,, e n } and T = {f 1,, f m } respectively Since these are bases, there exist constants v i and w such that any vectors v V and w W can
More informationIV. Matrix Approximation using Least-Squares
IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationChapter 6: Orthogonality
Chapter 6: Orthogonality (Last Updated: November 7, 7) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved around.. Inner products
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationExercise Set 7.2. Skills
Orthogonally diagonalizable matrix Spectral decomposition (or eigenvalue decomposition) Schur decomposition Subdiagonal Upper Hessenburg form Upper Hessenburg decomposition Skills Be able to recognize
More informationReview problems for MA 54, Fall 2004.
Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on
More informationIndependent Component Analysis. Contents
Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle
More informationRobustness Meets Algorithms
Robustness Meets Algorithms Ankur Moitra (MIT) ICML 2017 Tutorial, August 6 th CLASSIC PARAMETER ESTIMATION Given samples from an unknown distribution in some class e.g. a 1-D Gaussian can we accurately
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationLecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016
Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen
More informationIndependent Component Analysis and Its Application on Accelerator Physics
Independent Component Analysis and Its Application on Accelerator Physics Xiaoying Pang LA-UR-12-20069 ICA and PCA Similarities: Blind source separation method (BSS) no model Observed signals are linear
More informationSingular Value Decomposition
Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =
More information