Matrix Approximations

Size: px
Start display at page:

Download "Matrix Approximations"

Transcription

1 Matrix pproximations Sanjiv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 00 Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning

2 Latent Semantic Indexing (LSI) Given n documents with words from a vocabulary of size m, discover hidden topics and represent documents semantically. Process form a term-document matrix n documents m terms ij # of term i in j th document top left singular vectors represent topics transform input query in the -dim semantic space match against the semantically transformed database items n ~ O( B), m ~ O(00K) Large sparse/dense matrix Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning

3 Latent Semantic Indexing (LSI) Given n documents with words from a vocabulary of size m, discover hidden topics and represent documents semantically. Document representation and retrieval n documents U Σ V m terms Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 3

4 Latent Semantic Indexing (LSI) Given n documents with words from a vocabulary of size m, discover hidden topics and represent documents semantically. Document representation and retrieval n documents U Σ V m terms d j j th column dˆ j d j U Σ dˆ j Given a query q ˆ j Σ d qˆ Σ Find the nearest neighbors from database using sim q d ) Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 4 U U d q j ( ˆ, ˆ j

5 Principal Component nalysis (PC) Given n vectors, find the linear reconstruction with minimum mean squared error. Process form the centered data matrix (i.e. subtract mean from each vector) n vectors d dims get the top left singular vectors (equivalent to eigenvectors of covariance matrix) project the input data on the singular vectors n ~ O( B), d ~ O(00K) Large dense matrix Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 5

6 Kernel Methods Commonly used for nonlinear extensions in classification, regression, dimensionality reduction etc. Kernel Matrix Most ernel methods boil down to manipulation of ernel matrix n K = ( x i, x j ) n Symmetric Positive Semi-Definite (SPSD) matrix of size Get top/bottom eigenvectors (eg., ernel PC) or low-ran approximation (SVM) n ~ O( B) Very large dense matrix n n Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 6

7 Spectral Clustering Graph-based clustering method Less assumptions on data distribution than parametric model e.g.,gmm Luxburg [] Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 7

8 Spectral Clustering Graph-based clustering method Less assumptions on data distribution than parametric model e.g.,gmm Key Idea: Find a (low) -dim embedding such that neighbors remain close Yˆ = rg min Wij yi y j = rg miny LY Y i, j Y D W Solution: Find bottom eigenvectors of L (ignoring last) assuming graph is connected Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 8

9 Spectral Clustering Graph-based clustering method Less assumptions on data distribution than parametric model e.g.,gmm Procedure: Compute weight matrix W: W ij σ = exp( x x / ) or Dense Compute normalized Laplacian i j W ij = exp( xi x j / σ 0 otherwise Sparse ) if i ~ j / / = I D WD Symmetric or = I D W symmetric D ii = jwij Do a -means clustering in optimal reduced dims of : D / U or U Bottom eigenvectors of, ignoring last n ~ O( B), d ~ O(00K) Large dense or sparse matrix Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 9

10 wo scenarios Dense Matrices. Number of elements. Matrix-vector products O( n ) O( n ) 3. Spectrum usually falls exponentially for most real-world data Most energy contained in top few eigenvalues Low-ran approximation gives reasonable results 4. Commonly arise in ernel methods, regression, manifold learning, second order optimization (hessian) Sparse Matrices. Number of elements O(n). Very fast matrix-vector products O(n) 3. Spectrum is usually flatter (energy falls more slowly) 4. Commonly arise in spectral clustering, manifold learning, semisupervised learning Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 0

11 Low-ran pproximation Given a matrix of ran < min( m, n) = B m n C m n Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning

12 Low-ran pproximation Given a matrix of ran < min( m, n) dvantages = B m n Commonly used to discover structure in data i.e., Latent Semantic Indexing (LSI) e.g., discovering topics in documents or Recommendation system e.g., predicting user preferences C m n Storage reduces to only O (( m + n) ) instead of O(mn) O (( m + n) ) Matrix-vector product requires only instead of O(mn) Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning

13 Singular Value Decomposition (SVD) = [ m n] [ n] U V m [ n n] [ n n] Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 3

14 Singular Value Decomposition (SVD) wlog suppose m > n = [ m n] [ n] U V m [ n n] [ n n] Left singular vectors U U = Eigenvectors of I Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 4

15 Singular Value Decomposition (SVD) wlog suppose m > n = [ m n] [ n] U V m [ n n] [ n n] Left singular vectors U U = Eigenvectors of I Right singular vectors V V = Eigenvectors of I Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 5

16 Singular Value Decomposition (SVD) wlog suppose m > n = [ m n] [ n] U V m [ n n] [ n n] Left singular vectors U U = Eigenvectors of I diag( σ,..., σ n ) σ... σ 0 n σ are eigenvalues of or i Right singular vectors V V = Eigenvectors of I Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 6

17 Low-ran pproximation Frobenius Norm Spectral Norm F = = = r( ) = r( ) = i, j max x 0 ij x x = σ i σ i F n Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 7

18 Low-ran pproximation Frobenius Norm Spectral Norm F = = = r( ) = r( ) = i, j max x 0 ij x x = σ i σ i F n = rg min D ran ( D) = D F, = U m n Σ V m n Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 8

19 Low-ran pproximation Frobenius Norm Spectral Norm F = = = r( ) = r( ) = i, j max x 0 ij x x = σ i σ i F n = rg min D ran ( D) = D F, = U m n Σ V m n Σ Removing small entries in has noise-removal interpretation e.g., PC Computational Complexity full SVD O( mnmin{ m, n}) ran- SVD O(mn) Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 9

20 Matrix Projection = U V = U U = V V Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 0

21 Matrix Projection = U V = U U = V V Projection of on space spanned by columns of U Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning

22 V V U U V U = = = Matrix Projection V V U U V U ~ ~ ~ ~ ~ ~ ~ ~ = For approximate decomposition, low-ran approximation using SVD and matrix projection give different results! Projection of on space spanned by columns of U Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning

23 Power Method o compute the largest eigenvalue/eigenvector of a matrix ssume is a symmetric matrix, i.e. it is diagonalizable with, U U = λ diag( λ, λ,..., λ... λ n λ n ) Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 3

24 Power Method o compute the largest eigenvalue/eigenvector of a matrix ssume is a symmetric matrix, i.e. it is diagonalizable with, Power iteration U U =. Start with random vector. for =,,, K q = q diag( λ, λ,..., λ Normalize and compute eigenvalue z = q K λ = K z K K λ q z K K q 0 R n λ n λ n repeated matrix-vector product Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 4 )

25 Convergence properties Suppose, q = au + au a n u n 0 Since columns of U mae a basis in R n q = a λ u + a λ u a 0 n λ n u n Since u = λu u = i i i λ u i Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 5

26 Convergence properties Suppose, a n u n u a u a q =... 0 ) ( ), ( λ λ O u q dist = Since columns of U mae a basis in R n n n n u λ a u λ a u λ a q = = = j j n j j u λ λ a a u λ a q 0 i i i i u λ u λu u = = Since if 0 a Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 6

27 Convergence properties Suppose, a n u n u a u a q =... 0 ) ( ), ( λ λ O u q dist = Since columns of U mae a basis in R n n n n u λ a u λ a u λ a q =... 0 i i i i u λ u λu u = = Since if 0 a bigger gap leads to faster convergence easy to satisfy by random initialization Can be generalized to multiple eigenvectors simultaneously by Orthogonal Iteration via QR-decomposition! Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 7 + = = j j n j j u λ λ a a u λ a q 0

28 Page Ran Which document or lin is more popular independent of the query? [wiipedia] Random wal over pages If one randomly clics the lins then what is the probability that one will be at page B? Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 8

29 Page Ran Stationary probability distribution over pages djacency matrix n [ ] n if i j ij = 0 otherwise ransition matrix n p( j i) if i j = 0 otherwise ij [ ] n ransition matrix is stochastic, i.e., rows sum to Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 9

30 Page Ran Stationary probability distribution over pages djacency matrix n [ ] n if i j ij = 0 otherwise ransition matrix n p( j i) if i j = 0 otherwise ij [ ] n ransition matrix is stochastic, i.e., rows sum to Stationary Probability distribution given by the eigenvector corresponding to largest eigenvalue (i.e. ) of u = u Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 30

31 Page Ran Stationary probability distribution over pages djacency matrix n [ ] n if i j ij = 0 otherwise ransition matrix n p( j i) if i j = 0 otherwise ij [ ] n ransition matrix is stochastic, i.e., rows sum to Stationary Probability distribution given by the eigenvector corresponding to largest eigenvalue (i.e. ) of ( ) u0 Power Iteration u = Damping is used to mae sure graph connected u u = = α u + ( α)(/ n) + n-vector of s u Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 3

32 Lanczos Method technique to solve large sparse symmetric eigenproblems Useful when only top or bottom few eigenvalues/vectors are needed Generates a sequence of tridiagonal matrices whose extremal eigenvalues progressively better estimates of desired values Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 3

33 Lanczos Method technique to solve large sparse symmetric eigenproblems Useful when only top or bottom few eigenvalues/vectors are needed Generates a sequence of tridiagonal matrices whose extremal eigenvalues progressively better estimates of desired values Krylov Subspaces Κ(, q, ) = span{ q, q,..., q} Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 33

34 Lanczos Method technique to solve large sparse symmetric eigenproblems Useful when only top or bottom few eigenvalues/vectors are needed Generates a sequence of tridiagonal matrices whose extremal eigenvalues progressively better estimates of desired values Krylov Subspaces Key Idea Κ(, q, ) = span{ q, q,..., q} Successively generate a new orthonormal vector such that span{ q, q,..., q + } = span{ q, q,..., q} How to find { q, q,..., q + } Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 34

35 Why ridiagonalization Lanczos Method Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 35 Q Q = For any symmetric matrix : Where Q is n x n orthogonal matrix and is a tridiagonal matrix = n n α α α α 3 3 M M

36 Why ridiagonalization Lanczos Method Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 36 ]... [ n q q q Q = Q Q = For any symmetric matrix : Where Q is n x n orthogonal matrix and is a tridiagonal matrix = n n α α α α 3 3 M M Qe q = Q = Q = 0 0 M M e th let

37 Lanczos Method Why ridiagonalization For any symmetric matrix : Q Q = Where Q is n x n orthogonal matrix and is a tridiagonal matrix e 0 M = M 0 th Krylov Matrix let K m Q = Q Q = q q = Qe q... q [ n (, q, n) = [ q q... q] ] n n = Q [ e e e... e ] α α = M M R α 3 3 n α n Columns of Q form orthogonal bases for Krylov Matrix! Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 37

38 Construct partial Q and partial iteratively Lanczos Method Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 38 Q = Q Let s focus on th column of Q = n n α α α α 3 3 M M = q q α q q r q q I α q = + ) ( r q / = + r = if 0 r where

39 Lanczos Method Construct partial Q and partial iteratively Q = Q Let s focus on th column of Q q q = q + αq + q+ + = ( α I) q q if r 0 q / + = r where = r α α α = M M r0 = q, 0 =, =,... while ( 0) q = r / α = q q using orthormality of q s Need matrix-vector r = ( α I) q q product q = r n α Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 39 r 3 3 premultiply by q + n

40 Lanczos Method When to stop? Ideally until = ran of Krylov matrix Usually too long, so a threshold is used on residual Q = Q + r e Decompose S S = diag( θ,..., θ ) Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 40

41 Lanczos Method When to stop? Ideally until = ran of Krylov matrix Usually too long, so a threshold is used on residual Decompose estimated eigenvectors of : Q = Q + r S S = Q S e diag( θ,..., θ ) O() estimated eigenvalues of Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 4

42 Lanczos Method When to stop? Ideally until = ran of Krylov matrix Usually too long, so a threshold is used on residual Q = Q + r e Decompose estimated eigenvectors of : S S = Q S diag( θ,..., θ ) estimated eigenvalues of ccuracy min μ λ( ) θ i μ s i Convergence λ θ λ ( λ λn ) tan( φ ) /( c ( + ρ )) Kaniel-Page heory cos( φ q = u Chebyshev polynomial ) top eigenvector of ρ = ( λ λ ) /( λ λ n ) Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 4

43 Lanczos Method When to stop? Ideally until = ran of Krylov matrix Usually too long, so a threshold is used on residual Q = Q + r e Decompose estimated eigenvectors of : S S = Q S diag( θ,..., θ ) estimated eigenvalues of ccuracy min μ λ( ) θ i μ s i Convergence λ θ λ ( λ λn ) tan( φ ) /( c ( + ρ )) Kaniel-Page heory cos( φ q = u Chebyshev polynomial top eigenvector of Lanczos iterations converge faster than power iteration! ) Practical Implementation: void rounding errors Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 43

44 rnoldi s Method technique to solve large sparse asymmetric eigenproblems Generates a sequence of Hessenberg matrices with extremal eigenvalues progressively better estimates Q Q = H Q = QH q = + H = full hiqi r = q hiqi i= i= Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 44

45 rnoldi s Method technique to solve large sparse asymmetric eigenproblems Generates a sequence of Hessenberg matrices with extremal eigenvalues progressively better estimates if h +, fter steps Q Q = H Q = QH q = + H = full hiqi r = q hiqi i= i= 0 q + = r / h +, Q = Q H + r H y = λy e ( λi) Q y = ( e y) r ritz value ritz vector h = +, Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 45 O( ) r In practice, applied with multiple restarts!

46 Randomized SVD and low-ran approx Basic Problem oversampling Given an m n matrix, and an integer l = + p, find an m l orthonormal matrix Q such that m n Q Q m l he columns of Q form orthonormal basis for the range of l << m, n Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 46

47 Randomized SVD and low-ran approx Basic Problem oversampling Given an m n matrix, and an integer l = + p, find an m l orthonormal matrix Q such that m n Q Q m l he columns of Q form orthonormal basis for the range of l << m, n How to do SVD?. Form a small (l n) matrix:. Compute SVD of B: 3. Set U = QUˆ B = Uˆ B = Q Σ V O(mnl) O( nl O( ml ) ) Of course, the best Q is: Q = How to build approximate Q? Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 47 U l Expensive!

48 Randomized method Basic Problem Given an m n matrix, and an integer l, find an m lorthonormal matrix Q such that Q Q Procedure. Draw random vectors from P(w) w, w,..., w l R n. Form projected sample vectors y = w,..., y l = w l R m 3. Form orthonormal vectors q, q,... q l R m Span( q, q,..., ql ) = Span( y, y,..., yl ) Gram-Schmidt or QR Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 48

49 Randomized method Basic Problem Given an m n matrix, and an integer l, find an m lorthonormal matrix Q such that Q Q Matrix-version. Construct a random matrix Ω l of size n l O(nl) most expensive!. Form m l sample matrix Yl = Ωl O(mnl) Use randomized FF for O(mnlog(l)) 3. Form an m l orthonormal matrix Q l such that Y l = QlQl Y l O( ml ) Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 49

50 Randomized method Basic Problem Given an m n matrix, and an integer l, find an m lorthonormal matrix Q such that Q Q Matrix-version. Construct a random matrix Ω l of size n l O(nl) most expensive!. Form m l sample matrix Yl = Ωl O(mnl) Use randomized FF for O(mnlog(l)) 3. Form an m l orthonormal matrix Q l such that Y l = QlQl Y l O( ml ) Error in pproximation Q Q σl+ Q l Q l [ + + p. min( m, n)] σ+ with probability 6 p p Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 50

51 Matrices with slowly decaying spectrum Basic Idea Incorporate Krylov-space type power iteration B = ( ) q same singular vectors as of but fast decaying singular values Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 5

52 Matrices with slowly decaying spectrum Basic Idea Incorporate Krylov-space type power iteration B = ( ) q same singular vectors as of but fast decaying singular values lgorithm. Construct a random matrix Ω l of size n l. Form m l sample matrix Z = ( ) q Ω l 3. Form an m l orthonormal matrix Q l from Z using partial QR Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 5

53 Matrices with slowly decaying spectrum Basic Idea Incorporate Krylov-space type power iteration B = ( ) q same singular vectors as of but fast decaying singular values lgorithm. Construct a random matrix Ω l of size n l. Form m l sample matrix Z = ( ) q Ω l 3. Form an m l orthonormal matrix Q l from Z using partial QR Error in pproximation Q l Q l /(q+ ) [ + + p. min( m, n)] σ+ with probability 6 p p Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 53

54 Randomized method: Example Eigenfaces m = 754 faces n = pixels Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 54

55 Randomized method dvantages. Much faster than standard methods: O ( mnlog ) instead of O(mn). Need to loo at data only very few times (sometimes single pass!) 3. Easy parallel implementation 4. he error in approximation can be made arbitrarily small with very high probability (even 0-0 ) inexpensively 5. Simple to code Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 55

56 Randomized Vs Sparse Methods Randomized methods Build the approximate range of by multiplying with a new random vector progressively Sparse Methods Build the approximate range of by multiplying recursively with the outcome of the previous stage starting with a random vector dvantages of randomized method. Similar computational complexity. Can be parallelized y = w,..., y l = w l y 3. Provide better approximations even for small spectrum gaps = w,..., y l = yl if is square 4. Can learn from very few (sometimes just one) passes over the data Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 56

57 References. Deerwester, S., et al, Improving Information Retrieval with Latent Semantic Indexing, Proceedings of the 5st nnual Meeting of the merican Society for Information Science 5, Ulrie Von Luxburg, tutorial on spectral clustering, ech Report, R-49, Max-plan Institute. 3. Golub and Van Loan, Matrix Computations, Johns Hopins Press, Sergey Brin, Larry Page (998). "he natomy of a Large-Scale Hypertextual Web Search Engine". Proceedings of the 7th international conference on World Wide Web (WWW). Brisbane, ustralia. pp Frieze, R. Kannan and S. Vempala, Fast Monte-Carlo lgorithms for finding low-ran approximations, Proceedings of the Foundations of Computer Science, P. Drineas and R. Kannan, M. Mahoney, Fast Monte Carlo lgorithms for Matrices I: pproximating Matrix Multiplication, ech-report R Halo, Martinsson, & ropp, "Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions, CM Report, 009. Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning 57

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection Eigenvalue Problems Last Time Social Network Graphs Betweenness Girvan-Newman Algorithm Graph Laplacian Spectral Bisection λ 2, w 2 Today Small deviation into eigenvalue problems Formulation Standard eigenvalue

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview

More information

The Lanczos and conjugate gradient algorithms

The Lanczos and conjugate gradient algorithms The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

Index. Copyright (c)2007 The Society for Industrial and Applied Mathematics From: Matrix Methods in Data Mining and Pattern Recgonition By: Lars Elden

Index. Copyright (c)2007 The Society for Industrial and Applied Mathematics From: Matrix Methods in Data Mining and Pattern Recgonition By: Lars Elden Index 1-norm, 15 matrix, 17 vector, 15 2-norm, 15, 59 matrix, 17 vector, 15 3-mode array, 91 absolute error, 15 adjacency matrix, 158 Aitken extrapolation, 157 algebra, multi-linear, 91 all-orthogonality,

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES 48 Arnoldi Iteration, Krylov Subspaces and GMRES We start with the problem of using a similarity transformation to convert an n n matrix A to upper Hessenberg form H, ie, A = QHQ, (30) with an appropriate

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give

More information

Large-scale eigenvalue problems

Large-scale eigenvalue problems ELE 538B: Mathematics of High-Dimensional Data Large-scale eigenvalue problems Yuxin Chen Princeton University, Fall 208 Outline Power method Lanczos algorithm Eigenvalue problems 4-2 Eigendecomposition

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Eigenvalue Problems Computation and Applications

Eigenvalue Problems Computation and Applications Eigenvalue ProblemsComputation and Applications p. 1/36 Eigenvalue Problems Computation and Applications Che-Rung Lee cherung@gmail.com National Tsing Hua University Eigenvalue ProblemsComputation and

More information

Manning & Schuetze, FSNLP, (c)

Manning & Schuetze, FSNLP, (c) page 554 554 15 Topics in Information Retrieval co-occurrence Latent Semantic Indexing Term 1 Term 2 Term 3 Term 4 Query user interface Document 1 user interface HCI interaction Document 2 HCI interaction

More information

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization General Tools for Solving Large Eigen-Problems

More information

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Latent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Latent Semantic Models Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Vector Space Model: Pros Automatic selection of index terms Partial matching of queries

More information

MATH 304 Linear Algebra Lecture 34: Review for Test 2.

MATH 304 Linear Algebra Lecture 34: Review for Test 2. MATH 304 Linear Algebra Lecture 34: Review for Test 2. Topics for Test 2 Linear transformations (Leon 4.1 4.3) Matrix transformations Matrix of a linear mapping Similar matrices Orthogonality (Leon 5.1

More information

LARGE SPARSE EIGENVALUE PROBLEMS

LARGE SPARSE EIGENVALUE PROBLEMS LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization 14-1 General Tools for Solving Large Eigen-Problems

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

A randomized algorithm for approximating the SVD of a matrix

A randomized algorithm for approximating the SVD of a matrix A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University

More information

APPLIED NUMERICAL LINEAR ALGEBRA

APPLIED NUMERICAL LINEAR ALGEBRA APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

Conceptual Questions for Review

Conceptual Questions for Review Conceptual Questions for Review Chapter 1 1.1 Which vectors are linear combinations of v = (3, 1) and w = (4, 3)? 1.2 Compare the dot product of v = (3, 1) and w = (4, 3) to the product of their lengths.

More information

Manning & Schuetze, FSNLP (c) 1999,2000

Manning & Schuetze, FSNLP (c) 1999,2000 558 15 Topics in Information Retrieval (15.10) y 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Figure 15.7 An example of linear regression. The line y = 0.25x + 1 is the best least-squares fit for the four points (1,1),

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Iterative Subgraph Mining for Principal Component Analysis

Iterative Subgraph Mining for Principal Component Analysis Iterative Subgraph Mining for Principal Component nalysis Submitted for Blind Review bstract Graph mining methods enumerate frequent subgraphs efficiently, but they are not necessarily good features for

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

6.4 Krylov Subspaces and Conjugate Gradients

6.4 Krylov Subspaces and Conjugate Gradients 6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P

More information

On Sampling-based Approximate Spectral Decomposition

On Sampling-based Approximate Spectral Decomposition Sanjiv Kumar Google Research, New Yor, NY Mehryar Mohri Courant Institute of Mathematical Sciences and Google Research, New Yor, NY Ameet Talwalar Courant Institute of Mathematical Sciences, New Yor, NY

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Université du Littoral- Calais July 11, 16 First..... to the

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Randomized Numerical Linear Algebra: Review and Progresses

Randomized Numerical Linear Algebra: Review and Progresses ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

CS60021: Scalable Data Mining. Dimensionality Reduction

CS60021: Scalable Data Mining. Dimensionality Reduction J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a

More information

AA242B: MECHANICAL VIBRATIONS

AA242B: MECHANICAL VIBRATIONS AA242B: MECHANICAL VIBRATIONS 1 / 17 AA242B: MECHANICAL VIBRATIONS Solution Methods for the Generalized Eigenvalue Problem These slides are based on the recommended textbook: M. Géradin and D. Rixen, Mechanical

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Iterative methods for Linear System

Iterative methods for Linear System Iterative methods for Linear System JASS 2009 Student: Rishi Patil Advisor: Prof. Thomas Huckle Outline Basics: Matrices and their properties Eigenvalues, Condition Number Iterative Methods Direct and

More information

Non-negative matrix factorization with fixed row and column sums

Non-negative matrix factorization with fixed row and column sums Available online at www.sciencedirect.com Linear Algebra and its Applications 9 (8) 5 www.elsevier.com/locate/laa Non-negative matrix factorization with fixed row and column sums Ngoc-Diep Ho, Paul Van

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts

More information

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated. Math 504, Homework 5 Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated 1 Find the eigenvalues and the associated eigenspaces

More information

A Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades

A Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades A Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades Qiaochu Yuan Department of Mathematics UC Berkeley Joint work with Prof. Ming Gu, Bo Li April 17, 2018 Qiaochu Yuan Tour

More information

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Solving Constrained Rayleigh Quotient Optimization Problem by Projected QEP Method

Solving Constrained Rayleigh Quotient Optimization Problem by Projected QEP Method Solving Constrained Rayleigh Quotient Optimization Problem by Projected QEP Method Yunshen Zhou Advisor: Prof. Zhaojun Bai University of California, Davis yszhou@math.ucdavis.edu June 15, 2017 Yunshen

More information

Krylov Subspace Methods to Calculate PageRank

Krylov Subspace Methods to Calculate PageRank Krylov Subspace Methods to Calculate PageRank B. Vadala-Roth REU Final Presentation August 1st, 2013 How does Google Rank Web Pages? The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

A hybrid reordered Arnoldi method to accelerate PageRank computations

A hybrid reordered Arnoldi method to accelerate PageRank computations A hybrid reordered Arnoldi method to accelerate PageRank computations Danielle Parker Final Presentation Background Modeling the Web The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17 Krylov Space Methods Nonstationary sounds good Radu Trîmbiţaş Babeş-Bolyai University Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17 Introduction These methods are used both to solve

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

Fast Krylov Methods for N-Body Learning

Fast Krylov Methods for N-Body Learning Fast Krylov Methods for N-Body Learning Nando de Freitas Department of Computer Science University of British Columbia nando@cs.ubc.ca Maryam Mahdaviani Department of Computer Science University of British

More information

Problems. Looks for literal term matches. Problems:

Problems. Looks for literal term matches. Problems: Problems Looks for literal term matches erms in queries (esp short ones) don t always capture user s information need well Problems: Synonymy: other words with the same meaning Car and automobile 电脑 vs.

More information

Linear Algebra Background

Linear Algebra Background CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis Eigenvalue Problems Eigenvalue problems occur in many areas of science and engineering, such as structural analysis Eigenvalues also important in analyzing numerical methods Theory and algorithms apply

More information

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large

More information

Math 307 Learning Goals

Math 307 Learning Goals Math 307 Learning Goals May 14, 2018 Chapter 1 Linear Equations 1.1 Solving Linear Equations Write a system of linear equations using matrix notation. Use Gaussian elimination to bring a system of linear

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

IKA: Independent Kernel Approximator

IKA: Independent Kernel Approximator IKA: Independent Kernel Approximator Matteo Ronchetti, Università di Pisa his wor started as my graduation thesis in Como with Stefano Serra Capizzano. Currently is being developed with the help of Federico

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun yzsun@ccs.neu.edu March 16, 2016 Methods to Learn Classification Clustering Frequent Pattern Mining Matrix Data Decision

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 19: More on Arnoldi Iteration; Lanczos Iteration Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 17 Outline 1

More information

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES Eigenvalue Problems CHAPTER 1 : PRELIMINARIES Heinrich Voss voss@tu-harburg.de Hamburg University of Technology Institute of Mathematics TUHH Heinrich Voss Preliminaries Eigenvalue problems 2012 1 / 14

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Krylov Subspaces. The order-n Krylov subspace of A generated by x is

Krylov Subspaces. The order-n Krylov subspace of A generated by x is Lab 1 Krylov Subspaces Lab Objective: matrices. Use Krylov subspaces to find eigenvalues of extremely large One of the biggest difficulties in computational linear algebra is the amount of memory needed

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Spectral Clustering - Survey, Ideas and Propositions

Spectral Clustering - Survey, Ideas and Propositions Spectral Clustering - Survey, Ideas and Propositions Manu Bansal, CSE, IITK Rahul Bajaj, CSE, IITB under the guidance of Dr. Anil Vullikanti Networks Dynamics and Simulation Sciences Laboratory Virginia

More information

Krylov Subspaces. Lab 1. The Arnoldi Iteration

Krylov Subspaces. Lab 1. The Arnoldi Iteration Lab 1 Krylov Subspaces Lab Objective: Discuss simple Krylov Subspace Methods for finding eigenvalues and show some interesting applications. One of the biggest difficulties in computational linear algebra

More information

Chapter 6: Orthogonality

Chapter 6: Orthogonality Chapter 6: Orthogonality (Last Updated: November 7, 7) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved around.. Inner products

More information

Main matrix factorizations

Main matrix factorizations Main matrix factorizations A P L U P permutation matrix, L lower triangular, U upper triangular Key use: Solve square linear system Ax b. A Q R Q unitary, R upper triangular Key use: Solve square or overdetrmined

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Lecture # 11 The Power Method for Eigenvalues Part II. The power method find the largest (in magnitude) eigenvalue of. A R n n.

Lecture # 11 The Power Method for Eigenvalues Part II. The power method find the largest (in magnitude) eigenvalue of. A R n n. Lecture # 11 The Power Method for Eigenvalues Part II The power method find the largest (in magnitude) eigenvalue of It makes two assumptions. 1. A is diagonalizable. That is, A R n n. A = XΛX 1 for some

More information

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Chapter 7: Symmetric Matrices and Quadratic Forms

Chapter 7: Symmetric Matrices and Quadratic Forms Chapter 7: Symmetric Matrices and Quadratic Forms (Last Updated: December, 06) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved

More information

An Experimental Evaluation of a Monte-Carlo Algorithm for Singular Value Decomposition

An Experimental Evaluation of a Monte-Carlo Algorithm for Singular Value Decomposition An Experimental Evaluation of a Monte-Carlo Algorithm for Singular Value Decomposition Petros Drineas 1, Eleni Drinea 2, and Patrick S. Huggins 1 1 Yale University, Computer Science Department, New Haven,

More information

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm CS 622 Data-Sparse Matrix Computations September 19, 217 Lecture 9: Krylov Subspace Methods Lecturer: Anil Damle Scribes: David Eriksson, Marc Aurele Gilles, Ariah Klages-Mundt, Sophia Novitzky 1 Introduction

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 1: Course Overview & Matrix-Vector Multiplication Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 20 Outline 1 Course

More information

Math 307 Learning Goals. March 23, 2010

Math 307 Learning Goals. March 23, 2010 Math 307 Learning Goals March 23, 2010 Course Description The course presents core concepts of linear algebra by focusing on applications in Science and Engineering. Examples of applications from recent

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Arnoldi Notes for 2016-11-16 Krylov subspaces are good spaces for approximation schemes. But the power basis (i.e. the basis A j b for j = 0,..., k 1) is not good for numerical work. The vectors in the

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

Matrices, Vector Spaces, and Information Retrieval

Matrices, Vector Spaces, and Information Retrieval Matrices, Vector Spaces, and Information Authors: M. W. Berry and Z. Drmac and E. R. Jessup SIAM 1999: Society for Industrial and Applied Mathematics Speaker: Mattia Parigiani 1 Introduction Large volumes

More information

Faloutsos, Tong ICDE, 2009

Faloutsos, Tong ICDE, 2009 Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information