The Nyström Extension and Spectral Methods in Learning

Size: px
Start display at page:

Download "The Nyström Extension and Spectral Methods in Learning"

Transcription

1 Introduction Main Results Simulation Studies Summary The Nyström Extension and Spectral Methods in Learning New bounds and algorithms for high-dimensional data sets Patrick J. Wolfe (joint work with Mohamed-Ali Belabbas) Statistics and Information Sciences Laboratory School of Engineering and Applied Sciences Department of Statistics, Harvard University Future Directions in High-Dimensional Data Analysis Newton Institute Workshop, Cambridge University 26 June 2008 Wolfe (Harvard University) Spectral Methods in Learning June / 37

2 Introduction Main Results Simulation Studies Summary Outline Approximation of matrix products 1 Introduction Spectral methods and statistical learning Approximating a positive semi-definite kernel Discriminating between data and information 2 Nyström Approximation and Multi-Index Selection The Nyström extension as an approximation method Randomized multi-index selection by weighted sampling Deterministic multi-index selection by sorting 3 Numerical Results and Algorithmic Implementation Approximate sampling Low-rank kernel approximation Methods for nonlinear embeddings Wolfe (Harvard University) Spectral Methods in Learning June / 37

3 Introduction Main Results Simulation Studies Summary Outline Approximation of matrix products 1 Introduction Spectral methods and statistical learning Approximating a positive semi-definite kernel Discriminating between data and information 2 Nyström Approximation and Multi-Index Selection The Nyström extension as an approximation method Randomized multi-index selection by weighted sampling Deterministic multi-index selection by sorting 3 Numerical Results and Algorithmic Implementation Approximate sampling Low-rank kernel approximation Methods for nonlinear embeddings Wolfe (Harvard University) Spectral Methods in Learning June / 37

4 Introduction Main Results Simulation Studies Summary Outline Approximation of matrix products 1 Introduction Spectral methods and statistical learning Approximating a positive semi-definite kernel Discriminating between data and information 2 Nyström Approximation and Multi-Index Selection The Nyström extension as an approximation method Randomized multi-index selection by weighted sampling Deterministic multi-index selection by sorting 3 Numerical Results and Algorithmic Implementation Approximate sampling Low-rank kernel approximation Methods for nonlinear embeddings Wolfe (Harvard University) Spectral Methods in Learning June / 37

5 Introduction Main Results Simulation Studies Summary Spectral Methods Kernel Approximation Data vs. Information Spectral Methods in Learning The discrepancy between data and information What role do spectral methods play in statistical learning? Goal: get relevant information about very large datasets in very high dimensional spaces Image segmentation, low-dimensional embeddings,... What is the relevant information contained in the data set? Spectral methods reduce this question to finding a low-rank approximation to a symmetric, positive semi-definite (SPSD) kernel equivalently, a quadratic form or positive kernel They can be quite effective, and see wide use: Older methods: principal components analysis (1901), multidimensional scaling (1958),... Newer methods: isomap, Laplacian eigenmaps, Hessian eigenmaps, diffusion maps,... Wolfe (Harvard University) Spectral Methods in Learning June / 37

6 Introduction Main Results Simulation Studies Summary Spectral Methods Kernel Approximation Data vs. Information Application of Low-Rank Approximations to Learning Inner and outer characteristics of the point cloud Let {x 1,..., x n } be a collection of data points in R m. Spectral methods can be classified according to whether they rely on: Outer characteristics of the point cloud (PCA, discriminants). Here we work directly in the ambient space. Require spectral analysis of a positive-definite kernel of dimension m, the extrinsic dimensionality of the data. Inner characteristics of the point cloud (MDS, extensions). Embedding requires the spectral analysis of a kernel of dimension n, the cardinality of the point cloud. The spectral analysis task typically consists of finding a rank-k approximation to a symmetric, positive semi-definite matrix. Wolfe (Harvard University) Spectral Methods in Learning June / 37

7 Introduction Main Results Simulation Studies Summary Spectral Methods Kernel Approximation Data vs. Information How to Approximate a Positive Kernel, in Theory? Finding a low-rank approximation is easy... An SPSD matrix Q can be written in spectral coordinates Q = UΛU T, where U is orthogonal and Λ = diag(λ 1,..., λ n ) is diagonal. The λ i s are the eigenvalues of Q, ordered such that λ 1 λ 2... λ n 0, and the u i s are the eigenvectors. For any unitarily invariant matrix norm, we have that argmin Q Q = UΛ k U T =: Q k, Q: rank( Q)=k where Λ k = diag(λ 1, λ 2,..., λ k, 0,..., 0) Wolfe (Harvard University) Spectral Methods in Learning June / 37

8 Introduction Main Results Simulation Studies Summary Spectral Methods Kernel Approximation Data vs. Information How to Approximate a Positive Kernel, in Practice? Finding a low-rank approximation is hard! Changing to spectral coordinates is done using the Singular Value Decomposition of Q, which requires O(n 3 ) operations On a Pentium IV 3GHZ desktop PC, with 1GB RAM, 512k Cache: Extrapolating to n = 10 6, factoring Q takes more than 4 months. When n increases, Q quickly becomes too large to be stored in memory Wolfe (Harvard University) Spectral Methods in Learning June / 37

9 Introduction Main Results Simulation Studies Summary Spectral Methods Kernel Approximation Data vs. Information Approximating Very Large Kernels How to discriminate between data and information? This presents a practical problem for large data sets! A commonly used trick is to sparsify the kernel. Fix ε > 0. If Q ij ε, set Q ij = 0 Questions: How to choose ε? How accurate is the result? Alternative approach: discard some of the data. How to construct a low-rank approximation using just some of the data? The Nyström extension provides an answer The idea behind the Nyström extension is as follows: Write Q = X T X, so that Q is a Gram matrix for column vectors x 1,..., x n. Choose a subset J of vectors x i and their correlations with all the other vectors to find a low-rank approximation Q. Wolfe (Harvard University) Spectral Methods in Learning June / 37

10 Introduction Main Results Simulation Studies Summary Spectral Methods Kernel Approximation Data vs. Information A Provably Good Low-Rank Approximation Our main result on approximating quadratic forms How to choose J : J = k so as to minimize Q Q? This is equivalent to asking: How to choose the most informative part from our dataset? most informative being conditioned on our reconstruction scheme There are ( n k) such subsets no hope of enumerating Instead, we define a distribution on k-subsets by p Q,k det(q J J ), for principal submatrices Q J J Let Q have eigenvalues λ 1 λ 2... λ n 0. Our main result will be to show that for any unitarily invariant norm, E Q Q (k + 1)(λ k+1 + λ k λ n ) Wolfe (Harvard University) Spectral Methods in Learning June / 37

11 Outline Approximate Linear Algebra 1 Introduction Spectral methods and statistical learning Approximating a positive semi-definite kernel Discriminating between data and information 2 Nyström Approximation and Multi-Index Selection The Nyström extension as an approximation method Randomized multi-index selection by weighted sampling Deterministic multi-index selection by sorting 3 Numerical Results and Algorithmic Implementation Approximate sampling Low-rank kernel approximation Methods for nonlinear embeddings Wolfe (Harvard University) Spectral Methods in Learning June / 37

12 The Nyström Extension Simplify the problem Historically, the Nyström extension was introduced to obtain numerical solutions to integral equations. Let k : [0, 1] [0, 1] R be an SPD kernel and (f i, λ i ), i N, denote its pairs of eigenfunctions and eigenvalues: 1 0 k(x, y)f i (y) dy = λ i f i (x), i N. The Nyström extension approximates the eigenvectors of k(x, y) based on a discretization of the interval [0, 1]. Define the M + 1 points x i by x m = x m 1 + 1/M with x 0 = 0, so that the x i s are evenly spaced along the interval [0, 1]. Form the Gram matrix K mn := k(x m, x n ) and decompose it. Wolfe (Harvard University) Spectral Methods in Learning June / 37

13 The Nyström Extension Extend the solution We now solve a finite-dimensional problem 1 M + 1 K mn v i (n) = λ v i v i (m), i = 0, 1,..., M, n where (v i, λ v i ) represent the eigenvector-eigenvalues pairs associated with K. What do we do with these eigenvectors? We extend them to give an estimate ˆf i of the i th eigenfunction as follows: 1 ˆf i (x) = (M + 1)λ v k(x, x m )v i (m). i In essence: only use partial information about the kernel to solve a simpler eigenvalue problem, and then to extend the solution using complete knowledge of the kernel. Wolfe (Harvard University) Spectral Methods in Learning June / 37 m

14 The Nyström Extension In finite dimensions How do we translate this to a finite dimensional setting? We approximate k eigenvectors of an SPD matrix Q by decomposing and extending a k k principal submatrix of Q. We partition Q as follows: [ ] QJ Y Q = Y T, Z with Q J R k k ; we say that this partition corresponds to the multi-index J = {1, 2,..., k}. Define spectral decompositions Q = UΛU T, Q J = U J Λ J U J T. Wolfe (Harvard University) Spectral Methods in Learning June / 37

15 The Nyström Extension The approximation error The Nyström extension then provides an approximation for k eigenvectors in U as [ ] U Ũ := J Y T 1 ; Q U J Λ J = U J Λ J U T J. J In turn, the approximations Ũ U and Λ J Λ may be composed to yield an approximation Q Q according to [ ] Q := ŨΛ QJ Y JŨT = Y T Y T Q 1. J Y The resultant approximation error is Q Q = Z Y T Q J 1 Y, the norm of the Schur complement of Q J in Q. Wolfe (Harvard University) Spectral Methods in Learning June / 37

16 The Nyström Extension Another point of view What problem does the Nyström extension solve? Our answer requires the partial order on the cone of SPSD matrices. Write Q 0 is Q is SPSD, and Q Q if Q Q 0 The Nyström extension Q is the minimal (with respect to ) positive-definite projection of Q onto the range of Q J Is there a more statistical interpretation? Think linear models: Recall any Q 0 admits a Gram representation Q = X T X. Imagine selecting the first n 1 columns of X and regressing the final column x n onto these Our error term is then nothing more than the residual sum-of-squares. It is zero if and only if x n lies in the span of x 1,..., x n 1, implying singular Q and perfect reconstruction More generally, the Schur complement S C (Q J ) is recognizable as the covariance of the best linear unbiased estimator. Wolfe (Harvard University) Spectral Methods in Learning June / 37

17 Adjusting Computational Load vs. Approximation Error From eigenanalysis to partitioning... Back to our kernel approximation problem... On what fronts do we gain by using the Nyström extension? What is required is the spectral analysis of a kernel of size k n gain in space and time complexity. But we introduced another problem: how to partition Q? In other words, we have shifted the computational load from eigenanalysis to the determination a good partition The latter problem is more amenable to approximation We give two algorithms to solve it, along with error bounds... Wolfe (Harvard University) Spectral Methods in Learning June / 37

18 The Nyström Extension A combinatorial problem We now introduce the problem formally with notation: J, K {1,..., n} are multi-indices of respective cardinalities k and l, containing pairwise distinct elements in {1,..., n}. We write J = {j 1,..., j k }, K = {k 1,..., k l }, and denote by J the complement of J in {1,..., n}. Define Q J K for the k l matrix whose (p, q)-th entry is given by (Q J K ) pq = Q jpk q. Abbreviate Q J for Q J J. The partitioning problem is equivalent to selecting a multi-index J such that the error Q Q = Q J Q J J Q 1 J Q = S J J C (Q J ) is minimized for some unitarily invariant norm. Wolfe (Harvard University) Spectral Methods in Learning June / 37

19 The Nyström Method and Exact Reconstruction Recovery when rank(q J ) = rank(q) = k When does the Nyström method admit exact reconstruction? If we take for J the entire set {1, 2,..., n}, then the Nyström extension yields Q = Q trivially If Q is of rank k < n, then there exist J : J = k such that the Nyström method yields exact reconstruction These J are those such that rank(q J ) = rank(q) = k Intuition: express Q as a Gram matrix whose entries comprise the inner products of n vectors in R k Knowing the correlation of these n vectors with a subset of k linearly independent vectors allows us to recover them Information contained in Q J is sufficient to reconstruct Q; Nyström extension performs the reconstruction To verify, we introduce our first lemma... Wolfe (Harvard University) Spectral Methods in Learning June / 37

20 Verifying the Perfect Reconstruction Property Characterizing Schur complements as ratios of determinants Lemma (Crabtree-Haynsworth) Let Q J be a nonsingular principal submatrix of some SPSD matrix Q. The Schur complement of Q J in Q is given element-wise by (S C (Q J )) ij = det(q J {i} J {j}). (1) det(q J ) This implies that for J such that rank(q J ) = rank(q) = k, S C (Q J ) = Q J Q J J Q 1 J Q J J = 0. If rank(q) = k = J, then (1) implies that diag(s C (Q J )) = 0 Q 0 implies S C (Q J ) 0 for any multi-index J, allowing us to conclude that S C (Q J ) is identically zero. Wolfe (Harvard University) Spectral Methods in Learning June / 37

21 A Connection to Compound Matrices Characterizing Schur complements as ratios of determinants By Crabtree-Haynsworth we may write tr(s C (Q J )) = det(q J ) 1 i / J det(q J {i} ) This is in fact the trace norm of S C (Q J ). It is unitarily invariant and dominates all unitarily invariant matrix norms Controlling the trace norm S C (Q J ) tr = Q Q tr is hence key to bounding the Nyström approximation error Eigenvalue majorization results tell us worst-case results but little else. Ideally we would like a relative-error approximation that is invariant to rotation Recalling the optimal approximation via SVD, the best possible bound would be multiplicative in the minimal trace-norm error n i=k+1 λ i(q) Wolfe (Harvard University) Spectral Methods in Learning June / 37

22 A Connection to Compound Matrices Characterizing Schur complements as ratios of determinants Crabtree-Haynsworth provides an entry-wise characterization of the Schur complement in terms of ratios of determinants To exploit this, we require the notion of exterior algebras by way of compound matrices. Specifically, the kth compound matrix Q (k) of Q is comprised of all determinants det(q J K ) of k-subsets J K The following important result allows us to express tr(q (k) ) as a sum of k-fold products of eigenvalues of Q: Theorem (Cauchy-Binet) Let Q admit the spectral decomposition Q = UΛU T. Then if Q (k) denotes the kth compound matrix of Q, we have that (UΛU T ) (k) = U (k) Λ (k) (U T ) (k). Wolfe (Harvard University) Spectral Methods in Learning June / 37

23 Randomized Multi-Index Selection by Weighted Sampling Statement of the main result Since Q 0, we may use it to induce a probability distribution on the set of all J : J = k as follows: p Q,k (J) det(q I ). Thus, we may first sample J p Q,k (J), then perform the Nyström extension on the chosen multi-index: Theorem (Randomized Multi-Index Selection) Let Q 0 have eigenvalues λ 1... λ n 0. Let Q be the Nyström approximation to Q corresponding to J, with J p Q,k (J) det(q J ). Then for any unitarily invariant matrix norm, E Q Q (k + 1) n i=k+1 λ i (2) Wolfe (Harvard University) Spectral Methods in Learning June / 37

24 Proof of the Randomized Multi-Index Result I Randomized algorithm for multi-index selection Proof. We seek to bound E Q Q tr = 1 tr(q (k) ) J, J =k The Crabtree-Haynsworth Lemma yields det(q J ) S C (Q J ) tr. tr(s C (Q J )) = i / J det(q J {i} ), det(q J ) and thus E Q Q tr = 1 tr(q (k) ) J, J =k i / J det(q J {i} ). (3) Wolfe (Harvard University) Spectral Methods in Learning June / 37

25 Proof of the Randomized Multi-Index Result II Randomized algorithm for multi-index selection Proof. Every multi-index of cardinality k + 1 appears exactly k + 1 times in the double sum of (3) above, whence E Q Q tr = (k + 1) tr(q (k) ) J, J =k+1 det(q J ) = (k + 1) tr(q(k+1) ) tr(q (k) ). (4) By Cauchy-Binet, the sum of the principal (k + 1)-minors of Q can be expressed as the sum of (k + 1)-fold products of its ordered eigenvalues, and so factoring we obtain E Q Q tr = (k + 1) tr(q(k+1) ) tr(q (k) ) (k + 1) n λ i (Q) tr(q(k) ) tr(q (k) ). i=k+1 Wolfe (Harvard University) Spectral Methods in Learning June / 37

26 Deterministic Low-Rank Kernel Approximation Deterministic multi-index selection by sorting We now present a low-complexity deterministic multi-index selection algorithm and provide a bound on its worst-case error Let J contain the indices of the k largest diagonal elements of Q and then implement the Nyström extension. Then we have: Theorem (Deterministic Multi-Index Selection) Let Q be a positive-definite kernel, let J contain the indices of its k largest diagonal elements, and let Q be the corresponding Nyström approximation. Then for any unitarily invariant matrix norm, Q Q i / J Q ii. (5) Wolfe (Harvard University) Spectral Methods in Learning June / 37

27 Proof of the Deterministic Multi-Index Result I Deterministic algorithm for multi-index selection We have sacrificed some power to obtain gains in the deterministic nature of the result and in computational efficiency: Q Q n i=k+1 Q ii (sorting) vs. E Q Q (k + 1) n i=k+1 λ i (sampling) The proof of this theorem is straightforward, once we have the following generalization of the Hadamard inequality: Lemma (Fischer s Lemma) If Q is a positive-definite matrix and Q J a nonsingular principal submatrix then det(q J {i} ) < det(q J )Q ii. Wolfe (Harvard University) Spectral Methods in Learning June / 37

28 Proof of the Deterministic Multi-Index Result II Deterministic algorithm for multi-index selection Proof of the Theorem. We have from our earlier proof that Q Q tr(s C (Q J )); applying Crabtree-Haynsworth in turn gives Q Q 1 det(q J ) det(q J {i} ), after which Fischer s Lemma yields Q Q i / J Q ii. i / J Though we cannot say more without formally adopting a measure on the cone of SPSD matrices, we have observed this simple algorithm appears to work well in a variety of practical settings Other stepwise-greedy and sampling approaches to multi-index selection are also available through the notion of pinchings Wolfe (Harvard University) Spectral Methods in Learning June / 37

29 Remarks and Discussion Key points Determinantal sampling exploits the full power of Cauchy-Binet and Crabtree-Haynsworth, yielding a relative-error bound that depends only on the spectrum of Q Assigning zero mass to singular subsets is key pay k 3 for this Maximizing det(q J ) is not in general superior and is NP-hard Tridiagonal restrictions of subsets Q J interpolate our two results in a natural way, both in terms of bounds and algorithmic complexity Fischer s Lemma is an extreme example In general, You pay more to get more Majorization results can be used to obtain quick estimates of the eigenvalues of Q, and hence of the best or expected error of the Nyström approximation If the spectrum of Q does not exhibit fast decay, one should be cautious in approximating it! Wolfe (Harvard University) Spectral Methods in Learning June / 37

30 Remarks and Discussion Comparison to known results Drineas et al. (2005) proposed to choose row/column subsets by sampling, independently and with replacement, indices in proportion to elements of {Qii 2}n i=1, and showed only: E Q Q F Q Q k F n Qii 2, i=1 Our randomized approach yields a relative error bound Algorithmic complexity: O(k 3 + (n k)k 2 ) Our deterministic approach offers improvement if tr(q) n; complexity O(k 3 + (n k)k 2 + n log k) Connections to the recently introduced notion of volume sampling in theoretical computer science Wolfe (Harvard University) Spectral Methods in Learning June / 37

31 Introduction Main Results Simulation Studies Summary Sampling Sampling Embeddings Outline Approximate Linear Algebra 1 Introduction Spectral methods and statistical learning Approximating a positive semi-definite kernel Discriminating between data and information 2 Nyström Approximation and Multi-Index Selection The Nyström extension as an approximation method Randomized multi-index selection by weighted sampling Deterministic multi-index selection by sorting 3 Numerical Results and Algorithmic Implementation Approximate sampling Low-rank kernel approximation Methods for nonlinear embeddings Wolfe (Harvard University) Spectral Methods in Learning June / 37

32 Introduction Main Results Simulation Studies Summary Sampling Sampling Embeddings Implementation of the Sampling Scheme Sampling from p Q,k Sampling directly from p Q,k det(q J ) is infeasable Simulation methods provide an appealing alternative We employed the Metropolis algorithm to simulate an ergodic Markov chain admitting p Q,k (J) as its equilibrium distribution: The proposal distribution is straightforward: exchange one index from J with one index from J uniformly at random Variants such as mixture kernels are of course possible We made no attempt to optimize this choice, as its performance in practice was observed to be satisfactory Wolfe (Harvard University) Spectral Methods in Learning June / 37

33 Introduction Main Results Simulation Studies Summary Sampling Sampling Embeddings A Metropolis Algorithm for Sampling from p Q,k Implementation of the sampling scheme Implementation of the Metropolis sampler is straightforward and intuitive: Begin with data X = {x 1, x 2,..., x n } R m Initialize (in any desired manner) a multi-index J (0) of cardinality k Compute the sub-kernel Q (0) (X, J (0) ) After T iterations, return J p Q,k INPUT : Data X, 0 k n, T > 0, k k sub-kernel Q (0) with indices J (0) OUTPUT : Sampled k-multi-index J for t = 1 to T do pick s {1, 2,..., k} uniformly at random pick j s {1, 2,..., n} \ J (t 1) at random Q UpdateKernel(Q (t 1), X, s, j s) det(q with probability min(1, ) det(q (t 1) ) ) do Q (t) Q J (t) {j s} J (t 1) \ {j s } otherwise Q (t) Q (t 1) J (t) J (t 1) end do end for Wolfe (Harvard University) Spectral Methods in Learning June / 37

34 Introduction Main Results Simulation Studies Summary Sampling Sampling Embeddings Simulation Results Experimental setup First, we compare the different randomized algorithms for multi-index selection with one another, and with the method of Drineas et al. (2005): Three different settings for approximation error evaluation: Principal components, Diffusion maps, and Laplacian eigenmaps. We draw kernels at random from ensembles relevant to the test setting, and then average (though results do not imply a measure on the input space) For each kernel drawn, we further average over many runs of the randomized algorithm. Wolfe (Harvard University) Spectral Methods in Learning June / 37

35 Introduction Main Results Simulation Studies Summary Sampling Sampling Embeddings Approximate Principal Components Analysis Randomized multi-index selection Relative Error (db) Approximant Rank Sampling Uniformly 2 Sampling G ii We drew Sampling p G,k SPD matrices of rank 12 from a Wishart ensemble. We show the error of several algorithms used to perform a low rank approximation (outputs averaged over 250 trials) Wolfe (Harvard University) Spectral Methods in Learning June / 37

36 Introduction Main Results Simulation Studies Summary Sampling Sampling Embeddings Approximate Diffusion Maps Randomized multi-index selection Relative Error (db) Sampling Uniformly Sampling G ii 2 Sampling p G,k Optimal G k Approximant Rank We sample 500 points uniformly on a circle, and use the Diffusion maps algorithm to define an appropriate kernel for embedding We measured the resultant approximation error, averaged over 100 datasets and over 100 trials per set Wolfe (Harvard University) Spectral Methods in Learning June / 37

37 Introduction Main Results Simulation Studies Summary Sampling Sampling Embeddings Approximate Diffusion Maps Deterministic multi-index selection Observed cumulative probability Sampling Uniformly Sampling! G ii 2 Sampling! p G,k Relative Error At left, we plot the distribution of approximation error for fixed rank k = 8. Leftmost vertical line: optimum. Rightmost vertical line: deterministic algorithm The worst-case error bound of our deterministic algorithm can be clearly seen Wolfe (Harvard University) Spectral Methods in Learning June / 37

38 Introduction Main Results Simulation Studies Summary Sampling Sampling Embeddings Laplacian Eigenmaps Example Embedding a massive dataset We used the Laplacian eigenmaps algorithm embed the fishbowl dataset Point Realization of "Fishbowl" Data Set Sampling Uniformly 3 3 x 10 x Sampling pg,k x 10 Wolfe (Harvard University) Spectral Methods in Learning x 10 June / 37

39 Introduction Main Results Simulation Studies Summary Summary: Nyström & Spectral Methods in Learning Work supported by NSF-DMS and DARPA Two alternative strategies for the approximate spectral decomposition of large kernels were presented: Randomized multi-index selection (sampling) Deterministic multi-index selection (sorting) Simulation studies demonstrated applicability to machine learning tasks, with measurable improvements in performance Low-rank kernel approximation Methods for nonlinear embeddings Related techniques, coupled with the Nyström method, can be applied to approximate the product of two matrices See earlier Newton Web Seminar of 03 June 2008, and the SISL website: Matlab and R code forthcoming! Wolfe (Harvard University) Spectral Methods in Learning June / 37

OPERATIONS on large matrices are a cornerstone of

OPERATIONS on large matrices are a cornerstone of 1 On Sparse Representations of Linear Operators and the Approximation of Matrix Products Mohamed-Ali Belabbas and Patrick J. Wolfe arxiv:0707.4448v2 [cs.ds] 26 Jun 2009 Abstract Thus far, sparse representations

More information

arxiv: v1 [stat.ml] 24 Jun 2009

arxiv: v1 [stat.ml] 24 Jun 2009 On Landmark Selection and Sampling in High-Dimensional Data Analysis arxiv:0906.4582v1 [stat.ml] 24 Jun 2009 By Mohamed-Ali Belabbas and Patrick J. Wolfe Statistics and Information Sciences Laboratory,

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Approximate Principal Components Analysis of Large Data Sets

Approximate Principal Components Analysis of Large Data Sets Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models Bare minimum on matrix algebra Psychology 588: Covariance structure and factor models Matrix multiplication 2 Consider three notations for linear combinations y11 y1 m x11 x 1p b11 b 1m y y x x b b n1

More information

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014

Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information

Designing Information Devices and Systems II

Designing Information Devices and Systems II EECS 16B Fall 2016 Designing Information Devices and Systems II Linear Algebra Notes Introduction In this set of notes, we will derive the linear least squares equation, study the properties symmetric

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Lecture: Some Practical Considerations (3 of 4)

Lecture: Some Practical Considerations (3 of 4) Stat260/CS294: Spectral Graph Methods Lecture 14-03/10/2015 Lecture: Some Practical Considerations (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Preface to Second Edition... vii. Preface to First Edition...

Preface to Second Edition... vii. Preface to First Edition... Contents Preface to Second Edition..................................... vii Preface to First Edition....................................... ix Part I Linear Algebra 1 Basic Vector/Matrix Structure and

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery

CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

MATH 583A REVIEW SESSION #1

MATH 583A REVIEW SESSION #1 MATH 583A REVIEW SESSION #1 BOJAN DURICKOVIC 1. Vector Spaces Very quick review of the basic linear algebra concepts (see any linear algebra textbook): (finite dimensional) vector space (or linear space),

More information

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course Spectral Algorithms I Slides based on Spectral Mesh Processing Siggraph 2010 course Why Spectral? A different way to look at functions on a domain Why Spectral? Better representations lead to simpler solutions

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Homework 1. Yuan Yao. September 18, 2011

Homework 1. Yuan Yao. September 18, 2011 Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to

More information

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra. QALGO workshop, Riga. 1 / 26 Quantum algorithms for linear algebra., Center for Quantum Technologies and Nanyang Technological University, Singapore. September 22, 2015 QALGO workshop, Riga. 2 / 26 Overview

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

Exercise Sheet 1.

Exercise Sheet 1. Exercise Sheet 1 You can download my lecture and exercise sheets at the address http://sami.hust.edu.vn/giang-vien/?name=huynt 1) Let A, B be sets. What does the statement "A is not a subset of B " mean?

More information

Foundations of Matrix Analysis

Foundations of Matrix Analysis 1 Foundations of Matrix Analysis In this chapter we recall the basic elements of linear algebra which will be employed in the remainder of the text For most of the proofs as well as for the details, the

More information

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian FE661 - Statistical Methods for Financial Engineering 2. Linear algebra Jitkomut Songsiri matrices and vectors linear equations range and nullspace of matrices function of vectors, gradient and Hessian

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney

More information

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

8. Diagonalization.

8. Diagonalization. 8. Diagonalization 8.1. Matrix Representations of Linear Transformations Matrix of A Linear Operator with Respect to A Basis We know that every linear transformation T: R n R m has an associated standard

More information

Knowledge Discovery and Data Mining 1 (VO) ( )

Knowledge Discovery and Data Mining 1 (VO) ( ) Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

A Randomized Algorithm for the Approximation of Matrices

A Randomized Algorithm for the Approximation of Matrices A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 2 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory April 5, 2012 Andre Tkacenko

More information

Distance Geometry-Matrix Completion

Distance Geometry-Matrix Completion Christos Konaxis Algs in Struct BioInfo 2010 Outline Outline Tertiary structure Incomplete data Tertiary structure Measure diffrence of matched sets Def. Root Mean Square Deviation RMSD = 1 n x i y i 2,

More information

Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning. Sargur N. Srihari Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

More information

Dimension Reduction and Iterative Consensus Clustering

Dimension Reduction and Iterative Consensus Clustering Dimension Reduction and Iterative Consensus Clustering Southeastern Clustering and Ranking Workshop August 24, 2009 Dimension Reduction and Iterative 1 Document Clustering Geometry of the SVD Centered

More information

Coding the Matrix Index - Version 0

Coding the Matrix Index - Version 0 0 vector, [definition]; (2.4.1): 68 2D geometry, transformations in, [lab]; (4.15.0): 196-200 A T (matrix A transpose); (4.5.4): 157 absolute value, complex number; (1.4.1): 43 abstract/abstracting, over

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Linear Algebra in Actuarial Science: Slides to the lecture

Linear Algebra in Actuarial Science: Slides to the lecture Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

A PROBABILISTIC INTERPRETATION OF SAMPLING THEORY OF GRAPH SIGNALS. Akshay Gadde and Antonio Ortega

A PROBABILISTIC INTERPRETATION OF SAMPLING THEORY OF GRAPH SIGNALS. Akshay Gadde and Antonio Ortega A PROBABILISTIC INTERPRETATION OF SAMPLING THEORY OF GRAPH SIGNALS Akshay Gadde and Antonio Ortega Department of Electrical Engineering University of Southern California, Los Angeles Email: agadde@usc.edu,

More information

Dissertation Defense

Dissertation Defense Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Applied Linear Algebra

Applied Linear Algebra Applied Linear Algebra Peter J. Olver School of Mathematics University of Minnesota Minneapolis, MN 55455 olver@math.umn.edu http://www.math.umn.edu/ olver Chehrzad Shakiban Department of Mathematics University

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Review of some mathematical tools

Review of some mathematical tools MATHEMATICAL FOUNDATIONS OF SIGNAL PROCESSING Fall 2016 Benjamín Béjar Haro, Mihailo Kolundžija, Reza Parhizkar, Adam Scholefield Teaching assistants: Golnoosh Elhami, Hanjie Pan Review of some mathematical

More information

Compound matrices and some classical inequalities

Compound matrices and some classical inequalities Compound matrices and some classical inequalities Tin-Yau Tam Mathematics & Statistics Auburn University Dec. 3, 04 We discuss some elegant proofs of several classical inequalities of matrices by using

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

Spectral approximations in machine learning

Spectral approximations in machine learning Spectral approximations in machine learning Darren Homrighausen Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 dhomrigh@stat.cmu.edu Daniel J. McDonald Department of Statistics

More information

The Singular Value Decomposition and Least Squares Problems

The Singular Value Decomposition and Least Squares Problems The Singular Value Decomposition and Least Squares Problems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 27, 2009 Applications of SVD solving

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information