Risi Kondor, The University of Chicago. Nedelina Teneva UChicago. Vikas K Garg TTI-C, MIT

Size: px
Start display at page:

Download "Risi Kondor, The University of Chicago. Nedelina Teneva UChicago. Vikas K Garg TTI-C, MIT"

Transcription

1 Risi Kondor, The University of Chicago Nedelina Teneva UChicago Vikas K Garg TTI-C, MIT

2 Risi Kondor, The University of Chicago Nedelina Teneva UChicago Vikas K Garg TTI-C, MIT

3 {(x 1, y 1 ), (x 2, y 2 ),, (x m, y m )} f : x y 3 3/51

4 [ 1 f = argmin f F m m ] l(f(x i ), y i ) + λ f 2 2 i=1 General framework (regularized risk minimization) that encompasses Kernel methods, including Gaussian processes and SVMs etc Most of Bayesian statistics Boosting, etc [Vapnik & Chervnonenkis 1971 ] 4 4/51

5 5 5/51

6 Data: {(x, y)} m i=1 Features: {(ϕ 1 (x i ),, ϕ n (x i ), y)} m i=1 f : x y optimization 6 6/51

7 [ 1 f = argmin f F m m ] l(f(x i ), y i ) + λ f 1 i=1 The l 1 norm induces Crucially, minimizing f 1 is almost as good as minimizing f 0 This lead to an explosion of activity related to The Lasso [Tibshirani, 1996], Basis Pursuit [Donoho, 1998], Elastic Net [Hui & Hastie, 2005], Compressed Sensing [Donoho, 2004] [Candés et al 2005 ] [Osher] However, no more Representer Theorem 7 7/51

8 Data: {(x, y)} m i=1 Features: {(ϕ 1 (x i ),, ϕ n (x i ), y)} m i=1 f : x y optimization 8 8/51

9 Black box machine learning ignores the structure of the data itself: The algorithmic part is separated from the statistical part O(n 3 ) complexity is totally unrealistic in today s world No obvious parallelism Ignores the most important ideas of the Applied Math of the last 30 years, such as multiresolution analysis, fast multipole methods and multigrid (The relationship of deep learning to all this is not clear yet) 9 9/51

10

11 Data Features Mulitresolution (eg, Scattering) f : x y 11 11/51

12 MMF is a multilevel factorization of the data similarity (kernel) matrix A of the form ( ) ( ) ( ) ( ) ( ) ( ) P P Q L Q 1 where A Q 1 Q L P is just a permutation matrix to reorder the rows and columns of A, Each Q l is an orthogonal matrix that has special, Outside its [Q l ] 1:δl 1, 1:δ l 1 block, each Q l is just the identity for some fixed sequence n δ 1 δ L H A hierarchical, multipole-type description of the data In factorized form, A is much easier to deal with 12 12/51

13 The eigendecomposition (PCA) of a symmetric matrix A R n n is ( )( )( ) ( ) = Q A The MMF factorization is ( ) ( ) ( ) ( ) ( ) ( ) P P Q D Q L Q 1 A Q 1 Q L H The eigendecomposition always exists, is essentially unique, and truncating it after k terms is the optimal approximation in the Frobenius and nuclear norms But it is expensive to compute O(n 3 ), eigenvectors are dense, does not take advantage of hierarchical structrure 13 13/51

14 In general, multiresolution analysis on a space X is a filtration where V l = V l+1 W l+1 and Each V l s orthonormal basis is {ϕ l m} m Each W l s orthonormal basis is {ψ l m} m The spaces are chosen so that as l increases, V l contain functions that are increasingly smooth wrt some self-adjoint operator T : L(X) L(X) 14 14/51

15 The central dogma of harmonic analysis is that the structure of the space of functions on a set X can shed light on the structure of X itself G L(G) The interplay between geometry of sets, function spaces on sets, and operators on sets is classical in Harmonic Analysis [Coifman & Maggioni, 2006] Multiresolution analysis is an attractive paradigm for ML because Data naturally clusters into (soft) hierarchies The resulting data structures can form the basis of fast algorithms 15 15/51

16 But how does a matrix induce multiresolution??? 16 16/51

17

18 R Mallat [1989] defined multiresolution on R by the following axioms: 1 j V l = {0}, 2 l V l is dense in L 2 (R), 3 If f V l then f (x) = f(x 2 l m) is also in V l for any m Z, 4 If f V l, then f (x) = f(2x) is in V l 1, which imply the existence of a mother wavelet ψ and a father wavelet ϕ s t ψ l m = 2 l/2 ψ(2 l x m) and ϕ l m = 2 l/2 ϕ(2 l x m) 18 18/51

19 Which of the ideas from classical multiresolution still make sense? Recursively split L(X) into smoother and rougher parts Basis functions should be localized in space & frequency Each Φ l Q l Φl+1 Ψ l+1 transform is orthogonal and sparse Each ψ l m is derived by translating ψl MAYBE Each ψ l is derived by scaling ψ??? 19 19/51

20 1 The sequence L(X) = V 0 V 1 V 2 is a filtration of R n in terms of smoothness with respect to T in the sense that µ l = inf f, T f / f, f f V l \{0} increases at a given rate 2 The wavelets are localized in the sense that inf sup ψm(y) l x X y X d(x, y) α increases no faster than a certain rate 3 Letting Q l be the matrix expressing Φ l Ψ l in the previous basis Φ l 1, ie, ϕ l m = dim(v l 1 ) i=1 [Q l ] m,i ϕ l 1 i ψm l = dim(v l 1 ) i=1 [Q l ] m+dim(vl 1 ),i ϕ l 1 i, each Q l orthogonal transform is sparse, guaranteeing the existence of a fast wavelet transform (Φ 0 is taken to be the standard basis, ϕ 0 m = e m ) 20 20/51

21

22 If X = n is finite, representing T by a symmetric matrix A R, each basis transform V l V l+1 W l+1 is like applying a rotation matrix A Q 1 AQ 1 Q 2 Q 1 AQ 1 Q 2 and then fixing a subset of the coordinates as wavelets In addition, Q 1,, Q L must obey sparsity constraints multiresolution analysis multilevel matrix factorization 22 22/51

23 ( ) ( ) P ( ) ( ) P ( ) ( ) Q L Q 1 A Q 1 Q L H Given a symmetric matrix A R n n, a class of sparse rotations Q, and a sequence n δ 1 δ L, a of A is A = Q 1 Q 2 Q LH Q L Q 2 Q 1, where each Q l Q rotation satisfies [Q l ] [n]\sl, [n]\s l = I n δl 1 for some nested sequence of sets [n] = S 1 S 2 S L+1 with S l = δ l 1, and H is S L+1 core diagonal If this is factorization is exact, we say that A is (over G with δ 1,, δ L ) generalization of rank 23 23/51

24 Q l It is critical that the Q l must be very simple and local rotations Two choices: 1 k : Jacobi MMFs ( ) Q = I n k (i1,,i k ) O = P P for some O SO(k) for k = 2, just a Givens rotation 2 k Parallel MMFs ( ) Q = (i 1 1,,i 1 k ) O 1 (i 2 1 1,,i 2 k ) O 2 (i m 2 1,,i m km ) O m = P P for some O 1,, O m SO(k) 24 24/51

25 Given A, ideally, we would like to solve minimize A Q 1 Q L H Q L Q 1 2 Frob [n] S 1 S L H HS n ; Q L 1,, Q L Q for a given class Q of local rotations and dimensions δ 1 δ 2 δ L In general, this optimization problem is combinatorially hard Easy to approximate it in a greedy way (level by level) To solve the combinatorial part of the problem (at each level) use a Deterministic strategy, or a Randomized strategy 25 25/51

26 If Q l = I n k I O with I = (i 1,, i k ) and J l = {i k }, then the contribution of level l to the MMF approximation error (in Frobenius norm) is k 1 E l = EI O = 2 [O[A l 1 ] I,I O ] 2 k,p + 2[OBO ] k,k, p=1 where B = [A l 1 ] I,Sl ([A l 1 ] I,Sl ) In the special case of k =2 and I l = (i, j), E l = E O (i,j) = 2[O[A l 1] (i,j),(i,j) O ] 2 2,1 + 2[OBO ] k,k with B = [A l 1 ] (i,j),sl ([A l 1 ] (i,j),sl ) 26 26/51

27 ( Let A) R 2 2 be diagonal, B R 2 2 symmetric and cos α sin α O = Set a = (A 1,1 A 2,2 ) 2 /4, b = B 1,2, sin α cos α c = (B 2,2 B 1,1 )/2, e = b 2 + c 2, θ = 2α and ω = arctan(c/b) Then if α minimizes ([OAO ] 2,1 ) 2 + [OBO ] 2,2, then θ satisfies (a/e) sin(2θ) + sin(θ + ω + π/2) = /51

28 If Q l is a compound rotation of the form Q l = I1 O 1 Im O m for some partition I 1 I m of [n] with k 1,, k m k, and some sequence of orthogonal matrices O 1,, O m, then level l s contribution to the MMF error obeys [k m j 1 E l 2 [O j [A l 1 ] Ij,I j Oj ] 2 k j,p + [O jb j Oj ] kj,k j ], (1) j=1 p=1 where B j = [A l 1 ] Ij,S l 1 \I j ([A l 1 ] Ij,S l 1 \I j ) For compression tasks parallel MMFs are generally preferable to Jacobi MMFs because Unrelated parts of the matrix are processed independently, in parallel Gives more compact factorizations Jacobi MMFs can exhibit cascades The sets I 1,, I m can be found by a randomized strategy or exact matching (O(n 3 ) time) 28 28/51

29 The sequence in which MMF (with k 3) eliminates dimensions induces a (soft) hierarchical clustering amongst the dimensions (mixture of trees) Connection to hierarchical clustering 29 29/51

30 1 Find a (hierarchically) sparse basis for A 2 Hierarchically cluster data 3 Find community structure 4 Generate hierarchical graphs 5 Compress graphs & matrices 6 Provide a basis for sparse approximations such as the LASSO 7 Provide a basis for fast numerics (NLA, multigrid, etc) 30 30/51

31 Diffusion wavelets also start with the matrix representation of a smoothing operator (the diffusion operator) and compress it in multiple stages However, at each stage, the wavelets are constructed from the columns of A itself by a rank-revealing QR type process A Q 1 R 1 A 2 Q 1 R 1 R 1 }{{} Q 2 R 2 Q 1 A 4 Q 1 Q 2 R 2 R 2 }{{} Q2 Q 1 Q 3 R 3 Very strong theoretical foundations, but the sparsity (locality) of the Q l matrices is hard to control [Coifman & Maggioni, 2006] 31 31/51

32 Treelets are a special case of Jacobi MMF Q 3 Q 2 Q 1 AQ 1 Q 2 Q 3, but Restricted to Givens rotations (k = 2) only recovers a single tree Each Q i is chosen to eliminate the maximal off-diagonal entry, rather than minimizing overall error not intended as a factorization method A is regarded as a covariance matrix probabilistic analysis [Lee, Nadler & Wasserman, 2008] 32 32/51

33 Multigrid methods solve systems of pde s by shuttling back and forth between grids/meshes at different levels of resolution [Brandt, 1973; Livne & Brandt, 2010] Fast multipole methods evaluate a kernel (such as the Gaussian kernel) between a large number of particles, by aggregating them at different levels [Greengard & Rokhlin, 1987] H matrices [Hackbusch, 1999], H 2 matrices [Borm, 2007] and Hierarchically Semi-Separable matrices [Chandrasekaran et al, 2005] iteratively decompose into blocked matrices, with low rank structure in each of the blocks 33 33/51

34 Eigendecomposition (PCA) Singular Value Decomposition Laplacian eigenmaps, etc Rank Revealing QR Johnson & Lindenstrauss, 1986 Halko, Martinsson & Tropp, 2011 Ailon & Chazelle, 2006 Sarlós, 2006 Le, Sarlós & Smola, 2013 Jolliffe, 1972 Drineas et al, 2009 Deshpande et al, 2006 Boutisdis et al, 2009 Coifman & Maggioni, 2006 Lee, Nadler & Wasserman, 2008 Drineas & Mahoney, 2006 Koutis, Miller & Peng, 2010 Benczur & Karger, 1996 Spielman & Teng, 2004 Achlioptas & McSherry, 2001 Williams & Seeger, 2011 Fowlkes et al, 2004 Drineas & Mahoney, 2005 Kumar at al, /51

35 In classical wavelet transforms one proves that if f is α Hölder, ie, f(x) f(y) c H d(x, y) α x, y X, then the wavelet coefficients decay at a certain rate, eg, f, ψl m c l α+β Results of this type generally hold for spaces of, in which Vol(B(x, 2r)) c hom Vol(B(x, r)) x X, r > 0 Natural notion of distance between rows in MMF is d(i, j) = A i,:, A j,: /51

36 Λ We say that a symmetric matrix A R n n is Λ up to order K, if for any S [n] of size at most K, letting Q = A S,: A :,S, setting D to be the diagonal matrix with D i,i = Q i,: 1, and Q = D 1/2 QD 1/2, the λ 1,, λ S eigenvalues of Q satisfy Λ < λ i < 1 Λ, and furthermore c 1 T D i,i c T for some constant c T Inuitively Different rows are neither too parallel or totally orthogonal Generalization of the restricted isometry property from compressed sensing [Candes & Tao, 2005] 36 36/51

37 Let A R n n be a symmetric matrix that is Λ rank homogeneous up to order K and has an MMF factorization A = U 1 U L H U L U 1 Assume ψ l m is a wavelet in this factorization arising from row i of A l 1 supported on a set S of size K K and that H i,: 2 ϵ Then if f : [n] R is (c H, 1/2) Hölder with respect to d(i, j) = A i,:, A j,: 1, then f, ψ l m c T ch c Λ ϵ 1/2 K with c Λ = 4/(1 (1 2Λ) 2 ) 37 37/51

38

39 Frobenius norm error on the Zackary Karate Club graph (left) and a matrix of genetic relationship between 50 individuals from [Crossett, 2013](right) 39 39/51

40 Frobenius norm error of the MMF and Nyström methods on a vs a (Kronecker product) matrix 40 40/51

41 Frobenius norm error of the MMF and Nyström methods on large network datasets 41 41/51

42 MMF is a new type of matrix factorization mirroring multiresolution analysis generalization of rank MMF exploits hierarchical structure, but does not enforce a single hierarchy Empirical evidence suggests that MMF is a good model for real data Finding MMF factorizations is a fundamentally local and parallelizable process O(n log n) algorithms should be within reach Once in MMF form, a range of matrix computations become faster MMF has strong ties to: Diffusion wavelets, Treelets, Multiscale SVD, structured matrices, algebraic multigrid, and fast multipole methods 42 42/51

Multiresolution Matrix Compression

Multiresolution Matrix Compression Nedelina Teneva Pramod K Mudrakarta Risi Kondor Department of Computer Science The University of Chicago pramodkm@uchicago.edu Department of Computer Science The University of Chicago nteneva@uchicago.edu

More information

Approximate Spectral Clustering via Randomized Sketching

Approximate Spectral Clustering via Randomized Sketching Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed

More information

Approximate Principal Components Analysis of Large Data Sets

Approximate Principal Components Analysis of Large Data Sets Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Matrix decompositions

Matrix decompositions Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The

More information

Multiresolution analysis on the symmetric group

Multiresolution analysis on the symmetric group Multiresolution analysis on the symmetric group Risi Kondor and Walter Dempsey Department of Statistics and Department of Computer Science The University of Chicago {risi,wdempsey}@uchicago.edu Abstract

More information

RandNLA: Randomized Numerical Linear Algebra

RandNLA: Randomized Numerical Linear Algebra RandNLA: Randomized Numerical Linear Algebra Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas RandNLA: sketch a matrix by row/ column sampling

More information

Randomized Numerical Linear Algebra: Review and Progresses

Randomized Numerical Linear Algebra: Review and Progresses ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November

More information

Multiscale Wavelets on Trees, Graphs and High Dimensional Data

Multiscale Wavelets on Trees, Graphs and High Dimensional Data Multiscale Wavelets on Trees, Graphs and High Dimensional Data ICML 2010, Haifa Matan Gavish (Weizmann/Stanford) Boaz Nadler (Weizmann) Ronald Coifman (Yale) Boaz Nadler Ronald Coifman Motto... the relationships

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Universal Scalable Robust Solvers from Computational Information Games and fast eigenspace adapted Multiresolution Analysis.

Universal Scalable Robust Solvers from Computational Information Games and fast eigenspace adapted Multiresolution Analysis. Universal Scalable Robust Solvers from Computational Information Games and fast eigenspace adapted Multiresolution Analysis Houman Owhadi Joint work with Clint Scovel IPAM Apr 3, 2017 DARPA EQUiPS / AFOSR

More information

Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes

Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes Mauro Maggioni mauro.maggioni@yale.edu Department of Mathematics, Yale University, P.O. Box 88, New Haven, CT,, U.S.A.

More information

Global vs. Multiscale Approaches

Global vs. Multiscale Approaches Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge:

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Math 671: Tensor Train decomposition methods

Math 671: Tensor Train decomposition methods Math 671: Eduardo Corona 1 1 University of Michigan at Ann Arbor December 8, 2016 Table of Contents 1 Preliminaries and goal 2 Unfolding matrices for tensorized arrays The Tensor Train decomposition 3

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing

More information

Randomized algorithms for the approximation of matrices

Randomized algorithms for the approximation of matrices Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics

More information

Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices

Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman Nathan Halko Sijia Hao

More information

Computational Information Games A minitutorial Part II Houman Owhadi ICERM June 5, 2017

Computational Information Games A minitutorial Part II Houman Owhadi ICERM June 5, 2017 Computational Information Games A minitutorial Part II Houman Owhadi ICERM June 5, 2017 DARPA EQUiPS / AFOSR award no FA9550-16-1-0054 (Computational Information Games) Question Can we design a linear

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

Graph Sparsification I : Effective Resistance Sampling

Graph Sparsification I : Effective Resistance Sampling Graph Sparsification I : Effective Resistance Sampling Nikhil Srivastava Microsoft Research India Simons Institute, August 26 2014 Graphs G G=(V,E,w) undirected V = n w: E R + Sparsification Approximate

More information

Harmonic Analysis and Geometries of Digital Data Bases

Harmonic Analysis and Geometries of Digital Data Bases Harmonic Analysis and Geometries of Digital Data Bases AMS Session Special Sesson on the Mathematics of Information and Knowledge, Ronald Coifman (Yale) and Matan Gavish (Stanford, Yale) January 14, 2010

More information

Fast Matrix Computations via Randomized Sampling. Gunnar Martinsson, The University of Colorado at Boulder

Fast Matrix Computations via Randomized Sampling. Gunnar Martinsson, The University of Colorado at Boulder Fast Matrix Computations via Randomized Sampling Gunnar Martinsson, The University of Colorado at Boulder Computational science background One of the principal developments in science and engineering over

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

A fast randomized algorithm for overdetermined linear least-squares regression

A fast randomized algorithm for overdetermined linear least-squares regression A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm

More information

Randomized Algorithms for Matrix Computations

Randomized Algorithms for Matrix Computations Randomized Algorithms for Matrix Computations Ilse Ipsen Students: John Holodnak, Kevin Penner, Thomas Wentworth Research supported in part by NSF CISE CCF, DARPA XData Randomized Algorithms Solve a deterministic

More information

Grouping of correlated feature vectors using treelets

Grouping of correlated feature vectors using treelets Grouping of correlated feature vectors using treelets Jing Xiang Department of Machine Learning Carnegie Mellon University Pittsburgh, PA 15213 jingx@cs.cmu.edu Abstract In many applications, features

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

RandNLA: Randomization in Numerical Linear Algebra

RandNLA: Randomization in Numerical Linear Algebra RandNLA: Randomization in Numerical Linear Algebra Petros Drineas Department of Computer Science Rensselaer Polytechnic Institute To access my web page: drineas Why RandNLA? Randomization and sampling

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Diffusion Wavelets and Applications

Diffusion Wavelets and Applications Diffusion Wavelets and Applications J.C. Bremer, R.R. Coifman, P.W. Jones, S. Lafon, M. Mohlenkamp, MM, R. Schul, A.D. Szlam Demos, web pages and preprints available at: S.Lafon: www.math.yale.edu/~sl349

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

Intro to harmonic analysis on groups Risi Kondor

Intro to harmonic analysis on groups Risi Kondor Risi Kondor Any (sufficiently smooth) function f on the unit circle (equivalently, any 2π periodic f ) can be decomposed into a sum of sinusoidal waves f(x) = k= c n e ikx c n = 1 2π f(x) e ikx dx 2π 0

More information

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Enabling very large-scale matrix computations via randomization

Enabling very large-scale matrix computations via randomization Enabling very large-scale matrix computations via randomization Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman Nathan Halko Patrick Young Collaborators: Edo Liberty

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

A randomized algorithm for approximating the SVD of a matrix

A randomized algorithm for approximating the SVD of a matrix A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

Greedy Signal Recovery and Uniform Uncertainty Principles

Greedy Signal Recovery and Uniform Uncertainty Principles Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles

More information

Fast numerical methods for solving linear PDEs

Fast numerical methods for solving linear PDEs Fast numerical methods for solving linear PDEs P.G. Martinsson, The University of Colorado at Boulder Acknowledgements: Some of the work presented is joint work with Vladimir Rokhlin and Mark Tygert at

More information

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics

More information

A fast randomized algorithm for approximating an SVD of a matrix

A fast randomized algorithm for approximating an SVD of a matrix A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,

More information

Less is More: Computational Regularization by Subsampling

Less is More: Computational Regularization by Subsampling Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

OPERATIONS on large matrices are a cornerstone of

OPERATIONS on large matrices are a cornerstone of 1 On Sparse Representations of Linear Operators and the Approximation of Matrix Products Mohamed-Ali Belabbas and Patrick J. Wolfe arxiv:0707.4448v2 [cs.ds] 26 Jun 2009 Abstract Thus far, sparse representations

More information

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

Applied Machine Learning for Biomedical Engineering. Enrico Grisan Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

Treelets An Adaptive Multi-Scale Basis for Sparse Unordered Data

Treelets An Adaptive Multi-Scale Basis for Sparse Unordered Data Treelets An Adaptive Multi-Scale Basis for Sparse Unordered Data arxiv:0707.0481v1 [stat.me] 3 Jul 2007 Ann B Lee, Boaz Nadler, and Larry Wasserman Department of Statistics (ABL, LW) Carnegie Mellon University

More information

Dissertation Defense

Dissertation Defense Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation

More information

DIMENSION REDUCTION. min. j=1

DIMENSION REDUCTION. min. j=1 DIMENSION REDUCTION 1 Principal Component Analysis (PCA) Principal components analysis (PCA) finds low dimensional approximations to the data by projecting the data onto linear subspaces. Let X R d and

More information

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale Subset Selection Deterministic vs. Randomized Ilse Ipsen North Carolina State University Joint work with: Stan Eisenstat, Yale Mary Beth Broadbent, Martin Brown, Kevin Penner Subset Selection Given: real

More information

Memory Efficient Kernel Approximation

Memory Efficient Kernel Approximation Si Si Department of Computer Science University of Texas at Austin ICML Beijing, China June 23, 2014 Joint work with Cho-Jui Hsieh and Inderjit S. Dhillon Outline Background Motivation Low-Rank vs. Block

More information

Fast Krylov Methods for N-Body Learning

Fast Krylov Methods for N-Body Learning Fast Krylov Methods for N-Body Learning Nando de Freitas Department of Computer Science University of British Columbia nando@cs.ubc.ca Maryam Mahdaviani Department of Computer Science University of British

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

An Empirical Comparison of Graph Laplacian Solvers

An Empirical Comparison of Graph Laplacian Solvers An Empirical Comparison of Graph Laplacian Solvers Kevin Deweese 1 Erik Boman 2 John Gilbert 1 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms Department

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Less is More: Computational Regularization by Subsampling

Less is More: Computational Regularization by Subsampling Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro

More information

Mini-project in scientific computing

Mini-project in scientific computing Mini-project in scientific computing Eran Treister Computer Science Department, Ben-Gurion University of the Negev, Israel. March 7, 2018 1 / 30 Scientific computing Involves the solution of large computational

More information

H 2 -matrices with adaptive bases

H 2 -matrices with adaptive bases 1 H 2 -matrices with adaptive bases Steffen Börm MPI für Mathematik in den Naturwissenschaften Inselstraße 22 26, 04103 Leipzig http://www.mis.mpg.de/ Problem 2 Goal: Treat certain large dense matrices

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

Matrix Approximations

Matrix Approximations Matrix pproximations Sanjiv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 00 Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning Latent Semantic Indexing (LSI) Given n documents

More information

Math 671: Tensor Train decomposition methods II

Math 671: Tensor Train decomposition methods II Math 671: Tensor Train decomposition methods II Eduardo Corona 1 1 University of Michigan at Ann Arbor December 13, 2016 Table of Contents 1 What we ve talked about so far: 2 The Tensor Train decomposition

More information

Random Methods for Linear Algebra

Random Methods for Linear Algebra Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

Diffusion Geometries, Global and Multiscale

Diffusion Geometries, Global and Multiscale Diffusion Geometries, Global and Multiscale R.R. Coifman, S. Lafon, MM, J.C. Bremer Jr., A.D. Szlam, P.W. Jones, R.Schul Papers, talks, other materials available at: www.math.yale.edu/~mmm82 Data and functions

More information

The Nyström Extension and Spectral Methods in Learning

The Nyström Extension and Spectral Methods in Learning Introduction Main Results Simulation Studies Summary The Nyström Extension and Spectral Methods in Learning New bounds and algorithms for high-dimensional data sets Patrick J. Wolfe (joint work with Mohamed-Ali

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Basic Calculus Review

Basic Calculus Review Basic Calculus Review Lorenzo Rosasco ISML Mod. 2 - Machine Learning Vector Spaces Functionals and Operators (Matrices) Vector Space A vector space is a set V with binary operations +: V V V and : R V

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning. Sargur N. Srihari Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

More information

Geometric Component Analysis and its Applications to Data Analysis

Geometric Component Analysis and its Applications to Data Analysis Geometric Component Analysis and its Applications to Data Analysis Amit Bermanis 1, Moshe Salhov 2, and Amir Averbuch 2 1 Department of Mathematics, University of Toronto, Canada 2 School of Computer Science,

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Fast Approximate Matrix Multiplication by Solving Linear Systems

Fast Approximate Matrix Multiplication by Solving Linear Systems Electronic Colloquium on Computational Complexity, Report No. 117 (2014) Fast Approximate Matrix Multiplication by Solving Linear Systems Shiva Manne 1 and Manjish Pal 2 1 Birla Institute of Technology,

More information

Graphs, Geometry and Semi-supervised Learning

Graphs, Geometry and Semi-supervised Learning Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Methods for estimating the diagonal of matrix. functions. functions. Jesse Laeuchli Andreas Stathopoulos CSC 2016

Methods for estimating the diagonal of matrix. functions. functions. Jesse Laeuchli Andreas Stathopoulos CSC 2016 Jesse Laeuchli Andreas Stathopoulos CSC 216 1 / 24 The Problem Given a large A, and a function f, find diag(f(a)),or trace(f(a)) = n i=1 f(a) ii = n i=1 f(λ i) Some important examples of f f (A) = A 1

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Multiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances.

Multiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances. Multiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances. R. Coifman, Department of Mathematics, program of Applied Mathematics Yale University Joint work with M. Gavish and W.

More information

Multiscale Manifold Learning

Multiscale Manifold Learning Multiscale Manifold Learning Chang Wang IBM T J Watson Research Lab Kitchawan Rd Yorktown Heights, New York 598 wangchan@usibmcom Sridhar Mahadevan Computer Science Department University of Massachusetts

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Lecture Notes 5: Multiresolution Analysis

Lecture Notes 5: Multiresolution Analysis Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Computational tractability of machine learning algorithms for tall fat data

Computational tractability of machine learning algorithms for tall fat data Computational tractability of machine learning algorithms for tall fat data Getting good enough solutions as fast as possible Vikas Chandrakant Raykar vikas@cs.umd.edu University of Maryland, CollegePark

More information

Randomized algorithms for matrix computations and analysis of high dimensional data

Randomized algorithms for matrix computations and analysis of high dimensional data PCMI Summer Session, The Mathematics of Data Midway, Utah July, 2016 Randomized algorithms for matrix computations and analysis of high dimensional data Gunnar Martinsson The University of Colorado at

More information