Risi Kondor, The University of Chicago. Nedelina Teneva UChicago. Vikas K Garg TTI-C, MIT
|
|
- Mitchell Montgomery
- 6 years ago
- Views:
Transcription
1 Risi Kondor, The University of Chicago Nedelina Teneva UChicago Vikas K Garg TTI-C, MIT
2 Risi Kondor, The University of Chicago Nedelina Teneva UChicago Vikas K Garg TTI-C, MIT
3 {(x 1, y 1 ), (x 2, y 2 ),, (x m, y m )} f : x y 3 3/51
4 [ 1 f = argmin f F m m ] l(f(x i ), y i ) + λ f 2 2 i=1 General framework (regularized risk minimization) that encompasses Kernel methods, including Gaussian processes and SVMs etc Most of Bayesian statistics Boosting, etc [Vapnik & Chervnonenkis 1971 ] 4 4/51
5 5 5/51
6 Data: {(x, y)} m i=1 Features: {(ϕ 1 (x i ),, ϕ n (x i ), y)} m i=1 f : x y optimization 6 6/51
7 [ 1 f = argmin f F m m ] l(f(x i ), y i ) + λ f 1 i=1 The l 1 norm induces Crucially, minimizing f 1 is almost as good as minimizing f 0 This lead to an explosion of activity related to The Lasso [Tibshirani, 1996], Basis Pursuit [Donoho, 1998], Elastic Net [Hui & Hastie, 2005], Compressed Sensing [Donoho, 2004] [Candés et al 2005 ] [Osher] However, no more Representer Theorem 7 7/51
8 Data: {(x, y)} m i=1 Features: {(ϕ 1 (x i ),, ϕ n (x i ), y)} m i=1 f : x y optimization 8 8/51
9 Black box machine learning ignores the structure of the data itself: The algorithmic part is separated from the statistical part O(n 3 ) complexity is totally unrealistic in today s world No obvious parallelism Ignores the most important ideas of the Applied Math of the last 30 years, such as multiresolution analysis, fast multipole methods and multigrid (The relationship of deep learning to all this is not clear yet) 9 9/51
10
11 Data Features Mulitresolution (eg, Scattering) f : x y 11 11/51
12 MMF is a multilevel factorization of the data similarity (kernel) matrix A of the form ( ) ( ) ( ) ( ) ( ) ( ) P P Q L Q 1 where A Q 1 Q L P is just a permutation matrix to reorder the rows and columns of A, Each Q l is an orthogonal matrix that has special, Outside its [Q l ] 1:δl 1, 1:δ l 1 block, each Q l is just the identity for some fixed sequence n δ 1 δ L H A hierarchical, multipole-type description of the data In factorized form, A is much easier to deal with 12 12/51
13 The eigendecomposition (PCA) of a symmetric matrix A R n n is ( )( )( ) ( ) = Q A The MMF factorization is ( ) ( ) ( ) ( ) ( ) ( ) P P Q D Q L Q 1 A Q 1 Q L H The eigendecomposition always exists, is essentially unique, and truncating it after k terms is the optimal approximation in the Frobenius and nuclear norms But it is expensive to compute O(n 3 ), eigenvectors are dense, does not take advantage of hierarchical structrure 13 13/51
14 In general, multiresolution analysis on a space X is a filtration where V l = V l+1 W l+1 and Each V l s orthonormal basis is {ϕ l m} m Each W l s orthonormal basis is {ψ l m} m The spaces are chosen so that as l increases, V l contain functions that are increasingly smooth wrt some self-adjoint operator T : L(X) L(X) 14 14/51
15 The central dogma of harmonic analysis is that the structure of the space of functions on a set X can shed light on the structure of X itself G L(G) The interplay between geometry of sets, function spaces on sets, and operators on sets is classical in Harmonic Analysis [Coifman & Maggioni, 2006] Multiresolution analysis is an attractive paradigm for ML because Data naturally clusters into (soft) hierarchies The resulting data structures can form the basis of fast algorithms 15 15/51
16 But how does a matrix induce multiresolution??? 16 16/51
17
18 R Mallat [1989] defined multiresolution on R by the following axioms: 1 j V l = {0}, 2 l V l is dense in L 2 (R), 3 If f V l then f (x) = f(x 2 l m) is also in V l for any m Z, 4 If f V l, then f (x) = f(2x) is in V l 1, which imply the existence of a mother wavelet ψ and a father wavelet ϕ s t ψ l m = 2 l/2 ψ(2 l x m) and ϕ l m = 2 l/2 ϕ(2 l x m) 18 18/51
19 Which of the ideas from classical multiresolution still make sense? Recursively split L(X) into smoother and rougher parts Basis functions should be localized in space & frequency Each Φ l Q l Φl+1 Ψ l+1 transform is orthogonal and sparse Each ψ l m is derived by translating ψl MAYBE Each ψ l is derived by scaling ψ??? 19 19/51
20 1 The sequence L(X) = V 0 V 1 V 2 is a filtration of R n in terms of smoothness with respect to T in the sense that µ l = inf f, T f / f, f f V l \{0} increases at a given rate 2 The wavelets are localized in the sense that inf sup ψm(y) l x X y X d(x, y) α increases no faster than a certain rate 3 Letting Q l be the matrix expressing Φ l Ψ l in the previous basis Φ l 1, ie, ϕ l m = dim(v l 1 ) i=1 [Q l ] m,i ϕ l 1 i ψm l = dim(v l 1 ) i=1 [Q l ] m+dim(vl 1 ),i ϕ l 1 i, each Q l orthogonal transform is sparse, guaranteeing the existence of a fast wavelet transform (Φ 0 is taken to be the standard basis, ϕ 0 m = e m ) 20 20/51
21
22 If X = n is finite, representing T by a symmetric matrix A R, each basis transform V l V l+1 W l+1 is like applying a rotation matrix A Q 1 AQ 1 Q 2 Q 1 AQ 1 Q 2 and then fixing a subset of the coordinates as wavelets In addition, Q 1,, Q L must obey sparsity constraints multiresolution analysis multilevel matrix factorization 22 22/51
23 ( ) ( ) P ( ) ( ) P ( ) ( ) Q L Q 1 A Q 1 Q L H Given a symmetric matrix A R n n, a class of sparse rotations Q, and a sequence n δ 1 δ L, a of A is A = Q 1 Q 2 Q LH Q L Q 2 Q 1, where each Q l Q rotation satisfies [Q l ] [n]\sl, [n]\s l = I n δl 1 for some nested sequence of sets [n] = S 1 S 2 S L+1 with S l = δ l 1, and H is S L+1 core diagonal If this is factorization is exact, we say that A is (over G with δ 1,, δ L ) generalization of rank 23 23/51
24 Q l It is critical that the Q l must be very simple and local rotations Two choices: 1 k : Jacobi MMFs ( ) Q = I n k (i1,,i k ) O = P P for some O SO(k) for k = 2, just a Givens rotation 2 k Parallel MMFs ( ) Q = (i 1 1,,i 1 k ) O 1 (i 2 1 1,,i 2 k ) O 2 (i m 2 1,,i m km ) O m = P P for some O 1,, O m SO(k) 24 24/51
25 Given A, ideally, we would like to solve minimize A Q 1 Q L H Q L Q 1 2 Frob [n] S 1 S L H HS n ; Q L 1,, Q L Q for a given class Q of local rotations and dimensions δ 1 δ 2 δ L In general, this optimization problem is combinatorially hard Easy to approximate it in a greedy way (level by level) To solve the combinatorial part of the problem (at each level) use a Deterministic strategy, or a Randomized strategy 25 25/51
26 If Q l = I n k I O with I = (i 1,, i k ) and J l = {i k }, then the contribution of level l to the MMF approximation error (in Frobenius norm) is k 1 E l = EI O = 2 [O[A l 1 ] I,I O ] 2 k,p + 2[OBO ] k,k, p=1 where B = [A l 1 ] I,Sl ([A l 1 ] I,Sl ) In the special case of k =2 and I l = (i, j), E l = E O (i,j) = 2[O[A l 1] (i,j),(i,j) O ] 2 2,1 + 2[OBO ] k,k with B = [A l 1 ] (i,j),sl ([A l 1 ] (i,j),sl ) 26 26/51
27 ( Let A) R 2 2 be diagonal, B R 2 2 symmetric and cos α sin α O = Set a = (A 1,1 A 2,2 ) 2 /4, b = B 1,2, sin α cos α c = (B 2,2 B 1,1 )/2, e = b 2 + c 2, θ = 2α and ω = arctan(c/b) Then if α minimizes ([OAO ] 2,1 ) 2 + [OBO ] 2,2, then θ satisfies (a/e) sin(2θ) + sin(θ + ω + π/2) = /51
28 If Q l is a compound rotation of the form Q l = I1 O 1 Im O m for some partition I 1 I m of [n] with k 1,, k m k, and some sequence of orthogonal matrices O 1,, O m, then level l s contribution to the MMF error obeys [k m j 1 E l 2 [O j [A l 1 ] Ij,I j Oj ] 2 k j,p + [O jb j Oj ] kj,k j ], (1) j=1 p=1 where B j = [A l 1 ] Ij,S l 1 \I j ([A l 1 ] Ij,S l 1 \I j ) For compression tasks parallel MMFs are generally preferable to Jacobi MMFs because Unrelated parts of the matrix are processed independently, in parallel Gives more compact factorizations Jacobi MMFs can exhibit cascades The sets I 1,, I m can be found by a randomized strategy or exact matching (O(n 3 ) time) 28 28/51
29 The sequence in which MMF (with k 3) eliminates dimensions induces a (soft) hierarchical clustering amongst the dimensions (mixture of trees) Connection to hierarchical clustering 29 29/51
30 1 Find a (hierarchically) sparse basis for A 2 Hierarchically cluster data 3 Find community structure 4 Generate hierarchical graphs 5 Compress graphs & matrices 6 Provide a basis for sparse approximations such as the LASSO 7 Provide a basis for fast numerics (NLA, multigrid, etc) 30 30/51
31 Diffusion wavelets also start with the matrix representation of a smoothing operator (the diffusion operator) and compress it in multiple stages However, at each stage, the wavelets are constructed from the columns of A itself by a rank-revealing QR type process A Q 1 R 1 A 2 Q 1 R 1 R 1 }{{} Q 2 R 2 Q 1 A 4 Q 1 Q 2 R 2 R 2 }{{} Q2 Q 1 Q 3 R 3 Very strong theoretical foundations, but the sparsity (locality) of the Q l matrices is hard to control [Coifman & Maggioni, 2006] 31 31/51
32 Treelets are a special case of Jacobi MMF Q 3 Q 2 Q 1 AQ 1 Q 2 Q 3, but Restricted to Givens rotations (k = 2) only recovers a single tree Each Q i is chosen to eliminate the maximal off-diagonal entry, rather than minimizing overall error not intended as a factorization method A is regarded as a covariance matrix probabilistic analysis [Lee, Nadler & Wasserman, 2008] 32 32/51
33 Multigrid methods solve systems of pde s by shuttling back and forth between grids/meshes at different levels of resolution [Brandt, 1973; Livne & Brandt, 2010] Fast multipole methods evaluate a kernel (such as the Gaussian kernel) between a large number of particles, by aggregating them at different levels [Greengard & Rokhlin, 1987] H matrices [Hackbusch, 1999], H 2 matrices [Borm, 2007] and Hierarchically Semi-Separable matrices [Chandrasekaran et al, 2005] iteratively decompose into blocked matrices, with low rank structure in each of the blocks 33 33/51
34 Eigendecomposition (PCA) Singular Value Decomposition Laplacian eigenmaps, etc Rank Revealing QR Johnson & Lindenstrauss, 1986 Halko, Martinsson & Tropp, 2011 Ailon & Chazelle, 2006 Sarlós, 2006 Le, Sarlós & Smola, 2013 Jolliffe, 1972 Drineas et al, 2009 Deshpande et al, 2006 Boutisdis et al, 2009 Coifman & Maggioni, 2006 Lee, Nadler & Wasserman, 2008 Drineas & Mahoney, 2006 Koutis, Miller & Peng, 2010 Benczur & Karger, 1996 Spielman & Teng, 2004 Achlioptas & McSherry, 2001 Williams & Seeger, 2011 Fowlkes et al, 2004 Drineas & Mahoney, 2005 Kumar at al, /51
35 In classical wavelet transforms one proves that if f is α Hölder, ie, f(x) f(y) c H d(x, y) α x, y X, then the wavelet coefficients decay at a certain rate, eg, f, ψl m c l α+β Results of this type generally hold for spaces of, in which Vol(B(x, 2r)) c hom Vol(B(x, r)) x X, r > 0 Natural notion of distance between rows in MMF is d(i, j) = A i,:, A j,: /51
36 Λ We say that a symmetric matrix A R n n is Λ up to order K, if for any S [n] of size at most K, letting Q = A S,: A :,S, setting D to be the diagonal matrix with D i,i = Q i,: 1, and Q = D 1/2 QD 1/2, the λ 1,, λ S eigenvalues of Q satisfy Λ < λ i < 1 Λ, and furthermore c 1 T D i,i c T for some constant c T Inuitively Different rows are neither too parallel or totally orthogonal Generalization of the restricted isometry property from compressed sensing [Candes & Tao, 2005] 36 36/51
37 Let A R n n be a symmetric matrix that is Λ rank homogeneous up to order K and has an MMF factorization A = U 1 U L H U L U 1 Assume ψ l m is a wavelet in this factorization arising from row i of A l 1 supported on a set S of size K K and that H i,: 2 ϵ Then if f : [n] R is (c H, 1/2) Hölder with respect to d(i, j) = A i,:, A j,: 1, then f, ψ l m c T ch c Λ ϵ 1/2 K with c Λ = 4/(1 (1 2Λ) 2 ) 37 37/51
38
39 Frobenius norm error on the Zackary Karate Club graph (left) and a matrix of genetic relationship between 50 individuals from [Crossett, 2013](right) 39 39/51
40 Frobenius norm error of the MMF and Nyström methods on a vs a (Kronecker product) matrix 40 40/51
41 Frobenius norm error of the MMF and Nyström methods on large network datasets 41 41/51
42 MMF is a new type of matrix factorization mirroring multiresolution analysis generalization of rank MMF exploits hierarchical structure, but does not enforce a single hierarchy Empirical evidence suggests that MMF is a good model for real data Finding MMF factorizations is a fundamentally local and parallelizable process O(n log n) algorithms should be within reach Once in MMF form, a range of matrix computations become faster MMF has strong ties to: Diffusion wavelets, Treelets, Multiscale SVD, structured matrices, algebraic multigrid, and fast multipole methods 42 42/51
Multiresolution Matrix Compression
Nedelina Teneva Pramod K Mudrakarta Risi Kondor Department of Computer Science The University of Chicago pramodkm@uchicago.edu Department of Computer Science The University of Chicago nteneva@uchicago.edu
More informationApproximate Spectral Clustering via Randomized Sketching
Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed
More informationApproximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationMatrix decompositions
Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The
More informationMultiresolution analysis on the symmetric group
Multiresolution analysis on the symmetric group Risi Kondor and Walter Dempsey Department of Statistics and Department of Computer Science The University of Chicago {risi,wdempsey}@uchicago.edu Abstract
More informationRandNLA: Randomized Numerical Linear Algebra
RandNLA: Randomized Numerical Linear Algebra Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas RandNLA: sketch a matrix by row/ column sampling
More informationRandomized Numerical Linear Algebra: Review and Progresses
ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November
More informationMultiscale Wavelets on Trees, Graphs and High Dimensional Data
Multiscale Wavelets on Trees, Graphs and High Dimensional Data ICML 2010, Haifa Matan Gavish (Weizmann/Stanford) Boaz Nadler (Weizmann) Ronald Coifman (Yale) Boaz Nadler Ronald Coifman Motto... the relationships
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationUniversal Scalable Robust Solvers from Computational Information Games and fast eigenspace adapted Multiresolution Analysis.
Universal Scalable Robust Solvers from Computational Information Games and fast eigenspace adapted Multiresolution Analysis Houman Owhadi Joint work with Clint Scovel IPAM Apr 3, 2017 DARPA EQUiPS / AFOSR
More informationFast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes
Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes Mauro Maggioni mauro.maggioni@yale.edu Department of Mathematics, Yale University, P.O. Box 88, New Haven, CT,, U.S.A.
More informationGlobal vs. Multiscale Approaches
Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge:
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)
AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical
More informationMath 671: Tensor Train decomposition methods
Math 671: Eduardo Corona 1 1 University of Michigan at Ann Arbor December 8, 2016 Table of Contents 1 Preliminaries and goal 2 Unfolding matrices for tensorized arrays The Tensor Train decomposition 3
More informationsublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)
sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank
More informationSpectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity
Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing
More informationRandomized algorithms for the approximation of matrices
Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics
More informationRandomized methods for computing the Singular Value Decomposition (SVD) of very large matrices
Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman Nathan Halko Sijia Hao
More informationComputational Information Games A minitutorial Part II Houman Owhadi ICERM June 5, 2017
Computational Information Games A minitutorial Part II Houman Owhadi ICERM June 5, 2017 DARPA EQUiPS / AFOSR award no FA9550-16-1-0054 (Computational Information Games) Question Can we design a linear
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More informationGraph Sparsification I : Effective Resistance Sampling
Graph Sparsification I : Effective Resistance Sampling Nikhil Srivastava Microsoft Research India Simons Institute, August 26 2014 Graphs G G=(V,E,w) undirected V = n w: E R + Sparsification Approximate
More informationHarmonic Analysis and Geometries of Digital Data Bases
Harmonic Analysis and Geometries of Digital Data Bases AMS Session Special Sesson on the Mathematics of Information and Knowledge, Ronald Coifman (Yale) and Matan Gavish (Stanford, Yale) January 14, 2010
More informationFast Matrix Computations via Randomized Sampling. Gunnar Martinsson, The University of Colorado at Boulder
Fast Matrix Computations via Randomized Sampling Gunnar Martinsson, The University of Colorado at Boulder Computational science background One of the principal developments in science and engineering over
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationA fast randomized algorithm for overdetermined linear least-squares regression
A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm
More informationRandomized Algorithms for Matrix Computations
Randomized Algorithms for Matrix Computations Ilse Ipsen Students: John Holodnak, Kevin Penner, Thomas Wentworth Research supported in part by NSF CISE CCF, DARPA XData Randomized Algorithms Solve a deterministic
More informationGrouping of correlated feature vectors using treelets
Grouping of correlated feature vectors using treelets Jing Xiang Department of Machine Learning Carnegie Mellon University Pittsburgh, PA 15213 jingx@cs.cmu.edu Abstract In many applications, features
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationRandNLA: Randomization in Numerical Linear Algebra
RandNLA: Randomization in Numerical Linear Algebra Petros Drineas Department of Computer Science Rensselaer Polytechnic Institute To access my web page: drineas Why RandNLA? Randomization and sampling
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationDiffusion Wavelets and Applications
Diffusion Wavelets and Applications J.C. Bremer, R.R. Coifman, P.W. Jones, S. Lafon, M. Mohlenkamp, MM, R. Schul, A.D. Szlam Demos, web pages and preprints available at: S.Lafon: www.math.yale.edu/~sl349
More informationCourse Notes: Week 1
Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues
More informationIntro to harmonic analysis on groups Risi Kondor
Risi Kondor Any (sufficiently smooth) function f on the unit circle (equivalently, any 2π periodic f ) can be decomposed into a sum of sinusoidal waves f(x) = k= c n e ikx c n = 1 2π f(x) e ikx dx 2π 0
More informationLecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving
Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More informationEnabling very large-scale matrix computations via randomization
Enabling very large-scale matrix computations via randomization Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman Nathan Halko Patrick Young Collaborators: Edo Liberty
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationA randomized algorithm for approximating the SVD of a matrix
A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationGreedy Signal Recovery and Uniform Uncertainty Principles
Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles
More informationFast numerical methods for solving linear PDEs
Fast numerical methods for solving linear PDEs P.G. Martinsson, The University of Colorado at Boulder Acknowledgements: Some of the work presented is joint work with Vladimir Rokhlin and Mark Tygert at
More informationBeyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian
Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics
More informationA fast randomized algorithm for approximating an SVD of a matrix
A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,
More informationLess is More: Computational Regularization by Subsampling
Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationCompressed Sensing and Robust Recovery of Low Rank Matrices
Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationOPERATIONS on large matrices are a cornerstone of
1 On Sparse Representations of Linear Operators and the Approximation of Matrix Products Mohamed-Ali Belabbas and Patrick J. Wolfe arxiv:0707.4448v2 [cs.ds] 26 Jun 2009 Abstract Thus far, sparse representations
More informationApplied Machine Learning for Biomedical Engineering. Enrico Grisan
Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationTreelets An Adaptive Multi-Scale Basis for Sparse Unordered Data
Treelets An Adaptive Multi-Scale Basis for Sparse Unordered Data arxiv:0707.0481v1 [stat.me] 3 Jul 2007 Ann B Lee, Boaz Nadler, and Larry Wasserman Department of Statistics (ABL, LW) Carnegie Mellon University
More informationDissertation Defense
Clustering Algorithms for Random and Pseudo-random Structures Dissertation Defense Pradipta Mitra 1 1 Department of Computer Science Yale University April 23, 2008 Mitra (Yale University) Dissertation
More informationDIMENSION REDUCTION. min. j=1
DIMENSION REDUCTION 1 Principal Component Analysis (PCA) Principal components analysis (PCA) finds low dimensional approximations to the data by projecting the data onto linear subspaces. Let X R d and
More informationSubset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale
Subset Selection Deterministic vs. Randomized Ilse Ipsen North Carolina State University Joint work with: Stan Eisenstat, Yale Mary Beth Broadbent, Martin Brown, Kevin Penner Subset Selection Given: real
More informationMemory Efficient Kernel Approximation
Si Si Department of Computer Science University of Texas at Austin ICML Beijing, China June 23, 2014 Joint work with Cho-Jui Hsieh and Inderjit S. Dhillon Outline Background Motivation Low-Rank vs. Block
More informationFast Krylov Methods for N-Body Learning
Fast Krylov Methods for N-Body Learning Nando de Freitas Department of Computer Science University of British Columbia nando@cs.ubc.ca Maryam Mahdaviani Department of Computer Science University of British
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationAn Empirical Comparison of Graph Laplacian Solvers
An Empirical Comparison of Graph Laplacian Solvers Kevin Deweese 1 Erik Boman 2 John Gilbert 1 1 Department of Computer Science University of California, Santa Barbara 2 Scalable Algorithms Department
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationLess is More: Computational Regularization by Subsampling
Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro
More informationMini-project in scientific computing
Mini-project in scientific computing Eran Treister Computer Science Department, Ben-Gurion University of the Negev, Israel. March 7, 2018 1 / 30 Scientific computing Involves the solution of large computational
More informationH 2 -matrices with adaptive bases
1 H 2 -matrices with adaptive bases Steffen Börm MPI für Mathematik in den Naturwissenschaften Inselstraße 22 26, 04103 Leipzig http://www.mis.mpg.de/ Problem 2 Goal: Treat certain large dense matrices
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar
More informationMatrix Approximations
Matrix pproximations Sanjiv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 00 Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning Latent Semantic Indexing (LSI) Given n documents
More informationMath 671: Tensor Train decomposition methods II
Math 671: Tensor Train decomposition methods II Eduardo Corona 1 1 University of Michigan at Ann Arbor December 13, 2016 Table of Contents 1 What we ve talked about so far: 2 The Tensor Train decomposition
More informationRandom Methods for Linear Algebra
Gittens gittens@acm.caltech.edu Applied and Computational Mathematics California Institue of Technology October 2, 2009 Outline The Johnson-Lindenstrauss Transform 1 The Johnson-Lindenstrauss Transform
More informationThe following definition is fundamental.
1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic
More informationDiffusion Geometries, Global and Multiscale
Diffusion Geometries, Global and Multiscale R.R. Coifman, S. Lafon, MM, J.C. Bremer Jr., A.D. Szlam, P.W. Jones, R.Schul Papers, talks, other materials available at: www.math.yale.edu/~mmm82 Data and functions
More informationThe Nyström Extension and Spectral Methods in Learning
Introduction Main Results Simulation Studies Summary The Nyström Extension and Spectral Methods in Learning New bounds and algorithms for high-dimensional data sets Patrick J. Wolfe (joint work with Mohamed-Ali
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationBasic Calculus Review
Basic Calculus Review Lorenzo Rosasco ISML Mod. 2 - Machine Learning Vector Spaces Functionals and Operators (Matrices) Vector Space A vector space is a set V with binary operations +: V V V and : R V
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationLinear Algebra for Machine Learning. Sargur N. Srihari
Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it
More informationGeometric Component Analysis and its Applications to Data Analysis
Geometric Component Analysis and its Applications to Data Analysis Amit Bermanis 1, Moshe Salhov 2, and Amir Averbuch 2 1 Department of Mathematics, University of Toronto, Canada 2 School of Computer Science,
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationFast Approximate Matrix Multiplication by Solving Linear Systems
Electronic Colloquium on Computational Complexity, Report No. 117 (2014) Fast Approximate Matrix Multiplication by Solving Linear Systems Shiva Manne 1 and Manjish Pal 2 1 Birla Institute of Technology,
More informationGraphs, Geometry and Semi-supervised Learning
Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationCS 143 Linear Algebra Review
CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see
More informationMethods for estimating the diagonal of matrix. functions. functions. Jesse Laeuchli Andreas Stathopoulos CSC 2016
Jesse Laeuchli Andreas Stathopoulos CSC 216 1 / 24 The Problem Given a large A, and a function f, find diag(f(a)),or trace(f(a)) = n i=1 f(a) ii = n i=1 f(λ i) Some important examples of f f (A) = A 1
More informationMAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing
MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationMultiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances.
Multiscale bi-harmonic Analysis of Digital Data Bases and Earth moving distances. R. Coifman, Department of Mathematics, program of Applied Mathematics Yale University Joint work with M. Gavish and W.
More informationMultiscale Manifold Learning
Multiscale Manifold Learning Chang Wang IBM T J Watson Research Lab Kitchawan Rd Yorktown Heights, New York 598 wangchan@usibmcom Sridhar Mahadevan Computer Science Department University of Massachusetts
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationLecture Notes 5: Multiresolution Analysis
Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and
More informationOptimisation Combinatoire et Convexe.
Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix
More information1 Matrix notation and preliminaries from spectral graph theory
Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationComputational tractability of machine learning algorithms for tall fat data
Computational tractability of machine learning algorithms for tall fat data Getting good enough solutions as fast as possible Vikas Chandrakant Raykar vikas@cs.umd.edu University of Maryland, CollegePark
More informationRandomized algorithms for matrix computations and analysis of high dimensional data
PCMI Summer Session, The Mathematics of Data Midway, Utah July, 2016 Randomized algorithms for matrix computations and analysis of high dimensional data Gunnar Martinsson The University of Colorado at
More information