Sketching for Large-Scale Learning of Mixture Models

Size: px
Start display at page:

Download "Sketching for Large-Scale Learning of Mixture Models"

Transcription

1 Sketching for Large-Scale Learning of Mixture Models Nicolas Keriven Université Rennes 1, Inria Rennes Bretagne-atlantique Adv. Rémi Gribonval

2 Outline Introduction Practical Approach Results Theoretical analysis Conclusion and outlooks

3 Paths to Compressive Learning Goal : Compute parameters from a large database. 1/28

4 Paths to Compressive Learning Goal : Compute parameters from a large database. PCA : Classification : Regression : Density estimation : 1/28

5 Paths to Compressive Learning Goal : Compute parameters from a large database. PCA : Classification : Regression : Density estimation : Idea : compress the database beforehand. 1/28

6 Paths to Compressive Learning Goal : Compute parameters from a large database. PCA : Classification : Regression : Density estimation : Idea : compress the database beforehand. Large (tall) See e.g. [Calderbank 2009] 1/28

7 Paths to Compressive Learning Goal : Compute parameters from a large database. PCA : Classification : Regression : Density estimation : Idea : compress the database beforehand. Large (tall) See e.g. [Calderbank 2009] Large (fat) «Big data» See e.g. [Cormode 2011] 1/28

8 Inspiration Sketch Contains particular info about the database Maintained online [Cormode 2011] 2/28

9 Inspiration Sketch Learning Contains particular info about the database Maintained online [Cormode 2011] Knowledge about underlying probability distribution 2/28

10 Inspiration Sketch Learning Contains particular info about the database Maintained online [Cormode 2011] Knowledge about underlying probability distribution by Compressive Sensing Recover «low-dimensional» object from few linear measurements (ex : sparse vector, low-rank matrix ) 2/28

11 Inspiration Sketch Learning Contains particular info about the database Maintained online [Cormode 2011] Knowledge about underlying probability distribution by Compressive Sensing Recover «low-dimensional» object from few linear measurements (ex : sparse vector, low-rank matrix ) Sketch = measurements of underlying probability distribution 2/28

12 Generalized Moments (Generalized) Compressive Sensing Practical Compressive Learning 3/28

13 Generalized Moments (Generalized) Compressive Sensing Practical Compressive Learning 3/28

14 Generalized Moments (Generalized) Compressive Sensing Practical Compressive Learning Compressive Sensing : (Random) Projections 3/28

15 Generalized Moments (Generalized) Compressive Sensing Practical Compressive Learning Compressive Sensing : (Random) Projections Online Distributed 3/28

16 Generalized Moments (Generalized) Compressive Sensing Practical Compressive Learning Compressive Sensing : (Random) Projections Robustness of learning Alg.? Online Distributed 3/28

17 Mixture Model Estimation Usual Compressive Sensing 4/28

18 Mixture Model Estimation Usual Compressive Sensing Sparsity 4/28

19 Mixture Model Estimation Usual Compressive Sensing Sparsity Generalized Compressive Sensing (or ) Infinite, continuous dictionary! 4/28

20 Outline Practical Approach (Section 2 & 3) Greedy algorithm inspired by Compressive Sensing Application to K-means, GMM with diagonal covariance 5/28

21 Outline Practical Approach (Section 2 & 3) Greedy algorithm inspired by Compressive Sensing Application to K-means, GMM with diagonal covariance Theoretical Analysis (Section 4) Information-preservation guarantee Infinite-dimensional Compressive Sensing 5/28

22 Outline Introduction Practical Approach Results Theoretical analysis Conclusion and outlooks

23 Cost function Alg. 6/28

24 Cost function Alg. 6/28

25 Cost function Alg. «Ideal» decoding scheme See [Foucart 2013] 6/28

26 Cost function Alg. «Ideal» decoding scheme NP-complete See [Foucart 2013] 6/28

27 Cost function Alg. «Ideal» decoding scheme NP-complete Two approaches: Convex relaxation Greedy See [Foucart 2013] 6/28

28 Cost function Alg. «Ideal» decoding scheme NP-complete Two approaches: Convex relaxation Greedy See [Foucart 2013] 6/28

29 Cost function Alg. «Ideal» decoding scheme NP-complete Two approaches: Convex relaxation Greedy Ideal decoding scheme (Section 4) See [Foucart 2013] 6/28

30 Cost function Alg. «Ideal» decoding scheme NP-complete Two approaches: Convex relaxation Greedy Ideal decoding scheme (Section 4) Highly non-convex See [Foucart 2013] 6/28

31 Cost function Alg. «Ideal» decoding scheme NP-complete Two approaches: Convex relaxation Greedy Ideal decoding scheme (Section 4) Highly non-convex Two approaches: Convex relaxation [Bunea 2010] See [Foucart 2013] Greedy Proposed 6/28

32 Proposed algorithm Orthogonal Matching Pursuit (OMP) [Mallat 1993, Pati 1993] 1. Add atom most correlated to residual 2. Perform Least-Squares 3. Repeat until desired sparsity 7/28

33 Proposed algorithm OMP with Replacement (OMPR) [Jain 2011] Orthogonal Matching Pursuit (OMP) [Mallat 1993, Pati 1993] 1. Add atom most correlated to residual 1. Add atom most correlated to residual 2. Perform Hard-Thresholding (if necessary) 2. Perform Least-Squares 3. Perform Least-Squares 3. Repeat until desired sparsity 4. Repeat twice desired sparsity Similar to CoSAMP [Needell 2008] or SubSpace Pursuit [Dai 2009] 7/28

34 Proposed algorithm OMP with Replacement (OMPR) [Jain 2011] Orthogonal Matching Pursuit (OMP) [Mallat 1993, Pati 1993] 1. Add atom most correlated to residual 1. Add atom most correlated to residual 2. Perform Hard-Thresholding (if necessary) 2. Perform Least-Squares 3. Perform Least-Squares 3. Repeat until desired sparsity 4. Repeat twice desired sparsity Similar to CoSAMP [Needell 2008] or SubSpace Pursuit [Dai 2009] Compressive Learning OMPR (CLOMPR) (proposed) 1. Add atom most correlated to residual with gradient descent 2. Perform Hard-Thresholding 3. Perform Non-Negative Least-Squares 4. Perform gradient descent on all parameters, initialized with current ones We cannot just add a component 5. Repeat twice desired sparsity 7/28

35 CLOMPR : illustration (schematic illustration) Goal : 3-GMM. Intermediary support. 8/28

36 CLOMPR : illustration (schematic illustration) Goal : 3-GMM. Intermediary support. 1 : Add atom with gradient descent 8/28

37 CLOMPR : illustration (schematic illustration) Goal : 3-GMM. Intermediary support. 1 : Add atom with gradient descent 2 : Hard Thresholding 3 : Non-Negative Least-Squares 8/28

38 CLOMPR : illustration (schematic illustration) Goal : 3-GMM. Intermediary support. 1 : Add atom with gradient descent 2 : Hard Thresholding 3 : Non-Negative Least-Squares 4 : Gradient Descent on all parameters 8/28

39 CLOMPR : illustration (schematic illustration) Goal : 3-GMM. Intermediary support. 1 : Add atom with gradient descent 2 : Hard Thresholding 3 : Non-Negative Least-Squares Final support 4 : Gradient Descent on all parameters 8/28

40 CLOMPR : illustration (schematic illustration) Goal : 3-GMM. Intermediary support. 1 : Add atom with gradient descent 2 : Hard Thresholding 3 : Non-Negative Least-Squares Final support 4 : Gradient Descent on all parameters Define? 8/28

41 Choice of sketch To implement CLOMPR, and must have a closed-form expression w.r.t. 9/28

42 Choice of sketch To implement CLOMPR, and must have a closed-form expression w.r.t. is spatially localized 9/28

43 Choice of sketch To implement CLOMPR, and must have a closed-form expression w.r.t. is spatially localized Need incoherent sampling -> Fourier sampling 9/28

44 Choice of sketch To implement CLOMPR, and must have a closed-form expression w.r.t. is spatially localized Need incoherent sampling -> Fourier sampling Closed-form for many models! (including alpha-stable ) 9/28

45 Choice of sketch To implement CLOMPR, and must have a closed-form expression w.r.t. is spatially localized Need incoherent sampling -> Fourier sampling Closed-form for many models! (including alpha-stable ) Random Fourier sampling [Candes 2006] Random Fourier features [Rahimi 2007] 9/28

46 Choice of sketch To implement CLOMPR, and must have a closed-form expression w.r.t. is spatially localized Need incoherent sampling -> Fourier sampling Define? Closed-form for many models! (including alpha-stable ) Random Fourier sampling [Candes 2006] Random Fourier features [Rahimi 2007] 9/28

47 Designing the frequency distribution Must adjust «scale» of distribution 10/28

48 Designing the frequency distribution Must adjust «scale» of distribution Adjust by hand Not that difficult The method is quite robust 10/28

49 Designing the frequency distribution Must adjust «scale» of distribution Adjust by hand Not that difficult The method is quite robust Cross-validation Can be very long! Used in practice [Sutherland2015] 10/28

50 Designing the frequency distribution Must adjust «scale» of distribution Adjust by hand Cross-validation Automatic Proposed Not that difficult The method is quite robust Can be very long! Used in practice [Sutherland2015] Partial pre-processing Heuristic based on GMMs-like distributions 10/28

51 Summary Given database,, 1. Design Partial pre-processing to choose Draw 2. Compute Online, distributed, GPU 3. Derive mixture model with CLOMPR 11/28

52 Outline Introduction Practical Approach Results Theoretical analysis Conclusion and outlooks

53 Application : K-means, GMM K-means Goal : Classic approach Algorithm : Lloyd-Max [Lloyd 1982] (Matlab s kmeans) 12/28

54 Application : K-means, GMM K-means Goal : Classic approach Algorithm : Lloyd-Max [Lloyd 1982] (Matlab s kmeans) Model : Compressive approach (clustered distribution = noisy mixture of Diracs) 12/28

55 Application : K-means, GMM K-means GMM diagonal cov. Classic approach Classic approach Goal : Algorithm : Lloyd-Max [Lloyd 1982] (Matlab s kmeans) Goal : Algorithm : EM [Dempster 1977] (VLFeat s gmm) Model : Compressive approach (clustered distribution = noisy mixture of Diracs) 12/28

56 Application : K-means, GMM K-means GMM diagonal cov. Classic approach Classic approach Goal : Algorithm : Lloyd-Max [Lloyd 1982] (Matlab s kmeans) Goal : Algorithm : EM [Dempster 1977] (VLFeat s gmm) Model : Compressive approach Model : Compressive approach (clustered distribution = noisy mixture of Diracs) 12/28

57 Large-scale result K-means (n=5, K=10) Comparison with Matlab s kmeans Faster and more memory efficient on large databases Number of measurements does not depend on N 13/28

58 Large-scale result K-means (n=5, K=10) Comparison with Matlab s kmeans VLFeat s gmm Faster and more memory efficient on large databases Number of measurements does not depend on N GMMs, diagonal cov. (n=5, K=5) 13/28

59 Application : spectral clustering K-means (n=10, K=10, m=1000) Mean and var. over 50 exp. Spectral clustering for classification [Uw 2001], augmented MNIST database [Loosli 2007]. 14/28

60 Application : spectral clustering K-means (n=10, K=10, m=1000) Mean and var. over 50 exp. Spectral clustering for classification [Uw 2001], augmented MNIST database [Loosli 2007]. 14/28

61 Application : spectral clustering K-means (n=10, K=10, m=1000) Mean and var. over 50 exp. Spectral clustering for classification [Uw 2001], augmented MNIST database [Loosli 2007]. CLOMPR performs better and is more stable with a large database 14/28

62 Application : speaker recognition Variant of CLOMPR, faster at large K GMM (n=12, K=64) Classical method for speaker recognition [Reynolds 2000] (for proof of concept) NIST 2005 database, MFCCs. Also performs better on a large database. 15/28

63 Outline Introduction Practical Approach Results Theoretical analysis Conclusion and outlooks

64 Information-preservation guarantees Guarantee for CLOMPR? Difficult! (non-convex, random ) 16/28

65 Information-preservation guarantees Guarantee for CLOMPR? Difficult! (non-convex, random ) Solve cost func. 16/28

66 Information-preservation guarantees Guarantee for CLOMPR? Difficult! (non-convex, random ) Solve cost func. Robustness to using instead of? Robustness to not being exactly a mixture model? Guarantees in terms of usual learning cost functions? o K-means : sum of distances to closest centroid o GMMs : negative log-likelihood 16/28

67 K-means : result Goal minimize (expected risk) 17/28

68 K-means : result Goal minimize (expected risk) - separation - bounded domain Hyp. Reweighted Fourier features (needed for theory, no effect in practice) 17/28

69 K-means : result Goal minimize (expected risk) - separation - bounded domain Hyp. Reweighted Fourier features (needed for theory, no effect in practice) If w.h.p. 17/28

70 GMMs with known covariance : result Goal minimize (expected risk) 18/28

71 GMMs with known covariance : result Goal minimize (expected risk) Large enough separation - bounded domain Hyp. Fourier features 18/28

72 GMMs with known covariance : result Goal minimize (expected risk) Large enough separation - bounded domain Hyp. Fourier features If w.h.p. L1 distance from p* to the set of (separated) GMMs 18/28

73 GMM trade-off Trade-off Separation of means Size of sketch 19/28

74 GMM with unknown diagonal covariance Related to learning cost function? Efficiency of «compressive» approach? 20/28

75 GMM with unknown diagonal covariance Related to learning cost function? Efficiency of «compressive» approach? Nevertheless : when is exactly a GMM 20/28

76 Number of measurements Relative SSE K-means In theory, at least 21/28

77 Number of measurements Relative SSE K-means GMMs, known cov. Relative loglike In theory, at least 21/28

78 Number of measurements Relative SSE K-means GMMs, diagonal cov. Relative loglike GMMs, known cov. Relative loglike In theory, at least 21/28

79 Number of measurements Relative SSE K-means GMMs, diagonal cov. Relative loglike GMMs, known cov. Relative loglike In theory, at least Empirically? 21/28

80 Sketch of proof : principle Goal : Existence of instance Optimal Decoder 22/28

81 Sketch of proof : principle Goal : Existence of instance Optimal Decoder Lower Restricted Isometry Property (LRIP) [Bourrier 2014] 22/28

82 Sketch of proof : principle Goal : Existence of instance Optimal Decoder Lower Restricted Isometry Property (LRIP) [Bourrier 2014] Ex : Quantization error 22/28

83 Proving the LRIP 1 : Proving non-uniform LRIP 23/28

84 Proving the LRIP 1 : Proving non-uniform LRIP Kernel mean embedding [Smola 2007] Random (Fourier) Features [Rahimi 2007] Hoeffding, Bernstein, chaining 23/28

85 Proving the LRIP 1 : Proving non-uniform LRIP Kernel mean embedding [Smola 2007] Random (Fourier) Features [Rahimi 2007] 2 : Use - coverings to extend to uniform LRIP Hoeffding, Bernstein, chaining 23/28

86 Proving the LRIP 1 : Proving non-uniform LRIP Kernel mean embedding [Smola 2007] Random (Fourier) Features [Rahimi 2007] 2 : Use - coverings to extend to uniform LRIP Basic Set Hoeffding, Bernstein, chaining Easy! Ex : Quantization error [Boufounos 2016] 23/28

87 Proving the LRIP 1 : Proving non-uniform LRIP Kernel mean embedding [Smola 2007] Random (Fourier) Features [Rahimi 2007] 2 : Use - coverings to extend to uniform LRIP Hoeffding, Bernstein, chaining Basic Set Normalized Secant Set Easy! Ex : Quantization error [Boufounos 2016] Finite-dimensional Infinite-dimensional Easy! Difficult! 23/28

88 Results Sufficient Conditions Bad! has finite covering numbers Ex : GMMs with unknown covariance 24/28

89 Results Sufficient Conditions Bad! has finite covering numbers mixtures of sufficiently separated distributions with smooth Ex : GMMs with unknown covariance + guarantees w.r.t. risk Ex : Mixture of Diracs (K-means) with «Smooth» Random Features Smooth risk 24/28

90 Results Sufficient Conditions Bad! has finite covering numbers mixtures of sufficiently separated distributions with smooth Ex : GMMs with unknown covariance + guarantees w.r.t. risk Ex : Mixture of Diracs (K-means) with «Smooth» Random Features Smooth risk «Smoother» Random Features Ex : Mixtures of Diracs (K-means) with GMMs with known covariance 24/28

91 Outline Introduction Practical Approach Results Theoretical analysis Conclusion and outlooks

92 Conclusions Contributions Greedy algorithm for large-scale mixture learning from random moments 25/28

93 Conclusions Contributions Greedy algorithm for large-scale mixture learning from random moments Efficient heuristic to design the sketching operator as Fourier sampling 25/28

94 Conclusions Contributions Greedy algorithm for large-scale mixture learning from random moments Efficient heuristic to design the sketching operator as Fourier sampling Application to mixtures of Diracs, GMMs 25/28

95 Conclusions Contributions Greedy algorithm for large-scale mixture learning from random moments Efficient heuristic to design the sketching operator as Fourier sampling Application to mixtures of Diracs, GMMs Evaluation on synthetic and real data 25/28

96 Conclusions Contributions Greedy algorithm for large-scale mixture learning from random moments Efficient heuristic to design the sketching operator as Fourier sampling Application to mixtures of Diracs, GMMs Evaluation on synthetic and real data Information preservation guarantees using infinite-dimensional Compressive Sensing 25/28

97 The SketchMLbox SketchMLbox (sketchml.gforge.inria.fr) Mixture of Diracs («K-means») GMMs with known covariance GMMs with unknown diagonal covariance Soon: Alpha-stable Gaussian Locally Linear Mapping [Deleforge 2014] Optimized for user-defined 26/28

98 Outlooks : algorithmic guarantees Algorithmic guarantees? Non-convex cost function, randomized algorithm 27/28

99 Outlooks : algorithmic guarantees Algorithmic guarantees? Non-convex cost function, randomized algorithm Locally convex? Cost function for a 3-GMM in dimension 1 at positions {x,y,z}={-3,0,3} 27/28

100 Outlooks : algorithmic guarantees Algorithmic guarantees? Non-convex cost function, randomized algorithm Locally convex? Basin of attraction? [Jacques 2016, Candes 2016 ] Cost function for a 3-GMM in dimension 1 at positions {x,y,z}={-3,0,3} 27/28

101 Outlooks : algorithmic guarantees Algorithmic guarantees? Non-convex cost function, randomized algorithm Locally convex? Basin of attraction? [Jacques 2016, Candes 2016 ] Reached by CLOMPR with reasonable hypotheses? Cost function for a 3-GMM in dimension 1 at positions {x,y,z}={-3,0,3} 27/28

102 Outlooks : algorithmic guarantees Algorithmic guarantees? Non-convex cost function, randomized algorithm Locally convex? Basin of attraction? [Jacques 2016, Candes 2016 ] Reached by CLOMPR with reasonable hypotheses? Stopping condition? Cost function for a 3-GMM in dimension 1 at positions {x,y,z}={-3,0,3} 27/28

103 Outlooks : algorithmic guarantees Algorithmic guarantees? Non-convex cost function, randomized algorithm Locally convex? Basin of attraction? [Jacques 2016, Candes 2016 ] Reached by CLOMPR with reasonable hypotheses? Stopping condition? Recent result : locally block convex Cost function for a 3-GMM in dimension 1 at positions {x,y,z}={-3,0,3} 27/28

104 Outlooks : extension of the methods 1. Bridge observed gap between theory and practice? 28/28

105 Outlooks : extension of the methods 1. Bridge observed gap between theory and practice? Does not come from - coverings 28/28

106 Outlooks : extension of the methods 1. Bridge observed gap between theory and practice? Does not come from - coverings Improve concentration inequalities? 28/28

107 Outlooks : extension of the methods 1. Bridge observed gap between theory and practice? Does not come from - coverings Improve concentration inequalities? 2. Extend framework to other tasks? 28/28

108 Outlooks : extension of the methods 1. Bridge observed gap between theory and practice? Does not come from - coverings Improve concentration inequalities? 2. Extend framework to other tasks? Recent paper submitted to AISTATS : PCA 28/28

109 Outlooks : extension of the methods 1. Bridge observed gap between theory and practice? Does not come from - coverings Improve concentration inequalities? 2. Extend framework to other tasks? Recent paper submitted to AISTATS : PCA Other existing use of Fourier sketches? : e.g. classification [Sutherland 2015] 28/28

110 Outlooks : extension of the methods 1. Bridge observed gap between theory and practice? Does not come from - coverings Improve concentration inequalities? 2. Extend framework to other tasks? Recent paper submitted to AISTATS : PCA Other existing use of Fourier sketches? : e.g. classification [Sutherland 2015] Other kernel methods (algorithmic? Theoretical?) 28/28

111 Outlooks : extension of the methods 1. Bridge observed gap between theory and practice? Does not come from - coverings Improve concentration inequalities? 2. Extend framework to other tasks? Recent paper submitted to AISTATS : PCA Other existing use of Fourier sketches? : e.g. classification [Sutherland 2015] Other kernel methods (algorithmic? Theoretical?) 3. Extension to multi-layer sketches? (Neural networks ) 28/28

112 Outlooks : extension of the methods 1. Bridge observed gap between theory and practice? Does not come from - coverings Improve concentration inequalities? 2. Extend framework to other tasks? Recent paper submitted to AISTATS : PCA Other existing use of Fourier sketches? : e.g. classification [Sutherland 2015] Other kernel methods (algorithmic? Theoretical?) 3. Extension to multi-layer sketches? (Neural networks ) May be adapted to e.g. GMMs with unknown covariance 28/28

113 Outlooks : extension of the methods 1. Bridge observed gap between theory and practice? Does not come from - coverings Improve concentration inequalities? 2. Extend framework to other tasks? Recent paper submitted to AISTATS : PCA Other existing use of Fourier sketches? : e.g. classification [Sutherland 2015] Other kernel methods (algorithmic? Theoretical?) 3. Extension to multi-layer sketches? (Neural networks ) May be adapted to e.g. GMMs with unknown covariance Equivalence between LRIP and instance optimality still valid for non-linear operators! 28/28

114 Outlooks : extension of the methods 1. Bridge observed gap between theory and practice? Does not come from - coverings Improve concentration inequalities? 2. Extend framework to other tasks? Recent paper submitted to AISTATS : PCA Other existing use of Fourier sketches? : e.g. classification [Sutherland 2015] Other kernel methods (algorithmic? Theoretical?) 3. Extension to multi-layer sketches? (Neural networks ) May be adapted to e.g. GMMs with unknown covariance Equivalence between LRIP and instance optimality still valid for non-linear operators! CLOMPR and current sufficient conditions no longer valid 28/28

115 Thank you! K., Bourrier, Gribonval, Perez. Sketching for Large-Scale Learning of Mixture Models ICASSP 2016 K., Bourrier, Gribonval, Perez. Sketching for Large-Scale Learning of Mixture Models (extended version) submitted to Information and Inference, arxiv: K., Tremblay, Gribonval, Traonmilin. Compressive K-means ICASSP 2017 Gribonval, Blanchard, K., Traonmilin. Random moments for Sketched Statistical Learning submitted to AISTATS 2017, extended version soon

116 Appendix : CLOMPR

Rémi Gribonval Inria Rennes - Bretagne Atlantique.

Rémi Gribonval Inria Rennes - Bretagne Atlantique. Rémi Gribonval Inria Rennes - Bretagne Atlantique remi.gribonval@inria.fr Contributors & Collaborators Anthony Bourrier Nicolas Keriven Yann Traonmilin Gilles Puy Gilles Blanchard Mike Davies Tomer Peleg

More information

The Analysis Cosparse Model for Signals and Images

The Analysis Cosparse Model for Signals and Images The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under

More information

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

A simple test to check the optimality of sparse signal approximations

A simple test to check the optimality of sparse signal approximations A simple test to check the optimality of sparse signal approximations Rémi Gribonval, Rosa Maria Figueras I Ventura, Pierre Vergheynst To cite this version: Rémi Gribonval, Rosa Maria Figueras I Ventura,

More information

Compressive Sensing and Beyond

Compressive Sensing and Beyond Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered

More information

Greedy Signal Recovery and Uniform Uncertainty Principles

Greedy Signal Recovery and Uniform Uncertainty Principles Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Scale Mixture Modeling of Priors for Sparse Signal Recovery Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Non-convex Robust PCA: Provable Bounds

Non-convex Robust PCA: Provable Bounds Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Introduction to Sparsity Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Outline Understanding Sparsity Total variation Compressed sensing(definition) Exact recovery with sparse prior(l 0 ) l 1 relaxation

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

Generalized Orthogonal Matching Pursuit- A Review and Some

Generalized Orthogonal Matching Pursuit- A Review and Some Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

A Convex Approach for Designing Good Linear Embeddings. Chinmay Hegde

A Convex Approach for Designing Good Linear Embeddings. Chinmay Hegde A Convex Approach for Designing Good Linear Embeddings Chinmay Hegde Redundancy in Images Image Acquisition Sensor Information content

More information

The Informativeness of k-means for Learning Mixture Models

The Informativeness of k-means for Learning Mixture Models The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,

More information

Dictionary Learning Using Tensor Methods

Dictionary Learning Using Tensor Methods Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone

More information

Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide

Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1. User Guide Alexey Ozerov 1, Mathieu Lagrange and Emmanuel Vincent 1 1 INRIA, Centre de Rennes - Bretagne Atlantique Campus de Beaulieu, 3

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms Sparse analysis Lecture III: Dictionary geometry and greedy algorithms Anna C. Gilbert Department of Mathematics University of Michigan Intuition from ONB Key step in algorithm: r, ϕ j = x c i ϕ i, ϕ j

More information

Robust Sparse Recovery via Non-Convex Optimization

Robust Sparse Recovery via Non-Convex Optimization Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

Applied Machine Learning for Biomedical Engineering. Enrico Grisan Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

Info-Greedy Sequential Adaptive Compressed Sensing

Info-Greedy Sequential Adaptive Compressed Sensing Info-Greedy Sequential Adaptive Compressed Sensing Yao Xie Joint work with Gabor Braun and Sebastian Pokutta Georgia Institute of Technology Presented at Allerton Conference 2014 Information sensing for

More information

Signal Sparsity Models: Theory and Applications

Signal Sparsity Models: Theory and Applications Signal Sparsity Models: Theory and Applications Raja Giryes Computer Science Department, Technion Michael Elad, Technion, Haifa Israel Sangnam Nam, CMI, Marseille France Remi Gribonval, INRIA, Rennes France

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Approximating the Covariance Matrix with Low-rank Perturbations

Approximating the Covariance Matrix with Low-rank Perturbations Approximating the Covariance Matrix with Low-rank Perturbations Malik Magdon-Ismail and Jonathan T. Purnell Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 {magdon,purnej}@cs.rpi.edu

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

MATCHING PURSUIT WITH STOCHASTIC SELECTION

MATCHING PURSUIT WITH STOCHASTIC SELECTION 2th European Signal Processing Conference (EUSIPCO 22) Bucharest, Romania, August 27-3, 22 MATCHING PURSUIT WITH STOCHASTIC SELECTION Thomas Peel, Valentin Emiya, Liva Ralaivola Aix-Marseille Université

More information

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery Anna C. Gilbert Department of Mathematics University of Michigan Connection between... Sparse Approximation and Compressed

More information

Sensing systems limited by constraints: physical size, time, cost, energy

Sensing systems limited by constraints: physical size, time, cost, energy Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles SIAM Student Research Conference Deanna Needell Joint work with Roman Vershynin and Joel Tropp UC Davis, May 2008 CoSaMP: Greedy Signal

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Scalable Subspace Clustering

Scalable Subspace Clustering Scalable Subspace Clustering René Vidal Center for Imaging Science, Laboratory for Computational Sensing and Robotics, Institute for Computational Medicine, Department of Biomedical Engineering, Johns

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Journal of Information & Computational Science 11:9 (214) 2933 2939 June 1, 214 Available at http://www.joics.com Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Jingfei He, Guiling Sun, Jie

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

Some Recent Advances. in Non-convex Optimization. Purushottam Kar IIT KANPUR

Some Recent Advances. in Non-convex Optimization. Purushottam Kar IIT KANPUR Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk Recap of Convex Optimization Why Non-convex Optimization? Non-convex Optimization: A Brief Introduction Robust

More information

Compressed Sensing: Extending CLEAN and NNLS

Compressed Sensing: Extending CLEAN and NNLS Compressed Sensing: Extending CLEAN and NNLS Ludwig Schwardt SKA South Africa (KAT Project) Calibration & Imaging Workshop Socorro, NM, USA 31 March 2009 Outline 1 Compressed Sensing (CS) Introduction

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

When Dictionary Learning Meets Classification

When Dictionary Learning Meets Classification When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August

More information

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp CoSaMP Iterative signal recovery from incomplete and inaccurate samples Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Joint with D. Needell

More information

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France. Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview of the course Introduction sparsity & data compression inverse problems

More information

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Boaz Nadler The Weizmann Institute of Science Israel Joint works with Inbal Horev, Ronen Basri, Meirav Galun and Ery Arias-Castro

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Three Generalizations of Compressed Sensing

Three Generalizations of Compressed Sensing Thomas Blumensath School of Mathematics The University of Southampton June, 2010 home prev next page Compressed Sensing and beyond y = Φx + e x R N or x C N x K is K-sparse and x x K 2 is small y R M or

More information

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes

Mixtures of Gaussians with Sparse Regression Matrices. Constantinos Boulis, Jeffrey Bilmes Mixtures of Gaussians with Sparse Regression Matrices Constantinos Boulis, Jeffrey Bilmes {boulis,bilmes}@ee.washington.edu Dept of EE, University of Washington Seattle WA, 98195-2500 UW Electrical Engineering

More information

Random Sampling of Bandlimited Signals on Graphs

Random Sampling of Bandlimited Signals on Graphs Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Multipath Matching Pursuit

Multipath Matching Pursuit Multipath Matching Pursuit Submitted to IEEE trans. on Information theory Authors: S. Kwon, J. Wang, and B. Shim Presenter: Hwanchol Jang Multipath is investigated rather than a single path for a greedy

More information

A new method on deterministic construction of the measurement matrix in compressed sensing

A new method on deterministic construction of the measurement matrix in compressed sensing A new method on deterministic construction of the measurement matrix in compressed sensing Qun Mo 1 arxiv:1503.01250v1 [cs.it] 4 Mar 2015 Abstract Construction on the measurement matrix A is a central

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Fast Sparse Representation Based on Smoothed

Fast Sparse Representation Based on Smoothed Fast Sparse Representation Based on Smoothed l 0 Norm G. Hosein Mohimani 1, Massoud Babaie-Zadeh 1,, and Christian Jutten 2 1 Electrical Engineering Department, Advanced Communications Research Institute

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Generalized greedy algorithms.

Generalized greedy algorithms. Generalized greedy algorithms. François-Xavier Dupé & Sandrine Anthoine LIF & I2M Aix-Marseille Université - CNRS - Ecole Centrale Marseille, Marseille ANR Greta Séminaire Parisien des Mathématiques Appliquées

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

TUM 2016 Class 3 Large scale learning by regularization

TUM 2016 Class 3 Large scale learning by regularization TUM 2016 Class 3 Large scale learning by regularization Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Learning problem Solve min w E(w), E(w) = dρ(x, y)l(w x, y) given (x 1, y 1 ),..., (x n, y n ) Beyond

More information

Recovery of Sparse Signals Using Multiple Orthogonal Least Squares

Recovery of Sparse Signals Using Multiple Orthogonal Least Squares Recovery of Sparse Signals Using Multiple Orthogonal east Squares Jian Wang, Ping i Department of Statistics and Biostatistics arxiv:40.505v [stat.me] 9 Oct 04 Department of Computer Science Rutgers University

More information

Exponential decay of reconstruction error from binary measurements of sparse signals

Exponential decay of reconstruction error from binary measurements of sparse signals Exponential decay of reconstruction error from binary measurements of sparse signals Deanna Needell Joint work with R. Baraniuk, S. Foucart, Y. Plan, and M. Wootters Outline Introduction Mathematical Formulation

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Quantized Iterative Hard Thresholding:

Quantized Iterative Hard Thresholding: Quantized Iterative Hard Thresholding: ridging 1-bit and High Resolution Quantized Compressed Sensing Laurent Jacques, Kévin Degraux, Christophe De Vleeschouwer Louvain University (UCL), Louvain-la-Neuve,

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

Learning sets and subspaces: a spectral approach

Learning sets and subspaces: a spectral approach Learning sets and subspaces: a spectral approach Alessandro Rudi DIBRIS, Università di Genova Optimization and dynamical processes in Statistical learning and inverse problems Sept 8-12, 2014 A world of

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies

Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies July 12, 212 Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies Morteza Mardani Dept. of ECE, University of Minnesota, Minneapolis, MN 55455 Acknowledgments:

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Clustering and Dimensionality Reduction Clustering Kmeans DBSCAN Gaussian Mixture Model Dimensionality Reduction principal component analysis manifold learning Other Feature Processing

More information

Algorithms for sparse analysis Lecture I: Background on sparse approximation

Algorithms for sparse analysis Lecture I: Background on sparse approximation Algorithms for sparse analysis Lecture I: Background on sparse approximation Anna C. Gilbert Department of Mathematics University of Michigan Tutorial on sparse approximations and algorithms Compress data

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information