Rémi Gribonval Inria Rennes - Bretagne Atlantique.

Size: px
Start display at page:

Download "Rémi Gribonval Inria Rennes - Bretagne Atlantique."

Transcription

1 Rémi Gribonval Inria Rennes - Bretagne Atlantique remi.gribonval@inria.fr

2 Contributors & Collaborators Anthony Bourrier Nicolas Keriven Yann Traonmilin Gilles Puy Gilles Blanchard Mike Davies Tomer Peleg Patrick Perez 2

3 Agenda From Compressive Sensing to Compressive Learning? Compressive Clustering / Compressive GMM Information-preserving projections & sketches Conclusion 3

4 Algorithm Experimental results ˆ s}ks=1 and support ˆ = {µ s}ks=1. weights { Data setup: = 1, ( 1,..., k ) drawn uniformly on the simplex. Entries of µ1,..., µk N (, 1). Machine Learning i.i.d. rt functions: ponents to add to the support! hafµ, r i, added to the support ˆ. Algorithm heuristics: Frequencies drawn i.i.d. from N (, Id). New support function search (step 1) initialized as ru, where r uniformly drawn in, max x 2 and u uniformly drawn on B2(, 1). x2 with positivity constraints on coefficients: Available data argmin z (3) U 22, 2RK + Comparison between: Our method: Sketch is computed on-the-fly and data is discarded. EM: Data isof stored to allow the standard optimization steps to be pertraining collection feature vectors = point cloud formed. and corresponding support are kept ˆ 1,..., ˆ k. coefficients Goals orithm on the objective function, with initialization at nd coefficients. Quality measures: KL Divergence and Hellinger distance. Compressed EM infer parametersn to achieve a certain task KL div. Hell. Mem. KL div. Hell. Mem. Third step 1.68 ±.28.6 ± ±.44.7 ±.3.24 generalization to future samples with the same probability distribution 1.24 ±.31.2 ± ±.21.1 ± Second step ±.15.1 ± ±.21.1 ±.2 24 Table 1: Comparison between our method and an EM algorithm. n = 2, k = 1, m = 1. Examples 6 n=1, Hell. for 8% k*n/m A =) ustration in dimension n = 1 for k = 3 Gaus1. Bottom: Iteration 2. Blue curve=true mixture, ed mixture, Green curve=gradient function. Green ids, Red Dots=Reconstructed Centroids PCA principal subspace sketch size m Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruction quality for n = 1. Clustering centroids Dictionary learning dictionary Classification classifier parameters (e.g. support vectors) 4

5 Challenging dimensions Point cloud = large matrix of feature vectors 5

6 Challenging dimensions Point cloud = large matrix of feature vectors x 1 5

7 Challenging dimensions Point cloud = large matrix of feature vectors x 1 x 2 5

8 Challenging dimensions Point cloud = large matrix of feature vectors x 1 x 2 x N 5

9 Challenging dimensions Point cloud = large matrix of feature vectors x 1 x 2 x N High feature dimension n Large collection size N 5

10 Challenging dimensions Point cloud = large matrix of feature vectors x 1 x 2 x N High feature dimension n Large collection size N Challenge: compress before learning? 5

11 Compressive Machine Learning? Point cloud = large matrix of feature vectors x 1 x 2 x N M Y = M y 1 y 2 y N 6

12 Compressive Machine Learning? Point cloud = large matrix of feature vectors x 1 x 2 x N M Y = M y 1 y 2 y N 6

13 Compressive Machine Learning? Point cloud = large matrix of feature vectors x 1 x 2 x N Reduce feature dimension [Calderbank & al 29, Reboredo & al 213] (Random) feature projection Exploits / needs low-dimensional feature model Y = M y 1 y 2 y N 6

14 Challenges of large collections Feature projection: limited impact Y = M 7

15 Challenges of large collections Feature projection: limited impact Y = M Big Data Challenge: compress collection size 7

16 Compressive Machine Learning? Point cloud = empirical probability distribution 8

17 Compressive Machine Learning? Point cloud = empirical probability distribution Reduce collection dimension coresets see e.g. [Agarwal & al 23, Felman 21] sketching & hashing see e.g. [Thaper & al 22, Cormode & al 25] 8

18 Compressive Machine Learning? Point cloud = empirical probability distribution M Sketching operator nonlinear in the feature vectors linear in their probability distribution Reduce collection dimension coresets see e.g. [Agarwal & al 23, Felman 21] sketching & hashing see e.g. [Thaper & al 22, Cormode & al 25] z 2 R m 8

19 Compressive Machine Learning? Point cloud = empirical probability distribution M Sketching operator nonlinear in the feature vectors linear in their probability distribution Reduce collection dimension coresets see e.g. [Agarwal & al 23, Felman 21] sketching & hashing see e.g. [Thaper & al 22, Cormode & al 25] z 2 R m 8

20 1.24 ±.31.2 ± ±.15.1 ± ±.21.1 ± ±.21.1 ±.2 24 Example: Compressive Clustering Table 1: Comparison between our method and an EM algorithm. 2, k = 1, m = 1. N = 1; n = 2 M z2r n = m m = 6 6 n=1, Hell. for 8% A =) k*n/m.6 3 Recovery algorithm estimated centroids ground truth sketch size m Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc tion quality for n = 1. 9

21 Computational impact of sketching Time (s) Computation time K = Time (s) K = N Collection size N 1 8 Sketching (no distr. computing) CLOMP Sketching (no distr. computing) CLOMPR CLOMP CLOMPR BS CGMM CHS BS CGMM EM EM N K = 5 Sketch + freq. (Compressive methods) Data (EM) Time (s) Memory (bytes) Memory K = 2 Sketch + freq. (Compressive methods) Data (EM) N N Collection K = size 2 N Sketch + freq. (Compressive methods) Data (EM) Sketching (no distr. computing) CLOMP CLOMPR BS CGMM EM Memory (bytes) Ph.D. A. Bourrier & N. Keriven Memory (bytes)

22 The Sketch Trick Data distribution p(x) 11

23 The Sketch Trick Data distribution p(x) Machine Learning method of moments compressive learning Probability space p Signal Processing inverse problems compressive sensing Signal space x M Linear projection M z Sketch space y Observation space 11

24 The Sketch Trick Data distribution p(x) Sketch z` = Z = Eh`() 1 N h`(x)p(x)dx N h`(x i ) i=1 Machine Learning method of moments compressive learning Probability space p M z Sketch space Linear projection Signal Processing inverse problems compressive sensing M Signal space x y Observation space 11

25 The Sketch Trick Data distribution p(x) Sketch z` = Z = Eh`() 1 N h`(x)p(x)dx N i=1 h`(x i ) nonlinear in the feature vectors linear in the distribution p(x) finite-dimensional Mean Map Embedding, cf Smola & al 27, Sriperumbudur & al 21 Machine Learning method of moments compressive learning Probability space p M z Sketch space Linear projection Signal Processing inverse problems compressive sensing M Signal space x y Observation space 11

26 The Sketch Trick Information preservation? Data distribution p(x) Sketch z` = Z = Eh`() 1 N h`(x)p(x)dx N i=1 h`(x i ) nonlinear in the feature vectors linear in the distribution p(x) finite-dimensional Mean Map Embedding, cf Smola & al 27, Sriperumbudur & al 21 Machine Learning method of moments compressive learning Probability space p M z Sketch space Linear projection Signal Processing inverse problems compressive sensing M Signal space x y Observation space 11

27 The Sketch Trick Dimension reduction? Data distribution p(x) Sketch z` = Z = Eh`() 1 N h`(x)p(x)dx N h`(x i ) i=1 nonlinear in the feature vectors linear in the distribution p(x) finite-dimensional Mean Map Embedding, cf Smola & al 27, Sriperumbudur & al 21 Machine Learning method of moments compressive learning Probability space p M z Sketch space Linear projection Signal Processing inverse problems compressive sensing M Signal space x y Observation space 12

28 Compressive Learning (Heuristic) Examples

29 Compressive Machine Learning Point cloud = empirical probability distribution M Sketching operator z 2 R m Reduce collection dimension ~ sketching z` = 1 N N i=1 h`(x i ) 1 apple ` apple m Choosing information preserving sketch? 14

30 austure, reen ±.28.6 ± ±.31.2 ± ±.15.1 ± ±.44.7 ± ±.21.1 ± ±.21.1 ±.2 24 Example: Compressive Clustering Table 1: Comparison between our method and an EM algorithm. n = 2, k = 1, m = 1. Goal: find k centroids Sketching approach 6 n=1, Hell. for 8% 1 6 p(x) is spatially localized need incoherent sampling choose Fourier sampling k*n/m A =) sample characteristic function z` = N 1 12 sketch size m N e jw`> xi i=1 Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc~pooled Random Fourier Features, cf Rahimi & Recht 27 tion quality for n = 1. Standard approach = K-means choose sampling frequencies! ` 2 Rn 15

31 austure, reen ±.28.6 ± ±.31.2 ± ±.15.1 ± ±.44.7 ± ±.21.1 ± ±.21.1 ±.2 24 Example: Compressive Clustering Table 1: Comparison between our method and an EM algorithm. n = 2, k = 1, m = 1. Goal: find k centroids Sketching approach 6 n=1, Hell. for 8% 1 6 p(x) is spatially localized need incoherent sampling choose Fourier sampling k*n/m A =) sample characteristic function z` = N 1 12 sketch size m N e jw`> xi i=1 Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc~pooled Random Fourier Features, cf Rahimi & Recht 27 tion quality for n = 1. Standard approach = K-means choose sampling frequencies! ` 2 Rn 15

32 austure, reen ±.28.6 ± ±.31.2 ± ±.15.1 ± ±.44.7 ± ±.21.1 ± ±.21.1 ±.2 24 Example: Compressive Clustering Table 1: Comparison between our method and an EM algorithm. n = 2, k = 1, m = 1. Goal: find k centroids Sketching approach 6 n=1, Hell. for 8% 1 6 p(x) is spatially localized need incoherent sampling choose Fourier sampling k*n/m A =) sample characteristic function z` = N 1 12 sketch size m N e jw`> xi i=1 Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc~pooled Random Fourier Features, cf Rahimi & Recht 27 tion quality for n = 1. Standard approach = K-means choose sampling frequencies! ` 2 Rn 15

33 austure, reen ±.28.6 ± ±.31.2 ± ±.15.1 ± ±.44.7 ± ±.21.1 ± ±.21.1 ±.2 24 Example: Compressive Clustering Table 1: Comparison between our method and an EM algorithm. n = 2, k = 1, m = 1. Goal: find k centroids Sketching approach 6 n=1, Hell. for 8% 1 6 p(x) is spatially localized need incoherent sampling choose Fourier sampling k*n/m A =) sample characteristic function z` = N 1 12 sketch size m N e jw`> xi i=1 Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruc~pooled Random Fourier Features, cf Rahimi & Recht 27 tion quality for n = 1. Standard approach = K-means choose sampling frequencies Keriven & al, Sketching for Large-Scale Learning of Mixture Models (ICASSP 216)! ` 2 Rn 15

34 austure, reen ±.28.6 ± ±.31.2 ± ±.15.1 ± ±.44.7 ± ±.21.1 ± ±.21.1 ±.2 24 Example: Compressive Clustering Table 1: Comparison between our method and an EM algorithm. n = 2, k = 1, m = 1. Goal: find k centroids 6 n=1, Hell. for 8% k*n/m A =) sketch size m Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruction quality for n = 1. m N = 1; n = 2 M Sampled Characteristic Function z2r m = 6 16

35 austure, reen ±.28.6 ± ±.31.2 ± ±.15.1 ± ±.44.7 ± ±.21.1 ± ±.21.1 ±.2 24 Example: Compressive Clustering Table 1: Comparison between our method and an EM algorithm. n = Density model=gmm with variance = identity 2, k = 1, m = 1. Goal: find k centroids p 6 n=1, Hell. for 8% 1 6 K k p k ground truth k= k*n/m A =) sketch size m Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruction quality for n = 1. m N = 1; n = 2 M Sampled Characteristic Function z2r m = 6 z = Mp K k=1 k Mp k 16

36 austure, reen ±.28.6 ± ±.31.2 ± ±.15.1 ± ±.44.7 ± ±.21.1 ± ±.21.1 ±.2 24 Example: Compressive Clustering Table 1: Comparison between our method and an EM algorithm. n = Density model=gmm with variance = identity 2, k = 1, m = 1. Goal: find k centroids p 6 n=1, Hell. for 8% 1 6 K k p k ground truth k=1 estimated centroids Recovery algorithm = decoder k*n/m A =) sketch size m Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruction quality for n = 1. m N = 1; n = 2 M Sampled Characteristic Function z2r m = 6 z = Mp K k=1 k Mp k 16

37 austure, reen ±.28.6 ± ±.31.2 ± ±.15.1 ± ±.44.7 ± ±.21.1 ± ±.21.1 ±.2 24 Example: Compressive Clustering Table 1: Comparison between our method and an EM algorithm. n = Density model=gmm with variance = identity 2, k = 1, m = 1. Goal: find k centroids p 6 n=1, Hell. for 8% 1 6 K k p k ground truth k=1 estimated centroids Recovery algorithm = decoder k*n/m Compressive Learning OMP A =) similar to: OMP with Replacement, Subspace Pursuit & CoSaMP sketch size m Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruction quality for n = 1. m N = 1; n = 2 M Sampled Characteristic Function z2r m = 6 z = Mp K k=1 k Mp k 16

38 austure, reen ±.28.6 ± ±.31.2 ± ±.15.1 ± ±.44.7 ± ±.21.1 ± ±.21.1 ±.2 24 Example: Compressive GMM Table 1: Comparison between our method and an EM algorithm. n = Density model=gmm with variance = diagonal 2, k = 1, m = 1. Goal: find k centroids p 6 n=1, Hell. for 8% 1 6 K k p k ground truth k=1 estimated centroids Recovery algorithm = decoder k*n/m Compressive Hierarchical Splitting (CHS) A =) = extension to general GMM sketch size m Figure 3: Left: Example of data and sketch for n = 2. Right: Reconstruction quality for n = 1. m N = 1; n = 2 M Sampled Characteristic Function z2r m = 6 z = Mp K k=1 k Mp k 17

39 Application: Speaker Verification Claimed speaker identity Alice Bob speech utterance 18

40 Application: Speaker Verification Alice ~ 5 Gbytes ~ 1 hours of speech NIST SRE 25 database EM or CHS UBM = Universal Background Model speech utterance 19

41 Application: Speaker Verification Alice EM or CHS UBM = Universal Background Model Speaker specific model speech utterance 2

42 Application: Speaker Verification Alice EM or CHS speech utterance 21

43 Application: Speaker Verification Results (DET-curves) ~ 5 Gbytes ~ 1 hours of speech MFCC coefficients x i 2 R 12 N = 3 22

44 Application: Speaker Verification Results (DET-curves) ~ 5 Gbytes ~ 1 hours of speech MFCC coefficients x i 2 R 12 N = 3 After silence detection N = 6 22

45 Application: Speaker Verification Results (DET-curves) ~ 5 Gbytes ~ 1 hours of speech MFCC coefficients x i 2 R 12 N = 3 After silence detection N = 6 Maximum size manageable by EM N = 3 22

46 Application: Speaker Verification Results (DET-curves) ~ 5 Gbytes ~ 1 hours of speech CHS MFCC coefficients x i 2 R 12 N = 3 After silence detection N = 6 Maximum size manageable by EM N = 3 for EM for CHS 23

47 Application: Speaker Verification Results (DET-curves) ~ 5 Gbytes ~ 1 hours of speech CHS MFCC coefficients x i 2 R 12 N = 3 After silence detection N = 6 for CHS Maximum size manageable by EM N = 3 for EM 24

48 Application: Speaker Verification Results (DET-curves) ~ 5 Gbytes ~ 1 hours of speech CHS m= fold compression 25

49 Application: Speaker Verification Results (DET-curves) ~ 5 Gbytes ~ 1 hours of speech CHS m= fold compression m= fold compression 25

50 Application: Speaker Verification Results (DET-curves) ~ 5 Gbytes ~ 1 hours of speech CHS m= fold compression m= fold compression m= fold compression exploit whole collection improved performance 25

51 Computational Efficiency

52 Computational Aspects z` = 1 N Sketching empirical characteristic function N i=1 e jw>` xi 27

53 Computational Aspects z` = 1 N Sketching empirical characteristic function N i=1 e jw>` xi 27

54 Computational Aspects z` = 1 N Sketching empirical characteristic function N i=1 e jw>` xi h( ) =e j( ) h(w) W W 27

55 Computational Aspects z` = 1 N Sketching empirical characteristic function N i=1 h( ) =e j( ) e jw>` xi h(w) z average W W 27

56 Computational Aspects z` = 1 N Sketching empirical characteristic function N i=1 h( ) =e j( ) e jw>` xi h(w) z ~ One-layer random neural net DNN ~ hierarchical sketching? see also [Bruna & al 213, Giryes & al 215] average W W 27

57 Computational Aspects z` = 1 N Sketching empirical characteristic function N i=1 h( ) =e j( ) e jw>` xi h(w) z ~ One-layer random neural net DNN ~ hierarchical sketching? see also [Bruna & al 213, Giryes & al 215] average Privacy-reserving sketch and forget W W 28

58 Computational Aspects z` = 1 N Sketching empirical characteristic function N i=1 h( ) =e j( ) e jw>` xi h(w) z Streaming algorithms One pass; online update average W W 29

59 Computational Aspects z` = 1 N Sketching empirical characteristic function N i=1 h( ) =e j( ) e jw>` xi h(w) z Streaming algorithms One pass; online update average W W streaming 29

60 Computational Aspects z` = 1 N Sketching empirical characteristic function N i=1 h( ) =e j( ) e jw>` xi z Distributed computing Decentralized (HADOOP) / parallel (GPU) average h(w) DIS TRI BU TED W W 3

61 Summary: Compressive GMM Dimension reduction Efficiency Time (s) Time (s) K = 2 K = 5 Memory (bytes) Sketch + freq. (Compressive methods) Data (EM) 1 2 Sketching (no distr. computing) 1 5 Sketching (no distr. computing) CLOMP Sketching (n 1 1 CLOMP CLOMPR CLOMP 1 2 CLOMPR BS CGMM CLOMPR BS CGMM EM CHS EM 1 4 BS CGMM N N N K = 5 Time (s) K = 2 K = Sketch + freq. (Compressive methods) Data (EM) 1 8 Sketch + freq. (Compressive method Data (EM) Memory (bytes) Information preservation? Memory (bytes) N N

62 Information preserving projections (for compressive learning)

63 Stable recovery Signal space R n Model set = signals of interest Ex: set of k-sparse vectors k = {x 2 R n, kxk apple k} x Linear projection M y Observation space R m m n 33

64 Stable recovery Signal space R n Model set = signals of interest Ex: set of k-sparse vectors k = {x 2 R n, kxk apple k} x Ideal goal: build decoder with the guarantee that Linear projection M Recovery algorithm = decoder kx (Mx + e)k appleckek, 8x 2 (instance optimality [Cohen & al 29]) y Observation space R m m n 33

65 Stable recovery Signal space R n Model set = signals of interest Ex: set of k-sparse vectors k = {x 2 R n, kxk apple k} x Ideal goal: build decoder with the guarantee that Linear projection M Recovery algorithm = decoder kx (Mx + e)k appleckek, 8x 2 (instance optimality [Cohen & al 29]) y Observation space R m m n Are there such decoders? 33

66 Stable recovery of k-sparse vectors Typical decoders L1 minimization LASSO [Tibshirani 1994],Basis Pursuit [Chen & al 1999] Greedy algorithms (Orthonormal) Matching Pursuit [Mallat & Zhang 1993], Iterative Hard Thresholding (IHT) [Blumensath & Davies 29], Guarantees Assume Restricted isometry property (RIP) [Candès & al 24] (y) := arg minkxk 1 x:mx=y Exact recovery Stability to noise Robustness to model error 1 apple kmzk2 2 kzk 2 2 apple 1+ when kzk apple 2k 34

67 Stable recovery Signal space R n Model set = signals of interest Low-dimensional model Sparse x Linear projection M y Observation space R m m n 35

68 Stable recovery Signal space R n Model set = signals of interest Low-dimensional model Sparse Sparse in dictionary D x Linear projection M y Observation space R m m n 36

69 Stable recovery Signal space x R n Model set = signals of interest Low-dimensional model Sparse Sparse in dictionary D Co-sparse in analysis operator A total variation, physics-driven sparse models.. Linear projection M y Observation space R m m n 37

70 Stable recovery Signal space x Linear projection M R n Model set = signals of interest Low-dimensional model Sparse Sparse in dictionary D Co-sparse in analysis operator A total variation, physics-driven sparse models Low-rank matrix or tensor matrix completion, phase-retrieval, blind sensor calibration y Observation space R m m n 38

71 Stable recovery Signal space x Linear projection y M R n Observation space R m Model set m n = signals of interest Low-dimensional model Sparse Sparse in dictionary D Co-sparse in analysis operator A total variation, physics-driven sparse models Low-rank matrix or tensor matrix completion, phase-retrieval, blind sensor calibration Manifold / Union of manifolds detection, estimation, localization, mapping Matrix with sparse inverse Gaussian graphical models Given point cloud database indexing 39

72 Stable recovery Vector space H Signal space x Linear projection y M R n Observation space R m Model set m n = signals of interest Low-dimensional model Sparse Sparse in dictionary D Co-sparse in analysis operator A total variation, physics-driven sparse models Low-rank matrix or tensor matrix completion, phase-retrieval, blind sensor calibration Manifold / Union of manifolds detection, estimation, localization, mapping Matrix with sparse inverse Gaussian graphical models Given point cloud database indexing Gaussian Mixture Model 4

73 General stable recovery Vector space H Signal space R n Model set = signals of interest Low-dimensional model arbitrary set H x Ideal goal: build decoder with the guarantee that Linear projection M Recovery algorithm = decoder kx (Mx + e)k appleckek, 8x 2 (instance optimality [Cohen & al 29]) y Observation space R m m n Are there such decoders? Literature on embeddings, cf monograph Robinson 21 Hurewicz & Wallman 41, Falconer 85, Hunt & Kaloshin 99 41

74 Stable recovery from arbitrary model sets Theorem 1: RIP is necessary Definition: (general) Restricted Isometry Property (RIP) on secant set apple kmzk kzk apple when z 2 := {x x,x,x 2 } up to renormalization = p 1 ; = p 1+ M satisfies the RIP as soon as there exists an instance optimal decoder 42

75 Stable recovery from arbitrary model sets Theorem 1: RIP is necessary Definition: (general) Restricted Isometry Property (RIP) on secant set apple kmzk kzk apple when z 2 := {x x,x,x 2 } up to renormalization = p 1 ; = p 1+ M satisfies the RIP as soon as there exists an instance optimal decoder Theorem 2: RIP is sufficient RIP implies existence of decoder with performance guarantees: kx (Mx + e)k applec( )kek + C ( )d (x, ) [Cohen & al 29] for k [Bourrier & al 214] for arbitrary model set 8x 2 Exact recovery Stable to noise Bonus: robust to model error 42

76 Stable recovery from arbitrary model sets Theorem 1: RIP is necessary Definition: (general) Restricted Isometry Property (RIP) on secant set apple kmzk kzk apple when z 2 := {x x,x,x 2 } up to renormalization = p 1 ; = p 1+ M satisfies the RIP as soon as there exists an instance optimal decoder Theorem 2: RIP is sufficient RIP implies existence of decoder with performance guarantees: kx (Mx + e)k applec( )kek + C ( )d (x, ) Exact recovery Stable to noise [Cohen & al 29] for k Distance to model set [Bourrier & al 214] for arbitrary model set Bonus: robust to model error 42

77 Information recovery: decoders

78 Ideal decoder Instance optimality guarantee under RIP kx (Mx + e)k applec( )kek + C ( )d (x, ) «Ideal & Universal» instance optimal decoder (y) := arg min x2h C( )kmx yk + C ( )d (x, ) Adaptations when minimum may not be achieved (e.g., infinite dimension) + Noise-blind: no knowledge of noise level needed - Not very practical 44

79 Greedy decoders Projected Landweber ~ Iterative Hard Thresholding Gradient descent Projection x t+1/2 = x t + µm T (Mx t y) x t+1 =prox (x t+1/2 ) Definition: projection onto the model (~ denoising) prox (y) = arg min x2 ky xk 2 NB: can be NP-hard! Theorem [Blumensath 211]: PLA is instance optimal assuming RIP on constants 2 apple 1/µ apple apple 1/5 with 45

80 Decoding by (convex?) optimization RIP guarantees for generalized Basis Pursuit? General optimization framework Examples: (y) := arg min x2h Sparsity Low-rank k r f(x) s.t.kmx f(x) =kxk 1 f(x) =kxk? } yk apple [Cai & Zhang 214] < 1 p 2.77 Group sparsity Sparsity in J levels f(x) =kxk 1,2 f(x) =kxk 1 [Ayaz, Dirksen, Rauhut 214] < p [Bastounis & Hansen 215] r < 1/ J p Kmax /K min

81 Decoding by (convex?) optimization RIP guarantees for generalized Basis Pursuit? General optimization framework One RIP to rule them all Given cone (or Union of Subspace) model set Definition of RIP constant Definition (y) := arg min x2h δ Σ (f )= where δ Σ (z) =supδ Σ (x, z). x Σ inf δ Σ(z) z T f (Σ) Expressions : δσ UoS(x, z) = Re x,z x H x+z Σ2 x 2 H 2Re x,z δ cone Σ (x, z) = 2Re x,z x+z Σ2 2Re x,z f(x) s.t.kmx yk apple and regularizer f( ) Theorem : M has RIP with constant δ Σ (f )on vectors from Σ Σ guarantees robust and stable uniform recovery. [Traonmilin & G. 215] 47

82 Ideal decoder (ter): (non)convex optimization RIP guarantees for generalized Basis Pursuit? General optimization framework (y) := arg min x2h f(x) s.t.kmx One RIP to rule them all sparsity Given cone (or Union of Subspace) kxk 1 model set low-rank Bounds on RIP constants group sparsity group-sparsity in levels yk apple kxk 1 1/ permutation matrices Birkhoff polytope norm 2/3 J j=1 f(x) kxk? kxk 1,2 kx j k 1,2 [Traonmilin & G. 215] (f) and regularizer p 1/ 2 p Kj 1/ p J +2 q J K max K min +2 f( ) 48

83 From RIP analysis to regularizer design? Our result f w Bastounis et al.,. 1, κ 1 =1 Model set: Partition indices into J blocks (group) sparsity per block K j I j δ Bastounis et al.,. 1, κ 1 =1 Ayaz et al., group norm x = j x j { kxj k apple K j supp(x j ) I j Compared regularizers J Our result,. 1, κ 1 =1 apple 1 := q K max K min kxk 1 = kx j k 1 j= Bastounis et al.,. 1, κ 1 = δ.5 f w,1 (x) = j=1 kx j k 1 p Kj J k k 49

84 Conclusion

85 Projections & Learning Signal Processing compressive sensing Signal space Machine Learning compressive learning Probability space Reduce dimension of data items Reduce size of collection x M y Observation space Linear projection p M z Sketch space Compressive sensing random projections of data items Compressive learning with sketches random projections of collections nonlinear in the feature vectors linear in their probability distribution 51

86 Summary Challenge: compress before learning? Compressive clustering & Compressive GMM Bourrier, G., Perez, Compressive Gaussian Mixture Estimation. ICASSP 213 Keriven & G.. Compressive Gaussian Mixture Estimation by Orthogonal Matching Pursuit with Replacement. SPARS 215, Cambridge, United Kingdom Keriven & al, Sketching for Large-Scale Learning of Mixture Models, ICASSP upcoming long-version with proofs Information preservation? Unified framework covering projections & sketches Notion of Instance Optimal Decoders = Uniform guarantees Fundamental role of general Restricted Isometry Property Bourrier & al, Fundamental performance limits for ideal decoders in high-dimensional linear inverse problems. IEEE Transactions on Information Theory,

87 Recent / ongoing work / challenges Decoders? RIP-based guarantees for general (convex & nonconvex) regularizers Traonmilin & G, Stable recovery of low-dimensional cones in Hilbert spaces - One RIP to rule them all, ariv: Sharpness? Guide design of regularizer (cf atomic norms)? Dimension reduction? H M : H! R m max (f) s.t.f 2 F Goal: given model set exhibit satisfying the RIP Sufficient target dimension: m & covering dim. of normalized secant set [Dirksen 214] : given a random sub-gaussian linear form Puy, Davies & G., Recipes for stable linear embeddings from Hilbert spaces to R^m, hal , see also [Puy, Davies & G., EUSIPCO 215]. This dimension is not necessary: better measure of dimension of? 53

88 Recent / ongoing work / challenges Compressive Spectral Clustering Random projections + graph signal processing = faster spectral clustering Tremblay & al, Accelerated Spectral Clustering using Graph Filtering of Random Signals, ICASSP 216. O(k 2 N) O(k 2 log 2 k + N(log N + k)) proofs: Tremblay, Puy, G. & Vandergheynst, Compressive Spectral Clustering ariv: Example with Amazon graph (1 6 edges): 5 times speedup (3 hours instead of 15 hours for k= 5 classes) General Compressive Learning? RIP for sketches in RKHS applied to compressive GMM upcoming, Keriven, Bourrier, Perez & G. Compressive statistical learning: intrinsic dimension of PCA, k-means, Dictionary Learning, and related learning tasks work in progress, Blanchard & G. 54

89 TH###NKS#

Sketching for Large-Scale Learning of Mixture Models

Sketching for Large-Scale Learning of Mixture Models Sketching for Large-Scale Learning of Mixture Models Nicolas Keriven Université Rennes 1, Inria Rennes Bretagne-atlantique Adv. Rémi Gribonval Outline Introduction Practical Approach Results Theoretical

More information

The Analysis Cosparse Model for Signals and Images

The Analysis Cosparse Model for Signals and Images The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under

More information

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse

More information

Random Sampling of Bandlimited Signals on Graphs

Random Sampling of Bandlimited Signals on Graphs Random Sampling of Bandlimited Signals on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work with

More information

Compressive Sensing and Beyond

Compressive Sensing and Beyond Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Sampling, Inference and Clustering for Data on Graphs

Sampling, Inference and Clustering for Data on Graphs Sampling, Inference and Clustering for Data on Graphs Pierre Vandergheynst École Polytechnique Fédérale de Lausanne (EPFL) School of Engineering & School of Computer and Communication Sciences Joint work

More information

Three Generalizations of Compressed Sensing

Three Generalizations of Compressed Sensing Thomas Blumensath School of Mathematics The University of Southampton June, 2010 home prev next page Compressed Sensing and beyond y = Φx + e x R N or x C N x K is K-sparse and x x K 2 is small y R M or

More information

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France. Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview of the course Introduction sparsity & data compression inverse problems

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery Anna C. Gilbert Department of Mathematics University of Michigan Connection between... Sparse Approximation and Compressed

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Inverse problems, Dictionary based Signal Models and Compressed Sensing

Inverse problems, Dictionary based Signal Models and Compressed Sensing Inverse problems, Dictionary based Signal Models and Compressed Sensing Rémi Gribonval METISS project-team (audio signal processing, speech recognition, source separation) INRIA, Rennes, France Ecole d

More information

Robust Sparse Recovery via Non-Convex Optimization

Robust Sparse Recovery via Non-Convex Optimization Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

Generalized Orthogonal Matching Pursuit- A Review and Some

Generalized Orthogonal Matching Pursuit- A Review and Some Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

Provable Alternating Minimization Methods for Non-convex Optimization

Provable Alternating Minimization Methods for Non-convex Optimization Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Signal Sparsity Models: Theory and Applications

Signal Sparsity Models: Theory and Applications Signal Sparsity Models: Theory and Applications Raja Giryes Computer Science Department, Technion Michael Elad, Technion, Haifa Israel Sangnam Nam, CMI, Marseille France Remi Gribonval, INRIA, Rennes France

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Greedy Signal Recovery and Uniform Uncertainty Principles

Greedy Signal Recovery and Uniform Uncertainty Principles Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Compressive sensing of low-complexity signals: theory, algorithms and extensions

Compressive sensing of low-complexity signals: theory, algorithms and extensions Compressive sensing of low-complexity signals: theory, algorithms and extensions Laurent Jacques March 7, 9, 1, 14, 16 and 18, 216 9h3-12h3 (incl. 3 ) Graduate School in Systems, Optimization, Control

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Rémi Gribonval - PANAMA Project-team INRIA Rennes - Bretagne Atlantique, France

Rémi Gribonval - PANAMA Project-team INRIA Rennes - Bretagne Atlantique, France Sparse dictionary learning in the presence of noise and outliers Rémi Gribonval - PANAMA Project-team INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview Context: inverse problems

More information

Stochastic geometry and random matrix theory in CS

Stochastic geometry and random matrix theory in CS Stochastic geometry and random matrix theory in CS IPAM: numerical methods for continuous optimization University of Edinburgh Joint with Bah, Blanchard, Cartis, and Donoho Encoder Decoder pair - Encoder/Decoder

More information

Self-Calibration and Biconvex Compressive Sensing

Self-Calibration and Biconvex Compressive Sensing Self-Calibration and Biconvex Compressive Sensing Shuyang Ling Department of Mathematics, UC Davis July 12, 2017 Shuyang Ling (UC Davis) SIAM Annual Meeting, 2017, Pittsburgh July 12, 2017 1 / 22 Acknowledgements

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Fast Hard Thresholding with Nesterov s Gradient Method

Fast Hard Thresholding with Nesterov s Gradient Method Fast Hard Thresholding with Nesterov s Gradient Method Volkan Cevher Idiap Research Institute Ecole Polytechnique Federale de ausanne volkan.cevher@epfl.ch Sina Jafarpour Department of Computer Science

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

A new method on deterministic construction of the measurement matrix in compressed sensing

A new method on deterministic construction of the measurement matrix in compressed sensing A new method on deterministic construction of the measurement matrix in compressed sensing Qun Mo 1 arxiv:1503.01250v1 [cs.it] 4 Mar 2015 Abstract Construction on the measurement matrix A is a central

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk Model-Based Compressive Sensing for Signal Ensembles Marco F. Duarte Volkan Cevher Richard G. Baraniuk Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional

More information

Sparse linear models

Sparse linear models Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time

More information

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are

More information

Sensing systems limited by constraints: physical size, time, cost, energy

Sensing systems limited by constraints: physical size, time, cost, energy Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original

More information

On the Minimization Over Sparse Symmetric Sets: Projections, O. Projections, Optimality Conditions and Algorithms

On the Minimization Over Sparse Symmetric Sets: Projections, O. Projections, Optimality Conditions and Algorithms On the Minimization Over Sparse Symmetric Sets: Projections, Optimality Conditions and Algorithms Amir Beck Technion - Israel Institute of Technology Haifa, Israel Based on joint work with Nadav Hallak

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Application to Hyperspectral Imaging

Application to Hyperspectral Imaging Compressed Sensing of Low Complexity High Dimensional Data Application to Hyperspectral Imaging Kévin Degraux PhD Student, ICTEAM institute Université catholique de Louvain, Belgium 6 November, 2013 Hyperspectral

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline

More information

of Orthogonal Matching Pursuit

of Orthogonal Matching Pursuit A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

Scalable Subspace Clustering

Scalable Subspace Clustering Scalable Subspace Clustering René Vidal Center for Imaging Science, Laboratory for Computational Sensing and Robotics, Institute for Computational Medicine, Department of Biomedical Engineering, Johns

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

COMPRESSED Sensing (CS) is a method to recover a

COMPRESSED Sensing (CS) is a method to recover a 1 Sample Complexity of Total Variation Minimization Sajad Daei, Farzan Haddadi, Arash Amini Abstract This work considers the use of Total Variation (TV) minimization in the recovery of a given gradient

More information

Review: Learning Bimodal Structures in Audio-Visual Data

Review: Learning Bimodal Structures in Audio-Visual Data Review: Learning Bimodal Structures in Audio-Visual Data CSE 704 : Readings in Joint Visual, Lingual and Physical Models and Inference Algorithms Suren Kumar Vision and Perceptual Machines Lab 106 Davis

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

Applied Machine Learning for Biomedical Engineering. Enrico Grisan Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and Robustness of Weak Orthogonal Matching Pursuits Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Covariance Sketching via Quadratic Sampling

Covariance Sketching via Quadratic Sampling Covariance Sketching via Quadratic Sampling Yuejie Chi Department of ECE and BMI The Ohio State University Tsinghua University June 2015 Page 1 Acknowledgement Thanks to my academic collaborators on some

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property : An iterative algorithm for sparse recovery with restricted isometry property Rahul Garg grahul@us.ibm.com Rohit Khandekar rohitk@us.ibm.com IBM T. J. Watson Research Center, 0 Kitchawan Road, Route 34,

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Filtering and Sampling Graph Signals, and its Application to Compressive Spectral Clustering

Filtering and Sampling Graph Signals, and its Application to Compressive Spectral Clustering Filtering and Sampling Graph Signals, and its Application to Compressive Spectral Clustering Nicolas Tremblay (1,2), Gilles Puy (1), Rémi Gribonval (1), Pierre Vandergheynst (1,2) (1) PANAMA Team, INRIA

More information

Recovering overcomplete sparse representations from structured sensing

Recovering overcomplete sparse representations from structured sensing Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Approximate Kernel Methods

Approximate Kernel Methods Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression

More information

TUM 2016 Class 3 Large scale learning by regularization

TUM 2016 Class 3 Large scale learning by regularization TUM 2016 Class 3 Large scale learning by regularization Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Learning problem Solve min w E(w), E(w) = dρ(x, y)l(w x, y) given (x 1, y 1 ),..., (x n, y n ) Beyond

More information

Online Dictionary Learning with Group Structure Inducing Norms

Online Dictionary Learning with Group Structure Inducing Norms Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210 ROBUST BLIND SPIKES DECONVOLUTION Yuejie Chi Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 4 ABSTRACT Blind spikes deconvolution, or blind super-resolution, deals with

More information

On the Sub-Optimality of Proximal Gradient Descent for l 0 Sparse Approximation

On the Sub-Optimality of Proximal Gradient Descent for l 0 Sparse Approximation On the Sub-Optimality of Proximal Gradient Descent for l 0 Sparse Approximation Yingzhen Yang YYANG58@ILLINOIS.EDU Jianchao Yang JIANCHAO.YANG@SNAPCHAT.COM Snapchat, Venice, CA 90291 Wei Han WEIHAN3@ILLINOIS.EDU

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

Sparsity and Compressed Sensing

Sparsity and Compressed Sensing Sparsity and Compressed Sensing Jalal Fadili Normandie Université-ENSICAEN, GREYC Mathematical coffees 2017 Recap: linear inverse problems Dictionary Sensing Sensing Sensing = y m 1 H m n y y 2 R m H A

More information

GREEDY SIGNAL RECOVERY REVIEW

GREEDY SIGNAL RECOVERY REVIEW GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

Robust multichannel sparse recovery

Robust multichannel sparse recovery Robust multichannel sparse recovery Esa Ollila Department of Signal Processing and Acoustics Aalto University, Finland SUPELEC, Feb 4th, 2015 1 Introduction 2 Nonparametric sparse recovery 3 Simulation

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information