Sparse Estimation and Dictionary Learning

Size: px
Start display at page:

Download "Sparse Estimation and Dictionary Learning"

Transcription

1 Sparse Estimation and Dictionary Learning (for Biostatistics?) Julien Mairal Biostatistics Seminar, UC Berkeley Julien Mairal Sparse Estimation and Dictionary Learning Methods 1/69

2 What this talk is about? Why sparsity, what for and how? Feature learning / clustering / sparse PCA; Machine learning: selecting relevant features; Signal and image processing: restoration, reconstruction; Biostatistics: you tell me. Julien Mairal Sparse Estimation and Dictionary Learning Methods 2/69

3 Part I: Sparse Estimation Julien Mairal Sparse Estimation and Dictionary Learning Methods 3/69

4 Sparse Linear Model: Machine Learning Point of View Let (y i,x i ) n i=1 be a training set, where the vectors xi are in R p and are called features. The scalars y i are in { 1, +1} for binary classification problems. R for regression problems. We assume there is a relation y w x, and solve 1 min w R p n n l(y i,w x i ) i=1 } {{ } empirical risk + λψ(w). }{{} regularization Julien Mairal Sparse Estimation and Dictionary Learning Methods 4/69

5 Sparse Linear Models: Machine Learning Point of View A few examples: Ridge regression: Linear SVM: 1 min w R p n 1 min w R p n 1 Logistic regression: min w R p n n 1 2 (yi w x i ) 2 +λ w 2 2. n max(0,1 y i w x i )+λ w 2 2. i=1 i=1 n i=1 ( log 1+e yi w x i) +λ w 2 2. Julien Mairal Sparse Estimation and Dictionary Learning Methods 5/69

6 Sparse Linear Models: Machine Learning Point of View A few examples: Ridge regression: Linear SVM: 1 min w R p n 1 min w R p n 1 Logistic regression: min w R p n n 1 2 (yi w x i ) 2 +λ w 2 2. n max(0,1 y i w x i )+λ w 2 2. i=1 i=1 n i=1 ( log 1+e yi w x i) +λ w 2 2. The squared l 2 -norm induces smoothness in w. When one knows in advance that w should be sparse, one should use a sparsity-inducing regularization such as the l 1 -norm. [Chen et al., 1999, Tibshirani, 1996] Julien Mairal Sparse Estimation and Dictionary Learning Methods 6/69

7 Sparse Linear Models: Signal Processing Point of View Let y in R n be a signal. Let X = [x 1,...,x p ] R n p be a set of normalized basis vectors. We call it dictionary. X is adapted to y if it can represent it with a few basis vectors that is, there exists a sparse vector w in R p such that y Xw. We call w the sparse code. w 1 y x 1 x 2 x p w 2 }{{} y R n } {{ } X R n p.. w p }{{} w R p,sparse Julien Mairal Sparse Estimation and Dictionary Learning Methods 7/69

8 The Sparse Decomposition Problem 1 min w R p 2 y Xw 2 2 }{{} data fitting term ψ induces sparsity in w. It can be + λψ(w) }{{} sparsity-inducing regularization the l 0 pseudo-norm. w 0 = #{i s.t. wi 0} (NP-hard) the l 1 norm. w 1 = p i=1 w i (convex),... This is a selection problem. When ψ is the l 1 -norm, the problem is called Lasso [Tibshirani, 1996] or basis pursuit [Chen et al., 1999] Julien Mairal Sparse Estimation and Dictionary Learning Methods 8/69

9 Why does the l 1 -norm induce sparsity? Exemple: quadratic problem in 1D 1 min w R 2 (u w)2 +λ w Piecewise quadratic function with a kink at zero. Derivative at 0 + : g + = u +λ and 0 : g = u λ. Optimality conditions. w is optimal iff: w > 0 and (u w)+λsign(w) = 0 w = 0 and g + 0 and g 0 The solution is the soft-thresholding operator: w = sign(u)( u λ) +. Julien Mairal Sparse Estimation and Dictionary Learning Methods 9/69

10 Why does the l 1 -norm induce sparsity? w w λ λ u 2λ 2λ u (a) soft-thresholding operator, w = sign(u)( u λ) +, min w 1 2 (u w)2 +λ w (b) hard-thresholding operator w = 1 u 2λ u min w 1 2 (u w)2 +λ1 w >0 Julien Mairal Sparse Estimation and Dictionary Learning Methods 10/69

11 Why does the l 1 -norm induce sparsity? Comparison with l 2-regularization in 1D ψ(w) = w 2 ψ(w) = w ψ (w) = 2w ψ (w) = 1, ψ +(w) = 1 The gradient of the l 2 -penalty vanishes when w get close to 0. On its differentiable part, the norm of the gradient of the l 1 -norm is constant. Julien Mairal Sparse Estimation and Dictionary Learning Methods 11/69

12 Why does the l 1 -norm induce sparsity? Physical illustration E 1 = 0 E 1 = 0 y 0 Julien Mairal Sparse Estimation and Dictionary Learning Methods 12/69

13 Why does the l 1 -norm induce sparsity? Physical illustration E 1 = k 1 2 (y 0 y) 2 E 1 = k 1 2 (y 0 y) 2 E 2 = k 2 2 y 2 y y E 2 = mgy Julien Mairal Sparse Estimation and Dictionary Learning Methods 13/69

14 Why does the l 1 -norm induce sparsity? Physical illustration E 1 = k 1 2 (y 0 y) 2 E 1 = k 1 2 (y 0 y) 2 E 2 = k 2 2 y 2 y y = 0!! E 2 = mgy Julien Mairal Sparse Estimation and Dictionary Learning Methods 14/69

15 Regularizing with the l 1 -norm w 1 l 1 -ball w 2 w 1 T The projection onto a convex set is biased towards singularities. Julien Mairal Sparse Estimation and Dictionary Learning Methods 15/69

16 Regularizing with the l 2 -norm w 1 l 2 -ball w 2 w 2 T The l 2 -norm is isotropic. Julien Mairal Sparse Estimation and Dictionary Learning Methods 16/69

17 Regularizing with the l -norm w 1 l -ball w 2 w T The l -norm encourages w 1 = w 2. Julien Mairal Sparse Estimation and Dictionary Learning Methods 17/69

18 In 3D. Copyright G. Obozinski Julien Mairal Sparse Estimation and Dictionary Learning Methods 18/69

19 What about more complicated norms? Copyright G. Obozinski Julien Mairal Sparse Estimation and Dictionary Learning Methods 19/69

20 What about more complicated norms? Copyright G. Obozinski Julien Mairal Sparse Estimation and Dictionary Learning Methods 20/69

21 Examples of sparsity-inducing penalties Exploiting concave functions with a kink at zero ψ(w) = p i=1 φ( w i ). l q - pseudo-norm, with 0 < q < 1: ψ(w) = p i=1 w i q, log penalty, ψ(w) = p i=1 log( w i +ε), φ is any function that looks like this: φ( w ) w Julien Mairal Sparse Estimation and Dictionary Learning Methods 21/69

22 Examples of sparsity-inducing penalties (c) l 0.5-ball, 2-D (d) l 1-ball, 2-D (e) l 2-ball, 2-D Figure: Open balls in 2-D corresponding to several l q -norms and pseudo-norms. Julien Mairal Sparse Estimation and Dictionary Learning Methods 22/69

23 Examples of sparsity-inducing penalties The l 1 -l 2 norm (group Lasso), g G w g 2 = g G ( j g w 2 j ) 1/2, with G a partition of {1,...,p}. selects groups of variables [Yuan and Lin, 2006]. the fused Lasso [Tibshirani et al., 2005] or total variation [Rudin et al., 1992]: p 1 ψ(w) = w j+1 w j. j=1 Extensions (out of the scope of this talk): hierarchical norms [Zhao et al., 2009]. structured sparsity [Jenatton et al., 2009, Jacob et al., 2009, Huang et al., 2009, Baraniuk et al., 2010] Julien Mairal Sparse Estimation and Dictionary Learning Methods 23/69

24 Part II: Dictionary Learning and Matrix Factorization Julien Mairal Sparse Estimation and Dictionary Learning Methods 24/69

25 Matrix Factorization and Clustering Let us cluster some training vectors y 1,...,y m into p clusters using K-means: m min y i x l i 2 2. (x j ) p j=1,(l i) m i=1 It can be equivalently formulated as i=1 min X R n p,w {0,1} p m m y i Xw i 2 F s.t. wi 0 and i=1 p wj i = 1, j=1 which is a matrix factorization problem: min X R n p,w {0,1} p m Y XW 2 F s.t. W 0 and p wj i = 1, j=1 Julien Mairal Sparse Estimation and Dictionary Learning Methods 25/69

26 Matrix Factorization and Clustering Hard clustering min X R n p,w {0,1} p m Y XW 2 F s.t. W 0 and X = [x 1,...,x p ] are the centroids of the p clusters. p wj i = 1, j=1 Soft clustering min X R n p,w R p m Y XW 2 F s.t. W 0 and X = [x 1,...,x p ] are the centroids of the p clusters. p wj i = 1, j=1 Julien Mairal Sparse Estimation and Dictionary Learning Methods 26/69

27 Other Matrix Factorization Problems PCA 1 min W R p n 2 Y XW 2 F s.t. X X = I and WW is diagonal. X R m p X = [x 1,...,x p ] are the principal components. Julien Mairal Sparse Estimation and Dictionary Learning Methods 27/69

28 Other Matrix Factorization Problems Non-negative matrix factorization [Lee and Seung, 2001] 1 min W R p n 2 Y XW 2 F X R m p s.t. W 0 and X 0. Julien Mairal Sparse Estimation and Dictionary Learning Methods 28/69

29 Dictionary Learning and Matrix Factorization [Olshausen and Field, 1997] min X X,W R p m n i=1 1 2 yi Xw i 2 F +λ wi 1, which is again a matrix factorization problem 1 min W R p n 2 Y XW 2 F +λ W 1. X X Julien Mairal Sparse Estimation and Dictionary Learning Methods 29/69

30 Why having a unified point of view? 1 min W W 2 Y XW 2 F +λψ(w). X X same framework for NMF, sparse PCA, dictionary learning, clustering, topic modelling; can play with various constraints/penalties on W (coefficients) and on X (loadings, dictionary, centroids); same algorithms (no need to reinvent the wheel): alternate minimization, online learning [Mairal et al., 2010]. Julien Mairal Sparse Estimation and Dictionary Learning Methods 30/69

31 Advertisement SPAMS toolbox (open-source) C++ interfaced with Matlab, R, Python. proximal gradient methods for l 0, l 1, elastic-net, fused-lasso, group-lasso, tree group-lasso, tree-l 0, sparse group Lasso, overlapping group Lasso......for square, logistic, multi-class logistic loss functions. handles sparse matrices, provides duality gaps. fast implementations of OMP and LARS - homotopy. dictionary learning and matrix factorization (NMF, sparse PCA). coordinate descent, block coordinate descent algorithms. fast projections onto some convex sets. Try it! Julien Mairal Sparse Estimation and Dictionary Learning Methods 31/69

32 Part III: A few Image Processing Stories Julien Mairal Sparse Estimation and Dictionary Learning Methods 32/69

33 The Image Denoising Problem y }{{} measurements = x orig }{{} original image + w }{{} noise Julien Mairal Sparse Estimation and Dictionary Learning Methods 33/69

34 Sparse representations for image restoration y = x }{{} orig + }{{}}{{} w measurements original image noise Energy minimization problem - MAP estimation E(x) = Some classical priors Smoothness λ Lx 2 2 Total variation λ x 2 1 MRF priors y x ψ(x) }{{}}{{} image model (-log prior) relation to measurements Julien Mairal Sparse Estimation and Dictionary Learning Methods 34/69

35 Sparse representations for image restoration Designed dictionaries [Haar, 1910], [Zweig, Morlet, Grossman 70s], [Meyer, Mallat, Daubechies, Coifman, Donoho, Candes 80s-today]...(see [Mallat, 1999]) Wavelets, Curvelets, Wedgelets, Bandlets,... lets Learned dictionaries of patches [Olshausen and Field, 1997], [Engan et al., 1999], [Lewicki and Sejnowski, 2000], [Aharon et al., 2006], [Roth and Black, 2005], [Lee et al., 2007] min w i,x C i 1 2 y i Xw i 2 2 }{{} reconstruction ψ(w) = w 0 ( l 0 pseudo-norm ) ψ(w) = w 1 (l 1 norm) +λψ(w i ) }{{} sparsity Julien Mairal Sparse Estimation and Dictionary Learning Methods 35/69

36 Sparse representations for image restoration Solving the denoising problem [Elad and Aharon, 2006] Extract all overlapping 8 8 patches y i. Solve a matrix factorization problem: min w i,x C n 1 2 y i Xw i 2 2+λψ(w i ), }{{}}{{} sparsity reconstruction i=1 with n > 100,000 Average the reconstruction of each patch. Julien Mairal Sparse Estimation and Dictionary Learning Methods 36/69

37 Sparse representations for image restoration K-SVD: [Elad and Aharon, 2006] Figure: Dictionary trained on a noisy version of the image boat. Julien Mairal Sparse Estimation and Dictionary Learning Methods 37/69

38 Sparse representations for image restoration Grayscale vs color image patches Julien Mairal Sparse Estimation and Dictionary Learning Methods 38/69

39 Sparse representations for image restoration [Mairal, Sapiro, and Elad, 2008b] Julien Mairal Sparse Estimation and Dictionary Learning Methods 39/69

40 Sparse representations for image restoration Inpainting, [Mairal, Elad, and Sapiro, 2008a] Julien Mairal Sparse Estimation and Dictionary Learning Methods 40/69

41 Sparse representations for image restoration Inpainting, [Mairal, Elad, and Sapiro, 2008a] Julien Mairal Sparse Estimation and Dictionary Learning Methods 41/69

42 Sparse representations for video restoration Key ideas for video processing [Protter and Elad, 2009] Using a 3D dictionary. Processing of many frames at the same time. Dictionary propagation. Julien Mairal Sparse Estimation and Dictionary Learning Methods 42/69

43 Sparse representations for image restoration Inpainting, [Mairal, Sapiro, and Elad, 2008b] Figure: Inpainting results. Julien Mairal Sparse Estimation and Dictionary Learning Methods 43/69

44 Sparse representations for image restoration Inpainting, [Mairal, Sapiro, and Elad, 2008b] Figure: Inpainting results. Julien Mairal Sparse Estimation and Dictionary Learning Methods 44/69

45 Sparse representations for image restoration Inpainting, [Mairal, Sapiro, and Elad, 2008b] Figure: Inpainting results. Julien Mairal Sparse Estimation and Dictionary Learning Methods 45/69

46 Sparse representations for image restoration Inpainting, [Mairal, Sapiro, and Elad, 2008b] Figure: Inpainting results. Julien Mairal Sparse Estimation and Dictionary Learning Methods 46/69

47 Sparse representations for image restoration Inpainting, [Mairal, Sapiro, and Elad, 2008b] Figure: Inpainting results. Julien Mairal Sparse Estimation and Dictionary Learning Methods 47/69

48 Sparse representations for image restoration Color video denoising, [Mairal, Sapiro, and Elad, 2008b] Figure: Denoising results. σ = 25 Julien Mairal Sparse Estimation and Dictionary Learning Methods 48/69

49 Sparse representations for image restoration Color video denoising, [Mairal, Sapiro, and Elad, 2008b] Figure: Denoising results. σ = 25 Julien Mairal Sparse Estimation and Dictionary Learning Methods 49/69

50 Sparse representations for image restoration Color video denoising, [Mairal, Sapiro, and Elad, 2008b] Figure: Denoising results. σ = 25 Julien Mairal Sparse Estimation and Dictionary Learning Methods 50/69

51 Sparse representations for image restoration Color video denoising, [Mairal, Sapiro, and Elad, 2008b] Figure: Denoising results. σ = 25 Julien Mairal Sparse Estimation and Dictionary Learning Methods 51/69

52 Sparse representations for image restoration Color video denoising, [Mairal, Sapiro, and Elad, 2008b] Figure: Denoising results. σ = 25 Julien Mairal Sparse Estimation and Dictionary Learning Methods 52/69

53 Digital Zooming Couzinie-Devy, 2010, Original Julien Mairal Sparse Estimation and Dictionary Learning Methods 53/69

54 Digital Zooming Couzinie-Devy, 2010, Bicubic Julien Mairal Sparse Estimation and Dictionary Learning Methods 54/69

55 Digital Zooming Couzinie-Devy, 2010, Proposed method Julien Mairal Sparse Estimation and Dictionary Learning Methods 55/69

56 Digital Zooming Couzinie-Devy, 2010, Original Julien Mairal Sparse Estimation and Dictionary Learning Methods 56/69

57 Digital Zooming Couzinie-Devy, 2010, Bicubic Julien Mairal Sparse Estimation and Dictionary Learning Methods 57/69

58 Digital Zooming Couzinie-Devy, 2010, Proposed approach Julien Mairal Sparse Estimation and Dictionary Learning Methods 58/69

59 Inverse half-toning Original Julien Mairal Sparse Estimation and Dictionary Learning Methods 59/69

60 Inverse half-toning Reconstructed image Julien Mairal Sparse Estimation and Dictionary Learning Methods 60/69

61 Inverse half-toning Original Julien Mairal Sparse Estimation and Dictionary Learning Methods 61/69

62 Inverse half-toning Reconstructed image Julien Mairal Sparse Estimation and Dictionary Learning Methods 62/69

63 Inverse half-toning Original Julien Mairal Sparse Estimation and Dictionary Learning Methods 63/69

64 Inverse half-toning Reconstructed image Julien Mairal Sparse Estimation and Dictionary Learning Methods 64/69

65 Inverse half-toning Original Julien Mairal Sparse Estimation and Dictionary Learning Methods 65/69

66 Inverse half-toning Reconstructed image Julien Mairal Sparse Estimation and Dictionary Learning Methods 66/69

67 Inverse half-toning Original Julien Mairal Sparse Estimation and Dictionary Learning Methods 67/69

68 Inverse half-toning Reconstructed image Julien Mairal Sparse Estimation and Dictionary Learning Methods 68/69

69 Conclusion We have seen that many formulations are related to sparse regularized matrix factorization problems: pca, sparse pca, clustering, nmf, dictionary learning; we have so successful stories in images processing, computer vision and neuroscience; there exists efficient software for Matlab/R/Python. Julien Mairal Sparse Estimation and Dictionary Learning Methods 69/69

70 References I M. Aharon, M. Elad, and A. M. Bruckstein. The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representations. IEEE Transactions on Signal Processing, 54(11): , November R. G. Baraniuk, V. Cevher, M. Duarte, and C. Hegde. Model-based compressive sensing. IEEE Transactions on Information Theory, to appear. A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1): , E. Candes. Compressive sampling. In Proceedings of the International Congress of Mathematicians, volume 3, S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20:33 61, Julien Mairal Sparse Estimation and Dictionary Learning Methods 70/69

71 References II M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 54(12): , December K. Engan, S. O. Aase, and J. H. Husoy. Frame based signal compression using method of optimal directions (MOD). In Proceedings of the 1999 IEEE International Symposium on Circuits Systems, volume 4, A. Haar. Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen, 69: , J. Huang, Z. Zhang, and D. Metaxas. Learning with structured sparsity. In Proceedings of the International Conference on Machine Learning (ICML), L. Jacob, G. Obozinski, and J.-P. Vert. Group Lasso with overlap and graph Lasso. In Proceedings of the International Conference on Machine Learning (ICML), Julien Mairal Sparse Estimation and Dictionary Learning Methods 71/69

72 References III R. Jenatton, J-Y. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. Technical report, preprint arxiv: v1. D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems, H. Lee, A. Battle, R. Raina, and A. Y. Ng. Efficient sparse coding algorithms. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems, volume 19, pages MIT Press, Cambridge, MA, M. S. Lewicki and T. J. Sejnowski. Learning overcomplete representations. Neural Computation, 12(2): , J. Mairal, M. Elad, and G. Sapiro. Sparse representation for color image restoration. IEEE Transactions on Image Processing, 17(1):53 69, January 2008a. Julien Mairal Sparse Estimation and Dictionary Learning Methods 72/69

73 References IV J. Mairal, G. Sapiro, and M. Elad. Learning multiscale sparse representations for image and video restoration. SIAM Multiscale Modelling and Simulation, 7(1): , April 2008b. J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, S. Mallat. A Wavelet Tour of Signal Processing, Second Edition. Academic Press, New York, September S. Mallat and Z. Zhang. Matching pursuit in a time-frequency dictionary. IEEE Transactions on Signal Processing, 41(12): , Y. Nesterov. A method for solving a convex programming problem with convergence rate O(1/k 2 ). Soviet Math. Dokl., 27: , Y. Nesterov. Gradient methods for minimizing composite objective function. Technical report, CORE, Julien Mairal Sparse Estimation and Dictionary Learning Methods 73/69

74 References V B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37: , M. Protter and M. Elad. Image sequence denoising via sparse and redundant representations. IEEE Transactions on Image Processing, 18(1):27 36, S. Roth and M. J. Black. Fields of experts: A framework for learning image priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), L.I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60(1-4): , R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B, 58(1): , Julien Mairal Sparse Estimation and Dictionary Learning Methods 74/69

75 References VI R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society Series B, 67(1):91 108, M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68:49 67, P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. 37(6A): , Julien Mairal Sparse Estimation and Dictionary Learning Methods 75/69

76 One short slide on compressed sensing Important message Sparse coding is not compressed sensing. Compressed sensing is a theory [see Candes, 2006] saying that a sparse signal can be recovered from a few linear measurements under some conditions. Signal Acquisition: Z y, where Z R m s is a sensing matrix with s m. Signal Decoding: min w R p w 1 s.t. Z y = Z Xw. with extensions to approximately sparse signals, noisy measurements. Remark The dictionaries we are using in this lecture do not satisfy the recovery assumptions of compressed sensing. Julien Mairal Sparse Estimation and Dictionary Learning Methods 76/69

77 Greedy Algorithms Several equivalent non-convex and NP-hard problems: 1 min w R p 2 y Xw 2 2+ λ w 0, }{{}}{{} regularization residual r min w R p y Xw 2 2 s.t. w 0 L, min w R p w 0 s.t. y Xw 2 2 ε, The solution is often approximated with a greedy algorithm. Signal processing: Matching Pursuit [Mallat and Zhang, 1993], Orthogonal Matching Pursuit [?]. Statistics: L2-boosting, forward selection. Julien Mairal Sparse Estimation and Dictionary Learning Methods 77/69

78 Matching Pursuit w = (0,0,0) y x 2 x 1 z r x 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 78/69

79 Matching Pursuit w = (0,0,0) y x 2 x 1 z r x 3 < r,x 3 > x 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 79/69

80 Matching Pursuit w = (0,0,0) y x 2 x 1 z r r < r,x3 > x 3 x 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 80/69

81 Matching Pursuit w = (0, 0, 0.75) y x 2 x 1 r z x 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 81/69

82 Matching Pursuit w = (0, 0, 0.75) y x 2 x 1 r z x 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 82/69

83 Matching Pursuit w = (0, 0, 0.75) y x 2 x 1 r z x 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 83/69

84 Matching Pursuit w = (0, 0.24, 0.75) y x 2 x 1 z r x 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 84/69

85 Matching Pursuit w = (0, 0.24, 0.75) y x 2 z x 1 r x 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 85/69

86 Matching Pursuit w = (0, 0.24, 0.75) y x 2 z x 1 r x 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 86/69

87 Matching Pursuit w = (0, 0.24, 0.65) y x 2 x 1 z r x 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 87/69

88 Matching Pursuit min y Xw 2 w R p }{{} 2 s.t. w 0 L r 1: w 0 2: r y (residual). 3: while w 0 < L do 4: Select the predictor with maximum correlation with the residual î argmax x i r i=1,...,p 5: Update the residual and the coefficients 6: end while wî wî +xî r r r (xî r)xî Julien Mairal Sparse Estimation and Dictionary Learning Methods 88/69

89 Orthogonal Matching Pursuit y x 2 w = (0,0,0) J = z x 1 y x 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 89/69

90 Orthogonal Matching Pursuit y x 2 w = (0,0,0.75) J = {3} π 2 x 1 z y r π 1 x 3 π 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 90/69

91 Orthogonal Matching Pursuit y x 2 w = (0,0.29,0.63) J = {3,2} x 1 z π 32 y r π 31 x 3 x Julien Mairal Sparse Estimation and Dictionary Learning Methods 91/69

92 Orthogonal Matching Pursuit min w R y Xw 2 p 2 s.t. w 0 L 1: J =. 2: for iter = 1,...,L do 3: Select the predictor which most reduces the objective î argmin i J C { min w y X J {i} w 2 2 4: Update the active set: J J {î}. 5: Update the residual (orthogonal projection) 6: Update the coefficients 7: end for r (I X J (X J X J) 1 X J )y. w J (X J X J) 1 X J y. Julien Mairal Sparse Estimation and Dictionary Learning Methods 92/69 }

93 Orthogonal Matching Pursuit The keys for a good implementation If available, use Gram matrix G = X X, Maintain the computation of X r for each signal, Maintain a Cholesky decomposition of (X J X J) 1 for each signal. The total complexity for decomposing n L-sparse signals of size m with a dictionary of size p is O(p 2 m) }{{} Gram matrix +O(nL 3 ) }{{} Cholesky +O(n(pm+pL 2 )) = O(np(m+L 2 )) }{{} X T r It is also possible to use the matrix inversion lemma instead of a Cholesky decomposition (same complexity, but less numerical stability). Julien Mairal Sparse Estimation and Dictionary Learning Methods 93/69

94 Coordinate Descent for the Lasso Coordinate descent + nonsmooth objective: WARNING: not convergent in general Here, the problem is equivalent to a convex smooth optimization problem with separable constraints 1 min w +,w 2 y X +w + +X w 2 2+λw+1+λw T 1 T s.t. w,w + 0. For this specific problem, coordinate descent is convergent. Assume the colums of X to have unit l 2 -norm, updating the coordinate i: 1 w i argmin w R 2 y w j x j wx i 2 2 +λ w j i }{{} r sign(x i r)( x i r λ) + soft-thresholding! Julien Mairal Sparse Estimation and Dictionary Learning Methods 94/69

95 First-order/proximal methods min f(w)+λψ(w) w R p f is strictly convex and differentiable with a Lipshitz gradient. Generalizes the idea of gradient descent w k+1 argminf(w k )+ f(w k ) (w w k ) + L w R p }{{} 2 w wk 2 2+λψ(w) }{{} linear approximation quadratic term 1 argmin w R p 2 w (wk 1 L f(wk )) λ L ψ(w) When λ = 0, w k+1 w k 1 L f(wk ), this is equivalent to a classical gradient descent step. Julien Mairal Sparse Estimation and Dictionary Learning Methods 95/69

96 First-order/proximal methods They require solving efficiently the proximal operator 1 min w R p 2 u w 2 2 +λψ(w) For the l 1 -norm, this amounts to a soft-thresholding: w i = sign(u i )(u i λ) +. There exists accelerated versions based on Nesterov optimal first-order method (gradient method with extrapolation ) [Beck and Teboulle, 2009, Nesterov, 2007, 1983] suited for large-scale experiments. Julien Mairal Sparse Estimation and Dictionary Learning Methods 96/69

97 Optimization for Grouped Sparsity The proximal operator: For q = 2, For q =, 1 min w R p 2 u w 2 2 +λ w g q g G w g = u g u g 2 ( u g 2 λ) +, g G w g = u g Π. 1 λ[u g ], g G These formula generalize soft-thrsholding to groups of variables. They are used in block-coordinate descent and proximal algorithms. Julien Mairal Sparse Estimation and Dictionary Learning Methods 97/69

98 Smoothing Techniques: Reweighted l 2 Let us start from something simple Then and a 1 2 The formulation becomes a 2 2ab +b 2 0. ( a 2 b +b ) 1 w 1 = min η j 0 2 with equality iff a = b p w[j] 2 +η j. η j j=1 1 min w,η j ε 2 y Xw λ 2 p w[j] 2 +η j. η j j=1 Julien Mairal Sparse Estimation and Dictionary Learning Methods 98/69

99 DC (difference of convex) - Programming Remember? Concave functions with a kink at zero ψ(w) = p i=1 φ( w i ). l q - pseudo-norm, with 0 < q < 1: ψ(w) = p i=1 w i q, log penalty, ψ(w) = p i=1 log( w i +ε), φ is any function that looks like this: φ( w ) w Julien Mairal Sparse Estimation and Dictionary Learning Methods 99/69

100 Figure: Illustration of the DC-programming Julien Mairal Sparse approach. Estimation and The Dictionary non-convex Learning Methods part of 100/69 φ ( w k ) w +C w k f(w) w k φ(w) = log( w +ε) f(w)+φ ( w k ) w +C f(w)+φ( w ) w k

101 DC (difference of convex) - Programming p min w R pf(w)+λ φ( w i ). This problem is non-convex. f is convex, and φ is concave on R +. if w k is the current estimate at iteration k, the algorithm solves i=1 [ w k+1 argmin f(w)+λ w R p p i=1 ] ψ ( wi k ) w i, which is a reweighted-l 1 problem. Warning: It does not solve the non-convex problem, only provides a stationary point. In practice, each iteration sets to zero small coefficients. After 2 3 iterations, the result does not change much. Julien Mairal Sparse Estimation and Dictionary Learning Methods 101/69

Topographic Dictionary Learning with Structured Sparsity

Topographic Dictionary Learning with Structured Sparsity Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity

More information

Recent Advances in Structured Sparse Models

Recent Advances in Structured Sparse Models Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured

More information

Optimization for Sparse Estimation and Structured Sparsity

Optimization for Sparse Estimation and Structured Sparsity Optimization for Sparse Estimation and Structured Sparsity Julien Mairal INRIA LEAR, Grenoble IMA, Minneapolis, June 2013 Short Course Applied Statistics and Machine Learning Julien Mairal Optimization

More information

Structured Sparse Estimation with Network Flow Optimization

Structured Sparse Estimation with Network Flow Optimization Structured Sparse Estimation with Network Flow Optimization Julien Mairal University of California, Berkeley Neyman seminar, Berkeley Julien Mairal Neyman seminar, UC Berkeley /48 Purpose of the talk introduce

More information

Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning)

Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Julien Mairal UC Berkeley Edinburgh, ICML, June 2012 Julien Mairal, UC Berkeley Backpropagation Rules for Sparse Coding 1/57 Other

More information

Parcimonie en apprentissage statistique

Parcimonie en apprentissage statistique Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Structured sparsity-inducing norms through submodular functions

Structured sparsity-inducing norms through submodular functions Structured sparsity-inducing norms through submodular functions Francis Bach Sierra team, INRIA - Ecole Normale Supérieure - CNRS Thanks to R. Jenatton, J. Mairal, G. Obozinski June 2011 Outline Introduction:

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Sparse & Redundant Signal Representation, and its Role in Image Processing

Sparse & Redundant Signal Representation, and its Role in Image Processing Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Structured sparsity through convex optimization

Structured sparsity through convex optimization Structured sparsity through convex optimization Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with R. Jenatton, J. Mairal, G. Obozinski Journées INRIA - Apprentissage - December

More information

Hierarchical kernel learning

Hierarchical kernel learning Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning

Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Julien Mairal Inria, LEAR Team, Grenoble Journées MAS, Toulouse Julien Mairal Incremental and Stochastic

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries

Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries Jarvis Haupt University of Minnesota Department of Electrical and Computer Engineering Supported by Motivation New Agile Sensing

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Pathwise coordinate optimization

Pathwise coordinate optimization Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,

More information

Structured sparsity-inducing norms through submodular functions

Structured sparsity-inducing norms through submodular functions Structured sparsity-inducing norms through submodular functions Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with R. Jenatton, J. Mairal, G. Obozinski IPAM - February 2013 Outline

More information

Online Learning for Matrix Factorization and Sparse Coding

Online Learning for Matrix Factorization and Sparse Coding Online Learning for Matrix Factorization and Sparse Coding Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro To cite this version: Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro. Online

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,

More information

Introduction to Three Paradigms in Machine Learning. Julien Mairal

Introduction to Three Paradigms in Machine Learning. Julien Mairal Introduction to Three Paradigms in Machine Learning Julien Mairal Inria Grenoble Yerevan, 208 Julien Mairal Introduction to Three Paradigms in Machine Learning /25 Optimization is central to machine learning

More information

DNNs for Sparse Coding and Dictionary Learning

DNNs for Sparse Coding and Dictionary Learning DNNs for Sparse Coding and Dictionary Learning Subhadip Mukherjee, Debabrata Mahapatra, and Chandra Sekhar Seelamantula Department of Electrical Engineering, Indian Institute of Science, Bangalore 5612,

More information

Sparse linear models

Sparse linear models Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time

More information

A discretized Newton flow for time varying linear inverse problems

A discretized Newton flow for time varying linear inverse problems A discretized Newton flow for time varying linear inverse problems Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München Arcisstrasse

More information

Generalized Conditional Gradient and Its Applications

Generalized Conditional Gradient and Its Applications Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Noise Removal? The Evolution Of Pr(x) Denoising By Energy Minimization. ( x) An Introduction to Sparse Representation and the K-SVD Algorithm

Noise Removal? The Evolution Of Pr(x) Denoising By Energy Minimization. ( x) An Introduction to Sparse Representation and the K-SVD Algorithm Sparse Representation and the K-SV Algorithm he CS epartment he echnion Israel Institute of technology Haifa 3, Israel University of Erlangen - Nürnberg April 8 Noise Removal? Our story begins with image

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

Sparse linear models and denoising

Sparse linear models and denoising Lecture notes 4 February 22, 2016 Sparse linear models and denoising 1 Introduction 1.1 Definition and motivation Finding representations of signals that allow to process them more effectively is a central

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego

Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego Bhaskar Rao Department of Electrical and Computer Engineering University of California, San Diego 1 Outline Course Outline Motivation for Course Sparse Signal Recovery Problem Applications Computational

More information

The Iteration-Tuned Dictionary for Sparse Representations

The Iteration-Tuned Dictionary for Sparse Representations The Iteration-Tuned Dictionary for Sparse Representations Joaquin Zepeda #1, Christine Guillemot #2, Ewa Kijak 3 # INRIA Centre Rennes - Bretagne Atlantique Campus de Beaulieu, 35042 Rennes Cedex, FRANCE

More information

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion Semi-supervised ictionary Learning Based on Hilbert-Schmidt Independence Criterion Mehrdad J. Gangeh 1, Safaa M.A. Bedawi 2, Ali Ghodsi 3, and Fakhri Karray 2 1 epartments of Medical Biophysics, and Radiation

More information

Greedy Dictionary Selection for Sparse Representation

Greedy Dictionary Selection for Sparse Representation Greedy Dictionary Selection for Sparse Representation Volkan Cevher Rice University volkan@rice.edu Andreas Krause Caltech krausea@caltech.edu Abstract We discuss how to construct a dictionary by selecting

More information

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse

More information

THe linear decomposition of data using a few elements

THe linear decomposition of data using a few elements 1 Task-Driven Dictionary Learning Julien Mairal, Francis Bach, and Jean Ponce arxiv:1009.5358v2 [stat.ml] 9 Sep 2013 Abstract Modeling data with linear combinations of a few elements from a learned dictionary

More information

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Introduction to Sparsity Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Outline Understanding Sparsity Total variation Compressed sensing(definition) Exact recovery with sparse prior(l 0 ) l 1 relaxation

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

SIGNAL SEPARATION USING RE-WEIGHTED AND ADAPTIVE MORPHOLOGICAL COMPONENT ANALYSIS

SIGNAL SEPARATION USING RE-WEIGHTED AND ADAPTIVE MORPHOLOGICAL COMPONENT ANALYSIS TR-IIS-4-002 SIGNAL SEPARATION USING RE-WEIGHTED AND ADAPTIVE MORPHOLOGICAL COMPONENT ANALYSIS GUAN-JU PENG AND WEN-LIANG HWANG Feb. 24, 204 Technical Report No. TR-IIS-4-002 http://www.iis.sinica.edu.tw/page/library/techreport/tr204/tr4.html

More information

Dictionary learning for speech based on a doubly sparse greedy adaptive dictionary algorithm

Dictionary learning for speech based on a doubly sparse greedy adaptive dictionary algorithm IEEE TRANSACTIONS ON, VOL. X, NO. X, JANUARY XX 1 Dictionary learning for speech based on a doubly sparse greedy adaptive dictionary algorithm Maria G. Jafari and Mark D. Plumbley Abstract In this paper

More information

Structured sparsity through convex optimization

Structured sparsity through convex optimization Structured sparsity through convex optimization Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with R. Jenatton, J. Mairal, G. Obozinski Cambridge University - May 2012 Outline

More information

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE METHODS AND APPLICATIONS OF ANALYSIS. c 2011 International Press Vol. 18, No. 1, pp. 105 110, March 2011 007 EXACT SUPPORT RECOVERY FOR LINEAR INVERSE PROBLEMS WITH SPARSITY CONSTRAINTS DENNIS TREDE Abstract.

More information

Fast Hard Thresholding with Nesterov s Gradient Method

Fast Hard Thresholding with Nesterov s Gradient Method Fast Hard Thresholding with Nesterov s Gradient Method Volkan Cevher Idiap Research Institute Ecole Polytechnique Federale de ausanne volkan.cevher@epfl.ch Sina Jafarpour Department of Computer Science

More information

Online Learning for Matrix Factorization and Sparse Coding

Online Learning for Matrix Factorization and Sparse Coding Online Learning for Matrix Factorization and Sparse Coding Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro To cite this version: Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro. Online

More information

Online Dictionary Learning with Group Structure Inducing Norms

Online Dictionary Learning with Group Structure Inducing Norms Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,

More information

Learning MMSE Optimal Thresholds for FISTA

Learning MMSE Optimal Thresholds for FISTA MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Learning MMSE Optimal Thresholds for FISTA Kamilov, U.; Mansour, H. TR2016-111 August 2016 Abstract Fast iterative shrinkage/thresholding algorithm

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Compressed Sensing and Related Learning Problems

Compressed Sensing and Related Learning Problems Compressed Sensing and Related Learning Problems Yingzhen Li Dept. of Mathematics, Sun Yat-sen University Advisor: Prof. Haizhang Zhang Advisor: Prof. Haizhang Zhang 1 / Overview Overview Background Compressed

More information

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

Applied Machine Learning for Biomedical Engineering. Enrico Grisan Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

c Springer, Reprinted with permission.

c Springer, Reprinted with permission. Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

Trace Lasso: a trace norm regularization for correlated designs

Trace Lasso: a trace norm regularization for correlated designs Author manuscript, published in "NIPS 2012 - Neural Information Processing Systems, Lake Tahoe : United States (2012)" Trace Lasso: a trace norm regularization for correlated designs Edouard Grave edouard.grave@inria.fr

More information

A NEW FRAMEWORK FOR DESIGNING INCOHERENT SPARSIFYING DICTIONARIES

A NEW FRAMEWORK FOR DESIGNING INCOHERENT SPARSIFYING DICTIONARIES A NEW FRAMEWORK FOR DESIGNING INCOERENT SPARSIFYING DICTIONARIES Gang Li, Zhihui Zhu, 2 uang Bai, 3 and Aihua Yu 3 School of Automation & EE, Zhejiang Univ. of Sci. & Tech., angzhou, Zhejiang, P.R. China

More information

IMA Preprint Series # 2276

IMA Preprint Series # 2276 UNIVERSAL PRIORS FOR SPARSE MODELING By Ignacio Ramírez Federico Lecumberry and Guillermo Sapiro IMA Preprint Series # 2276 ( August 2009 ) INSTITUTE FOR MATHEMATICS AND ITS APPLICATIONS UNIVERSITY OF

More information

Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation

Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. X, NO. X, XX 200X 1 Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation Ron Rubinstein, Student Member, IEEE, Michael Zibulevsky,

More information

Minimizing Isotropic Total Variation without Subiterations

Minimizing Isotropic Total Variation without Subiterations MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Minimizing Isotropic Total Variation without Subiterations Kamilov, U. S. TR206-09 August 206 Abstract Total variation (TV) is one of the most

More information

MULTI-SCALE BAYESIAN RECONSTRUCTION OF COMPRESSIVE X-RAY IMAGE. Jiaji Huang, Xin Yuan, and Robert Calderbank

MULTI-SCALE BAYESIAN RECONSTRUCTION OF COMPRESSIVE X-RAY IMAGE. Jiaji Huang, Xin Yuan, and Robert Calderbank MULTI-SCALE BAYESIAN RECONSTRUCTION OF COMPRESSIVE X-RAY IMAGE Jiaji Huang, Xin Yuan, and Robert Calderbank Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, U.S.A

More information

Structured Sparsity. group testing & compressed sensing a norm that induces structured sparsity. Mittagsseminar 2011 / 11 / 10 Martin Jaggi

Structured Sparsity. group testing & compressed sensing a norm that induces structured sparsity. Mittagsseminar 2011 / 11 / 10 Martin Jaggi Structured Sparsity group testing & compressed sensing a norm that induces structured sparsity (ariv.org/abs/1110.0413 Obozinski, G., Jacob, L., & Vert, J.-P., October 2011) Mittagssear 2011 / 11 / 10

More information

Efficient Methods for Overlapping Group Lasso

Efficient Methods for Overlapping Group Lasso Efficient Methods for Overlapping Group Lasso Lei Yuan Arizona State University Tempe, AZ, 85287 Lei.Yuan@asu.edu Jun Liu Arizona State University Tempe, AZ, 85287 j.liu@asu.edu Jieping Ye Arizona State

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Homotopy methods based on l 0 norm for the compressed sensing problem

Homotopy methods based on l 0 norm for the compressed sensing problem Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China

More information