Recent Advances in Structured Sparse Models

Size: px
Start display at page:

Download "Recent Advances in Structured Sparse Models"

Transcription

1 Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured Sparse Models 1/46

2 Acknowledgements Francis Bach Rodolphe Guillaume Jenatton Obozinski Julien Mairal Recent Advances in Structured Sparse Models 2/46

3 What this talk is about? Sparse Linear Models Not only sparse, but also structured! Solving challenging optimization problems Developping new applications of sparse models in computer vision and machine learning Related publications: [1] J. Mairal, R. Jenatton, G. Obozinski and F. Bach. Network Flow Algorithms for Structured Sparsity. NIPS, 2010 [2] R. Jenatton, J. Mairal, G. Obozinski and F. Bach. Proximal Methods for Hierarchical Sparse Coding. arxiv: v1 [3] R. Jenatton, J. Mairal, G. Obozinski and F. Bach. Proximal Methods for Sparse Hierarchical Dictionary Learning. ICML, 2010 Julien Mairal Recent Advances in Structured Sparse Models 3/46

4 Sparse Linear Model: Machine Learning Point of View Let (y i,x i ) n i=1 be a training set, where the vectors xi are in R p and are called features. The scalars y i are in { 1, +1} for binary classification problems. {1,...,N} for multiclass classification problems. R for regression problems. In a linear model, on assumes a relation y w x, and solves 1 min w R p n n l(y i,w x i ) i=1 } {{ } data-fitting + λω(w). }{{} regularization Julien Mairal Recent Advances in Structured Sparse Models 4/46

5 Sparse Linear Models: Machine Learning Point of View A few examples: Ridge regression: Linear SVM: 1 min w R p 2n 1 min w R p n 1 Logistic regression: min w R p n n (y i w x i ) 2 +λ w 2 2. i=1 n max(0,1 y i w x i )+λ w 2 2. i=1 n i=1 ( log 1+e yi w x i) +λ w 2 2. The squared l 2 -norm induces smoothness in w. When one knows in advance that w should be sparse, one should use a sparsity-inducing regularization such as the l 1 -norm. [Chen et al., 1999, Tibshirani, 1996] The purpose of the talk is to add additional a-priori knowledge in the regularization. Julien Mairal Recent Advances in Structured Sparse Models 5/46

6 Sparse Linear Models: Signal Processing Point of View Let y in R n be a signal. Let D = [d 1,...,d p ] R n p be a set of normalized basis vectors. We call it dictionary. D is adapted to y if it can represent it with a few basis vectors that is, there exists a sparse vector w in R p such that x Dw. We call w the sparse code. w 1 y d 1 d 2 d p w 2 }{{} y R n } {{ } D R n p.. w p }{{} w R p,sparse Julien Mairal Recent Advances in Structured Sparse Models 6/46

7 Sparse Linear Models: the Lasso Signal processing: D is a dictionary in R n p, Machine Learning: 1 min w R p 2n n i=1 1 min w R p 2 y Dw 2 2 +λ w 1. (y i x i w) 2 1 +λ w 1 = min w R p 2n y X w 2 2+λ w 1, with X = [x 1,...,x n ], and y = [y 1,...,y n ]. Useful tool in signal processing, machine learning, statistics, neuroscience,...as long as one wishes to select features. Julien Mairal Recent Advances in Structured Sparse Models 7/46

8 Why does the l 1 -norm induce sparsity? Analysis of the norms in 1D Ω(w) = w 2 Ω(w) = w Ω (w) = 2w Ω (w) = 1, Ω +(w) = 1 The gradient of the l 2 -norm vanishes when w get close to 0. On its differentiable part, the norm of the gradient of the l 1 -norm is constant. Julien Mairal Recent Advances in Structured Sparse Models 8/46

9 Why does the l 1 -norm induce sparsity? Geometric explanation x x y y 1 min w R p 2 y Dw 2 2 +λ w 1 min w R p y Dw 2 2 s.t. w 1 T. Julien Mairal Recent Advances in Structured Sparse Models 9/46

10 Other Sparsity-Inducing Norms min w R p data fitting term {}}{ f(w) + λ Ω(w) }{{} sparsity-inducing norm The most popular choice for Ω: The l 1 norm, w 1 = p j=1 w j. However, the l 1 norm encodes poor information, just cardinality! Another popular choice for Ω: The l 1 -l q norm [Yuan and Lin, 2006], with q = 2 or q = w g q with G a partition of {1,...,p}. g G The l 1 -l q norm sets to zero groups of non-overlapping variables (as opposed to single variables for the l 1 norm). Julien Mairal Recent Advances in Structured Sparse Models 10/46

11 Sparsity-Inducing Norms Ω(w) = g G w g q Applications of group sparsity: Selecting groups of features instead of individual variables. Multi-task learning. Multiple kernel learning. Drawbacks: Requires a partition of the features. Encodes fixed/static information. What happens when the groups overlap? [Jenatton et al., 2009] Inside the groups, the l 2 -norm (or l ) does not promote sparsity. Variables belonging to the same groups are encouraged to be set to zero together. Julien Mairal Recent Advances in Structured Sparse Models 11/46

12 Examples of set of groups G (1/3) [Jenatton et al., 2009] Selection of contiguous patterns on a sequence, p = 6. G is the set of blue groups. Any union of blue groups set to zero leads to the selection of a contiguous pattern. Julien Mairal Recent Advances in Structured Sparse Models 12/46

13 Examples of set of groups G (2/3) [Jenatton et al., 2009] Selection of rectangles on a 2-D grids, p = 25. G is the set of blue/green groups (with their not displayed complements). Any union of blue/green groups set to zero leads to the selection of a rectangle. Julien Mairal Recent Advances in Structured Sparse Models 13/46

14 Examples of set of groups G (3/3) [Jenatton et al., 2009] Selection of diamond-shaped patterns on a 2-D grids, p = 25. It is possible to extent such settings to 3-D space, or more complex topologies. Julien Mairal Recent Advances in Structured Sparse Models 14/46

15 Hierarchical Norms [Zhao et al., 2009, Bach, 2009] A node can be active only if its ancestors are active. The selected patterns are rooted subtrees. Julien Mairal Recent Advances in Structured Sparse Models 15/46

16 Group Lasso + Sparsity [Sprechmann et al., 2010] Julien Mairal Recent Advances in Structured Sparse Models 16/46

17 Application 1: Wavelet denoising with hierarchical norms Julien Mairal Recent Advances in Structured Sparse Models 17/46

18 Wavelet denoising with hierarchical norms [Jenatton, Mairal, Obozinski, and Bach, 2010b] Classical wavelet denoising [Donoho and Johnstone, 1995]: 1 min w R p 2 x Dw 2 2 +λ w 1, When D is orthogonal, the solution is obtained via soft-thresholding. Julien Mairal Recent Advances in Structured Sparse Models 18/46

19 Wavelet denoising with hierarchical norms [Jenatton, Mairal, Obozinski, and Bach, 2010b] Wavelet with hierarchical norm: Add a-priori knowledge that the coefficients are embedded in a tree. (a) Barb., σ = 50, l 1 (b) Barb., σ = 50, tree Julien Mairal Recent Advances in Structured Sparse Models 19/46

20 Wavelet denoising with hierarchical norms [Jenatton, Mairal, Obozinski, and Bach, 2010b] Benchmark on a database of 12 standard images: Haar PSNR IPSNR σ l 0 l 1 Ω l2 Ω l ± ± ± ± ± ± ± ± ± ± ± ± ± ± ±.19 Julien Mairal Recent Advances in Structured Sparse Models 20/46

21 Application 2: Hierarchical Dictionary Learning Julien Mairal Recent Advances in Structured Sparse Models 21/46

22 Hierarchical Dictionary Learning [Olshausen and Field, 1997, Elad and Aharon, 2006, Mairal et al., 2010a] We now consider a sequence {y i } m i=1, of signals in Rn. min W R p m,d C m i=1 1 2 yi Dw i 2 2 +λ w i 1, This can be rewritten as a matrix factorization problem 1 min W R p m,d C 2 Y DW 2 F +λ W 1,1. [Jenatton, Mairal, Obozinski, and Bach, 2010a]: What about replacing the l 1 -norm by a hierarchical norm? Julien Mairal Recent Advances in Structured Sparse Models 22/46

23 Hierarchical Dictionary Learning Dictionaries learned with the l 1-norm Julien Mairal Recent Advances in Structured Sparse Models 23/46

24 Hierarchical Dictionary Learning [Jenatton, Mairal, Obozinski, and Bach, 2010a] Julien Mairal Recent Advances in Structured Sparse Models 24/46

25 Application to patch reconstrution [Jenatton, Mairal, Obozinski, and Bach, 2010a] Reconstruction of 100, natural images patches Remove randomly subsampled pixels Reconstruct with matrix factorization and structured sparsity noise 50 % 60 % 70 % 80 % 90 % flat 19.3± ± ± ± ±0.0 tree 18.6± ± ± ± ± Julien Mairal Recent Advances in Structured Sparse Models 25/46

26 Hierarchical Topic Models for text corpora [Jenatton, Mairal, Obozinski, and Bach, 2010a] Each document is modeled through word counts Low-rank matrix factorization of word-document matrix Probabilistic topic models such as Latent Dirichlet Allocation [Blei et al., 2003] Organise the topics in a tree. Previously approached using non-parametric Bayesian methods (Hierarchical Chinese Restaurant Process and nested Dirichlet Process): [Blei et al., 2010] Can we achieve similar performance with simple matrix factorization formulation? Julien Mairal Recent Advances in Structured Sparse Models 26/46

27 Tree of Topics Julien Mairal Recent Advances in Structured Sparse Models 27/46

28 Classification based on topics Comparison on predicting newsgroup article subjects 20 newsgroup articles (1425 documents, words) Classification Accuracy (%) PCA + SVM NMF + SVM LDA + SVM SpDL + SVM SpHDL + SVM Number of Topics Julien Mairal Recent Advances in Structured Sparse Models 28/46

29 Application 3: Background Subtraction Julien Mairal Recent Advances in Structured Sparse Models 29/46

30 Background Subtraction Given a video sequence, how can we remove foreground objects? video sequence 1 video sequence 2 Julien Mairal Recent Advances in Structured Sparse Models 30/46

31 Background Subtraction Solved by }{{} x frame Dw }{{} linear combination of background frames + }{{} e. error term 1 min w R p,e R m 2 x Dw e 2 2 +λ 1 w +λ 2 Ω(e). Same idea as Wright et al. [2009] for robust face recognition, where Ω = l 1. We are going to use overlapping groups with 3 3 neighborhoods to add spatial consistency. See also Cehver et al. [2008] for structured sparsity + background subtraction. Julien Mairal Recent Advances in Structured Sparse Models 31/46

32 Background Subtraction (a) input (b) estimated background (c) foreground, l 1 (d) foreground, l 1+struct (e) other example Julien Mairal Recent Advances in Structured Sparse Models 32/46

33 Background Subtraction (a) input (b) estimated background (c) foreground, l 1 (d) foreground, l 1+struct (e) other example Julien Mairal Recent Advances in Structured Sparse Models 33/46

34 How do we optimize all that? Julien Mairal Recent Advances in Structured Sparse Models 34/46

35 First-order/proximal methods min f(w)+λω(w) w R p f is strictly convex and differentiable with a Lipshitz gradient. Generalizes the idea of gradient descent w k+1 argminf(w k )+ f(w k ) (w w k ) + L w R p }{{} 2 w wk 2 2+λΩ(w) }{{} linear approximation quadratic term 1 argmin w R p 2 w (wk 1 L f(wk )) λ L Ω(w) When λ = 0, w k+1 w k 1 L f(wk ), this is equivalent to a classical gradient descent step. Julien Mairal Recent Advances in Structured Sparse Models 35/46

36 First-order/proximal methods They require solving efficiently the proximal operator 1 min w R p 2 u w 2 2 +λω(w) For the l 1 -norm, this amounts to a soft-thresholding: w i = sign(u i )(u i λ) +. There exists accelerated versions based on Nesterov optimal first-order method (gradient method with extrapolation ) [Beck and Teboulle, 2009, Nesterov, 2007, 1983] suited for large-scale experiments. Julien Mairal Recent Advances in Structured Sparse Models 36/46

37 Tree-structured groups Proposition [Jenatton, Mairal, Obozinski, and Bach, 2010a] If G is a tree-structured set of groups, i.e., g,h G, g h = or g h or h g For q = 2 or q =, we define Prox g and Prox Ω as 1 Prox g :u argmin w R p 2 u w +λ w g q, 1 Prox Ω :u argmin w R p 2 u w +λ w g q, g G If the groups are sorted from the leaves to the root, then Prox Ω = Prox gm... Prox g1. Tree-structured regularization : Efficient linear time algorithm. Julien Mairal Recent Advances in Structured Sparse Models 37/46

38 General Overlapping Groups for q = Dual formulation [Jenatton, Mairal, Obozinski, and Bach, 2010a] The solutions w and ξ of the following optimization problems 1 min w R p 2 u w +λ w g, (Primal) 1 min ξ R p G 2 u ξ g 2 2 s.t. g G, ξ g 1 λ and ξ g j = 0 if j / g, g G (Dual) satisfy w = u g Gξ g. (Primal-dual relation) The dual formulation has more variables, but no overlapping constraints. Julien Mairal Recent Advances in Structured Sparse Models 38/46

39 General Overlapping Groups for q = [Mairal, Jenatton, Obozinski, and Bach, 2010b] First Step: Flip the signs of u The dual is equivalent to a quadratic min-cost flow problem. 1 min ξ R p G 2 u ξ g 2 2 s.t. g G, ξ g j λ and ξ g j = 0 if j / g, + g G j g Julien Mairal Recent Advances in Structured Sparse Models 39/46

40 General Overlapping Groups for q = Example: G = {g = {1,...,p}} 1 min ξ g R p 2 u ξg 2 2 s.t. + s p ξ g j λ. j=1 ξ g 1 +ξg 2 +ξg 3 λ g ξ1 ξ g 2 g ξ g 3 u 1 u 2 u 3 ξ 1,c 1 ξ 2,c 2 ξ 3,c 3 t Figure: G={g={1,2,3}}, j, c j = 1 2 (u j ξ j ) 2. Julien Mairal Recent Advances in Structured Sparse Models 40/46

41 General Overlapping Groups for q = Example with two overlapping groups min ξ R p G u ξ g 2 2 s.t. g G, ξ g j λ and ξ g j = 0 if j / g, g G j g s ξ g 1 +ξg 2 λ g ξ1 u 1 g ξ g 2 u 2 ξ h 2+ξ h 3 λ h ξ2 h ξ h 3 u 3 ξ 1,c 1 ξ 2,c 2 ξ 3,c 3 t Figure: G={g={1,2},h={2,3}}, j, c j = 1 2 (u j ξ j ) 2. Julien Mairal Recent Advances in Structured Sparse Models 41/46

42 General Overlapping Groups for q = [Mairal, Jenatton, Obozinski, and Bach, 2010b] Main ideas of the algorithm: Divide and conquer 1 Solve a relaxed problem in linear time. 2 Test the feasability of the solution for the non-relaxed problem with a max-flow. 3 If the solution is feasible, it is optimal and stop the algorithm. 4 If not, find a minimum cut and removes the arcs along the cut. 5 Recursively process each part of the graph. The algorithm is guaranteed to converge to the solution. See more details in the paper Julien Mairal Recent Advances in Structured Sparse Models 42/46

43 Conclusions We have developed efficient and large-scale algorithmic tools for solving structured sparse decomposition problems. These tools are related to network flow optimization. The hierarchical case can be solved at the same cost as l 1. There are preliminary applications in computer vision, there should be more! Julien Mairal Recent Advances in Structured Sparse Models 43/46

44 References I F. Bach. High-dimensional non-linear variable selection through hierarchical kernel learning. Technical report, arxiv: , A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1): , D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3: , January D. Blei, T. Griffiths, and M. Jordan. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. Journal of the ACM, 57(2): 1 30, V. Cehver, M. F. Duarte, C. Hegde, and R. G. Baraniuk. Sparse signal recovery usingmarkov random fields. In Advances in Neural Information Processing Systems, S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20:33 61, D. L. Donoho and I. M. Johnstone. Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association, 90(432): , Julien Mairal Recent Advances in Structured Sparse Models 44/46

45 References II M. Elad and M. Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 54(12): , December R. Jenatton, J-Y. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. Technical report, preprint arxiv: v1. R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for sparse hierarchical dictionary learning. In Proceedings of the International Conference on Machine Learning (ICML), 2010a. R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for hierarchical sparse coding. Technical report, 2010b. submitted, arxiv: J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 2010a. J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Network flow algorithms for structured sparsity. In Advances in Neural Information Processing Systems, 2010b. Y. Nesterov. A method for solving a convex programming problem with convergence rate O(1/k 2 ). Soviet Math. Dokl., 27: , Y. Nesterov. Gradient methods for minimizing composite objective function. Technical report, CORE, Julien Mairal Recent Advances in Structured Sparse Models 45/46

46 References III B. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37: , P. Sprechmann, I. Ramirez, G. Sapiro, and Y. C. Eldar. Collaborative hierarchical sparse modeling. Technical report, Preprint arxiv: v1. R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B, 58(1): , J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages , M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68:49 67, P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. 37(6A): , Julien Mairal Recent Advances in Structured Sparse Models 46/46

Topographic Dictionary Learning with Structured Sparsity

Topographic Dictionary Learning with Structured Sparsity Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity

More information

Structured Sparse Estimation with Network Flow Optimization

Structured Sparse Estimation with Network Flow Optimization Structured Sparse Estimation with Network Flow Optimization Julien Mairal University of California, Berkeley Neyman seminar, Berkeley Julien Mairal Neyman seminar, UC Berkeley /48 Purpose of the talk introduce

More information

Sparse Estimation and Dictionary Learning

Sparse Estimation and Dictionary Learning Sparse Estimation and Dictionary Learning (for Biostatistics?) Julien Mairal Biostatistics Seminar, UC Berkeley Julien Mairal Sparse Estimation and Dictionary Learning Methods 1/69 What this talk is about?

More information

Parcimonie en apprentissage statistique

Parcimonie en apprentissage statistique Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Structured sparsity-inducing norms through submodular functions

Structured sparsity-inducing norms through submodular functions Structured sparsity-inducing norms through submodular functions Francis Bach Sierra team, INRIA - Ecole Normale Supérieure - CNRS Thanks to R. Jenatton, J. Mairal, G. Obozinski June 2011 Outline Introduction:

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning)

Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Julien Mairal UC Berkeley Edinburgh, ICML, June 2012 Julien Mairal, UC Berkeley Backpropagation Rules for Sparse Coding 1/57 Other

More information

Structured sparsity through convex optimization

Structured sparsity through convex optimization Structured sparsity through convex optimization Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with R. Jenatton, J. Mairal, G. Obozinski Journées INRIA - Apprentissage - December

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Optimization for Sparse Estimation and Structured Sparsity

Optimization for Sparse Estimation and Structured Sparsity Optimization for Sparse Estimation and Structured Sparsity Julien Mairal INRIA LEAR, Grenoble IMA, Minneapolis, June 2013 Short Course Applied Statistics and Machine Learning Julien Mairal Optimization

More information

Hierarchical kernel learning

Hierarchical kernel learning Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology ..... SPARROW SPARse approximation Weighted regression Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Université de Montréal March 12, 2012 SPARROW 1/47 .....

More information

Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning

Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Julien Mairal Inria, LEAR Team, Grenoble Journées MAS, Toulouse Julien Mairal Incremental and Stochastic

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Proximal Methods for Hierarchical Sparse Coding

Proximal Methods for Hierarchical Sparse Coding Proximal Methods for Hierarchical Sparse Coding Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, Francis Bach To cite this version: Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, Francis

More information

Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries

Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries Jarvis Haupt University of Minnesota Department of Electrical and Computer Engineering Supported by Motivation New Agile Sensing

More information

Structured sparsity-inducing norms through submodular functions

Structured sparsity-inducing norms through submodular functions Structured sparsity-inducing norms through submodular functions Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with R. Jenatton, J. Mairal, G. Obozinski IPAM - February 2013 Outline

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion Semi-supervised ictionary Learning Based on Hilbert-Schmidt Independence Criterion Mehrdad J. Gangeh 1, Safaa M.A. Bedawi 2, Ali Ghodsi 3, and Fakhri Karray 2 1 epartments of Medical Biophysics, and Radiation

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Fast Hard Thresholding with Nesterov s Gradient Method

Fast Hard Thresholding with Nesterov s Gradient Method Fast Hard Thresholding with Nesterov s Gradient Method Volkan Cevher Idiap Research Institute Ecole Polytechnique Federale de ausanne volkan.cevher@epfl.ch Sina Jafarpour Department of Computer Science

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

Introduction to Three Paradigms in Machine Learning. Julien Mairal

Introduction to Three Paradigms in Machine Learning. Julien Mairal Introduction to Three Paradigms in Machine Learning Julien Mairal Inria Grenoble Yerevan, 208 Julien Mairal Introduction to Three Paradigms in Machine Learning /25 Optimization is central to machine learning

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

Structured sparsity-inducing norms through submodular functions

Structured sparsity-inducing norms through submodular functions Structured sparsity-inducing norms through submodular functions Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with R. Jenatton, J. Mairal, G. Obozinski IMA - September 2011 Outline

More information

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,

More information

On Optimal Frame Conditioners

On Optimal Frame Conditioners On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

operator and Group LASSO penalties

operator and Group LASSO penalties Multidimensional shrinkage-thresholding operator and Group LASSO penalties Arnau Tibau Puig, Ami Wiesel, Gilles Fleury and Alfred O. Hero III Abstract The scalar shrinkage-thresholding operator is a key

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Structured sparsity through convex optimization

Structured sparsity through convex optimization Structured sparsity through convex optimization Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with R. Jenatton, J. Mairal, G. Obozinski Cambridge University - May 2012 Outline

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Structured sparsity through convex optimization

Structured sparsity through convex optimization Submitted to the Statistical Science Structured sparsity through convex optimization Francis Bach, Rodolphe Jenatton, Julien Mairal and Guillaume Obozinski INRIA and University of California, Berkeley

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

An Efficient Proximal Gradient Method for General Structured Sparse Learning

An Efficient Proximal Gradient Method for General Structured Sparse Learning Journal of Machine Learning Research 11 (2010) Submitted 11/2010; Published An Efficient Proximal Gradient Method for General Structured Sparse Learning Xi Chen Qihang Lin Seyoung Kim Jaime G. Carbonell

More information

MULTI-SCALE BAYESIAN RECONSTRUCTION OF COMPRESSIVE X-RAY IMAGE. Jiaji Huang, Xin Yuan, and Robert Calderbank

MULTI-SCALE BAYESIAN RECONSTRUCTION OF COMPRESSIVE X-RAY IMAGE. Jiaji Huang, Xin Yuan, and Robert Calderbank MULTI-SCALE BAYESIAN RECONSTRUCTION OF COMPRESSIVE X-RAY IMAGE Jiaji Huang, Xin Yuan, and Robert Calderbank Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, U.S.A

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows

Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows Journal of Machine Learning Research 4 (03) 449-485 Submitted 4/; Revised 3/3; Published 8/3 Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows Julien Mairal Bin Yu Department

More information

Multiple Change Point Detection by Sparse Parameter Estimation

Multiple Change Point Detection by Sparse Parameter Estimation Multiple Change Point Detection by Sparse Parameter Estimation Department of Econometrics Fac. of Economics and Management University of Defence Brno, Czech Republic Dept. of Appl. Math. and Comp. Sci.

More information

Bayesian Paradigm. Maximum A Posteriori Estimation

Bayesian Paradigm. Maximum A Posteriori Estimation Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Generalized Conditional Gradient and Its Applications

Generalized Conditional Gradient and Its Applications Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Learning MMSE Optimal Thresholds for FISTA

Learning MMSE Optimal Thresholds for FISTA MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Learning MMSE Optimal Thresholds for FISTA Kamilov, U.; Mansour, H. TR2016-111 August 2016 Abstract Fast iterative shrinkage/thresholding algorithm

More information

Some Recent Advances. in Non-convex Optimization. Purushottam Kar IIT KANPUR

Some Recent Advances. in Non-convex Optimization. Purushottam Kar IIT KANPUR Some Recent Advances in Non-convex Optimization Purushottam Kar IIT KANPUR Outline of the Talk Recap of Convex Optimization Why Non-convex Optimization? Non-convex Optimization: A Brief Introduction Robust

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

ONLINE DICTIONARY LEARNING OVER DISTRIBUTED MODELS. Jianshu Chen, Zaid J. Towfic, and Ali H. Sayed

ONLINE DICTIONARY LEARNING OVER DISTRIBUTED MODELS. Jianshu Chen, Zaid J. Towfic, and Ali H. Sayed 014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP ONLINE DICTIONARY LEARNING OVER DISTRIBUTED MODELS Jianshu Chen, Zaid J. Towfic, and Ali H. Sayed Department of Electrical

More information

DNNs for Sparse Coding and Dictionary Learning

DNNs for Sparse Coding and Dictionary Learning DNNs for Sparse Coding and Dictionary Learning Subhadip Mukherjee, Debabrata Mahapatra, and Chandra Sekhar Seelamantula Department of Electrical Engineering, Indian Institute of Science, Bangalore 5612,

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

Invertible Nonlinear Dimensionality Reduction via Joint Dictionary Learning

Invertible Nonlinear Dimensionality Reduction via Joint Dictionary Learning Invertible Nonlinear Dimensionality Reduction via Joint Dictionary Learning Xian Wei, Martin Kleinsteuber, and Hao Shen Department of Electrical and Computer Engineering Technische Universität München,

More information

Online Dictionary Learning with Group Structure Inducing Norms

Online Dictionary Learning with Group Structure Inducing Norms Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,

More information

Efficient Methods for Overlapping Group Lasso

Efficient Methods for Overlapping Group Lasso Efficient Methods for Overlapping Group Lasso Lei Yuan Arizona State University Tempe, AZ, 85287 Lei.Yuan@asu.edu Jun Liu Arizona State University Tempe, AZ, 85287 j.liu@asu.edu Jieping Ye Arizona State

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

GROUP SPARSITY WITH OVERLAPPING PARTITION FUNCTIONS

GROUP SPARSITY WITH OVERLAPPING PARTITION FUNCTIONS GROUP SPARSITY WITH OVERLAPPING PARTITION FUNCTIONS Gabriel Peyré, Jalal Fadili 2 Ceremade CNRS-Univ. Paris-Dauphine, France Gabriel.Peyre@ceremade.dauphine.fr 2 GREYC CNRS-ENSICAEN-Univ. Caen, France

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Trace Lasso: a trace norm regularization for correlated designs

Trace Lasso: a trace norm regularization for correlated designs Author manuscript, published in "NIPS 2012 - Neural Information Processing Systems, Lake Tahoe : United States (2012)" Trace Lasso: a trace norm regularization for correlated designs Edouard Grave edouard.grave@inria.fr

More information

The Iteration-Tuned Dictionary for Sparse Representations

The Iteration-Tuned Dictionary for Sparse Representations The Iteration-Tuned Dictionary for Sparse Representations Joaquin Zepeda #1, Christine Guillemot #2, Ewa Kijak 3 # INRIA Centre Rennes - Bretagne Atlantique Campus de Beaulieu, 35042 Rennes Cedex, FRANCE

More information

Path Coding Penalties for Directed Acyclic Graphs

Path Coding Penalties for Directed Acyclic Graphs Path Coding Penalties for Directed Acyclic Graphs Julien Mairal Department of Statistics University of California, Berkeley julien@stat.berkeley.edu BinYu Department of Statistics University of California,

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Classical Predictive Models

Classical Predictive Models Laplace Max-margin Markov Networks Recent Advances in Learning SPARSE Structured I/O Models: models, algorithms, and applications Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology

More information

THe linear decomposition of data using a few elements

THe linear decomposition of data using a few elements 1 Task-Driven Dictionary Learning Julien Mairal, Francis Bach, and Jean Ponce arxiv:1009.5358v2 [stat.ml] 9 Sep 2013 Abstract Modeling data with linear combinations of a few elements from a learned dictionary

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Efficient Adaptive Compressive Sensing Using Sparse Hierarchical Learned Dictionaries

Efficient Adaptive Compressive Sensing Using Sparse Hierarchical Learned Dictionaries 1 Efficient Adaptive Compressive Sensing Using Sparse Hierarchical Learned Dictionaries Akshay Soni and Jarvis Haupt University of Minnesota, Twin Cities Department of Electrical and Computer Engineering

More information

Stochastic gradient descent and robustness to ill-conditioning

Stochastic gradient descent and robustness to ill-conditioning Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1 10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex

More information

arxiv: v1 [stat.ml] 22 Nov 2016

arxiv: v1 [stat.ml] 22 Nov 2016 Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery arxiv:1611.07252v1 [stat.ml] 22 Nov 2016 Scott Wisdom 1, Thomas Powers 1, James Pitton 1,2, and Les Atlas 1 1 Department of Electrical

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information