Parcimonie en apprentissage statistique

Size: px
Start display at page:

Download "Parcimonie en apprentissage statistique"

Transcription

1 Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44

2 Classical supervised learning setup (ERM) Data: (x 1, y 1 ),..., (x i, y i ),..., (x n, y n ) f H function to learn Loss function: l : (y, a) l(y, a) e.g. l(y, a) = 1 2 (y a)2, logistic loss, hinge loss, etc. Empirical Risk Minimization 1 l(f (x i ), y i ) + λ f 2 H f H n }{{} }{{} Regularization Empirical Risk H typically an RKHS λ: regularization coefficient λ controls the complexity of the function that we are willing to learn for a given amount of data. Parcimonie en apprentissage 2/44

3 Learning linear functions Restricting to linear functions f w : x w x i 1 w R p n l(w x i, y i ) + λ 2 w 2 For the square loss ridge regression Issue: number of features p typically large compared to the amount of data Alternative to regularization provided by sparsity Reducing the number of features entering the models yields another way to control model complexity more interpretable models (very important in biomedical applications) computationally efficient algorithms Parcimonie en apprentissage 3/44

4 A sparse signal y R n is the signal X R n p is some overcomplete basis w is the sparse representation of the signal Find w sparse such that y = Xw Classical signal processing formulation of the problem w 0 s.t. y = Xw. Problem: there is noise... and noise is not sparse w 0 s.t. y Xw 2 ɛ. y Xw 2 2 s.t. w 0 k These problems are NP-hard. 1 2 y Xw λ w 0 Parcimonie en apprentissage 4/44

5 Approaches Greedy Methods Matching Pursuit (MP) Orthogonal Matching Pursuit (OMP) Least-square OMP CoSamp Relaxation Methods Lasso/Basis Pursuit Dantzig Selector Bayesian Methods Spike and Slab priors (ARD) Empirical Bayes Parcimonie en apprentissage 5/44

6 A convex relaxation... Empirical risk: for w R p, L(w) = 1 2n (y i xi w) 2 Supp(w) = 1 {wi 0} Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection Lasso L(w) + λ Supp(w) w Rd 0 w R d L(w) + λ w 1 Parcimonie en apprentissage 6/44

7 Related formulations through convex programs Basis Pursuit Basis Pursuit ( noisy setting) Dantzig Selector Remarks w w 1 s.t. y = Xw w w 1 s.t. y Xw 2 η w w 1 s.t. X (y Xw) λ Minima not necessarily unique Dantzig Selector linear program The optimality conditions for the Lasso require X (y Xw) λ Parcimonie en apprentissage 7/44

8 Optimization algorithms Generic approaches For the Lasso, with interior point methods. Subgradient descent Efficient first order methods Coordinate descent methods Proximal methods Reweighted l 2 methods (esp. for structured sparse methods) Active set methods For the Lasso: LARS algorithm In general: meta-algorithms to combine with methods above see e.g. Bach et al. (2012) Parcimonie en apprentissage 8/44

9 Software: SPAMs toolbox Toobox developped by Julien Mairal C++ interfaced with Matlab, R, Python. proximal gradient methods for l 0, l 1, elastic net, fused-lasso, group-lasso and more... for square, logistic, multi-class logistic loss functions handles sparse matrices fast implementations of OMP and LARS dictionary learning and matrix factorization (NMF, sparse PCA). Parcimonie en apprentissage 9/44

10 Correlation, Stability and Elastic Net In the presence of important correlations between the variables, Lasso choses arbitrarily Stability issues Elastic Net (Zou and Hastie, 2005) w 1 2n y Xw λ w 1 + µ w 2 2 Makes the optimization problem strongly convex always a unique solution faster convergence for many algorithms Selects correlated variables together. Two intuitions/views by decorrelating them w 1 ( w (X X + µi )w + 2y Xw + y y ) + λ w 1 2n by smoothing: for a pair of very correlated variables x 1, x 2 encourages w 1 w 2 Better behavior with heavily correlated variables Parcimonie en apprentissage 10/44

11 Comparing Lasso and other strategies for linear regression Compare: 1 Ridge regression: w R p 2 y Xw λ 2 w Lasso: w R p 2 y Xw λ w 1 1 OMP/FS: w R p 2 y Xw λ w 0 Each method builds a path of solutions from 0 to ordinary least-squares solution Regularization parameters selected on the test set Parcimonie en apprentissage 11/44

12 Simulation results i.i.d. Gaussian design matrix, k = 4, n = 64, p [2, 256], SNR = 1 Note stability to non-sparsity and variability L1 L2 greedy oracle L1 L2 greedy mean square error mean square error log (p) 2 Sparse log (p) 2 Rotated (non sparse) Parcimonie en apprentissage 12/44

13 Advantages and Drawbacks of l 1 vs l 0 penalization Advantages The solution α(x) is a continuous (differentiable on the support) function of the data x. The l 1 -norm is more robust to violation of the sparsity assumption. It controls the influence of spuriously introduced variables (like l 0 +l 2 ) The convex formulation leads to principled algorithms that generalize well to new situations and natural theoretical analyses. Drawbacks It introduces an estimation bias which leads to the selection of two many variables if ignored. Some of the l 0 algorithms are simpler. Parcimonie en apprentissage 13/44

14 Group sparse models Parcimonie en apprentissage 14/44

15 From l 1 -regularization... 1 w R p n (y (i) w x (i) ) 2 + λ w 1 with w 1 = p j=1 w j. Parcimonie en apprentissage 15/44

16 ...to penalization with grouped variables Assume that {1,..., p} is partitioned into m groups G 1,..., G m w = (w G1,..., w Gm ) and x = (x G1,..., x Gm ) 1 w R p n m l(w x (i), y (i) ) + λ w Gj Group Lasso (Yuan and Lin, 2007) j=1 1 w R p n m (y (i) w x (i) ) 2 + λ w Gj j=1 The l 1 /l 2 norm: Ω(w) : = G G w G 2 Unit ball in R 3 : (w 1, w 2 ) + w 3 1 Some entire groups set to 0 No zero within groups Parcimonie en apprentissage 16/44

17 l 1 /l q -regularization Can also consider l 1 /l -norm More non-differentiabilities Applications Group of noal variables (dummy binary variables) Learn sums of polynomial functions: f (x) = f (x 1 ) f (x p ) w 1 n ( w jk x (i) k ) 2 p j y (i) + (w j1,..., w jk ) 2 j,k j=1 j: variables i: observations k: degree of monomial Parcimonie en apprentissage 17/44

18 Algorithms for l 1 /l 2 -regularization 1 l(w x (i), y (i) ) w R p n }{{} f (w) Reweighted l 2 algorithms Proximal methods Blockwise coordinate descent +λ m w Gj 2 j=1 } {{ } Ω(w) Parcimonie en apprentissage 18/44

19 Sparsity in function space and multiple kernel learning Parcimonie en apprentissage 19/44

20 Introducing a feature map Feature map φ : x φ(x) Maps the input data to a richer possibly more explicit feature space Typical high dimensional or possibly infinite dimensional space l(w φ(x i ), y i ) + λ w 2 w 2. Parcimonie en apprentissage 20/44

21 Changing the dot product Let x = (x 1, x 2 ) R 2 and φ(x) = (x 1, x 2, x 2 1, x 2 2, 2x 1 x 2 ). φ(x), φ(y) = x 1 y 1 + x 2 y 2 + x1 2 y1 2 + x2 2 y x 1 x 2 y 1 y 2 = x 1 y 1 + x 2 y 2 + (x 1 y 1 ) 2 + (x 2 y 2 ) 2 + 2(x 1 y 1 )(x 2 y 2 ) = x, y + x, y 2 For w = (0, 0, 1, 1, 0), w φ(x) 1 0 x 2 1. Linear separators in R 5 correspond to conic separators in R 2. Let x = (x 1,..., x p ) R p and φ(x) = (x 1,..., x p, x 2 1,..., x 2 p, 2x 1 x 2,..., 2x i x j,... 2x p 1 x p ). Still have φ(x), φ(y) = x, y + x, y 2 But explicit mapping too expensive to compute: φ(x) R p+p(p+1)/2. Parcimonie en apprentissage 21/44

22 Duality for regularized empirical risk imization Define ψ i : u l(u, y i ). Let Ω be a norm, consider 1 w n l(w x i, y i ) + Ω(w) 2 u,w u,w max u,w α max α u,w max α 1 n 1 n 1 n 1 n 1 n ψ i (u i ) + λ 2 Ω(w)2 s.t. i, u i = w x i ψ i (u i ) + λ 2 Ω(w)2 s.t. u = Xw ψ i (u i ) + λ 2 Ω(w)2 λα (u Xw) [ ] [ 1 ψi (u i ) (nλα i )u i + λ 2 Ω(w)2 + w (X α)] ψi (nλα i ) λ 2 Ω ( X α ) 2 Parcimonie en apprentissage 22/44

23 Representer property and kernelized version Consider the special case Ω = 2 : 1 w n l(w x i, y i ) + w 2 2 max α max α 1 n 1 n ψi (nλα i ) λ X α ψi (nλα i ) λ 2 α Kα with K = XX With the relation between optimal solutions: w = X α = αi x i So if we replace x i with φ(x i ), we have w = n α i φ(x i). And f (x) = w, φ(x) = αi φ(x i ), φ(x) = αi K(x i, x). Parcimonie en apprentissage 23/44

24 Regularization for multiple features x Φ 1 (x) w 1.. Φ j (x) w j. Φ p (x). w p w 1 Φ 1 (x) + + w p Φ p (x) Concatenating feature spaces is equivalent to sumg kernels p w j 2 2 K = j=1 p j=1 K j Parcimonie en apprentissage 24/44

25 General kernel learning (Lanckriet et al, 2004, Bach et al., 2005, Micchelli and Pontil, 2005) G(K) = w F n l(y i, w Φ(x i )) + λ 2 w 2 2 = max l α R n i (λα i ) λ 2 α Kα is a convex function of the kernel matrix K Learning a convex combination of kernels Given K 1,..., K p, consider the formulation ( ) G η j K p s.t. η j = 1, η j 0 η j The simplex constraints, like the l 1 -norm induce sparsity. j Parcimonie en apprentissage 25/44

26 Multiple kernel learning and l 1 /l 2 Block l 1 -norm problem: l(y i, w1 Φ 1 (x i ) + + wp Φ p (x i )) + λ 2 ( w w p 2 ) 2 Proposition: Block l 1 -norm regularization is equivalent to imizing with respect to η the optimal value G( p j=1 η jk j ) (sparse) weights η obtained from optimality conditions dual parameters α optimal for K = p j=1 η jk j, Single optimization problem for learning both η and α Parcimonie en apprentissage 26/44

27 Proof of equivalence w 1,...,w p = l ( y i, w 1,...,w p j η j =1 = j η j =1 w 1,..., w p = j η j =1 w l ( y i, l ( y i, p wj Φ j (x i ) ) + λ ( p ) 2 w j 2 j=1 p wj Φ j (x i ) ) + λ j=1 p j=1 j=1 p w j 2 2/η j j=1 η 1/2 j w j Φ j (x i ) ) + λ p j=1 l ( y i, w Ψ η (x i ) ) + λ w 2 2 w j 2 2 with w j = w j η 1/2 j with Ψ η (x) = (η 1/2 1 Φ 1 (x),..., ηp 1/2 Φ p (x)) We have: Ψ η (x) Ψ η (x ) = p j=1 η jk j (x, x ) with p j=1 η j = 1 Parcimonie en apprentissage 27/44

28 Dictionary Learning Parcimonie en apprentissage 28/44

29 Dictionary learning for image denoising (Elad and Aharon, 2006) }{{} x = x }{{} 0 + }{{} ε measurements original image noise Parcimonie en apprentissage 29/44

30 Sparse PCA / Dictionary Learning Sparse PCA Dictionary Learning X = D. α X = D. α e.g. microarray data sparse dictionary (Witten et al., 2009; Bach et al., 2008) e.g. overcomplete dictionaries for natural images sparse decomposition (Elad and Aharon, 2006) Parcimonie en apprentissage 30/44

31 K-SVD (Aharon et al., 2006) Formulation: 1 D,α 2 x i Dα i 2 i s.t. α i 0 k Idea: 1 Decompose the signals on the current dictionary using a greedy algorithm 2 Fix the obtained supports 3 For all j in a sequence, update the jth atom and the decomposition coefficient on this atom to optimize the fit on the set of of signals that use this atom. 4 Iterate until the supports don t change any longer Parcimonie en apprentissage 31/44

32 K-SVD algorithm Algorithm 1 K-SVD repeat for i = 1 to n do Find α i using a greedy algorithm (e.g. OMP) end for for j = 1 to K do I j {i α ij 0} E (j) X :,Ij k j d(k) α k,ij Solve (using Lanczos algorithm) (d (j), α j,ij ) arg d,α E(j) dα 2 s.t. d 2 = 1 end for until none of the I j change anymore. Parcimonie en apprentissage 32/44

33 K-SVD: heuristics Replace atoms that are not much used by least explain datapoints in the dataset Remove atoms that are too correlated to other existing atoms and replace them by least explained datapoints. Parcimonie en apprentissage 33/44

34 l 1 formulation for Dictionary Learning 1 A R k n 2 D R p k ( xi Dα i 2 ) 2 + λ α i 1 s.t. j, d j 2 1. In both cases no orthogonality Not jointly convex but convex in each d j and α j Classical optimization alternates between D and α. Parcimonie en apprentissage 34/44

35 Block-coordinate descent for DL (Lee et al., 2007; Witten et al., 2009) U,V X UV 2 F + λ i v j 1 s.t. u j 2 1 Denote X (j) = X j j u j v j. Mimimizing w.r.t to u j : X (j) u j v u j 2 F s.t. u j 2 1 solved by u j X (j) v j j X (j) v j. Mimimizing w.r.t to v j : v j X (j) u j vj 2 F + λ i v j 1 Soft-thresholding requires no matrix inversion + can take advantage of efficient algorithms for Lasso can use warm start + active sets Parcimonie en apprentissage 35/44

36 Dictionary learning for image denoising Extract all overlapping 8 8 patches x i R 64. Form the matrix X = [x 1,..., x n ] R 64 n Solve a matrix factorization problem: A R k n D R p k ( xi Dα i 2 ) 2 + λ α i 1 s.t. j, d j 2 1. where α i is sparse, and D is the dictionary Each patch is decomposed into x i = Dα i Average the reconstruction Dα i of each patch x i to reconstruct a full-sized image y The number of patches n is large (= number of pixels) y use stochastic optimization/online learning (Mairal et al., 2009a) can handle potentially infinite datasets can adapt to dynamic training sets Parcimonie en apprentissage 36/44

37 Denoising result (Mairal et al., 2009b) Parcimonie en apprentissage 37/44

38 Dictionary Learning for Image Inpainting Example from Mairal et al. (2008) Parcimonie en apprentissage 38/44

39 What does the dictionary V look like? Parcimonie en apprentissage 39/44

40 Hierarchical sparsity and some applications Joint work with Rodolphe Jenatton, Julien Mairal and Francis Bach Parcimonie en apprentissage 40/44

41 Hierarchical Norms (Zhao et al., 2009; Bach, 2008) (Jenatton, Mairal, Obozinski and Bach, 2010a) Dictionary element selected only after its ancestors Structure on codes α (not on individual dictionary elements d i ) Hierarchical penalization: Ω(α) = g G α g 2 where groups g in G are equal to set of descendants of some nodes in a tree Parcimonie en apprentissage 41/44

42 Hierarchical dictionary for image patches Parcimonie en apprentissage 42/44

43 References I Aharon, M., Elad, M., and Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. Signal Processing, IEEE Transactions on, 54(11): Bach, F. (2008). Exploring large feature spaces with hierarchical multiple kernel learning. In Advances in Neural Information Processing Systems. Bach, F., Jenatton, R., Mairal, J., and Obozinski, G. (2012). Optimization with sparsity-inducing penalties. Foundations and Trends R in Machine Learning, 4(1): Bach, F., Mairal, J., and Ponce, J. (2008). Convex sparse matrix factorizations. Technical Report , ArXiv. Elad, M. and Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12): Lee, H., Battle, A., Raina, R., and Ng, A. (2007). Efficient sparse coding algorithms. In Advances in Neural Information Processing Systems (NIPS). Mairal, J., Bach, F., Ponce, J., and Sapiro, G. (2009a). Online dictionary learning for sparse coding. In International Conference on Machine Learning (ICML). Mairal, J., Bach, F., Ponce, J., Sapiro, G., and Zisserman, A. (2009b). Non-local sparse models for image restoration. In International Conference on Computer Vision (ICCV). Mairal, J., Sapiro, G., and Elad, M. (2008). Learning multiscale sparse representations for image and video restoration. SIAM Multiscale Modeling and Simulation, 7(1): Parcimonie en apprentissage 43/44

44 References II Witten, D., Tibshirani, R., and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3): Yuan, M. and Lin, Y. (2007). On the non-negative garrotte estimator. Journal of The Royal Statistical Society Series B, 69(2): Zhao, P., Rocha, G., and Yu, B. (2009). Grouped and hierarchical model selection through composite absolute penalties. Annals of Statistics, 37(6A): Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B (Statistical Methodology), 67(2): Parcimonie en apprentissage 44/44

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Topographic Dictionary Learning with Structured Sparsity

Topographic Dictionary Learning with Structured Sparsity Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity

More information

Recent Advances in Structured Sparse Models

Recent Advances in Structured Sparse Models Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured

More information

Sparse Estimation and Dictionary Learning

Sparse Estimation and Dictionary Learning Sparse Estimation and Dictionary Learning (for Biostatistics?) Julien Mairal Biostatistics Seminar, UC Berkeley Julien Mairal Sparse Estimation and Dictionary Learning Methods 1/69 What this talk is about?

More information

Structured Sparse Estimation with Network Flow Optimization

Structured Sparse Estimation with Network Flow Optimization Structured Sparse Estimation with Network Flow Optimization Julien Mairal University of California, Berkeley Neyman seminar, Berkeley Julien Mairal Neyman seminar, UC Berkeley /48 Purpose of the talk introduce

More information

Hierarchical kernel learning

Hierarchical kernel learning Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel

More information

Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning)

Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Backpropagation Rules for Sparse Coding (Task-Driven Dictionary Learning) Julien Mairal UC Berkeley Edinburgh, ICML, June 2012 Julien Mairal, UC Berkeley Backpropagation Rules for Sparse Coding 1/57 Other

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Structured sparsity-inducing norms through submodular functions

Structured sparsity-inducing norms through submodular functions Structured sparsity-inducing norms through submodular functions Francis Bach Sierra team, INRIA - Ecole Normale Supérieure - CNRS Thanks to R. Jenatton, J. Mairal, G. Obozinski June 2011 Outline Introduction:

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

2.3. Clustering or vector quantization 57

2.3. Clustering or vector quantization 57 Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :

More information

Generalized Conditional Gradient and Its Applications

Generalized Conditional Gradient and Its Applications Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning Francis Bach INRIA - Willow Project, École Normale Supérieure 45, rue d Ulm, 75230 Paris, France francis.bach@mines.org Abstract

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning

Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Julien Mairal Inria, LEAR Team, Grenoble Journées MAS, Toulouse Julien Mairal Incremental and Stochastic

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

Computing regularization paths for learning multiple kernels

Computing regularization paths for learning multiple kernels Computing regularization paths for learning multiple kernels Francis Bach Romain Thibaux Michael Jordan Computer Science, UC Berkeley December, 24 Code available at www.cs.berkeley.edu/~fbach Computing

More information

Trace Lasso: a trace norm regularization for correlated designs

Trace Lasso: a trace norm regularization for correlated designs Author manuscript, published in "NIPS 2012 - Neural Information Processing Systems, Lake Tahoe : United States (2012)" Trace Lasso: a trace norm regularization for correlated designs Edouard Grave edouard.grave@inria.fr

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Optimization for Sparse Estimation and Structured Sparsity

Optimization for Sparse Estimation and Structured Sparsity Optimization for Sparse Estimation and Structured Sparsity Julien Mairal INRIA LEAR, Grenoble IMA, Minneapolis, June 2013 Short Course Applied Statistics and Machine Learning Julien Mairal Optimization

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality

A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality A Multi-task Learning Strategy for Unsupervised Clustering via Explicitly Separating the Commonality Shu Kong, Donghui Wang Dept. of Computer Science and Technology, Zhejiang University, Hangzhou 317,

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Structured sparsity through convex optimization

Structured sparsity through convex optimization Structured sparsity through convex optimization Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with R. Jenatton, J. Mairal, G. Obozinski Journées INRIA - Apprentissage - December

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

Online Dictionary Learning with Group Structure Inducing Norms

Online Dictionary Learning with Group Structure Inducing Norms Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,

More information

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion Semi-supervised ictionary Learning Based on Hilbert-Schmidt Independence Criterion Mehrdad J. Gangeh 1, Safaa M.A. Bedawi 2, Ali Ghodsi 3, and Fakhri Karray 2 1 epartments of Medical Biophysics, and Radiation

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

Applied Machine Learning for Biomedical Engineering. Enrico Grisan

Applied Machine Learning for Biomedical Engineering. Enrico Grisan Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination

More information

Is the test error unbiased for these programs? 2017 Kevin Jamieson

Is the test error unbiased for these programs? 2017 Kevin Jamieson Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

L 2,1 Norm and its Applications

L 2,1 Norm and its Applications L 2, Norm and its Applications Yale Chang Introduction According to the structure of the constraints, the sparsity can be obtained from three types of regularizers for different purposes.. Flat Sparsity.

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

Dual Augmented Lagrangian, Proximal Minimization, and MKL

Dual Augmented Lagrangian, Proximal Minimization, and MKL Dual Augmented Lagrangian, Proximal Minimization, and MKL Ryota Tomioka 1, Taiji Suzuki 1, and Masashi Sugiyama 2 1 University of Tokyo 2 Tokyo Institute of Technology 2009-09-15 @ TU Berlin (UT / Tokyo

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

l 1 and l 2 Regularization

l 1 and l 2 Regularization David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Online Learning for Matrix Factorization and Sparse Coding

Online Learning for Matrix Factorization and Sparse Coding Online Learning for Matrix Factorization and Sparse Coding Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro To cite this version: Julien Mairal, Francis Bach, Jean Ponce, Guillermo Sapiro. Online

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

(k, q)-trace norm for sparse matrix factorization

(k, q)-trace norm for sparse matrix factorization (k, q)-trace norm for sparse matrix factorization Emile Richard Department of Electrical Engineering Stanford University emileric@stanford.edu Guillaume Obozinski Imagine Ecole des Ponts ParisTech guillaume.obozinski@imagine.enpc.fr

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

THe linear decomposition of data using a few elements

THe linear decomposition of data using a few elements 1 Task-Driven Dictionary Learning Julien Mairal, Francis Bach, and Jean Ponce arxiv:1009.5358v2 [stat.ml] 9 Sep 2013 Abstract Modeling data with linear combinations of a few elements from a learned dictionary

More information

Multiple kernel learning for multiple sources

Multiple kernel learning for multiple sources Multiple kernel learning for multiple sources Francis Bach INRIA - Ecole Normale Supérieure NIPS Workshop - December 2008 Talk outline Multiple sources in computer vision Multiple kernel learning (MKL)

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Lasso, Ridge, and Elastic Net

Lasso, Ridge, and Elastic Net Lasso, Ridge, and Elastic Net David Rosenberg New York University February 7, 2017 David Rosenberg (New York University) DS-GA 1003 February 7, 2017 1 / 29 Linearly Dependent Features Linearly Dependent

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

Greedy Dictionary Selection for Sparse Representation

Greedy Dictionary Selection for Sparse Representation Greedy Dictionary Selection for Sparse Representation Volkan Cevher Rice University volkan@rice.edu Andreas Krause Caltech krausea@caltech.edu Abstract We discuss how to construct a dictionary by selecting

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

An Efficient Proximal Gradient Method for General Structured Sparse Learning

An Efficient Proximal Gradient Method for General Structured Sparse Learning Journal of Machine Learning Research 11 (2010) Submitted 11/2010; Published An Efficient Proximal Gradient Method for General Structured Sparse Learning Xi Chen Qihang Lin Seyoung Kim Jaime G. Carbonell

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Convex Coding. David M. Bradley, J. Andrew Bagnell CMU-RI-TR May, 2009

Convex Coding. David M. Bradley, J. Andrew Bagnell CMU-RI-TR May, 2009 Convex Coding David M. Bradley, J. Andrew Bagnell CMU-RI-TR-09-22 May, 2009 Robotics Institute Carnegie Mellon University Pittsburgh, Pennsylvania 15213 c Carnegie Mellon University I Abstract Inspired

More information

Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries

Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries Jarvis Haupt University of Minnesota Department of Electrical and Computer Engineering Supported by Motivation New Agile Sensing

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Spectral k-support Norm Regularization

Spectral k-support Norm Regularization Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Oslo Class 4 Early Stopping and Spectral Regularization

Oslo Class 4 Early Stopping and Spectral Regularization RegML2017@SIMULA Oslo Class 4 Early Stopping and Spectral Regularization Lorenzo Rosasco UNIGE-MIT-IIT June 28, 2016 Learning problem Solve min w E(w), E(w) = dρ(x, y)l(w x, y) given (x 1, y 1 ),..., (x

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Is the test error unbiased for these programs?

Is the test error unbiased for these programs? Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x

More information

Sparse & Redundant Signal Representation, and its Role in Image Processing

Sparse & Redundant Signal Representation, and its Role in Image Processing Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

On Optimal Frame Conditioners

On Optimal Frame Conditioners On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,

More information