The lasso an l 1 constraint in variable selection

Size: px
Start display at page:

Download "The lasso an l 1 constraint in variable selection"

Transcription

1 The lasso algorithm is due to Tibshirani who realised the possibility for variable selection of imposing an l 1 norm bound constraint on the variables in least squares models and then tuning the model estimation calculation using this bound. Considerable interest has been generated in this procedure by the discovery by Osborne, Presnell, and Turlach that the complete solution trajectory parametrised by this bound can be calculated very efficiently (the homotopy algorithm). This has resulted in the study both of the selection problem for different objective and constraint choices and of applications to such areas as data compression and the generation of sparse solutions of very under-determined systems. One class of generalisation is to piecewise linear systems - one example is quantile regression. In this case the selection problem can be formulated as a linear program and post-optimality procedures used to generate the solution trajectory. Our original continuation idea also extends in an interesting two phase procedure which has significant computational advantages over the LP approach. However, it is significantly less effective than that of the original homotopy algorithm for least squares objectives. The underlying problem is easier to state than resolve. In contrast to the smooth objective case a relatively efficient descent algorithm is available for fixed values of the constraint bound. This is joint work with Berwin Turlach.

2 Outline Introduction LSQ Descent LSQ Homotopy l 1 Descent l 1 Homotopy Results Other References

3 Original formulation Start with linear model: r = y Xβ, X : R p R n, where rank X = min(p, n). Problem: select small subset of columns of X so that r 2 is small in an appropriate sense. Applications: 1. Exploratory data analysis (y signal observed in presence of noise). Here case of most interest corresponds to p < n. 2. Economise representation of a sampled signal in manner compatible with adequate image reconstruction. Here case of interest corresponds to p n. Aim is data compression.

4 Tibshirani Add l 1 constraint gor1 1 min β 2 r 2 2 ; β 1 κ.

5 Tibshirani Add l 1 constraint gor1 1 min β 2 r 2 2 ; β 1 κ. Can be written as QP by introducing slack variables and positivity constraints β i = u i v i, u i, v i 0, i = 1, 2,, p, p β 1 = (u i + v i ). i=1

6 Tibshirani Add l 1 constraint gor1 1 min β 2 r 2 2 ; β 1 κ. Can be written as QP by introducing slack variables and positivity constraints β i = u i v i, u i, v i 0, i = 1, 2,, p, p β 1 = (u i + v i ). i=1 Osborne, Presnell, and Turlach treat constraint directly.

7 The lasso in variable selection

8 Necessary conditions golsqr Then Let µ be the Lagrange multiplier for the l1 constraint. r T X = µu T, µ 0, u T ε β 1, µ = rt Xβ β 1. Note µ = 0 if κ β LS 1. Introduce index set ψ pointing to the nonzero components of β (the currently selected variables) and a permutation matrix Q ψ which collects together these nonzero components. Then β = Q T ψ [ βψ 0 ], u = Q T ψ [ θψ u 2 ] ε β 1, (θ ψ ) j = sgn(β ψ(j) ), 1 (u 2 ) k 1, kεψ c, ψ ψ c = {1, 2,, p}, u T β = β 1, u = 1.

9 Matrix factorization Partial orthogonal factorization helps to simplify calculations. [ ] [ ] XQψ T = S U1 U 12, S T c1 y =. 0 B Necessary conditions become [ ] {[ ] [ U T 1 0 c1 U1 U 12 0 B U T 12 B T Solving gives c 2 ] [ βψ 0 c 2 ]} U 1 β ψ = c 1 µw ψ, w ψ = U T θ ψ, µu 2 = B T c 2 + µu T 12 w ψ, κ = w T ψ c 1 µ w ψ 2 2. [ θψ = µ u 2 ]. Note the linear relation between κ and µ. Also condition 1 (u 2 ) i 1. goot gost

10 Descent direction Key feature is explicit treatment of l 1 constraint by insisting that θ ψ continues to define the norm constraint. Start with feasible β such that θ T ψ β ψ = κ. Find β 1 = β ψ + h by solving Kuhn-Tucker conditions give min r(β) 2 h,θ T ψ h=0 2. X T 1 r(β ψ + h) = µθ ψ, θ T ψ ( βψ + h ) = κ. h 0 is a descent direction. Small enough displacement in direction h retains feasibility and reduces objective. { } 1 β 2 r 2 2 h = h T X1 T r, [ ] = h T X1 T X 1h µθ ψ, = h T X T 1 X 1h < 0, h 0.

11 Optimality test By construction β 1 = β ψ + h satisfies the first necessary condition. If β 1 is feasible then say it is sign feasible. Now u 2 can be tested to see if the second necessary conditions are satisfied. go SD If no, then s, (u2 ) s > 1. This condition triggers variable addition as follows. 1. Select an infeasible multiplier say (u 2 ) s subject to the constraint that B s 2 is not too small. 2. Update: ψ σ {s}, β ψ [ βψ (β s = 0) ] [ θψ, θ ψ (θ s ) Can be shown that this choice of θ ψ ensures sgn h ψ = θ ψ in next descent step. ].

12 Otherwise If the incremented β is not sign feasible then move to first new zero of β in the direction defined by h 0 = β ψ(k) ( β ψ )k + γh k, 0 < γ < 1. Now there are two possibilities. 1. Set θ k θ k, and recompute h. If new h gives descent direction consistent with the updated sign feasibility requirement then continue. This step is relatively cheap. 2. Else must reset ψ σ \ {ψ(k)}, reset β ψ, θ ψ, downdate factorization, and recompute h. This is the backtrack step that derails greedy algorithms.

13 Piecewise linear solution trajectory Need a key result: The minimum of a positive definite quadratic form subject to a bound on the l 1 norm of the variables is stable in the sense that small pertubations in the data lead to small perturbations in the minimum. Start with the necessary conditions U 1 β ψ = c 1 µw ψ, µu 2 = B T c 2 + µu 12 w ψ, µ = wt ψ c 2 κ w T ψ w. ψ If at initial κ both µ u 2 < µ, and β ψ(i) > 0, i = 1, 2,, ψ, then differentiating necessary conditions gives: dµ dκ = 1 dβ ψ w T ψ w,u 1 ψ dκ = 1 w T ψ w w ψ, ψ d(µu 2 ) dκ = 1 w T ψ w U12 T w ψ. ψ

14 Solution trajectory Right hand side of the ODE s is independent of κ. solution trajectory is piecewise linear. This means it is a simple and effective computation to follow the solution trajectory until the basic assumptions break down! The continuity guaranteed by the perurbation result now shows how to restart at the breakpoints. This observation is the basis for the homotopy algorithm of Osborne, Presnell, and Turlach. It proves to be remarkably efficient, computing the entire solution trajectory in little more than the cost of solving the unconstrained problem and returning significant additional information. It links to the standard least squares solution algorithm based on orthogonal factorization by using standard stepwise updating techniques. gosd

15 Breakpoints Solution process breaks down for values of κ for which either β ψ(j) = 0, or (u 2 ) j = ±1. If β ψ(i) = 0 then ψ c ψ c {ψ(i)}. The corresponding component of u 2 is θ i. It must move into the interior of [ 1, 1] from its bound as κ increases in order to preserve solution continuity. This step deletes a variable from the selection. If (u 2 ) j = 1 then ψ ψ {ψ c (j)}. The corresponding component of β must move from 0 as κ increases. The rule θ ψ = sgn (u 2 ) j applies as in the descent algorithm. This step adds a variable to the solution.

16 Properties The number of piecewise linear pieces in the homotopy trajectory is finite. If ψ repeats at κ 1, κ 2, κ 1 < κ 2 then it holds for all κ in between by linearity. r 2 2 monotone decreasing as κ < κ LS increases. 1 d r dκ = rt X dβ dβ = µut dκ dκ, = µθ T 1 ψ w ψ 2 U 1 1 w ψ = µ < 0. 2 To start note that if an unique maximum of X i T y occurs when i = s then the optimal solution for κ small enough is ψ = {s}, µ = X T sy κ X s 2 2, β = θ sκ.

17 Extensions gor1 It turns out that what is important is that the objective should be strictly convex, have degree no more than 2, and have continuous first derivative. Cases considered include: Piecewise quadratics with C 1 smoothness. Variable selection using the Huber M estimator which gives a C 1 combination of quadratic and linear pieces. Quadratic spline approximation of log likelihood functions. L = L (r i (β)) has been considered. Sorting out the pieces adds an extra level of complexity as breakpoints occur when pieces change. More general constraints. For example, the signed rank objective p i=1 w i β τ(i) where w i 0, and τ() ranks the variables in increasing order of magnitude. Turlach et al consider simultaneous selection of a common set of predictor variables for several objectives.

18 l 1 objective A number of applications which involve polyhedral objectives and lasso like constraints have been considered. Perhaps variable selection in quantile regression has received most attention. l 1 lasso corresponds to quantile parameter set to.5. min β r 1, β 1 κ. Need Lagrangian form with multiplier λ L (β, λ) = r 1 + λ { β 1 κ} Convex if λ 0. Necessary conditions give 0 β L (β, λ) = β r 1 + λ β β 1. This is the condition for the minimum of the l 1 minimization problem (λ fixed) min β { r 1 + λ β 1 }.

19 When LP wont do! Basic LP no line search possibility l 1 structurely different line search important

20 Residual zeros are non-smooth points for l 1 To follow zeros set σ = {i : r i = 0}, ψ = {i : β i 0}. Define set complements by σ σ c = {1, 2,, n}, ψ ψ c = {1, 2,, p}, and Permutation matrices P σ : R n R n, Q ψ : R p R p by P σ r = Q ψ β = [ r1 r 2 [ β1 β 2 ], ], P σ XQ T ψ = [ X11 X 12 X 21 X 22 { (r1 ) i = r σ c (i) 0, i = 1, 2,, n σ,, (r 2 ) i = r σ(i) = 0, i = 1, 2,, σ { (β1 ) i = β ψ(i) 0, i = 1, 2,, ψ, (β 2 ) i = β ψ c (i) = 0, i = 1, 2,, p ψ ] [ ] y1, P σ y = y 2

21 Necessary conditions Have subdifferential components for permuted system [ ] [ ] θ T σ v T σ r P σ r 1, θ T ψ β Q ψ β 1. These permit the necessary conditions to be written: [ ] [ ] θ T σ v T X 11 X [ ] 12 σ = λ θ T X 21 X ψ ut ψ, λ 0, 22 ut ψ 1 v i 1, i = 1, 2,, σ, 1 u i 1, i = 1, 2,, ψ c, θ T σ r 1 = [ θ T ] σ v T σ Pσ r = r 1, [ ] θ T ψ β 1 = Q ψ β = β 1 κ. θ T ψ ut ψ golda gomnc

22 Structure of the homotopy The new feature of the extension of the continuation algorithm to the non-smooth case is that it involves two distinct phases. The first uses essentially the constrained form of the problem which involves κ explicitly but not the Lagrange multiplier λ, while the second uses the Lagrangian form which involves the multiplier explicitly but not the constraint bound.

23 Varying κ first homotopy phase Start with κ = κ > 0, κ in an open interval with β determined by conditions r i = 0, i σ, β 1 = κ. This gives conditions σ = ψ 1, θ T ψ β 1 = κ, X 21 β 1 = y 2. Note X 21 has full row rank σ. Differentiating gives θ T dβ 1 ψ dκ = 1, X 21 dβ 1 dκ = 0, dβ 1 dκ = [ θ T ψ X 21 ] 1 e 1. So dβ dκ is constant in a neighbourhood of κ. It follows that β is piecewise linear on intervals of increase of κ.

24 More from the necessary conditions golnc These give Differentiating θ T σ X 11 + v T σ X 21 = λθ T ψ. dv T σ dκ X 21 = dλ dκ θt ψ. Post-multiply by dβ 1 dλ dκ. This gives dκ = 0. Similar consequences give dvσ dκ, du ψ dκ = 0. Thus λ, v σ, u ψ are constant on intervals of increase of κ.

25 Properties dβ dκ is a descent direction for minimizing r 1. ( r 1 β : dβ ) = sup z T X dβ dκ z r 1 dκ, = θ T dβ σ X 1 11 dκ = dβ 1 λθt ψ dκ, [ ] = λθ T θ T 1 ψ ψ e X 1 = λ < There are two possibilities for terminating the κ step: 1. There occurs a new zero residual corresponding to row σ c (k) of X 11. Actions: σ c (k) σ(1), v 1 (λ 0 ) = sgn (r σ c (k)). 2. {β 1 } j = 0. Actions: ψ(j) ψ c (1), u 1 = sgn ({β 1 } j (κ )). Sign conditions necessary to preserve optimality.

26 Varying λ second homotopy phase Have made κ-step κ 0 κ κ 1. Update possibilities: 1. (r 1 (κ 1 )) k = 0: [ (X11 ) X 21 k X 21 ] [ (y1 ), y 2 k y 2 ] 2. (β(κ 1 ) 1 ) j = 0. Action: remove column j from X 21. Now X 21 full rank, σ = ψ, and fixes both β(κ 1 ) 1, κ 1. X 21 β 1 = y 2

27 Governing DE for λ step golnc Differentiating the necessary conditions gives: [ 0 dvt σ dλ ] [ X 11 X 12 X 21 X 22 ] [ = θ T ψ d(λu T ψ) dλ ] Thus dv T σ dλ X 21 = θ T ψ, ( ) d λu T ψ = dvt σ dλ dλ X 22 It follows that dvσ dλ, d(λu ψ) dλ are constant.

28 Reducing λ Necessary conditions continue to hold as λ is reduced while κ = κ 1. Two cases to determine how to terminate this phase. 1. A component of u ψ is first to reach a bound (say u q = e T q u ψ ). Then a ψ ψ {ψ c (q)}. b Increase κ phase recommences. c Corresponding component of β moves away from 0 with sign of bound. 2. A component of v σ is first to reach a bound. a Remove corresponding index from σ, σ σ \ {σ(q)}. b Commence next κ phase. c r q moves from zero with sign of v q.

29 Results: LSQ homotopy golsnc p n XA XD Hald Iowa diabetes housing Table: Step counts for homotopy algorithm least squares objective Here XA steps add variable ψ ψ + 1 while XD steps delete variable ψ ψ 1. Variable addition is much most common action. This explains the observed efficiency. Tibshirani noted that addition is only action when columns of the design are orthogonal.

30 Results: l 1 homotopy p n SASD SAXA XDXA XDSD Hald Iowa diabetes housing Table: Step counts for homotopy algorithm l 1 objective New feature here is residual sign changes trigger points of non differentiability. SA, SD indicate addition and deletion of entries in σ. This is where the extra work is being done as r adapts to the required sign structure. Double entries (eg SA followed by SD) reflect the two phases at each step of the computation.

31 SASD breakdown This table shows that consecutive SASD phases need not complete an l 1 minimisation in subspace defined by non-zero β components. κ variables at 0 subspace SASD steps ,8, , , ,9, ,9, , , Table: Example of backtrack step, diabetes data

32 Variable trajectories β i (κ) β i (κ) κ κ Figure: Homotopy algorithm illustrated on the diabetes data. The left panel shows the complete homotopy. The numbers on the right of this panel label the solution components. The right panel is a magnification of the initial part of the homotopy illustrating the large number of SASD steps that are taken.

33 l 1 descent calculations λ l 1 iterations solution zeros variables selected Table: l 1 descent calculations diabetes data Random initialisation is used. 10 steps are minimum needed for each λ. Total number of iterations is 283. Descent algorithm used a secant based line search.

34 Two class classification problem gor2 Idea is given training data (x 1, y 1 ),, (x n, y n ) where x i R p, y i { 1, 1} find rule so that given new x can assign class from { 1, 1}. Use l 1 -norm SVM: n p min 1 y i β 0 + β j h j (x i ), β β p κ β 0, β i=1 j=1 Basic algorithm applies with very minor modifications to take account of the unconstrained variable. Fitted model is f (x) = β0 + + p β j h j (x). j=1 The class assignment is given by sgn f (x).

35 Dantzig selector [3] gor2 Variable selection when p n and a local approximate orthogonality condition called uniform uncertainty principle holds. Basic form is ( min β 1, X T r 1 + t 1) σ 2 log p. β where σ is variance and t > 0 is a parameter whose choice affects the level of confidence in the results. This is equivalent to a problem of form min β X T r, β 1 κ, which has similar necessary conditions with λ 1 λ so it fits the lasso framework. It is also trying to make small the standard least squares criterion, and normal errors are assumed.

36 Another LP caution [3] suggest that LP be used to implement the Dantzig Selector. Two ways to pose max norm approximation problem as LP. Descent min h,β h; he Xβ y he. Ascent [ max y T y T ] u, u 0 [ e T e T ] X T X T u = e 1. Ascent algorithm identical to first algorithm of Remes. It performs well with systematic data (p step 2 nd order convergent) while descent algorithm is O { n 2}

37 References quadratic objective R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58(1): , M. R. Osborne, B. Presnell, and B. A. Turlach. A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20: , B. A. Turlach, W. N. Venables, and S. J. Wright. Simultaneous variable selection. Technometrics, 47(3): , S. Rosset and Ji Zhu. Piecewise linear regularised solution paths. Annals of Statistics, 35(3): , gorti96 goopt98b gosrjz

38 References piecewise linear objective M. R. Osborne. Simplicial Algorithms for Minimizing Polyhedral Functions. Cambridge University Press, J. Zhu, T. Hastie, S. Rosset, and R. Tibshirani. l 1 -norm Support Vector Machines. Advances in Neural Information Processing Systems, 16:49 56, E. Candes and T. Tao. The Dantzig selector: statistical estimation when p is much larger than n. Annals of Statistics, 35(6): , goct gozhrt

Lasso applications: regularisation and homotopy

Lasso applications: regularisation and homotopy Lasso applications: regularisation and homotopy M.R. Osborne 1 mailto:mike.osborne@anu.edu.au 1 Mathematical Sciences Institute, Australian National University Abstract Ti suggested the use of an l 1 norm

More information

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint B.A. Turlach School of Mathematics and Statistics (M19) The University of Western Australia 35 Stirling Highway,

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Basis Pursuit Denoising and the Dantzig Selector

Basis Pursuit Denoising and the Dantzig Selector BPDN and DS p. 1/16 Basis Pursuit Denoising and the Dantzig Selector West Coast Optimization Meeting University of Washington Seattle, WA, April 28 29, 2007 Michael Friedlander and Michael Saunders Dept

More information

Compressed Sensing in Cancer Biology? (A Work in Progress)

Compressed Sensing in Cancer Biology? (A Work in Progress) Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Hierarchical Penalization

Hierarchical Penalization Hierarchical Penalization Marie Szafransi 1, Yves Grandvalet 1, 2 and Pierre Morizet-Mahoudeaux 1 Heudiasyc 1, UMR CNRS 6599 Université de Technologie de Compiègne BP 20529, 60205 Compiègne Cedex, France

More information

SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE

SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE M. Lázaro 1, I. Santamaría 2, F. Pérez-Cruz 1, A. Artés-Rodríguez 1 1 Departamento de Teoría de la Señal y Comunicaciones

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Support Vector Machines for Regression

Support Vector Machines for Regression COMP-566 Rohan Shah (1) Support Vector Machines for Regression Provided with n training data points {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )} R s R we seek a function f for a fixed ɛ > 0 such that: f(x

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

The Constrained Lasso

The Constrained Lasso The Constrained Lasso Gareth M. ames, Courtney Paulson and Paat Rusmevichientong Abstract Motivated by applications in areas as diverse as finance, image reconstruction, and curve estimation, we introduce

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Common Subset Selection of Inputs in Multiresponse Regression

Common Subset Selection of Inputs in Multiresponse Regression 6 International Joint Conference on Neural Networks Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 6-, 6 Common Subset Selection of Inputs in Multiresponse Regression Timo Similä and

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

On the LASSO and Its Dual

On the LASSO and Its Dual On the LASSO and Its Dual Michael R. Osborne Brett Presnell Berwin A. Turlach June 23, 1999 Abstract Proposed by Tibshirani (1996), the LASSO (least absolute shrinkage and selection operator) estimates

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Pathwise coordinate optimization

Pathwise coordinate optimization Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Appendix A Taylor Approximations and Definite Matrices

Appendix A Taylor Approximations and Definite Matrices Appendix A Taylor Approximations and Definite Matrices Taylor approximations provide an easy way to approximate a function as a polynomial, using the derivatives of the function. We know, from elementary

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

Making Flippy Floppy

Making Flippy Floppy Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

Saharon Rosset 1 and Ji Zhu 2

Saharon Rosset 1 and Ji Zhu 2 Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1 CSCI5654 (Linear Programming, Fall 2013) Lectures 10-12 Lectures 10,11 Slide# 1 Today s Lecture 1. Introduction to norms: L 1,L 2,L. 2. Casting absolute value and max operators. 3. Norm minimization problems.

More information

Symmetries in Experimental Design and Group Lasso Kentaro Tanaka and Masami Miyakawa

Symmetries in Experimental Design and Group Lasso Kentaro Tanaka and Masami Miyakawa Symmetries in Experimental Design and Group Lasso Kentaro Tanaka and Masami Miyakawa Workshop on computational and algebraic methods in statistics March 3-5, Sanjo Conference Hall, Hongo Campus, University

More information

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725 Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

arxiv: v1 [math.oc] 30 Apr 2016

arxiv: v1 [math.oc] 30 Apr 2016 The Homotopy Method Revisited: Computing Solution Paths of l -Regularized Problems Björn Bringmann, Daniel Cremers, Felix Krahmer, Michael Möller arxiv:605.0007v [math.oc] 30 Apr 206 May 3, 206 Abstract

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Adaptive Lasso for correlated predictors

Adaptive Lasso for correlated predictors Adaptive Lasso for correlated predictors Keith Knight Department of Statistics University of Toronto e-mail: keith@utstat.toronto.edu This research was supported by NSERC of Canada. OUTLINE 1. Introduction

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Douglas Bates 8 th International Amsterdam Conference on Multilevel Analysis 2011-03-16 Douglas Bates

More information

LEAST ANGLE REGRESSION 469

LEAST ANGLE REGRESSION 469 LEAST ANGLE REGRESSION 469 Specifically for the Lasso, one alternative strategy for logistic regression is to use a quadratic approximation for the log-likelihood. Consider the Bayesian version of Lasso

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Algebraic Statistics progress report

Algebraic Statistics progress report Algebraic Statistics progress report Joe Neeman December 11, 2008 1 A model for biochemical reaction networks We consider a model introduced by Craciun, Pantea and Rempala [2] for identifying biochemical

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

Canonical Problem Forms. Ryan Tibshirani Convex Optimization

Canonical Problem Forms. Ryan Tibshirani Convex Optimization Canonical Problem Forms Ryan Tibshirani Convex Optimization 10-725 Last time: optimization basics Optimization terology (e.g., criterion, constraints, feasible points, solutions) Properties and first-order

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives

A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives Paul Grigas May 25, 2016 1 Boosting Algorithms in Linear Regression Boosting [6, 9, 12, 15, 16] is an extremely

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Sparsity and the Lasso

Sparsity and the Lasso Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is

More information

LASSO. Step. My modified algorithm getlasso() produces the same sequence of coefficients as the R function lars() when run on the diabetes data set:

LASSO. Step. My modified algorithm getlasso() produces the same sequence of coefficients as the R function lars() when run on the diabetes data set: lars/lasso 1 Homework 9 stepped you through the lasso modification of the LARS algorithm, based on the papers by Efron, Hastie, Johnstone, and Tibshirani (24) (= the LARS paper) and by Rosset and Zhu (27).

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints Instructor: Prof. Kevin Ross Scribe: Nitish John October 18, 2011 1 The Basic Goal The main idea is to transform a given constrained

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for

More information