LASSO. Step. My modified algorithm getlasso() produces the same sequence of coefficients as the R function lars() when run on the diabetes data set:

Size: px
Start display at page:

Download "LASSO. Step. My modified algorithm getlasso() produces the same sequence of coefficients as the R function lars() when run on the diabetes data set:"

Transcription

1 lars/lasso 1 Homework 9 stepped you through the lasso modification of the LARS algorithm, based on the papers by Efron, Hastie, Johnstone, and Tibshirani (24) (= the LARS paper) and by Rosset and Zhu (27). I made a few small changes to the algorithm in the LARS paper. I used the diabetes data set in R (the data used as an illustration in the LARS paper) to test my modification: library(lars) data(diabetes) # load the data set LL = lars(diabetes$x,diabetes$y,type="lasso") plot(ll,xvar="step") # one variation on the usual plot LASSO Standardized Coefficients Step My modified algorithm getlasso() produces the same sequence of coefficients as the R function lars() when run on the diabetes data set: J= getlasso(diabetes$y,diabetes$x) round(ll$bet[-1,]-t(j$output["coef.end",,]),6)# gives all zeros I am fairly confident that my changes lead to the same sequences of lasso fits. I find my plots, which are slightly different from those produced by the lars library, more helpful for understanding the algorithm: Statistics 312/612, Fall 21

2 lars/lasso 2 show2(j) # my plot Modified lasso algorithm for diabetes data ltg bmi, ldl map hdl, tch age, glu bj's tc bmi, ltg map, tch, glu age, tc, ldl Cj's hdl [For higher resolution, see the pdf files attached to this handout.] 1 The Lasso problem The problem is: given an n 1 vector y and an n p matrix X, find the ˆb(λ) that minimizes L λ (b) = y Xb 2 + 2λ b j for each λ. [The extra factor of 2 eliminates many factors of 2 in what follows.] The columns of X will be assumed to be standardized to have zero means and X j = 1. I also seem to need linear independence of various subsets of columns of X, which would be awkward if p were larger than n. Remark. I thought the vector y was also supposed to have a zero mean. That is not the case for the diabetes data set in R. It does not seem to be needed for the algorithm to work. I will consider only the one at a time case (page 417 of the LARS paper), for which the active set of predictors X j changes only by either addition or deletion of a single predictor. Statistics 312/612, Fall 21

3 lars/lasso 3 2 Directional derivative <1> The function L λ has derivative at b in the direction u defined by L λ(b, u) = lim t L λ (b + tu) L λ (b) t = 2 j λd(b j, u j ) (y Xb) X j u j where D(b j, u j ) = u j {b j > } u j {b j < } + u j {b j = }. By convexity, a vector b minimizes L λ if and only if L λ (b, u) for every u. Equivalently, for every j and every u j the jth summand in <1> must be nonnegative. [Consider u vectors with only one nonzero component to establish this equivalence.] That is, b minimizes L λ if and only if λd(b j, u j ) (y Xb) X j u j for every j and every u j When b j the inequalities for u j = ±1 imply an equality; for b j = we get only the inequality. Thus b minimizes L λ if and only if λ = X jr if b j > λ = X jr if b j < where R := y Xb λ X jr if b j = The LARS/lasso algorithm recursively calculates a sequence of breakpoints > λ 1 > λ 2 > with ˆb(λ) linear for each interval λ k+1 λ λ k. Define residual vector and correlations R(λ) := y Xˆb(λ) and C j (λ) := X jr(λ). Remark. To get a true correlation we would have to divide by R(λ), which would complicate the constraints. <2> The algorithm will ensure that λ = C j (λ) if ˆb j (λ) > (constraint ) λ = C j (λ) if ˆb j (λ) < (constraint ) λ C j (λ) if ˆb j (λ) = (constraint ) That is, for the minimizing ˆb(λ) each (λ, C j (λ)) needs to stay inside the region R = {(λ, c) R + R : c λ}, moving along the top bounday (c = λ) when b j (λ) > (constraint ), along the lower boundary (c = λ) when ˆb j (λ) < (constraint ), and being anywhere in R when ˆb j (λ) = (constraint ). Statistics 312/612, Fall 21

4 lars/lasso 4 3 The algorithm The solution ˆb(λ) is continuous in λ and linear on intervals defined by change points = λ > λ 1 > λ 2 > >. The construction proceeds in steps, starting with large λ and working towards λ =. The vector of fitted values ˆf(λ) = Xˆb(λ) is also piecewise linear. Within each interval (λ k+1, λ k ) only the active subset A = A k = {j : ˆb j (λ) } of the coefficients changes; the inactive coefficients stay fixed at zero. 3.1 Some illuminating special cases It helped me to work explicitly through the first few steps before thinking about the equations that define a general step in the algorithm. Start with A = and ˆb(λ) = for λ λ 1 := max X jy. Constraint is satisfied on [λ 1, ). Step 1. Constraint would be violated if we kept ˆb(λ) equal to zero for λ < λ 1, because we would have max j C j (λ) > λ. The ˆb(λ) must move away from zero as λ decreases below λ 1. We must have C j (λ 1 ) = λ 1 for at least one j. For convenience of exposition, suppose C 1 (λ 1 ) = λ 1 > C j (λ 1 ) for all j 2. The active set now becomes A = {1}. For λ 2 λ < λ 1, with λ 2 to be specified soon, keep ˆb j (λ) = for j 2 but let ˆb1 (λ) = + v 1 (λ 1 λ) for some constant v 1. To maintain the equalities λ = C 1 (λ) = X 1(y X 1ˆb1 (λ)) = C 1 (λ 1 ) X 1X 1 v 1 (λ 1 λ) = λ 1 v 1 (λ 1 λ) we need v 1 = 1. This choice also ensures that ˆb 1 (λ) > for a while, so that is the relevant constraint for ˆb 1. For λ < λ 1, with v 1 = 1 we have R(λ) = y X 1 (λ 1 λ) and C j (λ) = C j (λ 1 ) a j (λ 1 λ) where a j := X jx 1. Notice that a j < 1 unless X j = ±X 1. Also, as long as max j 2 C j (λ) λ the other ˆb j s still satisfy constraint. Statistics 312/612, Fall 21

5 lars/lasso 5 We need to end the first step at λ 2, the largest λ less than λ 1 for which max j 2 C j (λ) = λ. Solve for C j (λ) = ±λ for each fixed j 2: λ = λ 1 (λ 1 λ) = C j (λ 1 ) a j (λ 1 λ) λ = λ 1 + (λ 1 λ) = C j (λ 1 ) a j (λ 1 λ) if and only if λ 1 λ = (λ 1 C j (λ 1 )) /(1 a j ) λ 1 λ = (λ 1 + C j (λ 1 )) /(1 + a j ) <3> Both right-hand sides are strictly positive. Thus λ 2 = λ 1 λ where ( λ1 C j (λ 1 ) λ := min j 2 min, λ ) 1 + C j (λ 1 ) 1 a j 1 + a j Second step. We have C 1 (λ 2 ) = λ 2 = max j 2 C j (λ 2 ), by construction. For convenience of exposition, suppose C 2 (λ 2 ) = λ 2 > C j (λ 2 ) for all j 3. The active set now becomes A = {1, 2}. To emphasize a subtle point it helps to consider separately two cases. Write s 2 for sign(c 2 (λ 2 )), so that C 2 (λ 2 ) = s 2 λ 2. case s 2 = +1: For λ 3 λ < λ 2 and a new v 1 and v 2 (Note the recycling of notation.), define ˆb1 (λ) = ˆb 1 (λ 2 ) + (λ 2 λ)v 1 ˆb2 (λ) = + (λ 2 λ)v 2 with all other ˆb j s still zero. Write Z for [X 1, X 2 ]. The new C j s become ( ) C j (λ) = X j y X 1ˆb1 (λ) X 2ˆb2 (λ) = C j (λ 2 ) (λ 2 λ)x jzv where v = (v 1, v 2 ). We keep C 1 (λ) = C 2 (λ) = λ if we choose v to make X 1Zv = 1 = X 2v. That is, we need v = (Z Z) 1 1 with 1 = (1, 1). Of course we must assume that X 1 and X 2 are linearly independent for Z Z to have an inverse. Statistics 312/612, Fall 21

6 lars/lasso 6 case s 2 = 1: For λ 3 λ < λ 2 and a new v 1 and v 2, define ˆb1 (λ) = ˆb 1 (λ 2 ) + (λ 2 λ)v 1 ˆb2 (λ) = (λ 2 λ)v 2 (note the change of sign) with all other ˆb j s still zero. Write Z for [X 1, X 2 ]. The tricky business with the signs ensures that Xˆb(λ) Xˆb(λ 2 ) = X 1 (λ 2 λ)v 1 X 2 (λ 2 λ)v 2 = (λ 2 λ)zv. The new C j s become ( ) C j (λ) = X j y X 1ˆb1 (λ) X 2ˆb2 (λ) = C j (λ 2 ) (λ 2 λ)x jzv. We keep C 1 (λ) = C 2 (λ) = λ if we choose v to make X 1Zv = 1 = X 2v. That is, again we need v = (Z Z) 1 1. Remark. If we had assumed C 1 (λ 1 ) = λ 1 then X 1 would be replaced by X 1 in the Z matrix, that is, Z = [s 1 X 1, s 2 X 2 ] with s 1 = sign(c 1 (λ 1 )) and s 2 = sign(c 2 (λ 2 )). Because ˆb 1 (λ 2 ) >, the correlation C 1 (λ) stays on the correct boundary for the constraint. If s 2 = +1 we need v 2 > to keep ˆb 2 (λ) > and C 2 (λ) = λ, satisfying. If s 2 = 1 we also need v 2 > to keep ˆb 2 (λ) < and C 2 (λ) = λ, satisfying. That is, in both cases we need v 2 >. Why do we get a strictly positive v 2? Write ρ for s 2 X 2X 1. As the Z Z matrix is nonsingular we must have ρ < 1 so that v = ( 1 ρ ρ 1 ) 1 ( 1 ρ 1 = (1 ρ 2 ) 1 ρ 1 and v 2 = v 1 = (1 ρ)/(1 ρ 2 ) >. If no further C j (λ) s were to hit the ±λ boundary, step 2 could continue all the way to λ =. More typically, we would need to create a new active set at the largest λ 3 strictly smaller than λ 2 for which max j 3 C j (λ) = λ. For the general step there is another possible event that would require a change to the active set: one of the ˆb j (λ) s in the active set might hit zero, threatening to change sign and leave the corresponding C j (λ) on the wrong boundary. I could pursue these special cases further, but it is better to start again for the generic step in the algorithm. ) 1 Statistics 312/612, Fall 21

7 lars/lasso The general algorithm Once again, start with A = and ˆb(λ) = for λ λ 1 := max X jy. Constraint is satisfied on [λ 1, ). At each λ k a new active set A k is defined. During the kth step the parameter λ decreases from λ k to λ k+1. For all j s in the active set A k, the coefficients ˆb j (λ) change linearly and the C j (λ) s move along one of the boundaries of the feasible region: C j (λ) = λ if ˆb j (λ) > and C j (λ) = λ if ˆb j (λ) <. For each inactive j the coefficient ˆb j (λ) remains zero throughout [λ k+1, λ k ]. Step k ends when either an inactive C j (λ) hits a ±λ boundary or if an active ˆb j (λ) becomes zero: λ k+1 is defined as the largest λ less than λ k for which either of these conditions holds: (i) max j / Ak C j (λ) = λ. In that case add the new j A c k for which C j (λ k+1 ) = λ k+1 to the active set, then proceed to step k + 1. (ii) ˆb j (λ) = for some j A k. In that case, remove j from the active set, then proceed to step k + 1. For the diabetes data, the alternative (ii) caused the behavior shown below for 1.3 λ 2.2. Modified lasso algorithm for diabetes data ltg bmi, ldl map hdl, tch age, glu bj's tc bmi, map, ldl, tch, ltg, glu -5.1 Cj's age,, tc, hdl I will show how the C j (λ) s and the ˆb j (λ) s can be chosen so that the conditions <2> are always satisfied. Statistics 312/612, Fall 21

8 lars/lasso 8 At the start of step k (with λ = λ k ), define Z = [s j X j : j A k ], where s j := sign(c j (λ)). That is, Z has a column for each active X j predictor, with the sign flipped if the corresponding C j (λ) is moving along the λ boundary. Define v := (Z Z) 1 1. For λ k+1 λ λ k the active coefficients change linearly, <4> ˆbj (λ) = ˆb j (λ k ) + (λ k λ)v j s j for j A k. The fitted vector, ˆf(λ) := Xˆb(λ) = ˆf(λ k ) + X j (ˆbj (λ) ˆb ) j (λ k ) = ˆf(λ k ) + (λ k λ)zv, j A k the residuals, R(λ) := y ˆf(λ) = R(λ k ) (λ k λ)zv, and the correlations, <5> C j (λ) := X jr(λ) = C j (λ k ) (λ k λ)a j also change linearly. In particular, for j A k, where a j := X jzv, C j (λ) = s j λ k (λ k λ)s j Z jzv = s j λ because Z jz(z Z) 1 1 = 1. The active C j (λ) s move along one of the boundaries ±λ of R. Remember that I am assuming the one at a time condition: the active set changes only by addition of one new predictor in case (i) or by dropping one predictor in case (ii). Suppose index α enters the active set via (i) when λ equals λ k+1 then leaves it via (ii) when λ equals λ l+1. (If α never leaves the active set put λ l+1 equal to.) Throughout the interval (λ l+1, λ k+1 ) both sign(c α (λ)) and sign(ˆb α λ)) stay constant. To ensure that all constraints are satisfied it is enough to prove: (a) For λ only slightly smaller than λ k+1, both C α (λ) and ˆb α (λ) have the same sign. (b) For λ only slightly smaller than λ l+1, constraint is satisfied, that is, C α (λ) λ. Statistics 312/612, Fall 21

9 lars/lasso 9 <6> The analyses are similar for the two cases. They both depend on a neat formula for inversion of symmetric block matrices. Suppose A is an m m nonsingular, symmetric matrix and d is an m 1 vector for which κ := 1 d A 1 d. Then ( ) ( ) 1 A d A 1 + ww /κ w/κ d = where w := A 1 d. 1 w /κ 1/κ This formula captures the ideas involved in Lemma 4 of the LARS paper. For simplicity of notation, I will act in both cases (a) and (b) as if the α is larger than all the other j s that were in the active set. (More formally, I could replace X in what follows by XE for a suitable permutation matrix E.) The old and new Z matrices then have the block form shown in <6>. Case (a): α enters the active set at λ k+1 As for the analysis that led to the expression in <3>, the solutions for λ = C j (λ) with j A c k are given by (λ k λ)(1 a j ) = λ k C j (λ k ) to get λ = C j (λ) (λ k λ)(1 + a j ) = λ k + C j (λ k ) to get λ = C j (λ) Index α is brought into the active set because C α (λ k+1 ) = λ k+1. Thus (λ k λ k+1)(1 s α a α ) = λ k s α C α (λ k ) where s α := sign(c α (λ k+1)). Note that C α (λ k ) < λ k because α was not active during step k. It follows that the right-hand side of the last equality is strictly positive, which implies <7> 1 s α a α >. Throughout a small neighborhood of λ k+1 the sign s α of C α (λ) stays the same. Continue to write Z for the active matrix [s j X j ; j A k ] for λ slightly larger than λ k+1 and denote by Z = [s j X j ; j A k+1 ] = [Z, s α X α ] the new active matrix for λ slightly small than λ k+1. Then ( ) Z Z Z Z d = where d := s α Z X α. d 1 Statistics 312/612, Fall 21

10 lars/lasso 1 Notice that 1 κ = d A 1 d = X αz(z Z) 1 Z X α = HX α 2, <8> where H = Z(Z Z) 1 Z is the matrix that projects vectors orthogonally onto span(z). If X α / span(z) then HX α < X α = 1 so that κ >. From <6>, ( ) (Z ( Z Z) Z) 1 + ww /κ w/κ 1 = where w := s α (Z Z) 1 Z X α. w /κ 1/κ The αth coordinate of the new ṽ = ( Z Z) 1 1 equals ṽ α = κ 1 (1 w 1), which is strictly positive because 1 w 1 = 1 s α X αz(z Z) 1 1 = 1 s α X αzv = 1 s α a α by <5> > by <7>. By <4>, the new ˆb α (λ) = (λ k+1 λ)ṽ α s α has the same sign, s α, as C α (λ). Case (b): α leaves the active set at λ = λ l+1 The roles of Z = [s j X j : 1 j < α] and Z = [Z, s α X α ] are now reversed. For λ slightly larger than λ l+1 the active matrix is Z, and both C α (λ) and ˆb α (λ) have sign s α. Index α leaves the active set because ˆb α (λ l+1 ) =. Thus = ˆb α (λ l ) + (λ l λ l+1 )ṽ α s α where s α := sign(c α (λ l )) and ṽ = ( Z Z) 1 1. sign s α. Consequently, we must have The active ˆb α (λ l ) also had <9> ṽ α <. For λ slightly smaller than λ l+1 the active matrix is Z, and, by <5>, s α C α (λ) = λ l+1 (λ l+1 λ)s α a α = λ + (λ l+1 λ)(1 s α a α ) where a α := X αzv Statistics 312/612, Fall 21

11 lars/lasso 11 We also know that > κṽ α = 1 w 1 = 1 s α a α. Thus s α C α (λ) < λ, and hence C α (λ) < λ, for λ slightly less than λ l+1. The new C α (λ) satisfies constraint as it heads off towards the other boundary. Remark. Clearly there is some sort of duality accounting for the similarities in the arguments for cases (a) and (b), with <6> as the shared mechanism. If we think of λ as time, case (b) is a time reversal of case (a). For case (a) the defining condition (C α hits the boundary) at λ k+1 gives 1 s α v α >, which implies v α >. For case (b), the defining condition (ˆb α hits zero) at λ l+1 gives ṽ a <, which implies 1 s α a α <. Is there some clever way to handle both cases by a duality argument? Maybe I should read more of the LARS paper. 4 My getlasso() function I wrote the R code in the file lasso.r to help me understand the algorithm described in the LARS paper. The function is not particularly elegant or efficient. I wrote in a way that made it easy to examine the output from each step. If verbose=t, lots of messages get written to the console and the function pauses after each step. I also used some calls to browser() while tracking down some annoying bugs related to case (b). d.p. 1 December 21 References Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani (24). Least angle regression. The Annals of Statistics 32 (2), pp Rosset, S. and J. Zhu (27). Piecewise linear regularized solution paths. Annals of Statistics 35 (3), Statistics 312/612, Fall 21

12 Modified lasso algorithm for diabetes data ltg bj's bmi, ldl map hdl, tch age, glu tc bmi, ltg map, tch, glu age, tc, ldl Cj's hdl

13 Modified lasso algorithm for diabetes data bmi bj's ltg age,, map, tc, ldl, hdl, tch, glu bmi, ltg map, tch, glu age, tc, ldl Cj's hdl

14 Modified lasso algorithm for diabetes data bmi ltg bj's map age,, tc, ldl, hdl, tch, glu bmi, ltg map, tch, glu age, tc, ldl Cj's hdl

15 Modified lasso algorithm for diabetes data 55.7 bmi ltg bj's map age,, tc, ldl, tch, glu hdl bmi, ltg map, tch, glu age, tc, ldl Cj's hdl

16 Modified lasso algorithm for diabetes data bmi, ltg bj's map age, tc, ldl, tch, glu hdl bmi, map, ltg tch, glu age tc, ldl Cj's hdl

17 Modified lasso algorithm for diabetes data ltg bj's bmi, ldl map hdl, tch age, glu tc bmi, map, tch, ltg, glu age -2 Cj's ldl, tc, hdl

18 Modified lasso algorithm for diabetes data ltg bj's bmi, ldl map hdl, tch age, glu tc bmi, map, ldl, tch, ltg, glu -5.1 Cj's age,, tc, hdl

The problem is: given an n 1 vector y and an n p matrix X, find the b(λ) that minimizes

The problem is: given an n 1 vector y and an n p matrix X, find the b(λ) that minimizes LARS/lasso 1 Statistics 312/612, fall 21 Homework # 9 Due: Monday 29 November The questions on this sheet all refer to (a slight modification of) the lasso modification of the LARS algorithm, based on

More information

Adaptive Lasso for correlated predictors

Adaptive Lasso for correlated predictors Adaptive Lasso for correlated predictors Keith Knight Department of Statistics University of Toronto e-mail: keith@utstat.toronto.edu This research was supported by NSERC of Canada. OUTLINE 1. Introduction

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Lecture 17 Intro to Lasso Regression

Lecture 17 Intro to Lasso Regression Lecture 17 Intro to Lasso Regression 11 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due today Goals for today introduction to lasso regression the subdifferential

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Linear Programming: Simplex

Linear Programming: Simplex Linear Programming: Simplex Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Linear Programming: Simplex IMA, August 2016

More information

Robust methods and model selection. Garth Tarr September 2015

Robust methods and model selection. Garth Tarr September 2015 Robust methods and model selection Garth Tarr September 2015 Outline 1. The past: robust statistics 2. The present: model selection 3. The future: protein data, meat science, joint modelling, data visualisation

More information

Saharon Rosset 1 and Ji Zhu 2

Saharon Rosset 1 and Ji Zhu 2 Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

Homework 5 ADMM, Primal-dual interior point Dual Theory, Dual ascent

Homework 5 ADMM, Primal-dual interior point Dual Theory, Dual ascent Homework 5 ADMM, Primal-dual interior point Dual Theory, Dual ascent CMU 10-725/36-725: Convex Optimization (Fall 2017) OUT: Nov 4 DUE: Nov 18, 11:59 PM START HERE: Instructions Collaboration policy: Collaboration

More information

Lasso applications: regularisation and homotopy

Lasso applications: regularisation and homotopy Lasso applications: regularisation and homotopy M.R. Osborne 1 mailto:mike.osborne@anu.edu.au 1 Mathematical Sciences Institute, Australian National University Abstract Ti suggested the use of an l 1 norm

More information

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club

Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM H. E. Krogstad, IMF, Spring 2012 Karush-Kuhn-Tucker (KKT) Theorem is the most central theorem in constrained optimization, and since the proof is scattered

More information

The lasso an l 1 constraint in variable selection

The lasso an l 1 constraint in variable selection The lasso algorithm is due to Tibshirani who realised the possibility for variable selection of imposing an l 1 norm bound constraint on the variables in least squares models and then tuning the model

More information

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint B.A. Turlach School of Mathematics and Statistics (M19) The University of Western Australia 35 Stirling Highway,

More information

Frequentist Accuracy of Bayesian Estimates

Frequentist Accuracy of Bayesian Estimates Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University Bayesian Inference Parameter: µ Ω Observed data: x Prior: π(µ) Probability distributions: Parameter of interest: { fµ (x), µ

More information

Math Models of OR: Some Definitions

Math Models of OR: Some Definitions Math Models of OR: Some Definitions John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA September 2018 Mitchell Some Definitions 1 / 20 Active constraints Outline 1 Active constraints

More information

Online Supplement: Managing Hospital Inpatient Bed Capacity through Partitioning Care into Focused Wings

Online Supplement: Managing Hospital Inpatient Bed Capacity through Partitioning Care into Focused Wings Online Supplement: Managing Hospital Inpatient Bed Capacity through Partitioning Care into Focused Wings Thomas J. Best, Burhaneddin Sandıkçı, Donald D. Eisenstein University of Chicago Booth School of

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written 11.8 Inequality Constraints 341 Because by assumption x is a regular point and L x is positive definite on M, it follows that this matrix is nonsingular (see Exercise 11). Thus, by the Implicit Function

More information

Model Selection and Estimation in Regression with Grouped Variables 1

Model Selection and Estimation in Regression with Grouped Variables 1 DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1095 November 9, 2004 Model Selection and Estimation in Regression with Grouped Variables 1

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

CS261: Problem Set #3

CS261: Problem Set #3 CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:

More information

Linear Programming Redux

Linear Programming Redux Linear Programming Redux Jim Bremer May 12, 2008 The purpose of these notes is to review the basics of linear programming and the simplex method in a clear, concise, and comprehensive way. The book contains

More information

LEAST ANGLE REGRESSION 469

LEAST ANGLE REGRESSION 469 LEAST ANGLE REGRESSION 469 Specifically for the Lasso, one alternative strategy for logistic regression is to use a quadratic approximation for the log-likelihood. Consider the Bayesian version of Lasso

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Matroids and Greedy Algorithms Date: 10/31/16

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Matroids and Greedy Algorithms Date: 10/31/16 60.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Matroids and Greedy Algorithms Date: 0/3/6 6. Introduction We talked a lot the last lecture about greedy algorithms. While both Prim

More information

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Lecture 9: Elementary Matrices

Lecture 9: Elementary Matrices Lecture 9: Elementary Matrices Review of Row Reduced Echelon Form Consider the matrix A and the vector b defined as follows: 1 2 1 A b 3 8 5 A common technique to solve linear equations of the form Ax

More information

Ramanujan Graphs of Every Size

Ramanujan Graphs of Every Size Spectral Graph Theory Lecture 24 Ramanujan Graphs of Every Size Daniel A. Spielman December 2, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class. The

More information

Notes taken by Graham Taylor. January 22, 2005

Notes taken by Graham Taylor. January 22, 2005 CSC4 - Linear Programming and Combinatorial Optimization Lecture : Different forms of LP. The algebraic objects behind LP. Basic Feasible Solutions Notes taken by Graham Taylor January, 5 Summary: We first

More information

ALGEBRAIC GEOMETRY I - FINAL PROJECT

ALGEBRAIC GEOMETRY I - FINAL PROJECT ALGEBRAIC GEOMETRY I - FINAL PROJECT ADAM KAYE Abstract This paper begins with a description of the Schubert varieties of a Grassmannian variety Gr(k, n) over C Following the technique of Ryan [3] for

More information

A significance test for the lasso

A significance test for the lasso 1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,

More information

Designing Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 11

Designing Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 11 EECS 6A Designing Information Devices and Systems I Spring 206 Elad Alon, Babak Ayazifar Homework This homework is due April 9, 206, at Noon.. Homework process and study group Who else did you work with

More information

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2

x 1 + x 2 2 x 1 x 2 1 x 2 2 min 3x 1 + 2x 2 Lecture 1 LPs: Algebraic View 1.1 Introduction to Linear Programming Linear programs began to get a lot of attention in 1940 s, when people were interested in minimizing costs of various systems while

More information

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88 Math Camp 2010 Lecture 4: Linear Algebra Xiao Yu Wang MIT Aug 2010 Xiao Yu Wang (MIT) Math Camp 2010 08/10 1 / 88 Linear Algebra Game Plan Vector Spaces Linear Transformations and Matrices Determinant

More information

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On Basics of Proofs The Putnam is a proof based exam and will expect you to write proofs in your solutions Similarly, Math 96 will also require you to write proofs in your homework solutions If you ve seen

More information

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as Reading [SB], Ch. 16.1-16.3, p. 375-393 1 Quadratic Forms A quadratic function f : R R has the form f(x) = a x. Generalization of this notion to two variables is the quadratic form Q(x 1, x ) = a 11 x

More information

Languages, regular languages, finite automata

Languages, regular languages, finite automata Notes on Computer Theory Last updated: January, 2018 Languages, regular languages, finite automata Content largely taken from Richards [1] and Sipser [2] 1 Languages An alphabet is a finite set of characters,

More information

A significance test for the lasso

A significance test for the lasso 1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra

More information

Homework sheet 4: EIGENVALUES AND EIGENVECTORS. DIAGONALIZATION (with solutions) Year ? Why or why not? 6 9

Homework sheet 4: EIGENVALUES AND EIGENVECTORS. DIAGONALIZATION (with solutions) Year ? Why or why not? 6 9 Bachelor in Statistics and Business Universidad Carlos III de Madrid Mathematical Methods II María Barbero Liñán Homework sheet 4: EIGENVALUES AND EIGENVECTORS DIAGONALIZATION (with solutions) Year - Is

More information

Recall the convention that, for us, all vectors are column vectors.

Recall the convention that, for us, all vectors are column vectors. Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 7 Interpolation Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Optimization Problems

Optimization Problems Optimization Problems The goal in an optimization problem is to find the point at which the minimum (or maximum) of a real, scalar function f occurs and, usually, to find the value of the function at that

More information

Equivalent Forms of the Axiom of Infinity

Equivalent Forms of the Axiom of Infinity Equivalent Forms of the Axiom of Infinity Axiom of Infinity 1. There is a set that contains each finite ordinal as an element. The Axiom of Infinity is the axiom of Set Theory that explicitly asserts that

More information

Introduction to Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Introduction to Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Introduction to Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Module - 03 Simplex Algorithm Lecture 15 Infeasibility In this class, we

More information

A PRIMER ON SESQUILINEAR FORMS

A PRIMER ON SESQUILINEAR FORMS A PRIMER ON SESQUILINEAR FORMS BRIAN OSSERMAN This is an alternative presentation of most of the material from 8., 8.2, 8.3, 8.4, 8.5 and 8.8 of Artin s book. Any terminology (such as sesquilinear form

More information

Analysis of Algorithms I: Perfect Hashing

Analysis of Algorithms I: Perfect Hashing Analysis of Algorithms I: Perfect Hashing Xi Chen Columbia University Goal: Let U = {0, 1,..., p 1} be a huge universe set. Given a static subset V U of n keys (here static means we will never change the

More information

The dual simplex method with bounds

The dual simplex method with bounds The dual simplex method with bounds Linear programming basis. Let a linear programming problem be given by min s.t. c T x Ax = b x R n, (P) where we assume A R m n to be full row rank (we will see in the

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Farkas Lemma, Dual Simplex and Sensitivity Analysis

Farkas Lemma, Dual Simplex and Sensitivity Analysis Summer 2011 Optimization I Lecture 10 Farkas Lemma, Dual Simplex and Sensitivity Analysis 1 Farkas Lemma Theorem 1. Let A R m n, b R m. Then exactly one of the following two alternatives is true: (i) x

More information

A note on the group lasso and a sparse group lasso

A note on the group lasso and a sparse group lasso A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty

More information

The lars Package. R topics documented: May 17, Version Date Title Least Angle Regression, Lasso and Forward Stagewise

The lars Package. R topics documented: May 17, Version Date Title Least Angle Regression, Lasso and Forward Stagewise The lars Package May 17, 2007 Version 0.9-7 Date 2007-05-16 Title Least Angle Regression, Lasso and Forward Stagewise Author Trevor Hastie and Brad Efron

More information

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i ) Direct Methods for Linear Systems Chapter Direct Methods for Solving Linear Systems Per-Olof Persson persson@berkeleyedu Department of Mathematics University of California, Berkeley Math 18A Numerical

More information

Math 61CM - Solutions to homework 6

Math 61CM - Solutions to homework 6 Math 61CM - Solutions to homework 6 Cédric De Groote November 5 th, 2018 Problem 1: (i) Give an example of a metric space X such that not all Cauchy sequences in X are convergent. (ii) Let X be a metric

More information

10725/36725 Optimization Homework 4

10725/36725 Optimization Homework 4 10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages

More information

Chapter 2: Vector Geometry

Chapter 2: Vector Geometry Chapter 2: Vector Geometry Daniel Chan UNSW Semester 1 2018 Daniel Chan (UNSW) Chapter 2: Vector Geometry Semester 1 2018 1 / 32 Goals of this chapter In this chapter, we will answer the following geometric

More information

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota Submitted to the Annals of Statistics arxiv: math.pr/0000000 SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION By Wei Liu and Yuhong Yang University of Minnesota In

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization. A Lasso Solver

MS&E 318 (CME 338) Large-Scale Numerical Optimization. A Lasso Solver Stanford University, Dept of Management Science and Engineering MS&E 318 (CME 338) Large-Scale Numerical Optimization Instructor: Michael Saunders Spring 2011 Final Project Due Friday June 10 A Lasso Solver

More information

9. Numerical linear algebra background

9. Numerical linear algebra background Convex Optimization Boyd & Vandenberghe 9. Numerical linear algebra background matrix structure and algorithm complexity solving linear equations with factored matrices LU, Cholesky, LDL T factorization

More information

Missing dependent variables in panel data models

Missing dependent variables in panel data models Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units

More information

Standard forms for writing numbers

Standard forms for writing numbers Standard forms for writing numbers In order to relate the abstract mathematical descriptions of familiar number systems to the everyday descriptions of numbers by decimal expansions and similar means,

More information

EXAMPLES OF PROOFS BY INDUCTION

EXAMPLES OF PROOFS BY INDUCTION EXAMPLES OF PROOFS BY INDUCTION KEITH CONRAD 1. Introduction In this handout we illustrate proofs by induction from several areas of mathematics: linear algebra, polynomial algebra, and calculus. Becoming

More information

Pathwise coordinate optimization

Pathwise coordinate optimization Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,

More information

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs

CS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs CS26: A Second Course in Algorithms Lecture #2: Applications of Multiplicative Weights to Games and Linear Programs Tim Roughgarden February, 206 Extensions of the Multiplicative Weights Guarantee Last

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =

More information

1 Maintaining a Dictionary

1 Maintaining a Dictionary 15-451/651: Design & Analysis of Algorithms February 1, 2016 Lecture #7: Hashing last changed: January 29, 2016 Hashing is a great practical tool, with an interesting and subtle theory too. In addition

More information

5 + 9(10) + 3(100) + 0(1000) + 2(10000) =

5 + 9(10) + 3(100) + 0(1000) + 2(10000) = Chapter 5 Analyzing Algorithms So far we have been proving statements about databases, mathematics and arithmetic, or sequences of numbers. Though these types of statements are common in computer science,

More information

Review Solutions, Exam 2, Operations Research

Review Solutions, Exam 2, Operations Research Review Solutions, Exam 2, Operations Research 1. Prove the weak duality theorem: For any x feasible for the primal and y feasible for the dual, then... HINT: Consider the quantity y T Ax. SOLUTION: To

More information

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2 Final Review Sheet The final will cover Sections Chapters 1,2,3 and 4, as well as sections 5.1-5.4, 6.1-6.2 and 7.1-7.3 from chapters 5,6 and 7. This is essentially all material covered this term. Watch

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

LAGRANGE MULTIPLIERS

LAGRANGE MULTIPLIERS LAGRANGE MULTIPLIERS MATH 195, SECTION 59 (VIPUL NAIK) Corresponding material in the book: Section 14.8 What students should definitely get: The Lagrange multiplier condition (one constraint, two constraints

More information

CHAPTER 1: Functions

CHAPTER 1: Functions CHAPTER 1: Functions 1.1: Functions 1.2: Graphs of Functions 1.3: Basic Graphs and Symmetry 1.4: Transformations 1.5: Piecewise-Defined Functions; Limits and Continuity in Calculus 1.6: Combining Functions

More information

Here we used the multiindex notation:

Here we used the multiindex notation: Mathematics Department Stanford University Math 51H Distributions Distributions first arose in solving partial differential equations by duality arguments; a later related benefit was that one could always

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with Appendix: Matrix Estimates and the Perron-Frobenius Theorem. This Appendix will first present some well known estimates. For any m n matrix A = [a ij ] over the real or complex numbers, it will be convenient

More information

Sergey Norin Department of Mathematics and Statistics McGill University Montreal, Quebec H3A 2K6, Canada. and

Sergey Norin Department of Mathematics and Statistics McGill University Montreal, Quebec H3A 2K6, Canada. and NON-PLANAR EXTENSIONS OF SUBDIVISIONS OF PLANAR GRAPHS Sergey Norin Department of Mathematics and Statistics McGill University Montreal, Quebec H3A 2K6, Canada and Robin Thomas 1 School of Mathematics

More information

Functional Analysis MATH and MATH M6202

Functional Analysis MATH and MATH M6202 Functional Analysis MATH 36202 and MATH M6202 1 Inner Product Spaces and Normed Spaces Inner Product Spaces Functional analysis involves studying vector spaces where we additionally have the notion of

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Sparsity and the Lasso

Sparsity and the Lasso Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is

More information

Lecture 3: Sizes of Infinity

Lecture 3: Sizes of Infinity Math/CS 20: Intro. to Math Professor: Padraic Bartlett Lecture 3: Sizes of Infinity Week 2 UCSB 204 Sizes of Infinity On one hand, we know that the real numbers contain more elements than the rational

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

ON THE MATRIX EQUATION XA AX = X P

ON THE MATRIX EQUATION XA AX = X P ON THE MATRIX EQUATION XA AX = X P DIETRICH BURDE Abstract We study the matrix equation XA AX = X p in M n (K) for 1 < p < n It is shown that every matrix solution X is nilpotent and that the generalized

More information

satisfying ( i ; j ) = ij Here ij = if i = j and 0 otherwise The idea to use lattices is the following Suppose we are given a lattice L and a point ~x

satisfying ( i ; j ) = ij Here ij = if i = j and 0 otherwise The idea to use lattices is the following Suppose we are given a lattice L and a point ~x Dual Vectors and Lower Bounds for the Nearest Lattice Point Problem Johan Hastad* MIT Abstract: We prove that given a point ~z outside a given lattice L then there is a dual vector which gives a fairly

More information

Page 52. Lecture 3: Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 2008/10/03 Date Given: 2008/10/03

Page 52. Lecture 3: Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 2008/10/03 Date Given: 2008/10/03 Page 5 Lecture : Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 008/10/0 Date Given: 008/10/0 Inner Product Spaces: Definitions Section. Mathematical Preliminaries: Inner

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

Chapter 3, Operations Research (OR)

Chapter 3, Operations Research (OR) Chapter 3, Operations Research (OR) Kent Andersen February 7, 2007 1 Linear Programs (continued) In the last chapter, we introduced the general form of a linear program, which we denote (P) Minimize Z

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Welsh s problem on the number of bases of matroids

Welsh s problem on the number of bases of matroids Welsh s problem on the number of bases of matroids Edward S. T. Fan 1 and Tony W. H. Wong 2 1 Department of Mathematics, California Institute of Technology 2 Department of Mathematics, Kutztown University

More information

A NEW FACE METHOD FOR LINEAR PROGRAMMING. 1. Introduction We are concerned with the standard linear programming (LP) problem

A NEW FACE METHOD FOR LINEAR PROGRAMMING. 1. Introduction We are concerned with the standard linear programming (LP) problem A NEW FACE METHOD FOR LINEAR PROGRAMMING PING-QI PAN Abstract. An attractive feature of the face method [9 for solving LP problems is that it uses the orthogonal projection of the negative objective gradient

More information

1 Matrices and Systems of Linear Equations. a 1n a 2n

1 Matrices and Systems of Linear Equations. a 1n a 2n March 31, 2013 16-1 16. Systems of Linear Equations 1 Matrices and Systems of Linear Equations An m n matrix is an array A = (a ij ) of the form a 11 a 21 a m1 a 1n a 2n... a mn where each a ij is a real

More information

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization Spring 2017 CO 250 Course Notes TABLE OF CONTENTS richardwu.ca CO 250 Course Notes Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4, 2018 Table

More information