The problem is: given an n 1 vector y and an n p matrix X, find the b(λ) that minimizes

Size: px
Start display at page:

Download "The problem is: given an n 1 vector y and an n p matrix X, find the b(λ) that minimizes"

Transcription

1 LARS/lasso 1 Statistics 312/612, fall 21 Homework # 9 Due: Monday 29 November The questions on this sheet all refer to (a slight modification of) the lasso modification of the LARS algorithm, based on the papers by Efron, Hastie, Johnstone, and Tibshirani (24) and by Rosset and Zhu (27). I have made a few small changes to the algorithm described in the first paper. I hope I haven t broken anything. The homework problems are embedded into an explanation of how and why the algorithm works. Look for the numbers [1], [2],... in the left margin and the text enclosed in a box. 1 The Lasso problem The problem is: given an n 1 vector y and an n p matrix X, find the b(λ) that minimizes L λ (b) = y Xb 2 + 2λ b j for each λ. [The extra factor of 2 eliminates many factors of 2 in what follows.] The columns of X will be assumed to be standardized to have zero means and X j = 1. Remark. I thought the vector y was also supposed to have a zero mean. That is not the case for the diabetes data set in R, the data set used as an illustration in the LARS paper (Efron, Hastie, Johnstone, and Tibshirani 24) where the LARS procedure was first analyzed. The solution will turn out to be piecewise linear and continuous in λ. To test my modification I also used the diabetes data set: library(lars) data(diabetes) # load the data set #?diabetes for description dd = diabetes LL = lars(diabetes$x,diabetes$y,type="lasso") plot(ll,xvar="step") # the usual plot My modified algorithm gets the same coefficients as lars() for this data set.

2 LARS/lasso 2 Remark. For the sake of exposition, the discussion below assumes that the predictors enter the active set in a way slightly different from what happens for diabetes. I will consider only the one at a time case (page 417 of the LARS paper), for which the active set of predictors X j changes only by either addition or deletion of a single predictor. 2 Directional derivative The function L λ has derivative at b in the direction u defined by <1> L L λ (b + tu) L λ (b) λ (b, u) = lim t t = 2 j λf(b j, u j ) (y Xb) X j u j where f(b j, u j ) = u j {b j > } u j {b j < } + u j {b j = }. By convexity, a vector b minimizes L λ if and only if L λ (b, u) for every u. Equivalently, for every j and every u j the jth summand in <1> must be nonnegative. [Consider u vectors with only one nonzero component to establish this equivalence.] That is, b minimizes L λ if and only if λf(b j, u j ) (y Xb) X j u j for every j and every u j When b j the inequalities for u j = ±1 imply an equality; for b j = we get only the inequality. Thus b minimizes L λ if and only if λ = X j R if b j > λ = X j R if b j < where R := y Xb λ X j R if b j = The LARS/lasso algorithm recursively calculates a sequence of breakpoints > λ 1 > λ 2 > with b(λ) linear for each interval λ k+1 λ λ k. Define residual vector and correlations R(λ) := y X b(λ) and C j (λ) := X jr(λ). Remark. To get a true correlation we would have to divide by R(λ), which would complicate the constraints.

3 LARS/lasso 3 <2> The algorithm will ensure that λ = C j (λ) if b j (λ) > (constraint ) λ = C j (λ) if b j (λ) < (constraint ) λ C j (λ) if b j (λ) = (constraint ) That is, for the minimizing b(λ) each (λ, C j (λ)) needs to stay inside the region R = {(λ, c) R + R : c λ}, moving along the top bounday (c = λ) when b j (λ) > (constraint ), along the lower boundary (c = λ) when b j (λ) < (constraint ), and being anywhere in R when b j (λ) = (constraint ). Modified lasso algorithm for diabetes data ltg bmi, ldl map hdl, tch age, glu bj's sex tc bmi, ltg map, tch, glu age, tc, ldl sex Cj's hdl [See the file lassofull.pdf in the Handouts directory for higher resolution.] 3 The algorithm The solution b(λ) is to be constructed in a sequence of steps, starting with large λ and working towards λ =. First step. Define λ 1 = max X j y. For λ λ 1 take b(λ) =, so that C j (λ) λ 1. Constraint would be violated if we kept b(λ) equal to zero for λ < λ 1 ; the b(λ) must move away from zero as λ decreases below λ 1. We must have C j (λ 1 ) = λ 1 for at least one j. For convenience of exposition, suppose C 1 (λ 1 ) = λ 1 > C j (λ 1 ) for all j 2. The active set is now A = {1}.

4 LARS/lasso 4 For λ 2 λ < λ 1, with λ 2 to be specified soon, keep b j = for j 2 but let b 1 (λ) = + v 1 (λ 1 λ) for some constant v 1. We need b 1 (λ) = λ for a while, which forces v 1 = 1. That choice gives R(λ) = y X 1 (λ 1 λ) and C j (λ) = C j (λ 1 ) X jx 1 (λ 1 λ). Notice that b 1 (λ) > so that is the relevant constraint for b 1. Let λ 2 be the largest λ less than λ 1 for which max j 2 C j (λ) = λ. *[1] Find the value λ 2. That is, express λ 2 as a function of C j s. Hint: Consider Efron, Hastie, Johnstone, and Tibshirani (24, equation 2.13), with appropriate changes of notation. Second step. We have C 1 (λ 2 ) = λ 2, by construction. For convenience of exposition, suppose C 2 (λ 2 ) = λ 2 > C j (λ 2 ) for all j 3. The active set is now A = {1, 2}. For λ 3 λ < λ 2 choose a new v 1 and a v 2 then define b 1 (λ) = b 1 (λ 2 ) + (λ 2 λ)v 1 b 2 (λ) = + (λ 2 λ)v 2 with all other b j s still zero. Write Z for [X 1, X 2 ]. The new C j s become C j (λ) = X j (y X 1 b 1 (λ) X 2 b 2 (λ)) = C j (λ 2 ) (λ 2 λ)x jzv where v = (v 1, v 2 ). *[2] Show that C 1 (λ) = C 2 (λ) = λ if we choose v to make Z Zv = (1, 1). That is, v = (Z Z) 1 1 with 1 = (1, 1). Of course we must assume that X 1 and X 2 are linearly independent for Z Z to have an inverse. *[3] Show that v 1 and v 2 are both positive, so that is the relevant constraint for both b 1 and b 2. Let λ 3 be the largest λ less than λ 2 for which max j 3 C j (λ) = λ. Third step. We have C 1 (λ 3 ) = C 2 (λ 3 ) = λ 3 and max j 3 C j (λ 3 ) = λ 3. Consider now what happens if the last equality holds because a C j has hit the lower boundary of the

5 LARS/lasso 5 region R: suppose C 3 (λ 3 ) = λ 3 > max j 4 C j (λ 3 ). The active set then becomes A = {1, 2, 3}. For λ 4 λ < λ 3 choose a new v 1, v 2 and a v 3 to make b 1 (λ) = b 1 (λ 3 ) + (λ 3 λ)v 1 b 2 (λ) = b 2 (λ 3 ) + (λ 3 λ)v 2 b 3 (λ) = (λ 3 λ)v 3 with all other b j s still zero. More concisely, with s j = sign(c j (λ 3 )), b j (λ) = b j (λ 3 ) + (λ 3 λ)v j s j for j A. For λ 3 λ small enough, we still have b 1 (λ) > and b 2 (λ) >, keeping as the relevant constraint for b 1 and b 2. To ensure that b 3 (λ) < we need v 3 >. Define Z = [X 1, X 2, X 3 ]. For λ λ 3 show that *[4] [5] so that R(λ) R(λ k ) = j A Z jv j (λ k λ) C j (λ) = C j (λ 3 ) (λ 3 λ)x jzv Show that v = (Z Z) 1 = 1 keeps b 1, b 2, and b 3 on the correct boundaries. How should we choose λ 4? Is it possible for some active b j to switch sign? Prove that v 3 >. I am not yet clear on why this must be true. Efron, Hastie, Johnstone, and Tibshirani (24, Lemma 4) seems to contain the relevant argument. The general step. Just before λ = λ k+1 suppose the active set is A, so that C j (λ) = λ for each j in A. For each j in A let s j = sign(c j (λ k )). For λ k+1 λ λ k define b j (λ) = b j (λ k ) + (λ k λ)v j s j for j A *[6] Define Z as the matrix with jth column equal to s j X j for each j A. λ k+1 λ λ k show that C j (λ) = C j (λ k ) (λ k λ)x jzv For so that the choice = (Z Z) 1 1 keeps C j (λ) = s j λ for j A. Define λ k+1 to be the largest λ less than λ k for which either of these conditions holds:

6 LARS/lasso 6 (i) max j / A C j (λ) = λ. In that case add the new j A c for which C j (λ k+1 ) = λ k+1 to the active set, then carry out another general step. (ii) b j (λ) = for some j A. In that case, remove j from the active set, then carry out another general step. For diabetes, the second alternative caused the behavior shown below [see lassoending.pdf for higher resolution] for 1.3 λ 2.2. Modified lasso algorithm for diabetes data ltg bmi, ldl map hdl, tch age, glu bj's sex tc bmi, map, ldl, tch, ltg, glu -5.1 Cj's age, sex, tc, hdl *[7] Find λ k+1. That is, write out what an R program would have to calculate. Note that this step is slightly tricky. If predictor j is removed from the active set because b j = then the corresponding C j will lie on the boundary of R. You will need to interpret (i) with care to avoid λ k+1 = λ k. [8] [9] Show that the coefficient b j (λ), if j enters the active set, has the correct sign. Again Efron, Hastie, Johnstone, and Tibshirani (24, Lemma 4) seems to contain the relevant argument. If you think you understand the modified algorithm, as described above, try to submit your solution to the next problem to me by Friday 3 December. Write an R program (a script or a function) that implements the modified algorithm. Test it out on the diabetes data set.

7 LARS/lasso 7 References Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani (24). Least angle regression. The Annals of Statistics 32 (2), pp Rosset, S. and J. Zhu (27). Piecewise linear regularized solution paths. Annals of Statistics 35 (3),

LASSO. Step. My modified algorithm getlasso() produces the same sequence of coefficients as the R function lars() when run on the diabetes data set:

LASSO. Step. My modified algorithm getlasso() produces the same sequence of coefficients as the R function lars() when run on the diabetes data set: lars/lasso 1 Homework 9 stepped you through the lasso modification of the LARS algorithm, based on the papers by Efron, Hastie, Johnstone, and Tibshirani (24) (= the LARS paper) and by Rosset and Zhu (27).

More information

Adaptive Lasso for correlated predictors

Adaptive Lasso for correlated predictors Adaptive Lasso for correlated predictors Keith Knight Department of Statistics University of Toronto e-mail: keith@utstat.toronto.edu This research was supported by NSERC of Canada. OUTLINE 1. Introduction

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Lecture 17 Intro to Lasso Regression

Lecture 17 Intro to Lasso Regression Lecture 17 Intro to Lasso Regression 11 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due today Goals for today introduction to lasso regression the subdifferential

More information

Saharon Rosset 1 and Ji Zhu 2

Saharon Rosset 1 and Ji Zhu 2 Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.

More information

Robust methods and model selection. Garth Tarr September 2015

Robust methods and model selection. Garth Tarr September 2015 Robust methods and model selection Garth Tarr September 2015 Outline 1. The past: robust statistics 2. The present: model selection 3. The future: protein data, meat science, joint modelling, data visualisation

More information

Frequentist Accuracy of Bayesian Estimates

Frequentist Accuracy of Bayesian Estimates Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University Bayesian Inference Parameter: µ Ω Observed data: x Prior: π(µ) Probability distributions: Parameter of interest: { fµ (x), µ

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model

More information

Homework 6. Due: 10am Thursday 11/30/17

Homework 6. Due: 10am Thursday 11/30/17 Homework 6 Due: 10am Thursday 11/30/17 1. Hinge loss vs. logistic loss. In class we defined hinge loss l hinge (x, y; w) = (1 yw T x) + and logistic loss l logistic (x, y; w) = log(1 + exp ( yw T x ) ).

More information

Pathwise coordinate optimization

Pathwise coordinate optimization Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,

More information

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint

On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint B.A. Turlach School of Mathematics and Statistics (M19) The University of Western Australia 35 Stirling Highway,

More information

Chapter 1 Linear Equations. 1.1 Systems of Linear Equations

Chapter 1 Linear Equations. 1.1 Systems of Linear Equations Chapter Linear Equations. Systems of Linear Equations A linear equation in the n variables x, x 2,..., x n is one that can be expressed in the form a x + a 2 x 2 + + a n x n = b where a, a 2,..., a n and

More information

The lars Package. R topics documented: May 17, Version Date Title Least Angle Regression, Lasso and Forward Stagewise

The lars Package. R topics documented: May 17, Version Date Title Least Angle Regression, Lasso and Forward Stagewise The lars Package May 17, 2007 Version 0.9-7 Date 2007-05-16 Title Least Angle Regression, Lasso and Forward Stagewise Author Trevor Hastie and Brad Efron

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

High dimensional thresholded regression and shrinkage effect

High dimensional thresholded regression and shrinkage effect J. R. Statist. Soc. B (014) 76, Part 3, pp. 67 649 High dimensional thresholded regression and shrinkage effect Zemin Zheng, Yingying Fan and Jinchi Lv University of Southern California, Los Angeles, USA

More information

Frequentist Accuracy of Bayesian Estimates

Frequentist Accuracy of Bayesian Estimates Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University RSS Journal Webinar Objective Bayesian Inference Probability family F = {f µ (x), µ Ω} Parameter of interest: θ = t(µ) Prior

More information

Solving with Absolute Value

Solving with Absolute Value Solving with Absolute Value Who knew two little lines could cause so much trouble? Ask someone to solve the equation 3x 2 = 7 and they ll say No problem! Add just two little lines, and ask them to solve

More information

A robust hybrid of lasso and ridge regression

A robust hybrid of lasso and ridge regression A robust hybrid of lasso and ridge regression Art B. Owen Stanford University October 2006 Abstract Ridge regression and the lasso are regularized versions of least squares regression using L 2 and L 1

More information

Math 61CM - Solutions to homework 6

Math 61CM - Solutions to homework 6 Math 61CM - Solutions to homework 6 Cédric De Groote November 5 th, 2018 Problem 1: (i) Give an example of a metric space X such that not all Cauchy sequences in X are convergent. (ii) Let X be a metric

More information

LEAST ANGLE REGRESSION 469

LEAST ANGLE REGRESSION 469 LEAST ANGLE REGRESSION 469 Specifically for the Lasso, one alternative strategy for logistic regression is to use a quadratic approximation for the log-likelihood. Consider the Bayesian version of Lasso

More information

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Exploratory quantile regression with many covariates: An application to adverse birth outcomes Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram

More information

1.10 Continuity Brian E. Veitch

1.10 Continuity Brian E. Veitch 1.10 Continuity Definition 1.5. A function is continuous at x = a if 1. f(a) exists 2. lim x a f(x) exists 3. lim x a f(x) = f(a) If any of these conditions fail, f is discontinuous. Note: From algebra

More information

Physics 8 Wednesday, November 18, 2015

Physics 8 Wednesday, November 18, 2015 Physics 8 Wednesday, November 18, 2015 Remember HW10 due Friday. For this week s homework help, I will show up Wednesday, DRL 2C4, 4pm-6pm (when/where you normally find Camilla), and Camilla will show

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Vote Vote on timing for night section: Option 1 (what we have now) Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Option 2 Lecture, 6:10-7 10 minute break Lecture, 7:10-8 10 minute break Tutorial,

More information

ONE DIMENSIONAL TRIANGULAR FIN EXPERIMENT. Current Technical Advisor: Dr. Kelly O. Homan Original Technical Advisor: Dr. D.C. Look

ONE DIMENSIONAL TRIANGULAR FIN EXPERIMENT. Current Technical Advisor: Dr. Kelly O. Homan Original Technical Advisor: Dr. D.C. Look ONE DIMENSIONAL TRIANGULAR FIN EXPERIMENT Current Technical Advisor: Dr. Kelly O. Homan Original Technical Advisor: Dr. D.C. Look (Updated on 8 th December 2009, Original version: 3 rd November 2000) 1

More information

Ordinary Least Squares Linear Regression

Ordinary Least Squares Linear Regression Ordinary Least Squares Linear Regression Ryan P. Adams COS 324 Elements of Machine Learning Princeton University Linear regression is one of the simplest and most fundamental modeling ideas in statistics

More information

Model Selection and Estimation in Regression with Grouped Variables 1

Model Selection and Estimation in Regression with Grouped Variables 1 DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1095 November 9, 2004 Model Selection and Estimation in Regression with Grouped Variables 1

More information

MATH10001 Mathematical Workshop Difference Equations part 2 Non-linear difference equations

MATH10001 Mathematical Workshop Difference Equations part 2 Non-linear difference equations MATH10001 Mathematical Workshop Difference Equations part 2 Non-linear difference equations In a linear difference equation, the equation contains a sum of multiples of the elements in the sequence {y

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Lasso applications: regularisation and homotopy

Lasso applications: regularisation and homotopy Lasso applications: regularisation and homotopy M.R. Osborne 1 mailto:mike.osborne@anu.edu.au 1 Mathematical Sciences Institute, Australian National University Abstract Ti suggested the use of an l 1 norm

More information

An Introduction to Graphical Lasso

An Introduction to Graphical Lasso An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents

More information

MATH 829: Introduction to Data Mining and Analysis Least angle regression

MATH 829: Introduction to Data Mining and Analysis Least angle regression 1/14 MATH 829: Introduction to Data Mining and Analysis Least angle regression Dominique Guillot Departments of Mathematical Sciences University of Delaware February 29, 2016 Least angle regression (LARS)

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

SOLUTIONS TO ADDITIONAL EXERCISES FOR II.1 AND II.2

SOLUTIONS TO ADDITIONAL EXERCISES FOR II.1 AND II.2 SOLUTIONS TO ADDITIONAL EXERCISES FOR II.1 AND II.2 Here are the solutions to the additional exercises in betsepexercises.pdf. B1. Let y and z be distinct points of L; we claim that x, y and z are not

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

Functional Analysis MATH and MATH M6202

Functional Analysis MATH and MATH M6202 Functional Analysis MATH 36202 and MATH M6202 1 Inner Product Spaces and Normed Spaces Inner Product Spaces Functional analysis involves studying vector spaces where we additionally have the notion of

More information

MATH20411 PDEs and Vector Calculus B

MATH20411 PDEs and Vector Calculus B MATH2411 PDEs and Vector Calculus B Dr Stefan Güttel Acknowledgement The lecture notes and other course materials are based on notes provided by Dr Catherine Powell. SECTION 1: Introctory Material MATH2411

More information

Research Methods in Mathematics Homework 4 solutions

Research Methods in Mathematics Homework 4 solutions Research Methods in Mathematics Homework 4 solutions T. PERUTZ (1) Solution. (a) Since x 2 = 2, we have (p/q) 2 = 2, so p 2 = 2q 2. By definition, an integer is even if it is twice another integer. Since

More information

spikeslab: Prediction and Variable Selection Using Spike and Slab Regression

spikeslab: Prediction and Variable Selection Using Spike and Slab Regression 68 CONTRIBUTED RESEARCH ARTICLES spikeslab: Prediction and Variable Selection Using Spike and Slab Regression by Hemant Ishwaran, Udaya B. Kogalur and J. Sunil Rao Abstract Weighted generalized ridge regression

More information

AP Calculus AB Summer Assignment

AP Calculus AB Summer Assignment AP Calculus AB Summer Assignment Name: When you come back to school, it is my epectation that you will have this packet completed. You will be way behind at the beginning of the year if you haven t attempted

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Modeling the Motion of a Projectile in Air

Modeling the Motion of a Projectile in Air In this lab, you will do the following: Modeling the Motion of a Projectile in Air analyze the motion of an object fired from a cannon using two different fundamental physics principles: the momentum principle

More information

MATH10001 Mathematical Workshop Graph Fitting Project part 2

MATH10001 Mathematical Workshop Graph Fitting Project part 2 MATH10001 Mathematical Workshop Graph Fitting Project part 2 Polynomial models Modelling a set of data with a polynomial curve can be convenient because polynomial functions are particularly easy to differentiate

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

Homework 2 due Friday, October 11 at 5 PM.

Homework 2 due Friday, October 11 at 5 PM. Complex Riemann Surfaces Monday, October 07, 2013 2:01 PM Homework 2 due Friday, October 11 at 5 PM. How should one choose the specific branches to define the Riemann surface? It is a subjective choice,

More information

Markov Chains and Related Matters

Markov Chains and Related Matters Markov Chains and Related Matters 2 :9 3 4 : The four nodes are called states. The numbers on the arrows are called transition probabilities. For example if we are in state, there is a probability of going

More information

Farkas Lemma, Dual Simplex and Sensitivity Analysis

Farkas Lemma, Dual Simplex and Sensitivity Analysis Summer 2011 Optimization I Lecture 10 Farkas Lemma, Dual Simplex and Sensitivity Analysis 1 Farkas Lemma Theorem 1. Let A R m n, b R m. Then exactly one of the following two alternatives is true: (i) x

More information

Project One: C Bump functions

Project One: C Bump functions Project One: C Bump functions James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University November 2, 2018 Outline 1 2 The Project Let s recall what the

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Designing Information Devices and Systems I Fall 2015 Anant Sahai, Ali Niknejad Final Exam. Exam location: RSF Fieldhouse, Back Left, last SID 6, 8, 9

Designing Information Devices and Systems I Fall 2015 Anant Sahai, Ali Niknejad Final Exam. Exam location: RSF Fieldhouse, Back Left, last SID 6, 8, 9 EECS 16A Designing Information Devices and Systems I Fall 2015 Anant Sahai, Ali Niknejad Final Exam Exam location: RSF Fieldhouse, Back Left, last SID 6, 8, 9 PRINT your student ID: PRINT AND SIGN your

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization. A Lasso Solver

MS&E 318 (CME 338) Large-Scale Numerical Optimization. A Lasso Solver Stanford University, Dept of Management Science and Engineering MS&E 318 (CME 338) Large-Scale Numerical Optimization Instructor: Michael Saunders Spring 2011 Final Project Due Friday June 10 A Lasso Solver

More information

Designing Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 11

Designing Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 11 EECS 6A Designing Information Devices and Systems I Spring 206 Elad Alon, Babak Ayazifar Homework This homework is due April 9, 206, at Noon.. Homework process and study group Who else did you work with

More information

Sparse Algorithms are not Stable: A No-free-lunch Theorem

Sparse Algorithms are not Stable: A No-free-lunch Theorem Sparse Algorithms are not Stable: A No-free-lunch Theorem Huan Xu Shie Mannor Constantine Caramanis Abstract We consider two widely used notions in machine learning, namely: sparsity algorithmic stability.

More information

The lasso an l 1 constraint in variable selection

The lasso an l 1 constraint in variable selection The lasso algorithm is due to Tibshirani who realised the possibility for variable selection of imposing an l 1 norm bound constraint on the variables in least squares models and then tuning the model

More information

10725/36725 Optimization Homework 4

10725/36725 Optimization Homework 4 10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages

More information

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On Basics of Proofs The Putnam is a proof based exam and will expect you to write proofs in your solutions Similarly, Math 96 will also require you to write proofs in your homework solutions If you ve seen

More information

Review Solutions, Exam 2, Operations Research

Review Solutions, Exam 2, Operations Research Review Solutions, Exam 2, Operations Research 1. Prove the weak duality theorem: For any x feasible for the primal and y feasible for the dual, then... HINT: Consider the quantity y T Ax. SOLUTION: To

More information

Math 33A Discussion Notes

Math 33A Discussion Notes Math 33A Discussion Notes Jean-Michel Maldague October 21, 2017 Week 3 Incomplete! Will update soon. - A function T : R k R n is called injective, or one-to-one, if each input gets a unique output. The

More information

MATH 225 Summer 2005 Linear Algebra II Solutions to Assignment 1 Due: Wednesday July 13, 2005

MATH 225 Summer 2005 Linear Algebra II Solutions to Assignment 1 Due: Wednesday July 13, 2005 MATH 225 Summer 25 Linear Algebra II Solutions to Assignment 1 Due: Wednesday July 13, 25 Department of Mathematical and Statistical Sciences University of Alberta Question 1. [p 224. #2] The set of all

More information

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra

Observations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra September The Building Blocks of Algebra Rates, Patterns and Problem Solving Variables and Expressions The Commutative and Associative Properties The Distributive Property Equivalent Expressions Seeing

More information

Contents. 4.5 The(Primal)SimplexMethod NumericalExamplesoftheSimplexMethod

Contents. 4.5 The(Primal)SimplexMethod NumericalExamplesoftheSimplexMethod Contents 4 The Simplex Method for Solving LPs 149 4.1 Transformations to be Carried Out On an LP Model Before Applying the Simplex Method On It... 151 4.2 Definitions of Various Types of Basic Vectors

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

Linear Programming: Simplex

Linear Programming: Simplex Linear Programming: Simplex Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Linear Programming: Simplex IMA, August 2016

More information

A Quick Introduction to Row Reduction

A Quick Introduction to Row Reduction A Quick Introduction to Row Reduction Gaussian Elimination Suppose we are asked to solve the system of equations 4x + 5x 2 + 6x 3 = 7 6x + 7x 2 + 8x 3 = 9. That is, we want to find all values of x, x 2

More information

Self-adaptive Lasso and its Bayesian Estimation

Self-adaptive Lasso and its Bayesian Estimation Self-adaptive Lasso and its Bayesian Estimation Jian Kang 1 and Jian Guo 2 1. Department of Biostatistics, University of Michigan 2. Department of Statistics, University of Michigan Abstract In this paper,

More information

A significance test for the lasso

A significance test for the lasso 1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,

More information

Introduce complex-valued transcendental functions via the complex exponential defined as:

Introduce complex-valued transcendental functions via the complex exponential defined as: Complex exponential and logarithm Monday, September 30, 2013 1:59 PM Homework 2 posted, due October 11. Introduce complex-valued transcendental functions via the complex exponential defined as: where We'll

More information

Math Lecture 4 Limit Laws

Math Lecture 4 Limit Laws Math 1060 Lecture 4 Limit Laws Outline Summary of last lecture Limit laws Motivation Limits of constants and the identity function Limits of sums and differences Limits of products Limits of polynomials

More information

A note on the group lasso and a sparse group lasso

A note on the group lasso and a sparse group lasso A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty

More information

Manual of Logical Style (fresh version 2018)

Manual of Logical Style (fresh version 2018) Manual of Logical Style (fresh version 2018) Randall Holmes 9/5/2018 1 Introduction This is a fresh version of a document I have been working on with my classes at various levels for years. The idea that

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

arxiv: v1 [math.st] 2 May 2007

arxiv: v1 [math.st] 2 May 2007 arxiv:0705.0269v1 [math.st] 2 May 2007 Electronic Journal of Statistics Vol. 1 (2007) 1 29 ISSN: 1935-7524 DOI: 10.1214/07-EJS004 Forward stagewise regression and the monotone lasso Trevor Hastie Departments

More information

SMT 2013 Power Round Solutions February 2, 2013

SMT 2013 Power Round Solutions February 2, 2013 Introduction This Power Round is an exploration of numerical semigroups, mathematical structures which appear very naturally out of answers to simple questions. For example, suppose McDonald s sells Chicken

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

HOMEWORK 4: SVMS AND KERNELS

HOMEWORK 4: SVMS AND KERNELS HOMEWORK 4: SVMS AND KERNELS CMU 060: MACHINE LEARNING (FALL 206) OUT: Sep. 26, 206 DUE: 5:30 pm, Oct. 05, 206 TAs: Simon Shaolei Du, Tianshu Ren, Hsiao-Yu Fish Tung Instructions Homework Submission: Submit

More information

arxiv: v3 [stat.me] 4 Jan 2012

arxiv: v3 [stat.me] 4 Jan 2012 Efficient algorithm to select tuning parameters in sparse regression modeling with regularization Kei Hirose 1, Shohei Tateishi 2 and Sadanori Konishi 3 1 Division of Mathematical Science, Graduate School

More information

Boosted Lasso and Reverse Boosting

Boosted Lasso and Reverse Boosting Boosted Lasso and Reverse Boosting Peng Zhao University of California, Berkeley, USA. Bin Yu University of California, Berkeley, USA. Summary. This paper introduces the concept of backward step in contrast

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Homework 4 due on Thursday, December 15 at 5 PM (hard deadline).

Homework 4 due on Thursday, December 15 at 5 PM (hard deadline). Large-Time Behavior for Continuous-Time Markov Chains Friday, December 02, 2011 10:58 AM Homework 4 due on Thursday, December 15 at 5 PM (hard deadline). How are formulas for large-time behavior of discrete-time

More information

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Set 2 January 12 th, 2018

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Set 2 January 12 th, 2018 Policies Due 9 PM, January 9 th, via Moodle. You are free to collaborate on all of the problems, subject to the collaboration policy stated in the syllabus. You should submit all code used in the homework.

More information

Elementary Linear Algebra Review for Exam 3 Exam is Friday, December 11th from 1:15-3:15

Elementary Linear Algebra Review for Exam 3 Exam is Friday, December 11th from 1:15-3:15 Elementary Linear Algebra Review for Exam 3 Exam is Friday, December th from :5-3:5 The exam will cover sections: 6., 6.2, 7. 7.4, and the class notes on dynamical systems. You absolutely must be able

More information

Preptests 59 Answers and Explanations (By Ivy Global) Section 1 Analytical Reasoning

Preptests 59 Answers and Explanations (By Ivy Global) Section 1 Analytical Reasoning Preptests 59 Answers and Explanations (By ) Section 1 Analytical Reasoning Questions 1 5 Since L occupies its own floor, the remaining two must have H in the upper and I in the lower. P and T also need

More information

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction MAT4 : Introduction to Applied Linear Algebra Mike Newman fall 7 9. Projections introduction One reason to consider projections is to understand approximate solutions to linear systems. A common example

More information

Pre Algebra, Unit 1: Variables, Expression, and Integers

Pre Algebra, Unit 1: Variables, Expression, and Integers Syllabus Objectives (1.1) Students will evaluate variable and numerical expressions using the order of operations. (1.2) Students will compare integers. (1.3) Students will order integers (1.4) Students

More information

MAS221 Analysis Semester Chapter 2 problems

MAS221 Analysis Semester Chapter 2 problems MAS221 Analysis Semester 1 2018-19 Chapter 2 problems 20. Consider the sequence (a n ), with general term a n = 1 + 3. Can you n guess the limit l of this sequence? (a) Verify that your guess is plausible

More information

Solution to Proof Questions from September 1st

Solution to Proof Questions from September 1st Solution to Proof Questions from September 1st Olena Bormashenko September 4, 2011 What is a proof? A proof is an airtight logical argument that proves a certain statement in general. In a sense, it s

More information

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 Introduction Square matrices whose entries are all nonnegative have special properties. This was mentioned briefly in Section

More information

The residual again. The residual is our method of judging how good a potential solution x! of a system A x = b actually is. We compute. r = b - A x!

The residual again. The residual is our method of judging how good a potential solution x! of a system A x = b actually is. We compute. r = b - A x! The residual again The residual is our method of judging how good a potential solution x! of a system A x = b actually is. We compute r = b - A x! which gives us a measure of how good or bad x! is as a

More information

Post-selection inference with an application to internal inference

Post-selection inference with an application to internal inference Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information