The problem is: given an n 1 vector y and an n p matrix X, find the b(λ) that minimizes
|
|
- Ursula Brooks
- 5 years ago
- Views:
Transcription
1 LARS/lasso 1 Statistics 312/612, fall 21 Homework # 9 Due: Monday 29 November The questions on this sheet all refer to (a slight modification of) the lasso modification of the LARS algorithm, based on the papers by Efron, Hastie, Johnstone, and Tibshirani (24) and by Rosset and Zhu (27). I have made a few small changes to the algorithm described in the first paper. I hope I haven t broken anything. The homework problems are embedded into an explanation of how and why the algorithm works. Look for the numbers [1], [2],... in the left margin and the text enclosed in a box. 1 The Lasso problem The problem is: given an n 1 vector y and an n p matrix X, find the b(λ) that minimizes L λ (b) = y Xb 2 + 2λ b j for each λ. [The extra factor of 2 eliminates many factors of 2 in what follows.] The columns of X will be assumed to be standardized to have zero means and X j = 1. Remark. I thought the vector y was also supposed to have a zero mean. That is not the case for the diabetes data set in R, the data set used as an illustration in the LARS paper (Efron, Hastie, Johnstone, and Tibshirani 24) where the LARS procedure was first analyzed. The solution will turn out to be piecewise linear and continuous in λ. To test my modification I also used the diabetes data set: library(lars) data(diabetes) # load the data set #?diabetes for description dd = diabetes LL = lars(diabetes$x,diabetes$y,type="lasso") plot(ll,xvar="step") # the usual plot My modified algorithm gets the same coefficients as lars() for this data set.
2 LARS/lasso 2 Remark. For the sake of exposition, the discussion below assumes that the predictors enter the active set in a way slightly different from what happens for diabetes. I will consider only the one at a time case (page 417 of the LARS paper), for which the active set of predictors X j changes only by either addition or deletion of a single predictor. 2 Directional derivative The function L λ has derivative at b in the direction u defined by <1> L L λ (b + tu) L λ (b) λ (b, u) = lim t t = 2 j λf(b j, u j ) (y Xb) X j u j where f(b j, u j ) = u j {b j > } u j {b j < } + u j {b j = }. By convexity, a vector b minimizes L λ if and only if L λ (b, u) for every u. Equivalently, for every j and every u j the jth summand in <1> must be nonnegative. [Consider u vectors with only one nonzero component to establish this equivalence.] That is, b minimizes L λ if and only if λf(b j, u j ) (y Xb) X j u j for every j and every u j When b j the inequalities for u j = ±1 imply an equality; for b j = we get only the inequality. Thus b minimizes L λ if and only if λ = X j R if b j > λ = X j R if b j < where R := y Xb λ X j R if b j = The LARS/lasso algorithm recursively calculates a sequence of breakpoints > λ 1 > λ 2 > with b(λ) linear for each interval λ k+1 λ λ k. Define residual vector and correlations R(λ) := y X b(λ) and C j (λ) := X jr(λ). Remark. To get a true correlation we would have to divide by R(λ), which would complicate the constraints.
3 LARS/lasso 3 <2> The algorithm will ensure that λ = C j (λ) if b j (λ) > (constraint ) λ = C j (λ) if b j (λ) < (constraint ) λ C j (λ) if b j (λ) = (constraint ) That is, for the minimizing b(λ) each (λ, C j (λ)) needs to stay inside the region R = {(λ, c) R + R : c λ}, moving along the top bounday (c = λ) when b j (λ) > (constraint ), along the lower boundary (c = λ) when b j (λ) < (constraint ), and being anywhere in R when b j (λ) = (constraint ). Modified lasso algorithm for diabetes data ltg bmi, ldl map hdl, tch age, glu bj's sex tc bmi, ltg map, tch, glu age, tc, ldl sex Cj's hdl [See the file lassofull.pdf in the Handouts directory for higher resolution.] 3 The algorithm The solution b(λ) is to be constructed in a sequence of steps, starting with large λ and working towards λ =. First step. Define λ 1 = max X j y. For λ λ 1 take b(λ) =, so that C j (λ) λ 1. Constraint would be violated if we kept b(λ) equal to zero for λ < λ 1 ; the b(λ) must move away from zero as λ decreases below λ 1. We must have C j (λ 1 ) = λ 1 for at least one j. For convenience of exposition, suppose C 1 (λ 1 ) = λ 1 > C j (λ 1 ) for all j 2. The active set is now A = {1}.
4 LARS/lasso 4 For λ 2 λ < λ 1, with λ 2 to be specified soon, keep b j = for j 2 but let b 1 (λ) = + v 1 (λ 1 λ) for some constant v 1. We need b 1 (λ) = λ for a while, which forces v 1 = 1. That choice gives R(λ) = y X 1 (λ 1 λ) and C j (λ) = C j (λ 1 ) X jx 1 (λ 1 λ). Notice that b 1 (λ) > so that is the relevant constraint for b 1. Let λ 2 be the largest λ less than λ 1 for which max j 2 C j (λ) = λ. *[1] Find the value λ 2. That is, express λ 2 as a function of C j s. Hint: Consider Efron, Hastie, Johnstone, and Tibshirani (24, equation 2.13), with appropriate changes of notation. Second step. We have C 1 (λ 2 ) = λ 2, by construction. For convenience of exposition, suppose C 2 (λ 2 ) = λ 2 > C j (λ 2 ) for all j 3. The active set is now A = {1, 2}. For λ 3 λ < λ 2 choose a new v 1 and a v 2 then define b 1 (λ) = b 1 (λ 2 ) + (λ 2 λ)v 1 b 2 (λ) = + (λ 2 λ)v 2 with all other b j s still zero. Write Z for [X 1, X 2 ]. The new C j s become C j (λ) = X j (y X 1 b 1 (λ) X 2 b 2 (λ)) = C j (λ 2 ) (λ 2 λ)x jzv where v = (v 1, v 2 ). *[2] Show that C 1 (λ) = C 2 (λ) = λ if we choose v to make Z Zv = (1, 1). That is, v = (Z Z) 1 1 with 1 = (1, 1). Of course we must assume that X 1 and X 2 are linearly independent for Z Z to have an inverse. *[3] Show that v 1 and v 2 are both positive, so that is the relevant constraint for both b 1 and b 2. Let λ 3 be the largest λ less than λ 2 for which max j 3 C j (λ) = λ. Third step. We have C 1 (λ 3 ) = C 2 (λ 3 ) = λ 3 and max j 3 C j (λ 3 ) = λ 3. Consider now what happens if the last equality holds because a C j has hit the lower boundary of the
5 LARS/lasso 5 region R: suppose C 3 (λ 3 ) = λ 3 > max j 4 C j (λ 3 ). The active set then becomes A = {1, 2, 3}. For λ 4 λ < λ 3 choose a new v 1, v 2 and a v 3 to make b 1 (λ) = b 1 (λ 3 ) + (λ 3 λ)v 1 b 2 (λ) = b 2 (λ 3 ) + (λ 3 λ)v 2 b 3 (λ) = (λ 3 λ)v 3 with all other b j s still zero. More concisely, with s j = sign(c j (λ 3 )), b j (λ) = b j (λ 3 ) + (λ 3 λ)v j s j for j A. For λ 3 λ small enough, we still have b 1 (λ) > and b 2 (λ) >, keeping as the relevant constraint for b 1 and b 2. To ensure that b 3 (λ) < we need v 3 >. Define Z = [X 1, X 2, X 3 ]. For λ λ 3 show that *[4] [5] so that R(λ) R(λ k ) = j A Z jv j (λ k λ) C j (λ) = C j (λ 3 ) (λ 3 λ)x jzv Show that v = (Z Z) 1 = 1 keeps b 1, b 2, and b 3 on the correct boundaries. How should we choose λ 4? Is it possible for some active b j to switch sign? Prove that v 3 >. I am not yet clear on why this must be true. Efron, Hastie, Johnstone, and Tibshirani (24, Lemma 4) seems to contain the relevant argument. The general step. Just before λ = λ k+1 suppose the active set is A, so that C j (λ) = λ for each j in A. For each j in A let s j = sign(c j (λ k )). For λ k+1 λ λ k define b j (λ) = b j (λ k ) + (λ k λ)v j s j for j A *[6] Define Z as the matrix with jth column equal to s j X j for each j A. λ k+1 λ λ k show that C j (λ) = C j (λ k ) (λ k λ)x jzv For so that the choice = (Z Z) 1 1 keeps C j (λ) = s j λ for j A. Define λ k+1 to be the largest λ less than λ k for which either of these conditions holds:
6 LARS/lasso 6 (i) max j / A C j (λ) = λ. In that case add the new j A c for which C j (λ k+1 ) = λ k+1 to the active set, then carry out another general step. (ii) b j (λ) = for some j A. In that case, remove j from the active set, then carry out another general step. For diabetes, the second alternative caused the behavior shown below [see lassoending.pdf for higher resolution] for 1.3 λ 2.2. Modified lasso algorithm for diabetes data ltg bmi, ldl map hdl, tch age, glu bj's sex tc bmi, map, ldl, tch, ltg, glu -5.1 Cj's age, sex, tc, hdl *[7] Find λ k+1. That is, write out what an R program would have to calculate. Note that this step is slightly tricky. If predictor j is removed from the active set because b j = then the corresponding C j will lie on the boundary of R. You will need to interpret (i) with care to avoid λ k+1 = λ k. [8] [9] Show that the coefficient b j (λ), if j enters the active set, has the correct sign. Again Efron, Hastie, Johnstone, and Tibshirani (24, Lemma 4) seems to contain the relevant argument. If you think you understand the modified algorithm, as described above, try to submit your solution to the next problem to me by Friday 3 December. Write an R program (a script or a function) that implements the modified algorithm. Test it out on the diabetes data set.
7 LARS/lasso 7 References Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani (24). Least angle regression. The Annals of Statistics 32 (2), pp Rosset, S. and J. Zhu (27). Piecewise linear regularized solution paths. Annals of Statistics 35 (3),
LASSO. Step. My modified algorithm getlasso() produces the same sequence of coefficients as the R function lars() when run on the diabetes data set:
lars/lasso 1 Homework 9 stepped you through the lasso modification of the LARS algorithm, based on the papers by Efron, Hastie, Johnstone, and Tibshirani (24) (= the LARS paper) and by Rosset and Zhu (27).
More informationAdaptive Lasso for correlated predictors
Adaptive Lasso for correlated predictors Keith Knight Department of Statistics University of Toronto e-mail: keith@utstat.toronto.edu This research was supported by NSERC of Canada. OUTLINE 1. Introduction
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationLecture 17 Intro to Lasso Regression
Lecture 17 Intro to Lasso Regression 11 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due today Goals for today introduction to lasso regression the subdifferential
More informationSaharon Rosset 1 and Ji Zhu 2
Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.
More informationRobust methods and model selection. Garth Tarr September 2015
Robust methods and model selection Garth Tarr September 2015 Outline 1. The past: robust statistics 2. The present: model selection 3. The future: protein data, meat science, joint modelling, data visualisation
More informationFrequentist Accuracy of Bayesian Estimates
Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University Bayesian Inference Parameter: µ Ω Observed data: x Prior: π(µ) Probability distributions: Parameter of interest: { fµ (x), µ
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationRegularization Path Algorithms for Detecting Gene Interactions
Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationLeast Angle Regression, Forward Stagewise and the Lasso
January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationTECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1091r April 2004, Revised December 2004 A Note on the Lasso and Related Procedures in Model
More informationHomework 6. Due: 10am Thursday 11/30/17
Homework 6 Due: 10am Thursday 11/30/17 1. Hinge loss vs. logistic loss. In class we defined hinge loss l hinge (x, y; w) = (1 yw T x) + and logistic loss l logistic (x, y; w) = log(1 + exp ( yw T x ) ).
More informationPathwise coordinate optimization
Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,
More informationOn Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint
On Algorithms for Solving Least Squares Problems under an L 1 Penalty or an L 1 Constraint B.A. Turlach School of Mathematics and Statistics (M19) The University of Western Australia 35 Stirling Highway,
More informationChapter 1 Linear Equations. 1.1 Systems of Linear Equations
Chapter Linear Equations. Systems of Linear Equations A linear equation in the n variables x, x 2,..., x n is one that can be expressed in the form a x + a 2 x 2 + + a n x n = b where a, a 2,..., a n and
More informationThe lars Package. R topics documented: May 17, Version Date Title Least Angle Regression, Lasso and Forward Stagewise
The lars Package May 17, 2007 Version 0.9-7 Date 2007-05-16 Title Least Angle Regression, Lasso and Forward Stagewise Author Trevor Hastie and Brad Efron
More informationRegularization Paths. Theme
June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,
More informationHigh dimensional thresholded regression and shrinkage effect
J. R. Statist. Soc. B (014) 76, Part 3, pp. 67 649 High dimensional thresholded regression and shrinkage effect Zemin Zheng, Yingying Fan and Jinchi Lv University of Southern California, Los Angeles, USA
More informationFrequentist Accuracy of Bayesian Estimates
Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University RSS Journal Webinar Objective Bayesian Inference Probability family F = {f µ (x), µ Ω} Parameter of interest: θ = t(µ) Prior
More informationSolving with Absolute Value
Solving with Absolute Value Who knew two little lines could cause so much trouble? Ask someone to solve the equation 3x 2 = 7 and they ll say No problem! Add just two little lines, and ask them to solve
More informationA robust hybrid of lasso and ridge regression
A robust hybrid of lasso and ridge regression Art B. Owen Stanford University October 2006 Abstract Ridge regression and the lasso are regularized versions of least squares regression using L 2 and L 1
More informationMath 61CM - Solutions to homework 6
Math 61CM - Solutions to homework 6 Cédric De Groote November 5 th, 2018 Problem 1: (i) Give an example of a metric space X such that not all Cauchy sequences in X are convergent. (ii) Let X be a metric
More informationLEAST ANGLE REGRESSION 469
LEAST ANGLE REGRESSION 469 Specifically for the Lasso, one alternative strategy for logistic regression is to use a quadratic approximation for the log-likelihood. Consider the Bayesian version of Lasso
More informationExploratory quantile regression with many covariates: An application to adverse birth outcomes
Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram
More information1.10 Continuity Brian E. Veitch
1.10 Continuity Definition 1.5. A function is continuous at x = a if 1. f(a) exists 2. lim x a f(x) exists 3. lim x a f(x) = f(a) If any of these conditions fail, f is discontinuous. Note: From algebra
More informationPhysics 8 Wednesday, November 18, 2015
Physics 8 Wednesday, November 18, 2015 Remember HW10 due Friday. For this week s homework help, I will show up Wednesday, DRL 2C4, 4pm-6pm (when/where you normally find Camilla), and Camilla will show
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationVote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9
Vote Vote on timing for night section: Option 1 (what we have now) Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9 Option 2 Lecture, 6:10-7 10 minute break Lecture, 7:10-8 10 minute break Tutorial,
More informationONE DIMENSIONAL TRIANGULAR FIN EXPERIMENT. Current Technical Advisor: Dr. Kelly O. Homan Original Technical Advisor: Dr. D.C. Look
ONE DIMENSIONAL TRIANGULAR FIN EXPERIMENT Current Technical Advisor: Dr. Kelly O. Homan Original Technical Advisor: Dr. D.C. Look (Updated on 8 th December 2009, Original version: 3 rd November 2000) 1
More informationOrdinary Least Squares Linear Regression
Ordinary Least Squares Linear Regression Ryan P. Adams COS 324 Elements of Machine Learning Princeton University Linear regression is one of the simplest and most fundamental modeling ideas in statistics
More informationModel Selection and Estimation in Regression with Grouped Variables 1
DEPARTMENT OF STATISTICS University of Wisconsin 1210 West Dayton St. Madison, WI 53706 TECHNICAL REPORT NO. 1095 November 9, 2004 Model Selection and Estimation in Regression with Grouped Variables 1
More informationMATH10001 Mathematical Workshop Difference Equations part 2 Non-linear difference equations
MATH10001 Mathematical Workshop Difference Equations part 2 Non-linear difference equations In a linear difference equation, the equation contains a sum of multiples of the elements in the sequence {y
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationLasso applications: regularisation and homotopy
Lasso applications: regularisation and homotopy M.R. Osborne 1 mailto:mike.osborne@anu.edu.au 1 Mathematical Sciences Institute, Australian National University Abstract Ti suggested the use of an l 1 norm
More informationAn Introduction to Graphical Lasso
An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents
More informationMATH 829: Introduction to Data Mining and Analysis Least angle regression
1/14 MATH 829: Introduction to Data Mining and Analysis Least angle regression Dominique Guillot Departments of Mathematical Sciences University of Delaware February 29, 2016 Least angle regression (LARS)
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationSOLUTIONS TO ADDITIONAL EXERCISES FOR II.1 AND II.2
SOLUTIONS TO ADDITIONAL EXERCISES FOR II.1 AND II.2 Here are the solutions to the additional exercises in betsepexercises.pdf. B1. Let y and z be distinct points of L; we claim that x, y and z are not
More informationNonlinear Support Vector Machines through Iterative Majorization and I-Splines
Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support
More informationFunctional Analysis MATH and MATH M6202
Functional Analysis MATH 36202 and MATH M6202 1 Inner Product Spaces and Normed Spaces Inner Product Spaces Functional analysis involves studying vector spaces where we additionally have the notion of
More informationMATH20411 PDEs and Vector Calculus B
MATH2411 PDEs and Vector Calculus B Dr Stefan Güttel Acknowledgement The lecture notes and other course materials are based on notes provided by Dr Catherine Powell. SECTION 1: Introctory Material MATH2411
More informationResearch Methods in Mathematics Homework 4 solutions
Research Methods in Mathematics Homework 4 solutions T. PERUTZ (1) Solution. (a) Since x 2 = 2, we have (p/q) 2 = 2, so p 2 = 2q 2. By definition, an integer is even if it is twice another integer. Since
More informationspikeslab: Prediction and Variable Selection Using Spike and Slab Regression
68 CONTRIBUTED RESEARCH ARTICLES spikeslab: Prediction and Variable Selection Using Spike and Slab Regression by Hemant Ishwaran, Udaya B. Kogalur and J. Sunil Rao Abstract Weighted generalized ridge regression
More informationAP Calculus AB Summer Assignment
AP Calculus AB Summer Assignment Name: When you come back to school, it is my epectation that you will have this packet completed. You will be way behind at the beginning of the year if you haven t attempted
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationModeling the Motion of a Projectile in Air
In this lab, you will do the following: Modeling the Motion of a Projectile in Air analyze the motion of an object fired from a cannon using two different fundamental physics principles: the momentum principle
More informationMATH10001 Mathematical Workshop Graph Fitting Project part 2
MATH10001 Mathematical Workshop Graph Fitting Project part 2 Polynomial models Modelling a set of data with a polynomial curve can be convenient because polynomial functions are particularly easy to differentiate
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality
More informationHomework 2 due Friday, October 11 at 5 PM.
Complex Riemann Surfaces Monday, October 07, 2013 2:01 PM Homework 2 due Friday, October 11 at 5 PM. How should one choose the specific branches to define the Riemann surface? It is a subjective choice,
More informationMarkov Chains and Related Matters
Markov Chains and Related Matters 2 :9 3 4 : The four nodes are called states. The numbers on the arrows are called transition probabilities. For example if we are in state, there is a probability of going
More informationFarkas Lemma, Dual Simplex and Sensitivity Analysis
Summer 2011 Optimization I Lecture 10 Farkas Lemma, Dual Simplex and Sensitivity Analysis 1 Farkas Lemma Theorem 1. Let A R m n, b R m. Then exactly one of the following two alternatives is true: (i) x
More informationProject One: C Bump functions
Project One: C Bump functions James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University November 2, 2018 Outline 1 2 The Project Let s recall what the
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationDesigning Information Devices and Systems I Fall 2015 Anant Sahai, Ali Niknejad Final Exam. Exam location: RSF Fieldhouse, Back Left, last SID 6, 8, 9
EECS 16A Designing Information Devices and Systems I Fall 2015 Anant Sahai, Ali Niknejad Final Exam Exam location: RSF Fieldhouse, Back Left, last SID 6, 8, 9 PRINT your student ID: PRINT AND SIGN your
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization. A Lasso Solver
Stanford University, Dept of Management Science and Engineering MS&E 318 (CME 338) Large-Scale Numerical Optimization Instructor: Michael Saunders Spring 2011 Final Project Due Friday June 10 A Lasso Solver
More informationDesigning Information Devices and Systems I Spring 2016 Elad Alon, Babak Ayazifar Homework 11
EECS 6A Designing Information Devices and Systems I Spring 206 Elad Alon, Babak Ayazifar Homework This homework is due April 9, 206, at Noon.. Homework process and study group Who else did you work with
More informationSparse Algorithms are not Stable: A No-free-lunch Theorem
Sparse Algorithms are not Stable: A No-free-lunch Theorem Huan Xu Shie Mannor Constantine Caramanis Abstract We consider two widely used notions in machine learning, namely: sparsity algorithmic stability.
More informationThe lasso an l 1 constraint in variable selection
The lasso algorithm is due to Tibshirani who realised the possibility for variable selection of imposing an l 1 norm bound constraint on the variables in least squares models and then tuning the model
More information10725/36725 Optimization Homework 4
10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages
More informationBasics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On
Basics of Proofs The Putnam is a proof based exam and will expect you to write proofs in your solutions Similarly, Math 96 will also require you to write proofs in your homework solutions If you ve seen
More informationReview Solutions, Exam 2, Operations Research
Review Solutions, Exam 2, Operations Research 1. Prove the weak duality theorem: For any x feasible for the primal and y feasible for the dual, then... HINT: Consider the quantity y T Ax. SOLUTION: To
More informationMath 33A Discussion Notes
Math 33A Discussion Notes Jean-Michel Maldague October 21, 2017 Week 3 Incomplete! Will update soon. - A function T : R k R n is called injective, or one-to-one, if each input gets a unique output. The
More informationMATH 225 Summer 2005 Linear Algebra II Solutions to Assignment 1 Due: Wednesday July 13, 2005
MATH 225 Summer 25 Linear Algebra II Solutions to Assignment 1 Due: Wednesday July 13, 25 Department of Mathematical and Statistical Sciences University of Alberta Question 1. [p 224. #2] The set of all
More informationObservations Homework Checkpoint quizzes Chapter assessments (Possibly Projects) Blocks of Algebra
September The Building Blocks of Algebra Rates, Patterns and Problem Solving Variables and Expressions The Commutative and Associative Properties The Distributive Property Equivalent Expressions Seeing
More informationContents. 4.5 The(Primal)SimplexMethod NumericalExamplesoftheSimplexMethod
Contents 4 The Simplex Method for Solving LPs 149 4.1 Transformations to be Carried Out On an LP Model Before Applying the Simplex Method On It... 151 4.2 Definitions of Various Types of Basic Vectors
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationLinear Programming: Simplex
Linear Programming: Simplex Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Linear Programming: Simplex IMA, August 2016
More informationA Quick Introduction to Row Reduction
A Quick Introduction to Row Reduction Gaussian Elimination Suppose we are asked to solve the system of equations 4x + 5x 2 + 6x 3 = 7 6x + 7x 2 + 8x 3 = 9. That is, we want to find all values of x, x 2
More informationSelf-adaptive Lasso and its Bayesian Estimation
Self-adaptive Lasso and its Bayesian Estimation Jian Kang 1 and Jian Guo 2 1. Department of Biostatistics, University of Michigan 2. Department of Statistics, University of Michigan Abstract In this paper,
More informationA significance test for the lasso
1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,
More informationIntroduce complex-valued transcendental functions via the complex exponential defined as:
Complex exponential and logarithm Monday, September 30, 2013 1:59 PM Homework 2 posted, due October 11. Introduce complex-valued transcendental functions via the complex exponential defined as: where We'll
More informationMath Lecture 4 Limit Laws
Math 1060 Lecture 4 Limit Laws Outline Summary of last lecture Limit laws Motivation Limits of constants and the identity function Limits of sums and differences Limits of products Limits of polynomials
More informationA note on the group lasso and a sparse group lasso
A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty
More informationManual of Logical Style (fresh version 2018)
Manual of Logical Style (fresh version 2018) Randall Holmes 9/5/2018 1 Introduction This is a fresh version of a document I have been working on with my classes at various levels for years. The idea that
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationarxiv: v1 [math.st] 2 May 2007
arxiv:0705.0269v1 [math.st] 2 May 2007 Electronic Journal of Statistics Vol. 1 (2007) 1 29 ISSN: 1935-7524 DOI: 10.1214/07-EJS004 Forward stagewise regression and the monotone lasso Trevor Hastie Departments
More informationSMT 2013 Power Round Solutions February 2, 2013
Introduction This Power Round is an exploration of numerical semigroups, mathematical structures which appear very naturally out of answers to simple questions. For example, suppose McDonald s sells Chicken
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationHOMEWORK 4: SVMS AND KERNELS
HOMEWORK 4: SVMS AND KERNELS CMU 060: MACHINE LEARNING (FALL 206) OUT: Sep. 26, 206 DUE: 5:30 pm, Oct. 05, 206 TAs: Simon Shaolei Du, Tianshu Ren, Hsiao-Yu Fish Tung Instructions Homework Submission: Submit
More informationarxiv: v3 [stat.me] 4 Jan 2012
Efficient algorithm to select tuning parameters in sparse regression modeling with regularization Kei Hirose 1, Shohei Tateishi 2 and Sadanori Konishi 3 1 Division of Mathematical Science, Graduate School
More informationBoosted Lasso and Reverse Boosting
Boosted Lasso and Reverse Boosting Peng Zhao University of California, Berkeley, USA. Bin Yu University of California, Berkeley, USA. Summary. This paper introduces the concept of backward step in contrast
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationHomework 4 due on Thursday, December 15 at 5 PM (hard deadline).
Large-Time Behavior for Continuous-Time Markov Chains Friday, December 02, 2011 10:58 AM Homework 4 due on Thursday, December 15 at 5 PM (hard deadline). How are formulas for large-time behavior of discrete-time
More informationMachine Learning & Data Mining Caltech CS/CNS/EE 155 Set 2 January 12 th, 2018
Policies Due 9 PM, January 9 th, via Moodle. You are free to collaborate on all of the problems, subject to the collaboration policy stated in the syllabus. You should submit all code used in the homework.
More informationElementary Linear Algebra Review for Exam 3 Exam is Friday, December 11th from 1:15-3:15
Elementary Linear Algebra Review for Exam 3 Exam is Friday, December th from :5-3:5 The exam will cover sections: 6., 6.2, 7. 7.4, and the class notes on dynamical systems. You absolutely must be able
More informationPreptests 59 Answers and Explanations (By Ivy Global) Section 1 Analytical Reasoning
Preptests 59 Answers and Explanations (By ) Section 1 Analytical Reasoning Questions 1 5 Since L occupies its own floor, the remaining two must have H in the upper and I in the lower. P and T also need
More informationMAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction
MAT4 : Introduction to Applied Linear Algebra Mike Newman fall 7 9. Projections introduction One reason to consider projections is to understand approximate solutions to linear systems. A common example
More informationPre Algebra, Unit 1: Variables, Expression, and Integers
Syllabus Objectives (1.1) Students will evaluate variable and numerical expressions using the order of operations. (1.2) Students will compare integers. (1.3) Students will order integers (1.4) Students
More informationMAS221 Analysis Semester Chapter 2 problems
MAS221 Analysis Semester 1 2018-19 Chapter 2 problems 20. Consider the sequence (a n ), with general term a n = 1 + 3. Can you n guess the limit l of this sequence? (a) Verify that your guess is plausible
More informationSolution to Proof Questions from September 1st
Solution to Proof Questions from September 1st Olena Bormashenko September 4, 2011 What is a proof? A proof is an airtight logical argument that proves a certain statement in general. In a sense, it s
More information642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004
642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 Introduction Square matrices whose entries are all nonnegative have special properties. This was mentioned briefly in Section
More informationThe residual again. The residual is our method of judging how good a potential solution x! of a system A x = b actually is. We compute. r = b - A x!
The residual again The residual is our method of judging how good a potential solution x! of a system A x = b actually is. We compute r = b - A x! which gives us a measure of how good or bad x! is as a
More informationPost-selection inference with an application to internal inference
Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More information