MATH 829: Introduction to Data Mining and Analysis Least angle regression
|
|
- Baldwin Lester
- 6 years ago
- Views:
Transcription
1 1/14 MATH 829: Introduction to Data Mining and Analysis Least angle regression Dominique Guillot Departments of Mathematical Sciences University of Delaware February 29, 2016
2 Least angle regression (LARS) 2/14 Recall the forward stagewise approach to linear regression:
3 Least angle regression (LARS) 2/14 Recall the forward stagewise approach to linear regression: 1 Start with intercept y, and centered predictors with coecients initially all 0. 2 At each step the algorithm: identify the variable most correlated with the current residual. 3 Compute the simple linear regression coecient of the residual on this chosen variable, and add it to the current coecient for that variable. 4 Continued till none of the variables have correlation with the residuals.
4 Least angle regression (LARS) 2/14 Recall the forward stagewise approach to linear regression: 1 Start with intercept y, and centered predictors with coecients initially all 0. 2 At each step the algorithm: identify the variable most correlated with the current residual. 3 Compute the simple linear regression coecient of the residual on this chosen variable, and add it to the current coecient for that variable. 4 Continued till none of the variables have correlation with the residuals. Greedy approach.
5 Least angle regression (LARS) 2/14 Recall the forward stagewise approach to linear regression: 1 Start with intercept y, and centered predictors with coecients initially all 0. 2 At each step the algorithm: identify the variable most correlated with the current residual. 3 Compute the simple linear regression coecient of the residual on this chosen variable, and add it to the current coecient for that variable. 4 Continued till none of the variables have correlation with the residuals. Greedy approach. However, the solution often looks similar to the lasso solution.
6 Least angle regression (LARS) 2/14 Recall the forward stagewise approach to linear regression: 1 Start with intercept y, and centered predictors with coecients initially all 0. 2 At each step the algorithm: identify the variable most correlated with the current residual. 3 Compute the simple linear regression coecient of the residual on this chosen variable, and add it to the current coecient for that variable. 4 Continued till none of the variables have correlation with the residuals. Greedy approach. However, the solution often looks similar to the lasso solution. Connection between the two methods?
7 Forward stagewise vs lasso 3/14 Example: Prostate cancer data (see ESL). Efron et al., 2003
8 LARS 4/14 Least angle regression (LARS) is similar to forward stagewise, but only enters as much of a predictor as it deserves.
9 LARS 4/14 Least angle regression (LARS) is similar to forward stagewise, but only enters as much of a predictor as it deserves. ESL, Algorithm 3.2.
10 LARS (cont.) 5/14 Let A k be the current active set.
11 LARS (cont.) 5/14 Let A k be the current active set. β Ak be the coecients vectors at step k.
12 LARS (cont.) 5/14 Let A k be the current active set. β Ak be the coecients vectors at step k. Let r k = y X Ak β Ak denote the residual at step k.
13 LARS (cont.) 5/14 Let A k be the current active set. β Ak be the coecients vectors at step k. Let r k = y X Ak β Ak denote the residual at step k. Then, at step k, we move the coecients in the direction δ k = (X T A k X Ak ) 1 X T A k r k, i.e., β Ak (α) = β Ak + α δ k.
14 Analysis of LARS 6/14 How does the correlation between the predictors and the residuals evolve? δ k = (X T A k X Ak ) 1 X T A k r k β Ak (α) = β Ak + α δ k.
15 Analysis of LARS 6/14 How does the correlation between the predictors and the residuals evolve? δ k = (X T A k X Ak ) 1 X T A k r k β Ak (α) = β Ak + α δ k. It remains the same for all predictors, and decreases monotonically.
16 Analysis of LARS 6/14 How does the correlation between the predictors and the residuals evolve? δ k = (X T A k X Ak ) 1 X T A k r k β Ak (α) = β Ak + α δ k. It remains the same for all predictors, and decreases monotonically. Indeed, suppose each predictor in a linear regression problem has equal correlation (in absolute value) with the response. 1 n x j, y = λ j = 1,..., p. (Recall, we assume the predictors have been standardized.)
17 Analysis of LARS 6/14 How does the correlation between the predictors and the residuals evolve? δ k = (X T A k X Ak ) 1 X T A k r k β Ak (α) = β Ak + α δ k. It remains the same for all predictors, and decreases monotonically. Indeed, suppose each predictor in a linear regression problem has equal correlation (in absolute value) with the response. 1 n x j, y = λ j = 1,..., p. (Recall, we assume the predictors have been standardized.) Let ˆβ be the least-squares coecients of y on X and let u(α) = αx ˆβ for α [0, 1].
18 Analysis of LARS (cont.) 7/14 We have 1 n x j, y = λ and u(α) = αx ˆβ. Now,
19 Analysis of LARS (cont.) 7/14 We have 1 n x j, y = λ and u(α) = αx ˆβ. Now, ( ) 1 p n x j, y u(α) = 1 j=1 n XT (y u(α)) = 1 n XT (y αx(x T X) 1 X T y) = 1 n (1 α)xt y = (1 α)λ 1 p 1.
20 Analysis of LARS (cont.) 7/14 We have 1 n x j, y = λ and u(α) = αx ˆβ. Now, ( ) 1 p n x j, y u(α) = 1 j=1 n XT (y u(α)) = 1 n XT (y αx(x T X) 1 X T y) = 1 n (1 α)xt y = (1 α)λ 1 p 1. Therefore, the correlation between x j and the residuals y u(α) decreases linearly to 0.
21 Analysis of LARS (cont.) 7/14 We have 1 n x j, y = λ and u(α) = αx ˆβ. Now, ( ) 1 p n x j, y u(α) = 1 j=1 n XT (y u(α)) = 1 n XT (y αx(x T X) 1 X T y) = 1 n (1 α)xt y = (1 α)λ 1 p 1. Therefore, the correlation between x j and the residuals y u(α) decreases linearly to 0. In LARS, the parameter α is increased until a new variable becomes equally correlated with the residuals y u(α).
22 Analysis of LARS (cont.) 7/14 We have 1 n x j, y = λ and u(α) = αx ˆβ. Now, ( ) 1 p n x j, y u(α) = 1 j=1 n XT (y u(α)) = 1 n XT (y αx(x T X) 1 X T y) = 1 n (1 α)xt y = (1 α)λ 1 p 1. Therefore, the correlation between x j and the residuals y u(α) decreases linearly to 0. In LARS, the parameter α is increased until a new variable becomes equally correlated with the residuals y u(α). The new variable is then added to the model, and a new direction is computed.
23 Analysis of LARS (cont.) 8/14 Example: Ĉ k = current maximal correlation. Efron et al., 2003.
24 Equiangular vector 9/14 Why least angle regression? Recal: β Ak (α) = β Ak + α δ k.
25 Equiangular vector 9/14 Why least angle regression? Recal: β Ak (α) = β Ak + α δ k. Thus, ŷ(α) = X Ak β Ak (α) = X Ak β Ak + α X Ak δ k.
26 Equiangular vector 9/14 Why least angle regression? Recal: β Ak (α) = β Ak + α δ k. Thus, ŷ(α) = X Ak β Ak (α) = X Ak β Ak + α X Ak δ k. It is not hard to check that u k := X Ak δ k makes equal angles with the predictors in A k.
27 Equiangular vector 9/14 Why least angle regression? Indeed, Recal: β Ak (α) = β Ak + α δ k. Thus, ŷ(α) = X Ak β Ak (α) = X Ak β Ak + α X Ak δ k. It is not hard to check that u k := X Ak δ k makes equal angles with the predictors in A k. X T A k u k = X T A k X Ak δ k = X T A k X Ak (X T A k X Ak ) 1 X T A k r k = X T A k r k.
28 Equiangular vector 9/14 Why least angle regression? Indeed, Recal: β Ak (α) = β Ak + α δ k. Thus, ŷ(α) = X Ak β Ak (α) = X Ak β Ak + α X Ak δ k. It is not hard to check that u k := X Ak δ k makes equal angles with the predictors in A k. X T A k u k = X T A k X Ak δ k = X T A k X Ak (X T A k X Ak ) 1 X T A k r k = X T A k r k. The entries of the vector X T A k r k are all the same since the predictors in A k all have the same correlation with the residuals r k (by construction).
29 Equiangular vector 9/14 Why least angle regression? Indeed, Recal: β Ak (α) = β Ak + α δ k. Thus, ŷ(α) = X Ak β Ak (α) = X Ak β Ak + α X Ak δ k. It is not hard to check that u k := X Ak δ k makes equal angles with the predictors in A k. X T A k u k = X T A k X Ak δ k = X T A k X Ak (X T A k X Ak ) 1 X T A k r k = X T A k r k. The entries of the vector X T A k r k are all the same since the predictors in A k all have the same correlation with the residuals r k (by construction). Conclusion: u k makes equal angles with the predictors in A k.
30 Equiangular vector Why least angle regression? Indeed, Recal: β Ak (α) = β Ak + α δ k. Thus, ŷ(α) = X Ak β Ak (α) = X Ak β Ak + α X Ak δ k. It is not hard to check that u k := X Ak δ k makes equal angles with the predictors in A k. X T A k u k = X T A k X Ak δ k = X T A k X Ak (X T A k X Ak ) 1 X T A k r k = X T A k r k. The entries of the vector X T A k r k are all the same since the predictors in A k all have the same correlation with the residuals r k (by construction). Conclusion: u k makes equal angles with the predictors in A k. Problem: In general, given v 1,..., v k R n, how do we nd a vector that makes equal angles with v 1,..., v k. When is this possible? 9/14
31 LARS and Lasso 10/14 LARS is closely related to stepwise regression.
32 LARS and Lasso 10/14 LARS is closely related to stepwise regression. There is also a connection to the Lasso.
33 LARS and Lasso LARS is closely related to stepwise regression. There is also a connection to the Lasso. ESL, Figure On the above gure, the lasso coecient proles are almost identical to those of LARS in the left panel, and dier for the rst time when the blue coecient passes back through zero. 10/14
34 LARS and Lasso (cont.) 11/14 The previous observation suggests the following LARS modication.
35 LARS and Lasso (cont.) 11/14 The previous observation suggests the following LARS modication. ESL, Algorithm 3.2a.
36 LARS and Lasso (cont.) 11/14 The previous observation suggests the following LARS modication. ESL, Algorithm 3.2a. Theorem: The modied LARS (lasso) algorithm (Algorithm 3.2a) yields the solution of the Lasso problem if variables appear/disappear one at a time.
37 LARS and Lasso (cont.) 11/14 The previous observation suggests the following LARS modication. ESL, Algorithm 3.2a. Theorem: The modied LARS (lasso) algorithm (Algorithm 3.2a) yields the solution of the Lasso problem if variables appear/disappear one at a time. See Efron et al., Least angle regression, The Annals of Statistics, 2004.
38 LARS and Lasso (cont.) 11/14 The previous observation suggests the following LARS modication. ESL, Algorithm 3.2a. Theorem: The modied LARS (lasso) algorithm (Algorithm 3.2a) yields the solution of the Lasso problem if variables appear/disappear one at a time. See Efron et al., Least angle regression, The Annals of Statistics, Note: the theorem explains the piecewise linear nature of the lasso.
39 Consistency of the Lasso 12/14 Recall: We proved before that the least-squares estimator is consistent, i.e., ˆβ n β as the sample size n goes to innity (under some assumptions).
40 Consistency of the Lasso 12/14 Recall: We proved before that the least-squares estimator is consistent, i.e., ˆβ n β as the sample size n goes to innity (under some assumptions). We now study analogous results for the lasso. Assumptions:
41 Consistency of the Lasso 12/14 Recall: We proved before that the least-squares estimator is consistent, i.e., ˆβ n β as the sample size n goes to innity (under some assumptions). We now study analogous results for the lasso. Assumptions: X 1,..., X p are (possibly dependent) random variables.
42 Consistency of the Lasso 12/14 Recall: We proved before that the least-squares estimator is consistent, i.e., ˆβ n β as the sample size n goes to innity (under some assumptions). We now study analogous results for the lasso. Assumptions: X 1,..., X p are (possibly dependent) random variables. X j M almost surely for some M > 0, (j = 1,..., p).
43 Consistency of the Lasso 12/14 Recall: We proved before that the least-squares estimator is consistent, i.e., ˆβ n β as the sample size n goes to innity (under some assumptions). We now study analogous results for the lasso. Assumptions: X 1,..., X p are (possibly dependent) random variables. X j M almost surely for some M > 0, (j = 1,..., p). Y = p j=1 β j X j + ɛ for some (unknown) constants β j.
44 Consistency of the Lasso 12/14 Recall: We proved before that the least-squares estimator is consistent, i.e., ˆβ n β as the sample size n goes to innity (under some assumptions). We now study analogous results for the lasso. Assumptions: X 1,..., X p are (possibly dependent) random variables. X j M almost surely for some M > 0, (j = 1,..., p). Y = p j=1 β j X j + ɛ for some (unknown) constants β j. ɛ N(0, σ 2 ) is independent of the X j (σ 2 unknown).
45 Consistency of the Lasso 12/14 Recall: We proved before that the least-squares estimator is consistent, i.e., ˆβ n β as the sample size n goes to innity (under some assumptions). We now study analogous results for the lasso. Assumptions: X 1,..., X p are (possibly dependent) random variables. X j M almost surely for some M > 0, (j = 1,..., p). Y = p j=1 β j X j + ɛ for some (unknown) constants β j. ɛ N(0, σ 2 ) is independent of the X j (σ 2 unknown). Sparsity assumption (specied later).
46 Consistency of the Lasso (cont.) 13/14 We are given n iid observations of (Y, X 1,..., X p ). Z i = (Y i, X i,1,..., X i,p )
47 Consistency of the Lasso (cont.) 13/14 We are given n iid observations of (Y, X 1,..., X p ). Z i = (Y i, X i,1,..., X i,p ) Our goal is to recover β 1,..., β p as accurately as possible.
48 Consistency of the Lasso (cont.) 13/14 We are given n iid observations Z i = (Y i, X i,1,..., X i,p ) of (Y, X 1,..., X p ). Our goal is to recover β1,..., β p as accurately as possible. Let p Ŷ = βj X j, j=1 the best predictor of Y if the true coecients were known.
49 Consistency of the Lasso (cont.) 13/14 We are given n iid observations Z i = (Y i, X i,1,..., X i,p ) of (Y, X 1,..., X p ). Our goal is to recover β1,..., β p as accurately as possible. Let p Ŷ = βj X j, j=1 the best predictor of Y if the true coecients were known. Given β 1,..., β p, let p Ỹ = β j X j. j=1
50 Consistency of the Lasso (cont.) 13/14 We are given n iid observations Z i = (Y i, X i,1,..., X i,p ) of (Y, X 1,..., X p ). Our goal is to recover β1,..., β p as accurately as possible. Let p Ŷ = βj X j, j=1 the best predictor of Y if the true coecients were known. Given β 1,..., β p, let p Ỹ = β j X j. j=1 Dene the mean square prediction error by MSPE( β) = E(Ŷ Ỹ )2.
51 Consistency of the Lasso (cont.) We are given n iid observations Z i = (Y i, X i,1,..., X i,p ) of (Y, X 1,..., X p ). Our goal is to recover β1,..., β p as accurately as possible. Let p Ŷ = βj X j, j=1 the best predictor of Y if the true coecients were known. Given β 1,..., β p, let p Ỹ = β j X j. j=1 Dene the mean square prediction error by MSPE( β) = E(Ŷ Ỹ )2. We will provide a bound on MSPE( β) when β is the lasso solution. 13/14
52 Consistency of the Lasso (cont.) 14/14 Given K > 0, let β K = ( β 1 K,..., β p K ) be the minimizer of n (Y i β 1 X i,1 β p X i,p ) 2 i=1 under the constraint p β i K. i=1 (The problem is equivalent to the lasso).
53 Consistency of the Lasso (cont.) Given K > 0, let β K = ( β 1 K,..., β p K ) be the minimizer of n (Y i β 1 X i,1 β p X i,p ) 2 i=1 under the constraint p β i K. i=1 (The problem is equivalent to the lasso). Theorem: Under the previous assumptions and assuming p βj K for some K > 0 (sparsity assumption), we have j=1 MSPE( β K ) 2KMσ 2 log(2p) n 2 log(2p + 8K 2 M 2 2 ). n See Chatterjee, Assumptionless consistency of the Lasso, preprint, /14
MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression
1/9 MATH 829: Introduction to Data Mining and Analysis Consistency of Linear Regression Dominique Guillot Deartments of Mathematical Sciences University of Delaware February 15, 2016 Distribution of regression
More informationMATH 567: Mathematical Techniques in Data Science Linear Regression: old and new
1/14 MATH 567: Mathematical Techniques in Data Science Linear Regression: old and new Dominique Guillot Departments of Mathematical Sciences University of Delaware February 13, 2017 Linear Regression:
More informationRegularization Paths. Theme
June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,
More informationMATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests
1/16 MATH 829: Introduction to Data Mining and Analysis Linear Regression: statistical tests Dominique Guillot Departments of Mathematical Sciences University of Delaware February 17, 2016 Statistical
More informationLeast Angle Regression, Forward Stagewise and the Lasso
January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,
More informationMATH 829: Introduction to Data Mining and Analysis Computing the lasso solution
1/16 MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution Dominique Guillot Departments of Mathematical Sciences University of Delaware February 26, 2016 Computing the lasso
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationMATH 829: Introduction to Data Mining and Analysis Linear Regression: old and new
1/15 MATH 829: Introduction to Data Mining and Analysis Linear Regression: old and new Dominique Guillot Departments of Mathematical Sciences University of Delaware February 10, 2016 Linear Regression:
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationBANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1
BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models I
MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering
More informationSaharon Rosset 1 and Ji Zhu 2
Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationStat588 Homework 1 (Due in class on Oct 04) Fall 2011
Stat588 Homework 1 (Due in class on Oct 04) Fall 2011 Notes. There are three sections of the homework. Section 1 and Section 2 are required for all students. While Section 3 is only required for Ph.D.
More informationLecture 5: Soft-Thresholding and Lasso
High Dimensional Data and Statistical Learning Lecture 5: Soft-Thresholding and Lasso Weixing Song Department of Statistics Kansas State University Weixing Song STAT 905 October 23, 2014 1/54 Outline Penalized
More informationBusiness Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony
More informationMATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis
Logistic regression MATH 567: Mathematical Techniques in Data Science Logistic regression and Discriminant Analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware March 6,
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationNon-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets
Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual
More informationMachine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014
Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression
More informationRegularization Path Algorithms for Detecting Gene Interactions
Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable
More informationA significance test for the lasso
1 First part: Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Second part: Joint work with Max Grazier G Sell, Stefan Wager and Alexandra
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationGradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve
More informationA significance test for the lasso
1 Gold medal address, SSC 2013 Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and Ryan Tibshirani (Carnegie-Mellon Univ.) Reaping the benefits of LARS: A special thanks to Brad Efron,
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationLecture 5: Clustering, Linear Regression
Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-2 STATS 202: Data mining and analysis Sergio Bacallado September 19, 2018 1 / 23 Announcements Starting next week, Julia Fukuyama
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationMATH 829: Introduction to Data Mining and Analysis Support vector machines
1/10 MATH 829: Introduction to Data Mining and Analysis Support vector machines Dominique Guillot Departments of Mathematical Sciences University of Delaware March 11, 2016 Hyperplanes 2/10 Recall: A hyperplane
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationDay 4: Shrinkage Estimators
Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationLeast angle regression for time series forecasting with many predictors. Sarah Gelper & Christophe Croux Faculty of Business and Economics K.U.
Least angle regression for time series forecasting with many predictors Sarah Gelper & Christophe Croux Faculty of Business and Economics K.U.Leuven I ve got all these variables, but I don t know which
More informationLecture 6: Linear Regression
Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationConjugate direction boosting
Conjugate direction boosting June 21, 2005 Revised Version Abstract Boosting in the context of linear regression become more attractive with the invention of least angle regression (LARS), where the connection
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationA Significance Test for the Lasso
A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationUNIVERSITETET I OSLO
UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet Examination in: STK4030 Modern data analysis - FASIT Day of examination: Friday 13. Desember 2013. Examination hours: 14.30 18.30. This
More information1. Simple Linear Regression
1. Simple Linear Regression Suppose that we are interested in the average height of male undergrads at UF. We put each male student s name (population) in a hat and randomly select 100 (sample). Then their
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.)
1/12 MATH 829: Introduction to Data Mining and Analysis Graphical Models III - Gaussian Graphical Models (cont.) Dominique Guillot Departments of Mathematical Sciences University of Delaware May 6, 2016
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationSimple Linear Regression
Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors
More informationMS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015
MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates
More informationarxiv: v1 [math.st] 2 May 2007
arxiv:0705.0269v1 [math.st] 2 May 2007 Electronic Journal of Statistics Vol. 1 (2007) 1 29 ISSN: 1935-7524 DOI: 10.1214/07-EJS004 Forward stagewise regression and the monotone lasso Trevor Hastie Departments
More informationOn High-Dimensional Cross-Validation
On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING
More informationCOMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION
COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for
More informationStatistical View of Least Squares
May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationECON 616: Lecture Two: Deterministic Trends, Nonstationary Processes
ECON 616: Lecture Two: Deterministic Trends, Nonstationary Processes ED HERBST September 11, 2017 Background Hamilton, chapters 15-16 Trends vs Cycles A commond decomposition of macroeconomic time series
More informationMultiple linear regression S6
Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple
More informationMATH 829: Introduction to Data Mining and Analysis Logistic regression
1/11 MATH 829: Introduction to Data Mining and Analysis Logistic regression Dominique Guillot Departments of Mathematical Sciences University of Delaware March 7, 2016 Logistic regression 2/11 Suppose
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More informationLecture 7: Modeling Krak(en)
Lecture 7: Modeling Krak(en) Variable selection Last In both time cases, we saw we two are left selection with selection problems problem -- How -- do How we pick do we either pick either subset the of
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More informationSummary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club
Summary and discussion of: Exact Post-selection Inference for Forward Stepwise and Least Angle Regression Statistics Journal Club 36-825 1 Introduction Jisu Kim and Veeranjaneyulu Sadhanala In this report
More informationLeast Squares Regression
Least Squares Regression Chemical Engineering 2450 - Numerical Methods Given N data points x i, y i, i 1 N, and a function that we wish to fit to these data points, fx, we define S as the sum of the squared
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationMATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models
1/13 MATH 829: Introduction to Data Mining and Analysis Graphical Models II - Gaussian Graphical Models Dominique Guillot Departments of Mathematical Sciences University of Delaware May 4, 2016 Recall
More informationRidge and Lasso Regression
enote 8 1 enote 8 Ridge and Lasso Regression enote 8 INDHOLD 2 Indhold 8 Ridge and Lasso Regression 1 8.1 Reading material................................. 2 8.2 Presentation material...............................
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationCOMS 4771 Lecture Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso
COMS 477 Lecture 6. Fixed-design linear regression 2. Ridge and principal components regression 3. Sparse regression and Lasso / 2 Fixed-design linear regression Fixed-design linear regression A simplified
More informationAPPROXIMATING THE COMPLEXITY MEASURE OF. Levent Tuncel. November 10, C&O Research Report: 98{51. Abstract
APPROXIMATING THE COMPLEXITY MEASURE OF VAVASIS-YE ALGORITHM IS NP-HARD Levent Tuncel November 0, 998 C&O Research Report: 98{5 Abstract Given an m n integer matrix A of full row rank, we consider the
More informationAssignment 3. Introduction to Machine Learning Prof. B. Ravindran
Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively
More informationS-PLUS and R package for Least Angle Regression
S-PLUS and R package for Least Angle Regression Tim Hesterberg, Chris Fraley Insightful Corp. Abstract Least Angle Regression is a promising technique for variable selection applications, offering a nice
More informationA New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives
A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives Paul Grigas May 25, 2016 1 Boosting Algorithms in Linear Regression Boosting [6, 9, 12, 15, 16] is an extremely
More informationx 1 + 4x 2 = 5, 7x 1 + 5x 2 + 2x 3 4,
LUNDS TEKNISKA HÖGSKOLA MATEMATIK LÖSNINGAR LINJÄR OCH KOMBINATORISK OPTIMERING 2018-03-16 1. a) The rst thing to do is to rewrite the problem so that the right hand side of all constraints are positive.
More informationRegression and Statistical Inference
Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationLEAST ANGLE REGRESSION 469
LEAST ANGLE REGRESSION 469 Specifically for the Lasso, one alternative strategy for logistic regression is to use a quadratic approximation for the log-likelihood. Consider the Bayesian version of Lasso
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More informationWithin Group Variable Selection through the Exclusive Lasso
Within Group Variable Selection through the Exclusive Lasso arxiv:1505.07517v1 [stat.me] 28 May 2015 Frederick Campbell Department of Statistics, Rice University and Genevera Allen Department of Statistics,
More information