A Short Introduction to the Lasso Methodology
|
|
- Rachel Smith
- 5 years ago
- Views:
Transcription
1 A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael Gutmann Short Introduction to the Lasso 1 / 24
2 Goals of the lecture Lasso Least Absolute Shrinkage and Selection Operator Goal: After the lecture, to understand what these words mean Shrinkage: The lasso shrinks / regularizes the least squares regression coefficients (like ridge regression). Selection: The lasso also performs variable selection (unlike ridge regression). Least absolute: Shrinkage and selection are achieved by penalizing the absolute values of the regression coefficients. Michael Gutmann Short Introduction to the Lasso 2 / 24
3 Linear regression 5 4 Data: {(x 1, y 1 ),..., (x n, y n )} n observations of pairs (x i, y i ) x i R p : covariates yi R: response Response Data Regression lines Covariate Assumption: linear relation between covariates and response y i = x i1 β x ip β p + e i (1) = x i β + e i (2) where e i is the residual Goal: Determine the coefficients β = (β 1,..., β p ) Michael Gutmann Short Introduction to the Lasso 3 / 24
4 Least squares Minimize the residual sum of squares (RSS) In vector notation, with we have y = n n ( 2 RSS(β) = ei 2 = y i x i β) (3) i=1 i=1 y 1. y n X = x 1. x n = x x 1p... x n1... x np (4) RSS(β) = y Xβ 2 2 (5) Michael Gutmann Short Introduction to the Lasso 4 / 24
5 Least squares Closed form solution if p p matrix X X is invertible ˆβ o = argmin RSS(β) (6) β = (X X) 1 X y (7) 5 4 Prediction given a test covariate vector x (8) ŷ = x ˆβ o Response Data -3 Regression lines Least squares solution Covariate Michael Gutmann Short Introduction to the Lasso 5 / 24
6 Ridge regression If X X is not invertible, regularized inverse can be taken ˆβ r = (X X + λi p ) 1 X y (9) where I p is the p p identity matrix and λ 0 the regularization parameter. This is ridge regression, ˆβ r is minimizing J r (β) p J r (β) = y Xβ λ βj 2 (10) j=1 As λ increases, ˆβ r shrinks to zero ( shrinkage ). Michael Gutmann Short Introduction to the Lasso 6 / 24
7 Benefits of ridge regression ˆβ r = (X X + λi p) 1 X y Regularization / shrinkage is useful even if X X is invertible. Reason: it can improve prediction accuracy Example: n = 50 observations, p = 10 covariates Orthonormal matrix X: X X = I p ˆβ r = 1 1+λ X y = 1 ˆβ o Least squares 56 1+λ Mean squared prediction error Penalty λ (log 10) Ridge regression Michael Gutmann Short Introduction to the Lasso 7 / 24
8 Limits of ridge regression ˆβ r = 1 1+λ ˆβ o Data were artificially generated with β = (3, 2, 1, 0,..., 0) The vector is sparse: only 3/10 nonzero terms Ridge regression cannot recover sparse β. Ridge regression performs shrinkage but not variable selection. Variable selection: Some ˆβ j are set to zero; covariates are omitted from Regression coefficients the fitted model Penalty λ (log 10) Michael Gutmann Short Introduction to the Lasso 8 / 24
9 Some practical aspects Choice of λ: via cross-validation Ridge solution ˆβ r depends on the scale of the covariates. Center so that n i=1 y i = n i=1 x ij = 0 Re-scale so that n i=1 x ij 2 = 1 Assume that the data were preprocessed in this manner. Michael Gutmann Short Introduction to the Lasso 9 / 24
10 Importance of variable selection It reduces the complexity of the models. The models become easier to interpret. It makes prediction cheaper: only covariates with nonzero ˆβ j need to be measured. ŷ = x 1 ˆβ x 1000 ˆβ 1000 ŷ = x 1 ˆβ1 + x 2 ˆβ2 + x 3 ˆβ3 Michael Gutmann Short Introduction to the Lasso 10 / 24
11 Lasso regression Lasso regression consists in minimizing J L (β), p J L (β) = y Xβ λ β j (11) j=1 Similar to the cost function J r (β) for ridge regression, p J r (β) = y Xβ λ βj 2 (12) j=1 λ 0 is the regularization (shrinkage) parameter. Penalty: sum of absolute values instead of sum of squares Difference seems minor but it results in a very different behavior: it enables shrinkage and selection of covariates. Michael Gutmann Short Introduction to the Lasso 11 / 24
12 Shrinkage and variable selection with the lasso The lasso generally lacks an analytical solution. Closed form solution when X X = I p ˆβ o λ ˆβ j L 2 if ˆβ o λ 2 = 0 if ˆβ o ( λ 2, λ 2 ) ˆβ o + λ 2 if ˆβ o λ 2 (13) shrinkage 1 shrinkage Lasso coefficient variable selection Ridge coefficient Least squares coefficient Least squares coefficient Michael Gutmann Short Introduction to the Lasso 12 / 24
13 Back to the example Data were artificially generated with β = (3, 2, 1, 0,..., 0) The vector is sparse: only 3/10 nonzero terms Lasso regression combines shrinkage and variable selection. Mean squared prediction error Lasso regression Least squares Regression coefficients Penalty λ (log 10) Penalty λ (log 10) Michael Gutmann Short Introduction to the Lasso 13 / 24
14 Proof Assume X X = I p We want to show that the shrinkage and selection operator ˆβ o λ ˆβ j L 2 if ˆβ o λ 2 = 0 if ˆβ o ( λ 2, λ 2 ) (14) ˆβ o + λ 2 if ˆβ o λ 2 minimizes J L (β), p J L (β) = y Xβ λ β j (15) j=1 Michael Gutmann Short Introduction to the Lasso 14 / 24
15 Proof p J L (β) = y Xβ λ β j (16) j=1 p = (y Xβ) (y Xβ) + λ β j (17) j=1 p = y y y Xβ β X y + β X Xβ + λ β j (18) j=1 p = y y 2β X y + β X}{{ X} β + λ β j (19) I p j=1 p = y y 2β X y +β β + λ β }{{} j (20) ˆβ o j=1 =r Michael Gutmann Short Introduction to the Lasso 15 / 24
16 Proof p J L (β) = y y 2β r + β β + λ β j (21) j=1 p p p = y y 2 β j r j + βj 2 + λ β j (22) j=1 j=1 j=1 p ( ) = y y + 2β j r j + βj 2 + λ β j (23) j=1 }{{} f j (β j ) p = constant + f j (β j ) (24) j=1 For X X = I p, the optimization problem decomposes into p independent problems. Minimizing each f j (β j ) separately will minimize J L (β). Michael Gutmann Short Introduction to the Lasso 16 / 24
17 Proof Drop the subscripts for a moment and consider a single f only. Problem: derivative at zero not defined f (β) = β 2 2rβ + λ β (25) 6 5 λ = 1 λ = 2 λ = 3 4 f (for r=1) β Michael Gutmann Short Introduction to the Lasso 17 / 24
18 Proof Approach: Make a smooth approximation β h ɛ (β) { ɛ h ɛ (β) = ɛ β2 if β ( ɛ, ɛ) β otherwise (26) Do all the work with ɛ > 0 and, at the end, take the limit ɛ absolute value approximation β Michael Gutmann Short Introduction to the Lasso 18 / 24
19 Proof Using h ɛ (β) instead of β gives f (β) = β 2 2rβ + λh ɛ (β) (27) The derivative of f (β) is f (β) = 2β 2r + λh ɛ(β) (28) if β ɛ h ɛ(β) β = ɛ if β ( ɛ, ɛ) 1 if β ɛ β Michael Gutmann Short Introduction to the Lasso 19 / 24
20 Proof Setting the derivative of f (β) to zero gives the condition 2β 2r + λh ɛ(β) = 0 (29) β + λ 2 h ɛ(β) = r (30) The left-hand side is a piecewise linear, monotonically increasing function g ɛ (β): β is uniquely determined by r β + λ 2 if β ɛ g ɛ (β) = β(1 + λ 2ɛ ) if β ( ɛ, ɛ) β λ 2 if β ɛ β Michael Gutmann Short Introduction to the Lasso 20 / 24
21 Proof There are three cases 1. r ɛ + λ 2 β + λ! = r β = r λ r ( ɛ λ, ɛ + ) λ 2 2 β 2ɛ + λ 2ɛ! = r β = 2ɛr 2ɛ + λ Case 1 Case 2 3. r ɛ λ 2 β λ 2! = r β = r + λ 2 Case β Michael Gutmann Short Introduction to the Lasso 21 / 24
22 Proof There are three cases 1. r ɛ + λ 2 β + λ! = r β = r λ r ( ɛ λ, ɛ + ) λ 2 2 β 2ɛ + λ 2ɛ! = r β = 2ɛr 2ɛ + λ Case 1 Case 2 3. r ɛ λ 2 Case β β λ 2! = r β = r + λ 2 r λ 2 if r ɛ + λ 2 Hence β = 2ɛ 2ɛ+λ r if r ( ɛ λ 2, ɛ + λ 2 ) r + λ 2 if r ɛ λ 2 Michael Gutmann Short Introduction to the Lasso 22 / 24
23 Proof Taking the limit ɛ 0 gives r λ 2 if r λ 2 ˆβ = 0 if r ( λ 2, λ 2 ) r + λ 2 if r λ 2 With the subscripts, and r j = ˆβ j o, we have ˆβ j o λ 2 if ˆβ j o λ 2 ˆβ j = 0 if ˆβ j o ( λ 2, λ 2 ) ˆβ j o + λ 2 if ˆβ j o λ 2 (31) (32) which is the lasso solution ˆβ L j. Michael Gutmann Short Introduction to the Lasso 23 / 24
24 Summary Lasso Least Absolute Shrinkage and Selection Operator Method to regularize linear regression (like ridge regression) Regularization / shrinkage can improve prediction accuracy. Method to perform covariate selection (unlike ridge regression) Covariate selection reduces the complexity of fitted models; makes them easier to interpret. Combination of shrinkage and selection is achieved by penalizing the absolute values of the regression coefficients. Michael Gutmann Short Introduction to the Lasso 24 / 24
25 Appendix
26 Constrained optimization point of view Ridge regression: min y β Xβ 2 2 p subject to t j=1 β 2 j Lasso regression: min y β Xβ 2 2 p subject to β j t j=1 RSS increases Iso-contours of the RSS Constraint set (Based on figures from chapter 6 of Introduction to Statistical Learning) Michael Gutmann Short Introduction to the Lasso 2 / 2
Dimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationRidge Regression. Flachs, Munkholt og Skotte. May 4, 2009
Ridge Regression Flachs, Munkholt og Skotte May 4, 2009 As in usual regression we consider a pair of random variables (X, Y ) with values in R p R and assume that for some (β 0, β) R +p it holds that E(Y
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More informationLecture 5: A step back
Lecture 5: A step back Last time Last time we talked about a practical application of the shrinkage idea, introducing James-Stein estimation and its extension We saw our first connection between shrinkage
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More informationDay 4: Shrinkage Estimators
Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationHomework 1: Solutions
Homework 1: Solutions Statistics 413 Fall 2017 Data Analysis: Note: All data analysis results are provided by Michael Rodgers 1. Baseball Data: (a) What are the most important features for predicting players
More informationMS-C1620 Statistical inference
MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationarxiv: v3 [stat.ml] 14 Apr 2016
arxiv:1307.0048v3 [stat.ml] 14 Apr 2016 Simple one-pass algorithm for penalized linear regression with cross-validation on MapReduce Kun Yang April 15, 2016 Abstract In this paper, we propose a one-pass
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationUNIVERSITETET I OSLO
UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet Examination in: STK4030 Modern data analysis - FASIT Day of examination: Friday 13. Desember 2013. Examination hours: 14.30 18.30. This
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationSpatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University
Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under
More information. a m1 a mn. a 1 a 2 a = a n
Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationStability and the elastic net
Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationBANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1
BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)
More information5.2 Expounding on the Admissibility of Shrinkage Estimators
STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationRegression #2. Econ 671. Purdue University. Justin L. Tobias (Purdue) Regression #2 1 / 24
Regression #2 Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #2 1 / 24 Estimation In this lecture, we address estimation of the linear regression model. There are many objective functions
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationNonparametric Regression. Badr Missaoui
Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationLecture 11: Regression Methods I (Linear Regression)
Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear
More informationCS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS
CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationContents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects
Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationMATH 829: Introduction to Data Mining and Analysis Least angle regression
1/14 MATH 829: Introduction to Data Mining and Analysis Least angle regression Dominique Guillot Departments of Mathematical Sciences University of Delaware February 29, 2016 Least angle regression (LARS)
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 6: Linear Regression Model with RegularizaEons. Dr. Yanjun Qi. University of Virginia
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 6: Linear Regression Model with RegularizaEons Dr. Yanjun Qi University of Virginia Department of Computer Science 1 Where are we? è Five major secgons
More informationWithin Group Variable Selection through the Exclusive Lasso
Within Group Variable Selection through the Exclusive Lasso arxiv:1505.07517v1 [stat.me] 28 May 2015 Frederick Campbell Department of Statistics, Rice University and Genevera Allen Department of Statistics,
More informationRidge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation
Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking
More informationLogistic Regression with the Nonnegative Garrote
Logistic Regression with the Nonnegative Garrote Enes Makalic Daniel F. Schmidt Centre for MEGA Epidemiology The University of Melbourne 24th Australasian Joint Conference on Artificial Intelligence 2011
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationChapter 2: simple regression model
Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.
More informationLecture 19 Multiple (Linear) Regression
Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013 Lecture Plan 1 Multiple regression
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 13, 2017 1 Learning Model A: learning algorithm for a machine learning task S: m i.i.d. pairs z i = (x i, y i ), i = 1,..., m,
More informationMachine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart
Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationThe MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010
Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have
More informationCompressed Sensing in Cancer Biology? (A Work in Progress)
Compressed Sensing in Cancer Biology? (A Work in Progress) M. Vidyasagar FRS Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar University
More informationLecture 11: Regression Methods I (Linear Regression)
Lecture 11: Regression Methods I (Linear Regression) 1 / 43 Outline 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear Regression Ordinary Least Squares Statistical
More informationORIE 4741: Learning with Big Messy Data. Regularization
ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model
More informationNon-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets
Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationMachine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationRegression.
Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets
More informationTheoretical results for lasso, MCP, and SCAD
Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical
More information18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013
18.S096 Problem Set 3 Fall 013 Regression Analysis Due Date: 10/8/013 he Projection( Hat ) Matrix and Case Influence/Leverage Recall the setup for a linear regression model y = Xβ + ɛ where y and ɛ are
More informationSimultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationVariable Selection under Measurement Error: Comparing the Performance of Subset Selection and Shrinkage Methods
Variable Selection under Measurement Error: Comparing the Performance of Subset Selection and Shrinkage Methods Ellen Sasahara Bachelor s Thesis Supervisor: Prof. Dr. Thomas Augustin Department of Statistics
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationParameter Norm Penalties. Sargur N. Srihari
Parameter Norm Penalties Sargur N. srihari@cedar.buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Chapter 6 October 18, 2016 Chapter 6 October 18, 2016 1 / 80 1 Subset selection 2 Shrinkage methods 3 Dimension reduction methods (using derived inputs) 4 High
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationModeling Real Estate Data using Quantile Regression
Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices
More informationMachine Learning for Economists: Part 4 Shrinkage and Sparsity
Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More information