Focused fine-tuning of ridge regression

Size: px
Start display at page:

Download "Focused fine-tuning of ridge regression"

Transcription

1 Focused fine-tuning of ridge regression Kristoffer Hellton Department of Mathematics, University of Oslo May 9, 2016 K. Hellton (UiO) Focused tuning May 9, / 22

2 Penalized regression The least-squares (LS) estimate β = (X T X) 1 X T Y is sensitive to random errors or not unique if (X T X) 1 becomes close to or exactly singular, for instance as p > n. Ridge regression addresses this problem by penalizing the residual sum-of-squares: n ( ˆβ ridge = arg min yi x T i β ) 2 p + λ β i=1 introducing a tuning parameter λ. j=1 The solution and its relation to the LS estimate (p < n): β 2 j, ˆβ = (X T X + λi p ) 1 X T Y = (X T X + λi p ) 1 X T X β. K. Hellton (UiO) Focused tuning May 9, / 22

3 How to choose the tuning parameter? Many different procedures: K-fold cross-validation, Hastie et al. (2009) Allen s PRESS statistic, leave-one-out CV, Allen (1974) Generalized cross-validation, Golub et al. (1979) 1 Bootstrap, Delaney and Chatterjee (1986) Small sample corrected AIC, Hurvich and Tsai (1989) Marginal maximum likelihood The current unchallenged favorite: 10-fold cross-validation 1 Already Golub et al. (1979) provide references to more than 15 procedures. K. Hellton (UiO) Focused tuning May 9, / 22

4 Thought experiment A medical study has measured covariates and observed outcomes for a group of patients. A new patient enters the doctors office, and we wish to predict the outcome for his/her specific set of covariates, x 0, as best as possible. Cross-validation would select λ to be optimal for the overall distribution of covariates, but not the specific x 0. Instead; frame the specific prediction µ = x T 0 β as a focus parameter: 1 find the bias and variance of ˆµ, 2 and minimize the estimated mean squared error (MSE) of ˆµ as a function of λ to find ˆλ. K. Hellton (UiO) Focused tuning May 9, / 22

5 Given the standard fixed design regression model y i = x T i β + ε i, ε i N(0, σ 2 ), and the focus µ 0 = Ey 0 = x T 0 β, root mse the MSE(ˆµ) is a function of λ and the model parameters: Tuning parameter K. Hellton (UiO) Focused tuning May 9, / 22

6 Given the standard fixed design regression model y i = x T i β + ε i, ε i N(0, σ 2 ), and the focus µ 0 = Ey 0 = x T 0 β, root mse the MSE(ˆµ) is a function of λ and the model parameters: Tuning parameter MSE(ˆµ; λ, x 0, β, σ 2 ) = { x T 0 ((X T X + λi p ) 1 X T X I p )β } 2 + σ 2 x T 0 (X T X + λi p ) 1 X T X(X T X + λi p ) 1 x 0 For each covariate set x 0 the MSE curves, as functions of λ, have different minima. K. Hellton (UiO) Focused tuning May 9, / 22

7 The minimand of the theoretical MSE defines the oracle tuning λ best = arg min λ MSE(ˆµ; λ, x 0 ). K. Hellton (UiO) Focused tuning May 9, / 22

8 The minimand of the theoretical MSE defines the oracle tuning λ best = arg min λ MSE(ˆµ; λ, x 0 ). But an estimator of λ best requires pilot estimates of β and σ 2. Our proposed estimator uses the LS estimates (for p < n): ˆλ best = arg min λ MSE(ˆµ; λ, x 0 ), = arg min λ Var(ˆµ; λ, x0 ) + bias 2 (ˆµ; λ, x 0 ) = arg min λ { Var + max{( bias) 2 Var bias, 0} }, K. Hellton (UiO) Focused tuning May 9, / 22

9 The minimand of the theoretical MSE defines the oracle tuning λ best = arg min λ MSE(ˆµ; λ, x 0 ). But an estimator of λ best requires pilot estimates of β and σ 2. Our proposed estimator uses the LS estimates (for p < n): ˆλ best = arg min λ MSE(ˆµ; λ, x 0 ), = arg min λ Var(ˆµ; λ, x0 ) + bias 2 (ˆµ; λ, x 0 ) = arg min λ { Var + max{( bias) 2 Var bias, 0} or a simplified version without bias correction, written out as: { ˆλ best = arg min (x T 0 ((X T X + λi p ) 1 X T X I p ) β) 2 λ + σ 2 x T 0 (X T X + λi p ) 1 X T X(X T X + λi p ) 1 } x 0 }, K. Hellton (UiO) Focused tuning May 9, / 22

10 In one dimension Suppose p = 1, then ˆβ = M M + λ β, M = and the mean squared error is given n x 2 i, ( ) λ 2 ( ) 1 2 MSE(ˆµ; λ) = x 2 0M 2 β 2 + x 2 M + λ 0σ 2 M, M + λ i=1 such that the minimand λ best does not depend on x 0! Our proposed estimator loses its focus, more interesting with p 2. K. Hellton (UiO) Focused tuning May 9, / 22

11 Orthogonal case For general p, assume σ 2 to be known and the design matrix to be orthogonal with equal M j : X T X = diag(m 1,..., M p ) = M I p. K. Hellton (UiO) Focused tuning May 9, / 22

12 Orthogonal case For general p, assume σ 2 to be known and the design matrix to be orthogonal with equal M j : X T X = diag(m 1,..., M p ) = M I p. Then the estimated bias based on the LS pilot estimate is bias = λ ( M + λ xt β 0 N p λ σ 2 λ 2 x T M + λ xt 0 0 β, x ) 0 M(M + λ) 2, K. Hellton (UiO) Focused tuning May 9, / 22

13 Orthogonal case For general p, assume σ 2 to be known and the design matrix to be orthogonal with equal M j : X T X = diag(m 1,..., M p ) = M I p. Then the estimated bias based on the LS pilot estimate is bias = λ ( M + λ xt β 0 N p λ σ 2 λ 2 x T M + λ xt 0 0 β, x ) 0 M(M + λ) 2, giving the estimated mean squared error as MSE(ˆµ; λ) = where ( ) + = max(, 0). ( x T 0 β σ2 x T 0 x ) ( ) 0 λ 2 + σ2 x T 0 x ( ) 0 M 2, M + M + λ M M + λ K. Hellton (UiO) Focused tuning May 9, / 22

14 Focused tuning estimate The minimand, our focused tuning estimator, is ˆλ best = σ 2 Mx T 0 x 0 ( M(x T β) ), 0 2 σ 2 x T 0 x 0 + resulting in the focused prediction 0 if x T β 0 σ x 0 / M, ŷ(ˆλ best ) = M(x 0 2 σ 2 x T 0 x 0 M(x T β) x T β if x T β 0 > σ x 0 / M, when σ 2 known and X T X = diag(m 1,..., M p ) = M I p. K. Hellton (UiO) Focused tuning May 9, / 22

15 Simplified focused tuning estimate In the orthogonal case, the simplified mean squared error without the bias correction is given ( ) MSE(ˆµ; λ) = (x T β) λ 2 ( ) M σ 2 x T 0 x 0, M + λ M + λ with the minimand, the simplified focused tuning ˆλ best = σ 2 xt 0 x 0 (x T 0 β) 2. K. Hellton (UiO) Focused tuning May 9, / 22

16 Simplified focused tuning estimate In the orthogonal case, the simplified mean squared error without the bias correction is given ( ) MSE(ˆµ; λ) = (x T β) λ 2 ( ) M σ 2 x T 0 x 0, M + λ M + λ with the minimand, the simplified focused tuning ˆλ best = σ 2 xt 0 x 0 (x T 0 β) 2. The prediction has a geometric interpretation in terms of α, the angle between vectors x 0 and β and R 2 = x 0 2 / β 2 : ŷ(ˆλ best ) = M(x T β) 0 2 M(x T β) σ 2 x T 0 x x T β 0 = 0 M cos 2 α M cos 2 α + σ 2 R 2 xt 0 β. K. Hellton (UiO) Focused tuning May 9, / 22

17 Prediction risk: p = 1 Risk as function of β with original and simplified focused tuning: Risk for original FIC (black) and adjusted FIC (red) Risk Regression parameter Beta Both do better then least-squares (scaled to 1) for small β. The original is better then the simplified for β close to zero, but worse for medium β. NB: it does not depend on x 0. K. Hellton (UiO) Focused tuning May 9, / 22

18 Risk Prediction risk: p = 2 For p 2, the prediction risk risk(x T 0 ˆβ(ˆλ best )) will vary with x 0. For the simplified tuning, the risk surface as function of β (for fixed x 0 ) shows a trench orthogonal to x 0 : Risk surface Risk surface Risk K. Hellton (UiO) Focused tuning May 9, / 22

19 Prediction risk: p = 2 For fixed β, the risk surface for the simplified tuning can be a function of x 0 (scaled by the least-square risk): Risk surface as function of x0 Risk surface for FIC 1.2 risk x x Variable 1 Around the line in x 0 space orthogonal to β, the risk of the focused tuning ridge is smaller than LS risk. Variable K. Hellton (UiO) Focused tuning May 9, / 22

20 Cross-validation K-fold cross-validation has emerged as a standard for selecting tuning parameters. We will study n-fold or leave-one-out cross-validation ˆλ CV = arg min λ n (y i x T ˆβ ) 2 i i,λ i=1 where the ith observation is removed from the model fitting when predicting y i. For ridge regression, the leave-one-out cross-validation criterion simplifies to weighted sum of squared residuals: ( n y i x T i CV (λ) = ˆβ ) 2 λ, 1 H ii,λ i=1 where H = X(X T X + λ) 1 X T, the weights, quantify the distance of the covariates from the centroid (Golub et al. 1979). K. Hellton (UiO) Focused tuning May 9, / 22

21 Risk for cross-validation The prediction risk (for p=1) as function of β shows that the focused tuning, ˆλ best, gives smaller risk than cross-validation, ˆλ CV, for small β: Risk function for MSE (black) and CV (red) Risk Regression parameter K. Hellton (UiO) Focused tuning May 9, / 22

22 For a given data set (X i, y i ), i = 1,..., n, the cross-validation tuning parameter estimate ˆλ CV depends on the interplay between the residuals and position of the covariates Large residuals around the covariate centroid result in a smaller than average ˆλ cv. Large residuals for all covariate outliers result in a large than average ˆλ cv. K. Hellton (UiO) Focused tuning May 9, / 22

23 To compare prediction error as function of x 0 : fix X, simulate four different {y i }, i = 1,..., 40 and color the difference as focused tuning being better and cross-validation being better. Thick line: x T 0 β = 0. K. Hellton (UiO) Focused tuning May 9, / 22

24 To compare prediction error as function of x 0 : fix X, simulate four different {y i }, i = 1,..., 40 and color the difference as focused tuning being better and cross-validation being better. Thick line: x T 0 β = 0. Difference minimal MSE and CV predition Variable Variable 1 K. Hellton (UiO) Focused tuning May 9, / 22

25 To compare prediction error as function of x 0 : fix X, simulate four different {y i }, i = 1,..., 40 and color the difference as focused tuning being better and cross-validation being better. Thick line: x T 0 β = 0. Difference minimal MSE and CV predition Variable Difference minimal Variable MSE 1and CV predition Variable Variable 1 K. Hellton (UiO) Focused tuning May 9, / 22

26 To compare prediction error as function of x 0 : fix X, simulate four different {y i }, i = 1,..., 40 and color the difference as focused tuning being better and cross-validation being better. Thick line: x T 0 β = 0. Difference minimal MSE and CV predition 0.3 Difference minimal MSE and CV predition Variable Variable Difference minimal Variable MSE 1and CV predition Variable Variable Variable 1 K. Hellton (UiO) Focused tuning May 9, / 22

27 To compare prediction error as function of x 0 : fix X, simulate four different {y i }, i = 1,..., 40 and color the difference as focused tuning being better and cross-validation being better. Thick line: x T 0 β = 0. Difference minimal MSE and CV predition 0.3 Difference minimal MSE and CV predition Variable Variable Difference minimal Variable MSE 1and CV predition Difference minimal Variable MSE 1and CV predition Variable Variable Variable Variable 1 K. Hellton (UiO) Focused tuning May 9, /

28 The pudding When taking the averaged over 2000 simulated {y i }, i = 1,..., 40, the focused approach shows to be better than CV along the line x T Difference minimal MSE and CV predition 0 β = Variable Variable K. Hellton (UiO) Focused tuning May 9, / 22

29 The future Our goal is to be able to use this approach in a high-dimensional situation, p n. A motivating example: gene expression and weight change Cashion et al. (2013) measured gene expression profiles in adipose (fat) tissue taken from kidney transplant recipients to study association between gene activity and change in body weight: p = genes n = 25 patients Can the prediction focused tuning estimate gain something compared to cross-validation this high-dimensional setting? K. Hellton (UiO) Focused tuning May 9, / 22

30 To illustrate the procedure, we simulate outcomes based on gene expression y i = Xβ + ε i, β j = 0.01, ε i N(0, 1), and compare the cross-validation and focused (oracle) prediction: Focused tuning parameter (red) and CV (blue) Root prediction error In the case of known β: 60% of the predictions improved. But a real example needs a new pilot: ridge? principal component regression? K. Hellton (UiO) Focused tuning May 9, / 22

31 Concluding remarks As shown, the focused tuning estimator gives lower prediction error compared to cross-validation for certain x 0. Future work: controlling overfitting. find the best pilot estimate in high-dimension. Two-step ridge regression with cross-validation? PCR? exploit the low-dimensionality of the projection x T 0 β as p grows? explore further foci; β T β, individual β i, P [ŷ 0 > y thres ] = α K. Hellton (UiO) Focused tuning May 9, / 22

32 Thank you! K. Hellton (UiO) Focused tuning May 9, / 22

Improving ridge regression via model selection and focussed fine-tuning

Improving ridge regression via model selection and focussed fine-tuning Università degli Studi di Milano-Bicocca SCUOLA DI ECONOMIA E STATISTICA Corso di Laurea Magistrale in Scienze Statistiche ed Economiche Tesi di laurea magistrale Improving ridge regression via model selection

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

UNIVERSITETET I OSLO

UNIVERSITETET I OSLO UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet Examination in: STK4030 Modern data analysis - FASIT Day of examination: Friday 13. Desember 2013. Examination hours: 14.30 18.30. This

More information

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors

Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Kristoffer Hellton Department of Mathematics, University of Oslo May 12, 2015 K. Hellton (UiO) Distribution

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony

More information

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error?

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error? Simple linear regression Linear Regression Nicole Beckage y " = β % + β ' x " + ε so y* " = β+ % + β+ ' x " Method to assess and evaluate the correlation between two (continuous) variables. The slope of

More information

Lecture 5: A step back

Lecture 5: A step back Lecture 5: A step back Last time Last time we talked about a practical application of the shrinkage idea, introducing James-Stein estimation and its extension We saw our first connection between shrinkage

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Chapter 6 October 18, 2016 Chapter 6 October 18, 2016 1 / 80 1 Subset selection 2 Shrinkage methods 3 Dimension reduction methods (using derived inputs) 4 High

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Regularization Paths. Theme

Regularization Paths. Theme June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Introduction to Statistical modeling: handout for Math 489/583

Introduction to Statistical modeling: handout for Math 489/583 Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect

More information

Regularized Discriminant Analysis and Its Application in Microarray

Regularized Discriminant Analysis and Its Application in Microarray Regularized Discriminant Analysis and Its Application in Microarray Yaqian Guo, Trevor Hastie and Robert Tibshirani May 5, 2004 Abstract In this paper, we introduce a family of some modified versions of

More information

Inference After Variable Selection

Inference After Variable Selection Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Final project STK4030/ Statistical learning: Advanced regression and classification Fall 2015

Final project STK4030/ Statistical learning: Advanced regression and classification Fall 2015 Final project STK4030/9030 - Statistical learning: Advanced regression and classification Fall 2015 Available Monday November 2nd. Handed in: Monday November 23rd at 13.00 November 16, 2015 This is the

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

STK-IN4300 Statistical Learning Methods in Data Science

STK-IN4300 Statistical Learning Methods in Data Science Outline of the lecture STK-I4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no Model Assessment and Selection Cross-Validation Bootstrap Methods Methods using Derived Input

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

More information

High-Dimensional Statistical Learning: Introduction

High-Dimensional Statistical Learning: Introduction Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

Lecture 8 Genomic Selection

Lecture 8 Genomic Selection Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Lecture 14: Shrinkage

Lecture 14: Shrinkage Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the

More information

Homework 1: Solutions

Homework 1: Solutions Homework 1: Solutions Statistics 413 Fall 2017 Data Analysis: Note: All data analysis results are provided by Michael Rodgers 1. Baseball Data: (a) What are the most important features for predicting players

More information

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015

MS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015 MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Ridge and Lasso Regression

Ridge and Lasso Regression enote 8 1 enote 8 Ridge and Lasso Regression enote 8 INDHOLD 2 Indhold 8 Ridge and Lasso Regression 1 8.1 Reading material................................. 2 8.2 Presentation material...............................

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008.

Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008. Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter 7.1-7.9 Model Assessment and Selection CN700/March 4, 2008 Satyavarta sat@cns.bu.edu Auditory Neuroscience Laboratory, Department

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

CS 340 Lec. 15: Linear Regression

CS 340 Lec. 15: Linear Regression CS 340 Lec. 15: Linear Regression AD February 2011 AD () February 2011 1 / 31 Regression Assume you are given some training data { x i, y i } N where x i R d and y i R c. Given an input test data x, you

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Fundamentals of Machine Learning (Part I)

Fundamentals of Machine Learning (Part I) Fundamentals of Machine Learning (Part I) Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp April 12, 2018 Mohammad Emtiyaz Khan 2018 1 Goals Understand (some) fundamentals

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

Data analysis strategies for high dimensional social science data M3 Conference May 2013

Data analysis strategies for high dimensional social science data M3 Conference May 2013 Data analysis strategies for high dimensional social science data M3 Conference May 2013 W. Holmes Finch, Maria Hernández Finch, David E. McIntosh, & Lauren E. Moss Ball State University High dimensional

More information

Introduction to Machine Learning and Cross-Validation

Introduction to Machine Learning and Cross-Validation Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Penalized Regression

Penalized Regression Penalized Regression Deepayan Sarkar Penalized regression Another potential remedy for collinearity Decreases variability of estimated coefficients at the cost of introducing bias Also known as regularization

More information

STAT 462-Computational Data Analysis

STAT 462-Computational Data Analysis STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA SIMIN HU

NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA SIMIN HU NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA by SIMIN HU Submitted in partial fulfillment of the requirements

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

MSA220 Statistical Learning for Big Data

MSA220 Statistical Learning for Big Data MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant

More information

Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009

Ridge Regression. Flachs, Munkholt og Skotte. May 4, 2009 Ridge Regression Flachs, Munkholt og Skotte May 4, 2009 As in usual regression we consider a pair of random variables (X, Y ) with values in R p R and assume that for some (β 0, β) R +p it holds that E(Y

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

TGDR: An Introduction

TGDR: An Introduction TGDR: An Introduction Julian Wolfson Student Seminar March 28, 2007 1 Variable Selection 2 Penalization, Solution Paths and TGDR 3 Applying TGDR 4 Extensions 5 Final Thoughts Some motivating examples We

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Specification Errors, Measurement Errors, Confounding

Specification Errors, Measurement Errors, Confounding Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

MSG500/MVE190 Linear Models - Lecture 15

MSG500/MVE190 Linear Models - Lecture 15 MSG500/MVE190 Linear Models - Lecture 15 Rebecka Jörnsten Mathematical Statistics University of Gothenburg/Chalmers University of Technology December 13, 2012 1 Regularized regression In ordinary least

More information