Focused fine-tuning of ridge regression
|
|
- Agnes Powell
- 5 years ago
- Views:
Transcription
1 Focused fine-tuning of ridge regression Kristoffer Hellton Department of Mathematics, University of Oslo May 9, 2016 K. Hellton (UiO) Focused tuning May 9, / 22
2 Penalized regression The least-squares (LS) estimate β = (X T X) 1 X T Y is sensitive to random errors or not unique if (X T X) 1 becomes close to or exactly singular, for instance as p > n. Ridge regression addresses this problem by penalizing the residual sum-of-squares: n ( ˆβ ridge = arg min yi x T i β ) 2 p + λ β i=1 introducing a tuning parameter λ. j=1 The solution and its relation to the LS estimate (p < n): β 2 j, ˆβ = (X T X + λi p ) 1 X T Y = (X T X + λi p ) 1 X T X β. K. Hellton (UiO) Focused tuning May 9, / 22
3 How to choose the tuning parameter? Many different procedures: K-fold cross-validation, Hastie et al. (2009) Allen s PRESS statistic, leave-one-out CV, Allen (1974) Generalized cross-validation, Golub et al. (1979) 1 Bootstrap, Delaney and Chatterjee (1986) Small sample corrected AIC, Hurvich and Tsai (1989) Marginal maximum likelihood The current unchallenged favorite: 10-fold cross-validation 1 Already Golub et al. (1979) provide references to more than 15 procedures. K. Hellton (UiO) Focused tuning May 9, / 22
4 Thought experiment A medical study has measured covariates and observed outcomes for a group of patients. A new patient enters the doctors office, and we wish to predict the outcome for his/her specific set of covariates, x 0, as best as possible. Cross-validation would select λ to be optimal for the overall distribution of covariates, but not the specific x 0. Instead; frame the specific prediction µ = x T 0 β as a focus parameter: 1 find the bias and variance of ˆµ, 2 and minimize the estimated mean squared error (MSE) of ˆµ as a function of λ to find ˆλ. K. Hellton (UiO) Focused tuning May 9, / 22
5 Given the standard fixed design regression model y i = x T i β + ε i, ε i N(0, σ 2 ), and the focus µ 0 = Ey 0 = x T 0 β, root mse the MSE(ˆµ) is a function of λ and the model parameters: Tuning parameter K. Hellton (UiO) Focused tuning May 9, / 22
6 Given the standard fixed design regression model y i = x T i β + ε i, ε i N(0, σ 2 ), and the focus µ 0 = Ey 0 = x T 0 β, root mse the MSE(ˆµ) is a function of λ and the model parameters: Tuning parameter MSE(ˆµ; λ, x 0, β, σ 2 ) = { x T 0 ((X T X + λi p ) 1 X T X I p )β } 2 + σ 2 x T 0 (X T X + λi p ) 1 X T X(X T X + λi p ) 1 x 0 For each covariate set x 0 the MSE curves, as functions of λ, have different minima. K. Hellton (UiO) Focused tuning May 9, / 22
7 The minimand of the theoretical MSE defines the oracle tuning λ best = arg min λ MSE(ˆµ; λ, x 0 ). K. Hellton (UiO) Focused tuning May 9, / 22
8 The minimand of the theoretical MSE defines the oracle tuning λ best = arg min λ MSE(ˆµ; λ, x 0 ). But an estimator of λ best requires pilot estimates of β and σ 2. Our proposed estimator uses the LS estimates (for p < n): ˆλ best = arg min λ MSE(ˆµ; λ, x 0 ), = arg min λ Var(ˆµ; λ, x0 ) + bias 2 (ˆµ; λ, x 0 ) = arg min λ { Var + max{( bias) 2 Var bias, 0} }, K. Hellton (UiO) Focused tuning May 9, / 22
9 The minimand of the theoretical MSE defines the oracle tuning λ best = arg min λ MSE(ˆµ; λ, x 0 ). But an estimator of λ best requires pilot estimates of β and σ 2. Our proposed estimator uses the LS estimates (for p < n): ˆλ best = arg min λ MSE(ˆµ; λ, x 0 ), = arg min λ Var(ˆµ; λ, x0 ) + bias 2 (ˆµ; λ, x 0 ) = arg min λ { Var + max{( bias) 2 Var bias, 0} or a simplified version without bias correction, written out as: { ˆλ best = arg min (x T 0 ((X T X + λi p ) 1 X T X I p ) β) 2 λ + σ 2 x T 0 (X T X + λi p ) 1 X T X(X T X + λi p ) 1 } x 0 }, K. Hellton (UiO) Focused tuning May 9, / 22
10 In one dimension Suppose p = 1, then ˆβ = M M + λ β, M = and the mean squared error is given n x 2 i, ( ) λ 2 ( ) 1 2 MSE(ˆµ; λ) = x 2 0M 2 β 2 + x 2 M + λ 0σ 2 M, M + λ i=1 such that the minimand λ best does not depend on x 0! Our proposed estimator loses its focus, more interesting with p 2. K. Hellton (UiO) Focused tuning May 9, / 22
11 Orthogonal case For general p, assume σ 2 to be known and the design matrix to be orthogonal with equal M j : X T X = diag(m 1,..., M p ) = M I p. K. Hellton (UiO) Focused tuning May 9, / 22
12 Orthogonal case For general p, assume σ 2 to be known and the design matrix to be orthogonal with equal M j : X T X = diag(m 1,..., M p ) = M I p. Then the estimated bias based on the LS pilot estimate is bias = λ ( M + λ xt β 0 N p λ σ 2 λ 2 x T M + λ xt 0 0 β, x ) 0 M(M + λ) 2, K. Hellton (UiO) Focused tuning May 9, / 22
13 Orthogonal case For general p, assume σ 2 to be known and the design matrix to be orthogonal with equal M j : X T X = diag(m 1,..., M p ) = M I p. Then the estimated bias based on the LS pilot estimate is bias = λ ( M + λ xt β 0 N p λ σ 2 λ 2 x T M + λ xt 0 0 β, x ) 0 M(M + λ) 2, giving the estimated mean squared error as MSE(ˆµ; λ) = where ( ) + = max(, 0). ( x T 0 β σ2 x T 0 x ) ( ) 0 λ 2 + σ2 x T 0 x ( ) 0 M 2, M + M + λ M M + λ K. Hellton (UiO) Focused tuning May 9, / 22
14 Focused tuning estimate The minimand, our focused tuning estimator, is ˆλ best = σ 2 Mx T 0 x 0 ( M(x T β) ), 0 2 σ 2 x T 0 x 0 + resulting in the focused prediction 0 if x T β 0 σ x 0 / M, ŷ(ˆλ best ) = M(x 0 2 σ 2 x T 0 x 0 M(x T β) x T β if x T β 0 > σ x 0 / M, when σ 2 known and X T X = diag(m 1,..., M p ) = M I p. K. Hellton (UiO) Focused tuning May 9, / 22
15 Simplified focused tuning estimate In the orthogonal case, the simplified mean squared error without the bias correction is given ( ) MSE(ˆµ; λ) = (x T β) λ 2 ( ) M σ 2 x T 0 x 0, M + λ M + λ with the minimand, the simplified focused tuning ˆλ best = σ 2 xt 0 x 0 (x T 0 β) 2. K. Hellton (UiO) Focused tuning May 9, / 22
16 Simplified focused tuning estimate In the orthogonal case, the simplified mean squared error without the bias correction is given ( ) MSE(ˆµ; λ) = (x T β) λ 2 ( ) M σ 2 x T 0 x 0, M + λ M + λ with the minimand, the simplified focused tuning ˆλ best = σ 2 xt 0 x 0 (x T 0 β) 2. The prediction has a geometric interpretation in terms of α, the angle between vectors x 0 and β and R 2 = x 0 2 / β 2 : ŷ(ˆλ best ) = M(x T β) 0 2 M(x T β) σ 2 x T 0 x x T β 0 = 0 M cos 2 α M cos 2 α + σ 2 R 2 xt 0 β. K. Hellton (UiO) Focused tuning May 9, / 22
17 Prediction risk: p = 1 Risk as function of β with original and simplified focused tuning: Risk for original FIC (black) and adjusted FIC (red) Risk Regression parameter Beta Both do better then least-squares (scaled to 1) for small β. The original is better then the simplified for β close to zero, but worse for medium β. NB: it does not depend on x 0. K. Hellton (UiO) Focused tuning May 9, / 22
18 Risk Prediction risk: p = 2 For p 2, the prediction risk risk(x T 0 ˆβ(ˆλ best )) will vary with x 0. For the simplified tuning, the risk surface as function of β (for fixed x 0 ) shows a trench orthogonal to x 0 : Risk surface Risk surface Risk K. Hellton (UiO) Focused tuning May 9, / 22
19 Prediction risk: p = 2 For fixed β, the risk surface for the simplified tuning can be a function of x 0 (scaled by the least-square risk): Risk surface as function of x0 Risk surface for FIC 1.2 risk x x Variable 1 Around the line in x 0 space orthogonal to β, the risk of the focused tuning ridge is smaller than LS risk. Variable K. Hellton (UiO) Focused tuning May 9, / 22
20 Cross-validation K-fold cross-validation has emerged as a standard for selecting tuning parameters. We will study n-fold or leave-one-out cross-validation ˆλ CV = arg min λ n (y i x T ˆβ ) 2 i i,λ i=1 where the ith observation is removed from the model fitting when predicting y i. For ridge regression, the leave-one-out cross-validation criterion simplifies to weighted sum of squared residuals: ( n y i x T i CV (λ) = ˆβ ) 2 λ, 1 H ii,λ i=1 where H = X(X T X + λ) 1 X T, the weights, quantify the distance of the covariates from the centroid (Golub et al. 1979). K. Hellton (UiO) Focused tuning May 9, / 22
21 Risk for cross-validation The prediction risk (for p=1) as function of β shows that the focused tuning, ˆλ best, gives smaller risk than cross-validation, ˆλ CV, for small β: Risk function for MSE (black) and CV (red) Risk Regression parameter K. Hellton (UiO) Focused tuning May 9, / 22
22 For a given data set (X i, y i ), i = 1,..., n, the cross-validation tuning parameter estimate ˆλ CV depends on the interplay between the residuals and position of the covariates Large residuals around the covariate centroid result in a smaller than average ˆλ cv. Large residuals for all covariate outliers result in a large than average ˆλ cv. K. Hellton (UiO) Focused tuning May 9, / 22
23 To compare prediction error as function of x 0 : fix X, simulate four different {y i }, i = 1,..., 40 and color the difference as focused tuning being better and cross-validation being better. Thick line: x T 0 β = 0. K. Hellton (UiO) Focused tuning May 9, / 22
24 To compare prediction error as function of x 0 : fix X, simulate four different {y i }, i = 1,..., 40 and color the difference as focused tuning being better and cross-validation being better. Thick line: x T 0 β = 0. Difference minimal MSE and CV predition Variable Variable 1 K. Hellton (UiO) Focused tuning May 9, / 22
25 To compare prediction error as function of x 0 : fix X, simulate four different {y i }, i = 1,..., 40 and color the difference as focused tuning being better and cross-validation being better. Thick line: x T 0 β = 0. Difference minimal MSE and CV predition Variable Difference minimal Variable MSE 1and CV predition Variable Variable 1 K. Hellton (UiO) Focused tuning May 9, / 22
26 To compare prediction error as function of x 0 : fix X, simulate four different {y i }, i = 1,..., 40 and color the difference as focused tuning being better and cross-validation being better. Thick line: x T 0 β = 0. Difference minimal MSE and CV predition 0.3 Difference minimal MSE and CV predition Variable Variable Difference minimal Variable MSE 1and CV predition Variable Variable Variable 1 K. Hellton (UiO) Focused tuning May 9, / 22
27 To compare prediction error as function of x 0 : fix X, simulate four different {y i }, i = 1,..., 40 and color the difference as focused tuning being better and cross-validation being better. Thick line: x T 0 β = 0. Difference minimal MSE and CV predition 0.3 Difference minimal MSE and CV predition Variable Variable Difference minimal Variable MSE 1and CV predition Difference minimal Variable MSE 1and CV predition Variable Variable Variable Variable 1 K. Hellton (UiO) Focused tuning May 9, /
28 The pudding When taking the averaged over 2000 simulated {y i }, i = 1,..., 40, the focused approach shows to be better than CV along the line x T Difference minimal MSE and CV predition 0 β = Variable Variable K. Hellton (UiO) Focused tuning May 9, / 22
29 The future Our goal is to be able to use this approach in a high-dimensional situation, p n. A motivating example: gene expression and weight change Cashion et al. (2013) measured gene expression profiles in adipose (fat) tissue taken from kidney transplant recipients to study association between gene activity and change in body weight: p = genes n = 25 patients Can the prediction focused tuning estimate gain something compared to cross-validation this high-dimensional setting? K. Hellton (UiO) Focused tuning May 9, / 22
30 To illustrate the procedure, we simulate outcomes based on gene expression y i = Xβ + ε i, β j = 0.01, ε i N(0, 1), and compare the cross-validation and focused (oracle) prediction: Focused tuning parameter (red) and CV (blue) Root prediction error In the case of known β: 60% of the predictions improved. But a real example needs a new pilot: ridge? principal component regression? K. Hellton (UiO) Focused tuning May 9, / 22
31 Concluding remarks As shown, the focused tuning estimator gives lower prediction error compared to cross-validation for certain x 0. Future work: controlling overfitting. find the best pilot estimate in high-dimension. Two-step ridge regression with cross-validation? PCR? exploit the low-dimensionality of the projection x T 0 β as p grows? explore further foci; β T β, individual β i, P [ŷ 0 > y thres ] = α K. Hellton (UiO) Focused tuning May 9, / 22
32 Thank you! K. Hellton (UiO) Focused tuning May 9, / 22
Improving ridge regression via model selection and focussed fine-tuning
Università degli Studi di Milano-Bicocca SCUOLA DI ECONOMIA E STATISTICA Corso di Laurea Magistrale in Scienze Statistiche ed Economiche Tesi di laurea magistrale Improving ridge regression via model selection
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationLinear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman
Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic
More informationUNIVERSITETET I OSLO
UNIVERSITETET I OSLO Det matematisk-naturvitenskapelige fakultet Examination in: STK4030 Modern data analysis - FASIT Day of examination: Friday 13. Desember 2013. Examination hours: 14.30 18.30. This
More informationBANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1
BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013)
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationHigh-dimensional regression modeling
High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationSimple Linear Regression
Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.
More informationPrincipal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors
Principal component analysis and the asymptotic distribution of high-dimensional sample eigenvectors Kristoffer Hellton Department of Mathematics, University of Oslo May 12, 2015 K. Hellton (UiO) Distribution
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationIEOR 165 Lecture 7 1 Bias-Variance Tradeoff
IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to
More informationBusiness Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'
Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Model Evaluation and Selection Predictive Ability of a Model: Denition and Estimation We aim at achieving a balance between parsimony
More informationLinear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error?
Simple linear regression Linear Regression Nicole Beckage y " = β % + β ' x " + ε so y* " = β+ % + β+ ' x " Method to assess and evaluate the correlation between two (continuous) variables. The slope of
More informationLecture 5: A step back
Lecture 5: A step back Last time Last time we talked about a practical application of the shrinkage idea, introducing James-Stein estimation and its extension We saw our first connection between shrinkage
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Chapter 6 October 18, 2016 Chapter 6 October 18, 2016 1 / 80 1 Subset selection 2 Shrinkage methods 3 Dimension reduction methods (using derived inputs) 4 High
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationRegularization Paths. Theme
June 00 Trevor Hastie, Stanford Statistics June 00 Trevor Hastie, Stanford Statistics Theme Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Mee-Young Park,
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationRegularized Discriminant Analysis and Its Application in Microarray
Regularized Discriminant Analysis and Its Application in Microarray Yaqian Guo, Trevor Hastie and Robert Tibshirani May 5, 2004 Abstract In this paper, we introduce a family of some modified versions of
More informationInference After Variable Selection
Department of Mathematics, SIU Carbondale Inference After Variable Selection Lasanthi Pelawa Watagoda lasanthi@siu.edu June 12, 2017 Outline 1 Introduction 2 Inference For Ridge and Lasso 3 Variable Selection
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationLeast Angle Regression, Forward Stagewise and the Lasso
January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,
More informationTheorems. Least squares regression
Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationFinal project STK4030/ Statistical learning: Advanced regression and classification Fall 2015
Final project STK4030/9030 - Statistical learning: Advanced regression and classification Fall 2015 Available Monday November 2nd. Handed in: Monday November 23rd at 13.00 November 16, 2015 This is the
More informationOutline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren
1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information
More informationSTK-IN4300 Statistical Learning Methods in Data Science
Outline of the lecture STK-I4300 Statistical Learning Methods in Data Science Riccardo De Bin debin@math.uio.no Model Assessment and Selection Cross-Validation Bootstrap Methods Methods using Derived Input
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationHigh-Dimensional Statistical Learning: Introduction
Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationThe MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010
Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have
More informationLecture 8 Genomic Selection
Lecture 8 Genomic Selection Guilherme J. M. Rosa University of Wisconsin-Madison Mixed Models in Quantitative Genetics SISG, Seattle 18 0 Setember 018 OUTLINE Marker Assisted Selection Genomic Selection
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More information4 Multiple Linear Regression
4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationSTAT5044: Regression and Anova. Inyoung Kim
STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationHomework 1: Solutions
Homework 1: Solutions Statistics 413 Fall 2017 Data Analysis: Note: All data analysis results are provided by Michael Rodgers 1. Baseball Data: (a) What are the most important features for predicting players
More informationMS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015
MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates
More informationSimple Linear Regression
Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent
More informationRidge and Lasso Regression
enote 8 1 enote 8 Ridge and Lasso Regression enote 8 INDHOLD 2 Indhold 8 Ridge and Lasso Regression 1 8.1 Reading material................................. 2 8.2 Presentation material...............................
More informationExam: high-dimensional data analysis January 20, 2014
Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish
More informationHastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter Model Assessment and Selection. CN700/March 4, 2008.
Hastie, Tibshirani & Friedman: Elements of Statistical Learning Chapter 7.1-7.9 Model Assessment and Selection CN700/March 4, 2008 Satyavarta sat@cns.bu.edu Auditory Neuroscience Laboratory, Department
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationAdministration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books
STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,
More informationBuilding a Prognostic Biomarker
Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,
More informationCS 340 Lec. 15: Linear Regression
CS 340 Lec. 15: Linear Regression AD February 2011 AD () February 2011 1 / 31 Regression Assume you are given some training data { x i, y i } N where x i R d and y i R c. Given an input test data x, you
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationFundamentals of Machine Learning (Part I)
Fundamentals of Machine Learning (Part I) Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://emtiyaz.github.io emtiyaz.khan@riken.jp April 12, 2018 Mohammad Emtiyaz Khan 2018 1 Goals Understand (some) fundamentals
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationVariable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1
Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr
More informationData analysis strategies for high dimensional social science data M3 Conference May 2013
Data analysis strategies for high dimensional social science data M3 Conference May 2013 W. Holmes Finch, Maria Hernández Finch, David E. McIntosh, & Lauren E. Moss Ball State University High dimensional
More informationIntroduction to Machine Learning and Cross-Validation
Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationMa 3/103: Lecture 24 Linear Regression I: Estimation
Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the
More informationPenalized Regression
Penalized Regression Deepayan Sarkar Penalized regression Another potential remedy for collinearity Decreases variability of estimated coefficients at the cost of introducing bias Also known as regularization
More informationSTAT 462-Computational Data Analysis
STAT 462-Computational Data Analysis Chapter 5- Part 2 Nasser Sadeghkhani a.sadeghkhani@queensu.ca October 2017 1 / 27 Outline Shrinkage Methods 1. Ridge Regression 2. Lasso Dimension Reduction Methods
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationSimple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.
Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1
More informationNEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA SIMIN HU
NEW METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO SURVIVAL ANALYSIS AND STATISTICAL REDUNDANCY ANALYSIS USING GENE EXPRESSION DATA by SIMIN HU Submitted in partial fulfillment of the requirements
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationMultiple Linear Regression
Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from
More informationSTATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours
Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID
More informationMAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik
MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationChapter 7: Model Assessment and Selection
Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationMSA220 Statistical Learning for Big Data
MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant
More informationRidge Regression. Flachs, Munkholt og Skotte. May 4, 2009
Ridge Regression Flachs, Munkholt og Skotte May 4, 2009 As in usual regression we consider a pair of random variables (X, Y ) with values in R p R and assume that for some (β 0, β) R +p it holds that E(Y
More information11 Hypothesis Testing
28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes
More informationTGDR: An Introduction
TGDR: An Introduction Julian Wolfson Student Seminar March 28, 2007 1 Variable Selection 2 Penalization, Solution Paths and TGDR 3 Applying TGDR 4 Extensions 5 Final Thoughts Some motivating examples We
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationSpecification Errors, Measurement Errors, Confounding
Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationMSG500/MVE190 Linear Models - Lecture 15
MSG500/MVE190 Linear Models - Lecture 15 Rebecka Jörnsten Mathematical Statistics University of Gothenburg/Chalmers University of Technology December 13, 2012 1 Regularized regression In ordinary least
More information