Chapter 3: Other Issues in Multiple regression (Part 1)

Size: px

Start display at page:

Download "Chapter 3: Other Issues in Multiple regression (Part 1)"

Suzan Black
5 years ago
Views:

1 Chapter 3: Other Issues i Multiple regressio (Part 1) 1 Model (variable) selectio The difficulty with model selectio: for p predictors, there are 2 p differet cadidate models. Whe we have may predictors (with may possible iteractios), it ca be difficult to fid a good model. Model selectio tries to simplify this task. Suppose we have P predictors X 1,..., X P, but the true models oly depeds o a subset of X 1,..., X P. I other words i model Y = β 0 + β 1 X β P X P + ε some of the coefficiets are zeros. We eed to fid those predictors with ozero coefficiets. we call the set of predictors with ozero coefficiets best subset, all the predictors i the best subset importat variables Criteria: Statistical test; some idices of the model; predictability (Distictio betwee predictive ad explaatory research.) Example 1.1 (Surgical Uit example) X 1 : blood clottig score; X 2 : Progostic idex; X 3 : ezyme fuctio test score X 4 : liver fuctio test score; X 5 : age i year; X 6 : idicator of geder (0=mail, 1=f ); X 7,X 8 idicator for alcohol use; Y :survivaltime. If we oly cosider the first 4 predictors, we have the followig calculatio for the 1

2 possible models variables selected p SSE R 2 Ra 2 C p AIC SBC PRESS (BIC) (CV) Noe X X X X X1, X X1, X X1, X X2, X X2, X X3, X X1, X2, X X1, X2, X X1, X3, X X2, X3, X X1, X2, X3, X where p is the umber of coefficiets icluded i the model. 2 R 2 ad R 2 a Criterio 1. R 2 : ca be used for models with the same umber of parameters/coefficiets. 2. R 2 a : ca be used for models with Differet umber of parameters/coefficiets. We eed to choose a model with the biggest R 2 a. 3 Mallows C p Criterio Suppose we select p predictors, p P ad try a model with the selected predictors. deote its SSE by SSE p. The criterio is C p = SSE p MSE(X 1,..., X P ) ( 2p ) where p is the umber of coefficiets icludig itercept (if there is). Criterio: We seek to idetify subsets of X for which (1) the C p values is small ad (2) the C p vale is ear p. 2

3 If a selected model icludes all the importat variables (But with some other uimportat variables), the model is still correct. The we have E{SSE p } =( p )σ 2 O the other had Roughly speakig, we have E{MSE(X 1,..., X P )} = σ 2 C p p ( 2p )=p Questio: are the estimators still ubiased? If a selected model does ot iclude all the importat variables, the model is wrog. The SSE p >> SSE P C p >> p ( 2p )=p Questio: are the estimators still ubiased? 4 Akaike s iformatio criterio (AIC) We caot use SSE aloe for the selectio. As p icreases, SSE p decreases. AIC try to balace the umber of parameters ad SSE p. AIC p =log( SSE p )+2p or AIC p = log( SSE p )+2p 3

4 5 Schwarz Bayesia criterio (BIC or SBC) Theoretically, people fid that AIC does ot give a right umber of variables. Schwarz proposed the BIC or BIC p =log( SSE p )+log()p BIC p = log( SSE p )+log()p BIC gives bigger pealty to the umber of parameters 6 Predictio sum of squares (PRESS) or Cross-validatio criterio (CV) A better model should have better predictio. Most of the time, we dot have a data for us to predict. A simple way is to partitio the data to two parts: traiig samples (set) ad predictio set (or validatio set). Use traiig set to estimate the model ad predictio set to check the predictability. A simple case that each time, the predictio set has oe sample i tur. There are may partitios. Usig all the partitios is the idea of cross-validatio (CV). The idea was proposed by M. Stoe (1974). If we use 1 observatio for validatio ad the other -1 for model estimatio, it is the leave-oe-observatio-out cross-validatio If we use m observatios for validatio ad the other -m for model estimatio, it is the leave-m-observatio-out cross-validatio. We eed to select variables from X 1,..., X p to be icluded i the model. There are may cadidate variables. For example, model 1: model 2: model 3: Y = a 0 + a 1 X 1 + ε Y = b 0 + b 1 X 1 + b 2 X 4 + ε Y = c 0 + c 1 X 2 + ε 4

5 Suppose we have samples. For each i = 1,...,, we use data (Y 1,X 1 ),..., (Y i 1,X i 1 ), (Y i+1,x i+1 ),...(Y,X ), where X i =(X i1,..., X ip ), to estimate the models. the estimated models are, say, model 1: model 2: model 3: Y =â i 0 +âi 1 X i1 Y = ˆb i 0 + ˆb i 1X i1 + ˆb i 2X i4 Y =ĉ i 0 +ĉ i 1X i2 The predictio errors for (Y i,x i ) are respectively model 1: err 1 (i) ={Y i â i 0 â i 1X i,1 } 2 model 2: err 2 (i) ={Y i ˆb i 0 ˆb i 1 X i,1 ˆb i 2 X i,4} 2 model 3: err 3 (i) ={Y i ĉ i 0 ĉi 1 X i,2} 2 The overall predictio errors (also called Cross-validatio value) are respectively the model 1: CV 1 = 1 err 1 (i) i=1 model 2: CV 2 = 1 err 2 (i) i=1 model 3: CV 3 = 1 err 3 (i) i=1 The model with the smallest CV value is the model we prefer. 5

6 Example 6.1 For the same data above (data) Our cadidate models are model 0 model 1 model 2 model 3 model 4 model 5 Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + ε Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + ε Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 5 X 5 + ε Y = β 0 + β 1 X 1 + β 2 X 2 + β 4 X 4 + β 5 X 5 + ε Y = β 0 + β 1 X 1 + β 3 X 3 + β 4 X 4 + β 5 X 5 + ε Y = β 0 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + ε The CV values for the above model are respectivly CV (model 0) = ,CV(model 1) = ,CV(model 2) = , CV (model 3) = ,CV(model 4) = ,CV(model 5) = Thus model 1 is selected (ad variable X 5 is deleted) R-code for the calculatio K-fold cross-validatio I K-fold cross-validatio, the origial sample is partitioed ito K subsamples. Of the K subsamples, a sigle subsample is retaied as the validatio data for testig the model, ad the remaiig K 1 subsamples are used as traiig data. The cross-validatio process is the repeated K times (the folds), with each of the K subsamples used exactly oce as the validatio data. The K results from the folds the ca be averaged (or otherwise combied) to produce a sigle estimatio. The advatage of this method over repeated radom sub-samplig is that all observatios are used for both traiig ad validatio, ad each observatio is used for validatio exactly oce. 10-fold cross-validatio is commoly used. 7 Searchig for the best subset Forward selectio: startig with o variables i the model, tryig out the variables oe by oe ad icludig them if they are statistically sigificat or ca icrease the predictability. 6

7 Backward elimiatio: startig with all cadidate variables ad testig them oe by oe for statistical sigificace, deletig ay that are ot sigificat or ca icrease the predictability. Stepwise: a combiatio of the above, testig at each stage for variables to be icluded or excluded. 8 R code step(object, directio = c("both", "backward", "forward"), steps = 1000, k =??) where k ca be ay positive values, but k =2forAIC,adk =log() forbic(sbc) Example 8.1 For the first example above with data, the selected model variables are Based o BIC: X1 + X2 + X3 + X5 + X6 + X8 or Based o BIC: X1 + X2 + X3 + X8 (code) 7

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +