How the mean changes depends on the other variable. Plots can show what s happening...

Size: px

Start display at page:

Download "How the mean changes depends on the other variable. Plots can show what s happening..."

Adela Casey
5 years ago
Views:

1 Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How does the mean change when we increase x 1 by unity? at x 1 E(Y ) = β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 at x E(Y ) = β 1 (x i1 + 1) + β 2 x i2 + β 12 (x i1 + 1)x i2 difference = β 1 + β 12 x i2 How the mean changes depends on the other variable. Plots can show what s happening... 1

2 E(Y) β 1 > 0, β 2 > 0, β 12 = 0 x2 = 3 x2 = x2 E(Y) β 1 > 0, β 2 > 0, β 12 > 0 x2 = 3 x2 = x2 2 E(Y) β 1 > 0, β 2 > 0, β 12 < 0 x2 = 1 x2 = x2

3 Parallel lines indicate reflect no interaction between x 1 and x 2 ; non-parallel lines indicate an interaction. Including all pairwise (or higher) interactions complicates things tremendously. Need to pare them out via t-tests and/or F-tests. Book suggests fitting additive model, then looking at residuals e i versus each two-way interaction; if there s a pattern you could include that interaction in the model. In my personal experience, scientists will often have an idea of which variables might interact with the response, i.e. there s already some intuition there on their part. This can be helpful. Bodyfat example 3

4 Chapter 9: Model Building Model building(pp ) Designed experiments are typically easy; experimenter picked variables ahead of time! With confirmatory observational studies, the goal is to determine whether (or how) the response is related to one or more pre-specified explanatory variables. No need to weed them out. Exploratory observational studies are done when we have little previous knowledge of exactly which predictors are related to the response. Need to weed out good from useless predictors. We may have a list of potentially useful predictors; variable selection can help us screen out useless ones and build a good, predictive model. 4

5 Section 9.2: Surgical unit example First steps often involve plots: Plots to indicate correct functional form of predictors and/or response. Plots to indicate possible interaction. Exploration of correlation among predictors (maybe). Often a first-order model is a good starting point. Once a reasonable set of potential predictors is identified, formal model selection begins. If the number of predictors is large, say k 10, we can use (automated) stepwise procedures to reduce the number of variables (and models) under consideration. 5

6 Section 9.4: automated variable search (pp ) Forward stepwise regression (pp ) We start with k potential predictors x 1,...,x k. We add and delete predictors one at a time until all predictors are significant at some preset level. Let α e be the significance level for adding variables, and α r be significance level for removing them. Steps: 1. Regress Y on x 1 only, Y on x 2 only, up to Y on x k only. In each case, look at the p-value for testing the slope is zero. Pick the x variable with the smallest p-value to include in the the model. 6

7 2. Fit all possible 2-predictor models (in general j-predictor models) than include the initially chosen x, along with each remaining x variable in turn. Pick new x variable with smallest p-value for testing slope equal to zero in model that already has first one chosen, as long as p-value < α e. Maybe nothing is added. 3. Remove the x variable with the largest p-value as long as p-value > α r. Maybe nothing is removed. 4. Repeat steps (2)-(3) until no x variables can be added or removed. Note: We should choose α e < α r ; in book example α e = 0.1 & α r =

8 Forward selection, backward elimination, and Backward stepwise regression are similar procedures; see p Stepwise procedures work well. If you re going to choose between forward selection and backward elimination, I d choose the latter. Why backwards elimination versus forwards selection? Linear regression example. x 1 x 2 y This also illustrates why scatterplot matrices can be of limited use! R code: x1=c(2,3,4,1,5); x2=c(1,2,5,2,6); y=c(10,17,48,27,55). 8

9 Model selection Section 9.3 (pp ) Once we reduce the set of potential predictors to a reasonable number, we can examine all possible models and choose the best according to some criterion. 1. Choose model with largest adjusted R 2. For a candidate model with p predictors (k = p 1) R 2 a = 1 n 1 n p SSE p SSTO. This is equivalent to choosing the model with the smallest MSE p. If irrelevant variables are added, R 2 a may decrease unlike regular R 2. Thus R 2 a penalizes model for being too complex. 9

10 2. Choose model with smallest Akaike Information Criterion (AIC). For normal error model, AIC = n log(sse p ) n log(n) + 2p. n log(sse p ) n log(n) = 2 log{l(ˆβ, ˆσ 2 )} from the normal model. 2p is penalty term for adding predictors. Like R 2 a, AIC favors models with small SSE, but penalizes models with too many variables p. 10

11 3. Choose model with smallest Schwarz Bayesian Criterion (SBC), also known as Bayesian Information Criterion: BIC = n log(sse p ) n log(n) + p log(n). BIC is similar to AIC, but for n 8, the BIC penalty term is more severe. Chooses model that best predicts the observed data according to asymptotic criteria. 11

12 4. Choose model using Mallows C p : C p = SSE p MSE full n + 2p. Measures the bias in the regression model relative to the full model having all candidate predictors. The full model (with P predictors) is chosen to provide an unbiased estimate ˆσ 2 = MSE(x 1,...,x P 1 ). Predictors must be in correct form and important interactions included. If the model is unbiased E(Ŷi = µ i ), the mean response for the ith observation then E(C p ) = p (pp ). Goals: (a) choose candidate models with p < P for which C p is relatively small, & (b) choose candidate model for which C p = p, the number of parameters in the candidate model. 12

13 Criteria 1, 2, 3, 4 R 2 a, AIC, BIC, and C p will often given different best models. Ultimate goal is to find model that balances: A good fit to the data. Low bias. Parsimony. All else being equal, the simpler model is often easier to interpret and work with. SAS example: surgical unit data. Next: model validation & diagnostics... 13

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple