Lecture 19 Multiple (Linear) Regression

Size: px

Start display at page:

Download "Lecture 19 Multiple (Linear) Regression"

Jonah Lindsey
5 years ago
Views:

1 Lecture 19 Multiple (Linear) Regression Thais Paiva STA Summer 2013 Term II August 1, / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

2 Lecture Plan 1 Multiple regression 2 OLS estimates of β and α 3 Interpretation 2 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

3 Linear regression A study on depression: The response variable is Depression, which is the score on a self-report depression inventory Predictors: Simplicity is the score that indicates a subjects need to see the world in black and white Fatalism is the score that indicates the belief in the ability to control ones own destiny. Depression is thought to be related to simplicity and fatalism 3 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

4 Linear regression Patient Depression Simplicity Fatalism / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

5 Depression data Simplicity Fatalism Depression 5 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

6 Depression data Simplicity Fatalism Depression 6 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

7 Depression data - residuals Simplicity Fatalism Depression 7 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

8 Assumptions for multiple linear regression Y i = α + β 1 X 1i + β 2 X 2i β p X pi + ε i Just as with simple linear regression, the following have to hold: 1 Constant variance (also called homoscedasticity) V (ε i ) = σ 2 for all i = 1,..., n, for some σ 2 2 Linearity 3 Independence ε i ε j for all i, j = 1,..., n, i j 8 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

9 Interpretation of the β s Y i = α + β 1 X 1i + β 2 X 2i β p X pi + ε i β j is the average effect on Y of increasing X j by one unit, with all X k j held constant This is sometimes referred to as the effect of X j after controlling for X k j So β simplicity is the average effect of simplicity on depression after controlling for fatalism 9 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

10 Always plot residuals simplicity ε^ fatalism ε^ 10 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

11 Histogram of residuals Frequency ε^ 11 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

12 OLS estimates of α, β 1,..., β p (This is only really reasonable to write down if p = 2) where ˆβ 1 = s Y (r X1 Y r X1 X 2 r X2 Y ) s X1 (1 r 2 X 1 X 2 ) ˆβ 2 = s Y (r X2 Y r X1 X 2 r X1 Y ) s X2 (1 r 2 X 1 X 2 ) ˆα = Ȳ ˆβ 1 X1 ˆβ 2 X2, r AB = n Y i = α + β 1 X 1i + β 2 X 2i + ε i n i=1 (A i Ā)(B i B) i=1 (A n i Ā)2 i=1 (B i B) for some A and B and 2 SA 2 = 1 n (A i n 1 Ā)2 for some A i=1 12 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

13 It is easier if you know matrix algebra Y = Xβ + ε, where y 1 1 x x 1p α ε 1 y 2 Y =.., X = 1 x x 2p , β = β 1.., ε = ε 2.. y n 1 x x np β p ε n 13 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

14 It is easier if you know matrix algebra It turns out that the error sum of squares can be written as X T Y X T Xˆβ = 0 ˆε = (Y Xβ) T (Y Xβ) ˆε β = 2XT (Y Xβ) set = 0 X T Y = X T Xˆβ (X T X) 1 X T Y = ˆβ 14 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

15 It is easier if you know matrix algebra A couple of things are clear ˆβ = (X T X) 1 X T Y 1 ˆβ is linear in Y 2 ˆβ is easy to compute if we have a computer 15 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

16 The coefficient of determination Similarly to simple linear regression, and where TSS = n (Y i Ȳ ) 2, ESS = i=1 r 2 = ESS TSS TSS = ESS + RSS, n (Ŷ i Ȳ ) 2, RSS = i=1 SS: Sum of Squares. T: Total. E: Explained. R: Residual n (Y i Ŷ i ) 2 i=1 16 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

17 s 2 and degrees of freedom Similarly to simple linear regression, s 2 = 1 n p 1 = RSS n p 1 n (Y i Ŷi) 2 i=1 Note the n p 1 degrees of freedom. Why? We had to estimate p + 1 regression parameters. 17 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

18 Hypothesis tests for β j Suppose we are interested in testing H 0 : β j = 0 H A : β j 0 (or the one-sided version) Assuming p = 2 (tractable, but more complicated for p > 2), define and similarly for s 2ˆβ 2. Then (even for p > 2), s 2ˆβ1 = s 2 (n 1)s 2 X 1 (1 r 2 X 1X 2 ) t ˆβ j = ˆβ j β j s ˆβ j t n p 1 18 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

19 Hypothesis tests for β j Notice that s 2ˆβ1 = s 2 (n 1)s 2 X 1 (1 r 2 X 1X 2 ) depends on r 2 X 1X 2, which depends on X 2. So the test for β 1 depends on the other predictor variables What is the interpretation of this test then? Assuming that the other β k j 0, can we reject the hypothesis that β j = 0? 19 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

20 Scatterplot matrix depression simplicity fatalism 20 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

21 3D Scatterplot and plane Simplicity Fatalism Depression 21 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

22 Tests for β simplicity and β fatalism β simplicity : β fatalism : t βsimplicity = p-value = t βfatalism = p-value = But what if we take fatalism out of the model? Then we get β simplicity : t βsimplicity = p-value = Why? 22 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

23 Scatterplot matrix depression simplicity fatalism 23 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

24 1907 Romanian Peasant Rebellion From Wikipedia: The Romanian Peasants Revolt took place in March 1907 in Moldavia and it quickly spread, reaching Wallachia. Y = Intensity of the rebellion, by county X 1 = Commercialization of agriculture X 2 = Traditionalism X 3 = Strength of middle peasantry X 4 = Inequality of land tenure 24 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

25 Scatterplot matrix intensity commerce tradition midpeasant inequality 25 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

26 Peasant Rebellion results With all the predictors in the model: Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * commerce e-05 *** tradition midpeasant inequality / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

27 Peasant Rebellion results Without commerce, tradition becomes significant at α = 0.05: Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * tradition * midpeasant inequality / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

28 Caveats 1 Be careful interpreting the coefficients! Multiple regression is usually applied to observational data 2 Do not think of the sign of the coefficient as special it can actually change as other covariates are added or removed from the model 3 Similarly, tests about any covariate are only meaningful in the context of the other covariates in the model 4 Always make sure a linear model is appropriate for all predictors! 5 Always check residuals for heteroscedasticity and normality 28 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

29 Caveats In particular, a special case that you should be careful about is when the predictors are highly correlated In this situation the coefficient estimates may change erratically in response to small changes in the model or the data This phenomenon is called multicollinearity Because of that, matrix correlation of the predictors is also something to look at (and report) in the analysis 29 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

30 Summary 1 Multiple linear regression fits the best hyperplane to the data 2 We can test hypotheses about any of the β j s 3 Be careful about interpretation 4 Correlation of the predictors also important because of multicollinearity 30 / 30 Thais Paiva STA Summer 2013 Term II Lecture 19, 08/01/2013

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent