Lecture 19 Multiple (Linear) Regression

Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Lecture Plan 1 Multiple regression 2 OLS estimates of β and α 3 Interpretation 2 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Linear regression A study on depression: The response variable is Depression, which is the score on a self-report depression inventory Predictors: Simplicity is the score that indicates a subjects need to see the world in black and white Fatalism is the score that indicates the belief in the ability to control ones own destiny. Depression is thought to be related to simplicity and fatalism 3 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Linear regression Patient Depression Simplicity Fatalism 1 0.42 0.76 0.11 2 0.52 0.73 1.00 3 0.71 0.62 0.04 4 0.66 0.84 0.42 5 0.54 0.48 0.81 6 0.34 0.41 1.23 7 0.42 0.85 0.30 8 1.08 1.50 1.20 9 0.36 0.31 0.66 10 0.92 1.41 0.85 11 0.33 0.43 0.42 12 0.41 0.53 0.07 13 0.83 1.17 0.30 14 0.65 0.42 1.09 15 0.80 0.76 1.13 4 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Depression data 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 Simplicity Fatalism Depression 5 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Depression data 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 Simplicity Fatalism Depression 6 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Depression data - residuals 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 Simplicity Fatalism Depression 7 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Assumptions for multiple linear regression Y i = α + β 1 X 1i + β 2 X 2i +... + β p X pi + ε i Just as with simple linear regression, the following have to hold: 1 Constant variance (also called homoscedasticity) V (ε i ) = σ 2 for all i = 1,..., n, for some σ 2 2 Linearity 3 Independence ε i ε j for all i, j = 1,..., n, i j 8 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Interpretation of the β s Y i = α + β 1 X 1i + β 2 X 2i +... + β p X pi + ε i β j is the average effect on Y of increasing X j by one unit, with all X k j held constant This is sometimes referred to as the effect of X j after controlling for X k j So β simplicity is the average effect of simplicity on depression after controlling for fatalism 9 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Always plot residuals 0.5 1.0 1.5 2.0 2.5 3.0 0.5 0.0 0.5 1.0 simplicity ε^ 0.0 0.5 1.0 1.5 2.0 0.5 0.0 0.5 1.0 fatalism ε^ 10 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Histogram of residuals Frequency 0 5 10 15 0.5 0.0 0.5 1.0 ε^ 11 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

OLS estimates of α, β 1,..., β p (This is only really reasonable to write down if p = 2) where ˆβ 1 = s Y (r X1 Y r X1 X 2 r X2 Y ) s X1 (1 r 2 X 1 X 2 ) ˆβ 2 = s Y (r X2 Y r X1 X 2 r X1 Y ) s X2 (1 r 2 X 1 X 2 ) ˆα = Ȳ ˆβ 1 X1 ˆβ 2 X2, r AB = n Y i = α + β 1 X 1i + β 2 X 2i + ε i n i=1 (A i Ā)(B i B) i=1 (A n i Ā)2 i=1 (B i B) for some A and B and 2 SA 2 = 1 n (A i n 1 Ā)2 for some A i=1 12 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

It is easier if you know matrix algebra Y = Xβ + ε, where y 1 1 x 11... x 1p α ε 1 y 2 Y =.., X = 1 x 21... x 2p........., β = β 1.., ε = ε 2.. y n 1 x 21... x np β p ε n 13 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

It is easier if you know matrix algebra It turns out that the error sum of squares can be written as X T Y X T Xˆβ = 0 ˆε = (Y Xβ) T (Y Xβ) ˆε β = 2XT (Y Xβ) set = 0 X T Y = X T Xˆβ (X T X) 1 X T Y = ˆβ 14 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

It is easier if you know matrix algebra A couple of things are clear ˆβ = (X T X) 1 X T Y 1 ˆβ is linear in Y 2 ˆβ is easy to compute if we have a computer 15 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

The coefficient of determination Similarly to simple linear regression, and where TSS = n (Y i Ȳ ) 2, ESS = i=1 r 2 = ESS TSS TSS = ESS + RSS, n (Ŷ i Ȳ ) 2, RSS = i=1 SS: Sum of Squares. T: Total. E: Explained. R: Residual n (Y i Ŷ i ) 2 i=1 16 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

s 2 and degrees of freedom Similarly to simple linear regression, s 2 = 1 n p 1 = RSS n p 1 n (Y i Ŷi) 2 i=1 Note the n p 1 degrees of freedom. Why? We had to estimate p + 1 regression parameters. 17 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Hypothesis tests for β j Suppose we are interested in testing H 0 : β j = 0 H A : β j 0 (or the one-sided version) Assuming p = 2 (tractable, but more complicated for p > 2), define and similarly for s 2ˆβ 2. Then (even for p > 2), s 2ˆβ1 = s 2 (n 1)s 2 X 1 (1 r 2 X 1X 2 ) t ˆβ j = ˆβ j β j s ˆβ j t n p 1 18 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Hypothesis tests for β j Notice that s 2ˆβ1 = s 2 (n 1)s 2 X 1 (1 r 2 X 1X 2 ) depends on r 2 X 1X 2, which depends on X 2. So the test for β 1 depends on the other predictor variables What is the interpretation of this test then? Assuming that the other β k j 0, can we reject the hypothesis that β j = 0? 19 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Scatterplot matrix depression 0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 0.5 1.0 1.5 2.0 2.5 3.0 simplicity 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 fatalism 20 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

3D Scatterplot and plane 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 Simplicity Fatalism Depression 21 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Tests for β simplicity and β fatalism β simplicity : β fatalism : t βsimplicity = 3.649 p-value = 0.0005 t βfatalism = 3.829 p-value = 0.0003 But what if we take fatalism out of the model? Then we get β simplicity : t βsimplicity = 4.175 p-value = 2 10 8 Why? 22 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Scatterplot matrix depression 0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 0.5 1.0 1.5 2.0 2.5 3.0 simplicity 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 fatalism 23 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

1907 Romanian Peasant Rebellion From Wikipedia: The Romanian Peasants Revolt took place in March 1907 in Moldavia and it quickly spread, reaching Wallachia. Y = Intensity of the rebellion, by county X 1 = Commercialization of agriculture X 2 = Traditionalism X 3 = Strength of middle peasantry X 4 = Inequality of land tenure 24 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Scatterplot matrix intensity 10 20 30 40 5 10 15 1 1 2 3 4 10 20 30 40 commerce tradition 80 85 90 5 10 15 midpeasant 1 1 2 3 4 80 85 90 0.45 0.60 0.75 0.45 0.60 0.75 inequality 25 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Peasant Rebellion results With all the predictors in the model: Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -12.32796 5.74640-2.145 0.0418 * commerce 0.10055 0.02144 4.690 8.33e-05 *** tradition 0.10578 0.06161 1.717 0.0984. midpeasant 0.09333 0.07466 1.250 0.2229 inequality 0.42198 3.11171 0.136 0.8932 26 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Peasant Rebellion results Without commerce, tradition becomes significant at α = 0.05: Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -20.03497 7.40287-2.706 0.0119 * tradition 0.19705 0.07859 2.507 0.0187 * midpeasant 0.03480 0.09897 0.352 0.7279 inequality 5.12172 3.96053 1.293 0.2073 27 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Caveats 1 Be careful interpreting the coefficients! Multiple regression is usually applied to observational data 2 Do not think of the sign of the coefficient as special it can actually change as other covariates are added or removed from the model 3 Similarly, tests about any covariate are only meaningful in the context of the other covariates in the model 4 Always make sure a linear model is appropriate for all predictors! 5 Always check residuals for heteroscedasticity and normality 28 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Caveats In particular, a special case that you should be careful about is when the predictors are highly correlated In this situation the coefficient estimates may change erratically in response to small changes in the model or the data This phenomenon is called multicollinearity Because of that, matrix correlation of the predictors is also something to look at (and report) in the analysis 29 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013

Summary 1 Multiple linear regression fits the best hyperplane to the data 2 We can test hypotheses about any of the β j s 3 Be careful about interpretation 4 Correlation of the predictors also important because of multicollinearity 30 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013