Gov 2000: 9. Regression with Two Independent Variables

Size: px

Start display at page:

Download "Gov 2000: 9. Regression with Two Independent Variables"

Dustin Garrison
5 years ago
Views:

1 Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall / 62

2 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics with Two Covariates 5. OLS Assumptions with Two Covariates 6. Omitted Variable Bias 7. Goodness of Fit & Multicollinearity 2 / 62

3 Where are we? Where are we going? 10 Number of Covariates in Our Regressions Last Week 3 / 62

4 Where are we? Where are we going? 10 Number of Covariates in Our Regressions Last Week This Week 4 / 62

5 Where are we? Where are we going? 10 Number of Covariates in Our Regressions Last Week This Week Next Week 5 / 62

6 1/ Why Add Variables to a Regression? 6 / 62

7 7 / 62

8 Berkeley gender bias Graduate admissions data from Berkeley, 1973 Acceptance rates: Men: 8442 applicants, 44% admission rate Women: 4321 applicants, 35% admission rate Evidence of discrimination toward women in admissions? This is a marginal relationship. What about the conditional relationship within departments? 8 / 62

9 Berkeley gender bias, II Within departments: Men Women Dept Applied Admitted Applied Admitted A % % B % 25 68% C % % D % % E % % F 373 6% 341 7% Within departments, women do somewhat better than men! Women apply to more challenging departments. Marginal relationships (admissions and gender) conditional relationship given third variable (department). 9 / 62

10 Simpson s paradox 1 Z = 1 0 Y -1-2 Z = X Overall a positive relationship between Y i and X i. 10 / 62

11 Simpson s paradox 1 Z = 1 0 Y -1-2 Z = X Overall a positive relationship between Y i and X i. But within levels of Z i, the opposite. 11 / 62

12 Basic idea Old goal: estimate the mean of Y as a function of some independent variable, X: E[Y i X i ]. For continuous X s, we modeled the CEF/regression function with a line: Y i = β 0 + β 1 X i + u i New goal: estimate the relationship of two variables, Y i and X i, conditional on a third variable, Z i : Y i = β 0 + β 1 X i + β 2 Z i + u i β s are the population parameters we want to estimate. 12 / 62

13 Why control for another variable Descriptive Predictive Causal Get a sense for the relationships in the data. Conditional on the number of steps I ve taken, does higher activity levels correlate with less weight? We can usually make better predictions about the dependent variable with more information on independent variables. Block potential confounding, which is when X doesn t cause Y, but only appears to because a third variable Z causally affects both of them. 13 / 62

14 Plan of attack 1. Interpretation with a binary Z i 2. Interpretation with a continuous Z i 3. Mechanics of OLS with 2 covariates 4. OLS assumptions with 2 covariates: Omitted variable bias Multicollinearity 14 / 62

15 What we won t cover in lecture 1. The OLS formulas for 2 covariates 2. Proofs 3. The second covariate being a function of the first: Z i = X 2 i 4. Hypothesis test/confidence intervals (almost exactly the same) 15 / 62

16 2/ Adding a Binary Covariate 16 / 62

17 Example Non-African countries Log GDP per capita African countries Strength of Property Rights 17 / 62

18 Basics Ye olde model: E[Y i X i ] = α 0 + α 1 X i (α0, α 1 ) are the bivariate intercept/slope, e i is the bivariate error. Concern: AJR might be picking up an African effect : African countries might have low incomes and weak property rights. Condition on country being in Africa or not to remove this: E[Y i X i, Z i ] = β 0 + β 1 X i + β 2 Z i Z i = 1 to indicate that i is an African country Zi = 0 to indicate that i is an non-african country Effects are now within Africa or within non-africa, not between 18 / 62

19 AJR model ajr.mod <- lm(logpgp95 ~ avexpr + africa, data = ajr) summary(ajr.mod) ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) <2e-16 *** ## avexpr <2e-16 *** ## africa e-08 *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: on 108 degrees of freedom ## (52 observations deleted due to missingness) ## Multiple R-squared: 0.708, Adjusted R-squared: ## F-statistic: 131 on 2 and 108 DF, p-value: <2e / 62

20 Two lines, one regression How can we interpret this model? Plug in two possible values for Z i and rearrange When Z i = 0: When Z i = 1: Y i = β 0 + β 1 X i + β 2 Z i = β 0 + β 1 X i + β 2 0 = β 0 + β 1 X i Y i = β 0 + β 1 X i + β 2 Z i = β 0 + β 1 X i + β 2 1 = ( β 0 + β 2 ) + β 1 X i Two different intercepts, same slope 20 / 62

21 Interpretation of the coefficients Intercept for X i Non-African country (Z i = 0) β 0 β 1 African country (Z i = 1) β 0 + β 2 β 1 Slope for X i In this example, we have: Y i = X i Z i β 0 : average log income for non-african country (Z i = 0) with property rights measured at 0 is β 1 : A one-unit increase in property rights is associated with a increase in average log incomes for two African countries (or for two non-african countries) β 2 : there is a average difference in log income per capita between African and non-african counties conditional on property rights 21 / 62

22 General interpretation of the coefficients Y i = β 0 + β 1 X i + β 2 Z i β 0 : average value of Y i when both X i and Z i are equal to 0 β 1 : A 1-unit increase in X i is associated with a β 1 -unit change in Y i for units with the same value of Z i β 2 : average difference in Y i between Z i = 1 group and Z i = 0 group for units with the same value of X i 22 / 62

23 Adding a binary variable, visually Log GDP per capita β 0 5 β 0 = β 1 = Strength of Property Rights 23 / 62

24 Adding a binary variable, visually Log GDP per capita β 0 β 0 = β 1 = β 2 = β β 0 + β Strength of Property Rights 24 / 62

25 Marginal vs conditional Log GDP per capita Strength of Property Rights 25 / 62

26 3/ Adding a Continuous Covariate 26 / 62

27 Adding a continuous variable Ye olde model: E[Y i X i ] = α 0 + α 1 X i New concern: geography is confounding the effect geography might affect political institutions geography might affect average incomes (through diseases like malaria) Condition on Z i : mean temperature in country i (continuous) E[Y i X i, Z i ] = β 0 + β 1 X i + β 2 Z i 27 / 62

28 AJR model, revisited ajr.mod2 <- lm(logpgp95 ~ avexpr + meantemp, data = ajr) summary(ajr.mod2) ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) e-12 *** ## avexpr e-08 *** ## meantemp ** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: on 57 degrees of freedom ## (103 observations deleted due to missingness) ## Multiple R-squared: 0.615, Adjusted R-squared: ## F-statistic: 45.6 on 2 and 57 DF, p-value: 1.48e / 62

29 Interpretation with a continuous Z Intercept for X i Z i = 0 C β 0 β 1 Z i = 21 C β 0 + β 2 21 β 1 Z i = 24 C β 0 + β 2 24 β 1 Z i = 26 C β 0 + β 2 26 β 1 In this example we have: Slope for X i Y i = X i 0.06 Z i β 0 : average log income for a country with property rights measured at 0 and a mean temperature of 0 is β 1 : A one-unit increase in property rights is associated with a change in average log incomes conditional on a country s mean temperature β 2 : A one-degree increase in mean temperature is associated with a 0.06 change in average log incomes conditional on strength of property rights 29 / 62

30 General interpretation Y i = β 0 + β 1 X i + β 2 Z i The coefficient β 1 measures how the predicted outcome varies in X i for units with the same value of Z i. The coefficient β 2 measures how the predicted outcome varies in Z i for units with the same value of X i. 30 / 62

31 4/ OLS Mechanics with Two Covariates 31 / 62

32 Fitted values and residuals Where do we get our hats? β 0, β 1, β 2 Fitted values for i = 1,, n: Residuals for i = 1,, n: Y i = E[Y i X i, Z i ] = β 0 + β 1 X i + β 2 Z i u i = Y i Y i Minimize the sum of the squared residuals, just like before: ( β 0, β 1, β 2 ) = arg min b 0,b 1,b 2 n (Y i b 0 b 1 X i b 2 Z i ) 2 We ll derive closed-form estimators with arbitrary covariates next week. i=1 32 / 62

33 OLS estimator recipe using two steps No explicit OLS formulas this week, but a recipe instead Partialling out OLS recipe: 1. Run regression of X i on Z i : X i = E[X i Z i ] = δ 0 + δ 1 Z i 2. Calculate residuals from this regression: r xz,i = X i X i 3. Run a simple regression of Y i on residuals, r xz,i : Y i = α 0 + α 1 r xz,i Estimate of α 1 will be equivalent to β 1 from the big regression: Y i = β 0 + β 1 X i + β 2 Z i 33 / 62

34 First regression Regress X i on Z i : ## when missing data exists, need the na.actionin order ## to place residuals or fitted values back into the data ajr.first <- lm(avexpr ~ meantemp, data = ajr, na.action = na.exclude) summary(ajr.first) ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) < 2e-16 *** ## meantemp *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.32 on 58 degrees of freedom ## (103 observations deleted due to missingness) ## Multiple R-squared: 0.241, Adjusted R-squared: ## F-statistic: 18.4 on 1 and 58 DF, p-value: / 62

35 Regression of log income on the residuals Save residuals: ## store the residuals ajr$avexpr.res <- residuals(ajr.first) Now we compare the estimated slopes: coef(lm(logpgp95 ~ avexpr.res, data = ajr)) ## (Intercept) avexpr.res ## coef(lm(logpgp95 ~ avexpr + meantemp, data = ajr)) ## (Intercept) avexpr meantemp ## / 62

36 Residual/partial regression plot Can plot the conditional relationship between property rights and income given temperature: 10 Log GDP per capita Residuals(Property Right ~ Mean Temperature) 36 / 62

37 5/ OLS Assumptions with Two Covariates 37 / 62

38 OLS assumptions for unbiasedness Simple regression assumptions unbiasedness/consistency of OLS: 1. Linearity 2. Random/iid sample 3. Variation in X i 4. Zero conditional mean error: E[u i X i ] = 0 Small modification to these with 2 covariates: 1. Linearity Y i = β 0 + β 1 X i + β 2 Z i + u i 2. Random/iid sample 3. No perfect collinearity 4. Zero conditional mean error (both X i and Z i unrelated to u i ) E[u i X i, Z i ] = 0 38 / 62

39 New assumption Assumption 3: No perfect collinearity (1) No independent variable is constant in the sample and (2) there are no exactly linear relationships among the independent variables. Two components 1. Both X i and Z i have to vary. 2. Z i cannot be a deterministic, linear function of X i. Part 2 rules out anything of the form: Z i = a + bx i What s the correlation between Z i and X i? 1! 39 / 62

40 Perfect collinearity example Simple example: X i = 1 if a country is not in Africa and 0 otherwise. Zi = 1 if a country is in Africa and 0 otherwise. But, clearly we have the following: Z i = 1 X i These two variables are perfectly collinear. What about the following: Xi = property rights Z i = Xi 2 Do we have to worry about collinearity here? No! Because while Z i is a deterministic function of X i, it is a nonlinear function of X i. 40 / 62

41 R and perfect collinearity R, Stata, et al will drop one of the variables if there is perfect collinearity: ajr$nonafrica <- 1 - ajr$africa summary(lm(logpgp95 ~ africa + nonafrica, data = ajr)) ## ## Coefficients: (1 not defined because of singularities) ## Estimate Std. Error t value Pr(> t ) ## (Intercept) < 2e-16 *** ## africa e-14 *** ## nonafrica NA NA NA NA ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: on 146 degrees of freedom ## (15 observations deleted due to missingness) ## Multiple R-squared: 0.323, Adjusted R-squared: ## F-statistic: 69.7 on 1 and 146 DF, p-value: 4.87e / 62

42 6/ Omitted Variable Bias 42 / 62

43 Unbiasedness revisited Long regression: Y i = β 0 + β 1 X i + β 2 Z i + u i Assumptions 1-4 OLS is unbiased for β 0, β 1, β 2 What happens if we ignore the Z i and just run the simple linear regression with just X i? Short regression: Y i = α 0 + α 1 X i + u i OLS estimates from the short regression: ( α 0, α 1 ) Question: will E[ α 1 ] = β 1? If not, what will be the difference? 43 / 62

44 Deriving the short regression How can we relate α 1 to β 1? Short regression will be unbiased for CEF of Yi just given X i. Write short CEF in terms of the long regression model: E[Y i X i ] = E[β 0 + β 1 X i + β 2 Z i + u i X i ] = β 0 + β 1 X i + β 2 E[Z i X i ] + E[u i X i ] By assumption 4, X i is unrelated to the long-regression error, so E[u i X i ] = 0. E[Y i X i ] = β 0 + β 1 X i + β 2 E[Z i X i ] 44 / 62

45 Deriving the short regression E[Y i X i ] = β 0 + β 1 X i + β 2 E[Z i X i ] Let E[Z i X i ] = γ 0 + γ 1 X i be the (population) CEF from a regression of Z i on X i. Then, we can write the short CEF as: E[Y i X i ] = β 0 + β 1 X i + β 2 (γ 0 + γ 1 X i ) = (β 0 + γ 0 ) + (β 1 + β 2 γ 1 )X i = α 0 + α 1 X i Under these assumptions, short regression OLS unbiased for α 1 : E[ α 1 ] = α 1 = β 1 + β 2 γ 1 45 / 62

46 Omitted variable bias Omitted variable bias: bias for long regression coefficient from omitting Z i : Bias( α 1 ) = E[ α 1 ] β 1 = β 2 δ 1 In other words omitted variable bias is: ( effect of Z i on Y i ) ( effect of X i on Z i ) (omitted outcome) (included omitted) 46 / 62

47 Omitted variable bias, summary Remember that by OLS, the effect of X i on Z i is: δ 1 = cov(z i, X i ) var(x i ) We can summarize the direction of bias like so: cov(x i, Z i ) > 0 cov(x i, Z i ) < 0 cov(x i, Z i ) = 0 β 2 > 0 Positive bias Negative Bias No bias β 2 < 0 Negative bias Positive Bias No bias β 2 = 0 No bias No bias No bias Very relevant if Z i is unobserved for some reason! 47 / 62

48 Including irrelevant variables What if we do the opposite and include an irrelevant variable? What would it mean for Z i to be an irrelevant variable? Y i = β 0 + β 1 X i + 0 Z i + u i So in this case, the true value of β 2 = 0. But under Assumptions 1-4, OLS is unbiased for all the parameters: E[ β 0 ] = β 0 E[ β 1 ] = β 1 E[ β 2 ] = 0 Including an irrelevant variable will increase the standard errors for β / 62

49 7/ Goodness of Fit & Multicollinearity 49 / 62

50 Prediction error How do we judge how well a regression fits the data? How much does X i help us predict Y i? Prediction errors without X i : Best prediction is the mean, Y Prediction error is called the total sum of squares (SS tot ) would be: Prediction errors with X i : SS tot = n i=1 (Y i Y) 2 Best predictions are the fitted values, Y i. Prediction error is the the sum of the squared residuals or SS res : SS res = n i=1 (Y i Y i ) 2 50 / 62

51 Total SS vs SSR 12 Total Prediction Errors 11 Log GDP per capita Y Strength of Property Rights 51 / 62

52 Total SS vs SSR Residuals Log GDP per capita Y i Strength of Property Rights 52 / 62

53 R-square Regression will always improve in-sample fit: SS tot > SS res How much better does using X i do? Coefficient of determination or R 2 : R 2 = SS tot SS res SS tot = 1 SS res SS tot R 2 = fraction of the total prediction error eliminated by conditioning on X i. Common interpretation: R 2 is the fraction of the variation in Y i is explained by X i. R 2 = 0 means no relationship R 2 = 1 implies perfect linear fit 53 / 62

54 Sampling variance for bivariate regression Under simple linear regression and homoskadasticity, the sampling variance of the slope was: V[ β 1 X] = σ 2 u n i=1 (X i X) 2 = Factors affecting the standard errors: σ 2 u (n 1)S 2 X The error variance σ 2 u (higher conditional variance of Y i leads to bigger SEs) The sample variance of X i : SX 2 (lower variation in X i leads to bigger SEs) The sample size n (higher sample size leads to lower SEs) 54 / 62

55 Sampling variation with 2 covariates Regression with an additional independent variable: V[ β 1 X i, Z i ] = σ 2 u (1 R 2 1 )(n 1)S2 X Here, R 2 1 is the R2 from the regression of X i on Z i : X i = δ 0 + δ 1 Z i Factors now affecting the standard errors: The error variance: σ 2 u The sample variance of Xi : SX 2 The sample size n The strength of the (linear) relationship betwee Xi and Z i (stronger relationships mean higher R1 2 and thus bigger SEs) 55 / 62

56 Multicollinearity Definition Multicollinearity is defined to be high, but not perfect, correlation between two independent variables in a regression. With multicollinearity, we ll have R1 2 1, but not exactly. The stronger the relationship between X i and Z i, the closer the R1 2 will be to 1, and the higher the SEs will be: V[ β 1 X i, Z i ] = σ 2 u (1 R 2 1 )(n 1)S2 X Given the symmetry, it will also increase var( β 2 ) as well. 56 / 62

57 Intuition for multicollinearity Remember the OLS recipe: r xz,i are the residuals from the regression of X i on Z i β 1 from regression of Y i on Estimated coefficient: r xz,i β 1 = ĉov[ r xz,iy i ] V[ rxz,i 2 ] When Z i and X i have a strong relationship, then the residuals will have low variation We explain away a lot of the variation in X i through Z i. 57 / 62

58 Multicollinearity, visualized Weak X-Z Strong X-Z X 2 X Z Z 58 / 62

59 Multicollinearity, visualized Weak X-Z Strong X-Z Y 2 Y Residuals(X ~ Z) Residuals(X ~ Z) 59 / 62

60 Multicollinearity, visualized Weak X-Z Strong X-Z Y 2 Y Residuals(X ~ Z) Residuals(X ~ Z) 60 / 62

61 Effects of multicollinearity No effect on the bias of OLS. Only increases the standard errors. Really just a sample size problem: If X i and Z i are extremely highly correlated, you re going to need a much bigger sample to accurately differentiate between their effects. 61 / 62

62 Conclusion In this brave new world with 2 independent variables: 1. β s have slightly different interpretations 2. OLS still minimizing the sum of the squared residuals 3. Adding or omitting variables in a regression can affect the bias and the variance of OLS Remainder of class: 1. Regression in most general glory (matrices) 2. How to diagnose and fix violations of the OLS assumptions 62 / 62

Gov 2000: 9. Regression with Two Independent Variables

Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Harvard University mblackwell@gov.harvard.edu Where are we? Where are we going? Last week: we learned about how to calculate a simple