I 1. 1 Introduction 2. 2 Examples Example: growth, GDP, and schooling California test scores Chandra et al. (2008)...

Size: px

Start display at page:

Download "I 1. 1 Introduction 2. 2 Examples Example: growth, GDP, and schooling California test scores Chandra et al. (2008)..."

Brianna Howard
5 years ago
Views:

1 Part I Contents I 1 1 Introduction 2 2 Examples Example: growth, GDP, and schooling California test scores Chandra et al. (2008) Interpretation 7 4 OLS estimation 10 5 Partitioned regression Example: California test scores Example: Bronnenberg, Dhar, and Dubé (2009) 15 References Wooldridge (2013) chapter 3 Stock and Watson (2009) chapter 6 (parts of 7,8,9, & 18 also relevant) Angrist and Pischke (2014) chapter 2 Bierens (2010) Baltagi (2002) chapter 4 Linton (2017) Part III (more advanced) 1

2 1 Introduction Why we need a multiple regression model There are many factors affecting the outcome variable y. If we want to estimate the marginal effect of one of the factors (regressors), we need to control for other factors. Suppose that we are interested in the effect of x 1 on y, but y is affected by both x 1 and x 2 : y i = β 0 + β 1 x 1,i + β 2 x 2,i + ε i (1) Why we need a multiple regression model Assume: MLR.1 (linear model) MLR.2 (independence) {(x 1,i, x 2,i, y i )} n is independent random sample MLR.3 (rank condition) no multicollinearity MLR.4 (exogeneity) E[ε X] = 0 Suppose we regress y only against x 1 : ˆβ s 1 = (x 1,i x 1 )y i (x 1,i x 1 ) 2 What is E[ˆβ s 1 ]? Why we need a multiple regression model What is E[ˆβ s 1 ]? ˆβ s 1 = (x 1,i x 1 )(β 0 + β 1 x 1,i + β 2 x 2,i + ε i ) (x 1,i x 1 ) 2 =β 1 + β (x 1,i x 1 )x n 2,i 2 n (x 1,i x 1 ) 2 + (x 1,i x 1 )ε i (x 1,i x 1 ) 2 E[ˆβ s 1 x s] =β 1 + β 2 (x 1,i x 1 )x 2,i (x 1,i x 1 ) 2 =β 1 + β 2 Ĉov(x 1, x 2 ) Var(x 1 ) 2

3 ˆβ 1 s biased unless Cov(x 1, x 2 ) = 0 or β 2 = 0 We will call the regression of y on only x 1 the short regression, and the regression of y on x 1 and x 2 the long regression. The short regression coefficient on x 1, β s 1 captures how as x 1 varies there is both a Ĉov(x direct impact on y, β 1, and an indirect impact through how x 2 varies with x 1, β 1,x 2 ) 2. When we look Var(x 1 ) at the long regression, we eliminate this indirect channel, effectively holding x 2 constant. This is a very important concept. In multivariate regression, the interpretation of the slope coefficients is that they are the effect of each x variable holding all others constant. For example, when there are two regressors, β 1 is the effect of x 1 holding x 2 constant. Omitted variable bias When true model is long regression but we only estimate the short regression, then y i = β 0 + β 1 x 1,i + β 2 x 2,i + ε i y i = β 0 + β s 1 x 1,i + v i }{{} =β 2 x 2,i +ε i E[ˆβ s 1 ] =β 1 + β 2 Ĉov(x 1, x 2 ) Var(x 1 ) If x 1 and x 2 related, we can no longer say that E[v i x 1,i ] = 0 When x 1 changes, x 2 changes as well, which contaminates estimation of the effect of x 1 on y As a result, the short regression estimate, ˆβ 1 s, is biased Omitted variable bias is a useful way for thinking about how a violation of the exogeneity assumption might affect our estimates. For example, consider a regression of wages on years of eduction. Let y = log wage, x = years of education. The regression model is y i = β 0 + β 1 x i + ε i, where we want β 1 to be the causal effect of education on wages. It is likely that there is some unobserved ability, say a i that effects both wages and education. Since ability effects wages, a i is part of ε i. Thus, it seems unlikely that E[ε i x i ] = 0, so OLS is likely biased. If we think about omitted variables bias, we can be more specific about the bias. Decompose ε as ε i = β 2 a i + u i 3

where u i represents all unobservables that affect wages and are uncorrelated with ability. Then we have Omitted variables bias tells us that y i = β 0 + β 1 x i + β 2 a i + u i.

4 where u i represents all unobservables that affect wages and are uncorrelated with ability. Then we have Omitted variables bias tells us that y i = β 0 + β 1 x i + β 2 a i + u i. E[ˆβ 1 s ] =β Ĉov(x, a) 1 + β 2. Var(x) We expect ability to be positively related to wages (β 2 > 0) and education (Cov(x, a) > 0), thus we expect the bivariate regression of log wages on education to produce an upward biased estimate of the causal effect of education on wages, i.e. E[ˆβ s 1 ] =β 1 + β 2 Ĉov(x, a) Var(x) > β 1. 2 Examples 2.1 Example: growth, GDP, and schooling Example: growth, GDP, and schooling 4

5 Model 1 Model 2 Model 3 (Intercept) (0.378) (0.418) (0.389) rgdp (0.095) (0.146) yearsschool (0.089) (0.144) R Adj. R Num. obs RMSE p < 0.001, p < 0.01, p < 0.05 Table 1: Growth and GDP and education in 1960 Example: growth, GDP, and schooling 2.2 California test scores Test scores and student teacher ratios Data from Stock and Watson (2009) School average test score and student teacher ratios Code Test scores and student teacher ratios 2.3 Chandra et al. (2008) 0 Code 5

6 β (9.47) (60.72) (7.45) (5.77) student teacher (0.48) (6.19) (0.35) (0.28) ( student ) 2 teacher 0.11 (0.16) Income English learners (0.09) (0.07) 0.49 (0.03) Num. obs *** p < 0.01, ** p < 0.05, * p < 0.1 Table 2: Test scores and student teacher ratios Chandra et al. (2008) Does Watching Sex on Television Predict Teen Pregnancy? Long regression: preg = teen pregnancy preg = β 0 + β 1 sext V + β 2 totalt V + ε sext V = hours of television with sexual content totalt V = hour of television Short (bivariate) regression: preg = β s 0 + βs 1 sext V + v Results: Short: ˆβ s 1 Var(sexT V ) = 0.18 Long: ˆβ 1 Var(sexT V ) = 0.30 Also report that What is β 2? ˆγ 1 Var(sexT V ) = 0.29 totalt V = γ 0 + γ 1 sext V + e From omitted variable bias formula, we know E[ˆβ s 1 ] =β 1 + β 2 Ĉov(sexT V, totalt V ) Var(sexT V ) }{{} =ˆγ 1 =β 1 + β 2ˆγ 1 β 2 = β 1 E[ˆβ s 1 ] ˆγ 1 6

7 so, ˆβ 2 = ˆβ 1 ˆβ 1 s ˆγ 1 = (ˆβ 1 ˆβ 1 s) Var(sexT V ) ˆγ 1 Var(sexT V ) = 0.29 = Interpretation Multiple linear regression model y i = β 0 + β 1 x 1,i + β 2 x 2,i + + β k x k,i + ε i As in bivariate regression, let s assume: MLR.1 (linear model) MLR.2 (independence) {(x 1,i, x 2,i, y i )} n is independent random sample MLR.3 (rank condition) no multicollinearity MLR.4 (exogeneity) E[ε x 1,i, x 2,i,..., x k,i ] = 0 Interpretation of coefficients β j is partial (marginal) effect of x j on y β j is partial (marginal) effect of E[y x 1,..., x k ] β j = y i x j,i β j = E[y x 1,..., x k ] x j 7

8 Other regressors are held constant β 1 is the ceteris paribus effect of x 1,i on y i y i = β 0 + β 1 x 1,i + β 2 x 2,i + + β k x k,i + ε i Changing more than one regressor simultaneously Sometimes it does not make sense to change x j without also changing x k Example: age-earnings profile log wage i = β 0 + β 1 age i + β 2 age 2 i + ε i Cannot age while holding age 2 constant Instead should report marginal effect or average marginal effect log wage i agei =a = β 1 + 2β 2 a age i log wage i age i = β 1 + 2β 2 age Changing more than one regressor simultaneously Example: Chandra et al. (2008) effect of exposure to sexual content on television on teen pregnancy preg = β 0 + β 1 sext V + β 2 totalt V + ε sext V = hours of television with sexual content totalt V = hour of television Should we care about β 1, β 2, or some combination? 8

9 9

10 4 OLS estimation OLS estimation (ˆβ 0,..., ˆβ k ) = arg min b 0,...,b k (y i b 0 b 1 x 1,i b k x k,i ) 2 10

11 First order conditions: ) (y i ˆβ 0 ˆβ 1 x 1,i ˆβ k x k,i or with ˆε i = (y i ˆβ 0 ˆβ 1 x 1,i ˆβ ) k x k,i. (y i ˆβ 0 ˆβ 1 x 1,i ˆβ ) k x k,i (y i ˆβ 0 ˆβ 1 x 1,i ˆβ k x k,i ), = 0 x 1,i = 0. =. x k,i = 0 ˆε i =0 ˆε i x j,i =0 for j = 1, 2,..., k We choose ˆβ 0,..., ˆβ k so that residuals and regressors are uncorrelated (orthogonal) First order conditions are a system of linear equations in ˆβ 0,..., ˆβ k The first order conditions are a system of linear equations in ˆβ 0,..., ˆβ k. It is possible to write down an explicit algebraic expression for the ˆβ j, but it would be quite complicated and not very useful. 1 5 Partitioned regression Although a completely explicit solution to the first order conditions is complicated, it is possible to write a recursive formula for ˆβ j. This approach is called partitioned regression or partialling out. It will be useful in proofs and for thinking about bias in multiple regression. Partitioned regression A useful representation of ˆβ j (e.g. j = 1) Regress x 1,i on other regressors where x 1,i is the OLS residual x 1,i = ˆγ 0 + ˆγ 2 x 2,i + + ˆγ k x k,i + x 1,i 1 If you have taken linear algebra and are comfortable working with matrices, then you can write the OLS coefficients in a not too complicated way. (If you are not familiar with matrices, then you can safely ignore this footnote). Let X be an n (k + 1) matrix where the first column are all ones and the ith, jth for entry for j > 1 is x j 1,i. Let Y be an n 1 vector consisting of the y i, and let ˆβ be (k + 1) 1. Then the first order conditions can be written as X (Y X ˆβ) = 0. Solving for ˆβ we have ˆβ = (X X) 1 (X Y ). 11

12 Then ˆβ 1 = x 1,iy i x2 1,i Note that 1 n x 1,i = 0, so ˆβ 1 = slope coefficient from regressing y on x 1 Partitioned regression gives a recursive formula for ˆβ 1 in that it involves the residuals from the regression of x 1,i on the other x s. This is itself a multiple regression. However, it is a multiple regression with one less independent variable. We could apply partitioned regression again to calculate the ˆγ s. At some point the first step of partitioned regression would just involve a single independent variable and we d be done. For example, with k = 3, we can say ˆβ 1 = x 1,iy i x2 1,i where x 1,i are the residuals from x 1,i = ˆγ 0 + ˆγ 2 x 2,i + ˆγ 3 x 3,i + x 1,i. Now, using partitioned regression again, we can say that ˆγ 2 = x 2,i x 1,i x 2 2,i where x 2,i are residuals from and x 2,i = ˆδ 0 + ˆδ 3 x 3,i + x 2,i ˆδ 3 = (x 3,i x 3 )y i (x 3,i x 3 ) 2. In practice, we rarely need to write down or work with all the steps back in the recursion, but it is important to understand that partitioned regression completely describes how to calculate the ˆβ j. To show that partitioned regression works, we need to show that it gives the same ˆβ 1 as the OLS first order conditions. Proof outline Substitute y i = ˆβ 0 + ˆβ 1 x 1,i + + ˆβ k x k,i + ˆε i into x 1,i y i x 2 1,i Use the following facts to simplify: 1. x 1,i = 0 2. x 1,ix j,i = 0 for j = 2,..., k 3. x 1,ix 1,i = x2 1,i 4. x 1,iˆε i = 0 See handout version of slides for details 12

13 Proof. To begin with let s assume the 4 facts are true. If we substitute y i = ˆβ 0 + ˆβ 1 x 1,i + + ˆβ k x k,i + ˆε i into the partitioned regression formula we have ) n x 1,i (ˆβ 0 + ˆβ 1 x 1,i + + ˆβ k x k,i + ˆε i x 1,iy i x2 1,i = x2 1,i distributing and rearranging =ˆβ x 1,i 0 + ˆβ x 1,ix 1,i 1 + ˆβ x 1,ix 2,i ˆβ x 1,ix k,i k + x2 1,i x2 1,i x2 1,i x2 1,i =ˆβ 1 using the 4 facts Now, we just need to show the 4 facts. The x 1,i are residuals from The first order conditions for this regression are This is exactly facts 1 and 2. Fact 3 comes from x 1,i = ˆγ 0 + ˆγ 2 x 2,i + + ˆγ k x k,i + x 1,i. x 1,i =0 x 1,i x 2,i =0. =. x 1,i x k,i =0. ) x 1,i x 1,i = x 1,i (ˆγ0 + ˆγ 2 x 2,i + + ˆγ k x k,i + x 1,i =ˆγ 0 x 1,i + ˆγ 2 x 1,i x 2,i + + ˆγ k using facts 1 and 2 = x 1,i 2. Finally, since ˆε i are residuals from regressing y on the x s, we have x 1,i x k,i + x 1,i x 1,i x 1,iˆε i x2 1,i 0 = ˆε i = ˆε i x 1,i = = ˆε i x k,i. 13

14 Therefore, ( ) x 1,iˆε i = x1,i ˆγ 0 ˆγ 2 x 2,i ˆγ k x k,i ˆεi = x 1,iˆε i ˆγ 0 ˆε i ˆγ 2 ˆε i x 2,i ˆγ k =0, ˆε i x k,i which shows fact 4. Partialling out ˆβ 1 = x 1,iy i x2 1,i 1. Regress x 1 against the other controls (regressors) and keep the residuals x 1, which is the part of x 1 that is uncorrelated with the other regressors 2. Regress y against x 1 ˆβ 1 measures the effect of x 1 after the effects of x 2,..., x k have been partialled out x 1,i is the part of x 1,i that is uncorrelated with the other regressors. This is another way of seeing that multiple regression gives the relationship between x 1 and y holding the other x s constant. The multiple regression slope coefficient, ˆβ 1 = x 1,iy i, x2 1,i looks at the covariance of y with the part of x 1,i that is uncorrelated with the other x s. In contrast, the short regression coefficient ˆβ s 1 = (x 1,i x 1 )y i (x 1,i x 1 ) 2, looks at the covariance of y with all of x 1,i, some of which may be related to the other x s. 5.1 Example: California test scores Example: California test scores and student teacher ratios 1 ## long r e g r e s s i o n 2 summary ( lm ( t e s t s c r ~ s t r + avginc + calw _ pct + meal_ pct + e l _ pct, data = ca ) ) 3 ## p a r t i a l l i n g out 4 t i l d e. s t r < r e s i d u a l s ( lm ( s t r ~ avginc + calw _ pct + meal_ pct + e l _ pct, 5 data = ca ) ) 6 mean( t i l d e. s t r ) ## should be 0 7 cov ( t i l d e. str, ca $ avginc ) ## should be 0 8 sum( t i l d e. s t r * ca $ s t r ) 14

15 9 sum( t i l d e. s t r ^2) ## should equal previous l i n e ## should equal long r e g r e s s i o n c o e f f i c i e n t 12 cov ( t i l d e. str, ca $ t e s t s c r ) / var ( t i l d e. s t r ) 13 sum( t i l d e. s t r * ca $ t e s t s c r ) / sum( t i l d e. s t r ^2) ## = long r e g r e s s i o n c o e f f i c i e n t, but wrong standard e r r o r 16 summary ( lm ( ca $ t e s t s c r ~ t i l d e. s t r ) ) 6 Example: Bronnenberg, Dhar, and Dubé (2009) Example: Bronnenberg, Dhar, and Dubé (2009) Looks at market shares of brands of consumer packaged goods (CPG) across markets and time CPG = beer, coffee, ketchup, etc. Market shares from AC Nielsen scanner data Results This type of data has been used very frequently in IO during the last decade AC Nielsen distributes bar code scanners to a sample of consumers, consumers record every purchase by scanning bar codes 4-week intervals, June 1992-May 1995 Market shares variable across geographic markets, but persistent over time within each market Geographic market shares strongly correlated with first mover advantage * e.g. Miller (founded in Milwaukee) most popular beer in Milwaukee, Budweiser (founded in St. Louis) most popular beer in St. Louis 15

16 16

17 share im = β 0 + β 1 branda }{{ im } +β 2 earlyentry }{{ im } +ε im =1 if good i is brand A =1 if good i entered m first How are results in the three columns for beer related? ˆβ s 1 = (x 1,i x 1 )y i (x 1,i x 1 ) 2 17

18 Substitute y i = ˆβ 0 + ˆβ 1 x 1,i + ˆβ 2 x 2,i + ˆε i ˆβ s 1 = (x 1,i x 1 )(ˆβ 0 + ˆβ 1 x 1,i + ˆβ 2 x 2,i + ˆε i ) (x 1,i x 1 ) 2 = ˆβ 0 =0 ({}}{ ) ( x 1,i x 1 +ˆβ n 1 (x ) 1,i x 1 )x 1,i (x 1,i x 1 ) ˆβ 2 ( (x 1,i x 1 )x 2,i ) (x 1,i x 1 ) =0FOC =0FOC {( }}{ ) ({}} ){ x 1,iˆε i x 1 ˆε 1,i (x 1,i x 1 ) 2 ˆβ s 1 =ˆβ 1 + ˆβ 2 Ĉov(x 1, x 2 ) Var(x 1 ) =ˆβ 1 + ˆβ 2ˆγ 2 1 where ˆγ 2 1 is coefficient on x 1 in a regression of x 2 on x 1 Short (ˆβ 1 s) equals long (ˆβ 1 ) plus the effect of omitted (ˆβ 2 ) times the regression of the omitted on the included (ˆγ 1 2 ) (Angrist and Pischke, 2009) Similarly, ˆβ s 2 = ˆβ 2 + ˆβ 1 ˆγ 1 2 }{{} = Ĉov(x 1,x 2 ) Var(x 1 ) This formula is nearly identical to omitted variables bias. Omitted variables bias is about the E[ˆβ s 1 ] and involves population regression coefficients. It is that E[ˆβs 2 X] = β 2 + β 1 Ĉov(x 1, x 2 ) Var(x 1 ). We just the similar fact that ˆβ s 2 = ˆβ 2 + ˆβ 1 ˆγ 1 2 }{{} = Ĉov(x 1,x 2 ) Var(x 1 ). The former follows from the later by taking expectations conditional on X and assuming our usual four regression assumptions so that E[ˆβ j X] = β j. 18

19 In this example, ˆβ s bud = 0.118, ˆβ s entry = 0.134, ˆβ bud = 0.020, and ˆβ entry = 0.117, so Ĉov(bud, entry) Var(entry) Ĉov(bud, entry) Var(bud) ˆβ entry s ˆβ entry = ˆβ bud = = = ˆβ bud s ˆβ bud ˆβ entry = = References References Angrist, J.D. and J.S. Pischke Mostly harmless econometrics: An empiricist s companion. Princeton University Press. Angrist, Joshua D and Jörn-Steffen Pischke Mastering Metrics: The Path from Cause to Effect. Princeton University Press. Baltagi, BH Econometrics. Springer, New York. URL serialssolutions.com/?sid=sersol&ss_jc=tc &title=econometrics. Bierens, Herman J Multivariate Linear Regression. URL LINREG3.PDF. Bronnenberg, B.J., S.K. Dhar, and J.P.H. Dubé Brand history, geography, and the persistence of brand shares. Journal of Political Economy 117 (1): URL / Chandra, A., S.C. Martino, R.L. Collins, M.N. Elliott, S.H. Berry, D.E. Kanouse, and A. Miu Does watching sex on television predict teen pregnancy? Findings from a national longitudinal survey of youth. Pediatrics 122 (5): URL 122/5/1047.short. Linton, Oliver B Probability, Statistics and Econometrics. Academic Press. URL title=probability%2c%20statistics%20and%20econometrics. Stock, J.H. and M.W. Watson Introduction to Econometrics, 2/E. Addison-Wesley. Wooldridge, J.M Introductory econometrics: A modern approach. South-Western. 19

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna October 16, 2013 Outline Introduction Simple