The returns to schooling, ability bias, and regression

Size: px

Start display at page:

Download "The returns to schooling, ability bias, and regression"

Hugo Allison
6 years ago
Views:

1 The returns to schooling, ability bias, and regression Jörn-Steffen Pischke LSE October 4, 2016 Pischke (LSE) Griliches 1977 October 4, / 44

2 Counterfactual outcomes Scholing for individual i is described by a binary random variable, D i = {0, 1} say D i = 1 denotes completing high school, and D i = 0 denotes dropping out. The outcome of interest, log earnings is denoted by y i. Potential outcomes: What would have happened to someone who completes if they had dropped out and vice versa. Hence, for everybody there are two potential outcomes = { Y1i if D i = 1 Y 0i if D i = 0. Pischke (LSE) Griliches 1977 October 4, / 44

3 Potential and observed outcomes The effect of completing school for individual i is Y 1i Y 0i. The observed outcome, Y i, can be written in terms of potential outcomes as { Y1i if D Y i = i = 1 Y 0i if D i = 0 = Y 0i + (Y 1i Y 0i )D i. We only observe either Y 1i or Y 0i for a single individual. Pischke (LSE) Griliches 1977 October 4, / 44

4 The selection problem We can write the observed difference in earnings for the treated and untreated as the sum of two terms: E [Y i D i = 1] E [Y i D i = 0] }{{} Observed diff in earnings cond. on schooling = E [Y 1i D i = 1] E [Y 0i D i = 0] = E [Y 1i D i = 1] E [Y 0i D i = 1] }{{} average treatment effect on the treated +E [Y 0i D i = 1] E [Y 0i D i = 0] }{{} ability or selection bias Pischke (LSE) Griliches 1977 October 4, / 44

5 Selection on observables The key to regression is that conditional on observable variables, e.g. ability, counterfactual outcomes are mean independent of schooling: Then: E [Y 0i A i, D i ] = E [Y 0i A i ] (1) E [Y i A i, D i = 1] E [Y i A i, D i = 0] }{{} Observed difference in average log earnings = E [Y 1i A i, D i = 1] E [Y 0i A i, D i = 0] = E [Y 1i A i, D i = 1] E [Y 0i A i, D i = 1] }{{} if (1) holds get to switch D i = E [Y 1i Y 0i A i, D i = 1] }{{} average treatment effect on the treated in groups defined by A i Pischke (LSE) Griliches 1977 October 4, / 44

6 Selection on observables leads to matching We have just estimated group specific treatment effects defined by the variable A i : E [Y i A i = a, D i = 1] E [Y i A i = a, D i = 0] = E [Y 1i Y 0i A i = a, D i = 1] ρ a Do this for all values the variable A i takes on. To get from these group conditional averages to the overall average use the rules for conditional probability E [Y 1i Y 0i D i = 1] = E [Y 1i Y 0i A i = a, D i = 1] Pr (A i = a D i = 1) a = ρ a Pr (A i = a D i = 1) a Pischke (LSE) Griliches 1977 October 4, / 44

7 From matching to regression How to get from this to a regression is easiest to see in a constant effects setting. So let ρ a = ρ and write Using Y 0i = α + η i Y 1i = Y 0i + ρ Y i = Y 0i + (Y 1i Y 0i )D i = E (Y 0i ) + (Y 1i Y 0i ) D }{{}}{{} i + [Y 0i E (Y 0i )] }{{} α ρ η i = α + ρd i + η i Pischke (LSE) Griliches 1977 October 4, / 44

8 Still on the road to regression Using the CIA (1) Y i = α + ρd i + η i (2) E [Y 0i A i, D i ] = E [Y 0i A i ] E [η i A i, D i ] = E [η i A i ] we see that η i is a function of A i, which is related to D i. So (2) is not a regression yet. Approximate E [η i A i ] by a linear function Substitute in (2): E [η i A i ] = γa i η i = γa i + e i E [e i A i ] = 0 Y i = α + ρd i + γa i + e i which is a regression (i.e. E [e i A i, D i ] = 0). Pischke (LSE) Griliches 1977 October 4, / 44

9 Regression anatomy A key tool in understanding multivariate regression the regression anatomy formula (or Frisch-Waugh-Lowell theorem). Consider a generic bivariate regression: Then Y i = α + β 1 X 1i + β 2 X 2i + e i β 1 = Cov(Y i, X 1i ) Var( X 1i ) where X 1i is the residual from a regression of X 1i on X 2i : X 1i = π 0 + π 1 X 2i + X 1i. Pischke (LSE) Griliches 1977 October 4, / 44

10 The regression anatomy formula derived Substitute the expression for the long regression into Cov(Y i, X 1i ) Var( X 1i ) = Cov(α + β 1 X 1i + β 2 X 2i + e i, X 1i ) Var( X 1i ) =Var ( X 1i ) =0 =0 {}}{{}}{{}}{ = β 1Cov(X 1i, X 1i ) + β 2 Cov(X 2i, X 1i ) + Cov(e i, X 1i ) Var( X 1i ) = β 1 Var( X 1i ) Var( X 1i ) = β 1. In words: the coeffi cient from a bivariate regression of Y i on X 1i is numerically the same as the long regression coeffi cient on X 1i from a regression of Y i on X 1i and X 2i. Pischke (LSE) Griliches 1977 October 4, / 44

11 Alternative versions of regression anatomy Regression anatomy can also be written as β 1 = Cov(Y i, X 1i ) Var( X 1i ) = Cov(Ỹ i, X 1i ) Var( X 1i ) i.e. also partialling out X 2i from the dependent variable. It is not enough to partial out the covariate from the dependent variable alone Cov(Ỹ i, X 1i ) Var(X 1i ) = Cov(Ỹ i, X 1i ) Var( X 1i ) Var( X 1i ) Var(X 1i ) = β 1 (you can partial out X 2i from Y i but you must partial it out from X 1i ). Regression anatomy works for multiple covariates just as well. Pischke (LSE) Griliches 1977 October 4, / 44

12 Ability bias and the returns to schooling We would like to run the long regression Y i = α + ρs i + γa i + e i where Y i is log earnings, S i is schooling and A i is ability. If we don t have a measure of ability we can only run the short regression What do we get? Y i = α s + ρ s S i + e s i. Pischke (LSE) Griliches 1977 October 4, / 44

13 The omitted variables bias formula The relationship between the long and short regression coeffi cients is given by the omitted variables bias (OVB) formula ρ s = Cov(Y i, S i ) Var(S i ) = ρ + γδ AS where δ AS = Cov(A i,s i ) Var(S i ) is the regeression coeffi cient from a regression of A i (the omitted variable) on S i (the included variable). Exercise: Derive the OVB formula (works just like for regression anatonmy). The OVB formula is a mechanical relationship between two regressions: it holds regardless of the causal interpretation of any of the coeffcients. Pischke (LSE) Griliches 1977 October 4, / 44

14 Griliches (1977) regressions The conventional wisdom is Cov(A i, S i ) > 0, so returns to schooling estimates will be biased up. Short regression estimates using the NLS Long regression estimates Y i = const (0.003) S i + experience Y i = const (0.003) S i (0.0005) IQ i + experience The results are consistent with the conventional wisdom. Pischke (LSE) Griliches 1977 October 4, / 44

15 Classical measurement error Measurement error leads to bias. mismeasured. Many economic variables are Look at a generic example and start with a simple bivariate regression We don t observe X i where Y i = α + βx i + e i. but X i Xi = X i + m i Cov(Xi, m i ) = 0 Cov(e i, m i ) = 0. This is called classical measurement error. Pischke (LSE) Griliches 1977 October 4, / 44

16 Attenuation from classical measurement error The bivariate regression coeffi cient we estimate is β = Cov(Y i, X i ) Var(X i ) = Cov(α + βx i + e i, Xi + m i ) Var(Xi + m i ) = Var(Xi ) β Var(Xi ) + Var(m i ) = βλ. We see that β is biased towards zero by an attenuation factor λ = Var(X i ) Var(X i ) + Var(m i ) which is the variance in the signal divided by the variance in the signal plus noise. Pischke (LSE) Griliches 1977 October 4, / 44

17 How measurement error works Y X True data Mismeasured data Pischke (LSE) Griliches 1977 October 4, / 44

18 Measurement error in the returns to schooling Think of Y i as log earnings, and X i as schooling. experience for the moment. Ignore age or Ashenfelter and Krueger (1994) find λ = 0.9 for schooling. This means if the true return to schooling is 0.1, we would expect an estimate of Pischke (LSE) Griliches 1977 October 4, / 44

19 Measurement error with two regressors The bivariate regression is not particularly interesting, since we typically want to use regression to control for other factors. Consider Y i = α + β 1 X 1i + β 2 X 2i + e i. and only X 1i is subject to classical measurement error, i.e. Cov(X 1i, m i ) = Cov(X 2i, m i ) = 0. Starting from X 1i = X 1i + m i and using the classical measurement error assumptions we have X 1i = X 1i + m i. Pischke (LSE) Griliches 1977 October 4, / 44

20 Measurement error with two regressors The estimator of β 1 is now (using regression anatomy): ( ) β 1 = Cov Y i, X 1i ( ) Var X 1i ( ) = Cov α + β 1 X1i + β 2 X 2i + e i, X 1i + m i ( ) Var X 1i ( ) Var X 1i = β 1 ( ) = β 1 λ Var X 1i + Var (m i ) It is easy to see that λ λ whenever X 1i and X 2i are correlated. Fact Adding correlated regressors makes attenuation bias from measurement error worse. Pischke (LSE) Griliches 1977 October 4, / 44

21 Comparing the short and long regression The short regression (on just X 1i ) coeffi cient is β 1,short = λβ 1 + β 2 δ X2 X 1 = λ ( β 1 + β 2 δ X2 X 1 ) where the estimate of β 1 is biased both because of attenuation due to measurement error, and because of omitted variables bias (the part β 2 δ X2 X 1 where δ X2 X 1 is the coeffi cient from a regression of X 2i on X 1i ). The coeffi cient from the long regression is β 1,long = β 1 λ and since but λ < λ β 1,short β 1,long. Pischke (LSE) Griliches 1977 October 4, / 44

22 Comparing the short and long regression Notice that it is more diffi cult to compare the bias from the short regression and the long regression now. λ < λ implies that the attenuation bias goes up when another regressor is entered which is correlated with X 1i. There is less attenuation in the short regression but there is also OVB now. Not clear what the net effect is. Pischke (LSE) Griliches 1977 October 4, / 44

23 Measurement error in the control What about the coeffi cient β 2? Even when there is no measurement error in X 2i, the estimate of β 2 will be biased: β 2 = β 1 δ X2 X 1 Note that the bias will be larger the larger the measurement error the correlation between X 1i and X 2i The intuition is that ( ) 1 λ + β 2. β 1 is attenuated, and hence does not reflect the full effect of X 1i β 2 will capture part of the effect of X 1i through the correlation with X 2i Pischke (LSE) Griliches 1977 October 4, / 44

24 Measurement error in the returns to schooling controlling for ability We want to run the regression y i = α + ρs i + γa i + e i where Si is schooling and A i is ability. Suppose we only have a mismeasured version of schooling, S i (so S i takes on the role of X 1i before, and A i takes on the role of X 2i ). Then the short regression will give and the long regression ρ short = λρ + γδ AS ρ long = λρ If ability bias is upwards (δ AS > 0) it is not possible to say a priori which estimate will be closer to ρ. Pischke (LSE) Griliches 1977 October 4, / 44

25 Putting some numbers on the Griliches example Pick some numbers for the regression and set y i = 0.1S i λ = A i + e i σ S = 3, σ A = 15, σ AS = Then and δ AS = σ AS σ 2 S = = 2.5 ρ short = λρ + γδ AS = = Pischke (LSE) Griliches 1977 October 4, / 44

26 What about the long regression? We first need which is λ = Var ( S i ) Var ( S i ) + Var (m i ) = λ R2 AS 1 R 2 AS R 2 AS = λ = Then the long regression coeffi cient is ( ) 2 ( ) σas = = 0.25 σ S σ A = ρ long = λρ = = so the short regression coeffi cient is too large and the long regression coeffi cient is too small. Pischke (LSE) Griliches 1977 October 4, / 44

27 Measurement error in ability Now suppose years of schooling is measured perfectly but instead of Ai we only have mismeasured ability A i (so S i takes on the role of X 2i before, and A i takes on the role of X 1i ). Then ) ρ = γδ AS (1 λ + ρ. If ability bias is upwards (δ AS > 0) then the returns to schooling will be biased up but by less than in the short regression. Controlling for A i is better than controlling for nothing, but not as good as controlling for true ability A i. Pischke (LSE) Griliches 1977 October 4, / 44

28 Instrumental variables solve the measurement error problem Suppose you have an instrument Z i, correlated with the signal Xi uncorrelated with the error m i. In the bivariate regression you get and β IV = Cov(Y i, Z i ) Cov(X i, Z i ) = Cov(α + βx i + e i, Z i ) Cov(Xi = βcov(x i, Z i ) + m i, Z i ) Cov(Xi, Z i ) = β In the multivariate regression you get for similar reasons: β 1,IV = β 1 β 2,IV = β 2 (notice that only the mismeasured X 1i is instrumented) Pischke (LSE) Griliches 1977 October 4, / 44

29 More Griliches (1977) regressions Recall the Griliches long regression estimates y i = const (0.003) S i (0.0005) IQ i + experience Instrumenting IQ with results from the Knowledge of the World of Work test he gets y i = const (0.004) S i (0.0009) IQ i + experience This is still consistent with two stories: Pischke (LSE) Griliches 1977 October 4, / 44

30 More Griliches (1977) regressions Recall the Griliches long regression estimates y i = const (0.003) S i (0.0005) IQ i + experience Instrumenting IQ with results from the Knowledge of the World of Work test he gets y i = const (0.004) S i (0.0009) IQ i + experience This is still consistent with two stories: there is upward ability bias in the bivariate return to schooling and measurement error in IQ but not schooling (in this case the estimate of is correct) Pischke (LSE) Griliches 1977 October 4, / 44

31 More Griliches (1977) regressions Recall the Griliches long regression estimates y i = const (0.003) S i (0.0005) IQ i + experience Instrumenting IQ with results from the Knowledge of the World of Work test he gets y i = const (0.004) S i (0.0009) IQ i + experience This is still consistent with two stories: there is upward ability bias in the bivariate return to schooling and measurement error in IQ but not schooling (in this case the estimate of is correct) there is measurement error in both schooling and ability (in this case the true coeffi cient could be anything). Pischke (LSE) Griliches 1977 October 4, / 44

32 What to control for? In the quest for identifying causal effects, which variables belong on the right hand side of a regression equation? Yes: Variables determining the treatment and correlated with the outcome (e.g. ability). in general these variables will be fixed characteristics or pre-determined by the time of treatment (e.g. schooling) but remember the warnings on measurement error just discussed: the kitchen sink is no panacea Yes: Variables uncorrelated with the treatment but correlated with the outcome these variables may help reducing standard errors No: Variables which are outcomes of the treatment itself. bad controls. These are Pischke (LSE) Griliches 1977 October 4, / 44

33 Bad control Some researchers regressing earnings on schooling (and experience) include controls for occupation. Does this make sense? Clearly we can think of schooling affecting the access to higher level occupations, e.g. you need a Ph.D. to become a college professor. This gives rise to a two equation system Y i = α + ρs i + γo i + e i O i = λ 0 + λ 1 S i + u i You could think about these as a simultaneous equations system. Occupation O i is an endogenous variable. As a result, you could not necessarily estimate the first equation by OLS. Occupation is a bad control. Pischke (LSE) Griliches 1977 October 4, / 44

34 Bad control example 1 Pischke (LSE) Griliches 1977 October 4, / 44

35 Bad control example 2 Pischke (LSE) Griliches 1977 October 4, / 44

36 Proxy control Sometimes we control for a variable in the best of intentions. our regression of schooling on earnings Suppose Y i = α + ρs i + γa i + e i has a causal interpretation conditional on ability. Instead of ability we only have a test score taken at age 18, call it A li for late ability. The problem is that schooling will already have influenced the late ability (some students will have dropped out by 18). Suppose A li = π 0 + π 1 S i + π 2 A i i.e. that age 18 test scores are influenced by both schooling and true ability. Pischke (LSE) Griliches 1977 October 4, / 44

37 What do we get with proxy control? Substituting for A i in our regression above we get ( y i = α γ π ) ( 0 + ρ γ π ) 1 S i + γ A li + e i. π 2 π 2 π 2 If ρ > 0 γ > 0 π 1 > 0, π 2 > 0 then ( ρ γ π ) 1 < ρ π 2 and we will estimate a return to schooling that is too small. Pischke (LSE) Griliches 1977 October 4, / 44

38 Are our results robust? Suppose you have an identification strategy, say Y i = α + ρs i + γ 1 A i + e i. You have another variable, X i, which could potentially be an additional confounder. What do you do with it? Is ρ l = ρ? Y i = α l + ρ l S i + γ 1l A i + γ 2 X i + e l i Is δ = 0? X i = δ 0 + δs i + δ 1 A i + u i Pischke (LSE) Griliches 1977 October 4, / 44

39 A tale of two tests Y i = α + ρs i + γ 1 A i + e i Y i = α l + ρ l S i + γ 1l A i + γ 2 X i + e l i X i = δ 0 + δs i + δ 1 A i + u i The test ρ l = ρ is a coeffi cient comparison test. generalized Hausman test. The test δ = 0 is a balancing test. How are they related? Through the OVB formula: Formally a ρ ρ l = γ 2 δ. Under the maintained assumption γ 2 = 0, the two tests test the same null hypothesis δ = 0. Pischke (LSE) Griliches 1977 October 4, / 44

40 Measurement error in controls again Remember ( ) ρ l = γ 2 δ 1 λ + ρ γ 2 = γ 2 λ. Classical measurement error in X i biases ρ l towards ρ, and γ 2 towards 0. ρ l closer to ρ: power of the coeffi cient comparison test directly affected. Variation from measurement error ends up in ei l, raising standard errors and reducing power There is no bias in the estimate of δ Variation from measurement error ends up in u i, raising standard errors and reducing power Pischke (LSE) Griliches 1977 October 4, / 44

41 Theoretical Power Functions Homoskedastic, plain standard errors Rejection probability d Balancing test, θ=0 (baseline) Balancing test, θ=.7 Balancing test, θ=.85 CC test, θ=0 (baseline) CC test, θ=.7 CC test, θ=.85 Pischke (LSE) Griliches 1977 October 4, / 44

42 Simulated Rejection Rates Heteroskedasticity, robust standard errors Rejection probability d Balancing test, baseline Balancing test, θ=0, robust Balancing test, θ=.85, robust CC test, baseline CC test, θ=0, robust CC test, θ=.85, robust Pischke (LSE) Griliches 1977 October 4, / 44

43 Simulated Rejection Rates Mean reverting measurement error, robust standard errors Rejection probability d Balancing test, baseline CC test, baseline Balancing test, σ 2 µ=.75 CC test, σ 2 µ=.75 Balancing test, σ 2 µ=2.25 CC test, σ 2 µ=2.25 Pischke (LSE) Griliches 1977 October 4, / 44

44 Baseline Regressions for Returns to Schooling Mother's years of Library card Log hourly earnings education at age 14 (1) (2) (3) (6) (7) Years of education Mother's years of education Library card at age (0.0040) (0.0042) (0.0040) (0.0300) (0.0040) (0.0029) (0.0183) p-values Coefficient comparison test Balancing test Pischke (LSE) Griliches 1977 October 4, / 44

45 Returns to Schooling controlling for KWW score Mother's years of Library card Log hourly earnings education at age 14 (1) (2) (3) (6) (7) Years of education KWW score Mother's years of education Library card at age (0.0059) (0.0060) (0.0059) (0.0422) (0.0059) (0.0015) (0.0016) (0.0016) (0.0107) (0.0016) (0.0037) (0.0215) Body height in inches p-values Coefficient comparison test Balancing test Pischke (LSE) Griliches 1977 October 4, / 44

46 Returns to Schooling instrumenting the KWW score Mother's years of Library card Log hourly earnings education at age 14 (1) (2) (3) (6) (7) Years of education KWW score instrumented by IQ Mother's years of education Library card at age (0.0139) (0.0139) (0.0138) (0.0952) (0.0134) (0.0063) (0.0063) (0.0063) (0.0422) (0.0060) (0.0039) (0.0245) Body height in inches p-values Coefficient comparison test Balancing test Pischke (LSE) Griliches 1977 October 4, / 44

Poorly Measured Confounders are More Useful on the Left Than on the Right

Poorly Measured Confounders are More Useful on the Left Than on the Right Jörn-Steffen Pischke LSE Hannes Schwandt University of Zürich April 2016 Abstract Researchers frequently test identifying assumptions