CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

Size: px

Start display at page:

Download "CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M."

Constance Jacobs
6 years ago
Views:

1 CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. Linear Model with Additive Heterogeneity 3. Linear Model with Correlated Random Slopes 4. A Modeling Approach for Nonlinear Models 5. Estimating Average Partial Effects 6. Application to Probit/Fractional Response 1

2 1. Introduction In linear model with additive heterogeneity, unbalanced panels cause no serious issues provided certain assumptions about selection hold. FE is more robust than RE in the sense that with RE selection must be assumed uncorrelated with heterogeneity as well as with idiosyncratic shocks. FE allows arbitrary correlation between selection and c i. 2

3 For nonlinear models, RE easily adapts to unbalanced panels, but the selection assumptions are strong. Conditional MLE allows unbalanced panels when it applies (logit, Poisson). Treating the c i as parameters to estimate so called fixed effects does not require a balanced panel (but still has an incidental parameters problem with small T). 3

4 Drawback to correlated RE approach compared with CMLE or FE: Not clear how to allow for unbalanced panels when we want heterogeneity to be correlated with covariates and selection. Want to be able to handle nonlinear CRE models. Without specifically modeling the selection rule say, using a Heckman-type approach all methods assume selection is independent of shocks. We can test this assumption. 4

5 2. Linear Model with Additive Heterogeneity We still draw a random sample from the cross section, with units indexed by i. But now we may not observe a complete set of time series observations. Model this situation using a sequence of selection indicators, s i1,...,s it, where s it 1 if time period t for unit i can be used in estimation. Usually this means that we observe all elements of x it, y it. 5

6 We only use an i, t pair when a full set of data is observed, as happens when software is used for RE and FE on unbalanced panels. So we focus on complete-case estimators without imputation. We still write the population model for a random draw i as y it x it c i u it, t 1,...,T, where x it can generally include a fully set of time dummies, or other aggregate time variables. 6

7 Along with a rank condition, a sufficient (but not necessary) condition for consistency of FE on the unbalanced panel is E u it x i, c i, s i 0, t 1,...,T, where x i x i1, x i2,...,x it and s i s i1, s i2,...,s it. Allows selection in any time period to be correlated with x i, c i, but selection in all time periods must be unrelated to the idiosyncratic shocks. 7

8 The time-demeaned data now uses different time periods for different i. Let T 1 ÿ it y it T i r 1 T 1 ẍ it x it T i r 1 s ir y ir y it ȳ i s ir x ir x it x i T where T i r 1 s ir is random. 8

9 The FE estimator is then N N 1 i 1 T t 1 N N 1 i 1 1 s it ẍ it ẍ it T t 1 1 s it ẍ it ẍ it Consistency (fixed T, N ) follows if N N 1 i 1 T t 1 N N 1 i 1 s it ẍ it ÿ it T t 1 s it ẍ it u it rank T t 1 E s it ẍ it ẍ it K E s it ẍ it u it 0, t 1,...,T. 9

10 In the balanced case we know that if we estimate the equation y it x it x i v it by pooled OLS or RE we get the FE estimate for. Conveniently, the same result holds for unbalanced case, with a caveat: the time averages are defined only for the complete-case observations. 10

11 In other words, estimate the equation y it x it x i v it by pooled OLS using the s it 1 observations. The coefficient vector is identical to the fixed effects. 11

12 Unlike in the balanced case, any aggregate time variables, including time dummies, should be part of x it, and their time averages must be included in x i to get the FE estimates on the other variables. Wooldridge (2010, working paper) shows a more general result. Recall that the RE estimator can be obtained from a pooled OLS regression. On an unbalanced panel, it is where y it i ȳ i on x it i x i, 1 i x i, 1 i z i if s it 1, varies because T i varies. i 1 u 2 / u 2 T i c 2 1/2 12

13 Algebraic Fact: Let be the vector (K 1 of coefficients on x it i x i in the POLS regression above. Then FE, the fixed effects estimate on the unbalanced panel. Note that z i can contain any time-constant variables, including functions of T i, or interactions of the form T i x i, or allow a different set of coefficients on x i for each different T i. The coefficients on x i may changed widely across T i, yet the estimate on x it is still the FE estimate. 13

14 The algebraic equivalence still justifies the robust, regression-based Hausman test. Write a model with time-constant variables z i as where z i includes a constant. Use the Mundlak equation and estimate this by RE. y it x it z i c i u it, t 1,...,T, y it x it x i z i a i u it 14

15 The regression based Hausman test is just a (robust) Wald test of H 0 : 0 after RE estimation of the augmented equation. Remember that FE (and RE) assume that selection in any time period is not correlated with u it. Might worry that a shock today affects being in the sample in the future. Wooldridge (2010, MIT Press): Using FE on the unbalanced panel, estimate the equation y it x it s i,t 1 c i u it, t 1,...,T 1 and test H 0 : 0 using a robust test. 15

16 If we use RE we can (and should) test if c i is correlated with functions of T i. For example, define dummies d ir 1 T i r, r 1,...,T 1 and add these to the usual RE estimation. Significance of d i d i1,...,d i,t 1 casts serious doubt on the usual RE analysis. FE does not care. 16

17 We can jointly test the assumption that E c i x i1,...,x it, s i1,...,s it E c i by estimating the equation y it x it x i d i v it by RE using the unbalanced panel and jointly testing H 0 : 0, 0. 17

18 3. Linear Model with Correlated Random Slopes Now consider a model with all slopes heterogeneous: E y it x i, a i, b i a i x it b i, so, in the population, x it : t 1,...,T is strictly exogenous conditional on a i, b i. Write a i c i, b i d i and y it x it c i x it d i u it where E u it x i, a i, b i E u it x i, c i, d i 0 for all t. 18

19 Selection may be related to x i, a i, b i but not the idiosyncratic shocks: E y it x i, a i, b i, s i E y it x i, a i, b i or E u it x i, a i, b i, s i 0, t 1,...,T. With enough time periods, we could obtain â i, b i ( fixed effects ) and then average these. But need T i K 1. 19

20 Unfortunately, if selection is correlated with b i, there are no intuitive robustness results for the usual FE estimator. How might we use a CRE approach? 20

21 Multiply through by selection: s it y it s it s it x it s it c i s it x it d i s it u it Because we only use observations with s it 1, we handle the heterogeneity by conditioning on the entire history of selection and the values of the covariates if selected. That is, s it, s it x it : t 1,...,T. If s it 0 the observation is not used; if s it 1, the observation is used, and we observe x it. 21

22 If we condition on the larger information set x i1, s i1, x i2, s i2,..., x it, s it and heterogeneity depends only on the covariates, we would be left with an equation that is not estimable unless the covariates are always observed. 22

23 Write h i h it : t 1,...,T s it, s it x it : t 1,...,T. Then, extending Mundlak (1978) and Chamberlain (1982), we work with E s it y it h i s it s it x it s it E c i h i s it x it E d i h i and then make assumptions concerning E c i h i and E d i h i. Alternatively, we could eliminate c i using the within transformation, and focus just on E d i h i. 23

24 A simple approach is to model the expectations as exchangeable functions of h it : t 1,...,T extending the balanced case considered by Altonji and Matzkin (2005). From the model with constant slopes, natural to start with w i T i, x i. Fairly natural extension of Mundlak (1978). 24

25 A flexible specification with g ir 1 T i r : where T E c i T i, x i r 1 T r g ir r r 1 T T E d i T i, x i g ir r r r 1 r 1 g ir x i r r r P T i r E 1 T i r r E x i T i r g ir x i r I K r, 25

26 As a practical matter, the formulation above is identical to running separate regressions for each T i : y it on 1, x it, x i, x i r x it,fors it 1 where r N r 1 N i 1 1 T i r x i and N r is the number of observations with T i r. The coefficient on x it, r, is the APE given T i r. We can average these across r to obtain the overall APE. 26

27 Notice that we lose the T i 1 observations just as in a standard fixed effects estimation. We could use grouping based on T i so that the APEs include the entire population. 27

28 In any formulation, including the basic FE estimation, have a simple test for dynamic selection bias that is, for the null E y it x i, a i, b i, s i E y it x i, a i, b i. Assumes that our model for E c i, d i s it, s it x it : t 1,...,T is correctly specified. Then, no other functions of s it, s it x it : t 1,...,T should appear in E s i y it h i. 28

29 Assuming we have some T i with T i 3, add as extra regressors at time t the variables s i,t 1, s i,t 1 x i,t 1. We can compute a fully robust (to serial correlation and heteroskedasticity) Wald test of joint significance. Produces a joint test of ignorable selection and strictly exogenous covariates. 29

30 4. A Modeling Approach for Nonlinear Models Interested in the conditional distribution D y it x it, c i, or maybe just an expectation. Assume strictly exogenous covariates conditional on c i and ignorable selection: D y it x i, c i, s i D y it x it, c i, t 1,...,T. 30

31 Focus on methods that only exploit assumptions about the marginal distributions, D y it x it, c i, not joint distributions. Average partial effects are generally identified without restricting conditional dependence. Let g t y t x it, c i ; be a parametric density conditional on x it, c i. As with the linear model, specify models for D c i s it, s it x it : t 1,...,T. Let w i be a vector of known functions of s it, s it x it : t 1,...,T that act as sufficient statistics, so that D c i s it, s it x it : t 1,...,T D c i w i 31

32 Because D y it x it, c i, s it 1 D y it x it, c i and we have a density for the latter, we can obtain the density given x it, w i, s it 1 as f t y t x it, w i ;, R M g t y t x it, c; h c w i ; dc where h c w i ; is a parametric density for D c i w i and M is the dimension of c i. The partial log-likelihood function for the entire sample is N i 1 T t 1 s it log f t y it x it, w i ;,. Need a robust sandwich estimator for serial correlation (and maybe other misspecifications). 32

33 Same basic arguments carry through if we focus on estimating conditional means. We start with E y it x it, c i m t x it, c i and obtain E y it x it, w i by integrating m t x it, c i with respect to the density of c i given w i (which, again, is valid for the mean when s it 1). Or, we just assert a model for E y it x it, w i. Can then use a host of quasi-log likelihoods on the selected sample, including the Bernoulli for fractional responses, the gamma for nonnegative (continuous) responses, and the Poisson for nonnegative (count) responses. 33

34 5. Estimating Average Partial Effects In most nonlinear models, the parameters appearing in f t y t x t, c; provide only part of the story for the effect of x t on y t. Using pooled methods we often cannot fully identify or the distribution of c i. Even in the unbalanced case we can use the Blundell and Powell (2003) approach provided a vector w i suitably proxies for correlation between c i and s it, s it x it : t 1,...,T. 34

35 Let q t x t, w; denote the mean associated with f t y t x t, w;. Then ASF x t E wi q t x t, w i ;, that is, we can obtain the ASF by averaging out the observed vector of sufficient statistics, w i, from E y it x t, w i, s it 1 rather than averaging out c i from E y it x t, c i. Consistent estimation is then easy: N ASF x t N 1 i 1 q t x t, w i ; Panel bootstrap is still justified with unbalanced panel. 35

36 Tricky to estimate an overall average partial effect. With a balanced panel, it is natural to average APE tj x it, w i across the distribution of x it, w i, and then possibly across t, too. But if selection s it depends on x it, averaging across the selected sample does not consistently estimate E xit,w i APE tj x it, w i. Presumably we still have an idea of useful values to plug in for x t. 36

37 6. Application to Probit/Fractional Response The probit model, either for true binary responses or fractional responses, is natural for applying the previous approach. The model is P y it 1 x i, c i P y it 1 x it, c i x it c i, t 1,...,T where x it can include time dummies or other aggregate time variables. Once we specify P y it 1 x i, c i and assume that selection is conditionally ignorable for all t, that is, P y it 1 x i, c i, s i P y it 1 x i, c i, all that is left is to specify a model for D c i w i for suitably chosen functions w i of s it, s it x it : t 1,...,T. 37

38 A specification linear in x i but with intercept and slopes different for each T i is T E c i w i r 1 T r 1 T i r r 1 1 T i r x i r At a minimum, should let the variance of c i change with T i : Var c i w i exp T 1 r 1 1 T i r r 38

39 If we also maintain that D c i w i is normal, then we obtain the following response probability for s it 1 (with normalization that does not affect estimation of APEs): P y it 1 x it, w i, s it 1 x T it r 1 T r g ir r 1 g ir x i r 1/2 g ir r T exp r 2 so that the denominator is unity when all r are zero. (Recall g ir 1 T i r.) No difficulty in adding g ir x i for r 1,...,T to the variance function. 39

40 Computational bonus: Estimable by so-called heteroskedastic probit software, where the explanatory variables at time t are 1, x it, g i1,...,g it, g i1 x i,...,g it x i and the explanatory variables in the variance are simply the dummy variables g i2,...,g it, or also add g i1 x i,...,g it x i. 40

41 The APEs are easy to obtain from the estimated ASF: N ASF x t N 1 i 1 x T t r 1 T rg ir r 1 T exp r 2 g ir r g ir x i r 1/2 where the coefficients with ^ are from the pooled heteroskedastic probit estimation. The functions of T i, x i are averaged out, leaving the result a function of x t. If, say, x tj is continuous, its APE is estimated as APE tj x t j N N 1 i 1 T x t r 1 T rg ir r 1 T exp r 2 g ir r g ir x i r 1/2. 41

42 The above procedure applies, without change, if y it is a fractional response; that is, 0 y it 1. The original model is E y it x it, c i x it c i and APEs are on the mean response. Could allow to change with each T i (losing T i 1). Then, just estimate an APE for each T i 2 and average the results. Just a separate probit with explanatory variables 1, x it, x i. Or, allow heteroskedasticity as a function of x i, with ASF T ASF x t N 1 r 2 N i 1 g ir x t r r x i r exp x i r/2. 42

43 Same ideas apply to ordered probit and even multinomial logit, as well as Tobit and count data models. With count data, Poisson FE estimator has very nice robustness properties, including for sample selection E y it x i, c i, s i E y it x it, c i exp x it c i is sufficient but only for the scalar, additive heterogeneity case. 43

44 Scope for CRE approach in more complicated models, such as E y it x i, c i exp a i x it b i. CMLE approach makes no sense with all coefficients heterogeneous (and it is not clear a CMLE exists, anyway). FE approach can be implemented with lots of large T i T i K 1, but how well? CRE approach is available but with restrictions on D c i s it, s it x it : t 1,...,T. 44

45 EXAMPLE: Effects of Spending on School Test Pass Rates. use meap94_98. tab year 1992 school yr Freq. Percent Cum , , , , , Total 7, egen tobs sum(1), by(schid). tab tobs number of time periods Freq. Percent Cum , , , Total 7, gen tobs3 tobs 3. gen tobs4 tobs 4 45

46 . egen lavgrexpb mean(lavgrexp), by(schid). egen lunchb mean(lunch), by(schid). egen lenrolb mean(lenrol), by(schid). egen y95b mean(y95), by(schid). egen y96b mean(y96), by(schid). egen y97b mean(y97), by(schid). egen y98b mean(y98), by(schid). gen tobs4 tobs 4. gen tobs3 tobs 3. gen tobs3_lavgrexp tobs3*lavgrexp. gen tobs4_lavgrexp tobs4*lavgrexp 46

47 . xtreg math4 lavgrexp lunch lenrol y95 y96 y97 y98, fe cluster(schid) Fixed-effects (within) regression Number of obs 7150 Group variable: schid Number of groups 1683 R-sq: within Obs per group: min 3 between avg 4.2 overall max 5 F(7,1682) corr(u_i, Xb) Prob F (Std. Err. adjusted for 1683 clusters in schid) Robust math4 Coef. Std. Err. t P t [95% Conf. Interval] lavgrexp lunch lenrol y y y y _cons sigma_u sigma_e rho (fraction of variance due to u_i)

48 . xtreg math4 lavgrexp tobs3_lavgrexp tobs4_lavgrexp lunch lenrol y95 y96 y97 y98, fe cluster(schid) Fixed-effects (within) regression Number of obs 7150 Group variable: schid Number of groups 1683 R-sq: within Obs per group: min 3 between avg 4.2 overall max 5 F(9,1682) corr(u_i, Xb) Prob F (Std. Err. adjusted for 1683 clusters in schid) Robust math4 Coef. Std. Err. t P t [95% Conf. Interval] lavgrexp tobs3_lavgrexp tobs4_lavgrexp lunch lenrol y y y y _cons sigma_u sigma_e rho (fraction of variance due to u_i)

49 . test tobs3_lavgrexp tobs4_lavgrexp ( 1) tobs3_lavgrexp 0 ( 2) tobs4_lavgrexp 0 F( 2, 1682) 2.67 Prob F

50 . * Now fractional response.. replace math4 math4/100 (7150 real changes made). replace lunch lunch/100 (7146 real changes made).. capture program drop frac_het.. program frac_het 1. version args llf xb zg 3. quietly replace llf $ML_y1*log(normal( xb *exp(- zg ))) /// (1 - $ML_y1)*log(1 - normal( xb *exp(- zg ))) 4. end.. ml model lf frac_het (math4 lavgrexp lunch lenrol y95 y96 y97 y98 lavgrexpb /// lunchb lenrolb y95b y96b y97b y98b tobs3 tobs4) (tobs3 tobs4, nocons), /// vce(cluster distid). ml max 50

51 Number of obs 7150 Wald chi2(16) Log pseudolikelihood Prob chi (Std. Err. adjusted for 467 clusters in distid) Robust math4 Coef. Std. Err. z P z [95% Conf. Interval] eq1 lavgrexp lunch lenrol y y y y lavgrexpb lunchb lenrolb y95b y96b y97b y98b tobs tobs _cons eq2 tobs tobs

52 . glm math4 lavgrexp lunch lenrol y95 y96 y97 y98 lavgrexpb lunchb lenrolb y95b y96b y97b y98b tobs3 tobs4, fam(bin) link(probit) cluster(schid) note: math4 has noninteger values Generalized linear models No. of obs 7150 Optimization : ML Residual df 7133 (Std. Err. adjusted for 1683 clusters in schid) Robust math4 Coef. Std. Err. z P z [95% Conf. Interval] lavgrexp lunch lenrol y y y y lavgrexpb lunchb lenrolb y95b y96b y97b y98b tobs tobs _cons

53 . margins, dydx(lavgrexp) Average marginal effects Number of obs 7150 Model VCE : Robust Expression : Predicted mean math4, predict() dy/dx w.r.t. : lavgrexp Delta-method dy/dx Std. Err. z P z [95% Conf. Interval] lavgrexp

Jeffrey M. Wooldridge Michigan State University

Fractional Response Models with Endogenous Explanatory Variables and Heterogeneity Jeffrey M. Wooldridge Michigan State University 1. Introduction 2. Fractional Probit with Heteroskedasticity 3. Fractional