L8. LINEAR PANEL DATA MODELS UNDER STRICT AND WEAK EXOGENEITY. 1. A Structural Linear Panel Model

Size: px

Start display at page:

Download "L8. LINEAR PANEL DATA MODELS UNDER STRICT AND WEAK EXOGENEITY. 1. A Structural Linear Panel Model"

Willis McDowell
5 years ago
Views:

1 Victor Chernozhukov and Ivan Fernandez-Val Econometrics. Spring Massachusetts Institute o Technology: MIT OpenCourseWare, License: Creative Commons BY-NC-SA L8. LINEAR PANEL DATA MODELS UNDER STRICT AND WEAK EXOGENEITY VICTOR CHERNOZHUKOV AND IV AN FERN ANDEZ-VAL Abstract. We discuss basic examples o linear panel data models and their estimation via the ixed eects, dierencing, and correlated random eects approaches. 1. A Structural Linear Panel Model 1.1. The Setting. Here we consider the linear structural equations model (SEM) Y it = a itα + W i + D itβ + E it =: a i + Xitγ + E it, E it (X it, a i ), (1.1) where i = 1,..., n and t = 1,..., T. Here Y it is the outcome or an observational unit i at time t, D it is a vector o variables o interest or treatments, whose predictive eect α we would like to estimate, W it is a vector o covariates or controls including a constant, X it simply stacks together D it and W it, and E it is an error term normalized to have zero mean or each unit. We shall assume that the vectors Z i := {(Y it, Xit) } T t=1, that collect all the variables or the observational unit i, are i.i.d. across i. We note that this assumption does allow or arbitrary dependence o data or unit i across t, subject to other conditions speciied below. In our analysis the temporal dimension T will be small and the cross-sectional dimension n will be large. Accordingly, we shall derive ormal asymptotic results under the large n, ixed T asymptotics, were n and T is ixed. This type o scenario is oten called the short panel. The orthogonality condition stated will be strengthened below to various assumptions, which permit application o common estimation methods or perorming inerence on the target parameter α. The random variable a i is the unobserved individual eect. It can be correlated to X it, and so we can not omit it without introducing omitted variable bias that leads to 1

2 2 VICTOR CHERNOZHUKOV AND IVAN FERNANDEZ-VAL inconsistent estimates o the parameter o interest α. We can give context to this point by thinking o the case where a i is the unobserved individual s innate ability, Y it is wage, D it is education, and W it are other characteristics o a person i at time t. Clearly omission o a i rom the model would lead to an omitted variable bias and inconsistent estimation o the target parameter α or the usual reasons that we discussed in L2. Figure 1 illustrates the omitted variable bias problem in the linear panel model. An important point to make here is the ollowing. Suppose D it is randomly assigned conditional on a i and W it, then α estimates a causal parameter the average treatment eect. This is merely one o many suicient conditions or causal interpretability o α. An example o another condition is the assumption o parallel trends underlying the dierence-in-dierence approach, as described below. The individual eect a i is not observed and can not be identiied or consistently estimated rom short panels. This act is sometimes reerred to as the incidental parameter problem, a term that was coined by Neyman and Scott [4]. However, the act that we have panel data allows or the remarkable possibility to accommodate unobserved individual eects a i in the analysis explicitly. In act, below we design identiication and estimation strategies or the target parameter α that bypass the identiication and consistent estimation o a i. Beore we continue, it is worth pointing out that W it could contain a T -dimensional vector o indicators or time periods as a subvector, namely the vector Q t = (0, 0,..., 1,..., 0) with 1 in the t-th position. In this case the model is said to include time eects The Dierence-in-Dierence Method. A very important special case o the problem that we consider is the dierence-in-dierence approach to identiication. Here where Y it = a i + αd it + W it β + E it, (1.2) D it is the indicator o unit i receiving the treatment at time t; W it are various other controls, or example, time dummies in the simplest case;

3 L8 3 Y_it within pooled X_it Figure 1. The igure illustrated the inconsistency o the pooled least square estimator, which omits the individual eects. The igure plots the data rom the linear panel data model in which a i is related to X it. The pooled estimator, which treats a i as part o the error, estimates the projection o Y it on X it, but this is dierent then estimating target structural unction which characterizes the projection o Y it on a i and X it. As a result the pooled least squares is inconsistent or the target unction, as illustrated in the igure. The igure also demonstrates the within or ixed eect estimator, which does consistently estimate the target unction. and α has the interpretation o the treatment eect. Suppose there are two groups: the treated and control. The units in the treated group receive the treatment at time t 0, and the units in the control group do not receive the treatment at any time. Then, or the units

4 4 VICTOR CHERNOZHUKOV AND IVAN FERNANDEZ-VAL in the treated group E[Y is Treated] = E[a i + W is β + E is Treated], s < t 0, E[Y it Treated] = α + E[a i + W it β + E it Treated], t t 0. It is tempting to take the dierence: E[Y it Treated] E[Y is Treated] = α + E[(W it W is ) β + E it E is Treated], Change(s,t) in an eort to identiy the treatment eect α, but the trend term Change(s, t) does not necessarily vanish. For the units in the control group: I we take the dierence, E[Y is Control] = E[a i + W is β + E is Control], s < t 0, E[Y it Control] = E[a i + W it β + E it Control], t t 0. E[Y it Control] E[Y is Control] = E[(W it W is ) β + E it E is Control]. Under the assumption o the parallel trends: Change(s, t) = Change (s, t), Change ' (s,t) which says that treatment and control groups experienced parallel trends apart rom the treatment eect, we can take dierence-o-the-dierence to identiy the treatment eect: E[Y it Y is Treated] E[Y it Y is Control] = α. Thus under the assumption o parallel trends, we can identiy the treatment eect α. The structural linear panel data ormulation allows us to incorporate the dierence-indierence approach as a special case. Under the assumption o parallel trends and other conditions speciied below, we shall be able to identiy and estimate α. Note that the assumption o parallel trends is more plausible when the treatment D it is randomly assigned, but it also allows or non-random assignment even conditional on covariates. 1 For example, i Y it is income and D it is participation in a training program, it may be that D it = 1 is assigned to those who had low income Y st in the pre-treatment period s < t 0. In this case, we are still able to identiy α under the assumption o the parallel trends or high and low-income groups in the absence o treatment. The plausibility o the assumption really depends on the context. It could be tested by seeing i group-speciic trends enter the panel regression model as statistically insigniicant. 1 O course, this is not the irst time this happens in this course. Recall Wright s instrumental variables model.

5 L Basic Identiication and Estimation Strategies Under Strict Exogeneity There are our methods or identiication and estimation that we shall explore, which vary in terms o strength o assumptions needed or identiication o α: 1. within-estimation or ixed eect estimation, 2. irst-dierencing, 3. correlated random eects, 4. pooled estimation or random eects. The names listed above are conventional names given to the procedures, though some names are potentially misleading, as is usually the case Key approach I: within-groups or ixed eects approach. This approach eliminates the individual eects a i by individual demeaning. Deine the operator D acting on doubly indexed random variables V it as: 1 T DV it = V it V it. T t=1 Apply this operator to the linear SEM (1.1) to obtain: DY it = (DX it ) γ + DE it. This has eliminated the individual eects. It is now very tempting to identiy the parameter γ as a projection coeicient and estimate it by least squares, but our assumptions are not suicient or this purpose. In order to proceed in this way we need to have the orthogonality condition: T 1 T E[DX it DE it ] = 0. (2.1) T t=1 This condition states that the demeaned error terms are uncorrelated with the demeaned covariates, ater averaging over t. With this assumption γ is equal to the projection coeicient o DY it on DX it, where we aggregate across t = 1,..., T. The condition (2.1) is implied by the so-called strict exogeneity condition: T s=1, E it (X, a i ), X i := (X is ) (2.2) i which states that the structural errors E it are orthogonal to all lags and leads o X it conditional on a i. This is potentially a restrictive condition, and does not hold with lagged dependent variables appearing as X it. Note that it is hard to think o situations where (2.2) does not hold but (2.1) does or the causal parameter γ, so eectively we are imposing (2.2) when interpreting results as having causal meaning.

6 6 VICTOR CHERNOZHUKOV AND IVAN FERNANDEZ-VAL The least squares estimation using the demeaned equation is called within-groups or ixed eects estimator. It can be seen as an exactly identiied GMM estimator with the score unction T g(z i, γ) = 1 (DY it DX it γ)dx it. (2.3) T t=1 Because o the exact identiication a simpliied variance ormula applies. It turns out that this approach is numerically equivalent to the the so-called least squares dummy variable (LSDV) estimator that applies OLS to the model: Y it = Q iπ + Ditα + W it β + u it, where Q i is a n-dimensional vector o indicators or observational units with a 1 in the i-th position and 0 s otherwise, namely Q i = (0, 0,..., 1,..., 0). The elements o this vector are called ixed eects. Note that here the GMM ormulation takes care o the clustering problem temporal dependence o data on observational unit i across time by aggregating the data on unit i into one score. We can also use the panel bootstrap which would be to simply bootstrap the independent observational units i or the scores in (2.3). The strict exogeneity condition (2.2) contains over identiying restrictions, and we can set up a GMM estimator with the score unction: g (Z i, γ) = {(DY it DX it γ)x i } T t=1, X i = (X it ) T t=1. (2.4) Using this ormulation we can obtain an eicient estimator or the causal parameter γ. Moreover, we can use the J-test to check the statistical validity o (2.2). Typically the approach outlined in this paragraph is ignored in empirical work, but this is not a good practice. Note that here too we are taking care o the clustering problem by aggregating the data on unit i into one score. We can construct an alternative GMM estimator based on a subset o linear combinations o the moment conditions implied by strict exogeneity using the score unction: ğ(z i, γ) = {(DY it DX it γ)dx it } T t=1. (2.5) This estimator might have better inite sample properties than the estimator based on (2.4) as it is based on a smaller number o inormative moment conditions, T dim X it instead o T 2 dim X it. The J-test in this case can be interpreted as a test o time homogeneity o the parameter γ in the FE approach Key approach II: irst dierencing. This approach eliminates the individual eects a i by taking dierences across time. Speciically, deine the dierencing operator Δ acting

7 L8 7 on doubly indexed random variables V it by creating the dierence ΔV it = V it V it 1. Apply this operator to both sides o (1.1) to obtain: ΔY it = ΔX it γ + ΔE it, (2.6) since Δa i = 0. Thus dierencing eliminates the individual eect. It is tempting to apply least squares to this equation to estimate γ, but the assumptions made so ar do not guarantee that 1 T EΔE it ΔX it = 0, (2.7) T 1 t=2 which is necessary or γ to be a projection coeicient. So the researchers impose this condition implicitly when perorming the irst dierent estimation. This condition states that the innovation in the error terms is uncorrelated with the innovation in the covariates, ater averaging over t. With this assumption γ is equal to the projection coeicient o ΔY it on ΔX it, where we aggregate across t = 2,..., T. The assumption (2.7) is implied the strict exogeneity assumption (2.2) and it is hard to think o situations where strict exogeneity does not hold but (2.7) does or the causal parameter γ. The assumption can serve as a basis o completely standard least squares estimation method with the i.i.d. data {Z i } i=1 n. We can ormulate that as an exactly identiied GMM problem with the score unction: g(z i, γ) = 1 T T 1 t=2 (Δy it ΔX γ)δx it, (2.8) it so that GMM theory applies here, since {Z i } n i=1 are i.i.d. Note that the GMM ormulation automatically takes care o the clustering problem namely the act that data or unit i could be dependent across t by simply aggregating the scores within the cluster i. Note that we also can apply bootstrap or inerence by simply bootstrapping observation units, or, equivalently bootstrapping the scores in (2.8). Note however that the strict exogeneity (2.2) contains a lot o over-identiying inormation and could serve as a basis or a GMM method with the score unction g (Z i, γ) = {(Δy it ΔX it γ)x i } T t=2. (2.9) This estimator will be more eicient than the previous one, and we can also use the J- statistic to test the validity o (2.7) (as well as validity o (2.2)). Ordinarily J-testing is not done in empirical work (which is bad practice), and one wonders i the results o many well-known studies will pass such a test. Again, the GMM theory applies here. Note that this ormulation also automatically takes care o the clustering problem.

8 8 VICTOR CHERNOZHUKOV AND IVAN FERNANDEZ-VAL As in the FE approach, we can construct an alternative GMM estimator based on a subset o linear combinations o the moment conditions implied by strict exogeneity using the score unction: T ğ(z i, γ) = {(ΔY it ΔX it γ)δx it } t=2. (2.10) This estimator might have better inite sample properties than the estimator based on (2.9) as it is based on a smaller number o inormative moment conditions, (T 1) dim X it instead o (T 1)T dim X it. The J-test in this case can be interpreted as a test o time homogeneity o all the model parameters in the FD approach Other Approaches: Correlated Random Eects. In this approach, instead o trying to eliminate the individual eect a i, we model its dependence on the covariates: Then 1 T a i = X i λ + v i, v i X i, X i = X it. T t=1 Y = X i λ + X it itγ + u it, u it = v i + E it. Under the strict exogeneity assumption (2.2), the parameters γ and λ can be identiied as projection coeicients, and we can estimate the model by least squares. This corresponds to the GMM approach with the score unction: T g(z i, λ, γ) = 1 (Y it X i λ X it γ)x it, X it = (X i, X it ). T t=1 It turns out that this approach is numerically equivalent to the within-groups estimator in the linear model. Still it is conceptually useul to have this approach discussed here. As in the previous cases, there is a lot o over-identiying inormation and we can use the score unction: g (Z i, λ, γ) = {(Y it X i λ X it γ)x i } T t=1 to set up the eicient GMM estimator and use the J-test to examine the validity o the model Other Approaches: Pooled or Random Eects Approach. This approach is by ar the most restrictive. You still need to know about it, since people use this terminology quite oten and you need to understand what they mean by it. In this approach we simply think that a i is uncorrelated with X it and treat it as a part o the error term: Y = X it itγ + v it, v it X it, v it = a i + E it. This means we can identiy α as a projection coeicient and estimate it by least squares o Y it on X it. The error terms v it are correlated across t, which is a orm o clustering. We

9 L8 9 can easily take care o this dependence by using the score unction: T g(z i, γ) = 1 (Y it X it γ)x it. (2.11) T t= Which approach to use? The pooled estimator is the worst in terms o strength o assumptions. The minimal assumption (2.1) o the ixed eects approach and is neither stronger nor weaker than the minimal assumption (2.7) o the irst dierencing approach. It is always a good idea to test the underlying assumptions. None o the assumptions made are suitable to the case where the strict exogeneity does not hold. Strict exogeneity requires the leads and lags o X it to be uncorrelated with E it. This assumption is violated or example when X it contains lagged dependent variables. The irst dierence approach makes the slightly milder assumption o uncorrelated dierences, but it is still unsuitable, or example, when lags o Y it appear as X it. Example 1 (Lagged Dependent Variable). The last point requires elaboration. I the model is Y it = a i + β Y i(t 1) +E it, where E it has zero mean conditional on a i and is independent across t. Then, strict exogeneity clearly ails because E it is not orthogonal to X i(t+1) = Y it conditional on a i : X it 2 E[Y it E it a i ] = E[E it a i ] = 0. Also the uncorrelated dierences assumption ails because ΔE it is not orthogonal to ΔX it : EΔX it ΔE it = E(Y i(t 1) Y i(t 2) )(E it E i(t 1) ) = EE 2 = 0. i(t 1) 3. Getting More Sophisticated: Identiication and Estimation under Weak Exogeneity Once you understand how things work or the key approaches and how they easily it in the GMM ramework, you can easily modiy them to best suit your empirical situations. Here is one example o how to do this, and it comes as a pleasant bonus or mastering the preceding section as well as the GMM material. In this example, we will ocus on the dierencing approach, but we could also consider de-meaning where one uses uture values o the dependent variable to de-mean it and uses the past values o the regressors to demean them.

10 10 VICTOR CHERNOZHUKOV AND IVAN FERNANDEZ-VAL 3.1. A Model with Pre-Determined Regressors. Consider the linear SEM (1.1) with predetermined regressors, namely E it (X i t, a i ), X t := (X it, X i(t 1),..., X i1 ). i This is a weak exogeneity assumption. Note that this ormulation can accommodate lagged dependent variables as part o X it i E it is not serially correlated. We can adapt the irst dierences approach to this situation. Note that ΔE it X t 1. i This means we can identiy γ rom the moment equations: t 1 E(ΔY it ΔX it γ)x i = 0, t = 2,..., T. (3.1) The estimation and inerence can be done using GMM with the score unction g(z i, γ) = {(ΔY it ΔX it γ)x t 1 i } T t=2. (3.2) This estimation method subsumes as a special case the Arellano-Bond [3] s estimation approach or models with a lagged dependent variable. As in the strictly exogenous case, we can construct an alternative GMM estimator based on a subset o linear combinations o the moment conditions implied by weak exogeneity using the score unction: ğ(z i, γ) = {(ΔY it ΔX it γ)x i(t 1) } T t=2. (3.3) This estimator might have better inite sample properties than the estimator based on (3.2) as it is based on a smaller number o inormative moment conditions. The J-test in this case can be interpreted as a test o time homogeneity o all the model parameters. We can urther reduce the number conditions by averaging over t resulting on the score unction 1 T g (Z i, γ) = (ΔY it ΔX it γ)x i(t 1), T 1 t=2 which just-identities the parameter γ. This method subsumes as a special case the Anderson- Hsiao [2] estimator or models with a lagged dependent variable A Model with Pre-Determined Instruments. As a urther generalization we consider the linear structural equations model (SEM) Y it = a i + D it α + W it β + E it =: a i + X it γ + E it, where i = 1,..., n and t = 1,..., T, and where we have pre-determined instruments: t E it (M i, a i ), Mi t := (M it, M i(t 1),...). (3.4) where M it are technical instruments that contain W it, but don t contain D it. For example, we can think about modeling aggregate demand equations across dierent markets i and

11 L8 11 across dierent time periods, where D it is the price. The instruments M it could consist o various supply shiters as well as controls W it. Then (3.4) implies ΔE it Mi t 1. (3.5) This means we can identiy γ rom the moment equations: t 1 E(ΔY it ΔX it γ)m i = 0. (3.6) The estimation and inerence can be done using GMM with the score unction g(z i, γ) = {(ΔY it ΔX it γ)m t 1 i } T t=2. (3.7) As beore the GMM ormulation takes care o clustering by simply deining the scores appropriately. 4. The Eects o Expending on Academic Perormance We illustrate the methods o Section 2 with an empirical application on the eect o school resources per student on test pass rates ollowing Papke (2005) [5]. We use data rom annual Michigan School Reports (MSR) on 550 elementary schools in Michigan or the period rom 1992 through This period covers 1994, when Michigan carried out a school inance reorm to equalize spending across K-12 schools. The purpose o this reorm is to oer equal educational opportunities and to improve student perormance. The outcome variable Y it is the percent o students passing a ourth-grade math test rom the Michigan Educational Assessment Program in school i at year t. The explanatory variables X it include the logarithm o per student spending, logarithm o per student spending in the previous year, the raction o the students receiving ree lunch, and the logarithm o school enrollment. Table 1 gives descriptive statistics or the variables used in the analysis. Table 1. Descriptive Statistics Mean SD Math Pass Rate Expenditure 5,496 1,151 Lunch Enrollment 3,044 8,153 We estimate the model (1.1) using several approaches: (1) Pooled: pooled approach that assumes that X it is orthogonal to the school unobserved eect a i. (2) FD: irst dierencing approach based on the moment conditions (2.7).

12 12 VICTOR CHERNOZHUKOV AND IVAN FERNANDEZ-VAL (3) GMM-FD1: irst dierencing approach based on two-step GMM with the score unction (2.10). The irst-step uses the FD estimator to construct the optimal weighting matrix. (4) GMM-FD2: irst dierencing approach based on two-step GMM with the score unction (2.9), which exploits all the restrictions implied by strict exogeneity. The irst-step uses the FD estimator to construct the optimal weighting matrix. (5) FE: ixed eects approach based on the moment conditions (2.1). (6) GMM-FE1: ixed eects approach based on two-step GMM with score unction (2.5). The irst-step uses the FE estimator to construct the optimal weighting matrix. (7) GMM-FE2: ixed eects approach based on two-step GMM with score unction (2.4), which exploits all the restrictions implied by strict exogeneity. The irst-step uses the FE estimator to construct the optimal weighting matrix. For each approach we compute estimates, analytical standard errors clustered at the school level, and bootstrap standard errors based on resampling schools with replacement. We also report the results o the J-test or the overidentiying restrictions o the GMM-FD1, GMM-FD2, GMM-FE1 and GMM-FE2 approaches. Table 2 presents the results. All the approaches yield positive and signiicant eects o increasing expenditure per student the previous year on current math score passing rates. The delay on the eect is due to the timing o the tests and spending variables: the test are administered early in the second semester, whereas the spending is the allocation or the entire school year. According to the analytical standard errors, using all the strict exogeneity restrictions substantially increases the precision o the estimates. However, we should be cautious with this result as analytical standard errors might provide a poor approximation to the variability o two-step GMM estimators in the presence o many (potentially weak) moment conditions. 2 Bootstrap provides standard errors that are more stable across the dierent approaches. Time homogeneity o all the model parameters cannot be rejected at the 5% level, although only marginally or the GMM-FD1. The strict exogeneity overidentiying restrictions are rejected at the 5% level with both GMM-FD2 and GMM-FE2. 5. Causal Eect o Democracy on Economic Growth We illustrate the methods o Section 3 with an application to the causal eect o democracy on economic growth based on Acemoglu, Naidu, Restrepo and Robinson (2014) [1]. We use a balanced panel o 147 countries over the period rom 1987 through 2009 extracted rom the data set used in [1]. The outcome variable Y it is the logarithm o GDP per capita in 2000 USD as measured by the World Bank or country i at year t. The treatment variable o interest D it is a democracy indicator constructed in [1], which combines inormation 2 [6] derived a correction or analytical standard errors o two-step GMM estimators.

13 log(rexpp) 0.53 (2.51) [2.49] L1.log(rexpp) 9.05 (2.79) [2.81] log(enrol) 0.59 (0.41) [0.40] L8 13 Table 2. Eect o Expenditure per Student on Math Scores Pooled FD GMM-FD1 GMM-FD2 FE GMM-FE1 GMM-FE (4.93) (2.99) (1.30) (2.79) (2.09) (1.31) [4.65] [3.43] [3.39] [2.74] [2.51] [2.61] (5.12) [5.10] 2.14 (1.64) [1.59] 7.94 (2.77) [3.69] 1.84 (1.02) [1.34] 9.87 (1.12) [4.26] 1.42 (0.42) [1.32] 7.00 (4.24) [4.20] 0.25 (0.95) [0.95] 9.44 (2.47) [3.42] 0.31 (0.75) [0.96] 7.63 (1.03) [3.87] 0.05 (0.41) [0.93] lunch (0.03) [0.03] (0.17) [0.15] (0.12) [0.16] (0.04) [0.12] (0.13) [0.12] (0.10) [0.11] (0.04) [0.11] J-test p-val d.o Note 1: All the speciications include time eects. Note 2: Clustered standard errors at the school level in parentheses. Note 3: Bootstrap standard errors in brackets based on 500 replication. rom several sources including Freedom House and Polity IV. This indicator captures a bundle o institutions that characterize electoral democracies such as ree and competitive elections, checks on executive power, an inclusive political process that permits various groups o society to be represented politically, and expansion o civil rights. Table 3 reports some descriptive statistics o the variables used in the analysis. The unconditional eect o democracy on GDP is 134% in this period. Table 3. Descriptive Statistics Mean SD Dem = 1 Dem = 0 Democracy Log(GDP) Number Obs. 3,381 3,381 2,099 1,282

14 14 VICTOR CHERNOZHUKOV AND IVAN FERNANDEZ-VAL We control or unobserved country eects and rich dynamics o GDP using the linear panel model pt Y it = a i + αd it + β j Y i(t j) + E it, i = 1,..., n, t = p + 1,..., T, where we assume that j=1 E it (X i t, a i ), X i t = (D it,..., D i1, Y i(t 1),..., Y i1 ). (5.1) This assumption implies that democracy and past GDP are orthogonal to contemporaneous and uture GDP shocks, and that the error E it is serially uncorrelated once we include suiciently many lags o GDP. 3 Following the preerred speciication in [1], we include our lags (p = 4). Table 4. Eect o Democracy on Economic Growth Democracy ( 100) L1.log(gdp) 1.33 (0.05) [0.05] L2.log(gdp) (0.06) [0.07] L3.log(gdp) (0.05) [0.05] L4.log(gdp) (0.03) [0.03] Predictive Causal Pooled FD FE GMM-FD1 GMM-FD (0.26) (1.20) (0.65) (2.77) (1.70) [0.26] [1.20] [0.66] [2.75] [1.61] 0.33 (0.05) [0.06] 0.17 (0.04) [0.04] 0.06 (0.02) [0.02] (0.04) [0.04] 1.15 (0.05) [0.05] (0.06) [0.06] (0.04) [0.04] (0.02) [0.03] 1.30 (0.14) [0.15] (0.28) [0.22] (0.28) [0.23] 0.18 (0.14) [0.13] 1.00 (0.07) [0.07] (0.06) [0.07] (0.04) [0.04] (0.03) [0.03] J-test p-value d.o Note 1: All the speciications include time eects. Note 2: Clustered standard errors at the country level in parentheses. Note 3: Bootstrap standard errors in brackets based on 500 replications. 3 All the variables are net o time eects.

15 L8 15 Table 4 presents the results. The estimates based on the weak exogeneity condition (5.1) are reported in the columns labelled as GMM-FD1 and GMM-FD2. 4 GMM-FD1 applies two-step GMM with the score unction (3.3) with X it = (D it, Y i(t 1),..., Y i(t 4) ), γ = (α, β 1,..., β 4 ), (5.2) which uses only a subset o the moment conditions in (5.1). GMM-FD2 applies two-step GMM with the score unction (3.2), which uses all the moment conditions in (5.1), with the same X it and γ as in (5.2). In both cases we test the overidentiying restrictions using the J-test. The rest o the columns report estimates o predictive eects based on the pooled, irst dierencing and ixed eects approaches o Section 2 with the score unctions (2.11), (2.8) and (2.3), respectively. None o these approaches is consistent or the causal parameters identiied by (5.1) under large n ixed T asymptotics. 5 For each method, we report analytical standard errors clustered at the country level and bootstrap standard errors based on resampling countries without replacement. The GMM-FD2 approach inds that a transition to democracy increases economic growth by almost 4% in the irst year, and about 20% in the long run. 6 The J-test does not reject the weak exogeneity overidentiying restrictions in (5.1). The GMM-FD1 approach yields short run eects similar to FD-GMM2, but more than double run eects o 46.7% and clearly reject the overidentiying restrictions in (3.3). This does raise the concern about the statistical validity o the model and warrants urther investigation. The discrepancy in the conclusion o the J-test might be due to lack o power in the GMM-FD2 due to simultaneous testing o many restrictions, most o them (possibly) weak moment conditions. As expected, the FE predictive estimates are closer to their causal counterparts than the pooled and FD estimates because T is large, T = 19 ater using the irst 4 periods as initial conditions. Appendix A. Problems (1) Implement standard panel data estimators (ixed eects, irst dierences, or Arellano- Bond approach) or the irst or second empirical example. These are implemented in standard sotware such as Stata or R. Very briely explain the assumptions you need to make or these estimators to be consistent or causal eects, and why these assumptions may or may not hold in these examples. Very briely explain how you are taking care o clustering. (2) (Optional Bonus Problem.) Implement the GMM-FE1 and GMM-FE2 estimators or the irst empirical example and GMM-FD1 or the second empirical example. Report results o J-tests. Explain very briely what you are doing. Extra plus or 4 We obtain the estimates with the command pgmm o the package plm in R. 5 The ixed eects estimator is consistent under asymptotic sequences where n, T because it has asymptotic bias o order O(T 1 ). 6 The long-run eect is calculated as α/(1 to democracy. p j=1 βj ), and corresponds to the eect o a permanent transition

16 16 VICTOR CHERNOZHUKOV AND IVAN FERNANDEZ-VAL inding problems with the posted R-code or the empirical results reported in the lecture note. Reerences [1] Daron Acemoglu, Suresh Naidu, Pascual Restrepo, and James A. Robinson. Democracy Does Cause Growth. NBER Working Papers 20004, National Bureau o Economic Research, Inc, March [2] T. W. Anderson and Cheng Hsiao. Formulation and estimation o dynamic models using panel data. J. Econometrics, 18(1):47 82, [3] Manuel Arellano and Stephen Bond. Some tests o speciication or panel data: Monte carlo evidence and an application to employment equations. The Review o Economic Studies, 58(2): , [4] J. Neyman and E. L. Scott. Consistent estimates based on partially consistent observations. Econometrica, 16(1):1 32, [5] Leslie E. Papke. The eects o spending on test pass rates: evidence rom Michigan. Journal o Public Economics, 89(5-6): , June [6] Frank Windmeijer. A inite sample correction or the variance o linear eicient twostep gmm estimators. Journal o Econometrics, 126(1):25 51, 2005.

17 MIT OpenCourseWare Econometrics Spring 2017 For inormation about citing these materials or our Terms o Use, visit:

arxiv: v1 [econ.em] 12 Jan 2019

arxiv: v1 [econ.em] 12 Jan 2019 Mastering Panel Metrics: Causal Impact of Democracy on Growth By Shuowen Chen and Victor Chernozhukov and Iván Fernández-Val arxiv:1901.03821v1 [econ.em] 12 Jan 2019 The relationship between democracy