Measurement errors and simultaneity

Size: px

Start display at page:

Download "Measurement errors and simultaneity"

Bryce Skinner
6 years ago
Views:

1 Chapter 9 Measurement errors and simultaneity Erik Biørn and Jayalakshmi Krishnakumar Chapter in: THE ECONOMETRICS OF PANEL DATA Fundamentals and recent developments, Ed by L Mátyás and P Sevestre Springer Science, forthcoming 91 Introduction This chapter is concerned with the problem of endogeneity of certain explanatory variables in a regression equation There are two potential sources of endogeneity in a panel data model with individual and time specific effects : (i) correlation between explanatory variables and specific effects (when treated random) and (ii) correlation between explanatory variables and the residual/idiosyncratic error term The first case was extensively dealt with in Chapter 4 of this book and hence we will not go into it here In this chapter we are more concerned with a non-zero correlation between the explanatory variables and the overall error consisting of both the specific effect and the genuine disturbance term One might call it double endogeneity as opposed to the single endogeneity in the former situation In this chapter we consider two major causes of this double endogeneity encountered in practical situations One of them is the presence of measurement errors in the explanatory variables This will be the object of study of Section 92 Another major source is the simultaneity problem that arises when the regression equation is one of several structural equations of a simultaneous model and hence contains current endogenous explanatory variables Section 93 will look into this problem in detail and Section 94 concludes the 1

2 chapter 92 Measurement errors and panel data A familiar and notable property of the Ordinary Least Squares (OLS) when there are random measurement errors (errors-in-variables, EIV) in the regressors is that the coefficient estimators are inconsistent In the one regressor case (or the multiple regressor case with uncorrelated regressors) under standard assumptions, the slope coefficient estimator is biased towards zero, often denoted as attenuation More seriously, unless some extraneous information is available, eg the existence of valid parameter restrictions or valid instruments for the error-ridden regressors, slope coefficients cannot (in general) be identified from standard data [see Fuller (1987, section 113)] 1 This lack of identification in EIV models, however, relates to uni-dimensional data, ie, pure (single or repeated) cross-sections or pure time-series If the variables are observed as panel data, exhibiting two-dimensional variation, it may be possible to handle the EIV identification problem and estimate slope coefficients consistently without extraneous information, provided that the distribution of the latent regressors and the measurement errors satisfy certain weak conditions Briefly, the reason why the existence of variables observed along two dimensions makes the EIV identification problem easier to solve, is partly (i) the repeated measurement property of panel data, so that the measurement error problem can be reduced by taking averages, which, in turn, may show sufficient variation to permit consistent estimation, and partly (ii) the larger set of other linear data transformations available for estimation Such transformations, involving several individuals or several periods, may be needed to take account of uni-dimensional nuisance variables like unobserved individual or period specific heterogeneity, which are potentially correlated with the regressor Our focus is on the estimation of linear, static regression equations from balanced panel data with additive, random measurement errors in the regressors by means of methods utilizing instrumental variables (IVs) The panel data available to an econometrician are frequently from individuals, firms, or other kinds of micro units, where not only observation errors in the narrow sense, but also departures between theoretical variable definitions and their observable counterparts in a wider sense may be present From the panel data literature which disregards the EIV problem we know that the effect of, for example, additive (fixed or random) individual heterogeneity within a linear model can be eliminated by deducting individual means, taking differences over periods, etc [see Baltagi (2001, Chapter 2) and Hsiao (2003, Section 11)] Such transformations, 1 Identification under non-normality of the true regressor is possible, by utilizing moments of the distribution of the observable variables of order higher than the second, [see Reiersøl (1950)] Even under non-identification, bounds on the parameters can be established from the distribution of the observable variables [see Fuller (1987, p 11)] These bounds may be wide or narrow, depending on the covariance structure of the variables; see Klepper and Leamer (1984), Bekker et al (1987), and Erickson (1993) 2

3 however, may magnify the variation in the measurement error component of the observations relative to the variation in the true structural component, ie, they may increase the noise/signal ratio Hence, data transformations intended to solve the unobserved heterogeneity problem in estimating slope coefficients may aggravate the EIV problem Several familiar estimators for panel data models, including the fixed effects within-group and between-group estimators, and the random effects Generalised Least Squares (GLS) estimators will then be inconsistent, the bias depending, inter alia, on the way in which the number of individuals and/or periods tend to infinity and on the heterogeneity of the measurement error process; see Griliches and Hausman (1986) and Biørn (1992, 1996) Such inconsistency problems will not be dealt with here Neither will we consider the idea of constructing consistent estimators by combining two or more inconsistent ones with different probability limits Several examples are given in Griliches and Hausman (1986), Biørn (1996), and Wansbeek and Meijer (2000, section 69) The procedures to be considered in this section have two basic characteristics: First, a mixture of level and difference variables are involved Second, the orthogonality conditions derived from the EIV structure involving levels and differences over one or more than one periods are not all essential, some are redundant Our estimation procedures are of two kinds: (A) Transform the equation to differences and estimate it by IV or GMM, using as IVs level values of the regressors and/or regressands for other periods (B) Keep the equation in level form and estimate it by IV or GMM, using as IVs differenced values of the regressors and/or regressands for other periods In both cases, the differencing serves to eliminate individual heterogeneity, which is a potential nuisance since it may be correlated with the latent regressor vector These procedures resemble, to some extent, procedures for autoregressive (AR) models for panel data without measurement errors (mostly AR(1) equations with individual heterogeneity and often with exogenous regressors added) discussed, inter alia, by Anderson and Hsiao (1981, 1982), Holtz-Eakin et al (1988), Arellano and Bond (1991), Arellano and Bover (1995), Ahn and Schmidt (1995), Blundell and Bond (1998), and Sevestre and Trognon (2007) If the distribution of the latent regressor vector is not time invariant and the second order moments of the measurement errors and disturbances are structured to some extent, a large number of consistent IV estimators of the coefficient of the latent regressor vector exist Their consistency is robust to potential correlation between the individual heterogeneity and the latent regressor Serial correlation or non-stationarity of the latent regressor is favourable from the point of view of identification and estimability of the coefficient vector The literature dealing specifically with panel data with measurement errors is not large The (A) procedures above extend and modify procedures described in Griliches and Hausman (1986), which is the seminal article on measurement errors in panel data, at least in econometrics Extensions are discussed in Wansbeek and Koning (1991), Biørn (1992, 1996, 2000, 2003), and Biørn and Klette (1998, 1999), and Wansbeek (2001) Paterno et al (1996) consider Maximum Likelihood analysis of panel data with measurement errors 3

4 and is not related to the (A) and (B) procedures to be discussed here 921 Model and orthogonality conditions Consider a panel data set with N ( 2) individuals observed in T ( 2) periods and a relationship between y (observable scalar) and a (K 1) vector ξ (latent), y it = c + ξ it β + α i + u it, i = 1,, N; t = 1,, T, (921) where (y it, ξ it ) is the value of (y, ξ) for individual i in period t, c is a scalar constant, β is a (K 1) vector and α i is a zero (marginal) mean individual effect, which we consider as random and potentially correlated with ξ it, and u it is a zero mean disturbance, which may also contain a measurement error in y it We observe x it = ξ it + v it, i = 1,, N; t = 1,, T, (922) where v it is a zero mean (K 1) vector of measurement errors Hence, y it = c + x it β + ɛ it, ɛ it = α i + u it v it β (923) We can eliminate α i from (923) by taking arbitrary backward differences y itθ = y it y iθ, x itθ = x it x iθ, etc, giving y itθ = x itθ β + ɛ itθ, ɛ it = u it v it β (924) We assume that (ξ it, u it, v it, α i ) are independent across individuals [which excludes random period specific components in (ξ it, u it, v it )], and make the following basic orthogonality assumptions: Assumption (A): E(v it u iθ ) = E(ξ it u iθ ) = E(α i v it ) = 0 K1, E(ξ iθ v it ) = 0 KK, E(α i u it ) = 0, i = 1,, N, t, θ = 1,, T, where 0 mn denotes the (m n) zero matrix and is the Kronecker product operator Regarding the temporal structure of the measurement errors and disturbances, we assume either that Assumption (B1): E(v it v iθ ) = 0 KK, t θ > τ, Assumption (C1): E(u it u iθ ) = 0, t θ > τ, where τ is a non-negative integer, indicating the order of the serial correlation, or Assumption (B2): E(v it v iθ ) is invariant to t, θ, t θ, Assumption (C2): E(u it u iθ ) is invariant to t, θ, t θ, 4

5 which allows for time invariance of the autocovariances The latter will, for instance, be satisfied if the measurement errors and the disturbances have individual specific components, say v it = v 1i + v 2it, u it = u 1i + u 2it, where v 1i, v 2it, u 1i, and u 2it are independent IID processes The final set of assumptions relate to the distribution of the latent regressor vector ξ it : Assumption (D1): E(ξ it ) is invariant to t, Assumption (D2): E(α i ξ it ) is invariant to t, Assumption (E): rank(e[ξ ip ( ξ itθ)]) = K for some p, t, θ different, Assumptions (D1) and (D2) hold when ξ it is stationary for all i [(D1) alone imposing mean stationarity] Assumption (E) imposes non-iid and some form of autocorrelation or non-stationarity on ξ it It excludes, for example, the case where ξ it has an individual specific component, so that ξ it = ξ 1i + ξ 2it, where ξ 1i and ξ 2it are independent (vector) IID processes Assumptions (A) (E) do not impose much structure on the first and second order moments of the u it s, v it s, ξ it s and α i s This has both its pros and cons It is possible to structure this distribution more strongly, for instance assuming homoskedasticity and normality of u it, v it, and α i, and normality of ξ it Exploiting this stronger structure, eg, by taking a LISREL of LIML approach, we might obtain more efficient (but potentially less robust) estimators by operating on the full covariance matrix of the y it s and the x it s rather than eliminating the α i s by differencing Other extensions are elaborated in Section Identification and the structure of the second order moments The distribution of (ξ it, u it, v it, α i ) must satisfy some conditions to make identification of β possible The nature of these conditions can be illustrated as follows Assume, for simplicity, that this distribution is the same for all individuals and that (A) holds, and let C(ξ it, ξ iθ ) = Σ ξξ tθ, E(v it v iθ ) = Σvv tθ, E(ξ itα i ) = Σ ξα t, E(αi 2) = σαα, i = 1,, N, E(u it u iθ ) = σuu tθ, t, θ = 1,, T, where C denotes the covariance matrix operator It then follows from (921) and (922) that the second order moments of the observable variables can be expressed as C(x it, x iθ ) = Σ ξξ tθ + Σvv tθ, C(x it, y iθ ) = Σ ξξ tθ β + Σξα t, C(y it, y iθ ) = β Σ ξξ tθ β + (Σξα t ) β + β Σ ξα θ + σ uu tθ + σαα, i = 1,, N, t, θ = 1,, T (925) The identifiability of β from second order moments in general depends on whether or not knowledge of C(x it, x iθ ), C(x it, y iθ ), and C(y it, y iθ ) for all available t and θ is sufficient 5

6 for obtaining a unique solution for β from (925), given the restrictions imposed on the Σ ξξ tθ s, Σξα t s, σtθ uus, and σαα The answer, in general, depends on T and K With no further information, the number of elements in C(x it, x iθ ), and C(y it, y iθ ) (all of which can be estimated consistently from corresponding sample moments under weak conditions) equal the number of unknown elements in Σ vv tθ and σuu tθ, which is 2 1 KT (KT + 1) and 1 2 T (T + 1), respectively Then σ αα cannot be identified, and C(x it, y iθ ) contains the only additional information available for identifying β, Σ ξξ tθ, and Σξα t, given the restrictions imposed on the latter two matrices Consider two extreme cases First, if T = 1, ie, if we only have cross-section data, and no additional restrictions are imposed, there is an identification problem for any K Second, if T > 2 and ξ it IID(µ ξ, Σ ξξ ), v it IID(0 1,K, Σ vv ), u it IID(0, σ uu ), α i IID(0, σ αα ), we also have lack of identification in general We get an essentially similar conclusion when the autocovariances of ξ it are time invariant and it is IID across i From (925) we then get C(x it, x iθ ) = δ tθ (Σ ξξ + Σ vv ), C(x it, y iθ ) = δ tθ Σ ξξ β, C(y it, y iθ ) = δ tθ (β Σ ξξ β + σ uu ) + σ αα, (926) where δ tθ = 1 for t = θ and = 0 for t θ, and so we are essentially in the same situation with regard to identifiability of β as when T = 1 The cross-period equations (t θ) then serve no other purpose than identification of σ αα, and whether T = 1 or T > 1 realizations of C(x it, x it ), C(x it, y it ), and C(y it, y it ) are available in (926) is immaterial to the identifiability of β, Σ ξξ, Σ vv, and σ uu In intermediate situations, identification may be ensured when T 2 These examples illustrate that in order to ensure identification of the slope coefficient vector from panel data, there should not be too much structure on the second order moments of the latent exogenous regressors along the time dimension, and not too little structure on the second order moments of the errors and disturbances along the time dimension 923 Moment conditions A substantial number of (linear and non-linear) moment conditions involving y it, x it, and ɛ it can be derived from Assumptions (A) (E) Since (921) (923) and Assumption (A) imply E(x it x iθ) = E(ξ it ξ iθ) + E(v it v iθ), E(x it y iθ ) = E(ξ it ξ iθ)β + E[ξ it (α i + c)], E(y it y iθ ) = c 2 + E(α 2 i ) + β E(ξ it ξ iθ)β + β E[ξ it (α i + c)] + E[(α i + c)ξ iθ]β + E(u it u iθ ), E(x it ɛ iθ ) = E(ξ it α i ) E(v it v iθ)β, E(y it ɛ iθ ) = β E(ξ it α i ) + E(α 2 i ) + E(u it u iθ ), 6

7 we can derive moment equations involving observable variables in levels and differences: E[x ip ( x itθ)] = E[ξ ip ( ξ itθ)] + E[v ip ( v itθ)], (927) E[x ip ( y itθ)] = E[ξ ip ( ξ itθ)]β, (928) E[( x ipq )y it ] = E[( ξ ipq )ξ it]β + E[( ξ ipq )(α i + c)], (929) as well as moment equations involving observable variables and errors/disturbances: E[x ip ( ɛ itθ )] = E[v ip ( v itθ)]β, (9210) E[y ip ( ɛ itθ )] = E[u ip ( u itθ )], (9211) E[( x ipq )ɛ it ] = E[( ξ ipq )α i ] E[( v ipq )v it]β, (9212) E[( y ipq )ɛ it ] = β E[( ξ ipq )α i ] + E[( u ipq )u it ], t, θ, p, q = 1,, T (9213) Not all of the equations in (927) (9213), whose number is substantial even for small T, are, of course, independent Depending on which (B), (C), and (D) assumptions are valid, some terms on the right hand side of (929) (9213) will vanish Precisely, if T > 2, then (923), (925), and (9210) (9213) imply the following moment conditions, or orthogonality conditions (OC), on the observable variables and the errors and disturbances (B2), or (B1) with t p, θ p > τ, t θ = E[x ip ( ɛ itθ )] = E[x ip ( y itθ )] E[x ip ( x itθ )]β = 0 K1 (9214) (C2), or (C1) with t p, θ p > τ, t θ = E[y ip ( ɛ itθ )] = E[y ip ( y itθ )] E[y ip ( x itθ )]β = 0 (9215) (D1), (D2) and (B2), or (B1) with t p, t q > τ, p q = E[( x ipq )ɛ it ] = E[( x ipq )y it ] E[( x ipq )x it ]β = 0 K1 (9216) (D1), (D2), and (C2), or (C1) with t p, t q > τ, p q = E[( y ipq )ɛ it ] = E[( y ipq )y it ] E[( y ipq )x it ]β = 0 (9217) The treatment of the intercept term c in constructing (9216) and (9217) needs a comment When the mean stationarity assumption (D1) holds, using IVs in differences annihilates c in the moment equations, since then E( x ipq ) = 0 K1 and E( y ipq ) = 0 If, however, we relax (D1), which is unlikely to hold in many practical situations, we get E[( x ipq )ɛ it ] = E[( x ipq )y it ] E[ x ipq ]c E[( x ipq )x it]β = 0 K1, E[( y ipq )ɛ it ] = E[( y ipq )y it ] E[ y ipq ]c E[( y ipq )x it]β = 0 Using E(ɛ it ) = E(y it ) c E(x it )β = 0 to eliminate c leads to the following modifications of (9216) and (9217): (D1), (D2) and (B2), or (B1) with t p, t q > τ, p q, = E[( x ipq )ɛ it ] = E[( x ipq )(y it E(y it ))] E[( x ipq )(x it E(x it ))]β = 0 K1 (D1), (D2), and (C2), or (C1) with t p, t q > τ, p q, = E[( y ipq )ɛ it ] = E[( y ipq )(y it E(y it ))] E[( y ipq )(x it E(x it ))]β = 0 7

8 To implement these modified OCs in the GMM procedures to be described below for the level equation, we could replace E(y it ) and E(x it ) by corresponding global or period specific sample means The conditions in (9214) (9217) are not all independent Some are redundant, since they can be derived as linear combinations of other conditions 2 We confine attention to (9214) and (9216), since (9215) and (9217) can be treated similarly When τ = 0, the total number of OCs in both (9214) and (9216) is 1 2KT (T 1)(T 2) Below, we prove that (a) When (B2) and (C2), or (B1) and (C1) with τ = 0, are satisfied, all OCs in (9214) can be constructed from all admissible OCs relating to equations differenced over one period and a subset of OCs relating to differences over two periods When (B1) and (C1) are satisfied with an arbitrary τ, all OCs in (9214) can be constructed from all admissible OCs relating to equations differenced over one period and a subset of OCs relating to differences over 2(τ +1) periods (b) When (B2) and (C2), or (B1) and (C1) with τ = 0, are satisfied all OCs in (9216) can be constructed from all admissible OCs relating to IVs differenced over one period and a subset of IVs differenced over two periods When (B1) and (C1) are satisfied with an arbitrary τ, all OCs in (9216) can be constructed from all admissible OCs relating to IVs differenced over one period and a subset of IVs differenced over 2(τ +1) periods We denote the non-redundant conditions defined by (a) and (b) as essential OCs Since (9214) and (9216) are symmetric, we prove only (a) and derive (b) by way of analogy Since x ip ɛ itθ = x ip ( t j=θ+1 ɛ ij,j 1 ), we see that if (hypothetically) all p = 1,, T combined with all t > θ would have given admissible OCs, (9214) for differences over 2, 3,, T 1 periods could have been constructed from the conditions relating to oneperiod differences only However, since (t, θ) = (p, p 1), (p + 1, p) are inadmissible, and [when (B2) holds] (t, θ) = (p + 1, p 1) is admissible, we have to distinguish between the cases where p is strictly outside and strictly inside the interval (θ, t) From the identities x ip ɛ itθ = x ip ( t j=θ+1 ɛ ij,j 1 ) for p = 1,, θ 1, t + 1,, T, x ip ɛ itθ = x ip ( p 1 j=θ+1 ɛ ij,j 1 + ɛ i,p+1,p 1 + t j=p+2 ɛ ij,j 1 )for p = θ+1,, t 1, when taking expectations, we then obtain Proposition 1: A When (B2) and (C2) are satisfied, then 2 This redundancy problem is discussed in Biørn (2000) Essential and redundant moment conditions in AR models for panel data are discussed in Ahn and Schmidt (1995), Arellano and Bover (1995), and Blundell and Bond (1995) A general treatment of redundancy of moment conditions in GMM estimation is found in Breusch et al (1999) 8

9 (a) E[x ip ( ɛ it,t 1 )] = 0 K1 for p = 1,, t 2, t+1,, T ; t = 2,, T are K(T 1)(T 2) essential OCs for equations differenced over one period (b) E[x it ( ɛ it+1,t 1 )] = 0 K1 for t = 2,, T 1 are K(T 2) essential OCs for equations differenced over two periods (c) The other OCs are redundant: among the 1 2KT (T 1)(T 2) conditions in (9214), only a fraction 2/(T 1), are essential B When (B1) and (C1) are satisfied for an arbitrary τ, then (a) E[x ip ( ɛ it,t 1 )] = 0 K1 for p = 1,, t τ 2, t+τ +1,, T ; t = 2,, T are essential OCs for equations in one-period differences (b) E[x it ( ɛ it+τ+1,t τ 1 )] = 0 K1 for t = τ +2,, T τ 1 are essential OCs for equations in 2(τ +1) period differences (c) The other OCs in (9214) are redundant Symmetrically, from (9216) we have Proposition 2: A When (B2) and (C2) are satisfied, then (a) E[( x ip,p 1 )ɛ it ] = 0 K1 for t = 1,, p 2, p+1,, T ; p = 2,, T are K(T 1)(T 2) essential OCs for equations in levels, with IVs differenced over one period (b) E[( x it+1,t 1 )ɛ it ] = 0 K1 for t = 2,, T 1 are K(T 2) essential OCs for equations in levels, with IVs differenced over two periods (c) The other OCs are redundant: among the 1 2KT (T 1)(T 2) conditions in (9216), only a fraction 2/(T 1), are essential B When (B1) and (C1) are satisfied for an arbitrary τ, then (a) E[( x ip,p 1 )ɛ it ] = 0 K1 for t = 1,, p τ 2, p+τ +1,, T ; p = 2,, T are essential OCs for equations in levels, with IVs differenced over one period (b) E[( x it+τ+1,t τ 1 )ɛ it ] = 0 K1 for t = τ +2,, T τ 1 are essential OCs for equations in levels, with IVs differenced over 2(τ +1) periods (c) The other OCs in (9216) are redundant These propositions can be (trivially) modified to include also the essential and redundant OCs in the ys or the ys, given in (9215) and (9217) 924 Estimators constructed from period means Several consistent estimators of β can be constructed from differenced period means These estimators exploit the repeated measurement property of panel data, while the differencing removes the latent heterogeneity From (923) we obtain s ȳ t = s x tβ + s ɛ t, s = 1,, T 1; t = s+1,, T, (9218) (ȳ t ȳ) = ( x t x) β + ( ɛ t ɛ), t = 1,, T, (9219) 9

10 where ȳ t = 1 N i y it, ȳ = 1 NT i t y it, x t = 1 N i x it, x = 1 NT i t x it, etc and s denotes differencing over s periods When (A) is satisfied, the (weak) law of large numbers implies, under weak conditions [confer McCabe and Tremayne (1993, section 35)], 3 that plim( ɛ t) = 0, plim( x t ξ t) = 0 K1, so that plim[ x t ɛ t] = 0 K1 even if plim[ 1 Ni=1 N x it ɛ it ] 0 K1 From (9218) and (9219) we therefore get plim[( s x t)( s ȳ t)] = plim[( s x t)( s x t)]β, (9220) plim[( x t x)(ȳ t ȳ)] = plim[( x t x)( x t x) ]β (9221) Hence, provided that E[( s ξ t)( s ξ t) ] and E[( ξ t ξ)( ξ t ξ) ] have rank K, which is ensured by Assumption (E), consistent estimators of β can be obtained by applying OLS on (9218) and (9219), which give, respectively, β s = β BP = T ( s x t) ( s x t) 1 t=s+1 t=s+1 [ T ] 1[ T ( x t x)( x t x) t=1 T t=1 ( s x t) ( s ȳ t), s = 1,, T 1, (9222) ( x t x)(ȳ t ȳ) ] (9223) The latter is the between period (BP) estimator The consistency of these estimators simply relies on the fact that averages of a large number of repeated measurements of an error-ridden variable give, under weak conditions, an error-free measure of the true average at the limit, provided that this average shows variation along the remaining dimension, ie, across periods Shalabh (2003) also discusses consistent coefficient estimation in measurement error models with replicated observations The latter property is ensured by Assumption (E) A major problem with these estimators is their low potential efficiency, as none of them exploits the between individual variation in the data, which often is the main source of variation Basic to these conclusions is the assumption that the measurement error has no period specific component, which, roughly speaking, means that it is equally difficult to measure ξ correctly in all periods If such a component is present, it will not vanish when taking plims of period means, ie, plim( v t) will no longer be zero, (9220) and (9221) will no longer hold, and so β s and β BP will be inconsistent 925 GMM estimation and testing in the general case We first consider the GMM principle in general, without reference to panel data and measurement error situations Assume that we want to estimate the (K 1) coefficient vector β in the equation 4 y = xβ + ɛ, (9224) 3 Throughout plim denotes probability limits when N goes to infinity and T is finite 4 We here, unlike in Sections , let the column number denote the regressor and the row number the observation Following this convention, we can express the following IV and GMM estimators in the more common format when going from vector to matrix notation 10

11 where y and ɛ are scalars and x is a (1 K) regressor vector There exists an instrument vector z, of dimension (1 G), for x (G K), satisfying the OCs E(z ɛ) = E[z (y xβ)] = 0 G1 (9225) We have n observations on (y, x, z), denoted as (y j, x j, z j ), j = 1,, n, and define the vector valued (G 1) function of corresponding empirical means, g n (y, x, z; β) = 1 nj=1 n z j (y j x jβ) (9226) It may be considered the empirical counterpart to E[z (y xβ)] based on the sample The essence of GMM is to choose as an estimator for β the value which brings the value of g n (y, x, z; β) as close to its theoretical counterpart, 0 G1, as possible If G = K, an exact solution to g n (y, x, z; β) = 0 G1 exists and is the simple IV estimator β = [ j z j x j ] 1 [ j z j y j ] If G > K, which is the most common situation, GMM solves the estimation problem by minimizing a distance measure represented by a quadratic form in g n (y, x, z; β) for a suitably chosen positive definit (G G) weighting matrix W n, ie, β GMM = argmin β [g n (y, x, z; β) W n g n (y, x, z; β)] (9227) All estimators obtained in this way are consistent A choice which leads to an asymptotically efficient estimator of β, is to set this weighting matrix equal (or proportional) to the inverse of (an estimate of) the (asymptotic) covariance matrix of 1 n nj=1 z j ɛ j ; see, eg, Davidson and MacKinnon (1993, Theorem 173) and Harris and Mátyás (1999, section 133) If ɛ is serially uncorrelated and homoskedastic, with variance σ 2 ɛ, the appropriate choice is simply W n = [n 2 σ 2 ɛ nj=1 z j z j ] 1 The estimator obtained from (9227) is then β GMM = [( j x j z j )( j z j z j ) 1 ( j z j x j )] 1 [( j x j z j )( j z j z j ) 1 ( j z j y j )], (9228) which is the standard Two-Stage Least Squares (2SLS) estimator If ɛ j has an unspecified heteroskedasticity or has a more or less strictly specified autocorrelation, we can reformulate the OCs in an appropriate way, as will be exemplified below Both of these properties are essential for the application of GMM to panel data To operationalize the latter method in the presence of unknown heteroskedasticity, we first construct consistent residuals ɛ j, usually from (9228), which we consider as a first step GMM estimator, and estimate W n by Ŵ n = [n 2 j z j ɛ2 j z j ] 1 ; see White (1984, sections IV3 and VI2) Inserting this into (9227) gives β GMM = [( j x j z j )( j z j ɛ2 j z j ) 1 ( j z j x j )] 1 [( j x j z j )( j z j ɛ2 j z j ) 1 ( j z j y j )] (9229) 11

12 This second step GMM estimator is in a sense an optimal GMM estimator in the presence of unspecified error/disturbance heteroskedasticity The validity of the orthogonality condition (9225) can be tested by the Sargan-Hansen statistic [confer Hansen (1982), Newey (1985), and Arellano and Bond (1991)], corresponding to the asymptotically efficient estimator β GMM : J = [( j ɛ j z j )( j z j ɛ2 j z j ) 1 ( j z j ɛ j )] 1 Under the null, J is asymptotically distributed as χ 2 with a number of degrees of freedom equal to the number of overidentifying restrictions, ie, the number of orthogonality conditions less the number of coefficients estimated under the null The procedures for estimating standard errors of β GMM and β GMM can be explained as follows Express (9224) and (9225) as y = Xβ + ɛ, E(ɛ) = 0, E(Z ɛ) = 0, E(ɛɛ ) = Ω, where y, X, Z, and ɛ correspond to y, x, z and ɛ, and the n observations are placed along the rows The two generic GMM estimators (9228) and (9229) have the form β = [X P Z X] 1 [X P Z y], P Z = Z(Z Z) 1 Z, β = [X P Z (Ω)X] 1 [X P Z (Ω)y], P Z (Ω) = Z(Z ΩZ) 1 Z Let the residual vector obtained from the former be ɛ = y X β and S XZ = S ZX = X Z n, S ZZ = Z Z n, S ɛz = S Zɛ = ɛ Z n, S = Z ΩZ ZΩZ n, S ZɛɛZ = Z ɛɛ Z, S n = Z ɛ ɛ Z Z ɛ ɛz n Inserting for y in the expressions for the two estimators gives n( β β) = [ n[x P Z X] 1 [X P Z ɛ] = [S XZ S 1 ZZ S ZX] 1 S XZ S 1 n( β β) = n[x P Z (Ω)X] 1 [X P Z (Ω)ɛ] = [S XZ S 1 ZΩZ S ZX] 1 and hence ZZ [ Z ] ɛ, n S XZ S 1 ZΩZ Z ] ɛ, n n( β β)( β β) = [S XZ S 1 ZZ S ZX] 1 [S XZ S 1 ZZ S ZɛɛZS 1 ZZ S ZX][S XZ S 1 ZZ S ZX] 1, n( β β)( β β) = [S XZ S 1 ZΩZ S ZX] 1 [S XZ S 1 ZΩZ S ZɛɛZS 1 ZΩZ S ZX][S XZ S 1 ZΩZ S ZX] 1 The asymptotic covariance matrices of n β and n β can then, under suitable regularity conditions, be written as [see Bowden and Turkington (1984, pp 26, 69)] av( n β) = lim E[n( β β)( β β) ] = plim[n( β β)( β β) ], av( n β) = lim E[n( β β)( β β) ] = plim[n( β β)( β β) ] 12

13 Since S ZɛɛZ and S ZΩZ coincide asymptotically, we get, letting bars denote plims, av( n β) = [S XZ S 1 ZZS ZX ] 1 [S XZ S 1 ZZS ZΩZ S 1 ZZS ZX ][S XZ S 1 ZZS ZX ] 1, av( n β) = [S XZ S 1 ZΩZS ZX ] 1 Replacing the plims S XZ, S ZX, S ZZ and S ZΩZ by their sample counterparts, S XZ, S ZX, S ZZ and S ZˆɛˆɛZ and dividing by n, we get the following estimators of the asymptotic covariance matrices of β and β: V( β) = 1 n [S XZS 1 ZZ S ZX] 1 [S XZ S 1 ZZ S Z ˆɛˆɛZ S 1 ZZ S ZX][S XZ S 1 ZZ S ZX] 1 = [X P Z X] 1 [X P Z ɛ ɛ P Z X][X P Z X] 1, V( β) = 1 n [S XZS 1 Z ˆɛˆɛZ S ZX] 1 = [X Z(Z ɛ ɛ Z) 1 Z X] 1 = [X P Z ( ɛ ɛ )X] 1 These are the generic expressions for estimating variances and covariances of the GMM estimators (9228) and (9229) When calculating β in practice, we replace P Z (Ω) by P Z ( ɛ ɛ ) = Z(Z ɛ ɛ Z) 1 Z [see White (1982, 1984)] 926 Estimation by GMM, combining differences and levels Following this general description of the GMM, we can construct estimators of β by replacing the expectations in (9214) (9217) by sample means taken over i and minimizing their distances from the zero vector There are several ways in which this idea can be operationalized We can (i) Estimate equations in differences, with instruments in levels, using (9214) and/or (9215) for (a) one (t, θ) and one p, (b) one (t, θ) and several p, or (c) several (t, θ) and several p jointly (ii) Estimate equations in levels, with instruments in differences, using (9216) and/or (9217) for (a) one t and one (p, q), (b) one t and several (p, q), or (c) several t and several (p, q) jointly In cases (ia) and (iia), we obtain an empirical distance equal to the zero vector, so no minimization is needed This corresponds, formally, to the situation with exact identification (exactly as many OCs as needed) in classical IV estimation In cases (ib), (ic), (iib), and (iic), we have, in a formal sense, overidentification (more than the necessary number of OCs), and therefore construct compromise estimators by minimizing appropriate quadratic forms in the corresponding empirical distances We now consider cases (a), (b), and (c) for the differenced equation and the level equation (a) Simple period specific IV estimators Equation in differences, IVs in levels The sample mean counterpart to (9214) and (9215) for one (t, θ, p) gives the estimator β p(tθ) = [ N i=1 z ip ( x itθ )] 1 [ N i=1 z ip ( y itθ )], (9230) 13

14 where z ip = x ip or equal to x ip with one element replaced by y ip Equation in levels, IVs in differences The sample mean counterpart to (9216) and (9217) for one (t, p, q) gives the estimator β (pq)t = [ N i=1 ( z ipq )x it ] 1 [ N i=1 ( z ipq )y it ], (9231) where z ipq = x ipq or equal to x ipq with one element replaced by y ipq Using (9214) (9217) we note that When z ip = x ip (p θ, t) and z ipq = x ipq (t p, q), Assumption (B2) is necessary for consistency of β p(tθ) and β (pq)t If y ip is included in z ip (p θ, t), and y pq is included in z ipq (t p, q), Assumption (C2) is also necessary for consistency of β p(tθ) and β (pq)t Assumptions (D1) and (D2) are necessary for consistency of β (pq)t, but they are not necessary for consistency of β p(tθ) Since the correlation between the regressors and the instruments, say between z ip and x itθ, may be low, (9230) and (9231) may suffer from the weak instrument problem, discussed in Nelson and Startz (1990), Davidson and MacKinnon (1993, pp ), and Staiger and Stock (1997) The following estimators may be an answer to this problem (b) Period specific GMM estimators We next consider estimation of β in (924) for one pair of periods (t, θ), utilizing as IVs for x itθ all admissible x ip s, and estimation of β in (923), for one period (t), utilizing as IVs for x it all admissible x ipq s To formalize this, we define the selection and differencing matrices P tθ = ((T 2) T ) matrix obtained by deleting from the T -dimensional identity matrix rows t and θ, D t = d 21 d t 1,t 2 d t+1,t 1 d t+2,t+1 d T,T 1, t, θ = 1,, T, where d tθ is the (1 T ) vector with element t equal to 1, element θ equal to 1 and zero otherwise, so that D t is the are one-period [(T 2) T )] differencing matrix, except that d t,t 1 and d t+1,t are replaced by their sum, d t+1,t 1 5 We use the notation y i = (y i1,, y it ), X i = (x i1,, x it ), y i(tθ) = P tθ y i, X i(tθ) = P tθ X i, x i(tθ) = vec(x i(tθ) ), y i(t) = D t y i, X i(t) = D tx i, x i(t) = vec( X i(t) ), etc Here X i(tθ) denotes the [(T 2) K] matrix of x levels obtained by deleting rows t and θ from X i, and X i(t) denotes the [(T 2) K] matrix of x differences obtained 5 The two-period difference is effective only for t = 2,, T 1 14

15 by stacking all one-period differences between rows of X i not including period t and the single two-period difference between the columns for periods t + 1 and t 1 The vectors y i(tθ) and y i(t) are constructed from y i in a similar way Stacking y i(tθ), y i(t), x i(tθ), and x i(t), by individuals, we get y 1(tθ) y 1(t) x 1(tθ) x 1(t) Y (tθ) =, Y (t) =, X (tθ) =, X (t) =, y N(tθ) y N(t) x N(tθ) x N(t) which have dimensions (N (T 2)), (N (T 2)), (N (T 2)K), and (N (T 2)K), respectively These four matrices contain the IVs to be considered below Equation in differences, IVs in levels Write (924) as y tθ = X tθ β + ɛ tθ, where y tθ = ( y 1tθ,, y Ntθ ), X tθ = ( x 1tθ,, x Ntθ ), etc Using X (tθ) as IV matrix for X tθ, we obtain the following estimator of β, specific to period (t, θ) differences and utilizing all admissible x level IVs, ( ] 1 1 β x(tθ) = [( X tθ ) X (tθ) X (tθ) (tθ)) X X (tθ) ( X tθ ) ( ] 1 [( X tθ ) X (tθ) X (tθ) (tθ)) X X (tθ) ( y tθ ) = [ [ ] 1 [ ] i ( x itθ i(tθ)][ ] 1 )x i x i(tθ) x i(tθ) i x i(tθ) ( x itθ ) [ [ i ( x itθ )x i(tθ)][ i x i(tθ) x i(tθ)] 1 [ i x i(tθ) ( y itθ ) ] ] (9232) It exists if X (tθ) X (tθ) has rank (T 2)K, which requires N (T 2)K estimator, which exemplifies (9228), minimizes the quadratic form: ( ) 1 ( ) 1 1 ( ) 1 N X (tθ) ɛ tθ N 2 X (tθ) X (tθ) N X (tθ) ɛ tθ This GMM The weight matrix (N 2 X (tθ) X (tθ) ) 1 is proportional to the inverse of the (asymptotic) covariance matrix of N 1 X (tθ) ɛ tθ when ɛ itθ is IID across i, possibly with a variance depending on (t, θ) The consistency of β x(tθ) relies on Assumptions (B2) and (E) Interesting modifications of β x(tθ) are: (1) If var( ɛ itθ ) = ω itθ varies with i and is known, we can increase the efficiency of (9232) by replacing x i(tθ) x i(tθ) by x i(tθ) ω itθ x i(tθ), which gives an asymptotically optimal GMM estimator 6 Estimation of i x i(tθ) ω itθ x i(tθ) for unknown ω itθ proceeds as in (9229) 6 For a more general treatment of asymptotic efficiency in estimation with moment conditions, see Chamberlain (1987) and Newey and McFadden (1994) 15

16 (2) Instead of using X (tθ) as IV matrix for X tθ, as in (9232), we may use (X (tθ) Y (tθ) ) Equation in levels, IVs in differences Write (923) as y t = ce N + X t β + ɛ tθ, where e N is the N-vector of ones, y t = (y 1t,, y Nt ), X t = (x 1t,, x Nt ), etc Using X (t) as IV matrix for X t, we get the following estimator of β, specific to period t levels, utilizing all admissible x difference IVs, β x(t) = [X t( X ) ] 1 1 (( X (t) ) (t) )( X (t) ) ( X(t) ) X t [ ( ) ] 1 X t( X (t) ) ( X (t) ) ( X (t) ) ( X(t) ) y t = [ [ ][ ] 1 [ ] ] 1 i x it ( x i(t) ) i ( x i(t) )( x i(t) ) i ( x i(t) )x it [ [ i x it ( x i(t) ) ][ i ( x i(t) )( x i(t) ) ] 1 [ i ( x i(t) )y it] ] (9233) It exists if ( X (t) ) ( X (t) ) has rank (T 2)K, which again requires N (T 2)K This GMM estimator, which also exemplifies (9228), minimizes the quadratic form: ( ) 1 [ ] 1 1 ( ) 1 N ( X (t) ) ɛ t N 2 ( X (t) ) ( X (t) ) N ( X (t) ) ɛ t The weight matrix [N 2 ( X (t) ) ( X (t) )] 1 is proportional to the inverse of the (asymptotic) covariance matrix of N 1 ( X (t) ) ɛ t when ɛ it is IID across i, possibly with a variance depending on t The consistency of β x(t) relies on (B3), (D1), (D2), and the validity of (E3) for all (p, q) Interesting modifications of β x(t) are: (1) If var(ɛ it ) = ω it varies with i and is known, we can increase the efficiency of (9233) by replacing ( x i(t) ) ( x i(t) ) by ( x i(t) ) ω it ( x i(t) ), which gives an asymptotically optimal GMM estimator Estimation of i ( x i(t) ) ω it ( x i(t) ) for unknown ω it proceeds as in (9229) (2) Instead of using X (t) as IV matrix for X t, as in (9233), we may use ( X (t) Y (t) ) If we replace assumptions (B2) and (C2) by (B1) or (C1) with arbitrary τ, we must ensure that the IVs have a lead or lag of at least τ +1 periods to the regressor, to get clear of the τ period memory of the MA(τ) process Formally, we then replace P tθ and D t by 7 P tθ(τ) = matrix obtained by deleting from the T -dimensional identity matrix rows θ τ,, θ + τ and t τ,, t + τ, D t(τ) = 7 The dimension of these matrices depends in general on τ d 21 d t τ 1,t τ 2 d t+τ+1,t τ 1 d t+τ+2,t+τ+1 d T,T 1, t, θ = 1,, T, 16

17 and otherwise proceed as above (c) Composite GMM estimators We finally consider GMM estimation of β when we combine all essential OCs delimited by Propositions 1 and 2 We here assume that either (B1) and (C1) with τ = 0 or (B2) and (B2) are satisfied If τ > 0, we can proceed as above, but must ensure that the variables in the IV matrix have a lead or lag of at least τ +1 periods to the regressor, to get clear of the τ period memory of the MA(τ) process, confer Part B of Propositions 1 and 2 Equation in differences, IVs in levels Consider (925) for all θ = t 1 and all θ = t 2 These (T 1) + (T 2) equations stacked for individual i read or, compactly, y i21 y i32 y i,t,t 1 y i31 y i42 y i,t,t 2 = x i21 x i32 x i,t,t 1 x i31 x i42 x i,t,t 2 β + y i = ( X i )β + ɛ i ɛ i21 ɛ i32 ɛ i,t,t 1 ɛ i31 ɛ i42 ɛ i,t,t 2, (9234) The IV matrix, according to Proposition 1, is the ((2T 3) KT (T 2)) matrix 8 x i(21) x i(32) x Z i = i(t,t 1) (9235) x i x i x i,t 1 Let y = [( y 1 ),, ( y N ) ], ɛ = [( ɛ 1 ),, ( ɛ N ) ], X = [( X 1 ),, ( X N ) ], Z = [Z 1,, Z N] The GMM estimator corresponding to E[Z i( ɛ i )] = 0 T (T 2)K,1, which minimizes [N 1 ( ɛ) Z](N 2 V ) 1 [N 1 Z ( ɛ)] for V = Z Z, can be written as [ ] 1 [ ] β Dx = ( X) Z(Z Z) 1 Z ( X) ( X) Z(Z Z) 1 Z ( y) [ = [ i ( X i) Z i ] [ i Z iz i ] 1 [ ] 1 i Z i( X i )] [ [ i ( X i) Z i ] [ i Z iz i ] 1 [ ] i Z i( y i )] (9236) 8 Formally, we here use different IVs for the (T 1) + (T 2) different equations in (924), with β as a common slope coefficient 17

18 It is possible to include not only the essential OCs, but also the redundant OCs when constructing this GMM estimator The singularity of Z Z when including all OCs, due to the linear dependence between the redundant and the essential OCs, may be treated by replacing standard inverses in the estimation formulae by generalised (Moore-Penrose) inverses The resulting estimator is β Dx, which is shown formally in Biørn and Klette (1998) If ɛ has a non-scalar covariance matrix, a more efficient GMM estimator is obtained for V = V Z( ɛ) = E[Z ( ɛ)( ɛ) Z], which gives β Dx = [ ] ( X) ZV 1 1 [ ] Z( ɛ) Z ( X) ( X) ZV 1 Z( ɛ) Z ( y) (9237) We can estimate 1 N V Z( ɛ) consistently from the residuals obtained from (9237), ɛ i = y i ( X i ) β Dx, by means of [see White (1984, sections IV3 and VI2) and (1986, section 3)] V Z( ɛ) N = 1 N Z N i( ɛ i )( ɛ i ) Z i (9238) i=1 Inserting (9238) in (9237), we get the asymptotically optimal (feasible) GMM estimator 9 [ β Dx = [ i ( X i) Z i ][ i Z ɛ ɛ i i iz i ] 1 [ ] 1 i Z i( X i )] [ [ i ( X i) Z i ][ i Z ɛ ɛ i i iz i ] 1 [ ] i Z i( y i )] (9239) These estimators can be modified by extending in (9237) all x i(t,t 1) to (x i(t,t 1) y i(t,t 1) ) and all x it to (x it y it ), which also exploit the OCs in the ys Equation in levels, IVs in differences Consider next the T stacked level equations for individual i [confer (923)] y i1 y it = c c + x i1 x it β + ɛ i1 ɛ it, (9240) or, compactly, y i = e T c + X i β + ɛ i The IV matrix, according to Proposition 2, is the (T T (T 2)K) matrix 10 x i(1) 0 Z i = (9241) 0 x i(t ) 9 It is possible to include the redundant OCs also when constructing this GMM estimator Using (Moore-Penrose) inverses, the estimator remains the same 10 Again, we formally use different IVs for different equations, considering (9240) as T different equations with β as a common slope coefficient 18

19 Let y = [y 1,, y N ], ɛ = [ɛ 1,, ɛ N ], X = [X 1,, X N], Z = [( Z 1 ),, ( Z N ) ] The GMM estimator corresponding to E[( Z i ) ɛ i ] = 0 T (T 2)K,1, which minimizes [N 1 ɛ ( Z)](N 2 V ) 1 [N 1 ( Z) ɛ] for V = ( Z) ( Z), can be written as [ ] 1 β Lx = X ( Z)[( Z) ( Z)] 1 ( Z) X [ ] X ( Z)[( Z) ( Z)] 1 ( Z) y [ = [ i X i( Z i )] [ i ( Z i) ( Z i )] 1 [ ] 1 i ( Z i) X i ] [ [ i X i( Z i )] [ i ( Z i) ( Z i )] 1 [ ] i ( Z i) y i ] (9242) If ɛ has a non-scalar covariance matrix, a more efficient GMM estimator is obtained for V = V ( Z)ɛ = E[( Z) ɛɛ ( Z)], which gives β Lx = [ ] X ( Z)V 1 1 [ ] ( Z)ɛ ( Z) X X ( Z)V 1 ( Z)ɛ ( Z) y (9243) We can estimate 1 N V ( Z)ɛ consistently from the residuals obtained from (9243), by V ( Z)ɛ N = 1 N N ( Z i ) ɛ i ɛ i( Z i ) (9244) i=1 Inserting (9244) in (9244), we get the asymptotically optimal GMM estimator β Lx = [[ i X i( Z i )] [ i ( Z i) ɛ i ɛ i( Z i ) ] 1 ] 1 [ i ( Z i) X i ] [[ i X i( Z i )] [ i ( Z i) ɛ i ɛ i( Z i ) ] 1 ] [ i ( Z i) y i ] (9245) These estimators can be modified by extending all x i(t) to ( x i(t) y i(t) ) in (9241), which also exploit the OCs in the ys Other moment estimators, which will not be discussed specifically in the present EIV context, are considered for situations with predetermined IVs in Ziliak (1997), with the purpose of reducing the finite sample bias of asymptotically optimal GMM estimators 927 Extensions Modifications All the methods presented so far rely on differencing as a way of eliminating the individual effects, either in the equation or in the instruments This is convenient for the case where the individual heterogeneity has an unspecified correlation with the latent regressor vector and for the fixed effects case Other ways of eliminating this effect in such situations are discussed in Wansbeek (2001) Their essence is to stack the matrix of covariances between the regressand and the regressors and eliminating these nuisance parameters by suitable 19

20 projections Exploiting a possible structure, suggested by our theory, on the covariance matrix of the ξ it s and α i across individuals and periods, may lead to further extensions Additional exploitable structure may be found in the covariance matrix of the y it s The latter will, however, lead to moment restrictions that are quadratic in the coefficient vector β Under non-normality, higher order moments may also, in principle, be exploited to improve efficiency, but again at the cost of a mathematically less tractable problem In a random effects situation, with zero correlation between ξ it and α i, and hence between x it and α i, differencing or projecting out the α i s will not be efficient, since they will not exploit this zero correlation The GLS estimator, which would have been the minimum variance linear unbiased estimator in the absence of measurement errors, will no longer, in general, be consistent [see Biørn (1996, Section 1043)], so it has to be modified Finally, if the equation contains strongly exogenous regressors in addition to the errorcontaminated ones, further moment conditions exist, which can lead to improved small sample efficiency of the GMM estimators An improvement of small sample efficiency may also be obtained by replacing IV or GMM by LIML estimation; see Wansbeek and Meijer (2000, Section 66) 928 Concluding remarks Above we have demonstrated that several, rather simple, GMM estimators which may handle jointly the heterogeneity problem and the measurement error problem in panel data, exist These problems may be intractable when only pure (single or repeated) cross section data or pure time series data are available Estimators using either equations in differences with level values as instruments, or equations in levels with differenced values as instruments are useful In both cases, the differences may be taken over one period or more Even for the static model considered here, instruments constructed from the regressors (xs) as well as from the regressands (ys) may be of interest GMM estimators combining both instrument sets in an optimal way, are usually more precise than those using either of them Although a substantial number of orthogonality conditions constructed from differences taken over two periods or more are redundant, adding the essential two-period difference orthogonality conditions to the one-period conditions in the GMM algorithm may significantly affect the result [confer the examples in Biørn (2000)] Using levels as instruments for differences, or vice versa, as a general estimation strategy within a GMM framework, however, may raise problems related to weak instruments Finding operational ways of identifying such instruments among those utilizing essential orthogonality conditions in order to reduce their potential damage with respect to inefficiency, is a challenge for future research 20

21 93 Simultaneity and panel data Simultaneous equation models (SEM) or structural models as they are also sometimes called, have been around in the economic literature for a long time dating back to the period when the Econometric Society itself was formed In spite of this long history, their relevance in modelling economic phenomena has not diminished; if at all it is only growing over time with the realisation that there is a high degree of interdependence among the different variables involved in the explanation of any socio-economic phenomenon The theory of simultaneous equations has become a must in any econometric course whatever level it may be This is due to the fact any researcher needs to be made attentive to the potential endogenous regressor problem, be it in a single equation model or in a system of equations and this is the problem that the SEM theory precisely deals with At this stage it may be useful to distinguish between interdependent systems ie simultaneous equations and what are called systems of regression equations or seemingly unrelated regressions (SUR) in which there are no endogenous variables on the right hand side but non-zero correlations are assumed between error terms of different equations We will see later in the section that the reduced form of a SEM is a special case of SUR In a panel data setting, in addition to the simultaneous nature of the model which invariably leads to non-zero correlation between the right hand side variables and the residual disturbance term, there is also the possibility of the same variables being correlated with the specific effects However unlike in the correlated regressors case of Chapter 4 eliminating the specific effect alone does not solve the problem here and we need a more comprehensive approach to tackle it We will develop generalizations of the two stage least squares (2SLS) and three stage least squares (3SLS) methods that are available in the classical SEM case These generalizations can also be presented in a GMM framework, giving the corresponding optimal estimation in this context The most commonly encountered panel data SEM is the SEM with error component (EC) structure Thus a major part of this chapter will the devoted to this extension and all its variants Other generalizations will be briefly discussed at the end 931 SEM with EC The Model This model proposes to account for the temporal and cross sectional heterogeneity of panel data by means of an error components structure in the structural equations of a simultaneous equation system In other words, the specific effects associated with pooled data are incorporated in an additive manner in the random element of each equation Let us consider a complete linear system of M equations in M current endogenous variables and K exogenous variables We do not consider the presence of lagged endogenous variables in the system The reader is referred to the separate chapter of this book dealing with dynamic panel data models for treatment of such cases 21

22 By a complete system, we assume that there are as many equations as there are endogenous variables and hence the system can be solved to obtain the reduced form Further, we also assume that the data set is balanced ie observations are available for all the variables for all the units at all dates Once again, the case of unbalanced panel data sets is dealt with in a separate chapter of the book We write the M th structural equation of the system as follows 11 : y itγ m + x itβ m + u mit = 0, m = 1, M (931) where y it is the (1 M) vector of observations on all the M endogenous variables for the i th individual at the t th time period; x it is the (1 K) vector of observations on all the K exogenous variables for the i th individual at the t th time period; γm and βm are respectively the coefficient vectors of y it and x it ; and u mit is the disturbance term of the m th equation for the i th individual and the t th time period More explicitly, y it = [y 1it y Mit ]; x it = [x 1it x Kit ]; β m = [β 1m β Mm]; γ m = [γ 1m γ Km] By piling up all the observations in the following way: Y = y 11 y 1T y NT equation (931) can be written as: ; X = x 11 x 1T x NT ; u m = u m11 u m1t u mnt, Y γ m + Xβ m + u m = 0, m = 1, M (932) Defining Γ = [γ 1 γ M]; B = [β 1 β M]; U = [u 1 u M ], we can write the whole system of M equations as: Y Γ + XB + U = 0 (933) Before turning to the error structure, we add that the elements of Γ and B satisfy certain a priori restrictions, crucial for identification, in particular the normalisation rule ( γii = 1 ) and the exclusion restrictions (some elements of Γ and B are identically zero) 11 Note that the constant term is included in the β vector, contrary to the introductory chapters, and hence x it contains 1 as its first element 22

Econometrics of Panel Data

Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao