Lecture 8 Panel Data Economics 8379 George Washington University Instructor: Prof. Ben Williams
Introduction This lecture will discuss some common panel data methods and problems. Random effects vs. fixed vs. alternatives IV in panel data Lagged dependent variables Binary Outcome panel data models
The linear panel model Basic model and assumptions: y it = β x it + η i + ν it A1 E(ν i1,..., ν it x i1,..., x it, η i ) = 0 A2 Var(ν i1,..., ν it x i1,..., x it, η i ) = σ 2 I T These assumptions can be replaced by weaker but harder to interpret assumptions.
Differencing and within variation Some notation first: y i = (y i1,..., y it ) x i = (x i1,..., x it ) ν i = (ν i1,..., ν it ) The basic idea you ve seen before: In matrix notation, y it = β x it + ν it and E( ν it x it ) = 0 Dy i = Dx i β + Dν i where D is the (T 1) T first difference operator.
Differencing and within variation The fixed effects regression is not ( n i=1 x i D Dx i ) 1 n i=1 x i D Dy i, though this first differences estimator would be consistent under assumption A1.
Differencing and within variation The fixed effects regression is not ( n i=1 x i D Dx i ) 1 n i=1 x i D Dy i, though this first differences estimator would be consistent under assumption A1. Because Var(Dν i x i ) = σ 2 DD, the GLS estimator is more efficient, ˆβ fe := ( n i=1 x i D (DD ) 1 Dx i ) 1 n i=1 x i D (DD ) 1 Dy i
Differencing and within variation But Q = D (DD ) 1 D is idempotent and equal to I T ιι /T. This is the within-group operator. The fixed effects estimator is based on within variation. The fixed effects estimator is equivalent to including entity dummies.
Differencing and within variation Properties of the fixed effects (or within-group) estimator: For a fixed T, ˆβ fe is unbiased and optimal 1, and as n it is consistent and asymptotically normal. Estimates of η i are unbiased but only consistent if T. If T then ˆβ fe is consistent, even if n is fixed.
Differencing and within variation Robust standard errors: If A2 does not hold then the usual standard error formula for OLS on the transformed data is inconsistent. If T is fixed and n is large then the clustered (on entity) standard error formula provides a HAC estimator. If T is large and n is fixed then a Newey West type std error estimator is required for consistency under serial correlation.
Differencing and within variation Under serial correlation in ν it, the fixed effects estimator is not optimal. Let νi = Dν i. Generally, if Var(νi x i ) = Ω(x i ) then the GLS estimator is ( n i=1 x i D Ω(x i )Dx i ) 1 n i=1 x i D Ω(x i )Dy i In the special case where Var(νi x i ) = Ω, replace Ω(x i ) with n ˆΩ = n 1 ˆν i ˆν i i=1 to get a feasible GLS estimator.
Random effects Pooled OLS estimator is ( n ˆβ pooled = x i x i i=1 ) n i=1 x i y i It s unbiased and consistent only under the assumption that E(η i x it ) = 0. Under assumption A2 and Var(η i x i ) = σ 2 η, Var(η i ι + ν i x i ) = σ 2 ηιι + σ 2 I T
Random effects The GLS estimator is then ( n ) n ˆβ GLS = x i V 1 x i x i V 1 y i i=1 where V 1 = σ 2 ( I T σ 2 ηιι /(σ 2 + T σ 2 η) ). This is the random effects estimator. When T, this becomes the fixed effects estimator. More generally, if ψ = σ 2 η/(σ 2 + T σ 2 η) goes to 0 we get fixed effects and if ψ goes to 1 we get pooled OLS. i=1
Random effects Feasible GLS Estimate ψ in first stage to get estimate of ˆV. Several ways to estimate ψ. This is what xtreg...,re in Stata does. An alternative is the maximum likelihood estimator that will estimate β and σ and σ 2 η simultaneously. the usual MLE assumes that η i N(0, σ 2 η) though different distributions can be used.
Random effects vs fixed effects The primary difference between the two is that random effects assumes η i is uncorrelated with x it. The idea of fixed (non-random) versus random effects is not the real distinction. Mundlak (1978) showed that the fixed effects estimator is equivalent to a random effects type (GLS) estimator of the model where η i = a x i + ω i where ω i is independent of x i. Not true in nonlinear models!
Measurement error Motivating example Bover and Watson (2000) consider a simplified version of the model from Arellano (2003) Conditional money demand equation: y it denotes cash holdings (real money balances) of firm i in year t x it denotes sales η i = log(a i ) where a i denotes a firm s financial sophistication
Measurement error Suppose x it = x it + ε it and the true regressor values, x it are unobserved. Fixed effects can exacerbate measurement error bias:
Measurement error Suppose x it = x it + ε it and the true regressor values, x it are unobserved. Fixed effects can exacerbate measurement error bias: The measurement ( error ) bias in the FE estimator when T = 2 is β 1 1 1+λ where λ = Var( ε it )/Var( x it ) If ε it and x it are both iid then this attentuation bias is identical to the cross-sectional bias. If ε it is iid but x it is positively serially correlated then the bias is larger than in the cross-section.
Measurement error Suppose x it = x it + ε it and the true regressor values, x it are unobserved. Fixed effects can exacerbate measurement error bias: The measurement ( error ) bias in the FE estimator when T = 2 is β 1 1 1+λ where λ = Var( ε it )/Var( x it ) If ε it and x it are both iid then this attentuation bias is identical to the cross-sectional bias. If ε it is iid but x it is positively serially correlated then the bias is larger than in the cross-section. When T > 2, ε it is iid and x it is positively serially correlated Griliches and Hausman (1986) show that the bias of the fixed effects estimator lies between the bias of pooled OLS and that of OLS in first-differences.
Measurement error Panel IV can be a solution to the measurement error problem when ε it is not serially correlated and x it is. If η i is independent (random effects/pooled OLS model) then E( x is (y it β x it )) = 0 for s t
Measurement error If η i is correlated with x it, one solution is to take first differences and use the moment conditions E( x is ( y it β x it )) = 0 for s = 1,..., t 2, t + 1,..., T This requires T 3. Also, the rank condition should be considered carefully. What if x it is white noise? What is x it is a random walk? What if x it = α i + ξ it? With larger T, there is a tradeoff between allowing serial correlation in ε it and needing serial correlation in x it.
Measurement error Table from Bover and Watson (2000):
Measurement error The relationship among the pooled OLS, FE, and first difference estimators is consistent with measurement error in sales. Column (4) is GMM on first differences using other time periods as instruments. The Sargan test here is also marginally suggestive of measurement error. Columns (5) and (6) seem to correct for measurement error and are consistent with the expectation that pooled OLS should be downward biased.
Panel IV Start with the RE/pooled model where y it = β x it + u it. Moment conditions: E(z i u i) = 0 where z i is a T r matrix of instruments. Given r r weighting matrix W N, the panel GMM estimator is ( 1 n n n n ˆβ PGMM = ( x i z i)w N ( z i i)) x ( x i z i)w N ( z i y i) i=1 i=1 i=1 i=1
Panel IV The weighting matrix doesn t matter if r = dim(β) so that the model is just identified. The one step GMM estimator (which is just 2SLS) uses W N = ( n i=1 z i z i) 1 As we ve seen, this is optimal under homoskedasticity. The two step GMM estimator calculates a robust estimate of the asymptotic variance of n 1 n i=1 z i u i. This is Ŝ and then W N = Ŝ 1.
Panel IV Moment conditions. Pooled OLS is equivalent to taking z i = x i. Note that the moment conditions are then of the form: E( T x it u it ) = 0 t=1
Panel IV Moment conditions. Pooled OLS is equivalent to taking z i = x i. Note that the moment conditions are then of the form: E( T x it u it ) = 0 t=1 The moment conditions E(z it u it ) = 0 can be enforced by defining Z i as the Tdim(z it ) T block diagonal matrix with z i1,..., z it on the diagonal.
Panel IV Moment conditions. Pooled OLS is equivalent to taking z i = x i. Note that the moment conditions are then of the form: E( T x it u it ) = 0 t=1 The moment conditions E(z it u it ) = 0 can be enforced by defining Z i as the Tdim(z it ) T block diagonal matrix with z i1,..., z it on the diagonal. You get even more moment conditions under weak exogeneity (T (T 1)r) or strong exogeneity (T 2 r). Careful of the weak/many instruments finite sample bias!
Panel IV with fixed effects Suppose u it = η i + ν it. The GMM estimators take the same form as before except applied to y it = β x it + ν it where y it, x it, and ν it represent some differencing transformation.
Panel IV with fixed effects Suppose u it = η i + ν it. The GMM estimators take the same form as before except applied to y it = β x it + ν it where y it, x it, and ν it represent some differencing transformation. The available moment conditions depend on what type of differencing is used. Random vs. fixed The relevant distinction now is whether we want to assume that E(Z i u i) = 0 or E(Z i ν i ) = 0; the former requires any fixed effect η i to be uncorrelated with Z i, though not necessarily with X i.
Panel IV with fixed effects Suppose u it = η i + ν it. The GMM estimators take the same form as before except applied to y it = β x it + ν it where y it, x it, and ν it represent some differencing transformation. The available moment conditions depend on what type of differencing is used. Random vs. fixed The relevant distinction now is whether we want to assume that E(Z i u i) = 0 or E(Z i ν i ) = 0; the former requires any fixed effect η i to be uncorrelated with Z i, though not necessarily with X i. Under the random effects assumption it is again possible to take advantage of the error component structure to improve efficiency.
Strong vs weak exogeneity Assumption A1 for the fixed effects estimator was that E(ν it x i, η i ) = 0 This is strong exogeneity. This implies no feedback x it cannot be informed by ν is for s < t so it rules out lagged dependent variables x it = y i(t 1) We will focus on the lagged dependent variable problem and consider two basic solutions: figure out which past values can be used as instruments fixed effects is ok for large T
AR model with fixed effects Consider as a simple example the autoregressive model: y it = αy i(t 1) + η i + ν it B1 E(ν it y t 1 i, η i ) = 0 B2 E(νit 2 y t 1 i, η i ) = σ 2 B3 (mean stationarity) E(y i0 η i ) = η i /(1 α) B4 (covariance stationarity) Var(y i0 η i ) = σ 2 /(1 α 2 ) The fixed effects estimator has a bias that is equal to (1 + α)/2 when T = 2 approximately (1 + α)/t for large T This is called the Nickell bias due to pioneering work of Nickell (1981).
AR model with fixed effects Without assumptions B3 and B4 the bias is more complicated. E.g., if T = 2 and σ 2 η/var(ν i1 ) is large then the bias is very small. What if T is large but the same order of magnitude as n? Formally, if n/t c > 0 then nt (ˆαfe α) N( c(1 + α), (1 α 2 )/(nt )) For moderate values of T, a bias-corrected estimator: ˆα fe,bc = ˆα fe + 1 + ˆα fe T
IV solution Anderson and Hsiao (1981, 1982) suggested using an IV estimator that uses y i(t 2) or y i(t 2) as an instrument for y it when T 3 or T 4.
IV solution Anderson and Hsiao (1981, 1982) suggested using an IV estimator that uses y i(t 2) or y i(t 2) as an instrument for y it when T 3 or T 4. There are potentially many more moment conditions under assumption B1: E(y t 1 i ( y it α y i(t 1) )) = 0, t = 2,..., T
IV solution Holtz-Eakin, Newey, and Rosen (1988) and Arellano and Bond (1991) suggest implementing a GMM estimator that uses all (T 1)T /2 moment conditions. The Arellano Bond estimator uses a one-step optimal weighting matrix that accounts for serial correlation due to differencing, n ˆV = z i DD z i i=1 There is a bias however when n T that is proportional to 1/n.
IV solution Advice: When T is larger than n, use FE. When n is larger than T, use Arellano-Bond. When n is similar in magnitude to T, use bias-correction or limited number of instruments/moments.
With lagged dependent variables and other regressors Suppose y it = αy i(t 1) + β x it + η i + ν it. Use additional lags of x it as instruments for y i(t 1) this helps when ν it is serially correlated. tradeoff between exogeneity assumptions on x it and serial correlation in ν it, especially for small T Again, fixed effects will be consistent, and sometimes preferred, for large T
Extensions Arellano and Bover (1995) and Blundell and Bond (1998) suggest also using differences to instrument for levels. Use additional moments for time invariant regressors. Use second order moments (Ahn and Schmidt, 1995) Interactive fixed effects (Ahn, Lee, and Schmidt, 2001), etc. Using covariance structure to improve efficiency.
Binary choice with endogeneity Linear probability model The usual 2SLS formula treats the second stage as a linear probability model. This produces what is sometimes called the Local Average Treatment Effect (LATE). This is an average of the treatment effect, Y 1 Y 0, over a certain subset of the population. More on this in a few weeks.
Binary choice with endogeneity Random utility model A random utility/threshold crossing/ linear index model: y i = 1(β 0 + β 1 x i + u i 0) In this model, the treatment effect is given by And the ATE is 1(β 0 + β 1 + u i 0) 1(β 0 + u i 0) Pr(β 0 + β 1 + u i 0) Pr(β 0 + u i 0) Sometimes we want to estimate treatment effects but sometimes we are interested in structural parameters, β 0 and β 1.
Binary choice with endogeneity Triangular model with probit second stage The two equations are x i = γ 0 + γ Z i + σ ν ν i y i = 1(β 0 + β 1 x i + u i 0) where ( ( 1 ρ (u i, ν i ) N 0, ρ 1 )) This can be estimate via maximum likelihood.
Binary choice with endogeneity Triangular model with probit second stage, continued This can be estimated in Stata using ivprobit: syntax similar to ivreg2 additional controls can be allowed first option allowed includes a test of exogeneity, H 0 : ρ = 0 The biprobit command is for the case where x i is also binary.
Binary choice with endogeneity Triangular model with probit second stage, continued This model imposes some strong restrictions: normality homoskedasticity full independence of Z i A two-step control function approach is available that can be computationally simpler (option twostep in Stata) You will get misleading estimates from a probit that uses predicted values from first stage. Wooldridge has some useful notes that explain these issues in more detail. A semiparametric model demonstrates where identification comes from.
Binary choice with endogeneity Semiparametric triangular model without specifying the error distributions no assumption of normality instead, u i x i, ν i u i v i under this assumption: estimate ˆν i from first stage then estimate nonparametrically Pr(y i = 1 x i = x, ˆν i = ν) = F(β 0 + β 1 x, ν) this can be used to estimate β and/or average marginal effects
Static binary choice panel model Static model: T Pr(y i x i, η i ) = F(β x it + η i ) t=1 The log likelihood function is l(β, {η i }) = n T log(f (β x it + η i )) i=1 t=1 The log of the integrated likelihood function is ( n T ) l(β) = log F(β x it + η i )f η x (η i x i )dη i i=1 t=1
Static model Random effects models are based on the integrated likelihood. Random effects probit/logit assume that f η x = f η. Similar assumption to RE in linear models. Similarly, this is more efficient than a pooled probit/logit estimator but potentially more efficient by accounting for cross-equation dependence. The Mundlak/Chamberlain approach: η i = a x i + ω i, or η i is some other function of x i. This is implemented using the integrated likelihood with f η x (η i x i ) = f ω(η i a x i ) This reduces to T t=1 F(β x it + a x i + ω i )f ω(ω i )dω i Cannot identify β k if x itk is time invariant.
Static model Fixed effects models are based on the full likelihood, l(β, {η i }) Treat the η i as separate parameters. This introduces the incidental parameter problem (Neyman and Scott, 1948). The fixed effects estimator is biased for a fixed T, but is consistent as T. If T and n are of similar magnitude, or T is smaller, then FE doesn t work. When T and n are of similar magnitude bias corrections have been suggested (see work of Fernandez-Val and others)
Static model Conditional logit: In the logit model, when T = 2, exp(x i1 Pr(y i1 = 0, y i2 = 1 y i1 +y i2 = 1, x i ) = β) exp(x i1 β) + exp(x i2 β) This conditional likelihood estimator is implemented in Stata via clogit Not logit with i.caseid!! For larger T, condition on T t=1 y it. This approach works for dynamic logit and multinomial logit models as well.
Dynamic model A dynamic model: T Pr(y i x i, η i ) = Pr(y it y i,t 1, x it, η i ) t=1 Dynamic logit, fixed effects Random effects The initial conditions problem. Predetermined x it is difficult to model.
Linear probability model In practice a linear probability model is often used That is, y it = β x it + η i + ν it, despite the fact that y it is binary. This allows for fixed effects, various types of endogeneity, Arellano and Bond GMM estimator, etc.
Basic panel commands Stata has a suite of panel commands such as xtreg But you can use cross-sectional tools to do most of this, sometimes with some difficulty. I won t go over the basics but some commands you should be familiar with: xtset reshape egen xtreg xtsum, xttab,xttrans
Panel regression in Stata xtreg can be used to estimate pooled OLS, FE, and RE The first differences estimator can be estimated using reg D.(yvar xvar1 xvar2), options Standard errors that are robust to heterosk. and serial correlation vce(cluster id) or vce(robust) for xtreg Both of these can handle unbalanced panels as well. Other FGLS estimators can be implemented manually or, in some cases, using xtreg, pa, xtgls or xtgee Comparing FE and RE use hausman to conduct the specification test
Panel IV estimation in Stata ivregress and ivreg2 can be used for most panel IV regressions if the data is appropriately transformed and the instrument vector is appropriately defined. This is useful sometimes because it gives more transparent control over what moment conditions are used. The xtivreg command, however, can be easier to work with.
Panel IV estimation in Stata xtivreg: options fe, fd,re allow different transformations of data this command uses what Cameron and Trivedi call the summation moment conditions it differences the instruments as well
Panel IV estimation in Stata lagged dependent variables: one solution is to work with the first differences model, using ivregress or xtivreg xtabond can be used to implement versions of the Arellano Bond (1991) estimator that incorporate other instruments
Panel IV estimation in Stata xtabond: this only works if you want to include a lagged dependent variable you can flexibly specify how many lags to include you can flexibly specify which regressors are exogenous, endogenous, or predetermined we will see some examples of the syntax in a few minutes
Other panel IV commands in Stata xtabond2 is a user-written code that has a few additional features the command xtdpdsys command implements Arellano and Bover (1995) and Blundell and Bond (1998) xthtaylor implements an alternative panel IV estimator (Hausman-Taylor)