Lecture 10: Instructor: Department of Economics Stanford University 2011
Random Effect Estimator: β R y it = x itβ + u it u it = α i + ɛ it i = 1,..., N, t = 1,..., T E (α i x i ) = E (ɛ it x i ) = 0. Eα 2 i = σ 2 α. Eɛ 2 it = σ2 ɛ. Eɛ it ɛ is = 0. Eα i ɛ it = 0, t. In vector notation, y }{{} NT 1 = (y 11,..., y 1T,..., y N1,..., y NT ), }{{} α = (α 1,..., α N ) N 1 ɛ = (ɛ 11,..., ɛ 1T,..., ɛ N1,..., ɛ NT ), y = X β + u, β R is essentially GLS. u = α l + ɛ }{{} l = (1,..., 1) T 1
Note that Euu = E (α l + ɛ) (α l + ɛ) = E [αα ll + ɛɛ ] = σ 2 αi N ll + σ 2 ɛi N I T = σ 2 ɛω where [ Ω = I N I T + T ] [ σ2 α 1 σɛ 2 T ll = I N I T + T ] σ2 α σɛ 2 P T where P T = 1 T ll. To get Ω 1 Use [C + xy ] 1 = C 1 1 1 + y C 1 x C 1 xy C 1, where C is m m, x and y are m 1. (This is useful when you update OLS when an additional observation comes in)
Then ( ) 1 I T + σ2 α σɛ 2 ll σ 2 = I T α/σɛ 2 1 + T σα/σ 2 ɛ 2 ll = I T = I T ( 1 θ 2) P T = Q T + θ 2 P T where θ 2 = σ2 ɛ σ 2 ɛ +T σ2 α So Ω 1 = I N ( Q T + θ 2 P T ). and Q T = I T P T. β R = ( X Ω 1 X ) 1 ( X Ω 1 y ) T σ2 α σ 2 ɛ + T σ 2 α = ( X I N ( Q T + θ 2 P T ) X ) 1 ( X I N ( Q T + θ 2 P T ) y ) Obviously ( Var (β R ) = σɛ 2 X Ω 1 X ) 1 ( = σ 2 ɛ X I N ( ) ) Q T + θ 2 1 P T X. ll T
Fixed Effect, Within Estimator: β W Use within group variation only, demean the data first, then run OLS. Q T y i = Q T X i β + Q T u i. Note that Q T u i = Q T ɛ i. β W = (X I N Q T X ) 1 (X I N Q T y) = Var (β W ) = σ 2 ɛ (X I N Q T X ) 1 = σ 2 ɛ ( ( i i X i Q T X i ) 1 ( i X i Q T X i ) 1 X i Q T y i ) β W cannot estimate coefficient on time-invarying regressors. The coefficients on time-varying regressors are always consistent no matter whether α i is correlated with X i or not.
Between Estimator: β B Use only the variation in group mean(between group variation). P T y i = P T X i β + P T u i. Note that P T u i = α i l + P T ɛ i. β B = (X I N P T X ) 1 (X I N P T y) = = ( N i=1 ( x i x i ) ) 1 ( N i=1 ( x i ȳ i ) Var (β B ) = T σ 2 ū (X I N P T X ) 1 = T ) ( i X i P T X i ) 1 ( i ( ) σα 2 + σ2 ɛ (X I N P T X ) 1 T X i P T y i )
The random effect estimator is a linear combination of β W and β B : β R = ( X ( I ( Q T + θ 2 P T )) X ) 1 [ X (I Q T ) y + θ 2 X (I P T ) y ] = ( X ( I ( Q T + θ 2 P T )) X ) 1 X (I Q T ) X (X (I Q T ) X ) 1 X I Q T y + ( X ( I ( Q T + θ 2 P T )) X ) 1 θ 2 X (I P T ) X (X I P T X ) 1 X I P T y = β W + (I ) β B for = ( X ( I ( Q T + θ 2 P T )) X ) 1 X (I Q T ) X I = ( X ( I ( Q T + θ 2 P T )) X ) 1 θ 2 X (I P T ) X
You need neither σ 2 α nor σ 2 ɛ to compute either β W or β B. The only place where σα 2 and σɛ 2 are needed for β R is to calculate θ. ˆσ ɛ 2 1 N = N(T 1) i=1 (Q T y i Q T X i β W ) consistently estimates σɛ. 2 ˆσ 2 ū = 1 N N i=1 (ȳ i x i β B) 2 consistently estimates σ 2 ū = σ 2 α + 1 T σ2 ɛ. Reminder: If you first difference the data to get rid of the fix effect α i, then run LS, you Don t get the same result as β W, except in the simple case where T = 2. But if run GLS, then it is the same as β W.
Time-invariant Regressors Consider y it = x itβ + z i γ + α i + ɛ it γ can only be identified from between group variation. Case 1: z i uncorrelated with α i : ȳ i = x i β + z i γ + α i + ɛ i = x i β W + z i γ + α i + ɛ i + x i β x i β W = x i β W + z i γ + α i + ˆɛ i for ˆɛ i = ɛ i + x i β x i β W Then just estimate γ by a second step LS regression: ( n ) 1 ( n ) ˆγ = z i z i z i (ȳ i x i β W ) i=1 i=1 i=1 ( n ) 1 ( n = γ + z i z i z i (α i + ɛ i + x i β x i β W ) since β W p β. i=1 ) P γ
Case 2: z i some endogenous Consider: x 1it k 1 of them, exogenous. z 1i g 1 of them, exogenous. x 2it k 2 of them, endogenous. z 2i g 2 of them, endogenous. Need at least k 1 g 2 in order to estimate γ. Use A = (X 1it : Z 1i ) to instrument the equation So that define ˆγ IV as ȳ i x i β W = z i γ + (α i + ɛ i + x i β x i β W ) ˆγ IV = (Z P A Z) 1 ( Z P A (ȳ X β 0 )) where P A is the projection matrix into the column space of A.
Recall the equation in the entire sample: y it = x itβ + z i γ + α i + ɛ it This will give consistent but inefficient estimate of γ, the inefficiency comes from: (1) not making use of all instruments; An efficient set of instruments is given by: A = (QX 1, QX 2, PX 1, Z 1 ), where Q = I N Q T, P = I N P T. The reason that QX 2 can be used as is that the within group variation is always uncorrelated with α i : EX 2 Qα = EX 2 0 = 0. Note also that X 1 has been used as two sets of IVs, QX 1 is used to instrument X 1 while PX 1 is used to instrument Z.
In contrast to standard simulatenuous equation model, where you need EXCLUDED exogenous variable to instrument INCLUDED endogenuous variable, here you use INCLUDED exogenuous variable to instrument INCLUDED endogenous variable. This is the special feature of the time invariance of the fix effect α i. (2) ignore heteroscedasticity. ( Recall that Var (α l + ɛ) = σɛ 2 I N I T + 1 θ2 P θ 2 T ), where you can estimate σɛ 2 and θ 2 using consistent ˆγ IV. So the efficient IV estimator described in Hausman and Taylor(1981) is given by applied IV using instruments A = (QX 1, QX 2, PX 1, Z 1 ) to the transformed equations: ˆΩ 1/2 y = Ω 1/2 X β + Ω 1/2 Zγ + Ω 1/2 (α l + ɛ)
Lagged Dependent Variable Recall y it = x itβ + α i + ɛ it where it is assumed that Eɛ i x i = 0, or Eɛ it x it = 0, t, t. For β W to be consistent, it is crucially important that none of the lagged y it s appeared in x it. Adding lagged INdependent variable is never a problem. y it = x itβ 1 + x it 1 β 2 + α i + ɛ it The problem comes once you have a single lagged y it appearing on the right hand side: y it = x itβ 1 + y it 1 β 2 + α i + ɛ it y it 1 = x it 1β 1 + y it 2 β 2 + α i + ɛ it 1
Now β W is no longer consistent, since even after you difference out α i : y it y it 1 = (x it x it 1 ) β 1 + (y it 1 y it 2 ) β 2 + ɛ it ɛ it 1 LS or GLS of this differenced equation can t be consistent: E ((y it 1 y it 2 ) (ɛ it ɛ it 1 )) 0 Ey it 1 ɛ it 1 0 But since Ey it 2 (ɛ it ɛ it 1 ) = 0, you can use y it 2 to instrument y it 1 y it 2. The more lagged y s you add into the regressors, the more lagged y s you will need as instruments. In other words, the panel need to be long enough relative to the number of lagged y s in the regression, no matter how big the cross section is.
This is assuming that is no serial correlation in the ɛ it s. If ɛ it is MA(1): ɛ it = ρu it 1 + u it, then Ey it 2 ɛ it 1 = ρey it 2 (ρu it 2 + u it 1 ) 0 so y it 3 instead of y it 2 can be used to instrument: y it y it 1 = (x it x it 1 ) β 1 + (y it 1 y it 2 ) β 2 + ɛ it ɛ it 1 If ɛ it is AR(p) then there is nothing you can do with it, unless you exclude some x i (e.g., lead and lagged value) from the regressors so that they can be used as instruments. So a single y it 1 as a regressor is sufficient to bring up all these problem created by serial correlation in the ɛ it s, while only lagged INdependent variables x it don t.
Incidental Parameter Consider estimating the coefficient on lagged dependent variable y i1 = βy i0 + α i + ɛ i1 y i2 = βy i1 + α i + ɛ i2 If T is fixed, MLE for β is not consistent. Also, we can t estimate the nuisance parameters α i,i = 1,..., n consistently. Assume ɛ it N (0, 1), the likelihood function is: Const 1 2 n [(y i1 βy i0 α i ) 2 + (y i2 βy i1 α i ) 2] i=1 First concentrate out α i, by just taking first order condition: ˆα i = y i1 + y i2 2 ˆβ y i1 + y i0 2
Put this back to get the concentrated likelihood function and simplies, it is up to a constant and constant proportion: n i=1 ( y i2 y i1 ˆβ ) 2 (y i1 y i0 ) This is just regressing y i2 y i1 on y i1 y i0, which is exactly β W and we know that it is inconsistent in the presence of lagged y i1 and y i0.