Econ 582 Fixed Effects Estimation of Panel Data Eric Zivot May 28, 2012 Panel Data Framework = x 0 β + = 1 (individuals); =1 (time periods) y 1 = X β ( ) ( 1) + ε Main question: Is x uncorrelated with? 1 If yes, then we have a SUR type model with common coefficients 2 If no, then we have a multi-equation system with common coefficients and endogenous regressors We need to use an estimation procedure to deal with the endogeneity In the panel set-up, under certain assumptions, we can deal with the endogeneity without using instruments using the so-called fixed effects (FE) estimator
Error Components Assumption = + = unobserved fixed effect [x ] = 0 [ε ε 0 ] = Σ The fixed effect component (which is actually an unobserved random variable) captures unobserved heterogeneity across individuals that is fixed over time With the error components assumption, the RE and FE models are defined as follows: RE model: [x ]=0 FE model: [x ] 6= 0 Example: Panel wage equation 69 = + 69 + + 69 + + 69 80 = + 80 + + 80 + + 80 [ ] 6= 0 Here captures unobserved ability that is correlated with Itisassumed that ability does not vary over time Note: The existence of guarantees across time error correlation: [ 69 80 ] = [ ³ + 69 ³ + 80 ] = [ 2 ]+ [ 80 ]+ [ 69 ]+ [ 69 80 ] 6= 0
Remark: With panel data, as we saw in the last lecture, the endogeneity due to unobserved heterogeneity (ie, [x ] 6= 0) can be eliminated without the use of instruments To see this, consider the difference in log-wages over time: 80 69 =( )+ ( 80 69 ) + ( 80 69 )+( )+ ³ 80 69 = ( 80 69 )+ ( 80 69 )+ ³ 80 69 Hence, we can consistently estimate and by using the first differenced data! Fixed Effects Estimation Key insight: With panel data, β can be consistently estimated without using instruments There are 3 equivalent approaches 1 Within group estimator 2 Least squares dummy variable estimator 3 First difference estimator
Within group estimator To illustrate the within group estimator consider the simplified panel regression with a single regressor = + + [ ] 6= 0 [ ] = 0 Trick to remove fixed effect : First, for each average over time = + + = 1 = 1 = 1 Second, form the transformed regression or = ( )+( )+ = + Now, stack by observation for =1 giving the giant regression ỹ 1 x 1 η 1 = + ỹ x η or ỹ = x + 1 1 η 1
The within-group FE estimator is pooled OLS on the transformed regression (stacked by observation) ˆ = ( x 0 x) 1 x 0 ỹ = x 0 x x 0 ỹ Remarks 1 If x does not vary with (eg x = x ) then x = 0 and we cannot estimate β 2 Must be careful computing the degrees of freedom for the FE estimator There are total observations and parameters in β so it appreas that there are degrees of freedom However, you lose 1 degree of freedom for each fixed effect eliminated So the actual degrees of freedom are = ( 1)
Matrix Algebra Derivation of Within Group Fixed Effects Estimator Consider the general model (assume all variables vary with and ) = x 0 β + + Stack the observations for =1 giving Define y = X 1 β ( ) ( 1) 1 (1 1)( 1) + + η 1 Q = I 1 (1 0 1 ) 1 1 0 = I P P = 1 (1 0 1 ) 1 1 0 = 1 1 1 0 Note P 1 = 1 Q 1 = 0 P y = 1 (1 0 1 ) 1 1 0 y = 1 Q y = (I P )y = y 1 = ỹ
The transformed error components model is then Q y = Q X β + Q 1 + Q η ỹ = X β + η The giant regression (stacked by observation) is ỹ 1 = β + ỹ or ỹ = X 1 X 1 X β ( )( 1) η 1 η + η 1 Note: Unless [ηη 0 ]=σ 2 I ˆβ = ˆβ is not efficient The FE estimator is again pooled OLS on the transformed system ˆβ = ³ X 0 X 1 X 0 ỹ = X 0 X X 0 ỹ = (Q X ) 0 Q X = X 0 Q X X 0 Q y since Q is idempotent (Q X ) 0 Q y
Least Squares Dummy Variable Model Consider the general model Stack the observations over giving = x 0 β + + y = X β + 1 + η 1 Now create the giant regression or y 1 y 1 y = X = X 1 X ( )( 1) D = I 1 β β + + D 1 0 0 0 0 0 0 1 α + ( )( 1) 1 + η 1 η η = Xβ +(I 1 ) α + η ( 1) Aside: Partitioned Regression Consider the partitioned regression equation y = X 1 β 1 + X 2 β 2 + ε 1 1 1 1 2 2 2 The LS estimators for β 1 and β 2 can be expressed as where ˆβ 1 = (X 0 1 Q 2X 1 ) 1 X 0 1 Q 2y Q 2 = I P X2 ˆβ 2 = (X 0 2 Q 1X 2 ) 1 X 0 2 Q 1y Q 1 = I P X1 P X1 = X 1 (X 0 1 X 1) 1 X 0 1 P X 2 = X 2 (X 0 2 X 2) 1 X 0 2
Result: The FE estimator is the partitioned OLS estimator of β in the giant regression ˆβ = (X 0 Q X) 1 X 0 Q y Q = I P P = D(D 0 D) 1 D 0 D = I 1 Now, Therefore, P = (I 1 ) h (I 1 ) 0 (I 1 ) i 1 (I 1 ) 0 = (I 1 )[I 1 0 1 ] 1 (I 1 ) 0 = (I 1 )[I ³ 1 0 1 1] ³ I 1 0 = I P Q = I P = I (I P ) = I Q As a result ˆβ = (X 0 Q X) 1 X 0 Q y = (X 0 (I Q ) X) 1 X 0 (I Q ) y = X 0 X X 0 ỹ which is exactly the result we got when we transformed the model by subtracting off group means That is, the LSDV estimator of β in the FE model is numerically identical to the Within estimator of
Recovering Estimates of α With the LSDV approach, it is straightforward to deduce estimates for By examining the normal equations for α in the Giant Regression, one can deduce that Comments: ˆ = x 0 ˆβ = 1 x = 1 For short panels (small )ˆ is inconsistent ( fixed and ) FE as a First Difference Estimator Results: When =2 pooled OLS on the first differenced model is numerically identical to the LSDV and Within estimators of β When 2 pooled OLS on the first differenced model is not numerically the same as the LSDV and Within estimators of β It is consistent, but generally less efficient that the LSDV and Within estimators When 2 and [η η 0 ]= 2 I (no serial correlation), then pooled GLS on the first differenced model is numerically the same as the LSDV andwithinestimators
Equivalence of First Differenced (FD) Estimator and LSDV and Within Estimators Consider the error components model = x 0 β + + [η η 0 ] = 2 I Note: The spherical error assumption is made so that GLS estimation is possible Take 1st differences over : 2 = 2 1 = x 0 2 β + 2 = ( 1) = x 0 β + Notice that differencing removes the fixed effect In matrix form the model is or C 0 y = C 0 X β + C 0 η ( 1) 1 ŷ = ˆX β + ˆη where Note that C 0 ( 1) = 1 1 0 0 0 1 1 0 0 0 0 1 1 [ˆη ˆη 0 ]= [C0 η η 0 C]=C0 [η η 0 ]C = 2 C 0 C
The transformed Giant Regression is ŷ 1 = ŷ or Notice that [ˆηˆη 0 ] = ( 1) ( 1) ŷ = ˆX ( 1) 1 ˆX 1 ˆX β + β (( 1) ) ( 1) ˆη 1 ˆη 2 C 0 C 0 0 0 2 C 0 C 0 0 0 2 C 0 C + ˆη ( 1) 1 = 2 h I (C 0 C) i Pooled GLS on the transformed regression gives ˆβ = h ˆX 0 ³ I (C 0 C) 1 ˆX i 1 ˆX 0 ³ I (C 0 C) 1 ŷ = X 0 C(C 0 C) 1 C 0 X X 0 C(C 0 C) 1 C 0 y = = = X 0 P X X 0 Q X X 0 X X 0 ỹ X 0 P y X 0 Q y
Now: ˆX 0 ³ I (C 0 C) 1 ˆX = h ˆX 0 1 ˆX 0 i (C 0 C) 1 0 0 0 0 0 0 (C 0 C) 1 = ˆX 0 (C0 C) 1 ˆX = = (C 0 X ) 0 (C 0 C) 1 C 0 X X 0 C(C 0 C) 1 C 0 X = ˆX 1 ˆX X 0 P X P = C(C 0 C) 1 C 0 The result P = C(C 0 C) 1 C 0 = Q = I 1 (1 0 1 ) 1 1 0 follows from the result that 1 1 0 0 1 0 C 0 0 1 1 0 1 = 1 = 0 0 0 0 1 1 1 0 That is, P = C(C 0 C) 1 C 0 is an idempotent matrix satisfying P 1 = 0 It projects onto the space orthogonal to 1 andthisisexactlywhatq does: Q 1 = 0 Hence, when [η η 0 ]= 2 I pooled GLS on the FD model is numerically equivalent to the LSDV and Within estimators of β
Panel Robust Statistical Inference = x 0 β + + = 1 ; =1 Assumptions: 1 y = X β + ε 2 ε = 1 + η (error components) 3 {y X } is iid (over but not ) 4 [ ] 6= 0(Endogeneity) 5 [ ]=0for =1 2 Comments It is reasonable to assume independence over (ie, cross-sectional independence due to random sampling at given ) Errors are generally serially correlated over for a given ( is autocorrelated) and heteroskedastic over (cross-sectional heteroskedasticity) OLS standard errors are typically downward biased due to serial correlation Panel robust standard errors correct for serial correlation and heteroskedasticity
FE Panel Data Estimators in Common Notation The Within (LSDV) and FD estimators of β in = x 0 β + + = 1 ; =1 are pooled OLS estimators in a transformed model = x 0 β + Within Estimator: = x = x x FD Estimator: = 1 x = x x 1 1 In matrix notation, we have ỹ = X β + η =1 1 and the giant regression is Pooled OLS on the giant regression is ˆβ = ³ X 0 X 1 X 0 ỹ = ỹ = X β + η 1 1 X 0 X X 0 ỹ =
Asymptotic Distribution Theory (Advanced) Using y = X β + η we have ˆβ = X 0 X X 0 = X 1 0 X = β + X 0 X ³ X β + η X 0 X β + X 0 η X 0 X X 0 η It follows that ³ˆβ β = 1 X 0 X 1 X X 0 η The Law of Large Numbers (LLN) gives 1 X 0 X [ X 0 X ] 1 and the Central Limit Theorem (CLT) gives 1 X X 0 η (0 [ X 0 η η 0 X ])
Hence, by Slutsky s theorem ³ˆβ β = 1 1 X 0 X 1 X X 0 [ X 0 X ] 1 (0 [ X 0 η η 0 X ]) (0 avar(ˆβ )) where Therefore, avar(ˆβ )= [ X 0 X ] 1 [ X 0 η η 0 X ] [ X 0 X ] 1 ˆβ (β 1 avar(ˆβ d )) Remark: The sandwich form of avar(ˆβ ) suggests that it is an inefficient estimator in general Panel Robust Inference Under serial correlation and heteroskedasticity, it can be shown (Econ 583) 1 X 0 X [ X 0 X 1 ] X 0 η η 0 X [ X 0 η η 0 X ] η = ỹ X ˆβ Then a consistent estimate for avar(ˆβ ) is avar(ˆβ d ) = 1 1 X 0 X 1 X 0 X X 0 η η 0 X 1
Remarks The formula for avar(ˆβ d ) is the panel robust covariance estimate It is not the same as the usual White correction for heteroskedasticity in a pooled OLS regression The White correction does not account for serial correlation It is also not the Newey-West correction for heteroskedasticity and autocorrelation The formula for avar(ˆβ d ) is also known as the cluster robust covariance estimate when the clustering variable is (each individual is a cluster) Special Case: When η is Spherical [η η 0 ]= 2 I For the Within estimator, we have η = Q η so that [ η η 0 ]= [Q η η 0 Q ]= 2 Q This implies that there is no serial correlation (correlation across time) and no cross-sectional heteroskedasticity [ X 0 η η 0 X ]= [ X 0 [ η η 0 ]] X ]= 2 [ X 0 X ] and avar(ˆβ )= 2 [ X 0 X ] 1
Then a consistent estimate for avar(ˆβ ) is avar(ˆβ d ) = ˆ 2 ˆ 2 = 1 1 X 0 X η = (ỹ X ˆβ ) η 0 η Remark: In this case the Within estimator is an efficient estimator