Economics 582 Random Effects Estimation

Economics 582 Random Effects Estimation Eric Zivot May 29, 2013

Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0 RE framework is a SUR model with common coefficients - efficient estimation depends on the covariance structure of +

The Homoskedastic Equi-correlation Framework A common covariance structure in the RE model is based on the assumptions Let = + Then and ( )= iid (0 2 ) iid (0 2 ) ( 2 2 + 2 for 6= for = Note, ( )= 2 2 + 2 for 6= ( )=0for 6=

In matrix notation we have Then y 1 = W δ + ε W = [1 X ] δ =( β 0 ) 0 ε = 1 + η [ε ε 0 ] = Ω [ε ε 0 ] = [( 1 + η )( 1 + η ) 0 ] = [ 2 ]1 1 0 + [η η 0 ] = 2 1 1 0 + [η η 0 ]

Now [η η 0 ]= 2... = 2 I 2 Hence, [ε ε 0 ] = 2 1 1 0 + 2 I = Ω 2 + 2 2 2 = 2 2 + 2 2...... 2 2 2 + 2

Remarks Because Ω 6= 2 I pooled OLS is not an efficient estimator. The efficient estimator is pooled GLS, which is also the SUR estimator

Efficient Estimation of the RE model Using Ω 1 = Ω 1 2 Ω 1 20 the transformed model is or Ω 1 2 y = Ω 1 2 W δ + Ω 1 2 ε ỹ = W δ + ε [ ε ε 0 ] = I Hence, the GLS (SUR) estimator is the pooled OLS estimator X ˆδ = W 0 W X = =1 X =1 W 0 Ω 1 W W ỹ 0 =1 X =1 W 0 Ω 1 y

Feasible GLE estimation Because 2 and 2 are typically unknown, GLS is not feasible. The feasible GLS estimator is where ˆδ = X =1 ˆ 2 W 0 ˆΩ 1 W X =1 ˆΩ =ˆ 2 1 1 0 +ˆ 2 I 2 and ˆ 2 2 W 0 ˆΩ 1 y Q: How to consistently estimate 2 and 2?

Consistent Estimation of 2 and 2 There are several ways to consistently estimate 2 and 2 The most common estimators are as follows. Estimation of 2 is based on the FE Within estimator from the regression = x 0 β + = x = x = Then ˆ 2 = 1 ( 1) X X =1 =1 b 2 2

Estimation of 2 is bsed on the Between regression Then = w 0 δ + + =1 ( + ) = 2 + 2 = 1 ˆ 2 = 1 ( +1) X =1 X =1 ³ w 0 ˆδ 2 1 ˆ 2 2

Alternative Form of RE Estimator Result: The FGLS estimator ˆδ = X =1 W 0 ˆΩ 1 W X =1 W 0 ˆΩ 1 y can be computed as the pooled OLS estimator from the transformed model and is a consistent estimator of ˆ = (w ˆ w ) 0 δ + = (1 ˆ ) +( ˆ ) =1 q 2 + 2

Remarks: ˆ =0 ˆδ = pooled OLS ˆ =1 ˆδ = ˆβ = ˆβ ˆ 1 as

Linear Panel Example: Hours and Wages (Cameron and Trivedi, ch. 21) Does labor supply respond to wages? For people working, result is ambiguous due to offsetting substitution and income effects Data on = 532 males for each of the =10yearsfrom1979to1988 (5320 total obvs) Simple linear panel regression ln = ln + + is indep over but not

POLS Between Within First Diff RE 7.442 7.483 7.346.083.067.168.109.119 Panel Robust SE( ) (.030) (.024) (.085) (.084) (.051) Default (iid) SE( ) [.009] [.020] [.019] [.021] [.014] 2.015.021.016.008.014.181.161.283.232.233 0 1.585 5320 532 5320 4788 5320 Comments Table 1: Comparison of Panel Data Estimators TheWithinFEestimateof.168ismuchhigherthanPOLSestimateof.083

TheFDFEestimateof.109isalsohigherthanPOLS RE estimate of.119 is in between Between and Within FE POLS default SE of.009 is much smaller than panel robust SE of.030 Apparent efficiency loss (higher SEs) from using FE estimators

Autocorrelations of Pooled OLS Residuals 79 80 81 82 83 84 85 86 87 88 79 1 80 33 1 81 44 4 1 82 3 31 57 1 83 21 23 37 47 1 84 2 23 32 34 64 1 85 24 32 41 35 39 58 1 86 2 19 28 25 31 35 4 1 87 2 32 33 29 31 34 39 35 1 88 16 25 3 26 21 25 34 55 53 1 Autocorrelations ( ) resemble pattern of equi-correlation RE model

Hausman Specification Test for FE vs. RE = x 0 β + + The hypotheses to be tested are 0 : [x ]=0(RE estimation) 1 : [x ] 6= 0(FE estimation) Hausman and Taylor (1981, Ecta) considered a test statistic based on ˆq = ˆβ ˆβ in the context of maximum likelihood estimation under the equi-correlation RE framework.

Intuition: Under 0 both ˆβ and ˆβ are consistent (but ˆβ is efficient) so that ˆq 0 Under 1 ˆβ is not consistent but ˆβ is consistent so that Therefore, consider the test statistic ˆq 9 0 = ˆq ( avar(ˆq)) d 1 ˆq If is big then reject 0 ; otherwise do not reject 0 Q1: What is avar(ˆq)? d Q2: What is the asymptotic distribution of?

Hausman Test Principle In general, consider two estimators ˆθ and θ of θ R such that 0 : ˆθ θ 0 and ³ˆθ θ (0 V) : ˆθ θ 90 The Hausman statistic is ³ˆθ = θ 0 ³ 1 ˆV 1 ³ˆθ θ and under 0 2 ( )

What is? 1 = (ˆθ θ) = (ˆθ)+ ( θ) 2 (ˆθ θ) Result (Hausman, 1978). If ˆθ is efficient under 0 then (ˆθ θ) = (ˆθ) and (ˆθ θ) = ( θ) (ˆθ) Hence, = ³ˆθ θ 0 ³ d ( θ) d (ˆθ) 1 ³ˆθ θ

Comments Hausman test only requires d ( θ) and d (ˆθ) and not (ˆθ θ) It is possible for d ( θ) d (ˆθ) to be negative definite in finite samples It is possible for d ( θ) d (ˆθ) to be less than full rank in finite samples Hausman test can often be computed using an auxiliary regression

Hausman Test for FE vs. RE Under the RE model based on the assumptions iid (0 2 ) iid (0 2 ) The GLS estimator is asymptotically equivalent to the MLE under normality and is efficient. Hence, for ˆq = ˆβ ˆβ = ˆq ³ d (ˆβ ) d (ˆβ ) 1 ˆq 2 ( ) Comment: The statistic is not robust to serial correlation and heteroskedasticity

Hausman Statistic Based on Auxiliary Regression An asymptotically equivalent version of the Hausman test can be computed as the Wald test of γ = 0 in the auxiliary regression ˆ = (w ˆ w ) 0 δ + γ 0 (x x )+ w = (1 x 0 )0 ˆ 2 ˆ = 1 q ˆ 2 +ˆ 2 Why? Because the RE estimator is obtained when γ = 0 is imposed on the auxiliary regression.

Comment Auxiliary regression approach avoids problems of ˆq having non-full rank variance or negative definite variance. Wald test of γ = 0 can be made robust to serial correlation and heteroskedasticity using panel robust covariance estimates

Testing FE vs. RE in Hours and Wages Panel Regression ln = ln + + Using the original form of the Hausman test, we have = (ˆ ˆ ) 2 d (ˆ ) d (ˆ ) ( 168 119)2 = ( 019 2 014 2 ) =14 2 95 (1) = 3 84 so we reject 0 : [ln ]=0at the 5% level. Comment: Test may not be reliable because of serial correlation and heteroskedasticity

The auxiliary regression approach based on the regression ˆ = (w ˆ w ) 0 δ + ( )+ = ln w =(1 ln ) 0 =ln with panel robust standard errors gives the statistic 2 =0 =(1 28)2 =1 65 2 95 (1) = 3 84 so we cannot reject 0 : [ln ]=0at the 5% level.