PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

PANEL DATA RANDOM AND FIXED EFFECTS MODEL Professor Menelaos Karanasos December 2011

PANEL DATA Notation y it is the value of the dependent variable for cross-section unit i at time t where i = 1,..., n and t = 1,..., T is the value of the jth explanatory variable for unit i at time t where j = 1,..., K. That is, there are K explanatory variables Thus the upperscript j denotes an explanatory variable. BALANCED PANELS We will restrict our discussion to estimation with balanced panels. That is we have the same number of observations on each cross-section unit so that the total number of observations is nt When n = 1 and T is large we have the familiar time-series data case Likewise, when T = 1 and n is large we have cross-section data Panel data estimation refers to cases where n > 1 and T > 1 X j it

POOLED ESTIMATOR The most common way of organizing the data is by decision units. Thus, let 2 3 2 3 and y {z} i = T 1 X {z} i = T K 6 4 2 6 4 That is X i is a T K matrix y i1 y i2. y it 7 5, ε i = {z} 6 4 Xi1 1 X 2 T 1 ε i1 ε i2. ε it i1 Xi1 K Xi2 1 Xi2 2 Xi2 K...... XiT 1 XiT 2 XiT K 7 5, 3 7 5,

Often the data are stacked to form 2 3 y 1 y 2 7 y = {z} nt 1 6 4. y n 7 5, {z} X = 6 4 nt K 2 X 1 X 2. X n 3 7 5 {z} ε = 6 4 nt 1 where y and ε are nt 1 vectors, and X is an nt K matrix 2 ε 1 ε 2. ε n 3 7 5,

MATRIX FORM The standard model can be expressed as y {z} nt 1 = {z} X nt K {z} b K 1 + ε {z} nt 1 (1) where {z} b = K 1 2 6 4 The models we will discuss next are all variants of the standard linear model given by equation (1) The simplest estimation method is to assume that ε it iid(0, σ 2 ) for all i and t Estimation of this model is straightforward using OLS By assuming each observation is iid, however, we have essentially ignored the panel structure of the data b 1 b 2. b K 3 7 5

TWO EXTENSIONS Our starting point is the following model: y it = xit 0 {z} {z} b + ε it, 1K K 1 That is, x it is the tth row of the X i matrix or the itth row of the X matrix We go one step further and specify the following error structure for the disturbance term ε it = a i + η it, (2) where we assume that η it is uncorrelated with x it The rst term of the decomposition, a i, is called the individual e ect

ERROR STRUCTURE In this formulation our ignorance has two parts: The rst part varies across individuals or the cross-section unit but is constant across time This part may or may not be correlated with the explanatory variables The second part varies unsystematically (i.e., independently) across time and individuals RANDOM AND FIXED EFFECTS This formulation is the simplest way of capturing the notion that two observations from the same individual(unit) will be more like each other than observations from two di erent individuals A large proportion of empirical applications involve one of the following assumptions about the individual e ect: 1. Random e ects model: a i is uncorrelated with x it 2. Fixed e ects model: a i is correlated with x it

THE RANDOM EFFECTS MODEL It is important to stress that the substantive assumption that distinguishes this model from the xed e ects model is that the time-invariant person-speci c e ect a i is uncorrelated with x it. This orthogonality condition, along with our assumption about η it, is su cient for OLS to be asymptotically unbiased However, there are two problems: 1. OLS will produce consistent estimates of b but the standard errors will be understated. 2. OLS is not e cient compared to a feasible GLS procedure

THE ERROR It will be helpful to be a bit more explicit about the precise nature of the error. Recall that 2 3 2 3 and ε {z} i = T 1 6 4 ε i1 ε i2. ε it 7 5, {z} ε = 6 4 nt 1 ε it = a i + η it, ε 1 ε 2. ε n 7 5,

Thus, we can write η {z} i = T 1 2 6 4 η i1 η i2. η it 3 7 5, {z} η = 6 4 nt 1 2 η 1 η 2. η n 3 7 5. Moreover, we have E (η i ) = 0, E (η i η 0 i ) = σ2 ηi T, E (η) = 0, E (ηη 0 ) = σ 2 ηi nt.

THE a i Further, a {z} i = T 1 2 6 4 a i a i where i is a T 1 vector of ones, and. a i 3 7 5 = a i i, {z} a = 6 4 nt 1 2 a 1 a 2. a n 3 7 5, E (a i = 0), E (a i a j ) = 0, for i 6= j, E (a 2 i ) = σ 2 a, E (a i η jt ) = 0.

THE Σ MATRIX Given these assumptions we can write the error covariance matrix of the disturbance term of each individual cross-section unit: Σ {z} T T = E (ε i εi 0 ) = E (a {z } i ai 0 ) + E (η i η 0 i ) (3) T T = σ 2 a {z} ii 0 + σ 2 ηi T = = T T 2 3 2 3 1 1 1 0 σ 2 6 a 4.... 7. 5 + σ 2 6 η 4.... 7. 5 1 1 0 1 2 σ 2 a + σ 2 η σ 2 a σ 2 3 a σ 2 a σ 2 a + σ 2 η σ 2 a 6 4..... 7. 5. σ 2 a σ 2 a σ 2 a + σ 2 η

THE Ω MATRIX When the data are organized as in equation (1) the covariance of the error term for all the observations in the stacked model can be written as Ω {z} nt nt = I n {z} Σ = E(εε 0 ) = T T 2 3 Σ 0 0 0 Σ 0 6 4..... 7. 5. 0 0 Σ

The block diagonality of Ω makes nding an inverse simpler and we can focus on nding the inverse of Σ. It is straightforward but tedious to show that Σ 1/2 = 1 (1 θ) [I T ii 0 σ η T {z} ], T T s where θ = σ 2 η T σ 2 a + σ 2 η is the unknown quantity that must be estimated. Feasible GLS estimators require that we get estimates of the variances σ 2 a and σ 2 η in θ. (4)

RANDOM EFFECTS AS A COMBINATION OF WITHIN AND BETWEEN ESTIMATORS We consider two estimators that are consistent but not e cient relative to GLS The rst one is quite intuitive: convert all the data into speci c averages and perform OLS on this collapsed data set. In particular, y {z} i = T 1 2 6 4 y i1 y i2. y it 3 T 7 5 ) y i = {z} y it /T, t=1 11 so that we have 2 one3 observation for each section i. Notice that y i y i y i = y i i = 6 7 4. 5. y i

Moreover, X i {z} T K x i 0 {z} 1K = = 2 6 4 h Xi1 1 X 2 X 1 i2 X 2. XiT 1 X 2 i1 Xi1 K i2 Xi2 K..... it XiT K Xi 1 Xi 2 Xi K i, 3 7 5 ) where X j T i = X j it /T. Recall that we also denote the tth row of the X i t=1 matrix by xit 0 = [ X it 1 Xit 2 Xit K ]. Speci cally we perform OLS on the following equation y i {z} 11 0 = x {z} i 1K {z} b + error, i = 1,..., n (5) K 1

Thus, the between-group estimator can be expressed as β B = " n # 1 n (x i x)(x i x) 0 i=1 i=1 (x i x)(y i y), where x = 1 nt n i=1 T j=1 x it and, y = 1 nt n i=1 T j=1 y it. The estimator β B is called the between-group estimator because it ignores variation within the group.

DUMMY VARIABLES To put this expression in matrix terms, stack the data as before and de ne a new nt n matrix D, which is merely the matrix of n dummy variables corresponding to each cross-section unit. That is D {z} nt n = D1 {z} nt 1 D 2 {z} nt 1 D n {z} nt 1 where D i = 1 for observation (i 1)T + 1,..., it, and zero otherwise Next de ne P {z} D = D(D 0 D) 1 D 0 a symmetric and idempotent matrix nt nt that is: PD 0 = P D and P 2 D = P D Premultiplying by this matrix transforms the data into the means described in equation (5),

PREDICTED VALUES (MEANS) This is because the predicted value of y from a regression on nothing {z} nt 1 2 3 y 1 i 6 but the individual dummies is merely P D y = 4. y n i where i is the Tx1 unit vector. (See Matrices Appendix C) In other words, we run the following OLS regression: P D y = P D Xb+error 7 5 = by The β estimated this way is called the between estimator and is given by since P 0 D = P D and P 2 D = P D. β B = (X 0 P 0 D P D X) 1 X 0 P 0 D P D y = (X 0 P 2 D X) 1 X 0 P 2 D y = (X 0 P D X) 1 X 0 P D y

2SLS ESTIMATION The between estimator is consistent (though not e cient) when OLS on the pooled sample is consistent. This estimator corresponds to a 2SLS, using the person dummies as instruments. That is regress X on D and get the predicted values: bx {z} nt K = {z} D γ {z} (see Matrices Appendix C). Then regress y on bx: nt n nk β B = (bx 0 bx) 1 bx 0 y = D(D 0 D) 1 D 0 X = P {z } D X. γ = [(P D X) 0 P D X] 1 (P D X) 0 y = (X 0 P 0 D P D X) 1 X 0 P 0 D y = (X 0 P D X) 1 X 0 P D y.

WITHIN ESTIMATOR We can also use the information thrown away by the between estimator. P D z } { De ne M D = I nt D(D 0 D) 1 D 0, which is also a symmetric idempotent matrix. If we premultiply the data by M D and compute OLS on the transformed data we can derive the following within estimator β W = [(M D X) 0 D MX] 1 (M D X) 0 M D y = (X 0 M 0 D M D X) 1 X 0 M 0 D M D y = (X 0 M D X) 1 X 0 M D y. (6) A. The matrix M D can be interpreted as a residual-maker matrix. Premultiplying by this matrix transforms the data into residuals from auxiliary regressions of all the variables on a complete set of individual speci c constants (see also Matrices Appendix B):

RESIDUALS (DEVIATIONS FROM PERSON-SPECIFIC MEANS) i) We regress y and X on D and we save the residuals ey = M D y and ex = M D X: ey = y by = y Dγ = y D(D 0 D) 1 D 0 y {z } γ = [I nt = M D y. P D z } { D(D 0 D) 1 D 0 ]y ii) We regress ey = M D y on ex = M D X. Since the predicted value from regressing y on D, that is by = Dγ = D(D 0 D) 1 D 0 y, is merely the individual speci c mean (see the analysis above), the residuals ey = y by are merely deviations from person-speci c means.

Similarly, the residuals P D z } { X bx = X P D X = [I nt D(D 0 D) 1 D 0 ]X =ex = M D X are merely deviations from person-speci c means. B. Speci cally, equation (6) is equivalent to performing OLS on the following equation: y it y i = (x it x i ) 0 {z } {z} b + error, i = 1,..., N, t = 1,..., T. 1K K 1 If the assumptions underlying the random e ects model are correct, the within estimator is also a consistent estimator, but it is not e cient. This defect is clear since we have included n unnecessary extra variables.

Since y i = x i 0 b+a i, we can estimate a i as α i = y i x i 0 β.

C. The within estimator can also be expressed as β W = " n # 1 T n (x it x i )(x it x i ) 0 T i=1 t=1 i=1 t=1 (x it x i )(y it y i ). Its variance-covariance matrix is given by " n V (β W ) = σ 2 T η i=1 t=1 (x it x i )(x it x i ) 0 # 1.

D. The within estimator is merely an estimator that would result from running OLS on the data including a set of dummy variables where a 0 = [a 1 a 2 a n ]. Then y = Xb + Da + η, β W = (X 0 M D X) 1 X 0 M D y. (see Matrices Appendix B). It is called the within estimator because it uses only the variation within each cross-section unit. Its variance covariance matrix is given by V (β W ) = σ 2 η(x 0 M D X) 1

E. Alternatively, we can write y i {z} T 1 = X i {z} TxK b+a i {z} i +η i, i = 1,..., n Tx1 We premultiply the above ith equation by a TxT idempotent transformation matrix Q = I T 1 T ii0, to swipe out the individual e ect a i so that individual observations are measured as deviations from individual means (over time): Qy i = QX i b + Qia i + Qη i = QX i b + Qη i,

Applying the OLS procedure to the above equation, we have β W = " N # 1 Xi 0 QX i i=1 N i=1 X i Qy i. Its variance-covariance matrix is given by V (β W ) = σ 2 η " N i=1 X 0 i QX i # 1.

Notice that the pooled OLS estimate is just a weighted sum of the between and within estimators: β =(X 0 X) 1 X 0 y. Since M D = I nt D(D 0 D) 1 D 0 = I nt P D ) I nt = M D + P D, we have β = 8 9 I >< nt z } { >= (X 0 X) 1 >: X0 (M D + P D )y >; = (X 0 X) 1 X 0 M D y + X 0 P D y.

Next premultiplying X 0 M D y by I K = (X 0 M D X)(X 0 M D X) 1 and X 0 P D y by I K = (X 0 P D X)(X 0 P D X) 1 gives z } { β = (X 0 X) 1 f(x 0 M D X) (X 0 M D X) 1 X 0 M D y β W +(X 0 P D X)(X 0 P D X) 1 X 0 P D yg {z } β B = (X 0 X) 1 f(x 0 M D X)β W + (X 0 P D X)β B. Notice that the sum of the two weights is equal to I K : (X 0 X) 1 f(x 0 M D X) + (X 0 P D X)g = (X 0 X) 1 fx 0 (M D + P D )Xg = (X 0 X) 1 X 0 X = I K.

In the random e ects case, OLS on the pooled data fails to use information about the heteroscedasticity that results from using repeated observations of the same cross-section units. The problem with the pooled estimator is that it weights all observations equally. This treatment is not generally optimal because an additional observation on a person already in the data set is unlikely to add as much information as an additional observation from a new (independent) individual.

Next we have to compute the necessary quantities for a feasible GLS. Standard ANOVA suggests the following estimators bσ 2 η = bε 0 W bε W nt n K, bσ 2 B = bε0 Bbε B n K, bσ 2 a = bσ 2 B bσ 2 η T, where bε W are the residuals from the within regression and bε B are the residuals from the between regression. These can then be used to construct bθ. These estimators are asymptotically unbiased estimates of the relevant variances.

THE RANDOM EFFECT ESTIMATOR A simple procedure to compute the random e ect estimator is as follows: 1. Compute the between and within estimators 2. Use the residuals to calculate the appropriate variance terms 3. Calculate bθ 4. Run OLS on the following transformed variables y andx yit = y it (1 bθ)y i, (7) xit = x it (1 bθ)x i. (8) The transformation is intuitively appealing. When there is no uncorrelated person-speci c component of variance (σ 2 a = 0), θ = 1, and the random e ects estimator reduces to the pooled OLS estimator. When bθ = 0 the random e ects estimator reduces to the within estimator.

To get e cient estimates of the parameters we have to use the GLS method β GLS = " N # 1 Xi 0 Σ 1 X i i=1 N i=1 X i Σ 1 y i. where Σ is given by (3).

We can also obtain the following β GLS = " 1 T " 1 T N Xi 0 QX i + θ 2 i=1 # 1 n (x i x)(x i x) 0 i=1 N Xi 0 Qy i + θ 2 i=1 = β B + (I K )β W, (9) # n (x i x)(y i y) 0 i=1

where θ 2 has been de ned in (4) and = θ 2 T " n " N Xi 0 QX i + θ 2 T i=1 # (x i x)(x i x) 0. i=1 n i=1 (x i x)(x i x) 0 # 1 The GLS estimator (9) is a weighted average of the between-group and the within-group estimators. If θ! 0, then! 0, and the GLS becomes the within estimator. If θ! 1, then the GLS becomes the pooled OLS estimator.

Moreover, the variance covariance matrix of β GLS is V (β GLS ) = σ 2 η " N i=1 X 0 i QX i + T θ 2 # 1 n (x i x)(x i x) 0. i=1

THE FIXED EFFECTS MODEL The xed e ects model starts with the presumption that Cov(X it, a i ) 6= 0. Thus we must estimate the model conditionally on the presence of the xed e ects. That is, we must rewrite the model as y it = x it b+a i + η i, i = 1,..., n (10) where the a i are treated as unknown parameters to be estimated. In the typical case, T is small and n is large.

Although we can not estimate a i consistently, we can estimate the remaining parameters consistently. To do so we need only run the regression y = Xb + Da + η (11) where a 0 =[a 1 a 2 a n ], D = I n i T as before, is a set of n dummy variables.

The above equation is just the same as running a regression of each of our variables y and X on this set of dummies and then running the regression of the y residuals on the X residuals. The matrix that produces such residuals is the familiar M D = I nt D(D 0 D) 1 D 0 = I nt P D. We can run OLS on the transformed variables ey= M D y and ex = M D X to get β W = (X 0 M D X) 1 X 0 M D y. This is merely the within estimator we derived before. The within estimator is only one possible xed e ects estimator. Any transformation that rids us of the xed e ect will produce a xed e ect estimator.

Deviations from means purges the data of the xed e ects by removing means of these variables across individual cross-section units. That is, the predicted value of y, which belongs to group i, is just the mean of that group: y i = x i b+a i + η i (12) We can di erence Eqs (10) and (12) to yield y it y i = (x it x i )b + (η i η i ) In many applications, the easiest way to implement a xed e ects estimator with conventional software is to include a di erent dummy variable for each individual unit. This method is called the least-squares dummy variable (LSDV) method as in Eq. (11).

If n is very large, however, it may be computationally prohibitive to compute coe cients for each cross-section unit. In that case another way to implement a xed e ect estimator is as follows: Transform all the variables by subtracting person-speci c means Run OLS on the transformed variables Moreover, the standard errors need to be corrected. The correct standard errors are V (β W ) = σ 2 η(x 0 M D X) 1. This result is almost exactly the same output one would get from the two-step procedure de ned earlier.

The correct way to estimate σ 2 η is bσ 2 η = bε0 W bε W nt n K, The fact that the xed e ects estimator can be interpreted as a simple OLS regression of means-di erence variables explains why this estimator is often called a within group estimator. That is, it uses only the variation within an individual s set of observations.

A WU-HAUSMAN TEST The random and xed e ects estimators have di erent properties depending on the correlation between a i and the regressors. Speci cally, 1. If the e ects are uncorrelated with the explanatory variables, the random e ects estimator is consistent and e cient. The xed e ects estimator is consistent but not e cient. 2. If the e ects are correlated with the explanatory variables, the xed e ects estimator is consistent and e cient but the random e ects estimator is now inconsistent. One can perform a Hausman test, de ned as H = (β RE β FE ) 0 [V (β RE ) V (β FE )] 1 (β RE β FE ). The Hausman test statistic will be distributed asymptotically as χ 2 with K degrees of freedom under the null hypothesis that the random e ects estimator is correct.

An alternative method is to perform a simple auxiliary regression. Let yit and x it be the data transformed for the random e ects model as in Eqs. (7) and (8). Recall that the ex variables are transformed for xed-e ects regression. The Hausman test can be computed by means of a simple F test on d in the following auxiliary regression: y = X b + exd + error. The hypothesis being tested is whether the omission of xed e ects in the random e ects model has any e ect on the consistency of the random e ects estimates.

MATRICES APPENDIX A A matrix may be partitioned into as set of submatrices by indicating subgroups of rows and/or columns. For example A11 A A = 12 A 21 A 22 The inverse of a partitioned matrix may also be expressed in partitioned form: A 1 B = 11 B 11 A 12 A22 1 A22 1 A 21B 11 A22 1 + A 22 1 A 21B 11 A 12 A22 1, (13) where B 11 = (A 11 A 12 A22 1 A 21) 1 or, alternatively, A 1 A 1 = 11 + A 11 1 A 12B 22 A 21 A11 1 A 1 B 22 A 21 A11 1 B 22 11 A 12B 22, where B 22 = (A 22 A 21 A 1 11 A 12) 1.

Consider partition the matrix X as X = X 1 X 2. Then 2 3 A 11 A 12 z } { z } { X 0 X 0 X = 1 X 0 X2 0 X1 X 2 = 6 1X 1 X1X 0 2 4 X2X 0 0 7 {z } 1 X {z 2X } 2 5. A 22 A 22 The inverse of the above matrix is given by equation (13) where where B 11 = (X 0 1X 1 X 0 1X 2 (X 0 2X 2 ) 1 X 0 2X 1 ) 1 = (X 0 1IX 1 X 0 1X 2 (X 0 2X 2 ) 1 X 0 2X 1 ) 1 = (X 0 1M 2 X 1 ) 1, M 2 = I X 2 (X 0 2X 2 ) 1 X 0 2, that is, M 2 is a symmetric idempotent matrix. Similarly, B 22 = (X 0 2M 1 X 2 ) 1, where M 1 = I X 1 (X 0 1X 1 ) 1 X 0 1.

Moreover, B 11 A {z} 12 A22 1 is given by {z} X1 0 X 2 (X2 0 X 2) 1 B 11 X 0 1X 2 (X 0 2X 2 ) 1. In addition, X 0 y in partitioned form is given by X 0 X 0 y = 1 X 0 y = 1 y X2 0 y X 0 2

Thus the OLS estimates from the regression of y on X in partitioned form are given by 2 3 A 11 A 1 12 z } { z } { β1 X β = = 0 6 1X 1 X 0 1X 2 X 0 β 2 4 X2X 0 0 7 1 y {z } 1 X {z 2X } 2 5 X2 0 y. A 22 A 22 Taking the rst row we have Similarly, we can show that X1 0 X (X 0 2 2 X 2) 1 z} { z} { β 1 = B 11 X1y 0 B 11 A 12 A22 1 X2y 0 = B 11 X1y 0 B 11 X1X 0 2 (X 0 2 X 2) 1 X2y 0 = B {z} 11 X1(I 0 X 2 (X2 X 2) 1 X2) 0 y {z } (X1 0 M 2X 1 ) 1 = (X 0 1M 2 X 1 ) 1 X 0 1M 2 y. 0 M 2 β 2 = (X 0 2 M 1X 2 ) 1 X 0 2M 1 y.

Rewrite β 1 = (X 0 1M 2 X 1 ) 1 X 0 1M 2 y. Regressing y and X 1 on X 2 yields a vector of residuals, M 2 y, and a matrix of residuals, M 2 X 1. Regressing the former on the latter gives the β 1 coe cient vector.

MATRICES APPENDIX B M i = I X i (X 0 i X i ) 1 X 0 i, i = 1, 2. Premultiplication of any vector by M i gives the residuals from the regression of that vector on X i. Consider the following regression The estimator of b is given by The predicted value of y is given by y = X i b + ε. β = (X 0 i X i ) 1 X 0 i y = y. by = X i β = X i (X 0 i X i ) 1 X 0 i y = P i y. The vector of the residuals is given by ε = y by = y X i (X 0 i X i ) 1 X 0 i y = [I X i (X 0 i X i ) 1 X 0 i ]y = [I P i ]y = M i y.

MATRICES APPENDIX C Assume for simplicity and without loss of generality that we have only two sections. That is n = 2. De ne the 2T 2 matrix D, which is merely the matrix of 2 dummy variables corresponding to each cross-section unit. In a partition form we have 2T 2 z} { D = 22T z} { D 0 = where i is a T 1 unit vector. 2 6 4 2 6 4 T 1 z} { i 0 {z} T 1 1T z} { i 0 0 0 {z} 1T T 1 z} { 0 i {z} T 1 1T z} { 0 0 i 0 {z} 1T 3 7 5 or 3 7 5

Then 22 z} { D 0 D = i 0 i 0 0 i 0 i = T 0 0 T = (D 0 D) 1 = (1/T )I. = TI ) Moreover, 2T 2T 2 z} { DD 0 = 4 T T z} { ii 0 T T z} { 0 0 ii 0 where ii 0 is a T T matrix with unit elements. 3 5,

Thus D(D 0 D) 1 D 0 y = 1 T DD0 y 2 3 = 1 T T T T z} { z} { 4 ii 0 0 5 y1 T 0 ii 0 y 2 2 3 = 1 T 6 4 T t=1 y 1t i T y 2t i t=1 7 5 = y1 i y 2 i

In general if we have n cross-sections then D is an nt n matrix 2 3 T 1 z} { It can be shown that nt n z} { D = D {z} nt n 6 4 T 1 z} { 0 i 0 0 i 0...... 0 0 0 i (D 0 D) 1 D 0 y = {z} nt 1 2 6 4 y 1 i y 2 i. y n i. 7 5 3 7 5 = by.

Similarly, where D {z} nt n h X i = (D 0 D) 1 D 0 {z} X = nt k 2 6 4 X 1 X 2. X n 3 7 5 = bx, Xi 1 i Xi 2 i Xi K i i, with X j i = T t=1 X j it /T.