Modelling Multi-dimensional Panel Data: A Random Effects Approach

Size: px

Start display at page:

Download "Modelling Multi-dimensional Panel Data: A Random Effects Approach"

Muriel Stokes
6 years ago
Views:

1 Modelling Multi-dimensional Panel Data: A Random Effects Approach Laszlo Balazsi Badi H. Baltagi Laszlo Matyas Daria Pus Preliminary version, please do not quote February 5, 06 Abstract The last two decades or so, the use of panel data has become a standard in many areas of economic analysis. The available model formulations became more complex, the estimation and hypothesis testing methods more sophisticated. The interaction between economics and econometrics resulted in a huge publication output, deepening and widening immensely our knowledge and understanding in both. Traditional panel data, by nature, are two-dimensional. Lately, however, as part of the big data revolution, there has been a rapid emergence of three, four and even higher dimensional panel data sets. These have started to be used to study the flow of goods, capital, and services, but also some other economic phenomena that can be better understood in higher dimensions. Oddly, applications rushed ahead of theory in this field. This paper aims at partly bridging this gap, by collecting and analysing the empirically most relevant three-way error components models, some already presented in the literature, some new. First, optimal (F)GLS estimators are presented for the textbook-style complete data case, paying special attention to asymptotics, then these estimators are generalized to address incomplete panels as well. Extensions are also worked out, in particular for mixed Fixed-Random Effects models.it is shown how to extend these procedures to higher dimensions, and hypothesis testing is also briefly discussed. Presenting author, Central European University, Department of Economics, Budapest 05, Nador u., Hungary; Balazsi laszlo@phd.ceu.edu Syracuse University Central European University University of Texas at Austin

2 Key words: panel data, multidimensional panel data, incomplete data, random effects, mixed effects, error components model, trade model, gravity model. JEL classification: C, C, C4, F7, F47.

3 Introduction The disturbances of an econometric model in principle include all factors influencing the behaviour of the dependent variable, which cannot be explicitly specified. In statistical sense this means all terms about which we do not have enough information. In this paper we deal with multi-dimensional panel data models when the individual and/or time specific factors, and the possible interaction effects between them, are considered as unobserved heterogeneity, and as such are represented by random variables, and are part of the composite disturbance terms. From a more practical point of view, unlike the fixed effects approach, this random effects approach has the advantage that the number of parameters to take into account does not increase with the sample size. It also makes possible the identification of parameters associated with some time and/or individual invariant variables (see Hornok (0)). Historically multi-dimensional random effects (or error components) models can be traced back to the variance component analysis literature (see Rao and Kleffe (980), or the seminal results of Laird and Ware (98) or Leeuw and Kreft (986)) and are related to the multi-level models, well known in statistics (see, for example, Scott et al. (03), Luke (004), Goldstein (995), and Bryk and Raudenbush (99)). We, however, assume fixed slope parameters for the regressors (rather than a composition of fixed and random elements), and zero means for the random components. This paper follows the spirit of the analysis of the two-way panels by Baltagi et al. (008), that is, in Section we introduce the most frequently used models in a three-dimensional (3D) panel data setup, Section deals with the Feasible GLS estimation of these models, while Section 3 analyses the behaviour of this estimator for incomplete/unbalanced data. Section 4 generalizes the presented models to four and higher dimensional data sets, and extends the random effects approach toward a mixed effects framework. Finally, Section 5 deals with some testing issues and Section 6 concludes. 3

4 Different Model Specifications In this section we present the most relevant three-dimensional model formulations, paying special attention to the different interaction effects. The models we encounter have empirical relevance, and correspond to some fixed effects model formulations known from the literature (see, for example, Baltagi et al. (003), Egger and Pfaffermayr (003), Baldwin and Taglioni (006), and Baier and Bergstrand (007)). The general form of these random effects (or error components) models can be casted as y = Xβ + u, () where y and X are respectively the vector and matrix of observations of the dependent and explanatory variables, β is the vector of unknown (slope) parameters, and we want to exploit the structure embedded in the random disturbance terms u. As it is well known from the Gauss-Markov theorem, the General Least Squares (GLS) estimator is BLUE for β. To make it operational, in principle, we have to perform three steps. First, using the specific structure of u, we have to derive the variance-covariance matrix of model (), E(uu ) = Ω, then, preferably using spectral decomposition, we have to derive its inverse. This is important, as multi-dimensional data often tend to be very large, leading to some Ω-s of extreme order. And finally, we need to estimate the unknown variance components of Ω to arrive to the well known Feasible GLS (FGLS) formulation.. Various Heterogeneity Formulations The most general model formulation in a three-dimensional setup encompassing all pairwise random effects is y ijt = x ijtβ + µ ij + υ it + ζ jt + ε ijt, () where i =... N, j =... N, and t =... T. Note, that y ijt, x ijt, and u ijt = µ ij + v it + ζ jt + ε ijt are an element of the (N N T ), (N N T K), and (N N T ) sized vectors and matrix y, X, and u respectively, of the 4

5 general formulation (), and β is the (K ) vector of parameters. We assume the random effects to be pairwise uncorrelated, E(µ ij ) = 0, E(υ it ) = 0, E(ζ jt ) = 0, and further, E(µ ij µ i j ) = σµ i = i and j = j 0 otherwise E(υ it υ i t ) = συ i = i and t = t 0 otherwise E(ζ jt ζ j t ) = σζ j = j and t = t 0 otherwise The covariance matrix of such error components structure is simply Ω = E(uu ) = σµ(i N N J T )+συ(i N J N I T )+σζ(j N I N T )+I N N T, (3) where I N and J N are the identity, and the square matrix of ones respectively, with the size in the index. All other relevant model specifications are obtained by applying some restrictions on the random effects structure, that is all covariance structures are nested into that of model (). The model which only uses individualtime-varying effects reads as y ijt = x ijtβ + υ it + ζ jt + ε ijt, (4) together with the appropriate assumptions listed for model (). Now Ω = συ(i N J N I T ) + σζ(j N I N I T ) + I N N T. (5) A further restriction on the above model is y ijt = x ijtβ + ζ jt + ε ijt, (6) which in fact is a generalization of the approach used in multi-level modeling, 5

6 see for example, Ebbes et al. (004) or Hubler (006). The covariance matrix now is Ω = σζ(j N I N T ) + I N N T. (7) Another restrictions of model () is to leave in the pair-wise random effects, and restrict the individual-time-varying terms. Specifically, model y ijt = x ijtβ + µ ij + λ t + ε ijt (8) incorporates both time and individual-pair random effects. We assume, as before, that E(λ t ) = 0, and that σ E(λ t λ λ t = t t) = 0 otherwise Now Ω = σ µ(i N N J T ) + σ λ(j N N I T ) + σ εi N N T. (9) A restriction of the above model, when we assume, that µ ij = υ i + ζ j is y ijt = x ijtβ + υ i + ζ j + λ t + ε ijt (0) with the usual assumptions E(υ i ) = E(ζ j ) = E(λ t ) = 0, and σ υ i = i E(υ i υ i ) = 0 otherwise σ ζ j = j E(ζ j ζ j ) = 0 otherwise σ λ t = t E(λ t λ t ) = 0 otherwise The symmetric counterpart of model (6), with υ it random effects, could also be listed here, however, as it has the exact same properties as model (6), we take the two models together. This model has in fact been introduced in Matyas (998), and before that, in Ghosh (976). 6

7 Its covariance structure is Ω = σ υ(i N J N T ) + σ ζ(j N I N J T ) + σ λ(j N N I T ) + σ εi N N T. () Lastly, the simplest model is y ijt = x ijtβ + µ ij + ε ijt () with Ω = σ µ(i N N J T ) + σ εi N N T. (3) Note, that model () can be considered in fact as a straight panel data model, where the individuals are now the (ij) pairs (so essentially it does not take into account the three-dimensional nature of the data).. Spectral Decomposition of the Covariance Matrices To estimate the above models, the inverse of Ω is needed, a matrix of size (N N T N N T ). For even moderately large samples, this can be practically unfeasible without further elaboration. The common practice is to use the spectral decomposition of Ω, which in turn gives the inverse as a function of fairly standard matrices (see Wansbeek and Kapteyn (98)). We derive the algebra for model (), Ω for all other models can de derived likewise, so we only present the final results. First, consider a simple rewriting of the identity matrix I N = Q N + J N, where Q N = I N J N, with J N = N J N. Now Ω becomes Ω = T σµ((q N + J N ) (Q N + J N ) J T ) +N συ((q N + J N ) J N (Q T + J T )) +N σζ ( J N (Q N + J N ) Q T ) +((Q N + J N ) (Q N + J N ) (Q T + J T )). 7

8 If we unfold the brackets, the terms we get are in fact the between-group variations of each possible groups in three-dimensional data. For example, the building block B ij. = (Q N Q N J T ) captures the variation between i and j. All other B matrices are defined in a similar manner: the indices in the subscript indicate the variation with respect to which it is captured. The two extremes, B ijt and B... are thus B ijt = (Q N Q N Q T ) and B... = ( J N J N J T ). Notice, that the covariance matrix of all three-way error components model can be represented by these B building blocks. For model (), this means Ω = σ εb ijt + (σ ε + T σ µ)b ij. + (σ ε + N σ υ)b i.t + (σ ε + N σ ζ )B.jt +(σ ε + T σ µ + N σ υ)b i.. + (σ ε + T σ µ + N σ ζ )B.j. +(σ ε + N σ υ + N σ ζ )B..t + (σ ε + T σ µ + N σ υ + N σ ζ )B.... (4) Also notice, that all B matrices are idempotent and mutually orthogonal by construction (as Q N JN = 0, likewise with N and T ), so Ω = B ijt + B +T σµ ij. + +N B συ i.t + B +N σζ.jt + B +T σ µ +N συ i.. + B +T σ µ +N σζ.j. + B +N συ +N σζ..t + B +T σ µ +N συ +N σζ.... This means that we can get the inverse of a covariance matrix at virtually no computational cost, as a function of some standard B matrices. After some simplification, we get Ω = I N N T ( θ )( J N I N T ) ( θ )(I N J N I T ) ( θ 3 )(I N N J T ) + ( θ θ + θ 4 )( J N N I T ) +( θ θ 3 + θ 5 )( J N I N J T ) +( θ θ 3 + θ 6 )(I N J N T ) ( θ θ θ 3 + θ 4 + θ 5 + θ 6 θ 7 ) J N N T, (5) 8

9 with θ = σ ε θ 4 = θ 6 = +N σζ +N συ+n σζ, θ = σ ε +N συ, θ 5 = +T σµ+n, and θ συ 7 =, θ 3 = σ ε σ ε σ ε+t σ µ+n σ ζ σ ε+t σ µ, +T σµ+n συ+n. σζ The good thing is that we can fully get rid of the matrix notations, following Fuller and Battese (973), as σ εω / y can be written up in scalar form as well. This transformation can be represented with its typical element ỹ ijt = y ijt ( θ )ȳ.jt ( θ )ȳ i.t ( θ 3 )ȳ ij. +( θ θ + θ 4 )ȳ..t +( θ θ 3 + θ 5 )ȳ.j. + ( θ θ 3 + θ 6 )ȳ i.. ( θ θ θ 3 + θ 4 + θ 5 + θ 6 θ 7 )ȳ..., where, following the standard ANOVA notation, a bar over the variable means that mean of the variable was taken with respect to the missing indices. By using the OLS on these transformed variables, we get back the GLS estimator. For other models, the job is essentially the same. For model (4), σ εω = I N N T (I N J N I T ) ( J N I N T ) + ( J N N I T ) + σ ε N (( J σζ +σ N I N T ) ( J N N I T )) ε + σ ε N ((I συ+σ ε N J N I T )) ( J N N I T )) + N συ+n ( J σζ +σ N N I T ), ε and so σ εω / y in scalar form reads as, with a typical ỹ ijt element, with ỹ ijt = y ijt ( θ 8 )ȳ i.t ( θ 9 )ȳ.jt + ( θ 8 θ 9 + θ 0 )ȳ..t, θ 8 = σ ε N σ υ + σ ε, θ 9 = σ ε N σ ζ + σ ε, θ 0 = σ ε N σ υ + N σ ζ + σ ε. 9

10 For model (6), the inverse of the covariance matrix is even simpler, Ω = I N N T ( J N I N T ) + ( + N σ J ζ N I N T ), so σ εω / y defines the scalar transformation σ ε For Model (8), it is ỹ ijt = y ijt ( θ )ȳ.jt, with θ = σ ε σ ε + N σ ζ. σ εω = I N N T (I N N J T ) ( J N N I T ) + J N N T + σ ε ((I +T σ N N J T ) J N N T ) µ + +N N (( J σλ N N I T ) J N N T ) + σ ε σ ε+t σ µ+n N σ λ J N N T, so σ εω / y in scalar form is ỹ ijt = y ijt ( θ )ȳ ij. ( θ 3 )ȳ..t + ( θ θ 3 + θ 4 )ȳ..., with θ = σ ε, θ + T σµ 3 = σ ε, θ + N N σλ 4 = σ ε. + T σµ + N N σλ The spectral decomposition of model (0), which was in fact proposed by Baltagi (987), is Ω = I N N T ( J N N I T ) ( J N I N J T ) (I N J N T ) + J N N T + N ((I T συ +σ N J N T ) J N N T ) ε + σ ε N (( J T σζ +σ N I N J T ) J N N T ) ε + N N (( J σλ +σ N N I T ) J N N T ) ε + J N N T. N T σ υ+n T σ ζ +N N σ λ +σ ε 0

11 With the covariance matrix in hand, σ εω / y translates into ỹ ijt = y ijt ( θ 5 )ȳ i.. ( θ 6 )ȳ.j. ( θ 7 )ȳ..t +( θ 5 θ 6 θ 7 + θ 8 )ȳ..., where θ 5 = N, θ T συ+σ ε 6 = θ 8 = For model (), the inversion gives N, θ T σζ 7 = +σ ε N T συ +N T σζ +N N. σλ +σ ε N N, and σλ +σ ε Ω = I N N T (I N N J T ) + (I T σµ + N N J T ), and so Ω / y can be written up in scalar form, represented by a typical element σ ε ỹ ijt = y ijt ( θ 9 )ȳ ij., with θ 9 = σ ε T σ µ + σ ε. Table summarizes the key elements in each models inverse covariance matrix in the finite case. Table : Structure of the Ω matrices Model () (4) (6) (8) (0) () I N N T (I N N J T ) (I N J N I T ) + + ( J N I N T ) (I N J N T ) + + ( J N I N J T ) + + ( J N N I T ) J N N T + + +

12 A + sign in a column says which building element is part of the given model s Ω. If the + -s in the column a given model A cover that of another model B s means that model B is nested into model A. It can be seen, for example, that all models are in fact nested into (), or that model () is nested into model (8). When the number of observations grow in one or more dimensions, it can be interesting to find the limits of the θ k weights. It is easy to see, that if all N, N, and T, all θ k, (k =,..., 9) are in fact going to zero. That is, if the data grows in all directions, the GLS estimator (and in turn the FGLS) is identical to the Within Estimator. Hence, for example for model (), in the limit, σ εω is simply given by lim N,N,T σ εω = I N N T ( J N I N T ) (I N J N I T ) (I N N J T ) + ( J N N I T ) + ( J N I N J T ) +(I N J N T ) J N N T, which is the covariance matrix of the Within estimator. Table collects the asymptotic conditions, when the models (F)GLS estimator is converging to a Within estimator. Table : Asymptotic conditions when the models FGLS converges to a Within estimator Model Condition () N, N, T (4) N, N (6) N (8) (N, T ) or (N, T ) (0) (N, N ) or (N, T ) or (N, T ) () T FGLS Estimation To make the FGLS estimator operational, we need estimators for the variance components. Let us start again with model (), for the other models, the job is essentially the same. Using the assumptions that the error components are

13 pairwise uncorrelated, E[u ijt] = E [(µ ij + υ it + ζ jt + ε ijt ) ] = E[µ ij] + E[υ it] + E[ζ jt] + E[ε ijt] = σ µ + σ υ + σ ζ + σ ε. By introducing different Within transformations and so projecting the error components into different subspaces of the original three-dimensional space, we can derive further identifying equations. The appropriate Within transformation for model () (see for details Balazsi et al. (05)) is ũ ijt = u ijt ū.jt ū i.t ū ij. + ū..t + ū.j. + ū i.. ū.... (6) Note, that this transformation corresponds to the projection matrix M = I N N T (I N N J T ) (I N J N I T ) ( J N I N T ) +(I N J N T ) + ( J N I N J T ) + ( J N N I T ) J N N T, and u has to be pre-multiplied with it. Transforming u ijt according to this wipes out µ ij, υ it, ζ jt, and gives, with i =... N, and j =... N, E[ũ ijt] = E[ ε ijt] = E[(ε ijt ε.jt ε i.t ε ij. + ε..t + ε.j. + ε i.. ε... ) ] = (N )(N )(T ) N N T σ ε, where (N )(N )(T ) N N is the rank/order ratio of M, likewise for all other T subsequent transformations. Further, transforming u ijt according to ũ a ijt = u ijt ū.jt ū i.t + ū..t, or with the underlying matrix M a = I N N T ( J N I N T ) (I N J N I T ) + ( J N N I T ) eliminates υ it + ζ jt, and gives E[(ũ a ijt) ] = E[( µ a ij + ε a ijt) ] = E[( µ a ij) ] + E[( ε a ijt) ] = (N )(N ) N N (σ µ + σ ε). 3

14 Transforming according to ũ b ijt = u ijt ū ij. ū.jt + ū.j., or M b = I N N T (I N N J T ) ( J N I N T ) + ( J N I N J T ) eliminates µ ij + ζ jt, and gives E[(ũ b ijt) ] = E[( υ it+ ε b b ijt) ] = E[( υ it) b ]+E[( ε b ijt) ] = (N )(T ) (σ N T υ+). Finally, using ũ c ijt = u ijt ū ij. ū i.t + ū i.., or M c = I N N T (I N N J T ) (I N J N I T ) + (I N J N T ) wipes µ ij and υ it out, and gives E[(ũ c ijt) ] = E[( ζ c jt + ε c ijt) ] = E[( ζ c jt) ] + E[( ε c ijt) ] = (N )(T ) N T (σ ζ + σ ε). Putting the four identifying equations together gives a solvable system of four equations. Let û ijt be the residual from the OLS estimation of y = Xβ + u. With this notation, the estimators for the variance components are ˆσ ε = (N )(N )(T ) ˆσ µ = (N )(N )T ˆσ υ = (N )N (T ) ˆσ ζ = N (N )(T ) ijt û ijt ijt ( û a ijt) ˆσ ε ijt ( û b ijt) ˆσ ε ijt ( û c ijt) ˆσ ε. where, obviously, û ijt, û a ijt, û b ijt, and û c ijt are the transformed residuals according to M, M a, M b, and M c respectively. Note, however, that the FGLS estimator of model () is only consistent if the data grows in at least two dimensions, that is, any two of N, N, and T has to hold. This is, because σ µ (the variance of µ ij ) cannot be estimated consistently, when only T, σ υ, or when only N, and so on. For the consistency of the FGLS we need all variance components to be estimated consistently, something which holds only if the 4

15 data grows in at least two dimensions. Table 3 collects the conditions needed for consistency for all models considered. So what if, for example, the data is such that N is large, but N and T are small (like in case, for example, of an employee-firm data with an extensive number of workers, but with few hiring firms observed annually)? This would mean, that σµ and συ is estimated consistently, unlike σζ. In such cases, it makes more sense to assume ζ jt to be fixed instead of random (while still assuming the randomness of µ ij and υ it ), arriving to the so-called mixed effects models, something explored in Section 4. We can estimate the variance components of the other models in a similar way. As the algebra is essentially the same, we only present here the main results. For model (4), E[ũ ijt] = (N )(N ) N N, E[(ũ a ijt) ] = N N (συ + ) and E[(ũ b ijt) ] = N N (σζ + σ ε), now with ũ ijt = u ijt ū.jt ū i.t +ū..t, and ũ a ijt = u ijt ū.jt, and ũ b ijt = u ijt ū i.t, which correspond to the projection matrices M = I N N T ( J N I N T ) (I N J N I T ) + ( J N N I T ) M a = I N N T ( J N I N T ) M b = I N N T (I N J N I T ) respectively. The estimators for the variance components then are ˆσ ε = (N )(N )T ˆσ ζ = N (N )T ijt û ijt, ˆσ υ = (N )N T ijt ( û a ijt) ˆσ ε, ijt ( û b ijt) ˆσ ε, and where again, û ijt, û a ijt and û b ijt are obtained by transforming the residual û ijt according to M, M a, and M b respectively. For model (6), as E[u ijt] = σ ζ + σ ε, and E[ũ ijt] = N N σ ε, with now ũ ijt = u ijt ū.jt (or with M = I N N T ( J N I N T )), the appropriate 5

16 estimators are simply ˆσ ε = For model (8), (N )N T û ijt, and ˆσ ζ = ijt N N T û ijt ˆσ ε. E[ũ ijt] = (N N )(T ) N N σ T ε, E[(ũ a ijt) ] = N N N N (σµ + ), and E[(ũ b ijt) ] = T T (σ λ + σ ε), with ũ ijt = u ijt ū..t ū ij. + ū..., and ũ a ijt = u ijt ū..t, and ũ b ijt = u ijt ū ij. which correspond to M = I N N T ( J N N I T ) (I N N J T ) + J N N T M a = I N N T ( J N N I T ) M b = I N N T (I N N J T ) ijt respectively. The estimators for the variance components are ˆσ ε = (N N )(T ) ˆσ λ = N N (T ) ijt û ijt, ˆσ µ = (N N )T ijt ( û a ijt) ˆσ ε, ijt ( û b ijt) ˆσ ε. and For model (0), as E[ũ ijt] = (N N )T (N ) (N ) N N σ T ε E[(ũ a ijt) ] = (N N )T (N ) N N (σ T υ + ) E[(ũ b ijt) ] = (N N )T (N ) N N (σ T ζ + σ ε) E[(ũ c ijt) ] = N N T N N + N N (σ T µ + ) with ũ ijt = u ijt ū..t ū.j. ū i.. + ū..., ũ a ijt = u ijt ū..t ū.j. + ū..., ũ b ijt = u ijt ū..t ū i.. + ū..., and ũ c ijt = u ijt ū i.. ū.j. + ū... which all correspond to the projection matrices M M a M b M c = I N N T ( J N N I T ) ( J N I N J T ) (I N J N T ) + J N N T = I N N T ( J N N I T ) ( J N I N J T ) + J N N T = I N N T ( J N N I T ) (I N J N T ) + J N N T = I N N T ( J N I N J T ) (I N J N T ) + J N N T 6

17 respectively. The estimators for the variance components are ˆσ ε = Lastly, for model () we get (N N )T (N ) (N ) ˆσ υ = (N N )T (N ) ˆσ ζ = (N N )T (N ) ˆσ λ = N N T N N + ijt û ijt ijt ( û a ijt) ˆσ ε ijt ( û b ijt) ˆσ ε ijt ( û c ijt) ˆσ ε. E[u ijt] = σ µ + σ ε, and E[ũ ijt] = T T σ ε, with ũ ijt = u ijt ū ij. (which is the same as a general element of Mu with M = I N N T (I N N J T )). With this, the estimators are ˆσ ε = û ijt, and ˆσ µ = N N (T ) ijt N N T û ijt ˆσ ε. Standard errors are computed accordingly, using Var( ˆβ F GLS ) = (X ˆΩ X). In the limiting cases, the usual normalization factors are needed to obtain finite variances. If, for example N and T are growing, N T ( ˆβ F GLS β) has a normal distribution with zero mean, and Q XΩX variance, where Q XΩX = plim N,T X ˆΩ X N is assumed to be a finite, positive definite matrix. This T holds model-wide. We have no such luck, however, with the OLS estimator. The issue is best illustrated with model (). It can be shown, just as with the usual D panel models, Var( ˆβ OLS ) = (X X) X ˆΩX(X X) (with ˆΩ being model-specific, but let us assume for now, that it corresponds to (3)). In the asymptotic ijt Table 3: Sample conditions for the consistency of the FGLS Estimator Model Consistency requirements () (N, N ) or (N, T ) or (N, T ) (4) (T ) or (N, N ) (6) (N ) or (T ) (8) (N, T ) or (N, T ) (0) (N, N, T ) () (N ) or (N ) 7

18 case, when N, N, N N ( ˆβ OLS β) has a normal distribution with finite variance, but this variance grows without bound (at rate O(T )) once T. That is, an extra / T normalization factor has to be added to regain a normal distribution with bounded variance. Table 4 collects normalization factors needed for a finite Var( ˆβ OLS ) for the different models considered. As it is uncommon to normalize with, or with expression like N A N, some insights into the normalizations are given in Appendix. Another interest- Table 4: Normalization factors for the finiteness of ˆβ OLS Model () (4) (6) (8) (0) () N N N N N T T T N, N N A N N A N N N N N, T N T A T T N T A N N T N T A N N T A N N, T A T N T N, N, T N N T N A N A T N T N N T N A A N Where A is the sample size which grows with the highest rate, (N, N, or T ), and A, A are the two sample sizes which grow with the highest rates. ing aspect is revealed by comparing Tables and 3, that is the consistency requirements for the estimation of the variance components (Table ) and the asymptotic results, when the FGLS converges to the Within estimator (Table 3). As can be seen from Table 5, for all models the FGLS is consistent if all N, N, T go to infinity, but in these cases the (F)GLS estimator converges to the Within one. This is problematic, as some parameters, previously estimable, become suddenly unidentified. In such cases, we have to rely on the OLS estimates, rather than the FGLS. This is generally the case whenever a + sign is found in Table 5, most significant for models (8) and (0). For them, the FGLS is only consistent, when it is in fact the Within Estimator, leading to likely severe identification issues. The best case scenarios are indicated with a sign, where the respective asymptotics are already enough for the consistency of the FGLS, but do not yet cause identification problems. Lastly, blank spaces are left in the table if, under the given asymptotic, the 8

19 Table 5: Asymptotic Results when the OLS should be used Model () (4) (6) (8) (0) () N + N T + N, N N, T N, T N, N, T A sign indicates that the model is estimated consistently with FGLS, a + sign indicates that OLS should be used as some parameters are not identified, and a box is left blank if the model can not estimated consistently (under the respective asymptotics). FGLS is not consistent. In such cases we can again rely on the consistency of the OLS, but its standard errors are inconsistent, just as with the FGLS. 3 Unbalanced Data 3. Structure of the Covariance Matrices Our analysis has concentrated so far on balanced panels. We know, however, that real life datasets usually have some kind of incompleteness embedded. This can be more visible in the case of higher dimensional panels, where the number of missing observations can be substantial. As known from the analysis of the standard two-way error components models, in this case the estimators of the variance components, and in turn, those of the slope parameters are inconsistent, and further, the spectral decomposition of Ω is inapplicable. Next, we present the covariance matrices of the different models in an incomplete data framework, we show a feasible way to invert them, and then propose a method to estimate the variance components in this general setup. In our modelling framework, incompleteness means, that for any (ij) pair 9

20 of individuals, t T ij, where T ij index-set is a subset of the general {,..., T } index-set of the time periods spanned by the data. Further, let T ij denote the cardinality of T ij, i.e., the number of its elements. Note, that for complete (balanced) data, T ij = {,..., T }, and T ij = T for all (ij). We also assume, that for each t there is at least one (ij) pair, for each i, there is at least one (jt) pair, and for each j, there is at least one (it) pair observed. This assumption is almost natural, as it simply requires individuals or time periods with no underlying observation to be dropped from the dataset. As the structure of the data now is quite complex, we need to introduce a few new notation and definitions along the way. Formally, let us call n it, n jt, n i, n j, and n t the total number of observations for a given (it), (jt) pair, and for given individuals i, j, and time t, respectively. Further, let us call ñ ij, ñ it, ñ jt the total number of (ij), (it), and (jt) pairs present in the data. Remember, that in the balanced case, ñ ij = N N, ñ it = N T, and ñ jt = N T. It would make sense to define similarly ñ i, ñ j, and ñ t, however, we can assume, without the loss of generality, that there are still N i, N j, individuals, and T total time periods in the data (of course, there are holes in it). For the all-encompassing model (), u ijt can be stacked into vector u. Remember, that in the complete case it is u = (I N I N ι T )µ + (I N ι N I T )υ + (ι N I N I T )ζ + I N N T ε = D µ + D υ + D 3 ζ + ε, with µ, υ, ζ, ε begin the stacked vectors of µ ij, υ it, ζ jt, and ε ijt, of respective lengths N N, N T, N T, N N T, and ι is the column of ones with size on the index. The covariance matrix can then be represented by E(uu ) = Ω = D D σ µ + D D σ υ + D 3 D 3σ ζ + Iσ ε, which is identical to (3). However, in the case of missing data, we have to modify the underlying D k dummy matrices to reflect the unbalanced nature of the data. For every (ij) pair, let V ij denote the size ( T ij T ) matrix, which we obtain from the (T T ) identity matrix by deleting rows corresponding 0

21 to missing observations. 3 With this, the incomplete D k dummies are D = diag{v ι T, V ι T,..., V N N ι T } of size ( ij T ij ñ ij ), D = diag { (V, V,..., V N ),..., (V N, V N,..., V N N ) } of size ( ij T ij ñ it ) D 3 = ( diag{v, V,..., V N },..., diag{v N, V N,..., V N N } ) of size ( ij T ij ñ jt ). These then can be used to construct the covariance matrix as of size Ω = E(uu ) = I ij T ij σ ε + D D σ µ + D D σ υ + D 3 D 3σ ζ ( ij T ij ij T ij ). If the data is complete, the above covariance structure in fact gives back (3). The job is the same for other models. For models (4) and (6), u = D υ + D 3 ζ + ε and u = D 3 ζ + ε respectively, with the incompleteness adjusted D and D 3 defined above, giving in turn Ω = I ij T ij + D D σ υ + D 3 D 3σ ζ for model (4), and Ω = I ij T ij + D 3 D 3σ ζ for model (6). Again, if the panel were in fact complete, we would get back (5) and (7). The incomplete data covariance matrix of model (8) is Ω = I ij T ij + D D σ µ + D 4 D 4σ λ, with D 4 = (V, V,..., V N N ) of size ( ij T ij T ). 3 If, for example, t =, 4, 0 are missing for some (ij), we delete rows, 4, and 0 from I T to get V ij.

22 The covariance matrix for model (0) is where Ω = I ij T ij σ ε + D 5 D 5σ υ + D 6 D 6σ ζ + D 4 D 4σ λ, D 5 = diag { (V ι T, V ι T,..., V N ι T ),..., (V N ι T, V N ι T,..., V N N ι T ) } D 6 = ( diag{v ι T, V ι T,..., V N ι T },......, diag{v N ι T, V N ι T,..., V N N ι T } ). of sizes ( ij T ij N ), and ( ij T ij N ). Lastly, for model () we simply get Ω = I ij T ij σ ε + D D σ µ. An important practical difficulty is that the spectral decomposition of the covariance matrices introduced in Section are no longer valid, so the inversion of Ω for very large data sets can be forbidding. To go around this problem, let us construct the quasi-spectral decomposition of the incomplete data covariance matrices, which is simply done by leaving out the missing rows from the appropriate B. Specifically, let us call B the incompletenessadjusted versions of any B, which we get by removing the rows corresponding to the missing observations. For example, the spectral decomposition (4) for the all-encompassing model reads as Ω = σ εb ijt + (σ ε + T σ µ)b ij. + (σ ε + N σ υ)b i.t + (σ ε + N σ ζ )B.jt +(σ ε + T σ µ + N σ υ)b i.. + (σ ε + T σ µ + N σ ζ )B.j. +(σ ε + N σ υ + N σ ζ )B..t + (σ ε + T σ µ + N σ υ + N σ ζ )B..., where now all B have number of rows equal to ij T ij. Of course, this is not a correct spectral decomposition of Ω, but helps to define the following conjecture. 4 Namely, when the number of missing observations relative to the total number of observations is small, the inverse of Ω based on the quasispectral decomposition of it, Ω, approximate arbitrarily well Ω. More precisely, if [N N T i j T ij ]/[N N T ] 0, then (Ω Ω ) 0. This means that in large data sets, when the number of missing observation is 4 This can be demonstrated by simulation.

23 small relative to the total number of observations, Ω can safely be used in the GLS estimator instead of Ω. Let us give an example. Multi-dimensional panel data are often used to deal with trade (gravity) models. In these cases, however, when country i trade with country j, there are no (ii) (or (jj)) observations, there is no self-trade. Then the total number of observations is N T NT with NT being the number of missing observations due to no self-trade. Given that [N T (N T NT )]/N T 0 as the sample size increases, the quasi-spectral decomposition can be used in large data. 3. The Inverse of the Covariance Matrices The solution proposed above, however, suffers from two potential drawbacks. First, the inverse, though reached at very low cost, may not be accurate enough, and second, when the holes in the data are substantial this method cannot be used. These reasons spur us to derive the analytically correct inverse of the covariance matrices at the lowest possible cost. To do that, we have to reach back to the comprehensive incomplete data analysis carried out by Baltagi and Chang (994), and later Baltagi et al. (00) for oneand two-way error component models, Baltagi et al. (00) for nested threeway models, and also, we have to generalize the results of Wansbeek and Kapteyn (989) (in a slightly different manner though, than seen in Davis (00)). This leads us, for model (), to Ω = P b P b D 3 (R c ) D 3P b (7) where P b and R c are obtained in steps: R c R b R a = D 3P b D 3 + σ ε I, P b = P a P a D σζ (R b ) D P a, = D P a D + σ ε = D D + σ ε σ µ I, σ υ I, P a = I D (R a ) D, and where D, D, D 3 are the incompleteness-adjusted dummy variable matrices, and are used to construct the P and R matrices sequentially: first, construct R a to get P a, then construct R b to get P b, and finally, construct R c to get P c. Proof of (7) can be found in Appendix. Note, that to get the inverse, 3

24 we have to invert min{n T ; N T ; N N } matrices. The quasi-scalar form of (7) (which corresponds to the incomplete data version of transformation (5)) is y ijt ( σ ε T ij σ µ + σ ε ) T ij y ijt ωijt a ωijt b, t with ω a ijt = χ a ijt ψ a, and ω b ijt = χ b ijt ψ b, where χ a ijt is the row corresponding to observation (ijt) from P a D, ψ a is the column vector (R b ) D P a y, ω b ijt is the row from matrix P b D 3 corresponding to observation (ijt), and finally, ψ b is the column vector (R c ) D 3P b y. For the other models, the job is essentially the same, only the number of steps in obtaining the inverse is smaller (as the number of different random effects decreases). For model (4), it is, with appropriately redefining P and R, where now σ εω = P a P a D 3 (R b ) D 3P a, (8) R b = D 3P a D 3 + σ ε I, P a = I D σζ (R a ) D and R a = D D + σ ε I, συ with the largest matrix to inverted now of size min{n T ; N T }. For model (6), it is even more simple, Ω = I D 3 (R a ) D 3 with R a = D 3D 3 + σ ε I, (9) σζ defining the scalar transformation ( ) ỹ ijt = y ijt n jt σζ + y ijt, σ ε n jt with n jt being the number of observations for a given (jt) pair. For model i 4

25 (8), the inverse is σ εω = P a P a D 4 (R b ) D 4P a (0) where R b = D 4P a D 4 + σ ε I, P a = I D σλ (R a ) D and R a = D D + σ ε I. σµ and we have to invert a min{n N ; T } sized matrix. For model (0), the inverse is again the result of a three-step procedure: where σ εω = P b P b D 4 (R c ) D 4P b, () R c R b = D 4P b D 4 + σ ε I, P b = P a P a D σλ 6 (R b ) D 6P a, = D 6P a D 6 + σ ε I, P a = I D σζ 5 (R a ) D 5, and R a = D 5D 5 + σ ε I, συ (with inverting a matrix of size min{n ; N ; T }) and finally, the inverse of the simplest model is Ω = I D (R a ) D with R a = D D + σ ε I, () σµ defining the scalar transformation ỹ ijt = y ijt on a typical y ijt variable. ( σ ε T ij σ µ + σ ε ) T ij t y ijt 3.3 Estimation of the Variance Components Let us proceed to the estimation of the variance components. The estimators used for complete data are no longer applicable here, as for example, transformation (6) does not eliminate µ ij, υ it, and ζ jt from the composite disturbance term u ijt = µ ij +υ it +ζ jt +ε ijt, when the data is incomplete. This 5

26 problem can be tackled in two ways. We can derive incompleteness-robust alternative to (6), i.e., a transformation which clears the non-idiosyncratic random effects from u ijt, in the case of incomplete data (see Balazsi et al. (05)). The problem is that most of these transformations involve the manipulation of large matrices resulting in heavy computational burden. To avoid this we propose simple linear transformations, which on the one hand, are robust to incomplete data, and on the other hand, identify the variance components. Let us see, how this works for model (). As before but now, let us define E[u ijt] = σ µ + σ υ + σ ζ + σ ε, (3) ũ a ijt = u ijt T ij t u ijt, ũ b ijt = u ijt n it j u ijt, and ũ c ijt = u ijt n jt i u ijt. It can be seen that E[(ũ a ijt) ] = T ij T ij (σ υ + σζ + σ ε), E[(ũ b ijt) ] = n it n it (σµ + σζ + σ ε), and E[(ũ c ijt) ] = n jt n jt (σµ + συ + ). (4) Combining (3) with (4) identifies all four variance components. The appropriate estimators are then ˆσ µ = ij T ij ijt û ijt ˆσ υ = ij T ij ˆσ ζ = ij T ij ˆσ ε = ij T ij ñ ij ij T ij t ( û a ijt) ijt û ijt ñ it it n it j ( û b ijt) ijt û ijt ñ jt jt n jt i ( û c ijt) ijt û ijt ˆσ µ ˆσ υ ˆσ ζ, (5) where û ijt are the OLS residuals, and û k ijt are its transformations (k = a, b, c), where ñ ij, ñ it, and ñ jt denote the total number of observations for the (ij), (it), and (jt) pairs respectively in the data. The estimation strategy of the variance components is exactly the same for all the other models. Let us keep for now the definitions of ũ b ijt, and ũ c ijt. 6

27 For model (4), with u ijt = υ it + ζ jt + ε ijt, the estimators read as ˆσ υ = ij T ij ijt û ijt ˆσ ζ = ij T ij ˆσ ε = ij T ij ñ it it n it j ( û b ijt) ijt û ijt ñ jt jt n jt i ( û c ijt) (6) ijt û ijt ˆσ υ ˆσ ζ, whereas for model (6), with u ijt = ζ jt + ε ijt, they are ˆσ ζ = ij T ij ijt û ijt ñ jt jt n jt i ( û c ijt) ˆσ ε = ij T ij ijt û ijt ˆσ ζ, (7) Note, that these latter two estimators can be obtained from (5), by assuming ˆσ µ = 0 for model (4), and ˆσ µ = ˆσ υ = 0 for model (6). For model (8), let us redefine the ũ k ijt-s, as ũ a ijt = u ijt T ij u ijt, and ũ b ijt = u ijt u ijt, n t t with n t being the number of individual pairs at time t. With u ijt = µ ij + λ t + ε ijt, E[(ũ a ijt) ] = T ij T ij (σ λ + σ ε), E[(ũ b ijt) ] = nt n t (σ µ + σ ε), and E[u ijt] = σ µ + σ λ + σ ε. From this set of identifying equations, the estimators are simply ˆσ µ = ij T ij ijt û ijt ˆσ λ = ij T ij ˆσ ε = ij T ij ij ñ ij ij T ij t ( û a ijt) t n t ij ( û b ijt) ijt û ijt T (8) ijt û ijt ˆσ µ ˆσ λ. For model (), with u ijt = µ ij + ε ijt, keeping the definition of ũ a ijt, ˆσ µ = ij T ij ijt û ijt ˆσ ε = ij T ij ñ ij ij ijt û ijt ˆσ µ. T ij t ( û a ijt) (9) 7

28 Finally, for model (0), as now u ijt = υ i + ζ j + λ t + ε ijt, using ũ a ijt = u ijt u ijt, n i jt ũ b ijt = u ijt u ijt, n j it ũ c ijt = u ijt u ijt, n t ij with n i and n j being the number of observation-pairs for individual i, and j, respectively, the identifying equations are E[(ũ a ijt) ] E[(ũ c ijt) ] = n i n i (σζ + σ λ + σ ε), E[(ũ b ijt) ] = n j n j (συ + σλ + σ ε), = nt n t (συ + σζ + σ ε), and E[u ijt] = συ + σζ + σ λ + σ ε, in turn leading to ˆσ υ = ij T ij ijt û ijt ñ i ijt û ijt ñ j ˆσ υ = ij T ij ˆσ ζ = ij T ij ˆσ ε = ij T ij ijt û ijt T jt ij n i jt ( û a ijt) it n j it ( û b ijt) n t ij ( û c ijt) ijt û ijt ˆσ υ ˆσ ζ ˆσ λ. (30) 4 Extensions So far we have seen how to formulate and estimate three-way error components models. However, it is more and more typical to have data sets which require an even higher dimensional approach. As the number of feasible model formulations grow exponentially along with the dimensions, there is no point to attempt to collect all of them. Rather, we will take the 4D representation of the all-encompassing model (), and show how the extension to higher dimensions can be carried out. 4. 4D and beyond The baseline 4D model we use reads as, with i =... N, j =... N, s =... N 3, and t =... T, y ijst = x ijstβ + µ ijs + υ ist + ζ jst + λ ijt + ε ijst = x ijstβ + u ijst, (3) 8

29 where we keep assuming, that u (and its components individually) have zero mean, the components are pairwise uncorrelated, and further, E(µ ijs µ i j s ) = σµ i = i and j = j and s = s 0 otherwise E(υ ist υ i s t ) = συ i = i and s = s and t = t 0 otherwise E(ζ jst ζ j s t ) = σζ j = j and s = s and t = t 0 otherwise E(λ ijt ζ i j t ) = σλ i = i and j = j and t = t 0 otherwise The covariance matrix of such error components formulation is Ω = E(uu ) = σ µ(i N N N 3 J T ) + σ υ(i N J N I N3 T ) +σ ζ (J N I N N 3 T ) + σ λ (I N N J N3 I T ) + σ εi N N N 3 T. (3) Its inverse can be simply calculated, following the method developed in Section, and the estimation of the variance components can also be derived as in Section 3, see for details Appendix 3. The estimation procedure is not too difficult in the incomplete case either, at least not theoretically. Taking care of the unbalanced nature of the data in four dimensional panels has nevertheless a growing importance, as the likelihood of having missing and/or incomplete data increases dramatically in higher dimensions. Conveniently, we keep assuming, that our data is such, that, for each (ijs) individual, t T ijs, where T ijs is a subset of the index-set {,..., T }, that is, we have T ijs identical observations for each (ijs) pair. First, let us write up the covariance matrix of (3) as Ω = E(uu ) = σ εi + σ µd D + σ υd D + σ ζd 3 D 3 + σ λd 4 D 4, (33) 9

30 where, in the complete case, D = (I N N N 3 ι T ), D = (I N ι N I N3 T ), D 3 = (ι N I N N 3 T ), D 4 = (I N N ι N3 I T ), all being (N N N 3 T N N N 3 ), (N N N 3 T N N 3 T ), (N N N 3 T N N 3 T ), and (N N N 3 T N N T ) sized matrices respectively, but now we delete, from each D k, the rows corresponding to the missing observations to reflect the unbalanced nature of the data. The inverse of such covariance formulation can be reached in steps, that is, one has to derive Ω = P c P c D 4 (R d ) D 4P c (34) where P c and R d are obtained in the following steps: R d R c R b = D 4P c D 4 + σ ε, P c = P b P b D σλ 3 (R c ) D 3P b, = D 3P b D 3 + σ ε, P b = P a P a D σζ (R b ) D P a, = D P a D + σ ε, P a = I D συ (R a ) D, and R a = D D + σ ε. σµ Even though the calculation above alleviates some of the dimensionality curse, 5 to perform the inverse we still have to manipulate potentially large matrices. The last step in finishing the FGLS estimation of the incomplete 4D models is to estimate the variance components. Fortunately, this is not too difficult, however, due to the size of the formulas, the results are presented in Appendix Mixed FE-RE Models As briefly mentioned in Section 3, when one of the indices is small, it makes more sense to treat the effects depending on that index as fixed. As an illustration, consider an employee i employer j time t-type dataset, where we usually have a very large set of i, but relatively low j and t. All this means, 5 The higher the dimension of the panel, the larger the size of the matrices we have to work with. 30

31 that the all-encompassing model () can now be rewritten as or, similarly, y ijt = x ijtβ + α jt + µ ij + υ it + ε ijt, (35) y = Xβ + D α + D µ + D 3 υ + ε = Xβ + D α + u, with D = (ι N I N T ), D = (I N N ι T ), and D 3 = (I N ι N I T ). We assume, that α jt enters the model as a fixed dummy, and that u ijt = µ ij +υ it + ε ijt remains the random component. To estimate such model specification, keeping an eye on optimality, we have to follow a two step procedure. First, to get rid of the fixed effects, we define a projection orthogonal to α jt. Then, on this transformed model, we perform FGLS. The resulting estimator is analytically not too complicated, and although restricted x jt regressors can not be estimated from (35), ˆβ Mixed is identified and consistent for the rest of the variables. This is a substantial improvement over the FGLS estimation of model (), when N and T are both small, as in such cases, as shown in Section 3, the inconsistency of the variance components estimators carries over to the model parameters. The projection needed to eliminate α jt is M D = I D (D D ) D or in scalar form, ỹ ijt = y ijt ȳ.jt. (36) Notice, that the resulting transformed (35), ỹ ijt = x ijt β + ũ ijt, (37) is now a simple error components model, with a slightly less trivial random effects structure embedded in ũ ijt. In fact, Ω = E(ũũ ) = E(M D uu M D ) = M D D D M D σ µ + M D D 3 D 3M D σ υ + M D σ ε = ((I N J N ) I N J T )T σ µ + ((I N J N ) J N I T )N σ υ +((I N J N ) I N T )σ ε, 3

32 while its inverse can be derived using the trick introduced in Section (using the substitution I N = Q N + J N ), giving Ω = [ I N N T ( J N I N T ) ] ( θ ) [ (I N J N I T ) ( J N N I T ) ] ( θ ) [ (I N N J T ) ( J N I N J T ) ] +( θ θ + θ 3 ) [ (I N J N T ) J ] N N T with θ = σ ε N σ υ + σ ε, θ = σ ε T σ µ + σ ε, and θ 3 = σ ε N σ υ + T σ µ + σ ε. After all, the mixed effects estimation of (35) is identical to the FGLS estimation of (37). The only step remaining is to estimate the variance components. In particular, ˆσ ε = (N )(N )(T ) ˆσ µ = (N )(N )T ˆσ υ = (N )N (T ) where û ijt is the OLS residual, and now ijt ( û a ijt) ijt ( û b ijt) ˆσ ε ijt ( û c ijt) ˆσ ε, (38) ũ a ijt = u ijt ū.jt ū i.t ū ij. + ū..t + ū.j. + ū i.. ū..., ũ b ijt = u ijt ū.jt ū i.t + ū..t, and ũ c ijt = u ijt ū.jt ū ij. + ū.j.. (39) The next question is to what extent the above algorithm has to be modified for unbalanced data. First, transformation (36) is successful in eliminating α jt from model (35) in this case as well. Second, the resulting transformed covariance matrix now can not be represented by kronecker products, instead, to invert it, we have to rely on tricks derived in Section 3. The estimation of the variance components are done by first adjusting the transformations ũ a ijt, ũ b ijt in (39) to incomplete data, and for ũ c ijt ũ c = M () u = M () u M () D3 ( D 3M () D3 ) D 3 M () u, where u contain the stacked disturbances (with elements u ijt ), ũ c is its transformed counterpart, M () = I D ( D D ) D, and D and D 3 are obtained 3

33 from D = (I N N ι T ) and D 3 = (ι N I N T ) by leaving out the rows corresponding to missing observations. Finally, we have to set the proper sample sizes in (38). 5 Testing In this section we show for the all-encompassing model () how to test for the different components of the unobserved heterogeneity. More specifically, the nullity of the variance of some random components against the alternative, that the given variance is positive. We have to be careful, however, about what we assume about the rest of the variances. Testing H 0 : σ µ = 0 against H A : σµ > 0 implicitly assumes, that συ = σζ = 0, and so on. In what follows, we collect some null-, and alternative hypotheses, and present the mechanism to test them: H a 0 : σ µ = 0 σ υ > 0, σ ζ > 0; Ha A : σ µ > 0 σ υ > 0, σ ζ > 0 H b 0 : σ µ = 0 σ υ = 0, σ ζ > 0; Hb A : σ µ > 0 σ υ = 0, σ ζ > 0 H c 0 : σ µ = 0 σ υ = 0, σ ζ = 0; Hc A : σ µ > 0 σ υ = 0, σ ζ = 0 H d 0 : σ µ = 0 σ υ > 0, σ ζ > 0; Hd A : σ µ > 0 σ υ = 0, σ ζ > 0 H e 0 : σ µ = 0 σ υ > 0, σ ζ = 0; He A : σ µ > 0 σ υ = 0, σ ζ = 0 To test these hypotheses, we will invoke the ANOVA F -test, and adjust it to our purposes. In its general form, as derived in Baltagi et al. (99), F = y M Z D(D M Z D) D M Z y/(p r) y M Z y/(n N T k p + r), (40) where both M and M are orthogonal projectors, and the degrees of freedom is calculated from p, r, and k. Table 6 captures each specific matrix and constant for all hypotheses listed above. Although (40) suffices theoretically, let us not forget that in order to perform the test, we have to invert (D M Z D), a matrix as large as the data. Instead, to avoid this computational burden, we can elaborate on (40), and 33

34 Table 6: Specific functional forms of the ANOVA F -test Hypothesis Z D Z p r k H a (X, D, D 3 ) (I NN J T ) (X, D, D, D 3 ) N N N (T ) + N (T ) + T + k H b (X, D 3 ) (I NN J T ) (X, D, D 3 ) N N N (T ) + k H c X (I NN J T ) (X, D ) N N k H d (X, D 3 ) (I NN J T, I N J N I T ) (X, D, D, D 3 ) N N + N T k H e X (I NN J T, I N J N I T ) (X, D, D ) N N + N T k where, as defined, M Z = I Z(Z Z) Z, D = (I N N ι T ), D = (I N ι N I T ), and D 3 = (ι N I N T ). find out what the respective projection matrices do to the data: F = F /(p r) F /(N N T k p + r) where F = (ỹ X(X X) X y) (I + X( X X)X )(ỹ X(X X) X y) = (ỹ X ˆβOLS ) (I + X( X X)X )(ỹ X ˆβOLS ), and F = (ỹ X( X X) Xỹ) (ỹ X( X X) Xỹ) = (ỹ X ˆβw ) (ỹ X ˆβ w ), with the -s on the top denoting different transformations. For H0 a for example, these are H a A and ỹ ijt = y ijt ȳ.jt ȳ i.t ȳ ij. + ȳ..t + ȳ.j. + ȳ i.. ȳ... (4) (which is the optimal Within of model ()), and ỹ ijt = y ijt ȳ.jt ȳ i.t + ȳ..t. (4) To get an insight into the specific formula, notice, that we actually compare two models, the one where the sources of all variations are cleared (the denominator of (40)) with the one where all variation is cleared, but the one coming from µ ij (the numerator of (40)). This is, because both under the null and the alternative, we assume, that συ > 0 and σζ > 0, that is, they are irrelevant from our point of view, we can eliminate both υ it and ζ jt with an orthogonal projection. Further, under the alternative, σµ > 0 also holds, 34

35 so we eliminate µ ij as well, but save it under the null. The numerator and the denominator of (40) is then compared, and if it is sufficiently close to, we cannot reject the nullity of σµ. Not much changes when the underlying data is incomplete. In principle, the orthogonal projections M Z and M Z now cannot be represented as linear transformations on the data, only in semi-scalar form, with the inclusion of some matrix operations. Once we have the incomplete-robust ỹ, ỹ (similarly for X) variables, the F statistic is obtained as in (40), with the properly computed degrees of freedom. 6 Conclusion For large data sets, when observations can be considered as samples from an underlying population, random effects specifications seem to be more suited to deal with multi-dimensional data sets. FGLS estimators for three-way error components models are almost as easily obtained as for the traditional D panel models, however the resulting asymptotic requirements for their consistency are more peculiar. In fact, now the data can grow in three directions, and only some of the asymptotic cases are sufficient for consistency. Interestingly, for some error components specifications, consistency implies the convergence of the FGLS estimator to the Within one. This is utterly important, as under the Within estimation, the parameters of some fixed regressors are unidentified, which is in fact carried over to the FGLS estimation of those parameters as well. To solve this, we have shown that a simple OLS can be sufficient to get the full set of parameter estimates (of course, bearing the price of inefficiency), wherever this identification problem persists. The main results of the paper are also extended to treat incomplete data and towards higher dimensions as well. References Baier, Scott L. and Jeffrey H. Bergstrand, Do Free Trade Agreements Actually Increase Members International Trade?, Journal of International 35

36 Economics, 007, 7, Balazsi, Laszlo, Laszlo Matyas, and Tom Wansbeek, The Estimation of Multi-dimensional Fixed Effects Panel Data Models, Forthcoming in Econometric Reviews, 05. Baldwin, Richard and Daria Taglioni, Gravity for Dummies and Dummies for the Gravity Equations, NBER Working Paper 56, 006. Baltagi, Badi H., On estimating from a more general time-series cum cross-section data structure, The American Economist, 987, 3, and Young-Jae Chang, Incomplete panels: A comparative study of alternative estimators for the unbalanced one-way error component regression model, Journal of Econometrics, 994, 6, , Laszlo Matyas, and Patrick Sevestre, Error Components Models, Springer Verlag, 008., Peter Egger, and Michael Pfaffermayr, A generalized Design for Bilateral Trade Flow Models, Economic Letters, 003, 80, , Seuck H. Song, and Byoung C. Jung, The unbalanced nested error component regression model, Journal of Econometrics, 00, 0, ,, and, A comparative study of alternative estimators for the unbalanced two-way error component regression model, Econometrics Journal, 00, 5, , Young-Jae Chang, and Qi Li, Monte Carlo results on several new and existing tests for the error component model, Journal of Econometrics, 99, 54, Bryk, Anthony S. and Stephen W. Raudenbush, Hierarchical Linear Models: Applications and Data Analysis Methods (First Edition), Newbury Park, CA: Sage Publications,

Modelling Multi-dimensional Panel Data: A Random Effects Approach

Central European University Department of Economics Work in progress, please do not quote! Modelling Multi-dimensional Panel Data: A Random Effects Approach Laszlo Balazsi Laszlo Matyas Central European