No 19/2016 Panel data estimators and aggregation Erik Biørn

Size: px

Start display at page:

Download "No 19/2016 Panel data estimators and aggregation Erik Biørn"

Alice Rogers
5 years ago
Views:

1 MEMORADUM o 19/2016 Panel data estimators and aggregation Erik Biørn ISS: Department of Economics University of Oslo

2 his series is published by the University of Oslo Department of Economics P. O.Box 1095 Blindern OSLO orway elephone: Fax: Internet: econdep@econ.uio.no In co-operation with he Frisch Centre for Economic Research Gaustadalleén OSLO orway elephone: Fax: Internet: frisch@frisch.uio.no Last 10 Memoranda o 18/16 o 17/16 o 16/16 o 15/16 o 14/16 o 13/16 o 12/16 o 11/16 o 10/16 o 09/16 Olav Bjerkholt Wassily Leontief and the discovery of the input-output approach Øystein Kravdal ew Evidence about effects of reproductive variables on child mortality in sub-saharan Africa Moti Michaeli and Daniel Spiro he dynamics of revolutions Geir B. Asheim, Mark Voorneveld and Jörgen W. Weibull Epistemically robust strategy subsets orbjørn Hanson Estimating output mix effectiveness: A scenario approach Halvor Mehlum and Kalle Moene Unequal power and the dynamics of rivalry Halvor Mehlum Another model of sales. Price discrimination in a horizontally differentiated duopoly market Vladimir W. Krivonozhko, Finn R. Førsund and Andrey V. Lychev Smoothing the frontier in the DEA models Finn R. Førsund Pollution Modelling and Multiple-Output Production heory* Frikk esje and Geir B. Asheim Intergenerational altruism: A solution to the climate problem?* Previous issues of the memo-series are available in a PDF format at:

3 Panel Data Estimators and Aggregation Erik Biørn Department of Economics, University of Oslo, P.O. Box 1095 Blindern, 0317 Oslo, orway Abstract: For a panel data regression equation with two-way unobserved heterogeneity, individualspecific and period-specific, within-individual and within-period estimators, which can be given Ordinary Least Squares (OLS) or Instrumental Variables (IV) interpretations, are considered. A class of estimators defined as linear aggregates of these estimators, is defined. ine aggregate estimators, including between, within, and Generalized Least Squares (GLS), are special cases. Other estimators are shown to be more robust to simultaneity and measurement error bias than the standard aggregate estimators and more efficient than the disaggregate estimators. Empirical illustrations relating to manufacturing productivity are given. Keywords: Panel data. Aggregation. IV estimation. Robustness. Method of moments. Factor productivity JEL classification: C13, C23, C43.

4 1 Introduction A primary reason for the substantial growth in the use of panel data during the last decades is the opportunity they give for identifying and controlling for unobserved heterogeneity which may disturb coefficient estimation. It is well known that (i) the potential nuisance created by fixed (additive) individual heterogeneity in OLS estimation can be eliminated by measuring all variables from their individual means or taking individual differences over time, (ii) the potential nuisance created by fixed (additive) time specific heterogeneity in OLS estimation can be eliminated by measuring all variables from their time-specific means or taking time-specific differences over individuals, and (iii) efficient estimation in the presence of suitably structured random individual- or time-specific heterogeneity, can be performed by (Feasible) Generalized Least Squares. Less attention has been given to the fact that such aggregate estimators can be constructed from disaggregate building-blocks. Approaching estimation in this way is illuminating because regression coefficients can be estimated consistently from parts of a panel data set in numerous ways and because the disaggregate estimators have different degree of robustness to bias. By combining an increasing number of individual-specific or period-specific estimators, an increasing part of the observations can be included until, at the limit, the full data set is used. Such approaches are interesting both because several familiar estimators (within, between, generalized least squares etc.) for panel data models can be interpreted as linear combinations of elementary estimators, and because we get other suggestions of estimators along the way. he paper proceeds as follows: After, in Section 2, describing the model and its transformations, we in Section 3 define disaggregate within estimators, each having the interpretation as a micro OLS (Ordinary Least Squares) or IV (Instrumental Variables) estimator. Section 4 defines an estimator class by an arbitrary weighting of the latter, while in Section 5, nine estimators, including three within, two between, three Generalized Least Squares (GLS), and one standard OLS estimator. he general estimator is shown also to contain members which are more robust to violation of the standard assumptions in random coefficient models. Both a standard regression framework and situations with simultaneity (correlation between individual effects, period effects, and/or disturbances on the one hand and the regressor vector on the other) and situations with measurement errors in the regressor vector are considered. Among the latter estimators we select estimators which are more robust to simultaneity and measurement errors and more efficient than the disaggregate estimators. Finally, Section 6 contains an empirical illustration of robustness and efficiency loss, relating to manufacturing productivity. 2 Model, notation, and transformations A linear regression model relating y to the (1 K)-vector x, with observations from individuals and periods is y it = k + x it β + ϵ it, ϵ it = α i + γ t + u it, (u it X) IID(0, σ 2 ), (α i X) IID(0, σα), 2 (γ t X) IID(0, σγ), 2 u it α j γ s, i, j = 1,..., ; t, s = 1,...,, (1) where y it and x it = (x 1it,..., x Kit ) are the values of y and x for individual i in period t, β = (β 1,..., β K ) is the coefficient vector, α i and γ t are random individual-specific and 1

5 period-specific effects (which may alternatively be interpreted as fixed, see Section 5), u it is a disturbance, and k is an intercept. At the moment, we make the above standard assumptions for two-way random effects models, which imply E(ϵ it X) = 0, E(ϵ it ϵ js X) = δ ij σ 2 α + δ ts σ 2 γ + δ ij δ ts σ 2, i, j = 1,...,, t, s = 1,...,, (2) where δ ij = 1 for i = j and = 0 for i j, and δ ts = 1 for t = s and = 0 for t s, and X is the ( K) matrix containing all x it s. Let individual-specific and period-specific vectors and matrices be y i = stacked into y = y i1. y i y 1. y, X i =, X = x i1. x i X 1. X, y t =, y = y 1t. y t y 1. y, X t =, X = x 1t. x t X 1. X and let e H be the (H 1) vector of ones, I H the H-dimensional identity matrix, A H = e H e H /H, B H = I H A H, α = (α 1,..., α ), and γ = (γ 1,..., γ ). Alternative ways of writing the regression equation are implying y i = e k + X i β + ϵ i, ϵ i = e α i + γ + u i, i = 1,...,, (3) y t = e k + X tβ + ϵ t, ϵ t = α + e γ t + u t, t = 1,...,, (4) y i ȳ = (X i X)β + ϵ i ϵ, ϵ i ϵ = e (α i ᾱ) + B γ + u i ū, (5) y t ȳ = (X t X )β + ϵ t ϵ, ϵ t ϵ = B α + e (γ t γ) + u t ū, (6) where ϵ i, u i, ϵ t, u t are defined in similar way as y i and y t, ᾱ = i α i/, γ = t γ t/, X = i X i /, X = t X t/, ȳ = i y i /, ȳ = t y t/, etc. Premultiplying (3) by B, (5) by A, (4) by B and (6) by A, give, respectively,,, B y i = B X i β + B ϵ i, A (y i ȳ) = A (X i X)β + A (ϵ i ϵ), (7) B y t = B X tβ + B ϵ t, A (y t ȳ ) = A (X t X )β + A (ϵ t ϵ ). Symbolizing by W, V, B, and C matrices containing within-individual, within-period, between-individual, and between-period (co)variation, respectively, individual-specific and period-specific cross-product matrices emerge as (8) W XXij = X i B X j = t=1 (x it x i ) (x jt x j ), W Xγi = X i B γ = t=1 (x it x i ) (γ t γ), V XXts = X tb X s = (x it x t) (x is x s), V Xαt = X tb α = (x it x t) (α i ᾱ), i, j = 1,...,, (9) t, s = 1,...,, (10) 2

6 B XXii = (X i X) A (X i X) = ( x i x) ( x i x), B Xαii = (X i X) e (α i ᾱ) = ( x i x) (α i ᾱ), C XXtt = (X t X ) A (X t X ) = ( x t x) ( x t x), C Xγtt = (X t X ) e (γ t γ) = ( x t x) (γ t γ), i = 1,...,, (11) t = 1,...,, (12) etc., where x i = (e / )X i, x t = (e /)X t, x = (e /( ))X = (e /( ))X. We have: W XXij, of full rank K if x it contains no individual-specific variables, is the (K K) matrix of within-individual covariation in the xs of individuals i and j, while V XXts, of full rank K if x it contains no period-specific variables, is the (K K) matrix of within-period covariation in the xs of periods t and s. B XXii and C XXtt, of rank 1, are the matrices of between-individual cross-products and between-period cross-products of the xs of individual i and period t, respectively. W Xγi and V Xαt are the vectors of, respectively, the within-covariation of the xs of individual i and the period-specific effects, and the within-covariation of the xs of period t and the individual-specific effects. Premultiplying the two equations in (7) by, respectively, X i B and (X i X) A, and the two equations in (8) by, respectively, X tb and (X t X ) A, give W XY ij = W XXij β + W Xϵij, W Xϵij = W Xγi + W XUij, i, j = 1,...,, (13) B XY ii = B XXii β + B Xϵii, B Xϵii = B Xαii + B XUii, i = 1,...,, (14) V XY ts = V XXts β + V Xϵts, V Xϵts = V Xαt + V XUts, t, s = 1,...,, (15) C XY tt = C XXtt β + C Xϵtt, C Xϵtt = C Xγtt + C XUtt, t = 1,...,. (16) 3 Base estimators Since E(ϵ ij X) = 0 implies E(W Xϵij X) = E(V Xϵts X) = 0, (13) and (15) motivate 2 individual-specific and 2 period-specific estimators of β, to be denoted as base estimators, or disaggregate estimators: β W ij = W 1 XXij W XY ij = (X i B X j ) 1 (X i B y j ), i, j = 1,...,, (17) β V ts = V 1 XXts V XY ts = (X tb X s) 1 (X tb y s), t, s = 1,...,, (18) so that β W ii is the OLS estimator based on the time series from individual i; βw ij, for j i, is the IV estimator which instruments the within variation of individual j, B X j, by the within variation of individual i, B X i ; β V tt is the OLS estimator based on the cross-section from period t; β V ts, for s t, is the IV estimator which instruments the within variation of period s, B X s, by the within variation of period t, B X t. 1 If individual-specific variables occur, so that W XXij contains one or more zero rows and columns, their coefficient estimates cannot be obtained from (17), but estimators for the other coefficients can be solved from W XXij βw ij = W XY ij. Likewise, if periodspecific variables occur, so that V XXts contains one or more zero rows and columns, 1 One-regressor versions of these estimators, in a measurement error context, are considered in Biørn (2017, Section 7.2.2). 3

7 their coefficient estimates cannot be obtained from (18), but estimators for the other coefficients can be solved from V XXts βv ts = V XY ts. Since inserting for W XY ij and V XY ts from (13) and (15) in (17) and (18) gives β W ij β = W 1 XXij W Xϵij = W 1 XXij (W Xγi + W XUij ), i, j = 1,...,, (19) β V ts β = V 1 XXts V Xϵts = V 1 XXts (V Xαt + V XUts ), t, s = 1,...,, (20) and (1) implies E(W XUij X) = E(W Xγi X) = 0 K1, i, j = 1,...,, (21) E(V XUts X) = E(V Xαt X) = 0 K1, t, s = 1,...,, (22) β W ij and β V ts are unbiased. Also, β W ij is -consistent since plim(w Xϵij / )=0 K,1, provided that plim(w XXij / ) is non-singular, and β V ts is -consistent since plim(v Xϵts /)= 0 K,1, provided that plim(v XXts /) is non-singular. Some estimators may be consistent under weaker conditions than (1). he following robustness results hold: Since (19) does not contain α, β W ij is -consistent if α i is fixed and unstructured or correlated with x i. If γ t is correlated with x t, consistency fails. Symmetrically, since (20) does not contain γ, β V ts is -consistent if γ t is fixed and unstructured or correlated with x t. If α i is correlated with x i, consistency fails. Endogeneity of or random measurement error in x it usually violate E(u it X) = 0 and give E(x it u it ) 0 K1, plim(w XUii / ) 0 K1 and plim(v XUtt /) 0 K1, making the OLS estimators β W ii and β V tt inconsistent, while the IV estimators β W ij (j i) and β V ts (s t) remain -consistent and -consistent. respectively. In Appendix A it is shown that when (2) holds the matrices of covariances for the base estimators are C( β W ij, β W kl X) = (σγ 2 + δ jl σ 2 )W 1 XXij W XXik W 1 XXlk, i, j, k, l = 1,...,, (23) C( β V ts, β V pq X) = (σα 2 + δ sq σ 2 )V 1 XXts V XXtpV 1 XXqp, t, s, p, q = 1,...,. (24) For (k, l) = (i, j) and (p, q) = (t, s), the variance-covariance matrices emerge as V( β W ij X) = (σγ 2 + σ 2 )W 1 XXij W XXiiW 1 XXji, i, j = 1,...,, (25) V( β V ts X) = (σα 2 + σ 2 )V 1 XXts V XXttV 1 XXst, t, s = 1,...,, (26) from which it follows that β W jj and β V ss are always superior to β W ij (j i) and β V ts (s t), respectively, as V( β W ij X) V( β W jj X) (i j) and V( β V ts X) V( β V ss X) (t s) are positive definite. We have: where V( β W ij X) V( β W jj X) = (σγ 2 + σ 2 )(W 1 XXij W XXiiW 1 XXji W 1 XXjj ) (σγ 2 + σ 2 )(A 1 W Xij A 1 W Xji I K)W 1 XXjj, V( β V ts X) V( β V ss X) = (σα 2 + σ 2 )(V 1 XXts V XXttV 1 XXst V 1 (σα 2 + σ 2 )(A 1 V Xts A 1 V Xst I K)V 1 XXss, A W Xij = W 1 XXii W XXij, A V Xts = V 1 XXtt V XXts. XXss ) 4

8 he latter are the matrix of regression coefficients when regressing the j-specific block of X, X j, on the i-specific block, X i, and when regressing the s-specific block of X, X s, on the t-specific block, X t, respectively, and (A 1 W Xij A 1 W Xji I K ), j i, and (A 1 V Xts A 1 V Xst I K ), s t, are positive definite when all regressors are two-dimensional. he structure is transparent in the one regressor case (K = 1), (23) (26) reducing to W XXik C( β W ij, β W kl X) = (σγ 2 + δ jl σ 2 ), W XXij W XXkl V( β W ij X) = (σγ 2 + σ 2 ) W XXii WXXij 2, V XXtp C( β V ts, β V pq X) = (σα 2 + δ sq σ 2 ), V XXts V XXpq V( β V ts X) = (σα 2 + σ 2 ) V XXtt VXXts 2, (27) (28) where W XXik, β W ij, etc. are the scalar counterparts to W XXik, β W ij, etc. he coefficient of correlation between two arbitrary individual-specific and two arbitrary period-specific base estimators can therefore be written as, respectively, ρ( β W ij, β C( β W ij, β W kl X) W kl X) [V( β W ij X)V( β W kl X)] 1/2 = σ2 γ + δ jl σ 2 σ 2 γ + σ 2 W XXik (W XXii W XXkk ) 1/2 = ρ(ϵ jt, ϵ lt )R W Xik, (29) ρ( β V ts, β C( β V ts, β V pq X) V pq X) [V( β V ts X)V( β V pq X)] 1/2 = σ2 α + δ sq σ 2 σ 2 α + σ 2 V XXtp (V XXtt V XXpp ) 1/2 = ρ(ϵ is, ϵ iq )R V Xtp, (30) where R W Xik = W XXik /(W XXii W XXkk ) 1/2 is the empirical coefficient of correlation between the xs of individuals i and k; R V Xtp = V XXtp /(V XXtt V XXpp ) 1/2 is the coefficient of correlation between the xs in periods t and p; ρ(ϵ jt, ϵ lt ) = (σγ 2 +δ jl σ 2 )/(σγ 2 +σ 2 ); and ρ(ϵ is, ϵ iq ) = (σα+δ 2 sq σ 2 )/(σα+σ 2 2 ). herefore, considering (3) as an -equation model with one equation per individual and common coefficient, ρ( β W ij, β W kl X) emerges as the product of the coefficient of correlation between two ϵs from individuals (equations) j and l in the same period, and the coefficient of correlation between the regressor (instrument) for individuals (equations) i and k. Likewise, considering (4) as a -equation model with one equation per period and common coefficient, ρ( β V ts, β V pq X) emerges as the product of the coefficient of correlation between two ϵs from periods (equations) s and q for the same individual, and the coefficient of correlation between the values of the regressor (instrument) in periods t and p. Hence, ρ( β W ij, β W kl X) has one equation-specific component (j vs. l) and one instrument-specific component (i vs. k), while ρ( β V ts, β V pq X) has one equation-specific component (s vs. q) and one instrument-specific component (t vs. p). For j = l (same 5

9 equation/individual) and i = k (same IV) (29) gives, respectively, ρ( β W ij, β W kj X)=R W Xik, σ2 γ ρ( β W ij, β W il X)= σγ 2 +σ 2, i k, j l, and for s = q (same equation/period) and t = p (same IV) (30) gives, respectively, ρ( β V ts, β V ps X)=R V Xtp, σ2 α ρ( β V ts, β V tq X)= σα+σ 2 2, t p, s q. From (27) and (28) it follows that the inefficiency when instrumenting the (within) variation of individual i by the (within) variation of individual j relative to performing OLS on the observations from individual j and when instrumenting the (within) variation of period t by the (within) variation of period s relative to performing OLS on the observations from period s, can be expressed simply as, respectively, V( β W ij X) V( β W jj X) = 1 A W Xij A W Xji = 1 R 2 W Xij V( β V ts X) V( β V ss X) = 1 = 1 A V Xts A V Xst RV 2. Xts Hence, R 2 W Xij and R 2 V Xts measure the efficiency loss when using estimators that are robust to inconsistency caused by simultaneity or random measurement error in the regressor, respectively, (i) in a relationship for individual j using as IV observations from another individual, i, relative to using OLS, and (ii) in a relationship for period s by using as IV observations from another period, t, relative to using OLS., 4 A class of moment estimators Since each base estimator β W ij and β V ts uses only a minor part of the panel data set, they are rarely real competitors to estimators utilizing the complete data set, when (1) is valid. And even if correlation between x it and u it, between x i and α i or between x t and γ t are allowed for, consistent aggregate estimators which are more efficient than any of the IV estimators β W ij (j i) and β V ts (s t) may exist. Yet, the insight provided by examining the base estimators is useful when constructing composite estimators of β, of which they can serve as building-blocks. In order to explore this, we define a class of estimators of β by weighting the individual-specific or period-specific (co)variation in X and y. Let ϕ = (ϕ ts ) be a ( ) matrix and ψ = (ψ ij ) an ( ) matrix of (positive, zero or negative) weights and define a general moment estimator as b = b(ϕ, ψ) = ( t=1 s=1 ϕ tsv XXts + j=1 ψ ijw XXij ) 1 ( t=1 s=1 ϕ tsv XY ts + j=1 ψ ijw XY ij ) ( t=1 s=1 ϕ tsv XXts + ( t=1 j=1 ψ ijw XXij ) 1 s=1 ϕ tsv XXts βv ts + j=1 ψ ijw XXij βw ij ), (31) 6

10 or, in simplified notation, b = t=1 s=1 G V ts β V ts + j=1 G W ij β W ij, (32) involving weighting matrices G V ts, G W ij, t s G V ts+ i j G W ij =I K, given by G V ts = Q 1 ϕ ts V XXts, t, s = 1,...,, G W ij = Q 1 ψ ij W XXij, i, j = 1,...,, Q = Q(ϕ, ψ) = t=1 s=1 ϕ tsv XXts + j=1 ψ ijw XXij. When (1) holds, b is unbiased for any ϕ and ψ. In Appendix B it is shown that its variance-covariance matrix is 2 where with (33) V(b X) = Q 1 P (Q 1 ) = Q(ϕ, ψ) 1 P (ϕ, ψ, σ 2, σ 2 α, σ 2 γ)(q(ϕ, ψ) 1 ), (34) P = P (ϕ, ψ, σ 2, σ 2 α, σ 2 γ) = σ 2 (S V + S W + S V W ) + σ 2 α Z V + σ 2 γ Z W, (35) S V = S V (ϕ) = t=1 p=1 V XXtp ( S W = S W (ψ) = S V W = S V W (ϕ, ψ) = Z V = Z V (ϕ) = t=1 Z W = Z W (ψ) = s=1 ϕ tsϕ ps ), k=1 W XXik ( j=1 ψ ij ψ kj ), t=1 s=1 j=1 ϕ tsψ ij (x is x i ) (x jt x t), p=1 V XXtp ( s=1 ϕ ts)( r=1 ϕ pr), k=1 W XXik ( j=1 ψ ij )( l=1 ψ kl ). If either ϕ ts =ϕ for all t, s or ψ ij =ψ for all i, j, S W V =0, while Z V =0 if s=1 ϕ ts = 0 for all t, and Z W =0 if j=1 ψ ij =0 for all i. he standard estimators in fixed and random effects models satisfy at least one of these restrictions, which will be shown below. From (34) (36) V(b X) can be estimated consistently for any chosen weighting matrices ϕ and ψ when consistent estimators of the variances σ 2, σα, 2 and σγ 2 are available. 5 Selected aggregate estimators he estimator b contains several familiar estimators for fixed effects models. We first describe the weighting system (ϕ, ψ) for six such estimators and other, less familiar estimators whose consistency is more robust to violation of the basic assumptions. 3 Let the matrices of overall, within individual and within period (co)variation be (36) W XX = W XXii = t=1 (x it x i ) (x it x i ), (37) V XX = t=1 V XXtt = t=1 (x it x t) (x it x t), (38) etc. he corresponding overall between individual, and between period (co)variation are B XX = B XXii = ( x i x) ( x i x) = (1/ ) t=1 s=1 V XXts, (39) C XX = t=1 C XXtt = t=1 ( x t x) ( x t x) = (1/) j=1 W XXij, (40) 2 his specializes to the formula in Biørn (1994, Appendix A) when K = 1, σ 2 γ = 0. 3 he results below generalize those in Biørn (1994, section 3), where only one regressor is included (K = 1) and period-specific effects are disregarded (γ t = 0). 7

11 etc., where the last equalities are shown in Appendix C. he matrix of overall (co)variation and its decomposition is G XX = t=1 (x it x) (x it x) = W XX + B XX = V XX + C XX W XXii + (1/ ) t=1 s=1 V XXts t=1 V XXtt + (1/) j=1 W XXij. (41) Finally, the matrix of residual (co)variation, i.e., the (co)variation which remains when all (co)variation between individuals and between periods is eliminated, the combined within-individual-and-period (co)variation, is R XX = t=1 (x it x i x t + x) (x it x i x t + x) = G XX B XX C XX (W XXii (1/) j=1 W XXij) t=1 (V XXtt (1/ ) s=1 V XXts). (42) We notice that G XX and R XX can be expressed in terms of the W XXij s and the V XXts s in two ways. Combining the decompositions exemplified in (37) (40) with (17) (18), we can now, cexpress the familiar within individual, within period, between individual, and between period estimators of β as the following aggregates β W = W 1 XX W XY = ( W XXii ) 1 ( W XXii β W ii ), (43) β V = V 1 XX V XY = ( β B = B 1 XX B XY = ( β C = C 1 XX C XY = ( t=1 V XXtt ) 1 ( t=1 V XXtt β V tt ), (44) t=1 s=1 V XXts ) 1 ( t=1 s=1 V XXts β V ts ), (45) j=1 W XXij ) 1 ( j=1 W β XXij W ij ). (46) We know that β W and β V are the MVLUE (Minimum Variance Linear Unbiased Estimator) in the cases with only fixed individual-specific and with only fixed period-specific effects, respectively, and that β B and β C are obtained by OLS estimation of equations in individual-specific and in period-specific means, respectively. Among these, β W and β C utilize the (co)variation across periods and disregard the (co)variation across individuals, while β V and β B utilize the (co)variation across individuals and disregard the (co)variation across periods. Hence, β W and β C may be said to relate to time-series analysis and β V and β B to cross-section analysis. Reconsider, with this in mind, the global (standard OLS) (G) and the residual (R) estimators. Both can be written as aggregates, as either β G = G 1 XX G XY (B XX + C XX + R XX ) 1 (B XY + C XY + R XY ) = ( W XXii + (1/ ) t=1 s=1 V XXts ) 1 ( W β XXii W ii + (1/ ) t=1 s=1 V XXts β V ts ), (47) β R = R 1 XX R XY = [ (W XXii (1/) j=1 W XXij )] 1 [ (W XXii β W ii (1/) j=1 W XXij β W ij )], (48) 8

12 or β G = ( t=1 V XXtt + (1/) j=1 W XXij ) 1 ( t=1 V β XXtt V tt + (1/) j=1 W β XXij W ij ), (49) β R = [ t=1 (V XXtt (1/ ) s=1 V XXts )] 1 [ t=1 (V XXtt β V tt (1/ ) s=1 V XXts β V ts )], (50) which follow from (17) (18) and (41) (42). While β G is the MVLUE of β in the absence of individual or period-specific heterogeneity, β R has this property when all α i s and γ t s are interpreted as unknown constants (both fixed individual and period-specific effects). 4 Briefly, (43) (50) show that all the six standard aggregate estimators for fixed effects models belong to the class (31) and can be interpreted as follows: he within-individual estimator β W and the between-period estimator β C are matrix weighted averages of the individual-specific estimators β W ij, the former utilizing only the individual-specific OLS estimators, the latter also the ( 1) individual-specific IV estimators. he within-period estimator β V and the between-individual estimator β B are matrix weighted averages of the period-specific estimators β V ts, the former utilizing only the period-specific OLS estimators, the latter also the ( 1) period-specific IV estimators. he residual estimator β R is a matrix weighted average of either all the 2 individual-specific estimators or all the 2 period-specific estimators. he global OLS estimator β G is a matrix weighted average of either (a) all the individual-specific OLS estimators, all the period-specific OLS estimators, and all the ( 1) period-specific within period IV estimators, or (b) all the periodspecific OLS estimators, all individual-specific OLS estimators, and all ( 1) individual-specific within individual IV estimators. able 1, panel A summarizes the weights. Compactly, β R = b(b, 0, ) = b(0,, B ), β B = b(a, 0, ), β C = b(0, A ), β W = b(b, A ) = b(0,, I ), β V = b(a, B ) = b(i, 0, ), β G = b(i, A ) = b(a, I ). For the total, residual, and within estimators the weights occur in two versions. We obtain their variance-covariance matrices when the random effects specification (1) is valid by inserting the weights in able 1, panel A, into (34) (36), using (37) (42). he 4 Equations (43) (46), (48) and (50) generalize one-regressor counterparts in Biørn (2017 Section 7.2.3). 9

13 results are summarized in panel B. Compactly, V( β R X) = σ 2 R 1 XX, V( β B X) = (σ 2 + σα)b 2 1 XX, V( β C X) = (σ 2 +σγ)c 2 1 XX, V( β W X) = (R XX +C XX ) 1 [σ 2 R XX +(σ 2 +σγ)c 2 XX ](R XX + C XX ) 1, V( β V X) = (R XX +B XX ) 1 [σ 2 R XX +(σ 2 + σα)b 2 XX ](R XX +B XX ) 1, V( β G X) = G 1 XX [σ2 R XX +(σ 2 + σα)b 2 XX +(σ 2 +σγ)c 2 XX ]G 1 XX. able 1: he General Moment Estimator (31) A: Weights ϕ ts and ψ ij for selected aggregate estimators ϕ tt ϕ ts, s t ψ ii ψ ij, j i ϕ ψ β R B 0, β R , B 1 1 β B 0 0 A 0, 1 1 β C 0 0 0, A β W B A β W , I 1 1 β V A B β V I 0, β G 1 0 β G I A 1 0 A I B: Covariance matrices: values of S V +S W, Z V, Z W, Q (Z V W =0) S V +S W Z V Z W Q β R R XX 0 0 R XX β B B XX B XX 0 B XX β C C XX 0 C XX C XX β W C XX + R XX 0 C XX C XX +R XX β V B XX + R XX B XX 0 B XX +R XX β G G XX B XX C XX G XX ext reconsider the GLS estimator of β, which is the MVLUE in (1). Consider first β = β(µ B, µ C, µ R ) = (µ B B XX + µ C C XX + µ R R XX ) 1 (µ B B XY + µ C C XY + µ R R XY ), (51) where (µ B, µ C, µ R ) are scalar constants. Using the decompositions exemplified by (39), (40), and (42), it can be expressed in the (31) format as either β = [µ B t=1 s=1 V XXts/ +µ R W XXii+(µ C µ R ) j=1 W XXij/] 1 [µ B t=1 s=1 V XY ts/ +µ R W XY ii+(µ C µ R ) j=1 W XY ij/], 10

14 or β = [µ C j=1 W XXij/ +µ R t=1 V XXtt+(µ B µ R ) t=1 s=1 V XXts/ ] 1 [µ C j=1 W XY ij/ +µ R t=1 V XY tt+(µ B µ R ) t=1 s=1 V XY ts/ ]; compactly β = b(µ B A, µ C A +µ R B ) b(µ B A +µ R B, µ C A ). (52) As shown by Fuller and Battese (1973, 1974), the two-way random effects GLS estimator of β in Model (1), for known (σ 2, σ 2 α, σ 2 γ), its MVLUE, can be written as where β GLS = β(λ B, λ C, 1)=(λ B B XX +λ C C XX +R XX ) 1 (λ B B XY +λ C C XY +R XY ) [ RXX σ 2 + B XX σ 2 + σα 2 + C ] 1 [ XX RXY σ 2 +σγ 2 σ 2 + B XY σ 2 + σα 2 + C ] XY σ 2 +σγ 2, (53) σ 2 σ 2 λ B = σ 2 + σα 2, λ C = σ 2 +σγ 2. he corresponding estimators when, respectively, only random individual effects occur (γ t = σ 2 γ = 0) and only random period effects occur (α i = σ 2 α = 0) are β GLS(α) = β(λ B, 1, 1) = (λ B B XX + C XX + R XX ) 1 (λ B B XY + C XY + R XY ), β GLS(γ) = β(1, λ C, 1) = (B XX + λ C C XX + R XX ) 1 (B XY + λ C C XY + R XY ). heir weights, as functions of λ B or λ C, are given in able 2, panel A, compactly: β GLS = b(b + λ B A, λ C A ) b(λ B A, B + λ C A ), β GLS(α) = b(b + λ B A, A ) b(λ B A, I ), β GLS(γ) = b(i, λ C A ) b(a, B + λ C A ), with variance-covariance matrices, see Appendix D, V( β GLS X) = σ 2 [R XX + λ B B XX + λ C C XX ] 1 [ RXX = σ 2 + B XX σ 2 + σα 2 + C ] 1 XX σ 2 +σγ 2, V( β GLS(α) X) = [R XX +λ B B XX +C XX ] 1 [σ 2 R XX +λ 2 B(σ 2 + σα)b 2 XX +(σ 2 +σγ)c 2 XX ] [R XX +λ B B XX +C XX ] 1, V( β GLS(γ) X) = [R XX +B XX +λ C C XX ] 1 [σ 2 R XX +(σ 2 + σα)b 2 XX +λ 2 C(σ 2 +σγ)c 2 XX ] [R XX +B XX +λ B C XX ] 1. 11

15 If the one-way random effects model is valid, i.e., if σγ 2 = 0 or σα 2 = 0, respectively, the latter two are simplified to [ RXX + C V( β GLS(α) X) = XX σ 2 + B ] 1 XX σ 2 + σα 2, [ RXX + B V( β GLS(γ) X) = XX σ 2 + C ] 1 XX σ 2 + σγ 2. An interesting issue is robustness of the members of the class b(ϕ, ψ) to violation of the assumptions in Model (1). From conclusions in Section 3 it follows that: [1] If x it contains an IID measurement error vector, which becomes part of u it, then (i) all estimators satisfying ϕ tt = 0, ϕ ts 0 for some s t, and all ψ ij = 0, are -consistent, and (ii) all estimators satisfying ψ ii = 0, ψ ij 0 for some j i, and all ϕ ts = 0, are -consistent. [2] If endogeneity of some variables in x it leads to E(x it u it ) 0 K,1, while E(x it u js ) = 0 K,1 for (j, s) (i, t), similar consistency results hold. able 2: he General Moment Estimator (31) For Random Effects Models A: Weights ϕ ts and ψ ij ϕ tt ϕ ts, s t ψ ii ψ ij, j i ϕ ψ β GLS 1 1 λ B 1 λ B λ B λ B β GLS β GLS(α) 1 1 λ B 1 λ B λ B λ B β GLS(α) β GLS(γ) 1 0 β GLS(γ) 1 1 λ C 1 1 λ C 1 λ C 1 λ C 1 B + λ B A λ B A B + λ B A λ C A B + λ C A A 1 0 λ B A I λ C 1 1 λc λ C 1 λc I A λ C A B + λ C A B: Covariance matrices: values of S V +S W, Z V, Z W, Q (Z V W = 0) S V + S W Z V Z W Q β GLS λ 2 BB XX +λ 2 CC XX +R XX λ 2 B B XX λ 2 CC XX λ B B XX +λ C C XX +R XX β GLS(α) λ 2 BB XX +C XX +R XX λ 2 B B XX C XX λ B B XX +C XX +R XX β GLS(γ) B XX +λ 2 CC XX +R XX B XX λ 2 CC XX B XX +λ C C XX +R XX 6 Illustration: Factor productivity In this, final section, we illustrate some of the above results for a model with a single regressor (K = 1), relating to factor productivity. he data are from successive annual orwegian manufacturing censuses, collected by Statistics orway, for the sector Manufacture of textiles (ISIC 32), with = 215 firms observed in the years , i.e., = 8. he y it s and x it s are, respectively, the log of the material input and the log of gross production, both measured as values at constant prices, so that the (scalar) coefficient β can be interpreted as the input elasticity of materials with respect to output. he OLS estimate of β obtained from the = 1720 observations is β G = From the residuals, ϵ it and their between-individual, between-period, and residual sum of squares, B ϵ ϵ = ( ϵ i ϵ) 2, C ϵ ϵ = t=1 ( ϵ t ϵ) 2, R ϵ ϵ = t=1 ( ϵ it ϵ i ϵ t+ ϵ) 2, 12

16 we obtain AOVA type estimates: σ 2 α + σ2 = B ϵ ϵ ( 1), σ2 γ + σ2 = C ϵ ϵ ( 1), σ2 = R ϵ ϵ ( 1)( 1), confer Searle, Casella, and McCulloch (1992, section 4.7.iii), which give [ ] σ α 2 1 = B ϵ ϵ R ϵ ϵ = , ( 1) 1 [ ] σ γ 2 1 = C ϵ ϵ R ϵ ϵ = , ( 1) 1 σ 2 = , σ 2 ϵ = σ 2 α + σ 2 γ + σ 2 = he corresponding shares representing individual heterogeneity, period heterogeneity, and residual variation are σ α/ σ 2 ϵ 2 = , σ γ/ σ 2 ϵ 2 = , and σ 2 / σ ϵ 2 = , while B Y Y /G Y Y = , C Y Y /G Y Y = , R Y Y /G Y Y = for log-material-input and B XX /G XX = , C XX /G XX = , and R XX /G XX = for logoutput. ot surprisingly, the between firm variation by far dominates. We have selected =10 firms randomly from the 215 in the full sample and included the =8 observations from each of them. All results refer to this subsample of =80 observations, except that the variance components are estimated from the full sample. he firm-specific estimates of the input elasticity of materials β W ij are given in able 3 (upper panel), the OLS estimates on the main diagonal, varying from 0.09 (firm 2) to 1.54 (firm 7), and the IV estimates in the off-diagonal positions, standard errors, obtained from (25), are given in the lower panel. Even the OLS estimates have low precision. he corresponding within-firm coefficients of correlation of log-output, R W Xij, given in able A3, panel A, show considerable variation, are often low, indicating that log-output for other firms are weak IVs for own log-output. he weights of the firm-specific OLS estimates (able 3) in the overall within-firm estimate, β W, which is (standard error ), are reported in able A1, panel A. he estimate for firm 1 by far dominates (weight 38 per cent). he weights of the firmspecific IV/OLS estimates (able 3) in the overall between-year estimate β C, which is (standard error ), are reported in able A1, panel B. he estimate for (i, j) = (1, 1) by far dominates (weight 15 per cent). Some off-diagonal weights are negative, reflecting negative correlation between the log-output of the relevant firms (able A3, Panel A). he year-specific estimates β V ts for the = 8 years are given in able 4 (upper panel), with the OLS estimates on the main diagonal, varying between 1.21 (cross section from year 1989) and 1.64 (cross section from year 1985), and the IV estimates in the off-diagonal positions. All of the 2 = 64 estimates exceed one, with standard errors, from (26), given in the lower panel. Overall, the precision is much higher than for the firm-specific estimates. he corresponding across-year correlation of log-output, R V Xts, given in able A3, panel B, show far less variation than the corresponding across-firm correlation. his indicates that log-output for other years are strong instruments for the year s own log-output, cf. (26) and (28). he weights of the year-specific OLS estimates (able 4) in the within-year estimate, β V, which is (standard error ), are reported in able A2, panel A. he 13

17 weights vary from 20 per cent (for 1984) and 8 per cent (for 1990). he weights of all the period-specific IV/OLS estimates (able 4) in the overall between-firm estimate β B, which is (standard error ), are reported in able A2, panel B. Again, the weights vary less than those for the firm-specific estimates and all weights are positive. he residual estimate, the OLS estimate, and the GLS estimate (with standard error in parenthesis) are, respectively, β R = (0.0875), β G = (0.1646), and β GLS = (0.0717). he latter two are known to be weighted averages of β B, β C, and β R, which agrees with the numerical values β B = , β C = , and β R = Since all the aggregate estimators considered have either all ϕ tt 0 or all ψ ii 0, they are inconsistent in cases of endogeneity of or measurement errors in the regressor, confer the end of Section 5. Modifying the between-firm estimator β B by replacing ϕ ts =1/ for all (t, s) by 0 for s=t and 1/ for s t (confer able 1), we get β B = his is - consistent and is slightly larger than the (less robust) between-firm estimate β B = Symmetrically, modifying the between-year estimator β C by replacing ψ ij = 1/ for all (i, j) by 0 for j = i and 1/ for j i (confer able 1), we get β C = , which is -consistent and is substantially smaller than the (less robust) between-year estimate β C = On the other hand, if all assumptions of Model (1) hold, β B is somewhat less efficient than β B (standard error against ), and β C is markedly less efficient than β C (standard errors against ), i.e., the efficiency loss when eliminating the disaggregate OLS estimates from the aggregate estimator to improve robustness may be substantial. able 3: Firm-specific Estimates of Materials Output Elasticity: β W ij Within deviation of firm i used as IV for within deviation of firm j i j Standard errors i j

18 able 4: Year-specific Estimates of Materials Output Elasticity: β V ts Within deviation of year t used as IV for within deviation of year s t s Standard errors t s References Biørn, E. (1994): Moment Estimators and the Estimation of Marginal Budget Shares from Household Panel Data. Structural Change and Economic Dynamics 5, Biørn, E. (2017): Econometrics of Panel Data: Methods and Applications. Oxford: Oxford University Press. Fuller, W.A., and Battese, G.E. (1973): ransformations for Estimation of Linear Models with ested Error Structure. Journal of the American Statistical Association 68, Fuller, W.A., and Battese, G.E. (1974): Estimation of Linear Models with Crossed-Error Structure. Journal of Econometrics 2, Searle, S.R., Casella, G., and McCulloch, C.E. (1992): Variance Components. ew York, Wiley. 15

19 Appendices and Appendix ables A: he covariance matrices of the base estimators: In order to derive the variance-covariance matrices of β W ij and β V ts in Model (1) is valid, we first need expressions for the variance-covariance matrices of W XUij, V XUts, W Xγi, and V Xαt. Since E(αα X) = σαi 2, E(γγ X) = σγi 2, E(u j u l X) = δ jl σ 2 I, E(u su q X) = δ sq σ 2 I, E(u j u q X) = σ 2 i q i j, j, l = 1,...,, s, q = 1,...,, where i Hj denotes the j th column of I H, we get, after some algebra, E(W XUij W XUkl X) = δ jl σ 2 W XXik, E(W Xγi W Xγk X) = σ 2 γ W XXik, E(W Xϵij W Xϵkl X) = (σ 2 γ + δ jl σ 2 )W XXik, (a.1) E(V XUts V XUpq X) = δ sq σ 2 V XXtp, E(V Xαt V Xαp X) = σ 2 α V XXtp, E(V Xϵts V Xϵpq X) = (σ 2 α + δ sq σ 2 )V XXtp, E(W XUij V XUpq X) E(W Xϵij V Xϵpq X) } = σ 2 (x iq x i ) (x jp x p), i, j, k, l = 1,...,, t, s, p, q = 1,...,. (a.2) (a.3) Combining (a.1) (a.3) with (19) (20), it follows that the matrices of covariances between the individualspecific and between the period-specific base estimators, respectively, can be expressed as C( β W ij, β W kl X) = (σ 2 γ + δ jl σ 2 )W 1 XXijW XXik W 1 XXlk, i, j, k, l = 1,...,, (a.4) C( β V ts, β V pq X) = (σ 2 α + δ sq σ 2 )V 1 XXtsV XXtp V 1 XXqp, t, s, p, q = 1,...,. (a.5) B: he covariance matrix of b: Inserting for W XY ij and V XY ts from (13) and (15) in (31), using (33), we find [ b β = Q 1 t=1 s=1 ϕtsv Xϵts + ] j=1 ψijw ) Xϵij = Q 1 [ t=1 s=1 ϕ tsv XUts + t=1 + j=1 ψ ijw XUij + ( s=1 ϕ ts V Xαt ( j=1 ψ ij ) W Xγi ]. Combining this equation with (19), (20), and (a.1) (a.3), we find that b is an unbiased estimator of β for any ϕ and ψ and has variance-covariance matrix where V(b X) = Q 1 P (Q 1 ) = Q(ϕ, ψ) 1 P (ϕ, ψ, σ 2, σ 2 α, σ 2 γ)(q(ϕ, ψ) 1 ), P = P (ϕ, ψ, σ 2, σα, 2 σγ) 2 = σ 2 (S V + S W + S V W ) + σα 2 Z V + σγ 2 Z W, S V = S V (ϕ) = ( t=1 p=1 V ) XXtp s=1 ϕ tsϕ ps, S W = S W (ψ) = ( k=1 W ) XXik j=1 ψ ijψ kj, S V W = S V W (ϕ, ψ) = t=1 s=1 j=1 ϕ tsψ ij (x is x i ) (x jt x t), Z V = Z V (ϕ) = ( t=1 p=1 V ) ( XXtp s=1 ϕ ) ts, Z W = Z W (ψ) = k=1 W XXik ( j=1 ψ ij ) ( r=1 ϕ pr l=1 ψ kl C: Proof of (39) (40): Since x i x = t=1 (x it x t)/, x t x = (x it x i )/, etc., and (X i X) A (X i X) = 1 t=1 s=1 X t B X s, t=1 (X t X) A (X t X) = 1 j=1 X i B X j hold identically, (11) and (12) can be rewritten as ). (b.1) (b.2) (b.3) B XXii = 1 t=1 s=1 (x it x t) (x is x s), B Xαii = t=1 (x it x t) (α i ᾱ), C XXtt = 1 j=1 (x it x i ) (x jt x j ), C Xγtt = (x it x i ) (γ t γ), i = 1,...,, t = 1,...,, (c.1) (c.2) 16

20 and the following identities hold B XXii = 1 t=1 s=1 V XXts, t=1 C XXtt = 1 j=1 W XXij. (c.3) Similarly, B Xαii = t=1 V Xαt, t=1 C Xγtt = W Xγi. he overall between individual and overall between period (co)variation can then be written as B XX = B XXii = ( x i x) ( x i x) = (1/ ) t=1 s=1 V XXts, (c.4) C XX = t=1 C XXtt = t=1 ( x t x) ( x t x) = (1/) j=1 W XXij. (c.5) D: he covariance matrix of β GLS : Recalling (45), (46), (48), and (53), the GLS weights in the variance-covariance matrix can be obtained from able 2, panel A, by adding λ B times the weights in row 1, λ C times the weights in row 2, and the weights in row 3 (or 4). Expressions for the variancecovariance matrix of β GLS can be derived by inserting the weights in able 2, panel A, rows 1 or 2, into (34) (36). he result is given in able 2, panel B, row 1. In deriving V( β GLS X), we use s=1 ϕ ts = λ B, j=1 ψ ij = λ C, so that, using (36), we have s=1 ϕ tsϕ ps = δ tp 1 λ2 B j=1 ψ ijψ kj = δ ik 1 λ2 C t, p = 1,...,, i, k = 1,...,, Z V = λ 2 B t=1 p=1 V XXtp = λ 2 B B XX, Z W = λ 2 C k=1 W XXik = λ 2 CC XX, which are the expressions given in able 2, panel B, columns 2 and 3. Obviously, S V W = 0. From (36), in combination with the weights in able 2, rows 1 and 2, we get S V + S W = V XX (1 λ 2 B)B XX + λ 2 CC XX = λ 2 BB XX + W XX (1 λ 2 C)C XX, Q = V XX (1 λ B )B XX + λ C C XX = λ B B XX + W XX (1 λ C )C XX, which, since V XX B XX = W XX C XX = R XX, can be simplified to S V + S W = R XX + λ 2 BB XX + λ 2 CC XX, Q = R XX + λ B B XX + λ C C XX. hese are the expressions given in able 2, panel B, columns 1 and 4. Finally, since σ 2 (S V + S W ) + σ 2 αz V + σ 2 γz W = σ 2 [R XX + λ B B XX + λ C C XX ], the covariance matrix of β GLS can be written as V( β GLS X)=σ 2 [R XX +λ B B XX +λ C C XX ] 1 = [ RXX + B XX + C ] 1 XX. (d.1) σ 2 σ 2 + σα 2 σ 2 +σγ 2 he covariance matrices of the one-way GLS estimators β GLS(α) and β GLS(γ) when the two-way effects model is valid, obtained from able 2, panel B, rows 2 and 3, are V( β GLS(α) X) = [R XX +λ B B XX +C XX ] 1 [σ 2 R XX +λ 2 B(σ 2 + σ 2 α)b XX +(σ 2 +σ 2 γ)c XX ] [R XX +λ B B XX +C XX ] 1, (d.2) V( β GLS(γ) X) = [R XX +B XX +λ C C XX ] 1 [σ 2 R XX +(σ 2 + σ 2 α)b XX +λ 2 C(σ 2 +σ 2 γ)c XX ] [R XX +B XX +λ C C XX ] 1, (d.3) which for the one-way random effects models (σγ 2 = 0 and σα 2 = 0, respectively) are simplified to [ RXX +C V( β GLS(α) X) = XX + B ] 1 XX, (d.4) σ 2 σ 2 + σα 2 [ RXX +B V( β GLS(γ) X) = XX + C ] 1 XX. (d.5) σ 2 σ 2 +σγ 2 17

21 able A1: Weights of β W ij in aggregate estimates. = 10, = 8. A. Weights of β W ii in β W, per cent. Average = 10 per cent i B. Weights of β W ij in β C, per cent. Average = 1 per cent i j able A2: Weights of β V ts in aggregate estimates. = 10, = 8. A. Weights of β V tt in β V, per cent. Average = 12.5 per cent t B. Weights of β V ts in β B, per cent. Average = 1.56 per cent t s able A3: Coefficients of Correlation, Log-Output. = 10, = 8. A. Within Firm, R W Xij i j B. Within Year, R V Xts t s

HOW IS GENERALIZED LEAST SQUARES RELATED TO WITHIN AND BETWEEN ESTIMATORS IN UNBALANCED PANEL DATA?

HOW IS GENERALIZED LEAST SQUARES RELATED TO WITHIN AND BETWEEN ESTIMATORS IN UNBALANCED PANEL DATA? ERIK BIØRN Department of Economics University of Oslo P.O. Box 1095 Blindern 0317 Oslo Norway E-mail: