Dynamic Panel Data Workshop. Yongcheol Shin, University of York University of Melbourne

Size: px

Start display at page:

Download "Dynamic Panel Data Workshop. Yongcheol Shin, University of York University of Melbourne"

Wilfred Little
5 years ago
Views:

1 Dynamic Panel Data Workshop Yongcheol Shin, University of York University of Melbourne June 2014

2 2

3 Contents 1 Introduction Models For Pooled Time Series Classical regression model Cross-sectional heteroskedasticity: heterogenous variances Cross-sectional Heteroskedasticity: Correlation across groups Autocorrelation (but not correlation across individuals) 15 2 Models for Longitudinal Data Fixed Effects Estimator Random Effects Estimator Fixed Effects or Random Effects Hausman Test Random Effects Correlated with Regressors Alternative IV Estimators Hausman and Taylor (1981) IV Estimator Further Generalization Extension to two-way error components model The fixed effects model The random effects model 34 3 Dynamic Panels Dynamic Panels with Fixed T The Anderson and Hsiao (1981) First-difference IV Estimation The Arellano and Bond (1991) IV-GMM Estimator The Arellano and Bover (1995) Study Further Readings Dynamic Panels When Both N and T are Large The Mean Group Estimator Pooled mean group estimation in dynamic heterogeneous panels 48 3

4 4 CONTENTS 33 Estimation and Inference in Panels with Nonstationary Variables 52 4 Threshold Regression Models in Dynamic Panels Introduction Regime Switching Models Structural break models Smooth Transition Autoregressive (STAR) Models Markov-Switching Autoregressive (MS-AR) Models Linearity Tests for TAR/STAR Specification Nonlinear Unit Root Tests in Regime Switching Models Unit Root Tests in Two-regime TAR Framework Unit Root Tests in Three-regime TAR Framework Unit Root Tests in ESTAR Framework (Kapetanios, Shin and Snell, 2003) Nonlinear Error Correction Models Asymmetric TAR NEC Models Asymmetric STR NEC Models MS NEC Models Panel Threshold Regression Models Model Estimation Inference Multiple thresholds Investment and financing constraints Threshold Autoregressive Models in Dynamic Panels Model FD-GMM Estimator System-GMM Estimator Estimation of and Testing for Threshold Effects Asymmetric capital structure adjustments: New evidence from dynamic panel threshold models Dynamic Panels with Threshold Effect and Endogeneity The Model Estimation Asymptotic Distributions Testing Monte Carlo Experiments & Empirical Applications Bootstrap-based Bias Corrected Within Estimation of Threshold Regression Models in Dynamic Panels Model Bootstrap-based Bias Corrected Within Estimator Estimation of and Testing for Threshold Effects Empirical Application: To be filled 112

5 CONTENTS 5 49 Further Issues Cross Sectionally Correlated Panels Overview on Cross-section Dependence Representations of CSD Weak and strong CSD The correlated common effect estimator Uses of factor models Factor models Uses Estimation Methods Calculating Principal Components Static Models Dynamic Models Issues in using PCs Factor Augmented VARS, FAVARs Estimation of Cross Sectionally Dependent Panels SURE Time effects/demeaning Including Means, the CCE estimator PANIC Residual Principal components Interactive fixed effects Further remarks Panel Gravity Models in the Presence of Cross Section Dependence Overview on the Euro s Trade Effects Extended HT estimation MSS (2013) extension A Nonlinear Panel Data Model of Cross-Sectional Dependence Model Special cases Cross-sectional dependence and factor models General suggestions on the empirical applications including the herding Modelling Technical Effi ciency in Cross Sectionally Dependent Stochastic Frontier Panels The model Econometric estimation Further Issues References 159

6 6 CONTENTS

7 List of Figures 7

8 8 LIST OF FIGURES

9 List of Tables 9

10 10 LIST OF TABLES

11 Chapter 1 Introduction The panel data are repeated time series observations on the same set of cross-section units Thus, pooling of cross-section and time series data, where there are N cross-section individuals, i = 1, 2,, N, and T time periods, t = 1, 2,, T Regression model is written as y it = β 1 x it1 + β 2 x it2 + + β k x itk + ε it, (11) i = 1, 2,, N, t = 1, 2,, T, where y it is the value of the dependent variable for cross-section unit i at time t, x itj is the value of the jth explanatory variable for unit i at time t for j = 1,, k Let x it = (x it1,, x itk ) be a 1 k vector of regressors (including constant), and β = (β 1,, β k ) a k 1 vectors of parameters Then, (11) can be compactly written as More compactly, where and finally where y it = x it β + ε it, i = 1, 2,, N, t = 1, 2,, T (12) y i = T 1 y = NT 1 y i = x i β + ε i, i = 1, 2,, N, (13) y i1 y i2 y it y 1 y 2 y N,, x i = T k x i1 x i2 x it, ε i = T 1 ε i1 ε i2 ε it, y = Xβ + ε, (14) X = NT k x 1 x 2 x N, ε = NT 1 ε 1 ε 2 ε N 11

12 12 CHAPTER 1 INTRODUCTION Motivation for use of panel data: The analysis of panel data is the subject of one of active literature in econometrics See Hsiao (2003) and Baltagi (2008) First, we can obtain effi ciency gain from using more observations eg Budget study, where y is consumption of some good, x (prices, income), prices vary over time and (real) income varies over individual Second, we can control the bias of the estimation eg Labor economics, in earnings equation, where y is wage, x education, age, etc time invariant unobservables or individual effects, which can be related to individual ability or intelligence Sources and types of the panel data The Panel Study of Income Dynamics (PSID)collected by the Institute of Social research at the University of Michigan (since 1968) Information about economic status such as income, job, marital status and so on The Survey of Income and Program Participation (SIPP, US Department of Commerce) covers shorter time periods The study by Card (1992): Effects of the minimum wage law on employment: Collected information by US States on youth employment, unemployment rates, average wages and other factors for Macropanel such as the international data set obtained from the version 55 of the Penn World Tables collected by Summers and Heston 11 Models For Pooled Time Series Here N is relatively small, and T is large enough to run separate regressions for each individual but combining individuals may yield better (more effi cient) estimates Define the NT NT covariance matrix, V = Cov(ε) = E(εε ) ε 1 ε 1 ε 1 ε 2 ε 1 ε N ε 2 ε 1 ε 2 ε 2 ε 2 ε N = E = v 11 v 12 v 1N v 21 v 22 v 2N, ε N ε 1 ε N ε 2 ε N ε N v N1 v N2 v NN where the dimension of each block is T T Assume that ε N(0, V), and X s are exogenous Then, we consider the two basic estimators:

13 11 MODELS FOR POOLED TIME SERIES 13 Ordinary least squares (OLS): ˆβ OLS = (X X) 1 X y, which is unbiased but ineffi cient; that is, Cov(ˆβ OLS ) = (X X) 1 X VX(X X) 1 Generalised least squares (GLS): ˆβ GLS = (X V 1 X) 1 X V 1 y, which is unbiased and effi cient; that is, Cov(ˆβ GLS ) = (X V 1 X) 1 Exercise 111 Show that Cov(ˆβ OLS ) Cov(ˆβ GLS ) We have different models depending on specifications of V 111 Classical regression model We have ideal conditions such as ε it s are iidn(0, σ 2 ) Then, V = σ 2 I NT, GLS = OLS, where I NT is an NT NT identity matrix 112 Cross-sectional heteroskedasticity: heterogenous variances We have ideal conditions except V ar(ε it ) = σ 2 i V ar(ε jt ) = σ 2 j, for i j, that is, the error variance is allowed to vary across individuals Then, σ 2 1 I T σ 2 2 V = I T σ 2 N I T Therefore, the GLS estimator becomes the weighted least squares estimator given by ˆβ GLS = (X V 1 X) 1 X V 1 y ( N ) 1 ( N ) = σ 2 i X ix i σ 2 i X iy i i=1 i=1

14 14 CHAPTER 1 INTRODUCTION Noting that σ 2 i s are not observable, feasible GLS estimator is obtained by ( N ) 1 ( N ) ˆβ F GLS = s 2 i X ix i s 2 i X iy i, i=1 where s 2 i = 1 ) ) (y i X iˆβ T k GLS (y i X iˆβ GLS The FGLS is consistent, and asymptotically effi cient (as T and N fixed) Example 1 Greene (1997) i=1 I it = β 1 + β 2 F it + β 3 C it + ε it, where N = 5, T = 20 ( ), I it is gross investment, F it market value of the firm and C it value of plant and equipment OLS : I = (011) F (043) C, R2 = 78, F GLS : I = (0074) F (030) C 113 Cross-sectional Heteroskedasticity: Correlation across groups Now, there is correlation across individuals at the same time; { } 0 for t s Cov(ε it, ε js ) = σ ij for t = s Then, V = σ 11 I T σ 12 I T σ 1N I T σ 21 I T σ 22 I T σ 2N I T σ N1 I T σ N2 I T σ NN I T = Σ I T, where the dimension of Σ is N N This is the same error structure as in seemingly unrelated regressions model FGLS can be obtained as previously: replace unknown σ ij in V by s ij = 1 T k (y i X iˆβ GLS ) (y j X j ˆβ GLS ), so define ˆV = V accordingly with s ij in place of σ ij Then, ( ) 1 ( ) ˆβ F GLS = X ˆV 1 X X ˆV 1 y Again, the GLS estimator is consistent and asymptotically normal as T and N fixed

15 11 MODELS FOR POOLED TIME SERIES 15 Example 2 Greene example continued: F GLS : I = (0068) F (027) C For testing that the off-diagonal elements of Σ are zero; that is, there is no correlation across groups, we use the following LM statistic developed by Breusch and Pagan (1980): LM = T N i 1 i=2 j=1 r 2 ij, where rij 2 is the ijth residual correlation coeffi cient Under the (joint) null of σ ij = 0 for i, j = 1, 2,, N, and i j, as T, LM χ 2 N(N 1) 2 Example 3 Greene example continued: Using the residuals based on the FGLS estimates given above we find LM = 5132, which is far greater than the 95% critical value of χ 2 10 Hence we may conclude that the simple heteroskedastic model is not general enough for the investment data 114 Autocorrelation (but not correlation across individuals) See Greene (1997, Section 1523)

16 16 CHAPTER 1 INTRODUCTION

17 Chapter 2 Models for Longitudinal Data Here we have large N, but small T : hence we use an asymptotic theory as N and T fixed Example 4 Panel study for income dynamics (PSID), N = 5000, T = 9 In principle methods of the previous section could be applied, but problematic because only a few time periods are available In this case the techniques has been focused on cross sectional variation or heterogeneity The basic assumption is that time-invariant individual effect, becomes part of error process: that is, we consider the following error components-based panel, y it = x it β + ε it, i = 1, 2,, N; t = 1, 2,, T, (21) and ε it is decomposed as ε it = α i + u it, (22) where α i s are called individual effects Here we assume: u it s are N(0, σ 2 u) u it are uncorrelated with x js for all i, t, j, s, ie, x s are exogenous with respect to u But, assumptions about α i vary Example 5 We have observations for T time periods on N countries We want to estimate the spillover eff ect of foreign technology on domestic firm productivity in manufacturing An error components model describing output in each region with a standard Cobb-Douglas production is given by q it = β 1 k it + β 2 l it + β 3 m it + β 4 f it + α i + u it, 17

18 18 CHAPTER 2 MODELS FOR LONGITUDINAL DATA where q it is the log of output of domestic firms, k it the log of capital, l it the log of labor, m it the log of material, f it a measure of the influence of foreign firms A positive spillover is indicated by β 4 > 0 Why should we allow for the unobserved individual effects α i? One reason is that an observed positive relationship between output in a region and foreign influence, controlling only for capital, labor and materials, might simply reflect the fact that foreign firms tend to settle in areas that lend themselves to higher productivity; there may be no spillover effect at all By adding α i we allow f it to be correlated possibly with features of region, embodied in α i, that are related to higher productivity This solution is an improvement over not allowing for α i Example 6 Let y be net migration into city i at time t We would like to see whether taxes, housing prices, educational quality and other factors influence population flows There are certain features of cities, for example geographical characteristics, reputation that could be diffi cult to model, but are essentially constant over short periods of time Because the unobservables influence y it and might also be related to local policy and economic variables, it is important to control for them One model would be y it = β 1 x it + β 2 h it + β 3 e it + β 4 c it + α i + u it, where x are tax rates, h housing prices, e educational quality and c crime rates Now α i capture all time-constant (unobservable) differences about cities that might aff ect migration Thus, the above regression allows us to estimate E (y it x it, h it, e it, c it, α i ), which makes it clear that we are controlling for unobserved city effects when estimating the effects of tax policy on net migration for example There are two main approaches to deal with α i s: fixed effects and random effects 21 Fixed Effects Estimator Here we treat α i as fixed, but remember that we do not assume α i to be uncorrelated with x it This implies that differences across cross section units can be captured in differences in the constant terms Notice that the regression of y it on x it only is biased if α i is correlated with x it We want to avoid this bias by using the fixed effects estimation Defining 1 α 1 e T α 2 0 e T 0 e T = 1 T 1, α = α N N 1, D = 0 0 e T NT N = I N e T,

19 21 FIXED EFFECTS ESTIMATOR 19 and α = α 1 α 1 α 2 α 2 α Ṇ = α 1 e T α 2 e T α N e T = Dα = α e T, α N NT 1 then we have (in T observations for each individual) or (in NT observations) y i = x i β + α i e T + u i, (23) y = Xβ + α + u = y = Xβ + Dα + u (24) Notice that the bias in regression of y on X only is due to omission of D in (24) Solution is simply to regress y on (X, D) So it is sometimes called least squares dummy variable (LSDV) model; that is, ˆβ = coeffi cients of X in the regression of y on (X, D) But there is a computational problem for large N, since the dimension of D is very big (N columns) So we need an alternative formulation Define the NT NT matrices P and Q by P = D(D D) 1 D, Q = I NT P = I NT D(D D) 1 D, then a standard result from least squares algebra says: ˆβ = (X QX) 1 (X Qy) Digression on algebra: Notice that e T e T = T ; e T e T =

20 20 CHAPTER 2 MODELS FOR LONGITUDINAL DATA Defining P = 1 T e T e T = 1 T 1 T 1 T 1 1 T T 1 1 T T 1 1 T T, then this matrix creates means for any T 1 vector c = [c 1,, c T ] ; P c = c c c = ce T, c = 1 T T c t t=1 Next, define Q = I T 1 T e T e T = 1 1 T 1 T 1 T 1 T 1 1 T 1 T 1 T 1 T 1 1 T ; which makes deviations from means : that is, Q c = These matrices are relevant because c 1 c c 2 c c T c P = I N P, Q = I N Q, which are idempotent, symmetric PP = P, QQ = Q, P = P, Q = Q and orthogonal to each other PQ = 0

21 21 FIXED EFFECTS ESTIMATOR 21 They make individual means and deviation from individual means ; ȳ 1 y 11 ȳ 1 ȳ 1 y 1T ȳ 1 ȳ 2 y 21 ȳ 2 Py = ȳ 2, Qy = y 2T ȳ 2, (25) ȳ Ṇ y N1 ȳ N y NT ȳ N ȳ N where ȳ i = 1 T T t=1 y it and similarly for x and ū In particular, note that because Qy = Q (Xβ + α + u) = QXβ + Qu, Qα = 0 Therefore, taking deviations from (individual) means removes time-invariant unobservables Multiplication by Q (taking deviations from individual means) is often called the within transformations Within estimator expressions: This can be obtained by one of the following equivalent 1 Regression Qy on Qx 2 Regression of y it ȳ i on x it x i, i = 1,, N, t = 1,, T, where ȳ i = 1 T T t=1 y it and x i = 1 T T t=1 x it 3 OLS estimation with dummy variables (LSDV) The within estimator is obtained by 1 Here we should bear in mind that ˆβ W = ( X QX ) 1 X Qy 1 Using the (double) summation notation, we have { N } 1 { T N } T ˆβ W = (x it x i) (x it x i) (x it x i) (y it ȳ i) i=1 t=1 i=1 t=1

22 22 CHAPTER 2 MODELS FOR LONGITUDINAL DATA 1 We cannot include any time-invariant regressors 2 If you estimate by within-estimation and still want ˆα i, then ˆα i = ȳ i x iˆβ W 3 Statistical properties of ˆβ W : unbiased, consistent (as N or T ) and asymptotically normal, 2 { NT (ˆβ β) N 0, σ 2 u lim NT ( ) } 1 1 NT X QX The usual inference procedures can be used like t and Wald tests In sum, the fixed effects estimation removes potential bias caused by time-invariant unobservables by the within transformation Cost is that only variation over time (not between individual) is used in estimating β, which would possibly result in imprecise estimates Essentially the fixed effects model concentrates on differences within individuals; it is explaining to what extent y it differs from ȳ i and does not explain why ȳ i is different from ȳ j It may be important to realize that β s are identified (or consistently estimated) only through within variation of the data 22 Random Effects Estimator We consider the same basic model, (516), but now assume: u it s are iidn(0, σ 2 u) α i s are iidn(0, σ 2 α) α i s are uncorrelated with u jt for all i, j, t, that is, E [α i u jt ] = 0 for all i, j, t α i and u it are uncorrelated with x js for all i, j, s, t (so x is exogenous with respect to α and u) 2 A consistent estimate for σ 2 u is obtained as the within residual sum of squares divided by N (T 1), that is, ˆσ 2 u = 1 N (T 1) N T i=1 t=1 { (y it ȳ i) ˆβ W (x it x i)} 2 It is also possible to apply the usual degrees of freedom correction in which case k is subtracted from the denominator

23 22 RANDOM EFFECTS ESTIMATOR 23 This approach would be appropriate if we believed that sampled crosssectional units were drawn from a large population First, the OLS estimator is unbiased or consistent, because x is assumed to be exogenous with respect to ε = α + u, but ineffi cient This model is suitable for case of pooling, not of bias reduction as in the fixed effects model In general, the GLS estimator will be more effi cient For the GLS estimation we need a more detailed expression for Cov (ε) = V First, consider Cov(ε i ) = E ε 2 i1 ε i1 ε i2 ε i1 ε it ε i2 ε i1 ε 2 i2 ε i2 ε it ε it ε i1 ε it ε i2 ε 2 it Under the assumptions given above it is easily seen that Hence, for all i, E ( ε 2 it) = E ( α 2 i + u 2 it + 2α i u it ) = σ 2 α + σ 2 u, T T E (ε it ε is ) = E (α i + u it ) (α i + u is ) = σ 2 α, t s Cov(ε i ) = σ 2 αe T e T + σ 2 ui T = σ 2 α + σ 2 u σ 2 α σ 2 α σ 2 α σ 2 α + σ 2 u σ 2 α σ 2 α σ 2 α σ 2 α + σ 2 u Therefore, the NT NT matrix V can be written as Ω Ω 0 V = Cov(ε) = = I N Ω, (26) 0 0 Ω where is a Kronecker product Now, recall P = 1 T e T e T, Q = I T 1 T e T e T, P = I N P, Q = I N Q Then, Ω can be rewritten as Next, where we used Ω = ( T σ 2 α + σ 2 ) 1 u T e T e T + σ 2 u = ( T σ 2 α + σ 2 u) P + σ 2 uq ( I T 1 ) T e T e T V = I N Ω = ( T σ 2 α + σ 2 u) P + σ 2 u Q, (27) P = I N P, Q = I N Q

24 24 CHAPTER 2 MODELS FOR LONGITUDINAL DATA Digression on derivation of the inverse of V we need to find For the GLS estimation V 1 = I N Ω 1 or V 1/2 = I N Ω 1/2 Using the special nature of P and Q, it can be shown that 3 V 1 = 1 T σ 2 α + σ 2 u P + 1 σ 2 Q, (28) u and then { } V 1/2 1 = P + 1 Q = 1 σ 2 u T σ 2 α + σ 2 u σ 2 u σ u T σ 2 α + σ 2 P + Q u { ( ) } = 1 σ I NT 1 2 u σ u T σ 2 α + σ 2 P u Define then θ = 1 σ 2 u Notice that the GLS estimator is obtained by T σ 2 α + σ 2, (29a) u V 1/2 = 1 σ u (I NT θp) (210) ˆβ GLS = (X V 1 X) 1 (X V 1 y) [ 1 = (V 1/2 X) (V X)] 1/2 (V 1/2 X) (V 1/2 y), and therefore ˆβ GLS is obtained from the regression of V 1/2 y on V 1/2 X In fact, this is equivalent to the OLS estimation after θ differences Since V 1/2 y = 1 σ u (I NT θp) y = 1 σ u (y θpy), or more precisely and likewise 1 ( ) V 1/2 y σ = 1 (y it θȳ i ), u it σ u 1 ( ) V 1/2 X σ = 1 (x it θ x i ), u it σ u 3 Similarly, the inverse of Ω can be obtained as Ω 1 = 1 P + 1 Q T σ 2 α + σ 2 u σ 2 u

25 22 RANDOM EFFECTS ESTIMATOR 25 where ȳ i = 1 T T t=1 y it and x i = 1 T T t=1 x it So the GLS estimator is obtained from a regression of (y it θȳ i ) = (x it θ x i ) β + ε it, where ε it = ε it θ ε i (the proportionality constant 1 σ u being cancelled out) In other words, a fixed proportion θ of the individual means is subtracted from the data to obtain this transformed model We note in passing that 1 We can include time-invariant or individual specific variables They also get multiplied by I NT θp 2 If θ = 0, then the GLS estimator is equivalent to OLS But, this would occur only if σ 2 α = 0 3 The GLS estimator is consistent as N (with T fixed or with T such that N T constant) It is also asymptotically effi cient relative to the within estimator with ) Cov (ˆβ GLS = ( X V 1 X ) ( 1 = {X 1 T σ 2 α + σ 2 P + 1 ) 1 u σ 2 Q X} u 4 The effi ciency difference tends to 0 as T and θ 1 (When θ = 1, ˆβ GLS = ˆβ W ) Between estimator We formulate the model in terms of the individual means, where ȳ i = x i β + ε i, i = 1,, N, (211) ȳ i = ȳ i e T, x i = x i e T T 1 T k ȳ i = T 1 ε i = α i e T + ū i, T t=1 y it, x i = T 1 Reminding that the matrix P creates means for any conformable vector, then we write (211) in matrix form as T t=1 x it Py = PXβ + Pε (212) The OLS estimator in the above regression gives the between estimator, ˆβ B = ( X PX ) 1 X Py, (213)

26 26 CHAPTER 2 MODELS FOR LONGITUDINAL DATA which is also unbiased and consistent as N under the assumption that x i is uncorrelated with α i (such that E ( x i ε i) = 0), and with ) V ar (ˆβB = ( T σ 2 α + σ 2 ( u) X PX ) 1 The between estimator ignores any information within individuals The formular in (213) will be simplified as: Notice from (25) that Py and PX can be written as Py = ȳ e T, PX = X e T, where and thus we have ȳ = N 1 ȳ 1 ȳ N ; X N k = ˆβ B = { ( x e T ) ( x e T ) } 1 ( x et ) (ȳ e T ) (214) = ( x x e ) T e ( x ȳ 1 T e ) T e T = ( x x T ) 1 ( x ȳ T ) { = ( x x ) } 1 T 1 ( x ȳ T ) = ( x x ) 1 x ȳ, which is equivalent to the OLS estimator obtained from the following crosssectional regression: x 1 x N, ȳ i = x i β + ε i, i = 1,, N (215) The GLS estimator can be shown to be a weighted average of the within estimator ˆβ W and the between estimator ˆβ B, where ˆβ G = Fˆβ W + (I k F) ˆβ B, F = ( X QX + λx PX ) 1 X QX, λ = (1 θ) 2 This clearly shows that the effi ciency gain of the GLS relative to the within estimator comes from the use of between (across individuals) variations The GLS estimator is the optimal combination of the within and the between estimator Therefore, it is more effi cient than either There are some polar cases to consider: 1 If λ = 1, the GLS is equivalent to the OLS This would occur only if σ 2 α = 0 Thus, the OLS estimator is also a linear combination of the within and the between estimators, but ineffi cient one 2 If λ = 0, the GLS is equivalent to the within estimator There are two possibilities The first is σ 2 u = 0, in which case all variation across individuals would be due to α i s, which would be equivalent to the dummy variables used in the fixed effects model The question of whether they were fixed or random would be moot The other case is T

27 22 RANDOM EFFECTS ESTIMATOR 27 Feasible GLS estimator We need to find a consistent (as N ) estimator of θ = 1 σ 2 u σ 2 u + T σ 2, α and then V (see (27)) The estimator of σ 2 u is easily obtained from the within residuals (see (54) or (24)), denoted ˆσ 2 u and estimated by ˆσ 2 u = with the within residuals given by 1 N (T 1) k N i=1 t=1 û it = (y it ȳ i ) (x it x i ) ˆβ W From the between regression (215) we find T û 2 it, (216) σ 2 B = E ( ε 2 i ) = E ( α 2 i + ū 2 i + 2α i ū i ) = σ 2 α + 1 T σ2 u Hence, the consistent estimator for ˆσ 2 α is obtained by ˆσ 2 α = ˆσ 2 B 1 T ˆσ2 u, where ˆσ 2 B is the consistent estimator of σ2 B = σ2 α + 1 T σ2 u obtained by ˆσ 2 B = 1 N k with the between residuals given by 4 N ε 2 i, (217) i=1 ε i = ȳ i x i ˆβ B Now, we have ˆθ = 1 ˆσ 2 u,w ˆσ 2 u,w + T, ˆσ2 α and the resulting feasible GLS estimates is the random effects estimator, ˆβ RE = (X ˆV 1 X) 1 (X ˆV 1 y), (218) where ˆV 1 = 1 ˆσ 2 u,w + T ˆσ2 α P + 1 ˆσ 2 Q u,w 4 It is also possible to apply degrees of freedom correction in computing ˆσ 2 u,w and ˆσ 2 B

28 28 CHAPTER 2 MODELS FOR LONGITUDINAL DATA One thing to note is that the implied estimate of σ 2 α may be negative in finite samples Such a negative finding calls the specification of the model into question See Green, section 1643b None of the desirable properties of the random effects estimator relies on T, although it can be shown that some consistency results follow for T increasing On the other hand, in this case the fixed effects estimator does rely on T increasing for consistency See Nickell (1981) 23 Fixed Effects or Random Effects Whether to treat individual effects α i as fixed or random is not an easy question to answer The most common view is that the discussion should not be about the true nature of α i The appropriate interpretation is that the fixed effects approach is conditional on the values of α i This makes sense if the individuals in the sample are one of a kind and cannot be viewed as a random draw from some underlying population This is probably most appropriate when i denotes, (large) companies or industries In contrast the random effects approach is not conditional on the individual α i s but integrates them out In this case we are usually not interested in the particular value of some individual s α i ; we just focus on arbitrary individuals that have certain characteristics The random effects approach allows one to make inference with respect to population characteristics One way of formalizing this is noting that the random effects model states that E (y it x it ) = x it β, while the fixed effects model estimates E (y it x it ) = x it β + α i β s in these two conditional expectations are the same only if E (α i x it ) = 0 However, even if we are interested in the larger population of individuals, and a random effects framework seems appropriate, the fixed effects estimator may be preferred, since it is likely the case that x it and α i are correlated in which the random effects approach, ignoring this correlation, leads to inconsistent estimators This problem of correlation can be handled only by using the fixed effects approach 231 Hausman Test The general specification test suggested by Hausman (1978) can be used to test the null hypothesis H 0 : x it and α i are uncorrelated,

29 23 FIXED EFFECTS OR RANDOM EFFECTS 29 against the alternative hypothesis H 1 : x it and α i are correlated This test is based on an idea that the fixed effects estimator is consistent under both the null and the alternative while the random effects estimator is consistent only under the null but effi cient Let us consider the difference between ˆβ W and ˆβ RE To evaluate the significance of this difference we need to find its covariance matrix Under the null the two estimates should not differ significantly, and it can be also shown under the null that ) V ar (ˆβ W ˆβ RE ) ) = V ar (ˆβ W + V ar (ˆβ RE Cov ) = V ar (ˆβ W V ar (ˆβ W, ˆβ RE ) Cov (ˆβ W, ˆβ RE ) (ˆβ RE ), (219) where we used Hausman s essential result that ) ) ) ) Cov ((ˆβ W ˆβ RE, ˆβ RE = Cov (ˆβ W, ˆβ RE V ar (ˆβ RE = 0, or ) ) Cov (ˆβ W, ˆβ RE = V ar (ˆβ RE Consequently, the Hausman test statistic is defined as ) { ) h = (ˆβ W ˆβ RE V ar (ˆβ W V )} 1 ) ar (ˆβ RE (ˆβ W ˆβ RE, (220) where V ) ar (ˆβ W V ar (ˆβ RE ) Under the null, and V ) ) ar (ˆβ RE denote the estimates of V ar (ˆβ W and h χ 2 (k), where k is the number of parameters The Hausman test thus tests whether the fixed effects and random effects estimators are significantly different It is also possible to test for a subset of parameters in β 232 Random Effects Correlated with Regressors Essential difference between the fixed and random effects is whether or not the individual effects are correlated with regressors We now consider the case where the random effects are correlated with regressors Mundlak (1978) argued that the dichotomy between fixed effects and random effects models disappears if we make the assumption that α i depend on the mean values of x i, an assumption he regards as reasonable in many problems As before, consider the error components model, y i = x i β + α i + u i, (221)

30 30 CHAPTER 2 MODELS FOR LONGITUDINAL DATA but now assume α i = x i π + w i, where w i has the same properties that α i was assumed to have; that is, 1 w i s are iidn(0, σ 2 w) 2 w i s are uncorrelated with u jt for all i, j, t; E [w i u jt ] = 0 for all i, j, t 3 w i s are uncorrelated with x jt for all i, j, t; E [w i x jt ] = 0 for all i, j, t Then, we rewrite (221) as y i = x i β + x i π + w i + u i, (222) which can be written in matrix form as y = Xβ + PXπ + w + u = [ X PX ] [ β π ] + w + u, where we used α = (PX) π + w, and Cov(w + u) = V (see (26)) Carrying out the GLS estimation, then [ ] { ˆβ GLS [ ] = X PX V 1 [ X PX ]} 1 [ ] X PX V 1 y ˆπ GLS After some algebra, it can be shown that 5 with (223) ˆβ GLS = ˆβ W = ( X QX ) 1 X Qy, (224) ˆπ GLS = ˆβ B ˆβ W = ( X PX ) 1 X Py ( X QX ) 1 X Qy, (225) V ar (ˆπ GLS ) = V ar ) ) (ˆβ B +V ar (ˆβ W = ( T σ 2 w + σ 2 ( u) X PX ) 1 ( +σ 2 u X QX ) 1 This shows that for the linear regression model, the fixed effects is effectively the same as the random effects correlated with all regressors The test of π = 0 can be based on the following statistic: ˆπ GLS [V ar (ˆπ GLS )] 1 ˆπ GLS d χ 2 k under H 0 5 The same result can be derived alternatively Applying the GLS transformation described earlier to (222), we have y it θȳ i = (x it θ x i) β + ( x i θ x i) π + v it = (x it x i) β + x iδ + v it, where δ = (1 θ) (π + β) Using that x i is orthogonal to x it x i, we get the result

31 24 ALTERNATIVE IV ESTIMATORS Alternative IV Estimators The fixed effects estimator eliminates anything that is time-invariant from the model, which might be a high price to pay for allowing the x variables to be correlated with individual specific heterogeneity α i For example, we may be interested in the effect of time invariant variables like gender or schooling on a person s wage In this section we show that there is no need to restrict attention to the fixed and the random effects only, as it is possible to derive instrumental variables estimators that can be considered an intermediate case between fixed and random effects approach We now show that the fixed effects estimator is a special case of an IV estimator Notice that the fixed effects estimator can be written as { N } 1 { T N } T ˆβ W = (x it x i ) (x it x i ) (x it x i ) (y it ȳ i ) = i=1 t=1 { N i=1 t=1 } 1 { T N (x it x i ) x it i=1 t=1 i=1 t=1 } T (x it x i ) y it = ˆβ IV, which shows that ˆβ W has an interpretation of the IV estimator, y it = x it β + α i + u it, i = 1, 2,, N, ; t = 1, 2,, T, where x it is instrumented by x it x i Notice that by construction E [ (x it x i ) α i ] = 0, so that an IV estimator is consistent provided that E [ (x it x i ) u it ] = 0, which is satisfied by our assumption of strict exogeneity of x it This route may allow us to estimate the effect of time invariant variables in a general context To describe this approach, consider the following model: y it = x 1,it β 1 +x 2,it β 2 +z 1,i γ 1 +z 2,i γ 2 +α i +u it, i = 1, 2,, N, ; t = 1, 2,, T, (226) where we have four different groups of variables; x s are varying over both time periods and cross-section units, but z s are varying only over crosssection units and time-invariant In addition we assume that the 1 k 1 vector x 1,it and the 1 g 1 vector z 1,i are uncorrelated with α i : plim N 1 N N i=1 x 1,i α i = 0, plim N 1 N N z 1,i α i = 0, i=1

32 32 CHAPTER 2 MODELS FOR LONGITUDINAL DATA whereas the 1 k 2 vector x 2,it and the 1 g 2 vector z 2,i are correlated with α i (k 1 + k 2 = k and g 1 + g 2 = g) Under these assumptions, the fixed effects estimation provides still consistent estimators for β 1 and β 2, but would not identify γ 1 and γ 2, since time-invariant variables z 1,i and z 2,i are wiped out by the within transformation 241 Hausman and Taylor (1981) IV Estimator Hausman and Taylor (1981) suggest to estimate (226) by IV using the following variables as instruments: x 1,it for x 1,it, x 2,it x 2,i for x 2,it, z 1,i for z 1,i, and x 1,i for z 2,i, that is, uncorrelated variables x 1,it and z 1,i trivially serve as their own instruments, but x 2,it s are instrumented by their deviation from individual means as in the fixed effects estimation, and finally, z 2,i is instrumented by the individual average of x 1,it Obviously the identification requires that the number of x 1,it is as large as that of z 1,i, (k 1 > g 1 ) The resulting estimator, Hausman-Talyor estimator, also allows us to estimate γ 1 and γ 2 consistently If some of time-invariant variables are believed to be correlated with α i, we require that suffi cient time-varying variables that are not correlated with α i should be included for instruments In particular, the advantage of the Hausman and Taylor is that one does not have to use external instruments, instruments can be obtained within the model There are two versions of the Hausman-Taylor estimator, called HT-IV and HT-GLS, respectively HT-IV estimator (consistent but less effi cient) We rewrite (226) in the matrix from y = X 1 β 1 + X 2 β 2 + Z 1 γ 1 + Z 2 γ 2 + α + u (227) = Xβ + Zγ + α + u, where X = (X 1, X 2 ), Z = (Z 1, Z 2 ), β = ( β 1, β 2) and γ = (γ 1, γ 2 ) We first estimate by ˆβ W (within estimator), which is still consistent, and consider the following averaged within residuals in matrix form: ( ) { )} P y Xˆβ = Zγ + α + Pu + PX (ˆβ ˆβ, (228) where PZ = Z and Pα = α Applying the 2SLS to (228) and using the instruments of (X 1, Z 1 ), then we obtain the consistent estimate of γ by ˆγ W = { Z P (X1,Z 1 )Z } { ( )} 1 Z P (X1,Z 1 ) y Xˆβ,

33 24 ALTERNATIVE IV ESTIMATORS 33 where P (X1,Z 1 ) = (X 1, Z 1 ) { (X 1, Z 1 ) (X 1, Z 1 ) } 1 (X1, Z 1 ) is the orthogonal projection operator onto its column space In sum, ˆγ W exists and is consistent as N if k 1 g 2 We need at least as many instruments [X 1, Z 1 ] as regressors [Z 1, Z 2 ] So you need at least as many X 1 s as Z 1 s HT-GLS estimator (consistent and effi cient) Transform (227) by V 1/2 as above (θ difference), V 1/2 y = V 1/2 Xβ + V 1/2 Zγ + V 1/2 (α + u), (229) then do 2SLS using the instruments of (QX, Z 1, PX 1 ) = (QX 1, QX 2, Z 1, PX 1 ) to obtain ˆβ GLS and ˆγ GLS, where QX are deviations from means of all timevarying variables, PX 1 means of all time-varying variables not correlated with effects, Z 1 time-invariant variables not correlated with effects Condition for existence of the estimator now becomes k 1 +k 2 +g 1 +k 1 (number of instruments) k 1 +k 2 +g 1 +g 2 (number of regressors) or k 1 g 2, which is the same as for the simple estimator Summary of HT estimator 1 k 1 < g 2 (underidentification): ˆβ W = ˆβ GLS, but and ˆγ W and ˆγ GLS do not exist 2 k 1 = g 2 (just-identification): ˆβ W = ˆβ GLS, and ˆγ W = ˆγ GLS 3 k 1 > g 2 (over-identification): ˆβ GLS, ˆγ GLS are more effi cient than ˆβ W, ˆγ W An empirical application: Estimating returns to schooling See Hausman and Taylor (1981) 242 Further Generalization Amemiya and McCurdy (1986) and Breusch, Mizon and Schmidt (1989) all consider the same random effects model correlated with some but not all regressors, and suggest a larger set of instruments to improve upon the effi ciency of the Hausman and Taylor estimator A basic question is how

34 34 CHAPTER 2 MODELS FOR LONGITUDINAL DATA many explanatory variables or some linear combinations must be uncorrelated with the effects in order to improve on the within estimator and to include time-invariant regressors Amemiya and McCurdy (1986) suggest the use of the time invariant instruments, x 1,i1 x 1,i,, x 1,iT x 1,i, for z 2,i This requires that E [ (x 1,it x 1,i ) α i ] = 0 for each t, which makes sense if the correlation between x 1,it and α [ i is due ] to a time invariant component in x 1,it such that for a given t, E x 1,it α i does not depend on t Breusch, Mizon and Schmidt (1989) summarise this literature suggesting x 2,i1 x 2,i,, x 2,iT x 2,i as additional instruments for z 2,i 25 Extension to two-way error components model Consider the linear regression model with large N and large T y it = x it β + α i + λ t + u it, (230) where α i denotes the unobservable individual effect and λ t denotes the unobservable time effect 251 The fixed effects model Regress y on [X, dummy for individual, dummy for time], which is equivalent to within transformation Define the following means: y i = 1 T T y it, y t = 1 N t=1 N i=1 Then, carry out the within transformation: y it, y = 1 NT y w it = y it y i y t + y, x w it = x it x i x t + x N i=1 t=1 T y it Then, the within estimator is obtained from the regress y w it on xw it Notice that this transformation removes anything that does not vary over time (eg, α i ) and anything that does not vary over individual (eg, λ t ) The within estimator is unbiased, but consistency depends on 252 The random effects model Treat α i and λ t as iid draws from N(0, σ 2 α) and N(0, σ 2 λ ), and not correlated with X Then, the OLS estimator is unbiased and consistent as N and T The GLS estimator is more effi cient For details see Baltagi (2008)

35 Chapter 3 Dynamic Panels 31 Dynamic Panels with Fixed T Consider a dynamic panel with a lagged dependent variable as regressor, y it = φy it 1 + x it β + ε it, i = 1,, N, t = 1,, T, (31) where we assume an error component specification, ε it = α i + u it, where u i iidn ( 0, σ 2 u) The motivation here is to distinguish the true dependence of y on lagged y from spurious correlation due to unobserved heterogeneity (eg, earning across generations) We also assume for simplicity that x it s are not correlated with α i, and x it is uncorrelated with u it for all t = 1,, T Note, however, that α i is still correlated with y i,t 1 since y i,t 1 contains α i Hence, y it 1 is correlated with ε it This renders the OLS estimator biased and inconsistent even if u it s are not serially correlated Thus, OLS and GLS estimators are biased and inconsistent Fixed effects would seem to be an obvious possibility, but the within estimator is inconsistent as N for fixed T To illustrate this problem, consider a simple autoregressive model, 1 y it = φy it 1 + u it, i = 1,, N, t = 1,, T, (32) where we assume φ < 1 The fixed effects estimator is given by N T i=1 t=1 ˆφ W = (y it 1 ȳ i, 1 ) (y it ȳ i ) N T i=1 t=1 (y it 1 ȳ i, 1 ) 2, (33) 1 Extension to a dynamic panel with exogenous variables would be straightforward and in this case we would obtain exactly the same result, at least asymptotically 35

36 36 CHAPTER 3 DYNAMIC PANELS where ȳ i = 1 T T y i,t, ȳ i, 1 = 1 T t=1 T y i,t 1, and we assume that y 0 exists Clearly, the within transformation (y i,t 1 ȳ i, 1 ) is uncorrelated with α i, but it is still correlated with u it ; that is, t=1 Cov (y i,t 1 ȳ i, 1, u it ) = σ2 u T, so the within estimator is biased and inconsistent as N for a fixed T To analyze this, we can substitute (32) into (33) and get N T i=1 t=1 ˆφ W = φ + (u it 1 ū i, 1 ) (y it ȳ i ) N T i=1 t=1 (y it 1 ȳ i, 1 ) 2 (34) In fact, the fixed effects estimator will be biased of O ( T 1), and in particular, Nickell (1981) has shown that 2 1 N T plim N (u it 1 ū i, 1 ) (y it ȳ i ) = σ2 u (T 1) T φ + φ T NT T (1 φ) 2 i=1 t=1 (35) Therefore, for a typical panel where N is large and T is fixed, the fixed effects estimator is biased and inconsistent 3 Only if T, the fixed effects estimator will be consistent for the dynamic error components model 311 The Anderson and Hsiao (1981) First-difference IV Estimation Fortunately, there is a relatively easy way to fix the inconsistency problem Alternative transformation that wipes out the individual effects yet does not create the above problem in dynamic panels is the first difference transformation Take first difference of (32) to get rid of α i and obtain y it = φ y it 1 + u it, t = 2,, T, (36) where we note that u it is an MA(1) process with unit root The OLS estimator obtained from (44) will be inconsistent since y it 1 and u it or more precisely y it 1 and u it 1 are by definition correlated Anderson and Hsiao suggested y it 2 or y it 2 as an instrument for y it 1 These instruments will not be correlated with u it as long as the u it are not serially 2 See p328 in Verbeek (2010) for actual magnitude of the bias for a fixed T and for different values of φ See also Phillips and Sul (2003) 3 The same inconsistency problem occurs with the random effects GLS estimator

37 31 DYNAMIC PANELS WITH FIXED T 37 correlated For example, when using y it 2 as an instrument for y it 1, then we obtain the following (consistent) IV estimator: 4 ˆφ IV = N T i=1 t=1 y it 2 y it N T i=1 t=1 y (37) it 2 y it 1 Since u it = u it u i,t 1, any of y i,s, s t 2 can be used as legitimate instruments 312 The Arellano and Bond (1991) IV-GMM Estimator It is well-known that imposing more moment conditions increases the efficiency of the estimators provided the additional moment conditions are valid Arellano and Bond (1991) show that the list of instruments can be extended by exploiting additional moment conditions and letting their number vary with t In particular, they argue that additional instruments can be obtained if one utilizes the orthogonality conditions that exist between lagged values of y it and u it Consider the simple autoregressive panel (32) and its first-difference version (44) For t = 3, we observe (note here t = 2,, so the observation starts from y i1, not y i0 ) y i3 = φ y i2 + u i3, and thus y i1 is a valid instrument since it is highly correlated with y i2 but not correlated with u i3 Note when t = 4, then y i4 = φ y i3 + u i4, and y i2 as well as y i1 are valid instruments for y i3 since both are not correlated with u i4 Continuing in this fashion, for the period T, the set of valid instruments becomes (y i1, y i2,, y it 2 ) Define the (T 2) ( T 2) matrix, W i = (y i1 ) (y i1, y i2 ) (y i1, y i2,, y it 2 ), (38) as the matrix of instruments, where each row contains the instruments that are valid for a given period Consequently, the set of all moment conditions 4 A necessary condition for consistency is that plim N 1 NT N i=1 t=2 T y it 2 u it = 0

38 38 CHAPTER 3 DYNAMIC PANELS can be written concisely as E ( W i u i ) = 0, where u i = ( u 3,, u T ) or alternatively, E ( W i ( y i φ y i, 1 ) ) = 0, where y i = ( y 3,, y T ) and y i, 1 = ( y 2,, y T 1 ) are T 2 vectors, respectively A total number of moment conditions adds up to ( T 2) Next, define the N (T 2) ( T 2) matrix of instruments by W = and rewrite (44) in the matrix form as W 1 W N, y = φ y 1 + u, (39) where y = y 1 y N N(T 2) 1, y 1 = y 1, 1 y N, 1 N(T 2) 1, u = u 1 u N N(T 2) 1 Pre-multiplying (39) by W, W y = φw y 1 + W u (310) The Arellano and Bond s suggested estimator is the GLS estimator applied to (310); that is, ˆφ GLS = { y 1WV 1 W y 1 } 1 { y 1 WV 1 W y }, (311) where V = V ar (W u) They propose the two feasible GLS estimators First, under the assumption that u it is iid over both i and t, it is easily seen that E ( u i u i) = σ 2 u (I N G), where u i = ( u 3,, u T ) and G is the matrix given by G = (T 2) (T 2)

39 31 DYNAMIC PANELS WITH FIXED T 39 Then, V = V ar ( W u ) = σ 2 uw (I N G) W Therefore, we obtain one-step Arellano and Bond (GMM) estimator by ˆφ F GLS,1 = { y 1W [ W (I N G) W ] } 1 1 W y 1 (312) { y 1W [ W (I N G) W ] } 1 W y Since G is a fixed matrix, the optimal GMM estimator can be computed in one step if u it s are assumed to be homoskedastic and exhibit no autocorrelation In general, the GMM approach does not impose that u it is iid over both cross-section units and time periods In this case V or V 1 can be estimated without imposing these restrictions 5 Now, we need to replace W (I N G) W = N W igw i, i=1 by V N = N W i u i u iw i i=1 Since u i is unobservable, we obtain the two-step Arellano and Bond GMM estimator by where ˆφ F GLS,2 = { } 1 { } y 1W 1 V N W y 1 y 1W 1 V N W y, (313) û i = y i ˆφ F GLS,1 y 1 and V 1 N N = W i û i û iw i This GMM estimator requires no knowledge concerning the initial conditions or the distributions of u i and α i In general, the GMM estimator for φ is asymptotically normal with its covariance given by i=1 ) { } 1 V AR (ˆφF GLS,2 = y 1W 1 V N W y 1 5 The absence of autocorrelation is necessary for the validity of moment conditions

40 40 CHAPTER 3 DYNAMIC PANELS GMM estimator in models with exogenous variables Now we consider a more general dynamic panel model, y it = φy it 1 + x it β + α i + u it, i = 1,, N, t = 1,, T, (314) where x it is the k 1 vector of regressors Its first-difference version becomes y it = φ y it 1 + x it β + u it (315) Suppose that the k dimensional regressors x it are strictly exogenous such that E ( x itu is ) = 0 for all t, s = 1,, T, and assume that all x it are not correlated with the individual effects α i Then, all x it are valid instruments for (315) in which case the 1 kt vector defined by x i = (x i1, x i2,, x it ) should be added to each diagonal element of W i in (38); that is, we have the (T 2) [( T 2) + kt (T 2)] instrument matrix, W i = (y i1, x i ) (y i1, y i2, x i ) (y i1, y i2,, y it 2, x i ) (316) Writing (315) in the matrix form and premultiplying it by W, we obtain W y = φw y 1 + W Xβ + W u, (317) where y, y 1, u are defined just after equation (39), and X is the stacked N (T 2) k matrix of observations on x it The two-step GLS estimator can then be obtained by ( ˆφF GLS,2 ˆβ F GLS,2 ) = ( ) 1 ( ) z 1 W V N W z z 1 W V N W y, (318) V 1 N is estimated simi- where W = (W 1,, W N ), z = ( y 1, X) and larly as in (313) Next, if x it are not strictly exogenous but predetermined such that E (x it u is ) 0 for s < t, (and still assuming that all x it are not correlated with α i ), then only (x i1, x i2,, x is 1 ) are valid instruments for (315) at period s Thus, we get the (T 2)

41 31 DYNAMIC PANELS WITH FIXED T 41 [( T 2) + k ( T 1)] matrix of instruments, (y i1, x i1, x i2 ) (y i1, y i2, x i1, x i2, x i3 ) 0 W i = 0 0 (y i1,, y it 2, x i1,, x it 1 ) ( ) (319) φ and the two-step GLS estimator of can be obtained by (318), but β with the choice of W i given by (319) In empirical studies a combination of both exogenous and predetermined regressors may occur rather than two extreme cases, and one can adjust the matrix of instruments W accordingly In the case where only subset of x it are correlated with α i, then we can also extend the Hausman-Taylor estimation procedure See Baltagi (2008, section 82), 313 The Arellano and Bover (1995) Study Arellano and Bover (1995) develop a unifying IV-GMM framework for dynamic panel data models, which also includes the Hausman and Taylor type estimator as a special case Consider the following static panels: y it = x it β + z i γ + ε it, i = 1, 2,, N, ; t = 1, 2,, T, (320) where β is k 1, γ is g 1 and In the vector from, where y i = y i1 y it T 1, X i = S i = [X i, Z i ] T (k+g), δ = ε it = α i + u it y i = X i β + Z i γ + ε i = S i δ + ε i, (321) x i1 x it [ β γ ε i = α i e T + u i, ] T k, Z i =, e T = (k+g) 1 z i z i 1 1 T g T 1, ε i =, u i = Arellano and Bover transform (321) using the following nonsingular matrix transformation: [ ] C H = e T /T, T T ε i1 ε it u i1 u it T 1, T 1

42 42 CHAPTER 3 DYNAMIC PANELS where C is any (T 1) T matrix of rank (T 1) such that Ce T = 0 For example, C could be the first rows of the within group operator (see definition of Q ) or the first difference operator Premultiplying ε i by C, then we obtain the transformed disturbances, [ ] ε Cεi i = Hε i = (322) ε i For example, this class of transformation performs a decomposition between within-group and between-group variation which is helpful in order to implement moment conditions implied by the model Notice that the first (T 1) transformed disturbances are free of the individual effects α i by construction Hence, all exogenous variables are valid instruments for the first (T 1) equations in (321) Define the 1 (kt + g) vector w i = (x i1,, x it, z i ), and let m i denote the row vector of subset of variables of w i assumed to be uncorrelated with α i such that dim (m i ) = m dim (γ) = g Then, a valid instrument matrix becomes w i 0 0 W i =, (323) 0 w i m i and the moment conditions are given by Write (321) in matrix form, S = S 1 S N T (kt +g+m) E ( W ihε i ) = 0 (324) y = Sδ + ε, (325) NT (k+g) Defining the matrix of instruments, W = W 1 W N, ε = ε 1 ε N NT (kt +g+m) NT 1 and premultiplying (325) by W H, where H = I N H is a NT NT matrix, then we obtain the following complete transformed system: W Hy = W HSδ + W Hε (326),

43 31 DYNAMIC PANELS WITH FIXED T 43 The Arellano-Bover optimal GMM estimator of δ based on moment condition (324) is the GLS estimator applied to (326) given by where ˆδ GLS = ( ) S HWV 1 1 ( ) W HS S HWV 1 W Hy, (327) V ar (ε) = I N E ( ε i ε i) = IN Ω, V = V ar ( W Hε ) = W H (I N Ω) H W = W ( I N HΩH ) W The feasible GLS estimator is obtained by replacing HΩH or V by its consistent estimator First, unrestricted estimator of HΩH takes the form, 1 N N i=1 ˆε i ˆε i, where ˆε i are the residuals based on consistent preliminary estimates, in which case we have ( ) ˆδ F GLS = S HW ˆV 1 1 ( ) W HS S HW ˆV 1 W Hy, (328) where ( )) ˆV = W (I 1 N N ˆε i ˆε i W N i=1 Alternatively, we consider a restricted estimate of Ω, Ω = σ 2 αe T e T + σ 2 ui T, where σ 2 α and σ 2 u are the consistent estimators of σ 2 α and σ 2 u (recall random effects model) Thus, we have δ F GLS = ( ) S HWṼ 1 1 ( ) W HS S HWṼ 1 W Hy, (329) where Ṽ = W ( I N H ΩH ) W Consider the Hausman and Taylor model again, y it = x 1,it β 1 +x 2,it β 2 +z 1,i γ 1 +z 2,i γ 2 +α i +u it, i = 1, 2,, N, ; t = 1, 2,, T, (330) where the 1 k 1 vector x 1,it and the 1 g 1 vector z 1,i are uncorrelated with α i, but the 1 k 21 vector x 2,it and the 1 g 2 vector z 2,i are correlated with α i (k 1 + k 2 = k and g 1 + g 2 = g) Using the Arellano-Bover transformation, it is easily seen that m i include the set of and variables x 1,it and z 1,i, namely, m i = (x 1,i1,, x 1,iT, z 1,i )

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible