Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE

Introduction Chapter 6. Panel Data 2

Panel data The term panel data refers to data sets with repeated observations over time (or other dimensions) for a given cross-section of units. Units can be persons, households, rms, countries,... It is dierent from repeated cross-sections. Main advantages of panel data: Permanent unobserved heterogeneity. Dynamic responses and error components. Chapter 6. Panel Data 3

Cigarette Consumption We will use the same example throughout the chapter. Consider the estimation of the demand for cigarettes: ln C it = β 0 + β 1 ln P it + β 2 ln Y it + η i + v it, where: ln C it is log consumption of cigarettes, ln P it is log price of the cigarettes, ln Y it is log income of the individual, η i + v it is unobserved. Chapter 6. Panel Data 4

Static Models Chapter 6. Panel Data 5

General notation We consider the following model: y it = x itβ + (η i + v it ). where y it and x it are observed, and η i + v it is unobserved. Let {y it, x it } t=1,...,t i=1,...,n be our sample. We dene: y i y y i1. y it y 1. y N, X i =, X = x i1. x it X 1. X N, η i = η i ι T, and v i =, η = where ι T is a size T vector of ones. η 1. η N, and v = Hence, we can use the following compact notation: v i1. v it v 1. v N y i = X i β + (η i + v i ), and y = Xβ + (η + v). Chapter 6. Panel Data 6,,

General assumptions for static models For static models, we assume: Fixed eects: E[x it η i ] 0 or random eects: E[x it η i ] = 0. Strict exogeneity: E[x it v is ] = 0 s, t. This assumption rules out eects of past v is on current x it (e.g. x it cannot include lagged dependent variables). Error components: E[η i ] = E[v it ] = E[η i v it ] = 0. Serially uncorrelated shocks: E[v it v is ] = 0 s t. Homoskedasticity and i.i.d. errors: η i iid(0, σ 2 η) and v it iid(0, σ 2 v), which does not aect any crucial result, but simplies some derivations. Chapter 6. Panel Data 7

Pooled OLS A simple approach: dene: u η + v and estimate β by OLS: ˆβ OLS = (X X) 1 X y. The properties of ˆβ OLS depend on E[x it η i ], as E[x it v it ] = 0: If E[x itη i] = 0 E[x ju j] = 0 (random eects): E[x it v it ]=0 ˆβ OLS is consistent as N, or T, or both. it is ecient only if σ 2 η = 0. If E[x itη i] 0 E[x ju j] 0 (xed eects): ˆβ OLS is inconsistent as N, or T, or both. cross-section results are also inconsistent (but panel helps in constructing a consistent alternative). Chapter 6. Panel Data 8

Pooled OLS in Cigarette Demand In our previous example, we would redene our regression as: ln C j = β 0 + β 1 ln P j + β 2 ln Y j + u j, where I used subindex j to emphasize that each observation it is considered as one independent observation. Potential problems: preferences, education, OLS ln Prices (β 1 ) -0.083 (0.015) ln Income (β 2 ) -0.032 (0.006) parental smoking behavior,... Chapter 6. Panel Data 9

The xed eects model. Within groups estimation. Chapter 6. Panel Data 10

Within groups estimator Write the model in deviations from individual means, ỹ it y it ȳ i, where ȳ i T 1 T t=1 y it: ỹ it = (x it x i ) β + (η i η i ) + (v it v i ) = x itβ + ṽ it. Given the previous assumptions: E[ x it ṽ it ] = 0. Therefore, OLS on the transformed model: ˆβ W G = ( X X) 1 X ỹ, is a consistent estimator either if E[x it η i ] 0 or E[x it η i ] = 0. Strict exogeneity is a crucial assumption (see next slide). Chapter 6. Panel Data 11

The role of strict exogeneity In the case where N and T is xed, consistency depends on strict exogeneity. To see it, recall that: x it = x it 1 T (x i1 +... + x it ) and ṽ it = v it 1 T (v i1 +... + v it ). Therefore E[ x it ṽ it ] = 0 requires E[x it v is ] = 0 s, t unless T. This has motivated the development of dynamic panel data models, to relax this assumption. Chapter 6. Panel Data 12

Pros and cons of within groups estimator Advantage: consistent either if E[x it η i ] 0 or E[x it η i ] = 0. Limitations: Not ecient: When N but T is xed, less ecient that e.g. ˆβGLS if E[x it η i ] = 0. If E[x it η i ] 0, it is only ecient when all regressors are correlated with η i. It does not allow to identify coecients for time-invariant regressors, and identication is through switchers. Chapter 6. Panel Data 13

Within Groups in Cigarette Demand In our example: OLS WG ln Prices (β 1 ) -0.083 (0.015) -0.292 (0.023) ln Income (β 2 ) -0.032 (0.006) 0.107 (0.019) Chapter 6. Panel Data 14

Least Squares Dummy Variables The Within Groups estimator can also be obtained by including a set of N individual dummy variables: y it = x itβ + η 1 D 1i +... + η N D Ni + v it, where D hi = 1{h = i} (e.g. D 1i takes the value of 1 for the observations on individual 1 and 0 for all other observations). OLS estimation of this model gives numerically equivalent estimates to WG (that's why ˆβ W G is a.k.a. ˆβLSDV ). This gives intuition on why WG is not very ecient if there is only limited time-series variation (degrees of freedom are N T K N = N(T 1) K). Chapter 6. Panel Data 15

LSDV in Cigarette Demand We can generate individual (rm) dummies and estimate by OLS, to check that it delivers the same results: OLS WG LSDV ln Prices (β 1) -0.083-0.292-0.292 (0.015) (0.023) (0.023) ln Income (β 2) -0.032 0.107 0.107 (0.006) (0.019) (0.019) Individual 1 (β 0 + η 1) 2.804 (0.288) Individual 2 (β 0 + η 2) 3.455 (0.398) Individual 3 (β 0 + η 3) 2.891 (0.416) Individual 4 (β 0 + η 4) 2.908 (0.384) Individual 5 (β 0 + η 5) 3.490 (0.433) Individual 6 (β 0 + η 6) 2.092 (0.325) Individual 7 (β 0 + η 7) 1.769 (0.393)...... Chapter 6. Panel Data 16

First-Dierenced Least Squares Another transformation we can consider is rst dierences: y it = x itβ + v it, for i = 1,..., N; t = 2,..., T where y it = y it y it 1. Takes out time-invariant individual eects ( η i = η i η i = 0), so OLS on the dierenced model is consistent. Consistency requires E[ x it v it ] = 0 which is implied by but weaker than strict exogeneity. WG more ecient than FDLS under classical assumptions. FDLS more ecient if v it random walk ( v it = ε it iid(0, σ 2 ε)). Chapter 6. Panel Data 17

FDLS in Cigarette Demand To get FDLS we generate st dierences and estimate by OLS: OLS WG LSDV FDLS ln Prices (β 1 ) -0.083-0.292-0.292-0.413 (0.015) (0.023) (0.023) (0.035) ln Income (β 2 ) -0.032 0.107 0.107 0.178 (0.006) (0.019) (0.019) (0.055) Individual 1 (β 0 + η 1 ) 2.804 (0.288) Individual 2 (β 0 + η 2 ) 3.455 (0.398) Individual 3 (β 0 + η 3 ) 2.891 (0.416) Individual 4 (β 0 + η 4 ) 2.908 (0.384) Individual 5 (β 0 + η 5 ) 3.490 (0.433) Individual 6 (β 0 + η 6 ) 2.092 (0.325) Chapter 6. Panel Data Individual 7 (β 0 + η 7 ) 1.769 18

The random eects model. Error components. Chapter 6. Panel Data 19

Uncorrelated eects Now we assume uncorrelated or random eects: E[x it η i ] = 0. In this case, OLS is consistent, but not ecient. The ineciency is provided by the serial correlation induced by η i : E[u it u is ] = E[(η i + v it )(η i + v is )] = E[η 2 i ] = σ 2 η. The variance of the unobservables (under classical assumptions) is: E[u 2 it] = E[η 2 i ] + E[v 2 it] = σ 2 η + σ 2 v. Chapter 6. Panel Data 20

Error structure Therefore, the variance-covariance matrix of the unobservables is: ση 2 + σv 2 ση 2... σ 2 η E[u i u ση 2 ση 2 + σv 2... σ 2 η i] =...... = Ω i, ση 2 ση 2... ση 2 + σv 2 whose dimensions are T T, and E[u i u h ] = 0 i h, or: Ω 1 0... 0 E[uu 0 Ω 2... 0 ] =...... = Ω, 0 0... Ω N which is block-diagonal with dimension NT NT. Chapter 6. Panel Data 21

Generalized Least Squares Under the classical assumptions, GLS (Balestra-Nerlove) estimator is consistent and ecient if E[x it η i ] = 0: ˆβ GLS = ( X Ω 1 X ) 1 X Ω 1 y. If E[x it η i ] 0 GLS is inconsistent as N and T is xed. This estimator is unfeasible because we do not know σ 2 η and σ 2 v. Chapter 6. Panel Data 22

Theta-dierencing ˆβ GLS is equivalent to OLS on the theta-dierenced model: y it = x it β + u it, where: and: y it = y it (1 θ)ȳ i, σ 2 v θ 2 = σv 2 + T ση 2. Consistency relies on E[x it η i ] = 0 (as η i not eliminated). Two special cases: If σ 2 η = 0 (i.e. no individual eect), OLS is ecient. If T, then θ 0, and y it ỹ it = y it ȳ i: WG is ecient. Chapter 6. Panel Data 23

Feasible GLS ˆβ GLS is unfeasible because we do not know ση 2 and σv. 2 A consistent estimator of σv 2 is provided by the WG residuals: ˆṽ it ỹ it x ˆβ it W G ˆσ 2 v = ˆṽ ˆṽ N(T 1) K Then, a consistent estimator of σ 2 η is given by the BG residuals: ȳ i = x iβ + η i + v i, i = 1,..., N ˆβ BG ( ˆσ ū 2 = ση 2 + 1 ) T σ2 v ˆū i ȳ i x i ˆβ BG = ˆū ˆū N K ˆσ2 η = ˆσ 2 ū 1 T ˆσ2 v. Chapter 6. Panel Data 24

Feasible GLS in Cigarette Demand In our example, if we now estimate ˆβ F GLS, we get: OLS WG FGLS ln Prices (β 1 ) -0.083-0.292-0.122 (0.015) (0.023) (0.014) ln Income (β 2 ) -0.032 0.107-0.012 (0.006) (0.019) (0.004) Chapter 6. Panel Data 25

Testing for correlated individual eects. Chapter 6. Panel Data 26

Testing for correlated eects (Hausman test) ˆβ W G is consistent regardless of E[x it η i ] being equal to zero or not. ˆβ F GLS is consistent only if E[x it η i ] = 0. we can test whether both estimates are similar! The Hausman test does exactly this comparison: h = ˆq [avar(ˆq)] 1 ˆq a χ 2 (K), under the null hypothesis E[x it η i ] = 0, where: ˆq = ˆβ W G ˆβ F GLS, and: ( ) ( ) avar(ˆq) = avar ˆβ W G avar ˆβ F GLS. Requires classical assumptions (FGLS to be more ecient than WG). Chapter 6. Panel Data 27

Hausman test in Cigarette Demand Output from software packages often includes the Hausman test. In our example: Statistic P-value Hausman test 24.661 0.000 Chapter 6. Panel Data 28

Dynamic Models Chapter 6. Panel Data 29

Autoregressive models with individual eects Chapter 6. Panel Data 30

Autorregressive panel data model We consider the following model: y it = αy it 1 + η i + v it α < 1. Other regressors can be included, but main results unaected. We assume: Error components: E[η i ] = E[v it ] = E[η i v it ] = 0. Serially uncorrelated shocks: E[v it v is ] = 0 s t. Predetermined initial cond.: E[y i0 v it ] = 0 t = 1,..., T. Chapter 6. Panel Data 31

Properties of pooled OLS and WG estimators Even assuming E[y it 1v it] = 0, still OLS delivers: because E[y it 1η i] = σ 2 η ( plim ˆα OLS > α, N 1 α t 1 1 α ) + α t 1 E[y i0η i] > 0. Doing the within groups transformation we see that: because E[ỹ it 1ṽ it] = Aσ 2 v < 0. plim ˆα W G < α, N ( ( ( A = (1 α) 1+T 1 α t 1 α T 1 t)) +αt (1 α T 1) ) T 2 (1 α) 2 WG bias vanishes as T (bias not small even with T = 15). Supposedly consistent estimators that give ˆα >> α OLS or ˆα << ˆα W G should be seen with suspicion. Chapter 6. Panel Data 32

Anderson and Hsiao Consider the model in rst dierences: y it = α y it 1 + v it. OLS in rst dierences is inconsistent: E[ y it 1 v it ] = σv 2 < 0. However, y it 2 or y it 2 are valid instruments for y it 1 : Relevance: E[ y it 2 y it 1] 0, E[y it 2 y it 1] 0. Orthogonality: E[ y it 2 v it] = E[y it 2 v it] = 0. Anderson and Hsiao (1981) proposed this 2SLS estimators: ) 1 ˆα AH = ( y 1 y 1 y 1 y, where: y 1 = Z (Z Z) 1 Z y 1, where Z can be y 2 or y 2. Requires min. three periods (T = 2 and y i0 ). Only ecient if T = 2. Chapter 6. Panel Data 33

AR(1) Cigarette Demand (No Covariates) In our example, we redene the model to be an AR(1) process (for now without regressors): ln C it = α ln C it 1 + η i + v it. The Anderson-Hsiao results (together with OLS and WG) are: OLS WG Anderson- Hsiao Lagged cons. (ln C it 1 ) 0.982 0.884 1.395 (0.003) (0.061) (0.090) Chapter 6. Panel Data 34

Dierenced GMM estimation. Chapter 6. Panel Data 35

GMM in 3 slides (I): the setup GMM nds parameter estimates that come as close as possible to satisfy orthogonality conditions (or moment conditions) in the sample. E.g., consider K regressors x i and L instruments z i : u i = y i f(x i, β) z i = g(x i ). The model species L moment conditions: E[z i u i ] = 0. Sample analogue: b N (β) = 1 N N z i u i (β). i=1 Chapter 6. Panel Data 36

GMM in 3 slides (II): the estimation Two possible cases cases: L = K (just identied): b N ( ˆβ GMM ) = 0. L > K (overidentied): ˆβGMM = arg min β b N (β) W N b N (β). W N is a positive denite weighting matrix. Optimal GMM (ecient) uses ( 1 N N i=1 E[z iu i u i z i ] ) 1, the inverse of the variance-covariance matrix, as weighting matrix. Chapter 6. Panel Data 37

GMM in 3 slides (III): asymptotic properties ˆβ GMM is a consistent estimator of β. It is asymptotically normal, with the following variance: where: avar( ˆβ GMM ) = (D W D) 1 D W S 0 W D(D W D) 1, b N (β) D plim N β, W plim W N, N S 0 1 N N E[z i u i u iz i]. i=1 Chapter 6. Panel Data 38

Moment conditions Given previous assumptions, several moment conditions: Equation Instruments Orthogonality cond. y i2 = α y i1 + v i2 y i0 E[ v i2 y i0 ] = 0 [ ( )] yi0 y i3 = α y i2 + v i3 y i0, y i1 E v i3 = 0 y i1 y i0 y i4 = α y i3 + v i4 y i0, y i1, y i2 E v i4 y i1 = 0 y i2. y it = α y it 1 + v it y i0, y i1, y i2,..., y it 2 E.. v it y i0 y i1 y i2. y it 2 = 0 We end up with (T 1)T/2 moment conditions. Chapter 6. Panel Data 39

Moment conditions in matrix form We can write these moment conditions as E[Z i v i] = 0, where: y i0 0 0 0... 0 0... 0 0 y i0 y i1 0... 0 0... 0 Z i =................... 0 0 0 0... y i0 y i1... y it 2 v i2 v i3 and vi =., v it and the sample analogue is: b N (α) = 1 N N Z i v i(α). Hence, the GMM estimator (proposed by Arellano and Bond, 1991) is: ( ) ( ) 1 N ˆα GMM = arg min v 1 N α i(α)z i W N Z i v i(α) = N N i=1 i=1 i=1 = ( y 1 ZWN Z y 1 ) 1 y 1 ZWN Z y. Chapter 6. Panel Data 40

Optimal weighting matrix The optimal weighting matrix (ecient GMM) is: ( 1 1 N W N = E[Z i v i v iz i]). N The sample analogue is obtained in a two-step procedure: i=1 ( ) 1 1 N W N = [Z i v N i(ˆα) v i (ˆα)Zi]. i=1 Windeijer (2005) proposes a nite sample correction of the variance that accounts for α being estimated. The most common one-step (and rst-step) matrix uses the structure of E[ v i v i]: 2 1 0 0... 0 E[ v i v i] = σv 2 1 2 1 0... 0.......... 0 0 0 0... 2 Chapter 6. Panel Data 41

GMM in Cigarette Demand GMM results in our example are: Coecient Standard Error Least Squares (OLS) 0.982 (0.003) Within Groups (WG) 0.884 (0.061) Anderson-Hsiao 1.395 (0.090) One-step GMM 1.023 (0.104) Two-step GMM 0.994 (0.040) Two-step GMM small sample 0.994 (0.121) Chapter 6. Panel Data 42

Potential limitations of Arellano-Bond Weak instruments: When α 1, relevance of the instrument decreases. Instruments are still valid, but have poor small sample properties. Monte Carlo evidence shows that with α > 0.8, estimator behaves poorly unless huge samples available. There are alternatives in the literature. Overtting: Too many instruments if T relative to N is relatively large. We might want to restrict the number of instruments used. It is always good practice to check robustness to dierent combinations of instruments. Chapter 6. Panel Data 43

Dynamic linear model Once we include regressors, the model is: y it = αy it 1 + x itβ + η i + v it α < 1. We maintain the previous assumptions: error components, serially uncorrelated shocks, and predetermined initial conditions. Therefore, moment conditions of the kind: are still valid. E[y it s v it ] = 0 t = 2,..., T, s 2, Chapter 6. Panel Data 44

Assumptions on regressors Dierent assumptions regarding x it will enable dierent additional orthogonality conditions: x it can be correlated or uncorrelated with η i. x it can endogenous, predetermined, or strictly exogenous with respect to v it. For instance, if assumptions are analogous to those for y it 1, we may use x it 1 (and longer lags) as instruments: y i0 x i0 x i1... 0... 0 0... 0 Z i =......................... 0 0 0... y i0... y it 2 x i0... x it 1 Chapter 6. Panel Data 45

Dynamic Cigarette Demand with Regressors In our example, we rewrite the employment equations as follows: ln C it = α ln C it 1 + β 1 ln P it + β 2 ln Y it + η i + v it. Results are: OLS WG GMM Lagged dep (α) 0.947 0.528 0.495 (0.011) (0.064) (0.127) ln Prices (β 1 ) 0.010-0.501-0.607 (0.006) (0.098) (0.143) ln Income (β 2 ) 0.049 0.369 0.338 (0.011) (0.044) (0.051) Chapter 6. Panel Data 46

System GMM estimation Chapter 6. Panel Data 47

Additional orthogonality conditions Recall our (T 1)T/2 moment conditions: E[y it s v it ] = 0 t = 2,..., T ; s 2. System GMM (Arellano and Bover, 1995) uses the assumption E[y i0 η i ] = η i 1 α : E[ y it η i ] = 0, t or, alternatively: E[ y it s u it ] = 0, u it = η i + v it, s = 1,..., T 1. Chapter 6. Panel Data 48

The System GMM estimator Analogously to the rst-dierenced GMM, the estimator is given by E[(Z ) u i ] = 0: where: ˆα Sys GMM = ( (X ) Z W N (Z ) X ) 1 X Z W N (Z ) y, ( ) ( ) ( ) Zi Zi 0... 0 = 0, u vi i =, X y 1i i = y i1... y it 1 η i + v it y it 1 ( ) and y yi i = y it This estimator is more ecient, as it uses additional moment conditions. It reduces small sample bias, especially when α 1. Chapter 6. Panel Data 49

System-GMM in Cigarette Demand GMM results in our example are: Coecient Standard Error Least Squares (OLS) 0.982 (0.003) Within Groups (WG) 0.884 (0.061) Anderson-Hsiao 1.395 (0.090) One-step GMM 1.023 (0.104) Two-step GMM 0.994 (0.040) Two-step GMM small sample 0.994 (0.121) One-step System-GMM 0.926 (0.023) Two-step System-GMM small 0.911 (0.032) Chapter 6. Panel Data 50

Specication tests Chapter 6. Panel Data 51

Specication tests There are several relevant aspects for the validity of the estimation that can be tested formally. Orthogonality conditions: are they small enough to not reject that they are zero (overidentifying restrictions). Validity of a subset of more restrictive assumptions (dierence Sargan test, Hausman test). Serial correlation in the data: vital for the validity of the instruments (Arellano-Bond test). Chapter 6. Panel Data 52

Sargan/Hansen overidentifying restrictions test The null hypothesis is whether the orthogonality conditions are satised (i.e. moments are equal to zero). The test can only be implemented if the problem is overidentied (otherwise the sample moments are exactly zero by construction). The test is: S = NJ N (β) = N 1 N ( N û 1 iz i N i=1 ) 1 N Z iû iû iz i i=1 1 N N Z iû i, i=1 where û are predicted residuals from the rst step and û are those predicted from the second stage, and: S a χ 2 (L K). Chapter 6. Panel Data 53

Testing stronger assumptions Some of the assumptions that we make are stronger than others. If the problem is overidentied, we can test whether results change if we include or exclude the orthogonality conditions generated by them. If they are true, increase eciency, but if not, inconsistent! Two ways of testing it: Overidentifying restrictions (dierence in Sargan should be close to zero: S = S S A a χ 2 (L L A )). Dierences in coecients (Hausman test: ˆq = ˆδ A GMM ˆδ GMM ). Chapter 6. Panel Data 54

Direct test for serial correlation The test was proposed by Arellano-Bond (1991). Tests for the presence of second order autocorrelation in the rst-dierenced residuals. If dierences in residuals are second-order correlated, some instruments would not be valid! The test is: m 2 = v 2 v se a N (0, 1), where v 2 is the second lagged residual in dierences, and v is the part of the vector of contemporaneous rst dierences for the periods that overlap with the second lagged vector. Values close to zero do not reject the hypothesis of no serial correlation. Chapter 6. Panel Data 55