Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible to model dynamics

Examples - individual earnings - household expenditures - firm investment - sector productivity - regional migration - country income per capita, or growth rates

Dimensions of the panel are important for asymptotic properties of different estimators. Large N, small T often found in microeconomic data. Longer T more common with aggregate data. Semi-asymptotic results let one dimension become large with the other held fixed. Our emphasis will be on the case where N with T fixed, more relevant for microeconometric applications. Reliance on any asymptotic results is hazardous if neither N nor T is large.

Static linear model y it = x it β + w i γ + (η i + v it ) for i = 1,..., N and t = 1,..., T x it = (x 1it,..., x Kit ), β = β 1., w i = (w 1i,..., w Gi ), γ = γ 1. β K γ G 1 K K 1 1 G G 1 y it, η i, and v it scalars Observed y it, x it, w i. Unobserved η i, v it.

Stack T observations for each individual y i = X i β + W i γ + (η i j T + v i ) for i = 1,..., N y i = y i1. y it, X i = x 1i1... x Ki1..... x 1iT... x KiT, W i = w 1i... w Gi..... w 1i... w Gi, η i j T = η i. η i, v i = v i1. v it T 1 T K T G T 1 T 1 j T is a T 1 column vector with each element equal to one

Then stack over N individuals y = Xβ + W γ + (η + v) y = y 1. y N, X = X 1. X N, W = W 1. W N, η = η 1 j T. η N j T, v = v 1. v N NT 1 NT K NT G NT 1 NT 1

Special case: no time-invariant explanatory variables (G = 0) y = Xβ + (η + v) y = y 1. y N, X = X 1. X N, η = η 1 j T. η N j T, v = v 1. v N NT 1 NT K NT 1 NT 1 Two important assumptions that we maintain throughout:

y i = X i β + (η i j T + v i ) Cross-sectional independence: Observations on (y i, X i ) are independent over i = 1,..., N Slope parameter homogeneity: The parameters in β are common to all i = 1,..., N The form of unobserved heterogeneity that we address relates to the individualspecific intercept terms (η i ) in our linear model relating y it to x it (known as fixed effects or random effects, depending on whether they are assumed to be correlated or uncorrelated with the explanatory variables in x it )

y = Xβ + (η + v) = Xβ + u u = η + v; u it = η i + v it Ordinary least squares β OLS = (X X) 1 X y Assumption (x it predetermined) E[x it v it ] = 0 Properties of β OLS then depend on E[x it η i ].

Assumption (uncorrelated individual effects, or random effects ) E[x it η i ] = 0 Then β OLS is a consistent estimator of β as N or as T (or both). E[x it v it ] = 0 and E[x it η i ] = 0 = E[x it u it ] = 0. OLS would be consistent under these assumptions for a single cross-section. Panel dimension is thus not critical here for consistency. OLS is not effi cient in the panel setting, unless σ 2 η = var(η i ) = 0 (and v it iid(0, σ 2 v)).

Assumption (correlated individual effects, or fixed effects ) E[x it η i ] 0 Then β OLS is an inconsistent estimator of β as N or as T (or both). E[x it v it ] = 0 and E[x it η i ] 0 = E[x it u it ] 0. OLS using the panel (pooled OLS) is subject to the same kind of omitted variable bias as OLS in a single cross-section. Repeated observations do not change this; but do allow us to transform the model in order to construct consistent estimators.

Panel data is most useful when we suspect that cross-section regression results would be biased, due to (relevant and correlated) omitted variables. Particularly if it is plausible that important omitted variables are timeinvariant (or vary little over the sample period). And the explanatory variables of interest and the dependent variable vary over time.

Examples: Do high investment countries tend to have higher per capita income because investment raises income, or because factors like good governance or favorable geography raise both investment and income? Do high R&D firms tend to have higher TFP because R&D raises TFP, or because good managers both invest in R&D and (independently) achieve high TFP?

Classical panel data estimators Assumption (strict exogeneity) E[x it v is ] = 0 for all s, t This assumption is crucial for asymptotic properties in the case where N with T fixed, although not in the case where T. Strict exogeneity rules out feedback from past v is shocks to current x it. Hence rules out lagged dependent variables.

Assumption (error components) E[η i ] = E[v it ] = E[η i v it ] = 0 Assumption (serially uncorrelated shocks) E[v it v is ] = 0 for s t Assumption (homoskedasticity) E[η 2 i] = σ 2 η E[v 2 it] = σ 2 v

For the case of uncorrelated individual effects, ineffi ciency of pooled OLS reflects the serial correlation in u it = η i + v it due to the presence of the time-invariant individual effects (η i ). u it = η i + v it u i,t 1 = η i + v i,t 1 Under the classical assumptions E[u it u i,t 1 ] = E[η 2 i] = σ 2 η And E[u 2 it] = E[η 2 i] + E[v 2 it] = σ 2 η + σ 2 v

So And σ 2 η + σ 2 v σ 2 η σ 2 η σ 2 E[u i u η σ 2 η + σ 2 v... σ 2 η i] = = Ω i...... T T σ 2 η σ 2 η... σ 2 η + σ 2 v Ω i 0 0 0 Ω i 0 E[uu ] = = Ω...... NT NT 0 0 Ω i

Generalised Least Squares Under the classical assumptions, the GLS (or random effects ) estimator is consistent and effi cient if E[x it η i ] = 0 β GLS = (X Ω 1 X) 1 X Ω 1 y NB. Consistency requires all the explanatory variables to be uncorrelated with the individual effects. If E[x it η i ] 0, β GLS is inconsistent as N with T fixed.

β GLS can be obtained using OLS on the transformed model y it = x itβ + u it where y it = y it (1 θ)y i and θ 2 = σ 2 v σ 2 v + T σ 2 η, y i = 1 T T s=1 y is This transformation is known as theta-differencing.

Feasible GLS uses consistent estimates of σ 2 η and σ 2 v to obtain a consistent estimate of θ. These can be obtained using residuals from the Within Groups and Between Groups estimators (to be discussed below). Feasible GLS is asymptotically equivalent to true GLS for this model. Hence feasible GLS is asymptotically effi cient under the classical assumptions, when E[x it η i ] = 0.

y it = y it (1 θ)y i θ 2 = σ 2 v σ 2 v + T σ 2 η For σ 2 η = 0, θ = 1 and y it = y it. Special case where OLS is effi cient. As T, θ 0 and y it = y it y i. In this case GLS coincides with the simpler Within Groups estimator (discussed below), and estimation of θ becomes redundant.

Within Groups Within transformation T Key property ỹ it = y it y i, y i = 1 T s=1 y is η i = η i so that η i = η i η i = 0 Example of a transformation that eliminates time-invariant variables. Notice that theta-differencing does not eliminate the time-invariant individuals effects (η i ) from the error term for θ 0 (η i = η i (1 θ)η i = θη i ) - hence we require E[x it η i ] = 0 for GLS to be consistent.

Transformed model ỹ it = x it β + ṽ it The Within Groups (or fixed effects ) estimator is OLS on this transformed model β W G = ( X 1 X) X ỹ Under classical assumptions, β W G is consistent, both for E[x it η i ] = 0 and for E[x it η i ] 0 - since the time-invariant individuals effects (η i ) are eliminated from the error term by the within transformation, we do not require E[x it η i ] = 0 for Within Groups to be consistent.

The Within Groups estimator is thus consistent in the case where some or all of the explanatory variables are correlated with this unobserved heterogeneity. In some contexts this is a key advantage, relative to cross-section OLS, pooled OLS or GLS. This illustrates how we can construct consistent estimates using panel data in settings where cross-section OLS would be subject to omitted variables bias.

But note that this advantage comes at a price. As N with T fixed, β W G is less effi cient than β GLS in the case where E[x it η i ] = 0. β W G is only effi cient (under classical assumptions) in the special case where all the explanatory variables are correlated with η i. Moreover any observed time-invariant explanatory variables are also eliminated by the transformation, so the Within Groups estimator does not identify the γ parameters in the more general model y it = x it β + w i γ + (η i + v it )

This illustrates that repeated observations (i.e. panel data) are most useful when the variables of interest change over time - repeated observations are less useful when the variables of interest remain constant over time. For example, panel data is less successful in controlling for unobserved ability if we want to estimate the effect of schooling on earnings - years of schooling remain constant for most people once they leave fulltime education and join the labour force.

More generally, the Within Groups parameter estimates are likely to be imprecise if there is only limited time-series ( within ) variation. The Within Groups estimator of β can also be obtained by including a set of N dummy variables, for each individual y it = η 1 D 1i +... + η N D Ni + x it β + v it and using OLS on this model (D 1i = 1 for the observations on individual 1, and zero otherwise) Hence Within Groups is also called Least Squares Dummy Variables (LSDV).

Note that, in the case where N with T fixed, consistency depends on the strict exogeneity assumption. x it = x it 1 T (x i1 +... + x it ) ṽ it = v it 1 T (v i1 +... + v it ) Hence E[ x it ṽ it ] = 0 requires E[x it v is ] = 0 for all s, t unless T. This motivated the development of alternative estimators for dynamic panel data models, that are consistent as N for fixed T, in the presence of (e.g.) lagged dependent variables.

Other estimators Between Groups Between Groups is OLS on the cross-section equation y i = x i β + (η i + v i ) i = 1,..., N Consistency requires E[x it η i ] = 0. Between Groups is not effi cient - only used to obtain an estimate of σ 2 η when implementing feasible GLS.

First-differenced OLS OLS on the first-differenced equations y it = x it β + v it for i = 1,..., N and t = 2,..., T where y it = y it y i,t 1 First-differencing is another transformation that eliminates the time-invariant individual effects ( η i = η i η i = 0). Consistency requires E[ x it v it ] = 0 - this is implied by (but weaker than) strict exogeneity.

Within Groups is more effi cient than first-differenced OLS under classical assumptions, i.e. v it iid(0, σ 2 v), serially uncorrelated and homoskedastic. First-differenced OLS is more effi cient if v it is a random walk, i.e. v it = v i,t 1 + ε it with ε it iid(0, σ 2 ε) s.t. v it is serially uncorrelated. First-differenced OLS (but not Within Groups) would also be consistent as N with T fixed in cases where we suspect feedback from second lags or longer lags of v is onto x it, but not from the first lag (v i,t 1 ) onto x it - i.e. where any feedback takes two or more periods to influence x it.

Calculating the feasible GLS estimator y it = x itβ + u it y it = y it (1 θ)y i θ 2 = σ 2 v σ 2 v + T σ 2 η σ 2 v can be estimated consistently using the Within Groups residuals ṽit = ỹ it x it βw G σ 2 v = ṽ ṽ N(T 1) K

Notice that although we have N T observations and K parameters, we have only N(T 1) K degrees of freedom for the Within Groups estimator - we estimate N parameters when we estimate the individual means (y i for i = 1,..., N) used to construct the within transformation - or, equivalently, when we estimate coeffi cients on the N individual dummy variables in the Least Squares Dummy Variables representation - this is also relevant when we estimate the asymptotic variance of the Within Groups estimator using avar( β W G ) = σ 2 v( X X) 1

σ 2 η can then be estimated consistently using the Between Groups residuals û i = ( η i + v i ) = y i x i βbg σ 2 u = ( σ 2 η + 1 T σ2 v) = û û N K and then σ 2 η = σ 2 u 1 T σ2 v

Testing for correlated individual effects With fixed T, it is useful to test whether some of the included explanatory variables are correlated with the unobserved individual effects. β W G is consistent whether the individual effects are correlated with the included regressors, or not. β GLS (and β BG ) is consistent only if the individual effects are uncorrelated with all the included regressors; biased and inconsistent otherwise. Estimates should be similar if η i is uncorrelated with all the included regressors; but different if η i is correlated with any of the included regressors.

Hausman test q = β W G β GLS h = q [avar( q)] 1 q a χ 2 (K) under the null hypothesis that E[x it η i ] = 0. avar( q) = avar( β W G ) avar( β GLS ) An equivalent test can be based on q = β W G β BG. These tests require the classical assumptions, under which the FGLS estimator is effi cient relative to the Within estimator under the null. Versions robust to heteroskedasticity are now available.