Next, we discuss econometric methods that can be used to estimate panel data models.

1 Motivation Next, we discuss econometric methods that can be used to estimate panel data models. Panel data is a repeated observation of the same cross section Panel data is highly desirable when it is available. 1. Increased precision- additional observations and moment conditions 2. Unobserved heterogeneity- random effects or fixed effects 3. Individual level dynamics

The text presents a large number of estimators We shall focus on the most commonly used estimators for empirical work Alternative moment conditions will be discussed in detail Derivation of standard errors- limited discussion since it is fairly standard.

2 The Basic Model i =1,...,N is an individual and t =1,...,T is time y it, x it are endogenous and exogenous variables In most work in applied micro, N is large and T is small Asymptotic theory is conducted using N α i reflects unobserved heterogeneity Assume linearity and write y it = α i + x 0 it β + u it (1)

2.1 Remark about large T Different approaches are required for large T We would want to be much more careful about time series properties of error terms Different approach to establishing asymptotic theory Caculation of standard errors would differ The discussion of serial correlation Differences in Differences shows that it is important here as well.

2.2 Random Effects and Fixed Effects How should we model α i There are two common approaches The first is to assume that α i is iid across agents i and all the other variables in our model Frequently, α i N(α, σ α ) This is the random effects model The second is to assume that α i is a fixed and unknown parameter The fixed effects model is clearly more general

Most applied economists prefer fixed effects to random effects for that reason However, there are some tradeoffs: 1. If some component of x it is not time varying, we cannot seperately identify α i. Random effects still allows identification 2. In fixed effects estimators, we can only identify β, α i is treated as a nuisance parameter and not estimated. We only learn a subset of the parameters. Hence, we cannot simulate the model for example. 3. Fixed effects estimators rely on within variation, or changes in y it and x it, for identification. This destroys variation and may lead to less precise estimates.

4. In nonlinear models, sometimes random effects are possible when fixed effects are not

3 Exogeneity assumptions We need to make assumptions about E[u it x i,1,...,x it ] More generally we can make assumptions about E[u it z i,1,..., z it ]wherez it is a vector of instruments and z it = x it is one possible set of instruments Carefully justifying exogeneity assumptions is crucial in applied work Interpreting u it and using economic theory/intuition to justify moment conditions is one common approach The following steps are commonly (but not always) used:

1. A first order condition is used to form (1) 2. We interpret α i and u it given the available data and the economic theory. α i are omitted variables that are persistent and u it are omitted variables, measurement error, etc... 3. We must justify, often using theory, why u it satisfies our moment condition We shall describe alternative identification strategies at a high level and then give examples 3.1 Contemporaneous Exogeneity E[u it z it ]=0forallt =1,...,T

If z it is an r by 1 vector, there are rt moment conditions In a cross sectional setting, we only have r moment conditions Here the number increases because of the number of obervations 3.2 Weak Exogeneity E[u it z is ]=0fors t Here the error terms are assumed to be independent of all lagged values of the instruments This type of condition can sometimes be justified from a Stochastic Euler

Starting with an Euler equation, we may be able to isolate the dependent variable on the lhs and exogenous variables on the rhs in a linear fashion using appropriate functional forms. Even if we cannnot make them linear, we can approximate many functions using higher order terms in various basis functions The error term is an expectational error Expectational errors must be mean zero conditional on information at time t Intuitively, this means that prior values of z is cannot be used to forecast the expectational error We shall give examples. This is a stronger assumption, but it yields many more moment conditions

3.3 Strong Exogeneity Here we assume that E[u it z is ]=0fors =1,...,T That is, our error term is uncorrelated with future values of the instrument This is a stronger assumption For example, future values of the instrument may allow us to predict that expectational error However, in our auction model example, discussed below, the error term is private information Economic theory sometimes implies that independent private information is uncorrelated with even future information

4 Estimation A large number of estimators are discussed in the text As a practical matter, maximum likelihood is commonly used for random effect models This allows us to simulate the model. Misspecification of the parametric form of the likelihood is viewed as a less problematic assumption that the independence assumption in random effects 4.1 Within Estimation Assume that z it = x it and that strong exogeniety holds, i.e. E[u it x i,1,..., x i,t ]=0

The within estimator substracts Ey i and Ex i, the sample means of y i and x i averaging over t (notation is different than text due to incompatible fonts) Wecanthenwrite: y it Ey i =(x it Ex i ) 0 β +(ε it Eε i ) Strong exogeneity implies that x i,1,...,x i,t are valid instruments for the composite error term (ε it Eε i ) Abstracting from problems with weak instruments, one would probably use all of the x i,1,...,x i,t as instruments Note that weak exogeneity would invalidate this choice of instruments

The estimator is then just GMM using the moment conditions implies by strong exogeneity (discussed below) Note that we are not estimating α i This is a nuissance parameter However, without α i we are unable to make statements about the expected value of y i for different values of x i However, we can make statements that only require us to know β Thus, marginal effects for individuals in x i will be identified, but not the levels

4.2 First Differences A second transformation that can be used is first differencing the data This yields the equation: y it y i,t 1 =(x it x i,t 1 ) 0 β +(ε it ε i,t 1 ) Assume that weak exogeneity holds, then we can use the lagged values of x as valid instruments. Note that there are fewer instruments (possibly less efficiency) but less restrictive identifying assumptions. The text discusses GMM estimation of linear panel models in detail.

4.3 Linear Panel Data GMM The basic idea is that we use our exogeneity assumptions to define instruments Because of linearity, many of the estimators can be computed in closed form Asymptotic theory and (robust) standard errors are standard applications of methods covered earlier in the course We need to worry about both heteroskedasticity and serial correlation when computing standard errors het- In general, ignoring serial correlation and eroskedasticity will inflate t-stats

Panel data standard errors are coded in many stats packages However, you need to make sure that you understand how the standard errors are being computed in your program If it is doing the wrong thing, you could be forced to write an embarassing retraction/comment after someone discovers your mistakes

5 Dynamic models In many settings, we might expect the rhs variables in our panel data model to be a function of choices in earlier periods Therefore, it is desirable to drop assumption of exogenous x it in our model The estimators discused above are typically biased in these settings The data generating process is: y it = γy i,t 1 + x 0 it β + α i + ε i,t

In this model, people are concerned about identifying the difference between state dependence and unobserved heterogeneity. This might be difficult. y it and y i,t 1 can be correlated because γ 6= 0 and because α i 6=0 A poor choice of estimators, instruments or identification strategy could lead to a conclusion that γ 6= 0 when in fact it is not 5.1 Estimators Let s first difference our model:

y it y i,t 1 = γ ³ y i,t 1 y i,t 2 + ³ x 0 it x 0 it 1 β + ³ ε i,t ε i,t 1 The estimators discused in the previous sections arebiasedinourdynamicmodel Neither contemporary, weak, or strong exogeneity ensures ols generates consistent estimates cov( ³ ε i,t ε i,t 1, ³ yi,t 1 y i,t 2 ) 6= 0ingeneral Suppose that there is no serial correlation in ε i,t 1 Then y i,t 2 is a valid instrument for ³ y i,t 1 y i,t 2

y i,t 2 is correlated with future values of y i because of γ however, y i,t 2 does not depend on future realizations of the error term (e.g. the future does not cause the past in many models!) more generally, we could let ε i,t 1 depend on a moving average of past ε 0 s If the moving average depends on 4 periods, then we could use the 5th lag y i,t 5 as a valid instrument This is formalized in the Arellano-Bond estimator Once again, it is just GMM using moment conditions from IV

6 Remarks Panel data gives us different identifying assumptions Unobserved heterogeneity and dynamics can be accomodated Random effects- independence assumptions and MLE Fixed effects- use within variation/differences to identify parameters Much of the variation in the data is destroyed in many applications Nonlinear models can use panel techniques

Fixed effects is possible if we can rewrite the model in a way that cancel s out the fixed effect For example, fixed effects in a logit model are possible by using a conditional likelihood (important example) Otherwise random effects may be possible through appropriate simulation methods Mixture models (chapter 18) are a flexible way of accomdating unobserved heterogeneity in some nonlinear models

7 Example: Competitive Bidding Consider a first price sealed bid auction, such as contractors bidding for bridge/highway jobs. The dependent variable is firm i s bid. The control variables are a set of project characteritics. Following the theory of Bayes-Nash equilibrium, assume that costs can be written as: c i,t = x 0 i,t β + ξ i + ξ t + η i,t c i,t cost for firm i in project t x i,t observed cost controls (e.g. distance to project, engineering cost estimate)

ξ i firm i fixed effect ξ t project t fixed effect η i,t independent shock to costs Let Q(b i,t x i,t,ξ) be the probability that a bid of b i,t wins given the info that is publically observed to firms Let Q(b b i,t x i,t,ξ)beanestimateofthisobject and bq(b i,t x i,t,ξ) an estimate of the associated density For instance, we could specify a distribution for Q and use MLE conditioning on x and ξ.

Then the firm s profit max problem is: (b i,t c i,t )Q(b i,t x i,t,ξ) The FOC s for profit maximization are: Q(b i,t x i,t,ξ)+(b i,t c i,t )q(b i,t x i,t,ξ)=0 Algebra implies that b i,t = c i,t + Q(b i,t x i,t,ξ) q(b i,t x i,t,ξ) = x 0 bq(b i,t β + ξ i,t x i,t,ξ) i + ξ t + bq(b i,t x i,t,ξ) + η i,t In this model, η i,t is a shock to private information Weak exogeneity means that you can t uses past bids, etc... to predict η i,t

This seems sensible Strong exogeneity means that future bids cannot be used to infer η i,t This is stronger, but is also consistent with many theories. In our data, we have a large number of observations per firm Hence, we use firm level dummies for ξ i Our nuisance parameter is ξ t Estimates show that ols is biased and panel data has better t-stats

8 Hedonics Next we consider a hedonic home price regression These regressions are commonly used in environmental and public economics to measure the valuation of non-market amenities For example, the value of cleaning up a superfund site could be measured by using home prices next to superfund sites and comparing them to home prices that are not near superfund sites Unobserved heterogeneity- proximity to a toxic waste site is probably correlated w/ other bad things OLS regressions on cleaning up superfund sites suggests that clean up is bad becuase of omitted variables

Problem for measuring costs/benefits of superfund clean up policy. Repeat sales of a home j over multiple time periods (t =1, 2,...,T). log(p j,1 ) = α 0,1 + α 1,1 x j,1 + α ξ,1 ξ j,1 (2). log(p j,t ) = α 0,T + α 1,T x j,t + α ξ,t ξ j,t. The omitted product attribute evolves according to a first order Markov process, that is ξ j,t0 = α(t, t 0 )ξ j,t + η(j, t, t 0 ). (3) Housing market is informationally efficient if E[log(p j,t0 ) log(p j,t ) E h log(p j,t0 ) log(p j,t ) i I t ]=0, where I t denotes the information available to the buyer at time t.

Informational efficiency implies that E[η(j, t, t 0 ) I t ]= 0. The price hedonic at time t 0 can be written as log(p j,t 0) = α 0,t 0 + α 1,t 0x j,t 0,1 + α ξ,t 0ξ j,t 0 = α 0,t 0 + α 1,t 0x j,t 0,1 + α ξ,t 0α(t, t0 ) Ã! log(pj,t ) α 0,t α 1,t x j,t,1 = +α ξ,t 0η(j, t, t 0 ) Ã α ξ,t α 0,t 0 α ξ,t 0 α(t, t 0 )α 0,t α ξ,t Ã αξ,t 0!! + α(t, t 0 ) log(p j,t ) α ξ,t Ã! αξ,t 0α 1,t α(t, t 0 ) x j,t,1 + α ξ,t α 1,t 0x j,t 0,1 + α ξ,t 0η(j, t, t0 ). Intheabove,wesubstitute log(p j,t) α 0,t α 1,t x j,t,1 α ξ,t

for ξ j,t. log(p j,t0 ) = + t 0 1 X s=t 1 β 0 (s)1{s = t(j)} t 0 1 X s=t 1 β 1 (s)1{s = t(j)} log(p j,t(j) ) t 0 1 X + β 2 (s)1{s = t(j)}x j,t(j),1 s=t 1 +β 3 x j,t 0 + ε j,t 0 ε j,t 0 = α ξ,t 0η(j, t, t 0 ), the random evolution in the omitted attribute. This satisfies weak exogeneity and can be estimated using panel data. There is a huge difference in the estimates