Next, we discuss econometric methods that can be used to estimate panel data models.

Similar documents
1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Lecture 6: Dynamic panel models 1

Econometric Analysis of Cross Section and Panel Data

Econometrics of Panel Data

Outline. Overview of Issues. Spatial Regression. Luc Anselin

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Panel Data Exercises Manuel Arellano. Using panel data, a researcher considers the estimation of the following system:

ECON3327: Financial Econometrics, Spring 2016

Linear Models in Econometrics

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

1 Estimation of Persistent Dynamic Panel Data. Motivation

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary.

LECTURE 11. Introduction to Econometrics. Autocorrelation

Non-linear panel data modeling

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

1 Motivation for Instrumental Variable (IV) Regression

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

Panel Data Seminar. Discrete Response Models. Crest-Insee. 11 April 2008

Dealing With Endogeneity

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply

A Note on Demand Estimation with Supply Information. in Non-Linear Models

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

1 The Basic RBC Model

Transparent Structural Estimation. Matthew Gentzkow Fisher-Schultz Lecture (from work w/ Isaiah Andrews & Jesse M. Shapiro)

Lecture #11: Introduction to the New Empirical Industrial Organization (NEIO) -

Applied Quantitative Methods II

Instrumental Variables

Controlling for Time Invariant Heterogeneity

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Corporate Finance Data & The Role of Dynamic Panels. Mark Flannery, University of Florida Kristine W. Hankins, University of Kentucky

Uncertainty and Disagreement in Equilibrium Models

13. Time Series Analysis: Asymptotics Weakly Dependent and Random Walk Process. Strict Exogeneity

Applied Health Economics (for B.Sc.)

WISE International Masters

Empirical approaches in public economics

Regression Discontinuity

Econometría 2: Análisis de series de Tiempo

Limited Dependent Variables and Panel Data

Price Discrimination through Refund Contracts in Airlines

1 Outline. 1. MSL. 2. MSM and Indirect Inference. 3. Example of MSM-Berry(1994) and BLP(1995). 4. Ackerberg s Importance Sampler.

Quantitative Economics for the Evaluation of the European Policy

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Guide to Modern Econometric:

Estimation of Dynamic Regression Models

Econometrics. 7) Endogeneity

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Identification and Estimation of Bidders Risk Aversion in. First-Price Auctions

An overview of applied econometrics

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Dynamic Discrete Choice Structural Models in Empirical IO

Applied Quantitative Methods II

Lecture 2: Univariate Time Series

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

y it = α i + β 0 ix it + ε it (0.1) The panel data estimators for the linear model are all standard, either the application of OLS or GLS.

ECONOMETRICS HONOR S EXAM REVIEW SESSION

8. Instrumental variables regression

A Course on Advanced Econometrics

Lecture 6: Dynamic Models

Introduction to Econometrics

Lecture #8 & #9 Multiple regression

Identifying the Monetary Policy Shock Christiano et al. (1999)

ECO 2901 EMPIRICAL INDUSTRIAL ORGANIZATION

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

Final Exam. Economics 835: Econometrics. Fall 2010

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

A Course in Applied Econometrics. Lecture 10. Partial Identification. Outline. 1. Introduction. 2. Example I: Missing Data

EMERGING MARKETS - Lecture 2: Methodology refresher

Endogenous Information Choice

Spatial Regression. 13. Spatial Panels (1) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

SOLUTIONS Problem Set 2: Static Entry Games

Regression Discontinuity

Multiple Equation GMM with Common Coefficients: Panel Data

Econometrics for PhDs

Using Instrumental Variables to Find Causal Effects in Public Health

Applied Microeconometrics (L5): Panel Data-Basics

Some Non-Parametric Identification Results using Timing and Information Set Assumptions

Econometrics of Panel Data

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Econ 582 Fixed Effects Estimation of Panel Data

Lecture 8 Panel Data

Short T Panels - Review

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

System GMM estimation of Empirical Growth Models

Improving GMM efficiency in dynamic models for panel data with mean stationarity

Empirical Industrial Organization (ECO 310) University of Toronto. Department of Economics Fall Instructor: Victor Aguirregabiria

Economics 113. Simple Regression Assumptions. Simple Regression Derivation. Changing Units of Measurement. Nonlinear effects

Least Squares Estimation-Finite-Sample Properties

(a) Write down the Hamilton-Jacobi-Bellman (HJB) Equation in the dynamic programming

Econometric Analysis of Panel Data. Final Examination: Spring 2013

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Endogeneity. Tom Smith

Signaling Effects of Monetary Policy

Applied Econometrics Lecture 1

Transcription:

1 Motivation Next, we discuss econometric methods that can be used to estimate panel data models. Panel data is a repeated observation of the same cross section Panel data is highly desirable when it is available. 1. Increased precision- additional observations and moment conditions 2. Unobserved heterogeneity- random effects or fixed effects 3. Individual level dynamics

The text presents a large number of estimators We shall focus on the most commonly used estimators for empirical work Alternative moment conditions will be discussed in detail Derivation of standard errors- limited discussion since it is fairly standard.

2 The Basic Model i =1,...,N is an individual and t =1,...,T is time y it, x it are endogenous and exogenous variables In most work in applied micro, N is large and T is small Asymptotic theory is conducted using N α i reflects unobserved heterogeneity Assume linearity and write y it = α i + x 0 it β + u it (1)

2.1 Remark about large T Different approaches are required for large T We would want to be much more careful about time series properties of error terms Different approach to establishing asymptotic theory Caculation of standard errors would differ The discussion of serial correlation Differences in Differences shows that it is important here as well.

2.2 Random Effects and Fixed Effects How should we model α i There are two common approaches The first is to assume that α i is iid across agents i and all the other variables in our model Frequently, α i N(α, σ α ) This is the random effects model The second is to assume that α i is a fixed and unknown parameter The fixed effects model is clearly more general

Most applied economists prefer fixed effects to random effects for that reason However, there are some tradeoffs: 1. If some component of x it is not time varying, we cannot seperately identify α i. Random effects still allows identification 2. In fixed effects estimators, we can only identify β, α i is treated as a nuisance parameter and not estimated. We only learn a subset of the parameters. Hence, we cannot simulate the model for example. 3. Fixed effects estimators rely on within variation, or changes in y it and x it, for identification. This destroys variation and may lead to less precise estimates.

4. In nonlinear models, sometimes random effects are possible when fixed effects are not

3 Exogeneity assumptions We need to make assumptions about E[u it x i,1,...,x it ] More generally we can make assumptions about E[u it z i,1,..., z it ]wherez it is a vector of instruments and z it = x it is one possible set of instruments Carefully justifying exogeneity assumptions is crucial in applied work Interpreting u it and using economic theory/intuition to justify moment conditions is one common approach The following steps are commonly (but not always) used:

1. A first order condition is used to form (1) 2. We interpret α i and u it given the available data and the economic theory. α i are omitted variables that are persistent and u it are omitted variables, measurement error, etc... 3. We must justify, often using theory, why u it satisfies our moment condition We shall describe alternative identification strategies at a high level and then give examples 3.1 Contemporaneous Exogeneity E[u it z it ]=0forallt =1,...,T

If z it is an r by 1 vector, there are rt moment conditions In a cross sectional setting, we only have r moment conditions Here the number increases because of the number of obervations 3.2 Weak Exogeneity E[u it z is ]=0fors t Here the error terms are assumed to be independent of all lagged values of the instruments This type of condition can sometimes be justified from a Stochastic Euler

Starting with an Euler equation, we may be able to isolate the dependent variable on the lhs and exogenous variables on the rhs in a linear fashion using appropriate functional forms. Even if we cannnot make them linear, we can approximate many functions using higher order terms in various basis functions The error term is an expectational error Expectational errors must be mean zero conditional on information at time t Intuitively, this means that prior values of z is cannot be used to forecast the expectational error We shall give examples. This is a stronger assumption, but it yields many more moment conditions

3.3 Strong Exogeneity Here we assume that E[u it z is ]=0fors =1,...,T That is, our error term is uncorrelated with future values of the instrument This is a stronger assumption For example, future values of the instrument may allow us to predict that expectational error However, in our auction model example, discussed below, the error term is private information Economic theory sometimes implies that independent private information is uncorrelated with even future information

4 Estimation A large number of estimators are discussed in the text As a practical matter, maximum likelihood is commonly used for random effect models This allows us to simulate the model. Misspecification of the parametric form of the likelihood is viewed as a less problematic assumption that the independence assumption in random effects 4.1 Within Estimation Assume that z it = x it and that strong exogeniety holds, i.e. E[u it x i,1,..., x i,t ]=0

The within estimator substracts Ey i and Ex i, the sample means of y i and x i averaging over t (notation is different than text due to incompatible fonts) Wecanthenwrite: y it Ey i =(x it Ex i ) 0 β +(ε it Eε i ) Strong exogeneity implies that x i,1,...,x i,t are valid instruments for the composite error term (ε it Eε i ) Abstracting from problems with weak instruments, one would probably use all of the x i,1,...,x i,t as instruments Note that weak exogeneity would invalidate this choice of instruments

The estimator is then just GMM using the moment conditions implies by strong exogeneity (discussed below) Note that we are not estimating α i This is a nuissance parameter However, without α i we are unable to make statements about the expected value of y i for different values of x i However, we can make statements that only require us to know β Thus, marginal effects for individuals in x i will be identified, but not the levels

4.2 First Differences A second transformation that can be used is first differencing the data This yields the equation: y it y i,t 1 =(x it x i,t 1 ) 0 β +(ε it ε i,t 1 ) Assume that weak exogeneity holds, then we can use the lagged values of x as valid instruments. Note that there are fewer instruments (possibly less efficiency) but less restrictive identifying assumptions. The text discusses GMM estimation of linear panel models in detail.

4.3 Linear Panel Data GMM The basic idea is that we use our exogeneity assumptions to define instruments Because of linearity, many of the estimators can be computed in closed form Asymptotic theory and (robust) standard errors are standard applications of methods covered earlier in the course We need to worry about both heteroskedasticity and serial correlation when computing standard errors het- In general, ignoring serial correlation and eroskedasticity will inflate t-stats

Panel data standard errors are coded in many stats packages However, you need to make sure that you understand how the standard errors are being computed in your program If it is doing the wrong thing, you could be forced to write an embarassing retraction/comment after someone discovers your mistakes

5 Dynamic models In many settings, we might expect the rhs variables in our panel data model to be a function of choices in earlier periods Therefore, it is desirable to drop assumption of exogenous x it in our model The estimators discused above are typically biased in these settings The data generating process is: y it = γy i,t 1 + x 0 it β + α i + ε i,t

In this model, people are concerned about identifying the difference between state dependence and unobserved heterogeneity. This might be difficult. y it and y i,t 1 can be correlated because γ 6= 0 and because α i 6=0 A poor choice of estimators, instruments or identification strategy could lead to a conclusion that γ 6= 0 when in fact it is not 5.1 Estimators Let s first difference our model:

y it y i,t 1 = γ ³ y i,t 1 y i,t 2 + ³ x 0 it x 0 it 1 β + ³ ε i,t ε i,t 1 The estimators discused in the previous sections arebiasedinourdynamicmodel Neither contemporary, weak, or strong exogeneity ensures ols generates consistent estimates cov( ³ ε i,t ε i,t 1, ³ yi,t 1 y i,t 2 ) 6= 0ingeneral Suppose that there is no serial correlation in ε i,t 1 Then y i,t 2 is a valid instrument for ³ y i,t 1 y i,t 2

y i,t 2 is correlated with future values of y i because of γ however, y i,t 2 does not depend on future realizations of the error term (e.g. the future does not cause the past in many models!) more generally, we could let ε i,t 1 depend on a moving average of past ε 0 s If the moving average depends on 4 periods, then we could use the 5th lag y i,t 5 as a valid instrument This is formalized in the Arellano-Bond estimator Once again, it is just GMM using moment conditions from IV

6 Remarks Panel data gives us different identifying assumptions Unobserved heterogeneity and dynamics can be accomodated Random effects- independence assumptions and MLE Fixed effects- use within variation/differences to identify parameters Much of the variation in the data is destroyed in many applications Nonlinear models can use panel techniques

Fixed effects is possible if we can rewrite the model in a way that cancel s out the fixed effect For example, fixed effects in a logit model are possible by using a conditional likelihood (important example) Otherwise random effects may be possible through appropriate simulation methods Mixture models (chapter 18) are a flexible way of accomdating unobserved heterogeneity in some nonlinear models

7 Example: Competitive Bidding Consider a first price sealed bid auction, such as contractors bidding for bridge/highway jobs. The dependent variable is firm i s bid. The control variables are a set of project characteritics. Following the theory of Bayes-Nash equilibrium, assume that costs can be written as: c i,t = x 0 i,t β + ξ i + ξ t + η i,t c i,t cost for firm i in project t x i,t observed cost controls (e.g. distance to project, engineering cost estimate)

ξ i firm i fixed effect ξ t project t fixed effect η i,t independent shock to costs Let Q(b i,t x i,t,ξ) be the probability that a bid of b i,t wins given the info that is publically observed to firms Let Q(b b i,t x i,t,ξ)beanestimateofthisobject and bq(b i,t x i,t,ξ) an estimate of the associated density For instance, we could specify a distribution for Q and use MLE conditioning on x and ξ.

Then the firm s profit max problem is: (b i,t c i,t )Q(b i,t x i,t,ξ) The FOC s for profit maximization are: Q(b i,t x i,t,ξ)+(b i,t c i,t )q(b i,t x i,t,ξ)=0 Algebra implies that b i,t = c i,t + Q(b i,t x i,t,ξ) q(b i,t x i,t,ξ) = x 0 bq(b i,t β + ξ i,t x i,t,ξ) i + ξ t + bq(b i,t x i,t,ξ) + η i,t In this model, η i,t is a shock to private information Weak exogeneity means that you can t uses past bids, etc... to predict η i,t

This seems sensible Strong exogeneity means that future bids cannot be used to infer η i,t This is stronger, but is also consistent with many theories. In our data, we have a large number of observations per firm Hence, we use firm level dummies for ξ i Our nuisance parameter is ξ t Estimates show that ols is biased and panel data has better t-stats

8 Hedonics Next we consider a hedonic home price regression These regressions are commonly used in environmental and public economics to measure the valuation of non-market amenities For example, the value of cleaning up a superfund site could be measured by using home prices next to superfund sites and comparing them to home prices that are not near superfund sites Unobserved heterogeneity- proximity to a toxic waste site is probably correlated w/ other bad things OLS regressions on cleaning up superfund sites suggests that clean up is bad becuase of omitted variables

Problem for measuring costs/benefits of superfund clean up policy. Repeat sales of a home j over multiple time periods (t =1, 2,...,T). log(p j,1 ) = α 0,1 + α 1,1 x j,1 + α ξ,1 ξ j,1 (2). log(p j,t ) = α 0,T + α 1,T x j,t + α ξ,t ξ j,t. The omitted product attribute evolves according to a first order Markov process, that is ξ j,t0 = α(t, t 0 )ξ j,t + η(j, t, t 0 ). (3) Housing market is informationally efficient if E[log(p j,t0 ) log(p j,t ) E h log(p j,t0 ) log(p j,t ) i I t ]=0, where I t denotes the information available to the buyer at time t.

Informational efficiency implies that E[η(j, t, t 0 ) I t ]= 0. The price hedonic at time t 0 can be written as log(p j,t 0) = α 0,t 0 + α 1,t 0x j,t 0,1 + α ξ,t 0ξ j,t 0 = α 0,t 0 + α 1,t 0x j,t 0,1 + α ξ,t 0α(t, t0 ) Ã! log(pj,t ) α 0,t α 1,t x j,t,1 = +α ξ,t 0η(j, t, t 0 ) à α ξ,t α 0,t 0 α ξ,t 0 α(t, t 0 )α 0,t α ξ,t à αξ,t 0!! + α(t, t 0 ) log(p j,t ) α ξ,t Ã! αξ,t 0α 1,t α(t, t 0 ) x j,t,1 + α ξ,t α 1,t 0x j,t 0,1 + α ξ,t 0η(j, t, t0 ). Intheabove,wesubstitute log(p j,t) α 0,t α 1,t x j,t,1 α ξ,t

for ξ j,t. log(p j,t0 ) = + t 0 1 X s=t 1 β 0 (s)1{s = t(j)} t 0 1 X s=t 1 β 1 (s)1{s = t(j)} log(p j,t(j) ) t 0 1 X + β 2 (s)1{s = t(j)}x j,t(j),1 s=t 1 +β 3 x j,t 0 + ε j,t 0 ε j,t 0 = α ξ,t 0η(j, t, t 0 ), the random evolution in the omitted attribute. This satisfies weak exogeneity and can be estimated using panel data. There is a huge difference in the estimates