Applied Quantitative Methods II

Similar documents
Applied Microeconometrics (L5): Panel Data-Basics

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Fixed Effects Models for Panel Data. December 1, 2014

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Dealing With Endogeneity

Lecture 4: Linear panel models

EC327: Advanced Econometrics, Spring 2007

Lecture Module 8. Agenda. 1 Endogeneity. 2 Instrumental Variables. 3 Two-stage least squares. 4 Panel Data: First Differencing

Econometrics of Panel Data

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Topic 10: Panel Data Analysis

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

α version (only brief introduction so far)

Linear Panel Data Models

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

1 The basics of panel data

Panel data methods for policy analysis

Econometrics of Panel Data

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Introduction to Panel Data Analysis

Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13)

1 Motivation for Instrumental Variable (IV) Regression

Controlling for Time Invariant Heterogeneity

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Applied Quantitative Methods II

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Short T Panels - Review

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Econ 582 Fixed Effects Estimation of Panel Data

ECO375 Tutorial 8 Instrumental Variables

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Applied Econometrics Lecture 1

Problem Set 10: Panel Data

4 Instrumental Variables Single endogenous variable One continuous instrument. 2

Gov 2000: 13. Panel Data and Clustering

Econometrics of Panel Data

Final Exam. Economics 835: Econometrics. Fall 2010

Econometrics of Panel Data

Lecture 8: Instrumental Variables Estimation

ECON Introductory Econometrics. Lecture 17: Experiments

Econometrics (60 points) as the multivariate regression of Y on X 1 and X 2? [6 points]

Advanced Econometrics

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Dynamic Panel Data Models

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

Econometric Analysis of Cross Section and Panel Data

Empirical Application of Panel Data Regression

Applied Health Economics (for B.Sc.)

Fortin Econ Econometric Review 1. 1 Panel Data Methods Fixed Effects Dummy Variables Regression... 7

Economics 582 Random Effects Estimation

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

y it = α i + β 0 ix it + ε it (0.1) The panel data estimators for the linear model are all standard, either the application of OLS or GLS.

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

Multiple Regression. Midterm results: AVG = 26.5 (88%) A = 27+ B = C =

Non-linear panel data modeling

Econometrics Homework 4 Solutions

Handout 12. Endogeneity & Simultaneous Equation Models

10 Panel Data. Andrius Buteikis,

Lecture #8 & #9 Multiple regression

Lecture 10: Panel Data

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

The Simple Linear Regression Model

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

EMERGING MARKETS - Lecture 2: Methodology refresher

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Instrumental Variables and the Problem of Endogeneity

Econometrics. 7) Endogeneity

EC402 - Problem Set 3

Simple Regression Model (Assumptions)


ECONOMETRICS HONOR S EXAM REVIEW SESSION

ECNS 561 Multiple Regression Analysis

Analysis of Panel Data: Introduction and Causal Inference with Panel Data

Making sense of Econometrics: Basics

Lecture 14. More on using dummy variables (deal with seasonality)

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

Multiple Equation GMM with Common Coefficients: Panel Data

Econometrics - 30C00200

Instrumental Variables

Jeffrey M. Wooldridge Michigan State University

Instrumental Variables, Simultaneous and Systems of Equations

Introduction to Econometrics

Lab 07 Introduction to Econometrics

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

14.32 Final : Spring 2001

Notes on Panel Data and Fixed Effects models

Single-Equation GMM: Endogeneity Bias

ECO220Y Simple Regression: Testing the Slope

Ordinary Least Squares Regression

Dynamic Panel Data Models

Longitudinal Data Analysis. RatSWD Nachwuchsworkshop Vorlesung von Josef Brüderl 25. August, 2009

Panel Data Exercises Manuel Arellano. Using panel data, a researcher considers the estimation of the following system:

Transcription:

Applied Quantitative Methods II Lecture 10: Panel Data Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 1 / 38

Outline 1 Introduction 2 Pooled OLS 3 First differences 4 Fixed effects 5 Random effects 6 FE vs RE 7 Conclusion Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 2 / 38

Introduction Topic of today: Panel data Intuition: observing SAME units over time / in the same environment Units individuals, HHs, factories, firms, municipalities, states / countries: i...n, Repeated over: usually time periods t (year, quarter, weeks, days) units within clusters (siblings within family, firms within an industry, workers within a firm) Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 3 / 38

Introduction Repeated Cross-Section: Survey at several points in time ( rounds ) using different sample each round Panel: Survey at several points in time ( waves ) using same individuals Rotational Panel: Survey at several points in time, where part of the sample is based on same individuals and part is new ones Example Panel Study of Income Dynamics (PSID, USA) German Socioeconomic panel (GSEP, Germany) Linked employer-employee data from TREXIMA (CR) Time series vs. Panel - Length and sample size: Time Series: N small (mostly=1), T large (T ) Panel Surveys: N large, T small (N ) Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 4 / 38

Individual effects in panel data Positive relationship? Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 5 / 38

Individual effects in panel data Positive relationship? No! Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 5 / 38

Individual effects in panel data Positive relationship? No! Different intercepts for different people and negative relationship Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 5 / 38

Individual dynamics allows for study of individual dynamics Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 6 / 38

Introduction Main advantage: observe SAME individual => control for unobserved characteristics that do not change over time Estimation issues: We cannot assume that the observations are independently distributed across time undermine validity of standard OLS approach E.g. person s wage in 1990 and 1991 are very likely to be correlated Leads to more complicated methods Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 7 / 38

Structure of panel data Panel data have at least two dimensions: or y it = x itβ i + ε it with i = 1,..., N and t = 1,..., T X 1... 0 0 X 2 0 y = NT 1..... 0 0... X N NT kn β 1 β 2. β N kn 1 + ε T is not necessarily a time dimension: e.g. families & family members; schools, classrooms, and students We cannot estimate β it with OLS Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 8 / 38

Basic setup The basic framework for panel data analysis is the following linear regression model: y i,t = x i,tβ + z i α + v tγ + ε i,t x i,t is a vector of individual characteristics that change with time (e.g. employment status, income etc) z i is fixed or individual effects; it consists of time constant characteristics: 1 observed (sex, ethnicity etc) 2 unobserved (family specific characteristics, individual heterogeneity in skills or preferences etc) v t is time-varying common factor for all units (trend) Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 9 / 38

Types of models We will consider 4 basic cases: 1 Pooled regression 2 Random effects 3 Random coefficients/parameters 4 Fixed effects Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 10 / 38

Outline 1 Introduction 2 Pooled OLS 3 First differences 4 Fixed effects 5 Random effects 6 FE vs RE 7 Conclusion Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 11 / 38

Pooled OLS regression (POLS) We can just run an OLS regression on the whole dataset (pretending it s just N T cross-sectional observations) where: α = E[z i α], u i = z i α α y i,t = α + x i,tβ + u i + ε i,t }{{} =ω i,t In this case, we can put observed time-invariant characteristics and observed time trend into x i,t, so that u i only includes unobserved time-invariant characteristics Is this consistent and efficient? If so, when? Assumptions: no correlation between x i,t and ω i,t = E(x i,t u i ) = 0 & E(x i,t ε i,t ) = 0 no heteroskedasticity or serial correlation Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 12 / 38

Example Goal: estimate the effect of crime on house prices HousePrice i,t = β 0 + β 1 crime i,t + X i,tδ + v tγ + η i,t v t - time dummies (time trend) X i,t is observed characteristics of cities - both time variant and time-invariant (geography, demography, avg. education, age,) OLS on pooled sample: error term is η i,t = z i α + ε i,t z i - city-specific unobserved characteristics, time constant ɛ i,t - idiosyncratic error uncorrelated with crime rate error η i,t likely to be correlated with crime = endogeneity = OLS biased and inconsistent Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 13 / 38

Pooled regression bias If we ignore existing fixed effects (correlation of individual heterogeneity with other included variables): Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 14 / 38

Between estimator Another option: use unit averages over time y i = α + x iβ + ε i This uses only the variation between cross-sectional units (no within variation) Not efficient Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 15 / 38

Outline 1 Introduction 2 Pooled OLS 3 First differences 4 Fixed effects 5 Random effects 6 FE vs RE 7 Conclusion Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 16 / 38

First difference estimator Another way how to get rid of endogenous fixed effects (unobserved time-invariant characteristics) Run a regression on differences (changes over time): y i,t y i,t 1 = ( x i,t x i,t 1) β + εi,t ε i,t 1 = y i,t = x i,tβ + ε i,t Consistent estimator of β: unobserved time-invariant characteristics u i are differenced out! However: less efficient than other methods Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 17 / 38

Example Goal: estimate the effect of crime on house prices, first diff: HP i,t HP i,t 1 = β 0 +β 1 (cri i,t cri i,t )+β 2 (v t v t 1 )+z i z i +ε i,t ε i,t 1 HP it = β 0 + β 1 cri i,t + β 2 v t + ε i,t unobserved heterogeneity z i disappears! if cov( cri i,t, ɛ i,t ) = 0 and no heteroskedasticity and no serial correlation, consistent estimator Cons: 1 we have to have some variance in crime both across time and across cities 2 maybe, large variation in levels of crime, but low variation in first difference = larger error = lower efficiency 3 cannot estimate impact of time-invariant factors (e.g. geography) problem only if we are interested in that Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 18 / 38

Outline 1 Introduction 2 Pooled OLS 3 First differences 4 Fixed effects 5 Random effects 6 FE vs RE 7 Conclusion Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 19 / 38

Fixed effects model (FE) y i,t = α + x i,tβ + u i + ε i,t }{{} =ω i,t In this case we have Corr[x i,t, ω i,t ] 0 1 We can use individual fixed effects as dummies (individual specific intercepts): y i,t = x i,tβ + z i α + ε i,t sometimes too many dummies 2 It can be proven that it is the same thing as if we use deviations from unit means: y i,t y i = ( x i,t x i) β + εi,t ε i (We can also use time fixed effects, but careful about degrees of freedom) Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 20 / 38

Example - de-meaning Goal: estimate the effect of crime on house prices Estimate using fixed effects: we subtract means Mean of HP: HP i,t = 1 T HPi,t HP i,t HP i = β 0 +β 1 (cri i,t cri i )+β 2 (une it une i )+z i z i +trend+ε i,t ε i mean of z i is z i (no time variation) = we get rid of unobserved heterogeneity If cov(x it, ɛ it ) = 0, we have a consistent estimator Con: removes anything time-constant well, it was the goal? but now we cannot evaluate effect of any time-constant variable Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 21 / 38

Example - set of dummies Goal: estimate the effect of crime on house prices HP i,t = β 0 + β 1 cri i,t + β 2 une it + z i + trend + ε i,t add a dummy for each grouping (cities here, possibly also years) HP i,t = β 0 +β 1 cri i,t +β 2 une it +trend +µ 1 city 1 +...+µ N 1 city N 1 +ε i,t if cov(x i,t, ɛ i.t ) = 0, no serial correlation, homoscedasticity, then equivalent to FE Why to do this? we can capture the time constant influences z i Why not? large number of dummies makes estimation tedious Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 22 / 38

What about R-squared? HP i,t HP i = β 0 + β 1 (crime i,t crime i ) + β 2 (une it une i ) + ε i,t ε i if we have R 2 of 0.65, what does that mean? how well can our model explain the variation in houseprice across time? across cities? across time more important We have two: within- and between- R 2 when using only dummies, the dummies may inflate R 2! (use adj. R 2 ) Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 23 / 38

Outline 1 Introduction 2 Pooled OLS 3 First differences 4 Fixed effects 5 Random effects 6 FE vs RE 7 Conclusion Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 24 / 38

Motivation HP i,t = β 0 + β 1 crime i,t + β 1 unemp i,t + z i + ε i,t }{{} =ω i,t We assumed that there may be corr of unobserved heterogeneity and independent vars cov(z i, x i,t ) 0 then endogeneity in OLS => use FE or FD But what if the correlation is 0? Then we may have an easier way of estimation: when we think we control for all factors that are important in determination of y or, if effect of unobserved heterogeneity is very small Can we then use pooled OLS? OLS - serial correlation of errors We need to estimate structure of correlation -> Random effects Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 25 / 38

Random effects HP i,t = β 0 + β 1 crime i,t + β 1 unemp i,t + z i + ε i,t }{{} =ω i,t So, we assume that Cov(x i,t, z i ) = 0 Random efects (RE) corrects for presence of serial correlation When we subtract lambda * means (quasi-de-meaning) HP i,t λhp i = β 0 (1 λ)+β 1 (cri i,t λcri i )+β 1 (une i,t λune i )+η it λη i if lambda = 0, then RE = pooled OLS if lambda = 1, then RE = FE Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 26 / 38

Random effects HP i,t λhp i = β 0 (1 λ)+β 1 (cri i,t λcri i )+β 1 (une i,t λune i )+η it λη i what is lambda? λ = 1 (σ 2 u/(σ 2 u + T σ 2 z )) when does it go to 0? when T σ 2 z = 0, that is variance of unobserved heterogeneity is 0, we use OLS When is it 1? when T σ 2 z =, that is very large, then we have FE RE procedure (Stata does that automatically): 1 estimate lambda 2 transform system and estimate using OLS it is a feasible generalizes least-square technique Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 27 / 38

Example HP i,t λhp i = β 0 (1 λ)+β 1 (cri i,t λcri i )+β 1 (une i,t λune i )+η it λη i Assumptions of RE: Cov(x i,t, z i ) = 0 random sample in cross-section strict exogenous errors E(u i x i ) = 0 & E(ε i,t x i ) = 0, If assumptions hold, then our RE estimator converges to true population value Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 28 / 38

Outline 1 Introduction 2 Pooled OLS 3 First differences 4 Fixed effects 5 Random effects 6 FE vs RE 7 Conclusion Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 29 / 38

Using RE & FE Using RE rather than FE: Pros: Cons: smaller standard errors than FE (more efficient) time-constant variables estimation! almost never Cov(x i,t, z i ) = 0 does hold we have to have many control variables, which is often hard to get if Cov(x i,t, z i ) = 0 does not hold, then RE is incosistent also, we do not estimate unobserved heterogeneity How to find out which one to use? a test b/w FE and RE: Hausman test Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 30 / 38

Comparison of FE and RE Hausman test to compare: FE RE H 0 consistent consistent, efficient H A consistent inconsistent H 0 : Cov(x i,t, z i ) = 0 <=> can we use RE? do not reject => use RE reject => use FE basically, it compares estimated βs from FE and RE if they are the same, use RE (it is more efficient) if they are different, use FE Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 31 / 38

Example: Impact of enterprise zones on employment Source: data file EZUNEM (Wooldridge). 22 cities in Indiana, from 1980 to 1988. Six enterprise zones (ez) created in 1984, and 4 more in 1985. uclms is number of unemployment claims file during the year Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 32 / 38

Example: Impact of enterprise zones on employment Table Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 33 / 38

Example: Impact of enterprise zones on employment Fixed effect using dummies Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 34 / 38

Example: Impact of enterprise zones on employment Fixed effect using demeaning Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 35 / 38

Outline 1 Introduction 2 Pooled OLS 3 First differences 4 Fixed effects 5 Random effects 6 FE vs RE 7 Conclusion Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 36 / 38

Conclusion 1 Repeated observations isolation of time constant unobserved differences 2 We can study dynamics of economic processes 3 Some economic phenomena / outcomes of treatment are inherently longitudinal (e.g. unemployment levels) To summarize, with panel data: 1 We can obtain more precise estimates of the effect of our interest 2 If we suspect that endogeneity comes from some unobserved characteristic that does not change with time, we have an additional way how to solve it Panel data can be used with models you already covered FE probit/logit, Dif-in-dif, 2SLS etc Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 37 / 38

Types of panels Panels can be balanced or unbalanced: 1 Balanced: We observe each unit in every time period 2 Unbalanced: different units appear in different years disappearance of units is called attrition Always determine the reason for attrition: Is there any underlying process? (e.g. we do not observe wage, because that person is no longer employed not random) Solutions: Should I limit to balanced panel? NO, lower efficiency Imputation e.g. if only some variables are missing In case of attrition bias use sample selection bias correction Klára Kaĺıšková AQM II - Lecture 10 VŠE, SS 2016/17 38 / 38