Fixed Effects Models for Panel Data. December 1, 2014

Similar documents
Ordinary Least Squares Regression

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Applied Quantitative Methods II

Lecture 10: Panel Data

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

EC327: Advanced Econometrics, Spring 2007

Applied Microeconometrics (L5): Panel Data-Basics

Instrumental Variables and the Problem of Endogeneity

Topic 10: Panel Data Analysis

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Econometrics of Panel Data

Short T Panels - Review

Econometrics of Panel Data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

10 Panel Data. Andrius Buteikis,

Econ 582 Fixed Effects Estimation of Panel Data

Econometrics of Panel Data

Econometrics of Panel Data

Dealing With Endogeneity

α version (only brief introduction so far)

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

Final Exam. Economics 835: Econometrics. Fall 2010

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Advanced Econometrics

Non-linear panel data modeling

Multiple Equation GMM with Common Coefficients: Panel Data

Chapter 15 Panel Data Models. Pooling Time-Series and Cross-Section Data

PS 271B: Quantitative Methods II Lecture Notes

The Simple Linear Regression Model

Linear Panel Data Models

Multiple Regression Analysis: Heteroskedasticity

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

PANEL DATA RANDOM AND FIXED EFFECTS MODEL. Professor Menelaos Karanasos. December Panel Data (Institute) PANEL DATA December / 1

Formal Statement of Simple Linear Regression Model

Nonstationary Panels

y it = α i + β 0 ix it + ε it (0.1) The panel data estimators for the linear model are all standard, either the application of OLS or GLS.

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Instrumental Variables, Simultaneous and Systems of Equations

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Intermediate Econometrics

1 Motivation for Instrumental Variable (IV) Regression

Capital humain, développement et migrations: approche macroéconomique (Empirical Analysis - Static Part)

Economics 582 Random Effects Estimation

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Econometrics - 30C00200

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

Specification testing in panel data models estimated by fixed effects with instrumental variables

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Weighted Least Squares

Gov 2000: 13. Panel Data and Clustering

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Panel Data: Linear Models

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

Advanced Quantitative Methods: panel data

Specification Tests in Unbalanced Panels with Endogeneity.

18.S096 Problem Set 3 Fall 2013 Regression Analysis Due Date: 10/8/2013

Heteroskedasticity. Part VII. Heteroskedasticity

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Goals. PSCI6000 Maximum Likelihood Estimation Multiple Response Model 1. Multinomial Dependent Variable. Random Utility Model

Questions and Answers on Heteroskedasticity, Autocorrelation and Generalized Least Squares

Testing Random Effects in Two-Way Spatial Panel Data Models

Panel data methods for policy analysis

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

MULTILEVEL MODELS WHERE THE RANDOM EFFECTS ARE CORRELATED WITH THE FIXED PREDICTORS

splm: econometric analysis of spatial panel data

Econometrics of Panel Data

Introduction to Panel Data Analysis

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Introduction to Estimation Methods for Time Series models. Lecture 1

Panel Data Model (January 9, 2018)

Lecture 4: Linear panel models

Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13)

Test of hypotheses with panel data

A Meta-Analysis of the Urban Wage Premium

Empirical Economic Research, Part II

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Econometric Analysis of Cross Section and Panel Data

Problem Set 10: Panel Data

Basic Linear Model. Chapters 4 and 4: Part II. Basic Linear Model

Chapter 8 Heteroskedasticity

Mediation Analysis: OLS vs. SUR vs. 3SLS Note by Hubert Gatignon July 7, 2013, updated November 15, 2013

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

WISE International Masters

Linear models and their mathematical foundations: Simple linear regression

Applied Econometrics Lecture 1

08 Endogenous Right-Hand-Side Variables. Andrius Buteikis,

1 Estimation of Persistent Dynamic Panel Data. Motivation

Homoskedasticity. Var (u X) = σ 2. (23)

Lecture 4: Heteroskedasticity

Answer all questions from part I. Answer two question from part II.a, and one question from part II.b.

Limited Dependent Variables and Panel Data

6/10/15. Session 2: The two- way error component model. Session 2: The two- way error component model. Two- way error component regression model.

Introductory Econometrics

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Transcription:

Fixed Effects Models for Panel Data December 1, 2014

Notation Use the same setup as before, with the linear model Y it = X it β + c i + ɛ it (1) where X it is a 1 K + 1 vector of independent variables. Here we make our usual assumptions : Assumption 1: E[ɛ it X i1,..., X it, c i ] = 0 Assumption 2: E[ɛ i ɛ i ] = σ2 I T

What about E(X c)? In the fixed effects model, we do not have to make assumptions about whether unobserved heterogeneity is correlated with our independent variables. So it can handle this case: y OLS Estimated Relationship True Relationship for individual red True Relationship for individual blue True Relationship for individual green x i1 x 1

Method 1: The Dummy Variable Estimator Add an N 1 dimensional time invariant vector, called Z to the regression model, where Z looks like this for observation N = 1 Z 1 = [ 1... 0 ] (2) Using this approach, we can write the estimating equation as Y it = X it β + Z i c + ɛ it (3)

Just an OLS Estimator min c,b S(b) = (Y Xb Zc) (Y Xb Zc) (4) To see this consider X = [], and only the fixed effects dummies are included. In that case, T Y it ĉ i = (5) T t=1 Having variance σ2 T. Note this does not approach zero as N

Lots of c i s to estimate and not consistent The variance of ɛ, σ 2 is where 1 NT (K + 1) N ɛˆ dv ɛˆ dv (6) ˆɛ dv = (Y Xb Zc) (7) Besides the consistency problem, this estimator requires the identification of (N 1) + K + 1 parameters (and inverting an (N 1) K + 1) square matrix.

Method 2: The Demeaning Estimator To overcome the consistency problems with the dummy variable estimator, most statistical packages employ the Demeaning Estimator. This estimator does not fit a constant for each cross section unit N but importantly, it does not merely put the unobserved heterogeneity effect c i into the error term. To proceed, define Ȳ i = 1 T T Y it and X i = 1 T t=1 T X it (8) t=1 and writing the deviations from the means for X and Y as Ÿ it = Y it Ȳ i and Ẍ it = X it X i (9)

The Demeaning Estimator, cont. We can recover the demeaning estimator (b d ) for β by OLS on the demeaned data as b d = (Ẍ Ẍ) 1 Ẍ Ÿ (10) Sometimes also referred to as the With-In Estimator. The Between estimator uses only variation between cross section units. It is recovered by the regression Ȳ i = X i β + c i + u i

Standard errors from the Simple OLS approach needs a little tweaking to be correct While the regression estimates are unbiased, since E( ɛ X i ) = 0, the standard variance covariance matrix from simple OLS is not correct. To see this, recall that and E[ ɛ it ] =E[(ɛ it ɛ i ) 2 ] = E[ɛ 2 it 2ɛ it ɛ i + ɛ 2 i ] (11) =σ 2 + σ 2 /T 2σ 2 /T (12) =σ 2 (1 1/T ) (13) E[ ɛ it ɛ is ] =E[(ɛ it ɛ i )(ɛ is ɛ i )] (14) =0 σ 2 /T σ 2 /T + σ 2 /T (15) = σ 2 /T (16)

Demeaning Estimator Parameter Variance Covariance Matrix Var(b d x) =E[(b d b)(b d b) x] (17) =E[(Ẍ Ẍ ) 1 Ẍ Ÿ β)(ẍ Ẍ ) 1 Ẍ Ÿ β) ] (18) =E[(Ẍ Ẍ ) 1 Ẍ (Ẍ β + ɛ) β)(ẍ Ẍ ) 1 Ẍ (Ẍ β + ɛ) β) ] (19) =σ 2 ue[(ẍ Ẍ ) 1 ] (20) ˆσ 2 u = (Ÿ Ẍ β) (Ÿ Ẍ β) N(T 1) (K + 1) (21)

Method 3: First Differences Recall the first differencing approach we discussed back in the introduction. Suppose that for each individual, we have a panel of two periods (t = 1, 2). Apply a differencing approach for each individual i to rid the model of c i, since y i =β (x i2 x i1 ) + (c i c i ) + (ɛ i2 ɛ i1 ) (22) = x i β + ɛ i (23)

Then, we have another way of estimating the model that Rids the model of the c i But does not put them in the error term The estimating equation becomes: Having estimates equal to y = xβ + ɛ (24) b fd = ( x x) 1 x y (25) These are all equivalent methods for T = 2- that is, you will recover the same β estimates.

3 ways to estimate, which one to use? If T = 2, these will all yield exactly equivalent results for the parameters and the variance/covariance matrices. Demeaning is preferred in most cases since it does not require the estimation of N constants

Consider the following small dataset on N=2 and T=3 Person Year Wage Education Experience Training 1 2000 12 12 5 0 1 2001 15 12 6 1 1 2002 15 12 7 1 2 2000 25 16 0 0 2 2001 27 16 1 0 2 2002 30 16 0 1 For each of the 3 methods: Dummy Variable Estimator, Demeaning Estimator, and the First Differences Estimator construct the matrix of independent variables for the model.

Testing for Endogeneity Steps: Partition your matrix of explanatory variable for each individual as x i = [x i1 x i2 ]. Note that x i1 is the subset of exogenous independent variables and x i2 is the the potentially endogenous explanatory variable having an instrument z i2. Run the relevancy test regression x i2t = x i1t a + z i2t b + u it, t = 2,..T (26) and recover û it As in the endogeneity chapter, y it = x it β + û it δ + ψ (27) And test for H 0 : δ = 0. Rejecting H 0 is a strong signal of an endogeneity problem.

OLS versus RE This test relies on our finding that lim Ω = 1 σc 2 0 σɛ 2 I NT NT (28) Restricting σ 2 c = 0 means the model can t fit the data as well, so the test stata does (the Breusch Pagan LM test) checks to see how much our predictive power is degraded due to the restriction. I am being intentionally vague since this is a Maximum Likelihood-based test (have not covered yet). This tests H 0 : σ 2 c = 0 Pooled OLS appropriate H 1 : σ 2 c 0 Random Effects Appropriate

Fixed Versus Random Effects The critical difference between the random and fixed effects approaches is whether c i is correlated with x it. If it is, then the random effects approach, which simply puts c i in the error term will lead to biased estimates relative to the fixed effects estimator. As in the Chapter on endogeneity, we need to test to see how different the two estimated β vectors are and to do this, we use the Hausman test.

Hausman Test [ H = (b RE b FE ) VAR(b ˆ RE ) VAR(b ˆ 1 FE )] (bre b FE ) (29) is distributed with χ 2 M, and M are the degrees of freedom in the model. This tests H 0 : E[x c = 0] E[b re ] = E[b fe ] = β RE H 1 : E[x c 0] so E[b re ] E[b fe ] = β FE

When to use Pooled OLS, RE, FE Pooled OLS Estimator: When the unobserved heterogeneity is not present. All assembly line workers receive the same training, are more or less of the same background, and only workers of a certain skill level are retained. Therefore, productivity rates only vary randomly across people. Random Effects: Sequential measurement of astronomical phenomena. We observe a star day in a day out and make every known correction to the measurement of its position. That position may have measurement error and the error may contain a piece that varies day in and day out and the other piece may shift the error in a systematic way across observation.

When to use Pooled OLS, RE, FE Fixed Effects: Almost all examples from economics are a fixed effects story. Aid received by recipient countries probably depend on unobserved factors specific but invariant through time, that are also correlated with other recipient country factors. Unobserved factors on aid allocation by donors related to quality of aid implementation, and this is correlated with democratic institutions.