Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Similar documents
Econometric Analysis of Cross Section and Panel Data

1 Estimation of Persistent Dynamic Panel Data. Motivation

Quaderni di Dipartimento. Estimation Methods in Panel Data Models with Observed and Unobserved Components: a Monte Carlo Study

Dealing With Endogeneity

Short T Panels - Review

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Panel Threshold Regression Models with Endogenous Threshold Variables

Quantile Regression Estimation of a Model with Interactive Effects

Exogenous Treatment and Endogenous Factors: Vanishing of Omitted Variable Bias on the Interaction Term

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Improving GMM efficiency in dynamic models for panel data with mean stationarity

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Fixed Effects Models for Panel Data. December 1, 2014

1 Motivation for Instrumental Variable (IV) Regression

Applied Microeconometrics (L5): Panel Data-Basics

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

IV Quantile Regression for Group-level Treatments, with an Application to the Distributional Effects of Trade

Non-linear panel data modeling

Unconditional Quantile Regression for Panel Data with Exogenous or Endogenous Regressors

Testing for Slope Heterogeneity Bias in Panel Data Models

Christopher Dougherty London School of Economics and Political Science

THE AUSTRALIAN NATIONAL UNIVERSITY. Second Semester Final Examination November, Econometrics II: Econometric Modelling (EMET 2008/6008)

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

Multiple Linear Regression CIVL 7012/8012

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Linear dynamic panel data models

Asymptotic Distribution of Factor Augmented Estimators for Panel Regression

WORKING P A P E R. Unconditional Quantile Regression for Panel Data with Exogenous or Endogenous Regressors DAVID POWELL WR

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Applied Health Economics (for B.Sc.)

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

Econometrics of Panel Data

Nonstationary Panels

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity

Basic econometrics. Tutorial 3. Dipl.Kfm. Johannes Metzler

Economics 308: Econometrics Professor Moody

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Chapter 2. Dynamic panel data models

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous


Testing Random Effects in Two-Way Spatial Panel Data Models

Efficiency of repeated-cross-section estimators in fixed-effects models

Advanced Econometrics

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Introduction to causal identification. Nidhiya Menon IGC Summer School, New Delhi, July 2015

Applied Quantitative Methods II

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Econometrics of Panel Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Econometrics Summary Algebraic and Statistical Preliminaries

Ec1123 Section 7 Instrumental Variables

Revisiting the Synthetic Control Estimator

Outline. Overview of Issues. Spatial Regression. Luc Anselin

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models

Dynamic Panel Data Models

WORKING P A P E R. Unconditional Quantile Treatment Effects in the Presence of Covariates DAVID POWELL WR-816. December 2010

Econ 582 Fixed Effects Estimation of Panel Data

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Handout 12. Endogeneity & Simultaneous Equation Models

ECON Introductory Econometrics. Lecture 17: Experiments

ECO375 Tutorial 8 Instrumental Variables

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

A Course on Advanced Econometrics

Econ 510 B. Brown Spring 2014 Final Exam Answers

Specification Tests in Unbalanced Panels with Endogeneity.

Estimating Heterogeneous Coefficients in Panel Data Models with Endogenous Regressors and Common Factors

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Estimation of Dynamic Panel Data Models with Sample Selection

Lab 07 Introduction to Econometrics

Lecture 9: Panel Data Model (Chapter 14, Wooldridge Textbook)

Dynamic Panel Data Models

Quantitative Economics for the Evaluation of the European Policy

The Estimation of Simultaneous Equation Models under Conditional Heteroscedasticity


Beyond the Target Customer: Social Effects of CRM Campaigns

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

System GMM estimation of Empirical Growth Models

A Course in Applied Econometrics Lecture 4: Linear Panel Data Models, II. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Alternative Approaches to Estimation and Inference in Large Multifactor Panels: Small Sample Results with an Application to Modelling of Asset Returns

WISE International Masters

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

A better way to bootstrap pairs

Revisiting the Synthetic Control Estimator

Econometrics Review questions for exam

Has the Family Planning Policy Improved the Quality of the Chinese New. Generation? Yingyao Hu University of Texas at Austin

Introduction to Panel Data Analysis

Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13)

Empirical approaches in public economics

Instrumental Variables

PhD/MA Econometrics Examination January 2012 PART A

Introduction to Eco n o m et rics

AGEC 661 Note Fourteen

Transcription:

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Matthew Harding and Carlos Lamarche January 12, 2011 Abstract We propose a method for estimating the slope parameter in an interactive effects panel data model with endogenous loadings and factors, and endogenous regressors. JEL: C23, C33 Keywords: Panel data; Instrumental variables; Interactive fixed effects. 1. Introduction Panel data models allow us to control for unobserved individual heterogeneity. Recently, attempts have been made to relax the traditional assumption of unique time invariant individual effects using multiple interactive effects (Bai, 2009; Pesaran, 2006). The natural extension to the standard panel data model with N cross-sectional units and T time periods, y it = β x it +ǫ it, imposes a multi-factor error structure on the error term ǫ it = λ i F t +u it, where λ i is an r 1 vector of factor loadings and F t corresponds to the r common factors, and where both λ i and F t are unobserved. The classical individual effects model can be obtained by setting F t and r equal to one. Note however that both standard differencing and time trends methods are inconsistent in this setting since they will not remove the unobserved effects. We are grateful to Jonathan Temple for comments and suggestions. We also thank Cecilia Rouse for providing the MPCP data. Corresponding author: Department of Economics, Stanford University, 579 Serra Mall, Stanford, CA 94305; Phone: (650) 723-4116; Fax: (650) 725-5702; Email: mch@stanford.edu Department of Economics, University of Oklahoma, 729 Elm Avenue, Norman, OK 73019; Phone: (405) 325-5857; Email: lamarche@ou.edu

2 Bai (2009) considers the estimation of the interactive effects model in large N and T panel data settings and treats both λ i and F t as fixed-effects parameters to be estimated, thus allowing for potential correlation between the unobservable interactive effects and the regressors x it. The finite sample performance of this method is relatively poor in micro-panels with small T. Pesaran (2006) estimates the model using cross-sectional averages of the data to proxy for the unobserved effects. The method relies on the assumption that only the latent factors F t are allowed to be correlated with the regressors and the method is inconsistent if this assumption is violated. Moreover, Bai (2009) and Pesaran (2006) assume that the covariates x and the error term u are stochastically independent. We show in this paper that the approach of Pesaran (2006) can be easily modified to accommodate instrumental variables estimation. Moreover, instrumental variables estimation is also shown to be an easy fix for the case where both factors and loadings are correlated with the regressors. 2. Model and Method This paper considers the following model for i = 1... N and t = 1... T, (2.1) (2.2) (2.3) y it = β x it + γ d t + ǫ it, ǫ it = λ if t + u it, x it = Π w it + G d t + Λ if t + A λ i + v it. We relax the assumptions of Bai and Pesaran by allowing that some of the components of v are stochastically dependent on u. In this model x it and d t denote the individual and common observed regressors, while F t denotes the unobserved common regressors with individual loadings λ i. It is assumed that we additionally observe a vector of instruments w it which satisfies the appropriate identification conditions. The dimensions of the model are such that it is conformal to a design with k 1 regressors x it, k 2 common time trends d t, r factors F t and m k 1 instruments w it. In what follows we consider for simplicity the case of k 1 = m. Define z it = (y it, x it ). Then, (2.4) z it = C 1w it + C 2d t + C 3iF t + C 4λ i + ξ it, where ξ it = (β v it +u it,v it ) and C 1 = ((Πβ), Π ), C 2 = ((Gβ+γ), G ), C 3i = ((Λ i β+λ i ), Λ i ) and C 4 = ((Aβ), A ).

3 Now consider cross-sectional averages, (2.5) z t = C 1 w t + C 2 d t + C 3 F t + C 4 λ + ξ t. Hence, (2.6) F t = ( C 3 C 3 ) 1 C3 ( z t C 1 w t C 2d t C 4 λ ξ t ) Under the regularity conditions of Pesaran (2006), as N, for all t we have ξ t 0 and C 3 C 3 (constant). Hence, we can proxy for F t by ( w t, d t, z it, 1 ). Notice, however that the required proxy λ is not observed and consistent estimation is not possible unless either A = 0 or λ = 0. Furthermore, a second source of endogeneity is present as long as E(u it v it,k ) 0. The analysis indicates that if valid instruments w it are present performing an instrumental variable regression on equation 2.1 augmented by ( w t, d t, z it ) will lead to consistent estimates of β. The instruments address both the endogeneity of the regressors x it due to the correlation between u it and v it and the correlation of the factor loadings with the regressors. In practice, the model can be estimated consistently using instrumental variables as follows. First, augment equation 2.1 using the cross-sectional means ȳ t and x t. Notice, that this introduces the additional endogenous variable x t. Second, estimate the augmented equation using instrumental variables [d t. w t.1] as the first stage regressors. Inferential procedures could be implemented by accommodating the variance formulas introduced in Pesaran (2006) or by considering the bootstrap, as in Section 4. Once the coefficients on the observables have been estimated consistently, each model can be transformed into a traditional factor model and the factors and factor loadings consistently estimated using maximum likelihood or PCA. If the number of factors is unknown, it can also be estimated consistently using the eigenvalue method of Harding and Nair (2009) even in the presence of factor dynamics. 3. Monte Carlo We generate the dependent variable considering a design similar to Bai (2009) and Pesaran (2006): y it x it F jt η jt = β 0 + β 1 x it + γd t + λ i F t + u it = π 0 + π 1 w it + gd t + lι F t + aλ if t + v it = ρ f F jt 1 + η jt = ρ η η jt 1 + e jt

4 for j = {1,2},... t = 49,... 0,... T in the last two equations. The random variables are d t N(0,1), λ i1,λ i2 N(1,0.2), (u it,v it ) N(0,Ω), and e and w are Gaussian random variables. The parameters are assumed to be: β 0 = π 0 = l = 2, β 1 = γ = π 1 = g = 1, ρ f = 0.90, ρ η = 0.25, and Ω 11 = Ω 22 = 1. We consider three designs: (I) a = 0 and Ω 12 = Ω 21 = 0; (II) a = 2 and Ω 12 = Ω 21 = 0; (III) a = 2 and Ω 12 = Ω 21 = 0.5. Statistics N T Estimators OLS FE 2SLS IVFE ICCE CCE IVCCE Monte Carlo Design I Bias 50 5 0.2554 0.3147-0.0122-0.0210-0.0003 0.0004-0.0010 RMSE 50 5 0.2829 0.3355 0.1569 0.1762 0.0283 0.0873 0.1150 Bias 100 5 0.2509 0.3143-0.0019-0.0050 0.0002-0.0041 0.0027 RMSE 100 5 0.2764 0.3331 0.0995 0.0941 0.0205 0.0650 0.0851 Monte Carlo Design II Bias 50 5 0.2952 0.2360-0.0683-0.0106-0.0003 0.3318 0.0017 RMSE 50 5 0.3010 0.2387 1.9130 0.9921 0.0283 0.3447 0.2953 Bias 100 5 0.2939 0.2360-0.0243-0.0292 0.0002 0.3282 0.0053 RMSE 100 5 0.2995 0.2378 0.1923 0.5537 0.0205 0.3411 0.2144 Monte Carlo Design III Bias 50 5 0.3419 0.2714-0.0690-0.0329 0.2032 0.4434-0.0032 RMSE 50 5 0.3464 0.2725 1.3988 0.7868 0.2134 0.4449 0.2994 Bias 100 5 0.3416 0.2716-0.0288-0.0400 0.2042 0.4430-0.0010 RMSE 100 5 0.3458 0.2724 0.1638 0.7272 0.2130 0.4443 0.2091 Table 1. Bias and root MSE of panel data regression estimators. Results are based on 1000 replications. In Table 1 we compare the performance of several estimators for different samples sizes. We consider the ordinary least squares estimator (OLS), the within estimator (FE), the two-stage least

5 squares estimator (2SLS), the instrumental variables estimator applied to the within transformation (IVFE), the infeasible common correlated effect estimator (ICCE) which observes λ i F t, the common correlated effects estimator (CCE) which proxies for the unobserved factors using cross-sectional averages and the instrumental variables common correlated effects estimator (IVCCE) proposed in this paper, which applies instrumental variables estimation to the regression equation augmented by the cross-sectional averages of the observables. The first MC design corresponds to a situation with interactive effects, where the regressors are exogenous and the factor loadings are not correlated with the regressors. As we would expect IV, IVFE, ICCE, CCE and IVCCE are all unbiased. The second MC design corresponds to a model where the regressors are exogenous but the factor loadings are correlated with the regressors. In this case we notice that CCE is biased since it does not fully proxy for the latent factors. IVCCE however automatically corrects this problem. This model could have been estimated using the concentrated least squares approach of Bai (2009), however due to the small T dimension common in micro-econometric applications, the IVCCE estimator has better mean squared error properties than an approach involving PCA which produces large mean-squared errors (Online Appendix to Bai, 2009). The final MC design allows for both endogenous regressors and factor loadings correlated with the regressors. In this case the CCE estimator is severely biased while the IVCCE estimator continues to perform very well. Notice that the ICCE estimator is also severely biased due to the endogenous regressors and shows that knowing the true factors is not as important as correcting for endogeneity when it is present. 4. Empirical Application We consider data from the Milwaukee Parental Choice program (MPCP), the same data as Rouse (1998). Consider a model with interactive fixed effects: (4.1) T it = δc it + β x it + λ if t + u it where T measures educational attainment, c is actual attendance to choice school, x is a vector of exogenous variables that includes grade level of the test, gender, income, application lotteries, indicators for years from application to the program, and a dummy variable for whether the test was imputed. The variable w is an indicator variable for whether the students was randomly selected to

6 attend choice schools. We allow for possible correlation between c and (λ,f,u), and by definition, w u. Treatment Estimators Variable OLS FE CCE 2SLS IVFE IVCCE Enrolled in choice 0.699-0.772 0.720 3.530 4.276 3.585 school (1.175) (1.264) (1.176) (2.071) (2.265) (2.067) Table 2. Estimates of the causal effect of choice schools on math test scores. Table 2 presents results. The OLS, FE, and CCE estimates are likely to be biased because the treatment variable is suspected to be endogenous. Additionally the standard within transformation does not get rid of the individual specific effect, and thus E((c it c i )λ i (F t F)) 0 if we consider for simplicity the case of one factor. 2SLS and IVFE might be biased if there are interactive fixed effects also. The instrument could be correlated with the multifactor error term: E((w it w i )λ i (F t F)) 0 in the IVFE case and E(w it λ i F t ) 0 in the IV case. These conditions might be interpreted as suggesting that the selection to the program could be related to parents motivation and time factors like the availability of seats in the schools. The IVCCE suggests that the program benefited students on mathematics, yet the gain is less than that implied by standard IVFE methods. References Bai, J. (2009): Panel Data Models with Interactive Fixed Effects, Econometrica, 77(4), 1229 1279. Harding, M., and K. K. Nair (2009): Estimating the Number of Factors and Lags in High-Dimensional Dynamic Factor Models, mimeo. Pesaran, M. H. (2006): Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error Structure, Econometrica, 74(4), 967 1012. Rouse, C. (1998): Private School Vouchers and Student Achievement: An Evaluation of the Milwaukee Parental Choice Program, Quarterly Journal of Economics, pp. 553 602.