Spurious Significance of Treatment Effects in Overfitted Fixed Effect Models Albrecht Ritschl 1 LSE and CEPR. March 2009

Similar documents
Test of Hypotheses in a Time Trend Panel Data Model with Serially Correlated Error Component Disturbances

UNIFYING PCA AND MULTISCALE APPROACHES TO FAULT DETECTION AND ISOLATION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

Online Appendix for Trade Policy under Monopolistic Competition with Firm Selection

On the Use of Linear Fixed Effects Regression Models for Causal Inference

Web-Based Technical Appendix: Multi-Product Firms and Trade Liberalization

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

u!i = a T u = 0. Then S satisfies

A Modification of the Jarque-Bera Test. for Normality

Systems & Control Letters

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

A simple underidentification test for linear IV models, with an application to dynamic panel data models

Survey-weighted Unit-Level Small Area Estimation

Web Appendix to Firm Heterogeneity and Aggregate Welfare (Not for Publication)

Improving Estimation Accuracy in Nonrandomized Response Questioning Methods by Multiple Answers

Least-Squares Regression on Sparse Spaces

Research Article When Inflation Causes No Increase in Claim Amounts

Introduction to the Vlasov-Poisson system

Advanced Econometrics

Modeling time-varying storage components in PSpice

Quality competition versus price competition goods: An empirical classification

Damage identification based on incomplete modal data and constrained nonlinear multivariable function

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

CONSERVATION PROPERTIES OF SMOOTHED PARTICLE HYDRODYNAMICS APPLIED TO THE SHALLOW WATER EQUATIONS

Unexplained Gaps and Oaxaca-Blinder Decompositions

Placement and tuning of resonance dampers on footbridges

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Gaussian processes with monotonicity information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Logarithmic spurious regressions

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

Agmon Kolmogorov Inequalities on l 2 (Z d )

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

Costly Divorce and Marriage Rates

Damage detection of shear building structure based on FRF response variation

Bivariate distributions characterized by one family of conditionals and conditional percentile or mode functions

A simple model for the small-strain behaviour of soils

SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is

3.2 Shot peening - modeling 3 PROCEEDINGS

Entanglement is not very useful for estimating multiple phases

arxiv: v4 [math.pr] 27 Jul 2016

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2.

Chapter 2 Lagrangian Modeling

Addendum to A Simple Differential Equation System for the Description of Competition among Religions

Multi-View Clustering via Canonical Correlation Analysis

The influence of the equivalent hydraulic diameter on the pressure drop prediction of annular test section

OPTIMAL CONTROL OF A PRODUCTION SYSTEM WITH INVENTORY-LEVEL-DEPENDENT DEMAND

MATH , 06 Differential Equations Section 03: MWF 1:00pm-1:50pm McLaury 306 Section 06: MWF 3:00pm-3:50pm EEP 208

Linear Regression with Limited Observation

Concentration of Measure Inequalities for Compressive Toeplitz Matrices with Applications to Detection and System Identification

Ductility and Failure Modes of Single Reinforced Concrete Columns. Hiromichi Yoshikawa 1 and Toshiaki Miyagi 2

A COMPARISON OF SMALL AREA AND CALIBRATION ESTIMATORS VIA SIMULATION

The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas

ANALYSIS OF A GENERAL FAMILY OF REGULARIZED NAVIER-STOKES AND MHD MODELS

New Statistical Test for Quality Control in High Dimension Data Set

Multi-View Clustering via Canonical Correlation Analysis

3 The variational formulation of elliptic PDEs

IPA Derivatives for Make-to-Stock Production-Inventory Systems With Backorders Under the (R,r) Policy

Optimization of Geometries by Energy Minimization

Sparse Reconstruction of Systems of Ordinary Differential Equations

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Influence of weight initialization on multilayer perceptron performance

MODELLING DEPENDENCE IN INSURANCE CLAIMS PROCESSES WITH LÉVY COPULAS ABSTRACT KEYWORDS

Lecture 6: Generalized multivariate analysis of variance

A Hybrid Approach for Modeling High Dimensional Medical Data

THE EFFICIENCIES OF THE SPATIAL MEDIAN AND SPATIAL SIGN COVARIANCE MATRIX FOR ELLIPTICALLY SYMMETRIC DISTRIBUTIONS

Resilient Modulus Prediction Model for Fine-Grained Soils in Ohio: Preliminary Study

CONTROL CHARTS FOR VARIABLES

Inter-domain Gaussian Processes for Sparse Inference using Inducing Features

Level Construction of Decision Trees in a Partition-based Framework for Classification

(2012) , ISBN

Estimating International Migration on the Base of Small Area Techniques

A. Exclusive KL View of the MLE

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Influence of Radiation on Product Yields in a Film Boiling Reactor

Heteroscedasticityinstochastic frontier models: A Monte Carlo Analysis

Closed and Open Loop Optimal Control of Buffer and Energy of a Wireless Device

Real-time arrival prediction models for light rail train systems EDOUARD NAYE

CHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold

Robustness and Perturbations of Minimal Bases

Stable and compact finite difference schemes

Modelling and simulation of dependence structures in nonlife insurance with Bernstein copulas

Econ 582 Fixed Effects Estimation of Panel Data

Nonparametric Additive Models

Experimental Robustness Study of a Second-Order Sliding Mode Controller

Short Intro to Coordinate Transformation

Situation awareness of power system based on static voltage security region

The Use of the Durbin-Watson d Statistic in Rietveld Analysis

Connections Between Duality in Control Theory and

Econometrics of Panel Data

arxiv: v4 [stat.ml] 21 Dec 2016

Developing a Method for Increasing Accuracy and Precision in Measurement System Analysis: A Fuzzy Approach

Nearly finite Chacon Transformation

inflow outflow Part I. Regular tasks for MAE598/494 Task 1

Transcription:

Spurious Significance of reatment Effects in Overfitte Fixe Effect Moels Albrecht Ritschl LSE an CEPR March 2009 Introuction Evaluating subsample means across groups an time perios is common in panel stuies that evaluate the treatment effects of training programs, labor market policies, currency unions etc. Comparison of means between treate an non-treate groups may occur along the time axis (fixe effects, FE), the cross section (poole OLS, IV) or in a combination of the two (ifference in ifferences, DiD, see Ashenfelter (978), Ashenfelter an Car (985)), epening on the choice of ientifying assumptions about selectivity an common trens (see Heckman, Lalone, an Smith (999)). Despite their wiesprea use in evaluation stuies, FE an DiD estimators have acquire a reputation for generating spuriously low stanar errors on the estimate treatment effect. Bertran, Duflo, an Mullainathan (2004) survey empirical applications of the DiD estimator, an fin that much of this phenomenon can be attribute to autocorrelation. In a simulate ataset, they show that many stanar methos for ealing with autocorrelation yiel ownwar biase stanar errors on the treatment effect coefficient. he following note argues that spurious significance of treatment effects in panels may also occur in the absence of autocorrelation. his phenomenon arises in overfitte FE an DiD moeling of within-group comparisons. Overfitting in such moels occurs if observation-specific iniviual fixe effects (IFE) are specifie, although the comparison woul be ientifie by group-specific fixe effects. In evaluation stuies, ientifying the average treatment effect on the treate through a withingroup estimator woul require a group fixe effect on the treate (FE), see e.g. Angrist an Pischke (2009). Yet specifying an overfitte regression with IFE instea may seem innocuous to the applie researcher, as the coefficient estimates on the treatment effect uner both fixe effect specifications are ientical. Moreover, stanar software packages provie easy to use options for iniviual fixe effects, making an overfitte specification seem attractive. However, while the estimate treatment effect uner IFE an FE is the same, its estimate stanar error is not. Overfitting through IFE leas to spurious precision of the estimate treatment effect coefficient. he resulting bias is relate to the reuction in the resiual sum of squares inuce by employing IFE instea of FE. Uner ieal conitions where all Financial support from Deutsche Forschungsgemeinschaft uner SFB649 Economic Risk at Humbolt University Berlin is gratefully acknowlege.

other regressors are uncorrelate to the treatment an the fixe effects, this relation is strictly proportional. he rest of this note is structure as followe. he next section provies the setup. Section (3) presents the result. Section (4) conclues. 2 A Minimal an an Overfitte Setup Consier a ata panel with n observation units in the cross section an time perios. In this panel, enote by Y n the epenent variable. Z is a matrix of characteristics of interest, as well as any time fixe effects, while X inclues the regression constant an/or a suitably chosen matrix of either iniviual or group fixe effects. A policy treatment is applie to some observation units y i uring treatment perio τ {s,..., s+τ}. reatment uring perio t τ is inicate by a (n )-vector of ummy variables t, which are equal to one if unit i is uner treatment at time t, an zero otherwise. tr( ) < n is the number of observation units i in the treatment group. Accoringly, n is the number of observation units in the non-treate group. A stanar linear panel moel of this treatment effect problem is: Y (XD)β + Zγ + v () where v N(0, σ 2 v) an where D (0... s... τ... 0) is a ummy vector capturing the policy treatment in τ perios. Fixe effects estimation of moels like () is a popular (yet problematic) attempt to ensure the exogeneity of D with respect to the isturbance term v. o focus on the essentials, consier an ieal regression in which any characteristics inclue in Z are orthogonal to the fixe effects X an the treatment ummy D. Define the etrene variable y M z Y with M z I Z(Z Z) Z, where the influence on Y of any such characteristics, as well as any time fixe effects inclue in Z has been remove 2. As M z XD XD if Z XD 0, the moel becomes a Least Squares Dummy Variables (LSDV) regression on the fixe effect terms an the treatment ummy only: y (XD)β + u (2) where X is a suitably chosen matrix of fixe effects, D (0...... τ... 0) is a ummy vector capturing the policy treatment in τ perios, an u N(0, σu). 2 Uner iniviual fixe effects, X consists of stacke (n n) ientity matrices: X I I n n. I n n n n Uner the alternative assumption of a group fixe effect on the treate, matrix X takes the form: 2 ime fixe effects woul be orthogonal to X. heir inclusion in Z makes the FE an DiD estimators in y ientical. 2

X G.. n 2 Note that the column imension of X G is 2 as oppose to n in X I. LSDV estimation of (2) uner the two ifferent fixe effect specifications yiels: ˆβ I (X I D] X I D]) X I D] y ˆΩ ˆβ,I ˆσ 2 u,i (XI D] X I D]) (3) ˆβ G (X G D] X G D]) X G D] y ˆΩ ˆβ,G ˆσ 2 u,g (XG D] X G D]) (4) Let b I be the n+th (i.e., last) element of ˆβ I, an b G be the 3r (i.e., last) element of ˆβ G. b I an b G are the coefficients on the treatment ummy uner Iniviual Fixe Effects (I) an the Group Fixe Effect on the reate (G), respectively. Likewise, let ˆσ 2 (b I ) ˆΩ ˆβ,I,(n+,n+) an ˆσ 2 (b G ) ˆΩ ˆβ,G,(3,3) be the estimate variances of these coefficients, with Sn+,n+ I X I D] X I D]) n+,n+ an S3,3 G X G D] X G D]) 3,3 as the pertaining elements of the matrix inverses in (3) an (4), respectively. 3 Spurious Significance uner Overfitting Consier a treatment effect moel as in eq. (2), in which the enogenous variable has been etrene from any time effects, an in which any further characteristics are orthogonal to the fixe effects an treatment ummy, an have been eliminate as well. Estimation uner the alternatives of Iniviual Fixe Effects (IFE) an Fixe Effects on the reate (FE) as in (3) yiels ientical coefficient estimates on the treatment effects. However, the estimate variances on these coefficients in (4) iffer, owing to the presence of unnecessary ummy variables in the IFE specification that artificially increases the fit of the regression. his is expresse in the following Proposition. In a treatment effect moel as in eq. (2), the estimate variance of the treatment effect coefficient is ownwar biase uner IFE relative to FE. he bias is equal to the ratio of the estimate resiual variances uner IFE an FE: ˆσ 2 (b I ) ˆσ2 u,i. ˆσ 2 (b G ) ˆσ u,g 2 Proof. It suffices to show that Sn+,n+ I S3,3, G i.e. the last elements on the main iagonal of the inverte prouct sum matrices in eqs. (3) an (4) are ientical. By elementary operations, X I X I I n n. Hence, uner Iniviual Fixe Effects: ( ) X I D] X I I D] n n τ τ τ where, as efine further above, tr( ) tr( ). Inverting this partitione matrix, we fin for the (n+,n+)-element of the inverse: (X I D] X I D]) n+,n+ (τ τ τ) 3 τ( τ) (5)

Uner Fixe Effects on the reate, the prouct sum matrix becomes: ( ) n X G X G Hence, n τ X G D] X G D] τ τ τ τ Inverting this partitione matrix, we fin for the element (3,3) of the inverse: Using this becomes: (X G D] X G D]) 3,3 τ τ( (X G X G ) (X G D] X G D]) 3,3 τ τ τ τ 2 (n ) )(X G X G ) τ ( ) n ( ) (n ) τ( ) τ n ( )] (6) ( )] ( )] (n ) τ( ) 0 τ (n ) ] (7) is equal to (5), which completes the proof. τ( τ) (7) In applie work, the possible correlation of aitional characteristics Z with XD means the above relation no longer obtains exactly. Unless, however, this correlation amounts to near-collinearity, its effect is small relative to the overfitting effect escribe in the proposition. 4 Conclusion Applications of fixe effect an ifference in ifferences estimators sometimes employ iniviual, observation-unit specific fixe effects when group-specific fixe effects woul suffice for ientification. his note has examine the properties of ifference in ifferences estimators of treatment effects uner two ifferent fixe effects specifications. It shows that overfitting uner iniviual, observation-unit specific fixe effects generates lower stanar errors on the treatment effect coefficient than estimation uner a minimal specification with group specific effects. Depening on the correlation with other regressors, this bias grows at or near the relative ecrease 4

of the resiual sum of squares as the number of overfitte fixe effects increases. In large samples, which are frequent in evaluation stuies, this overfitting bias may lea to substantial unerestimation of the stanar errors on treatment effect coefficients, an hence to substantial false positives. References Angrist, J., an S. Pischke (2009): Mostly Harmless Econometrics. Princeton University Press. Ashenfelter, O. (978): Estimating the Effect of raining Programs on Earnings, Review of Economics an Statistics, 60, 47 57. Ashenfelter, O., an D. Car (985): Using the Longituinal Structure of Earnings to Estimate the Effect of raining Programs, Review of Economics an Statistics, 64, 648 660. Bertran, M., E. Duflo, an S. Mullainathan (2004): How Much Shoul We rust the Differences-in-Differences Estimator?, Quarterly Journal of Economics, 9, 249 275. Heckman, J., R. Lalone, an J. Smith (999): he Economics an Econometrics of Active Labor Market Programs, in Hanbook of Labor Economics, e. by O. Ashenfelter, an D. Car, pp. 865 2033. Elsevier. 5