An Algorithm to Estimate the Two-Way Fixed Effect Model Paulo Somaini Frank Wolak

Similar documents
Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

An Algorithm to Estimate the Two-Way Fixed Effects Model

Econometrics of Panel Data

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

x i1 =1 for all i (the constant ).

Primer on High-Order Moment Estimators

A Comparative Study for Estimation Parameters in Panel Data Model

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Now we relax this assumption and allow that the error variance depends on the independent variables, i.e., heteroskedasticity

Linear Regression Analysis: Terminology and Notation

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Comparison of Regression Lines

Statistics for Economics & Business

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

Chapter 11: Simple Linear Regression and Correlation

Basically, if you have a dummy dependent variable you will be estimating a probability.

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Winter 2017 Instructor: Victor Aguirregabiria

Factor models with many assets: strong factors, weak factors, and the two-pass procedure

This column is a continuation of our previous column

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

A Monte Carlo Study for Swamy s Estimate of Random Coefficient Panel Data Model

Systems of Equations (SUR, GMM, and 3SLS)

Testing for seasonal unit roots in heterogeneous panels

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Chapter 9: Statistical Inference and the Relationship between Two Variables

Lecture 4 Hypothesis Testing

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Chapter 13: Multiple Regression

Negative Binomial Regression

4DVAR, according to the name, is a four-dimensional variational method.

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Chapter 6. Supplemental Text Material

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Composite Hypotheses testing

Chapter 8 Indicator Variables

LECTURE 9 CANONICAL CORRELATION ANALYSIS

x = , so that calculated

Lecture 2: Prelude to the big shrink

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

The Geometry of Logit and Probit

Chapter 15 - Multiple Regression

Statistics for Business and Economics

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Difference Equations

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Lecture 3 Stat102, Spring 2007

Limited Dependent Variables and Panel Data. Tibor Hanappi

e i is a random error

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

January Examinations 2015

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

SIO 224. m(r) =(ρ(r),k s (r),µ(r))

Efficient nonresponse weighting adjustment using estimated response probability

The Granular Origins of Aggregate Fluctuations : Supplementary Material

STAT 511 FINAL EXAM NAME Spring 2001

Econometric Analysis of Panel Data. William Greene Department of Economics Stern School of Business

On the testing of heterogeneity effects in dynamic unbalanced panel data models

A new construction of 3-separable matrices via an improved decoding of Macula s construction

STAT 3008 Applied Regression Analysis

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

A Robust Method for Calculating the Correlation Coefficient

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

If we apply least squares to the transformed data we obtain. which yields the generalized least squares estimator of β, i.e.,

Turbulence classification of load data by the frequency and severity of wind gusts. Oscar Moñux, DEWI GmbH Kevin Bleibler, DEWI GmbH

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Exam. Econometrics - Exam 1

9. Binary Dependent Variables

Lab 4: Two-level Random Intercept Model

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

F statistic = s2 1 s 2 ( F for Fisher )

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Limited Dependent Variables

Notes on Frequency Estimation in Data Streams

Uncertainty as the Overlap of Alternate Conditional Distributions

Lecture 6: Introduction to Linear Regression

0.1 The micro "wage process"

CHAPTER 8. Exercise Solutions

Correlation and Regression

Linear Approximation with Regularization and Moving Least Squares

β0 + β1xi. You are interested in estimating the unknown parameters β

A TEST FOR SLOPE HETEROGENEITY IN FIXED EFFECTS MODELS

Joint Statistical Meetings - Biopharmaceutical Section

Estimation: Part 2. Chapter GREG estimation

Andreas C. Drichoutis Agriculural University of Athens. Abstract

Projection estimators for autoregressive panel data models

Lecture 12: Discrete Laplacian

Basic Business Statistics, 10/e

Transcription:

An Algorthm to Estmate the Two-Way Fxed Effect Model Paulo Soman Frank Wolak May 2014 Abstract. We present an algorthm to estmate the two-way fxed effect lnear model. The algorthm reles on the Frsch-Waugh-Lovell Theorem and apples to ordnary least squares (OLS, two-stage least squares (TSLS and generalzed method of moments estmators (GMM. The coeffcents of nterest are computed usng the resduals from the projecton of all varables on the two sets of fxed-effects. Our algorthm has three mportant advantages. Frst, t manages memory and computatonal resources effcently whch speeds up the computaton of the estmates. Second, t allows the researcher to estmate multple specfcatons usng the same set structure of fxed effects at a very low computatonal cost. Thrd, the asymptotc varance of the parameters of nterest can be consstently estmated usng standard routnes on the resdualzed data. 1 Introducton Large data sets allow researchers to obtan precse estmates even after controllng for dfferent sources of heterogenety. It s often the case that approprately controllng for heterogenety requres ncludng two sets of hgh-dmensonal fxed effects. For example, consder estmatng resdental electrcty demand usng daly consumpton from a panel of 1000 households over a 5-year perod. Half of the households were exposed to real-tme prcng whle the other half faced a regular twopart tarff. It seems approprate to nclude households fxed effects that capture heterogenety n electrcty consumpton habts, and day-of-sample fxed effects that capture common unobserved tme-specfc demand shocks (e.g., due to varyng whether condtons. If each consumer s observed 1826 tmes, the data would contan 1.8 mllon observatons. Gven memory constrants, creatng a set of dummes for ether household or day effects s not practcal and may not be feasble at all. In ths paper we propose a feasble algorthm that computes the effect of the varables of nterest (e.g., exposure to tme-based prcng and prce elastctes managng memory and computatonal power effcently. The algorthm reles on the Frsch Waugh Lovell Theorem. The coeffcents of nterest are computed by OLS, TSLS or GMM usng the resduals from the projecton of all varables on the two sets of fxed-effects. Ths procedure s trval n balanced panels wth equally-weghted Authors afflatons: Department of Economcs, MIT, Cambrdge, Massachusetts, U.S.A. e-mal: psoman@mt.edu Department of Economcs, Stanford Unversty, Stanford, Calforna, U.S.A. e-mal: wolak@stanford.edu 1

observatons but t can be qute complcated n cases where observatons are not equally weghted or the panel s unbalanced. The algorthm n ths paper s specally suted for the latter case. Let N be the number of households/groups and T be the number of perods. The data conssts of approxmately N T observatons. Unbalanced panels may have less observatons; conversely, f a groups s observed more than once n each tme perod the number of observatons may be larger. Constructng an addtonal set of dummes wll requre to add mn (N, T 1 varables. Ths s feasble only f the memory s able to store an array wth dmensons N T (mn (N, T 1. We propose an algorthm where ths s not necessary. The frst step of the algorthm computes a set of three matrces that characterze the structure of fxed effects gven a partcular sample. The frst matrx s N by T wth typcal element equal to the sum of weghts for observatons that share a partcular par of fxed effects. The other two matrces can be computed from the frst one and are used repeatedly n the subsequent steps. The sngle most computatonally ntensve operaton n ths step s nvertng a (mn (N, T 1 by (mn (N, T 1 matrx. The second step of the algorthm obtans the resduals of the projecton of each of the varables on the two sets of fxed effects. Ths step s computatonally nexpensve as the matrx nverson requred to compute the projecton was done once and for all n the frst step. The thrd step s to estmate the desred specfcaton wth the resdualzed data usng standard statstcal routnes. By vrtue of the Frsch Waugh Lovell Theorem, the estmate of the asymptotc varance calculated wth the resdualzed data s consstent. Our algorthm has three mportant advantages relatve to exstent routnes. Frst, t manages memory and computatonal resources effcently whch speeds up the computaton of the estmates. Second, t allows the researcher to estmate multple specfcatons usng the same set of dummes at a very low computatonal cost. The most computatonal ntensve operaton, nvertng the matrx n step one, has to be performed only once no matter how many explanatory varables are ncluded n the analyss. The results of step one can be stored and used over and over as the researcher adds more varables to the analyss and runs dfferent specfcatons. Thrd, the asymptotc varance of the parameters of nterest can be consstently estmated usng standard routnes for OLS, TSLS or GMM on the resdualzed data, ncludng the estmator robust to heteroskedastcty and wthngroup correlaton. Standard routnes avalable n statstcal software do not deal wth two-way fxed effect models effcently. Stata allows the user to absorb one set of fxed effects but requres generatng the a set of dummes for the other. In SAS, PROC PANEL has a TWOWAY opton that creates one set of dummes. Both procedures may requre more memory than typcally avalable. There are some useful user-generated algorthms that avod creatng dummes; they are specfcally desgned to deal wth stuatons where the panel structure s sparse, that s, cases where N and T are both very large but the number of observatons s order of magntudes lesser than N T. For example, 2 mllon wage observatons for N = 100, 000 ndvduals n T = 20, 000 frms. In these cases, the challenge s nvertng a T T matrx. In Stata, algorthms such as reg2hdfe and a2reg avod ths nverson relyng n teratve procedures. Ther drawback s that the results cannot be used to estmate dfferent specfcatons. The algorthm proposed here s specfcally desgned to deal wth 2

dense panel structures where nvertng a T T matrx s costly, but possble. The nverse matrx can be stored and used repeatedly when estmatng dfferent specfcatons. The rest of ths paper s organzed as follows. The next secton descrbes the panel data model wth two-way fxed effects and the OLS, TSLS, and GMM estmators wth ther correspondng asymptotc varance. Secton 3 descrbes the algorthm. Secton 4 compares ts performance wth other avalable methods. 2 The Panel Model Followng Arellano (1987, consder the model: y t = x tβ + e + h t + u t (t {1,..., T } ; {1,..., N}, where y t s the outcome of household n perod t, x t s a K 1 vector of ncluded varables, h t s a tme fxed effect and e s a group/household fxed effect. 1 The household effect e and the tme effect h t are unobservable and potentally correlated wth x t. There s a set of nstruments z t (L 1 vector, L K that s assumed to satsfy the followng exogenety condton: E (u t z 1,..., z T, e, h 1,..., h T = 0. The u t are assumed to be ndependently dstrbuted across groups (or ndvduals but no restrctons are placed on the form of the autocovarances for a gven group. E ( u u z 1,..., z T, e, h 1,..., h T = Ω Each observaton has a weght that can be related to the nverse of the probablty of beng sampled. If a par (, t s unobserved then w t = 0 (mssng data s assumed to occur at random. Let ỹ denote the vector of outcomes y t stacked by group and then ordered chronologcally. Smlarly, denote X be the matrx of stacked vectors x t and Z be the matrx of stacked vectors z t. D s a matrx of dummes for groups and H s a matrx of dummes for tme perods. w s the vector of weghts and ([ ũ s the ] vector of unobserved u t. Ether H or D has one of ts columns removed so that rank D, H = T + N 1. It s convenent to remove a column of H f T < N and a column of D f N < T. Let y = dag( wỹ, u = dag( wũ, X = dag( w X, Z = dag( w Z, D = dag( w D and H = dag( w H. The model can be wrtten as: Y = Xβ + De + Hh + u (1 Let M denote the annhlator matrx of S = [D, H]. M = I S (S S 1 S, M = M, MM = M and MS = 0. Denote y + = My, X + = MX, Z + = MZ. By the Frsch Waugh Lovell Theorem 1 The panel can be unbalanced. There can be unobserved pars (, t or pars (, t that are observed more than once. To keep notaton smple, we wll focus only on mbalances of the frst knd but the algorthm handles both. 3

(Frsch and Waugh, 1933; Lovell, 1963, 2008; Gles, 1984, the GMM estmator of β for a weghtng matrx Ŵ s: ˆβGMM = (X + Z + Ŵ Z + X + 1 (X + Z + Ŵ Z + y + = β + (X + Z + Ŵ Z + X + 1 (X + Z + Ŵ Z + y + If Ŵ = (Z+ Z + 1, ths estmator s the TSLS estmator; f K = L, t s the nstrumental varables estmator; and f X = Z, t s the OLS estmator. ˆβ OLS = ( X + X + 1 ( X + y + = β + ( X + X + 1 ( X + u Under approprate regularty condtons: N ( ˆβ β N ( 0, J 1 V J 1, where J = plm N N 1 ( X + Z + Ŵ Z + X + ( [ N V = plm N N 1 X + Z + ( Ŵ Z + Ω Z + =1 ] (Ŵ Z + X + and Ω = dag( w Ω dag( w. Consstent estmates of the asymptotc varance can be obtaned applyng standard statstcal routnes to the resdualzed data (y +, X +, Z +. Consder for example the OLS estmator. Its asymptotc dstrbuton specalzes to: N ( ˆβ β N ( 0, J 1 V J 1 J = plm N N 1 ( X + X + V = plm N N 1 N =1 ( X + Ω X + ( Arellano (1987 consders three estmators of avar ˆβOLS. Let u + = y + X + estmator s robust to heteroskedastcty and wthn-group correlaton: ˆβ OLS. The frst ( avar 1 ˆβOLS = ( ( X + X + N 1 =1 X + u + u+ X + (X + X + 1 Ths estmator can be calculated clusterng standard errors at the group level. 2 (2 The second estmator wll produce consstent standard errors f x t and u t are fourth order ndependent and f dsturbances are homoskedastc: Ω = Ω for all : ( avar 2 ˆβOLS = ( ( X + X + N 1 X + ˆΩ + X+ 2 In Stata: reg plus y plus x*, vce(cluster h =1 (X + X + 1 (3 4

where ˆΩ + = N 1 N =1 u + u+. (4 The thrd estmator wll produce consstent standard errors under the classcal assumpton Ω = σ 2 I: avar 3 ( ˆβOLS = ˆσ 2 ( X + X + 1, (5 where ˆσ 2 u + u + = L N T K. and L s the number of observatons. Ths estmator can be calculated runnng the regresson of y + on X + and multplyng the estmate of the asymptotc varance of ˆβ L K OLS by L N T K. These estmators are obtaned applyng standard routnes to the resdualzed data (y +, X +. 3 The Algorthm In balanced panels wth constants weghts n one of the two dmensons w t = w or w t = w t the computaton of ˆβ and ts asymptotc varance s straghtforward. y + s obtaned subtractng the weghted mean of ỹ along both dmensons: where y + t = ỹ t ȳ ȳ t t ȳ = w tỹ t t w and ȳ t = t w t (ỹ t ȳ w t X + and Z + are computed followng the same procedure. Ths heurstc approach can be justfed formally. Premultplyng the orgnal model n equaton (1 frst by M D, the annhlator of D, and then by M H, the annhlator of H, results n: M H M D Y = M H M D Xβ + M H M D Hh + u (6 If the panel s balanced and weghts are constant n t or n, then M H M D = M H or M H M D = 0. In ether case, M H M D H = 0, and the transformed model does not depend on the fxed effects. If the panel s unbalanced or the group weghts vary, 0 M H M D M H and the transformaton M H M D does not elmnate the fxed effects n matrx H. The fxed effects are annhlated only f the model s premultpled by M = I S (S S 1 S where S = [D, H]. The algorthm presented n ths paper s specfcally desgned to deal wth unbalanced panels and conssts of three steps: 1. Compute (S S 1. Ths step requres nvertng a (mn (N, T 1 by (mn (N, T 1 matrx and storng t along wth two N by T matrces. These three matrx contan all the nformaton requred to construct the annhlator matrx M. 2. Obtan (y +, X +, Z +.. 5

3. Use standard methods to estmate ˆβ and ts asymptotc varance by OLS, TSLS or GMM. Only the frst step s computatonally ntensve as t requres nvertng a potentally large matrx. Ths step only has to be performed once for a gven panel structure. The compuatonal cost of resdualzng varables and runnng dfferent specfcatons wth them s relatvely low. Computaton of (S S 1 Accordng the the defnton of S, ( S S [ 1 D D = D H D H H H D D s a dagonal matrx wth typcal dagonal element (, equal to t w t. Smlarly, H H s a dagonal matrx wth typcal dagonal element (t, t equal to w t. D H s a N-by-T matrx wth typcal element (, t equal to w t. Usng the formula for the nverse of a parttoned matrx: [ ] A B B C [ D D = D H ] 1 ] 1 D H H, (7 H A = C = ( (D D D H ( H H 1 1 H D (8 ( (H H H D ( D D 1 1 D H (9 B = AD H ( H H 1 ( = D D 1 D HC (10 If T < N, the frst step of the algorthm calculates and stores C, B, D H and (D D 1 (C s T 1 by T 1, D H and B are N by T 1 and D D s N by N but t s dagonal, so N numbers are stored. The only non-dagonal matrx that s nverted s (9. A can be calculated from the stored matrces by A = ( D D 1 ( + D D 1 D HCH D ( D D 1. (11 To economze computng power and memory, A an N by N matrx s not constructed. If N < T, ths step calculates and stores A, B, D H and (H H 1 (A s N 1 by N 1, D H and B are N 1 by T and H H s T by T but t s dagonal, so T numbers are stored. The only non-dagonal matrx that s nverted s (8. C can be calculated by C = ( H H 1 ( + H H 1 H DAD H ( H H 1. (12 However, C s never constructed or stored. Notce that the algorthm never constructs or stores a non-dagonal square matrx of sze equal to the greatest of T and N. Ths feature of the algorthm makes t more robust to memory constrants. 6

The resdualzed data (y +, X +, Z + We obtan the projecton coeffcents ˆδ = AD y + BT y ˆτ = B D y + CT y and compute y + = y Dˆδ T ˆτ. Varables x + k n X+, and z + k n Z+ are obtaned n the same way. If T < N, the matrx A s replaced by the expresson n (11. D y s a N-th order vector so AD y can be calculated wthout ever creatng A, or any other N by N matrx that s not dagonal: AD y = ( D D 1 ( D y ( D D 1 ( D H C ( H D ( D D 1 ( D y Smlarly, If T > N, CT y s calculated wthout ever creatng C, or any other T by T matrx that s not dagonal. Estmaton of the parameters and ther asymptotc varance The resdualzed data (y +, X +, Z + can be used as f t was the orgnal data n any statstcal package. In Stata, for example, reg plus y plus x*, vce(cluster h returns ˆβ OLS along wth ts asymptotc varance estmate that s robust to wthn group correlaton; vregress gmm plus y (plus x*=plus z*, wmatrx(cluster h returns ˆβ GMM where the weghtng matrx equal to N 1 ( N =1 Z + u + u+ Z + 1. Ŵ s set Stata also returns a consstent estmate of the asymptotc varance of ˆβ GMM. vregress 2sls plus y (plus x=plus z returns a two-stage least square estmate of β along wth a consstent estmate of ts asymptotc varance robust to heteroskedastcty. If the asymptotc varance s estmated by (5, whch s consstent under the classcal assumpton that the error terms are ndependently and dentcally dstrbuted (Ω = σ 2 I, then the the esmaton of ˆσ should be adjusted to account for the reduced degrees of freedom. In other words, the estmate of the asymptotc varance should be multpled by L K L N T K. 4 Comparatve Performance Analyss Ths secton presents a compares the performance of the proposed algorthm wth three exstng algorthms n Stata: a2reg, reg2hdfe and felsdvreg. The comparsons are not totally far as these algorthms have been desgned to work n sparse panel models that are typcal n labor market applcatons. They are stll the best avalable optons even for dense panels. We mplemented the algorthm n mata, the matrx language n Stata. The mplementaton contans two man programs. twowayset performs the frst step of the algorthm for a par of fxed 7

effects and stores the requred matrces. projvar performs the second steps and returns a set of resdualzed varables. For example, f hhd s a varable that contans a household dentfer and td contans an hour-of-sample dentfed, then twowayset hhd td, root("c:/abc" performs step one and stores the matrces n the fle C:/abc. If consumpton and hourprce are the endogenous and exogenous varables, respectvely, projvar consumpton hourprce, root("c:/abc", p(plus creates the varables plus consumpton plus hourprce that contan the resdualzed verson of the orgnal varables. 3 The actual estmaton s performed usng the bult-n commands regress or vregress on the resdualzed varables. The command a2reg, proposed by Ouazad (2008, uses a Conjugate gradent method to solve the mnmum squares problem and obtan ˆβ OLS. The algorthm never computes X X; therefore, t does not report an estmate of the asymptotc varance. The command reg2hdfe, proposed by Gumaraes and Portugal (2010, teratvely projects the dependent varable on the ndependent varables and the two sets of dummes. Whle computatonally ntensve, ths approach mposes mnmum memory requrements. The command felsdvreg, proposed by Cornelssen (2008, absorbs one set of fxed effects and constructs components of the OLS normal equatons usng the dentfer of the other set of fxed effects. Thus, t solves for ˆβ OLS wthout creatng a set of dummes or relyng on an teratve process. We generate an panel of N = 10 n households and T = 10 t tme perods and randomly drop 10 percent of the observatons. The resultng panel s unbalanced. We draw household-specfc and tme-specfc fxed effect from a standard normal. We also draw K covarates x and errors υ from ndpendent standard normals. The dependent varable y s: K y t = x tk + d + h t + υ t k=1 We estmate the parameters of the model usng our procedure (twoway and the three exstng algorthms and compare ther runnng tme. Notce that the panel structure s dfferent n each draw, so we are not takng advantage of the fact that our method allows to run the matrx nverson n step one just once. Table 1 summarzes the results for dfferent values of N, T and K. Table 1 shows that our procedure outperforms algorthms desgned for sparse panels. The only excepton s the case wth N = T = 10, 000 when twoway spends too much tme nvertng a 9, 999 9, 999 matrx. reg2hdfe performs the estmaton much faster. Even ths case s useful to llustrate one of the advantages of our procedure. The ncrease n computng tme from K = 2 to K = 10 s relatvely small compared to the ncrease observed for reg2hdfe. Ths s true also n the other cases where twoway ourperforms reg2hdfe. Our method only requres to nvert the matrx once ndependently of the number of explanatory varables K. Table 2 shows that the computng tme employed n step 1, whch performs the nverson, does not depend on K. The computng 3 See the documentaton to the Stata code for addtonal detals. The algorthm was also mplemented n Matlab and SAS to take advantage of some specfc features of those programs. The Matlab mplementaton uses sparse matrces. The SAS mplementaton never loads the raw data n the computer RAM memory. 8

Runnng Tme (n seconds N T K twoway reg2hdfe a2reg felsdvreg 100 100 2 0.1 0.3 0.2 0.7 100 100 10 0.1 0.6 0.3 0.8 1000 100 2 0.3 2.7 2.1 8.0 1000 100 10 0.9 6.8 3.0 8.4 1000 1000 2 8.1 24.7 16.4 2074.2 1000 1000 10 13.4 70.9 27.7 2141.1 10000 100 2 3.3 25.1 17.3 73.3 10000 100 10 8.4 71.5 28.7 82.5 10000 1000 2 48.4 242.2 163.6 21409.9 10000 1000 10 94.2 672.2 265.6 21572.9 10000 10000 2 9266.1 2806.3 * >1d 10000 10000 10 9836.5 8220.4 * >1d 100000 100 2 26.2 269.8 168.7 >1d 100000 100 10 81.4 891.2 278.8 >1d 100000 1000 2 397.1 2755.9 * >1d 100000 1000 10 970.5 8274.4 * >1d Table 1: Ths table compares the performance of each algorthm n Stata/SE 12.1 on an Intel Core 7-2600 CPU 3.40GHz wth 32 GB RAM operatng under 64-bt Wndows 7. N s the number of groups, T s the number of observatons per group, K s the number of explanatory varables. N T K Step 1 Step 2 Step 3 The column labeled as twoway shows the average computng tme of our algorthm measured n 100 100 2 0.0 0.0 0.1 seconds. Columns reg2hdfe, a2reg and felsdvreg show computng tme for each alternatve 100 100 10 0.0 0.1 0.0 algorthm. The astersk 1000 represents 100 a volaton 2 of the matrx 0.1 sze lmt 0.2 n Stata 0.0and >1d ndcates that the procedure1000 was stopped100 after one day. 10 0.1 0.8 0.0 1000 1000 2 6.1 1.8 0.1 1000 1000 10 6.1 6.8 0.4 10000 100 2 1.4 1.8 0.2 10000 100 10 1.4 6.6 0.4 10000 1000 2 21.3 25.3 1.8 10000 1000 10 21.3 68.3 4.6 10000 10000 2 9055.9 193.0 17.2 10000 10000 10 9055.5 732.5 48.6 100000 100 2 6.1 18.4 1.7 100000 100 10 6.3 70.6 4.5 100000 1000 2 172.8 207.2 17.2 9 100000 1000 10 170.7 751.4 48.4

100000 100 10 81.4 891.2 278.8 >1d 100000 1000 2 397.1 2755.9 * >1d 100000 1000 10 970.5 8274.4 * >1d Runnng Tme by Step (n seconds N T K Step 1 Step 2 Step 3 100 100 2 0.0 0.0 0.1 100 100 10 0.0 0.1 0.0 1000 100 2 0.1 0.2 0.0 1000 100 10 0.1 0.8 0.0 1000 1000 2 6.1 1.8 0.1 1000 1000 10 6.1 6.8 0.4 10000 100 2 1.4 1.8 0.2 10000 100 10 1.4 6.6 0.4 10000 1000 2 21.3 25.3 1.8 10000 1000 10 21.3 68.3 4.6 10000 10000 2 9055.9 193.0 17.2 10000 10000 10 9055.5 732.5 48.6 100000 100 2 6.1 18.4 1.7 100000 100 10 6.3 70.6 4.5 100000 1000 2 172.8 207.2 17.2 100000 1000 10 170.7 751.4 48.4 Table 2: Ths table breaks down the computng tme spent n each of the steps of our algorthm. N s the number of groups, T s the number of observatons per group, K s the number of explanatory varables. Step 1 performs the matrx nverson and constructs the matrces that wll be needed later on. Step 2 resdualzes the dependent and explanatory varables. Step 3 use the command regress to compute the OLS estmator usng the resdualzed data. tme n step 2, whch projects each explanatory varables on the set of fxed effects, s ncreasng n K. Fnal Remarks We present an algorthm to estmate a two-way fxed effect lnear model. Whle exstent algorthms are desgned for sparse panels, ours works best n dense, but unbalanced panels. The algorthm reles on the Frsch-Waugh-Lovell Theorem and apples to ordnary least squares (OLS, two-stage least squares (TSLS and generalzed method of moments estmators (GMM. The coeffcents of nterest are computed usng the resduals from the projecton of all varables on the two sets of fxed-effects. Our algorthm has three mportant advantages. Frst, t manages memory and computatonal resources effcently whch speeds up the computaton of the estmates. Second, t allows the researcher to estmate multple specfcatons usng the same set structure of fxed effects at a very low computatonal cost. Thrd, the asymptotc varance of the parameters of nterest can be consstently estmated usng standard routnes on the resdualzed data. 10

References Arellano, M. (1987: Computng Robust Standard Errors for Wthn-Groups Estmators, Oxford Bulletn of Economcs and Statstcs, 49(4, 431 34. Baum, C. F., M. E. Schaffer, and S. Stllman (2007: Enhanced routnes for nstrumental varables/gmm estmaton and testng, Stata Journal, 7(4, 465506. Cornelssen, T. (2008: The Stata command felsdvreg to ft a lnear model wth two hghdmensonal fxed effects, Stata Journal, 8(2, 170 189. Frsch, R., and F. V. Waugh (1933: Partal Tme Regressons as Compared wth Indvdual Trends, Econometrca, 1(4, 387 401. Gaure, S. (2013: OLS wth multple hgh dmensonal category varables, Computatonal Statstcs & Data Analyss, 66, 8 18. Gles, D. E. A. (1984: Instrumental varables regressons nvolvng seasonal data, Economcs Letters, 14(4, 339 343. Gumaraes, P., and P. Portugal (2010: A smple feasble procedure to ft models wth hgh-dmensonal fxed effects, Stata Journal, 10(4, 628. Hayash, F. (2000: Econometrcs. Prnceton Unversty Press, Prnceton. Lovell, M. C. (1963: Seasonal Adjustment of Economc Tme Seres and Multple Regresson Analyss, Journal of the Amercan Statstcal Assocaton, 58(304, 993 1010. (2008: A Smple Proof of the FWL Theorem, The Journal of Economc Educaton, 39(1, 88 91. Ouazad, A. (2008: A2REG: Stata module to estmate models wth two fxed effects,. Wansbeek, T., and A. Kapteyn (1989: Estmaton of the error-components model wth ncomplete panels, Journal of Econometrcs, 41(3, 341 361. Wooldrdge, J. M. (2010: Econometrc Analyss of Cross Secton and Panel Data. The MIT Press, Cambrdge, Mass, second edton edton edn. 11