Time-Series Cross-Section Analysis

Similar documents
Pooling Space and Time

Autocorrelation. Jamie Monogan. Intermediate Political Methodology. University of Georgia. Jamie Monogan (UGA) Autocorrelation POLS / 20

Modeling the Covariance

Cross Sectional Time Series: The Normal Model and Panel Corrected Standard Errors

Heteroscedasticity. Jamie Monogan. Intermediate Political Methodology. University of Georgia. Jamie Monogan (UGA) Heteroscedasticity POLS / 11

Vector Autoregression

Heteroskedasticity in Panel Data

Heteroskedasticity in Panel Data

Basic Regressions and Panel Data in Stata

EC327: Advanced Econometrics, Spring 2007

Economics 308: Econometrics Professor Moody

Granger Causality Testing

Topic 10: Panel Data Analysis

Applied Microeconometrics (L5): Panel Data-Basics

PS 271B: Quantitative Methods II Lecture Notes

Capital humain, développement et migrations: approche macroéconomique (Empirical Analysis - Static Part)

Econometrics of Panel Data

Panel Data III. Stefan Dahlberg

MA Advanced Econometrics: Applying Least Squares to Time Series

Advanced Quantitative Methods: panel data

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Econometrics. Week 6. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Chapter 15 Panel Data Models. Pooling Time-Series and Cross-Section Data

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

1 Estimation of Persistent Dynamic Panel Data. Motivation

Econometrics - 30C00200

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

ARIMA Models. Jamie Monogan. January 25, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 25, / 38

10 Panel Data. Andrius Buteikis,

1 Motivation for Instrumental Variable (IV) Regression

Dynamic panel data methods

Econometrics of Panel Data

Univariate, Nonstationary Processes

Journal of Statistical Software

Intervention Models and Forecasting

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

y it = α i + β 0 ix it + ε it (0.1) The panel data estimators for the linear model are all standard, either the application of OLS or GLS.

Heteroskedasticity. We now consider the implications of relaxing the assumption that the conditional

Modeling the Mean: Response Profiles v. Parametric Curves

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Lecture 7: Dynamic panel models 2

A Re-Introduction to General Linear Models (GLM)

Lecture 8 Panel Data

ECON 4551 Econometrics II Memorial University of Newfoundland. Panel Data Models. Adapted from Vera Tabakova s notes

Heteroskedasticity. y i = β 0 + β 1 x 1i + β 2 x 2i β k x ki + e i. where E(e i. ) σ 2, non-constant variance.

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Econometrics of Panel Data

Two-Variable Regression Model: The Problem of Estimation

Econometrics of Panel Data

Testing for Unit Roots with Cointegrated Data

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Multiple Regression Analysis: The Problem of Inference

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Lecture 4: Linear panel models

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Short T Panels - Review

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

Non-linear panel data modeling

2 Prediction and Analysis of Variance

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

the error term could vary over the observations, in ways that are related

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Linear Regression & Correlation

FinQuiz Notes

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Dealing With Endogeneity

Casuality and Programme Evaluation

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Regression of Time Series

Dynamic Panel Data Models

Reading Assignment. Serial Correlation and Heteroskedasticity. Chapters 12 and 11. Kennedy: Chapter 8. AREC-ECON 535 Lec F1 1

A Practitioner s Guide to Cluster-Robust Inference

DEPARTMENT OF ECONOMICS COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

ARIMA Models. Jamie Monogan. January 16, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 16, / 27

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

WU Weiterbildung. Linear Mixed Models

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Fixed Effects Models for Panel Data. December 1, 2014

Advanced Econometrics

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Point-Referenced Data Models

Topic 7: Heteroskedasticity

9. Linear Regression and Correlation

point estimates, standard errors, testing, and inference for nonlinear combinations

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Testing Random Effects in Two-Way Spatial Panel Data Models

AUTOCORRELATION. Phung Thanh Binh

Modelling the Covariance

Panel Data: Fixed and Random Effects

Linear Regression. Junhui Qian. October 27, 2014

Freeing up the Classical Assumptions. () Introductory Econometrics: Topic 5 1 / 94

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

Applied Statistics and Econometrics

Lecture 5: Omitted Variables, Dummy Variables and Multicollinearity

Applied Econometrics Lecture 1

Lecture 10: Panel Data

Transcription:

Time-Series Cross-Section Analysis Models for Long Panels Jamie Monogan University of Georgia February 17, 2016 Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 1 / 20

Objectives By the end of this meeting, participants should be able to: Estimate fixed effects and random effects models. Estimate a linear model with a lagged dependent variable using OLS. Correct for contemporaneous correlation in standard errors of estimates. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 2 / 20

General Considerations Customary Problems of Pooled Data And the Problems They Can Produce 1 Functional form: Dynamic or static? Linear or nonlinear? β is biased and inconsistent. 2 Unit effects, i.e., each unit requires an offset, C i, so that E(y) is the same across units. β is biased and inconsistent. Key question: Do your unit effects correlate with predictors? 3 Autocorrelation in the time dimension of residuals. β is inefficient (with biased standard errors). 4 Heteroscedasticity across units. β is inefficient (with biased standard errors). 5 Contemporaneous correlation: errors at time t are correlated across units. β is inefficient (with biased standard errors). 6 Spatial autocorrelation: geographically proximate units have correlated errors. β is inefficient (with biased standard errors). Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 3 / 20

General Considerations Two Guiding Principles 1 Cookie-cutter approaches can be a recipe for disaster. 2 Error assumption problems are proportional to the size of errors. A well fit model can tolerate error assumption violations and produce good estimates. (Observational wisdom: second digit of s.e.) A specification where most of the variance remains in the error term will be highly sensitive to error assumptions. The Implication for Procedure Spend your time getting the theory and model specification right, not in worrying about the error term. Do what you can (e.g., dummies, where needed) to move variance into the structural part of the model. And only after you have done your best on structure should you begin to think about modeling error. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 4 / 20

General Considerations A Point of View on Theory Instead of the dichotomy between theory and empirical knowledge of error, assume three components: 1 Genuine a priori theory (not tainted by encounter with data). 2 Knowledge about context and data that you bring to your research, call it quasi-theory (i.e., unit effects). 3 Error: that which is totally inexplicable. Advice My procedural advice amounts to saying maximize the variance accounted for by 1 and 2 before concerning yourself with 3. It is not smart to fool yourself or your reader that (2) is actually an explanation. But it is a lot less smart to leave that variance in the error term, where it will surely cause mischief. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 5 / 20

General Considerations Informing Software of the Panel Structure R For many commands (lm, lme), R does not get bent out of shape. For others (plm), there are options to specify in the command, e.g.: index=c( state, year ) Stata Stata has very powerful pooled data capabilities, none of which work, unless your file is xtset for panel data. The syntax is: xtset unitvar timevar (and order matters) E.g., xtset state year Once you have xtset it, Stata respects the stacked structure of your data (e.g. it won t resort) and opens up all the xt commands, e.g., xtreg, xtgls, xtpcse,xtregar Do help xt Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 6 / 20

Unit Effects Unit Effects Logic: everything that happened to the units before the period of our observation history is likely to have altered mean levels. Thus, the true model is often: y it = β 0 + k β j x jit + C i + u it j=1 where C i is the effect of being in unit i at all times. But with a naïve OLS model we fit: y it = β 0 + which clearly is underspecified. k β j x jit + u it j=1 Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 7 / 20

Unit Effects The Consequence of Ignoring Unit Effects OLS estimates will be biased and inconsistent if Cov(C i, x jit ) 0 for at least one j. Why? If C i is not estimated, it goes into the error. Then unit variables which are accidentally correlated with it will proxy the effect. This will produce false inferences that unit variables explain between unit differences that are actually due to history. OLS estimates will be inefficient with biased standard errors if Cov(C i, x jit ) = 0, j. Why? If C i is not estimated, it goes into the error. Since C i is present in the error term for all of individual i s observations, all of those observations will correlate. Some Solutions 1 Fixed effects (i.e., unit dummy variables). 2 Random effects (GLS). 3 Lagged dependent variable. 4 Recentering approaches. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 8 / 20

Unit Effects Fixed Effects Model y it = k βx jit + γ i + u it j=1 We directly estimate each γ i. Two principal approaches: Least squares dummy variables (LSDV). The within estimator. R command: fe.mod < plm(y x, data=dataset, index=c( unitvar, timevar ), model= within ) Stata command: xtreg depvar varlist,fe Properties The fixed effects estimator solves the problem and produces unbiased ˆβ. But two issues: 1 If the number of units is large, the estimation of all those parameters is inefficient. 2 Unit dummies (or unit means) are perfectly collinear with time-invariant variables (and highly collinear with quasi-unit constants), making it impossible to estimate a model with both fixed effects and time-invariant variables (AKA unit constants). Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 9 / 20

Unit Effects Error Components, aka GLSE, aka Random Effects You know this as a random intercept model, or a compound symmetry model. Reasoning: For pooled data we can apportion error variance sources as: σ 2 = σ 2 units(+σ 2 time) + σ 2 within That is, some proportion of the total error is due to differences between units, some proportion to differences between times, and some is pure error, due to neither units or time. Or, dropping time: σ 2 = σ 2 units + σ 2 within Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 10 / 20

Unit Effects The Logic of Random Effects In the fixed effects approach we estimate parameters for fixed effects between units and thereby remove that variance from the error term. In random effects (GLSE, error components, etc.) we leave the between unit variance in the error term and change our assumptions to reflect the knowledge that between unit variance is in the error term. That is, we specify a (particular) GLS model which assumes between unit variance in the errors. The empirical key to this specification is estimating ρ, conceptually the proportion of all error that is due to unit effects. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 11 / 20

Unit Effects Estimating ρ ˆρ = ρ = σ2 Between σ 2 Total (OLS Residual Variance) - (LSDV Residual Variance) (OLS Residual Variance) Many ways to estimate ρ in the r.e. model: Maddala & Mount (1973): Monte Carlo analyses with 11 different estimators. Baltagi (1981, 2005): Considers additional options. You have already used FIML and REML. Other options just use post hoc residual analysis. With ˆρ, you can use formulas you know to get: β FGLS = [X Ω 1 X] 1 X Ω 1 y. Analysis by Clark & Linzer suggests that the choice of estimator does matter. (E.g., plm [R] yields different results from lmer [R] and BUGS.) Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 12 / 20

Unit Effects Estimation of GLSE (Besides What You Already Know) R re.mod < plm(y x, data=dataset, index=c( unitvar, timevar ),model= random ) Stata xtreg depvar varlist,re re (random effects) is the default and does not need to be specified, but it is useful to remind yourself what model you are estimating. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 13 / 20

Unit Effects Properties of Random Effects Estimator Does not deal with a diminishing correlated error structure or with heteroscedasticity. But these error assumption violations produce unbiased but inefficient β and are generally much less serious in their consequences than are unit effects. Alternative: GLS-ARMA (Kmenta) Produces BLUE estimates if the problem of unit effects is the only assumption violation and the unit effects do not correlate with the covariates. Running a random effects model? A Hausman test is a common tool for evaluating model specification. In R: phtest from the plm library. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 14 / 20

Dynamic Models Perspective on Dynamics Beck & Katz Perspective We have a choice between: Static formulations, which then have expected unit effects and autocorrelation in the time dimension. Dynamic specification, where y t 1 on the right hand side will usually control history (hence unit effects) and eliminate autocorrelated error. Static requires some feasible GLS approach to deal with error assumptions and, because it is static, may be an inferior model of (inherently dynamic) causality. Dynamic is biased and inconsistent, but is probably the right structure and can then be estimated with OLS. Besides, a wrong functional form also induces bias. Conclusion: Dynamic specification with OLS is better. Note: we do have options on how we dynamically specify a model. One problem remains: contemporaneous correlation. This was a possibility with all the other models too, but was so far down the list of problems that most analysts chose not to deal with it. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 15 / 20

Dynamic Models Panel Corrected Standard Errors Contemporaneous Correlation Example Contemporaneous (error) correlation arises from some time-related process which affects most units at the same time similarly And is not part of the model structure. In a state analysis where some economic indicator (e.g., budget shares) is dependent, the national economy is perturbing most states to have similar errors at particular times. You could model this effect (if you have theory) You could dummy time points (which might be sensible if you had many states and few time points). But if you did neither, you would have contemporaneous correlation of errors across units. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 16 / 20

Dynamic Models Panel Corrected Standard Errors The Impact β estimates will be inefficient. Standard error estimates will be biased downward. Therefore, t tests are overstated and p estimates smaller than they should be. Monte Carlo analysis suggests that these effects are quite small, that even high levels of contemporaneous correlation cause only modest distortions. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 17 / 20

Dynamic Models Panel Corrected Standard Errors Panel Corrected Standard Errors Assume, a N N matrix of cross unit contemporary correlations: T e it e jt t=1 ij = T I T is a T T identity for the time dimension, then Ω = I T. Where Ω is NT NT and standard errors are taken from the diagonals of: (X X) 1 X ΩX(X X) 1 This is the panel correction. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 18 / 20

Dynamic Models Panel Corrected Standard Errors Software R Estimate a linear model Input the model name into the pcse function from the pcse library. Stata xtpcse y varlist Produces OLS β s And panel corrected standard errors. Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 19 / 20

Dynamic Models Panel Corrected Standard Errors For Next Time Study for the midterm exam. Read FLW chapters 11-13. Download Owsiak s (2013) dataset of the impact of settled borders on democracy. The file owsiakjop2013.dta is in Stata format at http://dx.doi.org/10.7910/dvn/arkoti. Your cross-sectional units are countries (ccode) and temporal units are years (year). Remove every country-year with any missing observation. This is easy, but I will not tell you how. Do not ask. Estimate a linear regression with OLS using the following specification (with variable names in parentheses): The dependent variable is Polity score (polity2), and the predictors are an indicator for having all borders settled (allsettle), lagged GDP (laggdpam), lagged change in GDP (laggdpchg), lagged trade openness (lagtradeopen), lagged military personnel (lagmilper), lagged urban population (lagupop), and lagged pervious non-democratic movement (lagsumdown). Test for unit effects in the residuals with ANOVA. What is your conclusion? Estimate a fixed effects model. Test for serial correlation with a panel Breusch-Godfrey test. What is your conclusion? Estimate a lagged dependent variable model with OLS. Taken at face value, what do you make of the coefficient on lagged Polity score lagpolity? How do your three results contrast? Which model do you most prefer? Jamie Monogan (UGA) Time-Series Cross-Section Analysis February 17, 2016 20 / 20