Spurious Fixed E ects Regression

Similar documents
Properties and Hypothesis Testing

1 Inferential Methods for Correlation and Regression Analysis

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Final Examination Solutions 17/6/2010

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

MA Advanced Econometrics: Properties of Least Squares Estimators

Mathematical Notation Math Introduction to Applied Statistics

Statistical Inference Based on Extremum Estimators

A statistical method to determine sample size to estimate characteristic value of soil parameters

Spatial Nonstationarity and Spurious Regression: The Case with Row-Normalized Spatial Weights Matrix

The standard deviation of the mean

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

ECON 3150/4150, Spring term Lecture 3

Algebra of Least Squares

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Economics 326 Methods of Empirical Research in Economics. Lecture 8: Multiple regression model

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

Chapter 6 Sampling Distributions

Cointegration versus Spurious Regression and Heterogeneity in Large Panels

1 Introduction to reducing variance in Monte Carlo simulations

Regression with an Evaporating Logarithmic Trend

Asymptotic Results for the Linear Regression Model

Lecture 2: Monte Carlo Simulation

Efficient GMM LECTURE 12 GMM II

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

LECTURE 8: ASYMPTOTICS I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 9

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

Topic 9: Sampling Distributions of Estimators

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Variables, Sampling and Estimation

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Topic 9: Sampling Distributions of Estimators

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Limit distributions for products of sums

Simple Linear Regression

Central limit theorem and almost sure central limit theorem for the product of some partial sums

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Chapter 13, Part A Analysis of Variance and Experimental Design

Asymptotic distribution of products of sums of independent random variables

SDS 321: Introduction to Probability and Statistics

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Solution to Chapter 2 Analytical Exercises

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Sampling Distributions, Z-Tests, Power

Linear Regression Models

Topic 9: Sampling Distributions of Estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Output Analysis (2, Chapters 10 &11 Law)

Describing the Relation between Two Variables

GUIDE FOR THE USE OF THE DECISION SUPPORT SYSTEM (DSS)*

Understanding Samples

1 General linear Model Continued..

Economics 326 Methods of Empirical Research in Economics. Lecture 18: The asymptotic variance of OLS and heteroskedasticity

GUIDELINES ON REPRESENTATIVE SAMPLING

STA6938-Logistic Regression Model

Last Lecture. Wald Test

There is no straightforward approach for choosing the warmup period l.

Control Charts for Mean for Non-Normally Correlated Data

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Statisticians use the word population to refer the total number of (potential) observations under consideration

Output Analysis and Run-Length Control

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Quantile regression with multilayer perceptrons.

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Bayesian Methods: Introduction to Multi-parameter Models

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Estimation for Complete Data

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

7.1 Convergence of sequences of random variables

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Statistics 511 Additional Materials

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

Statistical Fundamentals and Control Charts

4 Multidimensional quantitative data

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Chapter 8: Estimating with Confidence

Data Analysis and Statistical Methods Statistics 651

(all terms are scalars).the minimization is clearer in sum notation:

11 Correlation and Regression

Testing Statistical Hypotheses for Compare. Means with Vague Data

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

ECON 3150/4150, Spring term Lecture 1

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

Convergence of random variables. (telegram style notes) P.J.C. Spreij

32 estimating the cumulative distribution function

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Transcription:

Spurious Fixed E ects Regressio I Choi First Draft: April, 00; This versio: Jue, 0 Abstract This paper shows that spurious regressio results ca occur for a xed e ects model with weak time series variatio i the regressor ad/or strog time series variatio i the regressio errors whe the rst-di ereced ad Withi-OLS estimators are used. Asymptotic properties of these estimators ad the related t-tests ad model selectio criteria are studied by sedig the umber of crosssectioal observatios to i ity. This paper shows that the rst-di ereced ad Withi-OLS estimators diverge i probability, that the related t-tests are icosistet, that R s coverge to zero i probability ad that AIC ad BIC diverge to i probability. The results of the paper war that oe should ot jump to the use of xed e ects regressios without cosiderig the degree of time series variatios i the data. Itroductio Regressios that do ot reveal true statistical relatios are called spurious. Spurious regressios are kow to occur i various time series settigs. Yule (96) tries to explai ureasoably high correlatio betwee the mortality rate for the years 866 9 ad the ratio of Church of Eglad marriages to all marriages by usig time series which, ad the rst di erece of which, are positively correlated. Yule is regarded as the rst work that studied spurious regressios. More recetly, Grager ad Newbold (974) demostrate via simulatio that osese regressio results ca be observed betwee two idepedet radom walks. Their aalysis has bee exteded to various models ivolvig stochastic ad/or determiistic treds. I thak Woho Sog ad a aoymous referee for their helpful commets. Research assistace for this paper was provided by Michul Yum whom I thak. Departmet of Ecoomics, Sogag Uiversity, # Shisu-dog, Mapo-gu, Seoul, -74 Korea. E-mail: ichoi@gmail.com, ichoi@sogag.ac.kr See Aldrich (995) for a historical accout of spurious correlatios. Accordig to Aldrich, the coectio betwee Yule (96) ad Grager ad Newbold (974) had bee largely igored util poited out by Hedry (986).

However, spurious regressio results for covetioal micro paels have ever bee reported formally. This paper shows that spurious regressio results ca occur for a xed e ects model with weak time series variatio i the regressor ad/or strog time series variatio i the regressio errors whe such popular estimators as rst-di ereced ad Withi-OLS estimators are used. Uder the weak time series variatio, the variace of the regressor shriks to zero as the umber of cross-sectioal observatios goes to i ity. Some micro paels display weak time series variatios. For example, real wages collected over a short spa of time or moopoly prices uder regulatios do ot chage much over time. The strog time series variatio i the regressio errors is modelled by the divergig variace of the regressio errors. More speci cally, this paper will show by sedig the umber of cross-sectioal observatios to i ity that the rst-di ereced ad Withi-OLS estimators diverge i probability, that the related t-tests are icosistet, that R s coverge to zero i probability ad that AIC ad BIC diverge to i probability whe the regressor has weak time series variatio ad/or the regressio errors have strog time series variatio. Because these results are ot expected from stadard pael regressio theory, such results should obviously ot be take seriously. This paper has the followig pla. Sectio illustrates a spurious pael regressio usig a two-period xed e ects model. The basic model ad assumptios are also itroduced i this sectio. Sectio 3 exteds the results of Sectio to the case of a multiperiod xed e ects model. Sectio 4 reports simulatio results regardig the t-tests ad coe ciets of determiatio. Sectio 5 cotais a summary ad further remarks. All the limits are take by sedig to. A illustratio Cosider the followig two-period xed e ects model with a sigle regressor, x it y it = i + x it + u it ; (i = ; : : : ; ; t = ; ); () where x it = z i + a it ad u it = v i + b it () ad z i ad v i are radom variables. As usual, i is a idividual e ects variable correlated with x it. Observed data are fy it g ad fx it g. But fz i g ad fa it g are ot separately observed. Radom variables fa it g ad fb it g brig time series variatios to the observed data. For fa it g ad fb it g, we assume

Assumptio a it b it iid 0; " a 0 0 b # for every, i ad t; Assumptio 0 ad + < 0: As will be show, spurious regressio results about the di ereced estimator, t-test ad R arise uder Assumptios ad. Assumptio has two cases > 0; < ad = 0; < : I the former case, there is weak time series variatio i the regressor for large which implies the well-kow multicolliearity problem of regressio. Spurious regressio results are well expected i this case. This case is relevat i practice because some pael data display such behavior. For example, real wages collected over a short time spa or moopoly prices uder regulatios do ot chage much over time. I the latter case, there is o multicolliearity problem, but there is strog time series variatio i the regressio errors. We will use otatio w i = w i w i throughout this sectio. The Withi- OLS ad rst-di ereced estimators are kow to be idetical for a two-period xed e ects model. These estimators also yield the same t-ratio ad R. The P rst-di ereced estimator is writte as ^ d = x P P iy i = + (x i) P a ib i : Sice (a i) a i iid (0; a); b i iid (0; b ) ad Cov( a i ; b i ) = 0 for every, we obtai by the cetral limit theorem + p d a i b i N(0; ) (3) h with = E (+) (a i ) (b i ) i = 4 a b. Furthermore, the law of large umbers yields Relatios (3) ad (4) give (a i ) p a : (4) + (^ d ) d N 0; b : (5) a This ca also be writte as ^ d = + O p + ; which shows that ^ d diverges i probability to whe + < 0. I fact, the coditio 0 is ot required for spurious regressio results, but we are iterested i this case because it correspods to weak ( > 0) ad moderate time series variatio ( = 0) i the regressor. 3

Next, we study properties of the t-test. The t-ratio for the ull hypothesis H 0 : = 0 is writte as t = Sice ^ = q^ ( P ^ d (x i) ) ; where ^ = P (y i ^ d x i ). (b i ) + (^ d ) = O p + O p + O p ; (a i ) (^ d ) a i b i (6) due to relatios (3), (4), (5) ad the law of large umbers applied to the rst term of (6), we have Relatios (4), (5) ad (7) yield ^ p b : (7) t = r ^ + ^ d P (x i) N(0; b ) d a q = N(0; ): b = a This shows that the t-ratio has a stadard ormal distributio i the limit uder the ull hypothesis. However, uder the alterative hypothesis H : 6= 0, t = + (^ d ) + + r ^ P = c (x i) + d ; say, ad c d N(0; ); d = O p +. This shows that the t-ratio has a stadard ormal distributio i the limit whe + < 0. That is, the t-test is ot cosistet whe + < 0 or equivaletly whe ^ d diverges i probability. The coe ciet of determiatio obtaied by regressig y i o x i is de ed by R = ^ d (5) P P (x i) y i = C D ;say. Lettig = ^ d C = (a i ) +, we d from relatios (4) ad (a i ) + (a i ) = O p + O p + O p : (8) Moreover, D = (a i ) + (b i ) + a i b i (9) = O p + O p + O p : 4

Relatios (8) ad (9) show that R = O p + + O p + + O p () ; (0) O p ( + ) + O p () + O p + implyig R p 0 whe + < 0. Thus, regardless of the quality of x it as a regressor, small values of R will be observed. I the stadard pael regressio case (; ) = (0:0; 0:0), relatio (0) reduces to R = O p () (but ot o p ()). If > 0, such model selectio criteria as AIC ad BIC diverge to i probability, sice their commo term l(^ ) does so as is expected from relatio (7). But of course, this does ot mea that the selected regressors are optimal. It is simply a idicatio of a spurious pael regressio. 3 Spurious regressios for a multiperiod xed e ects model This sectio cosiders the model of Sectio ad studies properties of the Withi- OLS ad rst-di ereced estimators for T >. The model cotiues to have a sigle regressor i this sectio, which will simplify the expositio. Assumptios ad are also retaied. 3. Withi-OLS estimator The Withi-OLS estimator is writte as P P T t= ^ w = (x it x i )(y it y i ) P P T t= (x it x i ) P PT t= (a it a i )b it = + P PT t= (a it a i ) ; where q i = P T T t= q PT it. Because + t= (a it a i )b it iid 0; (T ) a b for each T, the cetral limit theorem yields + d p (a it a i )b it N(0; (T ) a b ): () t= PT Moreover, sice E t= (a it a i ) = (T ) a, the law of large umbers gives (a it a i ) t= 5 p (T ) a : ()

Relatios () ad () provide the limitig distributio of ^ w as + (^ w ) b d N(0; (T ) ): (3) a As i the previous sectio, ^ w diverges i probability to whe + < 0. As usual, the t-ratio is de ed as t w = q^ ( P ^ w P T t= (x it x i ) ) ; where ^ = P P T (T ) t= (y it y i ^ d (x it x i )) : Usig relatios P T t= (a it a i ) iid 0; (T ) a, P T t= b it bi iid 0; (T ) b ad (), we have ^ = = (T ) (T ) (^ w ) t= " t= t= b it bi (^ w ) (a it a i ) b it bi + (^w ) # (b it bi )(a it a i ) = O p + O p + O p ; t= (a it a i ) which implies ^ p b : (4) Relatios (), (3) ad (4) yield uder the ull hypothesis t w = r ^ P N 0; d P q T t= (x it x i ) b = ((T ) a) + ^ w b (T ) a = N(0; ): Uder the alterative hypothesis H : 6= 0, however, we have t w d N(0; ) whe + < 0 as i Sectio. Thus, the t-test is ot cosistet whe + < 0. We ca also show that R p 0 whe + < 0 usig the same methods as i Sectio ad that AIC ad BIC diverge to i probability whe > 0. Details are ot worth reportig here. 3. First-di ereced estimator P P T t= Write the rst-di ereced estimator as ^ d = + a itb it PT P P T : Sice + t= (a it) t= a itb it iid 0; (6T 8) a b for each T; we have P p + PT d t= a itb it N(0; (6T 8) a b ): I additio, PT t= (a it) are idepedet ad idetically distributed with 6

PT E t= (a it) = (T ) a, which gives P PT t= (a it) p (T ) a: Thus, we obtai + (^ d ) d N (6T 8) b 0; 4(T ) : (5) a Agai, it is foud that ^ d diverges i probability to whe + < 0. Relatios (3) ad (5) show that, whereas the Withi-OLS ad rst-di ereced estimators have the same limitig distributio whe T =, they have di eret distributios whe T >. Because fx it g ad fu it g are movig-average processes of order, we should ot use the usual stadard error for the t-ratio that does ot accout for the serial correlatio. Istead, the t-ratio should be de ed by usig a robust stadard error. Lettig ^u it = y it dvar(^ d ) = t= ^ d x it, a robust estimator of the variace of ^ d is (x it ) (x it ^u it ) + x it x i(t+) ^u it ^u i(t+) ; t= which utilizes the fact that fx it g ad fu it g are movig average processes of order. Usig this, we de e the t-ratio for the ull hypothesis H 0 : = 0 as t d = Writig t d = r (+) t= + p P P T t= a itb it q ^ d ^. V ar(^ d ) P PT t= (a it^u it ) + P T t= a ita i(t+) ^u it ^u i(t+) ad usig the fact that the deomiator coverges to q (6T 8) a b i probability, we d that t d d N(0; ) uder the ull hypothesis. Uder the alterative hypothesis H : 6= 0; the t-ratio also has a stadard ormal distributio i the limit whe + < 0. This follows i the same maer as i Sectio. Last, we also have R p 0 whe + < 0, as i Sectio, ad divergig AIC ad BIC (to i probability) whe > 0. 4 Simulatio This sectio reports some simulatio results regardig the t-tests ad R s from the rst-di ereced ad Withi-OLS regressios. The data were geerated by equatios 7

() ad () of Sectio with T = ; 6; = 00; 00; 500; i = z i = v i = 0 for all i; ad = 0:0; 0:05; 0:0; 0:5; 0:0. For " fa it g ad # fb it g; stadard ormal umbers were a it 0 geerated by iid N 0;. We cosidered three cases, (; ) = b it 0 (0:9; 0:3); (0:0; 0:6); (0:0; 0:0). The case (; ) = (0:9; 0:3) correspods to weak time series variatios both i the regressor ad regressio errors, while (; ) = (0:0; 0:6) to moderate time series variatio i the regressor ad strog variatio i the errors. These cases geerate spurious pael regressio results, while (; ) = (0:0; 0:0) leads to a stadard pael regressio. Simulatio results of this sectio are based 50,000 iteratios. I Figure, rejectio frequecies of the t-tests for the ull hypothesis H 0 : = 0 are plotted. Left-side ad middle paels show the rejectio frequecies of the t-tests for the spurious pael regressio cases (; ) = (0:9; 0:3); (0:0; 0:6). As predicted by the asymptotic theory of Sectios ad 3, the empirical rejectio frequecies are close to 0.05 for all the values of, T ad, which implies the t-tests lack asymptotic power. Right-side paels show the rejectio frequecies of the t-tests for the stadard pael regressio case, (; ) = (0:0; 0:0). Uder the ull hypothesis, = 0, the rejectio frequecy is close to 0.05, but icreases with ad, as is well expected from stadard pael regressio theory. Figure plots R s from the rst-di ereced ad Withi-OLS regressios. Whe (; ) = (0:9; 0:3); (0:0; 0:6), R coverges to zero as icreases for all the values of, which is cosistet with the asymptotic theory of Sectios ad 3 but is cotrary to the commo perceptio that R should icrease as sigals from the regressors become stroger. By cotrast, whe (; ) = (0:0; 0:0); R steadily icreases with the value of, as predicted by stadard pael regressio theory. Figure illustrates that R s from the spurious pael regressios take values close to zero. 5 Summary ad further remarks This paper sets up a pael regressio model with varyig degree of time series variatios i the regressor ad errors ad aalyzes asymptotic properties of the estimators, t-tests ad model selectio criteria from the rst-di ereced ad Withi-OLS estimatios. Weak time series variatio i the regressor ad/or strog time series variatio i the regressio errors are cosidered. These cases are show to brig spurious regressio results for the rst-di ereced ad With-OLS estimators ad related t-tests ad R : I additio, such iformatio criteria as AIC ad BIC are show to diverge 8

to mius i ity i probability whe the regressio errors have weak time series variatio. The results of this paper suggest that it is iappropriate to use xed e ects pael regressios i some cases. But it is also admitted that the theory of this paper does ot provide ay guidelie for judgig the appropriateess of xed e ects pael regressios ad that further research is required for the implemetatio of the guidelie. As a temporary remedy, however, it may be prudet to iterpret a xed e ects regressio that yields a large (i modulus) value of the estimator i use ad a moderate value of the t-ratio, alog with small values of R, AIC ad BIC, as evidece for spuriousess. I additio, if oe ds markedly di eret results from the pooled-ols ad xed e ects regressios, spurious xed e ects regressio results should also be suspected. I a utshell, this paper wars that jumpig to the use of xed e ects regressios without cosiderig the degree of time series variatio i the data should be avoided i empirical practice. This paper cosiders the case of sigle regressor oly. It is expected that similar result will hold i the case of multiple regressors, but more work eeds to be doe for this case. Refereces Aldrich, J. (995). Correlatios Geuie ad Spurious i Pearso ad Yule, Statistical Sciece, Vol. 0, pp. 364 376. Grager, C.W.J., ad P. Newbold. (974). Spurious Regressios i Ecoometrics, Joural of Ecoometrics, Vol., pp. 0. Hedry, D.F. (986). Ecoometric Modellig with Coitegrated Variables: Overview, Oxford Bulleti of Ecoomics ad Statistics, Vol. 48, pp. 0. A Yule, G.U. (96). Why Do We Sometimes Get Nosese-Correlatios betwee Time-Series? A Study i Samplig ad the Nature of Time-Series, Joural of the Royal Statistical Society, Vol. 89, pp. 63. 9