Asymptotic Least Squares Theory

Similar documents
LECTURE ON HAC COVARIANCE MATRIX ESTIMATION AND THE KVB APPROACH

Asymptotic Least Squares Theory: Part II

Quick Review on Linear Multiple Regression

Elements of Probability Theory

Understanding Regressions with Observations Collected at High Frequency over Long Span

Generalized Method of Moment

ECON 4160, Spring term Lecture 12

Empirical Market Microstructure Analysis (EMMA)

This chapter reviews properties of regression estimators and test statistics based on

Darmstadt Discussion Papers in Economics

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Classical Least Squares Theory

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

ECON 4160, Lecture 11 and 12

Classical Least Squares Theory

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

Testing for non-stationarity

MEI Exam Review. June 7, 2002

E 4101/5101 Lecture 9: Non-stationarity

Linear Regression with Time Series Data

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs

ECON 616: Lecture Two: Deterministic Trends, Nonstationary Processes

Single Equation Linear GMM with Serially Correlated Moment Conditions

Single Equation Linear GMM with Serially Correlated Moment Conditions

Ch.10 Autocorrelated Disturbances (June 15, 2016)

7. Integrated Processes

11/18/2008. So run regression in first differences to examine association. 18 November November November 2008

Consider the trend-cycle decomposition of a time series y t

Econometrics. Week 4. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

7. Integrated Processes

Non-Stationary Time Series and Unit Root Testing

Advanced Econometrics

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Multivariate Time Series: Part 4

Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem

E 4160 Autumn term Lecture 9: Deterministic trends vs integrated series; Spurious regression; Dickey-Fuller distribution and test

Vector error correction model, VECM Cointegrated VAR

Nonstationary Time Series:

Linear Regression with Time Series Data

BCT Lecture 3. Lukas Vacha.

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Moreover, the second term is derived from: 1 T ) 2 1

Econometric Methods for Panel Data

Introductory Econometrics

Econometrics II - EXAM Answer each question in separate sheets in three hours

Non-Stationary Time Series and Unit Root Testing

Unit roots in vector time series. Scalar autoregression True model: y t 1 y t1 2 y t2 p y tp t Estimated model: y t c y t1 1 y t1 2 y t2

Econometrics. Week 11. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Non-Stationary Time Series, Cointegration, and Spurious Regression

Multivariate Time Series: VAR(p) Processes and Models

Asymptotic distribution of GMM Estimator

Statistics 910, #5 1. Regression Methods

Empirical Macroeconomics

Empirical Macroeconomics

LM threshold unit root tests

Final Exam November 24, Problem-1: Consider random walk with drift plus a linear time trend: ( t

Introduction to Quantile Regression

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Univariate Unit Root Process (May 14, 2018)

A nonparametric test for seasonal unit roots

Title. Description. Quick start. Menu. stata.com. xtcointtest Panel-data cointegration tests

Residual-Based Tests for Cointegration and Multiple Deterministic Structural Breaks: A Monte Carlo Study

Generalized Least Squares Theory

A New Approach to Robust Inference in Cointegration

Regression and Statistical Inference

ARDL Cointegration Tests for Beginner

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

Cointegration Lecture I: Introduction

Unit Root and Cointegration

Linear Regression with Time Series Data

EC821: Time Series Econometrics, Spring 2003 Notes Section 9 Panel Unit Root Tests Avariety of procedures for the analysis of unit roots in a panel

Econometrics of Panel Data

Trending Models in the Data

Cointegrating Regressions with Messy Regressors: J. Isaac Miller

Lecture 2: Univariate Time Series

Economics 536 Lecture 7. Introduction to Specification Testing in Dynamic Econometric Models

Econometrics of Panel Data

Regression with time series

9) Time series econometrics

Non-Stationary Time Series and Unit Root Testing

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

Nonstationary Panels

A Primer on Asymptotics

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

GARCH Models Estimation and Inference

7 Introduction to Time Series

On the Long-Run Variance Ratio Test for a Unit Root

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Econometrics of Panel Data

7 Introduction to Time Series Time Series vs. Cross-Sectional Data Detrending Time Series... 15

Covers Chapter 10-12, some of 16, some of 18 in Wooldridge. Regression Analysis with Time Series Data

9. AUTOCORRELATION. [1] Definition of Autocorrelation (AUTO) 1) Model: y t = x t β + ε t. We say that AUTO exists if cov(ε t,ε s ) 0, t s.

Y t = ΦD t + Π 1 Y t Π p Y t p + ε t, D t = deterministic terms

Eksamen på Økonomistudiet 2006-II Econometrics 2 June 9, 2006

Department of Economics, UCSD UC San Diego

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

The outline for Unit 3

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Transcription:

Asymptotic Least Squares Theory CHUNG-MING KUAN Department of Finance & CRETA December 5, 2011 C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 1 / 85

Lecture Outline 1 When Regressors are Stochasitc 2 Asymptotic Properties of the OLS Estimators Consistency Asymptotic Normality 3 Consistent Estimation of Covariance Matrix 4 Large-Sample Tests Wald Test Lagrange Multiplier Test Likelihood Ratio Test Power of Tests Example: Analysis of Suicide Rate 5 Digression: Instrumental Variable Estimator C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 2 / 85

Lecture Outline (cont d) 6 I (1) Variables 7 Autoregression of an I (1) Variable Asymptotic Properties of the OLS Estimator Tests of Unit Root 8 Tests of Stationarity against I (1) 9 Regressions of I (1) Variables Spurious Regressions Cointegration Error Correction Model C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 3 / 85

When Regressors are Stochasitc Given y = Xβ + e, suppose that X is stochastic. Then, [A2](i) does not hold because IE(y) can not be Xβ o. It would be difficult to evaluate IE(ˆβ T ) and var(ˆβ T ) because ˆβ T is a complex function of the elements of y and X. Assume IE(y X) = Xβ o. IE(ˆβ T ) = IE [ (X X) 1 X IE(y X) ] = β o. If var(y X) = σ 2 oi T, var(ˆβ T ) = IE [ (X X) 1 X var(y X)X(X X) 1] = σ 2 o IE(X X) 1, which is not the same as σ 2 o(x X) 1. (X X) 1 X y is not normally distributed even when y is. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 4 / 85

Q: Is the condition IE(y X) = Xβ o realistic? Suppose that x t contains only one regressor y t 1. Then, IE(y t x 1,..., x T ) = x tβ o implies IE(y t y 1,..., y T 1 ) = β o y t 1, which is y t with probability one. As such, the conditional variance of y t, var(y t y 1,..., y T 1 ) = IE{[y t IE(y t y 1,..., y T 1 )] 2 y 1,..., y T 1 }, must be zero, rather than a positive constant σo. 2 Note: When X is stochastic, a different framework is needed to evaluate the properties of the OLs estimator. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 5 / 85

Notations We observe (y t w t), where w t (m 1) is the vector of all exogenous variables. W t = {w 1,..., w t } and Y t = {y 1,..., y t }. Then, {Y t 1, W t } generates a σ-algebra that is the information set up to time t. Regressors x t (k 1) are taken from the information set {Y t 1, W t }, and the resulting linear specification is y t = x tβ + e t, t = 1, 2,..., T. The OLS estimator of this specification is ( T ) 1 ( T ) ˆβ T = (X X) 1 X y = x t x t x t y t. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 6 / 85

Consistency The OLS estimator ˆβ T is strongly (weakly) consistent for β a.s. if ˆβ T β IP (ˆβ T β ) as T. That is, ˆβ T will be eventually close to β in a proper probabilistic sense when enough information becomes available. [B1] (i) {x t x t} obeys a SLLN (WLLN) with the almost sure (prob.) limit M xx := lim T 1 T T IE(x tx t), which is nonsingular. [B1] (ii) {x t y t } obeys a SLLN (WLLN) with the almost sure (prob.) limit m xy := lim T 1 T T IE(x ty t ). [B2] There exists a β o such that y t = x tβ o + ɛ t with IE(x t ɛ t ) = 0 for all t. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 7 / 85

By [B1] and Lemma 5.13, the OLS estimator of ˆβ T is ( 1 T ) 1 ( T x t x 1 t T ) T x t y t M 1 xx m xy a.s. (in probability). When [B2] holds, IE(x t y t ) = IE(x t x t)β o, so that m xy = M xx β o, and β = β o. Theorem 6.1 Consider the linear specification y t = x tβ + e t. (i) When [B1] holds, ˆβ T is strongly (weakly) consistent for β = M 1 xx m xy. (ii) When [B1] and [B2] hold, β o = M 1 xx m xy so that ˆβ T is strongly (weakly) consistent for β o. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 8 / 85

Remarks: Theorem 6.1 is about consistency (not unbiasedness), and what really matters is whether the data are governed by some SLLN (WLLN). Note that [B1] explicitly allows x t to be a random vector which may contain some lagged dependent variables (y t j, j 1) and other random variables in the information set. Also, the random data may exhibit dependence and heterogeneity, as long as such dependence and heterogeneity do not affect the LLN in [B1]. Given [B2], x tβ is the correct specification for the linear projection of y t, and the OLS estimator converges to the parameter of interest β o. A sufficient condition for [B2] is that there exists β o such that IE(y t Y t 1, W t ) = x tβ o. (Why?) C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 9 / 85

Corollary 6.2 Suppose that (y t x t) are independent random vectors with bounded (2 + δ) th moment for any δ > 0, such that M xx and m xy defined in [B1] exist. Then, the OLS estimator ˆβ T is strongly consistent for β = M 1 xx m xy. If [B2] also holds, ˆβ T is strongly consistent for β o. Proof: By the Cauchy-Schwartz inequality (Lemma 5.5), the i th element of x t y t is such that IE x ti y t 1+δ [ IE x ti 2(1+δ)] 1/2[ IE yt 2(1+δ)] 1/2, for some > 0. Similarly, each element of x t x t also has bounded (1 + δ) th moment. Then, {x t x t} and {x t y t } obey Markov s SLLN by Lemma 5.26 with the respective almost sure limits M xx and m xy. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 10 / 85

Example: Given the specification: y t = αy t 1 + e t, suppose that {y 2 t } and {y t y t 1 } obey a SLLN (WLLN). Then, the OLS estimator of α is such that ˆα T lim T 1 T T IE(y ty t 1 ) T IE(y t 1 2 ) lim T 1 T a.s. (in probability). When {y t } indeed follows a stationary AR(1) process: y t = α o y t 1 + u t, α o < 1, where u t are i.i.d. with mean zero and variance σu, 2 we have IE(y t ) = 0, var(y t ) = σu/(1 2 αo) 2 and cov(y t, y t 1 ) = α o var(y t ). We have ˆα T cov(y t, y t 1 ) var(y t ) = α o, a.s. (in probability). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 11 / 85

When x tβ o is not the linear projection, i.e., IE(x t ɛ t ) 0, IE(x t y t ) = IE(x t x t)β o + IE(x t ɛ t ). Then, m xy = M xx β o + m xɛ, where 1 m xɛ = lim T T T IE(x t ɛ t ). The limit of the OLS estimator now reads β = M 1 xx m xy = β o + M 1 xx m xɛ. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 12 / 85

Example: Given the specification: y t = x tβ + e t, suppose IE(y t Y t 1, W t ) = x tβ o + z tγ o, where z t are in the information set but distinct from x t. Writing y t = x tβ o + z tγ o + ɛ t = x tβ o + u t, we have IE(x t u t ) = IE(x t z t)γ o 0. It follows that ˆβ T M 1 xx m xy = β o + M 1 xx M xz γ o, with M xz := lim T T IE(x tz t)/t. The limit can not be β o unless x t is orthogonal to z t, i.e., IE(x t z t) = 0. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 13 / 85

Example: Given y t = αy t 1 + e t, suppose that y t = α o y t 1 + ɛ t, α o < 1, where ɛ t = u t π o u t 1 with π o < 1, and {u t } is a white noise with mean zero and variance σu. 2 Here, {y t } is a weakly stationary ARMA(1,1) process. We know ˆα T converges to cov(y t, y t 1 )/ var(y t 1 ) almost surely (in probability). Note, however, that ɛ t 1 = u t 1 π o u t 2 and IE(y t 1 ɛ t ) = IE[y t 1 (u t π o u t 1 )] = π o σu. 2 The limit of ˆα T is then cov(y t, y t 1 ) var(y t 1 ) = α o var(y t 1 ) + cov(ɛ t, y t 1 ) var(y t 1 ) = α o π oσ 2 u var(y t 1 ). The OLS estimator is inconsistent for α o unless π o = 0. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 14 / 85

Remark: Given the specification: y t = αy t 1 + x tβ + e t, suppose that y t = α o y t 1 + x tβ o + ɛ t, such that ɛ t are serially correlated (e.g., AR(1) or MA(1)). The OLS estimator is inconsistent because α o y t 1 + x tβ o is not the linear projection, a consequence of the joint presence of a lagged dependent variable (e.g., y t 1 ) and serially correlated disturbances (e.g., ɛ t being AR(1) or MA(1)). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 15 / 85

Asymptotic Normality By asymptotic normality of ˆβ T we mean: T (ˆβ T β o ) D N (0, D o ), where D o is a p.d. matrix. We may also write D 1/2 o T (ˆβ T β o ) D N (0, I k ). Given the specification y t = x tβ + e t and [B2], define ( 1 V T := var T T x t ɛ t ) [B3] {V 1/2 o x t ɛ t } obeys a CLT, where V o = lim T V T is p.d.. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 16 / 85

The normalized OLS estimator is ( ) 1 ( 1 T T (ˆβ T β o ) = x T t x 1 t T ( 1 = T ) 1 T x t x t V 1/2 o D M 1 xx V 1/2 o N (0, I k ). [ T V 1/2 o x t ɛ t ) ( 1 T T x t ɛ t )] Theorem 6.6 Given y t = x tβ + e t, suppose that [B1](i), [B2] and [B3] hold. Then, T (ˆβ T β o ) D N (0, D o ), D o = M 1 xx V o M 1 xx. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 17 / 85

Corrollary 6.7 Given y t = x tβ + e t, suppose that (y t x t) are independent random vectors with bounded (4 + δ) th moment for any δ > 0 and that [B2] holds. If M xx defined in [B1] and V o defined in [B3] exist, T (ˆβ T β o ) D N (0, D o ), D o = M 1 xx V o M 1 xx. Proof: Let z t = λ x t ɛ t, where λ is such that λ λ = 1. If {z t } obeys a CLT, then {x t ɛ t } obeys a multivariate CLT by the Cramér-Wold device. Clearly, z t are independent r.v. with mean zero and var(z t ) = λ [var(x t ɛ t )]λ. By data independence, ( 1 V T = var T T x t ɛ t ) = 1 T T var(x t ɛ t ). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 18 / 85

Proof (Cont d): The average of var(z t ) is then 1 T T var(z t ) = λ V T λ λv o λ. By the Cauchy-Schwartz inequality, IE x ti y t 2+δ [ IE x ti 2(2+δ)] 1/2[ IE yt 2(2+δ)] 1/2, for some > 0. Similarly, x ti x tj have bounded (2 + δ) th moment. It follows that x ti ɛ t and z t also have bounded (2 + δ) th moment by Minkowski s inequality. Then by Liapunov s CLT, 1 T (λ V o λ) T z t D N (0, 1). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 19 / 85

Example: Consider y t = αy t 1 + e t. Case 1: y t = α o y t 1 + u t with α o < 1, where u t are i.i.d. with mean zero and variance σu. 2 Note var(y t 1 u t ) = IE(yt 1) 2 IE(u t 2 ) = σu/(1 4 αo), 2 and cov(y t 1 u t, y t 1 j u t j ) = 0 for all j > 0. A CLT ensures: 1 α 2 o σu 2 T T y t 1 u t D N (0, 1). As T y 2 t 1 /T converges to σ2 u/(1 α 2 o), we have 1 α 2 o σ 2 u σ 2 u 1 α 2 o T (ˆαT α o ) = 1 T (ˆαT α 1 α 2 o ) D N (0, 1), o or equivalently, T (ˆα T α o ) D N (0, 1 α 2 o). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 20 / 85

Example (cont d): When {y t } is a random walk: y t = y t 1 + u t. We already know var(t 1/2 T y t 1u t ) diverges with T and hence {y t 1 u t } does not obey a CLT. Thus, there is no guarantee that normalized ˆα T is asymptotically normally distributed. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 21 / 85

Theorem 6.9 Given y t = x tβ + e t, suppose that [B1](i), [B2] and [B3] hold. Then, D 1/2 T T (ˆβT β o ) D N (0, I k ), where D T = ( T x tx t/t ) 1 VT ( T x tx t/t ) 1, with V T IP V o. Remarks: 1 Theorem 6.6 may hold for weakly dependent and heterogeneously distributed data, as long as these data obey proper LLN and CLT. 2 Normalizing the OLS estimator with an inconsistent estimator of D 1/2 o destroys asymptotic normality. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 22 / 85

Consistent Estimation of Covariance Matrix Consistent estimation of D o amounts to consistent estimation of V o. Write V o = lim T V T = lim T 1 T j= T +1 Γ T (j), with 1 T T t=j+1 Γ T (j) = IE(x tɛ t ɛ t j x t j ), j = 0, 1, 2,..., 1 T T t= j+1 IE(x t+jɛ t+j ɛ t x t), j = 1, 2,.... When {x t ɛ t } is weakly stationary, IE(x t ɛ t ɛ t j x t j ) depends only on the time difference j but not on t. Thus, Γ T (j) = Γ T ( j) = IE(x t ɛ t ɛ t j x t j), j = 0, 1, 2,..., and V o = Γ(0) + lim T 2 T 1 j=1 Γ(j). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 23 / 85

Eicker-White Estimator Case 1: When {x t ɛ t } has no serial correlations, V o = lim Γ 1 T (0) = lim T T T T IE(ɛ 2 t x t x t). A heteroskedasticity-consistent estimator of V o is V T = 1 T T êt 2 x t x t, which permits conditional heteroskedasticity of unknown form. The Eicker-White estimator of D o is: ( 1 D T = T ) 1 ( T x t x 1 t T ) ( T êt 2 x t x 1 t T 1 T x t x t). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 24 / 85

The Eicker-White estimator is robust when heteroskedasticity is present and of an unknown form. If ɛ t are also conditionally homoskedastic: IE ( ɛ 2 t Y t 1, W t) = σ 2 o, 1 V o = lim T T T IE [ IE ( ɛ 2 t Y t 1, W t) x t x t] = σ 2 o M xx. Then, D o is M 1 xx V o M 1 xx = σo 2 M 1 xx, and it can be consistently estimated by ( D T = ˆσ T 2 1 T 1 T x t x t), as in the classical model. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 25 / 85

Newey-West Estimator Case 2: When {x t ɛ t } exhibits serial correlations such that l(t ) V T = j= l(t ) Γ T (j) V o, where l(t ) diverges with T, we may try to estimate V T. A difficulty: The sample counterpart l(t ) j= l(t ) Γ T (j), which is based on the sample counterpart of Γ T (j), may not be p.s.d. A heteroskedasticity and autocorrelation-consistent (HAC) estimator that is guaranteed to be p.s.d. has the following form: V κ T = T 1 j= T +1 ( j κ ) ΓT (j), (1) l(t ) where κ is a kernel function and l(t ) is its bandwidth. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 26 / 85

The estimator of D o due to Newey and West (1987), ( D κ 1 T = T ) 1 T x t x t Vκ T ( 1 T 1 T x t x t), is robust to both conditional heteroskedasticity of ɛ t and serial correlations of x t ɛ t. The Eicker-White and Newey-West estimators do not rely on any parametric model of cond. heteroskedasticity and serial correlations. κ satisfies: κ(x) 1, κ(0) = 1, κ(x) = κ( x) for all x R, κ(x) dx <, κ is continuous at 0 and at all but a finite number of other points in R, and κ(x)e ixω dx 0, ω R. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 27 / 85

Some Commonly Used Kernel Functions 1 Bartlett kernel (Newey and West, 1987): κ(x) = 1 x for x 1, and κ(x) = 0 otherwise. 2 Parzen kernel (Gallant, 1987): 1 6x 2 + 6 x 3, x 1/2, κ(x) = 2(1 x ) 3, 1/2 x 1, 0, otherwise; 3 Quadratic spectral kernel (Andrews, 1991): κ(x) = 25 ( ) sin(6πx/5) 12π 2 x 2 cos(6πx/5) ; 6πx/5 4 Daniel kernel (Ng and Perron, 1996): κ(x) = sin(πx) πx. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 28 / 85

κ(x) 0.2 0.0 0.2 0.4 0.6 0.8 1.0 Bartlett Parzen Quadratic Spectral Daniell 0 1 2 3 4 x Figure: The Bartlett, Parzen, quandratic spectral and Daniel kernels. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 29 / 85

Remarks: Bandwidth l(t ): It can be of order o(t 1/2 ), Andrews (1991). (What does this imply?) The Bartlett and Parzen kernels have the bounded support [ 1, 1], but the quadratic spectral and Daniel kernels have unbounded support. Andrews (1991): The quadratic spectral kernel is to be preferred in HAC estimation. Rate of convergence: O(T 1/3 ) for the Bartlett kernel, and O(T 2/5 ) for the Parzen and quadratic spectral. The quadratic spectral kernel is more efficient asymptotically than the Parzen kernel, and the Bartlett kernel is the least efficient. The optimal choice of l(t ) is an important issue in practice. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 30 / 85

Wald Test Null hypothesis: Rβ o = r Want to check if Rˆβ T is sufficiently close to r. By Theorem 6.6, (RD o R ) 1/2 T R(ˆβ T β o ) D o = M 1 xx V o M 1 xx. Given a consistent estimator for D o : ( 1 D T = T ) 1 ( T x t x 1 t VT T 1 T x t x t), with V T be a consistent estimator of V o, we have (R D T R ) 1/2 T R(ˆβ T β o ) D N (0, I q ). D N (0, I q ), where C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 31 / 85

The Wald test statistic is W T = T (Rˆβ T r) (R D T R ) 1 (Rˆβ T r). Theorem 6.10 Given y t = x tβ + e t, suppose that [B1](i), [B2] and [B3] hold. Then D under the null, W T χ 2 (q), where q is the number of hypotheses. Data are not required to be serially uncorrelated, homoskedastic, or normally distributed. The limiting χ 2 distribution of the Wald test is only an approximation to the exact distribution. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 32 / 85

Example: Given the specification y t = x 1,t b 1 + x 2,t b 2 + e t, where x 1,t is (k s) 1 and x 2,t is s 1. Hypothesis: Rβ o = 0, where R = [0 s (k s) I s ]. The Wald test statistic is W T = T ˆβ T R ( R D T R ) 1 Rˆβ T D χ 2 (s), where D T = (X X/T ) 1 ˆV T (X X/T ) 1. The exact form of W T depends on D T. When V T = ˆσ 2 T (X X/T ) is consistent for V o, D T = ˆσ 2 T (X X/T ) 1 is consistent for D o, and the Wald statistic becomes W T = T ˆβ T R [ R(X X/T ) 1 R ] 1 Rˆβ T /ˆσ 2 T, which is s times the standard F statistic. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 33 / 85

Lagrange Multiplier (LM) Test Given the constraint Rβ = r, the Lagrangian is 1 T (y Xβ) (y Xβ) + (Rβ r) λ, where λ is the q 1 vector of Lagrange multipliers. The solutions are: λ T = 2 [ R(X X/T ) 1 R ] 1 (Rˆβ T r), β T = ˆβ T (X X/T ) 1 R λ T /2. The LM test checks if λ T (the shadow price of the constraint) is sufficiently close to zero. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 34 / 85

By the asymptotic normality of T (Rˆβ T r), Λ 1/2 o T λ T D N (0, Iq ), where Λ o = 4(RM 1 xx R ) 1 (RD o R )(RM 1 xx R ) 1. Let V T be a consistent estimator of V o based on the constrained estimation result. Then, Λ T = 4 [ R(X X/T ) 1 R ] 1[ R(X X/T ) 1 V T (X X/T ) 1 R ] [ R(X X/T ) 1 R ] 1, and Λ 1/2 T T λ T LM T = T λ 1 T Λ λ T T. D N (0, Iq ). The LM statistic is C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 35 / 85

Theorem 6.12 Given y t = x tβ + e t, suppose that [B1](i), [B2] and [B3] hold. Then D under the null, LM T χ 2 (q), where q is the number of hypotheses. Writing Rˆβ T r = R(X X/T ) 1 X (y X β T )/T = R(X X/T ) 1 X ë/t, λ T = 2 [ R(X X/T ) 1 R ] 1 R(X X/T ) 1 X ë/t. The LM test is then LM T = T ë X(X X) 1 R [ R(X X/T ) 1 VT (X X/T ) 1 R ] 1 R(X X) 1 X ë. That is, the LM test requires only constrained estimation. IP Note: Under the null, W T LM T 0; if V o is known, the Wald and LM tests would be algebraically equivalent. (why?) C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 36 / 85

Example: Testing whether one would like to add additional s regressors to the specification: y t = x 1,t b 1 + e t. The unconstrained specification is y t = x 1,tb 1 + x 2,tb 2 + e t, and the null hypothesis is Rβ o = 0 with R = [0 s (k s) I s ]. The LM test can be computed as in previous page, using the constrained estimator β T = ( b 1,T 0 ) with b 1,T = (X 1X 1 ) 1 X 1y. Letting X = [X 1 X 2 ] and ë = y X 1 b 1,T, suppose that V T = σ T 2 (X X/T ) is consistent for V o under the null, where σ T 2 = T ë2 t /(T k + s). Then, the LM test is LM T = T ë X(X X) 1 R [ R(X X/T ) 1 R ] 1 R(X X) 1 X ë/ σ 2 T. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 37 / 85

Using the formula for the inverse of a partitioned matrix, R(X X) 1 R = [X 2(I P 1 )X 2 ] 1, R(X X) 1 X = [X 2(I P 1 )X 2 ] 1 X 2(I P 1 ). Clearly, (I P 1 )ë = ë. The LM statistic is thus LM T = ë X(X X) 1 R [ R(X X) 1 R ] 1 R(X X) 1 X ë/ σ T 2 = ë (I P 1 )X 2 [X 2(I P 1 )X 2 ] 1 X 2(I P 1 )ë/ σ T 2 = ë X 2 [X 2(I P 1 )X 2 ] 1 X 2ë/ σ2 T = ë X 2 R(X X) 1 R X 2ë/ σ2 T. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 38 / 85

As X 1ë = 0, we can write ë X 2 R = [0 1 (k s) ë X 2 ] = ë X. A simple version of the LM test reads LM T = ë X(X X) 1 X ë ë ë/(t k + s) = (T k + s)r2, where R 2 is the non-centered R 2 of the auxiliary regression of ë on X. Note: The LM test may also be computed as TR 2, if σ 2 T = ë ë/t is an MLE estimator. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 39 / 85

Likelihood Ratio (LR) Test The OLS estimator ˆβ T is also the MLE β T that maximizes L T (β, σ 2 ) = 1 2 log(2π) 1 2 log(σ2 ) 1 T T (y t x tβ) 2 2σ 2. With ê t = y t x t β T, the unconstrained MLE of σ 2 is σ 2 T = 1 T T êt 2. Given Rβ = r, let β T denote the constrained MLE of β. Then ë t = y t x t β T, and the constrained MLE of σ 2 is σ 2 T = 1 T T ët 2. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 40 / 85

For H 0 : Rβ o = r, the LR test compares the constrained and unconstrained L T : LR T = 2T ( ( L T ( β T, σ T 2 ) L T ( β σ T, σ T 2 2 ) )) = T log T σ T 2. The null would be rejected if LR T is far from zero. Theorem 6.15 Given y t = x tβ + e t, suppose that [B1](i), [B2] and [B3] hold and that σ 2 T (X X/T ) is consistent for V o. Then under the null hypothesis, LR T D χ 2 (q), where q is the number of hypotheses. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 41 / 85

Noting ë = X( β T β T ) + ê and X ê = 0, we have σ T 2 = σ2 T + ( β T β T ) (X X/T )( β T β T ). We have seen β T β T = (X X/T ) 1 R [ R(X X/T ) 1 R ] 1 (Rˆβ T r). It follows that and that σ 2 T = σ2 T + (R β T r) [R(X X/T ) 1 R ] 1 (R β T r), LR T = T log ( 1 + (R β T r) [R(X X/T ) 1 R ] 1 (R β T r)/ σ T 2 ). }{{} =: a T C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 42 / 85

Owing to consistency of ˆβ T, a T 0. The mean value expansion of log(1 + a T ) about a T = 0 yields log(1 + a T ) (1 + a T ) 1 a T, where a T lies between a T and 0 and converges to zero. Then, LR T = T (1 + a T ) 1 a T = Ta T + o IP (1), where Ta T is the Wald statistic with V T = σ T 2 (X X/T ). When this V T is consistent for V o, LR T has a limiting χ 2 (q) distribution. Note: The applicability of the LR test here is limited because it can not be made robust to conditional heteroskedasticity and serial correlation. (Why?) C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 43 / 85

Remarks: When the Wald test involves V T = σ 2 T (X X/T ) and the LM test uses V T = σ 2 T (X X/T ), it can be shown that W T LR T LM T. Hence, conflicting inferences in finite samples may arise when the critical values are between two statistics. When V T = σ T 2 (X X/T ) and V T = σ T 2 (X X/T ) are all consistent for V o, the Wald, LM, and LR tests are asymptotically equivalent. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 44 / 85

Power of Tests Consider the alternative hypothesis: Rβ o = r + δ, where δ 0. Under the alternative, T (Rˆβ T r) = T R(ˆβ T β o ) + T δ, where the first term on the RHS converges and the second term diverges. We have IP(W T > c) 1 for any critical value c, because 1 T W T IP δ (RD o R ) 1 δ. The Wald test is therefore a consistent test. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 45 / 85

Example: Analysis of Suicide Rate Part I: Estimation results based on different covariance matrices const D t u t 1 u t 1 D t R2 OLS Coeff. 5.60 0.75 1.93 0.52 0.64 OLS s.e. 2.32 3.00 1.17 1.27 White s.e. 2.23 2.55 0.96 1.04 NW-B s.e. 2.79 3.62 1.09 1.26 (t-ratio) (2.00) ( 0.21) (1.78) (0.42) NW-QS s.e. 2.94 3.98 1.13 1.32 (t-ratio) (1.91) ( 0.19) (1.72) (0.40) FGLS Coeff. 19.14 0.73 0.13 0.10 NW-B and NW-QS stand for the Newey-West estimates based on the Bartlett and quadratic spectral kernels, respectively, with the truncation lag chosen by the package in R; D t = 1 for t > T = 1994. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 46 / 85

Part II: Estimation results based on different covariance matrices const D t u t 1 t td t R2 OLS Coeff. 12.36 15.26 0.38 0.50 1.19 0.91 OLS s.e. 1.05 2.04 0.36 0.08 0.14 White s.e. 0.71 1.41 0.26 0.05 0.09 NW-B s.e. 1.67 14.58 0.86 0.04 0.70 (t-ratio) (7.41) ( 1.05) (0.44) ( 14.10) (1.69) NW-QS s.e. 1.94 17.35 1.01 0.04 0.84 (t-ratio) (6.37) ( 0.88) (0.38) ( 12.43) (1.42) FGLS Coeff. 14.67 18.22 0.23 0.56 1.35 NW-B and NW-QS stand for the Newey-West estimates based on the Bartlett and quadratic spectral kernels, respectively, with the truncation lag chosen by the package in R; D t = 1 for t > T = 1994. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 47 / 85

Instrumental Variable Estimator OLS inconsistency: 1 A model omits relevant regressors. 2 A model includes lagged dependent variables as regressors and serially correlated errors. 3 A model involves regressors that are measured with errors. 4 The dependent variable and regressors are jointly determined at the same time (simultaneity problem). 5 The dependent variable is determined by some unobservable factors which are correlated with regressors (selectivity problem). To obtain consistency, let z t (k 1) be variables taken from (Y t 1, W t ) such that IE(z t ɛ t ) = 0 and z t are correlated with x t in the sense that IE(z t x t) is not singular. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 48 / 85

The sample counterpart of IE(z t ɛ t ) = IE[z t (y t x tβ o )] = 0 is 1 T T [z t (y t x tβ)] = 0, which is a system of k equations with k unknowns. The solution is the instrumental variable (IV) estimator: ( T ) 1 ( T ) ˇβ T = z t x IP t z t y t M 1 zx m zy = β o, under suitable LLN. This is also a method of moment estimator, because it solves the sample counterpart of the moment conditions: IE[z t (y t x tβ o )] = 0. This method breaks down when more than k instruments are available. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 49 / 85

Assume CLT: T 1/2 T z D tɛ t N (0, Vo ) with ( T ) z t ɛ t. V o = lim T var 1 T The normalized IV estimator has asymptotic normality: ( 1 T (ˇβ T β o ) = T ) 1 ( T z t x 1 t T where D o = M 1 zx V o M 1 zx. Then, T (ˇβ T β o ) D 1/2 T estimator for D o. T ) D z t ɛ t N (0, D o ), D N (0, I k ), where D T is a consistent C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 50 / 85

I (1) Variables {y t } is said to be an I (1) (integrated of order 1) process if y t = y t 1 + ɛ t, with ɛ t satisfying: [C1] {ɛ t } is a weakly stationary process with mean zero and variance σ 2 ɛ and obeys an FCLT: 1 σ T [Tr] ɛ t = 1 σ T y [Tr] w(r), 0 r 1, where w is standard Wiener process, and σ 2 is the long-run variance of ɛ t : σ 2 = lim T var ( 1 T T ɛ t ). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 51 / 85

Partial sums of an I (0) series (e.g., t i=1 ɛ i) form an I (1) series, while taking first difference of an I (1) series (e.g., y t y t 1 ) yields an I (0) series. A random walk is I (1) with i.i.d. ɛ t and σ 2 = σ 2 ɛ. When ɛ t = y t y t 1 is a stationary ARMA(p, q) process, y is an I (1) process and known as an ARIMA(p, 1, q) process. An I (1) series y t has mean zero and variance increasing linearly with t, and its autocovariances cov(y t, y s ) do not decrease when t s increases. Many macroeconomic and financial time series are (or behave like) I (1) processes. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 52 / 85

ARIMA vs. ARMA Processes 15 ARIMA ARMA 25 20 ARIMA ARMA 10 15 5 10 0 5 5 0 0 50 100 150 200 250 0 50 100 150 200 250 Figure: Sample paths of ARIMA and ARMA series. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 53 / 85

I (1) vs. Trend Stationarity Trend stationary series: y t = a o + b o t + ɛ t, where ɛ t are I (0). 25 random walk 30 random walk 20 15 10 25 20 15 10 5 5 0 0 0 50 100 150 200 250 0 50 100 150 200 250 Figure: Sample paths of random walk and trend stationary series. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 54 / 85

Autoregression of an I (1) Variable Suppose {y t } is a random walk such that y t = α o y t 1 + ɛ t with α o = 1 and ɛ t i.i.d. random variables with mean zero and variance σ 2 ɛ. {y t } does not obey a LLN, and T t=2 y t 1ɛ t = O IP (T ) and T t=2 y 2 t 1 = O IP(T 2 ). Given the specification: y t = αy t 1 + e t, the OLS estimator of α is: ˆα T = T t=2 y t 1y t T t=2 y 2 t 1 = 1 + T t=2 y t 1ɛ t T t=2 y 2 t 1 = 1 + O IP (T 1 ), which is T -consistent. This is also known as a super consistent estimator. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 55 / 85

Asymptotic Properties of the OLS Estimator Lemma 7.1 Let y t = y t 1 + ɛ t be an I (1) series with ɛ t satisfying [C1]. Then, (i) T 3/2 1 T y t 1 σ w(r) dr; (ii) T 2 T y 2 t 1 σ2 (iii) T 1 T y t 1ɛ t 1 2 [σ2 w(1) 2 σ 2 ɛ ] = σ 2 0 1 0 1 0 w(r) 2 dr; where w is the standard Wiener process. Note: When y t is a random walk, σ 2 = σ 2 ɛ. w(r) dw(r) + 1 2 (σ2 σ 2 ɛ ), C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 56 / 85

Theorem 7.2 Let y t = y t 1 + ɛ t be an I (1) series with ɛ t satisfying [C1]. Given the specification y t = αy t 1 + e t, the normalized OLS estimator of α is: T (ˆα T 1) = T t=2 y 1 t 1ɛ t /T T t=2 y t 1 2 /T 2 2[ w(1) 2 σɛ 2 /σ 2 ] 1 0 w(r)2 dr where w is the standard Wiener process. When y t is a random walk, [ w(1) 2 1 ] T (ˆα T 1) 1 2 1 0 w(r)2 dr, which does not depend on σ 2 ɛ and σ 2 and is asymptotically pivotal.. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 57 / 85

Lemma 7.3 Let y t = y t 1 + ɛ t be an I (1) series with ɛ t satisfying [C1]. Then, (i) T 2 T (y t 1 ȳ 1 ) 2 σ 2 (ii) T 1 T (y t 1 ȳ 1 )ɛ t σ 2 1 0 1 0 w (r) 2 dr; w (r) dw(r) + 1 2 (σ2 σ 2 ɛ ), where w is the standard Wiener process and w (t) = w(t) 1 0 w(r) dr. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 58 / 85

Theorem 7.4 Let y t = y t 1 + ɛ t be an I (1) series with ɛ t satisfying [C1]. Given the specification y t = c + αy t 1 + e t, the normalized OLS estimators of α and c are: T (ˆα T 1) 1 0 w (r) dw(r) + 1 2 (1 σ2 ɛ /σ ) 2 1 0 w (r) 2 dr 1 T ĉt A (σ w(r) dr 0 In particular, when y t is a random walk, ) + σ w(1). =: A, T (ˆα T 1) 1 0 w (r) dw(r) 1 0 w. (r) 2 dr C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 59 / 85

The limiting results for autoregressions with an I (1) variable are not invariant to model specification. All the results here are based on the data with DGP: y t = y t 1 + ɛ t. intercept. These results would break down if the DGP is y t = c o + y t 1 + ɛ t with a non-zero c o ; such series are said to be I (1) with drift. I (1) process with a drift: t y t = c o + y t 1 + ɛ t = c o t + ɛ i, i=1 which contains a deterministic trend and an I (1) series without drift. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 60 / 85

Tests of Unit Root 1 Given the specification y t = αy t 1 + e t, the unit root hypothesis is α o = 1, and a leading unit-root test is the t test: τ 0 = ( T t=2 y t 1 2 ) 1/2(ˆαT 1), ˆσ T,1 where ˆσ 2 T,1 = T t=2 (y t ˆα T y t 1 ) 2 /(T 2). 2 Given the specification y t = c + αy t 1 + e t, a unit-root test is τ c = [ T t=2 (y t 1 ȳ 1 ) 2] 1/2 (ˆαT 1) ˆσ T,2, where ˆσ 2 T,2 = T t=2 (y t ĉ T ˆα T y t 1 ) 2 /(T 3). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 61 / 85

Theorem 7.5 Let y t be generated as a random walk. Then, τ 0 τ c 1 2 [w(1)2 1] [ 1 0 w(r)2 dr ] 1/2, 1 0 w (r) dw(r) [ 1 0 w (r) 2 dr ] 1/2. For the specification with a time trend variable: ( y t = c + αy t 1 + β t T ) + e 2 t, the t-statistic of α o = 1 is denoted as τ t. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 62 / 85

Dickey-Fuller distributions Table: Some percentiles of the Dickey-Fuller distributions. Test 1% 2.5% 5% 10% 50% 90% 95% 97.5% 99% τ 0 2.58 2.23 1.95 1.62 0.51 0.89 1.28 1.62 2.01 τ c 3.42 3.12 2.86 2.57 1.57 0.44 0.08 0.23 0.60 τ t 3.96 3.67 3.41 3.13 2.18 1.25 0.94 0.66 0.32 These distributions are not symmetric about zero and assume more negative values. τ c assumes negatives values about 95% of times, and τ t is virtually a non-positive random variable. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 63 / 85

The Dickey-Fuller Distributions 0.0 0.1 0.2 0.3 0.4 0.5 0.6 τ c τ 0 N (0, 1) 4 2 0 2 4 Figure: The distributions of the Dickey-Fuller τ 0 and τ c tests vs. N (0, 1). 1 C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 64 / 85

Implementation In practice, we estimate one of the following specifications: 1 y t = θy t 1 + e t. 2 y t = c + θy t 1 + e t. 3 y t = c + θy t 1 + β ( t T ) 2 + et. The unit-root hypothesis α o = 1 is now equivalent to θ o = 0. The weak limits of the normalized estimators T ˆθ T are the same as the respective limits of T (ˆα T 1) under the null hypothesis. The unit-root tests are now computed as the t-ratios of these specifications. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 65 / 85

Phillips-Perron Tests Note: The Dickey-Fuller tests check only the random walk hypothesis and are invalid for testing general I (1) processes. Theorem 7.6 Let y t = y t 1 + ɛ t be an I (1) series with ɛ t satisfying [C1]. Then, τ 0 σ σ ɛ τ c σ σ ɛ ( ) 1 2 [w(1)2 σɛ 2 /σ ] 2 [ 1 0 w(r)2 dr ], 1/2 ( 1 0 w (r) dw(r) + 1 2 (1 σ2 ɛ /σ ) 2 [ 1 0 w (r) 2 dr ] 1/2 ), C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 66 / 85

Let ê t denote the OLS residuals and stn 2 a Newey-West type estimator of σ 2 based on ê t : s 2 Tn = 1 T 1 T t=2 êt 2 + 2 T 2 ( s ) T κ T 1 n s=1 t=s+2 with κ a kernel function and n = n(t ) its bandwidth. ê t ê t s, Phillips (1987) proposed the following modified τ 0 and τ c statistics: Z(τ 0 ) = ˆσ 1 T 2 τ s 0 (s2 Tn ˆσ2 T ( ) Tn s T Tn t=2 y t 1 2 /T 2) 1/2, Z(τ c ) = ˆσ 1 T 2 τ s c (s2 T ˆσ2 T [ ) Tn s T Tn t=2 (y t 1 ȳ 1 ) 2] 1/2 ; see also Phillips and Perron (1988). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 67 / 85

The Phillips-Perron tests eliminate the nuisance parameters by suitable transformations of τ 0 and τ c and have the same limits as those of the Dickey-Fuller tests. Corollary 7.7. Let y t = y t 1 + ɛ t be an I (1) series with ɛ t satisfying [C1]. Then, 1 2 [ w(1) 2 1 ] Z(τ 0 ) [ 1 0 w(r)2 dr ] 1/2, 1 0 Z(τ c ) w (r) dw(r) [ 1 0 w (r) 2 dr ] 1/2. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 68 / 85

Augmented Dickey-Fuller (ADF) Tests Said and Dickey (1984) suggest filtering out the correlations in a weakly stationary process by a linear AR model with a proper order. The augmented specifications are: 1 y t = θy t 1 + k j=1 γ j y t j + e t. 2 y t = c + θy t 1 + k j=1 γ j y t j + e t. ( ) 3 y t = c + θy t 1 + β t T 2 + k j=1 γ j y t j + e t. Note: This approach avoids non-parametric kernel estimation of σ 2 but requires choosing a proper lag order k for the augmented specifications (say, by a model selection criteria, such as AIC or SIC). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 69 / 85

KPSS Tests {y t } is trend stationary if it fluctuates around a deterministic trend: y t = a o + b o t + ɛ t, where ɛ t satisfy [C1]. When b o = 0, it is level stationary. Kwiatkowski, Phillips, Schmidt, and Shin (1992) proposed testing stationarity by η T = 1 T 2 s 2 Tn ( T t ) 2 ê i, i=1 where s 2 Tn is a Newey-West estimator of σ2 based on ê t. To test the null of trend stationarity, ê t = y t â T ˆb T t. To test the null of level stationarity, ê t = y t ȳ. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 70 / 85

The partial sums of ê t = y t ȳ are such that [Tr] [Tr] [Tr] ê t = (ɛ t ɛ) = ɛ t [Tr] T Then by a suitable FCLT, T ɛ t, r (0, 1]. 1 σ T [Tr] ê t w(r) rw(1) = w 0 (r). Similarly, given ê t = y t â T ˆb T t, 1 σ T [Tr] ê t w(r) + (2r 3r 2 )w(1) (6r 6r 2 ) 1 0 w(s) ds, which is a tide-down process (it is zero at r = 1 with prob. one). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 71 / 85

Theorem 7.8 Let y t = a o + b o t + ɛ t with ɛ t satisfying [C1]. Then, η T computed from ê t = y t â T ˆb T t is: η T 1 0 f (r) 2 dr, where f (r) = w(r) + (2r 3r 2 )w(1) (6r 6r 2 ) 1 0 w(s) ds. Let y t = a o + ɛ t with ɛ t satisfying [C1]. Then, η T computed from ê t = y t ȳ is: η T 1 0 w 0 (r) 2 dr, where w 0 is the Brownian bridge. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 72 / 85

Table: Some percentiles of the distributions of the KPSS test. Test 1% 2.5% 5% 10% level stationarity 0.739 0.574 0.463 0.347 trend stationarity 0.216 0.176 0.146 0.119 These tests have power against I (1) series because η T would diverge under I (1) alternatives. KPSS tests also have power against other alternatives, such as stationarity with mean changes and trend stationarity with trend breaks. Thus, rejecting the null of stationarity does not imply that the series must be I (1). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 73 / 85

The KPSS Distributions 0 5 10 15 trend level 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Figure: The distributions of the KPSS tests. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 74 / 85

Spurious Regressions Granger and Newbold (1974): Regressing one random walk on the other typically yields a significant t-ratio. They refer to this result as spurious regression. Given the specification y t = α + βx t + e t, let ˆα T and ˆβ T denote the OLS estimators for α and β, respectively, and the corresponding t-ratios: t α = ˆα T /s α and t β = ˆβ T /s β, where s α and s β are the OLS standard errors for ˆα T and ˆβ T. y t = y t 1 + u t and x t = x t 1 + v t, where {u t } and {v t } are mutually independent processes satisfying the following condition. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 75 / 85

[C2] {u t } and {v t } are two weakly stationary processes with mean zero and respective variances σ 2 u and σ 2 v and obey an FCLT with: ( T ) 2 σy 2 1 = lim T T IE u t, ( T ) 2 σx 2 1 = lim T T IE v t. We have the following results: 1 T 3/2 T 1 y t σ y w y (r) dr, 0 1 T 2 T yt 2 σy 2 1 0 w y (r) 2 dr, where w y is a standard Wiener processes. Similarly, 1 T 3/2 T 1 x t σ x w x (r) dr, 0 1 T 2 T xt 2 σx 2 1 0 w x (r) 2 dr. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 76 / 85

We also have 1 T 2 1 T 2 T (y t ȳ) 2 σy 2 T (x t x) 2 σx 2 1 0 1 0 w y (r) 2 dr σ 2 y w x (r) 2 dr σ 2 x ( 1 0 ( 1 0 ) 2 w y (r) dr =: σy 2 m y, ) 2 w x (r) dr =: σxm 2 x, where w y (t) = w y (t) 1 0 w y (r) dr and w x (t) = w x (t) 1 0 w x(r) dr are two mutually independent, de-meaned Wiener processes. Also, 1 T 2 T (y t ȳ)(x t x t ) ( 1 σ y σ x w y (r)w x (r) dr 0 =: σ y σ x m yx. 1 0 w y (r) dr 1 0 ) w x (r) dr C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 77 / 85

Theorem 7.9 Let y t = y t 1 + u t and x t = x t 1 + v t, where {u t } and {v t } are mutually independent and satisfy [C2]. Given the specification y t = α + βx t + e t, (i) ˆβ T σ y m yx σ x m x, ( 1 (ii) T 1/2 ˆα T σ y w y (r) dr m yx 0 m yx (iii) T 1/2 t β (m y m x myx) 2 1/2, 1 m x 1 0 ) w x (r) dr, (iv) T 1/2 t α m x 0 w 1 y (r) dr m yx 0 w x(r) dr [ (my m x myx) 2 1 0 w x(r) 2 dr ] 1/2, w x and w y are two mutually independent, standard Wiener processes. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 78 / 85

While the true parameters should be α o = β o = 0, ˆβ T has a limiting distribution, and ˆα T diverges at the rate T 1/2. Theorem 7.9 (iii) and (iv) indicate that t α and t β both diverge at the rate T 1/2 and are likely to reject the null of α o = β o = 0 using the critical values from the standard normal distribution. Spurious trend: Nelson and Kang (1984) showed that, when {y t } is in fact a random walk, one may easily find significant time trend specification: y t = a + b t + e t. Phillips and Durlauf (1986) demonstrate that the F test (and hence the t-ratio) of b o = 0 in the time trend specification above diverges at the rate T, which explains why an incorrect inference would result. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 79 / 85

Cointegration Consider an equilibrium relation between y and x: ay bx = 0. With real data (y t, x t ), z t := ay t bx t are equilibrium errors because they need not be zero all the time. y t and x t are both I (1): A linear combination of them, z t, is, in general, an I (1) series. Then, {z t } rarely crosses zero, and the equilibrium condition entails little empirical restriction on z t. When y t and x t involve the same random walk q t such that y t = q t + u t and x t = cq t + v t, where {u t } and {v t } are I (0). Then, z t := cy t x t = cu t v t, which is a linear combination of I (0) series and hence is also I (0). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 80 / 85

Granger (1981), Granger and Weiss (1983), and Engle and Granger (1987): Let y t be a d-dimensional vector I (1) series. The elements of y t are cointegrated if there exists a d 1 vector, α, such that z t = α y t is I (0). We say the elements of y t are CI(1,1). The vector α is a cointegrating vector. The space spanned by linearly independent cointegating vectors is the cointegrating space; the number of linearly independent cointegrating vectors is the cointegrating rank which is the dimension of the cointegrating space. If the cointegrating rank is r, we can put r linearly independent cointegrating vectors together and form the d r matrix A such that z t = A y t is a vector I (0) series. The cointegrating rank is at most d 1. (Why?) C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 81 / 85

Cointegrating Regression Cointegrating regression: y 1,t = α y 2,t + z t. Then, (1 α ) is the cointegrating vector and z t are the regression (equilibrium) errors. When the elements of y t are cointegrated, z t is correlated with y 2,t. Consistency of the OLS estimators do not matter asymptotically, but correlation would result in finite-sample bias and efficiency loss. Efficiency: Saikkonen (1991) proposed a modified co-integrating regression: k y 1,t = α y 2,t + y 2,t jb j + e t, j= k so that the OLS estimator of α is asymptotically efficient. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 82 / 85

Tests of Cointegration One can verify a cointegration relation by applying unit-root tests, such as the augmented Dickey-Fuller test and the Phillips-Perron test, to ẑ t. The null hypothesis that a unit root is present is equivalent to the hypothesis of no cointegration. To implement a unit-root test on cointegration residuals ẑ T, a difficulty is that ẑ T is not a raw series but a result of OLS fitting. Thus, even when z t may be I (1), the residuals ẑ t may not have much variation and hence behave like a stationary series. Engle and Granger (1987), Engle and Yoo (1987), and Davidson and MacKinnon (1993) simulated proper critical values for the unit-root tests on cointegrating residuals. Similar to the unit-root tests discussed earlier, these critical values are all model dependent. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 83 / 85

Table: Some percentiles of the distributions of the cointegration τ c test. d 1% 2.5% 5% 10% 2 3.90 3.59 3.34 3.04 3 4.29 4.00 3.74 3.45 4 4.64 4.35 4.10 3.81 Drawbacks of cointegrating regressions: 1 The choice of the dependent variable is somewhat arbitrary. 2 This approach is more suitable for finding only one cointegrating relationship. One may estimate multiple cointegration relations by a vector regression. It is now typical to adopt the maximum likelihood approach of Johansen (1988) to estimate the cointegrating space directly. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 84 / 85

Error Correction Model (ECM) When the elements of y t are cointegrated with A y t = z t, then there exists an error correction model (ECM): y t = Bz t 1 + C 1 y t 1 + + C k y t k + ν t. Cointegration characterizes the long-run equilibrium relations because it deals with the levels of I (1) variables, and the ECM deals with the differences of variables and describes short-run dynamics. When cointegration exists, a vector AR model of y t is misspecified because it omits z t 1, and the parameter estimates are inconsistent. We regress y t on ẑ t 1 and lagged y t. Here, standard asymptotic theory applies because ECM involves only stationary variables when cointegration exists. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 85 / 85