Asymptotic Least Squares Theory

Asymptotic Least Squares Theory CHUNG-MING KUAN Department of Finance & CRETA December 5, 2011 C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 1 / 85

Lecture Outline 1 When Regressors are Stochasitc 2 Asymptotic Properties of the OLS Estimators Consistency Asymptotic Normality 3 Consistent Estimation of Covariance Matrix 4 Large-Sample Tests Wald Test Lagrange Multiplier Test Likelihood Ratio Test Power of Tests Example: Analysis of Suicide Rate 5 Digression: Instrumental Variable Estimator C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 2 / 85

Lecture Outline (cont d) 6 I (1) Variables 7 Autoregression of an I (1) Variable Asymptotic Properties of the OLS Estimator Tests of Unit Root 8 Tests of Stationarity against I (1) 9 Regressions of I (1) Variables Spurious Regressions Cointegration Error Correction Model C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 3 / 85

When Regressors are Stochasitc Given y = Xβ + e, suppose that X is stochastic. Then, [A2](i) does not hold because IE(y) can not be Xβ o. It would be difficult to evaluate IE(ˆβ T ) and var(ˆβ T ) because ˆβ T is a complex function of the elements of y and X. Assume IE(y X) = Xβ o. IE(ˆβ T ) = IE [ (X X) 1 X IE(y X) ] = β o. If var(y X) = σ 2 oi T, var(ˆβ T ) = IE [ (X X) 1 X var(y X)X(X X) 1] = σ 2 o IE(X X) 1, which is not the same as σ 2 o(x X) 1. (X X) 1 X y is not normally distributed even when y is. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 4 / 85

Q: Is the condition IE(y X) = Xβ o realistic? Suppose that x t contains only one regressor y t 1. Then, IE(y t x 1,..., x T ) = x tβ o implies IE(y t y 1,..., y T 1 ) = β o y t 1, which is y t with probability one. As such, the conditional variance of y t, var(y t y 1,..., y T 1 ) = IE{[y t IE(y t y 1,..., y T 1 )] 2 y 1,..., y T 1 }, must be zero, rather than a positive constant σo. 2 Note: When X is stochastic, a different framework is needed to evaluate the properties of the OLs estimator. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 5 / 85

Notations We observe (y t w t), where w t (m 1) is the vector of all exogenous variables. W t = {w 1,..., w t } and Y t = {y 1,..., y t }. Then, {Y t 1, W t } generates a σ-algebra that is the information set up to time t. Regressors x t (k 1) are taken from the information set {Y t 1, W t }, and the resulting linear specification is y t = x tβ + e t, t = 1, 2,..., T. The OLS estimator of this specification is ( T ) 1 ( T ) ˆβ T = (X X) 1 X y = x t x t x t y t. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 6 / 85

Consistency The OLS estimator ˆβ T is strongly (weakly) consistent for β a.s. if ˆβ T β IP (ˆβ T β ) as T. That is, ˆβ T will be eventually close to β in a proper probabilistic sense when enough information becomes available. [B1] (i) {x t x t} obeys a SLLN (WLLN) with the almost sure (prob.) limit M xx := lim T 1 T T IE(x tx t), which is nonsingular. [B1] (ii) {x t y t } obeys a SLLN (WLLN) with the almost sure (prob.) limit m xy := lim T 1 T T IE(x ty t ). [B2] There exists a β o such that y t = x tβ o + ɛ t with IE(x t ɛ t ) = 0 for all t. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 7 / 85

By [B1] and Lemma 5.13, the OLS estimator of ˆβ T is ( 1 T ) 1 ( T x t x 1 t T ) T x t y t M 1 xx m xy a.s. (in probability). When [B2] holds, IE(x t y t ) = IE(x t x t)β o, so that m xy = M xx β o, and β = β o. Theorem 6.1 Consider the linear specification y t = x tβ + e t. (i) When [B1] holds, ˆβ T is strongly (weakly) consistent for β = M 1 xx m xy. (ii) When [B1] and [B2] hold, β o = M 1 xx m xy so that ˆβ T is strongly (weakly) consistent for β o. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 8 / 85

Remarks: Theorem 6.1 is about consistency (not unbiasedness), and what really matters is whether the data are governed by some SLLN (WLLN). Note that [B1] explicitly allows x t to be a random vector which may contain some lagged dependent variables (y t j, j 1) and other random variables in the information set. Also, the random data may exhibit dependence and heterogeneity, as long as such dependence and heterogeneity do not affect the LLN in [B1]. Given [B2], x tβ is the correct specification for the linear projection of y t, and the OLS estimator converges to the parameter of interest β o. A sufficient condition for [B2] is that there exists β o such that IE(y t Y t 1, W t ) = x tβ o. (Why?) C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 9 / 85

Corollary 6.2 Suppose that (y t x t) are independent random vectors with bounded (2 + δ) th moment for any δ > 0, such that M xx and m xy defined in [B1] exist. Then, the OLS estimator ˆβ T is strongly consistent for β = M 1 xx m xy. If [B2] also holds, ˆβ T is strongly consistent for β o. Proof: By the Cauchy-Schwartz inequality (Lemma 5.5), the i th element of x t y t is such that IE x ti y t 1+δ [ IE x ti 2(1+δ)] 1/2[ IE yt 2(1+δ)] 1/2, for some > 0. Similarly, each element of x t x t also has bounded (1 + δ) th moment. Then, {x t x t} and {x t y t } obey Markov s SLLN by Lemma 5.26 with the respective almost sure limits M xx and m xy. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 10 / 85

Example: Given the specification: y t = αy t 1 + e t, suppose that {y 2 t } and {y t y t 1 } obey a SLLN (WLLN). Then, the OLS estimator of α is such that ˆα T lim T 1 T T IE(y ty t 1 ) T IE(y t 1 2 ) lim T 1 T a.s. (in probability). When {y t } indeed follows a stationary AR(1) process: y t = α o y t 1 + u t, α o < 1, where u t are i.i.d. with mean zero and variance σu, 2 we have IE(y t ) = 0, var(y t ) = σu/(1 2 αo) 2 and cov(y t, y t 1 ) = α o var(y t ). We have ˆα T cov(y t, y t 1 ) var(y t ) = α o, a.s. (in probability). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 11 / 85

When x tβ o is not the linear projection, i.e., IE(x t ɛ t ) 0, IE(x t y t ) = IE(x t x t)β o + IE(x t ɛ t ). Then, m xy = M xx β o + m xɛ, where 1 m xɛ = lim T T T IE(x t ɛ t ). The limit of the OLS estimator now reads β = M 1 xx m xy = β o + M 1 xx m xɛ. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 12 / 85

Example: Given the specification: y t = x tβ + e t, suppose IE(y t Y t 1, W t ) = x tβ o + z tγ o, where z t are in the information set but distinct from x t. Writing y t = x tβ o + z tγ o + ɛ t = x tβ o + u t, we have IE(x t u t ) = IE(x t z t)γ o 0. It follows that ˆβ T M 1 xx m xy = β o + M 1 xx M xz γ o, with M xz := lim T T IE(x tz t)/t. The limit can not be β o unless x t is orthogonal to z t, i.e., IE(x t z t) = 0. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 13 / 85

Example: Given y t = αy t 1 + e t, suppose that y t = α o y t 1 + ɛ t, α o < 1, where ɛ t = u t π o u t 1 with π o < 1, and {u t } is a white noise with mean zero and variance σu. 2 Here, {y t } is a weakly stationary ARMA(1,1) process. We know ˆα T converges to cov(y t, y t 1 )/ var(y t 1 ) almost surely (in probability). Note, however, that ɛ t 1 = u t 1 π o u t 2 and IE(y t 1 ɛ t ) = IE[y t 1 (u t π o u t 1 )] = π o σu. 2 The limit of ˆα T is then cov(y t, y t 1 ) var(y t 1 ) = α o var(y t 1 ) + cov(ɛ t, y t 1 ) var(y t 1 ) = α o π oσ 2 u var(y t 1 ). The OLS estimator is inconsistent for α o unless π o = 0. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 14 / 85

Remark: Given the specification: y t = αy t 1 + x tβ + e t, suppose that y t = α o y t 1 + x tβ o + ɛ t, such that ɛ t are serially correlated (e.g., AR(1) or MA(1)). The OLS estimator is inconsistent because α o y t 1 + x tβ o is not the linear projection, a consequence of the joint presence of a lagged dependent variable (e.g., y t 1 ) and serially correlated disturbances (e.g., ɛ t being AR(1) or MA(1)). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 15 / 85

Asymptotic Normality By asymptotic normality of ˆβ T we mean: T (ˆβ T β o ) D N (0, D o ), where D o is a p.d. matrix. We may also write D 1/2 o T (ˆβ T β o ) D N (0, I k ). Given the specification y t = x tβ + e t and [B2], define ( 1 V T := var T T x t ɛ t ) [B3] {V 1/2 o x t ɛ t } obeys a CLT, where V o = lim T V T is p.d.. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 16 / 85

The normalized OLS estimator is ( ) 1 ( 1 T T (ˆβ T β o ) = x T t x 1 t T ( 1 = T ) 1 T x t x t V 1/2 o D M 1 xx V 1/2 o N (0, I k ). [ T V 1/2 o x t ɛ t ) ( 1 T T x t ɛ t )] Theorem 6.6 Given y t = x tβ + e t, suppose that [B1](i), [B2] and [B3] hold. Then, T (ˆβ T β o ) D N (0, D o ), D o = M 1 xx V o M 1 xx. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 17 / 85

Corrollary 6.7 Given y t = x tβ + e t, suppose that (y t x t) are independent random vectors with bounded (4 + δ) th moment for any δ > 0 and that [B2] holds. If M xx defined in [B1] and V o defined in [B3] exist, T (ˆβ T β o ) D N (0, D o ), D o = M 1 xx V o M 1 xx. Proof: Let z t = λ x t ɛ t, where λ is such that λ λ = 1. If {z t } obeys a CLT, then {x t ɛ t } obeys a multivariate CLT by the Cramér-Wold device. Clearly, z t are independent r.v. with mean zero and var(z t ) = λ [var(x t ɛ t )]λ. By data independence, ( 1 V T = var T T x t ɛ t ) = 1 T T var(x t ɛ t ). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 18 / 85

Proof (Cont d): The average of var(z t ) is then 1 T T var(z t ) = λ V T λ λv o λ. By the Cauchy-Schwartz inequality, IE x ti y t 2+δ [ IE x ti 2(2+δ)] 1/2[ IE yt 2(2+δ)] 1/2, for some > 0. Similarly, x ti x tj have bounded (2 + δ) th moment. It follows that x ti ɛ t and z t also have bounded (2 + δ) th moment by Minkowski s inequality. Then by Liapunov s CLT, 1 T (λ V o λ) T z t D N (0, 1). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 19 / 85

Example: Consider y t = αy t 1 + e t. Case 1: y t = α o y t 1 + u t with α o < 1, where u t are i.i.d. with mean zero and variance σu. 2 Note var(y t 1 u t ) = IE(yt 1) 2 IE(u t 2 ) = σu/(1 4 αo), 2 and cov(y t 1 u t, y t 1 j u t j ) = 0 for all j > 0. A CLT ensures: 1 α 2 o σu 2 T T y t 1 u t D N (0, 1). As T y 2 t 1 /T converges to σ2 u/(1 α 2 o), we have 1 α 2 o σ 2 u σ 2 u 1 α 2 o T (ˆαT α o ) = 1 T (ˆαT α 1 α 2 o ) D N (0, 1), o or equivalently, T (ˆα T α o ) D N (0, 1 α 2 o). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 20 / 85

Example (cont d): When {y t } is a random walk: y t = y t 1 + u t. We already know var(t 1/2 T y t 1u t ) diverges with T and hence {y t 1 u t } does not obey a CLT. Thus, there is no guarantee that normalized ˆα T is asymptotically normally distributed. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 21 / 85

Theorem 6.9 Given y t = x tβ + e t, suppose that [B1](i), [B2] and [B3] hold. Then, D 1/2 T T (ˆβT β o ) D N (0, I k ), where D T = ( T x tx t/t ) 1 VT ( T x tx t/t ) 1, with V T IP V o. Remarks: 1 Theorem 6.6 may hold for weakly dependent and heterogeneously distributed data, as long as these data obey proper LLN and CLT. 2 Normalizing the OLS estimator with an inconsistent estimator of D 1/2 o destroys asymptotic normality. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 22 / 85

Consistent Estimation of Covariance Matrix Consistent estimation of D o amounts to consistent estimation of V o. Write V o = lim T V T = lim T 1 T j= T +1 Γ T (j), with 1 T T t=j+1 Γ T (j) = IE(x tɛ t ɛ t j x t j ), j = 0, 1, 2,..., 1 T T t= j+1 IE(x t+jɛ t+j ɛ t x t), j = 1, 2,.... When {x t ɛ t } is weakly stationary, IE(x t ɛ t ɛ t j x t j ) depends only on the time difference j but not on t. Thus, Γ T (j) = Γ T ( j) = IE(x t ɛ t ɛ t j x t j), j = 0, 1, 2,..., and V o = Γ(0) + lim T 2 T 1 j=1 Γ(j). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 23 / 85

Eicker-White Estimator Case 1: When {x t ɛ t } has no serial correlations, V o = lim Γ 1 T (0) = lim T T T T IE(ɛ 2 t x t x t). A heteroskedasticity-consistent estimator of V o is V T = 1 T T êt 2 x t x t, which permits conditional heteroskedasticity of unknown form. The Eicker-White estimator of D o is: ( 1 D T = T ) 1 ( T x t x 1 t T ) ( T êt 2 x t x 1 t T 1 T x t x t). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 24 / 85

The Eicker-White estimator is robust when heteroskedasticity is present and of an unknown form. If ɛ t are also conditionally homoskedastic: IE ( ɛ 2 t Y t 1, W t) = σ 2 o, 1 V o = lim T T T IE [ IE ( ɛ 2 t Y t 1, W t) x t x t] = σ 2 o M xx. Then, D o is M 1 xx V o M 1 xx = σo 2 M 1 xx, and it can be consistently estimated by ( D T = ˆσ T 2 1 T 1 T x t x t), as in the classical model. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 25 / 85

Newey-West Estimator Case 2: When {x t ɛ t } exhibits serial correlations such that l(t ) V T = j= l(t ) Γ T (j) V o, where l(t ) diverges with T, we may try to estimate V T. A difficulty: The sample counterpart l(t ) j= l(t ) Γ T (j), which is based on the sample counterpart of Γ T (j), may not be p.s.d. A heteroskedasticity and autocorrelation-consistent (HAC) estimator that is guaranteed to be p.s.d. has the following form: V κ T = T 1 j= T +1 ( j κ ) ΓT (j), (1) l(t ) where κ is a kernel function and l(t ) is its bandwidth. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 26 / 85

The estimator of D o due to Newey and West (1987), ( D κ 1 T = T ) 1 T x t x t Vκ T ( 1 T 1 T x t x t), is robust to both conditional heteroskedasticity of ɛ t and serial correlations of x t ɛ t. The Eicker-White and Newey-West estimators do not rely on any parametric model of cond. heteroskedasticity and serial correlations. κ satisfies: κ(x) 1, κ(0) = 1, κ(x) = κ( x) for all x R, κ(x) dx <, κ is continuous at 0 and at all but a finite number of other points in R, and κ(x)e ixω dx 0, ω R. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 27 / 85

Some Commonly Used Kernel Functions 1 Bartlett kernel (Newey and West, 1987): κ(x) = 1 x for x 1, and κ(x) = 0 otherwise. 2 Parzen kernel (Gallant, 1987): 1 6x 2 + 6 x 3, x 1/2, κ(x) = 2(1 x ) 3, 1/2 x 1, 0, otherwise; 3 Quadratic spectral kernel (Andrews, 1991): κ(x) = 25 ( ) sin(6πx/5) 12π 2 x 2 cos(6πx/5) ; 6πx/5 4 Daniel kernel (Ng and Perron, 1996): κ(x) = sin(πx) πx. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 28 / 85

κ(x) 0.2 0.0 0.2 0.4 0.6 0.8 1.0 Bartlett Parzen Quadratic Spectral Daniell 0 1 2 3 4 x Figure: The Bartlett, Parzen, quandratic spectral and Daniel kernels. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 29 / 85

Remarks: Bandwidth l(t ): It can be of order o(t 1/2 ), Andrews (1991). (What does this imply?) The Bartlett and Parzen kernels have the bounded support [ 1, 1], but the quadratic spectral and Daniel kernels have unbounded support. Andrews (1991): The quadratic spectral kernel is to be preferred in HAC estimation. Rate of convergence: O(T 1/3 ) for the Bartlett kernel, and O(T 2/5 ) for the Parzen and quadratic spectral. The quadratic spectral kernel is more efficient asymptotically than the Parzen kernel, and the Bartlett kernel is the least efficient. The optimal choice of l(t ) is an important issue in practice. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 30 / 85

Wald Test Null hypothesis: Rβ o = r Want to check if Rˆβ T is sufficiently close to r. By Theorem 6.6, (RD o R ) 1/2 T R(ˆβ T β o ) D o = M 1 xx V o M 1 xx. Given a consistent estimator for D o : ( 1 D T = T ) 1 ( T x t x 1 t VT T 1 T x t x t), with V T be a consistent estimator of V o, we have (R D T R ) 1/2 T R(ˆβ T β o ) D N (0, I q ). D N (0, I q ), where C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 31 / 85

The Wald test statistic is W T = T (Rˆβ T r) (R D T R ) 1 (Rˆβ T r). Theorem 6.10 Given y t = x tβ + e t, suppose that [B1](i), [B2] and [B3] hold. Then D under the null, W T χ 2 (q), where q is the number of hypotheses. Data are not required to be serially uncorrelated, homoskedastic, or normally distributed. The limiting χ 2 distribution of the Wald test is only an approximation to the exact distribution. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 32 / 85

Example: Given the specification y t = x 1,t b 1 + x 2,t b 2 + e t, where x 1,t is (k s) 1 and x 2,t is s 1. Hypothesis: Rβ o = 0, where R = [0 s (k s) I s ]. The Wald test statistic is W T = T ˆβ T R ( R D T R ) 1 Rˆβ T D χ 2 (s), where D T = (X X/T ) 1 ˆV T (X X/T ) 1. The exact form of W T depends on D T. When V T = ˆσ 2 T (X X/T ) is consistent for V o, D T = ˆσ 2 T (X X/T ) 1 is consistent for D o, and the Wald statistic becomes W T = T ˆβ T R [ R(X X/T ) 1 R ] 1 Rˆβ T /ˆσ 2 T, which is s times the standard F statistic. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 33 / 85

Lagrange Multiplier (LM) Test Given the constraint Rβ = r, the Lagrangian is 1 T (y Xβ) (y Xβ) + (Rβ r) λ, where λ is the q 1 vector of Lagrange multipliers. The solutions are: λ T = 2 [ R(X X/T ) 1 R ] 1 (Rˆβ T r), β T = ˆβ T (X X/T ) 1 R λ T /2. The LM test checks if λ T (the shadow price of the constraint) is sufficiently close to zero. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 34 / 85

By the asymptotic normality of T (Rˆβ T r), Λ 1/2 o T λ T D N (0, Iq ), where Λ o = 4(RM 1 xx R ) 1 (RD o R )(RM 1 xx R ) 1. Let V T be a consistent estimator of V o based on the constrained estimation result. Then, Λ T = 4 [ R(X X/T ) 1 R ] 1[ R(X X/T ) 1 V T (X X/T ) 1 R ] [ R(X X/T ) 1 R ] 1, and Λ 1/2 T T λ T LM T = T λ 1 T Λ λ T T. D N (0, Iq ). The LM statistic is C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 35 / 85

Theorem 6.12 Given y t = x tβ + e t, suppose that [B1](i), [B2] and [B3] hold. Then D under the null, LM T χ 2 (q), where q is the number of hypotheses. Writing Rˆβ T r = R(X X/T ) 1 X (y X β T )/T = R(X X/T ) 1 X ë/t, λ T = 2 [ R(X X/T ) 1 R ] 1 R(X X/T ) 1 X ë/t. The LM test is then LM T = T ë X(X X) 1 R [ R(X X/T ) 1 VT (X X/T ) 1 R ] 1 R(X X) 1 X ë. That is, the LM test requires only constrained estimation. IP Note: Under the null, W T LM T 0; if V o is known, the Wald and LM tests would be algebraically equivalent. (why?) C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 36 / 85

Example: Testing whether one would like to add additional s regressors to the specification: y t = x 1,t b 1 + e t. The unconstrained specification is y t = x 1,tb 1 + x 2,tb 2 + e t, and the null hypothesis is Rβ o = 0 with R = [0 s (k s) I s ]. The LM test can be computed as in previous page, using the constrained estimator β T = ( b 1,T 0 ) with b 1,T = (X 1X 1 ) 1 X 1y. Letting X = [X 1 X 2 ] and ë = y X 1 b 1,T, suppose that V T = σ T 2 (X X/T ) is consistent for V o under the null, where σ T 2 = T ë2 t /(T k + s). Then, the LM test is LM T = T ë X(X X) 1 R [ R(X X/T ) 1 R ] 1 R(X X) 1 X ë/ σ 2 T. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 37 / 85

Using the formula for the inverse of a partitioned matrix, R(X X) 1 R = [X 2(I P 1 )X 2 ] 1, R(X X) 1 X = [X 2(I P 1 )X 2 ] 1 X 2(I P 1 ). Clearly, (I P 1 )ë = ë. The LM statistic is thus LM T = ë X(X X) 1 R [ R(X X) 1 R ] 1 R(X X) 1 X ë/ σ T 2 = ë (I P 1 )X 2 [X 2(I P 1 )X 2 ] 1 X 2(I P 1 )ë/ σ T 2 = ë X 2 [X 2(I P 1 )X 2 ] 1 X 2ë/ σ2 T = ë X 2 R(X X) 1 R X 2ë/ σ2 T. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 38 / 85

As X 1ë = 0, we can write ë X 2 R = [0 1 (k s) ë X 2 ] = ë X. A simple version of the LM test reads LM T = ë X(X X) 1 X ë ë ë/(t k + s) = (T k + s)r2, where R 2 is the non-centered R 2 of the auxiliary regression of ë on X. Note: The LM test may also be computed as TR 2, if σ 2 T = ë ë/t is an MLE estimator. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 39 / 85

Likelihood Ratio (LR) Test The OLS estimator ˆβ T is also the MLE β T that maximizes L T (β, σ 2 ) = 1 2 log(2π) 1 2 log(σ2 ) 1 T T (y t x tβ) 2 2σ 2. With ê t = y t x t β T, the unconstrained MLE of σ 2 is σ 2 T = 1 T T êt 2. Given Rβ = r, let β T denote the constrained MLE of β. Then ë t = y t x t β T, and the constrained MLE of σ 2 is σ 2 T = 1 T T ët 2. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 40 / 85

For H 0 : Rβ o = r, the LR test compares the constrained and unconstrained L T : LR T = 2T ( ( L T ( β T, σ T 2 ) L T ( β σ T, σ T 2 2 ) )) = T log T σ T 2. The null would be rejected if LR T is far from zero. Theorem 6.15 Given y t = x tβ + e t, suppose that [B1](i), [B2] and [B3] hold and that σ 2 T (X X/T ) is consistent for V o. Then under the null hypothesis, LR T D χ 2 (q), where q is the number of hypotheses. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 41 / 85

Noting ë = X( β T β T ) + ê and X ê = 0, we have σ T 2 = σ2 T + ( β T β T ) (X X/T )( β T β T ). We have seen β T β T = (X X/T ) 1 R [ R(X X/T ) 1 R ] 1 (Rˆβ T r). It follows that and that σ 2 T = σ2 T + (R β T r) [R(X X/T ) 1 R ] 1 (R β T r), LR T = T log ( 1 + (R β T r) [R(X X/T ) 1 R ] 1 (R β T r)/ σ T 2 ). }{{} =: a T C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 42 / 85

Owing to consistency of ˆβ T, a T 0. The mean value expansion of log(1 + a T ) about a T = 0 yields log(1 + a T ) (1 + a T ) 1 a T, where a T lies between a T and 0 and converges to zero. Then, LR T = T (1 + a T ) 1 a T = Ta T + o IP (1), where Ta T is the Wald statistic with V T = σ T 2 (X X/T ). When this V T is consistent for V o, LR T has a limiting χ 2 (q) distribution. Note: The applicability of the LR test here is limited because it can not be made robust to conditional heteroskedasticity and serial correlation. (Why?) C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 43 / 85

Remarks: When the Wald test involves V T = σ 2 T (X X/T ) and the LM test uses V T = σ 2 T (X X/T ), it can be shown that W T LR T LM T. Hence, conflicting inferences in finite samples may arise when the critical values are between two statistics. When V T = σ T 2 (X X/T ) and V T = σ T 2 (X X/T ) are all consistent for V o, the Wald, LM, and LR tests are asymptotically equivalent. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 44 / 85

Power of Tests Consider the alternative hypothesis: Rβ o = r + δ, where δ 0. Under the alternative, T (Rˆβ T r) = T R(ˆβ T β o ) + T δ, where the first term on the RHS converges and the second term diverges. We have IP(W T > c) 1 for any critical value c, because 1 T W T IP δ (RD o R ) 1 δ. The Wald test is therefore a consistent test. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 45 / 85

Example: Analysis of Suicide Rate Part I: Estimation results based on different covariance matrices const D t u t 1 u t 1 D t R2 OLS Coeff. 5.60 0.75 1.93 0.52 0.64 OLS s.e. 2.32 3.00 1.17 1.27 White s.e. 2.23 2.55 0.96 1.04 NW-B s.e. 2.79 3.62 1.09 1.26 (t-ratio) (2.00) ( 0.21) (1.78) (0.42) NW-QS s.e. 2.94 3.98 1.13 1.32 (t-ratio) (1.91) ( 0.19) (1.72) (0.40) FGLS Coeff. 19.14 0.73 0.13 0.10 NW-B and NW-QS stand for the Newey-West estimates based on the Bartlett and quadratic spectral kernels, respectively, with the truncation lag chosen by the package in R; D t = 1 for t > T = 1994. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 46 / 85

Part II: Estimation results based on different covariance matrices const D t u t 1 t td t R2 OLS Coeff. 12.36 15.26 0.38 0.50 1.19 0.91 OLS s.e. 1.05 2.04 0.36 0.08 0.14 White s.e. 0.71 1.41 0.26 0.05 0.09 NW-B s.e. 1.67 14.58 0.86 0.04 0.70 (t-ratio) (7.41) ( 1.05) (0.44) ( 14.10) (1.69) NW-QS s.e. 1.94 17.35 1.01 0.04 0.84 (t-ratio) (6.37) ( 0.88) (0.38) ( 12.43) (1.42) FGLS Coeff. 14.67 18.22 0.23 0.56 1.35 NW-B and NW-QS stand for the Newey-West estimates based on the Bartlett and quadratic spectral kernels, respectively, with the truncation lag chosen by the package in R; D t = 1 for t > T = 1994. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 47 / 85

Instrumental Variable Estimator OLS inconsistency: 1 A model omits relevant regressors. 2 A model includes lagged dependent variables as regressors and serially correlated errors. 3 A model involves regressors that are measured with errors. 4 The dependent variable and regressors are jointly determined at the same time (simultaneity problem). 5 The dependent variable is determined by some unobservable factors which are correlated with regressors (selectivity problem). To obtain consistency, let z t (k 1) be variables taken from (Y t 1, W t ) such that IE(z t ɛ t ) = 0 and z t are correlated with x t in the sense that IE(z t x t) is not singular. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 48 / 85

The sample counterpart of IE(z t ɛ t ) = IE[z t (y t x tβ o )] = 0 is 1 T T [z t (y t x tβ)] = 0, which is a system of k equations with k unknowns. The solution is the instrumental variable (IV) estimator: ( T ) 1 ( T ) ˇβ T = z t x IP t z t y t M 1 zx m zy = β o, under suitable LLN. This is also a method of moment estimator, because it solves the sample counterpart of the moment conditions: IE[z t (y t x tβ o )] = 0. This method breaks down when more than k instruments are available. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 49 / 85

Assume CLT: T 1/2 T z D tɛ t N (0, Vo ) with ( T ) z t ɛ t. V o = lim T var 1 T The normalized IV estimator has asymptotic normality: ( 1 T (ˇβ T β o ) = T ) 1 ( T z t x 1 t T where D o = M 1 zx V o M 1 zx. Then, T (ˇβ T β o ) D 1/2 T estimator for D o. T ) D z t ɛ t N (0, D o ), D N (0, I k ), where D T is a consistent C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 50 / 85

I (1) Variables {y t } is said to be an I (1) (integrated of order 1) process if y t = y t 1 + ɛ t, with ɛ t satisfying: [C1] {ɛ t } is a weakly stationary process with mean zero and variance σ 2 ɛ and obeys an FCLT: 1 σ T [Tr] ɛ t = 1 σ T y [Tr] w(r), 0 r 1, where w is standard Wiener process, and σ 2 is the long-run variance of ɛ t : σ 2 = lim T var ( 1 T T ɛ t ). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 51 / 85

Partial sums of an I (0) series (e.g., t i=1 ɛ i) form an I (1) series, while taking first difference of an I (1) series (e.g., y t y t 1 ) yields an I (0) series. A random walk is I (1) with i.i.d. ɛ t and σ 2 = σ 2 ɛ. When ɛ t = y t y t 1 is a stationary ARMA(p, q) process, y is an I (1) process and known as an ARIMA(p, 1, q) process. An I (1) series y t has mean zero and variance increasing linearly with t, and its autocovariances cov(y t, y s ) do not decrease when t s increases. Many macroeconomic and financial time series are (or behave like) I (1) processes. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 52 / 85

ARIMA vs. ARMA Processes 15 ARIMA ARMA 25 20 ARIMA ARMA 10 15 5 10 0 5 5 0 0 50 100 150 200 250 0 50 100 150 200 250 Figure: Sample paths of ARIMA and ARMA series. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 53 / 85

I (1) vs. Trend Stationarity Trend stationary series: y t = a o + b o t + ɛ t, where ɛ t are I (0). 25 random walk 30 random walk 20 15 10 25 20 15 10 5 5 0 0 0 50 100 150 200 250 0 50 100 150 200 250 Figure: Sample paths of random walk and trend stationary series. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 54 / 85

Autoregression of an I (1) Variable Suppose {y t } is a random walk such that y t = α o y t 1 + ɛ t with α o = 1 and ɛ t i.i.d. random variables with mean zero and variance σ 2 ɛ. {y t } does not obey a LLN, and T t=2 y t 1ɛ t = O IP (T ) and T t=2 y 2 t 1 = O IP(T 2 ). Given the specification: y t = αy t 1 + e t, the OLS estimator of α is: ˆα T = T t=2 y t 1y t T t=2 y 2 t 1 = 1 + T t=2 y t 1ɛ t T t=2 y 2 t 1 = 1 + O IP (T 1 ), which is T -consistent. This is also known as a super consistent estimator. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 55 / 85

Asymptotic Properties of the OLS Estimator Lemma 7.1 Let y t = y t 1 + ɛ t be an I (1) series with ɛ t satisfying [C1]. Then, (i) T 3/2 1 T y t 1 σ w(r) dr; (ii) T 2 T y 2 t 1 σ2 (iii) T 1 T y t 1ɛ t 1 2 [σ2 w(1) 2 σ 2 ɛ ] = σ 2 0 1 0 1 0 w(r) 2 dr; where w is the standard Wiener process. Note: When y t is a random walk, σ 2 = σ 2 ɛ. w(r) dw(r) + 1 2 (σ2 σ 2 ɛ ), C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 56 / 85

Theorem 7.2 Let y t = y t 1 + ɛ t be an I (1) series with ɛ t satisfying [C1]. Given the specification y t = αy t 1 + e t, the normalized OLS estimator of α is: T (ˆα T 1) = T t=2 y 1 t 1ɛ t /T T t=2 y t 1 2 /T 2 2[ w(1) 2 σɛ 2 /σ 2 ] 1 0 w(r)2 dr where w is the standard Wiener process. When y t is a random walk, [ w(1) 2 1 ] T (ˆα T 1) 1 2 1 0 w(r)2 dr, which does not depend on σ 2 ɛ and σ 2 and is asymptotically pivotal.. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 57 / 85

Lemma 7.3 Let y t = y t 1 + ɛ t be an I (1) series with ɛ t satisfying [C1]. Then, (i) T 2 T (y t 1 ȳ 1 ) 2 σ 2 (ii) T 1 T (y t 1 ȳ 1 )ɛ t σ 2 1 0 1 0 w (r) 2 dr; w (r) dw(r) + 1 2 (σ2 σ 2 ɛ ), where w is the standard Wiener process and w (t) = w(t) 1 0 w(r) dr. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 58 / 85

Theorem 7.4 Let y t = y t 1 + ɛ t be an I (1) series with ɛ t satisfying [C1]. Given the specification y t = c + αy t 1 + e t, the normalized OLS estimators of α and c are: T (ˆα T 1) 1 0 w (r) dw(r) + 1 2 (1 σ2 ɛ /σ ) 2 1 0 w (r) 2 dr 1 T ĉt A (σ w(r) dr 0 In particular, when y t is a random walk, ) + σ w(1). =: A, T (ˆα T 1) 1 0 w (r) dw(r) 1 0 w. (r) 2 dr C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 59 / 85

The limiting results for autoregressions with an I (1) variable are not invariant to model specification. All the results here are based on the data with DGP: y t = y t 1 + ɛ t. intercept. These results would break down if the DGP is y t = c o + y t 1 + ɛ t with a non-zero c o ; such series are said to be I (1) with drift. I (1) process with a drift: t y t = c o + y t 1 + ɛ t = c o t + ɛ i, i=1 which contains a deterministic trend and an I (1) series without drift. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 60 / 85

Tests of Unit Root 1 Given the specification y t = αy t 1 + e t, the unit root hypothesis is α o = 1, and a leading unit-root test is the t test: τ 0 = ( T t=2 y t 1 2 ) 1/2(ˆαT 1), ˆσ T,1 where ˆσ 2 T,1 = T t=2 (y t ˆα T y t 1 ) 2 /(T 2). 2 Given the specification y t = c + αy t 1 + e t, a unit-root test is τ c = [ T t=2 (y t 1 ȳ 1 ) 2] 1/2 (ˆαT 1) ˆσ T,2, where ˆσ 2 T,2 = T t=2 (y t ĉ T ˆα T y t 1 ) 2 /(T 3). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 61 / 85

Theorem 7.5 Let y t be generated as a random walk. Then, τ 0 τ c 1 2 [w(1)2 1] [ 1 0 w(r)2 dr ] 1/2, 1 0 w (r) dw(r) [ 1 0 w (r) 2 dr ] 1/2. For the specification with a time trend variable: ( y t = c + αy t 1 + β t T ) + e 2 t, the t-statistic of α o = 1 is denoted as τ t. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 62 / 85

Dickey-Fuller distributions Table: Some percentiles of the Dickey-Fuller distributions. Test 1% 2.5% 5% 10% 50% 90% 95% 97.5% 99% τ 0 2.58 2.23 1.95 1.62 0.51 0.89 1.28 1.62 2.01 τ c 3.42 3.12 2.86 2.57 1.57 0.44 0.08 0.23 0.60 τ t 3.96 3.67 3.41 3.13 2.18 1.25 0.94 0.66 0.32 These distributions are not symmetric about zero and assume more negative values. τ c assumes negatives values about 95% of times, and τ t is virtually a non-positive random variable. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 63 / 85

The Dickey-Fuller Distributions 0.0 0.1 0.2 0.3 0.4 0.5 0.6 τ c τ 0 N (0, 1) 4 2 0 2 4 Figure: The distributions of the Dickey-Fuller τ 0 and τ c tests vs. N (0, 1). 1 C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 64 / 85

Implementation In practice, we estimate one of the following specifications: 1 y t = θy t 1 + e t. 2 y t = c + θy t 1 + e t. 3 y t = c + θy t 1 + β ( t T ) 2 + et. The unit-root hypothesis α o = 1 is now equivalent to θ o = 0. The weak limits of the normalized estimators T ˆθ T are the same as the respective limits of T (ˆα T 1) under the null hypothesis. The unit-root tests are now computed as the t-ratios of these specifications. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 65 / 85

Phillips-Perron Tests Note: The Dickey-Fuller tests check only the random walk hypothesis and are invalid for testing general I (1) processes. Theorem 7.6 Let y t = y t 1 + ɛ t be an I (1) series with ɛ t satisfying [C1]. Then, τ 0 σ σ ɛ τ c σ σ ɛ ( ) 1 2 [w(1)2 σɛ 2 /σ ] 2 [ 1 0 w(r)2 dr ], 1/2 ( 1 0 w (r) dw(r) + 1 2 (1 σ2 ɛ /σ ) 2 [ 1 0 w (r) 2 dr ] 1/2 ), C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 66 / 85

Let ê t denote the OLS residuals and stn 2 a Newey-West type estimator of σ 2 based on ê t : s 2 Tn = 1 T 1 T t=2 êt 2 + 2 T 2 ( s ) T κ T 1 n s=1 t=s+2 with κ a kernel function and n = n(t ) its bandwidth. ê t ê t s, Phillips (1987) proposed the following modified τ 0 and τ c statistics: Z(τ 0 ) = ˆσ 1 T 2 τ s 0 (s2 Tn ˆσ2 T ( ) Tn s T Tn t=2 y t 1 2 /T 2) 1/2, Z(τ c ) = ˆσ 1 T 2 τ s c (s2 T ˆσ2 T [ ) Tn s T Tn t=2 (y t 1 ȳ 1 ) 2] 1/2 ; see also Phillips and Perron (1988). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 67 / 85

The Phillips-Perron tests eliminate the nuisance parameters by suitable transformations of τ 0 and τ c and have the same limits as those of the Dickey-Fuller tests. Corollary 7.7. Let y t = y t 1 + ɛ t be an I (1) series with ɛ t satisfying [C1]. Then, 1 2 [ w(1) 2 1 ] Z(τ 0 ) [ 1 0 w(r)2 dr ] 1/2, 1 0 Z(τ c ) w (r) dw(r) [ 1 0 w (r) 2 dr ] 1/2. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 68 / 85

Augmented Dickey-Fuller (ADF) Tests Said and Dickey (1984) suggest filtering out the correlations in a weakly stationary process by a linear AR model with a proper order. The augmented specifications are: 1 y t = θy t 1 + k j=1 γ j y t j + e t. 2 y t = c + θy t 1 + k j=1 γ j y t j + e t. ( ) 3 y t = c + θy t 1 + β t T 2 + k j=1 γ j y t j + e t. Note: This approach avoids non-parametric kernel estimation of σ 2 but requires choosing a proper lag order k for the augmented specifications (say, by a model selection criteria, such as AIC or SIC). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 69 / 85

KPSS Tests {y t } is trend stationary if it fluctuates around a deterministic trend: y t = a o + b o t + ɛ t, where ɛ t satisfy [C1]. When b o = 0, it is level stationary. Kwiatkowski, Phillips, Schmidt, and Shin (1992) proposed testing stationarity by η T = 1 T 2 s 2 Tn ( T t ) 2 ê i, i=1 where s 2 Tn is a Newey-West estimator of σ2 based on ê t. To test the null of trend stationarity, ê t = y t â T ˆb T t. To test the null of level stationarity, ê t = y t ȳ. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 70 / 85

The partial sums of ê t = y t ȳ are such that [Tr] [Tr] [Tr] ê t = (ɛ t ɛ) = ɛ t [Tr] T Then by a suitable FCLT, T ɛ t, r (0, 1]. 1 σ T [Tr] ê t w(r) rw(1) = w 0 (r). Similarly, given ê t = y t â T ˆb T t, 1 σ T [Tr] ê t w(r) + (2r 3r 2 )w(1) (6r 6r 2 ) 1 0 w(s) ds, which is a tide-down process (it is zero at r = 1 with prob. one). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 71 / 85

Theorem 7.8 Let y t = a o + b o t + ɛ t with ɛ t satisfying [C1]. Then, η T computed from ê t = y t â T ˆb T t is: η T 1 0 f (r) 2 dr, where f (r) = w(r) + (2r 3r 2 )w(1) (6r 6r 2 ) 1 0 w(s) ds. Let y t = a o + ɛ t with ɛ t satisfying [C1]. Then, η T computed from ê t = y t ȳ is: η T 1 0 w 0 (r) 2 dr, where w 0 is the Brownian bridge. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 72 / 85

Table: Some percentiles of the distributions of the KPSS test. Test 1% 2.5% 5% 10% level stationarity 0.739 0.574 0.463 0.347 trend stationarity 0.216 0.176 0.146 0.119 These tests have power against I (1) series because η T would diverge under I (1) alternatives. KPSS tests also have power against other alternatives, such as stationarity with mean changes and trend stationarity with trend breaks. Thus, rejecting the null of stationarity does not imply that the series must be I (1). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 73 / 85

The KPSS Distributions 0 5 10 15 trend level 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Figure: The distributions of the KPSS tests. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 74 / 85

Spurious Regressions Granger and Newbold (1974): Regressing one random walk on the other typically yields a significant t-ratio. They refer to this result as spurious regression. Given the specification y t = α + βx t + e t, let ˆα T and ˆβ T denote the OLS estimators for α and β, respectively, and the corresponding t-ratios: t α = ˆα T /s α and t β = ˆβ T /s β, where s α and s β are the OLS standard errors for ˆα T and ˆβ T. y t = y t 1 + u t and x t = x t 1 + v t, where {u t } and {v t } are mutually independent processes satisfying the following condition. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 75 / 85

[C2] {u t } and {v t } are two weakly stationary processes with mean zero and respective variances σ 2 u and σ 2 v and obey an FCLT with: ( T ) 2 σy 2 1 = lim T T IE u t, ( T ) 2 σx 2 1 = lim T T IE v t. We have the following results: 1 T 3/2 T 1 y t σ y w y (r) dr, 0 1 T 2 T yt 2 σy 2 1 0 w y (r) 2 dr, where w y is a standard Wiener processes. Similarly, 1 T 3/2 T 1 x t σ x w x (r) dr, 0 1 T 2 T xt 2 σx 2 1 0 w x (r) 2 dr. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 76 / 85

We also have 1 T 2 1 T 2 T (y t ȳ) 2 σy 2 T (x t x) 2 σx 2 1 0 1 0 w y (r) 2 dr σ 2 y w x (r) 2 dr σ 2 x ( 1 0 ( 1 0 ) 2 w y (r) dr =: σy 2 m y, ) 2 w x (r) dr =: σxm 2 x, where w y (t) = w y (t) 1 0 w y (r) dr and w x (t) = w x (t) 1 0 w x(r) dr are two mutually independent, de-meaned Wiener processes. Also, 1 T 2 T (y t ȳ)(x t x t ) ( 1 σ y σ x w y (r)w x (r) dr 0 =: σ y σ x m yx. 1 0 w y (r) dr 1 0 ) w x (r) dr C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 77 / 85

Theorem 7.9 Let y t = y t 1 + u t and x t = x t 1 + v t, where {u t } and {v t } are mutually independent and satisfy [C2]. Given the specification y t = α + βx t + e t, (i) ˆβ T σ y m yx σ x m x, ( 1 (ii) T 1/2 ˆα T σ y w y (r) dr m yx 0 m yx (iii) T 1/2 t β (m y m x myx) 2 1/2, 1 m x 1 0 ) w x (r) dr, (iv) T 1/2 t α m x 0 w 1 y (r) dr m yx 0 w x(r) dr [ (my m x myx) 2 1 0 w x(r) 2 dr ] 1/2, w x and w y are two mutually independent, standard Wiener processes. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 78 / 85

While the true parameters should be α o = β o = 0, ˆβ T has a limiting distribution, and ˆα T diverges at the rate T 1/2. Theorem 7.9 (iii) and (iv) indicate that t α and t β both diverge at the rate T 1/2 and are likely to reject the null of α o = β o = 0 using the critical values from the standard normal distribution. Spurious trend: Nelson and Kang (1984) showed that, when {y t } is in fact a random walk, one may easily find significant time trend specification: y t = a + b t + e t. Phillips and Durlauf (1986) demonstrate that the F test (and hence the t-ratio) of b o = 0 in the time trend specification above diverges at the rate T, which explains why an incorrect inference would result. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 79 / 85

Cointegration Consider an equilibrium relation between y and x: ay bx = 0. With real data (y t, x t ), z t := ay t bx t are equilibrium errors because they need not be zero all the time. y t and x t are both I (1): A linear combination of them, z t, is, in general, an I (1) series. Then, {z t } rarely crosses zero, and the equilibrium condition entails little empirical restriction on z t. When y t and x t involve the same random walk q t such that y t = q t + u t and x t = cq t + v t, where {u t } and {v t } are I (0). Then, z t := cy t x t = cu t v t, which is a linear combination of I (0) series and hence is also I (0). C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 80 / 85

Granger (1981), Granger and Weiss (1983), and Engle and Granger (1987): Let y t be a d-dimensional vector I (1) series. The elements of y t are cointegrated if there exists a d 1 vector, α, such that z t = α y t is I (0). We say the elements of y t are CI(1,1). The vector α is a cointegrating vector. The space spanned by linearly independent cointegating vectors is the cointegrating space; the number of linearly independent cointegrating vectors is the cointegrating rank which is the dimension of the cointegrating space. If the cointegrating rank is r, we can put r linearly independent cointegrating vectors together and form the d r matrix A such that z t = A y t is a vector I (0) series. The cointegrating rank is at most d 1. (Why?) C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 81 / 85

Cointegrating Regression Cointegrating regression: y 1,t = α y 2,t + z t. Then, (1 α ) is the cointegrating vector and z t are the regression (equilibrium) errors. When the elements of y t are cointegrated, z t is correlated with y 2,t. Consistency of the OLS estimators do not matter asymptotically, but correlation would result in finite-sample bias and efficiency loss. Efficiency: Saikkonen (1991) proposed a modified co-integrating regression: k y 1,t = α y 2,t + y 2,t jb j + e t, j= k so that the OLS estimator of α is asymptotically efficient. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 82 / 85

Tests of Cointegration One can verify a cointegration relation by applying unit-root tests, such as the augmented Dickey-Fuller test and the Phillips-Perron test, to ẑ t. The null hypothesis that a unit root is present is equivalent to the hypothesis of no cointegration. To implement a unit-root test on cointegration residuals ẑ T, a difficulty is that ẑ T is not a raw series but a result of OLS fitting. Thus, even when z t may be I (1), the residuals ẑ t may not have much variation and hence behave like a stationary series. Engle and Granger (1987), Engle and Yoo (1987), and Davidson and MacKinnon (1993) simulated proper critical values for the unit-root tests on cointegrating residuals. Similar to the unit-root tests discussed earlier, these critical values are all model dependent. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 83 / 85

Table: Some percentiles of the distributions of the cointegration τ c test. d 1% 2.5% 5% 10% 2 3.90 3.59 3.34 3.04 3 4.29 4.00 3.74 3.45 4 4.64 4.35 4.10 3.81 Drawbacks of cointegrating regressions: 1 The choice of the dependent variable is somewhat arbitrary. 2 This approach is more suitable for finding only one cointegrating relationship. One may estimate multiple cointegration relations by a vector regression. It is now typical to adopt the maximum likelihood approach of Johansen (1988) to estimate the cointegrating space directly. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 84 / 85

Error Correction Model (ECM) When the elements of y t are cointegrated with A y t = z t, then there exists an error correction model (ECM): y t = Bz t 1 + C 1 y t 1 + + C k y t k + ν t. Cointegration characterizes the long-run equilibrium relations because it deals with the levels of I (1) variables, and the ECM deals with the differences of variables and describes short-run dynamics. When cointegration exists, a vector AR model of y t is misspecified because it omits z t 1, and the parameter estimates are inconsistent. We regress y t on ẑ t 1 and lagged y t. Here, standard asymptotic theory applies because ECM involves only stationary variables when cointegration exists. C.-M. Kuan (National Taiwan Univ.) Asymptotic Least Squares Theory December 5, 2011 85 / 85