Cambridge Working Paper Economics

Similar documents
Alternative asymptotics for cointegration tests in large VARs.

Darmstadt Discussion Papers in Economics

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

Notes on Time Series Modeling

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Fin. Econometrics / 31

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Cointegration Lecture I: Introduction

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

Vector error correction model, VECM Cointegrated VAR

arxiv: v5 [math.na] 16 Nov 2017

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Unit Roots in White Noise?!

The properties of L p -GMM estimators

Modelling of Economic Time Series and the Method of Cointegration

Title. Description. var intro Introduction to vector autoregressive models

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω)

Nonstationary Panels

(Y jz) t (XjZ) 0 t = S yx S yz S 1. S yx:z = T 1. etc. 2. Next solve the eigenvalue problem. js xx:z S xy:z S 1

ECON 4160, Lecture 11 and 12

Zhaoxing Gao and Ruey S Tsay Booth School of Business, University of Chicago. August 23, 2018

11. Further Issues in Using OLS with TS Data

New Introduction to Multiple Time Series Analysis

ECON 4160, Spring term Lecture 12

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Econometric Methods for Panel Data

Cointegrated VARIMA models: specification and. simulation

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression

On the Error Correction Model for Functional Time Series with Unit Roots

Random regular digraphs: singularity and spectrum

Financial Econometrics

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Linear Algebra 2 Spectral Notes

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Multivariate Time Series

Understanding Regressions with Observations Collected at High Frequency over Long Span

Introduction to Algorithmic Trading Strategies Lecture 3

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

Convergence of the Ensemble Kalman Filter in Hilbert Space

An estimate of the long-run covariance matrix, Ω, is necessary to calculate asymptotic

NUCLEAR NORM PENALIZED ESTIMATION OF INTERACTIVE FIXED EFFECT MODELS. Incomplete and Work in Progress. 1. Introduction

Numerically Stable Cointegration Analysis

Stock Prices, News, and Economic Fluctuations: Comment

MATH 205C: STATIONARY PHASE LEMMA

Robust Unit Root and Cointegration Rank Tests for Panels and Large Systems *

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

A fast randomized algorithm for overdetermined linear least-squares regression

A distance measure between cointegration spaces

A Test of Cointegration Rank Based Title Component Analysis.

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: Part 4

University of Pavia. M Estimators. Eduardo Rossi

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

4.1 Order Specification

Bootstrapping the Grainger Causality Test With Integrated Data

Inference For High Dimensional M-estimates. Fixed Design Results

Parameterized Expectations Algorithm

1 Math 241A-B Homework Problem List for F2015 and W2016

On detection of unit roots generalizing the classic Dickey-Fuller approach

Econ 623 Econometrics II Topic 2: Stationary Time Series

Dynamic Discrete Choice Structural Models in Empirical IO

Proof. We indicate by α, β (finite or not) the end-points of I and call

Forecasting 1 to h steps ahead using partial least squares

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

A Primer on Asymptotics

Population Growth and Economic Development: Test for Causality

Volume 03, Issue 6. Comparison of Panel Cointegration Tests

Financial Econometrics

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

Robust Backtesting Tests for Value-at-Risk Models

U.C. Berkeley Better-than-Worst-Case Analysis Handout 3 Luca Trevisan May 24, 2018

Journal of Multivariate Analysis. Sphericity test in a GMANOVA MANOVA model with normal error

EC821: Time Series Econometrics, Spring 2003 Notes Section 9 Panel Unit Root Tests Avariety of procedures for the analysis of unit roots in a panel

On the robustness of cointegration tests when series are fractionally integrated

Scientific Computing

Title. Description. Quick start. Menu. stata.com. xtcointtest Panel-data cointegration tests

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

More Empirical Process Theory

Numerical Distribution Functions of. Likelihood Ratio Tests for Cointegration. James G. MacKinnon. Alfred A. Haug. Leo Michelis

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

Random Bernstein-Markov factors

Purchasing power parity: A nonlinear multivariate perspective. Abstract

Bootstrap prediction intervals for factor models

This chapter reviews properties of regression estimators and test statistics based on

SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 1 Introduction to Linear Algebra

Comparing Forecast Accuracy of Different Models for Prices of Metal Commodities

On Bootstrap Implementation of Likelihood Ratio Test for a Unit Root

Lecture 5: January 30

Volatility. Gerald P. Dwyer. February Clemson University

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Taking a New Contour: A Novel View on Unit Root Test 1

1. Stochastic Processes and Stationarity

Forecasting Levels of log Variables in Vector Autoregressions

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Empirical Macroeconomics

Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity

Transcription:

Faculty of Economics Cambridge Working Paper Economics Cambridge Working Paper Economics: 85 ETREME CANONICAL CORRELATIONS AND HIGH-DIMENSIONAL COINTEGRATION ANALYSIS Alexei Onatski (University of Cambridge) Chen Wang (University of Cambridge) 25 January 28 The simplest version of Johansen s (988) trace test for cointegration is based on the squared sample canonical correlations between a random walk and its own innovations. Onatski and Wang (27) show that the empirical distribution of such squared canonical correlations weakly converges to the Wachter distribution as the sample size and the dimensionality of the random walk go to infinity proportionally. In this paper we prove that, in addition, the extreme squared correlations almost surely converge to the upper and lower boundaries of the support of the Wachter distribution. This result yields strong laws of large numbers for the averages of functions of the squared canonical correlations that may be discontinuous or unbounded outside the support of the Wachter distribution. In particular, we establish the a.s. limit of the scaled Johansen s trace statistic, which has a logarithmic singularity at unity. We use this limit to derive a previously unknown analytic expression for the Bartlett-type correction coefficient for Johansen s test in a high-dimensional environment.

Extreme canonical correlations and high-dimensional cointegration analysis Alexei Onatski and Chen Wang Faculty of Economics, University of Cambridge August 5, 27 Abstract The simplest version of Johansen s (988) trace test for cointegration is based on the squared sample canonical correlations between a random walk and its own innovations. Onatski and Wang (27) show that the empirical distribution of such squared canonical correlations weakly converges to the Wachter distribution as the sample size and the dimensionality of the random walk go to infinity proportionally. In this paper we prove that, in addition, the extreme squared correlations almost surely converge to the upper and lower boundaries of the support of the Wachter distribution. This result yields strong laws of large numbers for the averages of functions of the squared canonical correlations that may be discontinuous or unbounded outside the support of the Wachter distribution. In particular, we establish the a.s. limit of the scaled Johansen s trace statistic, which has a logarithmic singularity at unity. We use this limit to derive a previously unknown analytic expression for the Bartlett-type correction coeffi cient for Johansen s test in a high-dimensional environment. K : High-dimensional random walk, cointegration, extreme canonical correlations, Wachter distribution, trace statistic. Introduction and the main result Analysis of cointegration between a large number of time series is a challenging but useful exercise. Its applications include high-dimensional vector error correction modelling for forecasting purposes (Engel et al. (25)), inference in nonstationary

panel data models (Baneree et al. (24), Pedroni et al. (25)), and verification of the assumptions under which composite commodity price indexes satisfy microeconomic laws of demand (Lewbel (993), Brown (23)). With increasing availability of large datasets, the needs for high-dimensional cointegration research will multiply. A central role in the likelihood-based cointegration analysis is played by the squared sample canonical correlation coeffi cients between a simple transformation of the levels and the first differences of the data. This paper and its companion Onatski and Wang (27) study such canonical correlations under the simultaneous asymptotic regime, where the dimensionality of the data goes to infinity proportionally to the sample size. Onatski and Wang (27) (OW7) show that the empirical distribution of the squared sample canonical correlations weakly converges to the so-called Wachter distribution. They use this result to explain the severe over-reection of the no cointegration hypothesis when the dimensionality of the data is relatively large. In this paper, we show that the extreme squared canonical correlations almost surely (a.s.) converge to the upper and lower boundaries of the support of the Wachter distribution. Our finding yields strong laws of large numbers for the averages of functions of the squared canonical correlations that may be discontinuous or unbounded outside an open interval containing the support of the Wachter distribution. In particular, we establish the a.s. limit of the scaled Johansen s (988) trace statistic, which has a logarithmic singularity at unity. We use this limit to derive an explicit expression for the Bartlett-type correction coeffi cient for Johansen s test. Such an expression was previously unknown, and the value of the coeffi cient had to be obtained numerically (see Johansen et al. (25)). Our setting can be described in the context of the likelihood ratio testing for no cointegration in the model t = Π ( t tˆρ ) + γ + η t, () where t, t =,..., T +, are p-dimensional data, t = t t with =, η t are i.i.d. N (, Σ) vectors, and ˆρ = T + / (T + ). This model is similar to Johansen s (995, eq. 5.4) model H : t = Π ( t tρ ) + γ + η t, (2) 2

where the deterministic trend is introduced so that there is no quadratic trend in t. In () ρ is replaced by a preliminary estimate ˆρ. Such a replacement yields the simultaneous diagonalizability of matrices used in the computation of the squared canonical correlations, which makes our theoretical analysis possible. We explain this in more detail in Section 5. As is well known, the LR statistic for testing the null hypothesis that Π = against Π equals p LR = (T + ) ln ( λ p ), (3) where λ p is the -th largest squared sample canonical correlation between demeaned vectors t and t tˆρ. In what follows, we will always assume that the null hypothesis holds so that the true value of Π is zero. In addition, we will assume that the true value of γ in the data generating process () is zero as well. Note that demeaning t tˆρ and t (t ) ˆρ yields the same result. On the other hand, t tˆρ is a p-dimensional random walk detrended so that its last values are tied down to zero. Hence, λ p can be interpreted as the squared sample canonical correlations between a lagged detrended and demeaned random walk and its demeaned innovations. Consider the simultaneous asymptotic regime where p, T so that p/t c. We abbreviate such a regime as p, T c. Without loss of generality, we assume that p is strictly increasing along the sequence, so that T can be viewed as a function of p. OW7 shows that as p, T c with c (, ], the empirical distribution of λ p... λ pp, F p (λ) p = p {λ pi λ}, i= a.s. weakly converges to the Wachter distribution W c with an atom of size max {, 2 /c } at unity, and density f (λ; c ) = + c (b+ λ) (λ b ) (4) 2πc λ ( λ) OW7 establishes the weak convergence F p (λ) W c (λ) both for Gaussian and non- Gaussian η. When η is non-gaussian and has two finite moments, OW7 establishes the weak convergence in probability. When η is Gaussian, the convergence is a.s. 3

supported on [b, b + ] (, ], where b ± = c ( 2 c ) 2. The main result of this paper strengthens OW7 s finding as follows. Theorem For c (, /2), λ p a.s. b + and λ pp a.s. b as p, T c. For c (, /2), Theorem implies that no squared canonical correlations lie outside any open interval covering [b, b + ] for suffi ciently large p, a.s. Since F p a.s. weakly converges to W c, this implies that any function f( ) that is continuous and bounded on the open interval covering [b, b + ], but may have discontinuities or other singularities outside that interval, satisfies the strong law of large numbers p p = f (λ p ) a.s. f (λ) dw c (λ) as p, T c. In particular, the likelihood ratio statistic (3), although defined in terms of an unbounded function ln ( λ), a.s. converges to a constant because its singularity lies outside [b, b + ] for c (, /2). 2 Corollary 2 Suppose that c (, /2). Then as p, T c where, LR/p 2 a.s. LR c, LR c = + c c 2 ln ( + c ) c c 2 ln ( c ) + 2c c 2 ln ( 2c ). Proof: OW7 shows that the expression on the right hand side of the above display equals ln ( λ) dw c (λ). Since by Theorem, λ p a.s. remains bounded away from unity, the a.s. weak convergence of F p to W c is the a.s. limit of LR/p 2. implies that this integral In the next section we use Corollary 2 to derive a previously unknown explicit expression for the Bartlett-type correction coeffi cient for Johansen s trace test. In Section 3, we describe the setup for the proof of Theorem. Section 4 contains the proof. In Section 5 we discuss reasons for working with model () rather than (2), and derive some results for (2). Section 6 discusses directions for future work and concludes. All technical proofs are given in the Supplementary Material (SM). 2 For c > /2, λ p equals with probability. Therefore, LR statistic is not well defined. For c = /2, b + = so that the singularity of ln ( λ) lies at the upper boundary of the support of W c. 4

2 Bartlett-type correction The standard Johansen s LR test is based on the asymptotic critical values that assume that p is fixed whereas T. As is well known, the test performs poorly in finite samples where p is moderately large. Even relatively small p s, such as five or six, lead to substantial over-reection of the null hypothesis (see Ho and Sorensen (996) and Gonzalo and Pitarakis (995, 999)). One of the partial solutions to the over-reection problem is the Bartlett correction of the LR statistic (see Johansen (22)). The idea is to scale the statistic so that its finite sample distribution better fits the asymptotic distribution of the unscaled statistic. Specifically, let E p, be the mean of the asymptotic distribution under the fixed-p, large-t asymptotic regime. Then, if the finite sample mean, E p,t, satisfies E p,t = E p, ( + a(p)/t + o (/T )), (5) the scaled statistic is defined as LR/ ( + a(p)/t ). By construction, the fit between the scaled mean and the original asymptotic mean is improved by an order of magnitude. Although, as shown by Jensen and Wood (997) in the context of unit root testing, the fit between higher moments does not improve by an order of magnitude, it may become substantially better (see Nielsen (997)). Theoretical analysis of the adustment factor + a(p)/t is diffi cult. The exact expression for a(p) is known only for p = (see Larsson (998)). Therefore, Johansen (22) proposes to approximate the Bartlett correction factor BC p,t E p,t /E p, numerically. Here, we propose an alternative correction factor, equal to the ratio of the limits of LR/p 2 under the simultaneous asymptotics p, T c and under the sequential asymptotics, where first T and then p goes to infinity. Monte Carlo analysis in OW7 suggests that the simultaneous asymptotic limit LR c, derived in Corollary 2, provides a very good centering point for LR/p 2, for moderately large p. From a theoretical perspective, this can be explained by the fact that, in contrast to the standard asymptotics, the simultaneous asymptotics does not neglect terms (p/t ) of relatively high order, which results in an improved approximation quality. The sequential asymptotic limit is derived in the following Theorem (see SM for a proof). Theorem 3 Suppose that c (, /2). Then, as first T and then p go to infinity, LR/p 2 2 in probability. This theorem and Corollary 2 yield the following analytic expression for the 5

proposed Bartlett-type correction factor where c p/t. BC p,t = + c 2c 2 ln ( + c) c 2c 2 ln ( c) + 2c 2c 2 ln ( 2c), (6) It is interesting to compare BC p,t to the numerical approximation to BC p,t E p,t /E p,, obtained in Johansen et al. (25). That paper simulates BC p,t various values of p and T 3 and fits a function of the form BC p,t = exp { a c + a 2 c 2 + [ a 3 c 2 + b ] /T } for to the obtained results. For relatively large values of T, the term [a 3 c 2 + b] /T in the above expression is small. When it is ignored, the fitted function becomes particularly simple: BC p,t = exp {.549c +.552c 2}. Figure superimposes the graphs of BC p,t and BC p,t as functions of c. For c.3, there is a strikingly good fit between the two curves, with the maximum distance between them.67. For c >.3 the quality of the fit quickly deteriorates. This can be explained by the fact that all (p, T )-pairs used in Johansen et al s (25) simulations are such that c <.3, so their numerical approximation does not cover cases with c >.3. To the best of our knowledge, analytical expressions, such as (6), for the Bartlett-type correction factors were previously unavailable. Although the expression is not simple, it certainly is elementary, and easy to compute and analyze. Since the expression is analytic, it does not depend on details of any numerical experiments, and the range of its applicability covers all c < /2. 3 Setup In this section, we introduce the setup for the proof of Theorem. Let, and η be p (T + ) matrices with columns t, t tˆρ, and η t, respectively. Further, let l be a (T +)-vector of ones, M l = I T + ll / (T + ) be the proection on the space orthogonal to l, and let U be the (T + ) (T + ) upper triangular matrix with ones above the main diagonal and zeros on the diagonal. Then under 6

Figure : Bartlett correction factors as functions of p/t. Solid line: the factor based on the ratio of the simultaneous and seqeuntial limits of LR/p 2. Dashed line: numerical approximation from Johansen et al. (25). the null hypothesis M l = ηm l and M l = ηm l UM l, (7) where the second equality is derived as follows. Let τ = (, 2,..., T + ). Note that τ = l U + l and ˆρ = γ + ηl/ (T + ). Therefore, M l = (ηu ˆρ τ ) M l = ηum l T + ηll UM l = ηm l UM l. Equations (7) imply that the squared sample canonical correlations λ p, =,..., p, between demeaned t and demeaned t tˆρ can be interpreted as the eigenvalues of the product P P 2, where P and P 2 are proections on the column spaces of M l U M l η and M l η, respectively. Clearly, λ p s are invariant with respect to right-multiplication of η by any invertible matrix. Hence, without loss of generality, we will assume that η t are i.i.d. N (, I p ) vectors. An equivalent interpretation of λ p, =,..., p, views them as the eigenvalues of matrix S S S S, where S = S and S = ηm l U M l η, S = ηm l UM l U M l η, S = ηm l η. (8) 7

As shown in OW7, M l U M l, M l UM l U M l and M l are circulant matrices, that is, their (i, )-th and (i 2, 2 )-th elements are equal as long as i equals i 2 2 modulo T +. As is well known (e.g. Golub and Van Loan (996), ch. 4.7.7), circulant matrices are simultaneously diagonalizable. Precisely, if V is a (T + ) (T + ) circulant matrix with the first column v, then V = F diag (Fv) F/ (T + ), where F is the Discrete Fourier Transform matrix with elements F st = exp { i2π (s ) (t ) / (T + )}, and the superscript denotes transposition and complex conugation. This yields the following lemma. Lemma 4 Let ω s = 2πs/ (T + ) and ˆ = diag { (e iω ),..., ( e iω T ) }. (9) Further, let ˆη = ηf be a p (T + ) matrix whose rows are the discrete Fourier transforms at frequencies, ω,..., ω T of the rows of η, and let ˆη be the p T matrix obtained from ˆη by removing its first column, corresponding to zero frequency. Then S = ˆη ˆ ˆη / (T + ), S = ˆη ˆ ˆ ˆη / (T + ), S = ˆη ˆ ˆη / (T + ), and S = ˆη ˆη / (T + ). The diagonal of ˆ consists of the reciprocals of the values of the transfer function (see e.g. Brillinger (98) ch. 2.7) of the leaded first-difference filter t t () at frequencies ω s, s. Hence λ p can also be viewed as the sample squared canonical correlations between discrete Fourier transforms of ˆη t and their products with the inverse of the transfer function of filter (). This yields a convenient frequency domain interpretation of Johansen s (99) trace statistic (3). The strongly serially dependent time domain series t tˆρ are replaced by heteroskedastic frequency domain series ( ) e iωs ˆηs with ˆη s independent from ˆη s2 as long as s + s 2 T +. 8

Below we will work with real-valued sin and cos Fourier transforms of η. In addition, we will interchange the order of frequencies so that ω s and ω s2 with s + s 2 = T + become adacent pairs. Specifically, let T be even (the case of odd T can be analyzed similarly), let P = {p st } be a T T permutation matrix with elements if s =,..., T/2 and t = 2s p st = if s = T/2 +,..., T and t = 2 (T s + ) otherwise, and let W = I T/2 ( / 2 / 2 i/ 2 i/ 2 ), where denotes the Kronecker product. Further, let ε = ˆη P W / T (T + ) and = diag {,..., T/2 } with = 2 A direct calculation shows that ( cot (ω /2) cot (ω /2) { } = = diag r I 2,..., r T/2 I 2 ) with r = 4 sin 2 (ω /2). Lemma 5 The columns of ε are i.i.d. N (, I p /T ) vectors. Matrix S S S S equals CD C A where. C = ε ε, D = ε ε, and A = εε. This lemma yields yet another interpretation of λ p, =,..., p. They can be thought of as the eigenvalues of matrix CD C A (ε ε ) (ε ε ) (ε ε ) (εε ). The convenience of this interpretation stems from the block-diagonality of and the diagonality of. Let ε () be a p 2 matrix that consists of the (2 )-th and the 2-th columns of ε. In particular, ε = [ ] ε (),..., ε (T/2). The key advantage of studying C, D, A as opposed to S, S, and S is that C, D, A can be represented as sums of 9

independent components of rank two. Specifically, C = ε () ε (), D = r ε () ε (), and A = ε () ε (). OW7 exploits these representations to derive the limit of the empirical distribution F p of the eigenvalues of CD C A. That paper proves the convergence of F p to W c by establishing convergence of the Stieltes transform of F p, defined as m p (z) (λ z) df p (λ) = tr ( CD C A zi p ) /p. Our proof of Theorem relies on some of the results of OW7. Therefore, to complete the setup of the analysis below, we now briefly outline the relevant findings of that paper. The first step in OW7 s derivations is using the Sherman-Morrison-Woodbury formula for the inverse of a perturbed matrix V (V + W Y ) = V V ( W + Y V ) Y V to derive identities where m p (z) = T p T p + zm p(z) = T p + zm p (z) = T p Ω (q) = p The 2 2 matrices v (q) z p z p z p T/2 = T/2 = T/2 = T/2 = ( z tr ( z) 2 tr ( [I2, r ( z) 2 tr ( [I2, r z ( z) 2 tr ( [I2, r z [, I 2 ] Ω (q) ( Ω (q) p (z) = I z 2 + v (q) (z) r z + u (q) (z) v (q) (z), u (q) ] (q) [ ) Ω I2, r ], () ] (q) [ ) Ω I2, zr ],(2) ] (q) [ ) Ω I2, r ], (3) [ I2, r ] ), (4) ) r z + u (q) (z) r z I z 2 + zṽ (q). (5) (z) u (q) (z), and ṽ (q) ṽ (q) (z) are defined as

follows. Let Then, A = A ε () ε (), C = C ε () ε (), D = D r ε () ε (), M = C D C za, and M = C A C zd. v (q) = ε ()M ε (), u (q) = ε ()D C M ε (), and ṽ (q) = ε () M ε (). The entries of these matrices are quadratic forms in the columns of ε (). In what follows, we use superscript (q) to denote matrices that involve quadratic forms in the columns of ε () to distinguish them from similarly defined matrices that do not involve such quadratic forms. The next step in OW7 is to replace Ω (q) which is obtained from Ω (q) by replacing v (q) v p (z)i 2, u p (z)i 2, and ṽ p (z)i 2, respectively, where in equations (-4) by matrix Ω, (z), u (q) (z), and ṽ (q) (z) in (5) with v p (z) = tr ( M ) /T, u p (z) = tr ( D C M ) ( /T, and ṽ p (z) = tr M ) /T. Here M = CD C za and M = C A C zd. To simplify notation, we will suppress the dependence of v p (z), u p (z), and ṽ p (z) on p and z. It is straightforward to verify that matrix Ω has the following explicit form Ω = z δ ( z z I 2 + zṽi 2 z ui 2 z ui 2 z 2 + vi 2 ), (6) where δ = zṽ ( + v zv) + r (u + zv ) ( z) u 2. (7)

Taking traces in equations (-4), after replacing Ω (q) by Ω, yields equations m p (z) = c + zm p(z) = + zm p (z) = = 2 ct c ( z) 2 ct c ( z) 2 ct c ( z) 2 ct T/2 = T/2 = T/2 = T/2 = zṽ + r (u + v ) ( z) δ + e (z), (8) zṽ + r z (u + zv ) ( z) δ + e 2 (z), (9) zṽ + r (u ( + z) /2 + zv ) ( z) δ + e 3 (z),(2) u r v/2 δ + e 4 (z), (2) where e k (z), k =,..., 4, are the approximation errors due to replacing Ω (q) by Ω. Specifically, e (z) = p e 2 (z) = p e 3 (z) = p e 4 (z) = p T/2 = T/2 = T/2 = T/2 = ( z) 2 tr ( [I2, r ( z) 2 tr ( [I2, r z ( z) 2 tr ( [I2, r z ( ( z tr [, I 2 ] ] ( Ω Ω (q) ] ( Ω Ω (q) ] ( Ω Ω (q) Ω Ω (q) ) [I2, r ) [I2 ] ), r, (22) ) [I2, zr ) [I2, r ] ), (23) ] ), (24) ] ). (25) Finally, OW7 shows that the errors e k (z), k =,..., 4, converge to zero pointwise over z from a compact subset of the upper half of the complex plane, C +. This allows OW7 to argue that m p (z) converges to m (z) uniformly over this compact subset, where m (z) satisfies the limiting version of system (8-2) that sets e k (z) to zeros. Solving the limiting system, OW7 shows that m (z) is the Stieltes transform of W c, which yields the convergence of F p to W c. Our proof of Theorem starts from the system (8-2). It amounts to establishing fast convergence of the errors e k (z), k =,..., 4, to zero as z runs over a sequence z p with Im z p and Re z p bounded away from the support of the Wachter distribution W c. 2

4 Proof of Theorem 4. Outline of the proof The general strategy of our proof is similar to that used in Bai and Silverstein s (998) (BS98) study of the asymptotic behavior of the extreme eigenvalues of sample covariance matrices. The main ideas are as follows. Consider a sequence {z p } such that (s.t.) x p Re z p [, ] and y p Im z p = y p α (26) with α and y (, ] that are independent from p. We study the behavior of m p (z p ) as p, T c. Let m (z) be the Stieltes transform of W c, where W c is obtained from the limiting distribution W c by replacing c with c p/t. Consider an interval [a, b] outside the supports of W c and W c for all large p. Since F p consists of masses /p at λ p, and since W c ([a, b]) =, we have the following decompositon Im (m p (z p ) m (z p )) = λ p [a,b] y p y p d(f p (λ) W c (λ)) p (λ p x p ) 2 + + yp 2 [a,b] c (λ x p ) 2. + yp 2 (27) The existence of λ p [a, b] puts an upper bound on the speed of convergence sup m p (z p ) m (z p ) (28) x p [a,b] that is linked to the speed of convergence y p + via the first term on the right hand side of (27). Proving that convergence (28) is faster than that bound shows that there are no λ p in [a, b] for all suffi ciently large p. The analysis of the speed of convergence of (28) is done in several steps.. We show that the expected number of eigenvalues in [a, b] cannot grow faster than p β with β < as p. 2. We use. to derive an upper bound on the speed of convergence m p (z p ) Em p (z p ) of the stochastic part of m p (z p ) m (z p ). 3. We derive an upper bound on the speed of convergence Em p (z p ) m (z p ) of the deterministic part of m p (z p ) m (z p ), and combine the results. 3

An implementation of these three steps requires a non-trivial extension of BS98. The fact that we have to deal with the product of four dependent stochastic matrices, CD C A, presents substantial challenges, relative to the case of a sample covariance matrix, that we overcome. The key is to establish fast convergence of the errors e k (z p ) defined in (22-25) to zero, which requires detailed analysis of matrices Ω, Ω (q), and their difference Ω Ω (q). 4.2 Step : Speed of convergence of EF p ([a, b]) 4.2. Rough bounds on the approximation errors To establish bounds on the approximation errors e k (z p ), k =,.., 4, we will use the identity Ω Ω (q) ( = Ω (q) (Ω (q) ) ) Ω Ω. (29) Definition 6 (Tao and Vu (2)) Let E be an event depending on p. Then E holds with overwhelming probability (w.ow.p.) if Pr (E) O C ( p C ) for every constant C >. Here O C ( p C ) denotes a quantity that is smaller than Bp C with constant B that may depend on C. Lemma 7 Suppose that z = z p. Then for any (C, d, γ) (, ) (, ) [, /2) and any α [, α γd ) with α γd (/2 γ) / ( + d), inequality holds w.ow.p. max Ω =,...,T/2 (Ω (q) ) < Cp γ yp d To prove the lemma, we use the convergence of quadratic forms ξ pw p ξ p in Gaussian vectors ξ p and the fact that the entries of (Ω (q) ) are such forms whereas the entries of Ω are the points of concentration of these forms (see SM). Since y p = y p α, the upper bound Cp γ yp d on Ω (Ω (q) ) converges to zero as fast as p αd γ. The rate αd + γ of such a convergence can be made arbitrarily close to /2 by choosing α suffi ciently close to α γd, choosing d suffi ciently large, and/or choosing γ suffi ciently close to /2. However, faster convergence rates for the bound are achieved at the expense of slower convergence of y p to zero. The reason for such a trade-off is that the convergence of ξ pw p ξ p is slowed down by large W p, and quadratic forms appearing in the entries of (Ω (q) ) have W p that are proportional to yp = y p α. 4

If we set α =, y p does not converge to zero as p. However, since in such a case γ can be chosen arbitrarily close to /2, the upper bound on Ω (Ω (q) ) derived by the lemma still may converge to zero at the rate arbitrarily close to /2. Lemma 8 (i) For any α [, /2) there exists C > such that w.ow.p. max =,...,T/2 Ω(q) Cyp 5 and max Ω Cy 5 =,...,T/2 (ii) For any α [, /6) and any ρ >, there exists C > such that ( ) E max =,...,T/2 Ω(q) ρ Cyp 5ρ. The constant C in Lemma 8 does not need to coincide with that in Lemma 7. In what follows, C denotes a constant whose value can change from one appearance to another. Identity (29), Lemmas 7-8, and the fact that z p 2 yp 2 that for any C (, ), d [5, ), and γ [, /2), z p 2 max =,...,T/2 p. imply Ω Ω (q) Cp γ yp d 2 w.ow.p. (3) as long as α < α γd. The requirement d 5 ensures that α γd /2 so that Lemma 8 applies. Combining (3) with equation (22) yields e (z p ) Cp γ y d 2 p w.ow.p. Similar inequalities hold for e k (z p ), k = 2, 3, 4. Hence, we have the following lemma. Lemma 9 For any (C, d, γ) (, ) [5, ) [, /2), any α [, α γd ), and any k =,..., 4, e k (z p ) Cp γ y d 2 p 4.2.2 System reduction w.ow.p. In the SM, we show that system of equations (8-2) can be reduced to the following simple form ṽ + 2u = ẽ, zv + u + c/ ( c) = ẽ 2, m v ( c) /c = ẽ 3, m 2 cz ( z) m (c z + cz) + = ẽ 4. 5 (3)

The transformed errors ẽ k are non-linear functions of the original errors e k and of the variables ṽ, u, v, and m (we suppress the dependence of all these quantities on p and z for the brevity of notations). We use bounds on these variables and Lemma 9 to derive the following result. Lemma For any (C, d, γ) (, ) [3, ) [, /2), any α [, α γd ), and any k =,..., 4, ẽ k Cy d 42 p w.ow.p. 4.2.3 Analysis of m m Let us define m m (z) as the solution of equation m 2 cz ( z) m (c z + cz) + = equal to m = c z + cz + (c z + cz) 2 4cz ( z), (32) 2cz ( z) where the branch of the square root, with the cut along the positive real semiaxis, is chosen so that the square root has positive imaginary part. It follows from e.g. Theorem.6 of Bai et al. (25) that such m is the Stieltes transform of the Wachter distribution W c with density f (λ; c) = + c (b+ λ) (λ b ) 2πcλ ( λ) supported on [b, b + ] (, ], where b ± = c ( 2 c ) 2. Note that the expression under the square root in (32) can be factorized as (c z + cz) 2 4cz ( z) = ( + c) 2 (z b + ) (z b ) (33) Since the linear factors z b + and z b cannot be simultaneously small, (33) implies a useful inequality (c zp + cz p ) 2 4cz p ( z p ) > Cyp (34) for some C > and all suffi ciently large p. 6

From the last equation of system (3), we have m = c z + cz + (c z + cz) 2 4cz ( z) + 4ẽ 4 cz ( z), 2cz ( z) which differs from (32) only by the term 4ẽ 4 cz ( z) under the square root. By Lemma and inequality (34), when z = z p, this term can be made negligible relative to the rest of the expression under the square root by choosing d 42. 3 Then, the difference m m is of order ẽ 4 / (c z p + cz p ) 2 4cz p ( z p ). In the SM, we use this fact to prove the following lemma. Lemma For any α [, /9), l >, and ρ 8l, there exists a constant C that may depend on α, l, and ρ s.t. for any ɛ > Pr ( y p sup x p [,] m (z p ) m (z p ) > ɛ ) Cɛ ρ p l. The inequality established in Lemma is analogous to inequality (3.23) in BS98. In the SM, we use BS98 s argument leading from (3.23) to (3.28) to obtain a bound on EF p ([a, b]). Let E denote the unconditional expectation and E k denote conditional expectation given ε (),..., ε (k). Proposition 2 Let [a, b] be an interval that lies outside the supports of W c and W c for all suffi ciently large p. We have max E k (F p ([a, b])) 2 ( = o ) a.s. p 2/9 and k=,...,t/2 For future reference, we similarly have max E ( kf p ([a, b]) = o ) a.s. p /9. k=,...,t/2 max E k (F p ([a, b ])) 2 ( = o ) a.s. p 2/9 and (35) k=,...,t/2 max E ( kf p ([a, b ]) = o ) a.s. p /9, k=,...,t/2 where [a, b ] = [a ɛ, b + ɛ] with ɛ such that [a 2ɛ, b + 2ɛ] lies outside the support 3 Even the choice d = 42 and γ = would lead to the negligibility of ẽ 4 because the constant C in Lemma can be chosen at will, that is, arbitrarily small. 7

of W c. Indeed, for all suffi ciently large p, [a, b ] lies outside the supports of both W c and W c so that the requirement of Proposition 2 is met. 4.3 Step 2: Convergence of m Em We now consider behavior of m Em m p (z p ) Em p (z p ) along the sequence z p x p + iy p with y p = y p α, α = /456, and y R + an arbitrary fixed positive real number. We will show that, for such a choice of α, sup py p m p (z p ) Em p (z p ) a.s.. x p [a,b] Since ( ) ( ) m p x () + iy p mp x (2) + iy p x () x (2) yp 2, it is suffi cient to show that max xp Sp py p m p (z p ) Em p (z p ) a.s., where S p is the set of p 2 points uniformly spaced on [a, b]. We use the following key representation of m Em in the form of a sum of the martingale difference sequence T/2 m Em = E m E m. = As shown in the SM, this representation can be rewritten in the following form m Em = p T/2 = ( (E E ) tr Γ (q) Ω (q) ), (36) where Ω (q) with is as defined in (5) above and Γ (q) = ( z v(q) z u(q) a (q) b (q) z v(q) z u(q) r b (q) r c (q) ) a (q) = ε ()M A M ε (), b (q) = ε ()D C M A M ε (), and c (q) = ε ()D C M A M C D ε (). 8

Consider the identity Ω (q) = Ω (d) where = Ω (d) ( + Ω (d) ( + Ω (d) Ω (d) (Ω (d) (Ω (d) ) ) (Ω (q) ) ) ) (Ω (q) ) Ω (q) ( ( Ω (d) p (z) = + Ev) I z 2 r z + EuI 2 (37) ( ( )) 2 Ω (d) + Ω (d) (Ω (d) ) (Ω (q) ) (q) Ω r z + EuI 2 ( r z + zeṽ) I z 2 ). (38) In this definition, we use superscript (d) to emphasize the fact that Ω (d) is a deterministic matrix. Ω (d) Lemma 3 There exists C >, such that sup xp [a,b] max =,...,T/2 p (z p) C for all suffi ciently large p. Identity (37) implies the following decomposition m Em = p + p + p T/2 = T/2 = T/2 = ( (E E ) tr Γ (q) ( (E E ) tr ( (E E ) tr Γ (q) ) Ω (d) Γ (q) We further expand (39) as follows. Define ˆΓ = Ω (d) ( ( Ω (d) (Ω (d) ( (Ω (d) ) (Ω (q) ) ) ( ( v ) z a I2 v z r b I 2 ( u ) z b I2 u z r c I 2 ) Ω (d) (39) )) ) 2 ) (Ω (q) ) (q) Ω. ), and ( ( z ˆΩ = + v r z + u I 2 ) I2 r ) z + u I 2 ( r z + zṽ ), z I2 9

where v = T tr M, u = T tr(c D M ), ṽ = T tr M, a = T tr(m A M ), b = T tr(d C M A M ), and c = T tr(d C M Then as shown in the SM, we have A M C D ). m Em = W + W 2 + W 3 + W 4, (4) where W = p T/2 = W 2 = T/2 E tr p W 3 = p W 4 = p = T/2 = T/2 = (( E tr Γ (q) ˆΓ ) ) Ω (d), (ˆΓ Ω (d) (ˆΩ ) (Ω (q) ) (( (E E ) tr Γ (q) ˆΓ ) ( (E E ) tr Γ (q) ( Ω (d) ( ( Ω (d) (Ω (d) ) Ω (d), (Ω (d) ) ) (Ω (q) ) )) ) 2 ) (Ω (q) ) (q) Ω. ) Ω (d), and Terms W k in the decomposition (4) are small in the sense that their moments quickly decay as p. A general strategy of proving this uses the fact that all these terms can be viewed as sums of martingale difference sequences, and therefore Burkholder s moment inequalities (see Lemmas 2. and 2.2 in BS98) are applicable. The moments of the corresponding summands can be bounded using results on quadratic forms in Gaussian vectors, detailed in the SM. The so-obtained bounds involve quantities such as E tr(m )/T. These quantities can be split into two parts, corresponding to the eigenvalues λ p that lie outside and inside the interval [a, b ]. The outside components are bounded for x p [a, b] because the distance between [a, b ] c and [a, b] is fixed and positive. The inside components are bounded by products of powers of y p and EF p ([a, b ]), the expected proportion of eigenvalues λ p that belong to [a, b ]. Given the choice of y p made in this section, such products are small by (35). Following this general 2

strategy, we prove our next proposition (see SM), which is the main result of this subsection. Proposition 4 For any k =,..., 4, max xp S p py p W k a.s., and hence, max py p m Em a.s.. x p S p 4.4 Step 3: Convergence of Em m Taking expectations of both parts of equations (-4) and replacing EΩ (q) by Ω (d), we obtain an analog of the approximate system (8-2) for variables Em, Ev, Eu, and Eṽ instead of m, v, u, and ṽ. where Em = c ( z) 2 ct c + zem = c ( z) 2 ct + zem = = 2 ct c ( z) 2 ct T/2 = T/2 = T/2 = T/2 = zeṽ + r (Eu + Ev ) ( z) δ + ē, (4) zeṽ + r z (Eu + zev ) ( z) δ + ē 2, (42) zeṽ + r (Eu ( + z) /2 + zev ) ( z) δ + ē 3,(43) Eu r Ev/2 δ + ē 4, (44) δ = zeṽ ( + Ev zev) + r (Eu + zev ) ( z) (Eu) 2, 2

and ē = p ē 2 = p ē 3 = p ē 4 = p T/2 = T/2 = T/2 = The identity Ω (d) T/2 = ( z) 2 tr ( [I2, r ( z) 2 tr ( [I2, r z ( z) 2 tr ( [I2, r z ( ( z tr [, I 2 ] Ω (d) Ω (q) = Ω (d) Ω (d) ((Ω (q) ] ( Ω (d) ] ( Ω (d) ] ( Ω (d) EΩ (q) EΩ (q) EΩ (q) EΩ (q) ) [I2 ] ), r, ) [I2 ] ), zr, ) [I2 ] ), r, ) [I2 ] ), r. ) (Ω (d) ) )Ω (q) yields a decomposition EΩ (q) = R + R 2 + R 3, (45) where R = Ω (d) E((Ω (q) ( ( R 2 = E Ω (d) R 3 = E ( ( Ω (d) ) (Ω (d) ) )Ω (d), ((Ω (q) ) (Ω (d) ) ) ((Ω (q) ) (Ω (d) ) ) ) ) 2 (d) Ω, and ) ) 3 (q) Ω. As we show in the SM, for any x p [a, b], E((Ω (q) ) ) 2 are of order p, whereas E ((Ω (q) (Ω (d) order, and Ω (d) as well as E Ω (q) ) (Ω (d) ) ) and E ((Ω (q) ) ) ) 3 is of an even smaller ) (Ω (d) are bounded. These facts would have implied that ē k are of order p, had there been no ( z) 2 multipliers in the definition of ē,..., ē 3, and ( z) multiplier in the definition of ē 4. If [a, b] includes unity, then these multipliers are not uniformly bounded over Re z [a, b]. However, it turns out that the norms of ( z) [ ] I 2, r (d) Ω and of ( z) [ ] I 2, r z (d) Ω are uniformly bounded over [a, b] (see the proof of Lemma 5 in the SM), which is suffi cient to guarantee that ē k are of order p notwithstanding the presence of the multipliers ( z) 2 and ( z) in their definitions. Lemma 5 There exists C >, s.t. for any k =,..., 4, sup xp [a,b] ē k (z p ) Cp. 22

As explained in the proof given in the SM, the inequality for ē 2 can be slightly strengthened so that sup ē 2 (z p ) /z p Cp. (46) x p [a,b] Such a strengthened version is used in the proof of Lemma 6. Similarly to the above reduction of the approximate system (8-2) to the simple form (3), we reduce the system of equations (4-44) to Eṽ + 2Eu = ê, zev + Eu + c/ ( c) = ê 2, Em Ev ( c) /c = ê 3, (Em) 2 cz ( z) Em (c z + cz) + = ê 4, where ê k, k =,..., 4, are nonlinear functions of ē k, k =,..., 4, Ev, Eu, and Eṽ. Lemma 6 There exists C >, s.t. for any k =,..., 4, sup xp [a,b] ê k (z p ) Cp. (47) Now recall the explicit form (32) of m. The fourth equation of (47) yields a similar expression for Em, Em(z) = c z + cz + (c z + cz) 2 4cz ( z) + 4ê 4 cz ( z). 2cz ( z) Hence, the difference Em (z p ) m (z p ) is of the order of ê 4 (z p ) / (c z p + cz p ) 2 4cz p ( z p ). On the other hand, identity (33) implies that inf x p [a,b] (c zp + cz p ) 2 4cz p ( z p ) > ɛ for all suffi ciently large p, where ɛ is the positive number used in the definition of [a, b ]. Therefore, the following Proposition follows from Lemma 6. Proposition 7 There exists C >, s.t. sup xp [a,b] Em (z p ) m (z p ) Cp. Propositions 4 and 7 yield sup m (z p ) m (z p ) = o a.s. (/ (py p )) (48) x p [a,b] 23

with y p = y p α, α = /456, and y an arbitrary fixed positive real number. This yields Theorem via the following arguments. The main idea of these arguments is outlined in Section 4. above. Using (48), we obtain max sup k {,2,...,228} x p [a,b] m Taking imaginary parts, we obtain max k {,2,...,228} x p [a,b] (x p + i kp α ) m ( x p + i kp α ) = oa.s. ( p +α ). sup d (Fp (λ) W c (λ)) (x p λ) 2 + kp 2α = o ( ) a.s. p +2α. Taking differences of the integrals corresponding to different values of k yields so that max sup k k 2 x p [a,b] max k,k 2,k 3 distinct sup x p [a,b] sup x p [a,b] sup x p [a,b] p 2α d (F p (λ) W c (λ)) ( (xp λ) 2 + k s p 2α) s= p 4α d (F p (λ) W c (λ)) ( (xp λ) 2 + k s p 2α) 2 3 s= p 454α d (F p (λ) W c (λ)) ( (xp λ) 2 + sp 2α) 228 s= 228 = o a.s. ( p +2α ), = o a.s. ( p +2α ),. = o a.s. ( p +2α ), d (F p (λ) W c (λ)) ( (xp λ) 2 + sp 2α) = o a.s. (). s= Splitting up the integral, we obtain sup x p [a,b] {[a,b ] c } (λ) d (F p (λ) W c (λ)) 228 ( (xp λ) 2 + sp 2α) (49) s= + p 228 ( (xp λ p ) 2 + sp 2α) = o a.s. (), λ p [a,b ] s= where {[a,b ] c } (λ) is the indicator function equal to unity iff λ / [a, b ]. Now suppose that there exists a subsequence p n such that for each p n, at 24

least one eigenvalue λ pn belongs to [a, b]. Setting x pn equal to such an eigenvalue, we see that the sum on the left hand side of (49) is no smaller than 228 s= s > for all p n. Therefore, at such x pn, the integral on the left hand side of (49) must be uniformly bounded away from zero over all p n. But the integral must a.s. converge to zero because the integrand is uniformly bounded, and both F pn weakly converge to W c and W c a.s. that satisfies W c ([a, b ]) =. Therefore, with probability one, no eigenvalues λ p will appear in [a, b] for all suffi ciently large p. 5 Johansen s H model If the data generating process is described by Johansen s H model (2) rather than (), the LR statistic for testing the null hypothesis that Π = still has form (3). However now, λ p s equal the eigenvalues of S S S S, where S i are defined differently from S i given in (8). Specifically, they correspond to sample covariance and cross-covariance matrices of the demeaned processes t and ( t, t ) (see Johansen (995, ch. 6.2)). That is, in contrast to (8), ( S = ηm l U η, ηm l τ ) and S = ( ηum l U η τ M l U η ηum l τ τ M l τ while similarly to above, S = S and S = ηm l η. Here τ denotes the time trend, τ = (, 2,..., T + ). In contrast to matrices S, S, and S given in (8), matrices S, S, and S cannot be simultaneously rotated to the form ε W ε, where W is a block-diagonal matrix. Therefore, in the case of H model, there is no convenient frequency domain reformulation of Johansen s test, and the above analysis will not go through. It is however possible to show that at most one eigenvalue of S S S S remains above and separated from b + and at most one eigenvalue remains below and separated from b, asymptotically. Hence, the second largest and smallest eigenvalues of S S S S a.s. converge to b + and b. Recall that the eigenvalues of S S S S equal those of P P 2, where P and P 2 are proections on the column spaces of Y M l U M l η and Z M l η, respectively. Similarly, the eigenvalues of S S S ( S equal those) of P P 2, where P is the proection on the column space of Ỹ M l U η, M l τ. Note that Ỹ has p + columns ( whereas Y has ) p columns. Let us augment Y by a zero column to obtain Ȳ M l U M l η,. Obviously, proections on the ), 25

columns of Y and Ȳ coincide and equal P. Further, ( ) ( Ỹ Ȳ = M l U ll η / (T + ), M l τ = M l τl η / (T + ), M l τ ), (5) ( and matrix M l τl η / (T + ), M l τ ) has rank one. Lemma 8 Let Y and Y 2 be n m matrices and let P Y and P Y2 be proections on the spaces spanned by the columns of Y and Y 2, respectively. If rank (Y Y 2 ) = r, then there exist n r matrices y and y 2 such that P Y P Y2 = P y P y2, where P y and P y2 are proections on the spaces spanned by the columns of y and y 2, respectively. In particular, rank (P Y P Y2 ) 2r. ( Proof: Assume that Y Y 2 = ab, where a is n r and b =, I r ). This assumption does not lead to loss of generality because P Y and P Y2 are invariant with respect to multiplication of Y and Y 2 from the right by arbitrary invertible m m matrices. The above form of b can be achieved by such a multiplication. Let us partition Y and Y 2 as [Y, Y 2 ] and [Y 2, Y 22 ], where Y 2 and Y 22 are the last r columns of Y and Y 2, respectively. We have Y 2 = Y and Y 22 + a = Y 2. Denote I m P Y2 as M, where P Y2 is the proection on the space spanned by the columns of Y 2, and let y 2 = M Y 22. Note that P Y2 = P [Y2,y 2 ] = P Y2 + P y2, where the second equality holds because Y 2 is orthogonal to y 2. Similarly, we have P Y = P Y + P y = P Y2 + P y, where y = M Y 2. Therefore, P Y P Y2 = P y P y2. Lemma 8 and equality (5) imply that there exist no more than one eigenvalue of S S S S that is larger than the largest eigenvalue of S S S S and no more than one eigenvalue of S S S S that is smaller than the smallest eigenvalue of S S S S. Indeed, note that the eigenvalues of S S S S, which equal those of P P 2, coincide with the eigenvalues of a symmetric matrix P 2 P P 2. Similarly, the eigenvalues of S S S S coincide with the eigenvalues 26

of a symmetric matrix P 2 P P 2. By Lemma 8, P 2 P P 2 P 2 P P 2 = P 2 P y P 2 P 2 P y2 P 2, where P y and P y2 are proections on one-dimensional spaces. Hence, our statement concerning eigenvalues of S S S S and S S S S follows from Weyl s inequalities for eigenvalues of a sum of symmetric matrices (see e.g. Horn and Johnson (985, Theorem 4.3.)). We have conducted a small-scale Monte Carlo study which suggests that, in fact, the largest eigenvalue of S S S S converges to b + similarly to the largest eigenvalue of S S S S. However, the smallest eigenvalue of S S S S is close to zero, whereas in accordance with our theoretical results, the smallest eigenvalue of S S S S converges to b. A more formal analysis of the extreme eigenvalues of S S S S would amount to studying low-rank perturbations of S S S S. There exists large literature on the low rank perturbations of classical random matrix ensembles (see e.g. Capitaine and Donati-Martin (26) and references therein). However, this literature is not directly applicable to S S S S. We leave analysis of small rank perturbations of such a matrix for future research. 6 Conclusion and discussion This paper establishes the a.s. convergence of the largest and the smallest eigenvalues of S S S S to the upper and lower boundaries of the support of the Wachter distribution W c. This complements Onatski and Wang s (27) result on the a.s. weak convergence of the empirical distribution of the eigenvalues to W c. The strategy of our proofs is similar to that of the proof of the convergence of the extreme eigenvalues of the sample covariance matrix in BS98. However, the fact that we have to deal with the product of four dependent stochastic matrices, S, S, S, and S, presents non-trivial challenges that we overcome. Eigenvalues of S S S S can be interpreted as squared canonical correlations between demeaned innovations of high dimensional random walk and detrended and demeaned levels of this random walk. Such eigenvalues form the basis for the LR test of no cointegration in high-dimensional vector autoregression of order one. The LR statistic has a singularity at unity, hence Onatski and Wang s (27) result cannot be used to establish its a.s. convergence. 27

The result of this paper shows that the singularity can be ignored because none of the eigenvalues of S S S S are close to unity asymptotically. Thus, our Corollary 2 establishes the a.s. limit of the LR statistic. We use this result to obtain an analytic formula for a Bartlett-type correction coeffi cient for the LR test. We establish Theorem under Gaussianity of the errors η t of model (). We need the Gaussianity for two reasons. First, it allows us to reduce the analysis of S S S S to that of C D CA, where C, D, and A have form ε W ε with block-diagonal W, and ε has i.i.d. elements. Second, we use it to derive bounds on the expected value of the inverse of the smallest eigenvalue of A (in SM). In principle, the first reason can be circumvented by simply assuming that the matrix ε of the discrete Fourier transforms of η has i.i.d. (but not necessarily Gaussian) elements. This still leaves the second reason intact. Unfortunately even a seemingly innocuous assumption that the elements of ε are i.i.d. Bernoulli random variables leads to non-invertibility of A with small but positive probability, and hence, to nonexistence of the expected value of the inverse of the smallest eigenvalue of A. We leave removing the Gaussianity assumption as an important topic for future research. Onatski and Wang (27) establish the a.s. weak convergence of the empirical distribution of the eigenvalues of S S S S to W c under more general data generating processes than the one described by (). Extension of the results of this paper to such more general processes would require analyzing the effect of small rank perturbations on the extreme eigenvalues of S S S S. As we discuss above, such an analysis is not straightforward and needs a substantial further research effort. Another important research task is to study the asymptotic fluctuations of the functionals of F p around their a.s. limits. This would allow one to derive an asymptotic distribution of the LR statistic under the simultaneous asymptotics. We are undertaking such a study as a separate proect. References [] Bai, Z. and Silverstein, J. (998) No Eigenvalues Outside the Support of the Limiting Spectral Distribution of Large-dimensional Sample Covariance Matrices, The Annals of Probability 26, 36-345. 28

[2] Bai, Z., Hu, J., Pan, G., and Zhou, W. (25) Convergence of the empirical distribution function of Beta matrices, Bernoulli 2, 538-74. [3] Baneree, A., Marcellino, M., and Osbat, C. (24) Some cautions on the use of panel methods for integrated series of macroeconomic data. The Econometrics Journal 7 (2), 322 34. [4] Brillinger, D.R. (98) Time series: data analysis and theory. Holden Day, Inc., San Francisco. [5] Capitaine, M. and Donati-Martin, C. (26) Spectrum of deformed random matrices and free probability, ariv 67.556v [6] Davis, G.C. (23) The generalized composite commodity theorem: Stronger support in the presence of data limitations, Review of Economics and Statistics 85, 476-48. [7] Engel, C., N.C. Mark, and K.D. West (25) Factor Model Forecasts of Exchange Rates, Econometric Reviews 34, 32-55. [8] Golub, G.H. and C. F. Van Loan (996) Matrix Computations, The John Hopkins University Press. [9] Gonzalo, J. and J-Y Pitarakis (995) Comovements in Large Systems, Working Paper 95-38, Statistics and Econometrics Series, Universidad Carlos III de Madrid. [] Gonzalo, J. and J-Y Pitarakis (999) Dimensionality Effect in Cointegration Analysis, Cointegration, Causality, and Forecasting. A Festschrift in Honour of Clive WJ Granger, Oxford University Press, Oxford, pp. 22-229 [] Ho, M.S., and B.E. Sorensen (996) Finding Cointegration Rank in High Dimensional Systems Using the Johansen Test: An Illustration Using Data Based Mote Carlo Simulations, Review of Economics and Statistics 78, 726-732. [2] Horn, R.A. and C.R. Johnson (985) Matrix Analysis, Cambridge University press. [3] Jensen, J.L. and A.T.A. Wood (997) On the non-existence of a Bartlett correction for unit root tests, Statistics and Probability Letters 35, 8-87. 29

[4] Johansen, S. (988) Statistical Analysis of Cointegrating Vectors, Journal of Economic Dynamics and Control 2, 23-254 [5] Johansen, S. (99) Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models, Econometrica 59, 55-58. [6] Johansen, S. (995) Likelihood-Based Inference in Cointegrated Vector Autoregressive Models, Oxford University Press Inc., New York. [7] Johansen, S. (22) A small sample correction for the test of cointegrating rank in the vector autoregressive model, Econometrica 7, 929-96. [8] Johansen, S., H. Hansen, and S. Fachin (25) A simulation study of some functionals of random walk, manuscript available at http://www.math.ku.dk/~so/. [9] Larsson, R. (998) Bartlett Corrections for Unit Root Test Statistics, Journal of Time Series Analysis 9, 426-238. [2] Lewbel, A. (996) Aggregation without separability: a generalized composite commodity theorem, American Economic Review 86, 524-543. [2] Nielsen, B. (997). Bartlett correction of the unit root test in autoregressive models, Biometrika 84, 5 54. [22] Onatski, A. and C. Wang (27) Alternative asymptotics for cointegration tests in large VARs. manuscript, University of Cambridge. [23] Pedroni, P.L., Vogelsang, T.J., Wagner, M., and Westerlund, J. (25) Nonparametric rank tests for non-stationary panels, Journal of Econometrics 85, 378-39. [24] Tao, T. and Vu, V. (2) Random Matrices: Universality of Local Eigenvalue Statistics, Acta Mathematica 26, 27-24. 3

Supplementary Material for Extreme canonical correlations and high-dimensional cointegration analysis. Alexei Onatski and Chen Wang Faculty of Economics, University of Cambridge August 5, 27 Abstract This note contains supplementary material for Onatski and Wang (27a) (OW in what follows). It is lined up with sections in the main text to make it easy to locate the required proofs. Contents Introduction and the main result 2. There is no supplementary material for this section....... 2 2 Bartlett-type correction 2 2. ProofofTheoremOW3... 2 2.. Convergence of (the largest eigenvalue of the -limit of () ).. 3 3 Setup 4 3. Proof of Lemma OW4 (diagonalization)...... 4 3.2 Proof of Lemma OW5 ( form of )... 4 3.3 DerivationofequationsOW-OW4... 5 3.3. ProofofLemma(linksbetweenvariableswithandwithouttilde)... 4 Proof of Theorem OW 4. Outlineoftheproof... 4.. ThereisnosupplementarymaterialforthissectionofOW.... 4.2 Step : Speed of convergence of E ([ ])... 4.2. Rough bounds. Proof of Lemma OW7 (bound on Ω (Ω () ) )... 4.2.2 Rough bounds. Proof of Lemma 3 (bounds on etc.)... 4 4.2.3 Rough bounds. Bounds on min and max (Pr oftailevents,andmoments)... 5 4.2.4 Rough bounds. Proof of Lemma OW8 (bounds on Ω () )... 6 4.2.5 Systemreduction.Derivationofsystem(OW3)andproofofLemmaOW... 2 4.2.6 Analysis of ProofofLemmaOW... 32 4.2.7 Analysis of Proof of Proposition OW2 (bound on E ([ ]))... 34 4.3 Step 2: Convergence of E... 37 4.3. Proof of equation (OW36) (initial representation of E)... 37 4.3.2 Proof of Lemma OW3 (boundedness of Ω () )... 37 4.3.3 Details of a proof of (3) (about the a.s. convergence of (E ( ) ( )))... 39 4.3.4 Proof of the decomposition (OW4) ( E = + 2 + 3 + 4 )... 4 4.3.5 Proof of Proposition OW4 (a.s. convergence of E )... 42 4.3.6 Proof of Lemma 23 (bound on max sup [] E (Ω () ) (Ω () ) )... 47 4.4 Step 3: Convergence of E... 5