Y t = ΦD t + Π 1 Y t Π p Y t p + ε t, D t = deterministic terms

VAR Models and Cointegration The Granger representation theorem links cointegration to error correction models. In a series of important papers and in a marvelous textbook, Soren Johansen firmly roots cointegration and error correction models in a vector autoregression framework. This section outlines Johansen s approach to cointegration modeling. The Cointegrated VAR Consider the levels VAR(p) forthe(n 1) vector Y t Y t = ΦD t + Π 1 Y t 1 + + Π p Y t p + ε t, t = 1,...,T, D t = deterministic terms

Remarks: The VAR(p) model is stable if det(i n Π 1 z Π p z p )=0 has all roots outside the complex unit circle. If there are roots on the unit circle then some or all of the variables in Y t are I(1) and they may also be cointegrated. If Y t is cointegrated then the VAR representation is not the most suitable representation for analysis because the cointegrating relations are not explicitly apparent.

The cointegrating relations become apparent if the levels VAR is transformed to the vector error correction model (VECM) Y t = ΦD t + ΠY t 1 + Γ 1 Y t 1 Γ k = + + Γ p 1 Y t p+1 + ε t Π = Π 1 + + Π p I n px j=k+1 Π j, k =1,...,p 1 In the VECM, Y t and its lags are I(0). The term ΠY t 1 is the only one which includes potential I(1) variables and for Y t to be I(0) it mustbethecasethatπy t 1 is also I(0). Therefore, ΠY t 1 must contain the cointegrating relations if they exist.

If the VAR(p) process has unit roots (z =1)then det(i n Π 1 Π p )=0 det(π) =0 Π is singular If Π is singular then it has reduced rank; that is rank(π) =r<n. There are two cases to consider: 1. rank(π) = 0. This implies that Π = 0 Y t I(1) and not cointegrated The VECM reduces to a VAR(p 1) in first differences Y t = ΦD t + Γ 1 Y t 1 + + Γ p 1 Y t p+1 + ε t.

2. 0 <rank(π) = r < n. This implies that Y t is I(1) with r linearly independent cointegrating vectors and n r common stochastic trends (unit roots). Since Π has rank r it can be written as the product Π (n n) = α β 0 (n r)(r n) where α and β are (n r) matrices with rank(α) = rank(β) =r. The rows of β 0 form a basis for the r cointegrating vectors and the elements of α distribute the impact of the cointegrating vectors to the evolution of Y t.thevecmbecomes Y t = ΦD t + αβ 0 Y t 1 + Γ 1 Y t 1 + + Γ p 1 Y t p+1 + ε t, where β 0 Y t 1 I(0) since β 0 is a matrix of cointegrating vectors.

Non-uniqueness It is important to recognize that the factorization Π = αβ 0 is not unique since for any r r nonsingular matrix H we have αβ 0 = αhh 1 β 0 =(ah)(βh 10 ) 0 = a β 0 a = ah, β = βh 10 Hence the factorization Π = αβ 0 only identifies the space spanned by the cointegrating relations. To obtain unique values of α and β 0 requires further restrictions on the model.

Example: Consider the bivariate VAR(1) model for Y t =(y 1t,y 2t ) 0 The VECM is Y t = Π 1 Y t 1 + ² t. Y t = ΠY t 1 + ε t Π = Π 1 I 2 Assuming Y t is cointegrated there exists a 2 1 vector β =(β 1,β 2 ) 0 such that β 0 Y t = β 1 y 1t + β 2 y 2t I(0) Using the normalization β 1 =1andβ 2 = β the cointegrating relation becomes β 0 Y t = y 1t βy 2t This normalization suggests the stochastic long-run equilibrium relation y 1t = βy 2t + u t

Since Y t is cointegrated with one cointegrating vector, rank(π) =1sothat Ã! Ã! Π = αβ 0 α1 ³ α1 α = 1 β = 1 β. α 2 α 2 α 2 β The elements in the vector α are interpreted as speed of adjustment coefficients. The cointegrated VECM for Y t may be rewritten as Y t = αβ 0 Y t 1 + ε t. Writing the VECM equation by equation gives y 1t = α 1 (y 1t 1 βy 2t 1 )+ε 1t, y 2t = α 2 (y 1t 1 βy 2t 1 )+ε 2t. The stability conditions for the bivariate VECM are related to the stability conditions for the disequilibrium error β 0 Y t.

It is straightforward to show that β 0 Y t follows an AR(1) process or β 0 Y t =(1 + β 0 α)β 0 Y t 1 + β 0 ε t u t = φu t 1 + v t, u t = β 0 Y t φ = 1+β 0 α =1+(α 1 βα 2 ) v t = β 0 ε t = u 1t βu 2t The AR(1) model for u t is stable as long as φ = 1+(α 1 βα 2 ) < 1 For example, suppose β = 1. Then the stability condition is which is satisfied if φ = 1+(α 1 α 2 ) < 1 α 1 α 2 < 0andα 1 α 2 > 2.

Johansen s Methodology for Modeling Cointegration The basic steps in Johansen s methodology are: 1. Specify and estimate a VAR(p) modelfory t. 2. Construct likelihood ratio tests for the rank of Π to determine the number of cointegrating vectors. 3. If necessary, impose normalization and identifying restrictions on the cointegrating vectors. 4. Given the normalized cointegrating vectors estimate the resulting cointegrated VECM by maximum likelihood.

Likelihood Ratio Tests for the Number of Cointegrating Vectors The unrestricted cointegrated VECM is denoted H(r). The I(1) model H(r) can be formulated as the condition that the rank of Π is less than or equal to r. This creates a nested set of models H(0) H(r) H(n) H(0) = non-cointegrated VAR H(n) = stationary VAR(p) This nested formulation is convenient for developing a sequential procedure to test for the number r of cointegrating relationships.

Remarks: Since the rank of the long-run impact matrix Π gives the number of cointegrating relationships in Y t, Johansen formulates likelihood ratio (LR) statistics for the number of cointegrating relationships as LR statistics for determining the rank of Π. These LR tests are based on the estimated eigenvalues ˆλ 1 > ˆλ 2 > > ˆλ n of the matrix Π. These eigenvalues also happen to equal the squared canonical correlations between Y t and Y t 1 corrected for lagged Y t and D t and so lie between0and1. Recall, the rank of Π is equal to the number of non-zero eigenvalues of Π.

Johansen s Trace Statistic Johansen s LR statistic tests the nested hypotheses H 0 (r) :r = r 0 vs. H 1 (r 0 ):r>r 0 The LR statistic, called the trace statistic, is given by LR trace (r 0 )= T nx i=r 0 +1 ln(1 ˆλ i ) If rank(π) =r 0 then ˆλ r0 +1,...,ˆλ n should all be close to zero and LR trace (r 0 ) should be small since ln(1 ˆλ i ) 0fori>r 0. In contrast, if rank(π) >r 0 then some of ˆλ r0 +1,...,ˆλ n will be nonzero (but less than 1) and LR trace (r 0 ) should be large since ln(1 ˆλ i ) << 0forsome i>r 0.

Result: The asymptotic null distribution of LR trace (r 0 ) is not chi-square but instead is a multivariate version of the Dickey-Fuller unit root distribution which depends on the dimension n r 0 and the specification of the deterministic terms. Critical values for this distribution are tabulated in Osterwald-Lenum (1992) for n r 0 =1,...,10.

Sequential Procedure for Determining the Number of Cointegrating Vectors 1. First test H 0 (r 0 = 0) against H 1 (r 0 > 0). If this null is not rejected then it is concluded that there are no cointegrating vectors among the n variables in Y t. 2. If H 0 (r 0 = 0) is rejected then it is concluded that there is at least one cointegrating vector and proceed to test H 0 (r 0 =1)againstH 1 (r 0 > 1). If this null is not rejected then it is concluded that there is only one cointegrating vector. 3. If the H 0 (r 0 = 1) is rejected then it is concluded that there is at least two cointegrating vectors. 4. The sequential procedure is continued until the null is not rejected.

Johansen s Maximum Eigenvalue Statistic Johansen also derives a LR statistic for the hypotheses H 0 (r 0 ):r = r 0 vs. H 1 (r 0 ):r 0 = r 0 +1 The LR statistic, called the maximum eigenvalue statistic, is given by LR max (r 0 )= T ln(1 ˆλ r0 +1) As with the trace statistic, the asymptotic null distribution of LR max (r 0 ) is not chi-square but instead is a complicated function of Brownian motion, which depends on the dimension n r 0 the specification of the deterministic terms. Critical values for this distribution are tabulated in Osterwald-Lenum (1992) for n r 0 =1,...,10.

Specification of Deterministic Terms Following Johansen (1995), the deterministic terms in are restricted to the form ΦD t = μ t = μ 0 + μ 1 t If the deterministic terms are unrestricted then the time series in Y t may exhibit quadratic trends and there may be a linear trend term in the cointegrating relationships. Restricted versions of the trend parameters μ 0 and μ 1 limit the trending nature of the series in Y t. The trend behavior of Y t can be classified into five cases:

1. Model H 2 (r): μ t =0(noconstant): Y t = αβ 0 Y t 1 +Γ 1 Y t 1 + + Γ p 1 Y t p+1 + ε t, and all the series in Y t are I(1) without drift and the cointegrating relations β 0 Y t have mean zero. (restricted con- 2. Model H1 (r): μ t = μ 0 = αρ 0 stant): Y t = α(β 0 Y t 1 + ρ 0 ) +Γ 1 Y t 1 + + Γ p 1 Y t p+1 + ε t, the series in Y t are I(1) without drift and the cointegrating relations β 0 Y t have non-zero means ρ 0.

3. Model H 1 (r): μ t = μ 0 (unrestricted constant): Y t = μ 0 + αβ 0 Y t 1 +Γ 1 Y t 1 + + Γ p 1 Y t p+1 + ε t the series in Y t are I(1) with drift vector μ 0 and the cointegrating relations β 0 Y t may have a nonzero mean. 4. Model H (r): μ t = μ 0 +αρ 1 t (restricted trend). The restricted VECM is Y t = μ 0 + α(β 0 Y t 1 + ρ 1 t) +Γ 1 Y t 1 + + Γ p 1 Y t p+1 + ε t the series in Y t are I(1) with drift vector μ 0 and the cointegrating relations β 0 Y t have a linear trend term ρ 1 t.

5. Model H(r): μ t = μ 0 + μ 1 t (unrestricted constant and trend). The unrestricted VECM is Y t = μ 0 + μ 1 t + αβ 0 Y t 1 +Γ 1 Y t 1 + + Γ p 1 Y t p+1 + ε t, the series in Y t are I(1) with a linear trend (quadratic trend in levels) and the cointegrating relations β 0 Y t have a linear trend.

Maximum Likelihood Estimation of the Cointegrated VECM If it is found that rank(π) =r, 0<r<n,thenthe cointegrated VECM Y t = ΦD t + αβ 0 Y t 1 + Γ 1 Y t 1 + + Γ p 1 Y t p+1 + ε t, becomes a reduced rank multivariate regression. Johansen derived the maximum likelihood estimation of the parametes under the reduced rank restriction rank(π) = r (see Hamilton for details). He shows that ˆβ mle =(ˆv 1,...,ˆv r ),where ˆv i are the eigenvectors associated with the eigenvalues ˆλ i, The MLEs of the remaining parameters are obtained by least squares estimation of Y t = ΦD t + αˆβ 0 mley t 1 + Γ 1 Y t 1 + + Γ p 1 Y t p+1 + ε t,

Normalized Estimates of α and β The factorization is not unique ˆΠ mle = ˆα mleˆβ0 mle The columns of ˆβ mle may be interpreted as linear combinations of the underlying cointegrating relations. For interpretations, it is often convenient to normalize or identify the cointegrating vectors by choosing a specific coordinate system in which to express the variables.

Johansen s normalized MLE An arbitrary normalization, suggested by Johansen, is to solve for the triangular representation of the cointegrated system (default method in Eviews). The resulting normalized cointegrating vector is denoted ˆβ c,mle. The normalization of the MLE for β to ˆβ c,mle will affect the MLE of α but not the MLEs of the other parameters in the VECM. Let ˆβ c,mle denote the MLE of the normalized cointegrating matrix β c. Johansen (1995) showed that T (vec(ˆβ c,mle ) vec(β c )) is asymptotically (mixed) normally distributed ˆβ c,mle is super consistent

Testing Linear Restrictions on β The Johansen MLE procedure only produces an estimate of the basis for the space of cointegrating vectors. It is often of interest to test if some hypothesized cointegrating vector lies in the space spanned by the estimated basis: Ã H 0 : β 0 β 0 = (r n) φ 0 β 0 0! = s n matrix of hypothesized cv s φ 0 = (r s) n matrix of remaining unspecified cv s Result: Johansen (1995) showed that a likelihood ratio statistic can be computed, which is asymptotically distributed as a χ 2 with s(n r) degrees of freedom.

Cointegration and the BN Decomposition The Granger Representation Theorem (GRT) provides an explicit link between the VECM form of a cointegrated VAR and the Wold or moving average representation. The GRT also provides insight into the Beveridge- Nelson decomposition of a cointegrated time series. Let y t be cointegrated with r cointegrating vectors captured in the r n matrix β 0 so that β 0 y t is I(0). Suppose y t has the Wold representation y t = μ + Ψ(L)u t Ψ(L) = X k=0 Ψ k L k and Ψ 0 = I n

Using Ψ(L) =Ψ(1) + (1 L) Ψ(L), The BN decomposition of y t is given by y t = y 0 + μt + Ψ(1) u t = Ψ(L)u t tx k=1 Multiply both sides by β 0 to give β 0 y t = β 0 μt + β 0 Ψ(1) tx k=1 Since β 0 y t is I(0) we must have that u t +ũ t ũ 0 u t + β 0 (y 0 +ũ t ũ 0 ) β 0 Ψ(1) = 0 Ψ(1) is singular and has rank n r The singularity of Ψ(1) implies that the long-run covariance of y t Ψ(1)ΣΨ(1) 0 is singular and has rank n r.

Now suppose that y t has the VECM representation Γ(L) y t = c + Πy t 1 + u t Π = αβ 0 where the n r matrices α and β both have rank r. The Granger Representation Theorem (GRT) gives an explicit mapping from the BN decomposition to the parameters of the VECM. Define the n (n r) full rank matrices α and β such that α 0 α =0,β 0 β =0 rank(α, α )=n, rank(β, β )=n (α 0 Γ(1)β ) 1 exists where Γ(1) = I n P p 1 i=1 Γ i β (α 0 Γ(1)β ) 1 α 0 + α0 (α 0 Γ(1)β ) 1 β 0 = I n

Theorem (GRT). If det(a(z)) = 0 implies that z > 1 or z = 1 and rank(π) = r < n,then there exist n r matrices α and β of rank r such that Π = αβ 0. A necessary and sufficient for β 0 y t to be I(0) is that α 0 Γ(1)β has full rank. Then the BN decomposition of y t has the representation where y t = μt + Ψ(1) tx k=1 u t + y 0 +ũ t ũ 0 Ψ(1) = β (α 0 Γβ ) 1 α 0 y t is a cointegrated process with cointegrating vectors given by the rows of β 0.

The main part of the GRT is the explicit representation for Ψ(1) : Notice that Ψ(1) = β (α 0 Γ(1)β ) 1 α 0. β 0 Ψ(1) = β 0 β (α 0 Γ(1)β ) 1 α 0 =0 Ψ(1)α = β (α 0 Γ(1)β ) 1 α 0 α =0 The common trends in y t are extracted using TS t = Ψ(1) tx u t k=1 = β (α 0 Γ(1)β ) 1 α 0 = ξα 0 tx u t k=1 tx u t k=1 where ξ = β (α 0 Γ(1)β ) 1. Hence the common trends are the linear combinations α 0 tx u t k=1

The GRT in a Cointegrated Bivariate VAR(1) Model To illustrate the GRT, consider the simple cointegrated bivariate VECM y t = αβ 0 y t 1 + u t. where α =( 0.1, 0.1) 0 and β =(1, 1) 0. Here there is one cointegrating vector and one common trend. It may be easily deduced that Γ(1) = I 2, α = (1, 1) 0, β = (1, 1) 0, α 0 Γ(1)β = 2. Thecommontrendisthengivenby Ã Ptk=1! TS t = α 0 tx u u t =(1, 1) P 1t tk=1 = u 2t k=1 and the loadings on the common trend are ξ = β (α 0 Γ(1)β ) 1 = Ã 12 1 2!. tx k=1 u 1t + tx u 2t k=1

Now suppose that α =( 0.1, 0) 0 so that y 2t is weakly and strongly exogenous exogenous. The VECM has the simplified form Then y 1t = 0.1 β 0 y t 1 + u 1t y 2t = u 2t Γ(1) = I 2, α = (0, 1) 0, β = (1, 1) 0, α 0 Γ(1)β = 1. Interestingly, the common trend is simply y 2t : Ã P! TS t = α 0 tx tk=1 u tx u t =(0, 1) P 1t tk=1 = u 2t k=1 The loadings on the common trend are Ã ξ = β (α 0 Γ(1)β ) 1 1 = 1!. k=1 u 2t = y 2t