Notes on Time Series Modeling

Similar documents
Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Some Time-Series Models

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Chapter 6. Maximum Likelihood Analysis of Dynamic Stochastic General Equilibrium (DSGE) Models

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Time Series Analysis for Macroeconomics and Finance

Problem set 1 - Solutions

2. Multivariate ARMA

1 First-order di erence equation

Title. Description. var intro Introduction to vector autoregressive models

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Multivariate Time Series

Vector Auto-Regressive Models

VAR Models and Applications

Ch. 15 Forecasting. 1.1 Forecasts Based on Conditional Expectations

Equivalence of several methods for decomposing time series into permananent and transitory components

1 Regression with Time Series Variables

Gaussian processes. Basic Properties VAG002-

Parametric Inference on Strong Dependence

11. Further Issues in Using OLS with TS Data

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

LECTURE 13: TIME SERIES I

Simultaneous Equation Models Learning Objectives Introduction Introduction (2) Introduction (3) Solving the Model structural equations

Lecture 2: Univariate Time Series

7. MULTIVARATE STATIONARY PROCESSES

Vector Time-Series Models

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Econ 424 Time Series Concepts

Vector Autogregression and Impulse Response Functions

Empirical Market Microstructure Analysis (EMMA)

1 Linear Difference Equations

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

1. The Multivariate Classical Linear Regression Model

Structural VAR Models and Applications

Economics Computation Winter Prof. Garey Ramey. Exercises : 5 ; B = AX = B: 5 ; h 3 1 6

ARIMA Models. Richard G. Pierse

LECTURE 10 LINEAR PROCESSES II: SPECTRAL DENSITY, LAG OPERATOR, ARMA. In this lecture, we continue to discuss covariance stationary processes.

Econometría 2: Análisis de series de Tiempo

Chapter 2. Dynamic panel data models

A time series is called strictly stationary if the joint distribution of every collection (Y t

Multivariate Time Series: VAR(p) Processes and Models

Chapter 5. Structural Vector Autoregression

Difference equations. Definitions: A difference equation takes the general form. x t f x t 1,,x t m.

Problem Set 1 Solution Sketches Time Series Analysis Spring 2010

When Do Wold Orderings and Long-Run Recursive Identifying Restrictions Yield Identical Results?

Econ 623 Econometrics II Topic 2: Stationary Time Series

UNIVERSITY OF CALIFORNIA, SAN DIEGO DEPARTMENT OF ECONOMICS

New Introduction to Multiple Time Series Analysis

Chapter 2. Some basic tools. 2.1 Time series: Theory Stochastic processes

Cointegration, Stationarity and Error Correction Models.

Economics 620, Lecture 13: Time Series I

Lecture 7a: Vector Autoregression (VAR)

Empirical Macroeconomics

Ch. 19 Models of Nonstationary Time Series

Ch. 14 Stationary ARMA Process

FINANCIAL ECONOMETRICS AND EMPIRICAL FINANCE -MODULE2 Midterm Exam Solutions - March 2015

We begin by thinking about population relationships.

ECON 616: Lecture 1: Time Series Basics

Structural Macroeconometrics. Chapter 4. Summarizing Time Series Behavior

Empirical Macroeconomics

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Fin. Econometrics / 31

B y t = γ 0 + Γ 1 y t + ε t B(L) y t = γ 0 + ε t ε t iid (0, D) D is diagonal

Covariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )

Forecasting and Estimation

Lecture 7a: Vector Autoregression (VAR)

Introduction to Stochastic processes

2014 Preliminary Examination

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 14

x i = 1 yi 2 = 55 with N = 30. Use the above sample information to answer all the following questions. Show explicitly all formulas and calculations.

The Predictive Space or If x predicts y; what does y tell us about x?

Univariate Nonstationary Time Series 1

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

1 Augmented Dickey Fuller, ADF, Test

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

5: MULTIVARATE STATIONARY PROCESSES

Advanced Econometrics

Introduction to ARMA and GARCH processes

Chapter 2: Unit Roots

2.5 Forecasting and Impulse Response Functions

1 Quantitative Techniques in Practice

Chapter 6: Endogeneity and Instrumental Variables (IV) estimator

On Econometric Analysis of Structural Systems with Permanent and Transitory Shocks and Exogenous Variables

Vector autoregressions, VAR

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

A. Recursively orthogonalized. VARs

Permanent Income Hypothesis (PIH) Instructor: Dmytro Hryshko

10) Time series econometrics

Augmenting our AR(4) Model of Inflation. The Autoregressive Distributed Lag (ADL) Model

ARIMA Modelling and Forecasting

ECON 4160, Lecture 11 and 12

Lecture Notes on Measurement Error

ECONOMET RICS P RELIM EXAM August 24, 2010 Department of Economics, Michigan State University

Vector error correction model, VECM Cointegrated VAR

It is easily seen that in general a linear combination of y t and x t is I(1). However, in particular cases, it can be I(0), i.e. stationary.

1. Stochastic Processes and Stationarity

Estimating and Identifying Vector Autoregressions Under Diagonality and Block Exogeneity Restrictions

Econ 423 Lecture Notes: Additional Topics in Time Series 1

R = µ + Bf Arbitrage Pricing Model, APM

Economics 241B Estimation with Instruments

Transcription:

Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g tt : These notes will consider discrete stochastic processes, i.e., processes indexed by the set of integers T = f::: ; 1; ; 1; ; :::g: fy t g 1 t= 1: Moreover, each y t is a real number. Stochastic processes will be referred to more concisely as "processes." De nition The oint distribution of the process fy t g is determined by the oint distributions of all nite subsets of y t s: F t1 ;t ;:::;t n ( 1 ; ; :::; n ) = Pr(y t1 1 ; y t :::; y tn n ); for all possible collections of distinct integers t 1 ; t ; :::; t n. This distribution can be used to determine moments: Z Mean: t = E(y t ) = y t df t (y t ); Z Variance: t = E(y t t ) = (y t t ) df t (y t ); 1

Autocovariance: t = E(y t t )(y t t ) Z = (y t t )(y t t )df t;t (y t ; y t ): De nition f" t g is a white noise process if, for all t: t = E(" t ) = ; t = E(" t ) = ; t = E(" t " t ) =, 6= : f" t g is a Gaussian white noise process if " t N(; ) for all t, i.e., " t is normally distributed with mean and variance. De nition fy t g is an AR(1) process ( rst-order autoregressive) if y t = c + y t 1 + " t ; where f" t g is white noise and c; are arbitrary constants. For an AR(1) process with < 1: t = c 1 ; t = 1 ; t = 1 : Note that the moments in the preceding examples do not depend on t. This property de nes an important class of processes. De nition fy t g is covariance-stationary or weakly stationary if t and t do not depend on t. For these notes we will simply say "stationary." Intuitively, for a stationary process the e ects of a given realization of y t die out as t! 1.

De nition fy t g is an MA(1) process (in nite-order moving average) if y t = + X 1 = " t ; where f" t g is white noise and ; ; 1 ; ::: are arbitrary constants. A su cient condition for stationarity of an MA(1) process is square summability of the coe cients f g: X 1 = < 1: Linear forecasting interpretation Consider the problem of forecasting y t on the basis of past observations y t 1 ; y t ; :::. A linear forecasting rule predicts y t as a linear function of n past observations y t 1 ; y t ; :::; y t n : y t = X n g (n)y t ; =1 where g 1 (n); :::; g n (n) are constants. The forecast error and mean squared error of the forecast rule are given by: X n F E = y t g (n)y t ; =1 X n MSE = E(y t g (n)y t ) : =1 De nition The linear proection of y t on y t 1 ; y t ; :::; y t n is the linear forecasting rule that satis es X n E(y t =1 gp (n)y t )y t s = ; s = 1; ; :::, i.e., the forecast error is uncorrelated with y t s for all s >. Let the linear proection be denoted by y P t (n): y P t (n) = P n =1 gp (n)y t : The associated forecast error is denoted by " P t (n): " P t (n) = y t X n =1 gp t (n)y t :

The linear proection has the following key property. Proposition The linear proection minimizes M SE among all linear forecasting rules. Proof For any linear forecasting rule, we may write MSE = E(y t X n =1 g (n)y t ) = E(y t y P t (n) + y P t (n) = E(y t y P t (n)) + E(y P t (n) +E(y t y P t (n))(y P t (n) = E(y t y P t (n)) + E( X n + X n X n =1 g (n)y t ) X n X n =1 g (n)y t ) =1 g (n)y t ) =1 (gp (n) g (n))y t ) =1 (gp (n) g (n))e(y t y P t (n))y t : The third term is zero by the de nition of linear proection. Thus, MSE is minimized by setting g (n) = g P (n) for all, i.e., the linear proection gives the lowest MSE among all linear forecasting rules. It can be shown that the linear proections y P t (n) converge in mean square to a random variable y P t as n! 1: lim n!1 E(yP t (n) yt P ) = : y P t is the linear proection of y t on y t 1 ; y t ; :::. De nition The fundamental innovation is the forecast error associated with the linear proection of y t on y t 1 ; y t ; ::: : " t = y t y P t : Note that the fundamental innovation is a least squares residual that obeys the orthogonality condition E(" t y t s ) = for s = 1; ; :::. Wold Decomposition Theorem Any stationary process fy t g with Ey t = can be represented as: y t = X 1 = 4 " t + t ;

where: (i) = 1 and P 1 = < 1 (square summability); (ii) f" t g is white noise; (iii) " t = y t y P t (fundamental innovation); (iv) t is the linear proection of t on y t 1 ; y t ; :::; and (v) E(" s t ) = for all s and t. " t is called the linearly indeterministic component, and t is called the linearly deterministic component. The Wold Decomposition Theorem represents the stationary process fy t g in terms of processes f" t g and f t g that are orthogonal at all leads and lags. The component f t g can be predicted arbitrarily well from a linear function of y t 1 ; y t ; :::, while the component f" t g is the forecast error when yt P is used to forecast y t. De nition A stationary process fy t g is purely linearly indeterministic if t = for all t. For a purely linearly indeterministic process, the Wold Decomposition Theorem shows that fy t g can be represented as a MA(1) process with = : y t = X 1 = " t ; This is called the moving average representation of fy t g. Example An AR(1) process fy t g may be represented using the lag operator: (1 L)y t = c + " t If < 1: y t = = 1 1 L (c + " t) = X 1 = (L) (c + " t ) c 1 + X 1 = " t

Express as deviation from mean: ^y t = y t c 1 = X 1 = " t Then E(^y t ) =. It follows that f^y t g is purely linearly indeterministic, and the MA(1) representation has =. Moreover: X 1 E(^y t ^y t 1 )^y t s = E" t = " t s = X 1 = E" t " t s = Thus ^y t P = ^y t 1, and the " t s are the fundamental innovations of the process. M A(1) representation of AR(p) processes De nition fy t g is an AR(p) process ( p th -order autoregressive) if y t = c + 1 y t 1 + y t + ::: + p y t p + " t (1) = c + X p i=1 iy t i + " t ; where f" t g is white noise and c; 1 ; :::; p are arbitrary constants. Represent (1) using the lag operator: (1 1 L L ::: p L p )y t = c + " t : To obtain an M A(1) representation, consider the equation p 1 p 1 p ::: p = : () This is called the characteristic equation. According to the fundamental theorem of algebra, there are p roots 1 ; ; :::; p in the complex plane such that, for any : p 1 p 1 p ::: p = ( 1 )( ) ( p ): Note that complex roots come in conugate pairs i = a + bi; = a by p and let z = 1=: bi. Divide through 1 1 z z ::: p z p = (1 1 z)(1 z) (1 p z): 6

By setting z = L we may write (1 1 L L ::: p L p )y t = (1 1 L)(1 L) (1 p L)y t = c + " t : Assume i < 1 for all i, i.e., all roots lie inside the unit circle on the complex plane (recall a + bi = a + b, which is the length of the vector (a; b)). Solve for y t : y t = 1 1 1 1 L 1 L 1 1 p L (c + " t): () The MA(1) representation is derived from () in two steps. Step 1 - Constant term. Note that, for any constant : Thus: 1 1 i L = X 1 = ( il) = X 1 = i = 1 i : 1 1 1 1 L 1 L 1 1 p L c = c (1 1 )(1 ) (1 p ) = c : (4) 1 1 ::: p Step - MA coe cients. Suppose the roots of () are distinct, i.e., i 6= k for all i; k. Then the product term in () can be expanded with partial fractions: where 1 1 1 1 L 1 L 1 1 p L = X p! i p 1 i i=1! i 1 i L ; : () py ( i k ) k=1 k6=i It can be shown that P p i=1! i = 1. Furthermore, we can write: X p i=1! i 1 i L = X p! X 1 i ( il) = X 1 X p! i i=1 = = i=1 i L = X 1 = L ; where X p i=1! i i : (6) 7

Thus: 1 1 1 1 L 1 L 1 1 p L " t = X 1 = " t : Clearly, = 1. Combining the terms gives: y t = + " t + X 1 =1 " t : (7) The restriction i < 1 for all i implies that f g satis es square summability, and so fy t g is stationary. The following proposition summarizes this analysis. Proposition. Suppose () has distinct roots 1 ; :::; p satisfying i < 1 for all i. Then the AR(p) process (1) is stationary and has an MA(1) representation (7), where is given by (4) and i is given by () and (6). Often AR(p) processes are analyzed using this alternative form of the characteristic equation: 1 1 z z ::: p z p = : In this case, the stationarity condition is that the roots lie outside of the unit circle, since the roots of this equation are the inverses of the earlier roots. 4 Nonstationary processes a Trend-stationary processes De nition fy t g is a trend-stationary process if fy t y tr t g is stationary, where fy tr t g is a deterministic sequence referred to as the trend of fy t g. Example Let fy t g be given by y t = X K = k= kt k + X 1 " t ; 8

where 1 ; :::; K are arbitrary constants. In this case the process has a polynomial trend: y tr t = X K k= kt k : fy t g is trend-stationary as long as its MA(1) component is stationary. b Unit root processes De nition An AR(p) process is integrated of order r, or I(r), if its characteristic equation has r roots equal to unity. Let fy t g be an AR(p) process whose roots satisfy i < 1, i = 1; :::; p 1, and p = 1. Then fy t g is I(1), and it may be written as (1 1 L L ::: p L p )y t = (1 1 L)(1 L) (1 p 1 L)(1 L)y t = (1 1 L)(1 L) (1 p 1 L)y t = c + " t ; where y t = y t y t 1 is the rst di erence of y t. It follows that the process fy t g is stationary. Example The following AR(1) process is called a random walk with drift: y t = c + y t 1 + " t : De ne the process fy t g by y t = c + " t : Then fy t g is stationary. V AR(p) processes a De nition De nition. fy t g is a V AR(p) process ( p th -order vector autoregressive) if Y t = C + X p i=1 iy t i + " t ; (8) 9

where Y t = 6 4 y 1t y t. ; C = 7 6 4 c 1 c. ; i = 7 6 4 i 11 i 1 i 1n i 1 i i n... ; " t = 7 6 4 " 1t " t. ; 7 y nt c n i n1 i n i nn " nt and " t is vector white noise: E(" t ) = n1 ; E(" t " t) = ; where is a positive de nite and symmtric n n matrix, and E(" s " t) = nn for all s 6= t: is the variance-covariance matrix of the white noise vector. Positive de niteness means that x x > for all nonzero n-vectors x. b Stationarity To evaluate stationarity of a V AR(p), consider the equation I n p 1 p 1 p ::: p = ; (9) where denotes the determinant and I n is the n n identity matrix: 1 1 I n : 6 4... 7 1 The V AR is stationary if all solutions = i to (9) satisfy i < 1. (Note that there are np roots of (9), possibly repeated, and complex roots come in conugate pairs.) Equivalently, the V AR is stationary if all values of z satisfying I n 1 z z ::: p z p = 1

lie outside of the unit circle. The solutions to (9) can be computed using the following np np matrix: 1 p 1 p I n n n n n F = n I n n n n ; 6 4..... 7 n n n I n n where n is an n n matrix of zeros. It can be shown that the eigenvalues 1 ; :::; np of F are precisely the solutions to I n p 1 p 1 p ::: p = : To calculate the mean of a stationary V AR, take expectation: EY t = C + X p iey t i : i=1 Stationarity implies EY t = for all t. Thus: = (I n X p i=1 i) 1 C: The variance and autocovariances of a stationary V AR are given by E(Y t )(Y t ) : Each is an n n matrix, with ik giving the covariance between y it and y k;t. c M A(1) representation To obtain an M A(1) representation, express (8) as X p (I n il i )Y t = C + " t : i=1 Stationarity allows us to invert the lag polynomial: (I n X p i=1 il i ) 1 = X 1 = L : (1) 11

Thus: Y t = + X 1 = " t : (11) The values of, = ; 1; ; ::: may be obtained using the method of undetermined coe cients. Write (1) as I n = (I n X p i=1 il i ) X 1 = L : (1) The constant terms on each side of (1) must agree. Thus: I n = : (1) Further, since there are no powers of L on the LHS, the coe cient of L on the RHS must equal zero for each > : = 1 1 p p ; = 1; ; :::. (14) Given the coe cients i and = I n, (14) may be iterated to compute MA coe cients 1; ; ; :::. Nonuniqueness. Importantly, the M A(1) representation of a V AR is nonunique. Let H be any nonsingular n n matrix, and de ne u t H" t : Note that u t is vector white noise: E(u t ) = HE(" t ) = n1 ; E(u t u t) = HE(" t " t)h = HH ; E(u s u t) = HE(" s " t)h = n ; and HH is positive de nite since H x is nonzero whenever x is. The MA(1) representation can be expressed as Y t = + X 1 = H 1 H" t = + X 1 = u t ; 1

where H 1. Note that in this case u t is not the fundamental innovation. To obtain the MA(1) representation in terms of the fundamental innovation we must impose the normalization = I n, i.e., H = I n. 6 Identi cation of shocks a Triangular factorization We wish to assess how uctuations in "more exogenous" variables a ect "less exogenous" ones. One way to do this is to rearrange the vector of innovations " t into components that derive from "exogenous shocks" to the n variables. This can be accomplished using a triangular factorization of. For any positive de nite symmetric matrix, there exists a unique representation of the form = ADA ; (1) where A is a lower triangular matrix with 1 s along the principal diagonal: 1 a 1 1 A = a 1 a 1 ; 6 4.... 7 a n1 a n a n 1 and D is a diagonal matrix: D = 6 4 with d ii > for i = 1; :::; n. d 11 d d.... d nn ; 7 1

Use the factorization to de ne a vector of exogenous shocks: u t A 1 " t : Substitute into the M A(1) representation to obtain an alternative "structural" representation: Y t = + " t + X 1 =1 " t = + Au t + X 1 =1 Au t = + X 1 = u t ; where A; A; = 1; ; :::. Note that the shocks u 1t ; :::; u nt are mutually uncorrelated: E(u t u t) = A 1 E(" t " t)(a 1 ) = A 1 (A ) 1 = A 1 ADA (A ) 1 = D: Thus: V ar(u it ) = d ii ; Cov(u it ; u kt ) = : To implement this approach, we order the variables from "most exogenous" to "least exogenous." This means that innovations to y it are a ected by the shocks u 1t ; :::; u it, but not by u i+1;t ; :::; u nt. Bivariate case. where ^y it y it matrices Let n =. (11) may be expressed as ^y t " t 4 ^y 1t = 4 " 1t + X 1 =1 4 11 1 1 4 " 1;t " ;t i. Here y 1t is taken to be "most exogenous." is factorized using the A = 4 1 a 1 1 ; D = 4 d 11 : d ; Thus, 4 " 1t " t = 4 1 a 1 1 4 u 1t u t = 4 u 1t a 1 u 1t + u t : 14

Innovations to y 1t are driven by the exogenous shocks u 1t. Innovations to y t are driven by both innovations to y 1t and uncorrelated shocks u t. Furthermore, for > : = A = 4 11 1 1 4 1 a 1 1 = 4 11 + a 1 1 1 1 + a 1 : The alternative "structural" M A(1) representation is 4 ^y 1t = 4 1 4 u 1t + X 1 4 =1 a 1 1 ^y t u t 11 + a 1 1 1 1 + a 1 4 u 1;t u ;t We can use this to assess the e ects of an exogenous shock to y 1t. Suppose the system begins : in the nonstochastic steady state: 4 u 1;t u ;t = 4 ; = 1; ; ::: ) 4 ^y 1;t ^y ;t = 4 : At time t there is a positive shock to y 1t, and there are no shocks following this: 4 u 1t = 4 1 ; u t ^y t 4 u 1;t+ u ;t+ = 4 ; = 1; ; :::. Then from the above representation we have 4 ^y 1t = 4 1 4 1 = 4 1 a 1 1 4 ^y 1;t+ ^y ;t+ = 4 11 + a 1 1 1 1 + a 1 a 1 4 1 = 4 ; 11 + a 1 1 1 + a 1 Subsequent movements in each variable are driven by the direct e ect of y 1t and an indirect e ect coming through the response of y t. These are the orthogonalized impulse-response functions. We can also assess the e ects of a positive shock to y t, as captured by u t. In this case the change in y t is conditioned on u 1t, i.e., u t indicates the movement in y t that cannot : 1

be predicted after u 1t is known. 4 u 1t = 4 ; 1 u t 4 ^y 1;t+ ^y ;t+ 4 ^y 1t ^y t = 4 = 4 u 1;t+ u ;t+ 4 1 a 1 1 = 4 ; = 1; ; :::, 11 + a 1 1 1 1 + a 1 4 = 4 ; 1 1 4 = 4 1 Note that u 1t a ects y t in period t (as long as a 1 6= ), but u t does not a ect y 1t. This is the sense in which y 1t is "more exogenous." 1 : Empirical implementation. For a given observed sample of size T, we can obtain OLS estimates ^C and ^ i, i = 1; :::; p by regressing Y t on a constant terms and p lags Y t 1 ; :::; Y t p. Estimated innovations are obtained from the OLS residuals: ^" t = Y t ^C X p ^ i Y t i=1 i : The variance-covariance matrix is estimated as ^ = 1 T X T t=1 ^" t^" t: Estimates of the MA coe cients ^, = 1; ; ::: can be obtained using the formulas derived above: ^ = I n ; ^ s ^ s 1 ^1 ^ s ^ ^ s p ^p = ; s = 1; ; :::. Orthogonalized impulse response functions are computed as ^ = A; ^ = ^ A; = 1; ; :::. The coe cient ^ ik, the ik-element of ^, gives the response of ^y i;t+ to a one-unit positive shock to u kt. 16

Cholesky factorization. For any positive de nite symmetric matrix, there exists a unique representation of the form = P P ; (16) where P = AD 1= = 6 4 1 a 1 1 a 1 a 1.... a n1 a n a n 1 7 6 4 p d11.. p d p d.. p dnn : 7 This is called the Cholesky factorization. Using the Cholesky factorization, the vector of exogenous shocks may be de ned as: v t P 1 " t : In the structural representation, A is simply replaced by P. Moreover, E(v t v t) = I n, i.e., V ar(v it ) = 1 for all i. b Forecast error decomposition For a stationary V AR(p), consider the problem of forecasting Y t+s at period t. Using (11), the forecast error may be written Y t+s E t Y t+s = X s =1 s " t+ ; where E t () denotes expectation conditional on period t information. The mean squared error of the s-period ahead forecast is given by = X s =1 MSE(s) = E(Y t+s E t Y t+s )(Y t+s E t Y t+s ) s X s = E =1 E(" t+ " t+) s s = X s " t+ X s + X s =1 s =1 l=1 " t+l s l X l6= s ; s E(" t+ " t+l ) s l 17

since E(" t+ " t+ ) = E(" t" t) = for all, while E(" t+ " t+l ) = nn for l 6=. MSE(s) can be decomposed based on the contributions of the identi ed shocks u 1t ; :::; u nt. The innovations " t may be expressed as " t = Au t = X n i=1 A iu it ; where A i is the i th column of the matrix A de ned in (1). Thus: X = E(" t " n t) = E A iu it X n i=1 k=1 A k u kt = X n i=1 A ie(u it)a i + X n = X n i=1 X i=1 A id ii A i; i6=k A ie(u it u kt )A k since E(u it ) = V ar(u it) = d ii and, for k 6= i, E(u it u kt ) = Cov(u it ; u kt ) =. Substitution gives MSE(s) = X s = X n s =1 d X s ii i=1 =1 X n s i=1 A id ii A i A i A i s : s (17) Equation (17) decomposes M SE(s) into n terms, associated with variation contributed by the n shocks u 1t ; :::; u nt. As s! 1, stationarity implies E t Y t+s!, and MSE(s)! E(Y t )(Y t ) = ; i.e., MSE(s) converges to the variance of the V AR. Thus, (17) decomposes the variance in terms of the contributions of the underlying shocks. When the Cholesky factorization is used, the vectors A i are replaced by vectors P i, which are columns of the matrix P de ned in (16). c Identi cation via long-run restrictions Consider the following bivariate VAR process: 4 y 1t = 4 y 1;t 1 + 4 " 1t + X 1 y t " t =1 4 " 1;t " ;t ; (18) 18

with variance-covariance matrix, where = 4 11 1 1 : Note that forecasted values of y 1t are permanently a ected by innovations, while the e ects on y t die out when the s satisfy suitable stationary restrictions. This distinction can be used to identify "permanent" versus "transitory" shocks. Write (18) as y t " t 4 y 1t = 4 " 1t + X 1 =1 4 " 1;t " ;t ; (19) where y t = y t y t 1, and assume that (19) is stationary. We wish to obtain a structural representation 4 y 1t y t = X 1 = 4 u 1;t u ;t ; () where u 1t and u t indicate permanent and transitory shocks, respectively, and = 4 11 1 1 Assume Cov(u 1t ; u t ) = and V ar(u 1t ) = V ar(u t ) = 1, i.e., the variances of the shocks are normalized to unity. Furthermore, since u t = " t : : E t (u t u t) = E t (" t " t) ) = : Recall that the Cholesky factorization gives a unique lower triangular matrix satisfying P P =. It follows that = P for some orthogonal matrix, i.e., satis es = I. Orthogonality implies three restrictions on., so we need one more restriction to identify For the fourth restriction, assume that u t has no long-run e ect on the level of y 1t, so that u t is transitory. For this to be true, all e ects on y 1t must cancel out in the long run: X 1 = 1 = : 19

Moreover, since = : 1 = 11 1 + 1 : Substitute and rearrange: 1 X 1 = 11 + X 1 = 1 = : This supplies one more restriction, and thus is identi ed. 7 Granger causality Consider two stationary processes fy 1t g and fy t g. Recall that the linear proection of y 1t on y 1;t 1 ; y 1;t ; :::, denoted by y1t P, minimizes MSE among all linear forecast rules. We are interested in whether the variable y t can be used to obtain better predictions of y 1t. That is, does the linear proection of y 1t on y 1;t 1 ; y 1;t ; ::: and y ;t 1 ; y ;t ; ::: give a lower MSE than y P 1t? If not, then we say that the variable y t does not Granger-cause y 1t. Suppose y 1t and y t are given by a bivariate V AR: 4 y 1t = 4 c 1 + X p y t c i=1 4 i 11 i 1 i 1 i 4 y 1;t i y ;t i + 4 " 1t " t : Then y t does not Granger-cause y 1t if the coe cient matrices are lower triangular: 4 i 11 i 1 = 4 i 11 ; i = 1; :::; p: i 1 i i 1 i To test for Granger causality, estimate the rst equation in the V AR with and without the parameter restriction y 1t = c 1 + X p y 1t = c 1 + X p i=1 i 11y 1;t i + 1t ; i=1 (i 11y 1;t i + i 1y ;t i ) + " 1t : Let ^ 1t and ^" 1t be the tted residuals and let the sample size be T. De ne RSS = X T t=1 ^ 1t; RSS 1 = X T t=1 ^" 1t:

Then for large T the following statistic has a distribution: S = T (RSS RSS 1 ) RSS 1 If S exceeds a designated critical value for a (p) variable (e.g., %), then we reect the null hypothesis that y t does not Granger-cause y 1t, i.e., y t does help in forecasting y 1t. Sources Hamilton, Time Series Analysis, 1994, chs.,4,11 Blanchard and Quah,"The Dynamic E ects of Aggregate Demand and Supply Disturbances," AER, Sept. 1989 1