Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g tt : These notes will consider discrete stochastic processes, i.e., processes indexed by the set of integers T = f::: ; 1; ; 1; ; :::g: fy t g 1 t= 1: Moreover, each y t is a real number. Stochastic processes will be referred to more concisely as "processes." De nition The oint distribution of the process fy t g is determined by the oint distributions of all nite subsets of y t s: F t1 ;t ;:::;t n ( 1 ; ; :::; n ) = Pr(y t1 1 ; y t :::; y tn n ); for all possible collections of distinct integers t 1 ; t ; :::; t n. This distribution can be used to determine moments: Z Mean: t = E(y t ) = y t df t (y t ); Z Variance: t = E(y t t ) = (y t t ) df t (y t ); 1
Autocovariance: t = E(y t t )(y t t ) Z = (y t t )(y t t )df t;t (y t ; y t ): De nition f" t g is a white noise process if, for all t: t = E(" t ) = ; t = E(" t ) = ; t = E(" t " t ) =, 6= : f" t g is a Gaussian white noise process if " t N(; ) for all t, i.e., " t is normally distributed with mean and variance. De nition fy t g is an AR(1) process ( rst-order autoregressive) if y t = c + y t 1 + " t ; where f" t g is white noise and c; are arbitrary constants. For an AR(1) process with < 1: t = c 1 ; t = 1 ; t = 1 : Note that the moments in the preceding examples do not depend on t. This property de nes an important class of processes. De nition fy t g is covariance-stationary or weakly stationary if t and t do not depend on t. For these notes we will simply say "stationary." Intuitively, for a stationary process the e ects of a given realization of y t die out as t! 1.
De nition fy t g is an MA(1) process (in nite-order moving average) if y t = + X 1 = " t ; where f" t g is white noise and ; ; 1 ; ::: are arbitrary constants. A su cient condition for stationarity of an MA(1) process is square summability of the coe cients f g: X 1 = < 1: Linear forecasting interpretation Consider the problem of forecasting y t on the basis of past observations y t 1 ; y t ; :::. A linear forecasting rule predicts y t as a linear function of n past observations y t 1 ; y t ; :::; y t n : y t = X n g (n)y t ; =1 where g 1 (n); :::; g n (n) are constants. The forecast error and mean squared error of the forecast rule are given by: X n F E = y t g (n)y t ; =1 X n MSE = E(y t g (n)y t ) : =1 De nition The linear proection of y t on y t 1 ; y t ; :::; y t n is the linear forecasting rule that satis es X n E(y t =1 gp (n)y t )y t s = ; s = 1; ; :::, i.e., the forecast error is uncorrelated with y t s for all s >. Let the linear proection be denoted by y P t (n): y P t (n) = P n =1 gp (n)y t : The associated forecast error is denoted by " P t (n): " P t (n) = y t X n =1 gp t (n)y t :
The linear proection has the following key property. Proposition The linear proection minimizes M SE among all linear forecasting rules. Proof For any linear forecasting rule, we may write MSE = E(y t X n =1 g (n)y t ) = E(y t y P t (n) + y P t (n) = E(y t y P t (n)) + E(y P t (n) +E(y t y P t (n))(y P t (n) = E(y t y P t (n)) + E( X n + X n X n =1 g (n)y t ) X n X n =1 g (n)y t ) =1 g (n)y t ) =1 (gp (n) g (n))y t ) =1 (gp (n) g (n))e(y t y P t (n))y t : The third term is zero by the de nition of linear proection. Thus, MSE is minimized by setting g (n) = g P (n) for all, i.e., the linear proection gives the lowest MSE among all linear forecasting rules. It can be shown that the linear proections y P t (n) converge in mean square to a random variable y P t as n! 1: lim n!1 E(yP t (n) yt P ) = : y P t is the linear proection of y t on y t 1 ; y t ; :::. De nition The fundamental innovation is the forecast error associated with the linear proection of y t on y t 1 ; y t ; ::: : " t = y t y P t : Note that the fundamental innovation is a least squares residual that obeys the orthogonality condition E(" t y t s ) = for s = 1; ; :::. Wold Decomposition Theorem Any stationary process fy t g with Ey t = can be represented as: y t = X 1 = 4 " t + t ;
where: (i) = 1 and P 1 = < 1 (square summability); (ii) f" t g is white noise; (iii) " t = y t y P t (fundamental innovation); (iv) t is the linear proection of t on y t 1 ; y t ; :::; and (v) E(" s t ) = for all s and t. " t is called the linearly indeterministic component, and t is called the linearly deterministic component. The Wold Decomposition Theorem represents the stationary process fy t g in terms of processes f" t g and f t g that are orthogonal at all leads and lags. The component f t g can be predicted arbitrarily well from a linear function of y t 1 ; y t ; :::, while the component f" t g is the forecast error when yt P is used to forecast y t. De nition A stationary process fy t g is purely linearly indeterministic if t = for all t. For a purely linearly indeterministic process, the Wold Decomposition Theorem shows that fy t g can be represented as a MA(1) process with = : y t = X 1 = " t ; This is called the moving average representation of fy t g. Example An AR(1) process fy t g may be represented using the lag operator: (1 L)y t = c + " t If < 1: y t = = 1 1 L (c + " t) = X 1 = (L) (c + " t ) c 1 + X 1 = " t
Express as deviation from mean: ^y t = y t c 1 = X 1 = " t Then E(^y t ) =. It follows that f^y t g is purely linearly indeterministic, and the MA(1) representation has =. Moreover: X 1 E(^y t ^y t 1 )^y t s = E" t = " t s = X 1 = E" t " t s = Thus ^y t P = ^y t 1, and the " t s are the fundamental innovations of the process. M A(1) representation of AR(p) processes De nition fy t g is an AR(p) process ( p th -order autoregressive) if y t = c + 1 y t 1 + y t + ::: + p y t p + " t (1) = c + X p i=1 iy t i + " t ; where f" t g is white noise and c; 1 ; :::; p are arbitrary constants. Represent (1) using the lag operator: (1 1 L L ::: p L p )y t = c + " t : To obtain an M A(1) representation, consider the equation p 1 p 1 p ::: p = : () This is called the characteristic equation. According to the fundamental theorem of algebra, there are p roots 1 ; ; :::; p in the complex plane such that, for any : p 1 p 1 p ::: p = ( 1 )( ) ( p ): Note that complex roots come in conugate pairs i = a + bi; = a by p and let z = 1=: bi. Divide through 1 1 z z ::: p z p = (1 1 z)(1 z) (1 p z): 6
By setting z = L we may write (1 1 L L ::: p L p )y t = (1 1 L)(1 L) (1 p L)y t = c + " t : Assume i < 1 for all i, i.e., all roots lie inside the unit circle on the complex plane (recall a + bi = a + b, which is the length of the vector (a; b)). Solve for y t : y t = 1 1 1 1 L 1 L 1 1 p L (c + " t): () The MA(1) representation is derived from () in two steps. Step 1 - Constant term. Note that, for any constant : Thus: 1 1 i L = X 1 = ( il) = X 1 = i = 1 i : 1 1 1 1 L 1 L 1 1 p L c = c (1 1 )(1 ) (1 p ) = c : (4) 1 1 ::: p Step - MA coe cients. Suppose the roots of () are distinct, i.e., i 6= k for all i; k. Then the product term in () can be expanded with partial fractions: where 1 1 1 1 L 1 L 1 1 p L = X p! i p 1 i i=1! i 1 i L ; : () py ( i k ) k=1 k6=i It can be shown that P p i=1! i = 1. Furthermore, we can write: X p i=1! i 1 i L = X p! X 1 i ( il) = X 1 X p! i i=1 = = i=1 i L = X 1 = L ; where X p i=1! i i : (6) 7
Thus: 1 1 1 1 L 1 L 1 1 p L " t = X 1 = " t : Clearly, = 1. Combining the terms gives: y t = + " t + X 1 =1 " t : (7) The restriction i < 1 for all i implies that f g satis es square summability, and so fy t g is stationary. The following proposition summarizes this analysis. Proposition. Suppose () has distinct roots 1 ; :::; p satisfying i < 1 for all i. Then the AR(p) process (1) is stationary and has an MA(1) representation (7), where is given by (4) and i is given by () and (6). Often AR(p) processes are analyzed using this alternative form of the characteristic equation: 1 1 z z ::: p z p = : In this case, the stationarity condition is that the roots lie outside of the unit circle, since the roots of this equation are the inverses of the earlier roots. 4 Nonstationary processes a Trend-stationary processes De nition fy t g is a trend-stationary process if fy t y tr t g is stationary, where fy tr t g is a deterministic sequence referred to as the trend of fy t g. Example Let fy t g be given by y t = X K = k= kt k + X 1 " t ; 8
where 1 ; :::; K are arbitrary constants. In this case the process has a polynomial trend: y tr t = X K k= kt k : fy t g is trend-stationary as long as its MA(1) component is stationary. b Unit root processes De nition An AR(p) process is integrated of order r, or I(r), if its characteristic equation has r roots equal to unity. Let fy t g be an AR(p) process whose roots satisfy i < 1, i = 1; :::; p 1, and p = 1. Then fy t g is I(1), and it may be written as (1 1 L L ::: p L p )y t = (1 1 L)(1 L) (1 p 1 L)(1 L)y t = (1 1 L)(1 L) (1 p 1 L)y t = c + " t ; where y t = y t y t 1 is the rst di erence of y t. It follows that the process fy t g is stationary. Example The following AR(1) process is called a random walk with drift: y t = c + y t 1 + " t : De ne the process fy t g by y t = c + " t : Then fy t g is stationary. V AR(p) processes a De nition De nition. fy t g is a V AR(p) process ( p th -order vector autoregressive) if Y t = C + X p i=1 iy t i + " t ; (8) 9
where Y t = 6 4 y 1t y t. ; C = 7 6 4 c 1 c. ; i = 7 6 4 i 11 i 1 i 1n i 1 i i n... ; " t = 7 6 4 " 1t " t. ; 7 y nt c n i n1 i n i nn " nt and " t is vector white noise: E(" t ) = n1 ; E(" t " t) = ; where is a positive de nite and symmtric n n matrix, and E(" s " t) = nn for all s 6= t: is the variance-covariance matrix of the white noise vector. Positive de niteness means that x x > for all nonzero n-vectors x. b Stationarity To evaluate stationarity of a V AR(p), consider the equation I n p 1 p 1 p ::: p = ; (9) where denotes the determinant and I n is the n n identity matrix: 1 1 I n : 6 4... 7 1 The V AR is stationary if all solutions = i to (9) satisfy i < 1. (Note that there are np roots of (9), possibly repeated, and complex roots come in conugate pairs.) Equivalently, the V AR is stationary if all values of z satisfying I n 1 z z ::: p z p = 1
lie outside of the unit circle. The solutions to (9) can be computed using the following np np matrix: 1 p 1 p I n n n n n F = n I n n n n ; 6 4..... 7 n n n I n n where n is an n n matrix of zeros. It can be shown that the eigenvalues 1 ; :::; np of F are precisely the solutions to I n p 1 p 1 p ::: p = : To calculate the mean of a stationary V AR, take expectation: EY t = C + X p iey t i : i=1 Stationarity implies EY t = for all t. Thus: = (I n X p i=1 i) 1 C: The variance and autocovariances of a stationary V AR are given by E(Y t )(Y t ) : Each is an n n matrix, with ik giving the covariance between y it and y k;t. c M A(1) representation To obtain an M A(1) representation, express (8) as X p (I n il i )Y t = C + " t : i=1 Stationarity allows us to invert the lag polynomial: (I n X p i=1 il i ) 1 = X 1 = L : (1) 11
Thus: Y t = + X 1 = " t : (11) The values of, = ; 1; ; ::: may be obtained using the method of undetermined coe cients. Write (1) as I n = (I n X p i=1 il i ) X 1 = L : (1) The constant terms on each side of (1) must agree. Thus: I n = : (1) Further, since there are no powers of L on the LHS, the coe cient of L on the RHS must equal zero for each > : = 1 1 p p ; = 1; ; :::. (14) Given the coe cients i and = I n, (14) may be iterated to compute MA coe cients 1; ; ; :::. Nonuniqueness. Importantly, the M A(1) representation of a V AR is nonunique. Let H be any nonsingular n n matrix, and de ne u t H" t : Note that u t is vector white noise: E(u t ) = HE(" t ) = n1 ; E(u t u t) = HE(" t " t)h = HH ; E(u s u t) = HE(" s " t)h = n ; and HH is positive de nite since H x is nonzero whenever x is. The MA(1) representation can be expressed as Y t = + X 1 = H 1 H" t = + X 1 = u t ; 1
where H 1. Note that in this case u t is not the fundamental innovation. To obtain the MA(1) representation in terms of the fundamental innovation we must impose the normalization = I n, i.e., H = I n. 6 Identi cation of shocks a Triangular factorization We wish to assess how uctuations in "more exogenous" variables a ect "less exogenous" ones. One way to do this is to rearrange the vector of innovations " t into components that derive from "exogenous shocks" to the n variables. This can be accomplished using a triangular factorization of. For any positive de nite symmetric matrix, there exists a unique representation of the form = ADA ; (1) where A is a lower triangular matrix with 1 s along the principal diagonal: 1 a 1 1 A = a 1 a 1 ; 6 4.... 7 a n1 a n a n 1 and D is a diagonal matrix: D = 6 4 with d ii > for i = 1; :::; n. d 11 d d.... d nn ; 7 1
Use the factorization to de ne a vector of exogenous shocks: u t A 1 " t : Substitute into the M A(1) representation to obtain an alternative "structural" representation: Y t = + " t + X 1 =1 " t = + Au t + X 1 =1 Au t = + X 1 = u t ; where A; A; = 1; ; :::. Note that the shocks u 1t ; :::; u nt are mutually uncorrelated: E(u t u t) = A 1 E(" t " t)(a 1 ) = A 1 (A ) 1 = A 1 ADA (A ) 1 = D: Thus: V ar(u it ) = d ii ; Cov(u it ; u kt ) = : To implement this approach, we order the variables from "most exogenous" to "least exogenous." This means that innovations to y it are a ected by the shocks u 1t ; :::; u it, but not by u i+1;t ; :::; u nt. Bivariate case. where ^y it y it matrices Let n =. (11) may be expressed as ^y t " t 4 ^y 1t = 4 " 1t + X 1 =1 4 11 1 1 4 " 1;t " ;t i. Here y 1t is taken to be "most exogenous." is factorized using the A = 4 1 a 1 1 ; D = 4 d 11 : d ; Thus, 4 " 1t " t = 4 1 a 1 1 4 u 1t u t = 4 u 1t a 1 u 1t + u t : 14
Innovations to y 1t are driven by the exogenous shocks u 1t. Innovations to y t are driven by both innovations to y 1t and uncorrelated shocks u t. Furthermore, for > : = A = 4 11 1 1 4 1 a 1 1 = 4 11 + a 1 1 1 1 + a 1 : The alternative "structural" M A(1) representation is 4 ^y 1t = 4 1 4 u 1t + X 1 4 =1 a 1 1 ^y t u t 11 + a 1 1 1 1 + a 1 4 u 1;t u ;t We can use this to assess the e ects of an exogenous shock to y 1t. Suppose the system begins : in the nonstochastic steady state: 4 u 1;t u ;t = 4 ; = 1; ; ::: ) 4 ^y 1;t ^y ;t = 4 : At time t there is a positive shock to y 1t, and there are no shocks following this: 4 u 1t = 4 1 ; u t ^y t 4 u 1;t+ u ;t+ = 4 ; = 1; ; :::. Then from the above representation we have 4 ^y 1t = 4 1 4 1 = 4 1 a 1 1 4 ^y 1;t+ ^y ;t+ = 4 11 + a 1 1 1 1 + a 1 a 1 4 1 = 4 ; 11 + a 1 1 1 + a 1 Subsequent movements in each variable are driven by the direct e ect of y 1t and an indirect e ect coming through the response of y t. These are the orthogonalized impulse-response functions. We can also assess the e ects of a positive shock to y t, as captured by u t. In this case the change in y t is conditioned on u 1t, i.e., u t indicates the movement in y t that cannot : 1
be predicted after u 1t is known. 4 u 1t = 4 ; 1 u t 4 ^y 1;t+ ^y ;t+ 4 ^y 1t ^y t = 4 = 4 u 1;t+ u ;t+ 4 1 a 1 1 = 4 ; = 1; ; :::, 11 + a 1 1 1 1 + a 1 4 = 4 ; 1 1 4 = 4 1 Note that u 1t a ects y t in period t (as long as a 1 6= ), but u t does not a ect y 1t. This is the sense in which y 1t is "more exogenous." 1 : Empirical implementation. For a given observed sample of size T, we can obtain OLS estimates ^C and ^ i, i = 1; :::; p by regressing Y t on a constant terms and p lags Y t 1 ; :::; Y t p. Estimated innovations are obtained from the OLS residuals: ^" t = Y t ^C X p ^ i Y t i=1 i : The variance-covariance matrix is estimated as ^ = 1 T X T t=1 ^" t^" t: Estimates of the MA coe cients ^, = 1; ; ::: can be obtained using the formulas derived above: ^ = I n ; ^ s ^ s 1 ^1 ^ s ^ ^ s p ^p = ; s = 1; ; :::. Orthogonalized impulse response functions are computed as ^ = A; ^ = ^ A; = 1; ; :::. The coe cient ^ ik, the ik-element of ^, gives the response of ^y i;t+ to a one-unit positive shock to u kt. 16
Cholesky factorization. For any positive de nite symmetric matrix, there exists a unique representation of the form = P P ; (16) where P = AD 1= = 6 4 1 a 1 1 a 1 a 1.... a n1 a n a n 1 7 6 4 p d11.. p d p d.. p dnn : 7 This is called the Cholesky factorization. Using the Cholesky factorization, the vector of exogenous shocks may be de ned as: v t P 1 " t : In the structural representation, A is simply replaced by P. Moreover, E(v t v t) = I n, i.e., V ar(v it ) = 1 for all i. b Forecast error decomposition For a stationary V AR(p), consider the problem of forecasting Y t+s at period t. Using (11), the forecast error may be written Y t+s E t Y t+s = X s =1 s " t+ ; where E t () denotes expectation conditional on period t information. The mean squared error of the s-period ahead forecast is given by = X s =1 MSE(s) = E(Y t+s E t Y t+s )(Y t+s E t Y t+s ) s X s = E =1 E(" t+ " t+) s s = X s " t+ X s + X s =1 s =1 l=1 " t+l s l X l6= s ; s E(" t+ " t+l ) s l 17
since E(" t+ " t+ ) = E(" t" t) = for all, while E(" t+ " t+l ) = nn for l 6=. MSE(s) can be decomposed based on the contributions of the identi ed shocks u 1t ; :::; u nt. The innovations " t may be expressed as " t = Au t = X n i=1 A iu it ; where A i is the i th column of the matrix A de ned in (1). Thus: X = E(" t " n t) = E A iu it X n i=1 k=1 A k u kt = X n i=1 A ie(u it)a i + X n = X n i=1 X i=1 A id ii A i; i6=k A ie(u it u kt )A k since E(u it ) = V ar(u it) = d ii and, for k 6= i, E(u it u kt ) = Cov(u it ; u kt ) =. Substitution gives MSE(s) = X s = X n s =1 d X s ii i=1 =1 X n s i=1 A id ii A i A i A i s : s (17) Equation (17) decomposes M SE(s) into n terms, associated with variation contributed by the n shocks u 1t ; :::; u nt. As s! 1, stationarity implies E t Y t+s!, and MSE(s)! E(Y t )(Y t ) = ; i.e., MSE(s) converges to the variance of the V AR. Thus, (17) decomposes the variance in terms of the contributions of the underlying shocks. When the Cholesky factorization is used, the vectors A i are replaced by vectors P i, which are columns of the matrix P de ned in (16). c Identi cation via long-run restrictions Consider the following bivariate VAR process: 4 y 1t = 4 y 1;t 1 + 4 " 1t + X 1 y t " t =1 4 " 1;t " ;t ; (18) 18
with variance-covariance matrix, where = 4 11 1 1 : Note that forecasted values of y 1t are permanently a ected by innovations, while the e ects on y t die out when the s satisfy suitable stationary restrictions. This distinction can be used to identify "permanent" versus "transitory" shocks. Write (18) as y t " t 4 y 1t = 4 " 1t + X 1 =1 4 " 1;t " ;t ; (19) where y t = y t y t 1, and assume that (19) is stationary. We wish to obtain a structural representation 4 y 1t y t = X 1 = 4 u 1;t u ;t ; () where u 1t and u t indicate permanent and transitory shocks, respectively, and = 4 11 1 1 Assume Cov(u 1t ; u t ) = and V ar(u 1t ) = V ar(u t ) = 1, i.e., the variances of the shocks are normalized to unity. Furthermore, since u t = " t : : E t (u t u t) = E t (" t " t) ) = : Recall that the Cholesky factorization gives a unique lower triangular matrix satisfying P P =. It follows that = P for some orthogonal matrix, i.e., satis es = I. Orthogonality implies three restrictions on., so we need one more restriction to identify For the fourth restriction, assume that u t has no long-run e ect on the level of y 1t, so that u t is transitory. For this to be true, all e ects on y 1t must cancel out in the long run: X 1 = 1 = : 19
Moreover, since = : 1 = 11 1 + 1 : Substitute and rearrange: 1 X 1 = 11 + X 1 = 1 = : This supplies one more restriction, and thus is identi ed. 7 Granger causality Consider two stationary processes fy 1t g and fy t g. Recall that the linear proection of y 1t on y 1;t 1 ; y 1;t ; :::, denoted by y1t P, minimizes MSE among all linear forecast rules. We are interested in whether the variable y t can be used to obtain better predictions of y 1t. That is, does the linear proection of y 1t on y 1;t 1 ; y 1;t ; ::: and y ;t 1 ; y ;t ; ::: give a lower MSE than y P 1t? If not, then we say that the variable y t does not Granger-cause y 1t. Suppose y 1t and y t are given by a bivariate V AR: 4 y 1t = 4 c 1 + X p y t c i=1 4 i 11 i 1 i 1 i 4 y 1;t i y ;t i + 4 " 1t " t : Then y t does not Granger-cause y 1t if the coe cient matrices are lower triangular: 4 i 11 i 1 = 4 i 11 ; i = 1; :::; p: i 1 i i 1 i To test for Granger causality, estimate the rst equation in the V AR with and without the parameter restriction y 1t = c 1 + X p y 1t = c 1 + X p i=1 i 11y 1;t i + 1t ; i=1 (i 11y 1;t i + i 1y ;t i ) + " 1t : Let ^ 1t and ^" 1t be the tted residuals and let the sample size be T. De ne RSS = X T t=1 ^ 1t; RSS 1 = X T t=1 ^" 1t:
Then for large T the following statistic has a distribution: S = T (RSS RSS 1 ) RSS 1 If S exceeds a designated critical value for a (p) variable (e.g., %), then we reect the null hypothesis that y t does not Granger-cause y 1t, i.e., y t does help in forecasting y 1t. Sources Hamilton, Time Series Analysis, 1994, chs.,4,11 Blanchard and Quah,"The Dynamic E ects of Aggregate Demand and Supply Disturbances," AER, Sept. 1989 1