THE UNIVERSITY OF CHICAGO Booth School of Business Business 494, Spring Qurter 05, Mr. Ruey S. Tsy Reference: Chpter of the textbook. Introduction Lecture 3: Vector Autoregressive (VAR) Models Vector utoregressive models re perhps the most widely used multivrite time series models. They re the dynmic version of multivrite multiple liner regressions commonly used in multivrite sttisticl nlysis. We strt with sttionry VAR models. As before, in this course, { t } is sequence of serilly uncorrelted rndom vectors with men zero nd positive-definite covrince mtrix Σ, which is denoted by Σ > 0. A k-dimensionl time series follows VAR(p) model if p z t φ 0 + φ i z t i + t, () i where φ 0 is k-dimensionl constnt vector nd φ i re k k rel-vlued mtrices. For VAR(p) model, we shll discuss the following topics:. Model structure nd Grnger cuslity. Reltion to trnsfer function models (distributed lg model) 3. Sttionrity condition 4. Invertibility condition 5. Moment equtions: multivrite Yule-Wlker equtions 6. Implied component models 7. MA representtion VAR() Models 3 VAR() Models 4 VAR(p) Models 5 Estimtion [This hndout follows Chpter of Tsy (04).]
Given the dt, we hve, for VAR(p) model, z t φ 0 + φ z t + + φ p z t p + t, t p +,..., T.. Lest squres estimtes (generlized LSE) A VAR(p) model cn be written s z t x tβ + t, where x t (, z t,..., z t p) is (kp + )-dimensionl vector nd β [φ 0, φ,..., φ p ] is k (kp + ) mtrix. Putting in mtrix form, the dt for estimtion is Z Xβ + A, () where Z is (T p) k mtrix with i-row being z p+i, X is (T p) (kp +) design mtrix with ith row bing x p+i, nd A is (T p) k mtrix with ith row being p+i. The mtrix representtion in Eq. () is prticulrly convenient for VAR(p) model. For exmple, column j of β contins prmeters ssocited with z jt. From Eq. (), we obtin vec(z) (I k X)vec(β) + vec(a). (3) Note tht the covrince mtrix of vec(a) is Σ I T p. The GLS estimte of β is obtined by minimizing S(β) [vec(a)] (Σ I T p ) vec(a) [vec(z Xβ)] (Σ I T p )vec(z Xβ) (4) tr[(z Xβ)Σ (Z Xβ) ]. (5) The lst equlity holds becuse Σ is symmetric mtrix nd we use tr(dbc) vec(c ) (B I)vec(D). From Eq. (4), we hve S(β) [vec(z) (I k X)vec(β)] (Σ [vec(z) vec(β) (I k X )](Σ vec(z) (Σ I T p )vec(z) vec(β) (Σ I T p )[vec(z) (I k X)vec(β)] I T p )[vec(z) (I k X)vec(β)] X )vec(z) + vec(β) (Σ X X)vec(β). (6) Tking prtil derivtives of S(β) with respect to vec(β), we obtin S(β) vec(β) (Σ X )vec(z) + (Σ X X)vec(β). (7) Equting to zero gives the norml equtions (Σ X X)vec( β) (Σ Consequently, the GLS estimte of n VAR(p) model is vec( β) (Σ X X) (Σ X )vec(z). X )vec(z) [Σ (X X) ](Σ X )vec(z) [I k (X X) X ]vec(z) (8) vec[(x X) (X Z)],
where the lst equlity holds becuse vec(db) (I D)vec(B). In other words, we obtin T β (X X) (X Z) which interestingly does not depend on Σ. x t x t T x t z t, (9) Remrk. The result in Eq. (9) shows tht one cn obtin the GLS estimte of VAR(p) model eqution-by-eqution. Tht is, one cn consider the k multiple liner regressions of z it on x t seprtely, where i,..., k. This estimtion method is convenient when one considers prmeter constrints in VAR(p) model. Ordinry Lest Squres (OLS) Estimte Reders my notice tht the GLS estimte of VAR(p) model in Eq. (9) is identicl to tht of the ordinry lest squres estimte of the multivrite multiple liner regression in Eq. (). Replcing Σ in Eq. (5) by I k, we hve the objective function of the ordinry lest squres estimtion S o (β) tr[(z Xβ)(Z Xβ) ]. (0) The derivtions discussed bove continue to hold step-by-step with Σ replced by I k. One thus obtins the sme estimte given in Eq. (9) for β. The fct tht the GLS estimte is the sme s the OLS estimte for VAR(p) model ws first shown in Zellner (96). In wht follows, we refer to the estimte in Eq. (9) simply s the lest squres (LS) estimte. The residul of the LS estimte is â t z t ˆφ p 0 ˆφ i z t i, i t p +,, T nd let  be the residul mtrix, i.e.  Z X β [I T p X(X X) X ]Y. The LS estimte of the innovtionl covrince mtrix Σ is Σ T (k + )p T â t â t T (k + )p  Â, where the denomintor is determined by [T p (kp + )], which is the effective smple size less the number of prmeters in the eqution for ech component z it. By Eq. (), we see tht β β (X X) X A. () Since E(A) 0, we see tht the LS estimte is n unbised estimtor. The LS estimte of VAR(p) model hs the following properties. Theorem.. For the sttionry VAR(p) model, ssume tht t re independent nd identiclly distributed with men zero nd positive definite covrince mtrix Σ. Then, (i) 3
E( β) β, where β is defined in Eq. (), (ii) E( Σ ) Σ, (iii) the residul  nd the LS estimte β re uncorrelted, nd (iv) the covrince of the prmeter estimtes is Cov[vec( β)] Σ (X X) Σ T x t x t.. Mximum likelihood estimtes under normlity Assume further tht t of the VAR(p) model follows multivrite norml distribution. Let z h:q denote the observtions from t h to t q (inclusive). The conditionl likelihood function of the dt cn be written s T L(z (p+):t z :p, β, Σ ) p(z t z :(t ), β, Σ ) The log-likelihood function then becomes l(β, Σ ) c T p c T p T T T p( t z :(t ), β, Σ ) p( t β, Σ ) (π) k/ exp[ Σ / tσ t ] Σ (T p)/ exp T tr( tσ t ). log( Σ ) T log( Σ ) tr Σ tr( tσ t ) T t t, where c is constnt, nd we use the properties tht tr(cd) tr(dc) nd tr(c + D) tr(c + D). Noting tht T t t A A, where A Z Xβ is the error mtrix in Eq. (), we cn rewrite the log-likelihood function s where S(β) is given in Eq. (5). l(β, Σ ) c T p log( Σ ) S(β), () Since the prmeter mtrix β only ppers in the lst term of l(β, Σ ), mximizing the log-likelihood function over β is equivlent to minimizing S(β). Consequently, the mximum likelihood (ML) estimte of β is the sme s its LS estimte. Next, tking the prtil derivtive of the log-likelihood function with respective to Σ, we obtin l( β, Σ ) Σ T p Σ 4 + Σ Â ÂΣ. (3)
Equting the prior norml eqution to zero, we obtin the mximum likelihood estimte of Σ s Σ T pâ Â T â t â T p t. (4) This result is the sme s tht for the multiple liner regression. The ML estimte of Σ is only symptoticlly unbised. Finlly, the Hessin mtrix of β cn be obtined by tking the prtil derivtive of Eq. (7), nmely l(β, Σ ) vec(β) vec(β) S(β) vec(β) vec(β) Σ X X. Inverse of the Hessin mtrix provides the symptotic covrince mtrix of the ML estimte of vec(β). Next, tking derivtive of Eq. (3), we obtin l( β, Σ ) vec(σ ) vec(σ ) T p (Σ Consequently, we hve ( ) l( β, Σ ) E vec(σ ) vec(σ ) Σ ) [(Σ [Σ Â Â(Σ Σ )]. T p (Σ Σ ). Σ )Â ÂΣ ] This result provides symptotic covrince mtrix for the ML estimtes of elements of Σ. Theorem.. Suppose tht the innovtion t of sttionry VAR(p) model follows multivrite norml distribution with men zero nd positive-definite covrince mtrix Σ. Then, the mximum likelihood estimtes re vec( β) (X X) X Z nd Σ T T p â t â t. Also, (i) (T p) Σ is distributed s W k,t (k+)p (Σ ), Wishrt distribution, nd (ii) vec( β) is normlly distributed with men vec(β) nd covrince mtrix Σ (X X), nd (iii) vec( β) is independent of Σ, where Z nd X re defined in Eq. (). Furthermore, T [vec( β) β] nd T [vec( Σ ) vec(σ )] re symptoticlly normlly distributed with men zero nd covrince mtrices Σ G nd Σ Σ, respectively, where G E(x t x t) with x t defined in Eq. (). Finlly, given the dt set {z,..., z T }, the mximized likelihood of VAR(p) model is L( β, Σ z :p ) (π) k(t p)/ Σ (T p)/ exp[ This vlue is useful in likelihood rtio tests to be discussed lter. 3. Properties of the estimtes k(t p) ]. (5) Assume tht the fourth moments of t re finite. Specificlly, t ( t,..., kt ) is continuous nd stisfies E it jt ut vt <, for i, j, u, v,..., k nd ll t. (6) 5
Lemm.3. If the VAR(p) process z t of Eq. () is sttionry nd stisfies the condition in Eq. (6), then, s T, we hve (i) X X/(T p) p G, (ii) T p vec(x A) T p (I k X )vec(a) d N(0, Σ G), where p nd d denote convergence in probbility nd distribution, respectively, X nd A re defined in Eq. (), nd G is nonsingulr mtrix given by G [ 0 0 Γ 0 ] + [ 0 u ] [0, u ], where 0 is kp-dimensionl vector of zeros, Γ 0 is defined below Γ 0 Γ 0 Γ Γ p Γ Γ 0 Γ p... Γ p Γ p Γ 0 nd u p µ with p being p-dimensionl vector of. kp kp, (7) Theorem.3. Suppose tht the VAR(p) time series z t in Eq. () is sttionry nd its innovtion t stisfies the ssumption in Eq. (6). Then, s T, (i) β p β, (ii) T p[vec( β) vec(β)] T p[vec( β β)] d N(0, Σ G ), where G is defined in Lemm.3. Proof. By Eq. (), we hve ( X β ) ( X X ) A β p 0, T p T p becuse the lst term pproches 0. This estblishes the consistency of β. For result (ii), we cn use Eq. (8) to obtin T p[vec( β) vec(β)] T p[i k (X X) X ]vec(a) T p[i k (X X) ][I k X ]vec(a) ( X I ) X k [I k X ]vec(a). T p T p Therefore, the limiting distribution of T p[vec( β) vec(β)] is the sme s tht of (I k G ) (I k X )vec(a). T p 6
Hence, by Lemm.3, the limiting distribution of T p[vec( β) vec(β)] is norml nd the covrince mtrix is The proof is complete. (I k G )(Σ G)(I k G ) Σ G. 4. Byesin estimtion, including Minnesot prior of Littermn (986) 6 Order selection. Sequentil likelihood rtio tests. Informtion criteri: AIC, BIC, nd HQ 7 Model checking. Residul cross-correltion mtrices nd their properties Let  Z X β be the residul mtrix of fitted VAR(p) model, using the nottion in Eq. (). The ith row of  is â p+i z p+i ˆφ 0 p ˆφ i i z t i. The lg-l cross covrince mtrix of the residul series is defined s Ĉ l T p T tp+l+ â t â t l. Note tht Ĉ0 Σ is the residul covrince mtrix. In mtrix nottion, we cn rewrite the lg-l residul cross-covrince mtrix Ĉl s where B is (T p) (T p) bck-shift mtrix defined s Ĉ l pâ B lâ, l 0 (8) T B [ 0 0 T p I T p 0 T p where 0 h is the h-dimensionl vector of zero. The lg-l residul cross-correltion mtrix is define s ˆR l D Ĉl D, (9) where D is the digonl mtrix of the stndrd errors of the residul series, tht is, D dig(ĉ0). In prticulr, R 0 is the residul correltion mtrix. For model checking, we consider the symptotic joint distribution of the residul crosscovrince mtrices Ξ m [Ĉ,..., Ĉm]. Using the nottion in Eq. (8), we hve Ξ m [BÂ, BÂ,..., BmÂ] T pâ pâ B m (I m Â), (0) T 7 ],
where B m [B, B,..., B m ] is (T p) m(t p) mtrix. Since  A X( β β) nd letting T p T p, we hve T Ξ p m A B m (I m A) A B m [I m X( β β)] () ( β β) X B m (I m A) + ( β β) X B m [I m X( β β)]. Lemm.4. Suppose tht z t follows sttionry VAR(p) model of Eq. () with t being white noise process with men zero nd positive covrince mtrix Σ. Also, ssume tht the ssumption in Eq. (6) holds nd the prmeter mtrix β of the model in Eq. () is consistently estimted vi methods discussed before nd the residul cross covrince mtrix is defined in Eq. (8). Then, T p vec( Ξ m ) hs the sme limiting distribution s T p vec(ξ m ) Tp Hvec[( β β) ], where T p T p, Ξ m is the theoreticl counterprt of Ξ m obtined by dividing the first term of Eq. () by T p, nd H H I k with H 0 0 0 Σ ψ Σ ψ m Σ 0 k Σ ψ m Σ... 0 k 0 k ψ m p Σ (kp+) km where 0 is k-dimensionl vector of zero, 0 k is k k mtrix of zero nd ψ i re the coefficient mtrices of the MA representtion of the VAR(p) model., Lemm.5. Assume tht z t is sttionry VAR(p) series stisfying the conditions of Lemm.4, then [ T p vec(a ] ( [ ] ) X) G H d N 0, Σ Tp vec(ξ m ) I m Σ, where G is defined in Lemm.3 nd H is defined in Lemm.4. H Theorem.4. Suppose tht z t follows sttionry VAR(p) model of Eq. () with t being white noise process with men zero nd positive covrince mtrix Σ. Also, ssume tht the ssumption in Eq. (6) holds nd the prmeter mtrix β of the model in Eq. () is consistently estimted by method discussed before nd the residul cross covrince mtrix is defined in Eq. (8). Then, where T p vec( Ξ m ) d N(0, Σ c,m ), Σ c,m (I m Σ H G H ) Σ I m Σ Σ H[(Γ 0) Σ ] H, 8
where H nd G re defined in Lemm.5, Γ 0 is the expnded covrince mtrix defined in Eq. (7), nd H H I k with H being submtrix of H with the first row of zeros removed. Let D be the digonl mtrix of the stndrd errors of the components of t, tht is, D dig{ σ,,..., σ kk, }, where Σ [σ ij, ]. Theorem.5. Assume tht the conditions of Theorem.4 hold. Then, T p vec( ξ m ) d N(0, Σ r,m ), where Σ r,m [(I m R 0 ) H 0G H 0 ] R 0, where R 0 is the lg-0 cross-correltion mtrix of t, H 0 H (I m D ), nd G, s before, is defined in Lemm.3.. Multivrite portmnteu sttistics Let R l be the theoreticl lg-l cross-correltion mtrix of residuls t. The hypothesis of interest in model checking is H 0 : R R m 0 versus H : R j 0 for some j m, () where m is pre-specified positive integer. The Portmnteu sttistic is often used to perform the test. For residul series, the sttistic becomes Q k (m) T m l T m l T m l T m l T l tr( R l R 0 R l R 0 ) T l tr( R l R 0 R l R 0 D D) T l tr( D R l D D R D D R 0 l D D R D 0 ) tr(ĉ lĉ 0 Ĉ T l lĉ 0 ). (3) Theorem.6. Suppose tht z t follows sttionry VAR(p) model of Eq. () with t being white noise process with men zero nd positive covrince mtrix Σ. Also, ssume tht the ssumption in Eq. (6) holds nd the prmeter mtrix β of the model in Eq. () is consistently estimted by method discussed erlier nd the residul cross covrince mtrix is defined in Eq. (8). Then, the test sttistic Q k (m) is symptoticlly distributed s chi-squre distribution with (m p)k degrees of freedom. 3. Model simplifiction nd refinements Testing Zero Prmeters Let ω is v-dimensionl vector consisting of the trget prmeters. In other words, v is the 9
number of prmeters to be fixed to zero. Let ω be the counterprt of ω in the prmeter mtrix β in Eq. (). The hypothesis of interest is H 0 : ω 0 versus H : ω 0. Clerly, there exists v k(kp + ) locting mtrix K such tht Kvec(β) ω, nd Kvec( β) ω. (4) By Theorem.3 nd properties of multivrite norml distribution, we hve T p ( ω ω) d N[0, K(Σ G )K ], (5) where T p T p is the effective smple size. Consequently, under H 0, we hve where v dim(ω). T p ω [K(Σ G )K ] ω d χ v, (6) 8 Liner constrints nd Grnger cuslity test The prior test cn be generlized to liner constrints, nmely vec(β) Jγ + r, (7) where J is k(kp + ) P constnt mtrix of rnk P, r is k(kp + )-dimensionl constnt vector, nd γ denotes P -dimensionl vector of unknown prmeters. 9 Forecsting. Men squres errors We cn use VAR() model to demonstrte forecsting nd men-reverting of sttionry VAR processes.. Forecsts with prmeter uncertinty With estimted prmeters, we hve l-step hed forecst error ê h (l) z h+l ẑ h (l) z h+l z h (l) + z h (l) ẑ h (l) e h (l) + [z h (l) ẑ h (l)]. (8) The two terms of the forecst errors ê h (l) re therefore uncorrelted nd we hve Cov[ê h (l)] Cov[e h (l)] + E{[z h (l) ẑ h (l)][z h (l) ẑ h (l)] } Cov[e h (l)] + MSE[z h (l) ẑ h (l)], (9) where the nottion is used to denote equivlence. Letting T p T p denote the effective smple size in estimtion, we ssume tht the prmeter estimtes stisfy T p vec( β β ) d N(0, Σ β ). 0
Since z h (l) is differentible function of vec(β ), one cn show tht ( T p [ẑ h (l) z h (l) F h ] d N 0, z h (l) vec(β ) Σ z h (l) ) β vec(β. ) This result suggests tht we cn pproximte the men squre error (MSE) in Eq. (9) by [ zh (l) Ω l E vec(β ) Σ z h (l) ] β vec(β. ) If we further ssume tht t is multivrite norml, then we hve T p [ẑ h (l) z h (l)] d N(0, Ω l ). Consequently, we hve Cov[ê h (l)] Cov[e h (l)] + T p Ω l. (30) It remins to derive the quntity Ω l. To this end, we need to obtin the derivtives z h (l)/ vec(β ). Using VAR(p) model, we hve where [ 0 P kp ν Φ ] z h (l) JP l x h, l, (3) (kp+) (kp+), J [0 k, I k, 0 k k(p ) ] k (kp+), nd ν [φ 0, 0 k(p ) ], where Φ is the compnion mtrix of φ(b) defined before, 0 m is n m-dimensionl vector of zero, nd 0 m n is n m n mtrix of zero. This is generlized version of the l-step hed forecst of VAR() model to include the constnt vector φ 0 in the recursion nd it cn be shown by mthemticl induction. Using Eq. (3), we hve z h (l) vec(β ) vec(jp l x h ) vec(β ) (x h J) vec(p l ) vec(β ) (x h J) (x h J) i0 i0 [ i0 [ i0 ] vec(p (P ) l i P i l ) vec(β ) (P ) l i P i ] (I kp+ J ) x h(p ) l i JP i J x h(p ) l i ψ i,
where we hve used the fct tht JP i J ψ i. Using the lest squres estimte β, we hve, vi Eq. (??), Σ β G Σ. Therefore, Ω l E In prticulr, if l, then [ zh (l) vec(β ) (G Σ ) z h(l) ] vec(β ) i0 j0 i0 j0 i0 j0 i0 j0 E(x h(p ) l i G P l j x h ) ψ i Σ ψ j E[tr(x h(p ) l i G P l j x h )]ψ i Σ ψ j tr[(p ) l i G P l j E(x h x h)]ψ i Σ ψ j tr[(p ) l i G P l j G]ψ i Σ ψ j. (3) Ω tr(i kp+ )Σ (kp + )Σ, nd Cov[ẑ h ()] Σ + kp + T p Σ T p + kp + T p Σ. Wht is the min impliction of the result? 0 Impulse response function It is lso known s the multiplier nlysis.. Definition, cumultive impulse response function Consider the MA representtion of vector time series (ssuming zero men), z t t + ψ t + ψ t +. Suppose we like to study the impct of 0 (, 0,..., 0) on the system. Tht is, we like to know wht would hppen to the system when the shock 0 nd ll the other shocks re zero. From the model, it is esily seen tht z 0 0, z ψ 0 ψ [, ], z ψ 0 ψ [, ], where ψ i [, ] denotes the -th column of ψ i. The cumultive impct of 0 on the system to time t n is thn n ψ n [, ] ψ i [, ]. i0
The sme rgument pplies to shock occurring t the ith component of 0. Consequently, the cumultive impct on the system to time t n cn be written s nd the totl multipliers or long-run effects is n ψ n ψ i, i0 ψ ψ i. i0. Orthogonl innovtions The sttement of the prior subsection implicitly ssumes there is no correltion in the components of 0. In rel pplictions, components of t re typiclly correlted so tht one cnnot chnge one component without ffecting the others. A more forml definition of impct of t on the future observtion z t+j for j > 0, is effect z t+j t ψ j t t ψ j Σ [, ]σ,, under the liner model frmework. In the bove, I use the simple liner regression reltion it (σ,i /σ, ) t + ɛ it. To simplify the concept, one cn perform orthogonliztion on t such s the Cholesky decomposition of Σ, nmely Σ U U, where U is n upper tringulr mtrix with positive digonl elements. Let η t (U ) t. Then, cov(η t ) I k nd we hve z t ψ(b) t ψ(b)u (U ) t [ψ(b)u ]η t. We cn then pply the concept of the prior section using modified ψ-weight mtrices ψ j U. An wekness of this pproch to impulse response functions is tht the Cholesky decomposition depends on the ordering of elements of t. Thus, cre must be exercised in multiplier nlysis. Forecst error covrince decomposition R demonstrtion nd exmples 3 MTS commnds used. VARorder: order selection (including VARorderI). VAR: estimtion 3. refvar: refined VAR estimtion 4. VARpsi: compute ψ-weight mtrices 3
5. VARpred: prediction 6. VARirf: impulse response function 7. VARchi: Testing zero prmeter constrints 8. BVAR: Byesin VAR estimtion 9. MTSdig: model checking 4