Time Series (Part II)

Similar documents
5: MULTIVARATE STATIONARY PROCESSES

7. MULTIVARATE STATIONARY PROCESSES

TIME SERIES AND FORECASTING. Luca Gambetti UAB, Barcelona GSE Master in Macroeconomic Policy and Financial Markets

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Vector Auto-Regressive Models

VAR Models and Applications

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

3. ARMA Modeling. Now: Important class of stationary processes

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Financial Econometrics / 56

ECON 616: Lecture 1: Time Series Basics

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Ch. 14 Stationary ARMA Process

Vector autoregressive Moving Average Process. Presented by Muhammad Iqbal, Amjad Naveed and Muhammad Nadeem

Econometría 2: Análisis de series de Tiempo

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Covariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )

ECONOMETRICS Part II PhD LBS

Empirical Macroeconomics

Lecture 1: Stationary Time Series Analysis

1 Class Organization. 2 Introduction

Chapter 2: Unit Roots

Trend-Cycle Decompositions

1 Introduction to Multivariate Models

Empirical Macroeconomics

2.5 Forecasting and Impulse Response Functions

Multivariate Time Series: VAR(p) Processes and Models

11. Further Issues in Using OLS with TS Data

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

1 Linear Difference Equations

Class 1: Stationary Time Series Analysis

Questions and Answers on Unit Roots, Cointegration, VARs and VECMs

Advanced Econometrics

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

Notes on Time Series Modeling

Empirical Market Microstructure Analysis (EMMA)

Univariate Nonstationary Time Series 1

Title. Description. var intro Introduction to vector autoregressive models

Discrete time processes

Introduction to Stochastic processes

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Lecture 1: Stationary Time Series Analysis

Lecture 2: Univariate Time Series

It is easily seen that in general a linear combination of y t and x t is I(1). However, in particular cases, it can be I(0), i.e. stationary.

Advanced Econometrics

Vector autoregressions, VAR

Booth School of Business, University of Chicago Business 41914, Spring Quarter 2013, Mr. Ruey S. Tsay. Midterm

Autoregressive Moving Average (ARMA) Models and their Practical Applications

Ch. 15 Forecasting. 1.1 Forecasts Based on Conditional Expectations

Chapter 2. Some basic tools. 2.1 Time series: Theory Stochastic processes

Econ 623 Econometrics II Topic 2: Stationary Time Series

ECON/FIN 250: Forecasting in Finance and Economics: Section 7: Unit Roots & Dickey-Fuller Tests

Lecture 1: Fundamental concepts in Time Series Analysis (part 2)

Some Time-Series Models

Statistics of stochastic processes

Cointegrated VAR s. Eduardo Rossi University of Pavia. November Rossi Cointegrated VAR s Fin. Econometrics / 31

ECON 4160, Spring term Lecture 12

FE570 Financial Markets and Trading. Stevens Institute of Technology

AR, MA and ARMA models

Non-Stationary Time Series and Unit Root Testing

ECON 4160, Lecture 11 and 12

MA Advanced Econometrics: Applying Least Squares to Time Series

Single Equation Linear GMM with Serially Correlated Moment Conditions

Non-Stationary Time Series and Unit Root Testing

Problem Set 1 Solution Sketches Time Series Analysis Spring 2010

Single Equation Linear GMM with Serially Correlated Moment Conditions

7. Integrated Processes

Multivariate Time Series: Part 4

Cointegration Lecture I: Introduction

New Introduction to Multiple Time Series Analysis

Econometrics II Heij et al. Chapter 7.1

LINEAR STOCHASTIC MODELS

Basic concepts and terminology: AR, MA and ARMA processes

ECON 616: Lecture Two: Deterministic Trends, Nonstationary Processes

Applied time-series analysis

Chapter 5. Analysis of Multiple Time Series. 5.1 Vector Autoregressions

Multivariate Time Series

Review Session: Econometrics - CLEFIN (20192)

10. Time series regression and forecasting

Introduction to Modern Time Series Analysis

E 4160 Autumn term Lecture 9: Deterministic trends vs integrated series; Spurious regression; Dickey-Fuller distribution and test

Econ 424 Time Series Concepts

Multivariate forecasting with VAR models

7. Integrated Processes

Vector error correction model, VECM Cointegrated VAR

Econometrics of Panel Data

Cointegration, Stationarity and Error Correction Models.

Bonn Summer School Advances in Empirical Macroeconomics

LECTURE 10 LINEAR PROCESSES II: SPECTRAL DENSITY, LAG OPERATOR, ARMA. In this lecture, we continue to discuss covariance stationary processes.

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Non-Stationary Time Series and Unit Root Testing

1. Stochastic Processes and Stationarity

Estimating and Identifying Vector Autoregressions Under Diagonality and Block Exogeneity Restrictions

Stationary Stochastic Time Series Models

ECON 4160: Econometrics-Modelling and Systems Estimation Lecture 7: Single equation models

Nonstationary Time Series:

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

Variable Targeting and Reduction in High-Dimensional Vector Autoregressions

Transcription:

Time Series (Part II) Luca Gambetti UAB IDEA Winter 2011 1

Contacts Prof.: Luca Gambetti Office: B3-1130 Edifici B Office hours: email: luca.gambetti@uab.es webpage: http://pareto.uab.es/lgambetti/ 2

Goal of the course The main objective of the course is to provide the students with the knowledge of a comprehensive set of tools necessary for empirical research with time series data. Description This is the second part of an introductory 40-hours course in Time Series Analysis with applications in macroeconomics. This part focuses on the theory of multivariate time series models. 3

Contents 1. VAR: preliminaries, representation, stationarity, second moments, specification (1-2). 2. VAR: estimation and hypothesis testing, Granger causality (1-2). 3. VAR: forecasting and impulse response functions (1). 4. Structural VAR: theory (1). 5. Structural VAR: applications (1). 6. Nonstationary data - Cointegration (1). 7. Structural Factor models and FAVARs (1). 8. Bayesian VARs (1). 4

References 1. P. J.Brockwell, and R. A. Davis, (2009), Time Series: Theory and Methods, Springer-Verlag: Berlin 2. F. Canova (2007), Methods for Applied Macroeconomic Research, Princeton University Press: Princeton 3. J. D. Hamilton (1994), Time Series Analysis, Princeton University Press: Princeton 4. H. Lutkepohl (2005), New Introduction to Multiple Time Series, Springer- Verlag: Berlin 5

Grades 40% problem sets 60% final take-home exam. Econometric Software GRETL, MATLAB. 6

1. STATIONARY VECTOR PROCESSES 1 1 This part is partly based on the Hamilton textbook and Marco Lippi s notes. 7

1 Some Preliminary Definitions and Results Random Vector: A vector X = (X 1,..., X n ) whose components are scalarvalued random variables on a probability space. Vector Random Process: A family of random vectors {X t, t T } defined on a probability space, where T is a set of time points. Typically T = R, T = Z or T = N, the sets or real, integer and natural numbers, respectively. Time Series Vector: A particular realization of a vector random process. 8

1.1 The Lag operator The lag operator L maps a sequence {X t } into a sequence {Y t } such that Y t = LX t = X t 1, for all t. If we apply L repeatedly on a process, for instance L(L(LX t )), we will use the convention L(L(LX t )) = L 3 X t = X t 3. If we apply L to a constant c, Lc = c. Inversion: L 1 is the inverse of L, such that L 1 (L)X t = X t. 9

1.2 Polynomials in the lag operator We can form polynomials; α(l) = 1 + α 1 L + α 2 L 2 +... + α p L p is a polynomial in the lag operator of order p and is such that α(l)x t = X t +α 1 X t 1 +...+α p X t p. Lag polynomials can also be inverted. For a polynomial φ(l), we are looking for the values of the coefficients α i of φ(l) 1 = α 0 + α 1 L + α 2 L 2 +... such that φ(l) 1 φ(l) = 1. Case 1: p = 1. Let φ(l) = (1 φl) with φ < 1. To find the inverse write (1 φl)(α 0 + α 1 L + α 2 L 2 +...) 1 = 1 note that all the coefficients of the non-zero powers of L must be equal to zero. This gives α 0 = 1 φ + α 1 = 0 α 1 = φ α 1 = φ φα 1 + α 2 = 0 α 2 = φ 2 φα 2 + α 3 = 0 α 3 = φ 3 10

and so on. In general α k = φ k, so (1 φl) 1 = j=0 φj L j provided that φ < 1. It is easy to check this because so (1 φl)(1 + φl + φ 2 L 2 +... + φ k L k ) = 1 φ k+1 L k+1 (1 + φl + φ 2 L 2 +... + φ k L k ) = 1 φk+1 L k+1 and k k j=0 φj L j 1 (1 φl). (1 φl) 11

Case 2: p = 2. Let φ(l) = (1 φ 1 L φ 2 L 2 ). To find the inverse it is useful to factor the polynomial in the following way (1 φ 1 L φ 2 L 2 ) = (1 λ 1 L)(1 λ 2 L) where the λ 1, λ 2 are the reciprocal of the roots of the above left-hand side polynomial or equivalently the eigenvalues of the matrix ( ) φ1 φ 2 1 0 Suppose λ 1, λ 2 < 0 and λ 1 λ 2. We have that (1 φ 1 L φ 2 L 2 ) 1 = (1 λ 1 L) 1 (1 λ 2 L) 1. Therefore we can use what we have seen above for the case p = 1. We can write [ (1 λ 1 L) 1 (1 λ 2 L) 1 = (λ 1 λ 2 ) 1 λ 1 1 λ 1 L λ ] 2 1 λ 2 L λ 1 [ = 1 + λ1 L + λ 1 L 2 +... ] λ 1 λ 2 λ 2 λ 1 λ 2 [ 1 + λ2 L + λ 2 L 2 +... ] = (c 1 + c 2 ) + (c 1 λ 1 + c 2 λ 2 )L + (c 1 λ 2 1 + c 2 λ 2 2)L 2 +... 12

where c 1 = λ 1 /(λ 1 λ 2 ), c 2 = λ 2 /(λ 1 λ 2 ) Matrix of polynomial in the lag operator: A(L) if its elements are polynomial in the lag operator, i.e. ( ) ( ) ( ) 1 L 1 0 0 1 A(L) = = + L 0 2 + L 0 2 0 1 We also define and A(0) = A(1) = ( ) 1 0 0 2 ( ) 1 1 0 3 13

1.3 Covariance Stationarity Let Y t be a n-dimensional random vector, Y t = [Y 1t,..., Y nt ]. Then Y t is covariance (weakly) stationary if E(Y t ) = µ, and the autocovariance matrix E(Y t µ)(y t j µ) = Γ j for all t, j, that is are independent of t and both finite. Stationarity of each of the components of Y t does not imply stationarity of the vector Y t. Stationarity in the vector case requires that the components of the vector are stationary and costationary. Although γ j = γ j for a scalar process, the same is not true for a vector process. The correct relation is Γ j = Γ j 14

1.4 Convergence of random variables We refresh three concepts of stochastic convergence in the univariate case and then we extend the three cases to the multivariate framework. Let {x T, T = 1, 2,...} be a sequence of random variables. Convergence in probability The sequence {x T } converges in probability to the random variable x if for every ɛ > 0 lim P r( x T x > ɛ) = 0 T When the above condition is satisfied, the variable c is called probability limit or plim of the sequence {x T }, indicated as plimx T = x Convergence in mean square. The sequence of random variables {x T } converges in mean square to x, denoted by x T c, m.s. if lim E(x T x) 2 = 0 T 15

Convergence in distribution Let F T denote the cumulative distribution function of x T and F the cumulative distribution of the scalar x. The sequence is said to converge in distribution written as x T d x (or xt L x), if for all real numbers c for which F is continuous We will equivalently write lim F T(c) = F (c). n X T d N(µ, σ 2 ) The concepts of stochastic convergence can be generalized to the multivariate setting. Suppose {X T } T =1 is a sequence of n-dimensional random vectors and X is a n-dimensional random vector. Then 1. X T p X if XkT p Xk for k = 1,..., K 2. X T m.s. X if lim E(X T X) (X t X) = 0 3. X T d X if limn F T (c) = F (c) where F and F T are the joint distribution of X and X T. 16

Proposition C.1 L. Suppose {X T } T =1 is a sequence of n-dimensional random vectors. Then the following relations hold: (a) X T m.s. X X T p X XT d X (c) (Slutsky s Theorem) If g : R n R m is a continuous function, then X T p X g(xt ) p g(x) (i.e. plimg(x T ) = g(plimx T )) X T d X g(xt ) d g(x) d Example Suppose that X T N(0, 1). Then X 2 T converges in distribution to the square of N(0, 1), i.e. XT 2 d χ 2 1. 17

Proposition C.2 L. Suppose {X T } T =1 and {Y T } T =1 are sequences of n 1 random vectors and A T is a sequence of n n random matrices, x s a n 1 random vector, c s a fixed n 1 vector, and A is a fixed n n matrix. 1. If plimx T, plimy T and plima T exist then (a) plim(x T +Y T ) = plimx T +plimy T, plim(x T Y T ) = plimx T plimy T, (b) plimc X T = c plimx T (c) plimx T Y T = (plimx T ) (plimy T ) (d) plima T X T = (plima T )(plimx T ) 2. If X T d X and plim(xt Y T ) = 0 then Y T d X. 3. If X T d X and plimyt = c, then (a) X t + Y T d X + c (b) Y T X t d c X 4. If X T d X and plimat = A then A T X T d AX 5. If X T d X and plimat = 0 then plima T X T = 0 18

d Example Let {X T } be a sequence of n 1 random vectors with X T N(µ, Ω), p and Let {Y T } be a sequence of n 1 random vectors with Y T C. Then by 3.(b) Y T X d T N(C µ, CΩC ). 19

1.5 Limit Theorems The Law of Large Numbers and the Central Limit Theorem are the most important results for computing the limits of sequences of random variables. There are many versions of LLN and CLT that differ on the assumptions about the dependence of the variables. Proposition C.12 L (Weak law of large numbers) 1. (iid sequences) Let {Y t } be an i.i.d sequence of random variables with finite mean µ. Then T Ȳ T = T 1 p Y t µ 2. (independent sequences) Let {Y t } be a sequence of independent random variables with E(X t ) = µ < and E X t 1+ɛ c < for some ɛ > 0 and a finite constant c. Then T 1 T t=1 Y p t µ. 3. (uncorrelated sequences) Let {Y t } be a sequence of uncorrelated random variables E(X t ) = µ < and V ar(x t ) c < for some finite constant 20 t=1

c. Then T 1 T t=1 Y t p µ. 4. (stationary processes) Let Y t be a convariance stationary process with finite E(Y t ) = µ and E[(Y t µ)(y t j µ)] = γ j with absolutely summable autocovariances j=0 γ j <. Then ȲT m.s. µ hence ȲT p µ. Weak stationarity and absolutely summable covariances are sufficient conditions for a law of large numbers to hold. Proposition C.13L (Central limit theorem) 1. (i.i.d. sequences) Let {Y T } be a sequence of n-dimensional iid(µ, Σ) random variables then T ( Ȳ T µ) d N(0, Σ) 2. (stationary processes) Let Y t = µ + j=0 Φ jε t j be a n-dimensional stationary random process with ε t a i.i.d vector, E(Y t ) = µ <, j=0 Φ j <. Then T (ȲT µ) d N(0, j= Γ j) where Γ j is the autcovariance matrix at lag j. 21

2 Some stationary processes 2.1 White Noise (WN) A n-dimensional vector white noise ɛ t = [ɛ 1t,..., ɛ nt ] W N(0, Ω) is such if E(ɛ t ) = 0 and Γ k = Ω (Ω a symmetric positive definite matrix) if k = 0 and 0 if k 0. If ɛ t, ɛ τ are independent the process is an independent vector White Noise (i.i.d). If also ɛ t N the process is a Gaussian WN. Important: A vector whose components are white noise is not necessarily a white noise. Example: ( let u t be a scalar white noise and define ɛ t = (u t, u t 1 ). Then σ 2 ) ( ) E(ɛ t ɛ t) = u 0 0 0 0 σu 2 and E(ɛ t ɛ t 1) = σu 2. 0 22

2.2 Vector Moving Average (VMA) Given the n-dimensional vector White Noise ɛ t a vector moving average of order q is defined as Y t = µ + ɛ t + C 1 ɛ t 1 +... + C q ɛ t q where C j are n n matrices of coefficients and µ is the mean of Y t. The VMA(1) Let us consider the VMA(1) Y t = µ + ɛ t + C 1 ɛ t 1 with ɛ t W N(0, Ω). The variance of the process is given by with autocovariances Γ 0 = E[(Y t µ)(y t µ) ] = Ω + C 1 ΩC 1 Γ 1 = C 1 Ω, Γ 1 = ΩC 1, Γ j = 0 for j > 1 23

The VMA(q) Let us consider the VMA(q) Y t = µ + ɛ t + C 1 ɛ t 1 +... + C q ɛ t q with ɛ t W N(0, Ω), µ is the mean of Y t. The variance of the process is given by with autocovariances Γ 0 = E[(Y t µ)(y t µ) ] = Ω + C 1 ΩC 1 + C 2 ΩC 2 +... + C q ΩC q Γ j = C j Ω + C j+1 ΩC 1 + C j+2 ΩC 2 +... + C q ΩC q j for j = 1, 2,..., q Γ j = ΩC j + C 1 ΩC j+1 + C 2 ΩC j+2 +... + C q+j ΩC q for j = 1, 2,..., q Γ j = 0 for j > q 24

The VMA( ) A useful process, as we will see, is the VMA( ) Y t = µ + C j ε t j (1) j=0 A very important result is that if the sequence {C j } is absolutely summable (i.e. j=0 C j < where C j = m n c mn,j, or equivalenty each sequence formed by the elements of the matrix is absolutely summable) then infinite sequence above generates a well defined (mean square convergent) process (see for instance proposition C.10L). Proposition (10.2H). Let Y t be an n 1 vector satisfying Y t = µ + C j ε t j j=0 where ε t is a vector WN with E(ε t ) = 0 and E(ε t ε t j ) = Ω for j = 0 and zero otherwise and {C j } j=0 is absolutely summable. Let Y it denote the ith element of Y t and µ i the ith element of µ. Then 25

(a) The autocovariance between the ith variable at time t and the jth variable at time s periods earlier, E(Y it µ i )(Y jt s µ j ) exists and is given by the row i column j element of Γ s = C s+v ΩC v v=0 for s = 0, 1, 2,... (b) The sequence of matrices {Γ s } s=0 is absolutely summable. If furthermore {ε t } t= is an i.i.d. sequence with E ε i1 tε i2 tε i3 tε i4 t for i 1, i 2, i 3, i 4 = 1, 2,..., n, then also (c) E Y i1 t 1 Y i2 t 2 Y i3 t 3 Y i4 t 4 for all t 1, t 2, t 3, t 4 (d) (1/T ) T t=1 Y ity jt s p E(Yit Y jt s ), for i, j = 1, 2,..., n and for all s Implications: 1. Result (a) implies that the second moments of a MA( ) with absolutely summable coefficients can be found by taking the limit of the autocovariance of an MA(q). 26

2. Result (b) ensures ergodicity for the mean 3. Result (c) says that Y t has bounded fourth moments 4. Result (d) says that Y t is ergodic for second moments 27

2.3 Invertibility and fundamentalness The VMA is invertible if and only if the determinant of C(L) vanishes only outside the unit circle, i.e. if det(c(z)) 0 for all z 1. If the process is invertible it possesses a unique VAR representation (clear later on). Example Consider the process ( Y1t Y 2t ) = ( 1 L ) ( ) ε1t 0 θ L ε 2t det(c(z)) = θ z which is zero for z = θ. Obviously the process is invertible if and only if θ > 1. The VMA is fundamental if and only if the det(c(z)) 0 for all z < 1. In the previous example the process is fundamental if and only if θ 1. In the case θ = 1 the process is fundamental but noninvertible. 28

Provided that θ > 1 the MA process can be inverted and the shock can be obtained as a combination of present and past values of Y t. In fact ( 1 L ) ( ) θ L Y1t 1 0 θ L Y 2t = ( ε1t Notice that for any noninvertible process with determinant that does not vanish on the unit circle there is an invertible process with identical autocovariance structure. ε 2t ) Example: univariate MA. Consider the MA(1) y t = u t + mu t 1 with u t W N, and m > 1 i.e. y t is noninvertible. The autocovariances are E(y 2 t ) = (1 + m) 2 σ 2 u, E(y t y t j ) = mσ 2 u for j = 1, 1 and E(y t y t j ) = 0 for j > 1. Now consider this alternative representation y t = v t + 1 m v t 1 29

which is invertible and ( v t = 1 + 1 ) 1 m L y t = ( 1 + 1 m L ) 1 (1 + ml)u t where v t is indeed a white noise process with variance m 2 σ 2 u. 30

2.4 Wold Decomposition Any zero-mean stationary vector process Y t admits the following representation Y t = C(L)ε t + µ t (2) where C(L)ɛ t is the stochastic component with C(L) = i=0 C il i and µ t the purely deterministic component, the one perfectly forecastable using linear combinations of past Y t. If µ t = 0 the process is said regular. Here we only consider regular processes. (2) represents the Wold representation of Y t which is unique and for which the following properties hold: (a) ɛ t is the innovation for Y t, i.e. ɛ t = Y t Proj(Y t Y t 1, Y t 1,...). (b) ɛ t is White noise, Eɛ t = 0, Eɛ t ɛ τ = 0, for t τ, Eɛ t ɛ t = Ω (c) The coefficients are square summable j=0 C j 2 <. (d) C 0 = I 31

The result is very powerful since holds for any covariance stationary process. However the theorem does not implies that (2) is the true representation of the process. For instance the process could be stationary but non-linear or noninvertible. 32

2.5 Other fundamental MA( ) Representations It is easy to extend the Wold representation to the general class of fundamental MA( ) representations. For any non singular matrix R of constant we have Y t = C(L)Ru t = D(L)u t where u t W N(0, R 1 ΩR 1 ) and u t = R 1 ɛ t. Fundamentalness is ensured since u t is a linear combination of the Wold shocks. The roots of the deteminant of D(L) will coincide with those of C(L). 33

3 VAR: representations If the MA matrix of lag polynomials is invertible, then for a vector stationary process a Vector Autoregressive (VAR) representation exists. We define C(L) 1 as an (n n) lag polynomial such that C(L) 1 C(L) = I; i.e. when these lag polynomial matrices are matrix-multiplied, all the lag terms cancel out. This operation in effect converts lags of the errors into lags of the vector of dependent variables. Thus we move from MA coefficient to VAR coefficients. Define A(L) = C(L) 1. Then given the (invertible) MA coefficients, it is easy to map these into the VAR coefficients: Y t = C(L)ɛ t A(L)Y t = ɛ t (3) where A(L) = A 0 L 0 + A 1 L 1 + A 2 L 2 +... and A j for all j are (n n) matrices of coefficients. 34

To show that this matrix lag polynomial exists and how it maps into the coefficients in C(L), note that by assumption we have the identity (A 0 + A 1 L 1 + A 2 L 2 +...)(I + C 1 L 1 + C 2 L 2 +...) = I After distributing, the identity implies that coefficients on the lag operators must be zero, which implies the following recursive solution for the VAR coefficients: A 0 = I A 1 = A 0 C 1 A k = A 0 C k A 1 C k... A k 1 C 1 As noted, the VAR is of infinite order (i.e. infinite number of lags required to fully represent joint density). In practice, the VAR is usually restricted for estimation by truncating the lag-length. The pth-order vector autoregression, denoted VAR(p) is given by Y t = A 1 Y t 1 + A 2 Y t 2 +... + A p Y t p + ɛ t (4) 35

Note: Here we are considering zero mean processes. In case the mean of Y t is not zero we should add a constant in the VAR equations. VAR(1) representation Any VAR(p) can be rewritten as a VAR(1). To form a VAR(1) from the general model we define: e t = [ɛ, 0,..., 0], Y t = [Y t, Y t 1,..., Y t p+1] A 1 A 2... A p 1 A p I n 0... 0 0 A = 0 I n... 0 0..... 0...... I n 0 Therefore we can rewrite the VAR(p) as a VAR(1) Y t = AY t 1 + e t This is also known as the companion form of the VAR(p). 36

SUR representation The VAR(p) can be stacked as Y = XΓ + u where X = [X 1,..., X T ], X t = [Y [ɛ 1,..., ɛ T ] and Γ = [A 1,..., A p ] t 1, Y t 2..., Y t p] Y = [Y 1,..., Y T ], u = Vec representation Let vec denote the stacking columns operator, i.e X = X 11 X 12 X 21 X 22 then vec(x) = X 31 X 32 X 11 X 21 X 31 X 12 X 22 X 32 Let γ = vec(γ), then the VAR can be rewritten as Y t = (I n X t)γ + ɛ t 37

4 VAR: Stationarity 4.1 Stability and stationarity Consider the VAR(1) Y t = µ + AY t 1 + ε t Substituting backward we obtain Y t = µ + AY t 1 + ε t = µ + A(µ + AY t 2 + ε t 1 ) + ε t = (I + A)µ + A 2 Y t 2 + Aε t 1 + ε t. Y t = (I + A +... + A j )µ + A j Y t j + j 1 i=0 A i ε t i If all the eigenvalues of A are smaller than one in modulus then 1. A j = P Λ j P 1 0. 2. the sequence A i, i = 0, 1,... is absolutely summable. 38

3. the infinite sum j 1 i=0 Ai ε t i exists in mean square (see e.g. proposition C.10L); 4. (I + A +... + A j )µ (I A) 1 and A j 0 as j goes to infinity. Therefore if the eigenvalues are smaller than one in modulus then Y t has the following representation Y t = (I A) 1 + A i ε t i i=0 Note that the eigenvalues (λ) of A satisfy det(iλ A) = 0. Therefore the eigenvalues correspond to the reciprocal of the roots of the determinant of A(z) = I Az. A VAR(1) is called stable if det(i Az) 0 for z 1. Equivalently stability requires that all the eigenvalues of A are smaller than one in absoulte value. 39

For a VAR(p) the stability condition also requires that all the eigenvalues of A (the AR matrix of the companion form of Y t ) are smaller than one in modulus or all the roots larger than one. Therefore we have that a VAR(p) is called stable if det(i A 1 z A 2 z 2,..., A p z p ) 0 for z 1. A condition for stationarity: A stable VAR process is stationary. Notice that the converse is not true. An unstable process can be stationary. Notice that the vector M A( ) representation of a stationary VAR satisfies the absolute summability condition so that assumptions of 10.2H hold. 40

4.2 Back the Wold representation How can we find it the Wold representation starting from a VAR? We need to invert the VAR representation. So let us rewrite the VAR(p) as a VAR(1). Substituting backward in the companion form we have Y t = A j Y t j + A j 1 e t j+1 +... + A 1 e t 1 +... + e t If conditions for stationarity are satisfied, the series i=1 Aj converges and Y t has an VMA( ) representation in terms of the Wold shock e t given by Y t = (I AL) 1 e t = A j e t j i=1 = C(L)e t where C 0 = A 0 = I, C 1 = A 1, C 2 = A 2,..., C k = A k. The coefficients of the Wold representation of Y t, C j, will be the n n upper left matrix of C j. 41

Example A stationary VAR(1) ( ) ( ) ( ) ( ) Y1t 0.5 0.3 Y1t 1 ɛ1t = + Y 2t 0.02 0.8 Y 2t 1 ɛ 2t ( ) ( ) 1 0.3 0.81 E(ɛ t ɛ t) = Ω = λ = 0.3.1 0.48 Figure 1: Blu: Y 1, green Y 2. 42

5 VAR: second moments Let us consider the companion form of a stationary (zero mean for simplicity) VAR(p) defined earlier Y t = AY t 1 + e t (5) The variance of Y t is given by Σ = E(Y t Y t) = AΣA + Ω (6) a closed form solution to (6) can be obtained in terms of the vec operator. Let A, B, C be matrices such that the product ABC exists. A property of the vec operator is that vec(abc) = (C A)vec(B) Applying the vec operator to both sides of (7) we have vec(σ) = (A A)vec(Σ) + vec(ω) If we define A = (A A) then we have vec(σ) = (I A) 1 vec(ω) 43

The jth autocovariance of Y t (denoted Γ j ) can be found by post multiplying (6) by Y t j and taking expectations: Thus or E(Y t Y t j ) = AE(Y t Y t j ) + E(e t Y t j ) Γ j = AΓ j 1 Γ j = A j Γ The variance Σ and the jth autocovariance Γ j of the original series Y t is given by the first n rows and columns of Σ and Γ j respectively. 44

6 VAR: specification Specification of the VAR is key for empirical analysis. We have to decide about the following: 1. Number of lags p. 2. Which variables. 3. Type of transformations. 45

6.1 Number of lags As in the univariate case, care must be taken to account for all systematic dynamics in multivariate models. In VAR models, this is usually done by choosing a sufficient number of lags to ensure that the residuals in each of the equations are white noise. AIC: Akaike information criterion Choosing the p that minimizes the following AIC(p) = T ln ˆΩ + 2(n 2 p) BIC: Bayesian information criterionchoosing the p that minimizes the following BIC(p) = T ln ˆΩ + (n 2 p) ln T HQ: Hannan- Quinn information criterion Choosing the p that minimizes the following HQ(p) = T ln ˆΩ + (n 2 p) ln T T 46

ˆp obtained using BIC and HQ are consistent while with AIC it is not. AIC overestimate the true order with positive probability and underestimate the true order with zero probability. Suppose a VAR(p) is fitted to Y 1,..., Y T small sample the following relations hold: (Y t not necessarily stationary). In ˆp BIC ˆp AIC if T 8 ˆp BIC ˆp HQ for all T ˆp HQ ˆp AIC if T 16 47

6.2 Type of variables Variables selection is a key step in the specification of the model. VAR models are small scale models so usually 2 to 8 variables are used. Variables selection depends on the particular application and in general should be included all the variable conveying relevant information. We will see several examples. 48

6.3 Type of transformations Problem: many economic time series display trend over time and are clearly non stationary (mean not constant). Trend-stationary series Y t = µ + bt + ε t, ε t W N. Difference-stationary series Y t = µ + Y t 1 + ε t, ε t W N. 49

Figure 1: blu: log(gdp). green: log(cpi) 50

These series can be thought as generated by some nonstationary process. Here there are some examples Example: Difference stationary ( ) ( ) ( ) ( Y1t 0.01 0.7 0.3 Y1t 1 = + 0.02 0.2 1.2 Y 2t Y 2t 1 ) ( ) ɛ1t +, Ω = ɛ 2t ( 1 0.3 0.3.1 ), λ = ( ) 1 0.9 51

Example: Trend stationary ( ) ( ) ( ) ( Y1t 0 0.5 0.3 Y1t 1 = t+ 0.01 0.02 0.8 Y 2t Y 2t 1 ) ( ) ɛ1t + ɛ 2t Ω = ( 1 0.3 0.3.1 ), λ = ( ) 0.81 0.48 52

So: 1) How do I know whether the series are stationary? 2) What to do if they are non stationary? 53

Dickey-Fuller test In 1979, Dickey and Fuller have proposed the following test for stationarity 1. Estimate with OLS the following equation x t = b + γx t + ε t 2. Test the null γ = 0 against the alternative γ < 0. 3. If the null is not rejected then which is a random walk with drift. x t = b + x t + ε t 4. On the contrary if γ t < 0, then x t is a stationary AR with a = 1 + γ < 1. x t = b + ax t + ε t An alternative is to specify the equation augmented by a deterministic trend x t = b + γx t + ct + ε t 54

With this specification under the alternative the preocess is stationary with a deterministic linear trend. Augmented Dickey-Fuller test. In the augmented version of the test p lags of the lags of x t can be added, i.e. or A(L) x t = b + γx t + ε t A(L) x t = b + γx t + ct + ε t If the test statistic is smaller than (negative) the critical value, then the null hypothesis of unit root is rejected. 55

Transformations I: first differences Let = 1 L be the first differences filter, i.e. a filter such that Y t = Y t Y t 1 and let us consider the simple case of a random walk with drift Y t = µ + Y t 1 + ɛ t where ɛ t is WN. By applying the first differences filter (1 L) the process is transformed into a stationary process Y t = µ + ɛ t Let us now consider a process with deterministic trend By differencing the process we obtain Y t = µ + δt + ɛ t Y t = δ + ɛ t which is a stationary process but is not invertible because it contains a unit root in the MA part. 56

log(gdp) and log(cpi) in first differences 57

Transformations II: removing deterministic trend Removing a deterministic trend (linear or quadratic) from a process from a trend stationary variable is ok. However this is not enough if the process is a unit root with drift. To see this consider again the process this can be writen as Y t = µ + Y t 1 + ɛ t Y t = µt + Y 0 + By removing the deterministic trend the mean of the process becomes constant but the variance grows over time so the process is not stationary. t j=0 ɛ j 58

log(gdp) and log(cpi) linearly detrended 59

Transformations of trending variables: Hodrick-Prescott filter The filter separates the trend from the cyclical component of a scalar time series. Suppose y t = g t + c t, where g t is the trend component and c t is the cycle. The trend is obtained by solving the following minimization problem min {g t } T t=1 T T 1 c 2 t + λ [(g t+1 g t ) (g t g t 1 )] 2 t=1 t=2 The parameter λ is a positive number (quarterly data usually =1600) which penalizes variability in the growth component series while the first part is the penalty to the cyclical component. The larger λ the smoother the trend component. 60