Lecture note 2 considered the statistical analysis of regression models for time

Similar documents
Univariate Time Series Analysis; ARIMA Models

This note introduces some key concepts in time series econometrics. First, we

Non-Stationary Time Series and Unit Root Testing

Empirical Market Microstructure Analysis (EMMA)

Non-Stationary Time Series and Unit Root Testing

Non-Stationary Time Series and Unit Root Testing

Lecture 2: Univariate Time Series

Autoregressive Moving Average (ARMA) Models and their Practical Applications

Univariate ARIMA Models

Chapter 4: Models for Stationary Time Series

Basic concepts and terminology: AR, MA and ARMA processes

STAT Financial Time Series

3. ARMA Modeling. Now: Important class of stationary processes

at least 50 and preferably 100 observations should be available to build a proper model

Applied time-series analysis

Lecture 4a: ARMA Model

Econometrics II Heij et al. Chapter 7.1

Advanced Econometrics

Introduction to ARMA and GARCH processes

Covariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )

Ch 6. Model Specification. Time Series Analysis

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

FE570 Financial Markets and Trading. Stevens Institute of Technology

Econometrics of financial markets, -solutions to seminar 1. Problem 1

ECON/FIN 250: Forecasting in Finance and Economics: Section 6: Standard Univariate Models

ARIMA Modelling and Forecasting

Econometric Forecasting

1 Linear Difference Equations

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Trend-Cycle Decompositions

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

Lecture 1: Fundamental concepts in Time Series Analysis (part 2)

Dynamic Regression Models

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

Introduction to Economic Time Series

Permanent Income Hypothesis (PIH) Instructor: Dmytro Hryshko

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

10. Time series regression and forecasting

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

Review Session: Econometrics - CLEFIN (20192)

Econ 623 Econometrics II Topic 2: Stationary Time Series

Non-Stationary Time Series, Cointegration, and Spurious Regression

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Some Time-Series Models

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Discrete time processes

2. An Introduction to Moving Average Models and ARMA Models

Problem Set 2: Box-Jenkins methodology

This note discusses some central issues in the analysis of non-stationary time

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

AR, MA and ARMA models

3 Theory of stationary random processes

Cointegration, Stationarity and Error Correction Models.

Univariate linear models

A time series is called strictly stationary if the joint distribution of every collection (Y t

5 Transfer function modelling

Ch 4. Models For Stationary Time Series. Time Series Analysis

Lecture 6a: Unit Root and ARIMA Models

Ch. 14 Stationary ARMA Process

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Chapter 2: Unit Roots

Econometría 2: Análisis de series de Tiempo

Linear Regression with Time Series Data

A SARIMAX coupled modelling applied to individual load curves intraday forecasting

The Identification of ARIMA Models

7 Introduction to Time Series Time Series vs. Cross-Sectional Data Detrending Time Series... 15

Autoregressive and Moving-Average Models

Read Section 1.1, Examples of time series, on pages 1-8. These example introduce the book; you are not tested on them.

Ross Bettinger, Analytical Consultant, Seattle, WA

7 Introduction to Time Series

Time Series Analysis

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Time Series Analysis -- An Introduction -- AMS 586

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

Stationary Stochastic Time Series Models

Advanced Econometrics

ECON3327: Financial Econometrics, Spring 2016

Eksamen på Økonomistudiet 2006-II Econometrics 2 June 9, 2006

Class 1: Stationary Time Series Analysis

Lecture on ARMA model

Ch. 15 Forecasting. 1.1 Forecasts Based on Conditional Expectations

Time Series I Time Domain Methods

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

Time Series Econometrics 4 Vijayamohanan Pillai N

Chapter 9: Forecasting

Econometrics for Policy Analysis A Train The Trainer Workshop Oct 22-28, 2016 Organized by African Heritage Institution

7. Integrated Processes

ARIMA Models. Jamie Monogan. January 16, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 16, / 27

Quantitative Finance I

Introduction to Stochastic processes

Time Series 2. Robert Almgren. Sept. 21, 2009

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

Elements of Multivariate Time Series Analysis

ARIMA Models. Jamie Monogan. January 25, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 25, / 38

Econ 424 Time Series Concepts

Transcription:

DYNAMIC MODELS FOR STATIONARY TIME SERIES Econometrics 2 LectureNote4 Heino Bohn Nielsen March 2, 2007 Lecture note 2 considered the statistical analysis of regression models for time series data, and gave sufficient conditions for the ordinary least squares (OLS) principle to provide consistent, unbiased, and asymptotically normal estimators. The main assumptions imposed on the data were stationarity and weak dependence, and the main assumption on the model was some degree of exogeneity of the regressors. This lecture note introduces some popular classes of dynamic time series models and goes into details with the mathematical structure and the economic interpretation of the models. To introduce the ideas we first consider the moving average (MA) model, which is probably the simplest dynamic model. Next, we consider the popular autoregressive (AR) model and present conditions for this model to generate stationary time series. We also discuss the relationship between AR and MA models and introduce mixed ARMA models allowing for both AR and MA terms. Finally we look at the class of autoregressive models with additional explanatory variables, the so-called autoregressive distributed lag (ADL) models, and derive the dynamic multipliers and the steady-state solution. Outline 1 Estimating Dynamic Effects... 2 2 Univariate Moving Average Models... 3 3 Univariate Autoregressive Models... 6 4 ARMA Estimation and Model Selection... 13 5 Univariate Forecasting... 18 6 The Autoregressive Distributed Lag Model... 20 7 Concluding Remarks... 26 1

1 Estimating Dynamic Effects A central feature of time series data is a pronounced time dependence, and it follows that shocks may have dynamic effects. Below we present a number of popular time series models for the estimation of dynamic responses after an impulse to the model. As a starting point it is useful to distinguish univariate from multivariate models. 1.1 Univariate Models Univariate models consider a single time series, y 1,y 2,...,y T, and model the systematic variation in y t as a function of its own past, i.e. y t y t 1,y t 2,... Although the univariate approach is limited it nevertheless serves two important purposes. The first purpose is to serve as a descriptive tool to characterize the dynamic properties of a time series. We may for example be interested in the strength of the time dependence, or persistence, of the time series, and we often want to assess whether the main assumption of stationarity is likely to be fulfilled. In empirical applications the univariate description often precedes a more elaborate multivariate analysis of the data. The second purpose is forecasting: To forecast y T +1 at time T based on a multivariate model for y t x t it is obviously necessary to know x T +1, which is generally not in the information set at time T.Univariatemodelsoffer the possibility of forecasts of y t based solely on its own past. These forecasts are simple extrapolations based on the systematic variationinthepast. Below we look at two classes of univariate models: We first introduce the moving average model in 2. This is the simplest class of dynamic models and the conditions for stationarity is straightforwardly obtained. We next present the autoregressive model in 3 and derive the stationarity condition by referring to MA case. We emphasize the relationship between the two models and present the ARMA class of mixed models allowing for both autoregressive and moving-average terms. We continue in 4 and 5 to discuss estimation and forecasting. 1.2 Multivariate Models An alternative to univariate models is to consider a model for y t given an information set including other explanatory variables, the vector x t say. These models are obviously more interesting from an economic point of view, and they allow the derivation of dynamic multipliers, y t, y t+1, y t+2,... x t x t x t An important example could be the analysis of monetary policy in which case x t could be the policy interest rate and y t the variable of interest e.g. unemployment or inflation. In 6 we look at a model for y t conditional on x t and the past, i.e. y t y t 1,y t 2,...,x t,x t 1,x t 2,... This is known as the autoregressive distributed lag (ADL) model and is the workhorse in single-equation dynamic modelling. 2

2 Univariate Moving Average Models Possibly the simplest class of univariate timeseriesmodelsisthemoving average (MA) model. To discuss the MA model we first define an IID error process t with mean zero and constant variance σ 2. An error term with these properties, which we will write as t IID(0,σ 2 ), is often referred to as a white noise process and in the time series literature t is sometimes called an innovation or a shock to the process. Recall that a (weakly) stationary stochastic process is characterized by constant mean, variance, and autocovariances (unconditionally), and the white noise process, t, is obviously stationary. 2.1 Finite MA-Models Next we define the moving average model of order q, MA(q),bytheequation y t = µ + t + α 1 t 1 + α 2 t 2 +... + α q t q, t =1, 2,...,T, (1) which states that y t is a moving average of q past shocks. The equation in (1) defines y t for t =1, 2,...,T, which means that we need q initial values for the unobserved error process, and it is costumary to assume that (q 1) = (q 2) =... = 1 = 0 =0. The equation in (1) includes a constant term, but the deterministic specification could be made more general and the equation could include e.g. a linear trend of seasonal dummies. We note that α i is the response on y t of an impulse to t i, and the sequence of α s are often referred to as the impulse response function. The stochastic process y t can be characterized directly from (1). The expectation is given by E[y t ]=E[µ + t + α 1 t 1 + α 2 t 2 +... + α q t q ]=µ, which is just the constant term in (1). The variance can be derived by inserting (1) in the definition, γ 0 := V (y t ) = E (y t µ) 2 = E h( t + α 1 t 1 + α 2 t 2 +... + α q t q ) 2i = E 2 t + E α 2 1 2 t 1 + E α 2 2 2 t 2 +... + E α 2 q 2 t q = 1+α 2 1 + α 2 2 +... + α 2 q σ 2. (2) The third line in the derivation follows from the independent distribution of t so that all covariances are zero, E[ t t h ]=0for h 6= 0. This implies that the variance of the sum is the sum of the variances. The autocovariances of y t can be found in a similar way, noting 3

again that all cross terms are zero, i.e. γ 1 := Cov(y t,y t 1 ) = E[(y t µ)(y t 1 µ)] = E [( t + α 1 t 1 +... + α q t q )( t 1 + α 1 t 2 +... + α q t q 1 )] = (α 1 + α 2 α 1 + α 3 α 2 +... + α q α q 1 )σ 2, γ 2 := Cov(y t,y t 2 ) = E[(y t µ)(y t 2 µ)] = E [( t + α 1 t 1 +... + α q t q )( t 2 + α 1 t 3 +... + α q t q 2 )] = (α 2 + α 3 α 1 + α 4 α 2 +... + α q α q 2 )σ 2,. γ q := Cov(y t,y t q ) = α q σ 2, while the autocovariances are zero for larger lag lengths, γ k := Cov(y t,y t k )=0for k>q. We note that the mean and variance are constant, and the autocovariance γ k depends on k but not on t. By this we conclude that the MA(q) process is stationary by construction. The intuition is that y t is a linear combination of stationary terms, and with constant weights the properties of y t are independent of t. A common way to characterize the properties of the time series is to use the autocorrelation function, ACF. It follows from the covariances that the ACF isgivenbythe sequence ρ k := γ k = α k + α k+1 α 1 + α k+2 α 2 +... + α q α q k γ 0 1+α 2 1 + α2 2 +... +, for k q, α2 q while ρ k =0for k>q. We note that the MA(q) process has a memory of q periods. To illustrate the appearance of MA processes, Figure 1 reports some simulated series and their theoretical and estimated autocorrelation functions. We note that the appearance of the MA(q) process depends on the order of the process, q, and on the parameters, α 1,...,α q, but the shocks to the white noise process are in all cases recognizable, and the properties do not change fundamentally. Also note that the MA(q) process has a memory of q periods, in the sense that it takes q time periods before the effect of a shock t has disappeared. Amemoryofexactlyq periods may be difficult to rationalize in many economic settings, and the MA model is not used very often in econometric applications. 2.2 Infinite MA-Models The arguments for the finite MA-process can be extended to the infinite moving average process, MA( ). In this case, however, we have to ensure that the variance of y t is bounded, V (y t )= 1+α 2 1 + α2 2 +... σ 2 <, whichrequiresthat P j=0 α2 j < so that the infinite sum converges. 4

(A) y t =ε t 1.0 (B) Autocorrelation function for (A) 2.5 0.5 0.0 0.0 2.5 0.5 1.0 0 20 40 60 80 100 0 1 2 3 4 5 6 7 8 9 10 11 5.0 (C) y t =ε t +0.80 ε t 1 1.0 (D) Autocorrelation function for (B) 2.5 0.5 0.0 0.0 0.5 2.5 0 20 40 60 80 100 1.0 0 1 2 3 4 5 6 7 8 9 10 11 (E) y t =ε t 0.80 ε t 1 1.0 (F) Autocorrelation function for (E) 2.5 0.5 0.0 0.0 2.5 0.5 0 20 40 60 80 100 1.0 0 1 2 3 4 5 6 7 8 9 10 11 (G) y t =ε t +ε t 1 +.8 ε t 2 +.4 ε t 3.6 ε t 4 ε t 5 1.0 (H) Autocorrelation function for (G) 5 0.5 0 0.0 5 0.5 0 20 40 60 80 100 1.0 0 1 2 3 4 5 6 7 8 9 10 11 Figure 1: Examples of simulated MA(q) processes. indicates the theoretical ACF while indicates the estimated ACF. Horizontal lines are the 95% confidence bounds for zero autocorrelations derived for an IID process. 5

3 Univariate Autoregressive Models The autoregressive (AR) model with p lags is defined by the equation y t = δ + θ 1 y t 1 + θ 2 y t 2 +... + θ p y t p + t, t =1, 2,...,T, (3) where t IID(0,σ 2 ) is a white noise error process. We use again the convention that the equation in (3) holds for observations y 1,y 2,...,y T, which means that we have observed also the p previous values, y (p 1),y (p 2),...,y 1,y 0 ; they are referred to as initial values for the equation. The assumptions imply that we can think of the systematic part as the best linear prediction of y t given the past, E[y t y t 1,y t 2,...]=E[y t y t 1,y t 2,...,y t p ]=δ + θ 1 y t 1 + θ 2 y t 2 +... + θ p y t p. We note that the first p lags capture all the information in the past, and that the conditional expectation is a linear function of the information set. 3.1 The AR(1) Model The analysis of the autoregressive model is more complicated that the analysis of moving average models and to simplify the analysis we focus on the autoregressive model for p =1 lag, the so-called first order autoregressive, AR(1), model: y t = δ + θy t 1 + t. (4) In this case the first lag captures all the informationinthepast. The only exogenous (or forcing) variable in (4) is the error term and the development ofthetimeseriesy t is determined solely by the sequence of innovations 1,..., T.Tomake this point explicit, we can find the solution for y t in terms of the innovations and the initial value. To do this we recursively substitute the expressions for y t k, k =1, 2,..., to obtain the solution y t = δ + θy t 1 + t = δ + θ(δ + θy t 2 + t 1 )+ t = (1+θ) δ + t + θ t 1 + θ 2 y t 2 = (1+θ) δ + t + θ t 1 + θ 2 (δ + θy t 3 + t 2 ) = 1+θ + θ 2 δ + t + θ t 1 + θ 2 t 2 + θ 3 y t 3. = 1+θ + θ 2 + θ 3 +... + θ t 1 δ + t + θ t 1 + θ 2 t 2 +... + θ t 1 1 + θ t y 0. (5) We see that y t is given by a deterministic term, a moving average of past innovations, and a term involving the initial value. Due to the moving average structure the solution is often referred to as the moving average representation of y t. 6

If we for a moment make the abstract assumption that the process y t startedinthe remote infinite past, then we may state the solution as an infinite sum, y t = 1+θ + θ 2 + θ 3 +... δ + t + θ t 1 + θ 2 t 2 +..., (6) whichwerecognizeasaninfinite moving average process. It follows from the result for infinite MA processes that the AR(1) process is stationary if the MA terms converge to zero so that the infinite sum converges. That is the case if θ < 1 which is known as the stationarity condition for an AR(1) model. In the analysis of the AR(1) model below we assume stationarity and impose this condition. We emphasize that while a finite MA process is always stationary, stationarity of the AR process requires conditions on the parameters. The properties of the time series y t can again be found directly from the moving average representation. The expectation of y t given the initial value is E[y t y 0 ]= 1+θ + θ 2 + θ 3 +... + θ t 1 δ + θ t y 0, where the last term involving the initial value converges to zero for t increasing, θ t y 0 0. The unconditional mean is the expectation of the convergent geometric series in (6), i.e. E[y t ]= δ =: µ. 1 θ This is not the constant term of the model, δ, it also depends on the autoregressive parameter, θ. We hasten to note that the constant unconditional mean is not defined if θ =1, but that case is ruled out by the stationarity condition. The unconditional variance can also be found from the solution in (6). Using the definition we obtain γ 0 := V (y t ) = E h(y t µ) 2i h t = E + θ t 1 + θ 2 t 2 + θ 3 t 2 +... i 2 = σ 2 + θ 2 σ 2 + θ 4 σ 2 + θ 6 σ 2 +... = 1+θ 2 + θ 4 + θ 6 +... σ 2, which is again a convergent geometric series with the limit γ 0 = σ2 1 θ 2. The autocovariances can also be found from (6): γ 1 := Cov(y t,y t 1 ) = E[(y t µ)(y t 1 µ)] = E[( t + θ t 1 + θ 2 t 2 +...)( t 1 + θ t 2 + θ 2 t 3 +...)] = θσ 2 + θ 3 σ 2 + θ 5 σ 2 + θ 7 σ 2 +... = θγ 0, 7

whereweagainusethatthecovariancesarezero: E[ t t h ]=0for h 6= 0. Likewise it follows that γ k := Cov(y t,y t k ) = θ k γ 0. The autocorrelation function, ACF, ofa stationary AR(1) is given by ρ k := γ k γ 0 = θk γ 0 γ 0 = θ k, which is an exponentially decreasing function if θ < 1. It is a general result that the autocorrelation function goes exponentially to zero for a stationary autoregressive time series. Graphically the results imply that a stationary time series will fluctuate around a constant mean with a constant variance. Non-zero autocorrelations imply that consecutive observations are correlated and the fluctuations may be systematic, but over time the process will not deviate too much from the unconditional mean. This is often phrased as the process being mean reverting and we also say that the process has an attractor, defined as a steady state level to which it will eventually return: In this case the unconditional mean, µ. Figure 2 shows examples of first order autoregressive processes. Note that the appearance depends more fundamentally on the autoregressive parameter. For the non-stationary case, θ =1, the process wanders arbitrarily up and down with no attractor; this is known as a random walk. If θ > 1 the process for y t is explosive. We note that there may be marked differences between the true and estimated ACF in small samples. 3.2 The General Case The results for a general AR(p) model are most easily presented in terms of the lagpolynomial introduced in Box 1. We rewrite the model in (3) as y t = δ + θ 1 y t 1 + θ 2 y t 2 +... + θ p y t p + t y t θ 1 y t 1 θ 2 y t 2... θ p y t p = δ + t θ(l)y t = δ + t, (7) where θ(z) :=1 θ 1 z θ 2 z 2... θ p z p is the autoregressive polynomial. The AR(p) model can be written as an infinite moving average model, MA( ), if the autoregressive polynomial can be inverted. As presented in Box 1 we can write the inverse polynomial as θ 1 (z) =1+c 1 z + c 2 z 2 + c 3 z 3 + c 4 z 4 +..., where the coefficients converge to zero. It follows that we can write the AR(p) model as y t = θ 1 (L)(δ + t ) = 1+c 1 L + c 2 L 2 + c 3 L 3 + c 4 L 4 +... (δ + t ) = (1+c 1 + c 2 + c 3 + c 4 +...) δ + t + c 1 t 1 + c 2 t 2 + c 3 t 3 +..., (8) 8

Box 1: Lag Polynomials and Characteristic Roots Ausefultoolintimeseriesanalysisisthelag-operator, L, thathasthepropertythatitlags a variable one period, i.e. Ly t = y t 1 and L 2 y t = y t 2. The lag operator is related to the well-known first difference operator, =1 L, and y t =(1 L) y t = y t y t 1. Using the lag-operator we can write the AR(1) model as y t θy t 1 = δ + t (1 θl)y t = δ + t, where θ(z) :=1 θz is a (first order) polynomial. For higher order autoregressive processes the polynomial will have the form θ(z) :=1 θ 1 z θ 2 z 2... θ p z p. Note that the polynomial evaluated in z =1is θ(1) = 1 θ 1 θ 2... θ p, which is one minus thesumofthecoefficients. The characteristic equation is defined as the polynomial equation, θ(z) =0, and the p solutions, z 1,..., z p, are denoted the characteristic roots. For the AR(1) model the characteristic root is given by θ(z) =1 θz =0 z = θ 1, which is just the inverse coefficient. The polynomial of an AR(p) model has p roots, and for p 2 they may be complex numbers of the form z j = a j ± b j 1. Recall, that the roots can be used to factorize the polynomial, and the characteristic polynomial can be rewritten as θ(z) =1 θ 1 z θ 2 z 2... θ p z p =(1 φ 1 z)(1 φ 2 z) 1 φ p z, (B1-1) where φ j = zj 1 denotes an inverse root. The usual results for convergent geometric series also hold for expressions involving the lagoperator, and if θ < 1 it holds that 1+θL + θ 2 L 2 + θ 3 L 3 1 +... = 1 θl =: θ 1 (L), (B1-2) where the right hand side is called the inverse polynomial. The inverse polynomial is infinite and it exists if the terms on the left hand side converges to zero, i.e. if θ < 1. Higher order polynomials may also be inverted if the infinite sum converges. Based on the factorization in (B1-1) we note that the general polynomial θ(z) is invertible if each of the factors are invertible, i.e. of all the inverse roots are smaller than one in absolute value, φ j < 1, j =1, 2,..., p. The general inverse polynomial has infinitely many terms, θ 1 (z) =1+c 1 z + c 2 z 2 + c 3 z 3 + c 4 z 4 +..., where the coefficients can be found as complicated functions of the autoregressive parameters: c 1 = θ 1, c 2 = c 1 θ 1 + θ 2, c 3 = c 2 θ 1 + c 1 θ 2 + θ 3,c 4 = c 3 θ 1 + c 2 θ 2 + c 1 θ 3 + θ 4, etc., where we insert θ k =0for k>p. Technically, the MA-coefficients can be derived from the multiplication of the inverse factors of the form (B1-2). These calculations are straightforward but tedious, and will not be presented here. 9

(A) y t =0.5 y t 1 +ε t 1.0 (B) Autocorrelation function for (A) 2.5 0.5 0.0 0.0 2.5 0.5 1.0 0 20 40 60 80 100 0 2 4 6 8 10 12 14 16 18 20 22 5 (C) y t =0.9 y t 1 +ε t 1.0 (D) Autocorrelation function for (C) 0.5 0 0.0 0.5 0 20 40 60 80 100 1.0 0 2 4 6 8 10 12 14 16 18 20 22 5 (E) y t = 0.9 y t 1 +ε t 1.0 (F) Autocorrelation function for (E) 0.5 0 0.0 5 0.5 0 20 40 60 80 100 1.0 0 2 4 6 8 10 12 14 16 18 20 22 5 (G) y t =y t 1 +ε t 0 (H) y t =1.05 y t 1 +ε t 0 25 50 5 75 0 20 40 60 80 100 0 20 40 60 80 100 Figure 2: Examples of simulated AR(p) processes. indicates the theoretical ACF while indicates the estimated ACF. Horizontal lines are the 95% confidence bounds for zero autocorrelations derived for an IID process. 10

wherewehaveusedthatδ is constant and Lδ = δ. We recognize (8) as a moving average representation. We note that the AR(p) model is stationary if the moving average coefficients converge to zero; and that is automatically the case for the inverse polynomial. We can therefore state the stationarity condition for an AR(p) model as the requirement that the autoregressive polynomial is invertible, i.e. that the p roots of the autoregressive polynomial are larger than one in absolute value, z j > 1, or that the inverse roots are smaller than one, φj < 1. Note that the moving average coefficients measure the dynamic impact of a shock to the process, y t y t y t =1, = c 1, = c 2,..., t t 1 t 2 and the sequence of MA-coefficients, c 1,c 2,c 3,..., is also known as the impulse-responses of the process. Stationarity implies that the impulse response function dies out eventually. Again we can find that properties of the process from the MA-representation. As an example, the constant mean is given by µ := E[y t ]=(1+c 1 + c 2 + c 3 + c 4 +...) δ δ θ(1) = δ 1 θ 1 θ 2... θ p. For this to be defined we require that z =1is not a root of the autoregressive polynomial, but that is ensured by the stationarity condition. The variance is given by γ 0 := V (y t )= 1+c 2 1 + c 2 2 + c2 3 + c2 4 +... σ 2. 3.3 ARMAandARIMAModels At this point it is worth emphasizing the duality between AR and MA models. We have seen that a stationary AR(p) model can always be represented by a MA( ) model because the autoregressive polynomial θ(z) can be inverted. We may also write the MA model using a lag polynomial, y t = µ + α(l) t, where α(z) :=1+α 1 z +...+α q z q is a polynomial. If α(z) is invertible (i.e. if all the inverse roots of α(z) =0are smaller than one), then the MA(q) model can also be represented by an AR( ) model. As a consequence we may approximate a MA model with an autoregression with many lags; or we may alternatively represent a long autoregression with a shorter MA model. The AR(p) and MA(q) models can also be combined into a so-called autoregressive moving average, ARMA(p,q), model, y t = δ + θ 1 y t 1 +... + θ p y t p + t + α 1 t 1 +... + α q t q. This is a very flexible class of models that is capable of representing many different patterns of autocovariances. Again we can write the model in terms of lag-polynomials as θ(l)(y t µ) =α(l) t, 11

Box 2: Autocorrelations from Yule-Walker Equations The presented approach for calculation of autocovariances and autocorrelations is totally general, but it is sometimes difficult to apply by hand because it requires that the MArepresentation is derived. An alternative way to calculate autocorrelations is based on the so-called Yule-Walker equations, which are obtained directly from the model. To illustrate, this box derives the autocorrelations for an AR(2) model. Consider the AR(2) model given by y t = δ + θ 1 y t 1 + θ 2 y t 2 + t. (B2-1) First we find the mean by taking expectations, E [y t ]=δ + θ 1 E [y t 1 ]+θ 2 E [y t 2 ]+E [ t ]. Assuming stationarity, E [y t ]=E [y t 1 ], and it follows that δ µ := E[y t ]=. 1 θ 1 θ 2 Next we define a new process as the deviation from the mean, ey t := y t µ, sothat ey t = θ 1 ey t 1 + θ 2 ey t 2 + t. (B2-2) Now remember that V (y t )=E[(y t µ) 2 ]=E[ey t 2 ], and we do all calculations on (B2-2) rather than (B2-1). If we multiply both sides of (B2-2) with ey t and take expectations, we find E ey t 2 = θ 1 E [ey t 1 ey t ]+θ 2 E [ey t 2 ey t ]+E[ t ey t ] γ 0 = θ 1 γ 1 + θ 2 γ 2 + σ 2, (B2-3) wherewehaveusedthedefinitions and that E [ t ey t ]=E [ t (θ 1 ey t 1 + θ 2 ey t 2 + t )] = σ 2.Ifwe multiplying instead with ey t 1, ey t 2,andey t 3,weobtain E [ey t ey t 1 ] = θ 1 E [ey t 1 ey t 1 ]+θ 2 E [ey t 2 ey t 1 ]+E[ t ey t 1 ] γ 1 = θ 1 γ 0 + θ 2 γ 1 (B2-4) E [ey t ey t 2 ] = θ 1 E [ey t 1 ey t 2 ]+θ 2 E [ey t 2 ey t 2 ]+E[ t ey t 2 ] γ 2 = θ 1 γ 1 + θ 2 γ 0 (B2-5) E [ey t ey t 3 ] = θ 1 E [ey t 1 ey t 3 ]+θ 2 E [ey t 2 ey t 3 ]+E[ t ey t 3 ] γ 3 = θ 1 γ 2 + θ 2 γ 1. (B2-6) The set of equations (9)-(9) is known as the Yule-Walker equations. To findthevariancewe can substitute γ 1 and γ 2 into (9) and solve. This is a bit tedious, however, and will not be done here. To find the autocorrelations, ρ k := γ k /γ 0, just divide the Yule-Walker equations with γ 0 to obtain ρ 1 = θ 1 + θ 2 ρ 1, ρ 2 = θ 1 ρ 1 + θ 2, and ρ k = θ 1 ρ k 1 + θ 2 ρ k 2, k 3, or by collecting terms, that ρ 1 = θ 1 1 θ 2, ρ 2 = θ2 1 1 θ 2 + θ 2, and ρ k = θ 1 ρ k 1 + θ 2 ρ k 2, k 3. 12

and there may exist both an AR-presentation, α 1 (L)θ(L)(y t µ) = t, and a MArepresentation, (y t µ) =θ 1 (L)α(L) t,bothwithinfinitely many terms. In this way we may think of the ARMA model as a parsimonious representation of a given autocovariance structure, i.e. the representation that uses as few parameters as possible. For y t to be stationary it is required that the inverse roots of the characteristic equation are all smaller than one. If there is a root at unity, a so-called unit root φ 1 =1, then the factorized polynomial can be written as θ(z) =(1 z)(1 φ 2 z) 1 φ p z. (9) Noting that =1 L is the first difference operator, the model can be written in terms of the first differenced variable θ (L) y t = δ + α(l) t, (10) where the polynomial θ (z) is defined as the last p 1 terms in (9). The general model in (10) is referred to as an integrated ARMA model or an ARIMA(p,d,q), with p stationary autoregressive roots, d first differences, and q MA terms. We return to the properties of unit root processes and the testing for unit roots later in the course. Example 1 (danish house prices): To illustrate the construction of ARIMA models we consider the real Danish house prices, 1972:1-2004:2, defined as the log of the house price index divided with the consumer price index. Estimating a second order autoregressive model yields p t =0.0034 + 1.545 p t 1 0.565 p t 2, with the autoregressive polynomial given by θ(z) =1 1.545 z +0.565 z 2. The p =2 inverse roots of the polynomial are given by φ 1 =0.953 and φ 2 =0.592 and we can factorize the polynomial as θ(z) =1 1.545 z +0.565 z 2 =(1 0.953 z)(1 0.592 z). Wedonotwanttotestforunitrootsatthispointandweassumewithouttestingthat the firstrootisunityanduse(1 0.953 L) (1 L) =. Estimating a AR(1) model for p t, which is the same as an ARIMA(1,1,0) model for p t, we get the following results p t =0.0008369 + 0.544 p t 1, where the second root is basically unchanged. 4 ARMA Estimation and Model Selection If we are willing to assume a specific distributional form for the error process, t,then it is natural to estimate the parameters using maximum likelihood. The most popular 13

assumption in macro-econometrics is the assumption of normal errors. In this case the log-likelihood function has the well-known form log L(θ, σ 2 )= T 2 log(2πσ2 ) TX t=1 2 t 2σ 2, (11) where θ contains the parameters to be estimated in the conditional mean and σ 2 is the error variance. Other distributional assumptions can also be used, and in financial econometrics it is often preferred to rely on error distributions with more probability mass in the tails of the distribution, i.e. with a higher probability of extreme observations. A popular choice is the student t(v) distribution, where v is the number of degrees of freedom. Low degrees of freedom give a heavy tail distribution and for v,thet(v) distribution approaches the normal distribution. In a likelihood analysis we may even treat v as a parameter and estimate the degrees of freedom in the t distribution. The autoregressive model is basically a linear regression and estimation is particularly simple. We can write the error in terms of the observed variables t = y t δ θ 1 y t 1 θ 2 y t 2... θ p y t p, t =1, 2,...,T, insert in (11) and maximize the likelihood function over (δ,θ 1,θ 2,...,θ p,σ 2 ) 0 given the observed data. We note that the analysis is conditional on the initial values and that the ML estimator coincides with the OLS estimator for this case. For the moving average model the idea is the same, but it is more complicated to express the likelihood function in terms of observed data. Consider for illustration the MA(1) model given by y t = µ + t + α t 1. To express the sequence of error terms, 1,..., T, as a function of the observed data, y 1,...,y T, we solve recursively for the error terms 1 = y 1 µ 2 = y 2 µ α 1 = y 2 αy 1 µ + αµ 3 = y 3 µ α 2 = y 3 αy 2 + α 2 y 1 µ + αµ α 2 µ 4 = y 4 µ α 3 = y 4 αy 3 + α 2 y 2 α 3 y 1 µ + αµ α 2 µ + α 3 µ. where we have assumed a zero initial value, 0 =0. The resulting log-likelihood is a complicated non-linear function of the parameters (µ, α, σ 2 ) 0 but it can be maximized using numerical algorithms to produce the ML estimates. 14

4.1 Model Selection In empirical applications it is necessary to choose the lag orders, p and q, forthearma model. If we have a list of potential models, e.g. ARMA(1,1) : y t θy t 1 = δ + t + α t 1 AR(1) : y t θy t 1 = δ + t MA(1) : y t = δ + t + α t 1 then we should find a way to choose the most relevant one for a given data set. This is known as model selection in the literature. There are two different approaches: generalto-specific (GETS) testing and model selection based on information criteria. If the models are nested, i.e. if one model is a special case of a larger model, then we may use standard likelihood ratio (LR) testing to evaluate the reduction from the large to the small model. In the above example, it holds that the AR(1) model is nested within the ARMA(1,1), written as AR(1) ARMA(1,1), where the reduction imposes the restriction α =0. This restriction can be tested by a LR test, and we may reject or accept the reduction to an AR(1) model. Likewise it holds that MA(1) ARMA(1,1) and we may test the restriction θ =0using a LR test to judge the reduction from the ARMA(1,1) to the MA(1). If both reductions are accepted, however, it is not clear how to choose between the AR(1) model and the MA(1) model. The models are not nested (you cannot impose a restriction on one model to get the other) and standard test theory will not apply. In practice we can estimate both models and compare the fit of the models and the outcome of misspecification testing, but there is no valid formal test between the models. An alternative approach is called model selection and it is valid also for non-nested models with the same regressand. We know that the more parameters we allow in a model the smaller is the residual variance. To obtain a parsimonious model we therefore want to balance the model fit against the complexity of the model. This balance can be measured by a so-called information criteria that takes the log-likelihood and subtract a penalty for the number of parameters, i.e. IC =logbσ 2 + penalty(t,#parameters). A small value indicates a more favorable trade-off, and model selection could be based on minimizing the information criteria. Different criteria have been proposed based on different penalty functions. Three important examples are the Akaike, the Hannan-Quinn, and Schwarz Bayesian criteria, defined as, respectively, AIC = logbσ 2 + 2 k T HQ = logbσ 2 + 2 k log(log(t )) T BIC = logbσ 2 + k log(t ), T 15

where k is the number of estimated parameters, e.g. k = p + q +1. The idea of the model selection is to choose the model with the smallest information criteria, i.e. the best combination of fit andparsimony. It is worth emphasizing that different information criteria will not necessarily give the same preferred model, and the model selection may not agree with the GETS testing. In practice it is therefore often difficult to make firm choices, and with several candidate models it is a sound principle to ensure that the conclusions you draw from an analysis are robust to the chosen model. A final and less formal approach to identification of p and q in the ARMA(p,q) model is based directly on the shape of the autocorrelation function. Recall from Lecture Note 1 that the partial autocorrelation function, PACF, isdefined as the autocorrelation conditional on intermediate lags, Corr(y t,y t k y t 1,y t 2,...,y t h+1 ). It follows directly from the definition that an AR(p) model will have p significant partial autocorrelations and PACF is zero for lags k>p; at the same time it holds that the ACF exhibits an exponential decay. For the MA(q) model we observe the reverse pattern: We know that the first q entries in the ACF are non-zero while ACF is zero for k>q; and if we write the MA model as an AR( ) weexpectthepacf to decay exponentially. The AR and MA model are therefore mirror images and by looking at the ACF and PACF we could get an idea on the appropriate values for p and q. This methodology is known as the Box-Jenkins identification procedure and it was very popular in times when ARMA models were hard to estimate. With today s computers it is probably easier to test formally on the parameters then informally on their implications in terms of autocorrelation patterns. Example 2 (danish consumption-income ratio): To illustrate model selection we consider the log of the Danish quarterly consumption-to-income ratio, 1971:1-2003:2. This is also the inverse savings rate. The time series is illustrated in Figure 3 (A), and the autocorrelation functions are given in (B). The PACF suggests that the first autoregressive coefficient is strongly significant while the second is more borderline. Due to the exponentialdecayoftheacf, implied by the autoregressive coefficient, it is hard so assess the presence of MA terms; this is often the case with the Box-Jenkins identification. We therefore estimate an ARMA(2,2) y t θ 1 y t 1 θ 2 y t 2 = δ + t + α 1 t 1 + α 2 t 2, and all sub-models obtained by imposing restrictions on the parameters (θ 1,θ 2,α 1,α 2 ) 0. Estimation results and summary statistics for the models are given in Table 1. All three information criteria are minimized for the purely autoregressive AR(2) model, but the value for the mixed ARMA(1,1) is by and large identical. The reduction from the ARMA(2,2) to these two models are easily accepted by LR tests. Both models seem to give a good description of the covariance structure of the data and based on the output from Table 1 it is hard to make a firm choice. 16

ARMA(p,q) (2,2) (2,1) (2,0) (1,2) (1,1) (1,0) (0,0) θ 1 1.418 (4.28) θ 2 0.516 ( 1.82) α 1 0.899 ( 2.80) α 2 0.308 (2.13) µ 0.094 ( 11.1) 0.573 (1.70) 0.224 (0.89) 0.040 ( 0.11) 0.094 ( 9.77) 0.536 (6.36) 0.251 (2.95) 0.833 (12.2)... 0.304 ( 2.78)...... 0.085 (0.95) 0.094 ( 9.87) 0.094 ( 9.87) 0.857 (15.2) 0.715 (11.7)............... 0.301 ( 3.06) 0.094 ( 9.44)............... 0.094 ( 12.6) 0.095 ( 30.4) log-likelihood 300.822 300.395 300.389 300.428 299.993 296.174 249.826 BIC 4.441 4.472 4.509 4.472 4.503 4.482 3.806 HQ 4.506 4.524 4.548 4.525 4.542 4.508 3.819 AIC 4.551 4.560 4.575 4.560 4.569 4.526 3.828 Normality [0.70] [0.83] [0.84] [0.81] [0.84] [0.80] [0.16] No-autocorrelation [0.45] [0.64] [0.72] [0.63] [0.66] [0.20] [0.00] Table 1: Estimation results for ARMA(2,2) and sub-models. Figures in parentheses are t-ratios. Figures in square brackets are p-values for misspecification tests. Estimation is done using the ARFIMA package in PcGive, which uses a slightly more complicated treatment of initial values than that presented in the present text. 0.00 0.05 (A) Consumption income ratio 1 (B) Autocorrelation function for (A) ACF PACF 0.10 0 0.15 1970 1980 1990 2000 0 5 10 15 20 0.00 0.05 (C) Forecast from an AR(2) Forecasts Actual 0.05 (D) AR(2) and ARMA(1,1) forecasts 0.10 0.10 0.15 0.15 1970 1980 1990 2000 0.20 2010 AR(2) ARMA(1,1) Actual 2000 2002 2004 2006 2008 2010 Figure 3: Example based on Danish consumption-income ratio. 17

5 Univariate Forecasting It is straightforward to forecast with univariate ARMA models, and the simple structure allows you to produce forecasts based solely on the past of the process. The obtained forecasts are plain extrapolations from the systematic part of the process and they will contain no economic insight. In particular it is very rarely possible to predict turning points, i.e. business cycle changes, based on a single time series. The forecast may nevertheless be helpful in analyzing the direction of future movement in a time series, all other things equal. The object of interest is a prediction of y T +k given the information up to time T. Formally we define the information set availableattimet as I T = {y,...,y T 1,y T }, and we define the optimal predictor as the conditional expectation y T +k T := E[y T +k I T ]. To illustrate the idea we consider the case of an ARMA(1,1) model, y t = δ + θy t 1 + t + α t 1, for t =1, 2,...,T. To forecast the next observation, y T +1, we write the equation y T +1 = δ + θy T + T +1 + α T, and the best prediction is the conditional expectation of the right-hand-side. We note that y T and T = y T δ θy T 1 α T 1 are in the information set at time T,while the best prediction of future shocks are zero, E[ T +k I T ]=0for k>0. We find the predictions y T +1 T = E [δ + θy T + T +1 + α T I T ] = δ + θy T + α T y T +2 T = E [δ + θy T +1 + T +2 + α T +1 I T ] = δ + θe [y T +1 I T ] = δ + θy T +1 T y T +3 T = E [δ + θy T +2 + T +3 + α T +2 I T ] = δ + θe [y T +2 I T ] = δ + θy T +2 T. We note that the error term, T,affects the first period forecast due to the MA(1) structure, and after that the first order autoregressive process takes over and the forecasts will converge exponentially towards the unconditional expectation, µ. In practice we replace the true parameters with the estimators, b θ, bα, and b δ,andthe true errors with the estimated residuals, b 1,...,b T, which produces the feasible forecasts, by T +k T. 5.1 Forecast Accuracy The forecasts above are point forecasts, i.e. the best point predictions given by the model. Often it is of interest to assess the variances of these forecasts, and produce confidence bounds or distribution of the forecasts. 18

To analyze forecast errors, consider first a MA(q) model The sequence of forecasts are given by y t = µ + t + α 1 t 1 + α 2 t 2 +... + α q t q. y T +1 T = µ + α 1 T + α 2 T 1 +... + α q T q+1 y T +2 T = µ + α 2 T +... + α q T q+2 y T +3 T = µ + α 3 T +... + α q T q+3. y T +q T = µ + α q T y T +q+1 T = µ, where the information set improves the predictions for q period. The corresponding forecast errors are given by the error terms not in the information set y T +1 y T +1 T = T +1 y T +2 y T +2 T = T +2 + α 1 T +1 y T +3 y T +3 T = T +3 + α 1 T +2 + α 2 T +1. y T +q y T +q T = T +q + α 1 T +q 1 +... + α q 1 T +1 y T +q+1 y T +q+1 T = T +q+1 + α 1 T +q +... + α q T +1. We can find the variance of the forecast as the squared forecast errors, i.e. C 1 = E 2 T +1 I T i = σ 2 C 2 = E h( T +2 + α 1 T +1 ) 2 I T = 1+α 2 1 σ 2 i C 3 = E h( T +3 + α 1 T +2 + α 2 T +1 ) 2 I T = 1+α 2 1 + 2 α2 σ 2. i C q = E h( T +q + α 1 T +q 1 +... + α q 1 T +1 ) 2 I T i C q+1 = E h( T +q+1 + α 1 T +q +... + α q T +1 ) 2 I T = 1+α 2 1 + α2 2 +... + α2 q 1 σ 2 = 1+α 2 1 + α2 2 +... + α2 q σ 2. Note that the forecast error variance increases with the forecast horizon, and that the forecast variances converge to the unconditional variance of y t, see equation (2). This result just reflects that the information set, I T, is useless for predictions in the remote future: The best prediction will be the unconditional mean, µ, and the uncertainty is the unconditional variance, C = γ 0. Assuming that the error term is normally distributed, we may produce 95% confidence bounds for the forecasts as y T +h T ±1.96 C h,where1.96 is the quantile of the normal distribution. Alternatively we may give a full distribution of the forecasts as N(y T +h T,C h ). 19

To derive the forecast error variance for AR and ARMA models we just write the models in their MA( ) form and use the derivations above. For the case of a simple AR(1) model the infinite MA representation is given in (6), and the forecast error variances are found to be C 1 = σ 2, C 2 = 1+θ 2 σ 2, C 3 = 1+θ 2 + θ 4 σ 2,... where we again note that the forecast error variances converge to the unconditional variance, γ 0. Example 3 (danish consumption-income ratio): For the Danish consumption-income ratio, we saw that the AR(2) model and the ARMA(1,1) gave by and large identical insample results. Figure 3 (C) shows the out-of-sample forecast for the AR(2) model. We note that the forecasts are very smooth, just describing an exponential convergence back to the unconditional mean the attractor in the stationary model. The shaded area is the distribution of the forecast, and the widest band corresponds to 95% confidence. Graph (D) compares the forecasts from the AR(2) and the ARMA(1,1) models as well as their 95% confidence bands. The two sets of forecasts are very similar and for practical use it does not matter which one we choose; this is reassuring as the choice between the models was very difficult. 6 The Autoregressive Distributed Lag Model The models considered so far were all univariate in the sense that they focus on a single variable, y t. In most situations, however, we are primarily interested in the interrelationships between variables. As an example we could be interested in the dynamic responses to an intervention in a given variables, i.e. in the dynamic multipliers y t x t, y t+1 x t, y t+2 x t,... A univariate model will not suffice for this purpose, and the object of interest is the conditional model for y t y t 1,y t 2,...,x t,x t 1,x t 2... Under the assumption that the conditional mean is a linear function, a natural starting point is the unrestricted autoregressive distributed lag (ADL) model defined as y t = δ + θ 1 y t 1 + φ 0 x t + φ 1 x t 1 + t, (12) for t =1, 2,...,T and IID(0,σ 2 ). To simplify the notation we consider the model with one explanatory variable and one lag in y t and x t, but the model can easily be extended to more general cases. By referring to the results for an AR(1) process, we immediately see that the process y t is stationary if θ 1 < 1 and x t is a stationary process. The first condition excludes unit roots in the equation (12), while the second condition states that the forcing variable, 20

x t, is also stationary. The results from Lecture Note 2 holds for estimation and inference in the model; here we are interested in the dynamic properties of the model under the stationarity condition. Before we go into details with the dynamic properties of the ADL model, we want to emphasize that the model is quite general and contains other relevant models as special cases. The AR(1) model analyzed above prevails if φ 0 = φ 1 =0. A static regression with IID errors is obtained if θ 1 = φ 1 =0. A static regression with AR(1) autocorrelated errors is obtained if φ 1 = θ 1 φ 0 ; this is the common factor restriction discussed in Lecture Note 2. Finally, a model in first differences is obtained if θ 1 =1and φ 0 = φ 1. Whether these special cases are relevant can be analyzed with standard Wald or LR tests on the coefficients. 6.1 Dynamic- and Long-Run Multipliers To derive the dynamic multiplier, we write the equations for different observations, y t = δ + θ 1 y t 1 + φ 0 x t + φ 1 x t 1 + t y t+1 = δ + θ 1 y t + φ 0 x t+1 + φ 1 x t + t+1 y t+2 = δ + θ 1 y t+1 + φ 0 x t+2 + φ 1 x t+1 + t+2, and state the dynamic multipliers as the derivatives: y t x t = φ 0 y t+1 x t = θ 1 y t x t + φ 1 = θ 1 φ 0 + φ 1 y t+2 y t+1 = θ 1 x t x t = θ 1 (θ 1 φ 0 + φ 1 ) y t+3 y t+2 = θ 1 x t x t = θ 2 1 (θ 1 φ 0 + φ 1 ). y t+k = θ1 k 1 (θ 1 φ x 0 + φ 1 ). t Under the stationarity condition, θ 1 < 1, shocks have only transitory effects y t+k x t 0 as k. We think of the sequence of multipliers as the impulse-reponses to a temporary change in x t and for stationary variables it is natural that the long-run effect is zero. To illustrate the flexibility of the ADL model with one lag, Figure 4 (A) reports example of dynamic multipliers. In all cases the contemporaneous impact is yt x t =0.8, but the dynamic profile can be fundamentally different depending on the parameters. Now consider a permanent shift in x t,sothate[x t ] changes. To find the long-run 21

multiplier we take expectations to obtain y t = δ + θ 1 y t 1 + φ 0 x t + φ 1 x t 1 + t E[y t ] = δ + θ 1 E[y t 1 ]+φ 0 E[x t ]+φ 1 E[x t 1 ] E[y t ](1 θ 1 ) = δ +(φ 0 + φ 1 ) E[x t ] E[y t ] = δ + φ 0 + φ 1 E[x t ]. 1 θ 1 1 θ 1 (13) In the derivation we have used the stationarity of y t and x t,i.e.thate[y t ]=E[y t 1 ] and E[x t ]=E[x t 1 ]. Based on (13) we find the long-run multiplier as the the derivative E[y t ] E[x t ] = φ 0 + φ 1. (14) 1 θ 1 It also holds that the long-run multiplier is the sum of the short-run effects, i.e. y t x t + y t+1 x t + y t+2 x t +... = φ 0 +(θ 1 φ 0 + φ 1 )+θ 1 (θ 1 φ 0 + φ 1 )+θ 2 1 (θ 1 φ 0 + φ 1 )+... = φ 0 1+θ1 + θ 2 1 +... + φ 1 1+θ1 + θ 2 1 +... = φ 0 + φ 1. 1 θ 1 For stationary variables we may define a steady state as y t = y t 1 and x t = x t 1. Inserting that in the ADL model with get the so-called long-run solution: y t = δ + φ 0 + φ 1 x t = µ + βx t, (15) 1 θ 1 1 θ 1 which is the steady state version of the equation in (14), and we see that the steady state derivative is exactly the long-run multiplier. Figure 4 (B) reports examples of cumulated dynamic multipliers, which picture of the convergence towards the long-run solutions. The firstperiodimpactisinallcases y t x t =0.8, while the long-run impact is β which depends on the parameters of the model. Example 4 (danish income and consumption): To illustrate the impact of income changes on changes in consumption, we consider time series for the growth from quarter to quarter in Danish private aggregate consumption, c t, and private disposable income, y t,for1971 : 1 2004 : 2, see Figure 4 (C). To analyze the interdependencies between the variables we estimate and ADL model with one lag and obtain the equation c t =0.003 (2.09) 0.328 ( 3.88) c t 1 +0.229 (3.80) y t +0.092 (1.48) y t 1, (16) where the numbers in parentheses are t ratios. Apart from a number of outliers, the model appears to be well specified and the misspecification tests for no-autocorrelation cannot be rejected. The residuals are reported in graph (D). The long-run solution is given by (15) with coefficients given by µ = δ = 0.003 1 θ 1 1+0.328 =0.0024 and β = φ 0 + φ 1 0.229 + 0.092 = =0.242. 1 θ 1 1+0.328 22

(A) Dynamic multipliers in an ADL(1,1) (B) Cumulated dynamic multipliers from (A) 1.0 0.5 y t =0.5 y t 1 +0.8 x t +0.2 x t 1 +ε t y t =0.9 y t 1 +0.8 x t +0.2 x t 1 +ε t y t =0.5 y t 1 +0.8 x t +0.8 x t 1 +ε t y t =0.5 y t 1 +0.8 x t 0.6 x t 1 +ε t 4 2 0.0 0 5 10 15 20 25 (C) Growth in Danish consumption and income 0 0 5 10 15 20 25 (D) Standardized residuals from ADL model 0.05 0.00 0.05 Consumption Income 2.5 0.0 0.05 0.00 2.5 1975 1980 1985 1990 1995 2000 0.05 1975 1980 1985 1990 1995 2000 2005 (E) Dynamic multipliers (scaled) (F) Forecasts from ADL model 0.2 Cumulated dynamic multipliers 0.05 Forecasts Actual 0.1 0.00 0.0 Dynamic multiplier (lag weights) 0.05 0 5 10 1975 1980 1985 1990 1995 2000 2005 Figure 4: (A)-(B): Examples of dynamic multipliers for an ADL model with one lag. (C)-(D): Empirical example based on Danish income and consumption The standard errors of µ and β are complicated functions of the original variances and covariances, but PcGive supply them automatically and we find that t µ=0 =2.10 and t β=0 =3.31 so both long-run coefficients are significantly different from zero at a 5% level. The impulse responses, c t / y t, c t+1 / y t,..., are reported in Figure 4 (E). We note that the contemporaneous impact is 0.229 from (16). The cumulated response is also presented in graph (E); it converges to the long-run multiplier, β =0.242, which in this case is very close to the first period impact. 23

6.2 General Case To state the results for the general case of more lags, it is again convenient to use lagpolynomials. We write the model with p and q lags as where the polynomials are defined as θ(l)y t = δ + φ(l)x t + t, θ(l) :=1 θ 1 L θ 2 L 2... θ p L p and φ(l) :=φ 0 + φ 1 L + φ 2 L 2 +... + φ q L q. Under stationarity we can write the model as y t = θ 1 (L)δ + θ 1 (L)φ(L)x t + θ 1 (L) t, (17) in which there are infinitely many terms in both x t and t. Writing the combined polynomial as c(l) =θ 1 (L)φ(L) =c 0 + c 1 L + c 2 L 2 +..., the dynamic multipliers are given by the sequence c 0,c 1,c 2,...This is parallel to the result for the AR(1) model stated above. If we take expectations in (17) we get the equivalent to (13), E [y t ]=θ 1 (L)δ + θ 1 (L)φ(L)E [x t ]. Now recall that E [y t ]=E[y t 1 ]=LE[y t ] and E [x t ]=E[x t 1 ]=LE[x t ] are constant due to stationarity, so that θ 1 (L)φ(L)E [x t ]=θ 1 (1)φ(1)E [x t ] and the long-run solution can be found as the derivative E[y t ] E[x t ] = φ(1) θ(1) = φ 0 + φ 1 + φ 2 +... + φ q 1 θ 1 θ 2... θ p. This is also the sum of the dynamic multipliers, c 0 + c 1 + c 2 +... 6.3 Error-Correction Model There exists a convenient formulation of the model that incorporates the long-run solution directly. In particular, we can rewrite the model as y t = δ + θ 1 y t 1 + φ 0 x t + φ 1 x t 1 + t y t y t 1 = δ +(θ 1 1)y t 1 + φ 0 x t + φ 1 x t 1 + t y t y t 1 = δ +(θ 1 1)y t 1 + φ 0 (x t x t 1 )+(φ 0 + φ 1 )x t 1 + t y t = δ +(θ 1 1)y t 1 + φ 0 x t +(φ 0 + φ 1 )x t 1 + t. (18) The idea is that the levels appear only ones, an the model can be written is the form y t = φ 0 x t (1 θ 1 )(y t 1 µ βx t 1 )+ t, (19) 24

where µ and β refer to (15). This model in (19) is known as the error-correction model (ECM) and it has a very natural interpretation in terms of the long-run steady state solution and the dynamic adjustment towards equilibrium. First note that y t 1 yt 1 = y t 1 µ βx t 1 is the deviation from steady state the previous period. For the steady state to be sustained the variables have to eliminate the deviations and move towards the long-run solution. This tendency is captured by the coefficient (1 θ 1 ) < 0, andify t 1 is above the steady state value then y t will be affected negatively, and there is a tendency for y t to move back towards the steady state. We say that y t error-corrects or equilibrium-corrects and yt = µ + βx t is the attractor for y t. We can estimate the ECM model and obtain the alternative representation directly. We note that the models (12), (18), and (19) are equivalent, and we can always calculate from one representation to the other. It is important to note, however, that the version in (18) is a linear regression model that can be estimated using OLS while the solved ECM in (19) is non-linear and requires a different estimation procedure (e.g. maximum likelihood). The choice of which model to estimate depends on the purpose of the analysis and on the used software. The analysis in PcGive focus on the standard ADL model and the long-run solution and the dynamic multipliers are automatically supplied. In other software packages it is sometimes more convenient to estimate the ECM form directly. Example 5 (danish income and consumption): If we estimate the linear errorcorrection form for the Danish income and consumption data series we obtain the equation, 2 c t =0.229 (3.80) 2 y t 1.328 ( 15.7) c t 1 +0.321 (3.19) y t 1 +0.0032 (2.09). Note that the results are equivalent to the results from the ADL model in (16). We can find the long-run multiplier as β = 0.321 1.328 =0.242. Using maximum likelihood (with a normal error distribution) we can also estimate the solved ECM form to obtain µ 2 c t =0.229 (3.80) 2 y t 1.328 c t 1 0.0024 0.242 y t 1, ( 15.7) (2.10) (3.31) where we again recognize the long-run solution. 6.4 Conditional Forecasts From the conditional ADL model we can also produce forecasts of y T +k. This is parallel to the univariate forecasts presented above and we will not go into details here. Note, however, that to forecast y T +k we need to know x T +k. Since this is rarely in the information set at time T, a feasible forecast could forecast x T +k first, e.g. using a univariate time series model and then forecast y T +k conditional on bx T +k T. 25