Ch. 15 Forecasting. 1.1 Forecasts Based on Conditional Expectations

Similar documents
Forecasting with ARMA

Università di Pavia. Forecasting. Eduardo Rossi

Ch. 14 Stationary ARMA Process

Covariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )

Discrete time processes

Econ 623 Econometrics II Topic 2: Stationary Time Series

Autoregressive and Moving-Average Models

Ch. 19 Models of Nonstationary Time Series

1 Linear Difference Equations

Empirical Market Microstructure Analysis (EMMA)

Class 1: Stationary Time Series Analysis

Chapter 4: Models for Stationary Time Series

Trend-Cycle Decompositions

1 Class Organization. 2 Introduction

3. ARMA Modeling. Now: Important class of stationary processes

Some Time-Series Models

LECTURE 10 LINEAR PROCESSES II: SPECTRAL DENSITY, LAG OPERATOR, ARMA. In this lecture, we continue to discuss covariance stationary processes.

A time series is called strictly stationary if the joint distribution of every collection (Y t

Principles of forecasting

ECON 616: Lecture 1: Time Series Basics

Introduction to ARMA and GARCH processes

2.5 Forecasting and Impulse Response Functions

Chapter 9: Forecasting

ECONOMETRICS Part II PhD LBS

Basic concepts and terminology: AR, MA and ARMA processes

Permanent Income Hypothesis (PIH) Instructor: Dmytro Hryshko

Econometrics II Heij et al. Chapter 7.1

Notes on Time Series Modeling

5: MULTIVARATE STATIONARY PROCESSES

Define y t+h t as the forecast of y t+h based on I t known parameters. The forecast error is. Forecasting

Lecture on ARMA model

Forecasting and Estimation

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

7. MULTIVARATE STATIONARY PROCESSES

Univariate Nonstationary Time Series 1

ECON/FIN 250: Forecasting in Finance and Economics: Section 6: Standard Univariate Models

LINEAR STOCHASTIC MODELS

Lecture 2: Univariate Time Series

E 4101/5101 Lecture 6: Spectral analysis

Lecture 1: Stationary Time Series Analysis

Identifiability, Invertibility

Chapter 1. Basics. 1.1 Definition. A time series (or stochastic process) is a function Xpt, ωq such that for

Applied time-series analysis

TIME SERIES AND FORECASTING. Luca Gambetti UAB, Barcelona GSE Master in Macroeconomic Policy and Financial Markets

Lecture 1: Stationary Time Series Analysis

Midterm Suggested Solutions

Autoregressive Moving Average (ARMA) Models and their Practical Applications

Lecture 8: ARIMA Forecasting Please read Chapters 7 and 8 of MWH Book

Lecture 1: Fundamental concepts in Time Series Analysis (part 2)

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Introduction to Stochastic processes

Univariate Time Series Analysis; ARIMA Models

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

ECON/FIN 250: Forecasting in Finance and Economics: Section 7: Unit Roots & Dickey-Fuller Tests

Stationary Stochastic Time Series Models

18.S096 Problem Set 4 Fall 2013 Time Series Due Date: 10/15/2013

5 Transfer function modelling

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

at least 50 and preferably 100 observations should be available to build a proper model

Problem Set 2: Box-Jenkins methodology

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Single Equation Linear GMM with Serially Correlated Moment Conditions

11. Further Issues in Using OLS with TS Data

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

Covariances of ARMA Processes

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

3 Theory of stationary random processes

Time Series 3. Robert Almgren. Sept. 28, 2009

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

Ch.10 Autocorrelated Disturbances (June 15, 2016)

IDENTIFICATION OF ARMA MODELS

Time Series: Theory and Methods

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Statistics of stochastic processes

7. Forecasting with ARIMA models

ARMA Estimation Recipes

1 First-order di erence equation

Chapter 6: Model Specification for Time Series

Lesson 9: Autoregressive-Moving Average (ARMA) models

Difference equations. Definitions: A difference equation takes the general form. x t f x t 1,,x t m.

Advanced Econometrics

2. An Introduction to Moving Average Models and ARMA Models

Ch 4. Models For Stationary Time Series. Time Series Analysis

Single Equation Linear GMM with Serially Correlated Moment Conditions

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Financial Econometrics / 49

Class: Trend-Cycle Decomposition

Introduction to univariate Nonstationary time series models

Forecasting. Simon Shaw 2005/06 Semester II

Time Series Econometrics 4 Vijayamohanan Pillai N

CHAPTER 8 FORECASTING PRACTICE I

Time Series 2. Robert Almgren. Sept. 21, 2009

Lecture note 2 considered the statistical analysis of regression models for time

A lecture on time series

Econometría 2: Análisis de series de Tiempo

Vector autoregressions, VAR

Econometric Forecasting

Serially Correlated Regression Disturbances

FE570 Financial Markets and Trading. Stevens Institute of Technology

ARIMA Modelling and Forecasting

Transcription:

Ch 15 Forecasting Having considered in Chapter 14 some of the properties of ARMA models, we now show how they may be used to forecast future values of an observed time series For the present we proceed as if the model were known exactly One of the important problems in time series analysis is the following: Given T observations on a realization, predict the (T + s)th observation in the realization, where s is a positive integer The prediction is sometime called the forecast of the (T + s)th observation Forecasting is an important concept for the studies of time series analysis In the scope of regression model we usually has an existing economic theory model for us to estimate their parameters The estimated coefficients have already a role to play such as to confirm some economic theories Therefore, to forecast or not from this estimated model depends on researcher s own interest However, the estimated coefficients from a time series model have no significant meaning to economic theory An important role that a time series analysis is therefore to be able to forecast precisely from this pure mechanical model 1 Principle of Forecasting 11 Forecasts Based on Conditional Expectations Suppose we are interested in forecasting the value of a variables Y t+1 based on a set of variables x t observed at date t For example, we might want to forecast Y t+1 based on its m most recent values In this case, x t = [Y t, Y t 1,, Y t m+1 ] Let Y t+1 t denote a forecast of Y t+1 based on x t (a function of x t, depending on how they are realized) To evaluate the usefulness of this forecast, we need to specify a loss function A quadratic loss function means choosing the forecast Y t+1 t so as to minimize MSE(Y t+1 t) = E(Y t+1 Y t+1 t) 2, which is known as the mean squared error Theorem: The smallest mean squared error of in the forecast Y t+1 t is the (statistical) expectation of Y t+1 conditional on x t : Y t+1 t = E(Y t+1 x t ) 1

Ch 15 1 PRINCIPLE OF FORECASTING Proof: Let g(x t ) be a forecasting function of Y t+1 other then the conditional expectation E(Y t+1 x t ) Then the MSE associated with g(x t ) would be E[Y t+1 g(x t )] 2 = E[Y t+1 E(Y t+1 x t ) + E(Y t+1 x t ) g(x t )] 2 = E[Y t+1 E(Y t+1 x t )] 2 +2E{[Y t+1 E(Y t+1 x t )][E(Y t+1 x t ) g(x t )]} +E{E[(Y t+1 x t ) g(x t )] 2 } Denote η t+1 {[Y t+1 E(Y t+1 x t )][E(Y t+1 x t ) g(x t )]} we have E(η t+1 x t ) = [E(Y t+1 x t ) g(x t )] E([Y t+1 E(Y t+1 x t )] x t ) = [E(Y t+1 x t ) g(x t )] 0 = 0 By laws of iterated expectation, it follows that E(η t+1 ) = E xt E(E[η t+1 x t ]) = 0 Therefore we have E[Y t+1 g(x t )] 2 = E[Y t+1 E(Y t+1 x t )] 2 + E{E[(Y t+1 x t ) g(x t )] 2 } (1) The second term on the right hand side of (1) cannot be made smaller than zero and the first term does not depend on g(x t ) The function g(x t ) that can makes the mean square error (1) as small as possible is the function that sets the second term in (1) to zero: g(x t ) = E(Y t+1 x t ) The MSE of this optimal forecast is E[Y t+1 g(x t )] 2 = E[Y t+1 E(Y t+1 x t )] 2 12 Forecasts Based on Linear Projection Suppose we now consider only the class of forecast that Y t+1 is a linear function of x t : Y t+1 t = α x t 2 Copy Right by Chingnun Lee R 2006

Ch 15 1 PRINCIPLE OF FORECASTING Definition: The forecast α x t is called the linear projection of Y t+1 on x t if the forecast error (Y t+1 α x t ) is uncorrelated with x t : E[(Y t+1 α x t )x t] = 0 (2) Theorem: The linear projection produces the smallest mean squared error among the class of linear forecasting rule Proof: Let g x t be any arbitrary linear forecasting function of Y t+1 associated with g x t would be Then the MSE E[Y t+1 g x t ] 2 = E[Y t+1 α x t + α x t g x t ] 2 = E[Y t+1 α x t ] 2 +2E{[Y t+1 α x t ][α x t g x t ]} +E[α x t g x t ] 2 Denote η t+1 {[Y t+1 α x t ][α x t g x t ]} we have E(η t+1 ) = E{[Y t+1 α x t ][(α g) x t ]} = (E[Y t+1 α x t ]x t)[α g] = 0 [α g] = 0 Therefore we have E[Y t+1 g x t ] 2 = E[Y t+1 α x t )] 2 + E[α x t g x t ] 2 (3) The second term on the right hand side of (3) cannot be made smaller than zero and the first term does not depend on g x t The function g x t that can makes the mean square error (3) as small as possible is the function that sets the second term in (3) to zero: g x t = α x t The MSE of this optimal forecast is E[Y t+1 g x t ] 2 = E[Y t+1 α x t ] 2 3 Copy Right by Chingnun Lee R 2006

Ch 15 1 PRINCIPLE OF FORECASTING For α x t is a linear projection of Y t+1 on x t, we will use the notation ˆP (Y t+1 x t ) = α x t, to indicate the linear projection of Y t+1 on x t Notice that MSE[ ˆP (Y t+1 x t )] MSE[E(Y t+1 x t )], since the conditional expectation offers the best possible forecast For most applications a constant term will be included in the projection We will use the symbol Ê to indicate a linear projection on a vector of random variables x t along a constant term: Ê(Y t+1 x t ) ˆP (Y t+1 1, x t ) 4 Copy Right by Chingnun Lee R 2006

Ch 2 15 FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION 2 Forecasts Based on an Infinite Number of Observation Recall that a general stationary and invertible ARM A(p, q) process is written in this form: φ(l)(y t µ) = θ(l)ε t, (4) where φ(l) = 1 φ 1 L φ 2 L 2 φ p L p, θ(l) = 1 + θ 1 L + θ 2 L 2 + + θ q L q and all the roots of φ(l) = 0 and θ(l) = 0 lie outside the unit circle 21 Forecasting Based on Lagged ε s, MA( ) form Assume that the lagged ε t is observed directly, then the following principle is useful in the manipulating the forecasting in the subsequence Consider an M A( ) form of (4): Y t µ = ϕ(l)ε t (5) with ε t white noise and ϕ(l) = θ(l)φ 1 (L) = ϕ 0 = 1, ϕ j < j=0 ϕ j L j, j=0 Suppose that we have an infinite number of observations on ε through date t, that is {ε t, ε t 1, ε t 2, }, and further know the value of µ and {ϕ 1, ϕ 2, } Say we want to forecast the value of Y t+s from now Note that (5) implies Y t+s = µ + ε t+s + ϕ 1 ε t+s 1 + + ϕ s 1 ε t+1 + ϕ s ε t +ϕ s+1 ε t 1 + We consider the best linear forecast since the ARMA model we considered so far is covariance-stationary, no assumption is made on their distribution so we 5 Copy Right by Chingnun Lee R 2006

Ch 2 15 FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION cannot form conditional expectation The best linear forecast takes the form 1 Ê(Y t+s ε t, ε t 1, ) = µ + ϕ s ε t + ϕ s+1 ε t 1 + (6) = [µ, ϕ s, ϕ s+1, ][1, ε t, ε t 1, ] (7) = α x t (8) That is, the unknown future ε s are set to their expected value of zero The error associated with this forecast is uncorrelated with x t = [1, ε t, ε t 1, ], or E{x t [Y t+s Ê(Y t+s ε t, ε t 1, )]} = 1 ε t E ε t 1 (ε t+s + ϕ 1 ε t+s 1 + + ϕ s 1 ε t+1 ) = 0 The mean squared error associated with this forecast is E[Y t+s Ê(Y t+s ε t, ε t 1, )] 2 = (1 + ϕ 2 1 + ϕ 2 2 + + ϕ 2 s 1)σ 2 Example: For an MA(q) process, the optimal linear forecast is { µ + θs ε Ê(Y t+s ε t, ε t 1, ] = t + θ s+1 ε t 1 + + θ q ε t q+s for s = 1, 2,, q µ for s = q + 1, q + 2, The MSE is σ 2 for s = 1 (1 + θ1 2 + θ2 2 + + θs 1)σ 2 2 for s = 2, 3,, q ((1 + θ1 2 + θ2 2 + + θq)σ 2 2 for s = q + 1, q + 2, The MSE increase with the forecast horizon s up until s = q If we try to forecast an MA(q) farther than q periods into the future, the forecast is simply the unconditional mean of the series (E(Y t+s ) = µ)) and the MSE is the unconditional variance of the series (V ar(y t+s ) = (1 + θ1 2 + θ2 2 + + θq)σ 2 2 ) 1 You cannot use conditional expectation that E(ε t+i ε t, ε t 1, ) = 0 for i = 1, 2,, s since we only assume that ε t is uncorrelated which do not imply that they are independent 6 Copy Right by Chingnun Lee R 2006

Ch 2 15 FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION A compact lag operator expression for the forecast in (6) is sometimes used Rewrite Y t+s as in (5) as Y t+s = µ + ϕ(l)ε t+s = µ + ϕ(l)l s ε t Consider polynomial that ϕ(l) are divided by L s : ϕ(l) L s = L s + ϕ 1 L 1 s + ϕ 2 L 2 s + + ϕ s 1 L 1 + ϕ s L 0 +ϕ s+1 L 1 + ϕ s+2 L 2 + as The annihilation operator replace negative powers of L by zero; for example, [ ] ϕ(l) = ϕ s L 0 + ϕ s+1 L 1 + ϕ s+2 L 2 + L s + Therefore the optimal forecast (6) could be written in lag operator notation [ ] ϕ(l) Ê(Y t+s ε t, ε t 1, ] = µ + ε L s t (9) + 22 Forecasting Based on Lagged Y s The previous forecasts were based on the assumption that ε t is observed directly In the usual forecasting situation, we actually have observation on lagged Y s, not lagged ε s Suppose that the general ARMA(p, q) has an AR( ) representation given by with ε t white noise and η(l)(y t µ) = ε t (10) η(l) = θ 1 (L)φ(L) = η 0 = 1, η j < j=0 η j L j = ϕ 1 (L), Under these conditions, we can substitute (10) into (9) to obtain the forecast of Y ts as a function of lagged Y s: [ ] ϕ(l) Ê(Y t+s Y t, Y t 1, ] = µ + η(l)(y L s t µ) (11) + j=0 7 Copy Right by Chingnun Lee R 2006

Ch 2 15 FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION or [ ] ϕ(l) Ê(Y t+s Y t, Y t 1, ] = µ + L s + 1 ϕ(l) (Y t µ) (12) Equation (12) is known as the W iener Kolmogorov prediction f ormula 221 Forecasting an AR(1) Process Therefore are two ways to forecast a AR(1) process They are: (a) W iener Kolmogorov prediction f ormula: For the covariance stationary AR(1) process, we have ϕ(l) = 1 1 φl = 1 + φl + φ2 L 2 + φ 3 L 3 + and [ ] ϕ(l) L s + = φ s + φ s+1 L 1 + φ s+2 L 2 + = φs 1 φl The optimal linear s-period ahead forecast for a stationary AR(1) process is therefore: Ê(Y t+s Y t, Y t 1, ] = µ + φs 1 φl (1 φl)(y t µ) = µ + φ s (Y t µ) The forecast decays geometrically from (Y t µ) toward µ as the forecast horizon s increase (b) Recursive Substitution and Lag operator: The AR(1) process can be represented as (using (119) on p3 of Hamilton) Y t+s µ = φ s (Y t µ) + φ s 1 ε t+1 + φ s 2 ε t+2 + + φε t+s 1 + ε t+s, Setting E(ε t+h ) = 0, h = 1, 2,, s, the optimal linear s-period ahead forecast for a stationary AR(1) process is therefore: Ê(Y t+s Y t, Y t 1, ] = µ + φ s (Y t µ), 8 Copy Right by Chingnun Lee R 2006

Ch 2 15 FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION with forecast error φ s 1 ε t+1 + φ s 2 ε t+2 + + φε t+s 1 + ε t+s, which is uncorrelated with the forecast Y t (= µ + ε t + φ 1 ε t 1 + φ 2 ε t 2 + ) The MSE of this forecast is E(φ s 1 ε t+1 + φ s 2 ε t+2 + + φε t+s 1 + ε t+s ) 2 = (1 + φ 2 + φ 4 + + φ 2(s 1) )σ 2 Notice that this grows with s and asymptotically approach σ 2 /(1 φ 2 ), the unconditional variance of Y 2 222 Forecasting an AR(p) Process (a) Use Recursive substitution and Lag operator: Following (11) in Chapter 13, the value of Y at t + s of an AR(p) process can be represented as Y t+s µ = f s 11(Y t µ) + f s 12(Y t 1 µ) + + f s 1p(Y t p+1 µ) + f s 1 11 ε t+1 + f s 2 11 ε t+2 + +f 1 11ε t+s 1 + ε t+s, where f j 11 is the (1, 1) elements of F j, φ 1 φ 2 φ 3 φ p 1 φ p 1 0 0 0 0 0 1 0 0 0 F 0 0 0 1 0 Setting E(ε t+h ) = 0, h = 1, 2,, s, the optimal linear s-period ahead forecast for a stationary AR(p) process is therefore: Ê(Y t+s Y t, Y t 1, ] = µ + f s 11(Y t µ) + f s 12(Y t 1 µ) + + f s 1p(Y t p+1 µ) The associated forecast error is Y t+s Ê(Y t+s) = f s 1 11 ε t+1 + f s 2 11 ε t+2 + + f 1 11ε t+s 1 + ε t+s 2 That is, with the information of Y t to forecast Y t+s, the MSE could never be larger than the unconditional variance of Y t+s 9 Copy Right by Chingnun Lee R 2006

Ch 2 15 FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION It is important to note that to forecast an AR(p) process, an optimal s-periodahead linear forecast based on an infinite number of observations {Y t, Y t 1, } in fact make use of only the p most recent value {Y t, Y t 1,, Y t p+1 } 223 Forecasting an M A(1) Process An invertible M A(1) process: with θ < 1 Y t µ = (1 + θl)ε t (a) Applying the Wiener-Kolmogorov formula we have [ ] 1 + θl 1 Ŷ t+s t = µ + L s 1 + θl (Y t µ), + To forecast an MA(1) process one period ahead (s = 1), [ ] 1 + θl = θ, L 1 + and so Ŷ t+1 t = µ + θ 1 + θl (Y t µ) = µ + θ(y t µ) θ 2 (Y t 1 µ) + θ 3 (Y t 2 µ) To forecast an MA(1) process for s = 2, 3, periods into the future, [ ] 1 + θl = 0 for s = 2, 3, ; L s an so Ŷ t+s t = µ for s = 2, 3, + 10 Copy Right by Chingnun Lee R 2006

Ch 2 15 FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION (b) From recursive substitution: An MA(1) process at period t + 1 is Y t+1 µ = ε t+1 + θε t At period t, E(ε t+s ) = 0, s = 1, 2, The optimal linear 1-period ahead forecast for a stationary M A(1) process is therefore: Ŷ t+1 t = µ + θε t = µ + θ(1 + θl) 1 (Y t µ) = µ + θ(y t µ) θ 2 (Y t 1 µ) + θ 3 (Y t 2 µ) An MA(1) process at period t + s is Y t+s µ = ε t+s + θε t+s 1, To forecast an MA(1) process for s = 2, 3, periods into the future therefore is Ŷ t+s t = µ for s = 2, 3, 224 Forecasting an M A(q) Process For an invertible M A(q) process, (Y t µ) = (1 + θ 1 L + θ 2 L 2 + + θ q L q )ε t, the forecast becomes [ ] 1 + θ1 L + θ 2 L 2 + + θ q L q Ŷ t+s t = µ + L s + 1 1 + θ 1 L + θ 2 L 2 + + θ q L q (Y t µ) Now [ ] 1 + θ1 L + θ 2 L 2 + + θ q L q = { θs + θ s+1 L + θ s+2 L 2 + + θ q L q s for s = 1, 2,, q 0 for s = q + 1, q + 2, L s 11 Copy Right by Chingnun Lee R 2006 +

Ch 2 15 FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION Thus for horizons of s = 1, 2,, q, the forecast is given by Ŷ t+s t = µ + (θ s + θ s+1 L + θ s+2 L 2 + + θ q L q s 1 ) 1 + θ 1 L + θ 2 L 2 + + θ q L (Y q t µ) A Forecast farther then q periods into the future is simply the unconditional mean µ It is important to note that to forecast an MA(q) process, an optimal s (s q)-period-ahead linear forecast would in principle requires all of the historical value of Y {Y t, Y t 1, } 225 Forecasting an ARM A(1, 1) Process For an ARMA(1, 1) process (1 φl)(y t µ) = (1 + θl)ε t, (13) that is stationary and invertible Denote (1+θL)ε t = η t, this ARMA(1, 1) process can be represented as (using (119) on p3 of Hamilton) Y t+s µ = φ s (Y t µ) + φ s 1 η t+1 + φ s 2 η t+2 + + φη t+s 1 + η t+s = φ s (Y t µ) + φ s 1 (1 + θl)ε t+1 + φ s 2 (1 + θl)ε t+2 + +φ(1 + θl)ε t+s 1 + (1 + θl)ε t+s = φ s (Y t µ) + (φ s 1 ε t+1 + φ s 1 θε t ) + (φ s 2 ε t+2 + φ s 2 θε t+1 ) + +(φε t+s 1 + φθε t+s 2 ) + (ε t+s + θε t+s 1 ) Setting E(ε t+h ) = 0, h = 1, 2,, s, the optimal linear s-period ahead forecast for 12 Copy Right by Chingnun Lee R 2006

Ch 2 15 FORECASTS BASED ON AN INFINITE NUMBER OF OBSERVATION a stationary ARM A(1, 1) process is therefore: Ê(Y t+s Y t, Y t 1, ] = µ + φ s (Y t µ) + φ s 1 θε t = µ + φ s (Y t µ) + φ s 1 θ 1 φl 1 + θl (Y t µ) from (13) = µ + (1 + θl)φs + φ s 1 θ(1 φl) (Y t µ) 1 + θl = µ + φs + θlφ s + φ s 1 θ θlφ s (Y t µ) 1 + θl = µ + φs + φ s 1 θ 1 + θl (Y t µ) It is important to note that to forecast an ARMA(1, 1) process, an optimal s-period-ahead linear forecast would in principle requires all of the historical value of Y {Y t, Y t 1, } Exercise: Suppose that the US quarterly seasonally adjusted unemployment rate for the period 1948 1972 behaves much like the time series Y t 477 = 154(Y t 1 477) 067(Y t 2 477) + ε t, where the ε t are uncorrelated random (0, 1) variables Let us assume that we know that this is the proper representation The four observations updated to 1972 are Y 1972 = 530, Y 1971 = 553, Y 1970 = 577, and Y 1969 = 583 Please make a best linear forecast of Y 1972+s, s = 1, 2, 3, 4 based on the information available 13 Copy Right by Chingnun Lee R 2006

Ch 153 FORECAST BASED ON A FINITE NUMBER OF OBSERVATIONS 3 Forecast Based on a Finite Number of Observations The section continues to assume that population parameters are known with certainty, but develops forecasts based on a finite m observations, {Y t, Y t 1,, Y t m+1 } 31 Approximations to Optimal Forecast For forecasting an AR(p) process, an optimal s-period-ahead linear forecast based on an infinite number of observations {Y t, Y t 1, } in fact make use of only the p most recent value {Y t, Y t 1,, Y t p+1 } For an MA or ARMA process, however, we would require in principle all of the historical value of Y in order to implement the formula of the proceeding section In reality we do not have infinite number of observations for forecasting One approach to forecasting based on a finite number of observations is to act as if pre-sample Y s were all equal to its mean value µ (or ε = 0) This idea is thus to use the approximation Ê(Y t+s 1, Y t, Y t 1, ) = Ê(Y t+s Y t, Y t 1,, Y t m+1 ; Y t m = µ, Y t m 1 = µ, ) For example in the forecasting of an MA(1) process, the 1 period ahead forecast Ŷ t+1 t = µ + θ(y t µ) θ 2 (Y t 1 µ) + θ 3 (Y t 2 µ) is approximated by Ŷ t+1 t = µ + θ(y t µ) θ 2 (Y t 1 µ) + θ 3 (Y t 2 µ) + ( 1) m 1 θ m (Y t m+1 µ) For m large and θ small (then the real difference between Y and µ will multiply a small and smaller number), this clearly gives an excellent approximation For θ closer to unity, the approximation may be poor 14 Copy Right by Chingnun Lee R 2006

Ch 153 FORECAST BASED ON A FINITE NUMBER OF OBSERVATIONS 32 Exact Finite-Sample Forecast An alternative approach is to calculate the exact projection of Y t+1 on its m most recent values 3 Let x t = 1 Y t Y t 1 Y t m+1 We thus seek a linear forecast of the form Ê(Y t+1 x t ) = α (m) x t = α (m) 0 + α (m) 1 Y t + α (m) 2 Y t 1 + + α (m) m Y t m+1 (14) The coefficient relating Y t+1 to Y t in a projection of Y t+1 on the m most recent value of Y is denoted α (m) 1 in (14) This will in general be different from the coefficient relating Y t+1 to Y t in a projection of Y t+1 on the (m + 1) most recent value of Y ; the latter coefficient would be denoted α (m+1) 1 From (2) we know that E(Y t+1 x t) = α E(x t x t), or α = E(Y t+1 x t)[e(x t x t)] 1, (15) assuming that E(x t x t) is a nonsingular matrix Since for a covariance stationary process Y t, γ j = E(Y t+j µ)(y t µ) = E(Y t+j Y t ) µ 2, here E(Y t+1 x t) = E(Y t+1 )[1 Y t Y t 1 Y t m+1 ] = [µ (γ 1 + µ 2 ) (γ 2 + µ 2 ) (γ m + µ 2 )] 3 So we do not need to assume any model such as ARMA here 15 Copy Right by Chingnun Lee R 2006

Ch 153 FORECAST BASED ON A FINITE NUMBER OF OBSERVATIONS and E(x t x t) = E = 1 Y t Y t 1 Y t m+1 [ ] 1 Yt Y t 1 Y t m+1 1 µ µ µ µ γ 0 + µ 2 γ 1 + µ 2 γ m 1 + µ 2 µ γ 1 + µ 2 γ 0 + µ 2 γ m 2 + µ 2 µ γ m 1 + µ 2 γ m 2 + µ 2 γ 0 + µ 2 Then α (m) = [µ (γ 1 + µ 2 ) (γ 2 + µ 2 ) (γ m + µ 2 )] 1 µ µ µ µ γ 0 + µ 2 γ 1 + µ 2 γ m 1 + µ 2 µ γ 1 + µ 2 γ 0 + µ 2 γ m 2 + µ 2 µ γ m 1 + µ 2 γ m 2 + µ 2 γ 0 + µ 2 1 When a constant term is included in x t, it is more convenient to express variables in deviations from the mean Then we could calculate the projection of (Y t+1 µ) on x t = [(Y t µ), (Y t 1 µ),, (Y t m+1 µ)] : Ŷ t+1 t µ = α (m) 1 (Y t µ) + α (m) 2 (Y t 1 µ) + + α (m) m (Y t m+1 µ) For this definition of x t the coefficients can be calculated from (15) to be α (m) 1 1 α (m) 2 α (m) = α (m) m = γ 0 γ 1 γ m 1 γ 1 γ 0 γ m 2 γ m 1 γ m 2 γ 0 γ 1 γ 2 γ m (16) 16 Copy Right by Chingnun Lee R 2006

Ch 153 FORECAST BASED ON A FINITE NUMBER OF OBSERVATIONS To generate an s-period-ahead forecast Ŷt+s t we would use Ŷ t+s t µ = α (m,s) 1 (Y t µ) + α (m,s) 2 (Y t 1 µ) + + α (m,s) m (Y t m+1 µ), where α (m,s) 1 α (m,s) 2 α (m,s) m = γ 0 γ 1 γ m 1 γ 1 γ 0 γ m 2 γ m 1 γ m 2 γ 0 1 γ s γ s+1 γ s+m 1 17 Copy Right by Chingnun Lee R 2006

Ch 15 4 SUMS OF ARMA PROCESS 4 Sums of ARMA Process This section explores the nature of series that result from adding two different ARM A process together 41 Sum of an MA(1) Process and White Noise Suppose that a series X t follows a zero-mean MA(1) process: X t = u t + δu t 1, where u t is white noise with variance σ 2 u The autocovariance of X t are thus (1 + δ 2 )σu 2 for j = 0 E(X t X t j ) = δσu 2 for j = ±1 0 otherwise (17) Let v t indicate a separate white noise series with variance σ 2 v Suppose, furthermore, that v and u are uncorrelated at all leads and lags: E(u t v t j ) = 0 for all j, implying that E(X t v t j ) = o for all j (18) Let an observed series Y t represent the sum of the MA(1) and the white noise process: Y t = X t + v t (19) = u t + δu t 1 + v t (20) The question now posed is, What are the time series properties of Y? 18 Copy Right by Chingnun Lee R 2006

Ch 15 4 SUMS OF ARMA PROCESS Clearly, Y t has mean zero, and its autocovariances can be deduced from (17) through (18): E(Y t Y t j ) = E(X t + v t )(X t j + v t j ) = E(X t X t j ) + E(v t v t j ) (1 + δ 2 )σu 2 + σv 2 for j = 0 = δσu 2 for j = ±1 0 otherwise (21) Thus, the sum X t +v t is covariance-stationary, and its autocovariances are zero beyond one lags, as those for an MA(1) We might naturally then ask whether there exist a zero-mean MA(1) represent for Y, Y t = ε t + θε t 1, (22) with E(ε t ε t j ) = { σ 2 for j = 0 0 otherwise, (23) whose autocovariance match those implied by (21) The autocovariance of (22) would be given by E(Y t Y t j ) = and (1 + θ 2 )σ 2 for j = 0 θσ 2 for j = ±1 0 otherwise In order to be consistent with (21), it would be the case that (1 + θ 2 )σ 2 = (1 + δ 2 )σ 2 u + σ 2 v (24) θσ 2 = δσ 2 u (25) Taking the value 4 associated with the invertible representation (θ, σ 2 ) which is solved from (24) and (25) simultaneously Let us consider whether (22) could indeed characterize the data Y t generated by (20) This would require (1 + θ L)ε t = (1 + δl)u t + v t, 4 Two value of θ that satisfy (24) and (25) can be found from the quadratic formula: θ = [(1 + δ2 ) + (σ 2 v/σ 2 u)] ± [(1 + δ 2 ) + (σ 2 v/σ 2 u)] 2 4δ 2 2δ 19 Copy Right by Chingnun Lee R 2006

Ch 15 4 SUMS OF ARMA PROCESS or ε t = (1 + θ L) 1 [(1 + δl)u t + v t ] = (u t θ u t 1 + θ 2 u t 2 θ 3 u t 3 + ) +δ(u t 1 θ u t 2 + θ 2 u t 3 θ 3 u t 4 + ) +(v t θ v t 1 + θ 2 v t 2 θ 3 v t 3 + ) (26) The series ε t defined in (26) is a distributed lag on past value of u and v, so it might seem to possess a rich autocorrelation structure In fact, it turns out to be white noise! To see this, note from (21) that the autocovariance-generating function of Y can be written g Y (z) = δσuz 2 1 + [(1 + δ 2 )σu 2 + σv]z 2 0 + δσuz 2 1 = σu(1 2 + δz)(1 + δz 1 ) + σv, 2 (27) so the autocovariance-generating function of ε t = (1 + θ L) 1 Y t is g ε (z) = σ2 u(1 + δz)(1 + δz 1 ) + σv 2 (28) (1 + θ z)(1 + θ z 1 ) But θ and σ 2 were chosen so as to make the autocovariance-generating function of (1 + θ L)ε t, namely, (1 + θ z)σ 2 (1 + θ z 1 ), (29) identical to the right side of (27) Thus, (28) is simply equal to g ε (z) = σ 2, a white noise series To summarize, adding an MA(1) process to a white noise series with which it is uncorrelated at all leads and lags produce a new M A(1) process characterized by (22) 42 Sum of Two Independent Moving Average Process As a necessary preliminary to what follows, consider a stochastic process W t, which is the sum of two independent moving average processes of order q 1 and q 2, respectively That is, W t = θ 1 (L)u t + θ 2 (L)v t, 20 Copy Right by Chingnun Lee R 2006

Ch 15 4 SUMS OF ARMA PROCESS where θ 1 (L) and θ 2 (L) are polynomials in L, of order q 1 and q 2, and the white noise processes u t and v t have zero means and are mutually uncorrelated at all lead and lags Suppose that q = max(q 1, q 2 ); then it is clear that the autocovariance function γ j for W t must be zero for j > q 5 It follows that there exists a representation of W t as a single moving average process of order q: W t = θ 3 (L)ε t, (30) where ε t is a white noise process with mean zero Thus the sum of two uncorrelated moving average processes is another moving average process, whose order is the same as that of the component process of higher order 43 Sum of an ARMA(p, q) Plus White Noise Consider the general stationary and invertible ARM A(p, q) model φ(l)x t = θ(l)u t, where u t is a white noise process with variance σu 2 Let v t indicate a separate white noise series with variance σv 2 Suppose, furthermore, that v and u are uncorrelated at all leads and lags Let an observed series Y t represent the sum of the ARMA(p, q) and the white noise process: Y t = X t + v t (31) In general we have φ(l)y t = φ(l)x t + φ(l)v t = θ(l)u t + φ(l)v t Let Q = max(q, p) and from (30) we have that Y t is an ARMA(p, Q) process 5 To see this, let W t = X t + Y t, then γ W j = E(W t W t j ) = E(X t + Y t )(X t j + Y t j ) = E(X t X t j )(Y t Y t j ) { γ X = j + γj Y for j = 0, ±1, ±2,, ±q, 0 otherwise 21 Copy Right by Chingnun Lee R 2006

Ch 15 4 SUMS OF ARMA PROCESS 44 Adding Two Autoregressive Processes Suppose now that X t are AR(p 1 ) and W t are AR(p 2 ) process: φ 1 (L)X t = u t φ 2 (L)W t = v t, (32) where u t and v t are uncorrelated white noise at all lead and lags Again suppose that we observe Y t = X t + W t, we wish to determine the nature of t5he observed process Y t From (32) we have then φ 2 (L)φ 1 (L)X t = φ 2 (L)u t φ 1 (L)φ 2 (L)W t = φ 1 (L)v t, φ 1 (L)φ 2 (L)(X t + W t ) = φ 1 (L)v t + φ 2 (L)u t or φ 1 (L)φ 2 (L)Y t = φ 1 (L)v t + φ 2 (L)u t Therefore Y t is an ARMA(p 1 + p 2, max{p 1, p 2 }) process from (30) 22 Copy Right by Chingnun Lee R 2006

Ch 15 5 WOLD S DECOMPOSITION 5 Wold s Decomposition All of the covariance stationary ARMA process considered in Chapter 14 can be written in the form Y t = µ + ϕ j ε t j, (33) where ε t is a white noise process and j=0 ϕ2 j < with ϕ 0 = 1 j=0 One might think that we are able to write all these processes in the form of (33) because the discussion was restricted to a convenient class of models (parametric ARM A model) However, the following result establishes that the representation (33) is in fact fundamental for any covariance-stationary time series Theorem (Wold s Decomposition): Let Y t be any covariance stationary stochastic process with EY t = 0 Then it can be written as Y t = ϕ j ε t j + η t, (34) j=0 where ϕ 0 = 1 and where j=0 ϕ2 j <, Eε 2 t = σ 2 0, E(ε t ε s ) = 0 for t s, Eε t = 0 and Eε t η s = 0 for all t and s; and η t is a process that can be predicted arbitrary well by a linear function of only past value of Y t, ie, η t is linearly deterministic 6 Furthermore, ε t = Y t Ê(Y t Y t 1, Y t 2, ) Proof: See Sargent (1987, pp 286-90) or Fuller (1996, pp 94-98), or Brockwell and Davis (1991, pp 187-89) Wold s decomposition is important for us because it provides an explanation of the sense in which ARM A model (stochastic difference equation) provide a general model for the indeterministic part of any univariate stationary stochastic process, and also the sense in which there exist a white-noise process ε t (which is in fact the forecast error from one-period ahead linear projection) that is the building block for the indeterministic part of Y t 6 When η t = 0, then the process (18) is called purely linear indeterministic 23 Copy Right by Chingnun Lee R 2006