Lecture on ARMA model

Similar documents
Covariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )

Discrete time processes

Lecture 1: Stationary Time Series Analysis

Lecture 1: Stationary Time Series Analysis

Autoregressive Moving Average (ARMA) Models and their Practical Applications

Econometrics II Heij et al. Chapter 7.1

Univariate Time Series Analysis; ARIMA Models

Ch. 14 Stationary ARMA Process

STAT Financial Time Series

Midterm Suggested Solutions

ECON/FIN 250: Forecasting in Finance and Economics: Section 6: Standard Univariate Models

Chapter 4: Models for Stationary Time Series

ARIMA Models. Richard G. Pierse

Stationary Stochastic Time Series Models

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Empirical Market Microstructure Analysis (EMMA)

Class 1: Stationary Time Series Analysis

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Ch. 15 Forecasting. 1.1 Forecasts Based on Conditional Expectations

Advanced Econometrics

Econometrics I: Univariate Time Series Econometrics (1)

E 4101/5101 Lecture 6: Spectral analysis

Introduction to ARMA and GARCH processes

Single Equation Linear GMM with Serially Correlated Moment Conditions

Lecture 1: Fundamental concepts in Time Series Analysis (part 2)

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

Univariate ARIMA Models

Applied time-series analysis

Autoregressive and Moving-Average Models

Introduction to Stochastic processes

1 Linear Difference Equations

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

Lecture 2: Univariate Time Series

Lecture 4a: ARMA Model

Econ 623 Econometrics II Topic 2: Stationary Time Series

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

Problem Set 2: Box-Jenkins methodology

Basic concepts and terminology: AR, MA and ARMA processes

FE570 Financial Markets and Trading. Stevens Institute of Technology

Trend-Cycle Decompositions

Ch 6. Model Specification. Time Series Analysis

Forecasting with ARMA

Review Session: Econometrics - CLEFIN (20192)

ECONOMETRICS Part II PhD LBS

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Design of Time Series Model for Road Accident Fatal Death in Tamilnadu

Vector autoregressions, VAR

Time Series I Time Domain Methods

CHAPTER 8 FORECASTING PRACTICE I

Econometric Forecasting

Università di Pavia. Forecasting. Eduardo Rossi

ECON 616: Lecture 1: Time Series Basics

AR, MA and ARMA models

ARIMA Modelling and Forecasting

9. AUTOCORRELATION. [1] Definition of Autocorrelation (AUTO) 1) Model: y t = x t β + ε t. We say that AUTO exists if cov(ε t,ε s ) 0, t s.

Define y t+h t as the forecast of y t+h based on I t known parameters. The forecast error is. Forecasting

Some Time-Series Models

Ch 4. Models For Stationary Time Series. Time Series Analysis

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

ARMA Estimation Recipes

Week 5 Quantitative Analysis of Financial Markets Characterizing Cycles

Time Series: Theory and Methods

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Lecture note 2 considered the statistical analysis of regression models for time

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

Topic 4 Unit Roots. Gerald P. Dwyer. February Clemson University

TIME SERIES AND FORECASTING. Luca Gambetti UAB, Barcelona GSE Master in Macroeconomic Policy and Financial Markets

Univariate Nonstationary Time Series 1

ECON/FIN 250: Forecasting in Finance and Economics: Section 8: Forecast Examples: Part 1

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Permanent Income Hypothesis (PIH) Instructor: Dmytro Hryshko

Box-Jenkins ARIMA Advanced Time Series

Introduction to Time Series Analysis. Lecture 11.

Estimation and application of best ARIMA model for forecasting the uranium price.

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

6 NONSEASONAL BOX-JENKINS MODELS

Single Equation Linear GMM with Serially Correlated Moment Conditions

7. MULTIVARATE STATIONARY PROCESSES

3. ARMA Modeling. Now: Important class of stationary processes

1 Class Organization. 2 Introduction

Time Series Analysis Fall 2008

Ross Bettinger, Analytical Consultant, Seattle, WA

Note: The primary reference for these notes is Enders (2004). An alternative and more technical treatment can be found in Hamilton (1994).

Modelling using ARMA processes

Dynamic Time Series Regression: A Panacea for Spurious Correlations

distributed approximately according to white noise. Likewise, for general ARMA(p,q), the residuals can be expressed as

ARMA MODELS Herman J. Bierens Pennsylvania State University February 23, 2009

at least 50 and preferably 100 observations should be available to build a proper model

A time series is called strictly stationary if the joint distribution of every collection (Y t

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

LINEAR STOCHASTIC MODELS

MAT3379 (Winter 2016)

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

A SARIMAX coupled modelling applied to individual load curves intraday forecasting

Econometría 2: Análisis de series de Tiempo

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

3 Theory of stationary random processes

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis

Transcription:

Lecture on ARMA model Robert M. de Jong Ohio State University Columbus, OH 43210 USA Chien-Ho Wang National Taipei University Taipei City, 104 Taiwan ROC October 19, 2006 (Very Preliminary edition, Comment welcome) Robert M. de Jong, Department of Economics, Ohio State University, 429 Arps Hall, Columbus, OH 43210, USA, Email: dejong@econ.ohio-state.edu. Department of Economics, National Taipei University, Taipei City 104, Taiwan, Email: wangchi3@mail.ntpu.edu.tw. 1

Contents 1 Basic concepts 3 1.1 Difference equation................................ 3 1.2 Lag operator.................................... 5 1.3 Weakly and Strong Stationary......................... 6 1.4 Stationary ARMA process............................ 7 1.4.1 Moving average process.......................... 7 1.4.2 Autoregressive process.......................... 8 1.4.3 Autoregressive moving average (ARMA) process............ 10 1.5 Box-Jenkins Modelling Philosophy........................ 11 1.5.1 Identification............................... 12 1.5.2 Estimation................................. 13 1.5.3 Diagnostic Checking........................... 16 1.6 Forecasting ARMA model............................ 16 2

1 Basic concepts The Autoregressive-Moving Average model (ARMA) model is the basic model in time series model. In this Chapter, we give some basic ideas about ARMA model. The basic ARMA(p,q) model is y t ρ 1 y t 1 ρ 2 y t 2... ρ p y t p = ε t + ϕ 1 ε t 1 +... + ϕ q ε t q. (1) where {ε t } is a white noise series with mean 0 and variance σ 2. 1.1 Difference equation Before we study the ARMA model, we will introduce some useful mathematical tools to solve dynamic system. First, we consider a dynamic equation: y t ρy t 1 = 0. (2) Equation (2) is called first order ordinary difference equation. When ρ < 1, we say that this first order ordinary difference equation is stable. This means that this first difference equation converges when time domain goes to infinity. In order to solve this equation, we consider a general solution y t = C exp(ξt). (3) We can rewrite Equation (2) as y t = ρy t 1. (4) We substitute the general solution to Equation (4). C exp(ξt) = ρc exp(ξ(t 1)) = ρc exp(ξt)(exp( ξ)) (5) Divide C exp(ξt) from two sides of Equation (5), 1 = ρ exp( ξ). (6) So, ρ = exp(ξ) = exp(ξ). (7) In Equation (7), we need two conditions to satisfy system stable. The first is ξ < 0. The second is that ξ must be real-valued. If ξ is a complex variable, we consider ξ = a + bi. From De Moivre s theorem (Hamilton 1994), we know exp(iϕ) = cos(ϕ) + i sin(ϕ). We substitute the general solution and rewrite Equation(4) y t = C exp(ξt) = C exp((a + bi)t) = C exp(at) exp(bti) = ρc exp((a + bi)(t 1)) (8) 3

With similar method, we can obtain the result 1 = ρ exp( a) exp( bi). (9) Remember modulus formula in Polar coordinate exp(iϕ) = cos 2 (ϕ) + sin 2 (ϕ) = 1. (10) Substitute Equation (10) to Equation (9) 1 = ρ exp( a). (11) For complex variable solution, we need a < 0 to make sure system stable. Next we consider the second order ordinary difference equation y t ρ 1 y t 1 ρ 2 y t 2 = 0. (12) The general solution is C 1 exp(tξ 1 ) + C 2 exp(tξ). (13) We substitute general solution into Equation (12). Equation(12) can be written as 1 ρ 1 exp( ξ 1 ) ρ 2 exp( 2ξ 1 ) = 0. (14) Assume x = exp( ξ 1 ), we can solve Equation (14) ρ(x ϕ 1 )(x ϕ 2 ) = 0 (15) Two possible solutions under ξ 1 and ξ 2 real numbers are { exp( ξ1 ) = ϕ 1 ϕ 1 = exp( Re(ξ 1 )) exp( ξ 1 ) = ϕ 2 ϕ 2 = exp( Re(ξ 1 )) If the second order difference equation is stable, we need ϕ 1 > 1 and ϕ 2 > 1. For complex variable solutions, we assume { ξ1 = a 1 + b 1 i ξ 2 = a 2 + b 2 i (16) (17) We can substitute Equation(17) into Equation(12) y t = C 1 exp(tξ 1 ) + C 2 exp(tξ 2 ) = C 1 exp(a 1 t) exp(ib 1 t) + C 2 exp(a 2 t) exp(b 2 it) = C 1 exp(a 1 t)[cos(b 1 t) + i sin(b 1 t) + C 2 exp(a 2 t)[cos(b 2 t) + i sin(b 2 t)]. (18) If the second order difference equation is stable, we need a 1 < 1 and a 2 < 1. From the two examples of difference equations, roots of lag polynomial outside unit circle are stable. 4

1.2 Lag operator The lag operator is the useful notion in time series analysis. If we have a time series data {x t } T t=0, we can use lag operator to model the data series and investigate the characteristics of {x t } T t=0. The basic characteristics of lag operator are 1. L(x t ) = x t 1 Example: x t = x t x t 1 = x t L(x t ) = x t Lx t = (1 L)x t. 2. L 2 x t = L Lx t = Lx t 1 = x t 2 3. For C is a constant, L C = C 4. For lag operators y t Lx t L(y t x t ) and y t Lx t y t x t L 5. lag polynomial a i L i = a(l). i=0 Some examples about lag operation 1. L(x t + y t ) = Lx t + Ly t = x t 1 + y t 1. 2. y t 1/2y t 1 3y t 2 = y t 1/2Ly t 3L 2 y t = (1 1/2L 3L 2 )y t = (1 2L)(1 + 1.5L)y t 3. a(l)y t = a i L i y t = i=0 a i y t i. i=0 4. Autoregressive-moving average (ARMA) model y t ρ 1 y t 1... ρ p y t p = ε t + ϕ 1 ε t 1 +... + ϕ q ε t q. We can rewrite ARMA model, P (L)y t = ϕ(l)ε t, where P (L) = 1 ρ 1 L ρ 2 L 2... ρ p L p and ϕ(l) = 1 + ϕ 1 L + ϕ 2 L 2 +... + ϕ q L q. 5

With lag operator notation, we can solve the stable condition of linear difference equation easily. Remember the Equation(2), we can use lag operation to solve stable condition, y t ρy t 1 = 0 (1 ρl)y t = 0. For the system stable, we need 1 ρz = 0 Z = 1/ρ. 1/ρ is a root of lag polynomial. 1.3 Weakly and Strong Stationary Let {y t } is a random variable that is a function of t. We consider {y t } has mean µ. We define autocovariance of {y t } E[(y t E(y t ))(y t k E(y t k ))] = γ k, k = 0, ±1, ±2,...,. (19) The autocovariance of {y t } will depend on k. When k = 0, autocovariance is equal to var(y t ). In addition to autocovariance, we define autocorrelations of {y t } are ρ k = E[(y t µ)(y t k µ)] E(yt µ) 2 E(y t k µ) 2 (20) If we have the same expectation E(y t ) = E(y t 1 ) =... = E(y t k ), we can simplify autocorrelation function ρ k = γ k γ 0. In weak stationary, the covariances of stochastic process y t and y t k will depend on the distance between y t and y t k, not on time domain t. A time series {y t } is called strict stationary if its distributions are invariant under different time period. i.e. E(y t + y t+3 ) 4 = E(y 1 + y 4 ) 4 = E(y 100 + y 104 ) 4 (22) γ k = cov(y t, y t k ) (23) (21) p k = γk γ 0 Definition 1 Wold Decomposition(Alibi for ARMA models) Any covariance-stationary process y t can be written as y t = k t + ψ j ε t j ψ 2 j <, (24) (25) where k t are some detrend function. ε t is white noise with variance σ 2. 6

1.4 Stationary ARMA process In this section, we will introduce some basic stationary time series processes. 1.4.1 Moving average process First order moving average process (MA(1)) Basic form of first order moving average process is y t = ε t + θε t 1, (26) where ε t is a white noise series with variance σ 2. If the coefficient θ satisfy the condition θ < 1, this MA(1) process is invertible. We can calculate the variance of MA(1) process as below: γ 0 = var(y t ) = var(ε t ) + var(θε t 1 ) = var(ε t ) + θ 2 var(ε t 1 ) = (1 + θ 2 )σ 2 (27) The first order autocovariance is γ 1 = cov(y t, y t 1 ) = cov(ε t + θε t 1, ε t 1 + θε t 2 ) = cov(ε t, ε t 1 ) + θcov(ε t ε t 2 ) + θvar(ε t 1 ) + θ 2 cov(ε t 1, ε t 2 ) = θσ 2. (28) The high order autocovariance are γ j = cov(y t, y t j ) = 0, for j = 2, 3... t. The first autocorrelation for MA(1) process is ρ 1 = γ 1 γ 0 = θ 1 + θ 2 The high order autocorrelation are ρ j = γ j γ 0 = 0, for j = 2, 3... t. 7

high order moving average process MA(q) The high order moving average process has the following form: y t = ε t + θ 1 ε t 1 + θ 2 ε t 2 + + θ q ε t q. (29) How does the autocovariance functions look like? First, consider first order autocovariance γ 0 q γ 0 = E(yt 2 ) = E( θ j ε t j ) 2 = q θj 2 σ 2 (30) Second, consider the high order autocovariances γ k q q γ k = E(y t, y t k ) = E( θ j ε t j )( θ j k ε t k j ) = q q θ j θ i E(ε t j ε t i k ) = q θ j θ j k σ 2 I(0 j k q) = q θ i θ j k σ 2 I(0 j k) = q θ j θ j k σ 2, (31) j=k where k q and I(.) is a indicator function. If k > q, the high order autocovariances are equal to zero. (γ k = 0, where k > q. ) From the MA(q) autocovariance functions, we can derive the autocorrelation function. 1.4.2 Autoregressive process In this section, we will introduce another useful stochastic process-autoregressive model. First order autoregressive process: y t = φy t 1 + ε t, (32) where φ < 1 and ε t is a white noise series with variance σ 2. Equation (32) is called first order autoregressive process (AR(1)). If we use recursive substitution, we may rewrite Equation (32) for summation of infinite error sequence. y t = φ j ε t j. (33) If we use lag operator to rewrite Equation (32), (1 φl)y t = ε t (34) 8

If the root of 1 φl = 0 outside unit circle or φ < 1, this AR(1) process will be weakly stationary. When the root of 1 φl = 0 inside unit circle or φ 1, the effect from ε t j to y t do not die out. Hence y t will be explosive. In AR(1) model, the autocovariance functions are as below. γ 0 = E(yt 2 ) = E( ε t j ) 2 = φ 2j E(ε 2 t j) = σ 2 1 ( 1 φ ) (35) 2 γ k = E(y t y t k ) = E( φ j ε t j )( φ i ε t k i ) = i=0 ϕ j ϕ i E(ε t j ε t i k ) i=0 = φ j φ i E(ε 2 t j)i(j = k + i) = i=0 φ j φ i σ 2 I(j = k + i) = i=0 φ j φ j+k σ 2 i=0 = φ k φ 2i σ 2 = φ k 1 ( 1 φ ) (36) 2 i=0 The other method to calculate autocovariances for AR(1) are as below: γ k = E(y t y t k ) = E(φy t 1 + ε t )y t k = φe(y t 1 y t k ) + E(ε t y t k ) = φγ k 2 (37) We use recursive substitution for Equation (37). γ k = φ k γ 0 (38) The variance of AR(1) model is γ 0 = σ2 1 φ 2. Substitute Equation (39) into (38). (39) σ 2 γ k = φ k ( 1 φ ) (40) 2 The autocorrelation functions of AR(1) model define as below: ρ k = γ k γ 0 = φ k, where k = 0, 1,.... (41) 9

Second order autoregressive process: y t = φ 1 y t 1 + φ 2 y t 2 + ε t (42) If Equation (42) is weakly stationary, we need the roots of lag polynomial 1 φ 1 L φ 2 L 2 = 0 outside the unit circle. About the autocovariances of AR(2) model, First we multiply y t 1 in Equation (42). y t y t 1 = φ 1 y 2 t 1 + φ 2 y t 2 y t 1 + ε t y t 1. (43) If we take expectation to Equation (43), we may obtain γ 1 = φ 1 γ 0 + φ 2 γ 1. (44) Second, we multiply y t 2 in Equation (42) and take expectation. E(y t y t 2 ) = φ 1 E(y t 1 y t 2 ) + φ 2 E(y 2 t 2) + E(ε t y t 2 ) (45) We may obtain the second equation. γ 2 = φ 1 γ 1 + φ 2 γ 0 (46) If we divided γ 0 to Equation (44) and (46), we obtain two equations about autocorrelation. ρ 1 = φ 1 = φ 2 ρ 1 ρ 2 = φ 1 ρ 1 + φ 2 (47) (48) These two equation call Yule-Walker equations. We can use Yule-Walker equation to calculate autocorrelations under high order AR or ARMA processes. From the Yule-Walker equations, we may calculate the first and second order autocorrelations ρ 1 = φ 1 1 φ 2 φ 1 ρ 2 = φ 1 ( ) + ρ 2 1 φ 2 (49) (50) 1.4.3 Autoregressive moving average (ARMA) process The simple autoregressive moving average process is ARMA(1,1) process. y t = φy t 1 + ε t + θε t 1 (51) If the root of Polynomial 1 φl = 0 outside unit circle, Equation (52) is weak stationary. If the moving average polynomial 1 + θl = 0 has the root outside unit circle, We may rewrite ARMA(1,1) model to an AR( ) representation. This ARMA(1,1) process is invertible. In ARMA(1,1) model, we can use Yule-Walker Equation to derive autovariances and autocorrelation functions. First, we obtain the following two equations. E(y 2 t ) φe(y t y t 1 ) = γ 0 + φγ 1 = E(y t ε t ) + θe(y t ε t 1 ) (52) 10

E(y t y t 1 ) φe(y 2 t 1) = γ 1 + φγ 0 = E(y t 1 ε t ) + θe(y t 1 tε t 1 ) (53) From weak stationary of ARMA(1,1) process and ε t i.i.d.(0, σ 2 ), we know E(y t ε t ) = E(y t 1 ε t 1 )and E(y t 1 ε t ) = 0. For the E(y t ε t ) and E(y t ε t 1 ), E(y t ε t ) = E(φy t 1 + ε t + θε t 1 )ε t = φe(y t 1 ε t ) + E(ε 2 t ) + θe(ε t 1 ε t ) = σ 2 (54) E(y t ε t 1 ) = E(φy t 1 + ε t + θε t 1 )ε t 1 = φe(y t 1 ε t 1 ) + E(ε t ε t 1 ) + θe(ε t 1 ε t 1 ) = φe(y t ε t ) + 0 + θσ 2 = (φ + θ)σ 2 (55) Solve Equation (54) and (55) simultaneously, { γ0 φγ 1 = σ 2 + θ(φ + θ)σ 2 γ 1 φγ 0 = θσ 2 [ ] We obtain γ 0 = 1+2θφ+θ2 σ 2 σ and γ 1 φ 2 1 = φ 2 (1 + 2φθ + θ 2 ) + θσ 2. For the high order of 1 φ 2 Yule-Walker Equations, E(y t y t i ) E(φy t 1 y t i ) = E(ε t y t i )+θe(ε t y t i ) = γ i φγ i 1 = 0 where i = 2, 3...(56) Use recursive method for Equation (63), we can obtain the general form of autocovariances. [ ] γ m = φ i 1 γ 1 = φ i 1 φ σ2 1 φ (1 + 2φθ + 2 θ2 ) + θσ 2 where i = 2, 3... 1.5 Box-Jenkins Modelling Philosophy From Wold decomposition idea, the stationary time series data can be modeling by ARMA(p,q) model. But p and q may be any positive natural numbers, fitting ARMA model with infinite parameters is intractable for economists and statisticians. Empirical economists want to find a model suitable for time series data with a parsimonious ARMA structure. Box and Jenkins (1976) propose a model selection procedure. The standard Box-Jenkins procedure consists of four steps: 1. Transform the nonstationary times series to a weak stationary time series. 2. Identify an ARMA(p,q) model from the transformed time series. 3. Estimate the parameters in this ARMA(p,q) model. 4. Use diagnostic tests to check model adequency and re-identify an ARMA(p,q) model if the preliminary model is found inappropriate. Except for the fist step, we will introduce the other three steps. 11

1.5.1 Identification In Box-Jenkins approach, the most important and difficult step is identification. After transform the ARIMA(p,d,q) model to ARMA(p,q) model, we must find the best p and q order. The sample autocorrelation functions is one way to judge the order of Moving average. Because the autocorrelation functions will be zero after the true order of moving average q. We may judge the MA(q) model with sample autocorrelation. Generally, the sample autocorrelation function can be calculate by formula. ˆρ j = 1 T T t=j+1 (y t ȳ)(y t j ȳ) t=1 (y = cov(y ty t j ) t ȳ) 2 var(y t ) 1 T d ρ j (57) When T, The sample autocorrelation function ˆρ j will converge to the population autocorrelation function ρ j, For AR(p) model, we use partial autocorrelation functions (PACF) to justify the accurate order of autoregressive part. Partial autocorrelation function between y t and y t j is the correlation between y t j and y t minus the best linear projection of y t on 1, y t 1,..., y t m+1 (Brockwell and Davis 1991, Kuan 2003). We may use the definition to calculate partial autocorrelation functions. Definition 2 (Greene 2003) The partial correlation between y t and y t j is the last coefficient in the best linear projection of y t on 1, y t 1,..., y t m+1, ˆφ 1 1. ˆφ j 1 ρ pac j = ˆγ 0 ˆγ j 1..... ˆγ j 1 ˆγ 0 ˆγ 1. ˆγ j. (58) If y t is an AR(p) process, y t correlated with y t j will make the last coefficient far away from 0. In the other hand, y t is uncorrelated with y t j. The partial autocorrelation coefficient will be zero. We may use sample partial autocorrelation functions to judge the order p of AR. When the model is AR(p), the partial autocorrelation graph will cut off after order p. One of very popular method to select number of order is model selection criterion. There are two common information criterions to judge number of order, Akaike information criterion (AIC, Akaike 1973) and Bayesian information criterion (BIC, or called Schwartz s information (SIC), Schwartz, 1978). ( ) SSR AIC = ln + T ( ) SSR BIC = ln + n 2(p + q + 1) T (p + q + 1) ln T T (59), (60) where SSR is the sum of square residuals of QMLE. p and q are possible order of autoregressive and moving average parts. When we add more autoregressive or moving average variables, SSR will decrease. But (p+q+1) will increase. This result shows a tradeoff between 12

fit and parameter parsimony. BIC imposes a large penalty for large number of parameters. Generally we choose the model with minimum values of AIC and BIC. If the AIC minimum value are different with BIC, we choose the model with BIC minimum value. (Because AIC have overfit problem under P + q + 1 large.) 1.5.2 Estimation When the basic ARMA(p,q) model is chosen, we may begin to estimate the parameters. There are three methods to estimate parameters: OLS, conditional Maximum likelihood (CMLE), and Full Maximum likelihood (FMLE). When the stationary time series model are AR(p), we may use OLS methods to estimate autoregressive parameters. Consider the AR(1) model. y t = φy t 1 + ε t Asymptotic distribution of OLS estimator for AR(1) model is ˆφ = t=2 y ty t 1 t=2 y2 t 1 = t=2 ϕy t 1y t 1 + ε t y t 1 t=2 y2 t 1 = φ + 1 T t=2 ε ty t 1 1 T t=2 y2 t 1 (61) T ( ˆφ ϕ) = 1 T t=2 ε ty t 1 1 T t=1 y2 t 1 N(0, Eε2 t y 2 t 1) Ey 2 t 1 = N(0, σ2 Ey 2 t 1) Ey 2 t 1 = N(0, σ 2 σ 2 1 ( 1 φ ) ) 2 = N(0, 1 φ 2 ) (62) For AR(p) model, y t = φ 1 y t 1 + φ 2 y t 2 + + φ p y t p + ε t, (63) where t = p + 2,..., T. If we rewrite Equation (63) as matrix form, Y = Xβ + ε. The asymptotic distribution of lease square parameter for AR(p) model. ( T ( ˆφ φ)) = (X X) 1 X ε, where φ is P 1 vector and y p y 1 X =.. y T 1 y T p T ( ˆφ φ) (T P ) P d N(0, σ 2 Γ 1 p 1) (64) 13

γ 0 γ p 1 where Γ p 1 =.. γ p 1 γ 0 ˆφ = (X X) 1 X Y. (65) ( T ( ˆφ φ)) = (X X) 1 X ε (66) 1 T X X d Γ p 1 1 X ε d N(0, 1 T T X E(εε X)X) = N(0, σ 2 Γ p 1 ) (68) (67) T ( ˆϕ ϕ) N(0, σ 2 (Γ p 1 ) 1 ) (69) Least Square only for AR(p) models. No distributional assumptions forε t. Ignore(basically) first p observations. We could alternatively solve Yale-Walker equations using more sample information. Implicitly ˆγ j = 1 T t=p+1 y ty t j is used. Conditional maximum likelihood estimation for AR model Consider an AR(1) model y t = φy t 1 + ε t, where ε t N(0, σ 2 ). We use the Baye s rule. f(y 2,..., y T y 1 ) = f(y T y 1,..., y T 1 )f(y T 1 y 1,..., y T 2 )... f(y 2 y 1 ) (70) If The conditional distribution of y t y t 1 N(φy t 1, σ 2 ), then f(y 2,..., y T y 1 ) = (σ 2π) (T 1) exp( 1 2σ 2 Conditional MLE assume ε t are normal. conditional on first p observations. maximize f(y p+1,..y t y 1, y 2,..., y p ). 14 T (y t ρy t 1 ) 2 ) (71) t=1

for AR(p) models, CMLE=LS. intutively we neglect information iny 1, y 2,..., y p. for ARMA(p,q) models. Full maximum likelihood estimation for AR model From Baye s rule, the unconditional join density of all y t is f(y 1,..., y T ) = f(y T y 1,..., y T 1 )f(y T 1 y 1,..., y T 2 )... f(y 2 y 1 )f(y 1 ). (72) y 1. y t Full MLE N(0, Ω) (73) f(y 1,..., y T ) = ( 2π) T exp( 1 2 y Ω 1 y)[ det(ω) ] T 2 (74) y maximizes f(y 1, y 2,..., y t ). Computationally often extremely difficult. Not in all computer packages. why bother if equivalent to CMLE anyway? (under first p observations does not matter in asymptotics.) Conditional maximum likelihood estimation for MA model Consider an MA(1) model y t = ε t + θε t 1, where ε t N(0, σ 2 ) and θ < 1. y t ε t 1 N(θε t 1, σ 2 ) If ε 0 = 0, it is given:y 1 ε 0 N(0, σ 2 ). Using Baye s rule, we may write the conditional density for MA(1) model if the MA(1) model is invertible. as: f(y 1,..., y T ε 0 = 0) = f(y T ε 0,..., ε T 1 )f(y T 1 ε 0,..., ε T 2 )... f(y 2 ε 1, ε 0 )f(y 1 ε 0 )(75) f(y 1, y 2,..., y T ε 0 = 0) = (σ 2π) T exp( 1 T ε 2 2σ 2 t ) (76) If y t is invertible, then ε t = (1 + θl) 1 y t = ( θ)j y t j. Equation (76) can be written f(y 1, y 2,..., y T ε 0 = 0) = (σ 2π) T exp( 1 2σ 2 Full maximum likelihood estimation for MA model t=1 T t 1 ( ( θ) j y t j ) 2 ) (77) f(y 1,..., y T ) = ( 2π) T det(ω) T 2 exp( 1 2 y Ω 1 y) (78) 15 t=1

1.5.3 Diagnostic Checking When we finish model estimation, we need to check whether our model are suitable for data. One method is to check the residuals of estimated ARMA(p,q) model. when the model is correct specified, its residuals ought to be a white noise sequence. If the model is incorrect, the sample autocorrelations will not be equal to zero. Consider the residuals ˆε t from estimated ARMA(p,q) model. The sample autocorrelations of ˆε t : ˆρ εj = T j t=1 (ˆε t ε)(ˆε t+j ˆε) t=1 (ˆε t ˆε) 2, (79) where ˆε is the sample average of residuals. The Box-Pierce residual test (Box and Pierce, 1970) is Q BP = T m (ˆρ εj ) 2 d χ 2 (m p q), (80) j=1 under T. When the residuals are not white noise, χ 2. We will reject H 0 : the residuals are white noise and accept alternative hypothesis. The other residual test is called Ljung-Box test (Ljung and Box 1978). Q LB = T (T + 2) under T. m j=1 (ˆρ εj ) 2 T j d χ 2 (m p q), (81) 1.6 Forecasting ARMA model In time series model, one important topic for empirical economists is to decide whether the model is suitable for data. One main criterion is the forecasting ability. In this paragraph, we will investigate the forecasting ARMA(p,q) model. Before we investigate forecasting ability, we first introduce some basic conditional density concepts. Let F t = σx i, < i t is an information set that include all lagged observations of X until time domain t. If we write the conditional expectation with F t 1, E(ε t F t 1 ) = 0 (82) From Equation (82), ε t cannot be predicted by information set F t 1. If we want predict y t+1 with information set F t 1 = y t, y t 1,, we need to find the minimum mean square errors between true value and predicted value. Assume f(y t, y t 1, ) is prediction of y t+1. We need find arg min f E(y t+1 f(y t, y t 1, )) 2. (83) We may rewrite Equation(83). E [y t+1 E(y t+1 y t, y t+1, y t+2,...) + E(y t+1 y t, y t+1, y t+2,...) f(y t, y t 1,...)] 2 16

= E [y t+1 E(y t+1 y t, y t+1, y t+2,...)] 2 +2E [(y t+1 E(y t+1 y t, y t+1, y t+2,...))] [E(y t+1 y t, y t 1,...) f(y t, y t 1,...)] + [E[y t+1 y t, y t+1, y t+2,...) f(y t, y t 1,...)] 2. (84) Under E(y t+1 y t, y t 1,...) = f(y t, y t 1,...), the last two items of Equation(84) will disappear. It generate the best linear predictor for y t+m by F t. Example:AR(1) Assume AR(1) model with i.i.d. error ε t. economists want use all information to predict y t+1. y t = ρy t 1 + ε t, where ρ < 1. We want to predict y t+1. E(y t+1 y t,...) = E(ρy t + ε t+1 y t,...) = E(ρy t y t,...) + E(ε t+1 y t,...) = ρy t If we want to use all information until t to predict y t+m, AR(1) model may rewrite by recursive substitution. y t+m = ρ m y t + m 1 ρ j ε t+m j. Ey t+m = ρ m y t. (85) Example:MA(1) Consider MA(1) model with i.i.d. error ε t. Use F t to predict y t+1. y t = ε t + θε t 1 = (1 + θl)ε t = ( θ)y t j, where θ < 1. E(y t+1 y t, y t 1...) = E(ε t+1 + θε t y t, y t+1,... = θε t ). Example:ARMA(1,1) Consider ARMA(1,1) model with i.i.d. error ε t. Use F t to predict y t+1. y t ρy t 1 = ε t + θε t 1, where ρ < 1 and θ < 1. E(y t+1 y t,...) = E(ρy t + ε t+1 + θε t y t, y t 1,...) = ρy t + θε t. 17

References Akaike, H. (1973), Information theory and an extension of the maximum likelihood principle, In B.N. Petrov and F. Csaki (eds.). 2nd. International Symposium on Information Theory, pp. 267-281, Akademia Kiado, Budapest. Box, G. E. P. and F. M. Jenkins (1976), Time Series Analysis: Forecasting and Control, San Francisco: Holden-Day. Box G.E.P. and D. A. Pierce (1970), Distribution of residual autocorrelations in autoregressive integrated moving average time series models, Journal of the American Statistical Association 65, 1509-1526. Brockwell, P.J. and R. A. Davis (1991), Time series: Theory and methods 2ed., New York, NY: Springer-Verlag. Greene, W. (2003), Econometric analysis 5ed., New Jersay, Prentice Hall. Hamilton, J. (1994), Time series analysis, Princeton, NJ: Princeton University press. Kuan, C-M (2003), Lecture on Basic time series models, Institute of Economics, Academia Sinica. Ljung, G.E. and G. E. P. Box (1978), On a measure of lock of fit in time series models, Biometrika 65, 297-303. Schwarz, G. (1978), Estimating the dimension of a model, Annals of Statistics 6, 461-464 18