The Identification of ARIMA Models

Similar documents
IDENTIFICATION OF ARMA MODELS

Seasonal Models and Seasonal Adjustment

LINEAR STOCHASTIC MODELS

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

Multivariate ARMA Processes

at least 50 and preferably 100 observations should be available to build a proper model

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Ch 6. Model Specification. Time Series Analysis

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

FE570 Financial Markets and Trading. Stevens Institute of Technology

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Forecasting with ARMA Models

3 Theory of stationary random processes

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Temporal Regression Models in Econometrics. With the use of the lag operator, we can rewrite this as

Advanced Econometrics

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Chapter 6: Model Specification for Time Series

Univariate, Nonstationary Processes

Stochastic Modelling Solutions to Exercises on Time Series

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Univariate ARIMA Models

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Forecasting. Simon Shaw 2005/06 Semester II

Serially Correlated Regression Disturbances

Empirical Market Microstructure Analysis (EMMA)

Ross Bettinger, Analytical Consultant, Seattle, WA

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

Part 1. Multiple Choice (50 questions, 1 point each) Part 2. Problems/Short Answer (10 questions, 5 points each)

Time Series: Theory and Methods

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

Chapter 4: Models for Stationary Time Series

Ch 5. Models for Nonstationary Time Series. Time Series Analysis

Use Google Analytics and R to predict Web Traffic Juan Antonio Breña Moral Created: Last update: Versión: 0.

ARIMA Models. Jamie Monogan. January 16, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 16, / 27

Moving Average (MA) representations

Lecture 1: Fundamental concepts in Time Series Analysis (part 2)

2. An Introduction to Moving Average Models and ARMA Models

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

A SARIMAX coupled modelling applied to individual load curves intraday forecasting

Some Time-Series Models

Quantitative Finance I

Univariate Time Series Analysis; ARIMA Models

Problem Set 2: Box-Jenkins methodology

Classic Time Series Analysis

Econometrics I: Univariate Time Series Econometrics (1)

E 4101/5101 Lecture 6: Spectral analysis

Circle a single answer for each multiple choice question. Your choice should be made clearly.

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

Chapter 8: Model Diagnostics

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Time Series Econometrics 4 Vijayamohanan Pillai N

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Lecture 2: Univariate Time Series

Final Examination 7/6/2011

Decision 411: Class 9. HW#3 issues

3. ARMA Modeling. Now: Important class of stationary processes

Time Series I Time Domain Methods

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

Theoretical and Simulation-guided Exploration of the AR(1) Model

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

A Data-Driven Model for Software Reliability Prediction

A time series is called strictly stationary if the joint distribution of every collection (Y t

SOME BASICS OF TIME-SERIES ANALYSIS

Modelling Monthly Rainfall Data of Port Harcourt, Nigeria by Seasonal Box-Jenkins Methods

Ch 9. FORECASTING. Time Series Analysis

The ARIMA Procedure: The ARIMA Procedure

5 Autoregressive-Moving-Average Modeling

Lecture 19 Box-Jenkins Seasonal Models

Marcel Dettling. Applied Time Series Analysis SS 2013 Week 05. ETH Zürich, March 18, Institute for Data Analysis and Process Design

ARIMA Models. Jamie Monogan. January 25, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 25, / 38

Unit root problem, solution of difference equations Simple deterministic model, question of unit root

Econometría 2: Análisis de series de Tiempo

Lesson 2: Analysis of time series

Chapter 5: Models for Nonstationary Time Series

Covariance Stationary Time Series. Example: Independent White Noise (IWN(0,σ 2 )) Y t = ε t, ε t iid N(0,σ 2 )

Econometrics I. Professor William Greene Stern School of Business Department of Economics 25-1/25. Part 25: Time Series

Lecture 5: Unit Roots, Cointegration and Error Correction Models The Spurious Regression Problem

ECON/FIN 250: Forecasting in Finance and Economics: Section 6: Standard Univariate Models

γ 0 = Var(X i ) = Var(φ 1 X i 1 +W i ) = φ 2 1γ 0 +σ 2, which implies that we must have φ 1 < 1, and γ 0 = σ2 . 1 φ 2 1 We may also calculate for j 1

1 Time Series Concepts and Challenges

Stochastic Processes

Minitab Project Report - Assignment 6

Non-Stationary Time Series and Unit Root Testing

Chapter 2: Unit Roots

Econometrics of Panel Data

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

Analysis. Components of a Time Series

FORECASTING. Methods and Applications. Third Edition. Spyros Makridakis. European Institute of Business Administration (INSEAD) Steven C Wheelwright

Econometrics for Policy Analysis A Train The Trainer Workshop Oct 22-28, 2016 Organized by African Heritage Institution

This note introduces some key concepts in time series econometrics. First, we

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

TMA4285 December 2015 Time series models, solution.

Dynamic Regressions LECTURE 8

International Journal of Advancement in Physical Sciences, Volume 4, Number 2, 2012

Dynamic Time Series Regression: A Panacea for Spurious Correlations

Ross Bettinger, Analytical Consultant, Seattle, WA

Transcription:

APPENDIX 4 The Identification of ARIMA Models As we have established in a previous lecture, there is a one-to-one correspondence between the parameters of an ARMA(p, q) model, including the variance of the disturbance, and the leading p + q + 1 elements of the autocovariance function. Given the true autocovariances of a process, we might be able to discern the orders p and q of its autoregressive and moving-average operators and, given these orders, we should then be able to deduce the values of the parameters. There are two other functions, prominent in time-series analysis, from which it is possible to recover the parameters of an ARMA process. These are the partial autocorrelation function and the spectral density function. The appearance of each of these functions gives an indication of the nature of the underlying process to which they belong; and, in theory, the business of identifying the model and of recovering its parameters can be conducted on the basis of any of them. In practice, the process is assisted by taking account of all three functions. The empirical versions of the three functions which are used in a modelbuilding exercise may differ considerably from their theoretical counterparts. Even when the data are truly generated by an ARMA process, the sampling errors which affect the empirical functions can lead one to identify the wrong model. This hazard is revealed by sampling experiments. When the data come from the real world, the notion that there is an underlying ARMA process is a fiction, and the business of model identification becomes more doubtful. Then there may be no such thing as the correct model; and the choice amongst alternative models must be made partly with a view their intended uses. The Autocorrelation Functions The techniques of model identification which are most commonly used were propounded originally by Box and Jenkins (1972). Their basic tools were the sample autocorrelation function and the partial autocorrelation function. We shall describe these functions and their use separately from the spectral density function which ought, perhaps, to be used more often in selecting models. The fact that spectral density function is often overlooked is probably due to 1

D.S.G. POLLOCK : ECONOMETRICS an unfamiliarity with frequency-domain analysis on the part of many model builders. Autocorrelation function (ACF). Given a sample y 0,y 1,...,y T 1 of T observations, we define the sample autocorrelation function to be the sequence of values (1) r τ = c τ /c 0, τ =0, 1,...,T 1, wherein (2) c τ = 1 T 1 (y t ȳ)(y t τ ȳ) T t=τ is the empirical autocovariance at lag τ and c 0 is the sample variance. One should note that, as the value of the lag increases, the number of observations comprised in the empirical autocovariance diminishes until the final element c T 1 = T 1 (y 0 ȳ)(y T 1 ȳ) is reached which comprises only the first and last mean-adjusted observations. In plotting the sequence {r τ }, we shall omit the value of r 0 which is invariably unity. Moreover, in interpreting the plot, one should be wary of giving too much credence to the empirical autocorrelations at lag values which are significantly high in relation to the size of the sample. Partial autocorrelation function (PACF). The sample partial autocorrelation p τ at lag τ is simply the correlation between the two sets of residuals obtained from regressing the elements y t and y t τ on the set of intervening values y 1,y 2,...,y t τ+1. The partial autocorrelation measures the dependence between y t and y t τ after the effect of the intervening values has been removed. The sample partial autocorrelation p τ is virtually the same quantity as the estimated coefficient of lag τ obtained by fitting an autoregressive model of order τ to the data. Indeed, the difference between the two quantities vanishes as the sample size increases. The Durbin Levinson algorithm provides an efficient way of computing the sequence {p τ } of partial autocorrelations from the sequence of {c τ } of autocovariances. It can be seen, in view of this algorithm, that the information in {c τ } is equivalent to the information contained jointly in {p τ } and c 0. Therefore the sample autocorrelation function {r t } and the sample partial autocorrelation function {p t } are equivalent in terms of their information content. The Methodology of Box and Jenkins The model-building methodology of Box and Jenkins, relies heavily upon the two functions {r t } and {p t } defined above. It involves a cycle comprising the three stages of model selection, model estimation and model checking. In view of the difficulties of selecting an appropriate model, it is envisaged that the cycle might have to be repeated several times and that, at the end, there might be more than one model of the same series. 2

D.S.G. POLLOCK : ARIMA IDENTIFICATION 18.5 18.0 17.5 17.0 16.5 16.0 15.5 0 50 100 150 Figure 1. The concentration readings from a chemical process with the autocorrelation function and the autocorrelation function of the differences. 3

D.S.G. POLLOCK : ECONOMETRICS Reduction to stationarity. The first step, which is taken before embarking on the cycle, is to examine the time plot of the data and to judge whether or not it could be the outcome of a stationary process. If a trend is evident in the data, then it must be removed. A variety of techniques of trend removal, which include the fitting of parametric curves and of spline functions, have been discussed in previous lectures. When such a function is fitted, it is to the sequence of residuals that the ARMA model is applied. However, Box and Jenkins were inclined to believe that many empirical series can be modelled adequately by supposing that some suitable difference of the process is stationary. Thus the process generating the observed series y(t) might be modelled by the ARIMA(p, d, q) equation (3) α(l) d y(t) =µ(l)ε(t), wherein d =(I L) d is the dth power of the difference operator. In that case, the differenced series z(t) = d y(t) will be described by a stationary ARMA(p, q) model. The inverse operator 1 is the summing or integrating operator, which accounts for the fact that the model depicted by equation (3) is described an autoregressive integrated moving-average model. To determine whether stationarity has been achieved, either by trend removal or by differencing, one may examine the autocorrelation sequence of the residual or processed series. The sequence corresponding to a stationary process should converge quite rapidly to zero as the value of the lag increases. An empirical autocorrelation function which exhibits a smooth pattern of significant values at high lags indicates a nonstationary series. An example is provided by Figure 1 where a comparison is made between the autocorrelation function of the original series and that of its differences. Although the original series does not appear to embody a systematic trend, it does drift in a haphazard manner which suggests a random walk; and it is appropriate to apply the difference operator. Once the degree of differencing has been determined, the autoregressive and moving-average orders are selected by examining the sample autocorrelations and sample partial autocorrelations. The characteristics of pure autoregressive and pure moving-average process are easily spotted. Those of a mixed autoregressive moving-average model are not so easily unravelled. Moving-average processes. The theoretical autocorrelation function {ρ τ } of a pure moving-average process of order q has ρ τ = 0 for all τ > q. The corresponding partial autocorrelation function {π τ } is liable to decay towards zero gradually. To judge whether the corresponding sample autocorrelation function {r τ } shows evidence of a truncation, we need some scale by which to judge the significance of the values of its elements. 4

D.S.G. POLLOCK : ARIMA IDENTIFICATION 4 3 2 1 0 1 2 3 4 5 0 25 50 75 100 Figure 2. The graph of 120 observations on a simulated series generated by the MA(2) process y(t)=(1+0.90l +0.81L 2 )ε(t) together with the theoretical and empirical ACF s (middle) and the theoretical and empirical PACF s (bottom). The theoretical values correspond to the solid bars. 5

D.S.G. POLLOCK : ECONOMETRICS As a guide to determining whether the parent autocorrelations are in fact zero after lag q, we may use a result of Bartlett [1946] which shows that, for a sample of size T, the standard deviation of r τ is approximately (4) 1 T { 1+2(r 2 1 + r 2 2 + + r 2 q) } 1/2 for τ>q. The result is also given by Fuller [1976, p. 237]. A simpler measure of the scale of the autocorrelations is provided by the limits of ±1.96/ T which are the approximate 95% confidence bounds for the autocorrelations of a white-noise sequence. These bounds are represented by the dashed horizontal lines on the accompanying graphs. Autoregressive processes. The theoretical autocorrelation function {ρ τ } of a pure autoregressive process of order p obeys a homogeneous difference equation based upon the autoregressive operator α(l) =1+α 1 L + + α p L p. That is to say (5) ρ τ = (α 1 ρ τ 1 + + α p ρ τ p ) for all τ p. In general, the sequence generated by this equation will represent a mixture of damped exponential and sinusoidal functions. If the sequence is of a sinusoidal nature, then the presence of complex roots in the operator α(l) is indicated. One can expect the empirical autocovariance function of a pure AR process to be of the same nature as its theoretical parent. It is the partial autocorrelation function which serves most clearly to identify a pure AR process. The theoretical partial autocorrelations function {π τ } of a AR(p) process has π τ = 0 for all τ > p. Likewise, all elements of the sample partial autocorrelation function are expected to be close to zero for lags greater than p, which corresponds to the fact that they are simply estimates of zero-valued parameters. The significance of the values of the partial autocorrelations is judged by the fact that, for a pth order process, their standard deviations for all lags greater that p are approximated by 1/ T. Thus the bounds of ±1.96/ T are also plotted on the graph of the partial autocorrelation function. Mixed processes. In the case of a mixed ARMA(p, q) process, neither the theoretical autocorrelation function not the theoretical partial autocorrelation function have any abrupt cutoffs. Indeed, there is little that can be inferred from either of these functions or from their empirical counterparts beyond the fact that neither a pure MA model nor a pure AR model would be inappropriate. On its own, the autocovariance function of an ARMA(p, q) process is not easily distinguished from that of a pure AR process. In particular, its elements γ τ satisfy the same difference equation as that of a pure AR model for all values of τ>max(p, q). 6

D.S.G. POLLOCK : ARIMA IDENTIFICATION 15 10 5 0 5 10 15 0 25 50 75 100 Figure 3. The graph of 120 observations on a simulated series generated by the AR(2) process (1 1.69L +0.81L 2 )y(t) =ε(t) together with the theoretical and empirical ACF s (middle) and the theoretical and empirical PACF s (bottom). The theoretical values correspond to the solid bars. 7

D.S.G. POLLOCK : ECONOMETRICS 40 30 20 10 0 10 20 30 0 20 50 75 100 Figure 4. The graph of 120 observations on a simulated series generated by the ARMA(2, 2) process (1 1.69L +0.81L 2 )y(t) =(1+0.90L +0.81L 2 )ε(t) together with the theoretical and emprical ACF s (middle) and the theoretical and empirical PACF s (bottom). The theoretical values correspond to the solid bars. 8

D.S.G. POLLOCK : ARIMA IDENTIFICATION There is good reason to regard mixed models as more appropriate in practice than pure models of either variety. For a start, there is the fact that a rational transfer function is far more effective in approximating an arbitrary impulse response than is an autoregressive transfer function, whose parameters are confined to the denominator, or a moving-average transfer function, which has its parameters in the numerator. Indeed, it might be appropriate, sometimes, to approximate a pure process of a high order by a more parsimonious mixed model. Mixed models are also favoured by the fact that the sum of any two mutually independent autoregressive process gives rise to an ARMA process. Let y(t) and z(t) be autoregressive processes of orders p and r respectively which are described by the equations α(l)y(t) = ε(t) and ρ(l)z(t) = η(t), wherein ε(t) and η(t) are mutually independent white-noise processes. Then their sum will be (6) y(t)+z(t) = ε(t) α(l) + η(t) ρ(l) = ρ(l)ε(t)+α(l)η(t) α(l)ρ(l) = µ(l)ζ(t) α(l)ρ(l), where µ(l)ζ(t) = ρ(l)ε(t) + α(l)η(t) constitutes a moving-average process of order max(p, r). In economics, where the data series are highly aggregated, mixed models would seem to be called for often. In the context of electrical and mechanical engineering, there may be some justification for pure AR models. Here there is often abundant data, sufficient to sustain the estimation of pure autoregressive models of high order. Therefore the principle of parametric parsimony is less persuasive than it might be in an econometric context. However, pure AR models perform poorly whenever the data is affected by errors of observation; and, in this respect, a mixed model is liable to be more robust. One can understand this feature of mixed models by recognising that the sum of a pure AR(p) process an a white-noise process is an ARMA(p, p) process. 9