distributed approximately according to white noise. Likewise, for general ARMA(p,q), the residuals can be expressed as

Similar documents
Figure 29: AR model fit into speech sample ah (top), the residual, and the random sample of the model (bottom).

Chapter 8: Model Diagnostics

Autoregressive Moving Average (ARMA) Models and their Practical Applications

at least 50 and preferably 100 observations should be available to build a proper model

Minitab Project Report - Assignment 6

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis

Advanced Econometrics

Modelling using ARMA processes

STAT 520 FORECASTING AND TIME SERIES 2013 FALL Homework 05

Comment about AR spectral estimation Usually an estimate is produced by computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Univariate ARIMA Models

5 Autoregressive-Moving-Average Modeling

Time Series Analysis -- An Introduction -- AMS 586

Dynamic Time Series Regression: A Panacea for Spurious Correlations

Time Series: Theory and Methods

Applied time-series analysis

ARIMA Modelling and Forecasting

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

Lesson 13: Box-Jenkins Modeling Strategy for building ARMA models

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

{ } Stochastic processes. Models for time series. Specification of a process. Specification of a process. , X t3. ,...X tn }

Time Series I Time Domain Methods

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Review Session: Econometrics - CLEFIN (20192)

FE570 Financial Markets and Trading. Stevens Institute of Technology

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

Introduction to Time Series Analysis. Lecture 11.

Lesson 14: Model Checking

Ch 6. Model Specification. Time Series Analysis

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

Midterm Suggested Solutions

ITSM-R Reference Manual

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

Lecture 4a: ARMA Model

Circle a single answer for each multiple choice question. Your choice should be made clearly.

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

Modeling and forecasting global mean temperature time series

Problem Set 2: Box-Jenkins methodology

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

Estimation and application of best ARIMA model for forecasting the uranium price.

Econometrics I: Univariate Time Series Econometrics (1)

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Akaike criterion: Kullback-Leibler discrepancy

AR, MA and ARMA models

Empirical Market Microstructure Analysis (EMMA)

Suan Sunandha Rajabhat University

Econometric Forecasting

STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9)

White Noise Processes (Section 6.2)

Statistics 349(02) Review Questions

STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) Outline. Return Rate. US Gross National Product

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER II EXAMINATION MAS451/MTH451 Time Series Analysis TIME ALLOWED: 2 HOURS

Econometrics for Policy Analysis A Train The Trainer Workshop Oct 22-28, 2016 Organized by African Heritage Institution

CHAPTER 8 FORECASTING PRACTICE I

Lecture 2: Univariate Time Series

TIME SERIES ANALYSIS. Forecasting and Control. Wiley. Fifth Edition GWILYM M. JENKINS GEORGE E. P. BOX GREGORY C. REINSEL GRETA M.

Classic Time Series Analysis

Time Series 2. Robert Almgren. Sept. 21, 2009

Stat 153 Time Series. Problem Set 4

Decision 411: Class 9. HW#3 issues

MCMC analysis of classical time series algorithms.

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

STAT Financial Time Series

Stochastic Modelling Solutions to Exercises on Time Series

Analysis. Components of a Time Series

The Identification of ARIMA Models

Final Examination 7/6/2011

ECON/FIN 250: Forecasting in Finance and Economics: Section 8: Forecast Examples: Part 1

Time Series Analysis

The autocorrelation and autocovariance functions - helpful tools in the modelling problem

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Module 3. Descriptive Time Series Statistics and Introduction to Time Series Models

Akaike criterion: Kullback-Leibler discrepancy

Firstly, the dataset is cleaned and the years and months are separated to provide better distinction (sample below).

Part II. Time Series

6. The econometrics of Financial Markets: Empirical Analysis of Financial Time Series. MA6622, Ernesto Mordecki, CityU, HK, 2006.

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

Econometrics II Heij et al. Chapter 7.1

Exercises - Time series analysis

ARMA MODELS Herman J. Bierens Pennsylvania State University February 23, 2009

Univariate linear models

Chapter 2: Unit Roots

Chapter 6: Model Specification for Time Series

Some Time-Series Models

Lecture 3: Autoregressive Moving Average (ARMA) Models and their Practical Applications

TMA4285 December 2015 Time series models, solution.

Read Section 1.1, Examples of time series, on pages 1-8. These example introduce the book; you are not tested on them.

2. An Introduction to Moving Average Models and ARMA Models

Multivariate Time Series

Kernel-based portmanteau diagnostic test for ARMA time series models

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Homework 4. 1 Data analysis problems

ARIMA Models. Richard G. Pierse

Analysis of Violent Crime in Los Angeles County

MAT3379 (Winter 2016)

Transcription:

library(forecast) log_ap <- log(airpassengers) fit <- auto.arima(log_ap, ic="aicc") 7 Model diagnostics The model diagnostics final step in the three-step procedure for time series model building suggested by (and attributed to) the Bo and Jenkins (1970): Identification where we look at the data (with ACF, PACF, differencing, lag plots, periodogram...), and also any subject-specific information about the data, to suggest subclasses of parsimonious models we might consider. Estimation where we fit the chosen model, or models of interest, to the data. Diagnostic checking where we study how the model fits the data, and look for any signs of an inadequate fit using formal hypothesis tests. Thestepsoverlap,asisthecasewithinformationcriteriawhichcanonlybefound after estimation of the parameters. Please bear in mind that this procedure was suggested when computing was epensive, and even then the procedure was meant to be iterative; the most adequate model may not be found in one iteration. 7.1 Residuals Let us net take a closer look at the residuals of the ARMA models. Notice that in the time series contet, there is no natural decomposition of the data to fitted values and the residuals. Please keep this mind when using the R functions fitted and resid with time series models; see Figure 32. Consider first an AR(p). If the data is really from AR(p), and if the estimated parameters are close to their true values, we should have the residuals e i = ( i ˆµ) ˆφ j ( i j ˆµ), i = p+1,...,n distributed approimately according to white noise. Likewise, for general ARMA(p,q), the residuals can be epressed as e i = i E[X i X 1 = 1,...,X i 1 = i 1 ], where the conditional epectation is with respect to the process (X i ) following the ARMA with the estimated parameters (ˆφ,ˆθ,ˆσ 2,ˆµ). The first step in the residual analysis is to look at the ACF and PACF of the residuals, whether they appear similar to those calculated from white noise. 51

4 6 8 10 5 10 15 20 3 1 1 5 10 15 20 Figure 32: Residuals of a linear model (top) and residuals of an AR(1) with φ 1 = 3/4 (bottom). 7.2 Residual tests Definition 7.1 (Bo-Pierce test). The Bo-Pierce statistic is calculated for some p+q < K n, K Q = n rj, 2 wherer j isthesampleautocorrelationoftheresidualseries.ifthemodeliscorrect, then Q is approimately distributed as χ 2 K p q.13 Definition 7.2 (Ljung-Bo test). The Ljung-Bo test is eactly as Bo-Pierce, but with a modified statistic Q = n K n+2 n j r2 j, which has been found empirically to be often a more accurate approimation of χ 2 K p q. Eample 7.3. Ljung-Bo with MA(3) fitted to simulated AR(2). 13. That is, the null (that the model is correct) is rejected if Q is greater than the 1 α quantile of χ 2 K p q. 52

Standardized Residuals 2 0 2 0 20 40 60 80 0.2 0.4 1.0 0 5 10 15 p values for Ljung Bo statistic 0.0 0.4 0.8 2 4 6 8 10 Figure 33: You should not trust the Ljung-Bo statistic reported by R function tsdiag(fit)... n <- 80; q <- 3; p <- 0 <- arima.sim(model=list(ar=c(1/2, 1/3)), n) fit <- arima(, order=c(p,0,q)); e <- resid(fit); pval <- rep(na,10) for(lag in (p+q+1):10) { pval[lag] <- Bo.test(e, lag=lag, fitdf=p+q, type="ljung")$p.value } Remark 7.4. The R function tsdiag calculates the Ljung-Bo statistics with wrong degrees of freedom, not taking the number of parameters into account, leading into overestimated p-values! Remark 7.5. The Bo-Pierce and Ljung-Bo tests generally may fail to disqualify poorly fitting models with smaller data sets(cf. also Brockwell and Davis, p. 312). Thismeansthatfailingtorejectthenullshouldnotbetakenasastrongindication that the model is necessarily the most adequate one. (Eample 7.3 with n = 200 often leads into clear rejection of the null.) 53

0.0 0.4 0.8 2 4 6 8 10 Figure 34: The incorrect statistics calculated by tsdiag (), and the correct Bo- Ljung ( ) and Bo-Pierce (o) statistics. 7.3 Overfitting Sometimes, it can be instructive to fit higher order model to reassure that the chosen model should, in fact, be sufficient. If the preliminary model is, say, AR(2), we may try to fit AR(3), and inspect the coefficients of the AR(3). If the first two coefficients of the fitted do not significantly differ from those of the AR(2), and the third does not significantly differ from zero, this overfitting procedure can given further support to our choice of the AR(2). Eample 7.6. Suppose we have fitted an AR(1) to the data, and both residual analysis and information criteria support our choice. We fit AR(2) and compare the coefficients. ˆφ 1 ±s.d. ˆφ2 ±s.d. ˆσ 2 What would you conclude? AR(1) 0.1935 ± 0.0509 1.5618 AR(2) 0.1865 ± 0.0518 0.0368 ± 0.0520 1.5607 8 Forecasting Forecasting in time-series models relies on calculating forecasts from the model with the estimated parameters, in the mean square sense. This means calculating the conditional epectations ˆ i+h 1:i := E[X i+h X 1 = 1,...,X i = i ], h > 1, where the conditional epectation is with respect to the process (X i ) following the ARMA with the estimated parameters. In order to have confidence invervals of the prediction, we should consider the conditional distribution of X i+h given X 1 = 1,...,X i = i. Under the 54

assumption of Gaussian white noise, we only need to calculate only the predictive variance v i+h 1:i = Var(X i+h X 1 = 1,...,X i = i ). 14 Remark 8.1. Note that these confidence intervals may be optimistic because of this Gaussian assumption heavier tailed noise might well imply wider confidence intervals. Note also that the parameter uncertainty is not taken into account, so the prediction confidence intervals may be optimistic. 8.1 Autoregressive process In the case of AR(p), we already discussed noted in Section 7.1 that the one-step predictors come directly from the definition for i > p ˆ i+1 1:i := E[X i+1 X 1 = 1,...,X i = i ] = ˆµ+ ˆφ j ( i j+1 ˆµ). The conditional variance is just the variance of W i+1, that is, ˆσ 2. For the rest of this section, we assume ˆµ = 0 to simplify epressions if (X i) was the non-centred AR(p) process, we consider ˆX i = X i ˆµ and so forth. The two-step predictor can be calculated as [ ˆ i+2 1:i = E W i+2 + ] ˆφ j X i+2 j X 1 = 1,...,X i = i = ˆφ 1 E[X i+1 X 1 = 1,...,X i = i ]+ = ˆφ 1ˆ i+1 1:i + ˆφ j i+2 j, j=2 ˆφ j i+2 j where the latter two sums equal zero if p = 1. If we denote ˆ j 1:i = j if 1 j i, we have the general result for i p and h 1 j=2 ˆ i+h 1:i = ˆφ jˆ i+h j 1:i. This just means that we calculate ˆ i+h 1:i from the previous values and previous predictions via the AR(p) definition, ignoring the noise. Remark 8.2. For any stationary AR(p), the predictors ˆ i+h 1:i converge to ˆµ as h increases (at an eponential rate). 14. If the prediction (ˆ i+1 1:i,...,ˆ i+h 1:i ) is considered simultaneously, then one could consider also the conditional covariance matri. 55

Eample 8.3. The variance of the prediction of AR(1) satisfies for i 2 and h 2, v i+h 1:i = Var(X 2 i+h X 1 = 1,...,X i = i ) = ˆσ 2 + ˆφ 2 1Var(X i+h 1 X 1 = 1,...,X i = 1 ) h 1 = = ˆσ 2 (ˆφ 2 1) k. k=0 We observe that v i+h 1:i ˆσ2 as h increases. 1 ˆρ 2 1 Any stationary AR(p) behaves similarly, that is, the variance of the predictor stabilises to the stationary variance (at eponential rate). 8.2 General ARIMA For general ARIMA process, closed form epressions are not available, but both prediction and the variance of the prediction (under Gaussian assumption) can be calculated numerically. In case of a regular stationary ARMA (Condition 4.26), we could write, in principle, X i = β j X i 1 +W i, where the constants β j converge to zero eponentially fast. Therefore, it comes by no surprise that the long-horizon predictions behave similarly as in the AR(p) case, that is, ˆ i+h 1:i ˆµ and v i+h 1:i γ 0 (at an eponential rate). When the model involves differencing, we can consider the model as an non-stationary ARMA, and do the prediction and calculate the variance of the prediction with the same numerical tools as for stationary ARMA. However, the model is in this case non-stationary, and the predictive variances increase towards infinity as h increases. Eample 8.4. Prediction from ARIMA(1,0,0)(1,0,0) 12 fitted to NY births data (from 1948) with a linear trend regressor. h <- 48; t <- time(b) f1 <- arima(b, reg=t, order=c(1,0,0), seasonal=list(order=c(1,0,0), season=12)) p1 <- predict(f1, h, newreg=t[n]+(1:h)/12) m1 <- p1$pred; s1 <- p1$se ts.plot(b, m1, m1+1.96*s1, m1-1.96*s1, col=c(1,2,2,2), lty=c(1,1,2,2)) 56

20 30 1950 1955 1960 20 30 1950 1955 1960 Figure 35: Predictions of ARIMA(1,0,0)(1,0,0) 12 of Eample 8.4 (top) and a similarly fitted non-stationary ARIMA(1,1,0)(1,0,0) 12 (bottom). 9 Spectrum of a stationary process The periodogram was a transform calculated from a finite length vector. We net consider a slightly more abstract concept of the spectrum of a stationary process. There are two key differences: the process (X i ) i Z is of infinite length, and the process can take multiple realisations. 9.1 Spectral density The spectral density of a stationary process is, in fact, a discrete-time Fourier transform (DTFT) of the autocovariance function. It is analogous to DFT, but with infinite number of frequencies. Definition 9.1 (Spectral density). Suppose that (X t ) is a stationary process with autocovariance sequence satisfying γ k <. k=0 The spectral density of the process (X t ) (or equivalently of the autocovariance (γ k )) is the function f(λ) = 1 2π k= γ k e ikλ, for λ ( π,π]. 57