distributed approximately according to white noise. Likewise, for general ARMA(p,q), the residuals can be expressed as

library(forecast) log_ap <- log(airpassengers) fit <- auto.arima(log_ap, ic="aicc") 7 Model diagnostics The model diagnostics final step in the three-step procedure for time series model building suggested by (and attributed to) the Bo and Jenkins (1970): Identification where we look at the data (with ACF, PACF, differencing, lag plots, periodogram...), and also any subject-specific information about the data, to suggest subclasses of parsimonious models we might consider. Estimation where we fit the chosen model, or models of interest, to the data. Diagnostic checking where we study how the model fits the data, and look for any signs of an inadequate fit using formal hypothesis tests. Thestepsoverlap,asisthecasewithinformationcriteriawhichcanonlybefound after estimation of the parameters. Please bear in mind that this procedure was suggested when computing was epensive, and even then the procedure was meant to be iterative; the most adequate model may not be found in one iteration. 7.1 Residuals Let us net take a closer look at the residuals of the ARMA models. Notice that in the time series contet, there is no natural decomposition of the data to fitted values and the residuals. Please keep this mind when using the R functions fitted and resid with time series models; see Figure 32. Consider first an AR(p). If the data is really from AR(p), and if the estimated parameters are close to their true values, we should have the residuals e i = ( i ˆµ) ˆφ j ( i j ˆµ), i = p+1,...,n distributed approimately according to white noise. Likewise, for general ARMA(p,q), the residuals can be epressed as e i = i E[X i X 1 = 1,...,X i 1 = i 1 ], where the conditional epectation is with respect to the process (X i ) following the ARMA with the estimated parameters (ˆφ,ˆθ,ˆσ 2,ˆµ). The first step in the residual analysis is to look at the ACF and PACF of the residuals, whether they appear similar to those calculated from white noise. 51

4 6 8 10 5 10 15 20 3 1 1 5 10 15 20 Figure 32: Residuals of a linear model (top) and residuals of an AR(1) with φ 1 = 3/4 (bottom). 7.2 Residual tests Definition 7.1 (Bo-Pierce test). The Bo-Pierce statistic is calculated for some p+q < K n, K Q = n rj, 2 wherer j isthesampleautocorrelationoftheresidualseries.ifthemodeliscorrect, then Q is approimately distributed as χ 2 K p q.13 Definition 7.2 (Ljung-Bo test). The Ljung-Bo test is eactly as Bo-Pierce, but with a modified statistic Q = n K n+2 n j r2 j, which has been found empirically to be often a more accurate approimation of χ 2 K p q. Eample 7.3. Ljung-Bo with MA(3) fitted to simulated AR(2). 13. That is, the null (that the model is correct) is rejected if Q is greater than the 1 α quantile of χ 2 K p q. 52

Standardized Residuals 2 0 2 0 20 40 60 80 0.2 0.4 1.0 0 5 10 15 p values for Ljung Bo statistic 0.0 0.4 0.8 2 4 6 8 10 Figure 33: You should not trust the Ljung-Bo statistic reported by R function tsdiag(fit)... n <- 80; q <- 3; p <- 0 <- arima.sim(model=list(ar=c(1/2, 1/3)), n) fit <- arima(, order=c(p,0,q)); e <- resid(fit); pval <- rep(na,10) for(lag in (p+q+1):10) { pval[lag] <- Bo.test(e, lag=lag, fitdf=p+q, type="ljung")$p.value } Remark 7.4. The R function tsdiag calculates the Ljung-Bo statistics with wrong degrees of freedom, not taking the number of parameters into account, leading into overestimated p-values! Remark 7.5. The Bo-Pierce and Ljung-Bo tests generally may fail to disqualify poorly fitting models with smaller data sets(cf. also Brockwell and Davis, p. 312). Thismeansthatfailingtorejectthenullshouldnotbetakenasastrongindication that the model is necessarily the most adequate one. (Eample 7.3 with n = 200 often leads into clear rejection of the null.) 53

0.0 0.4 0.8 2 4 6 8 10 Figure 34: The incorrect statistics calculated by tsdiag (), and the correct Bo- Ljung ( ) and Bo-Pierce (o) statistics. 7.3 Overfitting Sometimes, it can be instructive to fit higher order model to reassure that the chosen model should, in fact, be sufficient. If the preliminary model is, say, AR(2), we may try to fit AR(3), and inspect the coefficients of the AR(3). If the first two coefficients of the fitted do not significantly differ from those of the AR(2), and the third does not significantly differ from zero, this overfitting procedure can given further support to our choice of the AR(2). Eample 7.6. Suppose we have fitted an AR(1) to the data, and both residual analysis and information criteria support our choice. We fit AR(2) and compare the coefficients. ˆφ 1 ±s.d. ˆφ2 ±s.d. ˆσ 2 What would you conclude? AR(1) 0.1935 ± 0.0509 1.5618 AR(2) 0.1865 ± 0.0518 0.0368 ± 0.0520 1.5607 8 Forecasting Forecasting in time-series models relies on calculating forecasts from the model with the estimated parameters, in the mean square sense. This means calculating the conditional epectations ˆ i+h 1:i := E[X i+h X 1 = 1,...,X i = i ], h > 1, where the conditional epectation is with respect to the process (X i ) following the ARMA with the estimated parameters. In order to have confidence invervals of the prediction, we should consider the conditional distribution of X i+h given X 1 = 1,...,X i = i. Under the 54

assumption of Gaussian white noise, we only need to calculate only the predictive variance v i+h 1:i = Var(X i+h X 1 = 1,...,X i = i ). 14 Remark 8.1. Note that these confidence intervals may be optimistic because of this Gaussian assumption heavier tailed noise might well imply wider confidence intervals. Note also that the parameter uncertainty is not taken into account, so the prediction confidence intervals may be optimistic. 8.1 Autoregressive process In the case of AR(p), we already discussed noted in Section 7.1 that the one-step predictors come directly from the definition for i > p ˆ i+1 1:i := E[X i+1 X 1 = 1,...,X i = i ] = ˆµ+ ˆφ j ( i j+1 ˆµ). The conditional variance is just the variance of W i+1, that is, ˆσ 2. For the rest of this section, we assume ˆµ = 0 to simplify epressions if (X i) was the non-centred AR(p) process, we consider ˆX i = X i ˆµ and so forth. The two-step predictor can be calculated as [ ˆ i+2 1:i = E W i+2 + ] ˆφ j X i+2 j X 1 = 1,...,X i = i = ˆφ 1 E[X i+1 X 1 = 1,...,X i = i ]+ = ˆφ 1ˆ i+1 1:i + ˆφ j i+2 j, j=2 ˆφ j i+2 j where the latter two sums equal zero if p = 1. If we denote ˆ j 1:i = j if 1 j i, we have the general result for i p and h 1 j=2 ˆ i+h 1:i = ˆφ jˆ i+h j 1:i. This just means that we calculate ˆ i+h 1:i from the previous values and previous predictions via the AR(p) definition, ignoring the noise. Remark 8.2. For any stationary AR(p), the predictors ˆ i+h 1:i converge to ˆµ as h increases (at an eponential rate). 14. If the prediction (ˆ i+1 1:i,...,ˆ i+h 1:i ) is considered simultaneously, then one could consider also the conditional covariance matri. 55

Eample 8.3. The variance of the prediction of AR(1) satisfies for i 2 and h 2, v i+h 1:i = Var(X 2 i+h X 1 = 1,...,X i = i ) = ˆσ 2 + ˆφ 2 1Var(X i+h 1 X 1 = 1,...,X i = 1 ) h 1 = = ˆσ 2 (ˆφ 2 1) k. k=0 We observe that v i+h 1:i ˆσ2 as h increases. 1 ˆρ 2 1 Any stationary AR(p) behaves similarly, that is, the variance of the predictor stabilises to the stationary variance (at eponential rate). 8.2 General ARIMA For general ARIMA process, closed form epressions are not available, but both prediction and the variance of the prediction (under Gaussian assumption) can be calculated numerically. In case of a regular stationary ARMA (Condition 4.26), we could write, in principle, X i = β j X i 1 +W i, where the constants β j converge to zero eponentially fast. Therefore, it comes by no surprise that the long-horizon predictions behave similarly as in the AR(p) case, that is, ˆ i+h 1:i ˆµ and v i+h 1:i γ 0 (at an eponential rate). When the model involves differencing, we can consider the model as an non-stationary ARMA, and do the prediction and calculate the variance of the prediction with the same numerical tools as for stationary ARMA. However, the model is in this case non-stationary, and the predictive variances increase towards infinity as h increases. Eample 8.4. Prediction from ARIMA(1,0,0)(1,0,0) 12 fitted to NY births data (from 1948) with a linear trend regressor. h <- 48; t <- time(b) f1 <- arima(b, reg=t, order=c(1,0,0), seasonal=list(order=c(1,0,0), season=12)) p1 <- predict(f1, h, newreg=t[n]+(1:h)/12) m1 <- p1$pred; s1 <- p1$se ts.plot(b, m1, m1+1.96*s1, m1-1.96*s1, col=c(1,2,2,2), lty=c(1,1,2,2)) 56

20 30 1950 1955 1960 20 30 1950 1955 1960 Figure 35: Predictions of ARIMA(1,0,0)(1,0,0) 12 of Eample 8.4 (top) and a similarly fitted non-stationary ARIMA(1,1,0)(1,0,0) 12 (bottom). 9 Spectrum of a stationary process The periodogram was a transform calculated from a finite length vector. We net consider a slightly more abstract concept of the spectrum of a stationary process. There are two key differences: the process (X i ) i Z is of infinite length, and the process can take multiple realisations. 9.1 Spectral density The spectral density of a stationary process is, in fact, a discrete-time Fourier transform (DTFT) of the autocovariance function. It is analogous to DFT, but with infinite number of frequencies. Definition 9.1 (Spectral density). Suppose that (X t ) is a stationary process with autocovariance sequence satisfying γ k <. k=0 The spectral density of the process (X t ) (or equivalently of the autocovariance (γ k )) is the function f(λ) = 1 2π k= γ k e ikλ, for λ ( π,π]. 57