STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) Outline 1 Building ARIMA Models 2 SARIMA 3 Homework 4c Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 2/ 34 Return Rate Suppose x t is the value of an investment at time t and p t is the percentage changes from t 1 to t (which may be negative). Then we can write x t = (1 + p t )x t 1 Taking logs produces log(x t ) = log(1 + p t ) + log(x t 1 ) US Gross National Product We consider the seasonaly adjusted quarterly US GNP from 1947(1) to 2003(3) giving a total of n = 223 observations. http://research.stlouisfed.org/ (Economic Data FREDR Gross Domestic Product (GDP) and Components GDP/GNP GNP) > gnp96 = read.table("mydata/gnp96.dat") > gnp = ts(gnp96[,2], start=1947, frequency=4) > plot(gnp,lwd=3) equivalently log(x t ) = log(1 + p t ) p t where the approximation holds when p t is close to zero. Another representation of log(x t ) is ( ) xt log(x t ) = log(x t ) log(x t 1 ) = log. x t 1 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 4/ 34 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 5/ 34
US Gross National Product (cont) Just for kicks, lets look at the acf. > acf(gnp, 50) Percentage Quarterly Growth of US GNP Instead, we consider the growth rate x t = log(y t ). > gnpgr = diff(log(gnp)) # growth rate > plot.ts(gnpgr) Simple differencing may not be the answer. > plot(diff(gnp)) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 6/ 34 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 7/ 34 Modeling Percentage Quarterly Growth of US GNP The plots of the ACF and PACF of the GNP growth rate indicates two potential models for the log GNP series: ARIMA(0,1,2) ARIMA(1,1,0) We fit AR(1) to log(gnp). > (gnpgr.ar = arima(gnpgr, order = c(1, 0, 0))) arima(x = gnpgr, order = c(1, 0, 0)) ar1 intercept 0.3467 0.0083 s.e. 0.0627 0.0010 Modeling in R R says intercept but means mean. Therefore the fitted model is or equivalently x t.0083 =.347(x t 1.0083) + w t x t =.005 +.347x t 1 + w t i.e. if α is the intercept and µ is the mean, then α = µ(1 φ) sigma^2 estimated as 9.03e-05: log likelihood = 718.61, aic = -1431.22 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 8/ 34 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 9/ 34
Modeling in R From the expression α = µ(1 φ), we see σ α = σ m u(1 φ). Therefore we can write down the fitted model which incorporates the standard errors of the estimators x t =.005 (.0006) +.347 (.063) x t 1 + w t and σ = 9.03 10 5.0095. Also R has an issue with the I part of ARIMA fits where there is an AR component, so first difference the data then fit an ARMA model. Modeling Percentage Quarterly Growth of US GNP We fit MA(2) to log(gnp). > (gnpgr.ma = arima(gnpgr, order = c(0, 0, 2))) arima(x = gnpgr, order = c(0, 0, 2)) ma1 ma2 intercept 0.3028 0.2035 0.0083 s.e. 0.0654 0.0644 0.0010 sigma^2 estimated as 8.92e-05: log likelihood = 719.96, aic = -1431.93 The R output indicates the model x t =.0083 (.001) +.303 (.065) w t 1 +.204 (.064) w t 2 + w t with σ =.0094. Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 10/ 34 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 11/ 34 The Two Models Aren t That Different The first 10 terms of the MA( ) representation of the AR(1) model is computed in R as > ARMAtoMA(ar=.35, ma=0, 10) # prints psi-weights [1] 3.500000e-01 1.225000e-01 4.287500e-02 1.500625e-02 [5] 5.252187e-03 1.838266e-03 6.433930e-04 2.251875e-04 [9] 7.881564e-05 2.758547e-05 So one (rather crude) approximation to the model is x t =.35x t 1 + w t x t =.35w t 1 + 1.23w t 2 + w t which is close to the fitted MA(2) model x t =.0083 (.001) +.303 (.065) w t 1 +.204 (.064) w t 2 + w t. Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 12/ 34 Diagnostic Checking Investigate the residuals x t x t 1 t or standardized residuals e t = x t x t 1 Pt 1 t t If the model fits well, the residuals should behave like an iid sequence with mean zero and variance one. Diagnostic Checks Check the plot of Standardized residuals for patterns and outliers. Check the ACF, ˆρ, for significance lags. Use the Ljung-Box-Pierce Q-statistic to measure collective autocorralative (not just significance at a single lag). The Ljung-Box-Pierce Q-statistic is given as H ρ 2 Q = n(n + 2) e(h) n h Under the null of model adequacy, Q as the asymptotic distribution Q χ 2 H p q. Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 13/ 34 h=1
Diagnostic Checking of gnpgr.ma > tsdiag(gnpgr.ma, gof.lag=20) Testing Normality and Outliers The following approaches are useful in testing normality and identifying outliers histogram of the residuals QQ-plot of the residuals Shapiro-Wilk test of normality > hist(gnpgr.ma$resid, br=12) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 14/ 34 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 15/ 34 Testing Normality and Outliers in gnpgr.ma Return of the Varve Series > qqnorm(gnpgr.ma$resid) > qqline(gnpgr.ma$resid, col = 2) Recall we argued considering log(varve) instead of varve directly. Let s take a closer look at log(varve). > varve = scan("mydata/varve.dat") > varve2=diff(log(varve)) > ts.plot(varve2) > acf(varve2,lwd=5) > pacf(varve2,lwd=5) > shapiro.test(gnpgr.ma$resid) Shapiro-Wilk normality test data: gnpgr.ma$resid W = 0.9803, p-value = 0.003416 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 16/ 34 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 17/ 34
Diagnostics of ARIMA(0,1,1) on Logged Varve Data > (varve.ma = arima(log(varve), order = c(0, 1, 1))) arima(x = log(varve), order = c(0, 1, 1)) ma1-0.7705 s.e. 0.0341 sigma^2 estimated as 0.2353: log likelihood = -440.72, aic = 885.44 > tsdiag(varve.ma) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 18/ 34 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 19/ 34 Fitting ARIMA(1,1,1) to Logged Varve Data > pacf(varve.ma$resid, lwd=5) > (varve.arma = arima(log(varve), order = c(1, 1, 1))) arima(x = log(varve), order = c(1, 1, 1)) ar1 ma1 0.2330-0.8858 s.e. 0.0518 0.0292 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 20/ 34 sigma^2 est as 0.2284: log likelihood = -431.44, aic = 868.88 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 21/ 34
Watch Out for Overfitting! Model Selection in US GNP Series n = length(gnpgr) kma = length(gnpgr.ma$coef) sma=gnpgr.ma$sigma2 kar = length(gnpgr.ar$coef) sar=gnpgr.ar$sigma2 # sample size # number of parameters in ma model # mle of sigma^2 # number of parameters in ar model # mle of sigma^2 # AIC Returned Value log(sma) + (n+2*kma)/n # MA2-8.298 log(sar) + (n+2*kar)/n # AR1-8.294 # AICc log(sma) + (n+kma)/(n-kma-2) # MA2-8.288 log(sar) + (n+kar)/(n-kar-2) # AR1-8.285 # BIC log(sma) + kma*log(n)/n # MA2-9.252 log(sar) + kar*log(n)/n # AR1-9.264 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 22/ 34 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 23/ 34 Seasonal ARMA(P, Q) SARIMA Model Seasonal ARMA(P, Q) is used when seasonal (hence nonstationary) behavior is present in the time series. We use the model Φ P (B s )x t = Θ Q (B s )w t where s = 12 if data is in months and s = 4 if data is in quarters, etc. Seasonal differencing may be in order if the seasonal component follows a random walk, as in S t = S t 12 + v t The seasonal difference of order D is defined as Definition (SARIMA Model) The seasonal autoregressive integrated moving average model of Box and Jenkins (1970) is given by Φ P (B s )φ(b) D s d x t = α + Θ Q (B s )θ(b)w t and is denoted as an ARIMA(p, d, q) (P, D, Q) s. D s x t = (1 B s ) D x t Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 25/ 34 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 26/ 34
Federal Reserve Board Production Index > prod=ts(scan("mydata/prod.dat"), start=1948, frequency=12) > ts.plot(prod) Federal Reserve Board Production Index par(mfrow=c(2,1)) # (P)ACF of d1 data acf(diff(prod), 48) pacf(diff(prod), 48) > par(mfrow=c(2,1)) > acf(prod, 48) > pacf(prod, 48) par(mfrow=c(2,1)) # (P)ACF of d1-d12 data acf(diff(diff(prod),12), 48) pacf(diff(diff(prod),12), 48) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 27/ 34 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 28/ 34 Federal Reserve Board Production Index > prod.fit3 = arima(prod, order=c(1,1,1), + seasonal=list(order=c(2,1,1), period=12)) > prod.fit3 # to view the results arima(x = prod, order = c(1, 1, 1), seasonal = list(order = c(2, 1, 1), period = 12)) ar1 ma1 sar1 sar2 sma1 0.5753-0.2709-0.2153-0.2800-0.4968 s.e. 0.1120 0.1300 0.0784 0.0619 0.0712 sigma^2 estimated as 1.351: log likelihood = -568.22, aic = 1148.43 > tsdiag(prod.fit3, gof.lag=48) # diagnostics Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 29/ 34 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 30/ 34
Federal Reserve Board Production Index Textbook Reading > prod.pr = predict(prod.fit3, n.ahead=12) > U = prod.pr$pred + 2*prod.pr$se > L = prod.pr$pred - 2*prod.pr$se > ts.plot(prod,prod.pr$pred, col=1:2, type="o", ylim=c(105,175), xlim=c(1975,1980)) > lines(u, col="blue", lty="dashed") > lines(l, col="blue", lty="dashed") Read the following sections from the textbook 4.1 (Introduction to Spectral Analysis) 4.2 (Cyclical Behavior and Periodicity) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 31/ 34 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 33/ 34 Textbook Problems Do the following exercises from the textbook 3.31 3.35 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 34/ 34