STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9)
Outline 1 Building ARIMA Models 2 SARIMA 3 Homework 4c Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 2/ 34
Outline 1 Building ARIMA Models 2 SARIMA 3 Homework 4c Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 3/ 34
Return Rate Suppose x t is the value of an investment at time t and p t is the percentage changes from t 1 to t (which may be negative). Then we can write Taking logs produces equivalently x t = (1 + p t )x t 1 log(x t ) = log(1 + p t ) + log(x t 1 ) log(x t ) = log(1 + p t ) p t where the approximation holds when p t is close to zero. Another representation of log(x t ) is ( ) xt log(x t ) = log(x t ) log(x t 1 ) = log. x t 1 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 4/ 34
US Gross National Product We consider the seasonaly adjusted quarterly US GNP from 1947(1) to 2003(3) giving a total of n = 223 observations. http://research.stlouisfed.org/ (Economic Data FREDR Gross Domestic Product (GDP) and Components GDP/GNP GNP) > gnp96 = read.table("mydata/gnp96.dat") > gnp = ts(gnp96[,2], start=1947, frequency=4) > plot(gnp,lwd=3) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 5/ 34
US Gross National Product (cont) Just for kicks, lets look at the acf. > acf(gnp, 50) Simple differencing may not be the answer. > plot(diff(gnp)) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 6/ 34
Percentage Quarterly Growth of US GNP Instead, we consider the growth rate x t = log(y t ). > gnpgr = diff(log(gnp)) # growth rate > plot.ts(gnpgr) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 7/ 34
Modeling Percentage Quarterly Growth of US GNP The plots of the ACF and PACF of the GNP growth rate indicates two potential models for the log GNP series: ARIMA(0,1,2) ARIMA(1,1,0) We fit AR(1) to log(gnp). > (gnpgr.ar = arima(gnpgr, order = c(1, 0, 0))) Call: arima(x = gnpgr, order = c(1, 0, 0)) Coefficients: ar1 intercept 0.3467 0.0083 s.e. 0.0627 0.0010 sigma^2 estimated as 9.03e-05: log likelihood = 718.61, aic = -1431.22 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 8/ 34
Modeling in R R says intercept but means mean. Therefore the fitted model is x t.0083 =.347(x t 1.0083) + w t or equivalently x t =.005 +.347x t 1 + w t i.e. if α is the intercept and µ is the mean, then α = µ(1 φ) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 9/ 34
Modeling in R From the expression α = µ(1 φ), we see σ α = σ m u(1 φ). Therefore we can write down the fitted model which incorporates the standard errors of the estimators x t =.005 (.0006) +.347 (.063) x t 1 + w t and σ = 9.03 10 5.0095. Also R has an issue with the I part of ARIMA fits where there is an AR component, so first difference the data then fit an ARMA model. Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 10/ 34
Modeling Percentage Quarterly Growth of US GNP We fit MA(2) to log(gnp). > (gnpgr.ma = arima(gnpgr, order = c(0, 0, 2))) Call: arima(x = gnpgr, order = c(0, 0, 2)) Coefficients: ma1 ma2 intercept 0.3028 0.2035 0.0083 s.e. 0.0654 0.0644 0.0010 sigma^2 estimated as 8.92e-05: log likelihood = 719.96, aic = -1431.93 The R output indicates the model with σ =.0094. x t =.0083 (.001) +.303 (.065) w t 1 +.204 (.064) w t 2 + w t Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 11/ 34
The Two Models Aren t That Different The first 10 terms of the MA( ) representation of the AR(1) model is computed in R as > ARMAtoMA(ar=.35, ma=0, 10) # prints psi-weights [1] 3.500000e-01 1.225000e-01 4.287500e-02 1.500625e-02 [5] 5.252187e-03 1.838266e-03 6.433930e-04 2.251875e-04 [9] 7.881564e-05 2.758547e-05 So one (rather crude) approximation to the model x t =.35x t 1 + w t is x t =.35w t 1 + 1.23w t 2 + w t which is close to the fitted MA(2) model x t =.0083 (.001) +.303 (.065) w t 1 +.204 (.064) w t 2 + w t. Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 12/ 34
Diagnostic Checking Investigate the residuals x t x t t 1 or standardized residuals e t = x t x t t 1 P t 1 t If the model fits well, the residuals should behave like an iid sequence with mean zero and variance one. Diagnostic Checks Check the plot of Standardized residuals for patterns and outliers. Check the ACF, ˆρ, for significance lags. Use the Ljung-Box-Pierce Q-statistic to measure collective autocorralative (not just significance at a single lag). The Ljung-Box-Pierce Q-statistic is given as H ρ 2 Q = n(n + 2) e(h) n h Under the null of model adequacy, Q as the asymptotic distribution Q χ 2 H p q. Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 13/ 34 h=1
Diagnostic Checking of gnpgr.ma > tsdiag(gnpgr.ma, gof.lag=20) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 14/ 34
Testing Normality and Outliers The following approaches are useful in testing normality and identifying outliers histogram of the residuals QQ-plot of the residuals Shapiro-Wilk test of normality > hist(gnpgr.ma$resid, br=12) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 15/ 34
Testing Normality and Outliers in gnpgr.ma > qqnorm(gnpgr.ma$resid) > qqline(gnpgr.ma$resid, col = 2) > shapiro.test(gnpgr.ma$resid) Shapiro-Wilk normality test data: gnpgr.ma$resid W = 0.9803, p-value = 0.003416 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 16/ 34
Return of the Varve Series Recall we argued considering log(varve) instead of varve directly. Let s take a closer look at log(varve). > varve = scan("mydata/varve.dat") > varve2=diff(log(varve)) > ts.plot(varve2) > acf(varve2,lwd=5) > pacf(varve2,lwd=5) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 17/ 34
Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 18/ 34
Diagnostics of ARIMA(0,1,1) on Logged Varve Data > (varve.ma = arima(log(varve), order = c(0, 1, 1))) Call: arima(x = log(varve), order = c(0, 1, 1)) Coefficients: ma1-0.7705 s.e. 0.0341 sigma^2 estimated as 0.2353: log likelihood = -440.72, aic = 885.44 > tsdiag(varve.ma) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 19/ 34
Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 20/ 34
Fitting ARIMA(1,1,1) to Logged Varve Data > pacf(varve.ma$resid, lwd=5) > (varve.arma = arima(log(varve), order = c(1, 1, 1))) Call: arima(x = log(varve), order = c(1, 1, 1)) Coefficients: ar1 ma1 0.2330-0.8858 s.e. 0.0518 0.0292 sigma^2 est as 0.2284: log likelihood = -431.44, aic = 868.88 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 21/ 34
Watch Out for Overfitting! Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 22/ 34
Model Selection in US GNP Series n = length(gnpgr) kma = length(gnpgr.ma$coef) sma=gnpgr.ma$sigma2 kar = length(gnpgr.ar$coef) sar=gnpgr.ar$sigma2 # AIC Returned Value log(sma) + (n+2*kma)/n # MA2-8.298 log(sar) + (n+2*kar)/n # AR1-8.294 # AICc log(sma) + (n+kma)/(n-kma-2) # MA2-8.288 log(sar) + (n+kar)/(n-kar-2) # AR1-8.285 # BIC log(sma) + kma*log(n)/n # MA2-9.252 log(sar) + kar*log(n)/n # AR1-9.264 # sample size # number of parameters in ma model # mle of sigma^2 # number of parameters in ar model # mle of sigma^2 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 23/ 34
Outline 1 Building ARIMA Models 2 SARIMA 3 Homework 4c Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 24/ 34
Seasonal ARMA(P, Q) Seasonal ARMA(P, Q) is used when seasonal (hence nonstationary) behavior is present in the time series. We use the model Φ P (B s )x t = Θ Q (B s )w t where s = 12 if data is in months and s = 4 if data is in quarters, etc. Seasonal differencing may be in order if the seasonal component follows a random walk, as in S t = S t 12 + v t The seasonal difference of order D is defined as D s x t = (1 B s ) D x t Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 25/ 34
SARIMA Model Definition (SARIMA Model) The seasonal autoregressive integrated moving average model of Box and Jenkins (1970) is given by Φ P (B s )φ(b) D s d x t = α + Θ Q (B s )θ(b)w t and is denoted as an ARIMA(p, d, q) (P, D, Q) s. Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 26/ 34
Federal Reserve Board Production Index > prod=ts(scan("mydata/prod.dat"), start=1948, frequency=12) > ts.plot(prod) > par(mfrow=c(2,1)) > acf(prod, 48) > pacf(prod, 48) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 27/ 34
Federal Reserve Board Production Index par(mfrow=c(2,1)) # (P)ACF of d1 data acf(diff(prod), 48) pacf(diff(prod), 48) par(mfrow=c(2,1)) # (P)ACF of d1-d12 data acf(diff(diff(prod),12), 48) pacf(diff(diff(prod),12), 48) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 28/ 34
Federal Reserve Board Production Index > prod.fit3 = arima(prod, order=c(1,1,1), + seasonal=list(order=c(2,1,1), period=12)) > prod.fit3 # to view the results Call: arima(x = prod, order = c(1, 1, 1), seasonal = list(order = c(2, 1, Coefficients: ar1 ma1 sar1 sar2 sma1 0.5753-0.2709-0.2153-0.2800-0.4968 s.e. 0.1120 0.1300 0.0784 0.0619 0.0712 sigma^2 estimated as 1.351: log likelihood = -568.22, aic = 1148. > tsdiag(prod.fit3, gof.lag=48) # diagnostics Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 29/ 34
Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 30/ 34
Federal Reserve Board Production Index > prod.pr = predict(prod.fit3, n.ahead=12) > U = prod.pr$pred + 2*prod.pr$se > L = prod.pr$pred - 2*prod.pr$se > ts.plot(prod,prod.pr$pred, col=1:2, type="o", ylim=c(105,175), xl > lines(u, col="blue", lty="dashed") > lines(l, col="blue", lty="dashed") Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 31/ 34
Outline 1 Building ARIMA Models 2 SARIMA 3 Homework 4c Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 32/ 34
Textbook Reading Read the following sections from the textbook 4.1 (Introduction to Spectral Analysis) 4.2 (Cyclical Behavior and Periodicity) Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 33/ 34
Textbook Problems Do the following exercises from the textbook 3.31 3.35 Arthur Berg STA 6857 ARIMA and SARIMA Models ( 3.8 and 3.9) 34/ 34