AR, MA and ARMA models - PDF Free Download

AR, MA and AR by Hedibert Lopes P Based on Tsay s Analysis of Financial Time Series (3rd edition)

P 1 Stationarity 2 3 4 5 6 7 P 8 9 10 11 Outline

P Linear Time Series Analysis and Its Applications For basic concepts of linear time series analysis see Box, Jenkins, and Reinsel (1994, Chapters 2-3), and Brockwell and Davis (1996, Chapters 1-3) Shumway and Stoffer (2011, Chapters 1-3) The theories of linear time series discussed include stationarity dynamic dependence autocorrelation function modeling forecasting

P The econometric models introduced include (a) simple autoregressive models, (b) simple moving-average models, (b) mixed autoregressive moving-average models, (c) seasonal models, (d) unit-root nonstationarity, (e) regression models with time series errors, and (f) fractionally differenced models for long-range dependence.

Strict stationarity P The foundation of time series analysis is stationarity. A time series {r t } is said to be strictly stationary if the joint distribution of (r t1,..., r tk ) is identical to that of (r t1 +t,..., r tk +t) for all t, where k is an arbitrary positive integer and (t 1,..., t k ) is a collection of k positive integers. The joint distribution of (r t1,..., r tk ) is invariant under time shift. This is a very strong condition that is hard to verify empirically.

P Weak stationarity A time series {r t } is weakly stationary if both the mean of r t and the covariance between r t and r t l are time invariant, where l is an arbitrary integer. More specifically, {r t } is weakly stationary if (a) E(r t ) = µ, which is a constant, and (b) Cov(r t, r t l ) = γ l, which only depends on l. In practice, suppose that we have observed T data points {r t t = 1,..., T }. The weak stationarity implies that the time plot of the data would show that the T values fluctuate with constant variation around a fixed level. In applications, weak stationarity enables one to make inference concerning future observations (e.g., prediction).

P The covariance γ = Cov(r t, r t l ) is called the lag-l autocovariance of r t. It has two important properties: (a) γ 0 = V ar(r t ), and (b) γ l = γ l. The second property holds because Properties Cov(r t, r t ( l) ) = Cov(r t ( l), r t ) = Cov(r t+l, r t ) = Cov(r t1, r t1 l), where t 1 = t + l.

P Autocorrelation function The autocorrelation function of lag l is ρ l = Cov(r t, r t l ) V ar(rt )V ar(r t l ) = Cov(r t, r t l ) = γ l V ar(r t ) γ 0 where the property V ar(r t ) = V ar(r t l ) for a weakly stationary series is used. In general, the lag-l sample autocorrelation of r t is defined as ˆρ l = T t=l+1 (r t r)(r t l r) T t=1 (r t r) 2 0 l < T 1.

P Portmanteau Box and Pierce (1970) propose the Portmanteau statistic m Q (m) = T l=1 as a statistic for the null hypothesis H 0 : ρ 1 = = ρ m = 0 against the alternative hypothesis ˆρ 2 l H a : ρ i 0 for some i {1,..., m}. Under the assumption that {r t } is an iid sequence with certain moment conditions, Q (m) is asymptotically χ 2 m.

Ljung and Box (1978) P Ljung and Box (1978) modify the Q (m) statistic as below to increase the power of the in finite samples, Q(m) = T (T + 2) m l=1 ˆρ 2 l T l. The decision rule is to reject H 0 if Q(m) > q 2 α, where q 2 α denotes the 100(1 α)th percentile of a χ 2 m distribution.

P # Load data da = read.table("http://faculty.chicagobooth.edu/ruey.tsay/teaching/fts3/m-ibm3dx2608.txt", header=true) # IBM simple returns and squared returns sibm=da[,2] sibm2 = sibm^2 # par(mfrow=c(1,2)) acf(sibm) acf(sibm2) # statistic Q(30) Box.(sibm,lag=30,type="Ljung") Box.(sibm2,lag=30,type="Ljung") > Box.(sibm,lag=30,type="Ljung") Box-Ljung data: sibm X-squared = 38.241, df = 30, p-value = 0.1437 > Box.(sibm2,lag=30,type="Ljung") Box-Ljung data: sibm2 X-squared = 182.12, df = 30, p-value < 2.2e-16

P A time series r t is called a white noise if {r t } is a sequence of independent and identically distributed random variables with finite mean and variance. All the s are zero. If r t is normally distributed with mean zero and variance σ 2, the series is called a Gaussian white noise.

P Linear Time Series A time series r t is said to be linear if it can be written as r t = µ + ψ i a t i, where µ is the mean of r t, ψ 0 = 1, and {a t } is white noise. i=0 a t denotes the new information at time t of the time series and is often referred to as the innovation or shock at time t. If r t is weakly stationary, we can obtain its mean and variance easily by using the independence of {a t } as E(r t ) = µ, where σ 2 a is the variance of a t. V (r t ) = σ 2 a ψi 2, i=0

P The lag-l aucovariance of r t is so γ l = Cov(r t, r t l ) ( ) = E ψ i a t i ψ j a t l j i=0 j=0 = E ψ i ψ j a t i a t l j = i=0 i,j=0 ψ j+l ψ j E(a 2 t l j ) = σ2 a ψ j ψ j+l, ρ l = γ l γ 0 = j=0 i=0 ψ iψ i+l 1 + i=1 ψ2 i Linear time series models are econometric and statistical models used to describe the pattern of the ψ weights of r t.

P AR(1) For instance, an stationary AR(1) model can be written as r t µ = φ 1 (r t 1 µ) + a t where {a t } is white noise. It is easy to see that r t µ = φ i 1a t i, and i=0 V (r t ) = 1 φ 2, 1 provided that φ 2 1 < 1. In other words, the weak stationarity of an AR(1) model implies that φ 1 < 1. Using φ 0 = (1 φ 1 )µ, one can rewrite a stationary AR(1) as σ2 a r t = φ 0 + φ 1 r t 1 + a t, with φ 1 measuring the persistence of the process.

P Autocovariance function The of a stationary AR(1) process is γ 0 = E[(r t µ)(r t µ)] = E[(φ 1 (r t 1 µ) + ɛ t )(r t µ)] = φ 1 E[(r t 1 µ)(r t µ)] + E(ɛ t (r t µ)] = φ 1 γ 1 + σa, 2 and γ l = E[(r t µ)(r t l µ)] = E[(φ 1 (r t 1 µ) + ɛ t )(r t l µ)] = φ 1 E[(r t 1 µ)(r t 1 l µ)] = φ 1 γ l 1 = φ l 1γ 0, for l = ±1, ±2,....

Autocorrelation function Stationarity From γ 1 = φ 1 γ 0 and γ 0 = φ 1 γ 1 + σ 2 a, it follows that P γ 0 = σ2 1 φ 2. 1 The of a (weakly) stationary AR(1) process is ρ l = φ l 1 l = ±1, ±2,... which decays exponentially with rate φ 1.

P An AR(2) model assumes the form where r t = φ 0 + φ 1 r t 1 + φ 2 r t 2 + a t, E(r t ) = µ = provided that φ 1 + φ 2 1. φ 0 1 φ 1 φ 2, AR(2) It is easy to see that γ 0 = φ 1 γ 1 + φ 2 γ 2 + σ 2 a, for l 0, γ l = φ 1 γ l 1 + φ 2 γ l 2, for l 0, ρ l = φ 1 ρ l 1 + φ 2 ρ l 2, l = ±2, 3,...

It is easy to see that Stationarity P and φ 1 ρ 1 = 1 φ 2 ρ 2 = φ 1 ρ 1 + φ 2, γ 0 = φ 1 ρ 1 γ 0 + φ 2 ρ 2 γ 0 + σ 2 a = (φ 1 ρ 1 + φ 2 (φ 1 ρ 1 + φ 2 )) γ 0 + σ 2 a = ( φ 1 (1 + φ 2 )ρ 1 + φ 2 2) γ0 + σa 2 ( ) φ 2 = 1 (1 + φ 2 ) + φ 2 2 γ 0 + σa 2 1 φ 2 such that { γ 0 = 1 φ 2 (1 + φ 2 )[(1 φ 2 2 ) φ2 1 ] } σ 2 a

P https://stats.stackexchange.com/questions/118019/a-proof-for-the-stationarity-of-an-ar2 pdf(file="ar2-stationarityregion.pdf",width=7,height=7) phi1 <- seq(from = -2.5, to = 2.5, length = 51) plot(phi1,1+phi1,lty="dashed",type="l", xlab="",ylab="",cex.axis=.8,ylim=c(-1.5,1.5)) abline(a = -1, b = 0, lty="dashed") abline(a = 1, b = -1, lty="dashed") title(ylab=expression(phi[2]),xlab=expression(phi[1]),cex.lab=.8) polygon(x = phi1[6:46], y = 1-abs(phi1[6:46]), col="gray") lines(phi1,-phi1^2/4) text(0,-.5,expression(phi[2]<phi[1]^2/4),cex=.7) text(1.2,.5,expression(phi[2]>1-phi[1]),cex=.7) text(-1.75,.5,expression(phi[2]>1+phi[1]),cex=.7) points(1.75,-0.8,pch=16,col=1) points(0.9,0.05,pch=16,col=2) dev.off()

P Two sets of φ s Let us compute the for two cases: (φ 1, φ 2 ) = (1.75, 0.8) and (φ 1, φ 2 ) = (0.9, 0.05). φ 2 1.5 1.0 0.5 0.0 0.5 1.0 1.5 φ2 > 1+φ1 φ2 < φ 1 2 4 φ2 > 1 φ1 2 1 0 1 2 φ 1

Theoretical s P acf.hedi = function(k,phi1,phi2){ rhos = rep(0,k) rhos[1] = phi1/(1-phi2) rhos[2] = phi1*rhos[1]+phi2 for (l in 3:k) rhos[l] = phi1*rhos[l-1]+phi2*rhos[l-2] plot(0:k,c(1,rhos),type="h",xlab="lag",ylab="") title(substitute(paste(phi[1],"=",a,", ",phi[2],"=",b), list(a=phi1,b=phi2))) abline(h=0,lty=2) } pdf(file="ar2-acf.pdf",width=7,height=5) par(mfrow=c(1,2)) acf.hedi(100,1.75,-0.8) acf.hedi(150,0.9,0.05) dev.off()

Theoretical s φ 1 =1.75, φ 2 = 0.8 φ 1 =0.9, φ 2 =0.05 P 0.2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 Lag 0 50 100 150 Lag

P Empirical s Let us simulate some data and compute the empirical s. acf1.hedi = function(n,k,phi1,phi2){ y = rep(0,2*n) for (t in 3:(2*n)) y[t] = phi1*y[t-1]+phi2*y[t-2]+rnorm(1,0,1) y = y[(n+1):(2*n)] acf(y,k) } pdf(file="ar2-acf-1.pdf",width=10,height=7) par(mfrow=c(2,2)) acf.hedi(100,1.75,-0.8) acf.hedi(150,0.9,0.05) set.seed(1234) acf1.hedi(10000,100,1.75,-0.8) acf1.hedi(10000,150,0.9,0.05) dev.off()

n = 10, 000 Theoretical vs Empirical φ 1=1.75, φ 2= 0.8 φ 1=0.9, φ 2=0.05 P 0.2 0.2 0.6 1.0 0 20 40 60 80 100 Lag 0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150 Lag 0.2 0.2 0.4 0.6 0.8 1.0 Series y 0.0 0.2 0.4 0.6 0.8 1.0 Series y 0 20 40 60 80 100 0 50 100 150 Lag Lag

n = 1, 000 Theoretical vs Empirical φ 1=1.75, φ 2= 0.8 φ 1=0.9, φ 2=0.05 P 0.2 0.2 0.6 1.0 0 20 40 60 80 100 Lag 0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150 Lag 0.2 0.2 0.6 1.0 Series y 0.0 0.2 0.4 0.6 0.8 1.0 Series y 0 20 40 60 80 100 0 50 100 150 Lag Lag

n = 500 Theoretical vs Empirical φ 1=1.75, φ 2= 0.8 φ 1=0.9, φ 2=0.05 P 0.2 0.2 0.6 1.0 0 20 40 60 80 100 0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150 Lag Lag 0.2 0.2 0.6 1.0 Series y 0.0 0.2 0.4 0.6 0.8 1.0 Series y 0 20 40 60 80 100 0 50 100 150 Lag Lag

n = 200 Theoretical vs Empirical φ 1=1.75, φ 2= 0.8 φ 1=0.9, φ 2=0.05 P 0.2 0.2 0.6 1.0 0 20 40 60 80 100 0.0 0.2 0.4 0.6 0.8 1.0 0 50 100 150 Lag Lag 0.5 0.0 0.5 1.0 Series y 0.2 0.2 0.6 1.0 Series y 0 20 40 60 80 100 0 50 100 150 Lag Lag

US real GNP P As an illustration, consider the quarterly growth rate of U.S. real gross national product (GNP), seasonally adjusted, from the second quarter of 1947 to the first quarter of 1991. Here we simply employ an AR(3) model for the data. Denoting the growth rate by r t the fitted model is r t = 0.0047 + 0.348r t 1 + 0.179r t 2 0.142r t 3 + a t, with ˆσ a = 0.0097.

P Alternatively, r t 0.348r t 1 0.179r t 2 + 0.142r t 3 = 0.0047 + a t, with the corresponding third-order difference equation (1 0.348B 0.179B 2 + 0.142B 3 ) = 0 or (1 + 0.521B)(1 0.869B + 0.274B 2 ) = 0 The first factor (1 + 0.521B) shows an exponentially decaying feature of the GNP.

P Business cycles The second factor (1 0.869B + 0.274B 2 ) confirms the existence of stochastic business cycles. For an AR(2) model with a pair of complex characteristic roots, the average length of the stochastic cycles is 2π k = cos 1 [φ 1 /(2 φ 2 )] or k = 10.62 quarters, which is about 3 years. Fact: If one uses a nonlinear model to separate U.S. economy into expansion and contraction periods, the data show that the average duration of contraction periods is about 3 quarters and that of expansion periods is about 12 quarters. The average duration of 10.62 quarters is a compromise between the two separate durations. The periodic feature obtained here is common among growth rates of national economies. For example, similar features can be found for many OECD countries.

P R code gnp=scan(file="http://faculty.chicagobooth.edu/ruey.tsay/teaching/fts3/dgnp82.txt") # To create a time-series object gnp1=ts(gnp,frequency=4,start=c(1947,2)) par(mfrow=c(1,1)) plot(gnp1) points(gnp1,pch="*") # Find the AR order m1=ar(gnp,method="mle") m1$order m2=arima(gnp,order=c(3,0,0)) m2 # In R, intercept denotes the mean of the series. # Therefore, the constant term is obtained below: (1-.348-.1793+.1423)*0.0077 # Residual standard error sqrt(m2$sigma2) # Characteristic equation and solutions p1=c(1,-m2$coef[1:3]) roots = polyroot(p1) # Compute the absolute values of the solutions Mod(roots) [1] 1.913308 1.920152 1.913308 # To compute average length of business cycles: k=2*pi/acos(1.590253/1.913308)

P AR(p) The results of the AR(1) and AR(2) models can readily be generalized to the general AR(p) model: r t = φ 0 + φ 1 r t 1 + + φ p r t p + a t, where p is a nonnegative integer and {a t } is white noise. The mean of a stationary series is φ 0 E(r t ) = 1 φ 1 φ p provided that the denominator is not zero. The associated characteristic equation of the model is 1 φ 1 x φ 2 x 2 φ p x p = 0. If all the solutions of this equation are greater than 1 in modulus, then the series r t is stationary.

P Partial The P of a stationary time series is a function of its and is a useful tool for determining the order p of an AR model. A simple, yet effective way to introduce P is to consider the following in consecutive orders: r t = φ 0,1 + φ 1,1 r t 1 + e 1,t, r t = φ 0,2 + φ 1,2 r t 1 + φ 2,2 r t 2 + e 2,t, r t = φ 0,3 + φ 1,3 r t 1 + φ 2,3 r t 2 + φ 3,3 r t 3 + e 3,t,. The estimate ˆφ i,i the ith equation is called the lag-i sample P of r t.

P For a stationary Gaussian AR(p) model, it can be shown that the sample P has the following properties: ˆφp,p φ p as T. ˆφl,l 0 for all l > p. V ( ˆφ l,l ) 1/T for l > p. These results say that, for an AR(p) series, the sample P cuts off at lag p.

P AIC The well-known Akaike information criterion (AIC) (Akaike, 1973) is defined as AIC = 2 T log(likelihood) + 2 (number of parameters), }{{}} T {{} goodness of fit penalty function where the likelihood function is evaluated at the maximum-likelihood estimates and T is the sample size. For a Gaussian AR(l) model, AIC reduces to AIC(l) = log( σ 2 l ) + 2l T where σ l 2 is the maximum-likelihood estimate of σa, 2 which is the variance of a t and T is the sample size.

Another commonly used criterion function is the SchwarzBayesian information criterion (BIC). BIC P For a Gaussian AR(l) model, the criterion is BIC(l) = log( σ 2 l ) + l log(t ) T The penalty for each parameter used is 2 for AIC and log(t ) for BIC. Thus, BIC tends to select a lower AR model when the sample size is moderate or large.

P For the AR(p) model, suppose that we are at the time index h and are interested in forecasting r h+l where l 1. The time index h is called the forecast origin and the positive integer l is the forecast horizon. Let ˆr h (l) be the forecast of r h+l using the minimum squared error loss function, i.e. E{[r h+l ˆr h (l)] 2 F h } min g E[(r h+l g) 2 F h ], where g is a function of the information available at time h (inclusive), that is, a function of F h. We referred to ˆr h (l) as the l-step ahead forecast of r t at the forecast origin h.

1-step-ahead forecast P It is easy to see that ˆr h (1) = E(r h+1 F h ) = φ 0 + and the associated forecast error is p φ i r h+1 i, i=1 and e h (1) = r h+1 ˆr h (1) = a h+1, V (e h (1)) = V (a h+1 ) = σ 2 a.

2-step-ahead forecast P Similarly, ˆr h (2) = φ 0 + φ 1ˆr h (1) + φ 2 r h + + φ p r h+2 p, with e h (2) = a h+2 + φ 1 a h+1 and V (e h (2)) = (1 + φ 2 1)σa. 2

Multistep-ahead forecast P In general, r h+l = φ 0 + and p φ i r h+l i + a h+l, i=1 ˆr h (l) = φ 0 + where ˆr h (i) = r h+i if i 0. p φ iˆr h (l i), i=1

P Mean reversion It can be shown that for a stationary AR(p) model, ˆr h (l) E(r t ) as l, meaning that for such a series long-term point forecast approaches its unconditional mean. This property is referred to as the mean reversion in the finance literature. For an AR(1) model, the speed of mean reversion is measured by the half-life defined as half-life = log(0.5) log( φ 1 ).

MA(1) model P There are several ways to introduce. One approach is to treat the model as a simple extension of white noise series. Another approach is to treat the model as an infinite-order AR model with some parameter constraints. We adopt the second approach.

P We may entertain, at least in theory, an AR model with infinite order as r t = φ 0 + φ 1 r t 1 + φ 2 r t 2 + + a t. However, such an AR model is not realistic because it has infinite many parameters. One way to make the model practical is to assume that the coefficients φ i s satisfy some constraints so that they are determined by a finite number of parameters. A special case of this idea is r t = φ 0 θ 1 r t 1 θ 2 1r t 2 θ 3 1r t 3 + a t. where the coefficients depend on a single parameter θ 1 via φ i = θ1 i for i 1.

P Obviously, so r t + θ 1 r t 1 + θ1r 2 t 2 + θ1r 3 t 3 + = φ 0 + a t θ 1 (r t 1 + θ 1 r t 2 + θ1r 2 t 3 + θ1r 3 t 4 + ) = θ 1 (φ 0 + a t 1 ) r t = φ 0 (1 θ 1 ) + a t θ 1 a t 1, i.e., r t is a weighted average of shocks a t and a t 1. Therefore, the model is called an MA model of order 1 or MA(1) model for short.

P The general form of an MA(q) model is q r t = c 0 θ i a t i, i=1 MA(q) or r t = c 0 + (1 θ 1 B θ q B q )a t, where q > 0. Moving-average models are always weakly stationary because they are finite linear combinations of a white noise sequence for which the first two moments are time invariant. E(r t ) = c 0 V (r t ) = (1 + θ1 2 + θ2 2 + + θq)σ 2 a. 2

P of an MA(1) Assume that c 0 = 0 for simplicity. Then, r t l r t = r t l a t θ 1 r t l a t 1. Taking expectation, we obtain γ 1 = θ 1 σa 2 and γ l = 0, for l > 1. Since V (r t ) = (1 + θ1 2)σ2 a, it follows that ρ 0 = 1, ρ 1 = θ 1 1 + θ1 2, ρ l = 0, for l > 1.

P of an MA(2) For the MA(2) model, the autocorrelation coefficients are ρ 1 = θ 1 + θ 1 θ 2 1 + θ1 2 +, ρ 2 = θ2 2 Here the cuts off at lag 2. θ 2 1 + θ1 2 +, ρ l = 0, for l > 2. θ2 2 This property generalizes to other. For an MA(q) model, the lag-q is not zero, but ρ l = 0 for l > q. an MA(q) series is only linearly related to its first q-lagged values and hence is a finite-memory model.

Estimation Maximum-likelihood estimation is commonly used to estimate. There are two approaches for evaluating the likelihood function of an MA model. P The first approach assumes that a t = 0 for t 0, so a 1 = r 1 c 0, a 2 = r 2 c 0 + θ 1 a 1, etc. This approach is referred to as the conditional-likelihood method. The second approach treats a t = 0 for t 0, as additional parameters of the model and estimate them jointly with other parameters. This approach is referred to as the exact-likelihood method.

an MA(1) Stationarity P For the 1-step-ahead forecast of an MA(1) process, the model says r h+1 = c 0 + a h+1 θ 1 a h. Taking the conditional expectation, we have ˆr h (1) = E(r h+1 F h ) = c 0 θ 1 a h, e h (1) = r h+1 ˆr h (1) = a h+1 with V [e h (1)] = σ 2 a. Similarly, ˆr h (2) = E(r h+1 F h ) = c 0 e h (2) = r h+2 ˆr h (2) = a h+2 θ 1 a h+1 with V [e h (2)] = (1 + θ 2 1 )σ2 a.

an MA(2) P Similarly, for an MA(2) model, we have r h+l = c 0 + a h+l θ 1 a h+l 1 θ 2 a h+l 2, from which we obtain ˆr h (1) = c 0 θ 1 a h θ 2 a h 1, ˆr h (2) = c 0 θ 2 a h, ˆr h (l) = c 0, for l > 2.

P A brief summary of AR and is in order. We have discussed the following properties: For, is useful in specifying the order because cuts off at lag q for an MA(q) series. For, P is useful in order determination because P cuts off at lag p for an AR(p) process. An MA series is always stationary, but for an AR series to be stationary, all of its characteristic roots must be less than 1 in modulus. Carefully read Section 2.6 of Tsay (2010) about AR.