Generalised AR and MA Models and Applications

Chapter 3 Generalised AR and MA Models and Applications 3.1 Generalised Autoregressive Processes Consider an AR1) process given by 1 αb)x t = Z t ; α < 1. In this case, the acf is, ρ k = α k for k 0 and the pacf is: 1 k = 0 φ k = α k = 1 0 k with spectral density function: f X ω) = σ π1 α cos ω + α ) π ω π If you consider some simulations of time series generated by 1 αb) δ X t = Z t ; α < 1, δ > 0 you notice that the ACF, PACF and spectrum are similar for various valued of δ. Therefore it is difficult to identify these cases from the standard AR1) case with δ = 1. Therefore we call the class of time series generated by 1 αb) δ X t = Z t Generalised first order autoregression or GAR1). It can be seen that this parameter δ may increase the accuracy of forecast values. We now discuss some properties of this new class of GAR1). 33

3.1.1 Some basic results 1. Gamma Function t x 1 e t dt x > 0 0 Γx) = undefined x = 0 x 1 Γx + 1) x < 0 It can be seen that Γx) = x 1)Γx 1) so when x is an integer, Γx) = x 1)!. Furthermore, Γ 1 ) = π.. Beta Function It can be shown that Bx, y) = 1 0 t x 1 1 t) y 1 dt Bx, y) = Γx)Γy) Γx + y) 3. Binomial Coefficients for any n R: n nn 1) n k + 1) = k kk 1) 1 = Γn + 1) Γk + 1)Γn k + 1) When n is a positive integer, n = n C k = k n! n k)!k! Note that you can t define factorials when n is not an integer. 4. Hypergeometric Function F a, b; c; x) = 1+ ab 1!c aa + 1)bb + 1) x+ x +!cc + 1) We can re-express this using Pochhammer notation: aa + 1)a + )bb + 1)b + ) x 3 +... 3!cc + 1)c + ) a) j = aa + 1) a + j 1) and a 0 ) = 1 then Further, since a) j = Γa+j) Γa) F a, b; c; x) = we have a) j b) j x j j!c) j F a, b; c; x) = Γa + j)γb + j)γc) Γj + 1)Γa)Γb)Γc + j) xj 34

Now, our model is 1 αb) δ X t = Z t or equivalently, X t = 1 αb) δ Z t. So we can look at the general binomial expansion: and 1 x) 1 = 1 + x + x + x 3 +... 1 x) n = r=0 n x) r r when n is an integer, the above expression cuts off at a particular point. For our purposes we have: δ 1 αb) δ = αb) j, δ R j j=1 δ = 1) j αb) j j Let ) π j = 1) j δ j = 1)j δδ 1) δ j + 1) j! δ) δ + 1) δ + j 1) = j! Γj δ) = Γ δ)γj + 1) Thus, 1 αb) δ X t = Z t is equivalent to the AR ) process: π j α j X t j = Z t which can be expressed in equivalent MA ) form: X t = 1 αb) δ Z t we can derive the form of the weights as above, just changing +δ to δ: X t = ψ j α j Z t j j=1 where i.e. π j with δ changed to +δ). ψ j = Γj + δ) Γδ)Γj + 1) 35

3.1. Convergence Theorem 3.1.1. converges in the mean square sense. X t = ψ j α j Z t j Proof. EX t ) = σ ψ j α j ) Recall the ratio test of convergence): u 1, u t,..., let u j+1 lim = L j u j if L < 1 then u j converges; L = 1 then there s no decision; and if L > 1 then u j diverges. Letting u j = ψ j α j, then So, we have, u j+1 = ψ j+1 α u j = ψ j Γj + 1 + δ) Γj + )Γδ) = j + δ j + 1 α 1 + δ j = 1 + 1 α j u j+1 lim j u j Thus, as α < 1, the result follows. Γj + 1)Γδ) α Γj + δ) 1 + δ j = lim j 1 + 1 α = α j 3.1.3 Autocovariance function γ k = covx t, X t k ) = EX t X t k ) [ ) )] = E ψ j α j Z t j ψ l α l Z t k l l=0 = σ ψ k α k + ψ k+1 ψ 1 α k+ + ψ k+ ψ α k+4 +... ) = σ α k ψ j+k ψ j α j 36

When k = 0, γ 0 = varx t ) = σ ψ j α j. Note that when δ = 1, ψ j = Γj+δ) Γj+1)Γδ) = 1 and we collapse back to our standard case: γ 0 = varx t ) = σ Recall the Hypergeometric function α j = σ 1 α. F θ 1, θ ; θ 3 ; θ) = Γθ 1 + j)γθ + j)γθ 3 )θ j Γθ 1 )Γθ )Γθ 3 + j)γj + 1) Now, we have γ 0 = σ ψ j ) α j = σ Γj + δ) Γj + δ) Γj + 1)Γδ) Γj + 1)Γδ) αj = σ F δ, δ; 1; α ) I.e. we have a hypergeometric function with θ 1 = θ = δ, θ 3 = 1 and θ = α. Similarly, we can re-express the autocovariance function as: γ k = σ α k ψ k+j ψ j α j = σ α k Γk + δ) Γδ)Γk + 1) F δ, k + δ; k + 1, α ) Combining these results we have the autocorrelation function: ρ k = γ k = αk Γk + δ)f δ, k + δ; k + 1; α ) γ 0 Γδ)Γk + 1)F δ, δ; 1; α ) 3.1.4 Spectral Density Function Recall when we have X t = GB)Z t, f X ω) = Ge iω ) f Z ω) AR1) case: 1 αb)x t = Z t or X t = 1 αb) 1 Z t : f X ω) = 1 αe iω f Z ω) = 1 α cos ω + α ) 1 f Z ω) I {Z t } W N0, σ ) then f Z ω) = σ π 37 π < ω < π.

GAR1) case: 1 αb) δ X t = Z t and we have: f X ω) = 1 α cos ω + α δ σ ) π π < ω < π. Thus, the basic properties turning points, etc.) don t change. Therefore, it s hard to identify a δ class from an integer class as they have very similar properties. 3. Parameter Estimation 3..1 Maximum Likelihood Estimation Need to give a distribution to {Z t } rather than just white noise, therefore we assume {Z t } NID0, σ ). L = π) n Σ { 1 exp 1 } XT Σ 1 X where X = X 1,..., X n ) T, Σ = EXX T ). We can do MLE by numerical maximisation of L which gives ML estimates. 3.. Peridogram Estimation An alternative method of estimating α, δ and σ is to take log f X ω) = δ log α cos ω + α ) + C and let I X ω) be an estimate based on the periodogram) of f X ω). Then we have so: log f X ω) + log I X ω) = δ log α cos ω + α ) + C + log I X ω) log I X ω) = δ log α cos ω + α ) + log IX ω) + C. f X ω) Now, because I X ω) are uncorrelated random variables, we can write this as a regression type equation by letting y j = log I X ω j ) x j = log1 α cos ω + α ) b = δ e j = log a = c IX ω) f X ω) Putting the above together we have: y j = a + bx j + e j. 38

We can then estimate parameters by minimising the sum of squared errors: e j = y j a bx j ) but we can differentiate as out parameters are embedded, therefore we use numerical minimisation. In other words, since there is no explicit solution, we need to use the numerical method. 3..3 Whittle Estimation Procedure See Shitan and Peris 008). 3.3 Generalised Moving Average Models Standard MA1) models are defined as: X t = Z t + θz t 1 = 1 θb)z t. We didn t go into much detail, but we can write the Generalised Moving Average process of order 1, GMA1) as: X t = 1 θb) δ Z t We can get all of our results, as with the GAR1) by changing the sign of δ and letting θ = α. 39

Chapter 4 Analysis and Applications of Long Memory Time Series Let {X t } be a stationary time series and let ρ k be it s ACF at lag k. {X t } is said to be 1. No memory if ρ k = 0 for all k 0.. Short memory if ρ k φ k ; φ < 1. That is, ρ k decays to zero at an exponential rate. In this case, it is clear that there exists a constant A > 1 such that A k ρ k c as k. Equivalently, ρ k cφ k and hence ρ k <. 3. Long memory if ρ k δ, 0 < δ < 1. That is, ρ k decays slowly at the hyperbolic rate. In this case, k δ ρ k c as k. 4.1 Properties of Long Memory Time Series With an acf, ρ k that equivalently: k δ, 0 < δ < 1, we can check if ρ k is convergent by recalling 1 is convergent for p > 1 and divergent for p 1. rp r p is convergent for p > 1 and divergent for p 1. Thus ρk k δ is divergent as 0 < δ < 1. Thus ρ k =, however, ρ k 0 as k so we still have a stationary process. Note that the acf gives an indication of the memory type of the underlying process. We 40

can also look at the spectral properties. The sdf of a stationary process {X t } is: f X ω) = 1 π = 1 π k= [ 1 + ρ k e iωk ; π < ω < π ] ρ k cos ωk 1. No memory processes have ρ k = 0 for k 0 which implies that k=1 f X ω) = 1 π ; π < ω < π Thus the spectrum is uniformly distributed over π, π).. Short memory processes such as ARMA type processes have an sdf for all ω. If we look at the sdf at ω = 0, [ ] f X 0) = 1 1 + ρ k = c < π So for short memory type processes, we get a bounded point on the f X ω) axis. 3. Long memory type processes have a discontinuity at ω = 0: [ ] f X 0) = 1 1 + ρ k = π This the sdf is unbounded at ω = 0. Note that typically we plot with ω on the x axis and f X ω) on the y axis, so a short memory process will be continuous when crossing the y axis, but long memory processes will only approach the y axis asymptotically. k=1 k=1 4. Identification 1. Calculate the sample acf, r k.. Calculate the sample sdf or periodogram using I n ω) = 1 n x t e itω j t=1 ; π < ω j = πj n < π 3. Hurst coefficient. As famous hydrologist, Hurst 1951), notices that there is a relationship between the rescaled adjusted range, R, and long memory, or persistent time series. 41

The rescaled adjusted range is defined as [ Y i Y 1 i ] k Y n Y 1 ) R = max 1 i n [ min Y i Y 1 i ] 1 i n k Y n Y 1 ) where Y i = i t=1 X t. We further define the adjusted standard deviation as S = 1 n x i x) n i=1 The plots of log R S ) and log n were observed for many time series and it was noticed that for persistent time series a) the points were scattered around a straight line b) the slope was always greater than a half. Therefore we can write a regression line: R log = a + H logn) S where the slope coefficient H is called the Hurst coefficient. An estimate is: Ĥ = log R S log n 4.3 Fractional Integration and Long Memory Time Series Modelling In many time series problems, we usually consider the order of differencing, d, as either 0, 1 or. That is the dth difference of {X t } is stationary. Let Y t = 1 B) d X t A notation: X t Id). If {X t } I0) then no differencing is required to get a stationary series and the acf decays exponentially. If X t I1) then X t is non stationary and Y t = a + bt + Z t so EY t Y t 1 ) = b. Further if X t I) this implies that Y t = a + bt + ct + Z t Let u t = Y t Y t 1 then v t = u t u t 1 is a constant. Thus, we cannot accommodate the hyperbolic decay of the acf by integer differencing. It has been noticed that when 1 < d < 1 and d 0 or equivalently, 0 < d < 1 one can accommodate this hyperbolic decay of the acf. In particular we will show that when 0 < d < 1, Y t = 1 B) d X t represents a stationary, long memory time series. Note that in general then 1 < d < 1, 1 B) d X t is called fractional differencing or integration). 4

4.3.1 FARIMAp, d, q) If we fit an ARMAp, q) model to {Y t } in the form: Y t = 1 B) d X t we get: ΦB)Y t = ΘB)Z t where {Z t } W N0, 1) and 1 < d < 1 we have what s called fractionally integrated ARIMA or FARIMAp, d, q). To develop the theory, we first look at FARIMA0,d,0) or FDWN fractionally differenced white noise) given by: 1 B) d X t = Z t or d X t = Z t Note that this d X t = Z t is known as fractionally differenced white noise. Note that from here on take d as a fractional real number or 1 < d < 1. Theorem 4.3.1. 1. When d < 1, {X t} is stationary and equivalent to X t = d Z t = ψ j Z t j where ψ j = Γj + d) Γj + 1)Γd). When d > 1, {X t} is invertible and equivalent to π j X t j = Z t where π j = Γj d) Γj + 1)Γ d) Thus you need 1 < d < 1 otherwise, you still have fractional integration, but you ll have stationarity and not invertibility or invertibility without stationarity. Proof. 1. Recall that varx t ) = σ Γx + a) Γx + b) xa b so ψ j j d 1 for large j and therefore ψj for large x varx t ) σ j d 1). Now if we recall that r p is convergent for p > 1 and divergent for p 1 we require varx t ) < which will only occur if d 1) > 1 or d < 1. 43

. Similarly for invertibility, we need π j < which only occurs when d > 1. Exercise 4.3.1. Show that the sdf of fractionally differenced white noise is given by f X ω) = d π [ ω )] d sin Recall that if you have X t and Y t stationary, with Y t = GB)X t then f Y ω) = Ge iω ) f X ω) so with X t = 1 B) d Z t = GB)Z t we have f X ω) = 1 e iω d f Z ω) = 1 cos ω) + i sin ω d fz ω) = 1 cos ω) + i sin ω ) d fz ω) = 1 cos ω) + sin ω ) d fz ω) = cos ω) d f Z ω) = d sin ω ) d fz ω) = d sin ω ) d fz ω) Note that we ve used the trigonometric identity: cos A = 1 sin A. Note that for 0 < d < 1, f Xω) as ω 0. We can also see this by noting that for small ω, sin ω ω so near zero, f Xω) ω d f Z ω) so as ω 0 we have a discontinuity. Note that when 1 < d < 0, f Xω) exists near ω = 0, therefore the series has short memory behaviour. It is clear that, a FARIMA process has long memory properties when 0 < d < 1. Exercise: Let γ k be the acf at lag k of a FDWN. Show that 1. γ k = 1) k Γ1 d) Γ d+k+1)γ d k+1).. ρ k = Γk+d)Γ1 d) Γk d+1)γd). Exercise: Find the pacf at lags 1 and of of a FDWN. Ans: d, 1 d d. d 44

4.3. Estimation of d First note that f X ω) = 4 sin ω ) d fz ω) therefore taking logs we have log f X ω) ) = d log 4 sin ω ) + log f Z ω) ) Let I X ω) be the periodogram based on n observations. Now, log I X ω) ) = d log 4 sin ω ) + log I X ω) log f X ω) + log f Z ω) = log f Z 0) d log 4 sin ω ) IX ω) fz ω) + log + log f X ω) f Z 0) Since Z t W N0, σ ), f Z ω) = σ π for 0 < ω < π. So in this case, f Z ω) f Z 0) log1) = 0, so we can drop this term. In general this will hold for ω near 0. So, letting = 1 and where ω j = πj n y j = log I X ω j ) x j = log 4 sin ω ) j b = d IX ω j ) ε j = log f X ω j ) a = log f Z 0) for j = n 1),..., n 1). Thus we can write y j = a + bx j + ε j where the distribution of ε j is unknown. In practice we set j = 1,,..., m < n. 4.3.3 Distribution of ε j We ve seen that for short memory time series, I X ω j ) asy 1 [ n ] f Xω j )χ ; j 0, Theorem 4.3.. Let X t be a process generated by 1 B) d X t = Z t with 1 < d < 0. Then for large n, IX ω j ) [ n ] log ; j 0, f X ω j ) 45

follows an independent Gumble distribution with mean δ = 0.57716 Eulers constant) and variance π. Note that Y is said to be a Gumble distribution with parameters α and 6 β if the cdf of Y is: y α F Y y) = exp { e β ) } In this case, EY ) = α + βδ and vary ) = π β 6. Proof. When d < 0, with pdf L X x) = 1 e x I X ω j ) 1 f X ω j ) χ exp and EX) = 1 and varx) = 1. Let 4 IX ω j ) U = log f X ω j ) then, ) IX ω j ) F u) = P U u) = P log u f X ω j IX ω j ) = P f X ω j ) e u IX ω j ) = P f X ω j ) e u = exp { e u} Noting that if X expλ) then P X > x) = e λx. Thus we have a Gumble type distribution with α = 0 and β = 1, therefore, EU) = δ and varu) = π, neither of which depend on j therefore we have an iid sequence and we 6 can use standard regression techniques. So our model is: Our least squares estimate of b j is: y j = a + bx j + ε j ˆb = m j=1 x j x)y j m j=1 x j x) m < n and It is easy to see that varˆb) = â = ȳ ˆb x. σ xj x) = π 6 x j x) Let ˆd p be the corresponding estimate of d based on the periodogram approach: ˆdp = ˆb. 46

Under certain conditions and large n, we have the usual result: ˆd p d SE ˆd p ) N 0, 1) Now in the general case we have X t = 1 B) d Y t with f X ω) = 4 sin ω ) d fy ω) and log f X ω) = d log 4 sin ω ) + log f Y ω) thus log I X ω) = d log 4 sin ω ) + log IX ω) + log f Y ω) f X ω) Note that at this stage, we don t have a regression as we log f Y ω) is not a constant, therefore we add and subtract log f Y 0): log I X ω) = log f Y 0) d log 4 sin ω ) IX ω) fy ω) + log + log f X ω) f Y 0) Now, log fy ω) f Y is not a constant for all ω 0) j as it was previously) but it is a constant when ω j 0. Thus, this can be considered as a simple linear regression equation if we consider the values of ω near the origin. In this case, y j = a + bx j + ε j ; ω j = πj n ; j = 1,,..., m n In practice we usually take m = n α with 0 < α < 1, a popular choice is α = 0.5. 4.3.4 Estimation of d using a smoothed periodogram We can also estimate d using a smoothed, or weighted, periodogram. Let ˆf s ω) = 1 iωs λs) ˆRs)e π s <m be a smoothed periodogram based on a suitable lag window. In this case, where ν = References: n s <n and a = 1. λs) ν ˆf s ω) fω) aχ ν V. Reisen 1994) J. Time Series Analysis 153) 335-350 Chen, Abraham and Peiris 1994) J. Time Series Analysis 155) 473-487 47

Generalized Fractional Processes 48

. 49

. 50

. 51