Levinson Durbin Recursions: I note: B&D and S&S say Durbin Levinson but Levinson Durbin is more commonly used (Levinson, 1947, and Durbin, 1960, are source articles sometimes just Levinson is used) recursions solve Γ n a n = γ n (1) efficiently, giving us the coefficients a n needed for best linear predictor X n+1 = a nx n of X n+1 given X n = [X n,..., X 1 ] in doing so, L D recursions also give us coefficients a m for X m+1 = a mx m, m = 1,..., n 1, the best linear predictor of X m+1 given X m = [X m,..., X 1 ] partial autocorrelation function (PACF), also known as partial autocorrelation sequence or reflection coefficient sequence will state L D recursions without proof (B&D have one; S&S leave it as exercise; will give alternative proof in Stat/EE 520) BD 69, CC 113, SS 112, 165 XI 1
Levinson Durbin Recursions: II to keep track of best linear predictors as sample size n increases (and to emphasize certain connections with AR processes), will switch notation from a n to φ n henceforth we now write X n+1 = φ n,1 X n + φ n,2 X n 1 + + φ n,n X 1 = φ nx n where φ n [φ n,1, φ n,1,..., φ n,n ] simplify γ n (1) to just γ n so that γ n = [γ(1), γ(2),..., γ(n)] in new notation, L D recursions solve for φ n in Γ n φ n = γ n recall that Γ n is covariance matrix for X n, so its (i, j)th element is cov {X i, X j } = γ(i j) XI 2
Levinson Durbin Recursions: III referring back to overhead X 13, will denote mean square error (MSE) associated with predictor X n+1 as v n E{(X n+1 X n+1 ) 2 } = var {X n+1 X n+1 } 3. = γ(0) φ nγ n = var {X n+1 } φ n cov {X n+1, X n } BD 69, 70 XI 3
Levinson Durbin Recursions: IV for n = 1, have X 2 φ 1,1 X 1 equation Γ n φ n = γ n becomes γ(0)φ 1,1 = γ(1) solution is φ 1,1 = γ(1)/γ(0) = ρ(1) associated MSE is ( ) v 1 = γ(0) φ 1 γ 1 = γ(0) φ 1,1 γ(1) = γ(0) φ 1,1 [φ 1,1 γ(0)] (making use of ( )) = γ(0)(1 φ 2 1,1 ) = v 0(1 φ 2 1,1 ) with v 0 γ(0) Q: why is γ(0) a natural definition for v 0? note connection to AR(1) model X t = φ 1,1 X t 1 + Z t with {Z t } WN(0, σ 2 (1 φ 2 1,1 )), for which γ(0) = σ2 BD 70, SS 112 XI 4
Levinson Durbin Recursions: V given φ n 1 & v n 1, L D recursion gets φ n & v n in 3 steps 1. get nth order partial autocorrelation (more on this later!): φ n,n = γ(n) n 1 j=1 φ n 1,jγ(n j) v n 1 note: sum is inner product of φ n 1 & order reversal of γ n 1 2. get remaining φ n,j s: φ n,1. = φ n,n φ n,n 1 3. get nth order MSE: φ n 1,1. φ n 1,n 1 v n = v n 1 (1 φ 2 n,n) φ n 1,n 1. φ n 1,1 BD 70, SS 112 XI 5
Levinson Durbin Recursions: VI as a first example, reconsider AR(1) process X t = φx t 1 +Z t, where φ < 1 and {Z t } WN(0, σ 2 ) have already argued (X 14) that X n+1 = φx n φ n = [φ, 0,..., 0] and v n = σ 2 for all n since MSE is σ 2 accordingly, let s apply L D recursions to φ n 1 = [φ, 0,..., 0] & v n 1 = σ 2 and see if required forms for φ n and v n pop out step 1: recalling that γ(h) = σ 2 φ h /(1 φ 2 ) for h 0, we have φ n,n = γ(n) n 1 j=1 φ n 1,jγ(n j) v n 1 = σ 2φn φ n 1,1 φ n 1 v n 1 (1 φ 2 ) = σ 2 φ n φ n v n 1 (1 φ 2 ) = 0 XI 6
Levinson Durbin Recursions: VII step 2: yields φ n,1 φ n,2. φ n,n 2 φ n,n 1 = φ n 1,1 φ n 1,2. φ n 1,n 2 φ n 1,n 1 φ n,1 φ φ n,2. φ n,n 2 = 0. 0 0 φ n,n 1 0 so φ n = [φ, 0,..., 0] as required φ n,n 0 0. 0 φ = φ n 1,n 1 φ n 1,n 2. φ n 1,2 φ n 1,1 φ 0. 0 0, XI 7
Levinson Durbin Recursions: VIII step 3: v n = v n 1 (1 φ 2 n,n) = v n 1 = σ 2, as required note: partial autocorrelation φ n,n for AR(1) process is φ for n = 1 and is zero for n = 2, 3,... homework exercise: run L D recursions on MA(1) process as 2nd example, reconsider stationary process of Problem 3(b): X t = Z 1 cos (ωt) + Z 2 sin (ωt), where Z 1 and Z 2 are independent N (0, 1) RVs ACVF for {X t } is γ(h) = cos (ωh) (same as is its ACF ρ(h)) starting with X 2 φ 1,1 X 1 (n = 1 case), we have φ 1,1 = ρ(1) = cos (ω) and v 1 = γ(0)(1 φ 2 1,1 ) = 1 cos2 (ω) = sin 2 (ω) XI 8
Levinson Durbin Recursions: IX now let us get coefficients for X 3 φ 2,1 X 2 + φ 2,2 X 1 (n = 2 case) using L D recursions first step φ n,n = γ(n) n 1 j=1 φ n 1,jγ(n j) v n 1, yields, for n = 2 (recalling γ(h) = cos (ωh) & φ 1,1 = cos (ω)), φ 2,2 = γ(2) φ 1,1γ(1) v 1 = cos (2ω) cos (ω) cos (ω) sin 2 (ω) = 1 because of trig identity cos (2ω) cos 2 (ω) = sin 2 (ω) XI 9
Levinson Durbin Recursions: X second step of L D recursions, namely, φ n,1 φ n 1,1. =. φ n,n φ n,n 1 φ n 1,n 1 yields, for n = 2, φ n 1,n 1. φ n 1,1, φ 2,1 = φ 1,1 φ 2,2 φ 1,1 = cos (ω)[1 ( 1)] = 2 cos (ω) third step of L D recursions, namely, v n = v n 1 (1 φ 2 n,n) yields, for n = 2, v 2 = v 1 [1 ( 1) 2 ] = 0 thus X 3 is perfectly predicable given X 2 & X 1 : X 3 = 2 cos (ω)x 2 X 1 = X 3 thus, for all t, X t is perfectly predicable given X t 1 & X t 2 : X t = 2 cos (ω)x t 1 X t 2 = X t (Q: why?) BD 77 XI 10
Aside Step-Down Levinson Durbin Recursions: I application of L D recursions to AR(p) process yields, for n p, Y t = φ 1 Y t 1 + + φ p Y t p + Z t Ŷ n+1 = φ n,1 Y n + + φ n,n Y 1 = φ 1 Y n + + φ p Y n p+1, i.e., Ŷ n+1 only depends on p most recent values and, when n > p, not on remote values Y n p,..., Y 1 associated prediction error is Y n+1 Ŷn+1 = Y n+1 φ 1 Y n φ p Y n p+1 = Z n+1, so MSE is v n = var {Y n+1 Ŷn+1} = var {Z n+1 } = σ 2 given φ p,1 = φ 1, φ p,2 = φ 2,..., φ p,p = φ p and σ 2, can invert L D recursions to get coefficients for best linear predictors of orders p 1, p 2,..., 1 and associated MSEs XI 11
Aside Step-Down Levinson Durbin Recursions: II given φ h,1,..., φ h,h & v h, compute 1. φ h 1,j = φ h,j+φ h,h φ h,h j 1 φ 2 h,h 2. v h 1 = v h /(1 φ 2 h,h ), 1 j h 1 step-down L D recursion yields φ h 1,1,..., φ h 1,h 1 & v h 1 start with φ p,1 = φ 1,..., φ p,p = φ p & v p = σ 2 apply step-down recursions to get φ p 1,j s & v p 1, φ p 2,j s & v p 2,..., φ 1,1 & v 1 as opposed to usual L D recursions, step-down L D recursions do not make use of ACVF γ(h) for {Y t } in fact, given φ 1, φ 2,..., φ p & σ 2, can use results of step-down L D recursions to compute γ(h) (yet another method!) XI 12
Aside Step-Down Levinson Durbin Recursions: III to do so, return to overhead XI 4 and note that γ(0) v 0 = v 1 /(1 φ 2 1,1 ) γ(1) = γ(0)φ 1,1 next go to overhead XI 5, grab φ n,n = γ(n) n 1 j=1 φ n 1,jγ(n j) v n 1 and manipulate it to get γ(n) = φ n,n v n 1 + n 1 j=1 φ n 1,j γ(n j) and thus γ(2) = φ 2,2 v 1 + φ 1,1 γ(1) γ(3) = φ 3,3 v 2 + φ 2,1 γ(2) + φ 2,2 γ(1) etc., ending with γ(p) = φ p,p v p 1 + φ p 1,1 γ(p 1) + + φ p 1,p 1 γ(1) XI 13
Aside Step-Down Levinson Durbin Recursions: IV to get γ(p + 1), γ(p + 2),..., make use of an equation stated on overhead IX 50: γ(k) = φ 1 γ(k 1) + + φ p γ(k p), which holds for all k p + 1 note: can now argue that AR coefficients φ 1, φ 2,..., φ p and sequence of partial autocorrelations φ 1,1, φ 2,2,..., φ p,p are equivalent to one another (in particular, φ p,p = φ p ) we now return to our regularly scheduled program... XI 14
One-Step-Ahead Prediction Errors (Innovations): I given time series X 1, X 2,..., can use L D recursions to find coefficients φ m 1 for X m i.e., best linear predictor of X m given X m 1,..., X 1 define X 1 = 0 and X n = [ X n, X n 1,..., X 1 ] letting m = 1, 2,..., n, can generate a series of one-step-ahead prediction errors (or innovations): U m = X m X m collect these into U n = [U n, U n 1..., U 1 ] so that we can write U n = X n X n BD 71, SS 114 XI 15
One-Step-Ahead Prediction Errors (Innovations): II can write U n = A nx n, where A n is lower triangular: 1 0 0 0 0 φ n 1,1 1 0 0 0 A n =...... 0 0 φ n 1,n 3 φ n 2,n 4 1 0 0 φ n 1,n 2 φ n 2,n 3 φ 2,1 1 0 φ n 1,n 1 φ n 2,n 2 φ 2,2 φ 1,1 1 inverse of A n is also lower triangular, so let s write it as 1 0 0 0 0 θ n 1,1 1 0 0 0 C n...... 0 0 θ n 1,n 3 θ n 2,n 4 1 0 0 θ n 1,n 2 θ n 2,n 3 θ 2,1 1 0 θ n 1,n 1 θ n 2,n 2 θ 2,2 θ 1,1 1 BD 72, SS 114 XI 16
One-Step-Ahead Prediction Errors (Innovations): III since C n is inverse of A n, U n = A nx n leads to X n = C nu n ; i.e., time series can be reexpressed in terms of its innovations recall that L D recursions give v m 1 = E{(X m X m ) 2 } = var{u m }, m = 1, 2,..., n can use so-called innovations algorithm to get both v m and elements of C m (note: take sum with upper limit 1 to be 0): θ m,m k = γ(m k) k 1 j=0 θ k,k jθ m,m j v j, 0 k < m v k v m = γ(0) m 1 j=0 θ 2 m,m j v j start with v 0 = γ(0), get θ 1,1 & v 1, get θ 2,2, θ 2,1 & v 2 etc. BD 72, 73, SS 114 XI 17
One-Step-Ahead Prediction Errors (Innovations): IV since X n = C nu n, can write (with θ m,0 1), m X m+1 = θ m,j U m j+1, m = 1, 2,..., n 1, j=0 i.e., linear combination of innovations yields time series since X n = X n U n = C nu n U n = (C n I n )U n, where I n is the n n identity matrix, can also write m X m+1 = θ m,j U m j+1, m = 1, 2,..., n 1, j=1 i.e., linear combination of innovations also yields predictions HW exercise: innovations U 1, U 2,..., U n are uncorrelated BD 72, SS 114 XI 18
Aside Simulation of ARMA Processes: I often of interest to generate realizations of ARMA processes first consider stationary & causal Gaussian AR(p) process: Y t φ 1 Y t 1 φ p Y t p = Z t, {Z t } Gaussian WN(0,σ 2 ) recall that, for any t p + 1, best linear predictor Ŷt of Y t given Y t 1,..., Y 1 takes the form Ŷ t = φ 1 Y t 1 + + φ p Y t p innovations are U t = Y t Ŷt = Z t and have MSE v t 1 var {U t } = σ 2 can use step-down L D recursions to get coefficients for Ŷ t = φ t 1,1 Y t 1 + + φ t 1,t 1 Y 1, t = 2, 3,..., p and associated MSEs v t 1 (recall that Ŷ1 = 0 by definition) XI 19
Aside Simulation of ARMA Processes: II innovations U t = Y t Ŷt, t = 1,..., p are such that 1. E{U t } = 0 and var {U t } = v t 1 2. U 1, U 2,..., U p are uncorrelated RVs (homework exercise) implies independence under Gaussian assumption easy to simulate U t s: generate p independent realizations of N (0, 1) RVs, say, Z 1,..., Z p, and set U t = v 1/2 t 1 Z t can unroll U t s to get simulations of Y t s, t = 1,..., p: U 1 = Y 1 Ŷ1 = Y 1 yields Y 1 = U 1 U 2 = Y 2 Ŷ2 = Y 2 φ 1,1 Y 1 yields Y 2 = φ 1,1 Y 1 + U 2 U 3 = Y 3 Ŷ3 = Y 3 φ 2,1 Y 2 φ 2,2 Y 1 yields Y 3 = φ 2,1 Y 2 + φ 2,2 Y 1 + U 3 XI 20
finally Aside Simulation of ARMA Processes: III yields U p = Y p Ŷp = Y p φ p 1,1 Y p 1 φ p 1,p 1 Y 1 Y p = φ p 1,1 Y p 1 + + φ p 1,p 1 Y 1 + U p can now generate remainder of desired simulated series using Y t = φ 1 Y t 1 + + φ p Y t p + σ Z t, t = p + 1, p + 2,..., where Z t s are independent realizations of N (0, 1) RVs (these are independent of Z 1,..., Z p also) XI 21
Aside Simulation of ARMA Processes: IV knowing how to simulate AR process φ(b)y t = Z t, can in turn simulate ARMA process φ(b)x t = θ(b)z t since we can create ARMA process {X t } by applying filter θ(b) to AR process {Y t }: X t = θ(b)y t = θ(b)φ 1 (B)Z t, i.e., φ(b)x t = θ(b)z t (see overhead IX 47) hence can generate simulated ARMA series of length n via X t = Y t + θ 1 Y t 1 + + θ q Y t q, t = q + 1,..., q + n; i.e., need to make simulated AR series of length n + q XI 22
Example Simulation of ARMA(2,2) Process: I consider ARMA(2,2) process given by X t = 3 4 X t 1 2 1X t 2 + Z t + 10 7 Z t 1 10 1 Z t 2, {Z t } WN(0, 1), so that v 2 = 1 to simulate AR(2) process Y t = 4 3Y t 1 2 1Y t 2 + Z t, need to run reverse L D recursions once to obtain φ 1,1 = φ 2,1 + φ 2,2 φ 2,1 1 φ 2 2,2 = 3 4 1 2 3 4 1 1 4 = 1 2, v 1 = v 2 1 φ 2 2,2 = 4 3 and hence v 0 = v 1 1 φ 2 1,1 = 16 9 XI 23
Example Simulation of ARMA(2,2) Process: II thus would generate AR(2) process using Y 1 = 4 3 Z 1 Y 2 = 1 2 Y 1 + 2 3 Z2 Y 3 = 3 4 Y 2 1 2 Y 1 + Z 3. Y n+2 = 3 4 Y n+1 1 2 Y n + Z n+2, where Z t s are IID N (0, 1) RVs desired ARMA(2,2) process is given by X t = Y t+2 + 7 10 Y t+1 1 10 Y t, t = 1,..., n overhead VIII 24 shows AR(2) series (n = 100) used to form ARMA(2,2) simulation (n = 98) in next overhead XI 24
Realization of Second AR(2) Process 0 20 40 60 80 100 4 2 0 2 t x t VIII 24
Realization of ARMA(2,2) Process 0 20 40 60 80 100 4 2 0 2 t x t XI 25
Aside Simulation of ARMA Processes: V method described here deemed exact because of use of socalled stationary initial conditions (method used in R function arima.sim is not exact makes use of a burn-in period) source article is Kay (1981), which is just over a page in length, making it one of the shortest useful articles relevant to time series analysis (shortest is undoubtedly David, 1985!) XI 26
Multi-Step-Ahead Prediction: I reconsider one-step-ahead predictor X n+1 of X n+1 given X n, X n 1,..., X 1 in preparation for considering multi-step-ahead prediction, will now denote X n+1 by X n+1 n X n+1 n can be written as either a linear combination of previous time series values or previous innovations: n X n+1 n = φ n,j X n j+1 or X n n+1 n = θ n,j U n j+1 j=1 j=1 for a given h 2, want to formulate best linear predictor X n+h n of X n+h given X n, X n 1,..., X 1 XI 27
Multi-Step-Ahead Prediction: II first approach: replacing n in n X n+1 n = φ n,j X n j+1 with n + h 1 gives X n+h n+h 1 = j=1 n+h 1 j=1 φ n+h 1,j X n+h j above involves unobserved X n+h 1,..., X n+1, but replacing these with X n+h 1 n,..., X n+1 n, gives desired predictor: X n+h n = h 1 j=1 φ n+h 1,j Xn+h j n + n+h 1 j=h φ n+h 1,j X n+h j XI 28
Multi-Step-Ahead Prediction: III leads to recursive scheme for computing X n+h n starting with one-step-ahead predictor X n+1 n (we know how to get this!) two-step-ahead predictor: replace X n+1 in with X n+1 n to get X n+2 n+1 = n+1 j=1 X n+2 n = φ n+1,1 Xn+1 n + φ n+1,j X n+2 j n+1 j=2 φ n+1,j X n+2 j XI 29
Multi-Step-Ahead Prediction: IV three-step-ahead predictor: replace X n+2 & X n+1 in X n+3 n+2 = n+2 j=1 with X n+2 n & X n+1 n to get φ n+2,j X n+3 j n+2 X n+3 n = φ n+2,1 Xn+2 n +φ n+2,2 Xn+1 n + j=3 yadda, yadda, yadda, coming eventually to the desired X n+h n = h 1 j=1 φ n+h 1,j Xn+h j n + n+h 1 j=h φ n+2,j X n+3 j φ n+h 1,j X n+h j XI 30
Multi-Step-Ahead Prediction: V since X n+1 n,..., X n+h 1 n are all linear combinations of X n,..., X 1, it follows that X n+h n is also such: X n+h n = h 1 j=1 φ n+h 1,j Xn+h j n + n a j X n j+1 j=1 n+h 1 j=h φ n+h 1,j X n+h j can show that a n = [a 1,..., a n ] so defined is a solution to Γ n a n = γ n (h), where n n matrix Γ n has (i, j)th entry of γ(i j), while γ n (h) = [γ(h),..., γ(h + n 1)] XI 31
Multi-Step-Ahead Prediction: VI second approach: replacing n in n X n+1 n = θ n,j U n j+1 with n + h 1 gives X n+h n+h 1 = j=1 n+h 1 j=1 θ n+h 1,j U n+h j above involves unobserved U n+h 1,..., U n+1, but replacing these with their expected values (zero!) gives desired predictor: n+h 1 n X n+h n = θ n+h 1,j U n+h j = θ n+h 1,n+h j U j j=h j=1 XI 32
Multi-Step-Ahead Prediction: VII MSE of h-step-ahead forecast is E{(X n+h X n+h n ) 2 } = E{X 2 n+h } 2E{X n+h X n+h n } + E{ X 2 n+h n } = γ(0) E{ X 2 n+h n } = γ(0) var { X n+h n } since E{X 2 n+h } = γ(0) and E{X n+h X n+h n } = E{ X 2 n+h n } (homework exercise!) since var {U j } = v j 1 and U j s are uncorrelated, { n } var { X n n+h n } = var θ n+h 1,n+h j U j = θn+h 1,n+h j 2 v j 1 j=1 MSE is thus given by E{(X n+h X n+h n ) 2 } = γ(0) j=1 n θn+h 1,n+h j 2 v j 1 σn(h) 2 j=1 BD 74, 75 XI 33
Multi-Step-Ahead Prediction: VIII under a Gaussian assumption, can use above to form 95% prediction bounds for unknown X n+h : X n+h n ± 1.96σ n (h) as example, consider 1st part of wind speed series x 1,..., x 100 after centering x t by subtracting off its sample mean x, we model x t = x t x as an AR(1) process X t = φx t 1 + Z t with φ estimated by ˆφ = ˆρ(1). = 0.856 (cf. overhead X 16) based on x 1,..., x 100, forecast last 28 values x 101 x,..., x 128 x of time series and see how well we do following overheads show results from homegrown R code based on theory presented above built-in R functions ar and predict BD 74, 75 XI 34
Multi-Step-Ahead Prediction of Wind Speed x t 2 0 2 4 0 20 40 60 80 100 120 t XI 35
Multi-Step-Ahead Prediction of Wind Speed using R x t 2 0 2 4 0 20 40 60 80 100 120 t XI 36
Predictions Based on Infinite Past: I rather than using X n,..., X 1 to predict X n+h, suppose we use, for some m 0, X n,..., X 1, X 0, X 1,..., X m and form a predictor to be denoted by X n+h n,m by letting m and assuming limit exists (in MS sense), can write X n+h n, = α j X n j+1 j=1 where α j s are set by a version of the orthogonality principle: cov {X n+h α j X n j+1, X n i } = 0, i = 0, 1,... j=1 BD 75, SS 115 XI 37
Predictions Based on Infinite Past: II refer to X n+h n, as predictor of X n+h based on infinite past X n, X n 1,... associated prediction error X n+h X n+h n, has MSE E{(X n+h X n+h n, ) 2 } = var {X n+h X n+h n, }, which can be compared to var {X n+h X n+h n } to see how much can be gained from having lots more data (recall that X n+h n is based on just X n, X n 1,..., X 1 ) BD 75, 76, SS 115 XI 38
Predictions Based on Infinite Past: III applying representation X t = ψ j Z t j at t = n + h yields X n+h = j=0 ψ j Z n+h j j=0 consider Z t s that make up X n+h but not X n ; i.e., Z n+h, Z n+h 1,..., Z n+1 replacing these h RVs by their expected values (zero) gives X n+h n, = ψ j Z n+h j j=h prediction error is thus X n+h X n+h n, = ψ j Z n+h j j=0 XI 39 ψ j Z n+h j = j=h h 1 j=0 ψ j Z n+h j
Predictions Based on Infinite Past: IV since {Z t } WN(0, σ 2 ), variance of X n+h X n+h n, = i.e., MSE of X n+h n,, is given by h 1 j=0 ψ j Z n+h j var {X n+h X h 1 n+h n, } = σ 2 in particular, for h = 1, MSE is var {X n+1 X n+1 n, } = σ 2 rather than v n = var {X n+1 X n+1 n } j=0 homework exercise: compare MSEs for specific MA(1) and AR(1) processes with specific sample sizes n ψ 2 j CC 196, SS 116 XI 40
References H. A. David (1985), Bias of S 2 Under Dependence, The American Statistician, 39, p. 201 J. Durbin (1960), The Fitting of Time Series Models, Revue de l Institut International de Statistique/Review of the International Statistical Institute, 28, pp. 233 44 S. M. Kay (1981), Efficient Generation of Colored Noise, Proceedings of the IEEE, 69, pp. 480 1 N. Levinson (1947), The Wiener RMS Error Criterion in Filter Design and Prediction, Journal of Mathematical Physics, 25, pp. 261 78 XI 41