Akaike criterion: Kullback-Leibler discrepancy

Model choice. Akaike s criterion Akaike criterion: Kullback-Leibler discrepancy Given a family of probability densities {f ( ; ψ), ψ Ψ}, Kullback-Leibler s index of f ( ; ψ) relative to f ( ; θ) is (ψ θ) = E θ ( 2 log(f (X ; ψ))) = 2 log(f (x; ψ))f (x; θ) dx. R n Kullback-Leibler s discrepancy between f ( ; ψ) and f ( ; θ) is ( ) f (x; ψ) d(ψ θ) = (ψ θ) (θ θ) = 2 log f (x; θ) dx. R n f (x; θ) Jensen s inequality implies E(log(Y )) log(e(y )) for any random variable. Hence ( ) f (x; ψ) d(ψ θ) 2 log f (x; θ) dx = 0 R n f (x; θ) with equality only if f (x; ψ) = f (x; θ) a.e. [f ( ; θ)]. 24 novembre 2014 1 / 29

Model choice. Akaike s criterion Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ θ) among all candidate models ψ, given the true model θ. As the true model is unknown, we estimate d(ψ θ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 Indeed remember that for an ARMA(p,q) process { L(φ, ϑ, σ 2 ) = (2πσ 2 ) n/2 (r 0... r n 1 ) 1/2 exp 1 } S(φ, ϑ) 2σ2 with S(φ, ϑ) = n (x j ˆx j ) 2. j=1 r j 1 r 0,..., r n 1 depend only on parameters (φ, ϑ) and not on observed data. Data enter likelihood only through the terms (x j ˆx j ) 2 in S(φ, ϑ). 24 novembre 2014 2 / 29

Model choice. Akaike s criterion Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ θ) among all candidate models ψ, given the true model θ. As the true model is unknown, we estimate d(ψ θ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 S X ( ˆφ, ˆϑ) ˆσ 2 24 novembre 2014 3 / 29

Model choice. Akaike s criterion Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ θ) among all candidate models ψ, given the true model θ. As the true model is unknown, we estimate d(ψ θ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 S X ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 n = 24 novembre 2014 3 / 29

Model choice. Akaike s criterion Approximating Kullback-Leibler discrepancy Given observations X 1,..., X n, we would like to minimize d(ψ θ) among all candidate models ψ, given the true model θ. As the true model is unknown, we estimate d(ψ θ). Let ψ = (φ, ϑ, σ 2 ) the parameters of an ARMA(p,q) model and ˆψ the MLE based on X 1,..., X n. Let Y an independent realization of the same process. Then 2 log L Y ( ˆφ, ˆϑ, ˆσ 2 ) = n log(2π) + n log( ˆσ 2 ) + log(r 0... r n 1 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) ˆσ 2 S X ( ˆφ, ˆϑ) ˆσ 2 = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + S Y ( ˆφ, ˆϑ) n = ˆσ 2 ( ) E θ ( ( ˆψ θ)) = E (φ,ϑ,σ 2 )( 2 log L X ( ˆφ, ˆϑ, ˆσ 2 S Y ( ˆφ, ˆϑ) )) + E (φ,ϑ,σ 2 ) n. ˆσ 2 24 novembre 2014 3 / 29

Model choice. Akaike s criterion Kullback-Leibler discrepancy and AICC Using linear approximations, and asymptotic distributions of estimators, one arrives at ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q). E (φ,ϑ,σ 2 ) Similarly n ˆσ 2 = S X ( ˆφ, ˆϑ) for large n is distributed as σ 2 χ 2 (n p q 2) and is asymptotically independent of ( ˆφ, ˆϑ). Hence ( ) S Y ( ˆφ, ˆϑ) σ 2 (n + p + q) E (φ,ϑ,σ 2 ) ˆσ 2 σ 2 (n p q 2)/n From E θ ( ( ˆψ θ)) = E (φ,ϑ,σ 2 )( 2 log L X ( ˆφ, ˆϑ, ˆσ 2 )) + E (φ,ϑ,σ 2 ) AICC = 2 log L X ( ˆφ, ˆϑ, ˆσ 2 2(p + q + 1)n ) + n p q 2 is an approximate unbiased estimate of (ˆθ θ). ( ) SY ( ˆφ, ˆϑ) n ˆσ 2 24 novembre 2014 4 / 29

Model choice. Akaike s criterion Criteria for model choice The order is chosen by minimizing the value of AICC (Corrected Akaike s Information Criterion): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p+q+1)n n p q 2. The second term can be considered a penalty for models with a large number of parameters. For n large it is approximately the same as Akaike s information Criterion (AIC): 2 log L X ( ˆφ, ˆϑ, ˆσ 2 ) + 2(p + q + 1), but carries a higher penalty for finite n, and thus is somewhat less likely to overfit. In R: AICC <- AIC(myfit,k=2*n/(n-p-q-2)) A rule of thumb is the fits of model 1 and model 2 are not significantly different if AICC 1 AICC 2 < 2 (only the difference matters, not the absolute value of AICC). Hence, we may decide to choose model 1 if it simpler than 2 (or its residuals are closer to white-noise) even if AICC 1 > AICC 2 as long as AICC 1 < AICC 2 + 2. 24 novembre 2014 5 / 29

Model choice. Akaike s criterion Tests on residuals ˆX t ( ˆϕ, ˆϑ) predicted value of X t given the estimates ( ˆϕ, ˆϑ). Ŵ t = X t ˆX t ( ˆϕ, ˆϑ) ( ) 1/2 standardized residuals. r t 1 ( ˆϕ, ˆϑ) Portmanteau tests on ACF of Ŵ t : Box-Pierce; Ljung-Box; Test on turning points Rank tests... 24 novembre 2014 6 / 29

Autocovariance A mutivariate stochastic process {X t R m }, t Z is weakly stationary if E(Xt,i) 2 < t, i E(X t ) µ, Cov(X t+h, X t ) Γ(h). In particular γ ij (h) = Cov(X t+h,i, X t,j ) = E((X t+h,i µ i )(X t,j µ j )). Note that in general γ ij (h) γ ji (h), while γ ij (h) = Cov(X t+h,i, X t,j ) = (stationarity) = Cov(X t,i, X t h,j ) = (symmetry) = Cov(X t h,j, X t,i ) = γ ji ( h). 24 novembre 2014 7 / 29

Autocovariance A mutivariate stochastic process {X t R m }, t Z is weakly stationary if E(X 2 t,i) < t, i E(X t ) µ, Cov(X t+h, X t ) Γ(h). In particular γ ij (h) = Cov(X t+h,i, X t,j ) = E((X t+h,i µ i )(X t,j µ j )). Note that in general γ ij (h) γ ji (h), while γ ij (h) = Cov(X t+h,i, X t,j ) = (stationarity) = Cov(X t,i, X t h,j ) = (symmetry) = Cov(X t h,j, X t,i ) = γ ji ( h). Another simple property is γ i,j (h) (γ ii (0)γ jj (0)) 1/2. The ACF ρ ij (h) = γ ij (h) (γ ii (0)γ jj (0)) 1/2. 24 novembre 2014 7 / 29

Multivariate White-noise and MA A mutivariate stochastic process {Z t R m } is a white-noise with covariance S, {Z t } WN(0, S), if { S h = 0 {Z t } is stationary with mean 0 and ACVF Γ(h) = 0 h 0. {X t R m } is a linear process if X t = and C k are matrices s.t. + k= + k= C k Z t k {Z t } WN(0, S) (C k ) ij < + for all i, j = 1... m. 24 novembre 2014 8 / 29

Multivariate White-noise and MA A mutivariate stochastic process {Z t R m } is a white-noise with covariance S, {Z t } WN(0, S), if { S h = 0 {Z t } is stationary with mean 0 and ACVF Γ(h) = 0 h 0. {X t R m } is a linear process if X t = and C k are matrices s.t. + k= + k= {X t } is stationary and Γ X (h) = C k Z t k {Z t } WN(0, S) (C k ) ij < + for all i, j = 1... m. k= C k+h SC t k. 24 novembre 2014 8 / 29

Estimation of mean The mean µ can be estimated through X n. From the univariate theory, we know E( X n ) = µ, V(( X n ) i ) 0 (as n ), if γ ii (h) h 0 nv(( X n ) i ) + h= γ ii (h) if + h= γ ii (h) < +. Moreover ( X n ) i is asymptotically normal. Stronger assumptions are required for the vector X n to be asymptotically normal Theorem If X t = µ + then n 1/2 ( X n µ) = N(0, + k= C k Z t k {Z t } WN(0, S) k= C k+h SC t k ). 24 novembre 2014 9 / 29

Confidence intervals for the mean In principle, from X n N(µ, 1 n m-dimensional confidence ellipsoid. But... k= C k+h SCk t ) one could build an 24 novembre 2014 10 / 29

Confidence intervals for the mean In principle, from X n N(µ, 1 C k+h SCk t ) one could build an n k= m-dimensional confidence ellipsoid. But... not intuitive, C k and S not known and have to be estimated... Instead, build confidence intervals from ( X n ) i N(µ i, 1 n + h= + h= γ ii (h)). γ ii (h) = 2πf i (0) can be consistently estimated from r ( 2πˆf i (0) = 1 h ) ˆγ ii (h) where r n and r n r n 0. h= r Componentwise confidence intervals can be combined. If we found u i (α) s.t. P( µ i ( X n ) i < u i (a)) 1 α, then m P( µ i ( X n ) i <u i (a), i = 1, m) 1 P ( µ i ( X n ) i u i (a) ) 1 mα. i=1 24 novembre 2014 10 / 29

Estimation of ACVF (bivariate case, m = 2) 1 n h (X t+h X n )(X t X n ) t 0 h < n n t=1 ˆΓ(h) = 1 n (X t+h n X n )(X t X n ) t n < h < 0. t= h+1 ˆρ ij (h) = ˆγ ij (h)(ˆγ ii (0)ˆγ jj (0)) 1/2. 24 novembre 2014 11 / 29

Estimation of ACVF (bivariate case, m = 2) 1 n h (X t+h X n )(X t X n ) t 0 h < n n t=1 ˆΓ(h) = 1 n (X t+h n X n )(X t X n ) t n < h < 0. t= h+1 ˆρ ij (h) = ˆγ ij (h)(ˆγ ii (0)ˆγ jj (0)) 1/2. Theorem If X t = µ + + k= C k Z t k {Z t } IID(0, S) then h ˆγ ij (h) P γ ij (h) ˆρ ij (h) P ρ ij (h) as n. 24 novembre 2014 11 / 29

An example: Southern Oscillation Index Southern Oscillation Index (an environmental measure) compared to fish recruitment in South Pacific (1950 to 1985) Southern Oscillation Index -1.0 0.0 0.5 1.0 1950 1960 1970 1980 Recruitment 0 20 60 100 1950 1960 1970 1980 24 novembre 2014 12 / 29

ACF of Southern Oscillation Index soi soi & rec ACF -0.5 0.0 0.5 1.0 0.0 0.5 1.0 1.5 Lag rec & soi -0.5 0.0 0.5 1.0 0.0 0.5 1.0 1.5 Lag rec Bottom left panel is γ 12 of negative lags. ACF -0.5 0.0 0.5 1.0-0.5 0.0 0.5 1.0-1.5-1.0-0.5 0.0 0.0 0.5 1.0 1.5 Lag Lag 24 novembre 2014 13 / 29

An example from Box and Jenkins Sales (V2) with a leading indicator (V1) sales V2 200 220 240 260 V1 10 11 12 13 14 0 50 100 150 Time 24 novembre 2014 14 / 29

ACF of sales data V1 V1 & V2 ACF -0.2 0.2 0.4 0.6 0.8 1.0 0 5 10 15 Lag V2 & V1-0.2 0.2 0.4 0.6 0.8 1.0 0 5 10 15 Lag V2 Data are not stationary. ACF -0.2 0.2 0.4 0.6 0.8 1.0-0.2 0.2 0.4 0.6 0.8 1.0-15 -10-5 0 0 5 10 15 Lag Lag 24 novembre 2014 15 / 29

Differenced sales data dsales V2-2 0 2 4 V1-0.5 0.0 0.5 0 50 100 150 Time 24 novembre 2014 16 / 29

ACF of sales data V1 V1 & V2 ACF ACF -0.5 0.0 0.5 1.0-0.5 0.0 0.5 1.0 0 5 10 15 Lag V2 & V1-0.5 0.0 0.5 1.0-0.5 0.0 0.5 1.0 0 5 10 15 Lag V2 Only crosscorrelation relevant only at lags 2, 3. -15-10 -5 0 0 5 10 15 Lag Lag 24 novembre 2014 17 / 29

Testing for independence of time-series: basis Generally asymptotic distribution of ˆγ ij (h) is complicated. But 24 novembre 2014 18 / 29

Testing for independence of time-series: basis Generally asymptotic distribution of ˆγ ij (h) is complicated. But Theorem Let X t,1 = j= α j Z t j,1 X t,2 = j= β j Z t j,2 with {Z t,1 } WN(0, σ 2 1), {Z t,2 } WN(0, σ 2 2) and independent. 24 novembre 2014 18 / 29

Testing for independence of time-series: an example Suppose {X t,1 } and {X t,2 } are independent AR(1) processes with ρ i,i (h) = 0.8 h. Then asymptotic variance of ˆρ 12 (h) is n 1 h= 0.64 h n 1 4.556 Values of ˆρ 12 (h) quite larger than 1.96n 1 should be common even if the two series are independent. Instead, if one series is white-noise, then V(ˆρ 12 (h)) 1 n. 24 novembre 2014 19 / 29

Pre-whitening a time series Instead of testing ˆρ 12 (h) of the original series, one trasforms them into white noise. If {X t,1 } and {X t,2 } are invertible ARMA, then where k=0 π (i) k Z t,i = k=0 π (i) k X t k,i WN(0, σ 2 i ), i = 1, 2 zk = π (i) (z) = φ (i) (z)/θ (i) (z). {X t,1 } and {X t,2 } are independent if and only if {Z t,1 } and {Z t,2 }, hence one test for ˆρ Z1,Z 2 (h). 24 novembre 2014 20 / 29

Siimulated data 1st series is AR(1) with ϕ = 0.9; 2nd series is AR(2) with ϕ 1 = 0.7, ϕ 2 = 0.27. dat_sim dat2-6 -4-2 0 2 4 dat1-4 -2 0 2 4 0 50 100 150 200 Time 24 novembre 2014 21 / 29

ACF of simulated data dat1 dat1 & dat2 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Lag 0 5 10 15 20 Lag dat2 & dat1 dat2 ACF 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0-20 -15-10 -5 0 Lag 0 5 10 15 20 Lag 24 novembre 2014 22 / 29

ACF of residuals MLE fits the correct model to both series. 24 novembre 2014 23 / 29

ACF of residuals MLE fits the correct model to both series. fitunk1$res fitunk1$res & fitunk2$res ACF ACF 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Lag fitunk2$res & fitunk1$res 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Lag fitunk2$res A few crosscorrelation coefficient may appear slightly significant. -20-15 -10-5 0 0 5 10 15 20 Lag Lag 24 novembre 2014 23 / 29

Bartlett s formula More generally Theorem If {X t } is a bivariate Gaussian time series with lim ncov (ˆρ 12(h), ˆρ 12 (k)) = n + j= + h= γ ij (h) <, then [ ρ 11 (j)ρ 22 (j + k h) +ρ 12 (j + k)ρ 21 (j h) ρ 12 (h) (ρ 11 (j)ρ 12 (j + k) + ρ 22 (j)ρ 21 (j k)) ρ 12 (k) (ρ 11 (j)ρ 12 (j + h) + ρ 22 (j)ρ 21 (j h)) ( 1 +ρ 12 (h)ρ 12 (k) 2 ρ2 11(j) + ρ 2 12(j) + 1 ) ] 2 ρ2 22(j) 24 novembre 2014 24 / 29

Spectral density of multivariate series If + h= γ ij (h) <, one can define f (λ) = 1 e ihλ Γ(h), λ [ π, π] 2π and one obtains h= π Γ(h) = π e iλh f (λ) dλ 24 novembre 2014 25 / 29

Spectral density of multivariate series If + h= γ ij (h) <, one can define f (λ) = 1 e ihλ Γ(h), λ [ π, π] 2π and one obtains and h= π Γ(h) = X t = π π π e iλh f (λ) dλ e iλh dz(λ) where Z i ( ) are (complex) processes with independent increments s.t. λ2 λ 1 ( ) f ij (λ) dλ = E (Z i (λ 2 ) Z i (λ j ))(Z j (λ 2 ) Z j (λ 1 )). 24 novembre 2014 25 / 29

Coherence of series For a bivariate series the coherence at frequency λ is X 12 (λ) = f 12 (λ) [f 11 (λ)f 22 (λ)] 1/2 and represents the correlation between dz 1 (λ) and dz 2 (λ). The squared coherency function is X 12 (λ) 2 satisfies 0 X 12 (λ) 2 1. 24 novembre 2014 26 / 29

Periodogram n Define J(ω j ) = n 1/2 X t e itω j, t=1 for j between [(n 1)/2] and [n/2]. ω j = 2πj/n Then I n (ω j ) = J(ω j )J (ω j ) where means transpose and complex conjugate. ( n ) ( n ) I 12 (ω j ) = 1 n is the cross periodogram. t=1 X t1 e itω j t=1 X t2 e itω j 24 novembre 2014 27 / 29

Estimation of spectral density and coherence Again, one estimates f (λ) by ˆf (λ) = 1 2π If X t = + k= C kz t k m n k= m n W n (k)i n ( g(n, λ) + 2π k ). n {Z t } IID(0, S) then m n ˆf ij (λ) AN f ij (λ), f ij (λ) Wn 2 (k) 0 < λ < π. k= m n The natural estimator of X 12 (λ) 2 is ˆχ 2 12(λ) = ˆf 12 (λ) 2 ˆf 11 (λ)ˆf 22 (λ). 24 novembre 2014 28 / 29

An example of coherency estimation Squared coherency between SOI and recruitment squared coherency 0.0 0.2 0.4 0.6 0.8 1.0 The horizontal line represents a (conservative) test of the assumption X 12 (λ) 2 = 0. Strong coherency at period 1 yr. and longer than 3. 0 1 2 3 4 5 6 frequency 24 novembre 2014 29 / 29