Practical conditions on Markov chains for weak convergence of tail empirical processes

Practical conditions on Markov chains for weak convergence of tail empirical processes Olivier Wintenberger University of Copenhagen and Paris VI Joint work with Rafa l Kulik and Philippe Soulier Toronto, May 4 2016 1/31

I. Regularly varying time series II. The tail empirical process III. Statistical applications 2/31

I. Regularly varying time series Let {X t, t Z} be a real valued stationary time series. It is regularly varying iff for any k 0 ( X0 L x, X 1 x..., X ) k x X 0 > x x (Y 0,..., Y k ). This defines a process {Y k, k 0} called the tail process, Basrak and Segers (2009). Y 0 has a two-sided Pareto distribution on (, 1] [1, ). If for some j {1,..., d}, X j is extremally independent of X 0, then Y j 0. Y 0 is independent of Y j / Y 0 = Θ 0 called the spectral tail process. 3/31

Harris recurrent Markov chains Assume X t = g(φ t ) with {Φ t } an aperiodic, recurrent and irreducible Markov chain with kernel P and which admits an invariant distribution π such that X 0 = g(φ 0 ) is regularly varying. It constitutes a stationary regularly varying time series, Resnick and Zeber (2013), Janssen and Segers (2014). Y t is a multiplicative random walk Y t = A t A 1 Y 0, t 0 for some (A t ) iid. 4/31

Drift condition Assume the following drift condition holds: PV (x) λv (x) + b1 C (x), where λ (0, 1), V (x) : E [1, ) and C is a small set. Under these conditions, the chain (Φ t ) is β-mixing with geometric rate and so is (X t ), Meyn and Tweedie (1997). 5/31

The DC p, p > 0 condition, Mikosch and W. (2014) Let {u n } be an increasing sequence such that 1 lim F (u n ) = lim n n n F (u n ) = 0. There exist p (0, α/2) and a constant c such that For every compact set [a, b] (0, ), lim sup n sup a s b g p cv. (1) 1 unp(x p 0 > u n ) E [ ] V (Φ 0 )1 {sun<g(φ0 )} <. Roughly (1) and (2) implies the existence of c 1, c 2 and n 0 c 1 P(V (Φ 0 ) > u n ) P( X p > u n ) c 2 P(V (Φ 0 ) > u n ), n n 0. (2) 6/31

Example 1: the GARCH(1,1) process We consider a GARCH(1,1) process X t = σ t Z t, where (Z t ) is iid N (0, 1) and σ 2 t = α 0 + σ 2 t 1(α 1 Z 2 t 1 + β 1 ) = α 0 + σ 2 t 1A t. Special SRE then DC p holds under Kesten s condition E[A α/2 0 ] = 1 for p < α with V (x) = x p and g = x, Mikosch and W. (2014). 7/31

Example 2: the AR(p) process Assume that {X j, j Z} is an AR(p) model X j = φ 1 X j 1 + + φ p X j p + ε j, j 1, that satisfies the following conditions: {ε j, j Z} are iid, regularly varying with index α, the polynomial 1 φ 1 z φ p z p does not have unit root inside the unit cercle. As for Random Coefficients AR, DC p for p < α, holds for some V and g(x 1,..., x p ) = x 1, Feigin and Tweedie (1985). 8/31

Example 3: the Threshold ARCH(1) process Let ξ R. Assume that {X j } is T-ARCH model X j =(b 10 + b 11 X 2 j 1) 1/2 Z j 1 {Xj 1 <ξ} + (b 20 + b 21 X 2 j 1) 1/2 Z j 1 {Xj 1 ξ}, that satisfies the following conditions: b ij > 0; {Z j, j Z} are iid gaussian r.v., the Lyapunov exponent γ = p log b 1/2 11 + (1 p) log b1/2 21 + E[log( Z 1 )], where p = P(Z 1 < 0), is strictly negative; (b 11 b 21 ) p/2 E[ Z 0 p ] < 1. Then DC p holds. 9/31

A counterexample Let {Z j, j Z} be iid positive integer valued r.v. regularly varying with index β > 1. Define the Markov chain {X j, j 0} by the following recursion: { X j 1 1 if X j 1 > 1, X j = Z j if X j 1 = 1. Since β > 1, the chain admits a stationary distribution π on N given by π(n) = P(Z 0 n) E[Z 0 ], n 1. Karamata s Theorem implies that π is regularly varying with index α = β 1. 10/31

The state {1} is an atom and the return time to the atom is distributed as Z 0, The chain is not geometrically ergodic and DC p cannot hold, The time series is regularly varying with Y j = 1 for all j 0, It admits a phantom distribution, the distribution of Z 0, O Brien (1987), Doukhan et al. (2015) ( ) P max X j u n P(Z j u n ) n/e[z] 0 1 j n GARCH 0 1 2 3 4 5 QUEUE 5 10 15 20 0 2000 4000 6000 8000 10000 Time 11/31

II. The tail empirical process. II.1. Univariate TEP The univariate tail empirical process is defined by e n (t) = 1 n F (u n ) n {1 {Xi >u nt} F (u n t)} i=1 relevant to infer the marginal tail. In the iid case then n F (u n ) e n (t) W (t α ). where W is the standard Brownian motion and means weak convergence in the space D(0, ] endowed with the J 1 topology, Starica and Resnick (1997). 12/31

II.1. Multivariate TEP For each quantity of interest, an appropriate type of TEP is used. Joint exceedences, Kulik and Soulier (2015), extremograms, Davis and Mikosch (2009). e n (t 1,..., t h ) = 1 n F (u n ) n {1 {Xi+1 >u nt 1,...,X i+h >u nt h } i=1 Conditional limiting distributions, e n (t 1,..., t h ) = 1 n F (u n ) P(X 1 > u n t 1,..., X h > u n t h )}. n 1 {Xi+1 u nt 1,...,X i+h u nt h }1 {Xi >u n} i=1 P(X 1 u n t 1,..., X h u n t h X 0 > u n ) 13/31

Spectral tail process, Drees et al. (2015). The relevant TEP is e n (t) = 1 n F (u n ) n 1 {Xi+1 / X i t,x i >u n} P(X 1 /X 0 u n t X 0 > u n ). i=1 Many variations and more generally, cluster functionals, Drees and Rootzen (2010). Weighted versions e n (t 1,..., t h ) = 1 n F (u n ) n Ψ(X i+1,..., X i+h )1 {Xi+1 >u nt 1,...,X i+h >u nt h } i=1 E[Ψ(X i+1,..., X i+h ) X 1 > u n t 1,..., X h > u n t h ]. 14/31

Extremal quantities of interest Univariate: The tail index. Extreme quantiles. Multivariate; extremal dependence: various coefficients of extremal dependence; extremograms; distribution of the tail process. Multivariate; extremal independence: conditional limiting distributions; conditional scaling exponents. Infinite dimensional: extremal index; cluster functionals. 15/31

Conditions for time series To prove the weak convergence of the tail empirical process for time series, three types of conditions are needed. Temporal dependence condition; Tightness criterion; Finite clustering condition; Bias conditions. We will not discuss the bias issues which are important but of a completely different nature (second order conditions). 16/31

Temporal dependence β-mixing allowing blocks method, Rootzen (2009) For a suitable choice of sequences r n and u n, the limiting distribution of e n is the same as ẽ n (t) = m n i=1 1 n F (u n ) r n j=1 {1 { X(i 1)rn+j } F (u n t)} where m n = [n/r n ] and the coupling blocks, Yu (1994), { X j, j = (i 1)r n + 1,..., ir n }, i = 1,..., m n are iid as one original block of X i. Given this approximation, we need two more conditions: existence of the limiting variance of each block suitably normalized, tightness of the TEP. 17/31

The variance of one block By standard computations, we obtain r 1 n F (u n ) var 1 {Xj >u ns} j=1 F (u n s) F (u n ) + rn s α + j=1 r n j=1 P(X 0 > u n s, X j > u n s) F (u n ) P(X 0 > u n s, X j > u n s) F (u n ) + O(r n F (u n )) + O(r n F (u n )). If we assume that r n F (un ) 0, then the sum right above must a limit. 18/31

Note that joint regular variation of (X 0, X j ) implies that for fixed L > 0, by definition of the tail process {Y j, j 1}, lim n L j=1 P(X 0 > u n s, X j > u n s) F (u n ) L = s α P(Y j > 1). j=1 This implies that for each L > 1, 1 lim inf n F (u n ) var r n j=1 1 {Xj >u ns} s α L P(Y j > 1). Thus a necessary condition for the quantity on the left hand side to have a finite limit is P(Y j > 1) <. j=1 j=0 19/31

Finite Clustering condition A sufficient condition is lim lim sup L n 1 F (u n ) L< j r n P(X 0 > u n s, X j > u n s) = 0. (known under the misleading name Anti-Clustering condition). If Condition FC holds, then for s t, lim n n F (u n ) cov(e n (t), e n (s)) = s α j Z P(Y j > t/s). where {Y j, j Z} is the tail process (extended to negative integers thanks to the forward-backward formula of Basrak and Segers, 2009). 20/31

If the time series is extremally independent, i.e. Y j = 0 for j 0, then the limit is the same as in the iid case. If Condition FC holds, then the extremal index θ of the chain {X t } is positive and is given by, Basrak and Segers (2009), ( ) θ = P max Y k 1 > 0. k 1 In the counterexample θ = 0, Roberts et al. (2006). 21/31

Checking Condition FC for Markov chains Lemma Under the DC p and minorization conditions, Condition FC holds. Proof: The set C is a regenerative set. Let τ C be the first return to C. Fix an integer L > 0 and split the sum at τ C : 1 [ rn ] F (u n ) E 1 {sun<x0 }1 {sun<xj } j=l 1 F (u n ) E + 1 F (u n ) E τ C j=l r n j=τ C +1 1 {sun<x0 }1 {sun<x j } 1 {sun<x0 }1 {sun<xj }1 {τc r n} The last term is dealt with by standard regenerative arguments. 22/31

For r > 1, the sum up to the first visit is bounded by F 1 τc ] (u n )E[ 1 {su j=l n<x 0 }1 {sun<xj } [ τc ] E 1 {su su j=l n<x j } X 0 = x ν n (dx) n [ Cun p r L τc ] E r j V (X j ) X 0 = x ν n (dx) j=l su n The geometric drift condition states precisely that there exists r > 1 such that [ τa ] E r k V (Φ j ) X 0 = x C x p. j=l Plugging this bound into the integral yields F 1 τa ] (u n )E[ 1 {su j=l n<x 0 }1 {sun<xj } Cr L. 23/31

Tightness Tightness of the process e n under the β-mixing condition holds under the following sufficient condition: for each a > 0, s, t a, lim sup n 1 r n F (u n ) E r n 1 {uns<xj r nt} j=1 2 s α t α. This condition can be proved for Markov chains which satisfy the DC p and minorization conditions as above. Moreover, the following restriction on the block size is needed lim n r n n F (u n ) = 0. 24/31

Theorem Let DC p and minorization conditions hold and assume moreover that there exists η > 0 such that { } 1 lim n log1+η (n) F (u n ) + = 0. n F (u n ) Let s 0 > 0 be fixed. The TEP converges weakly in l ([s 0 1, )) to a centered Gaussian process at the rate nf (u n ). If ψ : R h+1 R is such that ψ(x 0,..., x h ) c(( x 0 1) q 0 + + ( x h 1) q h ), with q i + q i p α/2 for all i, i = 0,..., h, then converges weakly to a centered Gaussian process at the rate nf (u n ). 25/31

Examples and a counterexample Most usual Markovian time series satisfy the DC p condition GARCH(1,1), SRE, AR(p), Functional autoregressive process: X t+1 = g(x t ) + Z t with lim t g(x) λ < 1, Threshold models, AR-ARCH models, etc. The counterexample does not satisfy the DC p condition and the TEP converges at another rate than the iid case. The limit is not gaussian if var(z) =. 26/31

III. Statistical applications In the case of extremal independence, Y j = 0 for all j 0, under the DC p and minorization conditions, the univariate TEP has the same limit as in the iid case. Thus most of the statistical inference works similarly as in the iid case. 27/31

The Hill estimator The classical Hill estimator of γ = 1/α is defined as Corollary ˆγ = 1 k n j=1 ( ) Xn:n j+1 log + X n:n k Under the DC p and minorization conditions and if one can neglect the bias, then k {ˆγ γ} N 0, α 2 1 + 2 P(Y j > 1 Y 0 > 1). j=1. Notice the limit distribution is the one from the iid case multiplied by the sum of the extremograms. 28/31

Estimation of the cluster index Under Condition DC p, the cluster index exists, Mikosch and Wintenberger (2014), lim b +(h) = b +, h b + (h) = 1 h lim n P(X 0 + + X h > u n ). F (u n ) Define Corollary ˆb + (h) = 1 n h 1 kh {Xj + +X j+h >X n:n k }. j=1 Under the DC p and minorization conditions and if one can neglect the bias, k(ˆb + (h) b + (h)) converges weakly to a gaussian r.v.. 29/31

What about the counterexample? In most statistical applications, we are interested in the behavior of the TEP e n at X n:n k /u n. Both quantities behave in a very non usual way but the ultimate estimation seems to work normally. 10 9 8 7 6 5 4 3 7 8 9 10 30/31

Thanks! 31/31