Heavy Tailed Time Series with Extremal Independence Rafa l Kulik and Philippe Soulier Conference in honour of Prof. Herold Dehling Bochum January 16, 2015 Rafa l Kulik and Philippe Soulier
Regular variation Plan Regular Variation Extremal independence, extremal dependence Framework for extremal independence: Conditional Extreme Value Models: Markov chains, stochastic volatility models (with long memory) Statistical inference Simulations To do Rafa l Kulik and Philippe Soulier 1
Regular variation of random variables 1 2 A random variable is regularly varying with index α if Regular Variation P(X > x) = x α L(x), where L( ) is slowly varying at infinity. Define a sequence a n by lim np(x > a n) = 1 n and the measure ν α on the Borel sigma-field B((0, ]) by ν α (dy) = αy α 1 1 {y>0}. 1 All random variables considered in this talk are nonnegative 2 For a general theory of regular variation, see Resnick (2007) Rafa l Kulik and Philippe Soulier 2
Regular Variation Then, regular variation of X is equivalent to the vague convergence on B((0, ]) np(a 1 n X ) v ν α, n. Equivalently, for all functions f continuous with compact support we have n f(x)p(a 1 n X dx) f(x)ν α (dx). Rafa l Kulik and Philippe Soulier 3
Regular variation of random vectors Regular Variation Definition 1. A vector X = (X 1,..., X d ) in R d is regularly varying with index α if there exists a nondegenerate measure ν (homogeneous with index α) on [0, ] d \ {0} and a sequence of constants c n such that np(c 1 n X ) v ν. A stationary time series is regularly varying if all finite dimensional distributions are regularly varying. Since we will be dealing with stationary time series, let us assume that all marginal distributions are the same. Then Definition 1 implies that lim x P(X 1 > x,..., X d > x) P(X 1 > x) = ν((1, ] (1, ]) ν((1, ] (0, ) (0, ]) [0, ). (1) Rafa l Kulik and Philippe Soulier 4
Extremal dependence / independence Extremal independence, extremal dependence Definition 2. Let X be a regularly varying random vector in R d. It will be said extremally independent if its exponent measure is concentrated on the axes. It will be said extremally dependent if the exponent measure is not concentrated on the axes. Extremal dependence implies that: The limit in (1) is positive; Large values cluster, that is, a large value is followed by another large value; Conditional Tail Expectation growths linearly: lim x x 1 E[X j X 1 > x] (0, ) j = 1, 2,.... Rafa l Kulik and Philippe Soulier 5
(Random variables Z j α). Examples Extremal dependence / independence are independent and regularly varying with index Extremal dependence: 1. AR(1) model: X j = l=0 ρk Z j k, ρ (0, 1). 2. Stationary solutions to stochastic recurrence equations: X j+1 = A j+1 X j + B j+1, where (A j, B j ) are i.i.d. random vectors. 3 Extremal independence: 4 1. Exponential AR(1) model: stationary solution to X j+1 = X ϕ j Z j+1, ϕ (0, 1). 2. Stochastic volatility model: X j = exp(y j )Z j, where {Y j } is a dependent Gaussian sequence. 3 Under appropriate conditions, see Basrak, Davis, Mikosch (1999,2002) 4 See Mikosch and Rezapour (2013), Kulik and Soulier (2014), Drees and Janssen (2014) for variety of other models Rafa l Kulik and Philippe Soulier 6
Framework for extremal independence Extremal independence Example 1. Let X 1 and X 2 be i.i.d. regularly varying rvs. Then, lim x lim x P(X 1 > xu 1, X 2 > xu 2 ) P(X 1 > x) P(X 1 > xu 1, X 2 > xu 2 ) P 2 (X 1 > x) = 0, (2) = u α 1 u α 2, (3) lim P(X 1 > xu 1, X 2 > u 2 X 1 > x) = u α x 1 P(X 2 > u 2 ). (4) The limit (2) confirms the fact that in case of the extremal independence the usual extreme value theory is useless to consider events where both components are extremely large. The limits (3) and (4) show that using different scaling, either in the numerator or in the denominator, may yield non degenerate limits. Rafa l Kulik and Philippe Soulier 7
Extremal independence Several concepts have been introduced to deal with extremally independent random vectors: Hidden Regular Variation and Tail Dependence Coefficient (formalizing the situation in (3)). Bivariate concept, not really applicable to time series. 5 Conditional Extreme Value framework (formalizing the situation in (4)). 6 General technical idea: change the space of vague convergence from [0, ] d \ {0} to something smaller. 5 Resnick (series of papers in the last ten years); Ledford and Tawn (1996, 2003) 6 Heffernan and Resnick (2007), papers by Resnick in the last ten years, Kulik and Soulier (2014) Rafa l Kulik and Philippe Soulier 8
Conditional Extreme Value framework Conditional Extreme Value Assumption 1. There exist scaling functions b j, j 1 and Radon measures µ d, d 1, on (0, ] [0, ] d, d 1, such that (( ) 1 P(X 0 > x) P X0 x, X 1 b 1 (x),, X d b d (x) on (0, ] [0, ] d and for all y 0 > 0, ) v µ d, 1. the measure µ d ([y 0, ] ) on R d is not concentrated on a line through infinity; 2. the measure µ d ([y 0, ] ) on R d is not concentrated on a hyperplane; 3. the measure µ d ( R d ) on (0, ] is not concentrated at infinity. Rafa l Kulik and Philippe Soulier 9
Consequences of the assumption: Conditional Extreme Value 1. We can define the multivariate distribution functions Ψ d on [1, ) [0, ] d by ( Ψ d (y) = lim P X0 x x y 0, X 1 b 1 (x) y 1,..., ) X d b d (x) y d X 0 > x 2. The functions b h are regularly varying with index κ h. We call the index κ h of regular variation of the functions b h the (lag h) conditional scaling exponent. The indices reflect the influence of an extreme event at time zero on future lags. 3. Conditional Tail Expectation at lag h growths at rate b h : lim x 1 b h (x) E[X j X 1 > x] (0, ) j = 1, 2,..... Rafa l Kulik and Philippe Soulier 10
Markov chains Models Assumption 2. There exist a function b, regularly varying at infinity with index κ 0 and a distribution function G on [0, ), not concentrated on one point such that lim Π(x, b(x)a) = G(A) (5) x for all Borel sets A [0, ) such that G( A) = 0. This means that the transition kernel is asymptotically homogeneous. It also means that conditionally on X 0 = x, the distribution of X 1 /b(x) converges weakly to the distribution G. Rafa l Kulik and Philippe Soulier 11
Models The next result states that Assumption 2 implies Assumption 1. Define b 0 (x) = x, b 1 (x) = b(x) and for h 1, b h = b h 1 b. Theorem 3. Let {X j } be a Markov chain whose transition kernel satisfies Assumption 2 and with initial distribution having right tail index α > 0. Assume moreover that G({0}) = 0. Then Assumption 1 holds and the limiting conditional distribution of ( ) X0 x, X 1 b 1 (x),..., X h b h (x),... given X 0 > x when x is the distribution of the exponential AR(1) process {Y j, j 0} defined by Y j = Yj 1 κ W j where {W j } is an i.i.d. sequence with distribution G, independent of the standard Pareto random variable Y 0 with tail index α. (We call {Y j } the tail chain.) Rafa l Kulik and Philippe Soulier 12
Exponential AR(1) Let the time series {V j } be defined by V j = e ξ j with Models ξ j = ϕξ j 1 + ϵ j, where 0 ϕ < 1 and {ϵ j, j Z} is an i.i.d. sequence such that E[ϵ 0 ] = 0 and P(e ϵ 0 > x) = x α l(x), for some α > 0 and a slowly varying function l. Random variables V j are regularly varying with index α. The Exponential AR(1) satisfies the equation V j+1 = V ϕ j eϵ j+1. Rafa l Kulik and Philippe Soulier 13
We have Models Π(x, x ϕ A) = G(A). Since G({0}) = 0, Theorem 3 is applicable. The tail chain is a non stationary exponential AR(1) process {Y j } defined by Y j = Y ϕ j 1 eϵ j and Y 0 is a standard Pareto random variable. The limiting conditional distribution of V h given V 0 > x is lim P(V h x ϕh y V 0 > x) = x 1 P(e ξ 0,h v ϕh y) αv α 1 dv. The conditional scaling exponent is κ h = ϕ h. If α > 1, then [ ] lim E Vh x x κ V 0 > x h = αe[eξ 0,h] α κ h = αe[v 0 ] (α κ h )E[V κ h 0 ]. Rafa l Kulik and Philippe Soulier 14
Stochastic volatility process with heavy tailed volatility Models Assume that X j = V j Z j = e ξ jz j, where {ξ j, j Z} is the AR(1) process considered before and {Z j, j Z} is a sequence of i.i.d. random variables such that E[Z q 0 ] < for some q > α, independent of the sequence {ξ j, j Z}. Then P(X 0 > x) E[Z0 α ]P(e ξ 0 > x). The conditional scaling exponent is thus κ h = ϕ h and P(X h x κ hy X 0 > x) 0 P(Z 0 > v 1, Z h e ξ 0,h v κ hy)αv α 1 dv. Extensions to V j = e ξ j, where ξ j double-exponential marginals. is long memory linear process with Rafa l Kulik and Philippe Soulier 15
Statistical inference Statistical inference Goals: Estimate b h ( ), κ h, limiting conditional distribution. Define the bivariate tail empirical distribution function by H n (s, y) = 1 n F (u n ) n 1 {Xj >u n s,x j+h b h (u n )y}, s > 0, y R. i=1 Let H n (s, y) = E[ H n (s, y)]. Under Assumption 1, we can define H(s, y) = lim n E[ H n (s, y)]. We consider the centered and renormalized tail empirical process as G n (s, y) = n F (u n ){ H n (s, y) H n (s, y)}. Rafa l Kulik and Philippe Soulier 16
Statistical inference Theorem 4. Let {X j } be a strictly stationary sequence such that Assumptions 1 holds. Under appropriate weak dependence,anti-clustering and continuity assumptions 7, the process G n converges weakly to G, where G is an almost surely continuous centered Gaussian process with covariance function j Z c j(s, y, t, x), where c j (s, t, x, y) = lim n 1 F (u n ) P(X 0 > u n s, X j > u n t, X h b h (u n )x, X j+h b h (u n )y). - Current work (with Philippe Soulier and Olivier Winterberger) - verification of the weak dependence,anti-clustering and continuity assumptions by drift conditions for Markov chains. 7 See Kulik and Soulier (2015); also Davis and Mikosch (2009), Drees and Rootzen (2010) Rafa l Kulik and Philippe Soulier 17
Estimation of b h and conditional distribution Statistical inference Let Ψ h (y) = lim x P(X h b h (x)y X 0 > x) = H(1, y). Let k = k n be an intermediate sequence, that is k and k/n 0. From Theorem 4 we can conclude X n:n k u n p 1, where X n:1 X n:2 X n:n are the order statistics of X 1,..., X n. For the chosen k, let u n be such that k = n F (u n ). It is thus reasonable to expect that 1 k n j=1 X j+h 1 {Xj >X n:n k } b h (u n ) p 1. Rafa l Kulik and Philippe Soulier 18
Therefore, we define an estimator of b h (u n ) by Statistical inference ˆbh,n = 1 k n X j+h 1 {Xj >X n:n k }. j=1 A natural candidate to estimate Ψ h is then ˆΨ h,n (y) = 1 k ( ) n 1 {Xj >X n:n k }1 {Xj+h ˆb h,n y} = H X n:n k ˆbh,n n. u n b h (u n ) y j=1, Rafa l Kulik and Philippe Soulier 19
Statistical inference Theorem 5. Under the conditions of Theorem 4 { } ˆbh,n k b(u n ) 1, X n:n k d 1 (Z 1, Z 2 ), (6) u n where Z 1 = 1 y G(dx, dy) α 1 G(1, ), Z 2 = α 1 G(1, ). If moreover the function H is continuously differentiable, then k { ˆΨh,n Ψ h } Λ in D((, )), where Λ is defined by Λ(y) = G(1, y) + H x (1, y) Z 2 + y H y (1, y) Z 1. Rafa l Kulik and Philippe Soulier 20
To do Statistical inference Testing for extremal independence, in particular how can we use our results to distinguish between GARCH and stochastic volatility models; Statistical inference in case of long memory; Estimation of the scaling functions b h in case of infinite variance - stable limits, require convergence in M 1 or M 2 topology. Rafa l Kulik and Philippe Soulier 21