On the estimation of the heavy tail exponent in time series using the max spectrum. Stilian A. Stoev

On the estimation of the heavy tail exponent in time series using the max spectrum Stilian A. Stoev (sstoev@umich.edu) University of Michigan, Ann Arbor, U.S.A. JSM, Salt Lake City, 007 joint work with: George Michailidis (gmichail@umich.edu) and Murad Taqqu (murad@math.bu.edu)

Outline Heavy tails are ubiquitous An old problem Max spectrum The estimator Asymptotic properties Data examples

Heavy tails A random variable X is said to be heavy tailed if P{ X x} L(x)x α, as x, for some α > 0 and a slowly varying function L. Here we focus on the simpler but important context: X 0, a.s. and P{X > x} Cx α, as x. X (infinite moments) For p > 0, In particular, and EX p < if and only if p < α. 0 < α Var(X) = 0 < α E X =. The estimation of the heavy tail exponent α is an important problem with rich history. 3

Heavy tails everywhere: Traded volumes 0 x Traded Volumes No. Stocks, INTC, Nov, 005 05 8 6 4 x 0 4 4 6 8 0 x 0 4 4 3 000 4000 6000 8000 0000 000 4

Heavy tails everywhere: TCP durations x 0 4 TCP Flow Sizes (packets): UNC link 00 (~ 36 min) 8 6 4 4 6 8 0 4 time x 0 4 The first minute 00 000 800 600 400 00 500 000 500 000 500 3000 3500 5

Heavy tails everywhere: Insurance claims 50 Danish Fire Loss Data: 980 990 00 50 00 50 00 400 600 800 000 00 400 600 800 000 Hill plot: α (k) =.394 H.5 0 500 000 500 000 order statistics Max Spectrum H= 0.604 (0.00897), α =.655 0 8 6 4 0 0 5 0 Scales j 6

Tail exponent estimation: an old problem Hill (975) the MLE in the Pareto model P{X > x} = x α, x and introduced the Hill plot: α H (k) := ( k k log(x i,n ) log(x k+,n )), i= where X,n X,n X k,n are the top k order statistics of the sample. A lot of work for iid data less for dependent: Resnick and Stǎricǎ (995) consistency of Hill type estimators. J. Hill (006) asymptotic normality of Hill type estimators under NED (near epoch dependence) conditions.... Even for iid data, Hill plots are: volatile & hard to interpret: Hill horror plot 7

Another approach: max self similarity For iid (X k ) with tail exponent α n d Z, as n, n /α i= X i where P{Z x} = exp{ Cx α }, x > 0. The above continues to hold for many dependent stationary (X k )! Given X,..., X n, set D(j, k) := j i= X j (k )+i, to be block maxima of dyadic sizes. Observe that Y j := n j n j k= k n j := [n/ j ], j log (n). log D(j, k) Elog j/α Z = j/α + Elog Z, as j. 8

The max spectrum: iid asymptotics The Y j s, j log n is the max spectrum of the data set (X k, k n). An estimator of α is then derived from Y j via regression: α = α[j, j ] := j j=j w j Y j, with w j = 0, j jw j =. j For iid data: The estimator α[j, j ] is consistent and asymptotically normal, as j, j but so that n/ j, n/j. Thm [S., Michailidis & Taqqu (006)] For iid data under second order tail regularity conditions. Let r(n) log n be such that n/ r(n)(/+β/α) + r(n) r(n)/ / n 0, as n, then sup x R P{ n j +r(n)( θ, Y θ, µ r ) x} Φ(x/σ θ ) 0, n. 9

The max spectrum: iid asymptotics (cont d) Here Y = (Y j+r(n) ) j j=j, θ = (θ j ) j j=j, and µ r = ((j + r(n))/α + C, j j j ), and σ θ = α θ t Σ θ. Remarks: The β > 0 governs the second order tail behavior. Roughly: P{X > x} Cx α ( + Dx β ), as x. The asymptotic cov matrix Σ is the same as for Fréchet data. It does not depend on α and C = Elog Z. Consistency and asymptotic normality for α[r(n) + j, r(n) + j ] follow. The rates are the same as for the Hill estimator Hall (98). The explicit asymptotic cov α Σ of the max spectrum Y yields the optimal linear GLS estimators important in practice. 0

The max spectrum: dependent data Let (X k ) k Z be stationary, with tail exponent α and extremal index θ > 0. Then, n /α k n X k d θ /α Z where n /α k n X k d Z, (n ) where (X k ) are iid copies of X. Since θ > 0, the max spectrum (Y j ) for time series scales as for iid data: Y j j/α + C, as j and n j = n/ j. The same, regression based, estimators α = j j=j w j Y j work! The asymptotics for α are harder (than for iid data)! Intuition: the block maxima D(j, k), k n j are asymptotically iid, as j.

Max spectrum illustration: TCP durations TCP Flow Sizes (bytes): Max self similarity H= 0.94 (0.044637), α =.08 6 4 Max Spectrum 0 8 6 4 4 6 8 0 4 6 Scales j

Two asymptotic regimes Intermediate scales: Fix j < j integer and let α n = α[r(n)+j, r(n)+j ], where r(n) and r(n) /n 0, as n. We expect to get consistency and asymptotic normality for α n. Large scales: Fix l N and focus on the largest l + scales: α n = α[log n l,log n]. We can only get distributional consistency : with α Z a random variable. α n d αz, as n, Both regimes are useful/interesting in practice. More details... 3

Intermediate scales asymptotics The regularity conditions: for M n := max k n X k P{n /α M n x} = exp{ c(n, x)x α }, x > 0, where c(n, x) c X c (x)n β, x > 0, with c (x) = O(x R ), x 0. () (Plus a technicality at x 0.) Intuition: β controls the second order tail behavior of M n. Caveat: Relation () may be hard to verify! We have it for moving maxima. We get rates on moments of f(m n /n /α ), in particular: Thm [S. & Michailidis (006)] Under the above conditions, for all k N, E log k (M n /n /α ) Elog k (Z) = O(n β ), as n, provided c (x)x α +δ dx, for δ > 0. 4

Intermediate scales: asymptotic normality Let (X k ) be stationary with tail exponent α > 0. Thm [S. & Michailidis (006)] Under the above conditions, and if (X k ) is m dependent, we have nr(n) ( α n α) d N(0, α c w ), where c w = w t Σ w, and α n = α[r(n) + j, r(n) + j ], provided r(n) /n + n/ r(n)(+min{,β}) 0, as n. Remarks: The same asymptotic variance as in the iid case. Intuition: The block maxima D(j, k), k n j asymptotically iid! β captures: second order tails PLUS dependence. Asymptotic confidence intervals available! Optimal linear GLS estimators available! 5

Large scales: distributional consistency The regularity conditions and m dependence are restrictive. As in Davis & Resnick (985), let X k = i=0 c i ξ k i, where i c i δ <, 0 < δ < min{, α}. Here (ξ k ) are iid and P{ ξ > x} Cx α, x, with P{ξ > x}/p{ ξ > x} p [0,], as x. Lemma For X k (m) := max i m X m(k )+i, k =,,..., we get {m /α X k (m)} k N fdd {Z k } k N, as m, where (Z k ) are iid α Fréchet. Provided pmax i c i > 0 or ( p)max i ( c i ) > 0. This justifies the asymptotic independence phenomenon for the block maxima (D(j, k)) k as j! Thm [S. & Michailidis (006)] Under the above conditions, with fixed l d α n α Z,l, as n, where α n = α[ top l scales] and α Z is based on iid α Fréchet data Z,..., Z l+. 6

Distributional consistency: implications No consistency but confidence intervals! Covers more processes! The approximation is often valid for small n. 7

AR() with Pareto (α =.5) innovations AR() with Pareto innovations: φ = 0.9, α =.5 500 000 500 α 3.5.5 0.5.5.5 3 x 0 4 Hill plot Hill plot.5 α 0.5 3 Order statistics k x 0 4 500 000 500 000 Order statistics k 8

The max spectrum... Max self similarity: α =.4844 3 Max Spectrum 0 9 8 7 6 5 4 6 8 0 4 Scales j 9

ata examples: the advantage of time scales 0

Google: traded volume x 0 5 Transaction volumes for GOOG in November 005 Number of shares.5 0.5 5 0 5 0 5 Day of the month Confidence intervals for α per day 4 3 α 5 0 5 0 5 30 Day of the month

Google: traded volume the time series Number of shares x 0 4 8 6 4 Transaction volumes for GOOG: Nov 7, 005 α 3.5.5 0.5.5.5 3 3.5 4 x 0 4 Hill plot α =.079 8 6 4 0 8 Max Spectrum 0 00 400 600 800 0 5 0 5 Order statistics k Scales j

Intel: traded volume Number of shares x 0 6 3 6 Transaction volumes for INTC in November 005 5 0 5 0 5 Day of the month Confidence intervals for α per day 5 4 α 3 5 0 5 0 5 30 Day of the month 3

Intel: strange time series Number of shares x 0 5.5.5 0.5 Transaction volumes for INTC: Nov 3, 005 3 4 5 6 7 8 9 x 0 4 Hill plot α(7,) =.0578, α(,6) = 5.8 α 3 0 00 400 600 800 000 Order statistics k Max Spectrum 8 6 4 0 0 5 0 5 Scales j 4

Intel: typical time series 3 x 05 Transaction volumes for INTC: Nov, 005 Number of shares 0.5 3 4 5 6 x 0 4 Hill plot α =.5564 5 α.5 0.5 00 400 600 800 000 Order statistics k Max Spectrum 0 5 0 5 0 5 Scales j 5

References: Davis, R. A. and Resnick, S.I.(985) Limit theory for moving averages of random variables with regularly varying tail probabilities. The Annals of Probability 3(), 79 95. Hall, P. (98) On some simple estimates of an exponent of regular variation, J. Roy. Stat. Assoc. (Ser B), 44, 37 4. Hill, B. M. (975) A simple general approach to inference about the tail of a distribution. The Annals of Statistics 3, 63 74. Resnick, S. and Stǎricǎ, C. (995) Consistency of Hill s estimator for dependent data. Journal of Applied Probability 3, 39 67. Stoev, S. and Michailidis, G. (006) On the estimation of the heavy tail exponent in time series using the max spectrum, Technical Report, University of Michigan. Stoev, S., Michailidis, G., and Taqqu, M.S. (006) Estimating heavy tail exponents through max self similarity, Technical Report, University of Michigan. WRDS https://wrds.wharton.upenn.edu/. Wharton School of Management, Universty of Pennsylvania. 6