On the estimation of the heavy tail exponent in time series using the max spectrum. Stilian A. Stoev

Similar documents
Two applications of max stable distributions: Random sketches and Heavy tail exponent estimation

arxiv:math/ v1 [math.st] 6 Sep 2006

Max stable Processes & Random Fields: Representations, Models, and Prediction

Max stable processes: representations, ergodic properties and some statistical applications

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

Does k-th Moment Exist?

Extremogram and Ex-Periodogram for heavy-tailed time series

Extremogram and ex-periodogram for heavy-tailed time series

Practical conditions on Markov chains for weak convergence of tail empirical processes

Heavy Tailed Time Series with Extremal Independence

Nonlinear Time Series Modeling

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Math 576: Quantitative Risk Management

18.175: Lecture 13 Infinite divisibility and Lévy processes

Large deviations for random walks under subexponentiality: the big-jump domain

Extreme Value Analysis and Spatial Extremes

ON THE ESTIMATION OF EXTREME TAIL PROBABILITIES. By Peter Hall and Ishay Weissman Australian National University and Technion

Analysis methods of heavy-tailed data

Stochastic volatility models: tails and memory

The Fundamentals of Heavy Tails Properties, Emergence, & Identification. Jayakrishnan Nair, Adam Wierman, Bert Zwart

FRACTIONAL BROWNIAN MOTION WITH H < 1/2 AS A LIMIT OF SCHEDULED TRAFFIC

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

Lecture 4: September Reminder: convergence of sequences

The autocorrelation and autocovariance functions - helpful tools in the modelling problem

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

Severity Models - Special Families of Distributions

Poisson Cluster process as a model for teletraffic arrivals and its extremes

Long-range dependence

Introduction to Algorithmic Trading Strategies Lecture 10

Network Traffic Characteristic

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

The Convergence Rate for the Normal Approximation of Extreme Sums

A MODIFICATION OF HILL S TAIL INDEX ESTIMATOR

4. Distributions of Functions of Random Variables

Estimation of the long Memory parameter using an Infinite Source Poisson model applied to transmission rate measurements

Lecture 32: Asymptotic confidence sets and likelihoods

Financial Econometrics and Volatility Models Extreme Value Theory

Assessing Dependence in Extreme Values

Emma Simpson. 6 September 2013

Some functional (Hölderian) limit theorems and their applications (II)

Conditional Sampling for Max Stable Random Fields

Regular Variation and Extreme Events for Stochastic Processes

STA205 Probability: Week 8 R. Wolpert

arxiv: v6 [math.pr] 31 Jan 2014

1. Point Estimators, Review

Tail Index Estimation of Heavy-tailed Distributions

Location Multiplicative Error Model. Asymptotic Inference and Empirical Analysis

Multivariate Normal-Laplace Distribution and Processes

The Behavior of Multivariate Maxima of Moving Maxima Processes

Limit theorems for dependent regularly varying functions of Markov chains

STAT Financial Time Series

Research Article Strong Convergence Bound of the Pareto Index Estimator under Right Censoring

Brief Review on Estimation Theory

Asymptotic Statistics-III. Changliang Zou

Asymptotics and Simulation of Heavy-Tailed Processes

Department of Econometrics and Business Statistics

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Econ 423 Lecture Notes: Additional Topics in Time Series 1

A New Estimator for a Tail Index

This exam contains 13 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Thomas J. Fisher. Research Statement. Preliminary Results

Master s Written Examination

Lecture 1: August 28

Reliable Inference in Conditions of Extreme Events. Adriana Cornea

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

Generalized Logistic Distribution in Extreme Value Modeling

A Note on Tail Behaviour of Distributions. the max domain of attraction of the Frechét / Weibull law under power normalization

Pareto approximation of the tail by local exponential modeling

Thomas Mikosch and Daniel Straumann: Stable Limits of Martingale Transforms with Application to the Estimation of Garch Parameters

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*

Simulation of Max Stable Processes

Model Fitting. Jean Yves Le Boudec

STA 6857 Estimation ( 3.6)

Quantile-quantile plots and the method of peaksover-threshold

Rare event simulation for the ruin problem with investments via importance sampling and duality

Asymptotic Tail Probabilities of Sums of Dependent Subexponential Random Variables

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Generalized least-squares estimators for the thickness of heavy tails

QED. Queen s Economics Department Working Paper No. 1244

Econ 424 Time Series Concepts

On heavy tailed time series and functional limit theorems Bojan Basrak, University of Zagreb

Portfolio Allocation using High Frequency Data. Jianqing Fan

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

Tail process and its role in limit theorems Bojan Basrak, University of Zagreb

Introduction to Simple Linear Regression

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

Optimal Estimation of a Nonsmooth Functional

Inference and Regression

Nonparametric regression with martingale increment errors

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Empirical likelihood and self-weighting approach for hypothesis testing of infinite variance processes and its applications

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Spatial extreme value theory and properties of max-stable processes Poitiers, November 8-10, 2012

Optimal global rates of convergence for interpolation problems with random design

GARCH Models Estimation and Inference

Estimating GARCH models: when to use what?

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Spring 2012 Math 541B Exam 1

Transcription:

On the estimation of the heavy tail exponent in time series using the max spectrum Stilian A. Stoev (sstoev@umich.edu) University of Michigan, Ann Arbor, U.S.A. JSM, Salt Lake City, 007 joint work with: George Michailidis (gmichail@umich.edu) and Murad Taqqu (murad@math.bu.edu)

Outline Heavy tails are ubiquitous An old problem Max spectrum The estimator Asymptotic properties Data examples

Heavy tails A random variable X is said to be heavy tailed if P{ X x} L(x)x α, as x, for some α > 0 and a slowly varying function L. Here we focus on the simpler but important context: X 0, a.s. and P{X > x} Cx α, as x. X (infinite moments) For p > 0, In particular, and EX p < if and only if p < α. 0 < α Var(X) = 0 < α E X =. The estimation of the heavy tail exponent α is an important problem with rich history. 3

Heavy tails everywhere: Traded volumes 0 x Traded Volumes No. Stocks, INTC, Nov, 005 05 8 6 4 x 0 4 4 6 8 0 x 0 4 4 3 000 4000 6000 8000 0000 000 4

Heavy tails everywhere: TCP durations x 0 4 TCP Flow Sizes (packets): UNC link 00 (~ 36 min) 8 6 4 4 6 8 0 4 time x 0 4 The first minute 00 000 800 600 400 00 500 000 500 000 500 3000 3500 5

Heavy tails everywhere: Insurance claims 50 Danish Fire Loss Data: 980 990 00 50 00 50 00 400 600 800 000 00 400 600 800 000 Hill plot: α (k) =.394 H.5 0 500 000 500 000 order statistics Max Spectrum H= 0.604 (0.00897), α =.655 0 8 6 4 0 0 5 0 Scales j 6

Tail exponent estimation: an old problem Hill (975) the MLE in the Pareto model P{X > x} = x α, x and introduced the Hill plot: α H (k) := ( k k log(x i,n ) log(x k+,n )), i= where X,n X,n X k,n are the top k order statistics of the sample. A lot of work for iid data less for dependent: Resnick and Stǎricǎ (995) consistency of Hill type estimators. J. Hill (006) asymptotic normality of Hill type estimators under NED (near epoch dependence) conditions.... Even for iid data, Hill plots are: volatile & hard to interpret: Hill horror plot 7

Another approach: max self similarity For iid (X k ) with tail exponent α n d Z, as n, n /α i= X i where P{Z x} = exp{ Cx α }, x > 0. The above continues to hold for many dependent stationary (X k )! Given X,..., X n, set D(j, k) := j i= X j (k )+i, to be block maxima of dyadic sizes. Observe that Y j := n j n j k= k n j := [n/ j ], j log (n). log D(j, k) Elog j/α Z = j/α + Elog Z, as j. 8

The max spectrum: iid asymptotics The Y j s, j log n is the max spectrum of the data set (X k, k n). An estimator of α is then derived from Y j via regression: α = α[j, j ] := j j=j w j Y j, with w j = 0, j jw j =. j For iid data: The estimator α[j, j ] is consistent and asymptotically normal, as j, j but so that n/ j, n/j. Thm [S., Michailidis & Taqqu (006)] For iid data under second order tail regularity conditions. Let r(n) log n be such that n/ r(n)(/+β/α) + r(n) r(n)/ / n 0, as n, then sup x R P{ n j +r(n)( θ, Y θ, µ r ) x} Φ(x/σ θ ) 0, n. 9

The max spectrum: iid asymptotics (cont d) Here Y = (Y j+r(n) ) j j=j, θ = (θ j ) j j=j, and µ r = ((j + r(n))/α + C, j j j ), and σ θ = α θ t Σ θ. Remarks: The β > 0 governs the second order tail behavior. Roughly: P{X > x} Cx α ( + Dx β ), as x. The asymptotic cov matrix Σ is the same as for Fréchet data. It does not depend on α and C = Elog Z. Consistency and asymptotic normality for α[r(n) + j, r(n) + j ] follow. The rates are the same as for the Hill estimator Hall (98). The explicit asymptotic cov α Σ of the max spectrum Y yields the optimal linear GLS estimators important in practice. 0

The max spectrum: dependent data Let (X k ) k Z be stationary, with tail exponent α and extremal index θ > 0. Then, n /α k n X k d θ /α Z where n /α k n X k d Z, (n ) where (X k ) are iid copies of X. Since θ > 0, the max spectrum (Y j ) for time series scales as for iid data: Y j j/α + C, as j and n j = n/ j. The same, regression based, estimators α = j j=j w j Y j work! The asymptotics for α are harder (than for iid data)! Intuition: the block maxima D(j, k), k n j are asymptotically iid, as j.

Max spectrum illustration: TCP durations TCP Flow Sizes (bytes): Max self similarity H= 0.94 (0.044637), α =.08 6 4 Max Spectrum 0 8 6 4 4 6 8 0 4 6 Scales j

Two asymptotic regimes Intermediate scales: Fix j < j integer and let α n = α[r(n)+j, r(n)+j ], where r(n) and r(n) /n 0, as n. We expect to get consistency and asymptotic normality for α n. Large scales: Fix l N and focus on the largest l + scales: α n = α[log n l,log n]. We can only get distributional consistency : with α Z a random variable. α n d αz, as n, Both regimes are useful/interesting in practice. More details... 3

Intermediate scales asymptotics The regularity conditions: for M n := max k n X k P{n /α M n x} = exp{ c(n, x)x α }, x > 0, where c(n, x) c X c (x)n β, x > 0, with c (x) = O(x R ), x 0. () (Plus a technicality at x 0.) Intuition: β controls the second order tail behavior of M n. Caveat: Relation () may be hard to verify! We have it for moving maxima. We get rates on moments of f(m n /n /α ), in particular: Thm [S. & Michailidis (006)] Under the above conditions, for all k N, E log k (M n /n /α ) Elog k (Z) = O(n β ), as n, provided c (x)x α +δ dx, for δ > 0. 4

Intermediate scales: asymptotic normality Let (X k ) be stationary with tail exponent α > 0. Thm [S. & Michailidis (006)] Under the above conditions, and if (X k ) is m dependent, we have nr(n) ( α n α) d N(0, α c w ), where c w = w t Σ w, and α n = α[r(n) + j, r(n) + j ], provided r(n) /n + n/ r(n)(+min{,β}) 0, as n. Remarks: The same asymptotic variance as in the iid case. Intuition: The block maxima D(j, k), k n j asymptotically iid! β captures: second order tails PLUS dependence. Asymptotic confidence intervals available! Optimal linear GLS estimators available! 5

Large scales: distributional consistency The regularity conditions and m dependence are restrictive. As in Davis & Resnick (985), let X k = i=0 c i ξ k i, where i c i δ <, 0 < δ < min{, α}. Here (ξ k ) are iid and P{ ξ > x} Cx α, x, with P{ξ > x}/p{ ξ > x} p [0,], as x. Lemma For X k (m) := max i m X m(k )+i, k =,,..., we get {m /α X k (m)} k N fdd {Z k } k N, as m, where (Z k ) are iid α Fréchet. Provided pmax i c i > 0 or ( p)max i ( c i ) > 0. This justifies the asymptotic independence phenomenon for the block maxima (D(j, k)) k as j! Thm [S. & Michailidis (006)] Under the above conditions, with fixed l d α n α Z,l, as n, where α n = α[ top l scales] and α Z is based on iid α Fréchet data Z,..., Z l+. 6

Distributional consistency: implications No consistency but confidence intervals! Covers more processes! The approximation is often valid for small n. 7

AR() with Pareto (α =.5) innovations AR() with Pareto innovations: φ = 0.9, α =.5 500 000 500 α 3.5.5 0.5.5.5 3 x 0 4 Hill plot Hill plot.5 α 0.5 3 Order statistics k x 0 4 500 000 500 000 Order statistics k 8

The max spectrum... Max self similarity: α =.4844 3 Max Spectrum 0 9 8 7 6 5 4 6 8 0 4 Scales j 9

ata examples: the advantage of time scales 0

Google: traded volume x 0 5 Transaction volumes for GOOG in November 005 Number of shares.5 0.5 5 0 5 0 5 Day of the month Confidence intervals for α per day 4 3 α 5 0 5 0 5 30 Day of the month

Google: traded volume the time series Number of shares x 0 4 8 6 4 Transaction volumes for GOOG: Nov 7, 005 α 3.5.5 0.5.5.5 3 3.5 4 x 0 4 Hill plot α =.079 8 6 4 0 8 Max Spectrum 0 00 400 600 800 0 5 0 5 Order statistics k Scales j

Intel: traded volume Number of shares x 0 6 3 6 Transaction volumes for INTC in November 005 5 0 5 0 5 Day of the month Confidence intervals for α per day 5 4 α 3 5 0 5 0 5 30 Day of the month 3

Intel: strange time series Number of shares x 0 5.5.5 0.5 Transaction volumes for INTC: Nov 3, 005 3 4 5 6 7 8 9 x 0 4 Hill plot α(7,) =.0578, α(,6) = 5.8 α 3 0 00 400 600 800 000 Order statistics k Max Spectrum 8 6 4 0 0 5 0 5 Scales j 4

Intel: typical time series 3 x 05 Transaction volumes for INTC: Nov, 005 Number of shares 0.5 3 4 5 6 x 0 4 Hill plot α =.5564 5 α.5 0.5 00 400 600 800 000 Order statistics k Max Spectrum 0 5 0 5 0 5 Scales j 5

References: Davis, R. A. and Resnick, S.I.(985) Limit theory for moving averages of random variables with regularly varying tail probabilities. The Annals of Probability 3(), 79 95. Hall, P. (98) On some simple estimates of an exponent of regular variation, J. Roy. Stat. Assoc. (Ser B), 44, 37 4. Hill, B. M. (975) A simple general approach to inference about the tail of a distribution. The Annals of Statistics 3, 63 74. Resnick, S. and Stǎricǎ, C. (995) Consistency of Hill s estimator for dependent data. Journal of Applied Probability 3, 39 67. Stoev, S. and Michailidis, G. (006) On the estimation of the heavy tail exponent in time series using the max spectrum, Technical Report, University of Michigan. Stoev, S., Michailidis, G., and Taqqu, M.S. (006) Estimating heavy tail exponents through max self similarity, Technical Report, University of Michigan. WRDS https://wrds.wharton.upenn.edu/. Wharton School of Management, Universty of Pennsylvania. 6