Analysis methods of heavy-tailed data

Size: px
Start display at page:

Download "Analysis methods of heavy-tailed data"

Transcription

1 Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course

2 Chapter 1 Introduction: definitions and basic properties of classes of heavy-tailed distributions. Tail index estimation. Methods for the selection of the number of the largest order statistics in Hill estimator. Rough methods for the detection of heavy tails and the number of finite moments. The Section 1 contains the introduction with necessary definitions, basic properties and examples of heavy-tailed data. The tail index indicates the shape of the tail and therefore it is the basic characteristic of heavy-tailed data. Methods of tail index estimation are presented. Finally, several rough tools for the detection of heavy-tails, the number of finite moments and dependence are considered.

3 Main assumption. Let X 1,...,X n be a sample of i.i.d. r.v.s distributed with the heavy-tailed DF F(x). Definition A DF F(x) (or the r.v. X) is called heavy-tailed if its tail F(x) = 1 F(x) > 0, x 0, satisfies y 0 lim P{X > x + y X > x} = lim F(x + y)/ F(x) = 1. x x Comparison of probability tails.

4 Examples of heavy- and light-tailed distributions Heavy-tailed Subexponential: distributions: Pareto, Lognormal, Weibull with shape parameter less than 1. With regularly varying tails: Pareto, Cauchy, Burr, Frechét, Zipf-Mandelbrot law. Light-tailed exponential, gamma, Weibull distributions with shape parameter more than 1, normal, finite distributions.

5 Heavy-tailed distributions have been accepted as realistic models for various phenomena: WWW-session characteristics sizes and durations of sub-sessions; sizes of responses inter-response time intervals on/off-periods of packet traffic file sizes service-time in queueing model flood levels of rivers major insurance claims extreme levels of ozon concentrations high wind-speed values wave heights during a storm low and high temperatures

6 Basic definitions Let X n = {X 1,...,X n } be a sample of i.i.d. r.v. distributed with the DF F(x) and M n = max(x 1, X 2,...,X n ). Extreme value theory assumes that for a suitable choice of normalizing constants a n, b n it holds P{(M n b n )/a n x} = F n (b n + a n x) n H γ (x), x R, and an Extreme Value DF H γ (x) is of the following type: exp( x 1/γ ), x > 0, γ > 0 Fréchet, H γ (x) = exp( ( x) 1/γ ), x < 0, γ < 0 Weibull, exp( e x ), γ = 0, x R Gumbel. Definition The parameter γ is called the extreme value index (EVI) and defines the shape of the tail of the r.v. X. The parameter α = 1/γ is called tail index.

7 Pickands s theorem: The limit distribution of the excess distribution of the X i s is necessarily of the Generalized Pareto form lim P(X 1 u > x X 1 > u) (1 + γx) 1/γ +, x R, u x F,u+x<x F where x F = sup{x R : F(x) < 1} is the right endpoint of the distribution F and the shape parameter γ R. Therefore, the Generalized Pareto distribution (GPD) with DF Ψ σ,γ (x) = 1 (1 + γx/σ) 1/γ + is often used as a model of the tail of the distribution.

8 The classes of heavy-tailed distributions distributions with regularly varying tails (RVT ) (X R 1/γ ) P{X > x} = x 1/γ l(x), x > 0, γ > 0, where l is a slowly varying function, i.e. lim l(tx)/l(x) = 1, t > 0. x Examples: Pareto, Burr, Fréchet distributions belong to RVT. Examples of l(x) give c ln x, c ln(ln x) and all functions converging to positive constants.

9 The classes of heavy-tailed distributions subexponential distributions (S) (X S) P{S n > x} np{x 1 > x} P{M n > x} as x, where S n = X X n, n 2, M n = max i=1,...,n {X i }. Example: Weibull with shape parameter less than 1 belongs to S. Intuitively, subexponentiality means that the only way the sum can be large is by one of the summands getting large (in contrast, in the light-tailed case all summands are large if the sum is so).

10 Basic properties of regularly varying distributions: Lemma Let X R α. Then, (i) X S. (ii) E{X β } < if β < α, E{X β } = if β α. (iii) If α > 1, then X r R 1 α and P{X r > x} l(x)x 1 α /((α 1)E{X}) as x. (iv) If Y is non-negative and independent of X such that P{Y > x} = l 2 (x)x α 2, then X + Y R min(α,α2 ) and P{X + Y > x} P{X > x} + P{Y > x} as x. (v) If Y is non-negative and independent of X such that E{Y α+ε } < for some ε > 0 then XY R α and P{XY > x} E{Y α }P{X > x} as x.

11 Important property for the rough detection of heavy tails and the number of finite moments: Let X R α. Then E{X β } <, if β < 1/γ; Examples: E{X β } =, if β > 1/γ. If α = 2, γ = 0.5, then EX 1 < (the first moment is finite), EX1 2 = (the second moment is infinite or does not exist). If α = 0.5, γ = 2, then EX 1 =, EX1 2 =... (all moments are infinite or do not exist).

12 Estimators of tail index X (1) X (2)... X (n) are order statistics of the sample X n = {X 1, X 2,..., X n } For γ > 0: Hill s estimator k ˆγ H (n, k) = 1 k i=1 ln X (n i+1) ln X (n k) Ratio estimator Goldie, Smith, (1987) a n = a n (x n ) = n n ln(x i /x n )I{X i > x n }/ I{X i > x n }, i=1 i=1 x n is am arbitrary threshold level, e.g., x n = X (n k)

13 Estimators of tail index For γ R: Moment estimator, Dekkers, Einmahl, de Haan, (1989): ( 1 ˆγ M (n, k) = ˆγ H (n, k) (ˆγ H (n, k)) 2 /S n,k )), S n,k = (1/k) k ( ) 2 i=1 ln X(n i+1) ln X (n k) UH estimator, Berlinet, (1998): ˆγ UH (n, k) = (1/k) k ln UH i ln UH k+1, UH i = X (n i)ˆγ H (n, i) i=1 Pickands s estimator ˆγ P (n, k) = 1/ ln 2 ln ( X (k) X (2k) ) / ( X(2k) X (4k) ) No recursiveness!

14 Group estimator, Davydov, Paulauskas, Račkauskas, (2000): The sample X n is divided into l groups V 1,..., V l, each group containing m r.v.s, i.e. n = l m. Estimator of the function of the tail index: where z l = (1/l) l i=1 k li = ˆα ˆα+1 = 1 1+ˆγ, k li = M (2) li /M (1) li, M (1) li = max{x j : X j V i } and M (2) li is the second largest element in the same group V i.

15 Group estimator, Davydov, Paulauskas, Račkauskas, (2000).Theoretical background. 1 For distributions with regularly varying tails 1 F(x) = x α l(x), and l = m = [ n], it is proved 2 For distributions i=1 z l a.s. α α + 1 = γ. 1 F(x) = C 1 x α + C 2 x β + o(x β ), with β = 2α it holds ( l l l(l 1 k li α(1+α) 1 ) kli 2 l 1 i=1 ) 1/2 l k li p N(0, 1). i=1

16 Recursiveness of the estimate z l. On-line estimation. Having the next group of observations V l+1 it follows γ l+1 = ( l l k ) 1 l+1 l γ l l + 1 After getting i additional groups with m elements each V l+1,..., V l+i ( ) l 1 γ l+i = (l + i) + k l+1l k l+il+i 1, 1 + γ l i.e. γ l+i is obtained using γ l by O(1) calculations.

17 Recursiveness of the estimate z l. On-line estimation. Since it holds z l+i = lz l + z l+i = 1/(1 + γ l+i ) i j=1 k l+jl+j /(l + i), bias(z l+i ) = bias(z l ), var(z l+i ) < var(z l ) for i > 0 var(z l+i ) = var(z l )l/(l + i),

18 The selection of the number of the largest order statistics in Hill estimator A Hill plot {(k, ˆγ H (n, k)), 1 k n 1}): the estimate of ˆγ H (n, k) is chosen from an interval in which this function demonstrate stability. Plot of the mean excess function {(u, e(u)) : X (1) < u < X (n) }, where e(u) = n (X i u)1{x i > u}/ i=1 n 1{X i > u} is the sample mean excess function over the threshold u. If this plot follows a reasonably straight line above a certain value of u, then this indicates that excesses over u follow e P (u) = (1 + γu)/(1 γ) of generalized Pareto distribution with positive shape parameter. One can take the nearest order statistics to u as the estimate of k. i=1

19 The sensitivity of the Hill estimate to k gamma(k) 6 gamma(k) gamma(k) k k k The Hill s estimate against k for 15 samples of Weibull (left), Pareto (middle) and Frechét (right) distributions, all with shape parameter α = 1/γ = 0.5. Sample size is n = 1000.

20 Plot of the mean excess function. Left: Weibull distribution. Right: Pareto distribution. The shape parameter is 0.5, sample size 1000.

21 Bootstrap method for k selection. Minimizing of the empirical bootstrap estimate of the mean squared error of γ: MSE(ˆγ) = E (ˆγ γ) 2 min k. The bootstrap estimate is obtained by drawing B samples with replacement from the original data set X n. Smaller re-samples {X 1,..., X n 1 } of the size are used. n 1 = n β, 0 < β < 1, The corresponding smaller k 1 and an optimal k are related by: k = k 1 (n/n 1 ) α, 0 < α < 1, where β = 1/2 and α = 2/3, P.Hall, (1990).

22 Empirical bootstrap estimate of the MSE(ˆγ) is where MSE (n 1, k 1 ) = (ˆb (n 1, k 1 )) 2 + var (n 1, k 1 ) min k 1, ˆb (n 1, k 1 ) = 1 B var (n 1, k 1 ) = 1 B 1 B ˆγ b (n 1, k 1 ) ˆγ(n, k), b=1 ( B ˆγ b (n 1, k 1 ) 1 B b=1 B ˆγ b (n 1, k 1 ) b=1 are the empirical bootstrap estimates of the bias and the variance, ) 2 ˆγ b is the Hill s estimate constructed by some re-sample of the size n 1 with the parameter k 1.

23 Double bootstrap method for k selection. Danielsson, de Haan, Peng and de Vries, (1997) requires less parameters than bootstrap method, Hall, (1990): n 1 and B are required, α is not required. Auxiliary statistic: MSE(z n,k ) = E ( z n,k z n,k) 2, where z n,k = M n,k 2(ˆγ H (n, k)) 2, M n,k = 1 k ( ) 2 log X(n j+1) log X k (n k), j=1 z n,k is a bootstrap estimate of z n,k.

24 Double bootstrap method for k selection. M n,k /2ˆγ H (n, k) and ˆγ H (n, k) are consistent estimates for γ, = z n,k 0 as n. = AMSE(z n,k ) = E(z n,k ) 2 min k.

25 Double bootstrap procedure is Draw B bootstrap sub-samples of the size n 1 ( n, n) (e.g., n 1 n 3/4 ) from the original sample and determine the value ˆk n 1 that minimizes MSE of z n1,k. Repeat this for B sub-samples of the size n 2 = [n1 2 /n] ([x] is the integer part of the number) and determine the value ˆk n 2 that minimizes MSE of z n2,k. Calculate ˆk opt n from the formula ˆk n opt = [ (ˆk n 1 ) 2 ( 1 1ˆρ ) 2 2ˆρ 1 1 ], ˆρ1 = ˆk n 2 1 log ˆk n 1 2 log(ˆk n 1 /n 1 ), and estimate γ by the Hill s estimate with ˆk opt n. The method is robust with respect to the choice of n 1, Gomes, Oliveira, (2000).

26 Sequential procedure for k selection is based on the theoretical result: i(ˆγ H (n, i) γ) (log log n) 1/2, 2 i k in probability, Drees & Kaufmann, (1998).

27 Sequential procedure for k selection. Algorithm. An initial estimate ˆγ 0 = ˆγ H (n, 2 n) for the parameter γ is obtained by the Hill s estimate. For r n = 2.5 ˆγ 0 n 0.25 we compute ˆk(r n ) = min{k 1,...,n 1 max 2 i k i(ˆγ H (n, i) ˆγ H (n, k)) > r n }. If r n is too large and max 2 i k i(ˆγ H (n, i) ˆγ H (n, k)) > r n is not satisfied it is recommended repeatedly replace r n by 0.9r n until ˆk(r n ) i well defined. Similarly, ˆk(r n) ε for ε = 0.7 is computed. Optimal k ( ) 1/(1 ε) ˆk opt = 1 ˆk(r ε n ) (2ˆγ 0 ) 1/3 3 (ˆk(r n )) ε is calculated and γ is estimated by ˆγ H (n, ˆk opt ). The method is sensitive to the choice of r n.

28 Plot method for m selection in the group estimator. The plot {(m, 1/z m 1)} for Pareto distribution with γ = 1, the true γ is shown by dotted line. Sample sizes are n = {150, 500, 1000}.

29 Plot method for m selection in the group estimator. Plot: {(m, z m ), m 0 < m < M 0 }, m 0 > 2, M 0 < n/2 m = n/l, z m = (m/n) [n/m] i=1 k (n/m)i From consistency result z l a.s. 1 1+γ it follows that there must be an interval [m, m + ] such that z m α/(1 + α) = (1 + γ) 1 for all m [m, m + ]. We suggest choosing the average value z = mean{1/z m 1 : m [m, m + ]} and m [m, m + ] as a point such that z m = z.

30 Bootstrap method for m automatical selection. Minimizing of the empirical bootstrap estimate of the mean squared error of (1 + γ l ) 1, l = n/m: MSE(γ l ) = E ( 1 l l i=1 ) 2 k li 1 min 1 + γ. m The bootstrap estimate is obtained by drawing B samples with replacement from the original data set X n. Smaller re-samples {X 1,..., X n 1 } of the size are used. n 1 = n d, 0 < d < 1, The re-sample is divided into l 1 subgroups. The size of subgroups m 1 and m are related by: m = m 1 (n/n 1 ) c, 0 < c < 1, where m 1 is the size of subgroups in re-samples.

31 Empirical bootstrap estimate of the MSE where MSE (l 1, m 1 ) = (ˆb (l 1, m 1 )) 2 + var (l 1, m 1 ) min m 1, ˆb (l 1, m 1 ) = 1 B var (l 1, m 1 ) = 1 B 1 B zl b 1 z l, b=1 ( B zl b 1 1 B b=1 ) 2 B zl b 1 b=1 are the empirical bootstrap estimates of the bias and the variance, zl b 1 = 1 l1 l 1 i=1 k l1i is the group estimator constructed by some re-sample. How to select c and d?

32 Simulation study: the selection of c and d. Asymptotic theory (P.Hall, (1990)) recommends d = 1/2 and c = 2/3 for the bootstrap estimation of the parameter k of the Hill s estimate of γ. We check c = {0.05, 0.1(0.1); 0.5} for a fixed d = 0.5. Relative bias and the square root of the mean squared error: Biasγ = 1 1 N R ˆγ i γ, γ N R i=1 RMSEγ = 1 1 N R (ˆγ i γ) 2 γ N R i=1

33 Conclusions: the best values of c for the fixed d = 0.5 are c = { }; the bias of the group estimator is larger for Weibull distribution. Further research: proof of theoretically best values of c and d for the group estimator using the bootstrap.

34 Rough methods for the detection of heavy tails and the number of finite moments. 1. Ratio of the maximum to the sum Let X 1, X 2,..., X n be i.i.d. r.v.s. We define the following statistic R n (p) = M n(p), n 1, p > 0, where S n (p) S n (p) = X 1 p +... X n p, M n (p) = max ( X 1 p,..., X n p), n 1 to check the moment conditions of the data. For different values of p the plot of n R n (p) gives a preliminary information about the distribution P{ X > x}. Then E X p < follows, if R n (p) is small for large n. For large n a significant difference between R n (p) and zero indicates that the moment E X p is infinite.

35 Rough methods for the detection of heavy tails and the number of finite moments. 2. QQ-plot A QQ-plot (or quantiles against quantiles -plot) draws ( the dependence ( )) { X (k), F n k+1 n+1 : k = 1,..., n}, where X (1)... X (n) are the order statistics of the sample, and F is an inverse function of the DF F. Often, the QQ-plot is built as a dependence of exponential quantiles against the order statistics of the underlying sample. Then F is an inverse function of the exponential DF. Then a linear QQ-plot corresponds to the exponential distribution.

36 Rough methods for the detection of heavy tails and the number of finite moments. 3. Plot of the mean excess function n n e n (u) = (X i u)1{x i > u}/ 1{X i > u} i=1 i=1 1 Heavy-tailed distributions: e(u), u. 2 A Pareto distribution: a linear e(u). 3 An exponential distribution: the constant e(u) = 1/λ. 4 Light-tailed distributions: e(u) 0, u.

37 Rough methods for the detection of heavy tails and the number of finite moments. several distributions. Plots of function e(u) for

38 Rough methods for the detection of heavy tails and the number of finite moments. 4. Hill s and other estimators of the tail index The Hill estimate works bad if 1 the underlying DF does not have a regularly varying tail, 2 the tail index α = 1/γ is not positive, 3 the sample size is not large enough, 4 the tail is not heavy enough, i.e. γ is not big, 5 F R α, since it strongly depends on the slowly varying function l(x) that is usually unknown. The disadvantages of Hill s estimate show that one has to apply several estimates of the tail index to deal with the complex analysis of data.

39 Definition of the independence. Definition Random variables ξ 1, ξ 2,..., ξ n are independent if for any x 1, x 2,..., x n it holds P{ξ 1 x 1, ξ 2 x 2,..., ξ n x n } = P{ξ 1 x 1 }P{ξ 2 x 2 }...P{ξ n x n } or in terms of cumulative distribution functions F(x 1, x 2,..., x n ) = F 1 (x 1 )F 2 (x 2 )...F n (x n ), where F k (x k ) is a distribution function of the r.v. ξ k.

40 Rough methods for the dependence detection. Correlation between two r.v.s X 1 and X 2 is defined by where ρ(x 1, X 2 ) = cov(x 1, X 2 ) var(x1 )var(x 2 ), cov(x 1, X 2 ) = E(X 1 X 2 ) E(X 1 )E(X 2 ) is a covariance, and var(x) is a variance. 1 If X 1 and X 2 are independent then ρ(x 1, X 2 ) = 0. 2 If ρ(x 1, X 2 ) = 0 then not necessary X 1 and X 2 are independent! 3 If Gaussian r.v.s X 1 and X 2 are not correlated then they are independent. 4 For non-gaussian r.v.s it may be not true.

41 Rough methods for the dependence detection. Generally, the covariances and correlations cannot indicate the dependence. 1 They describe the degree to which two r.v.s. are linearly dependent: ρ(x 1, X 2 ) [ 1, 1]. 2 ρ(x 1, X 2 ) = ±1 X 1 and X 2 are perfectly linearly dependent, i.e. X 2 = α + βx 1, for some α R and β 0. What tool could be an appropriate to indicate the dependence?

42 Exact methods for the dependence detection. The mixing conditions of a stationary sequence should be considered for dependence detection. Definition (Rosenblatt (1956b)) The strictly stationary ergodic sequence of random vectors X t is strongly mixing with rate function φ k for σ-field, if sup P(A B) P(A) P(B) = φ k 0, k. A σ(x t,t 0),B σ(x t,t>k) The rate φ k shows how fast the dependence between the past and the future decreases.

43 Exact methods for the dependence detection. Another measure of dependence is given by the dependence index: n β n = sup f j (x, y) f(x)f(y), x,y j=1 where f(x) is a marginal probability density function (PDF) of a stationary sequence {X j, j = 1, 2,...}, f j (x, y) is a joint PDF of X 1 and X 1+j, j = 1, 2,... 1 for i.i.d. sequences β n = 0 for all n, 2 for sequences with strong long range dependence β n may tend to infinity, and 3 in between β n may converge to a finite limit at various rates. It is difficult to estimate such dependence measures by statistical tools.

44 Rough methods for the dependence detection. The autocorrelation function (ACF ) is ρ X (h) = ρ(x t, X t+h ) = E((X t E(X t ))(X t+h E(X t+h ))) /Var (X t ) Let {X t, t = 0, ±1, ±2,...} be a stationary sample series. The standard sample ACF at lag h Z is ρ n,x (h) = n h t=1 (X t X n )(X t+h X n ) n t=1 (X t X n ) 2, where X n = 1 n n t=1 X t represents the sample mean. The accuracy of ρ n,x (h) may be poor if the sample size n is small or h is large. The relevance of ρ n,x (h) is determined by its rate of convergence to the real ACF. When the distribution of the X t s is very heavy-tailed (in the sense that EXt 4 = ), this rate can be extremely slow.

45 The autocorrelation function for heavy-tailed data. For heavy-tailed data with infinite variance it is better to use the modified sample ACF : i.e. without X n. ρ n,x (h) = n h t=1 X tx t+h n t=1 X t 2, However, this estimate may behave in a very unpredictable way if one uses the class of non-linear processes in the sense that this sample ACF may converge in distribution to a non-degenerate r.v. depending on h. For linear models it converges in distribution to a constant depending on h, Davis and Resnick (1985).

46 Confidence intervals of the sample ACF: linear processes. The causal ARMA process (autoregressive moving average) X t has the representation X t = ψ j Z t j, t = 0, ±1,..., (1) j=0 where Z t is an i.i.d. noise sequence and {ψ j } is a sequence of real numbers providing the convergence of random series in (1), i.e., j=0 ψ j <, Brockwell and Davis (1991). The model ARMA(p,q) has the form X t = p j=0 θ jx t j + q j=0 ψ jz t j, t = 1,..., n. The model MA(q) has the form X t = q j=0 ψ jz t j.

47 Confidence intervals of the sample ACF: linear processes. Bartlett s formula. If the process is linear (1), j= ψ j < and EZt 4 <, then standard sample ACF ρ n,x (i) has the asymptotic joint normal distribution with mean ρ X (i) (i.e. ACF) and variance var ( ρ n,x (i) ) = c ii /n, c ii = k= [ρ 2 X (k + i) + ρ X(k i)ρ X (k + i) + 2ρ 2 X (i)ρ2 X (k) 4ρ X (i)ρ X (k)ρ X (k + i)], (2) as n. The Bartlett s formula (2) allows to check the hypothesis ρ n,x (i) = 0.

48 Confidence intervals of the sample ACF. Example For i.i.d. white noise Z t we have ρ Z (0) = 1 and ρ Z (i) = 0 for i 0 (since Z t and Z t+i are independent) and var ( ρ n,z (i) ) = 1/n by (2). For ARMA process driven by such a white noise Z t 1 the sample ACF is approximately normally distributed with mean 0 and variance 1/n for sufficiently large n. 2 It provides 95% confidence interval with the bounds ±1.96/ n for the sample ACF. 3 The hypothesis ρ n,x (i) = 0 is accepted if ρ n,x (i) falls within this interval. What would happened if 1 the noise Z t is not normal, and (or) 2 the process is not linear?

49 Confidence intervals of the heavy-tailed sample ACF ρ n,x (h): linear processes. ARMA process with i.i.d. regularly varying noise and tail index 0 < α < 2 (infinite variance). ρ n,x (h) estimates the quantity j ψ jψ j+h / j ψ2 j that represents the autocorrelation cov(x 0, X h ) in the case of a finite variance. Illusion: heavy-tailed sample ACF ρ n,x (h) can be applied to heavy-tailed processes without problem. Recommendation (Resnick (2006), p.349): For α < 1 use ρ n,x (h); for 1 < α < 2 the classical sample ACF ρ n,x (h). The calculation of confidence intervals in both cases is not easy.

50 Confidence intervals of the heavy-tailed sample ACF ρ n,x (h): nonlinear processes. GARCH(p,q) process (Mikosch (2002)): (Z t ) is an i.i.d. sequence, X t = µ + σ t Z t, t Z, p q σt 2 = α 0 + α i Xt i 2 + β j σt j 2, i=1 σ t and Z t are independent for a fixed t, the mean µ is estimated from the data, particularly µ = 0, α i and β j are non-negative constants. j=1 If the marginal distribution of the time series is very heavy-tailed, namely, the fourth moment is infinite, then the asymptotic normal confidence bounds for the sample ACF are not applicable anymore.

51 Testing of long-range dependence (LRD). Definition A stationary process (X t ) is long range dependent if h=0 ρ X(h) =, where ρ X (h) is the ACF at lag h Z, and short range dependent otherwise. This property implies that even though ρ X (h) s are individually small for large lags, their cumulative effect is important. To detect LRD by statistical procedures one replaces ρ X (h) by the sample ACF ρ n,x (h). The LRD effect is typical for long time series, e.g., several thousand points. One can look at lags 250, 300, 350, etc. Usual short range dependent data sets would show a sample ACF dying after only a few lags and then persisting within the 95% Gaussian confidence window ±1.96/ n.

52 Testing of long-range dependence: Hurst parameter. One can assume that for some constant c ρ > 0 ρ X (h) c ρ h 2(H 1) for large h and H (0.5, 1), (LRD case). The constant H (0.5, 1) is called the Hurst parameter. The closer H is to 1 the slower is the rate of ρ X (h) to zero as h, i.e., the longer is the range of dependence in the time series. Kettani and Gubner (2002): Ĥ n = 0.5 ( 1 + log 2 (1 + ρ n,x (1)) ).

53 Dependence detection by bivariate data Measures Rank coefficient Kendall s τ can be estimated by the sample: ρ τ = 2S τ n(n 1), where S τ = n n i=1 j=i+1 sign ( r j r i ), where r i is the order number of the individuum by the second property with the number i by the first property. Rank coefficient Spearman s ρ can be estimated by the sample: ρ S = 1 6S ρ n 3 n, where S ρ = Pickands dependence function A(t). n (r i i) 2 i=1 ρ τ and ρ S can be represented by means of A(t).

54 Pickands dependence function Let (X 1, Y 1 ),...,(X n, Y n ) be a bivariate i.i.d. random sample with a bivariate extreme value distribution G(x, y). Example: X 1 is a TCP-flow file size, Y 1 is a duration of its transmission. Similarly to univariate case it implies, that there exist normalizing constants a j,n > 0 and b j,n R, j = 1, 2 such that as n, P{ ( M 1,n b 1,n ) /a1,n x, ( M 2,n b 2,n ) /a2,n y} (3) = F n ( a 1,n x + b 1,n, a 2,n y + b 2,n ) G(x, y), where M 1,n = max (X 1,..., X n ), M 2,n = max (Y 1,..., Y n ) are the component-wise maxima. The vector (M 1,n, M 2,n ) will in general not be presence in the original data.

55 Pickands dependence function. Definition Let F 1 (x) and F 2 (y) be the DFs of X and Y. Let G j (x), j = 1, 2 be a univariate extreme value DF and F j (x) is in its domain of attraction. G(x, y) may be determined by margins G 1 (x) and G 2 (y) by the representation ( ( )) log (G2 (y)) G(x, y) = exp log (G 1 (x)g 2 (y)) A, log (G 1 (x)g 2 (y)) where A(t), t [0, 1], is the Pickands dependence function.

56 Pickands dependence function: properties. In the bivariate case the function A(t) satisfies two properties: 1 (1 t) t A(t) 1, t [0, 1], i.e. A(0) = A(1) = 1 and lies inside the triangle determined by points (0, 1), (1, 1) and (0.5, 0.5); 2 A(t) is convex. Case A(t) 1 corresponds to a total independence and A(t) = (1 t) t corresponds to a total dependence.

57 A-estimators Capéraà et al. (1997): ( Â HT n (t) = (1/n) ( )) 1 n ˆξ min i /ξ n 1 t, ˆη i/η n, t i=1 Hall and Tajvidi (2000): log ÂC n (t) = 1 n t 1 n n i=1 ) log max (t ˆξ i, (1 t)ˆη i n log ˆξ i (1 t) 1 n i=1 n log ˆη i. Here ˆξ i = log Ĝ1(X i ) and ˆη i = log Ĝ2(Y i ), i = 1,..., n, ξ n = n 1 n ˆξ i=1 i, η n = n 1 n i=1 ˆη i. i=1

58 Problems of A-estimators 1 The estimators are not convex. They may be improved by taking a convex hull. 2 The margins G 1 (x) and G 2 (x) are unknown. One has to replace them by their estimates Ĝ1(x) and Ĝ2(x), e.g., by empirical DFs constructed by component-wise maxima over blocks of data. The amount of these maxima may be very moderate that may reflect on the accuracy of an empirical DF. 3 The component-wise maxima may be not observable together, i.e. there are such pairs of maxima which do not presence in the sample.

59 Bivariate quantile curve Both estimates of A(t) allow to get the estimate Ĝ(x, y) and construct bivariate quantile curves Q(Ĝ, p) = {(x, y) : Ĝ(x, y) = p}, 0 < p < 1, (4) ) (Ĝ2 Ĝ(x, y) exp log (Ĝ1 (x)ĝ2(y)) Â log (y) ), log (Ĝ1 (x)ĝ2(y) assuming that Ĝ1(x) = p (1 w)/â(w) and Ĝ2(x) = p w/â(w) for some w [0, 1] in order to get Ĝ(x, y) = p, Beirlant et al. (2004). The quantile curve consists of the points ) Q(Ĝ, p) = { (Ĝ 1 1 (p(1 w)/â(w) ), Ĝ 1 2 (pw/â(w) ) : w [0, 1]}.

60 Web-traffic analysis Characteristics of sub-sessions: the size of a sub-session (s.s.s); the duration of a sub-session (d.s.s.). Characteristics of the transferred Web-pages: the size of the response (s.r.); the inter-response time (i.r.t.).

61 Description of the Web data Main statistical characteristics of the Web-data. s.s.s.(b) d.s.s.(sec) s.r.(b) i.r.t.(sec) Sample Size Mini mum Maxi mum Mean StDev

62 Results of the Web traffic analysis. Estimation of the tail index for Web-traffic characteristics. r.v. c m b γl b γ p l ˆγ n,k H s.s.s s.r i.r.t d.s.s

63 Results of the Web traffic analysis. Notations to the previous table: γl b is the group estimator with the bootstrap selected parameter m (m b ); γ p l is the group estimator with the plot selected parameter m; ˆγ H n,k k. is the Hill s estimator with the plot selected parameter

64 The EVI estimation by the Hill s estimator and the group estimator γ l for the data sets size of sub-sessions (left) and duration of sub-sessions (right).

65 The EVI estimation by the Hill s estimator and the group estimator γ l for the data sets inter-response times (left) and size of responses (right).

66 Conclusions from analysis of the tail index: 1 the distributions of considered Web-traffic characteristics are heavy-tailed; 2 at least βth moments, β 2 of the distribution of the s.s.s., s.r., d.s.s. are not finite; 3 the distribution of i.r.t. has two finite moments; 4 it might be possible for s.s.s. (when 1 < ˆγ < 2) that α < 1 and the expectation could be also not finite.

67 Results of the Web traffic analysis. Analysis of values R n (p). the values R n (p) are dramatically large for large n and p 2, e.g., in the case of the duration of sub-sessions and n = 350 ln R n (p) 10 for p = 2 and ln R n (p) 10 3 for p = 3. Hence, one may conclude that all moments of the considered r.v.s apart from the first one are not finite.

68 n ln R n (p) of the duration of sub-sessions (left) and the size of sub-sessions (right) for a variety of p-values.

69 n ln R n (p) of the inter-response times (left) and the size of responses (right) for a variety of p-values.

70 Results of the Web traffic analysis. Analysis of the mean excess function. the plots u e n (u) tend to infinity for large u implying heavy tails. These plots are close to a linear shape for all sets of data. The latter implies that the considered distributions can be modelled by a DF of a Pareto type.

71 Exceedance e n (u) against the threshold u for the duration of sub-sessions (left) and the size of sub-sessions (right).

72 Exceedance e n (u) against the threshold u for the inter-response times (left) and the size of responses (right).

73 Results of the Web traffic analysis. Analysis of QQ-plots. Two versions of QQ-plots are shown. Right top plots show, that the distribution of the d.s.s. is close to lognormal and Pareto distributions; distributions of s.s.s., i.r.t. and s.r. are close to Pareto. Left top plots show that the exponential distribution cannot be used as an appropriate model for these r.v.s.; distributions of d.s.s., s.s.s., i.r.t. and s.r. are close to a Generalized Pareto distribution with the DF { 1 (1 + γx/σ) Ψ σ,γ (x) = 1/γ, γ 0, 1 exp ( x/σ), γ = 0, with different values of the parameters γ and σ. The QQ-plot does not give a unique model to fit the underlying distribution.

74 QQ-plots for the duration of sub-sessions. Left: exponential quantiles (top) and quantiles of the GPD(0.3;1) (bottom) against d.s.s./s (the linear curves correspond to the QQ-plot of the same distributions). Right: Empirical DF s of the r.v. U i = F(X i ). Exponential, Pareto, Weibull, lognormal and normal distributions are used as models of F. The linear curve corresponds to the exponential distribution.

75 QQ-plot for the size of sub-sessions. Left: exponential quantiles (top) and quantiles of the GPD(1;0.015) (bottom, first plot) and the GPD(0.05;0.3) (bottom, second plot) against s.s.s./s (the linear curves correspond to the QQ-plot of the same distributions). Right: Empirical DFs of the r.v. U i = F(X i ). Exponential, Pareto, Weibull, lognormal and normal distributions are used as the models of F. The linear curve corresponds to the exponential distribution.

76 QQ-plots for the inter-response times. Left: exponential quantiles (top) and quantiles of the GPD(0.8;0.015) (bottom) against i.r.t./s (the linear curves correspond to the QQ-plot of the same distributions). Right: Empirical DFs of the r.v. U i = F(X i ). Exponential, Pareto, Weibull, lognormal and normal distributions are used as models of F. The linear curve corresponds to the exponential distribution.

77 QQ-plot for the size of responses. Left: exponential quantiles (top) and quantiles of the GPD(1;0.015) (bottom) against s.r./s (the linear curves correspond to the QQ-plot of the same distributions). Right: Empirical DFs of the r.v. U i = F(X i ). Exponential, Pareto, Weibull, lognormal and normal distributions are used as the models of F. The linear curve corresponds to the exponential distribution.

78 Results of the Web traffic analysis. ACF analysis of Web-data. The previous analysis shows that the considered Web data are heavy-tailed with infinite variance. Therefore, the application of formula is relevant. ρ n,x (h) = n h t=1 X tx t+h n t=1 X 2 t

79 Summary results of the preliminary analysis Comparison of the recommended methods for Web traffic data Amount of finite moments Type of distribution r.v. R n (p) Hill & Group QQ-plot e n (u) estimator s.s.s. 1 1 GPD(0.015; 1), Pareto (B) GPD(0.05; 0.3) -like d.s.s. 1 1 GPD(1; 0.3), Pareto (sec) lognormal -like s.r. 1 1 GPD(0.015; 1) Pareto (B) -like i.r.t. 1 2 GPD(0.015; 0.8) Pareto (sec) -like

80 Testing of dependence ACF estimation by the modified sample ACF and the standard sample ACF for the data sets s.s.s. (first two plots left), d.s.s. (last two plots right). The dotted horizontal lines indicate 95% asymptotic confidence bounds (±1.96/ n) corresponding to the ACF of i.i.d. Gaussian r.v.s.

81 Testing of dependence ACF estimation by the modified sample ACF and the standard sample ACF for the data sets i.r.t. (first two plots left), s.r. (last two plots right). The dotted horizontal lines indicate 95% asymptotic confidence bounds (±1.96/ n) corresponding to the ACF of i.i.d. Gaussian r.v.s.

82 Hurst parameter estimation for Web traffic data Table: Hurst parameter estimation for Web traffic. Data s.s.s. d.s.s. i.r.t. s.r. Ĥ n Main conclusion: all data sets are heavy-tailed and not long-range dependent.

83 TCP-flow analysis We observe TCP-flow sizes S and durations D gathered from one source destination pair. Motivation is to estimate the distribution of the maximal rate (or throughput) R = S/D and the expected throughput R (or S/D) that the transport system provides.

84 Description of the TCP-flow data The analyzed data consist of TCP-flow sizes and durations of transmissions have been measured from the mobile network of the Finnish operator Elisa; mobile TCP connections from periods of low, average and high network load conditions; TCP flows on port 80 (a WWW (HTTP) application). The number of analyzed flows is and, for practical reasons, we consider 61 disjoint bivariate samples, each of size n =

85 Description of the TCP-flow data Statis Unit Definition Sample Mean Sample Variance tic Min Max Min Max Size kb Content Transmitted Dura sec SYN-FIN tion The results, [min,max] ranges over all 61 samples. Content refers to the size of the downloaded web content and Transmitted means Content plus segments retransmitted by TCP. Both are measures of the size of a flow. SYN-FIN means from the three-way handshaking (synchronization) to finish.

86 Results of the TCP-flow data analysis Table: Estimation of the EVI for flow sizes ( Content and Transmitted ) and durations ( SYN-FIN ) γ H (n, k) γ l ˆγ M (n, k) ˆγ UH (n, k) Min Max Min Max Min Max Min Max Content Transmit ted SYN-FIN

87 Conclusions from analysis of the tail index for TCP-flow data: 1 the distributions of TCP-flow size and duration are heavy-tailed; 2 all estimators apart of the group estimator γ l indicate that the flow sizes samples (both content and transmitted) may have infinite variance under the assumption that their distributions are regularly varying; 3 some samples of flow durations may have two finite first moments.

88 Results of the TCP-flow data analysis Table: Comparison of the rough methods for TCP-flow data Amount of first finite Type of distribution moments R n (p) Estimators of γ QQ-plot e n (u) Content 1 2 or 1 GPD(1, 1.3) Paretolike Transmit 1 2 or < 1 GPD(1, 1.3) Paretoted like SYN-FIN < 1 2 or < 1 GPD(1, 0.85) Paretolike

89 Testing of dependence ACF estimation by the modified sample ACF and the standard sample ACF of one sub-sample (n = 1000) of the TCP-flow sizes (first two plots left), and durations (last two plots right). The horizontal lines indicate 95% asymptotic confidence bounds (±1.96/ n). Conclusions: The TCP-flow sizes may be independent. The ACFs of the TCP-flow durations have three clusters that may indicate the dependence.

90 Testing of long-range dependence Table: Hurst parameter estimation for Web traffic and TCP-flow data Data TCP flow size TCP flow duration Ĥ n Main conclusion: TCP-flow size and duration data sets are heavy-tailed and not long-range dependent.

91 Problems of throughput investigation The distributions of both S and D are heavy-tailed and their expectations may not be finite. Thus, ES/D may be not computable. Since S and D are dependent and positive, then the DF of the ratio R = S/D is defined by F R (x) = P{S/D x} = = zx 0 0 zx 0 0 df(y, z), f(y, z)dydz f(y, z) is a joint PDF of S and D, and its expectation by ER = if the latter integral converges. 0 xdf R (x),

92 Bivariate analysis of TCP-flow data First, we check the dependence between the pairs (S 1, D 1 ),...,(S n, D n ) to apply (3). For this purpose, we can calculate the ACF of the r.v.s r i = Si 2 + Di 2, i = 1,..., n. The sample ACF of {r i } is small in absolute value at all lags (possible exception are two lags that do not persist within 95% confidence interval). One may suppose that the sizes-duration pairs are independent.

93 Bivariate analysis of TCP-flow data Block Maximaof Duration (hours) Block Maximaof Size (MB) Scatter plot of pairs of block maxima (M j S,m, Mj D,m ), j = 1,...,610, when the block size is m = The pairs that are presented in the initial sample are marked by dots as far as the pairs that are not presented (e.g., the maximal size in the group does not necessarily correspond to the maximal duration in this group) by circles. Lines D = S/384 and D = S/42 indicate 384 kb/s (EDGE) and 42 kb/s (GPRS) access rates.

94 Testing of dependence of the block maxima ACF ACF Lag Lag Estimates of standard sample ACF of the both maxima samples of size 61 corresponding to TCP-flow sizes (left), and durations (right). The dotted horizontal lines indicate 95% asymptotic confidence bounds ±1.96/ n.

95 Testing of dependence of the block maxima ACF 0 ACF Lag Lag Estimates of standard sample ACF of the both maxima samples of size 610 corresponding to TCP-flow sizes (left), and durations (right). The dotted horizontal lines indicate 95% asymptotic confidence bounds ±1.96/ n.

96 Testing of distribution of the block maxima The Generalized Extreme Value (GEV) distribution { ( exp( (1 + γ x µ ) σ ) 1/γ ), γ 0 H γ (x) = x µ exp( e ( σ ) ), γ = 0. is applied as a model of the block maxima distribution. Maximum likelihood estimates of GEV parameters by block maxima of size 610 of TCP-flow data Statistic Definition γ µ σ Size Content Duration SYN-FIN

97 Testing of distribution of the block maxima Quantilesof GEV Quantilesof GEV Sorted Sample Sorted Sample QQ-plots of block maxima samples corresponding to TCP-flow sizes (left) and durations (right).

98 Testing of dependence of the TCP-flow data A(t) A(t) t t The estimation of the Pickands dependence function by estimators ÂC n (t) (dashed line) and ÂHT n (t) (solid line). The maxima sample of size 61 (left) and of size 610 (right). The marginal distributions G 1 (x) and G 2 (x) of TCP-flow sizes and durations are estimated by GEV. Conclusions: TCP-flow size and duration are dependent.

99 Bivariate quantile curves of the TCP-flow data Using estimates of A(t) one can construct bivariate quantile curves of TCP-flow data by (4) p = Flow Duration p = 0.9 p = 0.75 p = 0.95 Flow Duration p = 0.9 p = Flow Size 0 p = Flow Size Estimated quantile curves of TCP-flow data for p {0.75, 0.9, 0.95} corresponding to estimator ÂC n (t): the maxima sample of size 61 (left), of size 610 (right).

100 Conclusions from bivariate analysis of TCP-flow data The analysis is made from samples of moderate size. Size S and duration D are heavy-tailed with probably infinite second moment. Their distributions are complicated in the sense that they do not belong to any known parametric models. Estimates of the Pickands dependence function show that S and D are dependent. Bivariate quantile curves show that the bivariate extreme value distribution of (S, D) is not quite heavy-tailed in the sense that not many observations fall in the outliers area, i.e. beyond the 97.5% quantile curve. This can be a special property of this mobile TCP data. Bivariate quantile curves are sensitive to, at least, 1 estimation of parameters of margins of G(x, y) and estimates of A(t) and 2 the amount of component-wise maxima, or to the block size.

101 Notes Considering the real data three items have to be investigated: 1 the preliminary detection of heavy tails; 2 the dependence structure of univariate data; 3 the dependence structure of multivariate data.

102 Reference: 1. Markovich, N.M. and Krieger, U.R. (2005) Statistical Inspection and Analysis Techniques for Traffic Data Arising from the Internet, HETNETs 04 special journal issue on Convergent Multi-Service Networks and Next Generation Internet: Performance Modelling and Evaluation pp.72/1-72/9. 2. Beirlant, J., Goegebeur, Y., Teugels, J. and Segers, J. (2004) Statistics of Extremes: Theory and Applications. Wiley, Chichester, West Sussex. 3. Davis, R. and Resnick, S. (1985) Limit theory for moving averages of random variables with regularly varying tail probabilities. Ann. Probability 13, pp Embrechts, P., Klüppelberg, C. and Mikosch, T. (1997) Modeling Extremal Events. Springer, Berlin. 5. Brockwell P.J. and Davis R.A. (1991) Time series: Theory and Methods, 2nd edition. Springer, New York.

103 Reference: 6. Kettani, H. and Gubner, J.A. (2002) A novel approach to the estimation of the hurst parameter in self-similar traffic. In: Proceedings of IEEE conference on local computer networks, Tampa, Florida. 7. Resnick, S.I. (2006) Heavy-Tail Phenomena. Probabilistic and Statistical Modeling. Springer, New York.

Analysis methods of heavy-tailed data

Analysis methods of heavy-tailed data Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter

More information

Bivariate statistical analysis of TCP-flow sizes and durations

Bivariate statistical analysis of TCP-flow sizes and durations Ann Oper Res (2009) 170: 199 216 DOI 10.1007/s10479-009-0531-6 Bivariate statistical analysis of TCP-flow sizes and durations Natalia M. Markovich Jorma Kilpi Published online: 7 March 2009 Springer Science+Business

More information

Extreme Value Theory and Applications

Extreme Value Theory and Applications Extreme Value Theory and Deauville - 04/10/2013 Extreme Value Theory and Introduction Asymptotic behavior of the Sum Extreme (from Latin exter, exterus, being on the outside) : Exceeding the ordinary,

More information

Analysis methods of heavy-tailed data

Analysis methods of heavy-tailed data Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

ESTIMATING BIVARIATE TAIL

ESTIMATING BIVARIATE TAIL Elena DI BERNARDINO b joint work with Clémentine PRIEUR a and Véronique MAUME-DESCHAMPS b a LJK, Université Joseph Fourier, Grenoble 1 b Laboratoire SAF, ISFA, Université Lyon 1 Framework Goal: estimating

More information

Does k-th Moment Exist?

Does k-th Moment Exist? Does k-th Moment Exist? Hitomi, K. 1 and Y. Nishiyama 2 1 Kyoto Institute of Technology, Japan 2 Institute of Economic Research, Kyoto University, Japan Email: hitomi@kit.ac.jp Keywords: Existence of moments,

More information

Nonlinear Time Series Modeling

Nonlinear Time Series Modeling Nonlinear Time Series Modeling Part II: Time Series Models in Finance Richard A. Davis Colorado State University (http://www.stat.colostate.edu/~rdavis/lectures) MaPhySto Workshop Copenhagen September

More information

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Overview of Extreme Value Theory. Dr. Sawsan Hilal space Overview of Extreme Value Theory Dr. Sawsan Hilal space Maths Department - University of Bahrain space November 2010 Outline Part-1: Univariate Extremes Motivation Threshold Exceedances Part-2: Bivariate

More information

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015 MFM Practitioner Module: Quantitiative Risk Management October 14, 2015 The n-block maxima 1 is a random variable defined as M n max (X 1,..., X n ) for i.i.d. random variables X i with distribution function

More information

Practical conditions on Markov chains for weak convergence of tail empirical processes

Practical conditions on Markov chains for weak convergence of tail empirical processes Practical conditions on Markov chains for weak convergence of tail empirical processes Olivier Wintenberger University of Copenhagen and Paris VI Joint work with Rafa l Kulik and Philippe Soulier Toronto,

More information

Math 576: Quantitative Risk Management

Math 576: Quantitative Risk Management Math 576: Quantitative Risk Management Haijun Li lih@math.wsu.edu Department of Mathematics Washington State University Week 11 Haijun Li Math 576: Quantitative Risk Management Week 11 1 / 21 Outline 1

More information

A Closer Look at the Hill Estimator: Edgeworth Expansions and Confidence Intervals

A Closer Look at the Hill Estimator: Edgeworth Expansions and Confidence Intervals A Closer Look at the Hill Estimator: Edgeworth Expansions and Confidence Intervals Erich HAEUSLER University of Giessen http://www.uni-giessen.de Johan SEGERS Tilburg University http://www.center.nl EVA

More information

Introduction to Algorithmic Trading Strategies Lecture 10

Introduction to Algorithmic Trading Strategies Lecture 10 Introduction to Algorithmic Trading Strategies Lecture 10 Risk Management Haksun Li haksun.li@numericalmethod.com www.numericalmethod.com Outline Value at Risk (VaR) Extreme Value Theory (EVT) References

More information

A New Estimator for a Tail Index

A New Estimator for a Tail Index Acta Applicandae Mathematicae 00: 3, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. A New Estimator for a Tail Index V. PAULAUSKAS Department of Mathematics and Informatics, Vilnius

More information

Estimating Bivariate Tail: a copula based approach

Estimating Bivariate Tail: a copula based approach Estimating Bivariate Tail: a copula based approach Elena Di Bernardino, Université Lyon 1 - ISFA, Institut de Science Financiere et d'assurances - AST&Risk (ANR Project) Joint work with Véronique Maume-Deschamps

More information

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions Anna Kiriliouk 1 Holger Rootzén 2 Johan Segers 1 Jennifer L. Wadsworth 3 1 Université catholique de Louvain (BE) 2 Chalmers

More information

Financial Econometrics and Volatility Models Extreme Value Theory

Financial Econometrics and Volatility Models Extreme Value Theory Financial Econometrics and Volatility Models Extreme Value Theory Eric Zivot May 3, 2010 1 Lecture Outline Modeling Maxima and Worst Cases The Generalized Extreme Value Distribution Modeling Extremes Over

More information

Change Point Analysis of Extreme Values

Change Point Analysis of Extreme Values Change Point Analysis of Extreme Values TIES 2008 p. 1/? Change Point Analysis of Extreme Values Goedele Dierckx Economische Hogeschool Sint Aloysius, Brussels, Belgium e-mail: goedele.dierckx@hubrussel.be

More information

The Convergence Rate for the Normal Approximation of Extreme Sums

The Convergence Rate for the Normal Approximation of Extreme Sums The Convergence Rate for the Normal Approximation of Extreme Sums Yongcheng Qi University of Minnesota Duluth WCNA 2008, Orlando, July 2-9, 2008 This talk is based on a joint work with Professor Shihong

More information

Extremogram and ex-periodogram for heavy-tailed time series

Extremogram and ex-periodogram for heavy-tailed time series Extremogram and ex-periodogram for heavy-tailed time series 1 Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia) and Yuwei Zhao (Ulm) 1 Zagreb, June 6, 2014 1 2 Extremal

More information

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Statistica Sinica 20 2010, 365-378 A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Liang Peng Georgia Institute of Technology Abstract: Estimating tail dependence functions is important for applications

More information

Extreme Value Theory as a Theoretical Background for Power Law Behavior

Extreme Value Theory as a Theoretical Background for Power Law Behavior Extreme Value Theory as a Theoretical Background for Power Law Behavior Simone Alfarano 1 and Thomas Lux 2 1 Department of Economics, University of Kiel, alfarano@bwl.uni-kiel.de 2 Department of Economics,

More information

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS EVA IV, CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS Jose Olmo Department of Economics City University, London (joint work with Jesús Gonzalo, Universidad Carlos III de Madrid) 4th Conference

More information

Shape of the return probability density function and extreme value statistics

Shape of the return probability density function and extreme value statistics Shape of the return probability density function and extreme value statistics 13/09/03 Int. Workshop on Risk and Regulation, Budapest Overview I aim to elucidate a relation between one field of research

More information

A MODIFICATION OF HILL S TAIL INDEX ESTIMATOR

A MODIFICATION OF HILL S TAIL INDEX ESTIMATOR L. GLAVAŠ 1 J. JOCKOVIĆ 2 A MODIFICATION OF HILL S TAIL INDEX ESTIMATOR P. MLADENOVIĆ 3 1, 2, 3 University of Belgrade, Faculty of Mathematics, Belgrade, Serbia Abstract: In this paper, we study a class

More information

Extremogram and Ex-Periodogram for heavy-tailed time series

Extremogram and Ex-Periodogram for heavy-tailed time series Extremogram and Ex-Periodogram for heavy-tailed time series 1 Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia) and Yuwei Zhao (Ulm) 1 Jussieu, April 9, 2014 1 2 Extremal

More information

A NOTE ON SECOND ORDER CONDITIONS IN EXTREME VALUE THEORY: LINKING GENERAL AND HEAVY TAIL CONDITIONS

A NOTE ON SECOND ORDER CONDITIONS IN EXTREME VALUE THEORY: LINKING GENERAL AND HEAVY TAIL CONDITIONS REVSTAT Statistical Journal Volume 5, Number 3, November 2007, 285 304 A NOTE ON SECOND ORDER CONDITIONS IN EXTREME VALUE THEORY: LINKING GENERAL AND HEAVY TAIL CONDITIONS Authors: M. Isabel Fraga Alves

More information

Estimation of the long Memory parameter using an Infinite Source Poisson model applied to transmission rate measurements

Estimation of the long Memory parameter using an Infinite Source Poisson model applied to transmission rate measurements of the long Memory parameter using an Infinite Source Poisson model applied to transmission rate measurements François Roueff Ecole Nat. Sup. des Télécommunications 46 rue Barrault, 75634 Paris cedex 13,

More information

On the Estimation and Application of Max-Stable Processes

On the Estimation and Application of Max-Stable Processes On the Estimation and Application of Max-Stable Processes Zhengjun Zhang Department of Statistics University of Wisconsin Madison, WI 53706, USA Co-author: Richard Smith EVA 2009, Fort Collins, CO Z. Zhang

More information

Bivariate generalized Pareto distribution

Bivariate generalized Pareto distribution Bivariate generalized Pareto distribution in practice Eötvös Loránd University, Budapest, Hungary Minisymposium on Uncertainty Modelling 27 September 2011, CSASC 2011, Krems, Austria Outline Short summary

More information

Modelação de valores extremos e sua importância na

Modelação de valores extremos e sua importância na Modelação de valores extremos e sua importância na segurança e saúde Margarida Brito Departamento de Matemática FCUP (FCUP) Valores Extremos - DemSSO 1 / 12 Motivation Consider the following events Occurance

More information

A Conditional Approach to Modeling Multivariate Extremes

A Conditional Approach to Modeling Multivariate Extremes A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying

More information

Heavy Tailed Time Series with Extremal Independence

Heavy Tailed Time Series with Extremal Independence Heavy Tailed Time Series with Extremal Independence Rafa l Kulik and Philippe Soulier Conference in honour of Prof. Herold Dehling Bochum January 16, 2015 Rafa l Kulik and Philippe Soulier Regular variation

More information

Methods, Techniques and Tools for IP Traffic Characterization, Measurements and Statistical Methods

Methods, Techniques and Tools for IP Traffic Characterization, Measurements and Statistical Methods Information Society Technologies (IST) - 6th Framework Programme Deliverable No: D.WP.JRA.5.1.2 Methods, Techniques and Tools for IP Traffic Characterization, Measurements and Statistical Methods Deliverable

More information

Spatial and temporal extremes of wildfire sizes in Portugal ( )

Spatial and temporal extremes of wildfire sizes in Portugal ( ) International Journal of Wildland Fire 2009, 18, 983 991. doi:10.1071/wf07044_ac Accessory publication Spatial and temporal extremes of wildfire sizes in Portugal (1984 2004) P. de Zea Bermudez A, J. Mendes

More information

Stochastic volatility models: tails and memory

Stochastic volatility models: tails and memory : tails and memory Rafa l Kulik and Philippe Soulier Conference in honour of Prof. Murad Taqqu 19 April 2012 Rafa l Kulik and Philippe Soulier Plan Model assumptions; Limit theorems for partial sums and

More information

Bayesian Modelling of Extreme Rainfall Data

Bayesian Modelling of Extreme Rainfall Data Bayesian Modelling of Extreme Rainfall Data Elizabeth Smith A thesis submitted for the degree of Doctor of Philosophy at the University of Newcastle upon Tyne September 2005 UNIVERSITY OF NEWCASTLE Bayesian

More information

Research Article Strong Convergence Bound of the Pareto Index Estimator under Right Censoring

Research Article Strong Convergence Bound of the Pareto Index Estimator under Right Censoring Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 200, Article ID 20956, 8 pages doi:0.55/200/20956 Research Article Strong Convergence Bound of the Pareto Index Estimator

More information

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)

More information

Stochastic Processes: I. consider bowl of worms model for oscilloscope experiment:

Stochastic Processes: I. consider bowl of worms model for oscilloscope experiment: Stochastic Processes: I consider bowl of worms model for oscilloscope experiment: SAPAscope 2.0 / 0 1 RESET SAPA2e 22, 23 II 1 stochastic process is: Stochastic Processes: II informally: bowl + drawing

More information

Goodness-of-fit testing and Pareto-tail estimation

Goodness-of-fit testing and Pareto-tail estimation Goodness-of-fit testing and Pareto-tail estimation Yuri Goegebeur Department of Statistics, University of Southern Denmar, e-mail: yuri.goegebeur@stat.sdu.d Jan Beirlant University Center for Statistics,

More information

The Behavior of Multivariate Maxima of Moving Maxima Processes

The Behavior of Multivariate Maxima of Moving Maxima Processes The Behavior of Multivariate Maxima of Moving Maxima Processes Zhengjun Zhang Department of Mathematics Washington University Saint Louis, MO 6313-4899 USA Richard L. Smith Department of Statistics University

More information

Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution

Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution p. /2 Nonparametric Estimation of the Dependence Function for a Multivariate Extreme Value Distribution

More information

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations

Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Assessing the dependence of high-dimensional time series via sample autocovariances and correlations Johannes Heiny University of Aarhus Joint work with Thomas Mikosch (Copenhagen), Richard Davis (Columbia),

More information

Quantile-quantile plots and the method of peaksover-threshold

Quantile-quantile plots and the method of peaksover-threshold Problems in SF2980 2009-11-09 12 6 4 2 0 2 4 6 0.15 0.10 0.05 0.00 0.05 0.10 0.15 Figure 2: qqplot of log-returns (x-axis) against quantiles of a standard t-distribution with 4 degrees of freedom (y-axis).

More information

Department of Econometrics and Business Statistics

Department of Econometrics and Business Statistics Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Minimum Variance Unbiased Maximum Lielihood Estimation of the Extreme Value Index Roger

More information

Variable inspection plans for continuous populations with unknown short tail distributions

Variable inspection plans for continuous populations with unknown short tail distributions Variable inspection plans for continuous populations with unknown short tail distributions Wolfgang Kössler Abstract The ordinary variable inspection plans are sensitive to deviations from the normality

More information

On the estimation of the heavy tail exponent in time series using the max spectrum. Stilian A. Stoev

On the estimation of the heavy tail exponent in time series using the max spectrum. Stilian A. Stoev On the estimation of the heavy tail exponent in time series using the max spectrum Stilian A. Stoev (sstoev@umich.edu) University of Michigan, Ann Arbor, U.S.A. JSM, Salt Lake City, 007 joint work with:

More information

Exercises in Extreme value theory

Exercises in Extreme value theory Exercises in Extreme value theory 2016 spring semester 1. Show that L(t) = logt is a slowly varying function but t ǫ is not if ǫ 0. 2. If the random variable X has distribution F with finite variance,

More information

A Note on Tail Behaviour of Distributions. the max domain of attraction of the Frechét / Weibull law under power normalization

A Note on Tail Behaviour of Distributions. the max domain of attraction of the Frechét / Weibull law under power normalization ProbStat Forum, Volume 03, January 2010, Pages 01-10 ISSN 0974-3235 A Note on Tail Behaviour of Distributions in the Max Domain of Attraction of the Frechét/ Weibull Law under Power Normalization S.Ravi

More information

Bayesian Inference for Clustered Extremes

Bayesian Inference for Clustered Extremes Newcastle University, Newcastle-upon-Tyne, U.K. lee.fawcett@ncl.ac.uk 20th TIES Conference: Bologna, Italy, July 2009 Structure of this talk 1. Motivation and background 2. Review of existing methods Limitations/difficulties

More information

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics Mathematics Qualifying Examination January 2015 STAT 52800 - Mathematical Statistics NOTE: Answer all questions completely and justify your derivations and steps. A calculator and statistical tables (normal,

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

PREPRINT 2005:38. Multivariate Generalized Pareto Distributions HOLGER ROOTZÉN NADER TAJVIDI

PREPRINT 2005:38. Multivariate Generalized Pareto Distributions HOLGER ROOTZÉN NADER TAJVIDI PREPRINT 2005:38 Multivariate Generalized Pareto Distributions HOLGER ROOTZÉN NADER TAJVIDI Department of Mathematical Sciences Division of Mathematical Statistics CHALMERS UNIVERSITY OF TECHNOLOGY GÖTEBORG

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models

Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models Stat 248 Lab 2: Stationarity, More EDA, Basic TS Models Tessa L. Childers-Day February 8, 2013 1 Introduction Today s section will deal with topics such as: the mean function, the auto- and cross-covariance

More information

Estimation de mesures de risques à partir des L p -quantiles

Estimation de mesures de risques à partir des L p -quantiles 1/ 42 Estimation de mesures de risques à partir des L p -quantiles extrêmes Stéphane GIRARD (Inria Grenoble Rhône-Alpes) collaboration avec Abdelaati DAOUIA (Toulouse School of Economics), & Gilles STUPFLER

More information

EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY TREND WITH APPLICATIONS IN ELECTRICITY CONSUMPTION ALEXANDR V.

EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY TREND WITH APPLICATIONS IN ELECTRICITY CONSUMPTION ALEXANDR V. MONTENEGRIN STATIONARY JOURNAL TREND WITH OF ECONOMICS, APPLICATIONS Vol. IN 9, ELECTRICITY No. 4 (December CONSUMPTION 2013), 53-63 53 EXTREMAL QUANTILES OF MAXIMUMS FOR STATIONARY SEQUENCES WITH PSEUDO-STATIONARY

More information

Discrete time processes

Discrete time processes Discrete time processes Predictions are difficult. Especially about the future Mark Twain. Florian Herzog 2013 Modeling observed data When we model observed (realized) data, we encounter usually the following

More information

Change Point Analysis of Extreme Values

Change Point Analysis of Extreme Values Change Point Analysis of Extreme Values DGVFM Stuttgart 27 APRIL 2012 p. 1/3 Change Point Analysis of Extreme Values Goedele Dierckx Economische Hogeschool Sint Aloysius, Brussels, Belgium Jef L. Teugels

More information

Asymptotic Tail Probabilities of Sums of Dependent Subexponential Random Variables

Asymptotic Tail Probabilities of Sums of Dependent Subexponential Random Variables Asymptotic Tail Probabilities of Sums of Dependent Subexponential Random Variables Jaap Geluk 1 and Qihe Tang 2 1 Department of Mathematics The Petroleum Institute P.O. Box 2533, Abu Dhabi, United Arab

More information

Multivariate generalized Pareto distributions

Multivariate generalized Pareto distributions Multivariate generalized Pareto distributions Holger Rootzén and Nader Tajvidi Abstract Statistical inference for extremes has been a subject of intensive research during the past couple of decades. One

More information

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

HIERARCHICAL MODELS IN EXTREME VALUE THEORY HIERARCHICAL MODELS IN EXTREME VALUE THEORY Richard L. Smith Department of Statistics and Operations Research, University of North Carolina, Chapel Hill and Statistical and Applied Mathematical Sciences

More information

The autocorrelation and autocovariance functions - helpful tools in the modelling problem

The autocorrelation and autocovariance functions - helpful tools in the modelling problem The autocorrelation and autocovariance functions - helpful tools in the modelling problem J. Nowicka-Zagrajek A. Wy lomańska Institute of Mathematics and Computer Science Wroc law University of Technology,

More information

Large deviations for random walks under subexponentiality: the big-jump domain

Large deviations for random walks under subexponentiality: the big-jump domain Large deviations under subexponentiality p. Large deviations for random walks under subexponentiality: the big-jump domain Ton Dieker, IBM Watson Research Center joint work with D. Denisov (Heriot-Watt,

More information

Pareto approximation of the tail by local exponential modeling

Pareto approximation of the tail by local exponential modeling Pareto approximation of the tail by local exponential modeling Ion Grama Université de Bretagne Sud rue Yves Mainguy, Tohannic 56000 Vannes, France email: ion.grama@univ-ubs.fr Vladimir Spokoiny Weierstrass

More information

Econ 623 Econometrics II Topic 2: Stationary Time Series

Econ 623 Econometrics II Topic 2: Stationary Time Series 1 Introduction Econ 623 Econometrics II Topic 2: Stationary Time Series In the regression model we can model the error term as an autoregression AR(1) process. That is, we can use the past value of the

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

Change Point Analysis of Extreme Values

Change Point Analysis of Extreme Values Change Point Analysis of Extreme Values Lisboa 2013 p. 1/3 Change Point Analysis of Extreme Values Goedele Dierckx Economische Hogeschool Sint Aloysius, Brussels, Belgium Jef L. Teugels Katholieke Universiteit

More information

Sharp statistical tools Statistics for extremes

Sharp statistical tools Statistics for extremes Sharp statistical tools Statistics for extremes Georg Lindgren Lund University October 18, 2012 SARMA Background Motivation We want to predict outside the range of observations Sums, averages and proportions

More information

Estimation of the second order parameter for heavy-tailed distributions

Estimation of the second order parameter for heavy-tailed distributions Estimation of the second order parameter for heavy-tailed distributions Stéphane Girard Inria Grenoble Rhône-Alpes, France joint work with El Hadji Deme (Université Gaston-Berger, Sénégal) and Laurent

More information

A New Class of Tail-dependent Time Series Models and Its Applications in Financial Time Series

A New Class of Tail-dependent Time Series Models and Its Applications in Financial Time Series A New Class of Tail-dependent Time Series Models and Its Applications in Financial Time Series Zhengjun Zhang Department of Mathematics, Washington University, Saint Louis, MO 63130-4899, USA Abstract

More information

Tail empirical process for long memory stochastic volatility models

Tail empirical process for long memory stochastic volatility models Tail empirical process for long memory stochastic volatility models Rafa l Kulik and Philippe Soulier Carleton University, 5 May 2010 Rafa l Kulik and Philippe Soulier Quick review of limit theorems and

More information

The Fundamentals of Heavy Tails Properties, Emergence, & Identification. Jayakrishnan Nair, Adam Wierman, Bert Zwart

The Fundamentals of Heavy Tails Properties, Emergence, & Identification. Jayakrishnan Nair, Adam Wierman, Bert Zwart The Fundamentals of Heavy Tails Properties, Emergence, & Identification Jayakrishnan Nair, Adam Wierman, Bert Zwart Why am I doing a tutorial on heavy tails? Because we re writing a book on the topic Why

More information

High quantile estimation for some Stochastic Volatility models

High quantile estimation for some Stochastic Volatility models High quantile estimation for some Stochastic Volatility models Ling Luo Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfillment of the requirements for the degree of

More information

Limit theorems for dependent regularly varying functions of Markov chains

Limit theorems for dependent regularly varying functions of Markov chains Limit theorems for functions of with extremal linear behavior Limit theorems for dependent regularly varying functions of In collaboration with T. Mikosch Olivier Wintenberger wintenberger@ceremade.dauphine.fr

More information

4. Distributions of Functions of Random Variables

4. Distributions of Functions of Random Variables 4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n

More information

Assessing Dependence in Extreme Values

Assessing Dependence in Extreme Values 02/09/2016 1 Motivation Motivation 2 Comparison 3 Asymptotic Independence Component-wise Maxima Measures Estimation Limitations 4 Idea Results Motivation Given historical flood levels, how high should

More information

Overview of Extreme Value Analysis (EVA)

Overview of Extreme Value Analysis (EVA) Overview of Extreme Value Analysis (EVA) Brian Reich North Carolina State University July 26, 2016 Rossbypalooza Chicago, IL Brian Reich Overview of Extreme Value Analysis (EVA) 1 / 24 Importance of extremes

More information

PENULTIMATE APPROXIMATIONS FOR WEATHER AND CLIMATE EXTREMES. Rick Katz

PENULTIMATE APPROXIMATIONS FOR WEATHER AND CLIMATE EXTREMES. Rick Katz PENULTIMATE APPROXIMATIONS FOR WEATHER AND CLIMATE EXTREMES Rick Katz Institute for Mathematics Applied to Geosciences National Center for Atmospheric Research Boulder, CO USA Email: rwk@ucar.edu Web site:

More information

ENGR352 Problem Set 02

ENGR352 Problem Set 02 engr352/engr352p02 September 13, 2018) ENGR352 Problem Set 02 Transfer function of an estimator 1. Using Eq. (1.1.4-27) from the text, find the correct value of r ss (the result given in the text is incorrect).

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction

More information

Thomas Mikosch: How to model multivariate extremes if one must?

Thomas Mikosch: How to model multivariate extremes if one must? MaPhySto The Danish National Research Foundation: Network in Mathematical Physics and Stochastics Research Report no. 21 October 2004 Thomas Mikosch: How to model multivariate extremes if one must? ISSN

More information

Multivariate Pareto distributions: properties and examples

Multivariate Pareto distributions: properties and examples Multivariate Pareto distributions: properties and examples Ana Ferreira 1, Laurens de Haan 2 1 ISA UTL and CEAUL, Portugal 2 Erasmus Univ Rotterdam and CEAUL EVT2013 Vimeiro, September 8 11 Univariate

More information

Nonlinear time series

Nonlinear time series Based on the book by Fan/Yao: Nonlinear Time Series Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna October 27, 2009 Outline Characteristics of

More information

A THRESHOLD APPROACH FOR PEAKS-OVER-THRESHOLD MODELING USING MAXIMUM PRODUCT OF SPACINGS

A THRESHOLD APPROACH FOR PEAKS-OVER-THRESHOLD MODELING USING MAXIMUM PRODUCT OF SPACINGS Statistica Sinica 20 2010, 1257-1272 A THRESHOLD APPROACH FOR PEAKS-OVER-THRESHOLD MODELING USING MAXIMUM PRODUCT OF SPACINGS Tony Siu Tung Wong and Wai Keung Li The University of Hong Kong Abstract: We

More information

The sample autocorrelations of financial time series models

The sample autocorrelations of financial time series models The sample autocorrelations of financial time series models Richard A. Davis 1 Colorado State University Thomas Mikosch University of Groningen and EURANDOM Abstract In this chapter we review some of the

More information

AN ASYMPTOTICALLY UNBIASED MOMENT ESTIMATOR OF A NEGATIVE EXTREME VALUE INDEX. Departamento de Matemática. Abstract

AN ASYMPTOTICALLY UNBIASED MOMENT ESTIMATOR OF A NEGATIVE EXTREME VALUE INDEX. Departamento de Matemática. Abstract AN ASYMPTOTICALLY UNBIASED ENT ESTIMATOR OF A NEGATIVE EXTREME VALUE INDEX Frederico Caeiro Departamento de Matemática Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa 2829 516 Caparica,

More information

Models and estimation.

Models and estimation. Bivariate generalized Pareto distribution practice: Models and estimation. Eötvös Loránd University, Budapest, Hungary 7 June 2011, ASMDA Conference, Rome, Italy Problem How can we properly estimate the

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Estimation of risk measures for extreme pluviometrical measurements

Estimation of risk measures for extreme pluviometrical measurements Estimation of risk measures for extreme pluviometrical measurements by Jonathan EL METHNI in collaboration with Laurent GARDES & Stéphane GIRARD 26th Annual Conference of The International Environmetrics

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

The high order moments method in endpoint estimation: an overview

The high order moments method in endpoint estimation: an overview 1/ 33 The high order moments method in endpoint estimation: an overview Gilles STUPFLER (Aix Marseille Université) Joint work with Stéphane GIRARD (INRIA Rhône-Alpes) and Armelle GUILLOU (Université de

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Classical Extreme Value Theory - An Introduction

Classical Extreme Value Theory - An Introduction Chapter 1 Classical Extreme Value Theory - An Introduction 1.1 Introduction Asymptotic theory of functions of random variables plays a very important role in modern statistics. The objective of the asymptotic

More information

Quantitative Modeling of Operational Risk: Between g-and-h and EVT

Quantitative Modeling of Operational Risk: Between g-and-h and EVT : Between g-and-h and EVT Paul Embrechts Matthias Degen Dominik Lambrigger ETH Zurich (www.math.ethz.ch/ embrechts) Outline Basel II LDA g-and-h Aggregation Conclusion and References What is Basel II?

More information

Multivariate Heavy Tails, Asymptotic Independence and Beyond

Multivariate Heavy Tails, Asymptotic Independence and Beyond Multivariate Heavy Tails, endence and Beyond Sidney Resnick School of Operations Research and Industrial Engineering Rhodes Hall Cornell University Ithaca NY 14853 USA http://www.orie.cornell.edu/ sid

More information