Refining the Central Limit Theorem Approximation via Extreme Value Theory

Similar documents
The Convergence Rate for the Normal Approximation of Extreme Sums

A MODIFICATION OF HILL S TAIL INDEX ESTIMATOR

A Note on Tail Behaviour of Distributions. the max domain of attraction of the Frechét / Weibull law under power normalization

Long-Run Covariability

On the Application of the Generalized Pareto Distribution for Statistical Extrapolation in the Assessment of Dynamic Stability in Irregular Waves

Does k-th Moment Exist?

Research Article Strong Convergence Bound of the Pareto Index Estimator under Right Censoring

A NOTE ON SECOND ORDER CONDITIONS IN EXTREME VALUE THEORY: LINKING GENERAL AND HEAVY TAIL CONDITIONS

Economics 583: Econometric Theory I A Primer on Asymptotics

Random Bernstein-Markov factors

Reduced-load equivalence for queues with Gaussian input

On the Bennett-Hoeffding inequality

On rate of convergence in distribution of asymptotically normal statistics based on samples of random size

Strictly Stationary Solutions of Autoregressive Moving Average Equations

Module 3. Function of a Random Variable and its distribution

Estimates for probabilities of independent events and infinite series

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

3 Integration and Expectation

Notes on Mathematical Expectations and Classes of Distributions Introduction to Econometric Theory Econ. 770

NONPARAMETRIC ESTIMATION OF THE CONDITIONAL TAIL INDEX

1 Definition of the Riemann integral

General Theory of Large Deviations

Self-normalized Cramér-Type Large Deviations for Independent Random Variables

ON THE EQUIVALENCE OF CONGLOMERABILITY AND DISINTEGRABILITY FOR UNBOUNDED RANDOM VARIABLES

Exponential Tail Bounds

Math 229: Introduction to Analytic Number Theory The product formula for ξ(s) and ζ(s); vertical distribution of zeros

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

University of California San Diego and Stanford University and

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

RANDOM WALKS WHOSE CONCAVE MAJORANTS OFTEN HAVE FEW FACES

NOTES AND PROBLEMS IMPULSE RESPONSES OF FRACTIONALLY INTEGRATED PROCESSES WITH LONG MEMORY

EXPLICIT NONPARAMETRIC CONFIDENCE INTERVALS FOR THE VARIANCE WITH GUARANTEED COVERAGE

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

LARGE DEVIATION PROBABILITIES FOR SUMS OF HEAVY-TAILED DEPENDENT RANDOM VECTORS*

Asymptotic efficiency of simple decisions for the compound decision problem

Sums of exponentials of random walks

On the Set of Limit Points of Normed Sums of Geometrically Weighted I.I.D. Bounded Random Variables

The Mean Value Inequality (without the Mean Value Theorem)

Cross-fitting and fast remainder rates for semiparametric estimation

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

Adventures in Stochastic Processes

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Variable inspection plans for continuous populations with unknown short tail distributions

Combinatorial Proof of the Hot Spot Theorem

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University

Challenges in implementing worst-case analysis

The main results about probability measures are the following two facts:

Problem set 1, Real Analysis I, Spring, 2015.

The sample complexity of agnostic learning with deterministic labels

Approximately Gaussian marginals and the hyperplane conjecture

SUPPLEMENT TO POSTERIOR CONTRACTION AND CREDIBLE SETS FOR FILAMENTS OF REGRESSION FUNCTIONS. North Carolina State University

Asymptotic Tail Probabilities of Sums of Dependent Subexponential Random Variables

1 Sequences of events and their limits

On Semicontinuity of Convex-valued Multifunctions and Cesari s Property (Q)

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)

Bounded Infinite Sequences/Functions : Orders of Infinity

A DECOMPOSITION THEOREM FOR FRAMES AND THE FEICHTINGER CONJECTURE

A generalization of Cramér large deviations for martingales

Stochastic Convergence, Delta Method & Moment Estimators

arxiv:math/ v1 [math.st] 16 May 2006

Extremogram and Ex-Periodogram for heavy-tailed time series

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

Cramér large deviation expansions for martingales under Bernstein s condition

Extremogram and ex-periodogram for heavy-tailed time series

Building Infinite Processes from Finite-Dimensional Distributions

δ -method and M-estimation

Large Deviations for Small-Noise Stochastic Differential Equations

Standard forms for writing numbers

8.5 Taylor Polynomials and Taylor Series

Convex Analysis and Economic Theory AY Elementary properties of convex functions

Notes on uniform convergence

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

2. Variance and Higher Moments

Tail Index Estimation: Quantile Driven Threshold Selection. Working Paper

ON THE ESTIMATION OF EXTREME TAIL PROBABILITIES. By Peter Hall and Ishay Weissman Australian National University and Technion

Proof. We indicate by α, β (finite or not) the end-points of I and call

arxiv: v1 [math.pr] 24 Aug 2018

Wald for non-stopping times: The rewards of impatient prophets

Notes 6 : First and second moment methods

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

A PROBABILISTIC PROOF OF WALLIS S FORMULA FOR π. ( 1) n 2n + 1. The proof uses the fact that the derivative of arctan x is 1/(1 + x 2 ), so π/4 =

Soft Covering with High Probability

The Codimension of the Zeros of a Stable Process in Random Scenery

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series

Some Background Material

A comparison study of the nonparametric tests based on the empirical distributions

17. Convergence of Random Variables

Zeros of lacunary random polynomials

Robustness and duality of maximum entropy and exponential family distributions

2 markets. Money is gained or lost in high volatile markets (in this brief discussion, we do not distinguish between implied or historical volatility)

Approximation of Heavy-tailed distributions via infinite dimensional phase type distributions

Relative efficiency. Patrick Breheny. October 9. Theoretical framework Application to the two-group problem

arxiv: v2 [math.st] 20 Nov 2016

STAT 200C: High-dimensional Statistics

EXACT CONVERGENCE RATE AND LEADING TERM IN THE CENTRAL LIMIT THEOREM FOR U-STATISTICS

Transcription:

Refining the Central Limit Theorem Approximation via Extreme Value Theory Ulrich K. Müller Economics Department Princeton University February 2018 Abstract We suggest approximating the distribution of the sum of independent and identically distributed random variables with a Pareto-like tail by combining extreme value approximations for the largest summands with a normal approximation for the sum of the smaller summands. If the tail is well approximated by a Pareto density, then this new approximation has substantially smaller error rates compared to the usual normal approximation for underlying distributions with finite variance and less than three moments. It can also provide an accurate approximation for some infinite variance distributions. AMS 2000 subject classification: 60F05, 60G70, 60E07 Key words and phrases: Regular variation, rates of convergence

1 Introduction Consider approximations to the distribution of the sum = P =1 of independent meanzero random variables with distribution function. If 2 0 = R 2 () exists, then 12 is asymptotically normal by the central limit theorem. The quality of this approximation is poor if max is not much smaller than 12, since then a single non-normal random variable has non-negligible influence on 12. Extreme value theory provides large sample approximations to the behavior of the largest observations, suggesting that it may be fruitfully employed in the derivation of better approximations to the distribution of. For simplicity, consider the case where has a light left tail and a heavy right tail. Specifically, assume R 0 3 () and 1 () lim = 1, 0 (1) 1 for 13 1, sothattherighttailof is approximately Pareto with shape parameter 1 and scale parameter. Let : be the order statistics. For a given sequence = (), 1, split into two pieces X = : =1 X 1: (2) =1 Note that conditional on the th order statistic = 1: P =1 : has the same distribution as P =1,where are i.i.d. from the truncated distribution () with () = () () for and () =1otherwise. Let () and 2 () be the mean and variance of. Since is less skewed than, one would expect the distributional approximation (denoted by ) of the central limit theorem, X : ( )( )( ) 12 ( ) for N (0 1) (3) =1 to be relatively accurate. At the same time, extreme value theory implies that under (1), X =1 1: X =1 Γ for Γ = X, i.i.d. exponential. (4) =1 1

Combining (3) and (4) suggests ( )( Γ )12 ( Γ ) X =1 Γ (5) with independent of (Γ ) =1. If 12, the approximate Pareto tail (1) and E[ 1 ]=0imply 1 1 1 () (1 )(1 () 1 ) and 2 () 2 0 1 1 1 2 2 1 for large. From ( )( Γ ) 1, this further yields µ 12 X 1 Γ1 12 2 0 2 1 2 (Γ ) 1 2 Γ (6) with () =max( 0), which depends on only through the unconditional variance 2 0 and the two tail parameters ( ). Note that E[Γ ]=Γ( )Γ() and E[Γ 1 ]=Γ(1 Γ( )Γ(), so the right-hand side of (6) is the sum of a mean-zero )Γ() =(1 ) P =1 right skewed random variable, and a (dependent) random-scale mean-zero normal variable. Theorem 1 below provides an upper bound on the convergence rate of the error in the approximation (6). The proof combines the Berry-Esseen bound for the central limit theorem approximation in (3) and the rate result in Corollary 5.5.5 of Reiss (1989) for the extreme value approximation in (4). If the tail of is such that the approximation in (4) is accurate, then for both fixed and diverging the error in (6) converges to zero faster than the error in the usual mean-zero normal approximation. The approximation (6) thus helps illuminate the nature and origin of the leading error terms in the first order normal approximation, as derived in Chapter 2 of Hall (1982), for such. We also provide a characterization of the bound minimizing choice of. If 12, then the distribution of converges to a one-sided stable law with index. An elegant argument by LePage, Woodroofe, and Zinn (1981) shows that this limiting law can be written as P =1 Γ. The approximation (5) thus remains potentially accurate under also for infinite variance distributions. To obtain a further approximation akin to (6), note that (1) implies 2 () 2 () ( 2 ) R 1 1 for large. Let =().Then Z! 1 Γ1 12 2 ( ) 2 (Γ ) 12 X 1 1 Γ (7) 2 =1 =1

which depends on only through the tail parameters ( ) and the sequence of truncated variances 2 ( ). The approximation (7) could also be applied to the case 12, so that one obtains a unifying approximation for values of both smaller and larger than 12 Indeed, for mean-centered Pareto of index, the results below imply that for suitable choice of, this approximation has an error that converges to zero much faster than the error from the first order approximation via the normal or non-normal stable limit for close to 12. The approach here thus also sheds light on the nature of the leading error terms of the non-normal stable limit, such as those derived by Christoph and Wolf (1992). For 12 the idea of splitting up as in (2) and to jointly analyze the asymptotic behavior of the pieces is already pursued in Csörgö, Haeusler, and Mason (1988). The contribution here is to derive error rates for resulting approximation to the distribution of the sum, especially for 13 12, and to develop the additional approximation of the truncated mean and variance induced by the approximate Pareto tail. The next section formalizes these arguments and discusses various forms of writing the variance term and the approximation for the case where both tails are heavy. Section 3 contains the proofs. 2 Assumptions and Main Results The following condition imposes the right tail of to be in the -neighborhood of the Pareto distribution with index, asdefined in Chapter 2 of Falk, Hüsler, and Reiss (2004). Condition 1 For some 0 0 and 13 1, () admits a density for all 0 of the form () =() 1 () 1 1 (1 ()) with () uniformly in 0. As discussed in Falk, Hüsler, and Reiss (2004), Condition 1 can be motivated by considering the remainder in the von Mises condition for extreme value theory. It is also closely related to the assumption that the tail of is second order regularly varying, as studied by de Haan and Stadtmüller (1996) and de Haan and Resnick (1996). Many heavy-tailed distributions satisfy Condition 1: for the right tail of a student-t distribution with degrees 3

of freedom, =1 and =2, for the tail of a Fréchet or generalized extreme value distribution with parameter, =1 and =1 and for an exact Pareto tail, may be chosen arbitrarily large. In general, shifts of the distribution affect ; for instance, a mean-centered Pareto distribution satisfies Condition 1 only for. See Remark 4 below. We write for a generic positive constant that does not depend on or, not necessarily thesameineachinstanceitisused. Theorem 1 Under Condition 1, (a) for 13 12 sup µ P( 12 ) P 1 Γ1 12 2 0 X =1 2 1 2 (Γ ) 1 2 12 Γ 12! ( ) (b) for 13 1, =() and =( log ) 12 for =12 and = max(12) otherwise, sup P( ) P 1 Γ1 12 2 ( ) 2 X =1 Z (Γ ) 1 1! 12 Γ! ( ) where ( 12 () 3 1 () 12 for 13 12 ( ) = () 12 for 12 1 It is straightforward to characterize the rate for which minimizes the bound ( ). For two positive sequences,write ³ if 0 lim inf lim sup. Lemma 1 Let ³ with ³ min = ³ min ³ 6 1 62 3 6 62 1 2 1 12() 1 for 13 12 for 12 1 4

Figure 1: Error Convergence Rates of Refined Approximation Then min 1 ( ) ³ ( ) ³ with for 3(12 ) = 32 6 for 3(12 ) 123 124 2 for 123 1 6 for 13 12, and = for 12 1. Remarks. 1. For 13 12, Hall (1979) shows that under Condition 1, the error in the usual normal approximation to the distribution of satisfies sup P( 12 ) P( 0 ) ³ 1 1(2), so convergence is very slow for close to 12. For =12, Theorems3 and 4 in Hall (1980) imply that under Condition 1, ( log ) 12 converges to a normal distribution at a logarithmic rate. For any 0, the new approximation with optimal choice of yields a better rate for sufficiently close to 12, andforsufficiently large, the rate is at least as fast as 13 for all 13 12. Thus, if the tail of is sufficiently close to being Pareto in the sense of Condition 1, then the new approximations can provide dramatic improvements over the normal approximation. Even keeping fixed improves over the benchmark rate 1 1(2) as long as 1(2) 1 for 13 12. Atthesametime, if 12, then is larger than 1 1(2) for some sufficiently close to 13, sothenew approximation is potentially worse than the usual normal approximation (or, equivalently, the optimal choice of then is =0). 5

For 12 1and under Condition 1, sup P( ) P( P =1 Γ ) = ( 1 2 ) by Theorem 1 of Hall (1981), and his Theorem 2 shows this rate to be sharp under a suitably strengthened version of Condition 1. More specifically, for mean-centered Pareto, the rate is exactly 1 2 (cf. Christoph and Wolf (1992), Example 4.25), which, for any 0, isslowerthan for sufficiently close to 12. Figure 1 plots some of these rates. 2. An alternative approximation is obtained by replacing the term in the positive part function in parts (a) and (b) of Theorem 1 by 2 ((Γ ) ), with an approximation error that is still bounded by ( ) Substitution of the term 2 0 2 (Γ 1 2 ) 1 2 in part (a) of Theorem 1 by 2 0 2 1 2 ()1 2 (or dropping the integral in part (b) for 13 12) induces an additional error of order () 1 2 12. In general, this worsens the bound, although even with this further approximation, the rate can still be better than the baseline rate of 1 1(2).For12 1, dropping the integral in part (b) induces an additional error of order, so this simpler approximation still has an error no larger than ( ). 3. Consider the case where both tails of are approximately Pareto, that is Condition 1holdsfor = and =,andforsome 0 for all, () = ( ) 1 ( ) 1 1 (1 ( )) with () for all. Proceeding as in the introduction then suggests Υ 1 1 Γ 1 1 12 ( (Υ ) (Γ ) ) X =1 Γ X =1 Υ with (Υ ) =1 an independent copy of (Γ ) =1 and 2 ( ) thevarianceof 1 conditional on 1. If 13 1, then arguments analogous to the proof of Theorem 1 show that the error of this approximation is bounded by an expression of the form ( ) ( ), and the same form is obtained by replacing 2 ( ) with 2 ( )=( 2 ( )( 2 ) R 1 1 ( 2 ) R 1 1 ) for =( ) and =( ) (and the integrals may be dropped for 12 1, see the preceding remark). If =max( ) 12 and 6=, then the first order approximation to the distribution of is a one-sided stable law that does not depend on the smaller tail index. In contrast, the approximation above reflects the impact of both heavy tails, and in general, ignoring the relatively lighter tail leads to a worse bound. 6

4. Suppose the right tail of is well approximated by a shifted Pareto distribution, that is for some R and 1 1 1 0, () = () =() 1 (( )) 1 1 (1 ( )) for all 1 with () 1 1 uniformly in 1. This implies that satisfies Condition 1, but only for = min( 1 ). Let 0 () = ( ) and 0 () = R 0() 0 (). Then ( ) = R () ( ) =[ 0() (1 0 ())] 0 (). Thus, proceeding as for (6) yields ( 1: ) =1 ( Γ ) =1 and ( Γ ) 1 Γ1 12 ((Γ ) ) X =1 Γ (8) Straightforward modifications of the proof of Theorem 1 show that the approximation error in (8) is bounded by ( 1 ) and this form for the bound also applies if 2 () is further approximated by 2 () ( 2 ( )( 2 ) R 1 1 ) for =().So, for instance, if is mean-centered Pareto with 13 1,then 1 may be chosen arbitrarily large, and the approximation (8) with = of Lemma 1 yields a substantially better bound on the convergence rate compared to the original approximation (7) with a bound of the form ( ). The cost of this further refinement, however, is the introduction of a tail location parameter in addition to the tail scale and tail shape parameters ( ). 3 Proofs Let =( 1: : : ) The proof of Theorem 1 relies heavily on Corollary 5.5.5 of Reiss (1989) (also see Theorem 2.2.4 of Falk, Hüsler, and Reiss (2004)), which implies that under Condition 1, sup P( 1 ) P((Γ Γ 1 Γ 1 ) ) (() 12 ) (9) where the supremum is over Borel sets in R. Without loss of generality, assume 1 ( 0 ) 1 0, 2 0 1 1 2 0 (1 2) 0 and 0. We first prove two elementary lemmas. Let denote a generic positive constant that does not depend on or, not necessarily the same in each instant it is used. Lemma 2 Under Condition 1, for all 0, (a) for 13 1, () 1 1 and () 1 1 1 1 1 () 1 (1) 1 7

(b) for 13 12, 2 () 2 0 1 1 2 2 1 ( 2 (1) 2 2 1 ) (c) for 12 1, 1 2 1 2 () 2 1 and 2 () 2 ()( 1 ) R 1 1 ( R 1 (1) ( 2 1 2 1 )( 1 1 )) (d) for =12, 1 log 2 () log, and 2 () 2 ()( 1 ) R 1 1 ( R 1 (1) 1 log 1 log ) (e) for 13 1, R 3 () 3 1. Proof. (a) Follows from () = R () () and, under Condition 1, R () 1 1 1 1 R () 1 (1) and () 1() 1 R () (1). (b),(c),(d) Since 2 () = () 2 ( 2 0 R 2 ()) () for 12and 2 () = () 2 R 2 () () for 12, the results follow from 1 () 1, R 2 () ( 1 ) R 1 1 R 1 (1) viacondition1andtheresultin part (a). (e) Follows from R 3 () = R () 3 () () R 3 () () 3 by the inequality and Condition 1. Lemma 3 Under Condition 1 (a) with =max( 0 ), E[ ] () for all 0 1 (b) with =max( Γ 0), E[ ] () for all 0. Proof. (a) Let =(),sothatweneedtoshowthate[ ] is uniformly bounded or, equivalently, that P( ) 1 is uniformly integrable. We have, for 0 P( ) = P( () ) = P(1 ( ) 1 (() )) P( : 1 ) where : is the th order statistic of i.i.d. uniform [0 1] variables, and is such that 1 () 1 for all 0. By Lemma 3.1.2 of Reiss (1989), for all 0, P( : ) 1 ().Thus,P( ) ( 1 ) 1, where the last inequality holds for all ( ), and the result follows. 8

(b) Clearly, E[ ] () E[(Γ ) ].For0 1, E[(Γ ) ] E[Γ ] 1() = 1 while for 1, E[(Γ ) ]=E[( P 1 =1 ) ] E[ P 1 =1 ] by two applications of Jensen s inequality. ProofofTheorem1. We can assume 2 12 in the following, since otherwise, there is nothing to prove. Let =max( 0 ). Lemma 3.1.1 in Reiss (1989) implies that under Condition 1, P( 6= ). Write () =P( ). Assume first 13 12. Wehave " X : ( ) () =E P ( ) 12 ( ) P =1!# : ( )( ) ( ) 12 ( ) =1 Note that conditional on, the distribution of P =1 : isthesameasthatofthesumof i.i.d. draws from the truncated distribution with mean ( ) and variance ( ).The Berry-Esseen bound hence implies sup " X 1[ E =1 : ( ) ( ) 12 ( ) ] # R 3 () Φ() ( ) 12 3 ( ) where Φ() = ( ). Replacing by, by Lemma 2 (e), R 3 () ( ) 3 1 and 3 ( ) 3 ( 0 ) a.s. From Lemma 3 (a), E[ 3 1 P sup EΦ () =1 : ( )( ) ( ) 12 ( ) ] () 3 1,sothat! ( 12 () 3 1 ) From (9), with = (Γ ), sup () () 1 ( 12 () 3 1 () 12 ), where () 1 P! =1 =EΦ Γ ( )( ) ( ) 12 ( ) Let =max( 0 ) and note that by (9), ( 6= ) P( 6= )(() 12 ). Now focus on the claim in part (a). By Lemma 2 (a) and (b), ( ) 1 1 1 (1) 1 1 1 ( ) 1 and 2 ( ) 2 0 1 2 1 1 2 max( 2 (1) 2 2 1 ) a.s. Thus, exploiting that () =Φ() and () are uniformly bounded, and 0 2 ( 0 ) 9

2 ( ) 2 0 a.s., exact first order Taylor expansions and Lemma 3 (b) yield sup 1 () EΦ 12 P =1 Γ 12 1 Γ1 ³ 2 0 2 (Γ 1 2 ) 1 2 12 (() 12 12 () 1 () 2 (1) () 2 2 ) where =1 Γ.Let Γ 1 Γ = ( ) 1,sothat( Γ 6= Γ )=( 6= ),andwe can replace any Γ by Γ in the last expression without changing the form of the right hand side. Note that 1 Γ 1 ( 0 ) 1 0 and 2 0 2 ( Γ 1 2 ) 1 2 2 ( 0 ) a.s. Thus, by another exact Taylor expansion and E[Γ 1 Γ ] 2 E[Γ 2 2 ]E[(Γ ) 2 ] 3 2 2, we can replace by 1 at the cost of another error term of the form () 32. The result in part (a) now follows after eliminating dominated terms, and the proof of part (b) for 13 12 follows from the same steps. So consider =12. Let be the event (2) Γ (2). By Chebychev s inequality, P( )=P(12 P 1 =1 2). Conditional on, and recalling that 2 12, 1 2 ( ) log(), 2 ( ) 2 ( ) 1 R 1 1 1 1 1 ( ) 1 () 1 a.s. by Lemma (() 2 1 ()log()) and ( ) 1 1 2(a)and(d).Exactfirst order Taylor expansions of () 1 thus yield sup 1 () EΦ (log )12 P 12 =1 Γ 12 12 ³ 2 ( )2 R 2 (Γ ) 12 1 1 Γ12 12 ( 12 () 12 12 () 12 () ) and replacing by unity induces an additional error term of the form () 1 by the same arguments as employed above (and recalling that P( ) ). Wearelefttoprovetheclaimfor12 1. Note that the distribution of P =1 : conditional on only depends on through.letφ be the conditional distribution function of P : ( ) =1 given ( ) 12 ( ) =. For future reference, note that by Theorem 1.1 in Goldstein (2010), Φ Φ 1 = R Φ() Φ () ( ) R 12 3 ()() 3,so that by Lemma 2 (c) and (e), Φ Φ 1 12 1(2) for 0.Wehave µ P =1 () =EΦ : ( )( ) ( ) 12 ( ) 10

so that by (9), sup () () 2 (() 12 ), where () 2 P! =1 =EΦ Γ ( )( ) ( ) 12 ( ) Φ 1 Let be a uniform random variable on the unit interval, independent of (Γ ) =1, andlet be the quantile function of Φ.Then () 2 =P ( ) 12 ( )Φ 1 () ( )( ) X =1 Γ! Since Γ 1 Γ 2 Γ 2 Γ 3 Γ 1 Γ Γ are independent (cf. Corollary 1.6.11 of Reiss (1989)), the distribution of (Γ 1 Γ 2 ) conditional on Γ 2 Γ 3 Γ is the same as that conditional on Γ 2, which by a direct calculation is found to be Pareto with parameter 1. Thus, with () =1[ 1](1 1 ), ( ) 12 ( () 2 )Φ 1 =E () ( )( ) P! =2 Γ Γ 2 Note that for arbitrary 0 and R, with() =() Z E( Φ 1 ()) E( ) = ( )(Φ () Φ()) Z = (Φ() Φ ())( ) sup () Φ Φ 1 where the second equality stems from Riemann-Stieltjes integration by parts. Conditional on the event as defined above, Φ Φ 1 12, 1 () 2 1 2 ( ) R () 2 1, 2 ( ) 2 ( ) 1 1 1 (() 2 1 () 2 2 ) and ( ) 1 1 1 1 1 ( ) () 1 a.s. by Lemma 2 (a) and (c). Thus, by exact first order 1 Taylor expansions and exploiting that () is uniformly bounded and E[ ], E[Γ 2], sup () 2 () 3 ( 1 () 12 1 () 1 12 (() 12 () 32 )) 11

where () 3 =E P ³ R 12 =2 Γ 12 2 ( ) 1 1 1 Γ 2 1 Γ1 Ψ As before, we can replace Ψ by unity at the cost of another error term of the form 32, and the result follows after eliminating dominating terms. References Christoph, G., and W. Wolf (1992): Convergence theorems with a stable limit law, Mathematical Research. Akademie Verlag, Berlin. Csörgö, S., E. Haeusler, and D. M. Mason (1988): A probabilistic approach to the asymptotic distribution of sums of independent, identically distributed random variables, Advances in Applied Mathematics, 9(3), 259 333. de Haan, L., and S. Resnick (1996): Second-order regular variation and rates of convergence in extreme-value theory, The Annals of Probability, 24(1), 97 124. de Haan, L., and U. Stadtmüller (1996): Generalized regular variation of second order, Journal of the Australian Mathematical Society, 61(3), 381 395. Falk, M., J. Hüsler, and R. Reiss (2004): Laws of Small Numbers: Extremes and Rare Events. Birkhäuser, Basel. Goldstein, L. (2010): Bounds on the constant in the mean central limit theorem, The Annals of Probability, 38(4), 1672 1689. Hall, P. (1979): On the rate of convergence in the central limit theorem for distributions with regularly varying tails, Probability Theory and Related Fields, 49(1), 1 11. (1980): Characterizing the rate of convergence in the central limit theorem. I, The Annals of Probability, 8(6), 1037 1048. 12

(1981): Two-sided bounds on the rate of convergence to a stable law, Probability Theory and Related Fields, 57(3), 349 364. (1982): Rates of convergence in the central limit theorem. Pitman Publishing, Boston. LePage, R., M. Woodroofe, and J. Zinn (1981): Convergence to a stable distribution via order statistics, The Annals of Probability, 9(4), 624 632. Reiss, R.-D. (1989): Approximate distributions of order statistics: with applications to nonparametric statistics. Springer Verlag, New York. 13