Estimation of the extreme value index and high quantiles under random censoring

Similar documents
Statistics of extremes under random censoring

arxiv: v1 [stat.me] 18 May 2017

Semi-parametric tail inference through Probability-Weighted Moments

Nonparametric Estimation of Extreme Conditional Quantiles

Bias-corrected goodness-of-fit tests for Pareto-type behavior

Research Article Strong Convergence Bound of the Pareto Index Estimator under Right Censoring

Department of Econometrics and Business Statistics

Goodness-of-fit testing and Pareto-tail estimation

ESTIMATING BIVARIATE TAIL

A Closer Look at the Hill Estimator: Edgeworth Expansions and Confidence Intervals

AN ASYMPTOTICALLY UNBIASED MOMENT ESTIMATOR OF A NEGATIVE EXTREME VALUE INDEX. Departamento de Matemática. Abstract

A NOTE ON SECOND ORDER CONDITIONS IN EXTREME VALUE THEORY: LINKING GENERAL AND HEAVY TAIL CONDITIONS

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

PREPRINT 2005:38. Multivariate Generalized Pareto Distributions HOLGER ROOTZÉN NADER TAJVIDI

Extreme Value Theory and Applications

MFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015

ON THE TAIL INDEX ESTIMATION OF AN AUTOREGRESSIVE PARETO PROCESS

A MODIFICATION OF HILL S TAIL INDEX ESTIMATOR

Analysis methods of heavy-tailed data

Multivariate generalized Pareto distributions

Change Point Analysis of Extreme Values

APPROXIMATING THE GENERALIZED BURR-GAMMA WITH A GENERALIZED PARETO-TYPE OF DISTRIBUTION A. VERSTER AND D.J. DE WAAL ABSTRACT

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Modelling Multivariate Peaks-over-Thresholds using Generalized Pareto Distributions

Estimation of risk measures for extreme pluviometrical measurements

Nonparametric estimation of extreme risks from heavy-tailed distributions

Discussion on Human life is unlimited but short by Holger Rootzén and Dmitrii Zholud

A Review of Univariate Tail Estimation

Extreme value statistics for truncated Pareto-type distributions

Journal of Statistical Planning and Inference

A Note on Tail Behaviour of Distributions. the max domain of attraction of the Frechét / Weibull law under power normalization

A Robust Estimator for the Tail Index of Pareto-type Distributions

Spatial and temporal extremes of wildfire sizes in Portugal ( )

arxiv: v1 [stat.me] 26 May 2014

Nonparametric estimation of tail risk measures from heavy-tailed distributions

The Convergence Rate for the Normal Approximation of Extreme Sums

University of California, Berkeley

Modelação de valores extremos e sua importância na

Financial Econometrics and Volatility Models Extreme Value Theory

Quantitative Modeling of Operational Risk: Between g-and-h and EVT

Pitfalls in Using Weibull Tailed Distributions

A THRESHOLD APPROACH FOR PEAKS-OVER-THRESHOLD MODELING USING MAXIMUM PRODUCT OF SPACINGS

Math 576: Quantitative Risk Management

Abstract: In this short note, I comment on the research of Pisarenko et al. (2014) regarding the

Adaptive Reduced-Bias Tail Index and VaR Estimation via the Bootstrap Methodology

Estimation of Reinsurance Premium for Positive Strictly Stationary Sequence with Heavy-Tailed Marginals

Bias Reduction in the Estimation of a Shape Second-order Parameter of a Heavy Right Tail Model

New reduced-bias estimators of a positive extreme value index

Estimating Bivariate Tail: a copula based approach

Empirical Tail Index and VaR Analysis

Extreme Value Theory as a Theoretical Background for Power Law Behavior

PARAMETER ESTIMATION FOR THE LOG-LOGISTIC DISTRIBUTION BASED ON ORDER STATISTICS

HIERARCHICAL MODELS IN EXTREME VALUE THEORY

Multivariate generalized Pareto distributions

The extremal elliptical model: Theoretical properties and statistical inference

arxiv: v4 [math.pr] 7 Feb 2012

Reduced-bias estimator of the Conditional Tail Expectation of heavy-tailed distributions

Tail negative dependence and its applications for aggregate loss modeling

Extreme Value Analysis and Spatial Extremes

Change Point Analysis of Extreme Values

Lifetime Dependence Modelling using a Generalized Multivariate Pareto Distribution

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS

Multivariate Pareto distributions: properties and examples

Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC

PENULTIMATE APPROXIMATIONS FOR WEATHER AND CLIMATE EXTREMES. Rick Katz

Change Point Analysis of Extreme Values

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

NONPARAMETRIC ESTIMATION OF THE CONDITIONAL TAIL INDEX

Precise Asymptotics of Generalized Stochastic Order Statistics for Extreme Value Distributions

Classical Extreme Value Theory - An Introduction

A New Estimator for a Tail Index

Estimation de mesures de risques à partir des L p -quantiles

AFT Models and Empirical Likelihood

Wei-han Liu Department of Banking and Finance Tamkang University. R/Finance 2009 Conference 1

T E C H N I C A L R E P O R T PROJECTION ESTIMATORS OF PICKANDS DEPENDENCE FUNCTIONS. FILS-VILLETARD, A., GUILLOU, A. and J.

Extreme L p quantiles as risk measures

Frequency Estimation of Rare Events by Adaptive Thresholding

Extreme Precipitation: An Application Modeling N-Year Return Levels at the Station Level

Contributions to extreme-value analysis

A Conditional Approach to Modeling Multivariate Extremes

Pareto approximation of the tail by local exponential modeling

ASYMPTOTIC MULTIVARIATE EXPECTILES

A note on top-k lists: average distance between two top-k lists

Analytical Bootstrap Methods for Censored Data

Extreme value statistics: from one dimension to many. Lecture 1: one dimension Lecture 2: many dimensions

Likelihood ratio confidence bands in nonparametric regression with censored data

and Comparison with NPMLE

A Class of Weighted Weibull Distributions and Its Properties. Mervat Mahdy Ramadan [a],*

On the Haezendonck-Goovaerts Risk Measure for Extreme Risks

arxiv: v1 [math.st] 4 Aug 2017

ON THE ESTIMATION OF EXTREME TAIL PROBABILITIES. By Peter Hall and Ishay Weissman Australian National University and Technion

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Challenges in implementing worst-case analysis

Frontier estimation based on extreme risk measures

Bayesian Modelling of Extreme Rainfall Data

The use of L-moments for regionalizing flow records in the Rio Uruguai basin: a case study

Quantile-quantile plots and the method of peaksover-threshold

ISSN Asymptotic Confidence Bands for Density and Regression Functions in the Gaussian Case

Refining the Central Limit Theorem Approximation via Extreme Value Theory

Kernel density estimation for heavy-tailed distributions...

Transcription:

Estimation of the extreme value index and high quantiles under random censoring Jan Beirlant () & Emmanuel Delafosse (2) & Armelle Guillou (2) () Katholiee Universiteit Leuven, Department of Mathematics, Celestijnenlaan 200B, 300 Leuven, Belgium (2) Université Paris VI, L.S.T.A., Boîte 58, 75 rue du Chevaleret, 7503 Paris Key words and phrases: Pareto index, extreme quantile, censoring, Kaplan-Meier estimator. Abstract. In this paper, we consider the estimation problem of the extreme value index and extreme quantiles in the presence of censoring. Taing into account the fact that our main motivation is application in insurance, we focus on the Fréchet and Gumbel domains of attraction. In the case of no-censoring, the most famous estimator of the Pareto index is the classical Hill estimator (975). Some adaptations of this estimator in the case of censoring are proposed and used to build extreme quantile estimators. A theoretical study of the asymptotic properties of such estimators is started. The finite sample behaviour is illustrated in a small simulation study and also in a practical insurance example. Résumé. Dans cet article, nous considérons le problème de l estimation d un index des valeurs extrêmes et de quantiles extrêmes en présence de censure aléatoire. Compte tenu du fait que notre motivation principale concerne l application en assurance, nous nous concentrons sur les domaines d attraction de Fréchet et de Gumbel. Dans le cas non censuré, l estimateur de l index le plus connu est l estimateur de Hill (975). Nous proposons des adaptations de cet estimateur de l index dans le cas censuré que nous utilisons par la suite dans le but d estimer un quantile extrême. Une étude théorique des propriétés asymptotiques de ces nouveaux estimateurs est proposée. Par ailleurs, leur comportement est illustré sur la base de simulations et sur un exemple de données réelles. Mots-clés: Index de Pareto, quantile extrême, données censurées, estimateur de Kaplan- Meier.. Introduction. When a data set contains observations within a restricted range of values, but otherwise not measured, it is called a censored data set. Statistical techniques for analyzing censored data sets are quite well studied, especially in survival analysis and biostatistics in general where censoring mechanisms are quite common. Especially the case of right censoring where some results are nown to be at least as large as the reported value, received a lot of attention. Here we can for instance refer to Cox and Oaes (984). This then

concerns central characteristics of the underlying distribution. The literature on tail or extreme value analysis for censored data is almost non existing. In Reiss and Thomas (997) (section 6.), Beirlant et al. (996) (section 2.7) and Beirlant and Guillou (200) in case of truncated data, some estimators of tail indices were proposed without any deeper study on their behaviour. However, important problems such as the estimation of extreme quantiles apparently were not considered before in general. Data sets with censored extreme data often occur in insurance when reported payments cannot be larger than the maximum payment value of the contract. When the reported payment equals the maximum payment, this real payment can indeed be equal to the maximum or can be censored. The situation where all data above a fixed value are censored is referred to as truncation or type I censoring. This case was considered in Beirlant and Guillou (200). It can occur when the observations are not the real payments but the payments as a fraction of the sum insured, in which case the truncation level equals 00%. Here we consider random right censoring. The claim sizes X are possibly censored by the maximum payment Y. A maximum payment of a given contract is then considered as a realization of the random variable Y. Different situations can now occur, whether the censoring values (or maximum payment values) are observed or not. To be more specific, let X i, i IN, be independent and identically distributed (i.i.d.) random variables with common distribution function (df) F and let Y i, i IN, be a second i.i.d. sequence with df G. We only observe Z i = X i Y i, δ i = l Xi Y i, i IN. We denote by H the df of Z and let τ H = inf{x : H(x) = }, the supremum of the support of H. We define H (z) = IP(Z > z, δ = ) = IP(z < X Y ). Being motivated by actuarial applications we confine ourselves to the case where sample maxima from X samples are in the domain of attraction of the Fréchet or Gumbel law. This typically means that we consider polynomially decreasing tails or exponentially decreasing tails with infinite right endpoint. We will consequently consider the following cases: Observing (Z, δ), X independent of Y, and both X and Y are in the domain of attraction of the Fréchet law; Observing (Z, δ), X independent of Y, X is in the domain of attraction of the Fréchet or the Gumbel law, and Y in the domain of attraction of the Fréchet law. In order to illustrate the methods presented in this paper, we use a liability insurance example from Frees and Valdez (998). 2. Estimation techniques. 2.. Observing (Z, δ), X independent of Y, and both X and Y are in the domain of attraction of the Fréchet law 2

Supposing that F is of Pareto-type, that is, there exists a positive constant α for which where l is a slowly varying function at infinity satisfying F (x) = x α l (x), () l (λx) l (x) when x, for all λ > 0. In order for the censoring to be not too heavy, it appears natural to assume that the censoring distribution is also heavy tailed G(x) = x β l 2 (x), (2) for some β > 0 and slowly varying l 2. Assuming that X and Y are independent, so that H(x) = ( F (x))( G(x)), it now follows that H(x) = x (α+β) l(x), (3) with l also a slowly varying function at infinity. These conditions can be restated in terms of the tail quantile functions as U F (x) = x /α l,u (x), U G (x) = x /β l 2,U (x), U H (x) = x /(α+β) lu (x), with U F (x) = inf{y : F (y) /x}, x >, and l,u (x), l 2,U (x) and l U (x) again slowly varying functions at infinity. Our goal is( to ) discuss the estimation problem of γ := α and of extremes quantiles x F,p := U F p with p <. This problem has received a lot of attention in case of nocensoring, i.e. when X i Y i for all i =,..., n. The most famous estimator of γ is Hill s n (975) estimator, given by H X,,n = log X n i+,n log X n,n. (4) i= Turning to the estimation of high quantiles, the estimator proposed by Weissman (978) serves as a reference under Pareto-type models without censoring: ˆx p, = X ( + ) HX,,n n,n. (5) (n + )p In case of random right censoring, the lielihood based on E j,t = Z j, Z t j > t, is changed into N t ( ) αe α δj ( ) j E α δj j, 3

leading to the estimator H (c) Z,t = ni= log(z i /t)l {Zi >t} ni=, (6) δ i l {Zi >t} while for the extreme quantile estimator we propose to use ˆx (c) p,t = t ( ˆFn (t) p ) H (c) Z,t, (7) where ˆF n (x), < x < τ H denotes the Kaplan-Meier (958) product limit estimator of F (x), defined as ˆF n [ n (x) = δ ] j,nl Zj,n x, n j + where Z j,n denote the order statistics associated to Z,..., Z n and δ j,n := δ if and only if Z j,n = Z. The corresponding tail probability estimator is now of course given by IP ˆ (c) (X > x) = ( ˆF n (t)) ( x) (c) /H Z,t. (8) t When choosing t = Z n,n, we obtain the estimator ( log(zn j+,n H (c) Z,,n = ) log(z n,n ) ), (9) δ n j+,n which is the original Hill estimator adapted for right censoring. We will give also another interpretation for this estimator which is based on a novel QQ-plot. 2.2. Observing (Z, δ), X independent of Y, X in the domain of attraction of the Fréchet or Gumbel law, and Y in the domain of attraction of the Fréchet law When considering the extension to the case where γ 0, again as in the no-censoring case there are mainly two sets of solutions which originated from two different formulations of the model. First, the maximum lielihood approach based on POT s (Peas over Threshold) is based on the results given by Balema and de Haan (974) and Picands (975), stating that the limit distribution of the absolute exceedances over a threshold t when t is given by a generalized Pareto distribution (GPD). In the case of censoring, we can easily adapt the lielihood to [ fgp D (Ẽj) ] δ j [ FGP D (Ẽj) ] δ j 4

where Ẽj = Z j t if Z j > t and F GP D (x) = ( ) + γ x γ. Then, the maximization of σ this expression leads to a POT estimator for γ which we further denote by ˆγ t,ml. (c) Secondly, we can construct a new estimator based on upper order statistics for instance within the framewor of the QQ-plot regression technique. For example, in the case of no-censoring, Beirlant et al. (996) proposed an estimator of a real-valued index based on a generalized quantile plot, which taes over the role of the Pareto quantile plot in this more general setting. More precisely they proposed to loo at the graph with coordinates ( n + ) log, log UH j,n, j =,..., n, j with UH j,n = X n j,n H X,j,n. Again this plot becomes ultimately linear for small j with slope approximating γ. Then, one can construct several regression based estimators, such as ˆγ,UH = log UH j,n log UH +,n. From the above it appears natural to define a generalization of ˆγ,UH to the censoring case as a slope estimator of the generalized quantile plot adapted for censoring ( ( log ˆFn (Z n j+,n ) ), log UH j,n) (c), (0) (j =,..., n ) where UH (c) j,n = Z n j,n H (c) Z,j,n: ˆγ (c),uh = log UH (c) j,n log UH (c) +,n. () δ n j+,n Using one of the abovementioned estimators ˆγ (c).,. of γ 0 we can now propose new estimators for the quantile x F,p, in the spirit of the one proposed by Deers et al. (989) in the case of no-censoring: ˆx (c) p,t,. = t + ˆγ (c).,. t ( ˆFn(t) )ˆγ (c).,. p ˆγ (c).,.. (2) Under suitable assumptions, we establish the asymptotic properties of our estimators. We illustrate their behaviour in a small simulation study, but also in a practical insurance example. 5

Bibliography [] Balema, A. and de Haan, L. (974). Residual life time at great age, Ann. Probab., 2, 792-804. [2] Beirlant, J. and Guillou, A. (200). Pareto index estimation under moderate right censoring, Scand. Actuarial J., 2, -25. [3] Beirlant, J. Teugels, J.L. and Vyncier, P. (996). Practical Analysis of Extreme Values, Leuven University Press, Leuven. [4] Beirlant, J., Vyncier, P. and Teugels, J.L. (996). Excess functions and estimation of the extreme value index, Bernoulli, 2, 293-38. [5]Cox, D.R. and Oaes, D (984). Analysis of Survival Data, Chapman and Hall, New Yor. [6] Deers, A.L.M., Einmahl, J.H.J. and de Haan, L. (989). A moment estimator for the index of an extreme-value distribution, Ann. Statist. 7, 833-855. [7] Frees, E. and Valdez, E. (998). Understanding relationships using copulas, North American Actuarial Journal, 2, 5. [8] Hill, B.M. (975). A simple general approach to inference about the tail of a distribution, Ann. Statist., 3, 63-74. [9] Kaplan, E.L. and Meier, P. (958). Non-parametric estimation from incomplete observations, J. Amer. Statist. Assoc., 53, 457-48. [0] Picands III, J. (975). Statistical inference using extreme order statistics, Ann. Statist., 3, 9-3. [] Reiss, R.D. and Thomas, M. (997). Statistical Analysis of Extreme Values with Applications to Insurance, Finance, Hydrology and Other Fields, Birhäuser Verlag, Basel. [2] Weissman, I. (978). Estimation of parameters and large quantiles based on the largest observations. J. Amer. Statist. Assoc. 73, 82-85. 6