A TEST OF FIT FOR THE GENERALIZED PARETO DISTRIBUTION BASED ON TRANSFORMS

Similar documents
Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators

GOODNESS OF FIT TESTS FOR PARAMETRIC REGRESSION MODELS BASED ON EMPIRICAL CHARACTERISTIC FUNCTIONS

Statistic Distribution Models for Some Nonparametric Goodness-of-Fit Tests in Testing Composite Hypotheses

ENTROPY-BASED GOODNESS OF FIT TEST FOR A COMPOSITE HYPOTHESIS

TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Testing Exponentiality by comparing the Empirical Distribution Function of the Normalized Spacings with that of the Original Data

Journal of Environmental Statistics

arxiv: v1 [stat.me] 25 May 2018

Recall the Basics of Hypothesis Testing

Test of fit for a Laplace distribution against heavier tailed alternatives

Investigation of goodness-of-fit test statistic distributions by random censored samples

Estimation of Quantiles

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy

Package EWGoF. October 5, 2017

APPROXIMATING THE GENERALIZED BURR-GAMMA WITH A GENERALIZED PARETO-TYPE OF DISTRIBUTION A. VERSTER AND D.J. DE WAAL ABSTRACT

On robust and efficient estimation of the center of. Symmetry.

Research Article Tests of Fit for the Logarithmic Distribution

Asymptotic Statistics-VI. Changliang Zou

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

One-Sample Numerical Data

Financial Econometrics and Volatility Models Extreme Value Theory

Package homtest. February 20, 2015

Computer simulation on homogeneity testing for weighted data sets used in HEP

Does k-th Moment Exist?

The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Exact Statistical Inference in. Parametric Models

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples

TESTING PROCEDURES BASED ON THE EMPIRICAL CHARACTERISTIC FUNCTIONS I: GOODNESS-OF-FIT, TESTING FOR SYMMETRY AND INDEPENDENCE. 1.

Application of Homogeneity Tests: Problems and Solution

Parameter Estimation

Lecture 8 Inequality Testing and Moment Inequality Models

Spline Density Estimation and Inference with Model-Based Penalities

ASTIN Colloquium 1-4 October 2012, Mexico City

Hypothesis testing: theory and methods

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

A Conditional Approach to Modeling Multivariate Extremes

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

UNIVERSITY OF CALGARY. A New Hybrid Estimation Method for the. Generalized Exponential Distribution. Shan Zhu A THESIS

Department of Statistics, School of Mathematical Sciences, Ferdowsi University of Mashhad, Iran.

A goodness of fit test for the Pareto distribution

Modified Kolmogorov-Smirnov Test of Goodness of Fit. Catalonia-BarcelonaTECH, Spain

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

A New Two Sample Type-II Progressive Censoring Scheme

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Extreme Value Analysis and Spatial Extremes

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Chapter 1: A Brief Review of Maximum Likelihood, GMM, and Numerical Tools. Joan Llull. Microeconometrics IDEA PhD Program

Classical Extreme Value Theory - An Introduction

Inferences on stress-strength reliability from weighted Lindley distributions

Distribution Fitting (Censored Data)

A comparison study of the nonparametric tests based on the empirical distributions

A note on vector-valued goodness-of-fit tests

Tests of fit for symmetric variance gamma distributions

A TEST OF RANDOMNESS BASED ON THE DISTANCE BETWEEN CONSECUTIVE RANDOM NUMBER PAIRS. Matthew J. Duggan John H. Drew Lawrence M.

Asymptotic Statistics-III. Changliang Zou

Statistics 3858 : Maximum Likelihood Estimators

Goodness of Fit Tests for Rayleigh Distribution Based on Phi-Divergence

arxiv:math/ v1 [math.pr] 9 Sep 2003

Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories

Step-Stress Models and Associated Inference

Extreme Value Theory as a Theoretical Background for Power Law Behavior

Goodness of Fit: an axiomatic approach

Chapter 31 Application of Nonparametric Goodness-of-Fit Tests for Composite Hypotheses in Case of Unknown Distributions of Statistics

Model Fitting, Bootstrap, & Model Selection

Advanced Statistics II: Non Parametric Tests

Abstract: In this short note, I comment on the research of Pisarenko et al. (2014) regarding the

1.0 I] MICROCOPY RESOLUTION TEST CHART ..: -,.,-..., ,,..,e ' ilhi~am &32. Il W tle p 10 A..

A copula goodness-of-t approach. conditional probability integral transform. Daniel Berg 1 Henrik Bakken 2

Use of mean residual life in testing departures from exponentiality

Exact goodness-of-fit tests for censored data

TESTING PROCEDURES BASED ON THE EMPIRICAL CHARACTERISTIC FUNCTIONS II: k-sample PROBLEM, CHANGE POINT PROBLEM. 1. Introduction

TESTS BASED ON EMPIRICAL DISTRIBUTION FUNCTION. Submitted in partial fulfillment of the requirements for the award of the degree of

Robustness and Distribution Assumptions

Lecture 2: CDF and EDF

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics

Analysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

IE 303 Discrete-Event Simulation L E C T U R E 6 : R A N D O M N U M B E R G E N E R A T I O N

Introduction to Rare Event Simulation

Change Point Analysis of Extreme Values

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

NAG Library Chapter Introduction. G08 Nonparametric Statistics

inferences on stress-strength reliability from lindley distributions

1 Mixed effect models and longitudinal data analysis

Estimation of parametric functions in Downton s bivariate exponential distribution

Parameter Estimation

Variable inspection plans for continuous populations with unknown short tail distributions

Lecture 7 Introduction to Statistical Decision Theory

New Consistent Integral-Type Tests for Stochastic Dominance

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

TESTS OF FIT FOR THE RAYLEIGH DISTRIBUTION BASED ON THE EMPIRICAL LAPLACE TRANSFORM

Different methods of estimation for generalized inverse Lindley distribution

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Non-parametric Inference and Resampling

arxiv: v2 [math.st] 14 Sep 2015

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Transcription:

A TEST OF FIT FOR THE GENERALIZED PARETO DISTRIBUTION BASED ON TRANSFORMS Dimitrios Konstantinides, Simos G. Meintanis Department of Statistics and Acturial Science, University of the Aegean, Karlovassi, 832 00 Samos, Greece and Department of Economics, National and Kapodistrian University of Athens, 8 Pesmazoglou Street, 105 59 Athens, Greece Abstract. The generalized Pareto distribution is a very popular two parameter model for extreme events. In this article we develop a class of goodness of fit tests for the composite null hypothesis that the data at hand come from a Generalized Pareto distribution with unspecified parameter values. In doing so, the parameters are estimated, and the resulting estimates are employed in transforming the data, so that under the null hypothesis the transformed data approximately follow a unit exponential distribution. Then the null hypothesis of exponentiality is tested instead, by utilizing the empirical Laplace transform. The method is shown to be consistent, and the asymptotic null distribution of the test statistic is derived. The results of a Monte Carlo study are presented which include, apart from the proposed test, the standard methods based on the empirical distribution function, implemented by employing moment and maximum likelihood estimates, as well as probability weighted moments estimates. Keywords. Exterme events, Goodness of fit test, Empirical Laplace transform. AMS 2000 classification numbers: 62G10, 62G20 1

1 Introduction The analysis of series of observations consisting of the largest (or smallest) values was tradinionally based on the family of generalized extreme value distributions. However this approach has recently been critisized mainly because of its inefficient use of available data. One approach towards incorporating this loss of information is to consider several of the larger order statistics. Then the resulting exceedances (differences between these order statistics and a given high threshold) are typically modelled by the generalized Pareto distribution (GPD). The GPD is a two parameter model with a shape parameter α and a scale parameter c. Its distribution function is F (x; α, c) = 1 (1 (αx/c)) 1/α, α IR, c > 0, and has support x > 0 ( resp. 0 < x < c/α), if a 0 (resp. a > 0). We write GP(α, c) to denote the GPD with parameters α and c. For α < 0, the distribution is related to the P areto distribution of the second kind (see Johnson et al., 1994, p. 575), whereas for α = 0 and α = 1, respectively, the exponential with scale c and the uniform in (0,c) result. For applications of the GPD to such diverse areas of applied research as metereology, economics, ecology and reliability, the reader is referred to Choulakian and Stephens (2001), Singh and Ahmad (2004), and references therein. Recently, Choulakian and Stephens (2001) proposed classical goodness of fit tests for the null hypothesis H 0 : The law of X is GP(α, c) for some α IR and c > 0, based on independent copies {X j } n of the random variable X 0. These tests, namely the Kolmogorov Smirnov, the Crámer von Mises and the Anderson Darling test, utilize the empirical distribution function (EDF). In this article we develop a class of goodness of fit tests for the GPD based on the empirical Laplace transform. To this end consider, instead of X, the random variable Y = (1/α) log(1 (αx/c)), which under H 0 follows a unit exponential distribution. Consequently the Laplace transform (LT) L(t) = E(e ty ) of Y satisfies (1.1) (1 + t)l(t) 1 = 0, for all t > 0. 2

Hence we may test H 0, by employing the data (1.2) Y j = (1/ˆα) log(1 (ˆαX j /ĉ)), j = 1, 2,..., n, where ˆα (resp. ĉ) denotes a consistent estimator of the parameter α (resp. c). Under H 0 and for large n, {Y j } n will approximately follow a unit exponential distribution. Therefore it would be natural (in view of (1.1)) to base a test for H 0 on a measure of deviation from zero of the random function D n (t) = (1 + t)l n (t) 1 on [0, ], where L n (t) = 1 n exp( ty j ), is the empirical LT of the transformed data {Y j } n. Such a test for exponentiality was developed by Henze and Meintanis (2002) and takes the form (1.3) T n,a = n 0 D 2 n(t)e at dt, a > 0. Since it was shown that the test that rejects exponentiality for large values of T n,a, is more powerful than the classical tests referred to above, it is hoped that this will be so in the present, more general, situation. It should be emphasized that the empirical LT has proved a valuable tool for statistical inference, not only in the case of testing exponentiality, but also in goodness of fit tests for the inverse Gaussian (Henze and Klar, 2002) and the gamma distribution (Henze and Meintanis, 2004), as well as in the estimation context (Csörgő and Teugels, 1990, Feuerverger, 1989), to name a few. The paper is organized as follows. In Section 2 we derive the limit null distribution of T n,a and establish the consistency of the test based on T n,a against general alternatives. The results of a Monte Carlo study are presented in Section 3, where the new tests are compared to the EDF goodness of fit tests for the GPD law. The paper concludes with real data examples given in Section 4. 2 Theoretical results We briefly review the setting for asymptotic distribution theory. For more details the reader is referred to Henze and Meintanis (2004). A convenient setting is the separable Hilbert space H = L 2 (IR +, B, e at dt) of (equivalence classes of) measurable functions 3

f : IR + IR + satisfying 0 f 2 (t)e at dt <. The inner product and the norm in H are denoted by < f, g > and f, respectively. Here and in what follows, the notation D P means convergence in distribution of random elements and random variables, means convergence in probability, o P (1) stands for convergence in probability to 0, O P (1) denotes boundedness in probability, and i.i.d. means independent and identically distributed. In view of the parametric bootstrap procedure, which will be carried out to obtain critical values for T n,a, we study the null asymptotic distribution of T n,a under a triangular array X n1, X n2,..., X nn, n 1, of rowwise i.i.d. random variables, where X n1 GP(α n, 1), α n IR, and lim n α n = α. For the bootstrap procedure, α n = ˆα n is chosen. We further consider only regular estimators ˆα n and ĉ n of α and c, respectively. Namely we assume that under the triangular array referred to above, n(ˆαn α n ) = 1 n n(ĉn 1) = 1 n ψ 1 (X nj ; α n ) + ɛ n,1, ψ 2 (X nj ; α n ) + ɛ n,2, where for k = 1, 2, ɛ n,k = o p (1) and ψ k has zero mean, and finite second moment which satisfies lim n Eψk 2(X n1; α n ) = Eψk 2 (X; α), with X GP(α, 1). 3 Finite sample comparisons This section presents the results of a Monte Carlo study conducted at a 10% nominal level with 1 000 replications to assess the performance of the new tests. In order to avoid reliance on asymptotic critical values and since the null distribution of all test statistics considered depends on the (unknown) value of the shape parameter α we performed a parametric bootstrap to obtain the critical point p n of the test as follows: Conditionally on the observed value of ˆα n, generate 100 bootstrap samples from GP(ˆα n, 1). Calculate the value of the test statistic, say T j, (j = 1, 2,..., 100), for each bootstrap sample. Obtain p n as T(90), where T (j), j = 1, 2,..., 100 denotes the ordered values T j values. We have adapted the choice in Gürtler and Henze (2000) 4

and used the modified critical point p n = T(90) + 0.90(T (91) T (90) ), which leads to a more accurate empirical level of the test. i) Methods of estimation: We consider the moment (MO) estimators where ˆα n = 1 2 [ ( ) 2 [ ( Xn 1], ĉ n = 1 ) 2 S n 2 X Xn n + 1], S n X n = 1 n X j, S 2 n = 1 n (X j X n ) 2. The MO estimators are regular, in the sense of (2.3), for α > 1/4 (Hosking and Wallis, 1987). Hosking and Wallis (1987) also consider a related class of estimators, which are regular for α > 1/2. These are termed probability weighted moments (PWM) estimators and may be written as ˆα n = X n X n 2w 2, ĉ n = 2w X n X n 2w, where w = 1 n (1 π j )X (j), π j = j 0.35 n, j = 1, 2,..., n, and X (1) X (2)... X (n) are the order statistics of {X j } n. The maximum likelihood (ML) estimators are regular for α < 1/2 (Smith, 1984). We have employed the routine of Grimshaw (1993) in obtaining the ML estimators, supplemented by a bisection routine to locate the maximum of the profile log likelihood, [ ] L(ϑ) = n log(1 ϑx j ) n log 1 log(1 ϑx j ), nϑ with respect to ϑ = α/c. Let ˆϑ n denote the maximizing value. Then the ML estimators of α and c, are ˆα n = 1 n log(1 ˆϑ n X j ), ĉ n = ˆα n ˆϑ n. 5

Comparative evaluation for the performance of estimators of the GPD, including the estimators considered here, may be found in Peng and Welsh (2001) and Singh and Ahmad (2004). ii) Test statistics: From (1.3) it follows by straightforward algebra that the new test statistic may conviniently be written as T n,a = 1 n j,k=1,n 1 + (Y j + Y k + a + 1) 2 (Y j + Y k + a) 3 2 with Y j, j = 1, 2,..., n defined in (1.2). 1 + Y j + a (Y j + a) 2, The new test is compared with the EDF procedures developed by Choulakian and Stephens (2001). Let ˆF (x) = F (x; ˆα n, ĉ n ). Then the Kolmogorov Smirnov (KS) statistic is where KS = max{d +, D }, D + = max { j,2,...,n n ˆF (X (j) )}, D = max { ˆF (X (j) j 1,2,...,n n )}. The Crámer von Mises (CM) statistic is CM = 1 ( 12n + ˆF (X (j) ) 2j 1 ) 2, 2n and the Anderson Darling (AD) statistic is AD = n 1 n [ (2j 1) log ˆF (X (j) ) + (2(n j) + 1) log(1 ˆF ] (X (j) )). iii) Simulation results: All calculations were done at the Department of Economics, University of Athens, using double precision arithmetic in FORTRAN and routines from the IMSL library, whenever available. Apart from the GPD, the gamma with density Γ(θ) 1 x θ 1 exp ( x), the Weibull with density θx θ 1 exp( x θ ), and the log normal distribution with density (θx) 1 (2π) 1/2 exp[ (log x) 2 /(2θ 2 )], are considered. These distributions will be denoted by Γ(θ), W (θ) and LN(θ), respectively, while the GPD with unit scale will simply be denoted by GP (α). 6

Table 1: Percentage of rejection for 1 000 Monte Carlo samples of size n = 50 (left part) n = 75 (middle part) and n = 100 (right part). Estimation: MO (top), PWM (middle) and ML (bottom). T 0.25 T 0.5 T 0.75 T 0.25 T 0.50 T 0.75 T 0.25 T 0.50 T 0.75 GP ( 0.20) 13 13 13 12 12 13 11 12 12 GP ( 0.10) 11 12 12 10 11 11 10 11 11 GP (0.0) 11 11 11 10 10 10 10 10 10 GP (0.20) 9 10 9 9 9 9 9 9 9 GP (0.50) 9 9 9 9 9 8 9 9 9 GP (1.0) 8 8 8 8 8 7 8 9 9 Γ(2.0) 66 65 62 83 83 78 92 92 89 W (0.75) 62 62 61 73 73 73 81 80 79 W (1.25) 24 24 21 30 32 30 41 41 42 W (1.50) 45 50 51 60 67 66 75 81 81 LN(1.0) 62 52 45 83 72 59 93 85 77 GP ( 0.40) 10 11 11 10 10 10 10 11 11 GP ( 0.20) 9 10 10 10 9 11 9 10 10 GP (0.0) 9 9 9 9 9 10 8 8 9 GP (0.20) 9 10 9 9 9 9 9 8 9 GP (0.50) 9 10 10 9 10 9 9 10 10 GP (1.0) 9 11 10 9 9 9 9 11 10 Γ(2.0) 71 76 73 88 88 86 95 95 94 W (0.75) 47 47 48 59 57 57 67 65 66 W (1.25) 26 27 26 32 32 32 39 44 42 W (1.50) 56 62 64 73 79 79 84 88 88 LN(1.0) 77 73 67 93 89 84 98 96 95 GP ( 1.0) 8 9 8 9 9 10 9 10 10 GP ( 0.75) 8 8 8 9 10 10 9 9 9 GP ( 0.50) 8 9 9 9 10 11 9 10 9 GP ( 0.25) 9 9 9 9 10 10 10 10 10 GP (0.0) 9 9 9 9 10 10 10 10 10 GP (0.20) 8 8 8 9 9 10 10 10 10 GP (0.40) 6 6 6 7 8 8 9 10 9 Γ(2.0) 62 59 55 90 88 83 97 96 94 W (0.75) 42 41 40 54 51 50 63 61 61 W (1.25) 19 20 18 30 29 29 43 43 41 W (1.50) 43 41 40 72 70 68 85 84 82 LN(1.0) 68 60 55 89 81 74 97 92 88 7

Table 2: Percentage of rejection for 1 000 Monte Carlo samples of size n = 50 (left part) n = 75 (middle part) and n = 100 (right part). Estimation: MO (top), PWM (middle) and ML (bottom). CM KS AD CM KS AD CM KS AD GP ( 0.20) 13 13 13 11 12 11 10 11 10 GP ( 0.10) 12 12 11 10 10 10 10 10 10 GP (0.0) 10 10 10 10 10 9 9 9 9 GP (0.20) 8 8 8 8 8 8 8 7 7 GP (0.50) 8 8 8 8 7 8 7 9 8 GP (1.0) 9 8 9 8 8 8 9 8 9 Γ(2.0) 64 61 66 82 78 85 92 89 94 W (0.75) 49 45 58 60 55 70 66 60 76 W (1.25) 22 20 21 27 24 27 36 34 38 W (1.50) 55 54 54 72 69 74 85 83 85 LN(1.0) 10 10 10 10 10 10 10 10 10 GP ( 0.40) 11 12 10 10 10 9 10 9 9 GP ( 0.20) 11 10 10 9 9 9 9 8 8 GP (0.0) 10 9 9 9 8 8 8 7 7 GP (0.20) 9 9 9 9 9 8 9 9 9 GP (0.50) 11 10 10 10 9 9 9 9 9 GP (1.0) 11 10 11 10 9 9 10 9 10 Γ(2.0) 76 73 76 89 88 90 96 95 96 W (0.75) 33 29 42 41 37 52 49 42 61 W (1.25) 27 26 27 33 31 33 42 41 42 W (1.50) 65 64 65 82 81 82 89 89 90 LN(1.0) 50 44 56 69 63 78 83 79 91 GP ( 1.0) 10 11 10 10 10 10 8 8 8 GP ( 0.75) 10 10 11 11 10 9 8 8 8 GP ( 0.50) 11 11 10 10 9 10 8 7 9 GP ( 0.25) 10 11 10 10 10 9 9 9 8 GP (0.0) 10 10 10 10 10 10 10 9 9 GP (0.20) 8 9 8 9 9 9 9 10 9 GP (0.40) 6 6 6 7 8 8 8 9 9 Γ(2.0) 51 44 54 80 73 85 90 87 94 W (0.75) 27 25 38 35 32 47 43 36 57 W (1.25) 19 17 18 27 23 28 36 31 37 W (1.50) 37 31 38 61 54 65 78 72 81 LN(1.0) 48 41 56 66 63 77 82 78 91 8

Table 1 shows results (percentage of rejection rounded to the nearest integer) for T n,a, a = 0.25, 0.50, 0.75. In the table we simply write T a instead of T n,a. Corresponding results for the KS, CM, and AD tests are shown in Table 2. 9