Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators

Size: px

Start display at page:

Download "Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators"

Irene Taylor
5 years ago
Views:

de Ingenieros de Caminos, University of Cantabria, 395 Santander, Spain Received 15 January 25; received in revised form 12 July 25; accepted 2 September 25 Available online 12 October 25 Abstract

1 Computational Statistics & Data Analysis 51 (26) Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators Alberto Luceño E.T.S. de Ingenieros de Caminos, University of Cantabria, 395 Santander, Spain Received 15 January 25; received in revised form 12 July 25; accepted 2 September 25 Available online 12 October 25 Abstract Some of the most powerful techniques currently available to test the goodness of fit of a hypothesized continuous cumulative distribution function (CDF) use statistics based on the empirical distribution function (EDF), such as those of Kolmogorov, Cramer von Mises and Anderson Darling, among others. The use of EDF statistics was analyzed for estimation purposes. In this approach, maximum goodness-of-fit estimators (also called minimum distance estimators) of the parameters of the CDF can be obtained by minimizing any of the EDF statistics with respect to the unknown parameters. The results showed that there is no unique EDF statistic that can be considered most efficient for all situations. Consequently, the possibility of defining new EDF statistics is entertained; in particular, an Anderson Darling statistic of degree two and one-sided Anderson Darling statistics of degree one and two appear to be notable in some situations. The procedure is shown to be able to deal successfully with the estimation of the parameters of homogeneous and heterogeneous generalized Pareto distributions, even when maximum likelihood and other estimation methods fail. 25 Elsevier B.V. All rights reserved. Keywords: Anderson Darling statistic; Cramer von Mises statistic; Empirical distribution function; Generalized linear models; Kolmogorov distance; Minimum distance estimator 1. Introduction The maximum likelihood (ML) estimation method is known to be asymptotically optimal to estimate the parameters of many discrete and continuous distributions. However, there is a considerable number of continuous distributions for which the probability density function and, subsequently, the likelihood function can be made arbitrarily large at some point, and hence maximum likelihood estimators (MLEs) of the parameters of these distributions do not generally exist. Moreover, if the range of the distribution depends on unknown parameters, MLEs may not possess their classical statistical properties because Cramer s regularity conditions often fail to hold in this situation. There are also some distributions such that the likelihood function may not have a local maximum for some sample values so that no MLE exists. When the ML method cannot be used, an alternative estimation method is the method of moments (MOM). However, there is also a considerable number of distributions for which some of the first few moments are not finite, with the consequence that moment estimates do not exist. Even when the moments exist, the MOM method may produce inefficient estimators. In addition, the MOM method cannot be used in the context of generalized linear models. Tel.: ; fax: address: lucenoa@unican.es /$ - see front matter 25 Elsevier B.V. All rights reserved. doi:1.116/j.csda

2 A. Luceño / Computational Statistics & Data Analysis 51 (26) An interesting example of a distribution that poses these difficulties is provided by the generalized Pareto CDF given by { 1 (1 kx/θ) 1/k if k =, F θ,k (x) = 1 exp( x/θ) if k =, (1) where θ > is a scale parameter and k is a shape parameter. The range of x is x for k and x θ/k for k>. When k>1, MLEs do not exist because the probability density corresponding to (1) tends to infinity when x tends to θ/k. Moreover, Cramer s regularity conditions do not hold for k> 1 3. The mean and variance are μ = θ/(1 + k) and σ 2 = θ 2 / { (1 + k) 2 (1 + 2k) }, so that μ and σ 2 are finite only for k> 1 and k> 2 1, respectively. Consequently, moments estimators do not exist for k 2 1. The problem of fitting the generalized Pareto distribution (GPD) to data has been approached by several authors including Hosking et al. (1985), Hosking and Wallis (1987), Davison and Smith (199), Walshaw (199), Grimshaw (1993), Castillo and Hadi (1997), and Castillo et al. (25), among others. Goodness-of-fit tests for the GPD have been suggested by Choulakian and Stephens (21). The GPD is also important because it contains the exponential distribution with mean θ as a limiting case when k tends to, the uniform distribution in the range [, θ] when k = 1, and the standard Pareto distribution when k<. Moreover, its relevance has recently increased considerably (see Appendix A) because as shown by Pickands (1975) it can be put in connection with the generalized extreme value distribution (GEVD)having CDF { { exp (1 k(x μ)/ψ) 1/k } if k =, F μ,ψ,k (x) = exp[ exp{ (x μ)/ψ)}] if k =. (2) In this paper we analyze a method for estimating the parameters of continuous CDFs, which is based on minimizing empirical distribution function (EDF) statistics and can be used as an alternative or as a complement to other estimation methods. Because EDF statistics are used to test the goodness of fit of continuous distributions, we call this method the maximum goodness of fit (MGF) estimation method. The estimators provided by the MGF method will be called maximum goodness-of-fit estimators (MGFEs). The origin of the MGF method goes back to Wolfowitz (1953, 1957) and Kac et al. (1955), under the name of minimum distance method. Moreover, Pollard (198) proved the n-consistency of the minimum distance estimators and found its asymptotic distribution. Because the name minimum distance method is often used in other contexts not related to EDF statistics (for example, when minimizing functions of the sample and population autocorrelations of the residuals in time series, or the distance between sample and predicted moments), we prefer to use the name MGF throughout the paper. One important property of the MGF method is that it can be used in situations in which there are no MOM or ML estimators. In contrast with the MOM or ML methods which lead to unique estimators, the MGF method provides several estimators depending on the particular EDF statistic chosen, thus providing a wider inductive basis. For instance, one particular EDF statistic could provide more weight to the left tail of the distribution, whereas a second EDF statistic could assign more weight to the right tail or to the central part of the distribution, and a third statistic could assign equal weight to every part of the distribution. Even though the MGF method seems to have been disregarded as a useful estimation method to fit the GPD to data (see, e.g., Castillo et al., 25; Coles, 21; Smith, 23), we shall show throughout the paper that the MGF method can be successfully used to estimate the parameters of the GPD (and of generalized linear models based on the GPD) even for very extreme values of the shape parameter. Section 2 compiles the classical EDF statistics used throughout the paper together with some new EDF statistics that are useful for estimation purposes. Section 3 describes the MGF estimation method. The performance of MGF estimators is analyzed for homogeneous GPDs in Section 4 and for generalized linear models based on GPDs in Section 5; a real example of application to ocean engineering is also considered in Section 4. Concluding remarks are given in Section Some EDF statistics useful for estimation Let (x 1,...,x n ) be a sample of n IID observations on a continuous random variable X with CDF F(x). Let x (1) x (n) be the corresponding order statistics and S n (x) be the empirical distribution function (see Rao, 1973).

3 96 A. Luceño / Computational Statistics & Data Analysis 51 (26) Table 1 Three classical EDF statistics Statistic Acronym Formula Kolmogorov distance KS D n = sup x F(x) S n (x) Cramer von Mises CM Wn 2 = n {F(x) S n(x)} 2 df(x) Anderson Darling AD A 2 n = n {F(x) S n (x)} 2 F(x){1 F(x)} df(x) Table 2 Modified Anderson Darling statistics Statistic Acronym Formula Right-tail AD ADR Rn 2 = n {F(x) S n (x)} 2 df(x) 1 F(x) Left-tail AD ADL L 2 n = n {F(x) S n (x)} 2 df(x) F(x) Right-tail AD of second degree AD2R rn 2 = n {F(x) S n (x)} 2 {1 F(x)} 2 df(x) Left-tail AD of second degree AD2L ln 2 = n {F(x) S n (x)} 2 {F(x)} 2 df(x) AD of second degree AD2 an 2 = r2 n + l2 n The purpose of EDF statistics is to measure the distance between F(x) and S n (x). Throughout the paper, the three classical EDF statistics in Table 1, as well as the five modified EDF statistics introduced in Table 2, will be considered. The AD statistic A 2 n gives more weight to the tails of the CDF than the CM statistic W n 2. Similarly, the ADR and ADL statistics assign more weight to the selected tail of the CDF than the CM statistic. Either of the tails, or both of them, can receive even larger weights by using second degree Anderson Darling statistics. Note also that an 2 is defined in analogy with the relationship A 2 n = R2 n + L2 n satisfied by the AD, ADR and ADL statistics. Because S n(x) is a step function with jumps at the order statistics, the EDF statistics in Tables 1 and 2 can be written in alternative forms, which are more useful for computational purposes (see Appendix B). 3. Maximum goodness-of-fit estimation 3.1. Homogeneous populations Suppose that (x 1,...,x n ) is a sample of n IID observations on a continuous random variable X having CDF F Θ (x), where Θ is a vector of unknown parameters. MGFEs for Θ can be obtained minimizing any one of the EDF statistics provided in Section Heterogeneous populations Now suppose that (y 1,...,y n ) is a sample of independent but not necessarily identically distributed observations on n continuous random variables (Y 1,...,Y n ) having CDFs F i,θ (y i ), i = 1,...,n, where the functions F i,θ ( ) share the same set of unknown parameters Θ. In this case one cannot directly evaluate EDF statistics for (y 1,...,y n ) because these data come from different CDFs. However, any one of the random variables (U 1,...,U n ) defined by the transformations U i = F i,θ (Y i ), where U i 1 and i = 1,...,n, must have the same uniform U[, 1] distribution if F i,θ ( ) is the true CDF of Y i. This can be easily demonstrated as follows: u i = F i,θ (y i ) = Pr (Y i y i ) = Pr [ F i,θ (Y i ) F i,θ (y i ) ] = Pr (U i u i ) = F Ui (u i ),

4 A. Luceño / Computational Statistics & Data Analysis 51 (26) where u i 1. Hence S Θ (u 1,...,u n ) is a sample of n IID realizations on a U[, 1] random variable, for which one can evaluate EDF statistics. MGFEs for Θ can be computed by minimizing with respect to Θ any one of the EDF statistics obtained for the transformed samples S Θ. 4. Performance of the MGF estimators in homogeneous populations 4.1. Uniform distribution The U[, θ] CDF is a particular case of the GPD, obtained using k = 1 in (1). Suppose that (x 1,...,x n ) is a sample of n IID observations on a U[, θ] random variable X. Although Cramer s regularity conditions do not hold, the scale parameter θ can be estimated using the ML method. The MLE of θ is ˆθ = max (x 1,...,x n ). (3) ) ) This estimator is biased because E (ˆθ =nθ/(n+1) = θ, but superefficient because var (ˆθ =nθ 2 /{ (n + 2)(n + 1) 2}. The root mean squared error (RMSE) of ˆθ ) is RMSE (ˆθ = θ{(n + 1)(n + 2)/2} 1/2. We have simulated 1 samples of size n = 1 on a U[, 1] distribution and have computed MGFEs θ for θ by minimizing all the EDF statistics in Section 2. (The results are, obviously, invariant with respect to the value of θ.) The minimizations have been performed subject to the boundary condition θ max (x 1,...,x n ), which is required when no outliers are present. Table 3 provides the estimated bias and RMSEs for the MGFEs corresponding to the eight EDF statistics of Section 2, which have been ordered in ascending order of their RMSE estimates. The table also provides the bias and RMSE corresponding to the MLE ˆθ in ordered position. Table 3 shows that the MGFEs obtained minimizing AD2R and AD2 statistics have smaller bias and RMSE than the MLE ˆθ. The ) ADR and AD statistics provide estimators with smaller bias than ˆθ, although their RMSEs are larger than RMSE (ˆθ. Although the MGFEs using AD2R and AD2 statistics outperform the MLE of θ for the U[, θ] distribution, one cannot expect that MGFEs will generally outperform MLEs in terms of the RMSE they produce, particularly if Cramer s regularity conditions hold, because in this case the ML method provides asymptotically optimal estimators. However, we shall provide evidence that MGFEs are often close to MLEs in terms of RMSEs, thus providing a justification for the use of MGFEs when the ML method lacks its optimality properties Some exponential distributions The exponential Exp(θ) CDF is a particular case of the GPD, obtained using k = in (1). Suppose that (x 1,...,x n ) is a sample) of n IID observations on an Exp(θ) random variable X. The MLE ˆθ = x is unbiased, asymptotically optimal and var (ˆθ = θ 2 /n. Therefore, MGFEs are expected to be inferior to ˆθ in terms of RMSEs. We have simulated 1 samples of size 1 for the Exp(1) distribution and have computed the corresponding MGFEs. (The results are, obviously, invariant with respect to the value of θ.) Table 4 provides the estimated bias and RMSEs for the MGFEs corresponding to the eight EDF statistics of Section 2, together with the bias and RMSE of the MLE, ordered according to their RMSEs. Although the MLE is optimal, the difference is small with respect to the MGFEs using ADR and AD statistics. Suppose now that the samples of size 1 are contaminated so that 95 observations come from the Exp(θ) distribution with θ = 1, but five observations correspond to the Exp(4θ) distribution. Table 5 provides the new values of the bias and RMSEs for the ML and MGF estimators of θ. One can see that the MLE is now outperformed by five of the eight MGFEs. Consequently, MGFEs may be more robust than MLEs. As a third model, suppose that the random variable X has the shifted exponential CDF { 1 exp{ (x α)/θ} if x α, F α,θ (x) = if x<α.

5 98 A. Luceño / Computational Statistics & Data Analysis 51 (26) Table 3 Estimated bias and RMSEs for the MGFEs corresponding to the eight EDF statistics of Section 2, together with the bias and RMSE of the MLE, based on 1 samples of size 1 on a uniform U[, θ] distribution with θ = 1, in ascending order of the RMSEs AD2R AD2 MLE ADR AD CM KS ADL AD2L Bias RMSE Table 4 Estimated bias and RMSEs for the MGFEs corresponding to the eight EDF statistics of Section 2, together with the bias and RMSE of the MLE, based on 1 samples of size 1 on an exponential Exp(θ) distribution with θ = 1, in ascending order of the RMSEs MLE ADR AD CM KS AD2 ADL AD2R AD2L Bias RMSE Table 5 Estimated bias and RMSEs for the MGFEs corresponding to the eight EDF statistics of Section 2, together with the bias and RMSE of the MLE, based on 1 samples of size 1 on the contaminated exponential distribution of Section 4.2 with θ = 1 AD CM KS ADR ADL MLE AD2L AD2 AD2R Bias RMSE Table 6 Estimated bias and RMSEs for the MGFEs corresponding to the eight EDF statistics of Section 2, together with the bias and RMSE of the MLE, based on 1 samples of size 1 on the shifted exponential distribution of Section 4.2 with α = 1 and θ = 1 α AD2L ADL MLE AD2 AD CM KS ADR AD2R Bias RMSE θ MLE ADR AD CM KS AD2 ADL AD2L AD2R Bias RMSE The MLEs are ˆα = min (x 1,...,x n ) and ˆθ = x ˆα. These estimators are biased, because E (ˆα ) ) = α + θ/n and E (ˆθ = θ(n 1)/n, and have variances var (ˆα ) ) = θ 2 /n 2 and var (ˆθ = θ 2 (n 1)/n 2 so that ˆα is superefficient for the location parameter α. Cramer s regularity conditions do not hold because of α. Table 6 provides the values of the bias and RMSEs for the MGF and ML estimators based on 1 samples of size 1 with α=1 and θ=1. The minimizations of the EDF statistics have been performed subject to the boundary condition α min (x 1,...,x n ). The MLE of θ is better than the corresponding MGFEs. However, the MGFEs for α based on AD2L and ADL statistics outperform the MLE ˆα both in terms of bias and RMSE; in addition, most of the MGFEs for α have smaller bias than the MLE ˆα Normal distribution For completeness, Table 7 shows the results corresponding to a normal N ( μ, σ 2) distribution with mean μ = 1 and standard deviation σ = 1. Obviously, the MLEs of μ and σ are optimal in this situation. Nevertheless, the difference between the RMSEs provided by the MLEs and several MGFEs (e.g., using the AD statistic) is very small, particularly for the estimation of μ.

6 A. Luceño / Computational Statistics & Data Analysis 51 (26) Table 7 Estimated bias and RMSEs for the MGFEs corresponding to the eight EDF statistics of Section 2, together with the bias and RMSE of the MLE, based on 1 samples of size 1 on a normal N(1, 1) distribution, in ascending order of the RMSEs μ MLE CM AD ADR ADL AD2 KS AD2R AD2L Bias RMSE σ MLE AD ADL ADR KS CM AD2 AD2R AD2L Bias RMSE Generalized Pareto distribution As shown in Section 1, the ML method fails to provide estimators for the parameters k and θ of the GPD, given in (1), for most positive values of the parameter k. Thus, Hosking and Wallis (1987) compare MLEs with MOM estimates and with the estimates provided by a method of probability-weighted moments (PWM), but only consider a small range of values of k, namely, k <.5. Castillo and Hadi (1997) introduce the elemental percentile method (EPM) and expand the range of values of k to k 2, but disregard MLEs. Following these authors, we shall compare MGF estimators with MOM, PWM and EPM estimators, and also with the estimator provided by a quasi-ml (QML) method. QML method: Loosely speaking, our QML method uses a combination of the standard ML method when k<.75 and a modified ML method when k.75. But, because it is not possible to know the exact value of k using solely the information in a random sample (x 1,...,x n ), the decision about whether to use the standard or the modified ML method must be taken on empirical grounds. Therefore, we have adopted the QML method having the following steps: (1) Compute k = 1 n 1 ( ) x (i) ln 1 n 1 max (x i=1 1,...,x n ) and ni=1 xi 2 Z = 1 /n 2 x 2. (5) (2) If k<.75 and Z<, compute standard MLEs for k and θ. (3) Otherwise, estimate k using Eq. (4) and estimate θ using θ = k max (x 1,...,x n ). (6) (4) The justification of this method is as follows. When k is large, the range of the GPD is x θ/k and the ML method fails. Therefore, for large values of k, one can use estimators such that θ/ k = max (x 1,...,x n ), which is in agreement with the MLE provided by Eq. (3) for k = 1. This justifies (6). Moreover, introducing (6) in the log-likelihood function of the parameters k and θ given the remaining sample ( ) x (1),...,x (n 1), and maximizing the resulting expression with respect to k leads to (4). The additional criterium based on Eq. (5) can be justified on intuitive grounds because the asymptotic value of Z is 1 3 for the uniform distribution (k = 1) and for the exponential distribution (k = ). Eq. (5) can also be justified because nz is the slope of the log-likelihood function in the direction of k when k tends to and ˆθ = x (for k =, the MLE of θ is ˆθ = x, so that nz is related to the sign of ˆk in the neighborhood of k = ). Simulation results: Table 8 provides estimated bias and RMSEs for the MGF estimators obtained using the eight EDF statistics of Section 2 together with the estimators provided by the methods referred to in this section (QML or ML, MOM, PWM and EPM). These estimated bias and RMSEs are based on 1 samples of size 1 on the GPD with θ = 1 and k ={ 2, 1,, 1, 2}. The minimizations of the EDF statistics have been performed subject to the boundary condition θ/ k max (x 1,...,x n ) whenever k>. (The results are, obviously, invariant with respect to the value of θ.)

7 91 A. Luceño / Computational Statistics & Data Analysis 51 (26) Table 8 Estimated bias and RMSEs for the MGFEs corresponding to the eight EDF statistics of Section 2 and the QML, ML, MOM, PWM and EPM methods of Section 4.4, based on 1 samples of size 1 on the GPD with θ = 1 and k ={ 2, 1,, 1, 2}, in ascending order of the RMSEs k = 2 θ QML ADR EPM AD2R AD CM KS AD2 ADL PWM MOM AD2L Bias RMSE k QML ADR AD2R EPM AD CM KS ADL AD2 PWM MOM AD2L Bias RMSE k = 1 θ QML AD2R ADR EPM AD CM KS AD2 ADL MOM PWM AD2L Bias RMSE k QML AD2R EPM ADR AD CM KS AD2 ADL PWM MOM AD2L Bias RMSE k = θ MOM EPM ADR PWM QML AD CM AD2R KS ADL AD2 AD2L Bias RMSE k MOM PWM ADR AD2R QML EPM AD CM KS ADL AD2 AD2L Bias RMSE k = 1 θ AD MLE ADR CM KS ADL EPM AD2R AD2 AD2L PWM MOM Bias RMSE k MLE ADR AD CM AD2R PWM KS ADL EPM AD2 MOM AD2L Bias RMSE k = 2 θ AD ADL MLE CM KS ADR AD2 EPM AD2R AD2L PWM MOM Bias RMSE k MLE ADR AD CM KS ADL AD2R AD2 EPM PWM AD2L MOM Bias RMSE One can see that the QML method appears to provide the best estimators for θ and k when k = 2 and 1, even though this method has not been previously considered in the literature. The second best estimator appears to be a MGF estimator, namely, the ADR statistic for k = 2 and the AD2R statistic for k = 1. The MOM estimator appears to be the best for θ and k when k =, whereas the ADR statistic provides the third best estimator for θ and k in this case; the second place is shared by the EPM method for θ and the PWM method for k. The QML occupies the fifth position whereas the MLE cannot be used yet (because Eqs. (4) and (5) do not always suggest using the ML method when the unknown value of k equals ). Note that the MOM estimator appears to be among the worst estimators of θ and k for k ={ 2, 1, 1, 2}. The AD statistic appears to provide the best estimator for θ when k = 1 and 2, even though the ML method is always (i.e., with sampling frequency equal to 1) chosen by Eqs. (4) and (5). The MLE appears, however, in the first position as an estimator of k when k = 1 and 2, closely followed by the ADR and AD statistics in the second and third places. The fact that the MLE for θ is outperformed by the MGF estimator based on the AD statistic when k = 1 and 2 can probably be explained because MLEs do not display their asymptotic efficiency for samples of size 1 on the GPD, which was shown by Hosking and Wallis (1987) for k <.5. Although Hosking and Wallis showed that the

8 A. Luceño / Computational Statistics & Data Analysis 51 (26) standard ML method gives worse performance than the MOM and PWM methods for samples sizes around 1 and k <.5, this does not hold for k = 1or 2. Example: To illustrate the use of MGFEs, we consider the Bilbao waves data X analyzed by Castillo and Hadi (1997). These are 179 zero-crossing hourly mean periods (in seconds) of the sea waves measured in a Bilbao buoy in January According to Pickands (1975), for large enough threshold u, the CDF of X u, conditional on X>u, converges to the GPD as u increases (see Appendix A). Hence Castillo and Hadi (1997) analyze the data using thresholds at u = 7, 7.5, 8, 8.5, 9 and 9.5. Fig. 1 shows the empirical and estimated CDFs obtained for u = 7 with the eight MGFEs in Section 2. One can see that the GPD does not provide a good fit of the data, particularly in the left tail of the distribution; apparently, the number of small values of X 7 contained in the sample is smaller than would be expected under the GPD. Therefore, the threshold u = 7 is not large enough for Pickands asymptotic results to hold approximately. Fig. 2 is a reproduction of Fig. 1 using now threshold u = 7.5 (the resulting sample size is 154). One can see that the fit provided by the GPD improves drastically, particularly for the estimates obtained using KS, CM, AD, ADR and ADL statistics. Although one could increase the threshold value and repeat the analysis, the results obtained would be based on smaller sample sizes as the threshold increased, with the consequence that important information in the sample would be disregarded. Careful study of Figs. 1 and 2 reveals subtle differences among the estimated CDFs obtained with the eight MGFEs, depending on the weights they assign to each tail of the CDF. The upper part of Fig. 3 is a Cartesian representation of the estimates ( θ, k) obtained using all the estimation methods considered in this section. The lower part of Fig. 3 shows the corresponding values of the mean μ and range θ/ k estimated for {X 7.5 X>7.5}. The standard errors of θ and k can be obtained by simulation; for example, using u = 7.5 and 1 random samples, the resulting standard errors for the MGFEs based on the ADR statistic are.168 and.86, respectively. Fig. 3 shows that the MGFEs provided by the AD, ADR and ADL statistics and the MOM and PWM estimates are very close to each other, thus appearing as a compact cluster of estimates in the figure. The KS and CM statistics provide close but somewhat smaller estimates for θ and k, and larger estimates for θ/k. The remaining estimates (EPM, QML, AD2, AD2R and AD2L) are more distant apart from each other showing larger values of θ and k, and smaller values of θ/ k; note that most of the methods in this group assign large weights to the data in the right tail (QML, AD2 and AD2R) and/or to the data in the left tail (AD2 and AD2L). 5. Performance of the MGF estimators in heterogeneous populations One important property of the MGF estimation method is that it can be used to fit generalized linear models using the procedure in Section 3.2. This is in contrast with the MOM, PWM, EPM and QML methods of Section 4.4, which cannot be used when the information about the unknown parameters is contained in a sample (Y 1,...,Y n ) of independent but not necessarily identically distributed random variables Y i, i = 1,...,n. To illustrate the performance of the MGF method, we use two generalized linear models based on the GPD. According to the first model, the CDF of Y i is given by { 1 {1 ky/ (θ1 + θ 2 X i )} 1/k if k =, F θ1,θ 2,k(y) = 1 exp { y/(θ 1 + θ 2 X i )} if k =, where X i is a covariate. We have taken k = 2 so that none of the methods in Section 4.4, including the ML method, can be used to estimate θ 1, θ 2 and k with the exception of the MGF methods. We have also taken θ 1 = 1, θ 2 = 1 and X i = i/1, for i = 1,...,1, so that the values of the scale parameters θ 1 + θ 2 X i range from 1.1 to 11. Table 9 shows the estimated bias and RMSEs of the MGF estimators θ 1, θ2 and k corresponding to the eight EDF statistics of Section 2, based on 1 samples on the random vector (Y 1,...,Y 1 ). The AD statistic appears to provide the most efficient estimator for the three unknown parameters. The second and third places are occupied by the ADL and CM statistics, respectively.

9 912 A. Luceño / Computational Statistics & Data Analysis 51 (26) KS 1 CM 1 AD 1 ADR 1 ADL 1 AD2 1 AD2R 1 AD2L Fig. 1. Empirical and estimated generalized Pareto CDFs for the Bilbao wave exceedances data using threshold u = 7 and the MGF estimators in Section 2.

10 A. Luceño / Computational Statistics & Data Analysis 51 (26) KS 1 CM 1 AD 1 ADR 1 ADL 1 AD2 1 AD2R 1 AD2L Fig. 2. Empirical and estimated generalized Pareto CDFs for the Bilbao wave exceedances data using threshold u = 7.5 and the MGF estimators in Section 2.

11 914 A. Luceño / Computational Statistics & Data Analysis 51 (26) Threshold = k.7 KS CM AD 5 ADR ADL AD2 AD2R AD2L EPM MOM PWM QML θ θ/k Threshold = 7.5 KS CM AD ADR ADL AD2 AD2R AD2L EPM MOM PWM QML µ Fig. 3. Estimated values of θ versus k (upper part) and the mean μ versus the range θ/k (lower part) for {X 7.5 X>7.5}, obtained using the GPD and all the estimation methods in Section 4.4 for the Bilbao wave exceedances data. The second model we have considered is defined by the following CDFs for the projections of the random vector (Y 1,...,Y n ): { 1 {1 (k1 + k 2 X i ) y/θ} 1/(k 1+k 2 X i ) if k 1 + k 2 X i =, F θ,k1,k2(y) = 1 exp{ y/θ} if k 1 + k 2 X i =, where X i is a covariate. For illustration, we have taken k 1 = 1, k 2 = 1, θ = 1, n = 1 and X i = 2ln{(i.5)/2)} so that the shape parameters k 1 + k 2 X i range from to 4.29 (note that this range is considerably larger than those used previously in the literature). Consequently, the MOM, PWM, EPM, QML and ML methods in Section 4.4 cannot be used to estimate θ, k 1 and k 2. Table 1 shows the estimated bias and RMSEs of the MGF estimators θ, k 1 and k 2 corresponding to the eight EDF statistics of Section 2, based on 1 samples on the random vector (Y 1,...,Y 1 ).

12 A. Luceño / Computational Statistics & Data Analysis 51 (26) Table 9 Estimated bias and RMSEs for the MGFEs corresponding to the eight EDF statistics of Section 2, based on 1 samples (Y 1,...,Y 1 ) on independent generalized Pareto random variables Y i, i = 1,...,1, having parameters k = 2 and θ i = θ 1 + θ 2 X i, where θ 1 = 1, θ 2 = 1 and X i = i/1, in ascending order of the RMSEs θ 1 AD ADL CM KS AD2 AD2R ADR AD2L Bias RMSE θ 2 AD ADL CM KS ADR AD2 AD2R AD2L Bias RMSE k AD ADL CM ADR AD2 AD2R KS AD2L Bias RMSE Table 1 Estimated bias and RMSEs for the MGFEs corresponding to the eight EDF statistics of Section 2, based on 1 samples (Y 1,...,Y 1 ) on independent generalized Pareto random variables Y i, i = 1,...,1, having parameters θ = 1 and k i = k 1 + k 2 X i, where k 1 = 1, k 2 = 1 and X i = 2ln{(i.5)/2}, in ascending order of the RMSEs θ AD ADL CM ADR AD2L AD2 KS AD2R Bias RMSE k 1 AD ADL CM ADR KS AD2 AD2R AD2L Bias RMSE k 2 AD ADL ADR CM AD2 AD2L AD2R KS Bias RMSE The AD statistic appears again to provide the most efficient estimator for the three unknown parameters, followed by the ADL statistic in second place. 6. Concluding remarks In this article, we propose using statistics based on the empirical distribution function to estimate parameters of probability distributions in homogenous populations and parameters of generalized linear models in heterogeneous populations. Among these statistics, we consider the classical Kolmogorov Smirnov, Cramer von Mises and Anderson Darling statistics, which are routinely used to perform goodness-of-fit tests in homogeneous populations. In addition, we introduce modified statistics such as the left-tail and right-tail Anderson Darling statistics and the Anderson Darling statistics of second degree and illustrate their usefulness in the estimation context. We have shown that some of the new estimation methods, generically called maximum goodness-of-fit methods, outperform the ML method to estimate the parameters of some distributions; for example, the scale parameter of the uniform distribution in Section 4.1, the location parameter of the shifted exponential distribution in Section 4.2, and the scale parameter of the generalized Pareto distribution in Section 4.4. We have carefully considered the generalized Pareto distribution (GPD), which has recently received considerably attention after the discovery by Pickands (1975) of its close connection with the generalized extreme value distribution. The GPD has since become one of the most important distributions to model extreme values in financial, insurance, environmental, hydrological and ocean statistics, among others. It is also very interesting from the estimation point of view, because classical estimation methods such as the ML and moment methods fail for important ranges of the shape parameter of the GPD. However, we have shown in Section 4.4 that the maximum goodness-of-fit estimation methods

13 916 A. Luceño / Computational Statistics & Data Analysis 51 (26) can always be used, whichever the values of the shape parameter might be, and also that one can always find some maximum goodness-of-fit estimators having the best or close to the best efficiency among the estimators included in the analysis. In this context we have introduced a quasi-maximum likelihood method, which appears to have very good efficiency for positive values of the shape parameter. Moreover, we have shown in Sections 3 and 5 that maximum goodness-of-fit estimation methods can be used in the context of generalized linear models. In particular, we have used two generalized Pareto models for heterogeneous populations, whose parameters cannot be estimated by any of the methods considered, including ML, quasi-maximum likelihood, moment, probability-weighted moment and elemental percentile methods. However, these generalized Pareto models can be fitted using maximum goodness-of-fit estimators. The efficiency attained by these estimators in this context compares well with the efficiency attained for homogeneous populations. Acknowledgements This research was partially supported by the Spanish DGI Grant MTM I thank Professor George E.P. Box for great help during my repeated visits to the University of Wisconsin-Madison, where part of the research was performed. I am also grateful to a co-editor, an associate editor and two referees for their very helpful suggestions, which have led to an improved manuscript. Appendix A. Some connections between the GPD and the GEVD Pickands (1975) shows that classical limit results for sample maxima lead to parallel limit results for exceedances over thresholds. In particular, suppose that X 1,X 2,...is a sequence of independent and identically distributed (IID) random variables such that an appropriately normalized CDF of M n =max (X 1,...,X n ) converges to the GEVD. Then, for large enough threshold u, the CDF of X i u, conditional on X i >u, converges to the GPD as u increases. Both distributions share the same shape parameter k. A connection among the remaining parameters, which also embraces the intensity of a Poisson process, can be established using a Poisson-GPD process (see Coles, 21; Smith, 23) in which the number N of exceedances over the level u in any particular period (e.g., any year) has a Poisson distribution with mean λ and, conditional on N 1, the exceedances X 1,...,X N are IID random variables following the GPD. Then Prob {max (X 1,...,X N ) x} = e λ λ i e λ + {1 (1 k(x u)/θ) 1/k} i i! i=1 { = exp λ(1 k(x u)/θ) 1/k}. Identifying Eqs. (2) and (A.1) leads to the following relationships among the threshold u, the Poisson mean λ and the parameters of the GPD and GEVD: θ = ψλ k = ψ k(u μ). In addition, if u changes, the scale of the GPD becomes a function of u, say θ u, such that θ u + ku stays constant independently of u. Clearly, the GPD provides a wider inductive basis than the GEVD because the number of exceedances over interesting thresholds in any given period is usually much larger than one, which is the number of maxima in the period. This applies to many natural phenomena such as floods, waves, winds, temperatures, or earthquakes, among others. (A.1) Appendix B. Computational forms for the EDF statistics Using the notation z i = F ( x (i) ) and considering that Sn (x) is a step function with jumps at the order statistics, the EDF statistics in Tables 1 and 2 can be written in the forms of Table B.1.

14 A. Luceño / Computational Statistics & Data Analysis 51 (26) Table B.1 Computational forms for the EDF statistics Acronym Formula KS D n = 1 2n + max 1 i n z i i 1/2 n CM Wn 2 = 1 12n + n ( z i i 1/2 ) 2 i=1 n AD A 2 n = n 1 n (2i 1) {ln z i + ln (1 z n+1 i )} n i=1 ADR Rn 2 = n 2 2 n z i 1 n (2i 1) ln (1 z n+1 i ) i=1 n i=1 ADL L 2 n = 3n n z i 1 n (2i 1) ln z i i=1 n i=1 AD2R rn 2 = 2 n n 2i 1 AD2L AD2 ln (1 z i ) + 1 i=1 n i=1 1 z n+1 i ln 2 = 2 n ln z i + 1 n 2i 1 i=1 n i=1 z i an 2 = 2 n {ln z i + ln (1 z i )} + 1 n i=1 n i=1 ( 2i 1 + 2i 1 ) z i 1 z n+1 i References Castillo, E., Hadi, A.S., Fitting the generalized Pareto distribution to data. J. Amer. Statist. Assoc. 92, Castillo, E., Hadi, A.S., Balakrishnan, N., Sarabia, J.M., 25. Extreme Value and Related Models with Applications in Engineering and Science. Wiley, New York. Choulakian, V., Stephens, M.A., 21. Goodness-of-fit tests for the generalized Pareto distribution. Technometrics 43, Coles, S., 21. An Introduction to Statistical Modeling of Extreme Values. Springer, London. Davison, A.C., Smith, R.L., 199. Models for exceedances over high thresholds (with comments). J. Roy. Statist. Soc. B 52, Grimshaw, S.D., Computing maximum likelihood estimates for the generalized Pareto distribution. Technometrics 35, Hosking, J.R.M., Wallis, J.R., Parameter and quantile estimation for the generalized Pareto distribution. Technometrics 29, Hosking, J.R.M., Wallis, J.R., Wood, E.F., Estimation of the generalized extreme-value distribution by the method of probability-weighted moments. Technometrics 27, Kac, M., Fiefer, J., Wolfowitz, J., On tests of normality and other tests of goodness of fit based on distance methods. Ann. Math. Statist. 26, Pickands, J., Statistical inference using extreme order statistics. Ann. Statist. 3, Pollard, D., 198. The minimum distance method of testing. Metrika 27, Rao, C.R., Linear Statistical Inference and its Applications. second ed. Wiley, New York. Smith, R.L., 23. Statistics of Extremes, with Applications in Environment, Insurance and Finance. Department of Statistics, University of North Carolina, Chapel Hill, NC. Walshaw, D., 199. Discussion of Models for exceedances over high thresholds by A. C. Davison and R. L. Smith. J. Roy. Statist. Soc. B 52, Wolfowitz, J., Estimation by the minimum distance method. Ann. Inst. Statist. Math. 5, Wolfowitz, J., The minimum distance method. Ann. Math. Statist. 28,

A TEST OF FIT FOR THE GENERALIZED PARETO DISTRIBUTION BASED ON TRANSFORMS

A TEST OF FIT FOR THE GENERALIZED PARETO DISTRIBUTION BASED ON TRANSFORMS Dimitrios Konstantinides, Simos G. Meintanis Department of Statistics and Acturial Science, University of the Aegean, Karlovassi,