A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution

A Note o Box-Cox Quatile Regressio Estimatio of the Parameters of the Geeralized Pareto Distributio JM va Zyl Abstract: Makig use of the quatile equatio, Box-Cox regressio ad Laplace distributed disturbaces, likelihood estimators are foud makig use of least absolute deviatio methods for the parameters of the geeralized Pareto distributio. Keywords: Geeralized Pareto, quatile, Box-Cox, Laplace distributio AMS classificatio: 6F0,60E05 Itroductio The parameters of the geeralized Pareto distributio (GPD) are estimated by makig use of quatile regressio ad the Box-Cox form (Zeller,996), assumig Laplace (double expoetial) distributed errors i the regressio. This assumptio leads to the least absolute estimatio (LAD) form, ofte used i quatile regressio (Kua (007), Koeker ad Hallock (00)). DasGupta ad Mishra (003) wrote a review o LAD estimatio. Some of the aspects of importace are the asymptotic ormality of LAD estimators, iterative estimatio procedures ad if the disturbaces follow a Laplace distributio, likelihood estimatio is equivalet to LAD estimatio. The properties ad a discussio of estimatio for the GPD is give i the book of Johso, Kotz ad Balakrisha (994). The two methods most widely to estimate the GPD parameters are the momet estimator (Hoskig &Wallis (987)) ad maximum likelihood estimatio (Zhag, 007), (Smith, 985). There are complicatios besides the difficult oliear aspect of the maximizatio of the likelihood, ad the aamolous behaviour of the likelihood surface is discussed by del Castillo ad Daoudi (009). Teugels ad Varoele (004) ivestigated the

use of the Box-Cox trasformatio for heavy-tailed distributios focusig o the trasformatio o the regular variatio properties of tail quatile fuctios. The GPD refers to a two-parameter distributio, but the distributio ca be writte i a geeralized 3-parameter GPD form with a locatio parameter µ, / ξ ξ ( x µ ) F( x) = +, x µ, σ > 0, ξ 0, σ σ a scale parameter ad ξ the shape parameter. The shape parameter is ofte importat to estimate whe workig with extreme values ad Pickads (975) showed that for large the tail idex over a certai threshold ca be estimated by usig the fact that the distributio of the observatios over the threshold follows a geeralized Pareto distributio. By rewritig the distributio fuctio, it follows that ξ ξ ( x µ ) ( F( x)) =. Let σ F ( ξ ) F( x) =, ξ ξ thus ( ) F ξ has the liear regressio form ( ξ ) F x 0 x 0 ( ) = β + β, β = µ / σ, β = / σ. A variatio of quatile regressio with the observatios as the depedet variable was used, but it does ot lead to the liear regressio form without approximatig the distributio (Johso, Kotz ad Balakrisha, 994), (Jagger ad Elser, 008). For most applicatios µ is zero ad the simpler form ( ξ ) F x x ( ) = β, β = / σ used. The likelihood will be derived i sectio ad the results of a simulatio study give to compare this estimator with the momet estimator. The likelihood ad simulatio results The stochastic model with a observed value ( ξ ) ( ξ y = F ) ( x) + u, where u is a error term assumed to be i.i.d. Laplace distributed with mea zero ad variace φ. It is reasoable to assume that the errors have a symmetric distributio ad the Laplace assumptio leads to the correct form for quatile regressio ad more robust estimators. The Laplace desity is

pu ( u) = exp( u θ / φ), < u <, φ > 0, φ with variace φ. The maximum likelihood estimate of φ is the mea absolute deviatio calculated about the media of the sample It will be assumed that E( u) = θ = 0. Deote a sample of size observatios from the GPD by x,..., x. The empirical j distributio values are F( x( j) ) = for the ordered sample x(),..., x ( ), Deote + the vector of observatios by ( ξ ) y, u the vector of errors. Let X x() =. x ( ) The Jacobia of the trasformatio from the u j, j =,..., to y,..., y is = j, by otig that j= J ( u y) y ξ u j y i = ( y ) ξ + i, i=j ad 0 otherwise. By followig the same methods as i Zeller (996), it follows that the loglikelihood simplifies to ( ξ ) L = cost + ( ξ + ) log( y ) log( ˆ j τ ( ξ )), where ˆ( τ ξ ) = ˆ ˆ y j β0 β x( j) j= deotes the mea absolute deviatio. β ˆ ' = [ ˆ β ˆ 0 β]' is such that it is the least absolute estimators (LAD) of β, β for the give ξ. The maximum likelihood estimators ca be foud by fidig the values of ξ ad β for which the loglikelihood is a maximum. I the case of the 3-parameter GPD iteratively reweighted least squares (IRLS) ca be used for a give value of ξ to fid the LAD estimators, the log-likelihood calculated ad the maximum over all possible ξ 's yields the estimator. The method ivolves the iterative procedure: j= ˆ ( ' ) ' ( j+ ) ( j) ( j) ( ξ ) β = X W X X W y, ( j) W a diagoal matrix with diagoal elemets: w = y Xβ, X i th row, i =,...,. ( j) ( ξ ) ( j) i i i i 3

The weights of zero or very small residuals ca be put equal to a large umber. This procedure is repeated util covergece. Momet estimators of β ca be used as a startig value for the iterative procedure. Sice the least squares estimates are ot very stable for the 3-parameter GPD samples, these estimates was ot used as startig values. x The momet estimators are ˆ σ = x( + ) ad ˆ x ξ = ( ). s s The results of a simulatio study comparig the momet estimators ad the Laplace regressio form will be give below. The usual two parameter form will be cosidered first. It is assumed that ξ = 0.35, σ =. The average estimated parameters for 500 simulatios are give i table ad. Note that ˆ β ˆ = / σ. The techique will be called the Box-Cox method i the table. Sample Size ( σ = ) Box-Cox Momet Estimator 5.0674 (0.0448) 0.8534 (0.058) 50.043 (0.06) 0.83 (0.075) 50.000 (0.0009) 0.903 (0.06) 50.05 (0.0006) 0.905 (0.0083) Table. Estimated ˆ β 's ad var( ˆ β) i brackets. The simulatio results for ξ is show i table. Sample Size ( ξ = 0.35) Box-Cox Momet Estimator 5 0.3480 (0.046) 0.930 (0.037) 50 0.3488 (0.0043) 0. (0.07) 50 0.3669 (0.0074) 0.575 (0.007) 50 0.354 (0.0039) 0.775 (0.0056) Table. Estimated ˆ's ξ ad var( ˆ ξ ) i brackets. It ca be see that the Box-Cox regressio form estimators are stable for the various sample sizes ad closer estimates tha the momet estimator, especially 4

whe estimatig ξ. It is a oliear procedure, but much easier to perform tha the maximizatio of the GPD likelihood. Show below is a histogram of 500 ˆ's ξ estimated from samples with 50 observatios, the GPD parameters are ξ = 0.35, σ =, µ = 0. The mea of the ˆ's ξ is 0.3578 ad covariace 0.0039. Figure. Histogram of 500 estimated ˆ's ξ, =50, ξ = 0.35. I the two histograms below, 500 estimated β0 ' s, β ' s from sample of size 50 for the 3-parameter GPD with parameters ξ = 0.40, σ =, µ = is show. Thus β = 0.5 ad β 0 =. The averages of the ˆ ξ ' s, ˆ β ˆ 0 ' s ad β ' s are 0.376, - 0.773 ad 0.435, with covariaces 0.088, 0.0 ad 0.048 respectively. 5

Figure. Histogram of 500 estimated β 0 = µ / σ =, =50, ξ = 0.40. Figure 3. Histogram of 500 estimated β = / σ = 0.5, =50, ξ = 0.40. 6

3. Applicatio to the absolute NYSE Composite log returs This techique was applied to the 50 largest absolute log returs of the NYSE Composite idex, daily closig values idex, for the period begiig 000 till the ed of 009. There were 54 observatios. The estimated parameters are: ˆ ξ = 0.90, ˆ σ = 0.06, ˆ µ = 0.048. The idex, the absolute log returs ad the fitted cdf of the GPD versus the empirical cdf are show i figures 4-6. figure 4. Daily closig values of the NYSE Composite Idex, 000-009. 7

figure 5. Absolute log returs, NYSE Composite Idex. Figure 6. The fitted GPD ad empirical cumulative distributio fuctio. 8

The extreme values is clearly well fitted by the GPD showig that the largest absolute log returs are regularly varyig, but the idex is greater tha 4, which is a importat coditio eeded to fit GARCH models. 4 Coclusio The quatile regressio form assumig Laplace errors, outperforms the method of momets. Although ivolvig oliear LAD estimatio, it is much easier tha maximizig the likelihood. The estimators have good asymptotic properties sice the estimators are maximum likelihood estimators. Possible refiemets ca be developed to esure fast covergece of the log-likelihood maximizatio. Refereces DasGupta, M. ad Mishra, R.K.,003. Least Absolute Deviatio Estimatio of Multi-Equatio Liear Ecoometric Models: A Study Based o Mote Carlo Experimets. North-Easter Hill Uiversity (NEHU) Ecoomics Workig Paper No. skm/0. Del Castillo, J. ad Daoudi, J., 009. Estimatio of the geeralized Pareto distributio. Statistics ad Probability Letters, 79, 684 688. Hoskig, J.R.M. ad Wallace, J.R., 987. Parameter ad quatile estimatio for the geeralized Pareto distributio. Techometrics, 9(3), 339 349. Jagger, T.H. ad Elser, J.B., 008. Modellig tropical cycloe itesity with quatile regressio. Iteratioal Joural of Climatology, 9,0, 35 36. Johso, N.L., Kotz, S. ad Balakrisha, N., 994. Cotiuous Uivariate Distributios. Volume. Wiley, New York. Koeker, R. ad Hallock, K., 00. Quatile regressio. Joural of Ecoomic Perspectives, 5, 43 56. Kua, C-M, 007. A itroductio to quatile regressio. Istitute of Ecoomics, Academia Siica. Pickads, J.,975. Statistical iferece usig extreme order statistics, A. Statistics, 3, 9 3. Smith, R.L., 987. Estimatig tails of probability distributios. A. Statist.,5, 74-07. 9

Teugels, J.F. ad Varoele, G., 004. Box-Cox trasformatios ad heavy-tailed distributios. Stochastic Methods ad Their Applicatios, 4, 3 7. Zeller, A., 996. A Itroductio to Bayesia Iferece i Ecoometrics. New York: Wiley. Zhag, J., 007. Likelihood momet estimatio for the geeralized Pareto distributio. Austr. N.Z. J. Stat. 49(), 69-77. 0