Information Matrix for Pareto(IV), Burr, and Related Distributions

MARCEL DEKKER INC. 7 MADISON AVENUE NEW YORK NY 6 3 Marcel Dekker Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker Inc. COMMUNICATIONS IN STATISTICS Theory and Methods Vol. 3 No. pp. 35 35 3 Information Matrix for Pareto(IV) Burr and Related Distributions Vytaras Brazauskas* Department of Mathematical Sciences University of Wisconsin-Milwaukee Milwaukee Wisconsin USA ABSTRACT In this paper the exact form of information matrix for Pareto(IV) and related distributions is determined. The Pareto(IV) family being very general includes more specialized families of Pareto(I) Pareto(II) and Pareto(III) and the Burr family of distributions as special cases. These distributions for example arise as tractable parametric models in actuarial science economics finance and telecommunications. Additionally a useful mathematical result with its own domain of importance is obtained. In particular explicit formula for the improper integral R ððlog xþ m =ð þ xþ þb Þ dx with b > and non-negative integer m is derived. *Correspondence: Vytaras Brazauskas Department of Mathematical Sciences University of Wisconsin-Milwaukee P.O. Box 43 Milwaukee WI 53-43 USA; E-mail: vytarasuwm.edu. 35 DOI:.8/STA-888 Copyright & 3 by Marcel Dekker Inc. 36-96 (Print); 53-45X (Online) www.dekker.com

MARCEL DEKKER INC. 7 MADISON AVENUE NEW YORK NY 6 3 Marcel Dekker Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker Inc. 36 Brazauskas Key Words: Gamma function; Information; Pareto models; Polygamma functions.. INTRODUCTION In this paper the exact form of (Fisher) information matrix for Pareto(IV) and related distributions is determined. It is well-known that information matrix serves as a valuable tool for derivation of covariance matrix in the asymptotic distribution of maximum likelihood estimators (MLE). As discussed in Serfling (98) Sec. 4. under suitable regularity conditions the determinant (divided by the sample size) of the asymptotic covariance matrix of MLE reaches an optimal lower bound for the volume of the spread ellipsoid of joint estimators. In the univariate case this optimality property of MLE is widely used in the robustness versus efficiency studies as a quantitative benchmark for efficiency considerations. See for example Brazauskas and Serfling (a b) Hampel et al. (986) Huber (98) Kimber (983a 983b) and Lehmann (983) Chapter 5. The Pareto(IV) family being very general includes more specialized families of Pareto(I) Pareto(II) and Pareto(III) and the Burr family of distributions as special cases. These distributions are suitable for situations involving relatively high probability in the upper tail. More specifically such models have been formulated in the context of actuarial science economics finance and teletraffic for example for distributions of variables such as sizes of insurance claims sizes of firms incomes in a population of people stock price fluctuations and length of telephone calls. (See Arnold (983) and Johnson et al. (994) Chapter 9 for a broad discussion of Pareto models and their diverse applications.) New application contexts continue to be found. For example Crato et al. (997) have recently discovered Pareto-type tail behavior in the cost distributions of combinatorial search algorithms. As a by-product of the main result a useful mathematical formula (which plays a very important role in our derivations) is obtained. In particular explicit formula for the improper integral ðlog xþ m dx þb ð þ xþ ð:þ with b > and non-negative integer m is derived. Surprisingly however computation of integral (.) has not been addressed in the mathematical

MARCEL DEKKER INC. 7 MADISON AVENUE NEW YORK NY 6 3 Marcel Dekker Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker Inc. Information Matrix for Pareto(IV) 37 literature including such comprehensive handbooks of mathematics as Abramowitz and Stegun (97) and Harris and Stocker (998). A hierarchy of Pareto models as well as their relation to the Burr family is discussed in Sec.. Elements of the information matrix for Pareto(IV) Burr and related distributions are computed in Sec. 3. Derivation of integral (.) along with other intermediate integrals and formulas are presented in the Appendix.. PARETO(IV) AND RELATED DISTRIBUTIONS As discussed in Arnold (983) Chapter 3 a hierarchy of Pareto distributions is established by starting with the classical Pareto(I) distribution and subsequently introducing additional parameters which relate to location scale shape and inequality. Such an approach leads to a very general family of distributions called the Pareto(IV) family with the cumulative distribution function FðxÞ þ x = x > ð:þ where <<þ is the location parameter > is the scale parameter > is the inequality parameter and > is the shape parameter which characterizes the tail of the distribution. We denote this distribution by Pareto(IV) ð Þ. Note that in general statistical science there is no such type of parameters like inequality. Parameter is called the inequality parameter because of its interpretation in the economics context. That is if we choose and in expression (.) then parameter ( ) is precisely the Gini index of inequality. Clearly the other three types of Pareto distributions can be identified as special cases of the Pareto(IV) family by appropriately choosing parameters in Eq. (.) (see Arnold (983) pp. 44 45): Pareto(I) ð Þ Pareto(IV) ð Þ Pareto(II) ð Þ Pareto(IV) ð Þ Pareto(III) ð Þ Pareto(IV) ð Þ: The Burr family of distributions is also sufficiently flexible and enjoys long popularity in the actuarial science literature (see for example

MARCEL DEKKER INC. 7 MADISON AVENUE NEW YORK NY 6 3 Marcel Dekker Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker Inc. 38 Brazauskas Daykin et al. (994) and Klugman et al. (998)). However even this family can be treated as a special case of Pareto(IV): Burr ð Þ Pareto(IV) ð = Þ (see Klugman et al. (998) p. 574). In order to make the Pareto(IV) distribution a regular family we assume that parameter is known and without loss of generality equal to. (For regularity conditions on F and their interpretation see for example Serfling (98) pp. 44 45.) Also we note that this assumption is not too restrictive for modeling purposes because in typical applications the lower limit of variables of interest is known. In insurance and reinsurance context for example the lower limit is pre-defined by a contract and can be represented as a deductible or a retention level. (This perhaps is one of the main reasons why generality of the Burr distribution remains sufficient for actuarial modeling.) 3. INFORMATION MATRIX FOR PARETO(IV) Suppose X is a random variable with the probability density function f ðþ where ð... k Þ. Then the information matrix IðÞ is the k k matrix with elements log f I ij ðþ E ðxþ log f ðxþ : i j For the Pareto(IV) ð Þ distribution we have ð 3 Þ ð Þ and the density function ðx=þ = f ðxþ ð þðx=þ = Þ þ : ð3:þ Thus the required partial derivatives are log f ðxþ log f ðxþ log f ðxþ 3 log f ðxþ log f ðxþ log f ðxþ þ þðx=þ = log x þ logðx=þ þðx=þ = log þ x = þ :

MARCEL DEKKER INC. 7 MADISON AVENUE NEW YORK NY 6 3 Marcel Dekker Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker Inc. Information Matrix for Pareto(IV) 39 Note that for this problem the first two partial derivatives = i j of f ðþ exist and therefore an alternative (based on the second-order derivatives) formula for computation of information matrix elements may be used. We found however that except for a couple of cases this approach does not simplify our derivations. Therefore it is not pursued here. Since the information matrix IðÞ is symmetric it is enough to find elements I ðþ I ðþ I 3 ðþ I ðþ I 3 ðþ and I 33 ðþ. Derivation of these elements is based on the following strategy: first we express each I ij ðþ i j 3 in terms of integrals A A5 B B4 C C D D which are defined (and their explicit formulas are presented) in Appendix A.; then tedious algebraic simplifications yield the following formulas. log f ðxþ I ðþ f ðxþ dx ð þ Þ ðþ A þ þ A ðþ ð þ Þ log f ðxþ log f ðxþ I ðþ f ðxþ dx ð þ Þ 3 B 3 ð þ Þ I 3 ðþ log f ðxþ B ð þ Þ 3 ðþ ðþ ðþ þ log f ðxþ f ðxþ dx B3 þ þ þ A3 A þ þ A5 ð þ Þ log f ðxþ I ðþ f ðxþ dx ð þ Þ B4 þ 4 4 C þ ð þ Þ 4 ð þ Þ C ðþ ðþ þ ðþþ ð þ Þ ðþ ðþ ðþ ð þ Þ B þ 3 3 B ð Þ þ ð þ Þ A ðþ ðþ ðþ

MARCEL DEKKER INC. 7 MADISON AVENUE NEW YORK NY 6 3 Marcel Dekker Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker Inc. 3 Brazauskas I 3 ðþ log f ðxþ log f ðxþ f ðxþ dx þ B B D þ þ D þ A3 ðþ ð þ Þ ðþ ðþ log f ðxþ I 33 ðþ f ðxþ dx A3 þ A4 : Finally elements I ðþ I ðþandi 3 ðþ can be written in a simpler form by using the polygamma functions ðnþ ðaþ ðd n =da n Þð ðaþ=ðaþþ for a > and integer n. Specifically we use digamma ðaþ ðþ ðaþ and trigamma ðaþ functions. Thus the information matrix I P ðþ for the Pareto(IV) ð Þ distribution is given by: ðþ ð þ Þ ½ ðþ ðþþš ð þ Þ B ð þ Þ ½ ðþ ðþþš ð þ Þ ð ðþ ðþþ þ ðþþ ðþ þ ð ðþ ðþþ ð þ Þ ðþ ðþ ð þ Þ ð þ Þ ðþ ðþ ð þ Þ C A 3.. Special Cases 3... Burr( ) Distribution Since the Burr distribution is a reparametrization of Pareto(IV) ð Þ it follows from Lehmann (983) Sec..7 that its information matrix I B ðþ can be derived from J I P ðþ J where J is the Jacobian matrix of the transformation of variables. Thus I B ðþ is then given by: B ðþ ð þ Þ ½ ðþ ðþšþ ð þ Þ ð þ Þ ½ ðþ ðþšþ ð þ Þ ð ðþ ðþþ þ ðþþ ðþ þ ð ðþ ðþþ þ ½ ðþ ðþþš þ ð þ Þ ½ ðþ ðþþš þ C A

MARCEL DEKKER INC. 7 MADISON AVENUE NEW YORK NY 6 3 Marcel Dekker Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker Inc. Information Matrix for Pareto(IV) 3 3... Pareto(III) ( ) Distribution This is a special case of Pareto(IV) with. Therefore third row and third column (these represent information about parameter ) in I P ðþ vanish. And into expressions of the remaining elements we substitute. This yields I 3ðÞ ð; Þ B þ C ðþ A 3ðÞ B C þ 3 A 3 9 3..3. Pareto(II) ( ) Distribution This is a special case of Pareto(IV) with. Therefore second row and second column (these represent information about parameter ) in I P ðþ vanish. And into the remaining formulas we substitute. This yields I ð þ Þ ð;þ B ð þ Þ ð þ Þ C A It should be noted here that information matrices I ð Þ and I ð Þ are readily available in Arnold (983) p.. APPENDIX A.. Three Lemmas Lemma. For b > and non-negative integer m ðlog xþ m ð þ xþ þb dx ð þ bþ X m i m ðþ mi ðiþ ðþ ðmiþ ðbþ i ða:þ

MARCEL DEKKER INC. 7 MADISON AVENUE NEW YORK NY 6 3 Marcel Dekker Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker Inc. 3 Brazauskas where ðaþ R x a e x dx and ðnþ ðaþ R ðlog xþn x a e x dx are gamma function and its n-th order derivative respectively. Proof. We start with the following fact (Abramowitz and Stegun (97) p. 55): x a ðaþ t a e xt dt x > a > : Next after applying the above formula to the integral in Eq. (A.) and interchanging the order of integration we have ðlog xþ m ð þ xþ þb dx ð þ bþ t b e t e tx ðlog xþ m dx dt: Further substitution of variables z tx and application of the binomial formula to ðlog z log tþ m lead to the following expression Z Z " # X t b e t m m ðlog zþ i ð log tþ mi e z dz dt: ð þ bþ i i Finally simple reorganization of terms yields the result. v Lemma. For b > and integer n ðnþ ðbþ ðb Þ ðnþ ðb Þþn ðnþ ðb Þ ða:þ where ðþ and ðnþ ðþ denote gamma function and its n-th order derivative respectively. Proof. Integration by parts of the first derivative of the gamma function ðbþ leads to ðbþ ðb Þ ðb Þþðb Þ: Differentiation of this equation n times yields Eq. (A.). v Lemma 3. For b > logð þ xþ log x ð þ xþ þb dx b ðbþ ðbþ ðbþ ðbþ b! ðbþ ðbþ ðþ ða:3þ

MARCEL DEKKER INC. 7 MADISON AVENUE NEW YORK NY 6 3 Marcel Dekker Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker Inc. Information Matrix for Pareto(IV) 33 where ðþ and ðnþ ðþ denote gamma function and its n-th order derivative respectively. Proof. Note that logð þ xþ log x ð þ xþ þb dx d db log x ð þ xþ þb dx : Formula (A.) and differentiation of the right-hand side yield (A.3). v A.. Useful Integrals In all expressions below function f ðxþ is the Pareto(IV) ð Þ density function given by Eq. (3.). Integrals AA5 are derived via straightforward integration. A A A3 A4 A5 f ðxþ dx = þðx=þ þ f ðxþ þðx=þ = dx þ log þðx=þ = f ðxþ dx log þðx=þ = f ðxþ dx log þðx=þ = þðx=þ = f ðxþ dx ð þ Þ : Integrals B B4 are computed by making use of formula (A.). B B B3 B4 logðx=þ f ðxþ dx ðþ ðþ ðþ logðx=þ f ðxþ dx = þðx=þ þ ðþ ðþ ðþ logðx=þ ð þðx=þ = f ðxþ dx Þ þ log ðx=þ f ðxþ dx ðþ ðþ ðþ ðþ þ ðþ ðþ þ : ðþ ðþ ðþ

MARCEL DEKKER INC. 7 MADISON AVENUE NEW YORK NY 6 3 Marcel Dekker Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker Inc. 34 Brazauskas Lemmas and are applied in computation of integrals C and C. log ðx=þ C f ðxþ dx = þðx=þ þ ðþ ðþ ðþ ðþ þ ðþ ðþ ðþ log ðx=þ C ð þðx=þ = f ðxþ dx Þ þ ðþ ðþ þ þ þ ðþ ðþ þ ð þ Þ ðþ ðþ þ! þ ðþ : Integrals D and D are derived by employing Lemmas 3. D D þ logðx=þ logð þðx=þ = Þ f ðxþ dx ðþ ðþ ðþ ðþ! ðþ ðþ ðþ logðx=þ logð þðx=þ = Þ þðx=þ = f ðxþ dx ðþ ðþ ðþ ðþ þ ðþ ðþ ðþ! þ : ð þ Þ REFERENCES Abramowitz M. Stegun I. A. (97). Handbook of Mathematical Functions. National Bureau of Standards Applied Mathematics Series No. 55. Arnold B. C. (983). Pareto Distributions. Fairland Maryland: International Cooperative Publishing House. Brazauskas V. Serfling R. (a). Robust estimation of tail parameters for two-parameter Pareto and exponential models via generalized quantile statistics. Extremes 3(3):3 49. Brazauskas V. Serfling R. (b). Robust and efficient estimation of the tail index of a single-parameter Pareto distribution. North American Actuarial Journal 4(4): 7.

MARCEL DEKKER INC. 7 MADISON AVENUE NEW YORK NY 6 3 Marcel Dekker Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker Inc. Information Matrix for Pareto(IV) 35 Daykin C. D. Pentika inen T. Pesonen M. (994). Practical Risk Theory for Actuaries. London: Chapman and Hall. Gomes C. P. Selman B. Crato N. (997). Heavy-tailed distributions in combinatorial search. In: Smolka G. ed. Principles and Practice of Constraint Programming CP-97. Lecture Notes in Computer Science. Vol. 33 35. Hampel F. R. Ronchetti E. M. Rousseeuw P. J. Stahel W. A. (986). Robust Statistics: The Approach Based On Influence Functions. New York: Wiley. Harris J. W. Stocker H. (998). Handbook of Mathematics and Computational Science. New York: Springer. Huber P. J. (98). Robust Statistics. New York: Wiley. Johnson N. L. Kotz S. Balakrishnan N. (994). Continuous Univariate Distributions. Vol.. nd ed. New York: Wiley. Kimber A. C. (983a). Trimming in gamma samples. Applied Statistics 3():7 4. Kimber A. C. (983b). Comparison of some robust estimators of scale in gamma samples with known shape. Journal of Statistical Computation and Simulation 8:73 86. Klugman S. A. Panjer H. H. Willmot G. E. (998). Loss Models: From Data to Decisions. New York: Wiley. Lehmann E. L. (983). Theory of Point Estimation. New York: Wiley. Serfling R. J. (98). Approximation Theorems of Mathematical Statistics. New York: Wiley.