RC 14492 (6497) 3/22/89, revised 7/15/96 Mathematics 9 pages Research Report Some theoretical results concerning L-moments J. R. M. Hosking IBM Research Division T. J. Watson Research Center Yorktown Heights, NY 1598 IBM Research Division Almaden T.J. Watson Tokyo urich
Some theoretical results concerning L-moments J. R. M. Hosking IBM Research Division P. O. Box 218 Yorktown Heights, NY 1598 Abstract. L-moments and L-moment ratios are quantities useful in the summarization and estimation of probability distributions. Hosking (J. R. Statist. Soc. B, 199) describes the theory and applications of L-moments. Here we give an expanded discussion of some of the theory in Hosking (199), including in particular proofs of Theorems 2.1, 2.2 and 2.3 of that paper.
1. L-moments: denitions and basic properties Let X be a real-valued random variable with cumulative distribution function F (x) and quantile function x(f ), and let X 1:n X 2:n ::: X n:n be the order statistics of a random sample of size n drawn from the distribution of X. Dene the L-moments of X to be the quantities X r;1 r r ;1 (;1) k r ; 1 k k=! EX r;k:r r =1 2 ::: : (1) The L in \L-moments" emphasizes that r is a linear function of the expected order statistics. Furthermore, as noted in Hosking (199, section 3), the natural estimator of r based on an observed sample of data is a linear combination of the ordered data values, i.e. an L-statistic. The expectation of an order statistic may be written as EX j:r = r! (j ; 1)! (r ; j)! xff (x)g j;1 f1 ; F (x)g r;j df (x) (David, 1981, p. 33). Substituting this expression in (1), expanding the binomials in F (x) and summing the coecients of each power of F (x) gives where and r = 1 x(f ) P r;1(f ) df r =1 2 ::: (2) P r (F )= p r k =(;1) r;k rx k= p r k F k! r r + k P r (F )istherth shifted Legendre polynomial, related to the usual Legendre polynomials P r (u) by P r (u)=p r (2u ; 1). Shifted Legendre polynomials are orthogonal on the interval ( 1) with constant weight function (Lanczos, 1957, p. 286 though his P r ( : ) diers by a factor (;1) r from ours). The rst few L-moments are 1 = EX = 2 = 1 2 E(X 2:2 ; X 1:2 ) = 3 = 1 3 E(X 3:3 ; 2X 2:3 + X 1:3 ) = 4 = 1 4 E(X 4:4 ; 3X 3:4 +3X 2:4 ; X 1:4 )= k k x:df! x:(2f ; 1) df : x:(6f 2 ; 6F +1)dF x:(2f 3 ; 3F 2 +12F ; 1) df : 1
The use of L-moments to describe probability distributions is justied by the following theorem. Theorem 1. (i) The L-moments r, r =1 2 :::, of a real-valued random variable X exist if and only if X has nite mean. (ii) A distribution whose mean exists is characterized by its L-moments f r : r = 1 2 :::g. Proof. A nite mean implies nite expectations of all order statistics (David, 1981, p. 33), whence part (i) follows immediately. For part (ii), we rst show that a distribution is characterized by the set fex r:r, r =1 2 :::g. This has been proved by Chan (1967) and Konheim (1971): the following proof is essentially Konheim's. Let X and Y be random variables with cumulative distribution functions F and G and quantile functions x(u) andy(u) respectively. Let (X) r Then EX r:r = r (X) r+2 ; (X) xff (x)g r;1 df (x) (Y ) r 1 r+1 = 1 = = 1 EY r:r = r f(r +2)u r+1 ; (r +1)u r g x(u) du u r :u(1 ; u) dx(u) u r :dz X (u) xfg(x)g r;1 dg(x) : by parts where z X (u), dened by dz X (u)=u(1 ; u)dx(u), is an increasing function on ( 1). If r (X) = r (Y ), r =1 2 :::, then 1 u r dz X (u) = 1 u r dz Y (u) r = 1 ::: : Thus z X and z Y are distributions which have the same moments on the nite interval (,1) consequently (Feller, 197, pp. 222{224), z X = z Y. This implies that x(u)= y(u). We have shown that a distribution with nite mean is characterized by the set f r : r =1 2 :::g. Using (2), we have whence r = r = rx k=1 rx k=1 p r;1 k;1 k;1 k (2k ; 1) r!(r ; 1)! (r ; k)! (r ; 1+k)! k : (3) 2
Thus a given set of r determines a unique set of r, so the characterization of a distribution in terms of the latter quantities extends to the former. Thus a distribution may be specied by its L-moments even if some of its conventional moments do not exist. Furthermore, such a specication is always unique: this is of course not true of conventional moments. Indeed, the proof of Theorem 1 shows (in a sense) why L-moments characterize a distribution whereas conventional moments, in general, do not: characterization by L-moments reduces to the classical moment problem on a nite interval (the Hausdor moment problem) \given s r = 1 u r dz(u) r = 1 ::: nd z(u)", whereas characterization by conventional moments is the classical moment problem on an innite interval (the Hamburger moment problem) \given s r = 1 u r dz(u) ;1 Only the Hausdor problem has a unique solution. r = 1 ::: nd z(u)". As shown in Hosking (199), 2 is a measure of the scale or dispersion of the random variable X. It is often convenient to standardize the higher moments r, r 3, so that they are independent of the units of measurement of X. Dene, therefore, the L-moment ratios of X to be the quantities r r = 2 r =3 4 ::: : It is also possible to dene a function of L-moments which is analogous to the coef- cient of variation: this is the L-CV, 2 = 1. Bounds on the numerical values of the L-moment ratios and L-CV are given by the following theorem. Theorem 2. Let X be a nondegenerate random variable with nite mean. Then the L-moment ratios of X satisfy j r j < 1, r 3. If in addition X almost surely, then, the L-CV of X, satises <<1. Proof. Dene Q r (t) by t(1 ; t)q r (t) = (;1)r r! d r dt r ft(1 ; t)gr+1 : Q r (t) is the Jacobi polynomial P r (1 1) (2t ; 1). From Szego (1959, chap. 4) it follows that d dt ft(1 ; t)q r(t)g = ;(r +1)Pr+1(t) 3
so integrating (2) by parts gives h i r = ;xf (x)f1 ; F (x)g(r ; 1) ;1 Q r;2 (F (x)) + F (x)f1 ; F (x)g(r ; 1) ;1 Q r;2 (F (x)) dx : The integrated term vanishes, for niteness of the mean ensures that xf (x)f1 ; F (x)g!asx approaches the endpoints of the distribution: thus r = F (x) f1 ; F (x)g (r ; 1) ;1 Q r;2 (F (x)) dx : (4) Since Q (t) = 1 the case r = 2 gives 2 = F (x)f1 ; F (x)g dx : Now F (x) 1 for all x, and because X is nondegenerate there exists a set of nonzero measure on which <F(x) < 1: thus 2 >. Since F (x)f1 ; F (x)g for all x it follows that j r j(r ; 1) ;1 =(r ; 1) ;1 sup jq r;2 (t)j t1 From Szego (1959, p. 166) it follows that sup jq r;2 (t)j 2 : t1 sup jq r (t)j = r +1 t1 F (x)f1 ; F (x)gdx with the supremum being attained only at t =ort =1. Thus j r j 2, with equality only if F (x) can take only the values and 1, i.e. only if X is degenerate. Thus a nondegenerate distribution has j r j < 2, which together with 2 > implies j r j < 1. If X almost surely then 1 = EX > and 2 >, so = 2 = 1 > furthermore EX 1:2 >, so ; 1=( 2 ; 1 )= 1 = ;EX 1:2 = 1 < : We consider the boundedness of L-moment ratios to be an advantage. Intuitively, it seems easier to interpret a measure such as 3, which is constrained to lie within the interval (;1 1), than the conventional skewness, which can take arbitrarily large values. More stringent bounds on the r can be found. The proof of Theorem 1 implies that a sequence 1 2 :::, can be the \EX r:r " of some random variable X if and only if there exists an increasing function z such that r+2 ; r+1 = 1 u r dz(u) r = 1 ::: : 4
Establishing conditions for the existence of the function z is part of the classical \moment problem", and was solved by Hausdor (1923). Akhiezer (1965, p. 74) shows that z exists if and only if the following quadratic forms are nonnegative: mx i j= ( i+j+3 ; i+j+2 ) x i x j mx i j= (; i+j+3 +2 i+j+2 ; i+j+1 ) x i x j for r =2m ; 1, and mx i j= ( i+j+2 ; i+j+1 ) x i x j m;1 X i j= (; i+j+4 +2 i+j+3 ; i+j+2 ) x i x j for r =2m ; 2. These conditions can be expressed in terms of the nonnegativity of the determinants of certain matrices whose elements are linear combinations of the r. Mallows (1973) gives an exact statement of the result. Theorem 3 (Mallows, 1973, Theorem 2(ii)). The sequence 1 ::: n may be regarded as the expectations of the largest order statistics of samples of size 1 ::: n of a real-valued random variable if and only if either (a), A k >, k =2 ::: n, and B k >, k =3 ::: n, or (b), for some even m, 2 m<n, we have A k > and B k > for k =2 ::: m, A m =, B m > and A k = B k = for k = m +1 ::: n. Here A k and B k are determinants of matrices, as follows: A 2k = det [ i+j ; i+j;1 ] i j=1 ::: k A 2k+1 = det [ i+j+1 ; i+j ] i j=1 ::: k B 2k = det [; i+j +2 i+j;1 ; i+j;2 ] i j=2 ::: k B 2k+1 = det [; i+j+1 +2 i+j ; i+j;1 ] i j=1 ::: k : From Theorem 3 and equation (3) we can obtain the possible values of the L-moments of a distribution. In particular, for a nondegenerate distribution the constraints on 1, 2, 3 and 4 are that 2 ; 1 > 3 ; 2 > ; 3 +2 2 ; 1 > ( 4 ; 3 )( 2 ; 1 ) ; ( 3 ; 2 ) 2 ; 4 +2 3 ; 2 > and so the constraints on 1, 2, 3 and 4 are that < 2 ;1 < 3 < 1 1 4 (5 2 3 ; 1) 4 < 1 : 5
Here for good measure are the constraints on 5 and 6 : 1 5 3(7 4 ; 2) ; 7(1; 4)(1+4 4 ;5 2 3 ) 5(1+ 3 ) 1 (42 2 25 4 ; 14 4 ; 3) + 6(2 3 ; 7 3 4 +5 5 ) 2 35(1 + 4 4 ; 5 2 3 ) 5 1 5 3(7 4 ; 2) + 7(1; 4)(1+4 4 ;5 2 3 ) 5(1+ 3 ) 6 1 1 (3+7 4) ; 15( 3 ; 5 ) 2 14(1 ; 4 ) : Equality in these bounds can be attained only by a distribution which can take a nite number (m say) of distinct values. Such a distribution satises the lower bound on 2m and both the lower and upper bounds on r, r>2m. L-moments as measures of distributional shape Oja (1981), extending work of Bickel and Lehmann (1975, 1976) and van wet (1964), has dened intuitively reasonable criteria for one probability distribution on the real line to be located further to the right (more dispersed, more skew, more kurtotic) than another. A real-valued functional of a distribution that preserves the partial ordering of distributions implied by these criteria may then reasonably be called a \measure of location" (dispersion, skewness, kurtosis). The following theorem shows that 3 and 4 are, by Oja's criteria, measures of skewness and kurtosis respectively. Theorem 4. Let X and Y be real-valued random variables with cumulative distribution functions F and G respectively, andl-moments (X) r and (Y r ) and L-moment ratios r (X) and r (Y ) respectively. (i) If Y = ax + b, then (Y ) 1 = a (X) 1 + b, (Y ) 2 = jaj (X) 1, (Y ) 3 = (X) 3 and (Y ) 4 = (X) 4. (ii) Let (x)=g ;1 (F (x)) ; x. If (x) for all x, then (Y ) 1 (X) 1. If (x) is an increasing function of x, then (Y ) 2 (X) 2. If (x) is convex, then (Y ) 3 (X) 3. If X and Y are symmetric and (x) is convex of order 3 (Oja, 1981, Denition 2.1), then (Y ) 4 (X) 4. Proof. Part (i) is trivial. Part (ii) was proved by Oja (1981) for 1 and 2, in Oja's notation 1 (F )and 1 2 1(F ) respectively. For 3, assume that X and Y are continuous, with probability density functions f and g respectively, and let r(x)=f(x)=gfg ;1 (F (x))g: because (x) is convex, r(x)=d(x)=dx + 1 is increasing. Now by (4) and the substitution y = G ;1 (F (x)), (Y ) 2 = G(y)f1 ; G(y)g dy = F (x)f1 ; F (x)g r(x) dx and similarly (Y ) 3 = F (x)f1 ; F (x)gf2f (x) ; 1g r(x) dx : 6
Thus (X) 2 (Y ) 2 f (Y ) 3 ; (X) 3 g = (Y ) 3 (X) 2 ; (X) 3 (Y ) 2 may bewritten as F (1 ; F )(2F ; 1)r: F (1 ; F ) ; F (1 ; F )(2F ; 1) : F (1 ; F )r (5) wherein F (1 ; F ) is a positive function of x and 2F ; 1andr are increasing. Chebyshev's inequality forintegrals (Mitrinovic, 197, p. 4, Theorem 1) implies that (5) is positive. Because (X) 2 (Y ) 2 > it follows that (Y ) 3 (X) 3. A discrete random variable can be approximated arbitrarily closely by a continuous random variable, so the result is also valid for discrete random variables. The proof for 4 is similar. Approximating a quantile function Sillitto (1969) derived L-moments, without so naming them, as coecients in the approximation of a quantile function by polynomials. As a matter of taste, we prefer to regard (1) as the fundamental denition the approximation to the quantile function then becomes an inversion theorem, expressing the quantile function in terms of the L-moments. Theorem 5 (Sillitto, 1969). Let X be a real-valued continuous random variable with nite variance, quantile function x(f ) and L-moments r, r 1. Then the representation x(f )= 1X is convergent in mean square, i.e. (2r ; 1) r P r;1(f ) <F < 1 R s (F ) x(f ) ; (2r ; 1) r P r;1(f ) the remainder after stopping the innite sum after s terms, satises 1 fr s (F )g 2 df! as s!1: Proof. We seek an approximation to the quantile function x(f )of the form x(f ) a r Pr;1(F ) <F < 1 : (6) 7
The shifted Legendre polynomials P r;1(f ) are a natural choice as the basis of the approximation because they are orthogonal on <F <1 with constant weight function. To determine the a r in (6) we denote the error of the approximation (6) by R s (F )=x(f ) ; a r P r;1(f ) and seek to minimize the mean square error R 1 fr s (F )g 2 df. The condition that X has nite variance ensures that the mean square error is nite. We have 1 fr s (F )g 2 df = = 1 1 fx(f )g 2 df ; 2 + t=1 fx(f )g 2 df ; 2 a r a t 1 1 a r x(f )Pr;1(F )df P r;1(f ) P t;1(f ) df a r r + a 2 r =(2r ; 1) (where we have used the orthogonality results R 1 P r (F )P s (F )df = if r 6= s, R 1 fp r (F )g 2 df =1=(2r + 1)), which is minimized by choosing a r =(2r ; 1) r : That 1 fr s (F )g 2 df! as s!1 i.e. that the set of orthogonal functions P r;1(f ) is complete, is a standard result in Sturm-Liouville theory. A proof is given by Titchmarsh (1946, chap. 4). 8
References Akhiezer, N. I. (1965). The classical moment problem. New York: Hafner. Bickel, P. J., and Lehmann, E. L. (1975). Descriptive statistics for nonparametric models. I. Introduction. II. Location. Annals of Statistics, 3, 138{169. Bickel, P. J., and Lehmann, E. L. (1976). Descriptive statistics for nonparametric models. III. Dispersion. Annals of Statistics, 4, 1139{1158. Chan, L. K. (1967). On a characterization of distributions by expected values of extreme order statistics. American Mathematical Monthly, 74, 95{951. David, H. A. (1981). Order statistics, 2nd ed. New York: Wiley. Feller, W. (197). An introduction to probability theory and its applications, vol. 2. New York: Wiley. Hausdor, F. (1923). Momentenprobleme fur ein endliches Intervall. Mathematische eitschrift, 16, 22{248. Hosking, J. R. M. (199). L-moments: analysis and estimation of distributions using linear combinations of order statistics. Series B, 52, 15{124. Journal of the Royal Statistical Society, Konheim, A. G. (1971). A note on order statistics. American Mathematical Monthly, 78, 524. Lanczos, C. (1957). Applied analysis. London: Pitman. Mallows, C. L. (1973). Bounds on distribution functions in terms of expectations of order-statistics. Annals of Probability, 1, 297{33. Mitrinovic, D. S. (197). Analytic inequalities. Berlin: Springer. Oja, H. (1981). On location, scale, skewness and kurtosis of univariate distributions. Scandinavian Journal of Statistics, 8, 154{168. Sillitto, G. P. (1969). Derivation of approximants to the inverse distribution function of a continuous univariate population from the order statistics of a sample. Biometrika, 56, 641{65. Szego, G. (1959). Orthogonal polynomials, 2nd ed. New York: American Mathematical Society. Titchmarsh, E. C. (1946). Eigenfunction expansions. New York: Oxford University Press. Van wet, W. R. (1964). Convex transformations of random variables. Tract 7, Mathematisch Centrum, Amsterdam. 9