Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester
Some measures for asymmetry of distributions Georgi N. Boshnakov Mathematics Department University of Manchester Institute of Science and Technoogy Short running tite: Measures for asymmetry Georgi N. Boshnakov Mathematics Department University of Manchester Institute of Science and Technoogy P O Box 88 Manchester M60 1QD UK E-mai: georgi.boshnakov@umist.ac.uk 1
Abstract We propose severa measures, functiona and scaar, for asymmetry of distributions by comparing the behaviour of probabiity densities to the right and eft of the mode(s) and show how to generate casses of equivaent distributions from a given distribution, aowing for varying asymmetry but retaining some information theoretic properties of the origina distribution, such as the entropy. Key Words: asymmetry; asymmetry curve; asymmetry index; confidence characteristic 1 Introduction The numerica characteristic normay empoyed to characterise the ack of symmetry of a distribution is the coefficient of skewness, a standardised version of the third centra moment. A variant of this takes expectation with respect to the median rather than the mean. Various measures of asymmetry are discussed by MacGiivray (1986). We propose some aternatives which measure asymmetry with respect to modes rather than means or medians. The proposed measures aways exist and seem quite intuitive. Our approach provides a systematic way to create, from a reference distribution, an entire famiy of distributions having different asymmetries but sharing some fundamenta properties of the origina distribution, such as the differentia entropy. This may be of some interest for kerne estimation as a source of asymmetric kernes. 2 Notation We consider absoutey continuous distributions and measure asymmetry by comparing how ong does it take for the density to fa to a given vaue on the two sides of the modes. A very usefu too for this is the confidence transformation (Boshnakov, 2003) which produces, from a given source distribution, a new distribution (caed the confidence characteristic) whose density is, effectivey, a rearrangement of the vaues of the source density in decreasing order. This transformation preserves some important properties of the source distribution and so it makes sense to say that distributions having the same confidence density beong to a famiy, see Boshnakov (2003) for detais and reated notions. The distribution function and the probabiity density of the confidence characteristic are caed the confidence distribution function and confidence density, respectivey. 1
Given a distribution, we denote its distribution function, probabiity density, confidence distribution function, and confidence density by F, f, G, and g, respectivey. 3 Asymmetry curves and coefficients Suppose that f is unimoda with mode m. Let, for > 0, x = x() and y = y() be such that f(m + x) = f(m y) and = x + y. Then x = x() and y = y() are monotonicay increasing as functions of > 0. If f is stricty monotonicay decreasing on each side of the mode then represents the ength of the region where f if greater than f(m+x), whie x and y represent the ength of the part of this region where f is monotonicay descreasing and increasing, respectivey. To accomodate muti-moda distributions we use this property to define x() and y(). For any c > 0 et S c = {z : f(z) c}. Let aso S I,c (respectivey, S D,c ) be the subregion of S c where the density f is stricty increasing (decresing). Denote the engths of these regions by, incr, and decr, respectivey. We wi assume that = incr + decr. If f is symmetric then decr = incr for a. So, for a symmetric distribution the parametric pot of decr = decr () and incr = incr () as a function of is a straight ine with sope one. For asymmetric distributions such a pot provides comprehensive information about asymmetry. So, we introduce the foowing definition. Definition 1. The curve ( decr (), incr ()), 0, is said to be the asymmetry curve of f and ( decr (), incr () ) its detrended asymmetry curve. 2 The detrended asymmetry curves of symmetric distributions coincide with the positive x-axis. Asymmetry may be defined in terms of functions of ( decr (), incr ()) as foows. Definition 2. The functions decr ()/, incr ()/, and decr ()/ incr (), where > 0, are caed right-asymmetry, eft-asymmetry, and odds-asymmetry, respectivey. Symmetric distributions may be thought of as having a constant asymmetry equa to zero. More generay: Definition 3. A distribution is said to have constant asymmetry if its asymmetry curve is a straight ine. For many common distributions the asymmetry curve for sma is cose to a straight ine with sope one, indicating approximate symmetry in a 2
neghbourhood of the mode. Figure 1 shows the asymmetry curves of severa Gamma distributions. It makes sense to say that the distribution whose curve is cosest to the ine with sope 1 is the most symmetric one, Gamma(1,120) in this case. This is expected here since increasing the second parameter of the Gamma distribution resuts in a distribution coser to norma, see aso the exampe in Section 4.3. Various summary characteristics of the above functions may be considered candidates for the tite coefficient (or index) of asymmetry. Let r pos = E g ( decr ), r neg = E g ( incr ), r assym = r pos r neg, (1) where E g denotes expectation with respect to the confidence density g(). We wi ca r assym, r pos, and r neg the mean asymmetry, the mean positive asymmetry and the mean negative asymmetry, respectivey. The expectations above exist since decr and incr are positive and ess than one. Moreover, Theorem 1. The coefficients r pos, r neg, and r assym are aways finite, r pos and r neg are in the interva [0, 1], r assym is in [ 1, 1]. The mean odds-asymmetry is defined by ( ) decr () r odds = E g, (2) incr () and infinity is a possibe vaue for it. From the definitions above it is easy to see that symmetric distributions have constant asymmetry. Theorem 2. Let f be symmetric with median m, i.e., f(m + x) = f(m x). Then f has constant asymmetry; the right-asymmetry, eft-asymmetry, r pos, and r neg are equa to 2 ; r odds = 1, r assym = 0. Non-symmetric distributions may aso have constant asymmetry. A typica exampe may happen when the density has severa modes and decreases symmetricay around each one of them. It is cear that the introduced measures of asymmetry are invariant with respect to a shift. Some are invariant with respect to change of scae as we, others are not. Indeed, et Y = cx, where c > 0 is a positive constant, X is a random variabe. We wi use the above notation with an additiona index x or y for the probabiity characteristics of X and Y. The two densities are reated by f y (y) = 1 c f x( y c ). 3
Hence f y (a) = f y (a + ) if and ony if f x (a/c) = f x ((a + )/c). In the particuar case when f x is unimoda this shows that x is transformed to c y. Simiary, it can be seen that y,incr () = c x,incr (/c) and y,decr () = c x,decr (/c). Thus a change of scae eads, in genera, to a scae change in the discussed asymmetry measures. It is easy to see however that the change of the asymmetry curve, for exampe, corresponds to changing the units of its pot. The odds-asymmetry is invariant under a scae transformation. The asumptions of unimodaity can be removed with the hep of Coroary 1 from Boshnakov (2003). 4 Asymmetry of some distributions 4.1 Symmetric unimoda distributions Let F be a symmetric unimoda distribution with mode M. Then its confidence density is g() = f(m + 2 ) = f(m 2 ), i.e., decr() = incr () = 2 in this case. So, the asymmetry curve is ( 2, 2 ), r pos = r neg = 1 2, r assym = 0, r odds = 1. 4.2 Trianguar distribution Let f(x) = { 2x/H 0 x H, 2(1 x)/(1 H) H x 1, where H (0, 1). The system f(h incr ) = f(h + decr ), = decr + incr, gives decr () = (1 H) and incr () = H. So, decr () = 1 H H incr(), decr ()/ = 1 H, incr ()/ = H, decr ()/ incr ()/ = 1 2H, where [0, 1]. Hence, the asymmetry of the trianguar distribution is constant. The distribution is symmetric if H = 1, skewed to the right if H < 1 and skewed to the eft 2 2 otherwise. The odds-asymmetry is equa to 1 H. H 4.3 Γ-distribution Let f be a Γ-density, f(x) = λα Γ(α) xα 1 e λx, x 0, (3) We assume here that α > 1. In this case the distribution is unimoda with mode M = (α 1)/λ such that f(m) > f(x), for every x M. (4) 4
The equation f(m + decr ) = f(m incr ), where decr 0 and incr 0, wi be satisfied if (M + decr ) α 1 e λ(m+ decr) = (M incr ) α 1 e λ(m incr), which can be written as ( ) α 1 M + decr = e λ(m+ decr M+ incr) = e λ. (5) Aso, M incr M + decr = (M incr) + ( incr + decr ) = 1 +. (6) M incr M incr M incr From equations (5) and (6) we get Hence, 1 + M incr = e λ/(α 1). M incr = e λ/(α 1) 1. So, M incr = e λ/(α 1) 1. Finay, using the identity decr + incr =, we get incr = M e λ/(α 1) 1 decr = M + e λ/(α 1) 1 Hence, when α > 1 the confidence density of the Γ-distribution is g() = f(m + decr ), 0 = λα Γ(α) ( + e λ/(α 1) 1 )α 1 e λ(+ e λ/(α 1) 1 ) From the way we defined decr and incr, they shoud have the foowing imiting behaviour decr decr 0 0 incr M incr 0 0 5
This is indeed so, since it is easy to verify that for any c > 0 e c 1 0 and e c 1 0 We aso have g(0) = f(m), as expected. Thus, the odds-asymmetry tends to infinity as, the right-asymmetry and the asymmetry tend to 1. We see that the asymmetry curve of the Γ-distribution depends ony on M = (α 1)/λ. In other words, for Γ-distributions having the same mode the asymmetry curves and a measures of asymmetry derived from it are the same. For comparison, the usua coefficient of skewness is equa to 2/ α and so depends on α but not on λ. 1 c. 5 Distributions with given asymmetry The confidence transformation has a number of desirabe properties. In particuar, it preserves the entropy and other information theoretic properties of the origina distribution, see Boshnakov (2003) for detais. It is therefore justifiabe to cassify distributions by their confidence characteristics. By reverting the above process distributions with specified asymmetry properties may be generated. 5.1 Constant asymmetry Suppose that g is the confidence density of some unimoda distribution and we wish to create a distribution with the same confidence characteristic but with odds-asymmetry c > 0. Using the estabished notation, the foowing reations shoud be satisfied: decr incr = c, decr + incr =, f(m + decr ) = f(m incr ) = g(). (7) Let x 0, y 0. The required density is f(m + x) = g( 1+c c x) f(m y) = g((1 + c)y). 5.2 An asymmetric norma famiy The confidence distribution function and confidence density of the standard norma distribution are G() = 2Φ(/2) 1 and g(/2) = ϕ(/2) = 6
exp( 2 /8)/ 2π. The above formuae then give f(m + x) = 1 1+c e ( c ) 2 x 2 /8, 2π f(m y) = 1 2π e (1+c)2 y 2 /8. The distributions obtained by varying c and M form a famiy of distributions having constant asymmetry and the same confidence characteristic as the standard norma distribution. 5.3 Another famiy It may be more convenient in some circumstances to express and incr in terms of decr. So, et incr = u( decr ), = decr + u( decr ), (8) where u(.) is an appropriate function. Then we may define a new density f by f(m + decr ) = f(m incr ) = g() = g( decr + u( decr )). (9) For exampe if we take g(z) = λe λz to be the the exponentia density and u(x) = x 2, then we get f(m + decr ) = f(m 2 decr) = g( decr + u( decr )) = λe λ( decr+ 2 decr ). This can be written as { g(decr + decr 2 ) = λe λ( decr+decr 2 ), if decr 0 f(m + decr ) = g( decr + decr ) = λe λ( decr + decr ), if decr < 0. Now the asymmetry is not constant. For exampe, the odds-asymmetry is decr / 2 decr = 1/ decr. 6 Concusion We defined some measures of asymmetry which provide usefu information about this type of property. The asymmetry curve and its variants provide comprehensive information, whie their averaged counterparts summarise it to singe numbers. These measures provide a systematic way to generate asymmetric distributions. 7
1.4 1.2 1 0.8 0.6 0.4 0.2 0.25 0.5 0.75 1 1.25 1.5 1.75 Figure 1: Asymmetry curves, ( decr (), incr ()), > 0, of Gamma(1,4), Gamma(1,20) and Gamma(1,120). When the second parameter is increased, the curves are coser to the straight ine with sope 1 (amost the same in the case of Gamma(1,120) over the potted range), the interpretation being that the corresponding distributions become more symmetric. A pictured curves however have horizonta asymptotes because the eft imit of the support of the Gamma distribution is finite. References Boshnakov, G. N. (2003) Confidence characteristics of distributions, Statistics & Probabiity Letters 63/4, 353 360. MacGiivray, H. L. (1986) Skewness and asymmetry: Measures and orderings., Ann. Stat. 14, 994 1011. 8