Some Measures for Asymmetry of Distributions

Similar documents
2M2. Fourier Series Prof Bill Lionheart

CS229 Lecture notes. Andrew Ng

General Certificate of Education Advanced Level Examination June 2010

THE THREE POINT STEINER PROBLEM ON THE FLAT TORUS: THE MINIMAL LUNE CASE

Separation of Variables and a Spherical Shell with Surface Charge

Math 124B January 31, 2012

Related Topics Maxwell s equations, electrical eddy field, magnetic field of coils, coil, magnetic flux, induced voltage

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

Explicit overall risk minimization transductive bound

Coupling of LWR and phase transition models at boundary

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

General Certificate of Education Advanced Level Examination June 2010

Mat 1501 lecture notes, penultimate installment

A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC

Lecture Note 3: Stationary Iterative Methods

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Math 124B January 17, 2012

GOYAL BROTHERS PRAKASHAN

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

(1 ) = 1 for some 2 (0; 1); (1 + ) = 0 for some > 0:

Theory of Generalized k-difference Operator and Its Application in Number Theory

XSAT of linear CNF formulas

On Some Basic Properties of Geometric Real Sequences

$, (2.1) n="# #. (2.2)

1D Heat Propagation Problems

14 Separation of Variables Method

An Extension of Almost Sure Central Limit Theorem for Order Statistics

Consistent linguistic fuzzy preference relation with multi-granular uncertain linguistic information for solving decision making problems

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

AST 418/518 Instrumentation and Statistics

8 Digifl'.11 Cth:uits and devices

Formulas for Angular-Momentum Barrier Factors Version II

Keywords: Rayleigh scattering, Mie scattering, Aerosols, Lidar, Lidar equation

Tracking Control of Multiple Mobile Robots

4 Separation of Variables

(f) is called a nearly holomorphic modular form of weight k + 2r as in [5].

Identification of macro and micro parameters in solidification model

SydU STAT3014 (2015) Second semester Dr. J. Chan 18

More Scattering: the Partial Wave Expansion

Chapter 7 PRODUCTION FUNCTIONS. Copyright 2005 by South-Western, a division of Thomson Learning. All rights reserved.

HYDROGEN ATOM SELECTION RULES TRANSITION RATES

Haar Decomposition and Reconstruction Algorithms

A MODEL FOR ESTIMATING THE LATERAL OVERLAP PROBABILITY OF AIRCRAFT WITH RNP ALERTING CAPABILITY IN PARALLEL RNAV ROUTES

A. Distribution of the test statistic

Partial permutation decoding for MacDonald codes

6 Wave Equation on an Interval: Separation of Variables

LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL HARMONICS

Assignment 7 Due Tuessday, March 29, 2016

Homework 5 Solutions

Course 2BA1, Section 11: Periodic Functions and Fourier Series

4 1-D Boundary Value Problems Heat Equation

Chemical Kinetics Part 2

MA 201: Partial Differential Equations Lecture - 10

A CLUSTERING LAW FOR SOME DISCRETE ORDER STATISTICS

Determinantal point process models on the sphere

1. Measurements and error calculus

(Refer Slide Time: 2:34) L C V

Introduction to Simulation - Lecture 14. Multistep Methods II. Jacob White. Thanks to Deepak Ramaswamy, Michal Rewienski, and Karen Veroy

Problem set 6 The Perron Frobenius theorem.

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES

Analysis of Emerson s Multiple Model Interpolation Estimation Algorithms: The MIMO Case

Homogeneity properties of subadditive functions

arxiv: v1 [math-ph] 12 Feb 2016

Chemical Kinetics Part 2. Chapter 16

Efficiently Generating Random Bits from Finite State Markov Chains

Higher dimensional PDEs and multidimensional eigenvalue problems

Modal analysis of a multi-blade system undergoing rotational motion

The Binary Space Partitioning-Tree Process Supplementary Material

Traffic data collection

Maejo International Journal of Science and Technology

2.1. Cantilever The Hooke's law

WAVELET LINEAR ESTIMATION FOR DERIVATIVES OF A DENSITY FROM OBSERVATIONS OF MIXTURES WITH VARYING MIXING PROPORTIONS. B. L. S.

Random maps and attractors in random Boolean networks

PHYS 110B - HW #1 Fall 2005, Solutions by David Pace Equations referenced as Eq. # are from Griffiths Problem statements are paraphrased

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

Lecture 17 - The Secrets we have Swept Under the Rug

arxiv:quant-ph/ v3 6 Jan 1995

AALBORG UNIVERSITY. The distribution of communication cost for a mobile service scenario. Jesper Møller and Man Lung Yiu. R June 2009

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

b n n=1 a n cos nx (3) n=1

Physics 235 Chapter 8. Chapter 8 Central-Force Motion

12.2. Maxima and Minima. Introduction. Prerequisites. Learning Outcomes

From Margins to Probabilities in Multiclass Learning Problems

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Left-right symmetric models and long baseline neutrino experiments

Control Chart For Monitoring Nonparametric Profiles With Arbitrary Design

arxiv:math/ v2 [math.pr] 6 Mar 2005

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION

V.B The Cluster Expansion

V.B The Cluster Expansion

The Group Structure on a Smooth Tropical Cubic

Stat 155 Game theory, Yuval Peres Fall Lectures 4,5,6

APPENDIX C FLEXING OF LENGTH BARS

PhysicsAndMathsTutor.com

Improving the Reliability of a Series-Parallel System Using Modified Weibull Distribution

THINKING IN PYRAMIDS

STA 216 Project: Spline Approach to Discrete Survival Analysis

Restricted weak type on maximal linear and multilinear integral maps.

arxiv: v3 [math.ca] 8 Nov 2018

Transcription:

Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

Some measures for asymmetry of distributions Georgi N. Boshnakov Mathematics Department University of Manchester Institute of Science and Technoogy Short running tite: Measures for asymmetry Georgi N. Boshnakov Mathematics Department University of Manchester Institute of Science and Technoogy P O Box 88 Manchester M60 1QD UK E-mai: georgi.boshnakov@umist.ac.uk 1

Abstract We propose severa measures, functiona and scaar, for asymmetry of distributions by comparing the behaviour of probabiity densities to the right and eft of the mode(s) and show how to generate casses of equivaent distributions from a given distribution, aowing for varying asymmetry but retaining some information theoretic properties of the origina distribution, such as the entropy. Key Words: asymmetry; asymmetry curve; asymmetry index; confidence characteristic 1 Introduction The numerica characteristic normay empoyed to characterise the ack of symmetry of a distribution is the coefficient of skewness, a standardised version of the third centra moment. A variant of this takes expectation with respect to the median rather than the mean. Various measures of asymmetry are discussed by MacGiivray (1986). We propose some aternatives which measure asymmetry with respect to modes rather than means or medians. The proposed measures aways exist and seem quite intuitive. Our approach provides a systematic way to create, from a reference distribution, an entire famiy of distributions having different asymmetries but sharing some fundamenta properties of the origina distribution, such as the differentia entropy. This may be of some interest for kerne estimation as a source of asymmetric kernes. 2 Notation We consider absoutey continuous distributions and measure asymmetry by comparing how ong does it take for the density to fa to a given vaue on the two sides of the modes. A very usefu too for this is the confidence transformation (Boshnakov, 2003) which produces, from a given source distribution, a new distribution (caed the confidence characteristic) whose density is, effectivey, a rearrangement of the vaues of the source density in decreasing order. This transformation preserves some important properties of the source distribution and so it makes sense to say that distributions having the same confidence density beong to a famiy, see Boshnakov (2003) for detais and reated notions. The distribution function and the probabiity density of the confidence characteristic are caed the confidence distribution function and confidence density, respectivey. 1

Given a distribution, we denote its distribution function, probabiity density, confidence distribution function, and confidence density by F, f, G, and g, respectivey. 3 Asymmetry curves and coefficients Suppose that f is unimoda with mode m. Let, for > 0, x = x() and y = y() be such that f(m + x) = f(m y) and = x + y. Then x = x() and y = y() are monotonicay increasing as functions of > 0. If f is stricty monotonicay decreasing on each side of the mode then represents the ength of the region where f if greater than f(m+x), whie x and y represent the ength of the part of this region where f is monotonicay descreasing and increasing, respectivey. To accomodate muti-moda distributions we use this property to define x() and y(). For any c > 0 et S c = {z : f(z) c}. Let aso S I,c (respectivey, S D,c ) be the subregion of S c where the density f is stricty increasing (decresing). Denote the engths of these regions by, incr, and decr, respectivey. We wi assume that = incr + decr. If f is symmetric then decr = incr for a. So, for a symmetric distribution the parametric pot of decr = decr () and incr = incr () as a function of is a straight ine with sope one. For asymmetric distributions such a pot provides comprehensive information about asymmetry. So, we introduce the foowing definition. Definition 1. The curve ( decr (), incr ()), 0, is said to be the asymmetry curve of f and ( decr (), incr () ) its detrended asymmetry curve. 2 The detrended asymmetry curves of symmetric distributions coincide with the positive x-axis. Asymmetry may be defined in terms of functions of ( decr (), incr ()) as foows. Definition 2. The functions decr ()/, incr ()/, and decr ()/ incr (), where > 0, are caed right-asymmetry, eft-asymmetry, and odds-asymmetry, respectivey. Symmetric distributions may be thought of as having a constant asymmetry equa to zero. More generay: Definition 3. A distribution is said to have constant asymmetry if its asymmetry curve is a straight ine. For many common distributions the asymmetry curve for sma is cose to a straight ine with sope one, indicating approximate symmetry in a 2

neghbourhood of the mode. Figure 1 shows the asymmetry curves of severa Gamma distributions. It makes sense to say that the distribution whose curve is cosest to the ine with sope 1 is the most symmetric one, Gamma(1,120) in this case. This is expected here since increasing the second parameter of the Gamma distribution resuts in a distribution coser to norma, see aso the exampe in Section 4.3. Various summary characteristics of the above functions may be considered candidates for the tite coefficient (or index) of asymmetry. Let r pos = E g ( decr ), r neg = E g ( incr ), r assym = r pos r neg, (1) where E g denotes expectation with respect to the confidence density g(). We wi ca r assym, r pos, and r neg the mean asymmetry, the mean positive asymmetry and the mean negative asymmetry, respectivey. The expectations above exist since decr and incr are positive and ess than one. Moreover, Theorem 1. The coefficients r pos, r neg, and r assym are aways finite, r pos and r neg are in the interva [0, 1], r assym is in [ 1, 1]. The mean odds-asymmetry is defined by ( ) decr () r odds = E g, (2) incr () and infinity is a possibe vaue for it. From the definitions above it is easy to see that symmetric distributions have constant asymmetry. Theorem 2. Let f be symmetric with median m, i.e., f(m + x) = f(m x). Then f has constant asymmetry; the right-asymmetry, eft-asymmetry, r pos, and r neg are equa to 2 ; r odds = 1, r assym = 0. Non-symmetric distributions may aso have constant asymmetry. A typica exampe may happen when the density has severa modes and decreases symmetricay around each one of them. It is cear that the introduced measures of asymmetry are invariant with respect to a shift. Some are invariant with respect to change of scae as we, others are not. Indeed, et Y = cx, where c > 0 is a positive constant, X is a random variabe. We wi use the above notation with an additiona index x or y for the probabiity characteristics of X and Y. The two densities are reated by f y (y) = 1 c f x( y c ). 3

Hence f y (a) = f y (a + ) if and ony if f x (a/c) = f x ((a + )/c). In the particuar case when f x is unimoda this shows that x is transformed to c y. Simiary, it can be seen that y,incr () = c x,incr (/c) and y,decr () = c x,decr (/c). Thus a change of scae eads, in genera, to a scae change in the discussed asymmetry measures. It is easy to see however that the change of the asymmetry curve, for exampe, corresponds to changing the units of its pot. The odds-asymmetry is invariant under a scae transformation. The asumptions of unimodaity can be removed with the hep of Coroary 1 from Boshnakov (2003). 4 Asymmetry of some distributions 4.1 Symmetric unimoda distributions Let F be a symmetric unimoda distribution with mode M. Then its confidence density is g() = f(m + 2 ) = f(m 2 ), i.e., decr() = incr () = 2 in this case. So, the asymmetry curve is ( 2, 2 ), r pos = r neg = 1 2, r assym = 0, r odds = 1. 4.2 Trianguar distribution Let f(x) = { 2x/H 0 x H, 2(1 x)/(1 H) H x 1, where H (0, 1). The system f(h incr ) = f(h + decr ), = decr + incr, gives decr () = (1 H) and incr () = H. So, decr () = 1 H H incr(), decr ()/ = 1 H, incr ()/ = H, decr ()/ incr ()/ = 1 2H, where [0, 1]. Hence, the asymmetry of the trianguar distribution is constant. The distribution is symmetric if H = 1, skewed to the right if H < 1 and skewed to the eft 2 2 otherwise. The odds-asymmetry is equa to 1 H. H 4.3 Γ-distribution Let f be a Γ-density, f(x) = λα Γ(α) xα 1 e λx, x 0, (3) We assume here that α > 1. In this case the distribution is unimoda with mode M = (α 1)/λ such that f(m) > f(x), for every x M. (4) 4

The equation f(m + decr ) = f(m incr ), where decr 0 and incr 0, wi be satisfied if (M + decr ) α 1 e λ(m+ decr) = (M incr ) α 1 e λ(m incr), which can be written as ( ) α 1 M + decr = e λ(m+ decr M+ incr) = e λ. (5) Aso, M incr M + decr = (M incr) + ( incr + decr ) = 1 +. (6) M incr M incr M incr From equations (5) and (6) we get Hence, 1 + M incr = e λ/(α 1). M incr = e λ/(α 1) 1. So, M incr = e λ/(α 1) 1. Finay, using the identity decr + incr =, we get incr = M e λ/(α 1) 1 decr = M + e λ/(α 1) 1 Hence, when α > 1 the confidence density of the Γ-distribution is g() = f(m + decr ), 0 = λα Γ(α) ( + e λ/(α 1) 1 )α 1 e λ(+ e λ/(α 1) 1 ) From the way we defined decr and incr, they shoud have the foowing imiting behaviour decr decr 0 0 incr M incr 0 0 5

This is indeed so, since it is easy to verify that for any c > 0 e c 1 0 and e c 1 0 We aso have g(0) = f(m), as expected. Thus, the odds-asymmetry tends to infinity as, the right-asymmetry and the asymmetry tend to 1. We see that the asymmetry curve of the Γ-distribution depends ony on M = (α 1)/λ. In other words, for Γ-distributions having the same mode the asymmetry curves and a measures of asymmetry derived from it are the same. For comparison, the usua coefficient of skewness is equa to 2/ α and so depends on α but not on λ. 1 c. 5 Distributions with given asymmetry The confidence transformation has a number of desirabe properties. In particuar, it preserves the entropy and other information theoretic properties of the origina distribution, see Boshnakov (2003) for detais. It is therefore justifiabe to cassify distributions by their confidence characteristics. By reverting the above process distributions with specified asymmetry properties may be generated. 5.1 Constant asymmetry Suppose that g is the confidence density of some unimoda distribution and we wish to create a distribution with the same confidence characteristic but with odds-asymmetry c > 0. Using the estabished notation, the foowing reations shoud be satisfied: decr incr = c, decr + incr =, f(m + decr ) = f(m incr ) = g(). (7) Let x 0, y 0. The required density is f(m + x) = g( 1+c c x) f(m y) = g((1 + c)y). 5.2 An asymmetric norma famiy The confidence distribution function and confidence density of the standard norma distribution are G() = 2Φ(/2) 1 and g(/2) = ϕ(/2) = 6

exp( 2 /8)/ 2π. The above formuae then give f(m + x) = 1 1+c e ( c ) 2 x 2 /8, 2π f(m y) = 1 2π e (1+c)2 y 2 /8. The distributions obtained by varying c and M form a famiy of distributions having constant asymmetry and the same confidence characteristic as the standard norma distribution. 5.3 Another famiy It may be more convenient in some circumstances to express and incr in terms of decr. So, et incr = u( decr ), = decr + u( decr ), (8) where u(.) is an appropriate function. Then we may define a new density f by f(m + decr ) = f(m incr ) = g() = g( decr + u( decr )). (9) For exampe if we take g(z) = λe λz to be the the exponentia density and u(x) = x 2, then we get f(m + decr ) = f(m 2 decr) = g( decr + u( decr )) = λe λ( decr+ 2 decr ). This can be written as { g(decr + decr 2 ) = λe λ( decr+decr 2 ), if decr 0 f(m + decr ) = g( decr + decr ) = λe λ( decr + decr ), if decr < 0. Now the asymmetry is not constant. For exampe, the odds-asymmetry is decr / 2 decr = 1/ decr. 6 Concusion We defined some measures of asymmetry which provide usefu information about this type of property. The asymmetry curve and its variants provide comprehensive information, whie their averaged counterparts summarise it to singe numbers. These measures provide a systematic way to generate asymmetric distributions. 7

1.4 1.2 1 0.8 0.6 0.4 0.2 0.25 0.5 0.75 1 1.25 1.5 1.75 Figure 1: Asymmetry curves, ( decr (), incr ()), > 0, of Gamma(1,4), Gamma(1,20) and Gamma(1,120). When the second parameter is increased, the curves are coser to the straight ine with sope 1 (amost the same in the case of Gamma(1,120) over the potted range), the interpretation being that the corresponding distributions become more symmetric. A pictured curves however have horizonta asymptotes because the eft imit of the support of the Gamma distribution is finite. References Boshnakov, G. N. (2003) Confidence characteristics of distributions, Statistics & Probabiity Letters 63/4, 353 360. MacGiivray, H. L. (1986) Skewness and asymmetry: Measures and orderings., Ann. Stat. 14, 994 1011. 8