arxiv: v2 [math.pr] 8 Feb 2016

Size: px

Start display at page:

Download "arxiv: v2 [math.pr] 8 Feb 2016"

Asher Abel Murphy
6 years ago
Views:

1 Noname manuscript No will be inserted by the editor Bounds on Tail Probabilities in Exponential families Peter Harremoës arxiv:600579v [mathpr] 8 Feb 06 Received: date / Accepted: date Abstract In this paper we present various new inequalities for tail proabilities for distributions that are elements of the most improtant exponential families These families include the Poisson distributions, the Gamma distributions, the binomial distributions, the negative binomial distributions and the inverse Gaussian distributions All these exponential families have simple variance functions and the variance functions play an important role in the exposition All the inequalities presented in this paper are formulated in terms of the signed log-likelihood The inequalities are of a qualitative nature in that they can be formulated either in terms of stochastic domination or in terms of an intersection property that states that a certain discrete distribution is very close to a certain continuous distribution Keywords Tail probability exponential family signed Log-likelihood variance function inequalities Mathematics Subject Classification E5 6E7 60F0 Introduction Let X,, X n be iid random variables such that the moment generating function β E[expβX ] is finite in a neighborhood of the origin For a fixed value of µ one is interested in approximating the tail distribution: Pr n i X i µ If µ is close to the mean of X one would usually approximate the tail probability by the tail probability of a Gaussian random variable If µ is far from the mean of X the tail probability can be estimated using large deviation theory According to the Sanov theorem the probability that the deviation from the mean is as large as µ is of the Peter Harremoës Copenhagen Business College Copenhagen, Denmark Tel: harremoes@ieeeorg

2 Peter Harremoës order exp D where D is a constant or to be more precise ln Pr n i X i µ D n for n Bahadur and Rao [, ] improved the estimate of this large deviation probability, and in [5] such Gaussian tail approximations were extended to situations where one normally uses large deviation techniques The distribution of the signed log-likelihood is close to a standard Gaussian for a variety of distributions An asymptotic result for large sample sizes this is not new [0], but in this paper we are interested in inequalities that hold for any sample size Some inequalities of this type can be found in [? 9, 6,? ], but here we attempt to give a more systematic presentation including a number of new or improved inequalities In this paper we let τ denote the circle constant π and φ will denote the standard Gaussian density exp x τ / We let Φ denote the distribution function of the standard Gaussian Φ t ˆ t φ x dx The rest of the paper is organized as follows In Section we define the signed log-likelihood of exponential families and look at some of the fundamental properties of the signed log-likelihood Next we study inequalities for the signed log-likelihood for certain exponential families associated with continuous waiting times We start with the inverse Gaussian in Section 3 that is particularly simple Then we study the exponential distributions Section 4 and more general Gamma distributions Section 5 Next we turn our attention to discrete waiting times First we obtain some new inequalities for the geometric distributions Section 6 and then we generalize the results to negative binomial distributions Section 7 The negative binomial distributions are waiting times in Bernoulli processes, so in Section 8 our inequalities between negative binomial distributions and Gamma distributions are translated into inequalities between binomial distributions and Poisson distributions Combined with our domination inequalities for Gamma distributions we obtain an intersection inequality between binomial distributions and the Standard Gaussian distribution In this paper the focus is on intersection inequalities and stochastic domination inequalities, but in the discussion we mention some related inequalities of other types and how they may be improved The signed log-likelihood for exponential families Consider the -dimensional exponential family P β where dp β exp β x x dp 0 Z β and Z denotes the moment generating function given by Zβ exp β x dp 0 x Let P µ denote the element in the exponential family with mean value µ, and let ˆβ µ

3 Bounds on Tail Probabilities 3 denote the corresponding maximum likelihood estimate of β Let µ 0 denote the mean value of P 0 Then ˆ D P µ dp µ P 0 ln x dp µ x dp 0 The variance function of an exponential family is defined so that V µ is the variance of P µ The variance functions uniquely characterizes the corresponding exponential families and most important exponential families have very simple variance functions If we know the varince function the divergence can be calculated according to the following formula Lemma In an exponential family P µ parametrized by mean value µand with variance function V information divergence can be calculated according to the formula ˆ µ D P µ P µ µ µ µ V µ dµ Proof The divergence is given by ˆ D P µ P µ dp µ ln dp µ ˆ ln The derivative with respect to β is expβx Zβ expβ x Zβ x dp µ x dp µ x ˆ β β x ln Z β + ln Z β dp µ x β β µ ln Z β + ln Z β β D P µ P µ µ µ Therefore the derivative with respect to µ is Together with the trivial identity D P µ P µ µ µ µ dµ dβ µ µ V µ the results follows D P µ P µ ˆ µ µ µ µ V µ dµ Definition From [3] Let X be a random variable with distribution P 0 Then the signed log-likelihood G X of X is the random variable given by { [D P x P 0 ] /, for x < µ 0 ; G x + [D P x P 0 ] /, for x µ 0

4 4 Peter Harremoës We will need the following general lemma Lemma If the variance function is increasing then is a decreasing function of x Proof We have G x x µ 0 d G x x µ 0 D x Gx G x dx x µ 0 x µ 0 x µ 0 µ 0 x V µ dµ D x µ 0 G x x µ 0 x µ 0 V µ dµ D x µ 0 G x We have to prove that numerator is positive for x < µ 0 and negative for x > µ 0 The numerator can be calculated as ˆ x ˆ x ˆ x µ 0 µ 0 V µ dµ D x µ x 0 µ 0 V µ dµ µ µ 0 µ 0 V µ dµ ˆ x x µ0 V µ µ µ 0 dµ V µ If x > µ 0 then µ ˆ 0 x x + µ 0 µ µ 0 V µ dµ ˆ x x + µ 0 µ µ 0 V µ dµ ˆ x+µ 0 x + µ 0 µ µ 0 V µ ˆ x+µ 0 x + µ 0 µ µ 0 V x+µ 0 ˆ x ˆ x x + µ 0 µ dµ + dµ x+µ 0 V µ 0 ˆ x dµ + x + µ 0 µ µ 0 V x+µ 0 dµ 0 The inequality for x < µ 0 is proved in the same way x+µ 0 0 x + µ 0 µ V x+µ 0 dµ 3 Inequalities for inverse Gaussian The inverse Gaussian distribution and it is used to model waiting times for a Wiener process Brownian motion with drift An inverse Gaussian distribution has density f w [ ] / λ λ w µ τw 3 exp µ w

5 Bounds on Tail Probabilities 5 Fig The signed log-likelihood of an inverse Gaussian distribution with mean value and shape parameter with mean value parameter µ and shape parameter λ The variance function is V µ µ 3 /λ The divergence of an inverse Gaussian distribution with mean µ from an inverse Gaussian distribution with mean µ is ˆ µ Hence the signed log-likelihood is We observe that µ µ µ µ 3 /λ dµ λ µ µ µ µ G µ,λ w λ / w µ w / µ G µ,λ w [ λ µ ] / w µ [ w µ [ ] / λ G, µ ] / w µ Note that the saddle-point approximation [4] is exact for the family of inverse Gaussian distributions, ie φ G w f w [V w] / Lemma 3 The probability density of the random variable G µ,λ W is where g denotes the function G, φ z + g z [ ] µ / λ

6 6 Peter Harremoës Proof The density of G µ,λ W is Now we use that Hence f G µ,λ z G µ,λ z G µ,λ φg µ,λg µ,λ z [V G µ,λ z] / G µ,λ G µ,λ z φ z [ ] V G µ,λ z / G µ,λ G µ,λ z G µ,λ w w / µλ / / w / µλ / w µ wµ λ / µ + w w 3 / µ [ [V w] / G w 3 µ,λ w λ Therefore the density of G µ,λ W is f z ]/ µ + w µ φ z µ G µ,λ z + µ By isolating x in the equation G µ,λ x z we get G µ,λ z µ G, λ / µ + w w 3 / µ [ µ ] / z λ Hence which proves the theorem f z G, φ z z [ µ λ ] / +, Lemma 4 From [6] Let X and X denote random variables with density functions f and f If f x f x for x x 0 and f x f x for x x 0, then X is stochastically dominated by X In particular if f x f x is increasing then X is stochastically dominated by X Proof Assume that f x f x for x x 0 and that f x f x for x x 0 For t x 0 we have P X t ˆ t ˆ t f x dx f x dx P X t

Bounds on Tail Probabilities 7 Fig The density of the signed log-likelihood of an inverse Gaussian distribution blue with mean value and shape parameter comparead with the density of a standard

7 Bounds on Tail Probabilities 7 Fig The density of the signed log-likelihood of an inverse Gaussian distribution blue with mean value and shape parameter comparead with the density of a standard Gaussian distribution red Fig 3 Plot of the quantiles of a standard Gaussian vs the same quantiles of the signed log-likelihood of the invers Gaussian with µ and λ Similarly it is proved that P X t P X t for t x 0 but this implies that P X > t P X > t If f x f x is increasing then there exist a number x 0 such that f x f x for x x 0 and that f x f x for x x 0 Theorem If W is inverse Gauiisan distributed IG µ, λ then the signed loglikelihood G W µ,λ W is stochastically dominated by the standard Gaussian distribution, ie the inequality holds for any w ]0, [ Φ G W µ,λ w Pr W w Proof We have to prove that if W has an inverse Gaussian distribution then G W is stochastically dominated by the standard Gaussian According to Lemma 4 we can prove stochastic dominance by proving that φ z /f z is increasing Now φ z g z [ ] µ / f z λ + which is increasing because the function g is increasing

8 8 Peter Harremoës If Wald random variables are added they become more and more Gaussian and so do their signed log-likelihood The next theorem states that the convergence of the signed log-likelihood towards the standard Gaussian is monotone in stochastic domination Theorem Assume that W and W have inverse Gaussian distributions let G W and G W denote their signed log-likelihood Then G W is stochantically dominated by G W if and only if µ λ > µ λ Proof We have to prove that the densities satisfy φ z g z [ µ λ ] / + < φ z g z [ µ λ ] / + for z > 0 and the reverse inequality for z < 0 The inequality is equivalent to ] / ] / > g g z [ µ λ For z > 0 this follows because the function g is increasing The reversed inequality is proved in the same way z [ µ λ 4 Exponential distributions Although the tail probabilities of the exponential distribution are easy to calculate the inequalities related to the signed log-likelihood of the exponential distribution are non-trivial and will be useful later The exponential distribution Exp has density The distribution function is Pr X x f x exp x ˆ x 0 exp t dt exp x The mean of the exponential distribution Exp is and the variance is so the variance function is V µ µ The divergence can be calculated as ˆ D Exp Exp µ µ dµ ln From this we see that [ x G Exp x ± ln x ] / x γ

9 Bounds on Tail Probabilities 9 Fig 4 The signed log-likelihood γ x of an exponential distribution where γ denotes the function γ x { [ x ln x] /, when x ; + [ x ln x] /, when x > Note that the saddle-point approximation is exact for the family of exponential distributions, ie f x τ / φ G x e [V x] / Lemma 5 The density of the signed log-likelihood of an exponential random variable is given by τ / e zφ z γ z Proof Let X be a Exp distributed random variable The density of the signed loglikelihood is f G z G G z τ / e τ / e φgg z [V G z] / G G z The variance function is V x x so the density is τ / e φ z [V G z] / G G z φ z G z G G z

10 0 Peter Harremoës Fig 5 Plot of the quantiles of a standard Gaussian vs the quantiles of the signed log-likelihood of an exponential distribution From G D follows that G G D so that G x Hence the density of G X can be written as τ / e φ z G z G z GG z dd dx G x x G x τ / e τ / e zφ z G z zφ z γ z, which proves the lemma Theorem 3 The signed log-likelihood of an exponentially distributed random variable is stochastically dominated by the standard Gaussian Proof The quotient between the density of a standard Gaussian and the density of G X is e τ γ z / z We have to prove that this quotient is increasing The function γ is increasing so it is sufficient to prove that t γt is increasing or equivalently that γ t t is decreasing This follows from Lemma because the variance function is increasing

11 Bounds on Tail Probabilities 5 Gamma distributions The sum of k exponentially distributed random variables is Gamma distributed Γ k, where k is called the shape parameter and is the scale parameter It has density f x k Γ k xk exp x and this formula is used to define the Gamma distribution when k is not an integer The mean of the Gamma distribution Γ k, is k and the variance is k so the variance function is V µ µ /k The divergence can be calculated as ˆ k µ k D Γ k, Γ k, k µ /k dµ k ln Further we have that G Γ k, x k / x γ k Note that the saddle-point approximation is exact for the family of Gamma distributions, ie f x kk exp k Γ k kk τ / exp k Γ k k / exp k x k ln x k x φ G Γ k, x [V x] / Proposition If F denotes the distribution function of the distribution Γ k, with mean µ k then d dµ F t equals minus the density of the distribution Γ k +, Proof We have F t ˆ t 0 ˆ t/ 0 Γ k xk exp k x dx Γ k yk exp y dx Hence d d F t Γ k k t exp t t k k+ Γ k + xk exp d dµ F t k+ Γ k + xk exp x x

12 Peter Harremoës Fig 6 The quantiles of a standard Gaussian vs gamma distributions for k blue, k 5 black, and k 0 green The red line corresponds to a perfect match with a standard Gaussian The dependence on on shape and scaling is determined from the equation From this we see that D Γ k, x x Γ k, k k k ln x k x k k ln x k [ x G k, x ± k k ln x ] / k [ ±k / x k ln x ] / k k / x γ k which proves the proposition The following lemma is proved in the same way as Lemma 5 Lemma 6 The density of the signed log-likelihood of a Gamma random variable is given by k k τ / z exp k k / φ z Γ k k / γ z Theorem 4 From [6] The signed log-likelihood of a Gamma distributed random variable is stochastically dominated by the standard Gaussian, ie k / Pr X x Φ G Γ x Proof This is proved in the same way as the corresponding result for exponential distributions

13 Bounds on Tail Probabilities 3 Theorem 5 Let X and X denote Gamma disstributed random variables with shape parameters k and k and scale parameters and The the signed loglikelihood of X is dominated by the signed log-likelihood of X if and only if k k Proof We have to prove that z k / γ φ z z k / z k / φ z < γ z k / for z > 0 and the reverse inequality for z < 0 The inequality is equivalent to γ z k / z k / This follows because the function γ < γ t t z k / z k / is increasing and because both sides have the same limit as z tends to zero from the right 6 Geometric distributions Compounding a Poisson distribution P o λ with rate parameter λ distributed according to an exponential distribution Exp leads a geometric distribution that we will denote Geo We note that this is an unusual way of parametrizing the geometric distributions, but it will be usuful for some of our calculations Since λ is both the mean and the variance of P o λ the mean of Geo is and the variance is V µ µ + µ The point probabilities of a negative binomial distribution can be written as Pr M m ˆ 0 ˆ The distribution function can be calculated as λ m exp λ m! exp λ dλ t m exp t exp t dt 0 m! m + m+ m j Pr M m + j+ j0 m+ +

14 4 Peter Harremoës Fig 7 Plot the quantiles of the signed log-likelihood of Exp 35 vs the quantiles of the signed log-likelihood of Geo 35 The divergence is given by D Geo Geo ˆ µ µ + µ dµ ln + ln + + Hence the signed log-likelihood of the geometric distribution with mean is given by [ g x ± x ln x ] / x + x + ln + Theorem 6 Assume that the random variable M has a geometric distribution Geo and let the random variable X be exponentially distributed Exp If Pr X x Pr M < m then G Geo m / G Exp x G Geo m Proof First we note that G Exp x γ x / and Pr X x Pr X / x / Therefore we introduce the variable y x / and the random variable Y X / that is exponentially distributed Exp We will prove that Pr Y y Pr M < m implies g m / γ y g m The two inequalities are proved separately First we prove that Pr Y y Pr M < m implies that g m / γ y Equivalently we have to prove that γ y g m / γ y g m / γ y + g m /

15 Bounds on Tail Probabilities 5 is positive The probability Pr M < m is a decreasing function of Therefore the probability Pr Y y is a decreasing function of, but the destribution of Y does not depend on so y must be a decreasing function of Therefore the denominator γ y + g m / is a decreasing function of and it equals zero when m / The numerator also equals zero when m / so it is sufficient to prove that the numerator is a decreasing function of Therefore we have to prove the inequality γ y g m 0 or, equivalently, that g m γ y One also have to prove that Pr Y y Pr M < m implies that γ y g m and it is sufficient to prove that γ y g m We have γ y dy d For the geometric distribution we have g m Therefore we have to prove that m + / + dy d d γ y dy y m ln m m + m + m + + dy d m ln m y + Therefore Equation can be solved as m exp y + + y m ln The derivative is dy d m + m + m +

16 6 Peter Harremoës Finally we have to prove that m + / + m + m + / m + + / ln / ln ln m ln + m + ln + + The right ineqality is trivial The left inequality is equivalent to + ln + 0 / m + We have for The derivative is negative which proves the theorem + ln + 0 / / + +, Corollary Assume that the random variable M has a geometric distribution Geo and let the random variable X be exponential distributed Exp If G Exp x G Geo m then Pr M < m Pr X x Pr M m If we plot quantiles of an exponential distribution against the corresponding quantiles of the signed log-likelihood of a Geometric distribution we get a stair case function, ie a sequence of horisontal lines The inequality means that the left endpoint of any step is to the left of the line y x Actually the line y x intersects each step and we say that the plot has an intersection property as illustrated in Figure 7 Proof Since implies Pr X x Pr M < m G Exp x G Geo m and both Pr X x and G Exp x are increasing functions of x we have that G Exp x G Geo m implies that Pr X x Pr M < m Since Pr X x Pr M < m

17 Bounds on Tail Probabilities 7 implies G Geo m / G Exp x we have that G Geo m / G Exp x implies that Pr X x Pr M < m Hence G Geo m + / G Exp x implies that Pr X x Pr M < m + Pr M m Since G Geo m G Geo m + / we also have that G Geo m G Exp x implies that Pr X x Pr M m 7 Inequalities for negative binomial distributions Compounding a Poisson distribution P o λ with rate parameter λ distributed according to a Gamma distribution Γ k, leads a negative binomial distribution The link to waiting times in Bernoulli processes will be explored in Sectoin 8 In this section we will parametrize the negative binomial distribution as neg k, where k and are the prameters of the corresponding Gamma distribution We note that this is an unusual way of parametrizing the negative binomial distribution, but it will be usuful for some of our calculations Since λ is both the mean and the variance of P o λ we can calculate the mean of neg k, as µ k and the variance as V µ µ + µ k The point probaiblities of a negative binomial distribution can be written in several ways Pr M m ˆ 0 ˆ 0 λ m m! exp λ k Γ k λk exp λ dλ t m m! Γ m + k m!γ k exp t Γ k tk exp t dt m + m+k We need an explicite formula for the divergence that is given by ˆ k µ k D neg k, neg k, dµ k µ + µ k k ln + ln + + The log-likelihood is given by where g is given by Equation We will need the following lemma G negk, x k / g x k Lemma 7 A Poisson random variable K with distribution P o λ satisfies d Pr K k Pr K k dλ

18 8 Peter Harremoës Proof If X is a Gamma distributed Γ k +, then Pr K k Pr K < k + Pr X λ Hence d Pr K k dλ k Γ k + λk+ exp λ λk exp λ, k! which proves the lemma Lemma 8 If the distribution of M k is neg k, then the partial derivative of the point probability with respect to the mean value parameter equals where M k+ is neg k +, Proof We have d dµ Pr M k m dµ d d dµ Pr M k m Pr M k+ m k d d ˆ 0 ˆ 0 ˆ 0 m P o t; j Γ k tk exp t dt j0 t P o t; m Γ k tk exp t dt P o t; m Γ k + tk exp t dt The last integral equals Pr M k+ m, which proves the lemma The following theorem generalizes Corollary from k to arbitrary positive values of k We cannot use the same proof technique because we do not have an explicite formula for the quantile function for the Gamma distributions except in the case when k Lemma 4 cannot be used because we want to compare a discrete distribution with a continuous function Instead the proof combines a proof method developed by Zubkov and Serov [] with the ideas and results developed in the previous sections Theorem 7 Assume that the random variable M has a negative binomial distribution neg k, and let the random variable X be Gamma distributed Γ k, If G Γ k, x G negk, m then Pr M < m Pr X x Pr M m 3

19 Bounds on Tail Probabilities 9 Proof Below we only give the proof of the upper bound in Inequality 3 The lower bound is proved the in the same way First we note that G Γ k, x G Γ k, x / and Pr X x Pr X / x / Therefore we introduce the variable y x / and the random variable Y X / that is Gamma distributed Γ k, Introduce the difference and note that δ µ 0 Pr M m Pr Y y δ 0 lim µ 0 δ µ We note that there exists at least one value of µ 0 such that δ µ 0 0 It is sufficient to prove that δ is first increasing and then decreasing in [0, [ According to Lemma 8 the derivative of Pr M m with respect to µ 0 is Γ m + k + m Pr M m µ 0 m!γ k + + m+k+ m + k Γ m + k m k + m!γ k + m+k m + k Pr M m µ 0 + k ˆ + Pr M m + where µ 0 /k is the scale parameter and where and ˆ m /k is the maximum likelihood estimate of the scale parameter Let ˆPr denote the probability of M calculated with respect to this maximum likelihood estimate ˆ Then we have m + k Pr M m + exp D ˆPr M m The condition can be written as which implies G Γ k, x G negk, m k / y γ k / g ˆ k y ˆ γ g k Differentiation with respect to gives k dy y k d ˆ + so that dy d k y ˆ +

20 0 Peter Harremoës Therefore dy Pr Y y f y d Combining these results we get kk τ / exp k Γ k k / kk τ / exp k Γ k k / kk τ / exp k Γ k k / δ m + k + ˆPr M m exp D kk τ / exp k Γ k k / exp D y k y exp D y k ˆ + exp D γ g ˆ exp D γ g ˆ kk τ / exp k Γ k ˆ γ g ˆ Remark that the first factor is positive and that exp D + Γ k k / ˆ k k τ / exp k k + ˆPr M m ˆ + ˆ + ˆ + Γ k m + k k k τ / exp k ˆPr M m is a positive number that does not depend on Therefore it is sufficient to prove that is a decreasing function of, or, equivalently, to prove that ˆ γ g ˆ γ g ˆ ˆ is an increasing function of The partial derivative with respect to is ˆ γ g ˆ + γ g ˆ γ g ˆ ˆ ˆ + + γ g ˆ γ g ˆ γ g ˆ ˆ ˆ+γ g ˆ ˆ ˆγ g ˆ We have to prove that γ g ˆ γ g ˆ ˆ ˆ + γ g ˆ

21 Bounds on Tail Probabilities If ˆ the inequality is equivalent to γ g ˆ ˆ γ g ˆ ˆ + If ˆ < the inequality is equivalent to γ g ˆ ˆ γ g ˆ ˆ + The equation s s t can be solved with respect to x, which gives the solutions s + t ± +4t] / [t For ˆ we get [ ] / ˆ + 4 ˆ For ˆ < we get γ g ˆ + γ g ˆ + ˆ ˆ+ + ˆ+ ˆ+ [ + ˆ ˆ + ˆ + + 4ˆ ˆ + ˆ ˆ+ ] / [ ˆ + 4 ˆ ˆ+ ˆ+ [ + ˆ ˆ + ˆ + + 4ˆ ˆ + Since γ is increasing we have to prove that [ ˆ + ˆ + + 4ˆ g ˆ γ ˆ + ˆ + ] / ] / ] / or, equivalently, that [ ˆ + ˆ + + 4ˆ g ˆ γ ˆ + ˆ + { { } g ˆ γ + g ˆ + γ + ] / [ / } ˆ + ˆ+ +4ˆ] ˆ ˆ+ [ / ˆ + ˆ+ +4ˆ] ˆ ˆ+

22 Peter Harremoës is positive Both the denominator and the numerator are zero when ˆ Therefore it is sufficient to prove that both the denominator and the numerator are decreasing functions of First we prove that the denominator is decreasing The first term is obviously decreasing The second term is composed of γ, which is increasing, and y + y ± +4y] / [y which is increasing or decreasing depending on the sign of ±, and the function ˆ which is decreasing when ˆ and increasing when ˆ ˆ+ Therefore the composed function is a decreasing function of The numerator can be written as [ / { ˆ ln ˆ ˆ + ln ˆ } ˆ + ˆ+ +4ˆ] ˆ + ˆ+ [ / + ˆ + ˆ+ +4ˆ] ln + ˆ ˆ+ We calculate the derivative with respect to, which can be written as 4 +ˆ+4 + ˆ + ˆ + [ /, ˆ + + 4ˆ] + + ˆ + + 4ˆ which is obviously less than or equal to zero If we want to give lower bounds and upper bounds to the tail probabilities of a negative binomial distribution the following reformulation of Theorem 7 is useful Corollary Assume that the random variable M has a negative binomial distribution neg k, and let the random variable X be Gamma distributed Γ k, If G Γ k, x G negk, m Then Pr X x m Pr M m Pr X x m+ 5 where x m and x m+ are determined by G Γ k, x m G negk, m, G Γ k, x m+ G negk, m + 8 Inequalities for binomial distributions and Poisson distributions We will prove that intersections results for binomial distributions and Poisson distributions follows from the corresponding intersection result for negative binomial distributions and Gamma distributions We note that the point probabilities of a negative binomial distribution can be written as Γ m + k m!γ k m m k m+k + m! pk p m

23 Bounds on Tail Probabilities 3 Fig 8 Plot the quantiles of the signed log-likelihood of a standard Gaussian vs the quantiles of the sigend log-likelihood of bin 7, / where p + and where m denotes the raising factorial Let nb p, k denote a negativ binomial distribution with succes probability p Then nb p, k is the distribution of the number of failures before the k th success in a Bernoulli process with success probability p Our inequality for the negative binomial distribution can be translated into an inequality for the binomial distribution Assume that K is binomial bin n, p and M is negative binomial nb p, k Then Pr K k Pr M + k n In terms of p the divergence can be written as D nb p, k nb p, k k p ln p + p ln p p p p We have so D bin n, p bin n, p p ln p p + p ln p p k k D nb n, k k nb p, k n n ln n D bin k ln k n n p + p n, k bin n, p n If G bin is the signed log-likelihood of bin n, p and G nb is the signed log-likelihood of nb p, k then G binn,p k G nbp,k n k If K is Poisson distributed with mean λ and X is Gamma distributed with shape parameter k and scale parameter, ie the distribution of the waiting time until k observations from an Poisson process with intensity Then Next we note that Pr K k Pr X λ D P o k P o λ D Γ k, λ Γ k, k

4 Peter Harremoës Fig 9 Plot of quantiles of a standard Gaussian vs the log-ligelihood of the Poisson distribution P o 35 If G P oλ is the signed log-likelihood for P o λ and G Γ k, is the signed

24 4 Peter Harremoës Fig 9 Plot of quantiles of a standard Gaussian vs the log-ligelihood of the Poisson distribution P o 35 If G P oλ is the signed log-likelihood for P o λ and G Γ k, is the signed log-likelihood for Γ k, then G P oλ k G Γ k, λ Theorem 8 Assume that K is binomially distributed bin n, p and let G binn,p denote the signed log-likelihood function of the exponential family based on bin n, p Assume that L is a Poisson random variable with distribution P o λ and let G P oλ denote the signed log-likelihood function of the exponential family based on P o λ If G binn,p k G P oλ k Then Pr K < k Pr L < k Pr K k 6 Proof Let M denote a negative binomial random variable with distribution nb p, k and let X denote a Gamma random variable with distribution Γ k, where the parameter equals p such that the distributions nb p, k and Γ k, have the same mean value Now G nbp,k n k G binn,p k and G Γ k, λ G P oλ k Then G nbp,k n k G Γ k, λ The left part of the Inequality 6 is proved as follows Pr K < k Pr K k Pr M + k n Pr X λ Pr L k Pr L < k The right part of the inequality is proved in the same way Note that Theorem 7 cannot be proved from Theorem 8 because the number parameter for a binomial distribution has to be an integer while the number parameter of a negative binomial distribution may assume any positive value Now, our inequalities for negative binomial distributions can be translated into inequalities for binomial distributions Now we can prove the an intersection inequalities for the binomial family as stated in the following theorem that was recently proved by Serov and Zubkov []

25 Bounds on Tail Probabilities 5 Corollary 3 Assume that K is binomially distributed bin n, p and let G binn,p denote the signed log-likelihood function of the exponential family based on bin n, p Then Pr K < k Φ G binn,p k Pr K k 7 Similarly, assume that L is Poisson distributed P o λ and let G P oλ denote the signed log-likelihood function of the exponential family based on P o λ Then Pr L < k Φ G P oλ k Pr L k 8 Proof First we prove the left part of Inequality 8 Let X denote a Gamma distributed Γ k, and let Z denote a standard Gaussian Then G P oλ k G Γ k, λ and Pr L < k Pr L k Pr X λ Pr X λ Pr Z G Γ k, λ Pr Z G P oλ k Φ G P oλ k The left part of Inequality 7 is obtained by combining the left part of Inequality 8 with the left part of Inequality 6 Proof The right part of Inequality 7 is obtained follows from the left part of Inequality 7 by replacing p by p and replacing k by n k Proof Since a Poisson distribution is a limit of binomial distributions the right part of Inequality 8 follows from the right part of Inequality 8 The intersection property for Poisson distributions was proved in [6] where the inequality for binomial distributions was also conjectured

26 6 Peter Harremoës 9 Summary The main theorems in this paper are domination theorems and intersection theorems The first type of inequalities states that the signed log-likelihood of one distribution is dominated by the signed loglikelihood of another distribution, ie the distribution function of the first distribution is larger than the distribution function of the second distribution signed ll dom by signed ll Condition Theorem Inverse Gaussian Gaussian IG µ, λ IG µ, λ µ λ > µ λ Gamma Gaussian 4 Γ k, Γ k, k k 5 Table Stochastic domination results Note that the exponential distributions are special cases of Gamma distributions The second type of result are intersection results, ie the distribution function of the log-likelihood of a discrete distribution is a staircase function where each step is intersected by the distribution function of the log-likelihood of a continuous distribution Table Intersection results Discrete distribution Continuous distribution Theorem Geometric Exponential Negative binomial Gamma 7 Binomial Gaussian 3 Poisson Gaussian 3 0 Discussion In this paper we have presented inequalities of two types The inequalities for inverse Gaussian distributions, exponential distributions and other Gamma distributions are about stochastic domination The inequalities for Poisson distributions, binomial distributions, geometric distributions and other negative binomial distributions are about intersection These inequalities can be combined in order to get inequalities of other types For instance a negative binomial random variable M with distribution neg k, satisfies Φ G negk, m Pr M m, where G nbp,k denotes the signed log-likelihood of the negative binomial distribution Contrary to the similar inequality for the binomial distribution the inquality Pr M < m Φ G negk, m does in general not hold as illustrated in Figure 0 We have both lower bounds and upper bounds on the Poission distributions The upper bound for the Poisson distribution corresponds to the lower bound for the

27 Bounds on Tail Probabilities 7 Fig 0 Plot of the quantiles of a standard Gaussian vs the quantiles of the signed loglikelihood of the negative binomial distribution neg, 35 blue and of the signed loglikelihood of the Gamma distribution Γ, 35 green Gamma distribution presented in Theorem 4, but the lower for the Poisson distribution translated into a new upper bound for the distribution function of the Gamma distribution Numerical calculations also indicates that in Inequality 8 the right hand inequality can be improved to Φ G P oλ k + Pr L k This inequality is much tighter than the inequality in 8 Similarly, J Reiczigel, L Rejtő and G Tusnády conjectured that both the lower bound and the upper bound in Inequality 7 can be significantly improved when for p / [9], and their conjecture has been a major motivation for initializing this research For the most important distributions like the binomial distributions, the Poisson distributions, the negative binomial distributions, the inverse Gaussian distributions and the Gamma distributions we can formulate sharp inequalities that hold for any sample size All these distributions have variance functions that are polynomials of order and 3 Natural exponential families with polynomial variance functions of order at most 3 have been classified [8, 7] and there is a chance that one can formulate and prove sharp inequalities for each of these exponential families Although there may exist very nice results for the rest of the exponential families with simple variance functions the rest of these exponential families have much fewer applications than the exponential families that have been the subject of the present paper In the present paper inequalities have been developed for specific exponential families, but one may hope that some more general inequalities may be developed where bounds on the tails are derived directly from the properties of the variace function Acknowledgements The author want to thank Gabor Tusnády, who inspired me to look at this type of inequalities The author also want the thank László Györfi, Joram Gat, Janás Komlos, and A Zubkov for useful correspondence or discussions Finally I want to express my gratityde to Narayana Prasad Santhanam who invitet me to a two month visit at the Electrical Engineering Department, University of Hawai i This paper was completed during my visit

28 8 Peter Harremoës References Bahadur RR 960 Some approximations to the binomial distribution function Annals of mathematical statistics 3:43 54 Bahadur RR, Rao RR 960 On deviation of the sample mean Annals of Mathematical Statistics 3: Barndorff-Nielsen OE 990 A note on the standardized signed log likelihood ratio Scandinavian Journal of Statistics 7:57 60, URL stable/ Daniels HE 954 Saddlepoint approximations in statistics Ann Math Statist 54: Györfi L, Harremoës P, Tusnády G 0 Gaussian approximation of large deviation probabilities, unpublished 6 Harremoës P, Tusnády G 0 Information divergence is more χ -distributed than the χ -statistic In: International Symposium on Information Theory ISIT 0, IEEE, Cambridge, Massachusetts, USA, pp , URL org/abs/05 7 Letac G, Mora M 990 Natural real exponential families with cubic variance functions Ann Stat 8: 37 8 Morris C 98 Natural exponential families with quadratic variance functions Ann Statist 0: Reiczigel J, Rejtő L, Tusnády G 0 A sharpning of Tusnády s inequality, arxiv 0367v 0 Zhang T 008 Limiting distribution of the G statistics Statistics and Probability Letters 78:656 66, URL statprobletters008pdf Zubkov AM, Serov AA 03 A complete proof of universal inequalities for the distribution function of the binomial law Theory Probab Appl 573: , DOI 037/S X , URL S X http://arxivorg/abs/073838

Mutual information of Contingency Tables and Related Inequalities

Mutual information of Contingency Tables and Related Inequalities Peter Harremoës Copenhagen Business College Copenhagen, Denmark Email: harremoes@ieeeorg arxiv:4020092v [mathst] Feb 204 Abstract For testing