On Approximating a Generalized Binomial by Binomial and Poisson Distributions

International Journal of Statistics and Systems. ISSN 0973-675 Volume 3, Number (008), pp. 3 4 Research India Publications http://www.ripublication.com/ijss.htm On Approximating a Generalized inomial by inomial and Poisson Distributions P. Wongkasem and K. Teerapabolarn * Department of Mathematics, Faculty of Science urapha University, Chonburi, 03 Thailand R. Gulasirima Applied Statistics Program, Faculty of Science and Technology Suan Dusit Rajabhat University, angkok, 0300 Thailand *Corresponding author E-mail: kanint@buu.ac.th Abstract The aim of this paper is to approximate a generalized binomial distribution by binomial and Poisson distributions. Using the ω-function and the two Stein identities, the main result of each approximation in terms of the total variation distance between two distributions and its upper bound is obtained. Moreover, some numerical examples have been given to show applications of the results concerning the three special cases of generalized binomial distribution (binomial, hypergeometric and Pólya distributions). 000 Mathematics Subject Classification: Primary 60F05. Keywords: A generalized binomial distribution; binomial approximation; binomial distribution; hypergeometric distribution; Pólya distribution; Poisson approximation; Poisson distribution; Stein identity; ω-function.. Introduction In 979, Dwass [5] introduced a generalized discrete distribution that covered the binomial, hypergeometric and PÓlya distributions and he called the distribution that a generalized binomial distribution. The distribution depends on four parameters, A,, n and α, where A and are positive, n is a positive integer,α is an arbitrary real number satisfying (n )α and A (i) and (i) are never negative for i =,..., n, () i where x = x( x α) ( x ( i ) α).

4 P. Wongkasem et al. The distribution of the generalized binomial random variable is defined by p( k ) ( k) ( n k) n A =, k = 0,,..., n. ( n) k ( ) n [ AA ( α) ( A ( k ) α)][ ( α) ( ( n k ) α)] =, k ( )( α) ( ( n ) α) (.) k = 0,,..., n. (.) The form of (.) is similar to the distribution form with regard to the Pólya urn model in Johnson et al. [7] on pp. 58. Dwass [5] pointed out the three special cases of the distribution in (.) by α as follows: (i). If α = 0, then the binomial distribution with parameters n and result, that is, A is the k n k n A p( k ) =,k 0,,...,n. k = (.3) (ii). Ifα > 0, then the outcome is the hypergeometric distribution with parameters A,, n andα and, for A α and are integers, the distribution α A α α k n k p( k ) =,k = 0,,..., min{ n,a α} A α + α n (.4) is the well known classical hypergeometric distribution. (iii). If α < 0, then the result of (.) is the Pólya distribution with parameters A,, n and α given by p( k ) A α + k α + n k k n k =,k = 0,,...,n. A α α + n n (.5)

Generalized inomial by inomial and Poisson Distributions 5 Note that in the special case where α = the distribution is the beta-binomial distribution with parameters A, and n. If A,, n and α are all positive integers, then it is the same distribution as in rown and Phillips [3], and it is the negative hypergeometric distribution as α =. It is well known that the hypergeometric distribution can be approximated by the binomial distribution and both binomial and hypergeometric distributions can also be approximated by the Poisson distribution under certain conditions on their parameters. Similarly, if the conditions on the parameters of the generalized binomial, binomial and Poisson distributions are satisfied, then the generalized binomial distribution can also be approximated by the binomial distribution or the Poisson distribution. In this paper, the generalized binomial distribution has been approximated by binomial and Poisson distributions, and the accuracy of each approximation is measured in terms of the total variation distance and its upper bound. The total variation distance between generalized binomial and binomial distributions and the total variation distance between generalized binomial and Poisson distributions are defined by and d TV (G(A,,n,α),(n,p)) = sup G( A,,n, α)( E ) ( n, p )( E ) E (.6) d TV (G(A,,n,α),P(λ)) = sup G( A,,n, α)( E ) P( λ)( E ), (.7) E where E is a subset of {0,...,n} and E a subset of {0} and G(A,,n,α), (n, p) and P(λ) are generalized binomial, binomial and Poisson distributions, respectively. The main tools for determining an upper bound for each total variation distance are the ω-function associated with the generalized binomial random variable and the Stein identities for binomial and Poisson distributions. In 998, Majsnerowska [8] adapted the relation of ω-function associated with a non-negative integer-valued random variable X (Cacoullos and Papathanasiou [4]) to be pk ( ) μ ( k+ ) ω( k+ ) = ω( k) 0, k = 0,,..., (.8) pk ( + ) where ω(0) = μ, p(k) > 0 for all k 0 and μ and are mean and variance of X. The recurrence relation (.8) together with the two Stein identities are used to derive two main results in Section. In Section 3, some numerical examples have been given to show applications of the results concerning the binomial, hypergeometric and Pólya distributions which are the special cases of the generalized binomial distribution. The conclusions of this study are presented in the last section.

6 P. Wongkasem et al.. Main results This section uses the relation of ω-function in (.8) and the two Stein identities of binomial and Poisson distributions to derive the main results associated with (.6) and (.7). Let X be the generalized binomial random variable with probability distribution defined as in (.). Its mean and variance are μ = and ( nα ) =, respectively, and its associated ω-function is given as the ( ) ( α) following proposition. Proposition. Let ω(x) be the ω-function associated with the generalized binomial random variable X and p(k) > 0 for every 0 k n. Then ( n k)( A kα) ω( k) =, k = 0,,..., n ( ) (.) where ( nα ) =. ( ) ( α) Proof. Following Dwass [5] and (.8), the recurrence relation of ω-function associated with the random variable X can be expressed in the form k[ ( n k) α] k ω( k) = + ω( k ), k =,..., n, ( ) ( n k+ )[ A ( k ) α] where ω(0) =. ( ) Therefore, we have ( n )( A α) ( n )( A α) ( n n)( A nα ) ω() =, ω() =,, ω( n) =, ( ) ( ) ( ) which gives (.). Remark.. If α = 0, then (.) reduces to ( n k) A ω( k) =, k = 0,,..., n ( ) (.) with =, which is the ω-function associated with the binomial random ( ) variable and the distribution in (.3).

Generalized inomial by inomial and Poisson Distributions 7. If α > 0 and A α and α are integers, then the new form of ω-function in (.) can be written as ( n k)( Aα k) ω( k) =, k = 0,,...,min, ( Aα+ α) { n A α} (.3) with the same variance as in (.). This is the ω-function associated with the classical hypergeometric random variable and the distribution in (.4). 3. If α < 0, then (.) can be rewritten as ( n k)( Aα + k) ω( k) =, k = 0,,..., n, (.4) ( Aα α) ( nα ) where =. It is the ω-function associated with the Pólya ( ) ( α) random variable and the distribution in (.5). For giving a main result for each of the approximations, the relation of the ω- function is first mentioned, which was stated by Cacoullos and Papathanasiou [4], as follows: If a function g satisfies then where Δg(x) = g(x+) g(x). E ω(x)δg(x) < and = Var( X ) <, Cov(X,g(X)) = E ω(x)δg(x), (.5) y taking g(x) = x, E[ω(X)]= is obtained as a consequence of (.5).. A result of the binomial approximation The theorem below shows an estimate of the total variation distance between G(A,,n,α) and (n,p). To obtain the estimate the Stein identity for the binomial setting (see arbour et al. [] on pp. 88-89 or arbour and Chen [] on pp. 03) is applied for fixed parameters n and p = q (0,), every subset E of {0,...,n} and the bounded real valued function f = f E : {0} (defined as in []) G( A,,n, α)( E ) ( n, p)( E ) = E[( n X) pf( X + ) qxf( X)]. (.6) For any subset E of {0,...,n}, Ehm [6] showed that k, E k, E n+ n+ p q sup Δ f( k ) = sup f( k+ ) f( k ), (n + )pq (.7)

8 P. Wongkasem et al. where f(0) = f() and f(k) = f(n) for k n. Theorem. Let X be the generalized binomial random variable and p = A A + > 0. Then the following inequality holds: n A A ( G( A,, n, α), ( n, )) A + A + A + nn ( ) αδ( α), ( n+ )( α) where δ ( α) = if α 0 and δ( α) = if α < 0. Proof. It can be seen that + n+ (.8) E[( n X) pf( X + ) qxf( X)] = E[ npf ( X + ) px Δf ( X ) Xf ( X )] and, by (.6), (.6) and (.7), we have = E[ μ f( X + )] pe[ XΔf( X)] E[ Xf( X)] = E[ μ f( X + )] pe[ XΔ f( X)] Cov( X, f ( X )) E[ μ f ( X )] = E[ μδf( X)] pe[ XΔ f( X)] E[ ω( X) Δf( X)] (by (.5)) { μ ω( ) ( )} = E px X Δf X n+ n+ p q ( G( A,, n, α), ( n, p)) E μ px ω ( X). (.9) ( n+ ) pq y Proposition, ka ( n k)( A kα ) ( n k) kα μ pk ω( k) = =, A + this implies that For α 0, 0 if α 0, μ pk ω( k) 0 if α < 0.

Generalized inomial by inomial and Poisson Distributions 9 and E μ px ω( X) E μ px ω( X) Therefore, from both cases, we obtain = μ pe( X) E[ ω( X)] = μq pqn( n ) α =, A + α pqn( n ) α = for α < 0. α E μ px ω( X) = where δ ( α) = if α 0 and δ( α) = if α < 0. pqn( n ) α δ ( α ), α (.0) Substituting the right hand side of (.9) by (.0) and p and q by, respectively. Hence the inequality (.8) holds. A A + and The following two corollaries are consequences of Theorem with regard to approximating the hypergeometric and Pólya distributions by binomial distribution. Corollary. If α > 0 and A α and α are integers, then the following inequality holds: n+ n+ A A ( H( A,, n, α), ( n, )) A + A + A + nn ( ) α. (.) ( n+ )( α) n If additionally p = and A is an integer, then n n n ( H( A,, n, α), ( A, )) A + A + A + AA ( ) α, (.) ( )( α) where H(A,,n,α) is the hypergeometric distribution. Corollary. For α < 0, then A A ( PY( A,, n, α), ( n, )) A + A + A + nn ( )( α ), ( n+ )( α) n+ n+ (.3)

0 P. Wongkasem et al. where PY(A,,n,α) is the Pólya distribution. Remark. It should be noted that the result of Theorem or each result of corollaries and, yields a good binomial approximation provided that ( n ) α α is small, that is, α is small and A + α is large.. A result of the Poisson approximation Similar to the binomial approximation, the Stein identity for the Poisson case (see arbour et al. [] on pp. 6-7 or arbour and Chen [] on pp. 65) is applied for every positive constant λ, every subset E of {0} and the bounded real valued function f = f E : {0} (defined as in []). G( A,,n, α)( E) P( λ)( E ) = E[ λf( X + ) Xf( X)]. (.4) For any subset E of {0}, arbour et al. [] proved that Δ f k = f k+ f k λ e λ k, E k, E sup ( ) sup ( ) ( ) ( ). (.5) The following theorem presents a Poisson estimate of the total variation distance between G(A,,n,α) and P(λ) in terms of four parameters A,, n and α. Theorem. Let X be the generalized binomial random variable and 0, and let A (n )( α ) for α < 0. Then λ = A + > ( n ) α + A( A + α) ( G( A,, n, α), P( )) e. (.6) ( )( α) Proof. From (.4), we have d ( G( A,, n, α), P ( λ)) = λe[ f( X + )] E[ Xf( X)] TV = λe[ f( X + )] Cov( X, f( X)) μe[ f( X)] = λe[ Δf( X)] Cov( X, f( X)) = λe[ Δf( X)] E[ ω( X) Δ f( X)] (by (.5)) { λ ω( ) ( )} = E X Δf X ; which by (.5), λ = λ ( e ) E λ ω ( X) (.7)

Generalized inomial by inomial and Poisson Distributions thus As, by Proposition, λ ω( k) E λ ω( X) ( n k)( A kα) = [( n k) α + A] k = 0, = λ E[ ω( X)] = λ n ( ) α + AA ( + α) = λ. ( )( α) Substituting this result into (.7) and putting λ =, thus the theorem is proved. Immediately from (.6), additional results of the Poisson approximation are obtained for the three distributions as follows: Corollary 3. For α = 0, A ( G( n, ), P ( )) e. (.8) Corollary 4. For α > 0, if A α and α are integers, then ( n ) α + A( A + α) ( H( A,, n, α), P( )) e. (.9) ( )( α) Corollary 5. For α < 0, if A ( n )( α), then ( n ) α + A( A + α) ( PY( A,, n, α), P( )) e. (.0) ( )( α) A Remark 3.. If and α are small, then the result of (.6) is a good Poisson estimate.. It is noted that, for α > 0 and > 0, n ( ) α + AA ( + α) A n ( )( α) + AA ( + + α ) < <. ( )( α) ( )( + α) Hence, the bound in (.0) is always less than the bounds in (.8) and (.9).

P. Wongkasem et al. 3. Numerical examples The following numerical examples are given to illustrate how well binomial and Poisson distributions approximate generalized binomial distribution and to see how tight the upper bound for each total variation distance between two distributions by using the results in theorems and, or corollaries -5. Table : Sample values of each total variation distance ( d TV ) between generalized binomial and binomial distributions and its upper bound (U..) d TV U.. d TV U.. d TV U.. n A (.) (.) (.) (.) (.3) (.3) 5 0 0.0009 0.0000 0.00043 0.00044 0.0009 0.0000 0 0 0.00080 0.00086 0.00080 0.00086 0.00078 0.00086 0 0 0.009 0.00345 0.0039 0.0063 0.0080 0.00344 5 5 0.00044 0.00047 0.0057 0.008 0.00043 0.00047 0 5 0.006 0.0099 0.0046 0.0053 0.0059 0.0099 0 5 0.00459 0.00747 0.00577 0.00944 0.00445 0.00746 5 50 0.00075 0.00088 0.0088 0.0085 0.00075 0.00088 0 50 0.006 0.00353 0.08 0.099 0.00 0.0035 0 50 0.0055 0.095 0.045 0.03093 0.00540 0.09 Table provides sample values of each total variation distance between generalized binomial and binomial distributions and its upper bound for fixed = 000 and α =, corresponding to corollaries and, respectively. Table : Sample values of each total variation distance between generalized binomial and Poisson distributions and its upper bound (U..) d TV U.. d TV U.. d TV U.. n A (.8) (.8) (.9) (.9) (.0) (.0) 5 0 0.00047 0.00049 0.00066 0.00068 0.0008 0.0009 0 0 0.00087 0.00095 0.0066 0.0080 0.00009 0.0000 0 0 0.0049 0.008 0.00439 0.0053 0.003 ******* 5 5 0.0065 0.0094 0.00309 0.00340 0.00 0.0048 0 5 0.00436 0.00553 0.00598 0.00747 0.0076 0.00359 0 5 0.0058 0.00984 0.0040 0.073 0.0036 0.0055 5 50 0.00893 0.006 0.00968 0.090 0.0088 0.00 0 50 0.086 0.0967 0.04 0.0304 0.00964 0.063 0 50 0.04 0.036 0.0973 0.04303 0.0088 0.00

Generalized inomial by inomial and Poisson Distributions 3 For the Poisson approximation, sample values of each total variation distance between generalized binomial and Poisson distributions and its upper bound have been presented for fixed A + = 000 and α = 0,,, corresponding to corollaries 3, 4 and 5, respectively, as seen in Table. The numerical results in the two tables suggest that binomial and Poisson approximation to the generalized binomial distribution are quite efficient when n and/or A are small, that is, the estimate of the total variation distance between two distributions of each result is close to the true value of the distance provided that n and/or A are small and A + α is large. y comparing the numerical results in Table, (.) and (.3), and the results in Table, (.9) and (.0), it indicates that the result in (.) is better than the result in (.9) and in the case where n and A are more different the result in (.3) is also better than the result in (.0). 4. Conclusions In this study the upper bounds in (.8) and (.6) are the estimates of the total variation distance between the generalized binomial and binomial distributions and the total variation distance between the generalized binomial and Poisson distributions, respectively. Furthermore, each upper bound in (.8) and (.6) is also a criterion for measuring the accuracy of the corresponding approximation, that is, if an obtained bound is small, then a good binomial or Poisson approximation to the generalized binomial distribution is obtained. Conversely, if such a bound is large, then the binomial or Poisson distribution is not appropriate to approximate the generalized binomial distribution. From the results of (.8) and (.6), it is found that the upper bound of the binomial approximation is small when α is small and A α is large, and the upper bound of the Poisson is small when both α and are small. Acknowledgements. The first two authors would like to thank Faculty of Science, urapha University, for financial support. References [] arbour, A. D., Holst, L., and Janson, S., 99, Poisson approximation, Oxford Studies in Probability, Clarendon Press, Oxford. [] arbour, A.D. and Chen, L.H.Y., 005, An introduction to Stein s method, Singapore University Press, Singapore. [3] rown, T. C., and Phillips, M. J., 999, Negative binomial approximation with Stein s method, Meth. Comp. Appl. Probab., (4), pp. 407-4. [4] Cacoullos, T., and Papathanasiou, V., 989, Characterization of distributions by variance bounds, Statist. Probab. Lett., 7(5), pp. 35-356.

4 P. Wongkasem et al. [5] Dwass, M., 979, A generalized binomial distribution, Amer. Statistician, 33(), pp. 86-87. [6] Ehm, W., 99, inomial approximation to the Poisson binomial distribution, Statist. Probab. Lett., (), pp. 7-6. [7] Johnson, N. L., Kotz, S., and Kemp, A. W., 005, Univariate Discrete Distributions, 3 rd edition, Wiley, New York. [8] Majsnerowska, M., 998, A note on Poisson approximation by ω- functions, Appl. Math., 5(3), pp. 387-39.