Nonparametric estimation of the number of zeros in truncated count distributions

Size: px

Start display at page:

Download "Nonparametric estimation of the number of zeros in truncated count distributions"

Candace Oliver
5 years ago
Views:

celestin.kokonendji@univ-fcomte.fr Seminar of IRP on Statistical Advances for Complex Data CRM, Bellaterra : 215.11.

1 Nonparametric estimation of the number of zeros in truncated count distributions Célestin C. KOKONENDJI University of Franche-Comté, France Laboratoire de Mathématiques de Besançon - UMR 6623 CNRS-UFC celestin.kokonendji@univ-fcomte.fr Seminar of IRP on Statistical Advances for Complex Data CRM, Bellaterra : Joint work with Pere Puig, UAB 1 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

2 Acknowledgements : Centre de Recerca Matemàtica (CRM) : Intensive Research Program (IRP) on Statistical Advances for Complex Data > Moltes Gràcies 2 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

3 Acknowledgements : Centre de Recerca Matemàtica (CRM) : Intensive Research Program (IRP) on Statistical Advances for Complex Data Universitat Autònoma de Barcelona (UAB) : Departament de Matemàtiques & Servei d Estadistica Aplicada > Moltes Gràcies 2 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

4 Acknowledgements : Centre de Recerca Matemàtica (CRM) : Intensive Research Program (IRP) on Statistical Advances for Complex Data Universitat Autònoma de Barcelona (UAB) : Departament de Matemàtiques & Servei d Estadistica Aplicada Pere PUIG : Invitation to the IRP on Statistical Advances for Complex Data ( Multivariate over-equi- and underdispersion, in progress) DoReMi Workshop & Seminari del DEIO (UPC) with Marta Perez-Casany (also for Barcelona s & Sitges Visits) Many excursions (e.g. Costa Brava), Castanyada, etc. > Moltes Gràcies 2 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

5 Outline : Title : Nonparametric estimation of the number of zeros in truncated count distributions 1 Iintroduction 2 Count distributions with log-convex pgf 3 Fascination to lower bounds of p 4 Estimating the non-observed number of zeros 5 Applications 3 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

6 Iintroduction : Iintroduction Count distributions with log-convex pgf Fascination to lower bounds of p Estimating the non-observed number of zeros Applications Cholera data set of McKendrick Number of words knew but unused by Shakespeare Number of grizzly bear females in Yellowstone In many practical situations the researcher is not able to observe the entire distribution of counts in an experiment. In particular the zeros often are not observed, leading to the so called (zero)-truncated count data. For instance : capture-recapture models, used in Biology and Ecology. This is a methodology commonly used to estimate an animal population s size. In many cases the estimation of the not observed number of zeros is an important issue : 4 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

7 Cholera data set of McKendrick i Probably the oldest example of estimation of the number of zeros is that of Mckendrick (1926), who analyzed the number of individuals with cholera in 223 households in a village in India : No. of infections No. of households (frequency) (168) !! McKendrick argued that a household with no cases of cholera could be because its members had not been exposed or because they had been exposed but they had not been infected.?! McKendrick wanted to estimate the number of individuals who were exposed but did not develop the symptoms. To do this, he ignored the 168 households with zero cases and he developed an estimator of the number of zeros using the other observations based on the zero-truncated Poisson distribution (?). i. McKendrick, A. (1926). Application of mathematics to medical problems. Proc. Edinb. Math. Soc. 44, Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

8 Number of words knew but unused by Shakespeare ii?! Another interesting example arises answering to the following question : How many words did Shakespeare know? The information to be taken into account is that Shakespeare wrote different words, of which words were used exactly once, 4343 words were used exactly twice, 2292 were used exactly three times, and so forth. Here is a reduced version of the full table reported in Efron and Thisted : Ocurrences No. of words (frequency)? In this problem (?) the frequency of zeros to be estimated would represent the number of words that Shakespeare knew but did not use in any of his known works. ii. Efron, B., Thisted, R. (1976). Estimating number of unseen species - How many words did Shakespeare know? Biometrika 63, Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

9 Number of grizzly bear females in Yellowstone iii Most of the practical examples related with the estimation of the number of zeros are related to the capture-recapture sampling scheme. Keating et al (22) studied the annual numbers of females with cubs-of-the-year in the Yellowstone grizzly bear population, from 1986 to 21. It is shown below the number of unique females with cubs-of-the-year that were seen exactly j times during the year 1998 : Sights No. of bears (frequency)? Each sight is considered as a "capture", so that 11 females has been captured exactly once, 13 has been captured twice, and so forth. In this case, the number of bears that has been observed is just 33. The frequency of zeros f represents the number of bears not observed, so that the total number of grizzly bear females would be 33 + f. iii. Keating, K., Schwartz, C., Haroldson, M., Moody, D. (22). Estimating numbers of females with cubs-of-the-year in the Yellowstone grizzly bear population. URSUS 13, Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

Iintroduction Count distributions with log-convex pgf Fascination to lower bounds of p Estimating the non-observed number of zeros Applications Discrete Compound Poisson distributions Mixed Poisson

10 Iintroduction Count distributions with log-convex pgf Fascination to lower bounds of p Estimating the non-observed number of zeros Applications Discrete Compound Poisson distributions Mixed Poisson distributions Log-convexity class Count distributions with log-convex pgf!! Very wide class count Compound(?) and Mixed(?) Poisson?! Examples with Differences ( Desigual )!! Overdispersion (to Poisson)!! Zero-inflation (to Poisson) Siméon Denis Poisson ( ) 8 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

11 Discrete Compound Poisson distributions A r.v. X follows a discrete Compound-Poisson (dcp) distribution if X = N Y i, with pgf Φ X (t) := Et X = i=1 t k P(X = k) = exp{ λ[1 Ψ(t)]}, N Poisson(λ) and Y 1, Y 2,... are iid count r.v. s, also independent of N with pgf Ψ( ). The dcp distr. constitute a huge family of count distr. acording to : k= See, e.g., Johnson et al (25) and Steutel and van Harn (24) for properties, formulae and algorithms to calculate the probabilities. Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

12 Discrete Compound Poisson distributions A r.v. X follows a discrete Compound-Poisson (dcp) distribution if X = N Y i, with pgf Φ X (t) := Et X = i=1 t k P(X = k) = exp{ λ[1 Ψ(t)]}, N Poisson(λ) and Y 1, Y 2,... are iid count r.v. s, also independent of N with pgf Ψ( ). The dcp distr. constitute a huge family of count distr. acording to : Feller s characterization : The dcp are the only one discrete distributions that are infinitely divisible. See, e.g., Johnson et al (25) and Steutel and van Harn (24) for properties, formulae and algorithms to calculate the probabilities. k= Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

13 Discrete Compound Poisson distributions A r.v. X follows a discrete Compound-Poisson (dcp) distribution if X = N Y i, with pgf Φ X (t) := Et X = i=1 t k P(X = k) = exp{ λ[1 Ψ(t)]}, N Poisson(λ) and Y 1, Y 2,... are iid count r.v. s, also independent of N with pgf Ψ( ). The dcp distr. constitute a huge family of count distr. acording to : Feller s characterization : The dcp are the only one discrete distributions that are infinitely divisible. See, e.g., Johnson et al (25) and Steutel and van Harn (24) for properties, formulae and algorithms to calculate the probabilities. Examples of dcp distributions : Hermite, negative binomial, strict arcsine, Poisson-Tweedie, Hinde-Demétrio a a. Kokonendji,C.C., Dossou-Gbété,S., Demétrio,C.G.B. (24). Some discrete exponential dispersion models : Poisson-Tweedie and Hinde-Demétrio classes. SORT 28, k= Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count dis

14 Mixed Poisson distributions iv A r.v. X follows a Mixed-Poisson (MP) distribution on N := {, 1,...} if p k := P(X = k) = where F is a distribution function on [, ). Examples of F (MP) distributions : λ λk e k! df(λ), with Φ X(t) = e λ(1 t) df(λ), Poisson (Neyman A), gamma (negative binomial), inverse-gaussian (Sichel or PIG), Tweedie positive stables (Poisson-Tweedie), F for finite supports. iv. Grandell, J. (1997). Mixed Poisson Processes. Chapman & Hall, London. Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

15 Mixed Poisson distributions iv A r.v. X follows a Mixed-Poisson (MP) distribution on N := {, 1,...} if p k := P(X = k) = where F is a distribution function on [, ). Examples of F (MP) distributions : λ λk e k! df(λ), with Φ X(t) = e λ(1 t) df(λ), Poisson (Neyman A), gamma (negative binomial), inverse-gaussian (Sichel or PIG), Tweedie positive stables (Poisson-Tweedie), F for finite supports. Remark : all Poisson-Tweedie (PTw) (MP dcp) ; PTw HD = {NB}. MP with F for finite supports dcp. dcp (Hermite strict arcsine HD\NB) MP. iv. Grandell, J. (1997). Mixed Poisson Processes. Chapman & Hall, London. Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

16 Class of log-convexity pgf : Proposition () Let X be a discrete r.v., Compound- or Mixed-Poisson distributed, with pgf Φ X ( ). Then log Φ X ( ) is a convex function in [, 1]. Proof : Easy for dcp. As for MP [Φ Φ (Φ ) 2 ], let dg t (λ) = e λ(1 t) df(λ) : ( 2 λ 2 dg t (λ) dg t (λ) λdg t (λ)) (Cauchy Schwartz). Class of count distributions with log-convex pgf is wider than (dcp MP). 1 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

17 Class of log-convexity pgf : Proposition () Let X be a discrete r.v., Compound- or Mixed-Poisson distributed, with pgf Φ X ( ). Then log Φ X ( ) is a convex function in [, 1]. Proof : Easy for dcp. As for MP [Φ Φ (Φ ) 2 ], let dg t (λ) = e λ(1 t) df(λ) : ( 2 λ 2 dg t (λ) dg t (λ) λdg t (λ)) (Cauchy Schwartz). Properties Log-convexity Overdispersion (VarX EX) and Zero-inflation (p e EX ). Class of count distributions with log-convex pgf is wider than (dcp MP). 1 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

18 Class of log-convexity pgf : Proposition () Let X be a discrete r.v., Compound- or Mixed-Poisson distributed, with pgf Φ X ( ). Then log Φ X ( ) is a convex function in [, 1]. Proof : Easy for dcp. As for MP [Φ Φ (Φ ) 2 ], let dg t (λ) = e λ(1 t) df(λ) : ( 2 λ 2 dg t (λ) dg t (λ) λdg t (λ)) (Cauchy Schwartz). Properties Log-convexity Overdispersion (VarX EX) and Zero-inflation (p e EX ). Class of count distributions with log-convex pgf is wider than (dcp MP). Example & Desigual Φ X (t) = 1/5 + t/5 + t 2 /5 + t 3 /2 + 7t 4 /2 is a log-convex function in [, 1] but X is not in (dcp MP) by direct calculations. 1 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

19 Iintroduction Count distributions with log-convex pgf Fascination to lower bounds of p Estimating the non-observed number of zeros Applications Some lower bounds of p An improved inequality Fascination to lower bounds of p from Desigual to = 1 + e iπ 12 Célestin C. K OKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

20 Some lower bounds of p : Part I (dcp MP) Proposition (I) Let X be a discrete r.v. Compound- or Mixed-Poisson distributed. Then ( ) k + r p k+r p p k p r, k, r 1, (1) r where p k = P(X = k), k {, 1, 2,...}. Set of lower bounds of p : (1) implies p p k p r ( k+r r )p k+r, k, r 1. (2) Remark : (i) the equalities in (1) or (2) are satisfied iff X is Poisson distributed. (ii) k = r = 1 for the well-known Chao s (1987) lower bound (Böhning, 21) p p2 1 2p 2. (3) 3 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

21 Some lower bounds of p : Part II (Log-convexity) In general, Log-convexity does not satisfy the inequalities (1) or (2) ; cf. the preceding Example & Desigual with 3p 3 p < p 1 p 2. Besides, Log-convexity allows other p -inequalities, involving also the population mean and again Chao s lower bound : Proposition (II) Let X be a discrete r.v. with a log-convex pgf Φ X ( ) in [, 1], such that E(X) = µ. Then, i. p exp( µ) : (Poisson) zero-inflation ii. p p 1 /µ µ p 1 /p : Turing s estimator (Good, 1953) iii. p p 2 1 /(2p 2) : Chao s lower bound. Note : - Equalities in (i)-(iii) are satisfied for Poisson distribution. - The inequalities (i)-(iii) are well known either for both or for one of Compound- and Mixed-Poisson distributions. 14 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

22 An improved inequality : Part III Let X be a r.v. Compound- or Mixed-Poisson distributed with E(X) = µ. Because all the inequalities in (2) and in Prop.(II) are satisfied, a sharper lower bound of p can be obtained taking the maximum of all them. Concretely, (?) p M := max r,k p k p r ( k+r r )p k+r p max { p M, exp( µ), p 1 /µ }. (4) Lemma (Lanumteang & Böhning (211), in proof of their Th.1) Let X be a discrete r.v. Mixed-Poisson distributed, then Proposition (III) p 1 2p 2 3p 3... kp k... p p 1 p 2 p k 1 Under Mixed Poisson : p M := max r,k p k p r ( k+r r )p k+r = p2 1. 2p 2 15 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

23 Example 1 : Negative binomial = (HD PTw) MP ( ) φ ( ) k φ µ Γ(φ + k) p k =, k =, 1, 2,... φ + µ φ + µ k!γ(φ) with mean µ and parameter of shape φ >. Direct calculations show that, ( ) φ p k p r φ Γ(φ + k)γ(φ + r) =, k, r = 1, 2,... ( k+r r )p k+r φ + µ Γ(φ + k + r)γ(φ) & its maximum is attained for k = r = 1,i.e., at the Chao s lower bound (3). It agrees with Prop. (III) because NB is a Mixed Poisson. Consequently, ( ) φ ( ) φ+1 φ φ p M = φ + µ φ + 1. Because p φ M, µ 1, φ + µ direct calculations show that the inequality (4) remains, ( ) φ φ φ p max φ + µ φ + 1, exp( µ). (5) The maximum in the right part of (5) is attained at exp( µ), for < µ µ, and at p M, for µ µ, where µ is the solution of the equation exp( µ) = p M. 16 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

24 Example 2 : Hermite of 3rd order v (dcp \ MP) Consider X a count r.v. Compound-Poisson where the compounding distribution takes a finite range of values,, 1, 2 and 3. It leads to a third-order Hermite distribution, that can be represented as : X = X 1 + 2X 2 + 3X 3, with iid X i P(λ i ). Its probabilities, p k = P(X = k), can be calculated using the recursive relation, p k = (p k 1 λ 1 + 2p k 2 λ 2 + 3p k 3 λ 3 )/k where p = exp( λ 1 λ 2 λ 3 ), and p 1 = p 2 =. This is a dcp \ MP, and consequently the value of p M in (4) is not always the Chao s lower bound. Indeed, taking λ 2 =.5 and λ 3 = 1 numerical calculations show that : - for λ 1 = 1.5 the maximum is at the Chao s lower bound, i.e. p M = p 2 1 /(2p 2), - for λ 1 = 2 the maximum is p M = p 1 p 2 /(3p 3 ), and - for λ 1 = 3 the maximum is p M = p 2 2 /(6p 4). v. Puig, P., Barquinero, J.F. (211). An application of compound Poisson modelling to biological d simetry. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 467 (2127), Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

25 Iintroduction Count distributions with log-convex pgf Fascination to lower bounds of p Estimating the non-observed number of zeros Applications Improved Chao estimate Turing estimate ZI-estimate Final result Estimating the non-observed number of zeros from Chao to = 1 + e iπ The Imitation Game Alan M. Turing ( ) and... How to apply these inequalities to the estimation of the number of zeros? 18 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

26 Improved Chao estimate Consider X a count r.v., with probabilities p k, k =, 1, 2,..., where only the zero-truncated r.v. X X > (of positive values) are observed. Let x = (x 1, x 2,..., x n ) a sample of size n of X X >, and let f k denote the number (frequency) of x i equal to k, k = 1, 2,..., m (m is the largest count observed in the sample). It is evident that f 1 + f f m = n. Let f denote the number of non-observed zeros, to be estimated. The size of the complete sample (counting the zeros) would be N = f + n (that represents the total number of individuals in the capture-recapture experiment). Taking into account that p i f i /N, the inequalities (2) lead to the following lower bound estimates of f, ˆf r,k = f k f r ( k+r r )f k+r, 1 k, r, k + r m. (6) The well known Chao s (1984, 1987) estimator of f is obtained for r = k = Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

27 Turing estimate The inequalities (i) and (ii) in Proposition (II) also allow to obtain lower bound estimates of f. The population mean µ in (i)-(ii) can be replaced by µ := s n + f, where s = + Then, inequality (ii) in Proposition (II) leads to, f f 1/(n + f ) n + f s/(n + f ), n x i. and isolating f we obtain the Turing s estimator of f, i=1 ˆf T = nf 1 s f 1. (7) Note : The so-called Good-Turing s estimator vi of the population size ˆN T = ˆf T + n = n/(1 f 1 /s) underestimates it for the (very wide family of) log-convex-pgf by Prop. (II). vi. See Good (1953), Chao & Lin (212), Chiu et al (214), for capture-recapture problems. 2 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

28 ZI-estimate Replacing again µ by µ := s/(n + f ) in the inequality (i) of Prop.(II) we obtain, ( ) f s exp n + f n + f x ( x ) 1 + x exp, 1 + x where x = f /n and x = s/n. From here, we define the zi-estimator of f, ˆf Z = nˆx, (8) where ˆx is the unique solution of the equation, ( ) x log (1 + x) = x. (9) 1 + x Note : This estimator is well defined because the left part of (9) is a decreasing function, becoming infinity at x = and tending to 1 as x grows, and x > Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

29 ZI-estimate Replacing again µ by µ := s/(n + f ) in the inequality (i) of Prop.(II) we obtain, ( ) f s exp n + f n + f x ( x ) 1 + x exp, 1 + x where x = f /n and x = s/n. From here, we define the zi-estimator of f, ˆf Z = nˆx, (8) where ˆx is the unique solution of the equation, ( ) x log (1 + x) = x. (9) 1 + x Note : This estimator is well defined because the left part of (9) is a decreasing function, becoming infinity at x = and tending to 1 as x grows, and x > 1. Set of (under)estimators of f : ˆf r,k, ˆfT, ˆfZ. 21 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

30 Final results of estimation Because ˆf r,k, ˆf T and ˆf Z underestimate f we propose to consider the estimator resulting maximizing all these estimators, that is, f k f r ˆf M = max r,k, 1 k, r, k + r m. ( k+r r Compound- or Mixed-Poisson )f k+r ˆf = max {ˆfM, ˆf Z, ˆf T }. (1) If ˆf C = f 2 1 /(2f 2) is the Chao s estimator (r = k = 1), it is suitable to consider Log-convex-pgf ˆf = max {ˆfC, ˆf Z, ˆf T }, (11) Remark : Variance of ˆf or ˆf is so complicated! Then, we suggest to use a bootstrap method to estimate the variance and the associated confidence interval for any given sample, 22 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

31 Iintroduction Count distributions with log-convex pgf Fascination to lower bounds of p Estimating the non-observed number of zeros Applications Three Examples of Application Cholera data set of McKendrick Number of words knew but unused by Shakespeare Number of grizzly bear females in Yellowstone Coming back to : 1 Cholera data set of the McKendrick s problem 2 Number of words knew but unused by Shakespeare 3 Number of grizzly bear females in Yellowstone 23 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

32 Cholera data set of McKendrick (1926) Cholera in 223 households in a village in India : No. of infections No. of households (frequency) (168) Result : ˆf M = 48, ˆf Z = , ˆf T = and ˆf C = 32. Here : ˆf M = (f 1 f 3 )/(4f 4 ) = 48. ˆf = 48 and ˆf = 33. Variability using 5 bootstrap samples (and CI by the quantile s method) : Estimator Mean SD 95% CI ˆf [26.5, 17.68] ˆf [22.38, 72.82] Note that the prior knowledge about the distributional pattern is important because the wide of the confidence interval in general is greater for ˆf. 24 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

33 Number of words knew but unused by Shakespeare A reduced version of the full table reported in Efron and Thisted (1976) : Ocurrences No. of words (frequency)? Result (using the full table) : ˆf M = , ˆf Z = , ˆf T = and ˆf C = ˆf = ˆf = ˆf C the Chao s estimator. The simulation of 1 bootstrap samples produces : Estimator Mean SD 95% CI ˆf [ , ] ˆf [ , ] 5 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

34 Number of grizzly bear females in Yellowstone Estimation of the population of grizzly bears females (Keating et al, 22) : Sights No. of bears (frequency)? Result : ˆf M = , ˆf Z = , ˆf T = and ˆf C = ˆf = 28 and ˆf = 6. Adding to the observed number of bears 33, the estimated population size is ˆN = 61 and ˆN = 39. The simulation of 5 bootstrap samples produces : Estimator Mean SD 95% CI ˆf [5.83, 6.17] ˆf [3.5, 16.] 26 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

35 Iintroduction Count distributions with log-convex pgf Fascination to lower bounds of p Estimating the non-observed number of zeros Applications Cholera data set of McKendrick Number of words knew but unused by Shakespeare Number of grizzly bear females in Yellowstone Jo mai perdo. O bé guanyo, o n aprenc. I never lose. I either win or I learn. 27 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

36 Supplementary References a. Böhning, D. (21). Some general comparative points on Chao s and Zelterman s estimators of the population size. Scand. J. Statist. 37, b. Chao, A. (1984). Nonparametric estimation of the number of classes in a population. Scand. J. Statist. 11, c. Chao, A. (1987). Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43, d. Chao, A., Lin, C.-W. (212). Nonparametric lower bounds for species richness and shared species richness under sampling without replacement. Biometrics 68, e. Chiu, C.-H., Wang, Y.-T., Walther, B.A., Chao, A. (214). An improved nonparametric lower bound of species richness via a modified Good-Turing frequency formula. Biometrics 7, f. Good, I.J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika 4, g. Johnson, N.L., Kemp, A.W., Kotz, S. (25). Univariate Discrete Distributions (3rd ed.). Wiley, New Jersey. h. Kemp, A.W., Kemp, C.D. (1966). An alternative derivation of the hermite distribution. Biometrika 53, i. Lanumteang, K., Böhning, D. (211). An extension of Chao s estimator of population size based on the first three capture frequency counts. Comput. Statist. Data Anal. 55, j. Steutel, F.W., van Harn, K. (24). Infinite Divisibility of Probability Distributions on the Real Line (1st ed.). Dekker, New York. 8 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

37 Thanks - Gràcies - Merci - Singuila 29 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

38 Proof of Proposition (I) Steutel and van Harn (24, Chap.II, p.51) for the Compound-Poisson distributions. For the Mixed-Poisson distributions, note that the inequalities (1) are equivalent to, e λ λ k df(λ) e λ λ r df(λ) Defining the probability measure over the positive reals, dg(λ) = e λ λ r+k df(λ) e λ df(λ). (12) e λ df(λ) e λ df(λ), the inequality (12) can be written as, E(Y r )E(Y k ) E(Y k+r ), where Y is a positive r.v. with distribution G. It is well known that for any positive r.v. Y, E(Y s ) 1/s E(Y z ) 1/z, for all < s z (moment monotonicity). Without loss of generality we can assume that r k. Then, E(Y k+r ) E(Y k ) (k+r)/k = E(Y k )E(Y k ) r/k E(Y k )E(Y r ), and the proof is complete. 3 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

39 Proof of Proposition (II) (i) Due to the convexity, the tangent line to log(φ X (t)) at t = t is always lower than log(φ X (t)), that is, log(φ X (t)) Φ X (t ) Φ X (t ) (t t ) + log(φ X (t )). In particular, for t = 1, taking into account that Φ X (1) = µ and Φ X(1) = 1, we obtain log(φ X (t)) µ(t 1), and for t = it leads to log(p ) µ. (ii) Note that the first derivative of log(φ X (t)) is an increasing function for t [, 1]. In particular, the second inequality is deduced from Φ X () Φ X () Φ X (1) Φ X (1). (iii) The third inequality is a direct consequence of the pgf log-convexity at t =. Because log(φ X (t)) is a convex function, calculating the second derivative we obtain that Φ X (t)φ X(t) (Φ X (t))2. Evaluating this expression at t = the third inequality directly holds. Note : Evaluating at t = 1 the expression Φ X (t)φ X(t) (Φ X (t))2, we directly obtain that any count r.v. having a log-convex pgf is overdispersed. 31 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

40 Proof of Proposition (III) Because Lemma establishes the set of inequalities, p p2 1 p 2p 1 p k p , 2p 2 3p 3 (k + 1)p k+1 we need only to prove that p 1 p r, r = 2, 3,... (k + 1)p k+1 ( k+r r )p k+r This inequality is equivalent to e λ λ k+1 df(λ) e λ λ r df(λ) e λ λ r+k df(λ) λe λ df(λ) Similarly to the proof of Proposition (I), defining the probability measure dg(λ) = λe λ df(λ) λe λ df(λ), the inequality can be expressed as, E(Y r 1 )E(Y k ) E(Y k+r 1 ), where Y is a r.v. with distribution G. Using again the moment monotonicity the proof is completed. 32 Célestin C. KOKONENDJI & Pere PUIG Nonparametric estimation of the number of zeros in truncated count di

A Note on Weighted Count Distributions

Journal of Statistical Theory and Applications Volume 11, Number 4, 2012, pp. 337-352 ISSN 1538-7887 A Note on Weighted Count Distributions Célestin C. Kokonendji and Marta Pérez-Casany Abstract As particular