Biostatistics Department Technical Report

Similar documents
Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

Machine Learning Basics: Estimators, Bias and Variance

Probability Distributions

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

An Approximate Model for the Theoretical Prediction of the Velocity Increase in the Intermediate Ballistics Period

Sampling How Big a Sample?

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Estimating Parameters for a Gaussian pdf

Best Procedures For Sample-Free Item Analysis

Testing equality of variances for multiple univariate normal populations

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

A Note on the Applied Use of MDL Approximations

Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression

In this chapter, we consider several graph-theoretic and probabilistic models

OBJECTIVES INTRODUCTION

Non-Parametric Non-Line-of-Sight Identification 1

The degree of a typical vertex in generalized random intersection graph models

Meta-Analytic Interval Estimation for Bivariate Correlations

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

An Introduction to Meta-Analysis

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

Bayesian Approach for Fatigue Life Prediction from Field Inspection

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Physically Based Modeling CS Notes Spring 1997 Particle Collision and Contact

Classical and Bayesian Inference for an Extension of the Exponential Distribution under Progressive Type-II Censored Data with Binomial Removals

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Two New Unbiased Point Estimates Of A Population Variance

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

TABLE FOR UPPER PERCENTAGE POINTS OF THE LARGEST ROOT OF A DETERMINANTAL EQUATION WITH FIVE ROOTS. By William W. Chen

A Simple Regression Problem

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

Lower Bounds for Quantized Matrix Completion

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

The Transactional Nature of Quantum Information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

PHY 171. Lecture 14. (February 16, 2012)

On the block maxima method in extreme value theory

lecture 36: Linear Multistep Mehods: Zero Stability

On the Maximum Likelihood Estimation of Weibull Distribution with Lifetime Data of Hard Disk Drives

4 = (0.02) 3 13, = 0.25 because = 25. Simi-

Feature Extraction Techniques

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

The Weierstrass Approximation Theorem

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Block designs and statistics

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

3.3 Variational Characterization of Singular Values

Some Perspective. Forces and Newton s Laws

Estimation of the Population Mean Based on Extremes Ranked Set Sampling

Chapter 6 1-D Continuous Groups

Computable Shell Decomposition Bounds

Beyond Mere Convergence

Now multiply the left-hand-side by ω and the right-hand side by dδ/dt (recall ω= dδ/dt) to get:

arxiv: v2 [math.co] 3 Dec 2008

On Conditions for Linearity of Optimal Estimation

Statistical properties of contact maps

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

Polygonal Designs: Existence and Construction

Best Linear Unbiased and Invariant Reconstructors for the Past Records

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Bootstrapping Dependent Data

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

Ch 12: Variations on Backpropagation

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

General Properties of Radiation Detectors Supplements

Chapter 2 General Properties of Radiation Detectors

1 Bounding the Margin

Necessity of low effective dimension

arxiv:cond-mat/ v1 [cond-mat.stat-mech] 22 Oct 1998

Measuring Temperature with a Silicon Diode

AN EFFICIENT CLASS OF CHAIN ESTIMATORS OF POPULATION VARIANCE UNDER SUB-SAMPLING SCHEME

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

C na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction

Generalized Augmentation for Control of the k-familywise Error Rate

IN modern society that various systems have become more

On Constant Power Water-filling

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

Detection and Estimation Theory

Computable Shell Decomposition Bounds

Optimal nonlinear Bayesian experimental design: an application to amplitude versus offset experiments

Correcting a Significance Test for Clustering in Designs With Two Levels of Nesting

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

Fixed-to-Variable Length Distribution Matching

Birthday Paradox Calculations and Approximation

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Optimal Jackknife for Discrete Time and Continuous Time Unit Root Models

Kinetic Theory of Gases: Elementary Ideas

Nonuniqueness of canonical ensemble theory. arising from microcanonical basis

Measures of average are called measures of central tendency and include the mean, median, mode, and midrange.

Solutions of some selected problems of Homework 4

Testing the lag length of vector autoregressive models: A power comparison between portmanteau and Lagrange multiplier tests

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Support recovery in compressed sensing: An estimation theoretic approach

Enzyme kinetics: A note on negative reaction constants in Lineweaver-Burk plots

Transcription:

Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent of Biostatistics School of Public Health University of Alabaa at Biringha ckatholi@uab.edu

The screening of pools of insects to ake estiates of the prevalence of infection of soe disease in a vector species is becoing ore and ore coon due to the increasing sensitivity and specificity of PCR ethods. In this approach, the investigator collects pools or groups of the vector species. These pools are then tested using an assay ethod such as PCR and the pool is evaluated as positive or negative depending on whether the disease of interest is found in the pool. An early use of this approach was testing groups of en drafted to serve in the ilitary for syphilis. In this case, the ethod was use to provide screening at reduced cost since if a pool was found to be negative it was certain that none of the en in the pool had the disease.this approach is particularly appropriate in the case where the prevalence is low and screening pools allows one to check a large nuber of insects with a saller aount of labor than would be required to test each individual insect by dissection or soe other protocol. The appropriate statistical odel to use in conjunction with the pool screening approach depends very uch on the way that the sapling and testing are done. Many investigators have considered the case of binoial sapling [ -5 ]. For testing individual insects (i.e., pools of size ) George and Elston [6] recoended geoetric sapling when the probability of an event was sall. They gave confidence intervals for the prevalence based on this odel. They did not, however, investigate the statistical properties of the estiator. Lui [7] extended their work on the confidence interval by considering negative binoial sapling and showed that as the nuber of successes required increased, the width of the confidence interval decreased. Lui also did not discuss point estiators, their statistical properties nor did he investigate the statistical properties of his confidence intervals. In this report we investigate these sapling odels when an investigator collects and tests pools until soe pre-deterined nuber of positive pools is observed. We shall consider point and interval estiators obtained by both classical and Bayesian ethods and investigate their statistical properties. In what follows we shall denote the size of the pools collected by, the prevalence of infection by p, the nuber of positive results to be observed before quitting by r, and the nuber of ties the experient is carried out by. If the prevalence of infection is p, then the probability that a pool of size tests negative is given by ( p) and the probability that a pool is positive is positive is [ ( p) ]. If we let Y be the nuber of negative pools observed prior to getting r positive pools, then Y has a negative binoial distribution and we take this as the probability odel upon which to base our calculations. If Y, Y,, Y are the results of such experients we shall often denote T the as a vector Y = ( Y,, Y ). Following noral practice we shall generally denote rando variables by capital letters and their realizations by sall letters. Classical Methods: We begin by finding the axiu likelihood estiator of p given that Y has the negative binoial distribution, y+ r r y f ( y p) = P( Y = y p) = ( p) ( p), y = 0,,, ;0 < p < y

Given the results of replications of the sapling procedure, Y, Y,, Y the likelihood function is given by Yj + r r l( p Y) = ( p) ( p) j= Y j If we define L( p Y) = ln l( p Y) we have Yj + r L( p Y) = rln ( p) + Tln[ p] + wheret = Y j= Y j j= Recall that since the Y, Y,, Y are i.i.d negative binoial with paraeters r and p, then T is negative binoial with paraeters r and p. Following the usual procedure we take the derivative of the log likelihood with respect to p, set it equal to zero and solve the resulting equation. We shall show that the solution is a axiu and that it is unique. Yj j L r( p) T r( p) = = T p ( p) ( p) ( p) ( p) Since is positive for 0<p< we need only consider the right hand factor when ( p) this is set to zero. Setting the right hand ter to zero and solving for p yields T r( p) pˆ =. Exaination of u( p) = T reveals that as p T + r ( p) approaches zero the u(p) tends to +. Siilarly, as p approaches, the u(p) approaches (-T). Because the function is continuous on (0, ) and changes sign in the interval it follows fro the interediate value theore that there is at least one root of the equation in the interval (0, ). ext note that it is easily deonstrated that the left hand ter in the expression for u(p) is strictly onotone decreasing so there is only one solution on (0, ). Finally, L r( )( p) r( p) = 0 + + T < p ( p) ( p) ( p) for all 0 < p < and so ˆp is the unique axiu likelihood estiator (MLE) of p. ext we consider the asyptotic properties of the MLE. It is coonly the custo to assue the consistency and asyptotic norality of axiu likelihood estiators as given when certain (generally unspecified) regularity conditions are et. Most basic texts in statistics do not say what these conditions are. They can be found, for exaple in Singer and Sen [0], Serfling [] and Furgeson []. The exact assuptions

differ soewhat aong the authors and soe can be difficult to establish in practice. Wald [3] established the strong consistency of the MLE under very weak conditions copared to those given in general. On the other hand, when the MLE is available in closed for as it is here, it is often possible to establish these properties directly and that approach is taken here. In what follows we will denote the MLE by pˆ to ephasize that it is the estiator based replications of the sapling process. We begin by proving the following result. Theore 0: p ˆ is a strongly consistent estiator; that is, as.. ˆ as p p. T T proof: We have previously shown that pˆ = =, T Y = i. T + r T + r i= r( p) Let X = T = Yi and note that EY ( i) = = E( X), then by i = ( p) as. r( p) Khintchine s strong law of large nubers X = c and ( p) as. X W = X c 0. With this notation we have that pˆ = and we X + r expand the right hand side of this expression in a Taylor series about c. Thus we can write that r pˆ [ ( ) ] [ ( ) ] = p hc+ h X hc+ h X + r ( X c) = r p [ X ] [ ] hw X + r hw ( W) X for soe h,0 < h <. Define the vector rando variable V = and note that W as.. c V v = as. Also define the function 0 rw gv ( ) [ ] [ ] = X hw X + r hw gv ( ) is clearly a continuous function of its arguents and hence is a Borel function. Hence by general convergence results for Borel functions we have This copletes the proof. gv as.. ( ) g( v) 0 =.

We are also interested to asyptotic behavior of the oents of p ˆ.To this end, t consider the function f () t = t ( t r) = + and note t+ r r( p) that ET ( ) =. In order to expand f () t about t 0 = E( T) we need the ( p) derivatives of f () t. To this end we note that f () t can be written in ters of the product of two functions; ut () t vt () = t+ r. That is, f ( t) = u( t) v( t). We shall also utilize Leibnitz s rule for taking the s-th derivative of a product of two functions. That is, = and ( ) s s s j j d ( uv) s d u d v = s s j j () dt j= 0 j dt dt and that It is straight forward to show that k k () k k k = ( ) ( ) ( ) k j r p p dt t t 0 0 E( T) j= n = = dut Siilarly, we have that t= t0 k dvt () = ( ) j ( r) ( p) k + dt j= 0 k k + k + k ut ( ) = ( r) ( p) ( p) t= E( T) and vt () = ( r) ( p) t= E( T) If we ake the rule that ± j = j= 0 then for 0 j s we can write that d dt dt r s j j s s j j u d v j s s j ( ) ( ) = p ( p) i i ( p) s j j + t= E( T) i= 0 i= 0

This is used in conjunction with Leibnitz s rule, equation (), to calculate all needed derivatives. As it happens, exaination of the first few of these reveals a pattern and so the general k-th derivative can be expressed in closed for. To this end, we consider several cases: Case i: (s = ) duv ( ) du dv = v u ( p) ( p) ( p) + = ( p) dt 0 dt 0 dt r or duv ( ) = ( p) ( p) dt r () Case ii: (s = ) = v + + u dt 0 dt dt dt dt d ( uv ) d u du dv d v which after plugging in the parts and gathering ters is d ( uv) ( ) ( ) ( ) = p p p ( p) dt r + + But the ter in braces on the right hand side of the equation is divisible by ( p) leading to the final result d ( uv) 3 ( ) ( ) = p p ( p) + + dt r (3) Exaination of several ore derivatives (s = 3,4 and 5) lead to the general result, d k k k+ k k k l l ( uv) ( ) k k+ l ( ) = p ( p) j i ( p) k dt r + l= 0 l j= i= 0 where we have ade the rule that j ± =. Having developed this forula, it is j= now possible to expand the equation for the MLE in a Taylor series about E(T). Thus we have

( T E( T) ) T d( uv) d ( uv) pˆ = = u() t v() t ( T E( T) )! That is, t= E( T) T + r dt t= E( T) dt t= E( T) pˆ = p ( ) p ( p) ( T E( T)) + r 3 T E T ( p) ( p) g( p) ( ( )) r! 3 3 3 4 T E T ( p) ( p) g3( p) ( ( )) + r 3! 4 4 ( p) ( p) r 5 4 ( T E( T)) g4( p) 4! (4) where k k l l k gk ( p) = j i + ( p) l= 0 l j= i= l (5) oting that each ter in this expansion has a factor of the for k k ( T E( T) ) = ( Y E( Y) ) i i i = we see that taking the expected value of ˆp involves finding the expected value of averages of i.i.d. rando variables. Clearly, by the way we constructed the series, Siilarly, E ( Yi E( Yi) ) E( T E( T) ) 0 = = i= k while r( p) E ( Yi E( Yi)) Var( Y) = = i= ( p) 3 3 r( p) ( p) E ( Yi E( Yi)) E (( Y E( Y) + ) ) 3 = = i= ( p)

and finally 4 3 3 r ( p) E ( Yi E( Yi) ) ( Var( Yi) ) 3 4 3 = + = + i= ( p) If we now take the expectation ter-wise in the series for ˆp we obtain, p ( p) E( pˆ ) = p+ ( ) g p r!( p) ( p) ( p) + ( p) g 3( p) r 3! ( p) 3 ( p) ( p) g 4( p) + 3 r 4! ( p) + or gathering ters in powers of r yields Theore : The first few ters in the expansion for E( p ˆ) in powers of / are E( pˆ ) p ( p) g ( p) r ( p)! = p+ + ( p) ( p) 3 g ( ) 4( p) p g3( p) + + r ( p) 4! 3! + 3 where the gk ( p ) are defined as in equation (5). We note that both ( ) g p 3 g ( ) 3( ) and 4( p) p g p + are positive and so the! 4! 3! Maxiu likelihood Estiator is upwardly biased. Fro the expansion in equation (4) we can also find the first few ters in the expansion for ( pˆ p), take the expectation with respect to T as above an obtain the second order approxiation to the Mean Square Error for ˆp. To this end, oving p to the left hand side of equation (4) and squaring both sides yields,

4 ( pˆ p) ( ) ( ) p p ( T E( T) ) = r 3 3 5 4 4 6 5 5 ( T E( T) ) ( ) ( ) p p g( p) + r! ( T E( T) ) ( ) ( ) p p g3( p) r 4! ( ) ( p) ( p) 7 T E( T) r g4( p) + + 5! 3 4 5 ( T E( T) ) 4 4 6 ( ) ( ) p p g ( p) r (!) 4 ext taking the expectation on both sides of the equation and gathering ters in powers of r yields, Theore : The first few ters in the expansion for the Mean Square Error in powers of / are, E( pˆ p) = ( p) ( p) + r 3( ) ( ) g p g p ( p) ( p) + + ( p) g( p) + r 4! (!) ( r) 3 By cobining the expansions fro Theores and we can find the first few ters of the expansion for the variance of ˆp. Carrying out this process leads to the following theore.

Theore 3: The first few ters in the expansion for the variance of ˆp are Var( pˆ ) = ( p) ( p) + r 3( ) g p ( ) ( ) ( ) p p p g( p) r + + 4! 3 ( r) ote that as we would expect, the first ter in this expansion is just the reciprocal of the Fischer inforation. We now want to consider the asyptotic distribution of ˆp. We shall prove the following theore: Theore 4: ( pˆ p) D (0,) ( p)( ( p) ) r( p) proof: We shall again utilize a truncated Taylor expansion to deonstrate this result. In a anner analogous to equation (4) we can obtain pˆ p = ( ) p ( p) W r + [ X h W ] [ X r h W ] r X h W r + ( ) + W! where as before r( p) X = T = Yi and W = X c= i i = Y i= ( ( p) ) ext we note that the first ter can be written as

( p) ( p) ( p) r ( p) r( p) We note that the ter in the left hand bracket is just the square root of the first ter in the expansion for the variance given in Theore 3 while the ter in the right hand bracket is just the reciprocal of the square root of the variance of W. Hence we can write that, ( pˆ p) W = + ( p)( ( p) ) r( p) r( p) ( p) r( p)!( p)( ( p) ) [ X h W ] [ X r h W ] + ( r X h W ) + r W (6) as. r( p) as.. We have noted previously that X and that W 0. Since ( p) alost sure convergence iplies convergence in probability each of these converges to its respective liit in probability as well. Finally, consider the quantity W. We shall show that this converges in probability to 0 as tends to infinity. To this end, E( W ) P( W > ε ) < by Chebyshev s inequality and ε r(( p) E( W) = E( W) = Var( W) = ε ( p) so that P W > = ; that is, ( ε ) 0 as W converges to zero in probability. An arguent like to one used in Theore 0 now shows that the second ter on the right side of the equal sign in equation (6) converges to zero in probability. By the central liit theore the first ter (in brackets) on the right hand side converges in distribution to a (0,) and by syetry of the noral so does the negative of this rando variable. Finally application of Slutsky s theore copletes the proof. To suarize, we have shown that a unique axiu likelihood estiator exists and that it has all the usual properties we associate with an MLE. We have also obtained the first few ters of asyptotic expansions for the Bias and Variance of the MLE. One

practical atter needs to be noted. That is, the MLE is well defined if the experient is carried out only once. That is, in case the investigator collects and tests pools until r Y positive pools are found, the MLE is just pˆ = where Y is the nuber of Y + r negative pools tested prior to obtaining r positive pools. On the other hand, it is not reasonable in this case to invoke any of the asyptotic results noted above. It is particularly worthy of note that in the event of a very rare event, it ight be of practical interest to set r =. For the rare event case, it is also iportant to exercise care in the coputation of ˆp. This is because Y will be large copared to r and so the ratio Y will approach. The ratio raised to the / power is even closer to and so ( Y + r ) there will be excessive cancellation leading to loss of precision in the coputation of ˆp. Y r This proble can be solved by noting that = = e Y + r Y + r using the McClaurin expansion r ln Y+ r and 3 k x x x ln( x) = x, < x< 3 k r with x = Y + r to calculate the sall negative quantity r ξ = ln. Then the Y + r 3 ξ ξ McClaurin expansion e ξ = ( ξ + + + ) is used to calculate ˆp without loss of! 3! precision due to cancellation. To get a sense of the asyptotic behavior of the MLE, we note that the key factor in each ter of the expansions given in Theores and 3 is the quantity r. Table I shows the effect of r on the bias for several values of p when the pool size = 50. Fro the table it is clear that they key nuber fro an asyptotic point of view is the nuber of positive cells observed (r). It can also be noted here that the results of Theores and 3 still hold if the experient is changed so that at each site the investigator collects speciens until ri, i =,,, are observed. That is, if the sites are indexed as i=,,,, the investigator observes r i positive pools at the i-th site. In this case the log likelihood function is Yj + r L( p Y) = rj ln ( p) + Tln[ p] +, T = Y j= j= Y j j= j

and the axiu likelihood estiator is * *, j j= T pˆ = T + rj j=. If rj is written as j= r r = r then the expansions of theores and 3 are the sae but with r replaced by of * r and so the bias and eans square error depend asyptotically on powers rj which grows without bound as tends to infinity. j= p r E( p ˆ) Bias 0.055 0.0505 0.0045 0.0035 3 0.006 0.0006 4 0.0034 0.00034 /000 = 0.00 5 0.005 0.0005 0 0.00 0.000 5 0.0007 0.00007 0 0.0005 0.00005 5 0.0004 0.00004 0.00546 0.00536 0.0004 0.0003 3 0.0005 0.00005 4 0.0003 0.00003 /0,000=0.000 5 0.000 0.0000 0 0.000 0.0000 5 0.00007 0.000007 0 0.00005 0.000005 5 0.00004 0.000004 Table I: Behavior of the MLE with respect to bias as the quantity r increases given p and pools of size = 50. We next consider confidence intervals produced fro this odeling approach. As entioned previously, George and Elston [6] considered a confidence interval for the case r = while Lui [7] considered the case r > while =. The extension of their results to the screening of pools is very easy. As we have previously observed, if we have collected speciens at sites until r positive pools are observed at a site, and if we denote the nuber of negative pools collected at site i by Y i then the distribution of

T = Y is a negative binoial with paraeters q= ( p) and r. Given that i i= the investigator has observed t total failures, the classical confidence intervals are then given by solving the equations t r y+ r y ( p) ( p) = α / y= 0 y (8) r y+ r y ( p) ( p) = α / y= t y (9) A nuber of investigators [7-9] have shown that these sus can be replaced with equivalent binoial sus and hence that the sus can be found fro the incoplete beta function. In particular, following Patil [9] let y+ r r wy (, ( p), r) = ( p) ( p) y and Wt (, ( p), r) = wy (, ( p), r) then t y= 0 y Wt (, ( p ), r ) = I ( rt, + ) ( p) where ( p) r ( t+ ) I ( r, t + ) = u ( u) du ( p) Brt (, + ) 0 is the incoplete beta function with paraeters r and t +. Lui then uses the known relationship between the beta distribution and the F-distribution to obtain closed for equations for the end points of the confidence interval based on critical values fro the F-distribution. Fro a coputational point of view, it is no easier to calculate the quantile values fro the F-distribution than it is to calculate those fro the beta distribution and so in our study we have used the appropriate quantiles fro the incoplete beta function. The confidence intervals should be exact (ie, for any value of p, P( pl < p< pu) α and infiu with respect to p of such probabilities is exactly α ). Table II shows the coverage probabilities of the intervals for two values of the unknown paraeter p and for a nuber of values of r.

p r Coverage probability 0.95355 3 0.9555 4 0.9535 /000 = 0.00 5 0.9559 0 0.955 5 0.95 0 0.95089 5 0.95045 0.9509 3 0.95039 4 0.9508 /0,000 = 0.000 5 0.9505 0 0.9503 5 0.9500 0 0.95009 5 0.95006 Table II: Coverage probabilities for the confidence intervals when the true values of the unknown paraeter p are as given in the table. For this table, the pool size is = 50. In Table III we give results about coverage probabilities for the case = 50, r =,. 0 and p taking on values in the interval ( 0,000 00 ) p x 0 4 Coverage Probability 0.450 0.95006 0.750 0.95006 0.96 0.95007.30 0.95009.585 0.950.000 0.95003.60 0.95045 3.500 0.95035 4.30 0.9504 5.500 0.95086 7.00 0.95078 9.000 0.95067 0.000 0.955 Table III. Coverage probabilities for 95% confidence intervals for 0,000 p 000 when the pool size is = 50 and r = 0. In Table III we see the typical variation of the coverage probability with changes in p one expects with discrete distributions. These intervals are exact and slightly

conservative, but not nearly as conservative as the Binoial confidence intervals, for exaple. Bayesian Methods: We now take a look at the use of the Bayesian ethod to find point estiates and confidence intervals for this sapling situation. The new decision which ust be ade here is the choice of the prior distribution. We shall assue initially that we have no prior experience upon which to base a decision concerning a prior distribution. For that reason, we shall utilize the Jeffreys prior for our analysis. It is not difficult to show that for our negative binoial odel, this prior, g(p), is such that n ( p) g( p) ( p) ( p) Cobining this with the likelihood T Yi + r K ( p) ( p), T = Yj, K = j= i= Yi leads to the posterior distribution, n r t+ n π ( p) = ( p) ( p),0< p< Brt (, + ) where t is the value of the rando variable T actually observed. Although the classical concept of point estiate is not part of the Bayes approach, it is none the less possible to consider the ode of the distribution as analogous to a point estiate. In this case, as long as r, π ( p) has a single axiu on 0 < p < at the point, p b r = t+ r n We noted earlier that the axiu likelihood estiator, ˆp is upwardly biased. It requires only siple algebra to show that p ˆ b < p suggesting that it ight be a less biased estiator. This possibility was investigated nuerically and as is shown in Table IV, p b is far less biased than ˆp. This would suggest that if bias in the point estiator is iportant to the investigator, p b is a better choice for the point estiator particularly when r is sall.

p r E( p ˆ) E( p b) 4.5 x 0-3 9.8450 x 0-4 /000 = 0.00 3.6 x 0-3.0004 x 0-3 5.5 x 0-3.000 x 0-3 0. x 0-3.000 x 0-3.4 x 0-4 9.9860 x 0-5 /0,000 = 0.000 3.50 x 0-4.00005 x 0-4 5.5 x 0-4.0000 x 0-4 0. x 0-4.0000 x 0-4 Table IV: Coparison of the Expected value of the MLE, ˆp, to the expected value of the ode of the Bayesian posterior distribution, p b. Calculations shown are for pool sizes of = 50. Equal tail area ( α)% credibility intervals are also easily calculated given the posterior distribution, π ( p). To this end we ust find values p and p such that, l u pl u π ( pdp ) = α / and π( pdp ) = α / 0 0 p That is, for p l we ust solve the equation, pl r y+ n r y+ ( p ) l ( p) ( p) u ( u) α dp = du = Br (, y+ ) Br (, y+ ) 0 0 ote that if we let u ( ) l = p l the right hand integral can be solved for u l be eans of the Beta distribution quantile function. p l is then easily calculated fro u l by siple algebra. Coputation of p u is accoplished in the sae anner. Again, coverage probability is not a Bayesian concept but for coparison purposes Table V gives the calculated coverage for the 95% Bayesian credibility for a range of values of p when = 50 and r = 3. Fro Table V it is clear that the credibility is not exact in the sense described above. On the other hand, it appears to be reasonably close to the noinal level across all values of p considered and so should be quite useful. Before using this in a situation where you have prior knowledge which leads you to believe that the rando variable p is in a particular interval, a siulation study for a oderate nuber of values of p in this interval will be valuable in assessing results.

p x 0 4 Coverage Probability 0.96 0.94984.34 0.9509.585 0.9500.035 0.94979.63 0.95049 3.355 0.9499 4.307 0.9495 5.53 0.95098 7.0 0.953 9.9 0.94769.709 0.947 5.034 0.9500 9.305 0.95480 4.788 0.9504 Table V: Coverage probabilities for the 95% credibility interval for p in the approxiate range ( 0,000, 00 ). The actual values of p were chosen equally spaced on a logarithic scale. For this table, the pool size = 50 and r = 3. General Conclusions: This ethod of sapling has been suggested as a reasonable approach to take when the population prevalence is believed to be very sall. We have investigated the Maxiu likelihood approach for finding a point estiate, shown that it is strongly consistent and that it is asyptotically norally distributed. We have found asyptotic expansions for the bias and ean square error and have investigated the bias of the estiator nuerically. We have also considered exact confidence intervals analogous to the Clopper-Pearson intervals for the Binoial Distribution. Finally we have considered a Bayesian analysis based on the noninforative Jeffreys prior. We have found that the MLE is upwardly biased and severely so when the nuber of positive pools required prior to stopping the sapling is sall (r = or ). The confidence intervals are slightly conservative but not nearly as conservative as the Clopper-Pearson intervals for the Binoial sapling odel. We have also seen that the ode of the posterior distribution, viewed as a point estiate, is nearly unbiased even when r is sall. The Bayesian credibility intervals, although not exact, have nice coverage properties fro a Frequentist point of view.

References: (). Thopson,K.H., Estiation of the proportion of vectors in a natural population of insects insects. Bioetrics 8: 568-578 (96). (). Chiang,C.L., Reeves,W.C., Statistical estiation of virus infection rates in osquito vector populations, A J Hyg 75: 377-39 (96). (3). Katholi, C.R., Toé L., Merriweather,A., Unnasch, T.R., Deterining the prevalence of Onchocerca volvulus infection in vector populations by PCR screening of pools of black flies. J Infect Dis 7: 44-47 (995). (4). Barker, J.T., Statistical Estiators of Infection Potential Based on PCR Pool Screening With Unequal Pool Sizes, Biostatistics. Biringha: University of Alabaa at Biringha, (000). (5). Hepworth G, Exact Confidence Intervals for proportions Estiated by Group Testing, Bioetrics 5: 34-46, (996). (6). George, V.T. and Elston, R.C., Confidence intervals based on the first occurance of an event, Statistics in Medicine, Vol., pp 685-690 (993). (7). Lui, K., Confidence liits for the population prevalence rate based on the negative binoial distribution, Statistics in Medicine, Vol. 4, pp 47-477 (995). (8). Morris, K. W., A note on direct and inverse binoial sapling, Bioetrika, Vol. 50, 0. 3/4, pp 544-545, (963). (9). Patil, G. P., On the evaluation of the egative Binoial Distribution with exaples, Technoetrics, Vol., o. 4, pp 50-505 (960). (0). Singer, J and Sen, P.K., Large Saple Methods in Statistics, Chapan & Hall, ew York, (993). (). Serfling, Approxiation Theores of Matheatical Statistics, Wiley Series in Probability and Matheatical Statistics, John Wiley and Sons, ew York, (980). (). Ferguson, T.S., A Course in Large Saple Theory, Chapan and Hall, ew York (996). (3). Wald, A., ote on Consistency of the Maxiu Likelihood Estiator, Annals of Matheatical Statistics, Vol 0, o. 4, pp 595-60 (949).