Tolerance Intervals for Hypergeometric and Negative Hypergeometric Variables

Size: px
Start display at page:

Download "Tolerance Intervals for Hypergeometric and Negative Hypergeometric Variables"

Transcription

1 Sankhyā : The Indian Journal of Statistics 2015, Volume 77-B, Part 1, pp c 2014, Indian Statistical Institute Tolerance Intervals for Hypergeometric and Negative Hypergeometric Variables Derek S. Young University of Kentucky, Lexington, USA Abstract Tolerance intervals for discrete variables are widely used, especially in industrial applications. However, there is no thorough treatment of tolerance intervals when sampling without replacement. This paper proposes methods for constructing one-sided tolerance limits and two-sided tolerance intervals for hypergeometric and negative hypergeometric variables. Equal-tailed tolerance intervals (i.e., tolerance intervals that control the percentages in both tails) are studied followed by a small adjustment to the nominal coverage level to obtain tolerance intervals that control a specified inner percentage of the sampled distribution. The tolerance interval calculations implicitly use confidence bounds for, the unknown number of elements possessing a certain attribute in the finite population of size N. Three different methods for obtaining such confidence bounds are suggested: a large sample approach, an approach with a continuity correction, and an exact method based on nonrandomization. The intervals are examined for desirable coverage probabilities and expected widths. The methods are also illustrated using some examples. AS (2000) subject classification. Primary 62F25; Secondary 62F03. Keywords and phrases. Acceptance sampling, coverage probability, exact confidence bounds, expected width, monotone likelihood ratio property, tolerance package. 1 Introduction A statistical tolerance interval is an interval that is expected to contain at least a certain proportion of the sampled population (P ) with a specified confidence level (1 α). Tolerance intervals are important for applications in quality control, engineering, and the pharmaceutical industry. See the texts Electronic supplementary material The online version of this article (doi: /s ) contains supplementary material, which is available to authorized users.

2 Hypergeometric and negative hypergeometric tolerance intervals 115 by Hahn and eeker (1991) and Krishnamoorthy and athew (2009) for examples. Like confidence and prediction intervals, (approximate) tolerance intervals are available for numerous continuous and discrete distributions, regression models, and some multivariate settings. The literature on constructing tolerance intervals for continuous distributions is extensive and dates back to the seminal works of Wilks (1941, 1942). However, there is considerably less literature on tolerance intervals for discrete distributions, where the majority of such works focus on the binomial, Poisson, and negative binomial distributions. One of the earliest works is Zacks (1970), who developed uniformly most accurate (UA) upper tolerance limits for discrete distributions that possess the monotone likelihood ratio property. While the work of Zacks (1970) is firmly rooted in a theoretical framework, Hahn and Chandra (1981) took a more pragmatic approach to the problem so that practitioners could easily calculate tolerance limits (intervals) for binomial and Poisson random variables. Zaslavsky (2007) provided numerous examples of calculating discrete tolerance limits for clinical trials having dichotomous outcomes. athew and Young (2013) used a fiducial approach to construct tolerance intervals for functions of discrete random variables. Using the framework of Hahn and Chandra (1981), Young (2014) provided an extensive simulation study of different methods for computing negative binomial tolerance intervals. The approaches for constructing tolerance intervals for the above discrete distributions are often conservative. However, methods to improve their coverage probabilities have been investigated. Wang and Tsung (2009) proposed a coverage-adjustment procedure to compute the exact minimum and average coverage probabilities of binomial and Poisson tolerance intervals. Krishnamoorthy et al. (2011) applied this coverage adjustment to obtain two-sided binomial and Poisson tolerance intervals that do not necessarily control the percentages in both tails. Cai and Wang (2009) utilized a probability-matching technique to construct tolerance intervals for any distribution belonging to the natural discrete exponential family having a quadratic variance function. Their approach involves a two-term Edgeworth expansion that re-centers the tolerance interval by causing the first-order smoothing term (i.e., first-order probability matching) or the first-order and second-order smoothing terms (i.e., second-order probability matching) to vanish. This removes systematic bias and results in better coverage probabilities. The above references highlight most of the work that has been done regarding tolerance limits for the binomial, Poisson, and negative binomial distributions. However, there is little work that properly treats the tolerance

3 116 D.S. Young interval problem when sampling without replacement. Eichenberger et al. (2011) presented some work relevant to this topic for sample size determination for social surveys, but it relied on a normal approximation and the notions of size resolution and difference resolution. We provide a rigorous development of tolerance intervals for hypergeometric and negative hypergeometric variables and highlight the corresponding sampling schemes to which they are applicable. For example, the hypergeometric distribution is often used when constructing attribute acceptance sampling plans (see Chapter 15 of ontgomery (2013)). eanwhile, a negative hypergeometric distribution is often used when constructing an attribute inverse sampling plan to determine the total number of samples to draw (without replacement) in order to observe a specified number of the attribute of interest. Acceptance limits for either plan can be framed as tolerance limits for the respective distribution. We note that the negative hypergeometric distribution is, in general, less studied in the literature compared to the hypergeometric distribution, especially regarding the computation of statistical intervals. See Zhang and Johnson (2011) for a discussion of approximate negative hypergeometric confidence intervals. In the literature, the negative hypergeometric distribution is also referred to as the inverse hypergeometric distribution (Guenther, 1975) and the hypergeometric waiting-time distribution (Johnson et al. 2005). It is helpful to briefly comment on some of the notation used throughout this paper. Specifically, we will use Y and Z to denote hypergeometric and negative hypergeometric random variables, respectively. X will denote a general random variable that will often be treated as a discrete random variable. oreover, these random variables will be superscripted with an asterisk (e.g., X ) to indicate when another random variable is drawn independently from the same distributional family, but perhaps with different fixed parameter values. This paper is organized as follows. In Section 2, we describe the general set-up for constructing one-sided tolerance limits and equal-tailed tolerance intervals for discrete distributions following Zacks (1970) and Hahn and Chandra (1981). The method implicitly uses confidence intervals for the parameter of interest. In Section 3, we outline large sample and exact methods for constructing confidence intervals and tolerance intervals for both the hypergeometric and negative hypergeometric distributions. We utilize the theory presented in Lehmann and Romano (2005) and Wright (1997) regarding exact randomized one-sided confidence bounds for hypergeometric distributions, which are known to be uniformly most accurate (UA). We further leverage the monotone likelihood ratio property of the hypergeometric and negative hypergeometric distributions combined with the theory presented

4 Hypergeometric and negative hypergeometric tolerance intervals 117 in Zacks (1970) to develop exact tolerance limits for these two distributions. All two-sided tolerance intervals that we present are conservative; however, we apply a coverage adjustment using a criterion suggested in Krishnamoorthy et al. (2011) for the binomial and Poisson settings, which is based on the methodology of Wang and Tsung (2009). In Section 4, we compare the performance of these different intervals. In Section 5, we present examples of hypergeometric and negative hypergeometric tolerance intervals as well as the corresponding functions that are available in the tolerance package (Young, 2010) for the R programming language (R Development Core Team, 2013). Finally, we end with a brief discussion in Section 6. 2 Tolerance Intervals for Discrete Distributions Let X be a discrete random variable with cumulative distribution F ( ; θ, n), where θ is the parameter of interest and n is a known size parameter (e.g., the number of draws from a hypergeometric distribution or the target number of successes to draw from a negative hypergeometric distribution). Let X follow the same distribution independent of X, but with size m,which may or may not be equal to n. A(1 α, P ) tolerance interval [L(X),U(X)] is constructed so that Pr X{Pr X {L(X) X U(X) X} P } =Pr X{(Pr X {X U(X) X} Pr X {X L(X) 1 X}) P } =Pr X{F (U(X); θ, m) F (L(X) 1; θ, m) P } 1 α. (1) Analogously, a lower (1 α, P ) tolerance bound requires finding the largest integer L 1 (X) such that Pr X {1 F (L 1 (X) 1; θ, m) P } 1 α, (2) while an upper (1 α, P ) tolerance bound requires finding the smallest integer U 1 (X) such that Pr X {F (U 1 (X); θ, m) P } 1 α. (3) In the above, (1 α) is the confidence level and P (called the content) is the proportion of the sampled population that we wish to capture. Note that because of the discrete nature of the problem, the coverage probability requirements above are at least (1 α), whereas in the continuous setting they would be equalities. oreover, we must carefully

5 118 D.S. Young handle complementary probability statements when finding the bounds; i.e., Pr(X x 0 )=Pr(X x 0 1) for integer-valued x 0. Suppose x is an observed value of our random variable X and that F ( ; θ, n) is monotonic in θ. The method presented in Hahn and Chandra (1981) (henceforth referred to as the Hahn-Chandra method ) for constructing equal-tailed tolerance intervals for discrete distributions can be summarized in two steps: 1.Basedontheobservedvaluex, construct a two-sided 100(1 α)% confidence interval for θ, (θ L;α (x, n),θ U;α (x, n)). 2. For a future sample size m, find the maximum integer L(x) andthe minimum integer U(x) such that 1 F (L(x) 1; θ L;α (x, n),m) 1+P 2 (4) and respectively. F (U(x); θ U;α (x, n),m) 1+P 2, (5) The above can easily be modified to obtain one-sided tolerance limits. Specifically, we would replace the confidence interval in Step 1 by the necessary one-sided 100(1 α)% confidence bound for θ, sayθ L1 ;α(x, n) orθ U1 ;α(x, n), andthenreplace(1+p )/2 instep2byp. The Hahn-Chandra method as outlined is a two-sided setting that results in equal-tailed tolerance intervals because we are controlling the percentages in both tails. However, one can control some inner percentage of the distribution by using a slightly different criterion for Step 2. Namely, 2a. For a future sample size m, find integers U(x) >L(x) such that F (U(x); θ U;α (x, n),m) F (L(x) 1; θ L;α (x, n),m) P. (6) Obviously different criteria are employed depending on if the objective is to control the percentages in both tails or to control for some inner (possibly central) percentage. This also impacts how the coverage probabilities of the resulting tolerance intervals are calculated. These differences will be discussed in the following subsection.

6 Hypergeometric and negative hypergeometric tolerance intervals 119 Clearly the method used to construct the confidence interval in Step 1 above will impact the performance of the resulting tolerance interval. As Krishnamoorthy et al. (2011) point out, coverage properties of one-sided tolerance limits are typically similar to those of the confidence intervals that were used to construct them. However, coverage properties of two-sided tolerance intervals often warrant further investigation to gauge their degree of conservatism. Regardless, we will investigate coverage properties for both one-sided tolerance limits and two-sided tolerance intervals for hypergeometric and negative hypergeometric variables. Besides coverage probabilities, expected widths are also important in evaluating the performance of tolerance intervals and statistical intervals in general. See Brown et al. (2001) for such comparisons regarding confidence interval estimation of binomial proportions. We now provide the general formulas for coverage probabilities and expected widths in the discrete set-up Performance easures. In this section, we will define the coverage probabilities and expected widths of discrete two-sided tolerance intervals. It is straightforward to alter the formulas for assessing the performance of onesided upper and one-sided lower tolerance limits. The coverage probability of a discrete tolerance interval is the probability that the calculated tolerance interval captures at least a proportion P of the sampled population. Letting p X ( ; θ, n) denote the probability mass function of X, the coverage probability of a (1 α, P ) tolerance interval [L(X; θ L;α (X, n)),u(x; θ U;α (X, n))] controlling for some inner P 100 % of the sampled population is given by Pr X {Pr X {L(X; θ L;α (X, n)) X U(X; θ U;α (X, n)) P X}} { n U(t;θU;α (t,n)) } = p X (t; θ, n)i p X (x ; θ, m) P, t=0 x =L(t;θ L;α (t,n)) (7) where I{ } is the indicator function. oreover, the expected width of this interval is given by n (U(t; θ U;α (t, n)) L(t; θ L;α (t, n)))p X (t; θ, n). (8) t=0 Due to the more stringent requirement for tolerance intervals when controlling for the tails (Step 2) compared to controlling for some inner percentage (Step 2a), the coverage probability of an equal-tailed tolerance interval must be larger than the true coverage probability. Hence, the coverage

7 120 D.S. Young probability of a (1 α, P ) equal-tailed tolerance interval [L e (X; θ L;α (X, n)), U e (X; θ U;α (X, n))] is given by n t=0 { } p X (t; θ, n)i{ L e (t; θ L;α (t, n)) Q((1 P )/2; θ, m) { U e (t; θ U;α (t, n)) Q((1 + P )/2; θ, m)} }, (9) where Q( ; θ, m) is the quantile function for the distribution characterized by F ( ; θ, m). Note that the above expression is the probability with respect to F ( ; θ, n) that the equal-tailed tolerance interval includes the (1 P )/2 th and (1 + P )/2 th quantiles of the distribution characterized by F ( ; θ, m). Analogous to the setting when controlling for an inner percentage, the expected width of the equal-tailed tolerance interval is given by n (U e (t; θ U;α (t, n)) L e (t; θ L;α (t, n)))p X (t; θ, n). (10) t=0 3 Hypergeometric and Negative Hypergeometric Tolerance Intervals Consider a finite universe of N elements with an unknown number having a particular attribute of interest, where 0 N. When sampling n items without replacement from this universe, we let the random variable Y denote the number of elements in the sample that possess the attribute of interest. Thus, the random variable Y follows a hypergeometric distribution with probability mass function ( )( N ) y n y ( p Y (y;,n) = N, for max{n N +,0} y min{,n} n) 0, otherwise, (11) which we write as Y Hyp(n, N, ). Suppose that we again sample without replacement, but until a specified number of elements k, 1 k, having the attribute of interest is observed. Letting the random variable Z denote the number of elements drawn

8 Hypergeometric and negative hypergeometric tolerance intervals 121 until k successes are observed. We then say that the random variable Z follows a negative hypergeometric distribution with probability mass function ( z 1 )( N z ) k 1 k ( p Z (z;,k) = N, for 0 <k z min{n + k, } ) 0, otherwise, (12) which we write as Z NegHyp(k, N, ). Note that the relationship between the negative hypergeometric and the hypergeometric is similar to that between the negative binomial and the binomial; see iller and Fridell (2007) for further discussion. Also, since N and have the same meaning for each distribution, we use the same notation to avoid introducing additional notation. For each distribution, the value is the unknown parameter of interest. Thus, confidence intervals for are necessary to employ the Hahn-Chandra method. For the hypergeometric distribution, let [ L;α (y; n), U;α (y; n)] denote a two-sided 100(1 α)% confidence interval for. For a future sample size m, we find the appropriate integers L(y) andu(y) that satisfy the requirements in Step 2 or Step 2a, which are the limits for a two-sided (1 α, P ) hypergeometric tolerance interval. Similarly, for the negative hypergeometric distribution, let [ L;α (z; k), U;α (z; k)] denote a two-sided 100(1 α)% confidence interval for. For a future number of target successes l, we also find the appropriate integers L(z) andu(z), which are the limits for a two-sided (1 α, P ) negative hypergeometric tolerance interval. Without loss of generality, we will simply assume that m = n and l = k for the remainder of our discussion as this does not affect the overall results from the comparative study Large Sample Intervals. Confidence intervals based on large sample theory can be constructed for both the hypergeometric and negative hypergeometric distributions. To do this, we utilize the fact that the binomial and negative binomial distributions can be used as large sample approximations for the hypergeometric and negative hypergeometric distributions, respectively; see iller and Fridell (2007) for discussion. oreover, it is easier to first re-parameterize so that we can apply standard Wald-type confidence intervals. When using the binomial approximation to the hypergeometric distribution, ˆp = y/n is the maximum likelihood estimate for the proportion p of elements possessing the attribute of interest. oreover, Var(ˆp) =p(1 p)/n.

9 122 D.S. Young Then an estimate for is [N ˆp], where [ ] is the nearest integer function, and a standard large sample 100(1 α)% confidence interval for p is ˆp ± q 1 α/2 ˆp(1 ˆp)/n, (13) where q 1 α/2 is the (1 α/2) th quantile of the standard normal distribution. While there are many other confidence intervals for a binomial proportion (see, for example, Newcombe (1998) and Brown et al. (2001)), our large sample approach will be based on (13). When incorporating a finite population correction factor, we get the following large sample 100(1 α)% confidence interval for : [ ( N ˆp q 1 α/2 ˆp(1 ˆp) n ) ( N n ˆp(1 ˆp), N ˆp + q 1 α/2 N 1 n ) ] N n, N 1 (14) where and are the floor and ceiling functions, respectively. oreover, we can modify the above large sample interval by employing a continuity correction, which results in the following 100(1 α)% confidence interval for : [ ( N ˆp(1 ˆp) N n ˆp q 1 α/2 n N 1 1 ) ( ˆp(1 ˆp) N n, N ˆp+q 1 α/2 2n n N n (15) When using the negative binomial approximation to the negative hypergeometric distribution, ˆν = k/z is the maximum likelihood estimate for the proportion ν of elements possessing the attribute of interest. oreover, Var(ˆν) =ν 2 (1 ν)/z. Then an estimate for is [N ˆν] and a large sample 100(1 α)% confidence interval for ν is ) ] ˆν ± q 1 α/2 ˆν 2 (1 ˆν)/z. (16) Again, many other confidence intervals are available for a negative binomial proportion (see, for example, Tian et al. (2009) and Young (2014)), but our large sample approach will be based on (16). Like the hypergeometric setting, we can incorporate a finite population correction factor to get the following large sample 100(1 α)% confidence interval for : [ ( N ˆν q 1 α/2 ˆν2 (1 ˆν) z ) ( N z ˆν2 (1 ˆν), N ˆν+q 1 α/2 N 1 z N z N 1 ) ]. (17).

10 Hypergeometric and negative hypergeometric tolerance intervals 123 oreover, the 100(1 α)% large sample confidence interval for with a continuity correction is [L 1 (X; θ L1 ;α/2(x + R, n)),u 1 (X; θ U1 ;α/2(x + R, n))]. (18) [ ˆν N( ˆν q 2 (1 ˆν) 1 α/2 z ( ˆν N ˆν + q 2 (1 ˆν) 1 α/2 z N z N 1 1 ), 2z N z N ) ]. (19) 2z Using any of the above set-ups, we can then plug the respective confidence interval into Step 2 of the Hahn-Chandra method to obtain the appropriate (1 α, P ) equal-tailed tolerance interval. oreover, it is easy to obtain onesided confidence limits for and construct one-sided (1 α, P ) tolerance limits similarly Exact Intervals. In this section, we show how to obtain UA (or exact) one-sided upper (1 α, P ) tolerance limits. The construction of onesided lower tolerance limits is completely analogous and so it is enough to consider the one-sided upper setting. We will then use these limits to compute an exact-based two-sided tolerance interval. Consider again the discrete random variable X and the corresponding distribution function F (X; θ, n). Suppose that we are interested in the hypothesis test H 0 : θ = θ 0 (20) H A : θ<θ 0. For size α tests involving discrete distributions, it is usually not possible to choose a critical region consisting of realizations that yield test statistics of size exactly α. However, the theory in Chapter 3 of Lehmann and Romano (2005) shows how any such randomized test based on X has the representation of a nonrandomized test based on X and an independent standard uniform random variable U. Specifically, the statistic T = X + U is equivalent to the pair (X, U), since with probability 1, X = T and U = T T. Thus, the distribution of T is continuous and confidence bounds can be based on this statistic. Wright (1997) applied this approach when studying both randomized and nonrandomized 100(1 α)% confidence bounds when attributes are rare in finite universes. Now, define P as the family of distribution functions F (x; θ, n), where θ Θ; i.e., P = {F (x; θ, n) : θ Θ}. Assume that P is a monotone likelihood ratio family, which means for each θ<θ, p(x; θ, n)/p(x; θ,n)is

11 124 D.S. Young non-decreasing in x. We now state a definition and theorem, both due to Zacks (1970): Definition 1. An upper (1 α, P ) tolerance limit is UA if, subject to Equation (3), it has the optimum property at all θ Θ, that Pr X {U 1 (X) F 1 (P ; θ,m)} is at a minimum for all θ such that F 1 (P ; θ,m) >F 1 (P ; θ, m). Note in Zacks (1970) that m is simply taken to be n and suppressed from the formulas. Using the representation of T defined above, we can state the following theorem: Theorem 1. If P = {F (x; θ, n) :θ Θ} is a monotone likelihood ratio family in x and if θ U1 ;α(x + R, n) isauaupperconfidencelimitforθ at confidence level (1 α), then U 1 (X; θ U1 ;α(x + R, n)) = F 1 (P ; θ U1 ;α(x + R, n),m) (21) is a UA upper (1 α, P ) tolerance limit for P. See Zacks (1970) for the proof. Letting L 1 (X; θ L1 ;α(x + R, n)) denote the analogous UA one-sided lower (1 α, P ) tolerance limit, we can then simply define an exact-based two-sided equal-tailed tolerance interval [L(X; θ L;α (X+ R, n)),u(x; θ U;α (X + R, n))] as [L 1 (X; θ L1 ;α/2(x + R, n)),u 1 (X; θ U1 ;α/2(x + R, n))]. (22) The above theory is developed in the context of randomized bounds. However, nonrandomized bounds are still exact and just as efficacious for our discussion. Thus, we proceed in obtaining nonrandomized bounds and avoid additional computational complexities, such as with the algorithm in Wright (1997). We first note that both the hypergeometric and negative hypergeometric distributions possess the monotone likelihood ratio property (see Appendix A). For the hypergeometric distribution, it is easy to construct a UA upper confidence bound by defining C UH (y) ={ :Pr {Y y} >α} ˆ UH (y; α) = arg max{pr {Y y} >α}, where ˆ UH (y; α) is an exact nonrandomized 100(1 α)% upper confidence bound for. oreover, a UA lower confidence bound is found by defining C LH (y) ={ :Pr {Y y} >α} ˆ LH (y; α) =argmin{pr {Y y} >α},

12 Hypergeometric and negative hypergeometric tolerance intervals 125 where ˆ LH (y; α) is an exact nonrandomized 100(1 α)% lower confidence bound for. Hence, [ ˆ LH (y; α/2), ˆ UH (y; α/2)] would be an exact-based nonrandomized 100(1 α)% confidence interval for. For the negative hypergeometric distribution, we proceed in a similar manner. To construct a UA upper confidence bound, define C UNH (z) ={ :Pr {Z z} >α} ˆ UNH (z; α) = arg max{pr {Z z} >α}, where ˆ UNH (z; α) is an exact nonrandomized 100(1 α)% upper confidence bound for. oreover, a UA lower confidence bound is found by defining C LNH (z) ={ :Pr {Z z} >α} ˆ LNH (z; α) =argmin{pr {Z z} >α}, where ˆ LNH (z; α) is an exact nonrandomized 100(1 α)% lower confidence bound for. Hence, [ ˆ LNH (z; α/2), ˆ UNH (z; α/2)] would be an exactbased nonrandomized 100(1 α)% confidence interval for. As stated at the end of Section 3.1, we can proceed to use any of the above confidence limits (intervals) to compute the appropriate (1 α, P ) tolerance limits (intervals) inimum Coverage Two-Sided Tolerance Intervals. As Krishnamoorthy et al point out, it is often unnecessary to control the tail percentages in practical applications (e.g., discrete quality assessment). Instead, controlling for some inner percentage of the sampled population would suffice. This amounts to Step 2a in the Hahn-Chandra method. For the binomial and Poisson distributions, Wang and Tsung (2009) provided a numerical approach to find the value α so that an exact (1 α, P ) tolerance interval will have minimum coverage probability close to the nominal level. Empirical work in Krishnamoorthy et al. (2011) regarding the binomial and Poisson distributions found that using (1 2α, P ) equal-tailed tolerance intervals yielded (minimum) coverage probabilities close to the nominal level of (1 α). In the hypergeometric and negative hypergeometric settings, we do not develop a formal numerical approach like that in Wang and Tsung (2009) for determining α. We are simply applying what Krishnamoorthy et al. (2011) suggested for α in the binomial and Poisson settings. This adjustment is incorporated into our performance study in the next section. Also, to avoid confusing the exact terminology of Wang and Tsung (2009) with the exact terminology of the intervals in the previous subsection, we will henceforth refer to the former as a coverage-adjusted approach.

13 126 D.S. Young 4 Performance Comparisons For comparing the performance of the methods discussed in Section 3, we considered the following conditions. For the hypergeometric distribution, we considered N {100, 500} and n {0.20N,0.50N}. Hence, coverage probabilities and expected widths were calculated for =0, 1,...,N. For the negative hypergeometric distribution, we again considered N {100, 500}, but with k {0.50N,0.75N}. Hence, coverage probabilities and expected widths were calculated for = k, k +1,...,N. In practice, common values for (1 α) andp are taken from the set {0.90, 0.95, 0.99}. We present results for (0.95, 0.90) tolerance limits and intervals for both the hypergeometric and negative hypergeometric distribution. We also ran limited studies at the (0.85, 0.90) and (0.95, 0.95) levels, which demonstrated similar results as to what we present for the (0.95, 0.90) setting. The figures we present for assessing the coverage probabilities and expected widths have some common characteristics that are worth noting. First is the oscillatory behavior that occurs with the coverage probabilities. This is typical for distributions that have a lattice structure, like the hypergeometric and negative hypergeometric. This has also been noted in other studies on statistical intervals for discrete distributions, such as in Brown et al. (2001), Cai and Wang (2009), and Young (2014). For our study, the oscillatory patterns become more distinct as N increases, while also the coverage probabilities tend toward the nominal level for values of not near the extremes. The expected widths for the one-sided tolerance limits were all very close to each other, regardless of the method employed. As such, we provided only a brief summary in Table 1 for the Hyp(50, 100,)and NegHyp(250, 500,) settings and for four select values of. These results were typical regardless of the conditions. Figure 1 shows the coverage probabilities for the one-sided hypergeometric tolerance limits. Coverage probabilities for the large sample (LS) Table 1: Some expected widths of one-sided upper (0.95,0.90) tolerance limits for the large sample (LS), continuity correction (CC), and exact (EX) methods. Hyp(50, 100,) NegHyp(250, 500,) LS CC EX LS CC EX

14 Hypergeometric and negative hypergeometric tolerance intervals 127 N = 100, n = 20 N = 100, n = 50 LS CC EX LS CC EX (a) (b) N = 500, n = 100 N = 500, n = 250 LS CC EX LS CC EX (c) (d) Figure 1: Coverage probabilities for the one-sided hypergeometric tolerance limits. The solid lines ( ) are for the LS method, the dashed lines ( ) are for the CC method, and the dotted lines ( )arefortheex method. and exact (EX) method appear to be similar, while the continuity correction (CC) method tends to be slightly more conservative. As N increases, all three methods are closer to the nominal level (0.95); however, there are some higher values of where the EX method tends to be less conservative than the other methods. Given their similar performance with respect to expected widths, one could reasonably use either the LS or EX methods, but there is a slightly better performance with the EX method for larger values of. Figure 2 shows the coverage probabilities for the two-sided hypergeometric tolerance intervals as well as when the coverage adjustment is applied.

15 128 D.S. Young N = 100, n = 20 (Large Sample) N = 500, n = 250 (Large Sample) (a) (b) N = 100, n = 20 (Continuity Correction) (c) N = 100, n = 20 (Exact) N = 500, n = 250 (Continuity Correction) (d) N = 500, n = 250 (Exact) (e) (f) Figure 2: Coverage probabilities for the two-sided hypergeometric tolerance intervals. The conditions used to generate the coverage probabilities are above each figure. The solid lines ( ) are for the original calculation and the dashed lines ( ) are for the coverage adjustment.

16 Hypergeometric and negative hypergeometric tolerance intervals 129 For N = 100, we see that the EX method actually is conservative relative to the other two methods. However, for N = 500, it appears to be performing closer to nominal. When comparing whether or not the coverage adjustment is applied, we observe little benefit for the case of N = 100; however, for N = 500 there appears to be a trade-off when the adjustment is not applied. Namely, there appears to be intervals of where the coverage probabilities improve with the adjustment, and intervals of where it does not. However, when we look at the expected widths of each setting in Figure 3, the coverage adjustment yields unanimously narrower intervals - especially for N = 100, n = 50 N = 100, n = 20 Expected Width LS CC EX Expected Width LS CC EX (a) (b) N = 500, n = 250 N = 500, n = 100 Expected Width LS CC EX Expected Width LS CC EX (c) (d) Figure 3: Expected widths for the two-sided hypergeometric tolerance intervals. Shading for the LS method, CC method, and EX method is noted on each figure. The solid lines ( ) are for the original calculation and the dashed lines ( ) are for the coverage adjustment.

17 130 D.S. Young values of away from the extremes. Hence, the LS and EX method with a coverage adjustment appear to have the better performance. Figure 4 shows the coverage probabilities for the one-sided negative hypergeometric tolerance limits. Coverage probabilities for the EX method appear to be closer to nominal relative to the other two methods. As N and k increase, all three methods show signs of stabilizing closer to the nominal level. As noted earlier, all three methods perform similarly with respect to expected widths, just like in the hypergeometric setting. Given these results, the EX method appears to have the better performance. N = 100, k = 50 N = 100, k = 75 LS CC EX LS CC EX (a) (b) N = 500, k = 250 N = 500, k = 375 LS CC EX LS CC EX (c) (d) Figure 4: Coverage probabilities for the one-sided negative hypergeometric tolerance limits. The solid lines ( ) are for the LS method, the dashed lines ( ) are for the CC method, and the dotted lines ( )arefortheex method.

18 Hypergeometric and negative hypergeometric tolerance intervals 131 Figure 5 shows the coverage probabilities for the two-sided negative hypergeometric tolerance intervals as well as when the coverage adjustment is applied. For N = 100, we see that the CC and EX methods are conservative, but the LS method is much more variable about the nominal level. However, for N = 500, the EX method performs closer to nominal relative to the LS and CC methods. When comparing whether or not the coverage adjustment is applied, we see that there is usually some improvement for most values of when it is applied. oreover, when we look at the expected widths of each setting in Figure 6, the coverage adjustment again yields unanimously narrower intervals - especially for values of away from the extremes. Hence, the EX method with a coverage adjustment appears to have the better performance. 5 R Functions and Examples The R package tolerance (Young, 2010) includes tools for estimating tolerance limits of various data structures, such as data from: continuous distributions (e.g., normal, Weibull, and Cauchy); discrete distributions (e.g., binomial, Poisson, and negative binomial); regression settings (e.g., linear regression, nonlinear regression, and nonparametric regression); and multivariate settings (e.g., multivariate normal and multivariate linear regression). The tolerance package now includes functions that compute the hypergeometric and negative hypergeometric tolerance intervals discussed in Section 3. They are the hypertol.int and neghypertol.int functions, respectively. Both functions have the same arguments, but some of them depend on the distribution. The arguments are summarized in Table 2. Further details on these functions can be found by typing?hypertol.int and?neghypertol.int in R. Example 1. (One-Sided Hypergeometric Tolerance Limits) Consider a manufacturing setting where a company needs to purchase plastic fasteners in bulk. The company receives N = 5000 fasteners in a given lot. When sampling n = 1000 fasteners without replacement, the company found y =15 defective units. Due to the cost of the sampling procedure, the company will reduce future inspections to m = 100 fasteners sampled without replacement. They further need to specify a one-sided upper (0.95, 0.99) tolerance limit

19 132 D.S. Young N = 100, k = 50 (Large Sample) N = 500, k = 375 (Large Sample) (a) (b) N = 100, n = 50 (Continuity Correction) N = 500, k = 375 (Continuity Correction) N = 500, k = 375 (Exact) (c) (d) N = 100, k = 50 (Exact) (e) (f) Figure 5: Coverage probabilities for the two-sided negative hypergeometric tolerance intervals. The conditions used to generate the coverage probabilities are above each figure. The solid lines ( ) are for the original method and the dashed lines ( ) are for the coverage-adjusted method.

20 Hypergeometric and negative hypergeometric tolerance intervals 133 N = 100, k = 75 N = 100, k = 50 Expected Width LS CC EX Expected Width LS CC EX (a) (b) N = 500, k = 250 N = 500, k = 375 Expected Width LS CC EX Expected Width LS CC EX (c) (d) Figure 6: Expected widths for the two-sided negative hypergeometric tolerance intervals. Shading for the LS method, CC method, and EX method is noted on each figure. The solid lines ( ) are for the original calculation and the dashed lines ( ) are for the coverage adjustment. based on this information for determining when to accept/reject the lot of fasteners. This is found using the hypertol.int function as follows: > hypertol.int(x = 15, n = 1000, N = 5000, m = 100, alpha = 0.05, P = 0.99, side = 1, method = "EX") alpha P rate p.hat 1-sided.lower 1-sided.upper Thus, the company can be 95% confident that at least 99% of all lots will have no more than 6 defects in a future sample of 100 fasteners. Note that the value of y from our earlier formulas corresponds to the x argument in the

21 134 D.S. Young Table 2: Arguments for the hypertol.int and neghypertol.int functions. Argument hypertol.int neghypertol.int x number of units with attribute total sample drawn to achieve in sample n units with attribute n size of the sample drawn target number of units with attribute to draw m size of a future sample future target number of units with attribute to draw N alpha P side method population size level of the test the content numeric argument taking 1 or 2 for one-sided limits or a two-sided interval an argument to specify the large sample method ("LS"), the continuity correction method ("CC"), or the exact method ("EX") R function. The above result is for the EX method. By changing the method argument, we can obtain the results for the LS and CC methods. For this example, the one-sided upper (0.95, 0.99) tolerance limit is also 6 for the LS and CC methods. Example 2. (Hypergeometric Tolerance Intervals) Vener et al. (1993) analyzed data from grant applications submitted to the National Cancer Institute in response to a February 1993 request for applications. The applications went through a triage at the National Institutes of Health. 21 members comprised a full committee, from which 5 members formed a subcommittee to review an individual application. A total of 73 applications went through this process. Each application reviewed was assigned a competitiveness score by each member, which resulted in it being classified as competitive or noncompetitive. If at least two of the five subcommittee members voted for a grant application as competitive, then it was sent to the full committee of 21 members for further review. Otherwise, the grant application was rejected. The top half of Table 3 gives the data from this triage process. Vener et al. (1993) built a hypergeometric model to estimate probabilities of possible dispositions of grant applications as a result of this triage process. Clearly, there are some limitations and assumptions made. For example, Vener et al. (1993) assumed that the reviewers were of roughly equal ability

22 Hypergeometric and negative hypergeometric tolerance intervals 135 Table 3: The peer review triage data of Vener et al. (1993) and the coal tit data of Ridiout (1999). # of Competitive Votes Frequency # of Feeders Visited Frequency and were fairly homogenous (i.e., in the long run they would each accept or reject the same percentage of applications). They also assumed that the full committee represented a gold standard for the review process. oreover, there were some subsequent decision rules after triage that fed into their overall model and analysis. For the purposes of our example, we will treat the aggregated triage data as a realization from a hypergeometric distribution. A total of y = 231 competitive votes were cast by the subcommittees for the 73 proposals, which is calculated using the data in the top half of Table 3. Thus, there were a possible n = 365 (5*73) potential competitive votes from the subcommittees. The committee (a finite population) of 21 would have the possibility of casting a total of N = 1533 (21*73) competitive votes. Using a hypergeometric distribution and clearly acknowledging the assumptions as stated in Vener et al. (1993), a(0.90, 0.90) tolerance interval is calculated as follows: > hypertol.int(x = 231, n = 365, N = 1533, m = 21, alpha = 0.10, P = 0.90, side = 2, method = "EX") alpha P rate p.hat 2-sided.lower 2-sided.upper Thus, with 90 % confidence, we would expect at least 90 % of the proposals reviewed by a full committee of m =21to have between 9 and 17 competitive votes. The above result is for the EX method, but the same tolerance interval is also obtained for the LS and CC methods. Example 3. (One-Sided Negative Hypergeometric Tolerance Limits) Ridiout (1999) analyzed data from an experiment where the memory in coal tits (a small bird found primarily throughout temperate Eurasia) was studied. The birds were released into a room that contained four feeders, of which only one contained food that was visible to the bird. The bird was removed from the room and then returned 15 minutes later. The feeders remained the

23 136 D.S. Young same, but the food in the filled feeder was hidden. The number of distinct feeders visited by each bird was recorded. A total of 19 birds were in the experiment, such that each bird was used 5-15 times, resulting in 207 different trials. There were 20 instances where the bird gave up searching for the food and, thus, were censored observations. Jolliffe and Jolliffe (1997) applied the E algorithm to incorporate the incomplete data and estimate different models for the probability distribution of the number of looks. Ridiout (1999) used a (generalized) negative hypergeometric distribution to estimate the probabilities. For our purposes, we will focus on the complete-data portion (i.e., 187 trials) of the analysis in Ridiout (1999). These data are given in the bottom half of Table 3. There are again some limitations and assumptions made. For example, the negative hypergeometric model in Ridiout (1999) ignored the fact that all of the data arose from multiple measurements on 19 birds. There could also be, say, a learning or fatigue effect that occurs across these multiple trials. For the purposes of our example, we will treat the aggregated coal tit data as a realization from a negative hypergeometric distribution. A total of z = 306 visits were made to the feeders for the 187 (complete) trials, which is calculated using the data in the bottom half of Table 3. Since we are interested in when the coal tit arrives at the feeder with food, the total number of successes is k = 187. Thus, there were a possible N =748visits to feeders when aggregating across all of the birds. Using a negative hypergeometric distribution and clearly acknowledging the assumptions as stated in Ridiout (1999), we are interested in finding a one-sided upper (0.85, 0.90) tolerance limit for the total number of feeders visited by a bird; i.e., m =1. This is found using the neghypertol.int function as follows: > neghypertol.int(x = 306, n = 187, N = 748, m = 1, alpha = 0.15, P = 0.90, side = 1, method="ex") alpha P rate p.hat 1-sided.lower 1-sided.upper Thus, with 85% confidence, we would expect at least 90% of the birds to visit no more than 3 feeders in total to find the one with the food. Note that the values of k and z from our earlier formulas correspond to the n and x arguments, respectively, in the R function. The above result is for the EX method, but the one-sided upper (0.85, 0.90) tolerance limit is also 3 for the LS and CC methods. Example 4. (Negative Hypergeometric Tolerance Intervals) We next consider the example in Zhang and Johnson (2011) for planning a sample survey

24 Hypergeometric and negative hypergeometric tolerance intervals 137 that utilizes random digit dialing. Consider a sampling frame that is a list of both residential and non-residential telephone numbers. Suppose that a researcher has banks of N = 100 telephone numbers (the sampling frame), from which they randomly sample one-at-a-time a sequence of telephone numbers (the primary sampling units). From a previous bank of 100, the researcher found that z =21calls were necessary until k =15residential numbers were reached. For workload planning purposes, the researcher would like to know with 95% confidence, the total number of calls necessary to reach their target for 85% of all such lists. A (0.95, 0.85) tolerance interval is calculated as follows: > neghypertol.int(x = 21, n = 15, N = 100, m = 15, alpha = 0.05, P = 0.85, side = 2, method = "EX") alpha P rate p.hat 2-sided.lower 2-sided.upper Thus, with 95% confidence, the researcher can expect that at least 85% of all lists will require between 15 and 36 total calls to make contact with 15 residential units. The above result is for the EX method. For this example, the (0.95, 0.85) tolerance intervals for the LS and CC methods are [15, 32] and [15, 34], respectively. 6 Discussion In this paper, we have provided a rigorous development of constructing one-sided tolerance limits and two-sided tolerance intervals for hypergeometric and negative hypergeometric variables. The construction of such tolerance limits when sampling without replacements has not been handled in the literature. For one-sided tolerance limits and two-sided equal-tailed tolerance intervals, we applied the approach of Hahn and Chandra (1981). We also leveraged the numerical results of Krishnamoorthy et al. (2011) (based on the methodology of Wang and Tsung (2009)) to provide a coverage adjustment to the equal-tailed tolerance intervals as a way to estimate two-sided tolerance intervals that control an inner percentage of the sampled population. The tolerance limits in all of these procedures depend on confidence intervals for, the unknown number of elements possessing an attribute of interest in the population. We compared their performance based on three different approaches: a large sample approach, an approach with a continuity correction, and an exact method based on nonrandomization. From our comparisons, we found that the exact method typically performs better with respect to coverage probabilities and expected widths. We have also included

25 138 D.S. Young functions for computing these tolerance limits in the R package tolerance (Young, 2010). While we demonstrated the relative performance of the intervals between the different methods, we note that future research could be done to further improve coverage probabilities, especially for smaller N. odificationstothe probability-matching approach of Cai and Wang (2009) might be possible and could potentially improve coverage probabilities for the hypergeometric and negative hypergeometric tolerance intervals discussed here. However, we note that neither distribution belongs to the natural discrete exponential family and, hence, the approach of Cai and Wang (2009) does not directly apply. oreover, their probability-matching approach involves fairly complex forms and are only fully-developed for the one-sided setting given the difficulty of extending the methodology to the two-sided setting. Thus, the trade-off between the complexity with such an approach and the gains in performance measures on the intervals would need to be closely considered. Acknowledgements. We are grateful to three anonymous referees and an Associate Editor for numerous helpful comments during the preparation of this article. We would also like to thank Thomas athew for some suggestions on an earlier version of this work. References brown, l.d., cai, t.t. and dasgupta, a. (2001). Interval estimation for a binomial proportion. Statist. Sci. 16, cai, t.t. and wang, h. (2009). Tolerance intervals for discrete distributions in exponential families. Statist. Sinica 19, eichenberger, p., hulliger, b. and potterat, j. (2011). Two measures for sample size determination. Survey Research ethods 5, guenther, w.c. (1975). The inverse hypergeometric - a useful model. Stat. Neerl. 29, hahn, g.j. and chandra, r. (1981). Tolerance intervals for Poisson and binomial random variables. Journal of Quality Technology 13, hahn, g.j. and meeker, w.q. (1991). Statistical Intervals: A Guide for Practitioners. Wiley-Interscience, New York. johnson, n.l., kemp, a.w. and kotz, s. (2005). Univariate Discrete Distributions, 3rd edn. Wiley, Hoboken. jolliffe, i.t. and jolliffe, a.r. (1997). odelling memory in coal tits: An illustration of the E algorithm. Biometrics 53, krishnamoorthy, k. and mathew, t. (2009). Statistical Tolerance Regions: Theory, Applications, and Computation. Wiley, Hoboken. krishnamoorthy, k., xia, y. and xie, f. (2011). A simple approximate procedure for constructing binomial and Poisson tolerance intervals. Comm. Statist. Theory ethods 40, lehmann, e.l. and romano, j.p. (2005). Testing Statistical Hypotheses. Springer.

26 Hypergeometric and negative hypergeometric tolerance intervals 139 mathew, t. and young, d.s. (2013). Fiducial-based tolerance intervals for some discrete distributions. Comput. Statist. Data Anal. 61, miller, g.k. and fridell, s.l. (2007). A forgotten discrete distribution? Reviving the negative hypergeometric model. Amer. Statist. 61, montgomery, d.c. (2013). Introduction to Statistical Quality Control, 7th edn. Wiley, New Jersey. newcombe, r.g. (1998). Two-sided confidence intervals for the single proportion: Comparison of seven methods. Stat. ed. 17, r development core team (2013) R: A Language and Environment for Statistical Comridiout, m.s. (1999). emory in coal tits: An alternative model. Biometrics 55, puting, Vienna. ISBN tian, m., tang, m.l., ng, h.k.t. and chen, p.s. (2009). A comparative study of confidence intervals for negative binomial proportions. J. Stat. Comput. Simul. 79, vener, k.j., feuer, e.j. and gorelic, l. (1993). A statistical model validating triage for the peer review process: Keeping the competitive applications in the review pipeline. The Federation of American Societies for Experimental Biology Journal 7, wang, h. and tsung, f. (2009). Tolerance intervals with improved coverage probabilities for binomial and Poisson variables. Technometrics 51, wilks, s.s. (1941). Determination of sample sizes for setting tolerance limits. The Annals of athematical Statistics 12, wilks, s.s. (1942). Statistical prediction with special reference to the problem of tolerance limits. The Annals of athematical Statistics 13, wright, t. (1997). A simple algorithm for tighter exact upper confidence bounds with rare attributes in finite universes. Statist. Probab. Lett. 36, young, d.s. (2010). tolerance: An R package for estimating tolerance intervals. Journal of Statistical Software 36, young, d.s. (2014). A procedure for approximate negative binomial tolerance intervals. J. Stat. Comput. Simul. 84, zacks, s. (1970). Uniformly most accurate upper tolerance limits for monotone likelihood ratio families of discrete distributions. J. Amer. Statist. Assoc. 65, zaslavsky, b.g. (2007). Calculation of tolerance limits and sample size determination for clinical trials with dichotomous outcomes. J. Biopharm. Statist. 17, zhang, l. and johnson, w.d. (2011) Approximate confidence intervals for a parameter of the negative hypergeometric distribution. In Proceedings of the Section on Survey Research ethods, pages American Statistical Association. Appendix A onotone Likelihood Ratio Property For a likelihood function L(θ; X), define Λ(θ 1,θ 2 ; X) = L(θ 1; X) L(θ 2 ; X) (A.1) to be the ratio between the likelihood evaluated at θ 1 and θ 2,whereθ 1 θ 2.

A Simple Approximate Procedure for Constructing Binomial and Poisson Tolerance Intervals

A Simple Approximate Procedure for Constructing Binomial and Poisson Tolerance Intervals This article was downloaded by: [Kalimuthu Krishnamoorthy] On: 11 February 01, At: 08:40 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 107954 Registered office:

More information

TOLERANCE INTERVALS FOR DISCRETE DISTRIBUTIONS IN EXPONENTIAL FAMILIES

TOLERANCE INTERVALS FOR DISCRETE DISTRIBUTIONS IN EXPONENTIAL FAMILIES Statistica Sinica 19 (2009), 905-923 TOLERANCE INTERVALS FOR DISCRETE DISTRIBUTIONS IN EXPONENTIAL FAMILIES Tianwen Tony Cai and Hsiuying Wang University of Pennsylvania and National Chiao Tung University

More information

Charles Geyer University of Minnesota. joint work with. Glen Meeden University of Minnesota.

Charles Geyer University of Minnesota. joint work with. Glen Meeden University of Minnesota. Fuzzy Confidence Intervals and P -values Charles Geyer University of Minnesota joint work with Glen Meeden University of Minnesota http://www.stat.umn.edu/geyer/fuzz 1 Ordinary Confidence Intervals OK

More information

Tolerance Intervals With Improved Coverage Probabilities for Binomial and Poisson Variables

Tolerance Intervals With Improved Coverage Probabilities for Binomial and Poisson Variables Tolerance Intervals With Improved Coverage Probabilities for Binomial and Poisson Variables Hsiuying WANG Institute of Statistics National Chiao Tung University Hsinchu, Taiwan Fugee TSUNG Department of

More information

Approximate Test for Comparing Parameters of Several Inverse Hypergeometric Distributions

Approximate Test for Comparing Parameters of Several Inverse Hypergeometric Distributions Approximate Test for Comparing Parameters of Several Inverse Hypergeometric Distributions Lei Zhang 1, Hongmei Han 2, Dachuan Zhang 3, and William D. Johnson 2 1. Mississippi State Department of Health,

More information

A process capability index for discrete processes

A process capability index for discrete processes Journal of Statistical Computation and Simulation Vol. 75, No. 3, March 2005, 175 187 A process capability index for discrete processes MICHAEL PERAKIS and EVDOKIA XEKALAKI* Department of Statistics, Athens

More information

Weizhen Wang & Zhongzhan Zhang

Weizhen Wang & Zhongzhan Zhang Asymptotic infimum coverage probability for interval estimation of proportions Weizhen Wang & Zhongzhan Zhang Metrika International Journal for Theoretical and Applied Statistics ISSN 006-1335 Volume 77

More information

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions K. Krishnamoorthy 1 and Dan Zhang University of Louisiana at Lafayette, Lafayette, LA 70504, USA SUMMARY

More information

Plugin Confidence Intervals in Discrete Distributions

Plugin Confidence Intervals in Discrete Distributions Plugin Confidence Intervals in Discrete Distributions T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Philadelphia, PA 19104 Abstract The standard Wald interval is widely

More information

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta Statistical Science 2005, Vol. 20, No. 4, 375 379 DOI 10.1214/088342305000000395 Institute of Mathematical Statistics, 2005 Comment: Fuzzy and Randomized Confidence Intervals and P -Values Lawrence D.

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

Optimal SPRT and CUSUM Procedures using Compressed Limit Gauges

Optimal SPRT and CUSUM Procedures using Compressed Limit Gauges Optimal SPRT and CUSUM Procedures using Compressed Limit Gauges P. Lee Geyer Stefan H. Steiner 1 Faculty of Business McMaster University Hamilton, Ontario L8S 4M4 Canada Dept. of Statistics and Actuarial

More information

Test Strategies for Experiments with a Binary Response and Single Stress Factor Best Practice

Test Strategies for Experiments with a Binary Response and Single Stress Factor Best Practice Test Strategies for Experiments with a Binary Response and Single Stress Factor Best Practice Authored by: Sarah Burke, PhD Lenny Truett, PhD 15 June 2017 The goal of the STAT COE is to assist in developing

More information

Superiority by a Margin Tests for One Proportion

Superiority by a Margin Tests for One Proportion Chapter 103 Superiority by a Margin Tests for One Proportion Introduction This module provides power analysis and sample size calculation for one-sample proportion tests in which the researcher is testing

More information

Simultaneous Prediction Intervals for the (Log)- Location-Scale Family of Distributions

Simultaneous Prediction Intervals for the (Log)- Location-Scale Family of Distributions Statistics Preprints Statistics 10-2014 Simultaneous Prediction Intervals for the (Log)- Location-Scale Family of Distributions Yimeng Xie Virginia Tech Yili Hong Virginia Tech Luis A. Escobar Louisiana

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Inference on reliability in two-parameter exponential stress strength model

Inference on reliability in two-parameter exponential stress strength model Metrika DOI 10.1007/s00184-006-0074-7 Inference on reliability in two-parameter exponential stress strength model K. Krishnamoorthy Shubhabrata Mukherjee Huizhen Guo Received: 19 January 2005 Springer-Verlag

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Binomial random variable

Binomial random variable Binomial random variable Toss a coin with prob p of Heads n times X: # Heads in n tosses X is a Binomial random variable with parameter n,p. X is Bin(n, p) An X that counts the number of successes in many

More information

Oikos. Appendix 1 and 2. o20751

Oikos. Appendix 1 and 2. o20751 Oikos o20751 Rosindell, J. and Cornell, S. J. 2013. Universal scaling of species-abundance distributions across multiple scales. Oikos 122: 1101 1111. Appendix 1 and 2 Universal scaling of species-abundance

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

THE SHORTEST CONFIDENCE INTERVAL FOR PROPORTION IN FINITE POPULATIONS

THE SHORTEST CONFIDENCE INTERVAL FOR PROPORTION IN FINITE POPULATIONS APPLICATIOES MATHEMATICAE Online First version Wojciech Zieliński (Warszawa THE SHORTEST COFIDECE ITERVAL FOR PROPORTIO I FIITE POPULATIOS Abstract. Consider a finite population. Let θ (0, 1 denote the

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Reports of the Institute of Biostatistics

Reports of the Institute of Biostatistics Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Distribution Fitting (Censored Data)

Distribution Fitting (Censored Data) Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...

More information

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions Introduction to Statistical Data Analysis Lecture 3: Probability Distributions James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Fuzzy and Randomized Confidence Intervals and P-values

Fuzzy and Randomized Confidence Intervals and P-values Fuzzy and Randomized Confidence Intervals and P-values Charles J. Geyer and Glen D. Meeden June 15, 2004 Abstract. The optimal hypothesis tests for the binomial distribution and some other discrete distributions

More information

COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA. Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky

COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA. Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky COMPUTATION OF THE EMPIRICAL LIKELIHOOD RATIO FROM CENSORED DATA Kun Chen and Mai Zhou 1 Bayer Pharmaceuticals and University of Kentucky Summary The empirical likelihood ratio method is a general nonparametric

More information

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests) Dr. Maddah ENMG 617 EM Statistics 10/15/12 Nonparametric Statistics (2) (Goodness of fit tests) Introduction Probability models used in decision making (Operations Research) and other fields require fitting

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Ch. 5 Hypothesis Testing

Ch. 5 Hypothesis Testing Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Taiwan Published online: 11 Sep To link to this article:

Taiwan Published online: 11 Sep To link to this article: This article was downloaded by: [National Chiao Tung University 國立交通大學 ] On: 8 April 04, At: 6:09 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 07954 Registered

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

A New Confidence Interval for the Difference Between Two Binomial Proportions of Paired Data

A New Confidence Interval for the Difference Between Two Binomial Proportions of Paired Data UW Biostatistics Working Paper Series 6-2-2003 A New Confidence Interval for the Difference Between Two Binomial Proportions of Paired Data Xiao-Hua Zhou University of Washington, azhou@u.washington.edu

More information

William C.L. Stewart 1,2,3 and Susan E. Hodge 1,2

William C.L. Stewart 1,2,3 and Susan E. Hodge 1,2 A Targeted Investigation into Clopper-Pearson Confidence Intervals William C.L. Stewart 1,2,3 and Susan E. Hodge 1,2 1Battelle Center for Mathematical Medicine, The Research Institute, Nationwide Children

More information

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

Smoking Habits. Moderate Smokers Heavy Smokers Total. Hypertension No Hypertension Total

Smoking Habits. Moderate Smokers Heavy Smokers Total. Hypertension No Hypertension Total Math 3070. Treibergs Final Exam Name: December 7, 00. In an experiment to see how hypertension is related to smoking habits, the following data was taken on individuals. Test the hypothesis that the proportions

More information

SIMULATED POWER OF SOME DISCRETE GOODNESS- OF-FIT TEST STATISTICS FOR TESTING THE NULL HYPOTHESIS OF A ZIG-ZAG DISTRIBUTION

SIMULATED POWER OF SOME DISCRETE GOODNESS- OF-FIT TEST STATISTICS FOR TESTING THE NULL HYPOTHESIS OF A ZIG-ZAG DISTRIBUTION Far East Journal of Theoretical Statistics Volume 28, Number 2, 2009, Pages 57-7 This paper is available online at http://www.pphmj.com 2009 Pushpa Publishing House SIMULATED POWER OF SOME DISCRETE GOODNESS-

More information

Glossary. Appendix G AAG-SAM APP G

Glossary. Appendix G AAG-SAM APP G Appendix G Glossary Glossary 159 G.1 This glossary summarizes definitions of the terms related to audit sampling used in this guide. It does not contain definitions of common audit terms. Related terms

More information

Test Volume 11, Number 1. June 2002

Test Volume 11, Number 1. June 2002 Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa

More information

37.3. The Poisson Distribution. Introduction. Prerequisites. Learning Outcomes

37.3. The Poisson Distribution. Introduction. Prerequisites. Learning Outcomes The Poisson Distribution 37.3 Introduction In this Section we introduce a probability model which can be used when the outcome of an experiment is a random variable taking on positive integer values and

More information

A New Two Sample Type-II Progressive Censoring Scheme

A New Two Sample Type-II Progressive Censoring Scheme A New Two Sample Type-II Progressive Censoring Scheme arxiv:609.05805v [stat.me] 9 Sep 206 Shuvashree Mondal, Debasis Kundu Abstract Progressive censoring scheme has received considerable attention in

More information

Fuzzy and Randomized Confidence Intervals and P -values

Fuzzy and Randomized Confidence Intervals and P -values Fuzzy and Randomized Confidence Intervals and P -values Charles J. Geyer and Glen D. Meeden May 23, 2005 Abstract. The optimal hypothesis tests for the binomial distribution and some other discrete distributions

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Review of Discrete Probability (contd.)

Review of Discrete Probability (contd.) Stat 504, Lecture 2 1 Review of Discrete Probability (contd.) Overview of probability and inference Probability Data generating process Observed data Inference The basic problem we study in probability:

More information

A Count Data Frontier Model

A Count Data Frontier Model A Count Data Frontier Model This is an incomplete draft. Cite only as a working paper. Richard A. Hofler (rhofler@bus.ucf.edu) David Scrogin Both of the Department of Economics University of Central Florida

More information

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work

Approximate interval estimation for EPMC for improved linear discriminant rule under high dimensional frame work Hiroshima Statistical Research Group: Technical Report Approximate interval estimation for PMC for improved linear discriminant rule under high dimensional frame work Masashi Hyodo, Tomohiro Mitani, Tetsuto

More information

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN Journal of Biopharmaceutical Statistics, 15: 889 901, 2005 Copyright Taylor & Francis, Inc. ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500265561 TESTS FOR EQUIVALENCE BASED ON ODDS RATIO

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Discrete Distributions

Discrete Distributions Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have

More information

7 Estimation. 7.1 Population and Sample (P.91-92)

7 Estimation. 7.1 Population and Sample (P.91-92) 7 Estimation MATH1015 Biostatistics Week 7 7.1 Population and Sample (P.91-92) Suppose that we wish to study a particular health problem in Australia, for example, the average serum cholesterol level for

More information

Fuzzy and Randomized Confidence Intervals and P-values

Fuzzy and Randomized Confidence Intervals and P-values Fuzzy and Randomized Confidence Intervals and P-values Charles J. Geyer and Glen D. Meeden December 6, 2004 Abstract. The optimal hypothesis tests for the binomial distribution and some other discrete

More information

Distance-based test for uncertainty hypothesis testing

Distance-based test for uncertainty hypothesis testing Sampath and Ramya Journal of Uncertainty Analysis and Applications 03, :4 RESEARCH Open Access Distance-based test for uncertainty hypothesis testing Sundaram Sampath * and Balu Ramya * Correspondence:

More information

Modified Large Sample Confidence Intervals for Poisson Distributions: Ratio, Weighted Average and Product of Means

Modified Large Sample Confidence Intervals for Poisson Distributions: Ratio, Weighted Average and Product of Means Modified Large Sample Confidence Intervals for Poisson Distributions: Ratio, Weighted Average and Product of Means K. KRISHNAMOORTHY a, JIE PENG b AND DAN ZHANG a a Department of Mathematics, University

More information

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb May 15, 2015 > Abstract In this paper, we utilize a new graphical

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY FOR STABILITY STUDIES

AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY FOR STABILITY STUDIES Journal of Biopharmaceutical Statistics, 16: 1 14, 2006 Copyright Taylor & Francis, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500406421 AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY

More information

In Defence of Score Intervals for Proportions and their Differences

In Defence of Score Intervals for Proportions and their Differences In Defence of Score Intervals for Proportions and their Differences Robert G. Newcombe a ; Markku M. Nurminen b a Department of Primary Care & Public Health, Cardiff University, Cardiff, United Kingdom

More information

arxiv: v1 [math.st] 5 Jul 2007

arxiv: v1 [math.st] 5 Jul 2007 EXPLICIT FORMULA FOR COSTRUCTIG BIOMIAL COFIDECE ITERVAL WITH GUARATEED COVERAGE PROBABILITY arxiv:77.837v [math.st] 5 Jul 27 XIJIA CHE, KEMI ZHOU AD JORGE L. ARAVEA Abstract. In this paper, we derive

More information

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

On discrete distributions with gaps having ALM property

On discrete distributions with gaps having ALM property ProbStat Forum, Volume 05, April 202, Pages 32 37 ISSN 0974-3235 ProbStat Forum is an e-journal. For details please visit www.probstat.org.in On discrete distributions with gaps having ALM property E.

More information

IE 316 Exam 1 Fall 2011

IE 316 Exam 1 Fall 2011 IE 316 Exam 1 Fall 2011 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed 1 1. Suppose the actual diameters x in a batch of steel cylinders are normally

More information

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science EXAMINATION: QUANTITATIVE EMPIRICAL METHODS Yale University Department of Political Science January 2014 You have seven hours (and fifteen minutes) to complete the exam. You can use the points assigned

More information

Probability Distributions - Lecture 5

Probability Distributions - Lecture 5 Probability Distributions - Lecture 5 1 Introduction There are a number of mathematical models of probability density functions that represent the behavior of physical systems. In this lecture we explore

More information

OHSU OGI Class ECE-580-DOE :Statistical Process Control and Design of Experiments Steve Brainerd Basic Statistics Sample size?

OHSU OGI Class ECE-580-DOE :Statistical Process Control and Design of Experiments Steve Brainerd Basic Statistics Sample size? ECE-580-DOE :Statistical Process Control and Design of Experiments Steve Basic Statistics Sample size? Sample size determination: text section 2-4-2 Page 41 section 3-7 Page 107 Website::http://www.stat.uiowa.edu/~rlenth/Power/

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

HYPERGEOMETRIC and NEGATIVE HYPERGEOMETIC DISTRIBUTIONS

HYPERGEOMETRIC and NEGATIVE HYPERGEOMETIC DISTRIBUTIONS HYPERGEOMETRIC and NEGATIVE HYPERGEOMETIC DISTRIBUTIONS A The Hypergeometric Situation: Sampling without Replacement In the section on Bernoulli trials [top of page 3 of those notes], it was indicated

More information

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic STATISTICS ANCILLARY SYLLABUS (W.E.F. the session 2014-15) Semester Paper Code Marks Credits Topic 1 ST21012T 70 4 Descriptive Statistics 1 & Probability Theory 1 ST21012P 30 1 Practical- Using Minitab

More information

The International Journal of Biostatistics

The International Journal of Biostatistics The International Journal of Biostatistics Volume 7, Issue 1 2011 Article 12 Consonance and the Closure Method in Multiple Testing Joseph P. Romano, Stanford University Azeem Shaikh, University of Chicago

More information

CHAPTER EVALUATING HYPOTHESES 5.1 MOTIVATION

CHAPTER EVALUATING HYPOTHESES 5.1 MOTIVATION CHAPTER EVALUATING HYPOTHESES Empirically evaluating the accuracy of hypotheses is fundamental to machine learning. This chapter presents an introduction to statistical methods for estimating hypothesis

More information

IE 581 Introduction to Stochastic Simulation

IE 581 Introduction to Stochastic Simulation 1. List criteria for choosing the majorizing density r (x) when creating an acceptance/rejection random-variate generator for a specified density function f (x). 2. Suppose the rate function of a nonhomogeneous

More information

Enquiry. Demonstration of Uniformity of Dosage Units using Large Sample Sizes. Proposal for a new general chapter in the European Pharmacopoeia

Enquiry. Demonstration of Uniformity of Dosage Units using Large Sample Sizes. Proposal for a new general chapter in the European Pharmacopoeia Enquiry Demonstration of Uniformity of Dosage Units using Large Sample Sizes Proposal for a new general chapter in the European Pharmacopoeia In order to take advantage of increased batch control offered

More information

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679 APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 1 Table I Summary of Common Probability Distributions 2 Table II Cumulative Standard Normal Distribution Table III Percentage Points, 2 of the Chi-Squared

More information

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016

More information

Inequalities Between Hypergeometric Tails

Inequalities Between Hypergeometric Tails JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 7(3), 165 174 Copyright c 2003, Lawrence Erlbaum Associates, Inc. Inequalities Between Hypergeometric Tails MARY C. PHIPPS maryp@maths.usyd.edu.au

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky

COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky Summary Empirical likelihood ratio method (Thomas and Grunkmier 975, Owen 988,

More information

11. Statistical Evidence of Theomatics demonstrated at Luke 15:10-32

11. Statistical Evidence of Theomatics demonstrated at Luke 15:10-32 11. Statistical Evidence of Theomatics demonstrated at Luke 15:10-32 P Theology is based upon faith and is as such beyond a scientific investigation. The interpretation of theological phenomenons, therefore,

More information

STATISTICAL ANALYSIS AND COMPARISON OF SIMULATION MODELS OF HIGHLY DEPENDABLE SYSTEMS - AN EXPERIMENTAL STUDY. Peter Buchholz Dennis Müller

STATISTICAL ANALYSIS AND COMPARISON OF SIMULATION MODELS OF HIGHLY DEPENDABLE SYSTEMS - AN EXPERIMENTAL STUDY. Peter Buchholz Dennis Müller Proceedings of the 2009 Winter Simulation Conference M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds. STATISTICAL ANALYSIS AND COMPARISON OF SIMULATION MODELS OF HIGHLY DEPENDABLE

More information

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity Prof. Kevin E. Thorpe Dept. of Public Health Sciences University of Toronto Objectives 1. Be able to distinguish among the various

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Empirical Likelihood Methods for Sample Survey Data: An Overview

Empirical Likelihood Methods for Sample Survey Data: An Overview AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information