Inverse-Type Sampling Procedure for Estimating the Number of Multinomial Classes
|
|
- Geraldine Lambert
- 6 years ago
- Views:
Transcription
1 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. SEQUENTIAL ANALYSIS Vol. 22, No. 4, pp , 23 Inverse-Type Sampling Procedure for Estimating the Number of Multinomial Classes Hokwon A. Cho* Department of Mathematical Sciences, University of Nevada, Las Vegas, Nevada, USA ABSTRACT A procedure with inverse-type sampling is proposed for the estimation of the number of classes (or cells) in a given multinomial distribution with a devised stopping rule that satisfies a preassigned P*-condition and controls the probability of a correct decision P{CD}. Dirichlet Integrals of Type II are adapted for developing the procedure, which is based on a decision-theoretic approach. We assume that we know neither the number of classes (or cells) nor the cell probabilities (parameters) of the given multinomial but we do assume the minimum cell probability "(>). Finally, the Monte Carlo experimentation is carried out in order to illustrate the theoretical results and the behavior of proposed procedure. Key Words: Inverse-type sampling; Number of classes in multinomial distribution; Probability of correct decision; Dirichlet Integrals Type II; Minimum cell probability. *Correspondence: Hokwon A. Cho, Department of Mathematical Sciences, University of Nevada, 455 Maryland Parkway, Box 4542, Las Vegas, NV 89154, USA; Fax: (77) ; cho@unlv.nevada.edu. 37 DOI: 1.181/SQA Copyright & 23 by Marcel Dekker, Inc (Print); (Online)
2 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. 38 Cho 1. INTRODUCTION Suppose we have a multinomial population which is partitioned into an unknown number of classes, say k. Then a natural question will be to ask how many classes (or categories, cells) there are. The following examples are rather common in practical situations and have given in the literature. 1. Biologists, ecologists and botanists may be interested in estimating the number of different species in a population of fish, animals or plants. Such estimates are essential to obtain indices of diversity or the rates of extinction which play an important role in the study of ecosystems or in studying environmental conditions of the ecological diversity of species. [1] 2. Numismatists may be concerned with estimating the total number of dies used to produce coins to study ancient economies. Suppose the researcher assumes that each die has been used to manufacture approximately the same number of coins. [2,3] 3. Linguists may be interested in estimating the size of an author s vocabulary. [4,5] In addition to these applications, estimating the number of distinct records in a filing system such as the social security card file can be an important issue, where many records are suspected to be duplicated. An early article in the above area was written by Goodman. [6] Since then, many approaches have been proposed and developed. Bunge and Fitzpatrick [7] provided a comprehensive review of the existing literature according to methods and approaches with an extended bibliography. Even though there has been more work done on these problems, both frequentist and Bayesian methods [8] use rather restrictive assumptions; namely, equal cell probabilities or uniform prior. Here, we consider a more flexible inverse-type sampling procedure, which has not been considered so far, using multiple decision-theoretic methods. 2. FORMULATION Let X 1, X 2, be a sequence of independent observations drawn from a multinomial population with an unknown integer number of classes (or cells) k, with corresponding cell probabilities p i, where p i >, i ¼ 1, 2,, k and k i¼1p i ¼ 1. Then, there is a simple relation between the unknown k
3 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. Inverse-Type Sampling Procedure 39 and the smallest and largest values, namely, 1=p ½kŠ k 1=p ½1Š, ð2:1þ where p [i] represents the i-th ordered cell probability. For moderate and large numbers of observations, the number of cells with positive frequency in the sequential sample at stopping time will be a important part of our statistic for estimating the true number of cells k. However we need an appropriate stopping rule that will provide a confidence level on our estimate of k. From a practical point of view, we fix a sufficiently small positive value " and say that cells with probability less than this either do not exist or are not of any practical interest. We refer this condition as to the "-condition as ð"þ (the extended parameter space). From (2.1) it follows that k 1/"; thus if " ¼.1, then k 1 and if " ¼.5, then k Stopping Rule of the Procedure We propose the following stopping and decision rule for our procedure R; If the smallest positive frequency among the observed cells reaches a tabled positive integer c cð"þ, then we stop sampling. That is, the stopping rule for the total (random) number N c of observations required is N c ¼ inf n 3 Min f i ¼ c, ð2:2þ 1iK n where n ¼ K n i¼1 f i, n 1 and K n is the number of distinct cells seen in n observations, and the f i, i ¼ 1, 2,, K n are corresponding frequencies of each observed cell. The value of c, which is called the stopping value, will be determined by a P*-condition and this in turn will determine the random variable N c. We then use the statistic K n as an estimate of the true number of cells k. This value K n is clearly a lower bound on k and K n! k as c!1. Since we do not know at stopping time whether we have seen all the cells or not, we define a correct decision (CD) as observing all of the cells with p i " by stopping time. We show below that for the procedure R the P{CD R} P* where P* is a prespecified level of confidence such that 1/k P*<1; the tabled value of the stopping integer c depends on the prespecified P*, as well as on ". Equivalently the probability of a wrong decision P{WD R}, i.e., of stopping without seeing all the cells with p i ", will be less than 1 P*.
4 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. 31 Cho Under our "-condition the proposed sequential procedure R terminates with probability one with a finite number of observations since the value of c will always be finite regardless of how small a positive value is prespecified for " The Configurations of the Cell Probabilities and Least Favorable Configuration The procedure R we use is constructed so that it satisfies the usual P*-requirement, namely P{CD R} P* for any configurations of cell probabilities, vector p, in the parameter space (") in which every component is at least ". This leads us to find the so-called least favorable configuration (LFC). [9] That is, we must look for the configuration which infimizes (or minimizes) the PfCDjRg over all vectors p 2 ð"þ. Thus if such a limit configuration p LFC exists and is found, then we will be able to confine our attention solely to the LFC rather than the entire space of parameters. However, we do focus on finding the LFC in this article. We want to satisfy the basic requirement that PfCDjRg P for all p 2 ð"þ. A probability configuration of parameters is said to be a "-least favorable configuration ("-LFC) if any given N, k and " the probability of a correct decision P{CD} becomes a minimum (or infimum) over all vectors p 2 ð"þ under "-condition. We note that the slippage ratio of cell probabilities, defined as p [ j] /p [ j 1], j ¼ 2, 3,, k, is related to find the LFC Basic Scheme of the Procedure In the first stage of our solution we take " ¼.1 and P* to be the traditional value.95 to give a specific example of the use of the procedure R. Then the goal is to find the smallest integers c under the stopping rule R which satisfies the P*-requirement. Firstly let us consider the case where only one cell has been observed, i.e., t ¼ 1. Using our assumption where only one cell has been observed, there may be a second cell; in the worst scenario suppose its cell probability values are p [1] ¼.1 and p [2] ¼.9. Also suppose P* ¼.95. Since PfCDjRg þ PfWDjRg ¼ 1, we want to find integer c such that P{WD} 1 P*. In order to find c(.1) for this situation, we set up the equation in n: :9 n þ :1 n :5 ð2:3þ
5 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. Inverse-Type Sampling Procedure 311 By solving this equation we find n to be 28.4 and the smallest integer above this is 29. In this case c is the value of n that satisfies Eq. (2.3) under the condition " ¼.1, namely 29. Therefore the number of observation required is 29 observations in the one cell observed in order to have PfCDjRg :95; the decision in this case would be k ¼ 1. That is, if we observe one cell at least 29 times in a row, then we conclude with probability.95 that there is only one class (or kind) in the population. In the next section we apply the same analogy to extend to the case where we observe more than one cell, i.e., t PROBABILITY OF CORRECT DECISION AND DIRICHLET INTEGRALS 3.1. Multinomial Events and C and D Functions Let two events E 1 and E 2 represent the minimum frequency and maximum frequency respectively among the cells at stopping time (a.s.t.) and the stopping rule is of type used in inverse sampling. Then applying the results shown in Olkin and Sobel, [1] we have PfE 1 g¼pf f i r, i ¼ 1, 2,, b a:s:t :when f bþ1 ¼ m for the first timeg ¼ Z ab Z ab 1 Z a1 f ðbþ ðx; r, mþ Yb Ca ðbþ ðr, mþ ð3:1þ and PfE 2 g¼pf f i <r, i ¼1, 2,, b a:s:t: when f bþ1 ¼ m for the first timeg where Z 1 Z 1 ¼ a b a b 1 D ðbþ a ðr, mþ f ðbþ ðx; r, mþ ¼ Z 1 a 1 ðm þ RÞ ðmþ Q b i¼1 ðr iþ f ðbþ ðx; r, mþ Yb i¼1 i¼1 dx i dx i Q b i¼1 xr i 1 i ð3:2þ dx i 1 þ P mþr ð3:3þ b i¼1 x i which is the (b-variate) Dirichlet-type II distribution, f i denotes the frequency in the i-th cell, and a i ¼ p i /p bþ1, i ¼ 1, 2,, b. Consider a multinomial model with 1 þ b þ j cells and the corresponding cell probabilities are p and p ¼ ( p 1, p 2,, p b : p bþ1,, p bþj ),
6 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. 312 Cho where p þ bþj i¼1 p i ¼ 1. Let c be an integer such that 1 c b. Suppose that we stop sampling when the frequency f reaches the value of m for the first time. Let E denote the compound multinomial even that the frequencies for i ¼ 1, 2,, c are all at least r, the frequencies f i for i ¼ c þ 1, c þ 2,, b are all less than r and the frequencies for i ¼ b þ 1, b þ 2,, b þ j are all exactly equal to r at stopping time. Then the probability of the compound multinomial event E is given by the CDintegrals in Sobel et al. [11] :! ðc, b c;jþ ðm þ RÞ Y bþj Z a r a 1 Z a c i CDa ðr; mþ¼ ðmþ bþj ðrþ r i¼bþ1 Z 1 Z 1 Q b i¼1 x i r 1 dx i a cþ1 1 þ P bþj i¼bþ1 a i þ P mþr, ð3:4þ b i¼1 x i a b where a ¼ p/p ¼ ( p 1, p 2,, p b : p bþ1,, p bþj ) and R ¼ (b þ j )r. We note that c denotes the number of integrals of type C in the expression (3.4). Furthermore if j ¼, we are able to eliminate the j and simply (3.4) as follows: CD ðc, b cþ a ðr; mþ ¼ ðm þ RÞ ðmþ b ðrþ Z a 1 Z a c a cþ1 Z 1 Z 1 a b Q b i¼1 xr 1 i 1 þ P b i¼1 x i dx i mþr ð3:5þ By letting c ¼ b, this CD-integral reduces to the usual C-integral in Eq. (3.1) and letting c ¼, it reduces to the usual D-integral in Eq. (3.2). In addition, if we have a common value for the a, then we can use a without specifying the vector notations on the left-hand side of both Eqs. (3.4) and (3.5). It should be noted that since there is a common value of r in each partition, it is convenient to use one C and one D in the left hand side of the expressions (3.2) and (3.5) Expressions of P{WD} based on C and D Functions Let k be the true number of cells in the state of nature. Let T denote the observed number of cells at the stopping time. Also we assume in the present stage that every cell probability p i 1/1, i ¼ 1, 2,, k. Consider a configuration such as {9/1t, 9/1t,, 9/1t, 1/1}, where 9/1t 1/1, t ¼ 1, 2,, 9. In this configuration the slippage ratio of cell probabilities is a constant and 9/t. Note that if t ¼ 1, then we immediately stop
7 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. Inverse-Type Sampling Procedure 313 Figure 3.1. Cell structure and probabilities for omitting one cell. sampling since under our "-condition all p i s are equally probable spontaneously, and we call this equal parameter configuration (EPC). We now are going to focus on the case of omitting exactly one cell P{WD} for the case of Omitting One Cell: T ¼ k 1 In order to take the case of omitting one cell into consideration, first we need to investigate the structure of the cells in multinomial as shown in Fig Since the total number of cells is the sum of the observed cells and the missing cell, the total number of cells from the above diagram must be k ¼ t þ 1 where t ¼ 1, 2,,9. According to the cell structure, there can be three possibilities in omitting one cell. Hence the P{WD} can be completely considered on the basis of the three cases. Denoting by L the cell of size 9/1t and by S the cell of size 1/1, the probability of a wrong decision P{WD} for omitting one cell is composed of three parts by the cell structure, that is, PfWDg ¼tL; h Siþ tðt 1ÞhL; Liþ ts; h Li, ð3:6þ where tl; h Si, for instance, represents the fact that there are t possible cases in which one of the L is the missing cell and S is the stopping cell. Then by Eq. (3.4) we can express P{WD} in terms of multiple of C and D-integrals as follows: PfWDg¼tDC ð1,t 1Þ ð1,c;cþþtðt 1ÞDCCð1,1,t 2Þ ¼ t þ t ðt=9,1þ þ tdcð1,t 1Þ ð9=t,9=tþ Z 1 Z 1 Z 1 ð1 þ tcþ t ðcþ ð1 þ tcþ t ðcþ þ tðt 1Þ t=9 Z 1 Z 9=t 9=t ð1 þ tcþ t ðcþ Z 9=t dx Q t 1 1 þ x þ P t 1 i¼1 y i Z 1 Z t=9 Z 1 1 dy i i¼1 yc 1 i 1þtc dx Q t 1 1 þ x þ P t 1 i¼1 y i Z 1 dy i i¼1 yc 1 i 1þtc ð1,t=9,1þ ð1,c,c;cþ dxðy c 1 dyþ Q t 2 i¼1 zc 1 i dz i 1 þ x þ y þ P 1þtc : t 2 i¼1 z i ð3:7þ
8 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. 314 Cho After some algebra and integrations of the D-type integrals we write Eq. (3.7) in the form 9 c ðtcþ PfWDg ¼t 9 þ t t ðcþ t c ðtcþ þ t 9 þ t t ðcþ Z 9=ð9þtÞ Z 9=ð9þtÞ Z 9=ð9þtÞ Q t 1 i¼1 wc 1 i Z 9=ð9þtÞ Z 1 ðtc cþ 1=2 Z 1=2 þ t ðt 1Þ 2 c t 1 ðcþ t c 1 9 2c 1 ðtc þ 1Þ 9 18 þ t 2 ðcþ t 2 ðcþ Z 9=ð18þtÞ Z 9=ð18þtÞ 1 þ P t 2 i¼1 w i dw i 1 þ P t 1 i¼1 w i Q t 1 i¼1 wc 1 i Q t 2 i¼1 wc 1 i dw i tc 1 t c 2 9 2c 2 ðtc þ 2Þ 9 18 þ t ðcþ ðc 1Þ t 2 ðcþ Z 9=ð18þtÞ Z 9=ð18þtÞ Q t 2 i¼1 wc 1 i dw i tc 2 1 þ P t 2 i¼1 w i tc dw i 1 þ P t 1 i¼1 w i 1 þ P t 2 i¼1 w i tc Q t 2 i¼1 wc 1 i dw i tc c 9 c Z ðtc þ cþ 9=ð18þtÞ Z 9=ð18þtÞ 18 þ t ðcþ ð1þ t 2 ðcþ 9 Q t 2 i¼1 wc 1 i dw >= i 1 þ P tc c t 2 i¼1 w >; : ð3:8þ i After combining similar terms and reshuffing the remaining terms in terms of the C-function, this can be written as 9 c PfWDg ¼t þ t c C ðt 1Þ 1Þ n 9=ð9þtÞðc; cþþtðt 9 þ t 9 þ t 2 c C ðt 2Þ 1=2 ðc; cþ Xc t c 1 18 c ) 2c 1 i C ðt 2Þ 9=ð18þtÞðc; 2c iþ : 18 þ t 18 þ t c 1 i¼1 ð3:9þ
9 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. Inverse-Type Sampling Procedure 315 Table 3.1. Total P{WD} for " ¼.1, P ¼.95. k t c(") P{WD, t ¼ k 1} (1) P{WD, t ¼ k 2} (2) Total P{WD} (1) þ (2) From Eq. (3.9), we are able to obtain values of c corresponding to different values of t, (t ¼ 1, 2,, 9). The values are given in Table 3.1 when P is specified as.95 under " ¼.1. It is most important to note that P{WD}.5 uniformly. It should also be noted that P{WD, T ¼ k 1} in Table 3.1 is not monotonic in either t or c. A reasonable explanation for this is that both the least favorable configuration and the c values are changing. After c settles (to 4 in the table) and the least favorable configuration becomes the one with exactly one cell probability slipped to the left and thereafter the results in Table 3.1 are monotonic. As we see in table, the maximum value t can take is 9 since the number of missing cells is 1 and total number of cells k ¼ t þ 1, and the probabilities of omitting one cell are uniformly smaller than 1 P* which is.5. Therefore the suggested procedure is well justified by guaranteeing P*-condition under the assumption P{WD} for the Case of Omitting Two Cells: T ¼ k 2 In order to apply the same analogy of the case of omitting one cell, let us consider the cell structure for the case of missing two cells. Since the total number of cells is the sum of the observed cells and the missing cells, the noticeable difference from the case of omitting one cell is that the total number of cells as shown in Fig. 3.2 must be k ¼ (t þ 1) þ 1 ¼ t þ 2 where t ¼ 1, 2,, 8. By the cell structure, there can be three possibilities in omitting two cells. Hence the P{WD} can be completely considered on the basis of the following three cases. Denoting L by the cell of size 9/1t and S by the cell of size 1/1,
10 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. 316 Cho Figure 3.2. Cell structure and probabilities for omitting two cells. the probability of wrong decision P{WD} when we have two unobserved cells is given by PfWDg ¼ t þ 1 hl, L; Siþ t ðt þ 1ÞhL, S; Li 2 t þðtþ1þ hl, L; Li, ð3:1þ 2 where tðt þ 1ÞhL, S; Li, for instance, represents that there are t (t þ 1) possible cases of which one of L and S are missing cell and L is the stopping cell. Then we can express P{WD} in terms of multiple of C and D-integrals by Eq. (3.6). PfWDg ¼ t þ 1 2 D ð1þ 9=ðtþ1Þ Dð1Þ 9=ðtþ1Þ C ð1þ 9=ðtþ1Þð1, 1, c; cþ þðtþ1þtd ð1þ 1 Dð1Þ ðtþ1þ=9 C ðt 1Þ 1 ð1, 1, c; cþ t þðtþ1þ D ð1þ 1 2 Dð1Þ 1 C ðt 2Þ 1 C ð1þ ðtþ1þ=9ð1, 1, c, c; cþ: ð3:11þ Then we can express P{WD} in terms of multiple of C and D-integrals by Eq. (3.6) as follows. t ðt þ 1Þ ðtc þ 2Þ PfWDg ¼ 2 t ðcþ þ Q t 1 i¼1 xc 1 Z 1 9=ðtþ1Þ Z 1 9=ðtþ1Þ Z 9=ðtþ1Þ i dx i dy 1 dy 2 tcþ2 þ t ðt þ 1Þ 1þy 1 þy 2 þ P t 1 i¼1 x i Z 1 Z 1 1 ðtþ19þ=9 Z 1 ðt 1Þt ðt þ 1Þ 2 Z 1 ðtc þ 2Þ t ðcþ Q t 1 i¼1 xc 1 Z 9=ðtþ1Þ ð2 þ tcþ t ðcþ i dx i dy 1 dy 2 1 þ y 1 þ y 2 þ P t 1 i¼1 x i Z 1 Z 1 Z tcþ2 Z 1 Z ðtþ1þ=9 Q yc 1 3 dy t 2 3 i¼1 xc 1 i dx i dy 1 dy 2 1 þ y 1 þ y 2 þ P tcþ2 : ð3:12þ t 2 i¼1 x i þ y 3
11 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. Inverse-Type Sampling Procedure 317 After considerable algebra and integrations, we obtain PfWDg¼ tðt þ 1Þ 2 t þ 1 tc ðtcþ t þ 19 ðcþ t 1 ðcþ 9 c ðtcþ þ tðt þ 1Þ t þ 19 ðcþ t 1 ðcþ ( tðt 1Þðt þ 1Þ 1 ðtc cþ þ 2 3 c ðcþ t 2 ðcþ Xc t þ 1 c i 9 9 i¼1 Z 9=ð28þtÞ Z 9=ð28þtÞ Z 9=ðtþ19Þ Z t=ðtþ19þ Z 1=3 Z 9=ðtþ19Þ Z t=ðtþ19þ Q t 2 Q t 1 i¼1 yc 1 i dy i ð1 þ P t 1 i¼1 y iþ tc Q t 1 i¼1 yc 1 i dy i ð1 þ P t 1 i¼1 y iþ tc Z 1=3 i¼1 : yc 1 i dy i ð1 þ P t 2 i¼1 y iþ tc c 2c i ð2c iþ ðtc iþ 28 þ t ðcþ ðc i þ 1Þ t 2 ðcþ ð2c iþ Q ) t 2 i¼1 yc 1 i dy i ð1 þ P t 2 i¼1 y : ð3:13þ iþ tc i Using Eq. (3.1) this can be written in terms of C-functions as PfWDg¼ t ðtþ1þ tþ1 c C ðt 1Þ 9=ðtþ19Þ 2 tþ19 ðc; cþþtðtþ1þ 9 t þ 19 c C ðt 1Þ 9=ðtþ19Þðc; cþ ( tðt 1Þðt þ 1Þ þ 2 3 c C ðt 2Þ t þ 1 c i 27 c 1=3 ðc; cþ Xc 28 þ t 28 þ t i¼1! ) 2c 1 i C ðt 2Þ 9=ð28þtÞðc; 2c iþ : ð3:14þ c 1 Since the number of missing cells is 2 and total number of cells k ¼ t þ 2, the maximum value t can take is up to 8. The values of the stopping time and P{WD} are calculated and given in Table 3.1. We need to note importantly that three components in above expression have the identical C-functions when t ¼ 8 except coefficients of t. We know that if t ¼ 8, the configuration of cell structure becomes EPC Total Probability of Wrong Decision Since the events that exactly 1 cell is missing, exactly 2 cells are missing,, are mutually exclusive and P{WD} for more than two
12 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. 318 Cho cells missing are extremely small by using the same analysis as in Subsections and 3.3.2, the total P{WD} is PfWDg ¼ X k 1 PfWD, t ¼ k jg j¼1 ð3:15þ PfWD, k ¼ t 1gþPfWD, k ¼ t 2g: Therefore we can approximate the total P{WD} as being close to the sum of column (1) and column (2) in the Table 3.1 and indeed we note that the results are uniformly below.5, which satisfies P*-condition in the procedure, for each value of k. Therefore the proposed procedure is well justified by guaranteeing the P*-condition under the assumption. As we see in Table 3.1, and the probabilities of omitting two cells are uniformly smaller than 1 P* which is.5. We note that for two cells missing the probabilities are relatively very smaller than the probability of one missing cell and have a negligible effect on raising the probability of wrong decision. Also additionally, we find the stopping values and calculated the lower bound (LB) for the expected number of observations E(N) given t, i.e., minimum total number of observations for the case of P* ¼.99. Then we define the c-value of Table 3.1 to be the stopping values of the procedure R and obtain the following table: We note that we do not use any c-value for t ¼ 1 since we simply assert there must be exactly 1 cells under the "-condition. Illustration 3.1. Suppose we observe 4 distinct cells (t ¼ 4). Then from Table 3.2 we obtain c ¼ 6 for P* ¼.95. If we have observed at least Table 3.2. Values of the stopping constant c " as a function of t under procedure R, " ¼ :1. P* ¼.95 P* ¼.99 t c LB of E(N t) c LB of E(N t)
13 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. Inverse-Type Sampling Procedure 319 c ¼ 6 from each of the t ¼ 4 cells, we stop sampling and assert that there are exactly 4 cells. Under our "-condition, all p i s are.1, the probability that our decision is correct is at least.95. However, if we wish to have P{CD}.99, we get c ¼ 9 according to the table. This implies that there should be at least c ¼ 9 observations from each of the t ¼ 4 cells under the assumption " ¼ 1/1 to keep the level of confidence at least.99, which is guaranteed by the proposed procedure R. 4. EXPECTATIONS OF THE NUMBER OF OBSERVATIONS IN THE PROCEDURE 4.1. Expected Total Number of Observations Let N c denote the total number of observations at stopping time needed by the procedure R, which is an inverse sampling procedure. Now we denote the total number of true cells by b (in fact, b stands for blue cell). The stopping rule says that sampling is terminated when the minimum positive frequency reaches a tabled value c which depends upon the observed number of different cells t and prespecified P*. Clearly the total number of observations is random but its moments are functions of the true number of cells and P*; they do not depend on c or t. Before we derive an explicit expression for the expectation of N c,we introduce the general representation for g-th ascending factorial moment of the waiting time until every one of the b cells reaches its quota in a simpler setting than our problem; the quota for cell i is r i (i ¼ 1, 2,, b). Sobel et al. [11] (p. 62, Eq. (5.8)) have shown that the -th ascending factorial moment of the waiting time until the every one of the b cells reaches its quota is given by ½Š ¼ Xb ¼1 ðr þ Þ ðr Þp C ðb 1Þ p< >=p ðr < > ; r þ Þ; ð4:1þ here refers to the cell which is the last one to reach its quota, where p < > =p ¼ðp 1 =p, p 2 =p,, p 1 =p, p þ1 =p,, p b =p Þ; r < > denotes the vector of parameters (r-values) with the component r absent, i.e., (r 1, r 2,, r 1,r þ1,, r b ). Now we consider a certain useful compound event and in order to apply the above factorial moment to our situation. In the notations we have been using, we have N c ¼ f 1 þ f 2 þþf b, where f i (below we
14 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. 32 Cho also use x i for the value of f i ) is observed cell frequency in the i-th cell at stopping time, i ¼ 1, 2,, b. Lemma 4.1. Let X ¼ (X 1, X 2,, X b ) be the b-variate multinomial vector of frequencies with corresponding cell probabilities p ¼ ( p 1, p 2,, p b ) and b i¼1p i ¼ 1. Define an event E ¼ {at stopping time (a.s.t.) x ¼ c for the first time: x i c (i ¼ 1, 2,, 1) and x i ¼ (i ¼ þ 1, þ 2,, b)} which can occur under our inverse sampling procedure R. Let N c be the contribution to the total number of observations at stopping time, c þ x 1 þ x 2 þþx 1 ¼ c þ x 1 þ x 2 þþx 1 þ x þ1 þþx b, contributed by the event E. Then, writing E(N c ) for this contribution, the total expected sample size (a.s.t.) is a sum of terms of the form EðN c Þ¼ Xb c C ðb 1Þ p< >=p ðr < > ; c þ 1Þ ð4:2þ ¼1 p where p< >/p ¼ ( p 1 /p, p 2 /p,, p 1 /p, p þ 1/ p,, p b /p ) and we use the scalar c when all the r s are equal c to which is the minimum positive observed frequency at stopping time (our stopping constant). Proof. Let denote x i be the observed frequency at stopping time for i-th cell, i ¼ 1, 2,, b. There are now (b 1) blue cells and one stopping cell. Since N c ¼ c þ P b i¼1, i6¼ x i at stopping time, the expectation of N c becomes as follows: EðN c Þ¼ Xb ¼1 ðx 1 þ x 2 þþx b ÞPfa:s:t:X ¼ c, X t c, X j ¼ g ð4:3þ where i ¼ 1, 2,, 1, and j ¼ þ 1, þ 2,, b. Then we obtain 2 8 EðN c Þ¼ Xb X 1 X 1 X 1 c þ P 9 b j¼1, j6¼ x 3 < j! Y ¼1 x 1 ¼c x 2 ¼c x 1 ¼c c! Q b j¼1, j6¼ x p c b = 4 p x l 5 : l j! ; l¼1, l6¼ " ¼ Xb c X 1 X 1 X 1 p ¼1 x 1 ¼c x 2 ¼c x 1 ¼c 8 h ðc þ 1Þþ P i 9 b j¼1, j6¼ x 3 < j Y b = ðc þ 1Þ Q b j¼1, j6¼ ðx j þ 1Þ pcþ1 p x l : l 5: ð4:4þ ; l¼1, l6¼ Using the exact correspondence between the cumulative multinomial distribution and Dirichlet-Type II (i.e., the C-integral) in Olkin and Sobel [1] and Sobel et al., [11] we obtain Eq. (4.2). g
15 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. Inverse-Type Sampling Procedure 321 We note that we use the scalar c when all the r s are equal to c which is the minimum positive observed frequency at stopping time (our stopping constant). That is, from the multinomial interpretation of C-integrals (in fact, D-integrals have vanished due to some cells which have not appeared before the stopping time) and the expression in Eq. (3.6), the above result exactly coincides with the right hand side of Eq. (4.1). Example 4.1. Consider the case of k ¼ 2. The configuration of "-LFC for k ¼ 2 is {.9,.1}. Since the maximum number of missing cells is at most 1, the probability of the wrong decision will be exact in this case. From the Table 3.1, P{CD} ¼ 1 P{WD} ¼ ¼ From Eq. (4.2) we obtain 13ð1Þ EðN c Þ¼:9529 :9 Cð1Þ 1=9ð13, 14Þþ13ð1Þ :1 Cð1Þ 9 ð13, 14Þ 29ð1Þ :9 Dð1Þ 1=9ð1, 3Þþ29ð1Þ :1 Dð1Þ 9 ð1, 3Þ þ :471 29ð1Þ :9 Dð1Þ 1=9ð1, 3Þþ29ð1Þ :1 Dð1Þ 9 ð1, 3Þ ¼ 122:6396: This E (N c k ¼ 2) is quite agreeable with the result we obtain from Monte Carlo simulation which is in Table 5.1. Example 4.2. Consider the case of k ¼ 4. The configuration of "-LFC for k ¼ 4 is {.3,.3,.3,.1}. The number of missing cells can be more than 1, so we approximate the wrong decision based on one or two missing. From the Table 3.1, P{CD} ¼ ¼ Similarly, from Eq. (4.2) we obtain 6ð1Þ EðN c Þ¼:95818 :3 Cð1Þ 1=3 Cð1Þ 1 ð6, 6; 7Þþ6ð1Þ :1 Cð3Þ 3 ð6; 7Þ 8ð3Þ :3 Dð1Þ 1=3 Cð2Þ 1 ð1, 8; 9Þþ8ð6Þ :3 Dð1Þ 1 Cð1Þ 1 Cð1Þ 1=3ð1, 8, 8; 9Þ þ 8ð3Þ :1 Dð1Þ 3 Cð2Þ 3 ð1, 8; 9Þ þ :471 8ð3Þ :3 Dð1Þ 1=3 Cð2Þ 1 ð1, 8; 9Þ þ 8ð6Þ :3 Dð1Þ 1 Cð1Þ ¼ 56:9194: 1 Cð1Þ 1=3 ð1, 8, 8; 9Þþ8ð3Þ :1 Dð1Þ 3 Cð2Þ 3 ð1, 8; 9Þ
16 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. 322 Cho This E(N c k ¼ 4) is comparable with the result we obtain from Monte Carlo simulation which is in Table Variance of the Total Number of Observations The first ascending factorial moment ( ¼ 1) is clearly the mean ¼ E(N c ). Since [2] ¼ E{N c (N c þ 1)} is the second ascending factorial moment of N c, the variance 2 is obtained from the relationship VARðN c Þ¼ ½2Š ð1 þ Þ: ð4:5þ 5. SIMULATION STUDIES In the Monte Carlo experimentation, we used several configurations for each value of k, (k ¼ 2, 4, 8) depending on the number t of cells that have been observed. We calculated the P{CD} for each configuration and several other configuration as well Monte Carlo Experimentation The results of the Monte Carlo simulation are summarized in the following tables, which show the probability of correct decision, P{CD}, Table 5.1. For k ¼ 2; P{CD} P, P ¼.95, " ¼.1. Configuration P{CD} E(N) s.e. #WD ð 1 2, 1 2Þ ð 9 1, 1 1Þ ð 8 Þ , 2 1 Table 5.2. For k ¼ 4; P{CD} P, P ¼.95, " ¼.1. Configuration P{CD} E(N) s.e. #WD ð 1 4, 1 4, 1 4, 1 4Þ ð 3 1, 3 1, 3 1, 1 1Þ ð 4 1, 4 1, 1 1, 1 1Þ ð 7 1, 1 1, 1 1, 1 1Þ
17 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. Inverse-Type Sampling Procedure 323 Table 5.3. For k ¼ 8; P{CD} P, P ¼.95, " ¼.1. Configuration P{CD} E(N) s.e. #WD ð 1 8, 1 8, 1 8, 1 8, 1 8, 1 8, 1 8, 1 8Þ ð 9 7, 9 7, 9 7, 9 7, 9 7, 9 7, 9 7, 1 1Þ ð 2 15, 2 15, 2 15, 2 15, 2 15, 2 15, 1 1, 1 1Þ ð 7 5, 7 5, 7 5, 7 5, 7 5, 1 1, 1 1, 1 1Þ ð 3 2, 3 2, 3 2, 3 2, 1 1, 1 1, 1 1, 1 1Þ ð 5 3, 5 3, 5 3, 1 1, 1 1, 1 1, 1 1, 1 1Þ ð 2 1, 2 1, 1 1, 1 1, 1 1, 1 1, 1 1, 1 1Þ ð 3 1, 1 1, 1 1, 1 1, 1 1, 1 1, 1 1, 1 1Þ the average number of observations, E(N), its standard error, s.e., and the number of wrong decisions in the experiments. Every row in each of Table corresponds to 1, independent experiments. REFERENCES 1. Mann, C. Extinction: Are ecologists crying wolf? Science 1991, 253, Marchand, J.; Schroeck, F., Jr. On estimation of the number of equally likely classes in a population. Commun. Statist. 1982, 11, Arnold, B.; Beaver, R. Estimation of the number of classes in a population. Biometrical Journal 1988, 3, McNeil, D. Estimating an author s vocabulary. J. Ameri. Statist. Assoc. 1973, 68, Efron, B.; Thisted, R. Estimating the number of unseen species: how many words did Shakespeare know? Biometrika 1976, 63, Goodman, L. On the estimation of the number of classes in a population. Ann. Math. Statist. 1949, 2, Bunge, J.; Fitzpatrick, M. Estimating the number of species: a review. J. Ameri. Statist. Assoc. 1993, 88, Boender, C.; Rinnooy Kan, A. A Bayesian analysis of the number of cells of a multinomial distribution. The Statistician 1983, 32, Gibbons, J.; Olkin, I.; Sobel, M. Selecting and Ordering Population: A New Statistical Methodology; John Wiley and Sons: New York, 1977.
18 MARCEL DEKKER, INC. 27 MADISON AVENUE NEW YORK, NY Marcel Dekker, Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker, Inc. 324 Cho 1. Olkin, I.; Sobel, M. Integral expression for tail probabilities of the multinomial and negative multinomial distributions. Biometrika 1965, 52, Sobel, M.; Uppuluri, V.; Frankowski, K. Selected Tables in Mathematical Statistics, Vol. IX-Dirichlet Integrals of Type II and Their Applications; Edited by IMS. American Mathematical Society, Providence: Rhode Island, Received August 22 Recommended by B. Eisenberg
Information Matrix for Pareto(IV), Burr, and Related Distributions
MARCEL DEKKER INC. 7 MADISON AVENUE NEW YORK NY 6 3 Marcel Dekker Inc. All rights reserved. This material may not be used or reproduced in any form without the express written permission of Marcel Dekker
More informationAppendix B from Costinot et al., A Theory of Capital Controls as Dynamic Terms-of-Trade Manipulation (JPE, vol. 122, no. 1, p. 77)
2014 by The University of Chicago. All rights reserved. DOI: 10.1086/674425 Appendix B from Costinot et al., A Theory of Capital Controls as Dynamic Terms-of-Trade Manipulation (JPE, vol. 122, no. 1, p.
More informationSAMPLE SIZE AND OPTIMAL DESIGNS IN STRATIFIED COMPARATIVE TRIALS TO ESTABLISH THE EQUIVALENCE OF TREATMENT EFFECTS AMONG TWO ETHNIC GROUPS
MARCEL DEKKER, INC. 70 MADISON AVENUE NEW YORK, NY 006 JOURNAL OF BIOPHARMACEUTICAL STATISTICS Vol., No. 4, pp. 553 566, 00 SAMPLE SIZE AND OPTIMAL DESIGNS IN STRATIFIED COMPARATIVE TRIALS TO ESTABLISH
More informationExperimental designs for precise parameter estimation for non-linear models
Minerals Engineering 17 (2004) 431 436 This article is also available online at: www.elsevier.com/locate/mineng Experimental designs for precise parameter estimation for non-linear models Z. Xiao a, *,
More informationSELECTING THE NORMAL POPULATION WITH THE SMALLEST COEFFICIENT OF VARIATION
SELECTING THE NORMAL POPULATION WITH THE SMALLEST COEFFICIENT OF VARIATION Ajit C. Tamhane Department of IE/MS and Department of Statistics Northwestern University, Evanston, IL 60208 Anthony J. Hayter
More informationMultistage Methodologies for Partitioning a Set of Exponential. populations.
Multistage Methodologies for Partitioning a Set of Exponential Populations Department of Mathematics, University of New Orleans, 2000 Lakefront, New Orleans, LA 70148, USA tsolanky@uno.edu Tumulesh K.
More informationCONSISTENT ESTIMATION THROUGH WEIGHTED HARMONIC MEAN OF INCONSISTENT ESTIMATORS IN REPLICATED MEASUREMENT ERROR MODELS
ECONOMETRIC REVIEWS, 20(4), 507 50 (200) CONSISTENT ESTIMATION THROUGH WEIGHTED HARMONIC MEAN OF INCONSISTENT ESTIMATORS IN RELICATED MEASUREMENT ERROR MODELS Shalabh Department of Statistics, anjab University,
More informationVersion of record first published: 01 Sep 2006.
This article was downloaded by: [University of Miami] On: 27 November 2012, At: 08:47 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:
More informationShooting Methods for Numerical Solution of Stochastic Boundary-Value Problems
STOCHASTIC ANALYSIS AND APPLICATIONS Vol. 22, No. 5, pp. 1295 1314, 24 Shooting Methods for Numerical Solution of Stochastic Boundary-Value Problems Armando Arciniega and Edward Allen* Department of Mathematics
More informationA note on sufficient dimension reduction
Statistics Probability Letters 77 (2007) 817 821 www.elsevier.com/locate/stapro A note on sufficient dimension reduction Xuerong Meggie Wen Department of Mathematics and Statistics, University of Missouri,
More informationA numerical analysis of a model of growth tumor
Applied Mathematics and Computation 67 (2005) 345 354 www.elsevier.com/locate/amc A numerical analysis of a model of growth tumor Andrés Barrea a, *, Cristina Turner b a CIEM, Universidad Nacional de Córdoba,
More information1 Numbers. exponential functions, such as x 7! a x ; where a; x 2 R; trigonometric functions, such as x 7! sin x; where x 2 R; ffiffi x ; where x 0:
Numbers In this book we study the properties of real functions defined on intervals of the real line (possibly the whole real line) and whose image also lies on the real line. In other words, they map
More informationConvergence of a linear recursive sequence
int. j. math. educ. sci. technol., 2004 vol. 35, no. 1, 51 63 Convergence of a linear recursive sequence E. G. TAY*, T. L. TOH, F. M. DONG and T. Y. LEE Mathematics and Mathematics Education, National
More informationPossible numbers of ones in 0 1 matrices with a given rank
Linear and Multilinear Algebra, Vol, No, 00, Possible numbers of ones in 0 1 matrices with a given rank QI HU, YAQIN LI and XINGZHI ZHAN* Department of Mathematics, East China Normal University, Shanghai
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationA Study on the Power Functions of the Shewhart X Chart via Monte Carlo Simulation
A Study on the Power Functions of the Shewhart X Chart via Monte Carlo Simulation M.B.C. Khoo Abstract The Shewhart X control chart is used to monitor shifts in the process mean. However, it is less sensitive
More informationOn a connection between the Bradley Terry model and the Cox proportional hazards model
Statistics & Probability Letters 76 (2006) 698 702 www.elsevier.com/locate/stapro On a connection between the Bradley Terry model and the Cox proportional hazards model Yuhua Su, Mai Zhou Department of
More informationDeterminants and polynomial root structure
International Journal of Mathematical Education in Science and Technology, Vol 36, No 5, 2005, 469 481 Determinants and polynomial root structure L G DE PILLIS Department of Mathematics, Harvey Mudd College,
More informationAdaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation
Research Article Received XXXX (www.interscience.wiley.com) DOI: 10.100/sim.0000 Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample
More informationSIMULATION - PROBLEM SET 1
SIMULATION - PROBLEM SET 1 " if! Ÿ B Ÿ 1. The random variable X has probability density function 0ÐBÑ œ " $ if Ÿ B Ÿ.! otherwise Using the inverse transform method of simulation, find the random observation
More informationMisclassification in Logistic Regression with Discrete Covariates
Biometrical Journal 45 (2003) 5, 541 553 Misclassification in Logistic Regression with Discrete Covariates Ori Davidov*, David Faraggi and Benjamin Reiser Department of Statistics, University of Haifa,
More informationConfidence Interval for the Ratio of Two Normal Variables (an Application to Value of Time)
Interdisciplinary Information Sciences Vol. 5, No. (009) 37 3 #Graduate School of Information Sciences, Tohoku University ISSN 30-9050 print/37-657 online DOI 0.036/iis.009.37 Confidence Interval for the
More informationSequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process
Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University
More informationMultinomial Allocation Model
Multinomial Allocation Model Theorem 001 Suppose an experiment consists of independent trials and that every trial results in exactly one of 8 distinct outcomes (multinomial trials) Let : equal the probability
More informationRounding Error in Numerical Solution of Stochastic Differential Equations
STOCHASTIC ANALYSIS AND APPLICATIONS Vol. 21, No. 2, pp. 281 300, 2003 Rounding Error in Numerical Solution of Stochastic Differential Equations Armando Arciniega* and Edward Allen Department of Mathematics
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationCOUPON COLLECTING. MARK BROWN Department of Mathematics City College CUNY New York, NY
Probability in the Engineering and Informational Sciences, 22, 2008, 221 229. Printed in the U.S.A. doi: 10.1017/S0269964808000132 COUPON COLLECTING MARK ROWN Department of Mathematics City College CUNY
More informationThe exact bootstrap method shown on the example of the mean and variance estimation
Comput Stat (2013) 28:1061 1077 DOI 10.1007/s00180-012-0350-0 ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011
More informationHow do our representations change if we select another basis?
CHAPTER 6 Linear Mappings and Matrices 99 THEOREM 6.: For any linear operators F; G AðV Þ, 6. Change of Basis mðg FÞ ¼ mðgþmðfþ or ½G FŠ ¼ ½GŠ½FŠ (Here G F denotes the composition of the maps G and F.)
More informationHigher-Order Differential Equations
CHAPTER Higher-Order Differential Equations. HIGHER-ORDER HOMOGENEOUS EQUATIONS WITH CONSTANT COEFFICIENTS. ðd 7D þ 0Þy 0 has auxiliary equation m 7m þ 00 which factors into ðm Þðm 5Þ 0 and so has roots
More informationCODIMENSION AND REGULARITY OVER COHERENT SEMILOCAL RINGS
COMMUNICATIONS IN ALGEBRA, 29(10), 4811 4821 (2001) CODIMENSION AND REGULARITY OVER COHERENT SEMILOCAL RINGS Gaoha Tang, 1,2 Zhaoyong Hang, 1 and Wenting Tong 1 1 Department of Mathematics, Nanjing University,
More informationAnalysis of the Global Relation for the Nonlinear Schro dinger Equation on the Half-line
Letters in Mathematical Physics 65: 199 212, 23. 199 # 23 Kluwer Academic Publishers. Printed in the Netherlands. Analysis of the Global Relation for the Nonlinear Schro dinger Equation on the Half-line
More informationINTRODUCTION TO PATTERN RECOGNITION
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationARTICLE IN PRESS. Computers & Operations Research
Computers & Operations Research 37 (2) 74 8 Contents lists available at ScienceDirect Computers & Operations Research ournal homepage: www.elsevier.com/locate/caor On tem blocking queues with a common
More informationName: Exam 2 Solutions. March 13, 2017
Department of Mathematics University of Notre Dame Math 00 Finite Math Spring 07 Name: Instructors: Conant/Galvin Exam Solutions March, 07 This exam is in two parts on pages and contains problems worth
More informationInferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data
Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone
More informationResearch Article Degenerate-Generalized Likelihood Ratio Test for One-Sided Composite Hypotheses
Mathematical Problems in Engineering Volume 2012, Article ID 538342, 11 pages doi:10.1155/2012/538342 Research Article Degenerate-Generalized Likelihood Ratio Test for One-Sided Composite Hypotheses Dongdong
More informationSAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS
Journal of Biopharmaceutical Statistics, 18: 1184 1196, 2008 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400802369053 SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE
More informationCyclic scheduling of a 2-machine robotic cell with tooling constraints
European Journal of Operational Research 174 (006) 777 796 Production, Manufacturing and Logistics Cyclic scheduling of a -machine robotic cell with tooling constraints Hakan Gultekin, M. Selim Akturk
More informationDistance-based test for uncertainty hypothesis testing
Sampath and Ramya Journal of Uncertainty Analysis and Applications 03, :4 RESEARCH Open Access Distance-based test for uncertainty hypothesis testing Sundaram Sampath * and Balu Ramya * Correspondence:
More informationHIGHER THAN SECOND-ORDER APPROXIMATIONS VIA TWO-STAGE SAMPLING
Sankhyā : The Indian Journal of Statistics 1999, Volume 61, Series A, Pt. 2, pp. 254-269 HIGHER THAN SECOND-ORDER APPROXIMATIONS VIA TWO-STAGE SAMPING By NITIS MUKHOPADHYAY University of Connecticut, Storrs
More informationVisually Identifying Potential Domains for Change Points in Generalized Bernoulli Processes: an Application to DNA Segmental Analysis
University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2009 Visually Identifying Potential Domains for
More informationRandom Variable. Pr(X = a) = Pr(s)
Random Variable Definition A random variable X on a sample space Ω is a real-valued function on Ω; that is, X : Ω R. A discrete random variable is a random variable that takes on only a finite or countably
More informationModel of Dry Matter and Plant Nitrogen Partitioning between Leaf and Stem for Coastal Bermudagrass. I. Dependence on Harvest Interval
JOURNAL OF PLANT NUTRITION Vol. 27, No. 9, pp. 1585 1592, 2004 Model of Dry Matter and Plant Nitrogen Partitioning between Leaf and Stem for Coastal Bermudagrass. I. Dependence on Harvest Interval A. R.
More informationAn Explicit Formula for the Discrete Power Function Associated with Circle Patterns of Schramm Type
Funkcialaj Ekvacioj, 57 (2014) 1 41 An Explicit Formula for the Discrete Power Function Associated with Circle Patterns of Schramm Type By Hisashi Ando1, Mike Hay2, Kenji Kajiwara1 and Tetsu Masuda3 (Kyushu
More informationIf the objects are replaced there are n choices each time yielding n r ways. n C r and in the textbook by g(n, r).
Caveat: Not proof read. Corrections appreciated. Combinatorics In the following, n, n 1, r, etc. will denote non-negative integers. Rule 1 The number of ways of ordering n distinguishable objects (also
More informationMathematical Methods for Physics and Engineering
Mathematical Methods for Physics and Engineering Lecture notes for PDEs Sergei V. Shabanov Department of Mathematics, University of Florida, Gainesville, FL 32611 USA CHAPTER 1 The integration theory
More informationPreservation of local dynamics when applying central difference methods: application to SIR model
Journal of Difference Equations and Applications, Vol., No. 4, April 2007, 40 Preservation of local dynamics when applying central difference methods application to SIR model LIH-ING W. ROEGER* and ROGER
More informationExponential and Logarithmic Functions
CHAPTER Eponential and Logarithmic Functions. EXPONENTIAL FUNCTIONS p. ffiffi ¼ :. ¼ :00. p pffiffi ffiffi ¼ :.. 0 0 f ðþ 0 0. 0 f ðþ 0 0 hðþ : : : :0 : 0 0 CHAPTER EXPONENTIAL AND LOGARITHMIC FUNCTIONS..
More informationAppendix D INTRODUCTION TO BOOTSTRAP ESTIMATION D.1 INTRODUCTION
Appendix D INTRODUCTION TO BOOTSTRAP ESTIMATION D.1 INTRODUCTION Bootstrapping is a general, distribution-free method that is used to estimate parameters ofinterest from data collected from studies or
More informationCOLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS
American Journal of Biostatistics 4 (2): 29-33, 2014 ISSN: 1948-9889 2014 A.H. Al-Marshadi, This open access article is distributed under a Creative Commons Attribution (CC-BY) 3.0 license doi:10.3844/ajbssp.2014.29.33
More informationAsymptotics of minimax stochastic programs $
Statistics & Probability Letters 78 (2008) 150 157 www.elsevier.com/locate/stapro Asymptotics of minimax stochastic programs $ Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute
More informationLECTURE 5 HYPOTHESIS TESTING
October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.
More informationSelecting the Best Simulated System. Dave Goldsman. Georgia Tech. July 3 4, 2003
Selecting the Best Simulated System Dave Goldsman Georgia Tech July 3 4, 2003 Barry Nelson 1 Outline of Talk 1. Introduction. Look at procedures to select the... 2. Normal Population with the Largest Mean
More informationExistence of positive periodic solutions for a periodic logistic equation
Applied Mathematics and Computation 139 (23) 311 321 www.elsevier.com/locate/amc Existence of positive periodic solutions for a periodic logistic equation Guihong Fan, Yongkun Li * Department of Mathematics,
More informationModule 1. Probability
Module 1 Probability 1. Introduction In our daily life we come across many processes whose nature cannot be predicted in advance. Such processes are referred to as random processes. The only way to derive
More informationResearch Article A Nonparametric Two-Sample Wald Test of Equality of Variances
Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner
More informationA Type of Sample Size Planning for Mean Comparison in Clinical Trials
Journal of Data Science 13(2015), 115-126 A Type of Sample Size Planning for Mean Comparison in Clinical Trials Junfeng Liu 1 and Dipak K. Dey 2 1 GCE Solutions, Inc. 2 Department of Statistics, University
More informationAsymptotic efficiency and small sample power of a locally most powerful linear rank test for the log-logistic distribution
Math Sci (2014) 8:109 115 DOI 10.1007/s40096-014-0135-4 ORIGINAL RESEARCH Asymptotic efficiency and small sample power of a locally most powerful linear rank test for the log-logistic distribution Hidetoshi
More informationModel of Dry Matter and Plant Nitrogen Partitioning between Leaf and Stem for Coastal Bermudagrass. II. Dependence on Growth Interval
JOURNAL OF PLANT NUTRITION Vol. 27, No. 9, pp. 1593 1600, 2004 Model of Dry Matter and Plant Nitrogen Partitioning between Leaf and Stem for Coastal Bermudagrass. II. Dependence on Growth Interval A. R.
More informationAppendix A Conventions
Appendix A Conventions We use natural units h ¼ 1; c ¼ 1; 0 ¼ 1; ða:1þ where h denotes Planck s constant, c the vacuum speed of light and 0 the permittivity of vacuum. The electromagnetic fine-structure
More informationA Hypothesis-Free Multiple Scan Statistic with Variable Window
Biometrical Journal 50 (2008) 2, 299 310 DOI: 10.1002/bimj.200710412 299 A Hypothesis-Free Multiple Scan Statistic with Variable Window L. Cucala * Institut de Math matiques de Toulouse, Universit Paul
More informationAppendix A: Logic Gates and Boolean Algebra Used in the Book
ppendix : Logic Gates and oolean lgebra Used in the ook This appendix provides a brief set of notes on oolean algebra laws and their use. It is assumed that the reader already has knowledge of these laws.
More informationwhich are not all zero. The proof in the case where some vector other than combination of the other vectors in S is similar.
It follows that S is linearly dependent since the equation is satisfied by which are not all zero. The proof in the case where some vector other than combination of the other vectors in S is similar. is
More informationSimplified marginal effects in discrete choice models
Economics Letters 81 (2003) 321 326 www.elsevier.com/locate/econbase Simplified marginal effects in discrete choice models Soren Anderson a, Richard G. Newell b, * a University of Michigan, Ann Arbor,
More informationAn Approach to Constructing Good Two-level Orthogonal Factorial Designs with Large Run Sizes
An Approach to Constructing Good Two-level Orthogonal Factorial Designs with Large Run Sizes by Chenlu Shi B.Sc. (Hons.), St. Francis Xavier University, 013 Project Submitted in Partial Fulfillment of
More informationEssentials of expressing measurement uncertainty
Essentials of expressing measurement uncertainty This is a brief summary of the method of evaluating and expressing uncertainty in measurement adopted widely by U.S. industry, companies in other countries,
More informationD-Optimal Designs for Second-Order Response Surface Models with Qualitative Factors
Journal of Data Science 920), 39-53 D-Optimal Designs for Second-Order Response Surface Models with Qualitative Factors Chuan-Pin Lee and Mong-Na Lo Huang National Sun Yat-sen University Abstract: Central
More informationSynchronization of an uncertain unified chaotic system via adaptive control
Chaos, Solitons and Fractals 14 (22) 643 647 www.elsevier.com/locate/chaos Synchronization of an uncertain unified chaotic system via adaptive control Shihua Chen a, Jinhu L u b, * a School of Mathematical
More informationBiased Urn Theory. Agner Fog October 4, 2007
Biased Urn Theory Agner Fog October 4, 2007 1 Introduction Two different probability distributions are both known in the literature as the noncentral hypergeometric distribution. These two distributions
More informationSolution: chapter 2, problem 5, part a:
Learning Chap. 4/8/ 5:38 page Solution: chapter, problem 5, part a: Let y be the observed value of a sampling from a normal distribution with mean µ and standard deviation. We ll reserve µ for the estimator
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationThe F distribution and its relationship to the chi squared and t distributions
The chemometrics column Published online in Wiley Online Library: 3 August 2015 (wileyonlinelibrary.com) DOI: 10.1002/cem.2734 The F distribution and its relationship to the chi squared and t distributions
More informationA Note on Item Restscore Association in Rasch Models
Brief Report A Note on Item Restscore Association in Rasch Models Applied Psychological Measurement 35(7) 557 561 ª The Author(s) 2011 Reprints and permission: sagepub.com/journalspermissions.nav DOI:
More informationA nonparametric two-sample wald test of equality of variances
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David
More informationAPPENDIX C: Measure Theoretic Issues
APPENDIX C: Measure Theoretic Issues A general theory of stochastic dynamic programming must deal with the formidable mathematical questions that arise from the presence of uncountable probability spaces.
More informationMATH Notebook 3 Spring 2018
MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................
More informationDifferentiation matrices in polynomial bases
Math Sci () 5 DOI /s9---x ORIGINAL RESEARCH Differentiation matrices in polynomial bases A Amiraslani Received January 5 / Accepted April / Published online April The Author(s) This article is published
More informationA SEQUENTIAL SAMPLING RULE FOR SELECTING THE MOST PROBABLE MULTINOMIAL EVENT. KHURSHEEO At.AM, KENZO SEO AND JAMES R. THOMPSON
A SEQUENTIAL SAMPLING RULE FOR SELECTING THE MOST PROBABLE MULTINOMIAL EVENT KHURSHEEO At.AM, KENZO SEO AND JAMES R. THOMPSON (Received Aug. 17, 1970) Summary A sequential sampling rule is given for selecting
More informationPhase Transition & Approximate Partition Function In Ising Model and Percolation In Two Dimension: Specifically For Square Lattices
IOSR Journal of Applied Physics (IOSR-JAP) ISS: 2278-4861. Volume 2, Issue 3 (ov. - Dec. 2012), PP 31-37 Phase Transition & Approximate Partition Function In Ising Model and Percolation In Two Dimension:
More informationMath 416 Lecture 3. The average or mean or expected value of x 1, x 2, x 3,..., x n is
Math 416 Lecture 3 Expected values The average or mean or expected value of x 1, x 2, x 3,..., x n is x 1 x 2... x n n x 1 1 n x 2 1 n... x n 1 n 1 n x i p x i where p x i 1 n is the probability of x i
More informationARTICLE IN PRESS. Physica B 343 (2004) A comparative study of Preisach scalar hysteresis models
Physica B 343 (2004) 164 170 A comparative study of Preisach scalar hysteresis models B. Azzerboni a, *, E. Cardelli b, G. Finocchio a a Dipartimento di Fisica della Materia e Tecnologie Fisiche Avanzante,
More informationA Note on Bayesian Inference After Multiple Imputation
A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in
More informationBootstrap tests of multiple inequality restrictions on variance ratios
Economics Letters 91 (2006) 343 348 www.elsevier.com/locate/econbase Bootstrap tests of multiple inequality restrictions on variance ratios Jeff Fleming a, Chris Kirby b, *, Barbara Ostdiek a a Jones Graduate
More informationMARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES
REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of
More informationConstructing Ensembles of Pseudo-Experiments
Constructing Ensembles of Pseudo-Experiments Luc Demortier The Rockefeller University, New York, NY 10021, USA The frequentist interpretation of measurement results requires the specification of an ensemble
More informationTesting Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA
Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize
More informationn N CHAPTER 1 Atoms Thermodynamics Molecules Statistical Thermodynamics (S.T.)
CHAPTER 1 Atoms Thermodynamics Molecules Statistical Thermodynamics (S.T.) S.T. is the key to understanding driving forces. e.g., determines if a process proceeds spontaneously. Let s start with entropy
More informationOptimal Cusum Control Chart for Censored Reliability Data with Log-logistic Distribution
CMST 21(4) 221-227 (2015) DOI:10.12921/cmst.2015.21.04.006 Optimal Cusum Control Chart for Censored Reliability Data with Log-logistic Distribution B. Sadeghpour Gildeh, M. Taghizadeh Ashkavaey Department
More informationAn inverse problem in estimating simultaneously the effective thermal conductivity and volumetric heat capacity of biological tissue
Applied Mathematical Modelling 31 (7) 175 1797 www.elsevier.com/locate/apm An inverse problem in estimating simultaneously the effective thermal conductivity and volumetric heat capacity of biological
More informationNON-HYPERELLIPTIC RIEMANN SURFACES OF GENUS FIVE ALL OF WHOSE WEIERSTRASS POINTS HAVE MAXIMAL WEIGHT1. Ryutaro Horiuchi. Abstract
R HORIUCHI KODAI MATH J 0 (2007), 79 NON-HYPERELLIPTIC RIEMANN SURFACES OF GENUS FIVE ALL OF WHOSE WEIERSTRASS POINTS HAVE MAXIMAL WEIGHT1 Ryutaro Horiuchi Abstract It is proved that any non-hyperelliptic
More informationProbability. Lecture Notes. Adolfo J. Rumbos
Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationStrongly chordal and chordal bipartite graphs are sandwich monotone
Strongly chordal and chordal bipartite graphs are sandwich monotone Pinar Heggernes Federico Mancini Charis Papadopoulos R. Sritharan Abstract A graph class is sandwich monotone if, for every pair of its
More informationNon-Bayesian Multiple Imputation
Journal of Official Statistics, Vol. 23, No. 4, 2007, pp. 433 452 Non-Bayesian Multiple Imputation Jan F. Bjørnstad Multiple imputation is a method specifically designed for variance estimation in the
More informationOn the excluded minors for the matroids of branch-width k $
Journal of Combinatorial Theory, Series B 88 (003) 61 65 http://www.elsevier.com/locate/jctb On the excluded minors for the matroids of branch-width k $ J.F. Geelen, a A.M.H. Gerards, b,c N. Robertson,
More informationSpontaneous magnetization of the square 2D Ising lattice with nearest- and weak next-nearest-neighbour interactions
Phase Transitions Vol. 82, No. 2, February 2009, 191 196 Spontaneous magnetization of the square 2D Ising lattice with nearest- and weak next-nearest-neighbour interactions H.J.W. Zandvliet a * and C.
More informationUNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS
UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS F. C. Nicolls and G. de Jager Department of Electrical Engineering, University of Cape Town Rondebosch 77, South
More informationIndirect Sampling in Case of Asymmetrical Link Structures
Indirect Sampling in Case of Asymmetrical Link Structures Torsten Harms Abstract Estimation in case of indirect sampling as introduced by Huang (1984) and Ernst (1989) and developed by Lavalle (1995) Deville
More informationEIGENVALUES AND SINGULAR VALUE DECOMPOSITION
APPENDIX B EIGENVALUES AND SINGULAR VALUE DECOMPOSITION B.1 LINEAR EQUATIONS AND INVERSES Problems of linear estimation can be written in terms of a linear matrix equation whose solution provides the required
More information