Estimating the number of shared species by a jackknife procedure

Size: px
Start display at page:

Download "Estimating the number of shared species by a jackknife procedure"

Transcription

1 Environ Ecol Stat (2015) 22: DOI /s Estimating the number of shared species by a jackknife procedure Chia-Jui Chuang 1,2 Tsung-Jen Shen 1 Wen-Han Hwang 1 Received: 12 September 2014 / Revised: 13 April 2015 / Published online: 7 May 2015 Springer Science+Business Media New York 2015 Abstract A sequence of jackknife estimators is developed to estimate the number of shared species in two communities. The estimators have simple and explicit formulae. A sequential testing criterion is also developed to determine a proper order for these jackknife estimators. The performance of the estimators is evaluated using empirical data on two forests from Malaysia, where 209 shared species present in both forests, and using simulated data. Results for the empirical data and simulated scenarios (for sampling fraction ranging from 0.5 to 20 %) show that the jackknife estimator, compared with other existing estimators, has a smaller bias and provides more reliable interval estimation in most cases. Additionally, two avian datasets from Taiwan and Hong Kong are used to demonstrate the proposed method. To extend the proposed method to three communities, we also list the first six orders of the jackknife estimators explicitly. Keywords Quadrat sampling Shared species Two-sample jackknife 1 Introduction Assume that there are M 1 and M 2 species in communities I and II, respectively. Of these species, without loss of generality, assume that the first S species are shared by Handling Editor: Pierre Dutilleul. Electronic supplementary material The online version of this article (doi: /s ) contains supplementary material, which is available to authorized users. B Wen-Han Hwang wenhan@nchu.edu.tw 1 Department of Applied Mathematics and Institute of Statistics, National Chung Hsing University, Taichung, Taiwan 2 Center for Biomedical Resources, National Health Research Institutes, Zhunan, Taiwan

2 760 Environ Ecol Stat (2015) 22: both communities, with M 1 S species and M 2 S species being unique to communities I and II, respectively. Let X = ( X 1,...,X M1 ) and Y = ( Y1,...,Y M2 ) be the frequencies of the M 1 and M 2 species randomly sampled from the two communities, respectively. Note that all species observed in both samples are undoubtedly shared species. However, species only observed in one sample have uncertain designation: they can be unique species or shared species. The objective of this study is to estimate the number of shared species S based on such samples. Estimating the number of species shared by two distinct ecological communities is essential for understanding spatial distribution of diversity in landscapes, for modeling species diversity patterns, such as the species area relationship and beta diversity (Ostling et al. 2003; Krishnamani et al. 2004; Tjørve and Tjørve 2008), and for inferring mechanisms of diversity maintenance (Condit et al. 2002). The shared species between communities, traditionally characterized by Jaccard or Sorenson similarity indices, form the basis of community ordination analysis and are widely used to measure beta diversity in macroecology (Magurran 2004; Colwell and Elsensohn 2014). However, in ecological applications, species similarities are nearly always directly calculated from sampled data with the implied assumption that samples constitute full coverage of the communities from which they are sampled. Given that in reality samples are almost always a minute fraction of the communities they are supposed to represent, this assumption is rarely true. Sample bias can be substantial (Chao et al. 2006). Realizing this problem, much effort has recently been dedicated to estimating shared species between communities based on samples (Chao et al. 2000; Schloss and Handelsman 2006; Chao et al. 2006, 2008; Yue and Clayton 2012). In the context of estimating species richness S, there are three major studies in the literature. Chao et al. (2000) first took the shared species approach to estimate species richness. They extended a richness estimator via the sample coverage for single community (i.e., the so-called ACE estimator; Colwell and Coddington 1994) to estimate the number of species shared by two communities. Chao et al. (2006) derived a simple estimator for shared species based on the Laplace approximation formula. In a more recent attempt, Pan et al. (2009) proposed a nonparametric lower bound for shared species among multiple communities, which can be considered as an extension of the popular richness estimator Chao2 (Colwell and Coddington 1994). However, these statistical developments have not been subjected to thorough examination and no empirical test has yet been conducted to evaluate their performance in practice. Prior to this study, we conducted empirical tests using simulated data of two largescale forest plots in Malaysia to evaluate performance of these three methods. Our results showed that the three methods all suffered from serious underestimation of shared species if the sampling intensity is less than 5 % of the true community. This deficiency is critical because it is very rare to have sampling effort larger than 1 % in field surveys (Chiarucci et al. 2003). This made us question the practical utility of these methods and motivated us to develop more reliable methods for estimating shared species richness. Our method is developed based on a two-sample jackknife procedure (Schechtman and Wang 2004). The derived method has an explicit form consisting of a sequence of estimators. We further develop a sequential acceptance rejection criterion to determine the jackknife order among the sequence of estimators. Note that jackknife estimators in a single community have been widely applied to

3 Environ Ecol Stat (2015) 22: estimate species richness or population size. This approach has been examined and recommended by several studies. For examples, based on comprehensive studies of simulated data (Burnham and Overton 1978, 1979; Amstrup et al. 2010) and real datasets (Palmer 1990, 1991; Colwell and Coddington 1994; Walther and Morand 1998; Walther and Moore 2005; Williams et al. 2007; Gotelli and Colwell 2009), there is a general consensus that the jackknife method can be useful in practice. In the remainder of this study, we first introduce the model structures and provide a brief overview of the three main methods proposed in the literature. We then derive the jackknife estimator for the number of shared species between communities and extend the applications of quadrat sampling data, which are frequently used in plant surveys. The jackknife estimator is tested using simulated data of two forest plots in Malaysia. We conclude by discussing a generalization to estimate shared species in multiple communities. 2 Model and overview Suppose there are two communities. Community I has M 1 species with relative abundances ( p 1,...,p M1 ) and community II has M2 species with relative abundances ( q1,...,q M2 ).Letn1 and n 2 be the sample sizes for samples from communities I and II, respectively. Suppose that each individual is observed or detected independently from others; hence the species counts X = ( X 1,...,X M1 ) follow a multinomial distribution with total n 1 and probabilities ( p 1,...,p M1 ) and similarly for Y = ( Y 1,...,Y M2 ).LetI ( ) be an indicator function, where I (A) = 1 if the event A occurs and 0 otherwise. Then D = S i=1 I (X i 1, Y i 1) denotes the number of observed shared species from the samples. For any two nonnegative integers j and k, we define f jk = S I (X i = j, Y i = k) i=1 as the number of shared species with precisely j individuals in sample I (the sample from community I) and k individuals in sample II (the sample from community II). We can observe f jk only if both j and k are positive. For simplicity, we further let f j+ = k 1 f jk = S i=1 I (X i = j, Y i 1) be the number of observed shared species with j individuals in sample I and similarly define f +k. Note that the parameter of interest S can be decomposed into four terms S = D + f 0+ + f +0 + f 00, (1) where the last three terms are unobserved. With the above notation, we review the three methods for estimating S in the literature.

4 762 Environ Ecol Stat (2015) 22: Method I: Sample coverage In Chao et al. (2000), the sample coverage C with respect to two communities is defined as: Si=1 p i q i I (X i 1, Y i 1) C = Si=1, (2) p i q i which is the fraction of product probabilities associated with the species common to both samples. Based on the method of moments, Chao et al. (2000) suggested an estimator of C: Ĉ = 1 Di=1 {X i I (Y i = 1) + Y i I (X i = 1) I (X i = 1, Y i = 1)} Di=1 X i Y i. This estimator performs very well in many situations. The underlying concept can be traced back to Good (1953), who considered the probability that an extra individual sampled from a community is a new species. Good s coverage has been widely used to develop methods for richness estimation (Darroch and Ratcliff 1980; Esty 1985; Chao and Lee 1992). As a special case, if the relative abundances are uniform, that is p i = 1/M 1 and q j = 1/M 2 for all i and j, the sample coverage (2) reduces to C = D/S. Thus, D/Ĉ is an estimate of S in this case. However, the uniform case is unrealistic for most field studies. To account for more general situations, Chao et al. (2000) considered the asymptotic bias of D/Ĉ in terms of coefficients of covariance (CCVs). Let r i = p i q i for i = 1,...,S, p = S i=1 p i /S, q = S i=1 q i /S, and r = S i=1 r i /S. The CCVs are Ɣ 1 = S i=1 (p i p)(r i r)/(s p r), Ɣ 2 = Si=1 (q i q)(r i r)/(s q r), and Ɣ 12 = S i=1 (p i p)(q i q)(r i r)/(s p q r). Furthermore, these CCVs can be estimated via the method of moments with some approximations. Based on the above-mentioned preliminary work, Chao et al. (2000) proposed an estimator of S: Ŝ Cov = Ḓ C + 1 Ĉ ( f 1+ ˆƔ 1 + f +1 ˆƔ 2 + f 11 ˆƔ 12 ), where ˆƔ 1, ˆƔ 2, and ˆƔ 12 are the associated CCV estimates and are given in Chao et al. (2000). Note that the estimator Ŝ Cov is influenced by high frequency values, so the authors suggested applying it to a subset of the original data with X i 10 and Y i 10 for all i. The estimator has been implemented in the software programs Spade (Chao and Shen 2010) and EstimateS (Colwell and Elsensohn 2014). Method II: Laplace approximation The second method relies on the Laplace approximation, which is a popular technique to approximate numerical integrals (Goutis and Casella 1999). Owing to the specialty of the Laplace technique, Chao et al. (2006) assumed that each relative abundance (p i and q i ) is bounded below by a positive constant. When the sample sizes n 1 and n 2 are large, they established E( f 0+ ) E( f1+ 2 )/{2E( f 2+)}, E( f +0 ) E( f+1 2 )/{2E( f +2)}, and E( f 00 ) E( f 11 )E( f 1+ )E( f +1 )/{4E( f 2+ )E( f +2 )}.

5 Environ Ecol Stat (2015) 22: According to (1), Chao et al. (2006) proposed a simple estimator using the Laplace approximation: Ŝ Lap = D + f f 2+ + f f +2 + f 11 f 1+ f +1 4 f 2+ f +2. If one of the denominators ( f 2+ and f +2 ) is zero, Chao et al. (2006) suggested using a bias-corrected formula: f1+ 2 Ŝ Lap = D + 2( f ) + f+1 2 2( f ) + f f 1+ f ( f )( f ). (3) Method III: Lower bound Instead of seeking an estimate of S directly, Pan et al. (2009) obtained a lower bound estimate for the number of shared species. Pan et al. (2009) applied the Cauchy Schwarz inequality and showed that 2E( f 0+ )E( f 2+ ) (n 1 1) E 2 ( f 1+ )/n 1, 2E( f +0 )E( f +2 ) (n 2 1)E 2 ( f +1 )/n 2, and 4E( f 00 )E( f 22 ) (n 1 1)(n 2 1)E 2 ( f 11 )/(n 1 n 2 ). Substituting these terms into (1) and considering ( n j 1 ) /n j 1for j = 1, 2asn j are usually large, Pan et al. (2009) developed a lower bound estimator of S: Ŝ Low = D + f f 2+ + f f +2 + f f 22. Similarly, when f 2+ = 0, f +2 = 0or f 22 = 0, a correction like (3) could be adopted as well. 3 Jackknife procedure for estimating S 3.1 A sequence of jackknife estimators The jackknife method was invented by Quenouille (1949) and has been widely applied for correcting the statistical bias and estimating standard errors (Shao and Tu 1995). In ecology, Burnham and Overton (1978) applied the procedure to obtain a series of population size estimators for a closed capture recapture model. Heltshe and Forrester (1983) considered species richness estimation based on a quadrat sampling data. Traditionally, the first-order jackknife method is carried out through recomputing a desired statistic by successively leaving one observation out at a time from a onesample dataset. Because our interest is about two-sample data, we followed extended works (Arvesen 1969; Schechtman and Wang 2004) to apply the jackknife procedure to two-sample situations. Suppose that individuals in sample I have labels a l,l= 1,...,n 1, and individuals in sample II have labels b m, m = 1,...,n 2. For a parameter of interest θ, let ˆθ be an estimator of θ and ˆθ ( l, ) be the estimate when individual a l is removed from sample I, ˆθ (, m) be the estimate when individual b m is removed from sample II, and

6 764 Environ Ecol Stat (2015) 22: ˆθ ( l, m) be the estimate after individuals a l and b m are removed from samples I and II, respectively. Recall that D is the number of observed shared species from the samples; intuitively, Ŝ 0 = D is chosen to be a basic estimator of S for the jackknife procedure. The procedure starts with alternately and sequentially deleting a l and b m from the full dataset and then recounts the observed number of shared species in the resulting data. For instance, Ŝ ( l, ) is the observed number of shared species after deleting a l from sample I. Trivially, Ŝ ( l, ) can be either D or D 1, where the latter occurs when individual a l belongs to species i associated with X i = 1 and Y i > 0. Performing the usual jackknife method with respect to sample I yields the estimator: Ŝ 0,X = n 1 Ŝ 0 (n 1 1) Similarly, by jackknifing sample II, we obtain: Ŝ 0,Y = n 2 Ŝ 0 (n 2 1) n1 Ŝ( l, ) l=1 0 = D + n 1 1 f 1+. n 1 n 1 n2 m=1 Ŝ(, m) 0 n 2 = D + n 2 1 n 2 f +1. By taking a weighted average of Ŝ 0,X and Ŝ 0,Y (Arvesen 1969), the first-order jackknife estimator is: Ŝ 1 = n 1Ŝ0,X + n 2 Ŝ 0,Y n 1 + n 2 = D + n 1 1 n 1 + n 2 f 1+ + n 2 1 n 1 + n 2 f +1. Nevertheless, as shown in Schechtman and Wang (2004), this first-order jackknife estimator does not reduce the bias in terms of asymptotic order. Hence a further correction is necessary. Following Schechtman and Wang (2004), we consider jackknifing Ŝ 0,X with deleting one individual b m at a time from sample II. As a result, we find the second-order estimator Ŝ 2 : Ŝ 2 = n 2 Ŝ 0,X (n 2 1) n2 Ŝ(, m) m=1 0,X n 2 n1 Ŝ( l, ) l=1 0 = n 1 n 2 Ŝ 0 n 2 (n 1 1) n 1 (n 2 1) n 1 + (n 1 1)(n 2 1) n 1 n 2 Ŝ ( l, m) 0 n 1 n 2 l=1 m=1 = D + n 1 1 n 1 f 1+ + n 2 1 n 2 f +1 + (n 1 1)(n 2 1) n 1 n 2 f 11. n2 m=1 Ŝ(, m) 0 n 2 We note that, alternatively, it can be shown Ŝ 2 = n 1 Ŝ 0,Y (n 1 1) n 1 l=1 Ŝ( l, ) 0,Y /n 1. Briefly, Ŝ 2 results from combining jackknifing Ŝ 0 with alternately deleting one individual from either sample.

7 Environ Ecol Stat (2015) 22: In order to further reduce the statistical bias by Ŝ 0, we suggest continuing this procedure. In this way we establish a sequence of estimators Ŝ k for k 0; the algorithm is summarized as follows. Step 0: Initialize ν = 0. Step 1: Let k = 2ν + 1 and define Ŝ 2ν,X = n 1 Ŝ 2ν (n 1 1) n 1 Ŝ 2ν,Y = n 2 Ŝ 2ν (n 2 1) n 2 k = 2ν + 1, is: /n 1 and 0,Y /n 2.Thekth-order jackknife estimator, m=1 Ŝ(, m) Ŝ( l, ) l=1 2ν Ŝ k = n 1Ŝ2ν,X + n 2 Ŝ 2ν,Y n 1 + n 2. Step 2: The (k + 1)-th-order jackknife estimator, k + 1 = 2ν + 2, is: Ŝ k+1 = n 2 Ŝ 2ν+1,X (n 2 1) n2 m=1 Ŝ(, m) 2ν+1,X n 2. Step 3: Increment ν to ν + 1 and return to Step 1. In Theorem 1 of the Appendix, we show that Ŝ k is a linear combination of observed frequencies f ij and give the explicit formula. In addition, as the sample sizes (n 1 and n 2 ) are usually large in practice, we can further simplify the expression of Ŝ k ;see Corollary 1 in the Appendix. The variance estimation of Ŝ k can be derived from a standard asymptotic approach. Due to random sampling, the random variables S D as well as f ij, i, j 1, follow a multinomial distribution with the total S and probabilities 1 π and π ij, i 1, j 1, where π = i 1, j 1 π ij and π ij is the probability of a shared species exactly observed i times in sample I and j times in sample II. Given Ŝ k, we estimate π ij by ˆπ ij = f ij /Ŝ k for all i 1 and j 1. As a consequence, we have { fij (1 ˆπ Cov( f ij, f st ) = ij ) if i = s, j = t; f ij ˆπ st otherwise. Rewriting Ŝ k = i 1 j 1 c ij f ij in terms of some constant coefficients c ij,the variance estimator of Ŝ k can be expressed as: Var(Ŝ k ) = i 1 cij 2 f ij Ŝ k. (4) Remark Similar to the proof in Cormack (1989), it is straightforward to show that the bias of the initial estimator D cannot be expressed by a power series in the reciprocal of the sample sizes n 1 and n 2, and hence the bias-reduction assumption in the two-sample jackknife procedure of Schechtman and Wang (2004) is not satisfied. Nevertheless, in practice, the bias can be reduced by jackknife estimators under some conditions. For instance, the second-order jackknife estimator Ŝ 2 can reduce the bias of D in a j 1

8 766 Environ Ecol Stat (2015) 22: broad range of situations. To see this, let d 1 = n 1 p and d 2 = n 2 q. Using the Taylor expansion to D and Ŝ 2 around p and q, we obtain the asymptotic biases of D and Ŝ 2 in terms of d 1, d 2, and the coefficients of variation (CVs), see Web Appendix S2. Note that d 1 (d 2 ) is the average of the observed number of individuals for the shared species in community I (community II). As a consequence, we can evaluate the relative asymptotic biases of D and Ŝ 2,givend 1, d 2, and the CVs. Web Figure 1 displays the results under selected CVs. Based on these results, we conclude the jackknife estimator Ŝ 2 is able to reduce the bias of D, especially when d 1 and/or d 2 are small. Note that, when both d 1 and d 2 are large, the absolute bias of both D and Ŝ 2 tend to be small. The asymptotic bias of other jackknife estimators can be evaluated from the parallel technique, but the results are much more complicated than that of Ŝ Order selection Although the jackknife estimator Ŝ k is likely to have a smaller bias for larger k, it inevitably inflates the variance as more terms are involved. Thus, there is a biasvariance trade-off in selecting a jackknife order k. Here we use a sequential test procedure (Burnham and Overton 1978) as the decision criterion. For each k 0, consider the following hypotheses: H 0k : E(Ŝ k+1 Ŝ k ) = 0vs.H 1k : E(Ŝ k+1 Ŝ k ) = 0. (5) Assume that, under the null hypothesis H 0k, the test statistic T k = Ŝ k+1 Ŝ k Var(Ŝ k+1 Ŝ k ) (6) is asymptotically normally distributed. For a significance level α, the procedure begins by testing the hypothesis in (5) with order k = 0 and then continues to the next order until acceptance occurs. In other words, if the p-value associated with the test statistic T k is smaller than α, the procedure goes to the next order of hypothesis and it stops when the p-value exceeds α. When the procedure stops at k = k, our proposed estimator is Ŝ JK = Ŝ k. Note that Burnham and Overton (1978) suggested using an interpolation formula at this stage, but the resulting estimate was less favorable than the proposed method in a simulation study (data not shown). The variance estimate in the denominator of (6) can be obtained via the same technique shown in (4) since Ŝ k+1 Ŝ k is a linear combination of the observed frequencies f ij. However, we caution that (4) is not suitable for estimating variance of Ŝ JK because the selected order k is a random variable. Specifically, the variance of Ŝ JK would be underestimated if one treated k as fixed and applied (4) naively. Although an analytic variance estimator of Ŝ JK is currently not available, we suggest adopting a non-parametric bootstrap (Chao et al. 2000) to obtain a variance estimator instead. It is possible that the proposed sequential testing procedure never terminates i.e., the test never yields a p value that exceeds the desired significance level α, though

9 Environ Ecol Stat (2015) 22: this outcome was unusual in our empirical study. To successfully implement the order selection, we set an upper bound of the jackknife order K u and an upper threshold to avoid extreme estimates that can occur frequently for higher orders of k. Under these refinements, the sequential test procedure is stopped at k when the next (k + 1)-order jackknife estimator would exceed the upper threshold. Moreover, when no acceptance occurs before order K u, we took K u as the selected order. In practice, we suggest taking K u = 6 because the procedure seldom selected an order larger than 6 in our experience. The upper threshold could be 10 times the number of observed shared species in both samples (Hwang and Huang 2003). 4 Quadrat sampling with incidence-based data In plant ecology, it is common to collect data by quadrat sampling in which an area of interest is divided into several regular quadrats (usually in a rectangle shape), and a random sample of quadrats is taken from the area. Within each sampled quadrat, instead of counting the exact abundance of each species, one only record the presence (1) or absence (0) for each species. Thus the sampling unit is a quadrat, and a vector of 0 1 values reflects species incidence in each quadrat. Although the incidence data differ from the structure we considered in the last section, the jackknife procedure developed in this study is equally applicable. In this section, we redefine notation and corresponding statistics for the jackknife formula. Under a quadrat sampling design, let n 1 and n 2 be the number of quadrats taken from communities I and II, respectively. Let D be the observed number of shared species and f jk be the number of shared species detected in j quadrats in community I and in k quadrats in community II. Other symbols like f j+ and f +k are defined similarly as in (1). However, the quadrat sampling incidence data is different from sampling abundance data; the sampling unit, denoted a l in sample I and b m in sample II, is now an incidence vector rather than a scalar as in the previous sections. That is a l = ( a 1l,...,a M1 l), where ail = 1ifthei-th species has been detected in the l-th quadrat of sample I and a il = 0 otherwise. b m = ( b 1m,...,b M2 m) is similarly defined. Following the same arguments in Sect. 3 but treating a l and b m as the removed units, the jackknife estimators derived from the incidence-based quadrat sampling design are the same as before. The sequential testing criterion in Sect. 3.2 is again recommended for selecting the jackknife order. Note that the similarity between the abundance-based and incidence-based data is not coincident. In fact, it can be shown that all methods in Sect. 2 have the same representations for both incidence-based and abundance-based data; see Pan et al. (2009) for a remark on the two data types of the estimation approaches. 5 Empirical study The performance of various estimators was assessed by simulation where two largescale census rain forest datasets were considered as sampling populations to reflect the species structure in two real communities. The two forest plots, Pasoh and Lambir, are

10 768 Environ Ecol Stat (2015) 22: Table 1 Basic characteristics of the Pasoh and the Lambir plots Pasoh Lambir Location 2 58 N, E 4 10 N, E Size of plot (ha) Range of elevation (m) Annual rainfall (mm) No. of species No. of individuals 320, ,602 No. of shared species log(frequency) Fig. 1 Frequency of the 209 shared species of trees in Pasoh (right) and Lambir (left) plots. Note that the horizontal axis is on the log scale both located in Malaysia. The Pasoh plot is 50 ha ( m) and is located in the Pasoh Forest Reserve, Peninsular Malaysia. The Lambir plot is 52 ha ( m) and is located in Lambir Hills National Park in Sarawak, Malaysia. In each plot, all free-standing trees and shrubs at least 1 cm in diameter at breast height were counted, located on a reference map with precise coordinates, and were identified to species. To date, both plots have been censused several times; we use the data collected in 1985 for the Pasoh plot and in 1991 for the Lambir plot. Table 1 summarizes the background of the two plots, which includes locations, average annual rainfall, and species richness. There were 209 tree species in common between the two plots. In Fig. 1 we show the abundances of these shared species, where the number in the Pasoh plot ranges from 1 to 8821 (median 208) and in the Lambir plot from 1 to 3130 (median 141).

11 Environ Ecol Stat (2015) 22: We simulated quadrat sampling from the two plots and considered three quadrat sizes (5 5m, 10 10m, and 20 20m) and seven sampling proportions (0.5, 1, 3, 5, 10, 20, 33, and 50 %). For each combination of quadrat size and sampling proportion, 2000 pairs of samples of quadrats were randomly selected with replacement from the Pasoh and Lambir plots. Note that sampling without replacement is more appropriate than sampling with replacement in this application; however, since the sampling proportion is usually small in practice and at most 30 % in our study, these sampling schemes yield very similar results. Figure 2 displays the average frequency of f ij used in the jackknife estimators. We found the frequencies were not sensitive to quadrat sizes; however, the frequencies varied notably with sampling proportions. For each generated dataset, we computed the following estimators: the sample coverage estimator (Ŝ Cov ), the Laplace approximation estimator (Ŝ Lap ),thelower bound estimator (Ŝ Low ), the jackknife estimators Ŝ 1,...,Ŝ 6, and the estimator Ŝ JK selected based on the procedure proposed in Sect. 3.2 with K u = 6 and significance level α = 0.1 (results were similar for α = 0.05 and α = 0.15). For the proposed Ŝ JK, we estimated the standard error (SE) using the bootstrap procedure (Chao et al. 2000) with 100 bootstrap replicates; SEs of the other estimators were derived according to (4). The resulting 2000 estimates along with their SEs were averaged to give the Estimate and ˆσ in Web Tables 2 4 in the Supplementary Materials. The sample SE (denoted σ ) and sample root mean squared error (RMSE) were also calculated. Moreover, we also computed the percentage of the 2000 simulated datasets in which the 95 % confidence intervals covered the true number of shared species. Since the distribution of the species richness estimator skews to the right in general, a log-transformed confidence interval suggested by Chao (1987) was adopted here. In Fig. 3 we summarize the results of comparing the proposed jackknife estimators with existing method in terms of bias, RMSE, and coverage percentage of the 95 % confidence interval. With regard to the three existing methods, when the sampling proportion is very small (say 0.5 %), the sample coverage-based estimator Ŝ Cov outperforms Ŝ Lap and Ŝ Low. However, the performance of these estimators is reversed if the sampling proportion is increased. As we can see from Fig. 3 and Web Tables 2 4, when the sampling proportion increases, the estimator Ŝ Cov approaches the target value S = 209 very slowly compared with the other methods. In this case, Ŝ Lap and Ŝ Low are preferred in terms of bias and RMSE. According to the empirical test, we also find that the difference between the Laplace approximation and the lower bound estimate is negligible, but the former usually has a smaller bias and RMSE than the latter. Nevertheless, all the three methods have considerable negative bias, especially when the sampling proportion is less than 5 %. Given the same sampling proportion but with different quadrat sizes, the magnitude of bias slightly increases when the quadrat size becomes large, except for Ŝ Cov and Ŝ JK at sampling proportions less than 3 %. A similar pattern is also found for the RMSE. The jackknife estimators Ŝ 1,...,Ŝ 6 present an apparently increasing trend with the order (Web Tables 2 4), where Ŝ 5 has the smallest bias when the sampling proportion is less than 1 %, but this is accompanied by a rather large variance; Ŝ 4 has the smallest RMSE in almost all cases when the sampling proportion is less than 10 %; Ŝ 2 has the best performance among all considered methods when the sampling proportion

12 770 Environ Ecol Stat (2015) 22: Fig. 2 Average frequency counts of f ij used in the jackknife estimators where the data were generated from the Pasoh and Lambir plots for selected combinations of quadrat size (column) and sampling proportion q (row)

13 Environ Ecol Stat (2015) 22: m m m bias RMSE Ŝ Cov Ŝ JK Ŝ Lap Ŝ Low Ŝ 1 Ŝ 2 Ŝ 3 Ŝ 4 Ŝ 5 Ŝ Coverage percentage of the 95% C.I Sampling proportion Fig. 3 Bias (top panel), root mean squared error (middle panel), and coverage percentage of the 95 % confidence interval (bottom panel) for estimators of the number of shared species in Pasoh and Lambir plots where the sampling quadrat size are 5 5m, m, and m

14 772 Environ Ecol Stat (2015) 22: is more than 20 %. These results are just as anticipated: a higher-order jackknife estimator is required to reduce bias when the data are sparse (i.e., when sampling effort is smaller). However, the variance of a higher-order estimator tends to increase as more terms are involved. In contrast, a lower-order jackknife estimator can work well when data are rich (i.e., when sampling effort is greater). In practical applications, it may be difficult to determine whether a particular dataset is rich enough to consider a jackknife estimate. Using the order selection procedure in Sect. 3.2, the order-selected jackknife estimator Ŝ JK performs well. The order selection procedure generally stops at Ŝ 2 when q 10 % and stops at Ŝ 3 or Ŝ 4 otherwise; see Web Tables 2 4. In comparison with the three existing methods, it has the smallest bias and most reliable interval estimation with the coverage percentage closest to the anticipated nominal level. Assessed using RMSE, Ŝ JK also performs favorably for small sampling proportions. Nevertheless, its RMSE is comparable with the other estimators for large sampling proportions when the data are rich. Despite its generally superior performance, it is worth noting that the jackknife estimator still suffers a considerable negative bias when the sampling proportion is very small, and the coverage percentage of interval estimation could reach as low as 75 %. Finally we note that the average of the bootstrap SEs are quite close to the sample SEs; a simulation study (data not shown) found the SE estimation would be underestimated more than 30 % if we used (4) and regarded the selection order as a fixed constant. As a consequence, the naïve use of (4) yielded an artificially narrow confidence interval and undermined the performance in terms of coverage percentage. 6 Case study 6.1 Example 1: Bird abundance data in two river estuaries This illustrating example considers bird abundance data from two river estuaries, Ker- Ya River and Chung-Kang River, in Taiwan. A local wild bird society in Taiwan collected data weekly from April 1994 to March 1995; see Chao et al. (2000) for further details. There were 155 species (with 85,867 individuals) and 140 species (with 59,646 individuals) observed at the two estuaries. We calculated D = 111 birds common to both estuaries and f 1+ = 10, f 2+ = 2, f 3+ = 6, f +1 = 15, f +2 = 7, f +3 = 3, f 11 = 4, f 12 = 2, f 21 = 1, and f 13 = f 22 = f 23 = 0. Estimated shared species Ŝ k for k = 1,...,4 and associated p-values for the selected orders are shown at the top of Table 2. When the significance level is α = 0.1, the order k = 3 was selected and the corresponding estimate is Ŝ JK = In addition, using the bootstrap procedure with 100 bootstrap replicates, the SE was For comparison, Ŝ Cov, Ŝ Lap, and Ŝ Low were also evaluated. Though Ŝ Lap and Ŝ JK were very similar, Ŝ JK yielded a much smaller SE. In contrast, Ŝ Cov and Ŝ Lap produced much smaller estimates. 6.2 Example 2: Hong Kong big bird race data This example considers incidence-based data collected from a bird watch race; see Chao et al. (2006) for a description. The rules of the race were simple: record as

15 Environ Ecol Stat (2015) 22: Table 2 Estimated shared species for Example 1 (top) and 2 (bottom) Ŝ 1 Ŝ 2 Ŝ 3 Ŝ 4 Ŝ JK Ŝ Lap Ŝ Low Ŝ Cov Example 1: Abundance-based data Estimated S SE p value <10 4 < Example 2: Incidence-based data Estimated S SE p value <10 4 < The p values indicate the evidence against the null hypothesis of H 0,k 1 ; see Sect. 3.2 many bird species in Hong Kong territory as possible in a period of one month. Consequently, all watchers focused on species seen during the race regardless of abundance of observed species. At the end of the race, each team enumerated all the bird species they observed (together with some other watching information), resulting in incidence-based data. There were 19 participating teams in 1999 and 20 teams in During the race, 217 species were observed in 1999 and 220 species were observed in There were 116 species common to both years. In our notation, n 1 = 19, n 2 = 20, and D = 116; the relevant frequency counts were f 1+ = 6, f 2+ = 4, f 3+ = 5, f +1 = 10, f +2 = 7, f +3 = 4, f 11 = 1, f 12 = 3, f 13 = f 21 = f 23 = 0, and f 22 = 1. Based on these key statistics, results produced by various estimation methods are given at the bottom of Table 2. Jackknife estimates of orders 1 4 yield a range over With the selection order k at 2, we see that Ŝ JK is 133 and the SE is 9.6, slightly larger than results obtained by other estimators. 7 Discussion In this study, the two-sample jackknife procedure in Schechtman and Wang (2004) is extended and applied to estimate the number of shared species between two communities. In addition to developing a series of jackknife estimators for shared species richness, we also suggest a sequential testing criterion for selecting a proper order among these jackknife estimators to strike a reasonable trade-off between reducing bias and inflating variance. The performance of the proposed and existing estimators was evaluated using an empirical study and two real datasets of avian communities. In the empirical study, we found the proposed estimator Ŝ JK possesses advantageous properties compared with the other methods, especially for sampling fraction ranging from 0.5 to 20 %. To confirm our results, an additional simulation study made by six postulated communities with low sampling rates was also carried out and our findings are summarized in the Supplementary Materials, where the performance of the shared

16 774 Environ Ecol Stat (2015) 22: species estimators were similar to what we observed in the empirical data when neither community is a homogeneous population. It is worthwhile to indicate that the second- and fourth-order jackknife estimators, Ŝ 2 and Ŝ 4, could have better performance than Ŝ JK in terms of bias, RMSE, and coverage percentage of the 95 % confidence interval. For a quick estimate of Swithout an order selection step, Ŝ 2 is recommended when the data are rich and Ŝ 4 when the data are sparse. Similar recommendations apply in species richness and population size estimation, where the first- and second-order jackknife estimators are frequently suggested in applications (Heltshe and Forrester 1983; Hellmann and Fowler 1999; Chao 2005). It is further worth remarking that rare species (observed only once or twice) convey the most information about the number of unseen species in the sample; Eren et al. (2012) and Chiu et al. (2014) also underscored the importance of low observed frequencies for estimator performance. The empirical study reveals that the proposed estimator Ŝ JK can yield interval estimates that are much more reasonable than some typical methods when the sample proportion is small (q 3%), though this method still suffers from considerable negative bias in this case. Seeking a more satisfactory estimator in this setting is a challenge and is certainly worth pursuing. In a sense, this work at least provides a possible framework for doing so. In particular, the jackknife method appears promising for addressing this problem. For example, as shown in the empirical study, the fifth-order jackknife estimator Ŝ 5 performed well in terms of bias when q 1 %; unfortunately it also produced a large variance. Based on our findings, further research may reduce the variance of a higher-order jackknife estimator and/or develop an alternative order selection procedure. Burnham and Overton (1978) proposed a sequential criterion to select the order from a series of jackknife estimators to estimate a population size. The selection criterion considered in our study is similar to theirs with one distinction regarding the variance estimation. To estimate the variance of the resulting estimator based on the sequential testing criterion, Burnham and Overton (1978) did not take the randomness of the selection order into account and instead only calculated the asymptotic variance of the selected estimator with a fixed order from the sequential test. The resulting variance estimate is therefore underestimated, as we mentioned in Sect A non-parametric bootstrap procedure (Chao et al. 2000) is useful to overcome this drawback and thus improve the coverage percentage of such an estimator. In principle, as suggested by a referee, the selected order may reflect sufficiency of the sampling information, e.g., the data are sparse if the selected order k > 2 and vice versa. Although this seems reasonable, this suggestion warrants further investigation. More relevant to the overarching goal of this study would be to develop a stopping rule for obtaining an estimate with a desired accuracy (Yip et al. 2003) or an extension that directly accounts for the cost of sampling (Rasmussen and Starr 1979; Chao et al. 1993). It is straightforward to extend our method to estimate the number of shared species in multiple communities (Pan et al. 2009). In the Supplementary Materials, an algorithm describes in detail the sequence of jackknife estimators in the case of three communities. Several first-order jackknife estimators have been explicitly formulated

17 Environ Ecol Stat (2015) 22: and tabulated in the Supplementary Materials. For the case of more than three communities, jackknife estimators can be developed in a similar manner. Acknowledgments The authors are grateful to Professor Fangliang He for his valuable discussions and providing the Lambir forest plot data. The authors thank the referees and editor for their useful comments. We also thank Roman Gulati for his generous editing assistance. This work was supported by the Ministry of Science and Technology of Taiwan. 8 Appendix: A general result of the jackknife estimators Ŝ k Define a 2-dimensional array of coefficients d t,u as: d 1,1 = 1 d t,t = td t 1,t 1 t 2; d t,1 = 2 t 1 t 2; d t,u = d t 1,u + u ( ) d t 1,u d t 1,u 1 t 2 and 2 u < t d t,u = 0 otherwise. These coefficients are used to simplify the expressions of jackknife estimators. The formulae can be summarized with the following Theorem. Theorem 1 For each nonnegative integer v, we have: and ν+1 n 1 t Ŝ 2ν,X = D + d ν+1,t f t+ + n 1 + Ŝ 2ν,Y = D + + ν+1 t=1 ν t=1 u=1 ν t=1 ν ν+1 t=1 u=1 ν u=1 d ν,u n 2 u n 2 f +u d ν+1,t d ν,u (n 1 t)(n 2 u) n 1 n 2 f tu (7) n 1 t ν+1 n 2 u d ν,t f t+ + d ν+1,u f +u n 1 n 2 u=1 d ν,t d ν+1,u (n 1 t)(n 2 u) n 1 n 2 f tu. (8) Therefore, Ŝ 2ν+1 = (n 1 Ŝ 2ν,X + n 2 Ŝ 2ν,Y )/(n 1 + n 2 ) is a linear combination of the frequencies f tu. Furthermore, the (2ν + 2)-th order jackknife estimator is: ν+1 Ŝ 2ν+2 = D + t=1 ν+1 ν+1 + t=1 u=1 n 1 t ν+1 d ν+1,t f t+ + n 1 u=1 d ν+1,u n 2 u n 2 f +u d ν+1,t d ν+1,u (n 1 t)(n 2 u) n 1 n 2 f tu. (9)

18 776 Environ Ecol Stat (2015) 22: The proof is established by mathematical induction and is shown in the Supplementary Materials due to lengthy algebra. We can further simplify the formulae in the next Corollary. Corollary 1 When the sample sizes n 1 and n 2 are sufficiently large, define λ j = (n j h)/(n 1 + n 2 ) for any finite number h and j = 1, 2. Asymptotically, the explicit forms of the jackknife estimators Ŝ k for k = 1,...,6, are as follows: Ŝ 1 = D + λ 1 f 1+ + λ 2 f +1 ; Ŝ 2 = D + f 1+ + f +1 + f 11 ; Ŝ 3 = D + (1 + 2λ 1 ) f 1+ 2λ 1 f 2+ + (1 + 2λ 2 ) f +1 2λ 2 f f 11 2λ 1 f 12 2λ 1 λ 2 f 21 ; Ŝ 4 = D + 3 f 1+ 2 f f +1 2 f f 11 6 f 12 6 f f 22 ; Ŝ 5 = D + (3 + 4λ 1 ) f 1+ 2(1 + 5λ 1 ) f λ 1 f 3+ + (3 + 4λ 2 ) f +1 2(1 + 5λ 2 ) f λ 2 f f 11 + (22λ 1 36) f 12 (22λ 2 36) f f λ 1 f λ 2 f 13 12λ 1 f 32 12λ 2 f 23 ; Ŝ 6 = D + 7 f f f f f f f f f f f f f f f 33. References Amstrup SC, McDonald TL, Manly BF (eds) (2010) Handbook of capture recapture analysis. Princeton University Press, Princeton Arvesen JN (1969) Jackknifing U-statistics. Ann Math Stat 40: Burnham KP, Overton WS (1978) Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65(3): Burnham KP, Overton WS (1979) Robust estimation of population size when capture probabilities vary among animals. Ecology 60(5): Chao A (1987) Estimating the population size for capture recapture data with unequal catchability. Biometrics 43: Chao A (2005) Species estimation and applications. In: Balakrishnan N, Read CB, Vidakovic B (eds) Encyclopedia of statistical sciences, vol 12, 2nd edn. Wiley, New York, pp Chao A, Hwang W-H, Chen Y-C, Kuo C-Y (2000) Estimating the number of shared species in two communities. Stat Sin 10: Chao A, Jost L, Chiang S-C, Jiang Y-H, Chazdon R (2008) A two-stage probabilistic approach to multiplecommunity similarity indices. Biometrics 64: Chao A, Lee S-M (1992) Estimating the number of classes via sample coverage. J Am Stat Assoc 87: Chao A, Ma M-C, Yang MCK (1993) Stopping rules and estimation for recapture debugging with unequal failure rates. Biometrika 80: Chao A, Shen T-J (2010) Program SPADE (Species Prediction And Diversity Estimation). Program and User s Guide published at Chao A, Shen T-J, Hwang W-H (2006) Application of Laplace s boundary-mode approximations to estimate species and shared species richness. Aust N Z J Stat 48: Chiarucci A, Enright NJ, Perry GLW, Miller BP, Lamont BB (2003) Performance of nonparametric species richness estimators in a high diversity plant community. Divers Distrib 9:

19 Environ Ecol Stat (2015) 22: Chiu CH, Wang YT, Walther BA, Chao A (2014) An improved nonparametric lower bound of species richness via a modified good-turing frequency formula. Biometrics 70(3): Colwell RK, Coddington JA (1994) Estimating terrestrial biodiversity through extrapolation. Philos Trans R Soc Lond B 345: Colwell RK, Elsensohn JE (2014) EstimateS turns 20: statistical estimation of species richness and shared species from samples, with non-parametric extrapolation. Ecography 37: Condit R, Pitman N, Leigh EG Jr, Chave J, Terborgh J, Foster RB, Núñez P, Aguilar S, Valencia R, Villa G, Muller-Landau HC, Losos E, Hubbell SP (2002) Beta-diversity in tropical forest trees. Science 295: Cormack RM (1989) Log-linear models for capture-recapture. Biometrics Darroch JN, Ratcliff D (1980) A note on capture recapture estimation. Biometrics 36: Eren MI, Chao A, Hwang WH, Colwell RK (2012) Estimating the richness of a population when the maximum number of classes is fixed: a nonparametric solution to an archaeological problem. PLoS One 7(5):e34179 Esty WW (1985) Estimation of the number of classes in a population and the coverage of a sample. Math Stat 10:41 50 Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40: Gotelli NJ, Colwell RK (2009) Estimating species richness. In: Magurran A, McGill B (eds) Frontiers in measuring biodiversity. Oxford University Press, New York Goutis C, Casella G (1999) Explaining the saddle point approximation. Am Stat 53: Heltshe JF, Forrester NE (1983) Estimating species using the jackknife procedure. Biometrics 39:1 11 Hellmann JJ, Fowler GW (1999) Bias, precision, and accuracy of four measures of species richness. Ecol Appl 9: Hwang WH, Huang SY (2003) Estimation in capture recapture models when covariates are subject to measurement errors. Biometrics 59: Krishnamani R, Kumar A, Harte J (2004) Estimating species richness at large spatial scales using data from discrete plots. Ecography 27: Magurran AE (2004) Measuring biological diversity. Blackwell, Oxford Ostling A, Harte J, Green J, Kinzig A (2003) A community-level fractal property produces power-law species area relationships. Oikos 103: Palmer MW (1990) The estimation of species richness by extrapolation. Ecology 71: Palmer MW (1991) Estimating species richness: the second-order jackknife reconsidered. Ecology 72: Pan H-Y, Chao A, Foissner W (2009) A nonparametric lower bound for the number of specie hared by multiple communities. J Agric Biol Environ Stat 14: Quenouille MH (1949) Approximate tests of correlation in time series. J R Stat Soc Ser B 11:68 84 Rasmussen SL, Starr N (1979) Optimal and adaptive stopping in the search for new species. J Am Stat Assoc 74: Schechtman E, Wang S (2004) Jackknifing two-sample statistics. J Stat Plan Inference 119: Schloss PD, Handelsman J (2006) Introducing SONS, a tool for OTU-based comparisons of membership and structure between microbial communities. Appl Environ Microbiol 72: Shao J, Tu D (1995) The jackknife and bootstrap. Springer, New York Tjørve E, Tjørve KMC (2008) The species area relationship, self-similarity, and the true meaning of the z-value. Ecology 89: Walther BA, Moore JL (2005) The concepts of bias, precision and accuracy, and their use in testing the performance of species richness estimators, with a literature review of estimator performance. Ecography 28: Walther BA, Morand S (1998) Comparative performance of species richness estimation methods. Parasitology 116: Williams VL, Witkowski ET, Balkwill K (2007) The use of incidence-based species richness estimators, species accumulation curves and similarity measures to appraise ethnobotanical inventories from South Africa. Biodivers Conserv 16: Yip PS, Fang X, Zhou Y, Wang Y (2003) Sequential procedure for fixed accuracy estimation of the population size in recapture sampling. Aust N Z J Stat 45: Yue JC, Clayton MK (2012) Sequential sampling in the search for new shared species. J Stat Plan Inference 142:

20 778 Environ Ecol Stat (2015) 22: Chia-Jui Chuang received the Ph.D. degree in mathematics from the National Chung Hsing University, Taiwan in He is now a research fellow in the National Health Research Institutes, Taiwan. His research interests are in ecological statistics and public health. Tsung-Jen Shen received the Ph.D. degree in statistics from the National Tsing Hua University, Taiwan in Since 2010, he has been an associate professor at the National Chung Hsing University. His research interests are in developing statistical methods to deal with ecological issues, including alpha and beta diversity indices estimation, species richness prediction and so forth. Wen-Han Hwang received the Ph.D. degree in statistics from the National Tsing Hua University, Taiwan in Since 2012, he has been a professor at the National Chung Hsing University. His research interests are in ecological statistics, measurement error analysis and statistical inference.

CHAO, JACKKNIFE AND BOOTSTRAP ESTIMATORS OF SPECIES RICHNESS

CHAO, JACKKNIFE AND BOOTSTRAP ESTIMATORS OF SPECIES RICHNESS IJAMAA, Vol. 12, No. 1, (January-June 2017), pp. 7-15 Serials Publications ISSN: 0973-3868 CHAO, JACKKNIFE AND BOOTSTRAP ESTIMATORS OF SPECIES RICHNESS CHAVAN KR. SARMAH ABSTRACT: The species richness

More information

Webinar Session 1. Introduction to Modern Methods for Analyzing Capture- Recapture Data: Closed Populations 1

Webinar Session 1. Introduction to Modern Methods for Analyzing Capture- Recapture Data: Closed Populations 1 Webinar Session 1. Introduction to Modern Methods for Analyzing Capture- Recapture Data: Closed Populations 1 b y Bryan F.J. Manly Western Ecosystems Technology Inc. Cheyenne, Wyoming bmanly@west-inc.com

More information

APPENDIX E: Estimating diversity profile based on the proposed RAD estimator (for abundance data).

APPENDIX E: Estimating diversity profile based on the proposed RAD estimator (for abundance data). Anne Chao, T. C. Hsieh, Robin L. Chazdon, Robert K. Colwell, and Nicholas J. Gotelli. 2015. Unveiling the species-rank abundance distribution by generalizing the Good-Turing sample coverage theory. Ecology

More information

A Nonparametric Estimator of Species Overlap

A Nonparametric Estimator of Species Overlap A Nonparametric Estimator of Species Overlap Jack C. Yue 1, Murray K. Clayton 2, and Feng-Chang Lin 1 1 Department of Statistics, National Chengchi University, Taipei, Taiwan 11623, R.O.C. and 2 Department

More information

Nonparametric estimation of Shannon's index of diversity when there are unseen species in sample

Nonparametric estimation of Shannon's index of diversity when there are unseen species in sample Environmental and Ecological Statistics 10, 429±443, 2003 Nonparametric estimation of Shannon's index of diversity when there are unseen species in sample ANNE CHAO and TSUNG-JEN SHEN Institute of Statistics,

More information

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Alisa A. Gorbunova and Boris Yu. Lemeshko Novosibirsk State Technical University Department of Applied Mathematics,

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

AN INCIDENCE-BASED RICHNESS ESTIMATOR FOR QUADRATS SAMPLED WITHOUT REPLACEMENT

AN INCIDENCE-BASED RICHNESS ESTIMATOR FOR QUADRATS SAMPLED WITHOUT REPLACEMENT Ecology, 89(7), 008, pp. 05 060 Ó 008 by the Ecological Society of America AN INCIDENCE-BASED RICHNESS ESTIMATOR FOR QUADRATS SAMPLED WITHOUT REPLACEMENT TSUNG-JEN SHEN 1, AND FANGLIANG HE 1 Department

More information

Plugin Confidence Intervals in Discrete Distributions

Plugin Confidence Intervals in Discrete Distributions Plugin Confidence Intervals in Discrete Distributions T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Philadelphia, PA 19104 Abstract The standard Wald interval is widely

More information

Package SPECIES. R topics documented: April 23, Type Package. Title Statistical package for species richness estimation. Version 1.

Package SPECIES. R topics documented: April 23, Type Package. Title Statistical package for species richness estimation. Version 1. Package SPECIES April 23, 2011 Type Package Title Statistical package for species richness estimation Version 1.0 Date 2010-01-24 Author Ji-Ping Wang, Maintainer Ji-Ping Wang

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software April 2011, Volume 40, Issue 9. http://www.jstatsoft.org/ SPECIES: An R Package for Species Richness Estimation Ji-Ping Wang Northwestern University Abstract We introduce

More information

A CONDITIONALLY-UNBIASED ESTIMATOR OF POPULATION SIZE BASED ON PLANT-CAPTURE IN CONTINUOUS TIME. I.B.J. Goudie and J. Ashbridge

A CONDITIONALLY-UNBIASED ESTIMATOR OF POPULATION SIZE BASED ON PLANT-CAPTURE IN CONTINUOUS TIME. I.B.J. Goudie and J. Ashbridge This is an electronic version of an article published in Communications in Statistics Theory and Methods, 1532-415X, Volume 29, Issue 11, 2000, Pages 2605-2619. The published article is available online

More information

Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size

Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size Ecology, 93(12), 2012, pp. 2533 2547 Ó 2012 by the Ecological Society of America Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size ANNE CHAO 1,3 AND LOU

More information

Ecological Archives E A2

Ecological Archives E A2 Ecological Archives E091-147-A2 Ilyas Siddique, Ima Célia Guimarães Vieira, Susanne Schmidt, David Lamb, Cláudio José Reis Carvalho, Ricardo de Oliveira Figueiredo, Simon Blomberg, Eric A. Davidson. Year.

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Statistics Preprints Statistics -00 A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators Jianying Zuo Iowa State University, jiyizu@iastate.edu William Q. Meeker

More information

Supplement: Beta Diversity & The End Ordovician Extinctions. Appendix for: Response of beta diversity to pulses of Ordovician-Silurian extinction

Supplement: Beta Diversity & The End Ordovician Extinctions. Appendix for: Response of beta diversity to pulses of Ordovician-Silurian extinction Appendix for: Response of beta diversity to pulses of Ordovician-Silurian extinction Collection- and formation-based sampling biases within the original dataset Both numbers of occurrences and numbers

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

MATH Notebook 3 Spring 2018

MATH Notebook 3 Spring 2018 MATH448001 Notebook 3 Spring 2018 prepared by Professor Jenny Baglivo c Copyright 2010 2018 by Jenny A. Baglivo. All Rights Reserved. 3 MATH448001 Notebook 3 3 3.1 One Way Layout........................................

More information

Multiple regression and inference in ecology and conservation biology: further comments on identifying important predictor variables

Multiple regression and inference in ecology and conservation biology: further comments on identifying important predictor variables Biodiversity and Conservation 11: 1397 1401, 2002. 2002 Kluwer Academic Publishers. Printed in the Netherlands. Multiple regression and inference in ecology and conservation biology: further comments on

More information

Package breakaway. R topics documented: March 30, 2016

Package breakaway. R topics documented: March 30, 2016 Title Species Richness Estimation and Modeling Version 3.0 Date 2016-03-29 Author and John Bunge Maintainer Package breakaway March 30, 2016 Species richness estimation is an important

More information

A new multivariate CUSUM chart using principal components with a revision of Crosier's chart

A new multivariate CUSUM chart using principal components with a revision of Crosier's chart Title A new multivariate CUSUM chart using principal components with a revision of Crosier's chart Author(s) Chen, J; YANG, H; Yao, JJ Citation Communications in Statistics: Simulation and Computation,

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

Sample size calculations for logistic and Poisson regression models

Sample size calculations for logistic and Poisson regression models Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National

More information

Estimation and sample size calculations for correlated binary error rates of biometric identification devices

Estimation and sample size calculations for correlated binary error rates of biometric identification devices Estimation and sample size calculations for correlated binary error rates of biometric identification devices Michael E. Schuckers,11 Valentine Hall, Department of Mathematics Saint Lawrence University,

More information

Bootstrap Tests: How Many Bootstraps?

Bootstrap Tests: How Many Bootstraps? Bootstrap Tests: How Many Bootstraps? Russell Davidson James G. MacKinnon GREQAM Department of Economics Centre de la Vieille Charité Queen s University 2 rue de la Charité Kingston, Ontario, Canada 13002

More information

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

Weizhen Wang & Zhongzhan Zhang

Weizhen Wang & Zhongzhan Zhang Asymptotic infimum coverage probability for interval estimation of proportions Weizhen Wang & Zhongzhan Zhang Metrika International Journal for Theoretical and Applied Statistics ISSN 006-1335 Volume 77

More information

Distribution Theory. Comparison Between Two Quantiles: The Normal and Exponential Cases

Distribution Theory. Comparison Between Two Quantiles: The Normal and Exponential Cases Communications in Statistics Simulation and Computation, 34: 43 5, 005 Copyright Taylor & Francis, Inc. ISSN: 0361-0918 print/153-4141 online DOI: 10.1081/SAC-00055639 Distribution Theory Comparison Between

More information

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS Journal of Biopharmaceutical Statistics, 18: 1184 1196, 2008 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400802369053 SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE

More information

Non-independence in Statistical Tests for Discrete Cross-species Data

Non-independence in Statistical Tests for Discrete Cross-species Data J. theor. Biol. (1997) 188, 507514 Non-independence in Statistical Tests for Discrete Cross-species Data ALAN GRAFEN* AND MARK RIDLEY * St. John s College, Oxford OX1 3JP, and the Department of Zoology,

More information

Estimation of the Conditional Variance in Paired Experiments

Estimation of the Conditional Variance in Paired Experiments Estimation of the Conditional Variance in Paired Experiments Alberto Abadie & Guido W. Imbens Harvard University and BER June 008 Abstract In paired randomized experiments units are grouped in pairs, often

More information

Efficient Robbins-Monro Procedure for Binary Data

Efficient Robbins-Monro Procedure for Binary Data Efficient Robbins-Monro Procedure for Binary Data V. Roshan Joseph School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205, USA roshan@isye.gatech.edu SUMMARY

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

The exact bootstrap method shown on the example of the mean and variance estimation

The exact bootstrap method shown on the example of the mean and variance estimation Comput Stat (2013) 28:1061 1077 DOI 10.1007/s00180-012-0350-0 ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011

More information

PRE-TEST ESTIMATION OF THE REGRESSION SCALE PARAMETER WITH MULTIVARIATE STUDENT-t ERRORS AND INDEPENDENT SUB-SAMPLES

PRE-TEST ESTIMATION OF THE REGRESSION SCALE PARAMETER WITH MULTIVARIATE STUDENT-t ERRORS AND INDEPENDENT SUB-SAMPLES Sankhyā : The Indian Journal of Statistics 1994, Volume, Series B, Pt.3, pp. 334 343 PRE-TEST ESTIMATION OF THE REGRESSION SCALE PARAMETER WITH MULTIVARIATE STUDENT-t ERRORS AND INDEPENDENT SUB-SAMPLES

More information

Glossary. Appendix G AAG-SAM APP G

Glossary. Appendix G AAG-SAM APP G Appendix G Glossary Glossary 159 G.1 This glossary summarizes definitions of the terms related to audit sampling used in this guide. It does not contain definitions of common audit terms. Related terms

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

Change-point models and performance measures for sequential change detection

Change-point models and performance measures for sequential change detection Change-point models and performance measures for sequential change detection Department of Electrical and Computer Engineering, University of Patras, 26500 Rion, Greece moustaki@upatras.gr George V. Moustakides

More information

Are Forecast Updates Progressive?

Are Forecast Updates Progressive? MPRA Munich Personal RePEc Archive Are Forecast Updates Progressive? Chia-Lin Chang and Philip Hans Franses and Michael McAleer National Chung Hsing University, Erasmus University Rotterdam, Erasmus University

More information

Specification Test for Instrumental Variables Regression with Many Instruments

Specification Test for Instrumental Variables Regression with Many Instruments Specification Test for Instrumental Variables Regression with Many Instruments Yoonseok Lee and Ryo Okui April 009 Preliminary; comments are welcome Abstract This paper considers specification testing

More information

A comparison study of the nonparametric tests based on the empirical distributions

A comparison study of the nonparametric tests based on the empirical distributions 통계연구 (2015), 제 20 권제 3 호, 1-12 A comparison study of the nonparametric tests based on the empirical distributions Hyo-Il Park 1) Abstract In this study, we propose a nonparametric test based on the empirical

More information

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals Michael Sherman Department of Statistics, 3143 TAMU, Texas A&M University, College Station, Texas 77843,

More information

On Selecting Tests for Equality of Two Normal Mean Vectors

On Selecting Tests for Equality of Two Normal Mean Vectors MULTIVARIATE BEHAVIORAL RESEARCH, 41(4), 533 548 Copyright 006, Lawrence Erlbaum Associates, Inc. On Selecting Tests for Equality of Two Normal Mean Vectors K. Krishnamoorthy and Yanping Xia Department

More information

INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF INDEPENDENT GROUPS ABSTRACT

INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF INDEPENDENT GROUPS ABSTRACT Mirtagioğlu et al., The Journal of Animal & Plant Sciences, 4(): 04, Page: J. 344-349 Anim. Plant Sci. 4():04 ISSN: 08-708 INFLUENCE OF USING ALTERNATIVE MEANS ON TYPE-I ERROR RATE IN THE COMPARISON OF

More information

Analysis of Type-II Progressively Hybrid Censored Data

Analysis of Type-II Progressively Hybrid Censored Data Analysis of Type-II Progressively Hybrid Censored Data Debasis Kundu & Avijit Joarder Abstract The mixture of Type-I and Type-II censoring schemes, called the hybrid censoring scheme is quite common in

More information

Distance-based test for uncertainty hypothesis testing

Distance-based test for uncertainty hypothesis testing Sampath and Ramya Journal of Uncertainty Analysis and Applications 03, :4 RESEARCH Open Access Distance-based test for uncertainty hypothesis testing Sundaram Sampath * and Balu Ramya * Correspondence:

More information

On the mean connected induced subgraph order of cographs

On the mean connected induced subgraph order of cographs AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 71(1) (018), Pages 161 183 On the mean connected induced subgraph order of cographs Matthew E Kroeker Lucas Mol Ortrud R Oellermann University of Winnipeg Winnipeg,

More information

AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY FOR STABILITY STUDIES

AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY FOR STABILITY STUDIES Journal of Biopharmaceutical Statistics, 16: 1 14, 2006 Copyright Taylor & Francis, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500406421 AN ALTERNATIVE APPROACH TO EVALUATION OF POOLABILITY

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Inferences About the Difference Between Two Means

Inferences About the Difference Between Two Means 7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Coefficients of Correlation, Alienation and Determination Hervé Abdi Lynne J. Williams 1 Overview The coefficient of

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Testing for a unit root in an ar(1) model using three and four moment approximations: symmetric distributions

Testing for a unit root in an ar(1) model using three and four moment approximations: symmetric distributions Hong Kong Baptist University HKBU Institutional Repository Department of Economics Journal Articles Department of Economics 1998 Testing for a unit root in an ar(1) model using three and four moment approximations:

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Reliable Inference in Conditions of Extreme Events. Adriana Cornea

Reliable Inference in Conditions of Extreme Events. Adriana Cornea Reliable Inference in Conditions of Extreme Events by Adriana Cornea University of Exeter Business School Department of Economics ExISta Early Career Event October 17, 2012 Outline of the talk Extreme

More information

A Characterization of (3+1)-Free Posets

A Characterization of (3+1)-Free Posets Journal of Combinatorial Theory, Series A 93, 231241 (2001) doi:10.1006jcta.2000.3075, available online at http:www.idealibrary.com on A Characterization of (3+1)-Free Posets Mark Skandera Department of

More information

Test Volume 11, Number 1. June 2002

Test Volume 11, Number 1. June 2002 Sociedad Española de Estadística e Investigación Operativa Test Volume 11, Number 1. June 2002 Optimal confidence sets for testing average bioequivalence Yu-Ling Tseng Department of Applied Math Dong Hwa

More information

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone

More information

Constructing Ensembles of Pseudo-Experiments

Constructing Ensembles of Pseudo-Experiments Constructing Ensembles of Pseudo-Experiments Luc Demortier The Rockefeller University, New York, NY 10021, USA The frequentist interpretation of measurement results requires the specification of an ensemble

More information

GENERAL PROBLEMS OF METROLOGY AND MEASUREMENT TECHNIQUE

GENERAL PROBLEMS OF METROLOGY AND MEASUREMENT TECHNIQUE DOI 10.1007/s11018-017-1141-3 Measurement Techniques, Vol. 60, No. 1, April, 2017 GENERAL PROBLEMS OF METROLOGY AND MEASUREMENT TECHNIQUE APPLICATION AND POWER OF PARAMETRIC CRITERIA FOR TESTING THE HOMOGENEITY

More information

Chiang Mai J. Sci. 2016; 43(3) : Contributed Paper

Chiang Mai J. Sci. 2016; 43(3) : Contributed Paper Chiang Mai J Sci 06; 43(3) : 67-68 http://epgsciencecmuacth/ejournal/ Contributed Paper Upper Bounds of Generalized p-values for Testing the Coefficients of Variation of Lognormal Distributions Rada Somkhuean,

More information

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson Proceedings of the 0 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. RELATIVE ERROR STOCHASTIC KRIGING Mustafa H. Tongarlak Bruce E. Ankenman Barry L.

More information

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Statistica Sinica 13(2003), 613-623 TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Hansheng Wang and Jun Shao Peking University and University of Wisconsin Abstract: We consider the estimation

More information

Four aspects of a sampling strategy necessary to make accurate and precise inferences about populations are:

Four aspects of a sampling strategy necessary to make accurate and precise inferences about populations are: Why Sample? Often researchers are interested in answering questions about a particular population. They might be interested in the density, species richness, or specific life history parameters such as

More information

Chapter Seven: Multi-Sample Methods 1/52

Chapter Seven: Multi-Sample Methods 1/52 Chapter Seven: Multi-Sample Methods 1/52 7.1 Introduction 2/52 Introduction The independent samples t test and the independent samples Z test for a difference between proportions are designed to analyze

More information

CRISP: Capture-Recapture Interactive Simulation Package

CRISP: Capture-Recapture Interactive Simulation Package CRISP: Capture-Recapture Interactive Simulation Package George Volichenko Carnegie Mellon University Pittsburgh, PA gvoliche@andrew.cmu.edu December 17, 2012 Contents 1 Executive Summary 1 2 Introduction

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

Analytical Bootstrap Methods for Censored Data

Analytical Bootstrap Methods for Censored Data JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 6(2, 129 141 Copyright c 2002, Lawrence Erlbaum Associates, Inc. Analytical Bootstrap Methods for Censored Data ALAN D. HUTSON Division of Biostatistics,

More information

Are Forecast Updates Progressive?

Are Forecast Updates Progressive? CIRJE-F-736 Are Forecast Updates Progressive? Chia-Lin Chang National Chung Hsing University Philip Hans Franses Erasmus University Rotterdam Michael McAleer Erasmus University Rotterdam and Tinbergen

More information

Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples

Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples 90 IEEE TRANSACTIONS ON RELIABILITY, VOL. 52, NO. 1, MARCH 2003 Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples N. Balakrishnan, N. Kannan, C. T.

More information

2 Mathematical Model, Sequential Probability Ratio Test, Distortions

2 Mathematical Model, Sequential Probability Ratio Test, Distortions AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 153 162 Robust Sequential Testing of Hypotheses on Discrete Probability Distributions Alexey Kharin and Dzmitry Kishylau Belarusian State University,

More information

Supporting information for Demographic history and rare allele sharing among human populations.

Supporting information for Demographic history and rare allele sharing among human populations. Supporting information for Demographic history and rare allele sharing among human populations. Simon Gravel, Brenna M. Henn, Ryan N. Gutenkunst, mit R. Indap, Gabor T. Marth, ndrew G. Clark, The 1 Genomes

More information

The NP-Hardness of the Connected p-median Problem on Bipartite Graphs and Split Graphs

The NP-Hardness of the Connected p-median Problem on Bipartite Graphs and Split Graphs Chiang Mai J. Sci. 2013; 40(1) 8 3 Chiang Mai J. Sci. 2013; 40(1) : 83-88 http://it.science.cmu.ac.th/ejournal/ Contributed Paper The NP-Hardness of the Connected p-median Problem on Bipartite Graphs and

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Parametric Empirical Bayes Methods for Microarrays

Parametric Empirical Bayes Methods for Microarrays Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Statistical Inference of Covariate-Adjusted Randomized Experiments

Statistical Inference of Covariate-Adjusted Randomized Experiments 1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu

More information

Conditional Distribution Fitting of High Dimensional Stationary Data

Conditional Distribution Fitting of High Dimensional Stationary Data Conditional Distribution Fitting of High Dimensional Stationary Data Miguel Cuba and Oy Leuangthong The second order stationary assumption implies the spatial variability defined by the variogram is constant

More information

Disentangling spatial structure in ecological communities. Dan McGlinn & Allen Hurlbert.

Disentangling spatial structure in ecological communities. Dan McGlinn & Allen Hurlbert. Disentangling spatial structure in ecological communities Dan McGlinn & Allen Hurlbert http://mcglinn.web.unc.edu daniel.mcglinn@usu.edu The Unified Theories of Biodiversity 6 unified theories of diversity

More information

Chap 4. Software Reliability

Chap 4. Software Reliability Chap 4. Software Reliability 4.2 Reliability Growth 1. Introduction 2. Reliability Growth Models 3. The Basic Execution Model 4. Calendar Time Computation 5. Reliability Demonstration Testing 1. Introduction

More information

Minimax design criterion for fractional factorial designs

Minimax design criterion for fractional factorial designs Ann Inst Stat Math 205 67:673 685 DOI 0.007/s0463-04-0470-0 Minimax design criterion for fractional factorial designs Yue Yin Julie Zhou Received: 2 November 203 / Revised: 5 March 204 / Published online:

More information

Product-limit estimators of the survival function with left or right censored data

Product-limit estimators of the survival function with left or right censored data Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut

More information

Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances

Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances METRON - International Journal of Statistics 2008, vol. LXVI, n. 3, pp. 285-298 SHALABH HELGE TOUTENBURG CHRISTIAN HEUMANN Mean squared error matrix comparison of least aquares and Stein-rule estimators

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Chapter 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions

Chapter 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions Chapter 15 Confidence Intervals for Mean Difference Between Two Delta-Distributions Karen V. Rosales and Joshua D. Naranjo Abstract Traditional two-sample estimation procedures like pooled-t, Welch s t,

More information

EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST

EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST EVALUATING THE REPEATABILITY OF TWO STUDIES OF A LARGE NUMBER OF OBJECTS: MODIFIED KENDALL RANK-ORDER ASSOCIATION TEST TIAN ZHENG, SHAW-HWA LO DEPARTMENT OF STATISTICS, COLUMBIA UNIVERSITY Abstract. In

More information

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS

COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS Communications in Statistics - Simulation and Computation 33 (2004) 431-446 COMPARISON OF FIVE TESTS FOR THE COMMON MEAN OF SEVERAL MULTIVARIATE NORMAL POPULATIONS K. Krishnamoorthy and Yong Lu Department

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

October 1, Keywords: Conditional Testing Procedures, Non-normal Data, Nonparametric Statistics, Simulation study

October 1, Keywords: Conditional Testing Procedures, Non-normal Data, Nonparametric Statistics, Simulation study A comparison of efficient permutation tests for unbalanced ANOVA in two by two designs and their behavior under heteroscedasticity arxiv:1309.7781v1 [stat.me] 30 Sep 2013 Sonja Hahn Department of Psychology,

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

Application of Homogeneity Tests: Problems and Solution

Application of Homogeneity Tests: Problems and Solution Application of Homogeneity Tests: Problems and Solution Boris Yu. Lemeshko (B), Irina V. Veretelnikova, Stanislav B. Lemeshko, and Alena Yu. Novikova Novosibirsk State Technical University, Novosibirsk,

More information

Moment Aberration Projection for Nonregular Fractional Factorial Designs

Moment Aberration Projection for Nonregular Fractional Factorial Designs Moment Aberration Projection for Nonregular Fractional Factorial Designs Hongquan Xu Department of Statistics University of California Los Angeles, CA 90095-1554 (hqxu@stat.ucla.edu) Lih-Yuan Deng Department

More information

Week Topics of study Home/Independent Learning Assessment (If in addition to homework) 7 th September 2015

Week Topics of study Home/Independent Learning Assessment (If in addition to homework) 7 th September 2015 Week Topics of study Home/Independent Learning Assessment (If in addition to homework) 7 th September Functions: define the terms range and domain (PLC 1A) and identify the range and domain of given functions

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information