Issues of Cost and Efficiency in the Design of Reliability Studies

Size: px
Start display at page:

Download "Issues of Cost and Efficiency in the Design of Reliability Studies"

Transcription

1 Biometrics 59, December 2003 Issues of Cost and Efficiency in the Design of Reliability Studies M. M. Shoukri, 1,2, M. H. Asyali, 1 and S. D. Walter 3 1 Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Center, P.O. Box 3354, Riyadh, Saudi Arabia 2 Department of Epidemiology and Biostatistics, University of Western Ontario, London, Ontario, Canada 3 Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada shoukri@kfshrc.edu.sa Summary. Reliability of continuous and dichotomous responses is usually assessed by means of the intraclass correlation coefficient (ICC). We derive the optimal allocation of the number of subjects k and the number of repeated measurements n that minimize the variance of the estimated ICC. Cost constraints are discussed for the case of normally distributed responses. Tables showing optimal choices of k and n are given, along with guidelines for the design of reliability studies in light of our results and those reported by others. Key words: Cost function; Lagrange multiplier; Reliability index; Sample size. 1. Introduction Interobserver reliability studies are conducted to investigate the reproducibility and level of agreement on assessments made by several raters. Typically, several raters score each of a series of subjects and then their assessments are compared. For a general review on the topic of interobserver agreement, for both continuous and binary assessments, we refer the reader to the recent reviews by Dunn (1992), Shoukri (1999, 2000), Shoukri and Asyali (2002), and the references therein. An important aspect of the design of interobserver reliability study is the determination of sample size. Following the notation of Walter, Eliasziw, and Donner (1998), we suppose that n observations are made on each of k subjects, and that the jth observation Y ij for subject i (i =1,2,..., k; j =1, 2,..., n) is Y ij = µ + s i + e ij, (1) where the random subject effects {s i } are normally distributed with mean 0 and variance σ 2 s,orn(0, σ 2 s), the measurement error {e ij } are N(0, σ 2 e), and the {s i } and {e ij } terms are independent. We assume the subjects are randomly drawn from some population of interest. Regardless of whether the assessments are binary or continuous scale measurements, an index of reliability should distinguish the within-subject variation from the between-subjects variation. A widely recognized index of reliability, which possesses this property, is the intraclass correlation coefficient (ICC), defined as ρ = σ 2 s/(σ 2 s + σ 2 e). Therefore, ρ is defined as the proportion of the total variation that is associated with between-subject variation. A frequently adopted design (for interrater reliability) is when the k subjects are each rated by the same n raters. However, a similar approach can also be adopted for test-retest reliability when a single subject is assessed repeatedly on each of several occasions, or when replicates are taken from different subjects by a single judge on different occasions (Haggard, 1958). In each of these cases, for both continuous and binary assessments, ρ can be estimated from an appropriate one-way ANOVA (Fisher, 1925; Elston, 1977). The question to be addressed here concerns the optimal combination (n, k) for the design that permits the most accurate estimation of ρ. For a fixed number of replicates, Donner and Eliasziw (1987) have provided contours of exact power for selected values of k and n. Eliasziw and Donner (1987) used these power results to identify optimal designs that minimize the study costs. Walter et al. (1998) developed an approximation that allows the calculation of the required number of subjects, k, when the number of replicates n is fixed. The approximation avoids the intensive numerical work entailed in the exact method. We note, however, that reliability studies are often designed primarily to estimate the level of observer agreement, and their results are then reported in terms of estimates of agreement, rather than using hypothesis testing. Power considerations in the design of reliability studies require specifications of the hypotheses to be tested; rejection of the null value in a study does not provide useful information, because the investigator needs to know more than the fact that the observed level of reliability is unlikely to be due to chance (Giraudeau and Mary, 2001). Given that inter-rater reliability studies emphasize estimation, it is natural to base their sample size calculations on the attainment of a specified level of precision in the estimation of ρ. For example, Bonett (2002) calculated the sample 1107

2 1108 Biometrics, December 2003 size required to achieve a prescribed expected width for the confidence interval on ρ. This is similar to Donner s approach (1999), which focused on estimating the number of subjects needed to construct a confidence interval with fixed width on the intraclass kappa, for the case of a dichotomous outcome measure. In the following development, we assume that the investigator is interested in the number of replicates, n, per subject, so that the variance of the estimator for ρ is minimized, given that the total number of measurements is constrained to be N = nk a priori. In Section 2, we provide background information on the estimation of the ICC and a variance expression for its moment estimator. In Section 3, we use calculus of optimization to find the optimal combination (n, k) that minimizes the variance of the estimate of the ICC when the response variable is continuously distributed. We also examine the situation where both k and n are determined in such a way that the variance of the estimator for ρ is minimized subject to cost constraints. We devote Section 4 to the issue of optimal design when the assessments are binary. Further discussion is presented in Section Background Under the random effects model given by (1), the variance component estimator for ρ is given as r =(MSB MSW)/{MSB +(n 1)MSW}, where MSB and MSW are, respectively, the between-subject and within-subject mean squares, obtained from the familiar one-way ANOVA. The derivation of r requires no assumption concerning the normality of the Y ij. However, if we do assume that Y ij are normally distributed, we can use an approximate expression for the large sample variance of r (Fisher, 1925; Swiger et al., 1964), as follows: var(r) =V (k, n, ρ) = 2(1 ρ) 2 {1+(n 1)ρ} 2 /kn(n 1). (2) Note that an approximate 100(1 α)% confidence interval on ρ can be given as r ± z 1 α/2 V (k, n, ρ), where z 1 α/2 is the 100(1 α) percentile of the standard normal distribution. This interval depends on (i) the number of subjects k, (ii) the number of replicates n, and (iii) the point estimate of the ICC. Giraudeau and Mary (2001) suggested that a reliability study should be planned with regards to the width of the confidence interval of the ICC, in the same way as is usually done in a descriptive prevalence study (see Sukhatme et al. 1984, p. 45), or as in Freedman, Parmar, and Baker (1993) in estimating probability of observer agreement. In planning a reliability study, we propose an approach similar to that adopted in survey samples aimed at estimating the population mean of some outcome variable (Sukhatme et al. 1984, p. 284). We provide an explicit expression for the required number of replicates, so that researchers can manipulate the value of ρ and cost, to evaluate their effects on the optimal allocation scheme defined by (n, k). 3. Optimal Allocation for Continuous Response Variable 3.1. The Normal Case We will assume that, because of resource limitations, a reliability study is planned with a total of N = nk observations. Formally, we therefore need to decide on the optimal allocation of the observations to minimize (2), subject to N = nk being fixed a priori; ρ is assumed known. Here, we apply the basic idea in the method of constrained variation, using direct substitution. Substitution of N = nk gives var(r) =f(n, ρ) = 2(1 ρ) 2 {1+(n 1)ρ} 2 /N (n 1). (3) Necessary and sufficient conditions for f (n, ρ) to have a unique minimum are given by Rao (1984, p. 53). Differentiating f with respect to n, equating to zero, and solving for n, we obtain n 0 =(1+ρ)/ρ (4) Note that we restrict our investigation to the values of ρ that are strictly positive, since within the framework of reliability studies, negative values of ρ are meaningless. In practice, only integer values of (n, k) are used, and because N = nk is fixed a priori, optimum values of n were first rounded to the nearest integer; then k = N /n is rounded to the nearest integer as well. The values of var(r) at the optimal and appropriately rounded allocations for different values of N and ρ are listed in Table 1. We note that the net loss or gain in precision due to rounding is negligible. We observe from Table 1 that higher number of replicates (n) would lead to a smaller number of subjects, which may reduce the generalizability potential of the study. In addition, when ρ is expected to be larger than 0.6, which is the case in many reliability studies, the results in Table 1 suggest that the study be planned with no more than two or three replicates per subject. This guideline is quite similar to that proposed by Giraudeau and Mary (2001), based on the attainment of a specified width for the 95% confidence interval of the ICC. This is also consistent with the results reported in Table 3 of Walter et al. (1998) The Nonnormal Case As indicated above, the sampling distribution and formula for the variance of the reliability estimates rely on the normality assumptions, despite the fact that real data seldom satisfy these assumptions. We might expect that normality would be only approximately satisfied, at best. A similar problem exists for statistical inference in the one-way random effect model ANOVA, although it has been found that the F-distribution of the ratio of mean squares is quite robust with respect to nonnormality under certain conditions. Scheffé (1959) investigated the effects of nonnormality, concluding that it has little effect on inferences on mean values, but serious effects on inferences concerning variances of random effects whose kurtosis γ differs from zero (p. 345). Although Scheffé s conclusions were based on inferences for the variance ratio φ = σ 2 s/σ 2 e, they may have similar implications for the reliability parameter ρ = φ/(1 + φ). Tukey (1956) obtained the variance of the variance component estimates under various ANOVA models by employing polykeys. For the one-way random effects model, together

3 Issues of Cost and Efficiency in the Design of Reliability Studies 1109 Table 1 Optimal combinations of (n, k), their rounded values, and the corresponding minimized values of var(r), for ρ =0.1(0.1)0.9 and fixed N = nk =60, 90, 120. N ρ n k var(r) n k var(r) n k var(r) with the delta method (Kendall and Stuart, 1986), it can be shown that, to a first order approximation, var (r) = 2(1 ρ) 2 {1+(n 1)ρ} 2 {kn(n 1)} 1 +ρ 2 (1 ρ) 2 k 1( γ s + γ e n 1), (5) where γ s = E(s 4 i )/σ4 s and γ e = E(e 4 ij )/σ4 e (Hemmersley, 1949). Following the same optimization procedure as in Section (3.1), we find that the optimal value for n, say,n,is n =1+ { ρ(1 + γ s ) 1/2} 1. (6) Clearly, when γ s = 0, then n = n 0 (equation [4]). Moreover, for large values of γ s, i.e., increased departure from normality, a smaller number of replicates is needed, implying that a proportionally larger number of subjects (k) should be recruited to ensure precise estimation of ρ. We therefore recommend the same recruitment strategy as in the normal case Cost Implications It has long been recognized that funding constraints determine the recruitment costs of a reliability study. The crucial decision in a typical study is to balance the cost of recruiting subjects with the need for a precise estimate of ρ. There have been some attempts to address the issue of power, rather than precision, in the presence of funding constraints. Eliasziw and Donner (1987) presented a method to determine the number of subjects, k, and number of replications, n, that minimize the overall cost of conducting a reliability study, while still providing acceptable power for tests of hypotheses concerning ρ. They also provided tables showing optimal choices of k and n under various cost constraints. In this section, we shall determine the combinations (n, k) that minimize the variance of r, as given by (2), subject to cost constraints. In our attempt to construct a flexible cost function, we adhere to the general guidelines identified by Flynn, Whitley, and Peters (2002), and Eliasziw and Donner (1987). First, one has to identify the approximate sampling and overhead costs. The sampling cost depends primarily on the size of the sample, and includes costs for data collection, travel, management, and other staff. On the other hand, overhead costs (such as the cost of setting the data collection form) remain fixed, regardless of sample size. Following Sukhatme et al. (1984, p. 284), we assume that the overall cost function is given as: C = c 0 + kc 1 + nkc 2, (7) where c 0 is the fixed cost, c 1, the cost of recruiting a single subject, and c 2 is the cost of making one observation. Using the method of Lagrange multipliers (Rao, 1984), we form the objective function G, asg = var(r)+λ(c c 0 kc 1 nkc 2 ), where var(r) is given by (2) and λ is the Lagrange multiplier. The necessary and sufficient conditions for var(r) to have a constrained relative minimum are given by a theorem of Rao (1984, p. 68). Differentiating G with respect to n, k, and λ, and equating to zero, we obtain and n 3 ρc 2 n 2 c 2 (1 + ρ) nc 1 (2 ρ)+(1 ρ)c 1 =0, (8) λ = 2(1 ρ) 2 {1+(n 1)ρ} {1 2n +(n 1)ρ} / k 2 n 2 (n 1) 2 c 2, k =(C c 0 )/(c 1 + nc 2 ). (9) The third-degree polynomial in (8) has three roots. Using Descartes s rule of signs, we predict that there are two positive or two complex conjugate roots and exactly one negative root. Furthermore, since c 1, c 2 > 0 and 0 <ρ<1, we conclude that there are indeed two (real) positive roots, one of which is

4 1110 Biometrics, December 2003 Table 2 Optimal values of n that minimize var(r) for ρ =0.4(0.1)0.9 and R =0.01, 0.05, 0.25, 1, 5, 25, 50, and 100. R always between 0 and 1. This conveniently leaves us with only one relevant solution for the optimal value of n. The explicit expression for this optimal solution is n opt = {A 1/3 /ρ B +(1+ρ)/ρ}/3 where A =9R(ρ 3 ρ 2 + ρ)+(ρ +1) 3 +3ρ[3R{(R +1) 2 ρ 4 (6R 2 +4R 2)ρ 3 +12R(R +1)ρ 2 (8R 2 +10R +2)ρ R 1}],B = {3Rρ(ρ 2) (ρ +1) 2 }/ρa 1/3, and R = c 1 /c 2. Clearly, n opt depends on ρ and R (the cost of recruiting a subject relative to the cost of measuring a subject). Once the value of n opt is determined, then from (9), the optimal k is ρ k opt = {(C c 0 )/c 1 }/(1 + n opt /R). (10) The numerator of k opt defines the resource available for total recruitment and measurement relative to the recruitment cost per subject. We note, from (10), that n opt and k opt are inversely related. The results of the optimization procedure appear in Table 2. for ρ = 0.4 (0.1) 0.9 and R = 0.01, 0.05, 0.25, 1, 5, 25, 50, and 100. It is apparent from Table 2 that, as R increases (decreases), the number of measurements per subject n opt increases (decreases), while the number of subjects k opt decreases (increases). On the other hand, when R is fixed, an increase in the value of ρ would result in a decrease in the number of replicates and an increase in the number of subjects. This trend reflects two intuitive facts; the first is that it is sensible to decrease the number of items associated with a higher cost, and increase those with a lower cost. The second is that when ρ is large (high reproducibility), fewer number of replicates per subject are needed, while higher number of subjects should be recruited, ensuring that r is estimated with appreciable precision. This remark is similar to the conclusion reached in the previous section, when costs were not explicitly considered. Finally, we note also that by setting c 1 = 0 in (8), i.e., R = 0, we obtain n opt =(1+ρ)/ρ, as in (4). This means that a special cost structure is implied in the optimal allocation discussed in Section 3.1. Example. To assess the accuracy of Doppler echocardiography (DE) in determining aortic valve area (AVA) prospective evaluation on patients with aortic stenosis, an investigator wishes to demonstrate a high degree of reliability (ρ = 90%) in estimating AVA using the velocity integral method. Suppose that the total cost of making the study is fixed at $1600. We assume that the travel costs for a patient in going from the health center to the tertiary hospital (where the procedure is done) is $15. The administrative cost of the procedure and the cost of using the DE is $15 per visit. It is assumed that c 0, the overhead cost, is absorbed by the hospital. From Table 2, n opt for R = 1 and ρ = 0.9 is 2.57, which should be rounded up to 3. From (10), k opt = (1600/15)/(1 + 3) = 27; that is, we need 27 patients, with 3 measurements each. The minimized value of var(r) is Optimal Allocation for Dichotomous Assessments When assessing interrater reliability, a choice must be made on how to measure the condition under investigation. One of the practical aspects of this decision concerns the relative advantages of measuring the trait on a continuous scale, as discussed in the previous sections, or on a dichotomous scale. In many medical screening programs, and in social sciences and psychology studies, it is often more feasible to record the subject s response on a dichotomous scale (such as presence/absence). If this approach is adopted, the issue of optimal allocation becomes very important because, as was demonstrated by Donner and Eliasziw (1994), the loss of power associated with measuring the trait on a dichotomous scale is quite severe, and frequently prohibitive. Our primary focus in this section is the determination of the optimal allocation of fixed N = nk, so that the variance of the estimate of ρ is minimized when the response variable is dichotomous. Fleiss and Cuzick (1979) provided an example where the characteristic under investigation was the presence or absence of schizophrenia in hospitalized mental patients. Let Y ij be the jth rating made on the ith subject, where Y ij = 1 if the condition is present, and 0 otherwise. Analogous to the continuous case, Landis and Koch (1977) employed the one-way random effects model (1), but without the normality assumption being imposed on either e ij or s i. In this context, the standard assumption for the Y ij corresponding to the above ANOVA model is E(Y ij )=π =Pr(Y ij = 1), and σ 2 = var(y ij )=π(1 π). Moreover, let δ =Pr(Y ij =1,Y il = 1) = E(Y ij Y il ). Then, it follows for j l and i =1,2,..., k, that δ =cov(y ij,y il )+E(Y ij )E(Y il )=ρπ(1 π)+π 2 where ρ is the (within-subject) ICC. The probability that two given measurements from the same subject will have the same response is P o = δ +(1 π) 2 + ρπ (1 π) =1 2π (1 π)(1 ρ). When ρ = 0, this probability (probability of agreement by chance) reduces to P e =1 2π(1 π). Therefore, the parameter ρ has a kappa type probabilistic interpretation (Mak, 1988), i.e., κ =(P o P e )/(1 P e )=[{1 2π(1 π)(1 ρ)} {1 2π(1 π)}]/[1 {1 2π(1 π)}] =ρ. The ANOVA estimate for ρ is given by ρ =(MSB MSW)/{MSB +(n 1)MSW}, where MSB and MSW are functions of the ith subject s total Y i = n Y j=1 ij (Fleiss, 1981, p. 226). Crowder (1978) demonstrated the equivalence of the ANOVA model and the well-known common-correlation

5 Issues of Cost and Efficiency in the Design of Reliability Studies 1111 Table 3 Optimal allocation and the minimized values of var(r) for N =60and ρ =0.3(0.1)0.9, for a dichotomous response. π ρ n k var(r) n k var(r) n k var(r) model that occurs when, conditional on the subject effect µ i, the subject s total Y i has a binomial distribution, with conditional mean and variance given by E(Y i µ i ) = nµ i ; var(y i µ i )=nµ i (1 µ i ), with µ i assumed to follow a beta distribution with density function f(µ i )=Γ(α + β)µ α 1 i (1 µ i ) β 1/ Γ(α)Γ(β) with the appropriate parameterization, α = π(1 ρ)/ρ, and β =(1 π)(1 ρ)/ρ. Therefore, the ANOVA model and the beta-binomial model are virtually indistinguishable (Cox and Snell, 1989). Now, since for the nonnormal case, the optimal number of replicates under the ANOVA model was found to be n =1+{ρ (1 + γ s ) 1/2 } 1 (equation [6]). Since γ s is the kurtosis of the subject effect distribution, it turns out that one may use the kurtosis of the beta distribution (the subjectrandom effect distribution for binary data) to determine the optimal number of replications in the case of dichotomous response. One can derive the γ s for the beta distribution from Kendall and Stuart (1986, p. 73) from which γ s = m 4 /m 2 2, where m 4 and m 2 are, respectively, the fourth and the second central moments of the beta distribution. Substituting γ s into (6), we obtain n =1+π(1 π){(1 + ρ)(1 + 2ρ)/ψ(π,ρ)} 1/2 (11) where, ψ(π, ρ) = π[ρ + π(1 ρ)][2ρ + π(1 ρ)][3ρ + π(1 ρ) 4π(1 + 2ρ)]+(1+ρ)(1 + 2ρ)[6π 3 (1 π)ρ + 3π 4 + π 2 (1 π) 2 ρ 2 ]. In contrast to the continuous measurement model, the optimal allocations in the case of dichotomous assessments depend on π, the mean of the binary response variable. Table 3 shows the optimal number of replicates n, the corresponding optimal number of subjects, k = N/n, and var(r) at the optimal and appropriately rounded allocations, for N = 60, ρ = 0.3 (0.1) 0.9. For simplicity, we assigned a value of 0toγ e, while computing var(r) using (5). We note that, for fixed N, the allocations are equivalent for π and 1 π, and therefore we have restricted the values of π to 0.1, 0.3, and 0.5. We also note that as π approaches 0.5, the number of replicates increases while the number of subjects decreases. On the hand, as ρ increases, the number replicates decreases and the number subjects increases. Moreover, similar to Table 1, the rounding has negligible effect on the efficiency of the ICC estimate. 5. Discussion A crucial decision that a researcher faces in the design stage of a reliability study is the determination of the number of subjects k and the number of measurements per subject n. When we have prior knowledge of what constitutes an acceptable level of reliability, a hypothesis testing approach may be used, and the sample size calculations can then be performed using the methods of Donner and Eliasziw (1987) and Walter et al. (1998). However, in most cases, values of the reliability coefficient under the null and alternative hypotheses may be difficult to specify. For instance, the estimated value of ICC depends on the degree of heterogeneity among the sampled subjects: the greater the heterogeneity, the higher the value of ICC. Since most reliability studies focus on the estimation of ICC with sufficient precision, the guidelines provided in this article, which we based on principles of mathematical optimization, allow an investigator to select the pair (n, k) that maximizes the precision of the estimated reliability index. Our proposed approach is quite simple and produces estimates of (n, k) that are in close agreement with results based on considerations of power. An interesting finding from our results is that, regardless of whether the assessments are continuous or binary, the variance is minimized with a small number of replicates, as long as the true index of reliability remains reasonably high. In many clinical investigations, reliability of at least 60% is required, to provide method of measurement that has practical utility. Under such circumstances, one can safely recommend making only two or three observations per subject. We note

6 1112 Biometrics, December 2003 that cost implications for dichotomous assessments are quite important but, because of lack of space, we intend to report on this issue in a future article. Acknowledgement We thank the associate editor and two anonymous referees for their constructive comments. We appreciate the support for this research by the Research Center Administration and NSERC Canada. Résumé La fiabilité de réponses continues ou dichotomiques est habituellement évaluée au moyen du coefficient de corrélation intraclasse (ICC). Nous calculons l allocation optimale du nombre de sujets k et du nombre de mesures répétées n qui minimise la variance de l ICC estimé. Les contraintes de coût sont envisagées dans le cas de réponses à distributions gaussiennes. On donne des tables avec les choix optimaux de k et n, ainsi que des indications pour la mise au point d études de fiabilité à la lumière de nos résultats et de ceux d autres auteurs. References Bonett, D. G. (2002). Sample size requirements for estimating intraclass correlations with desired precision. Statistics in Medicine 21, Cox, D. R. and Snell, E. J. (1989). Analysis of Binary Data, 2nd edition. London: Chapman and Hall. Crowder, M. (1978). Beta-binomial ANOVA for proportions. Applied Statistics 27, Donner, A. (1999). Sample size requirements for interval estimation of the intraclass kappa statistic. Communications in Statistics Simulation 28(2), Donner, A. and Eliasziw, M. (1987). Sample size requirements for reliability studies. Statistics in Medicine 6, Donner, A. and Eliasziw, M. (1994). Statistical implications of the choice between a dichotomous or continuous trait in studies of interobserver agreement. Biometrics 50, Dunn, G. (1992). Design and analysis of reliability studies. Statistical Methods in Medical Research 1, Eliasziw, M. and Donner, A. (1987). A cost-function approach to the design of reliability studies. Statistics in Medicine 6, Elston, R. (1977). Response to query: Estimating heritability of a continuous trait. Biometrics 33, Fisher, R. A. (1925). Statistical Methods for Research Workers. London: Oliver and Boyd. Fleiss, J. (1981). Statistical Methods for Rates and Proportions, 2nd edition. New York: Wiley. Fleiss, J. and Cuzick, J. (1979). The reliability of dichotomous judgments: Unequal number of judgments per subject. Applied Psychological Measurement 3, Flynn, N. T., Whitley, E., and Peters, T. (2002). Recruitment strategy in a cluster randomized trial: Cost implications. Statistics in Medicine 21, Freedman, L., Parmar, M., and Baker, S. (1993). The design of observer agreement studies with binary assessments. Statistics in Medicine 12, Giraudeau, B. and Mary, J. Y. (2001). Planning a reproducibility study: How many subjects and how many replicates per subject for an expected width of the 95 per cent confidence interval of the intraclass correlation coefficient. Statistics in Medicine 20, Haggard, E. R. (1958). Intraclass Correlation and the Analysis of Variance. New York: Dryden Press. Hemmersley, I. M. (1949). The unbiased estimate and standard error of the intraclass variance. Metron 15, Kendall, M. and Stuart, A. (1986). The Advanced Theory of Statistics, Volume I. London: Griffin. Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics 33, Mak, T. K. (1988). Analyzing intraclass correlation for dichotomous variables. Applied Statistics 20, Rao, S. S. (1984). Optimization: Theory and Applications, 2nd edition. New Delhi: Wiley Eastern. Scheffé, H. (1959). The Analysis of Variance. New York: Wiley. Shoukri, M. M. (1999). Agreement. In Encyclopedia of Biostatistics, P. Armitage and T. Colton, (eds). New York: Wiley. Shoukri, M. M. (2000). Agreement. In Encyclopedia of Epidemiology, M. Gail (ed). New York: Wiley. Shoukri, M. M. and Asyali, M. H. (2002). Issues of cost and power in the design of agreement studies. Technical Report, Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia. Sukhatme, P. V, Sukhatme, B. V., Sukhatme, S., and Asok, C. (1984). Sampling Theory of Surveys with Applications. Ames: Iowa State University Press. Swiger, L. A., Harvey, W. R., Everson, D. O., and Gregory, K. E. (1964). The variance of the intraclass correlation involving groups with one observation. Biometrics 20, Tukey, J. W. (1956). Variance of variance components: I. Balanced designs. Annals of Mathematical Statistics 27, Walter, D. S., Eliasziw, M., and Donner, A. (1998). Sample size and optimal design for reliability studies. Statistics in Medicine 17, Received October Revised May Accepted May 2003.

SAMPLE SIZE AND OPTIMAL DESIGNS FOR RELIABILITY STUDIES

SAMPLE SIZE AND OPTIMAL DESIGNS FOR RELIABILITY STUDIES STATISTICS IN MEDICINE, VOL. 17, 101 110 (1998) SAMPLE SIZE AND OPTIMAL DESIGNS FOR RELIABILITY STUDIES S. D. WALTER, * M. ELIASZIW AND A. DONNER Department of Clinical Epidemiology and Biostatistics,

More information

Sample Size Formulas for Estimating Intraclass Correlation Coefficients in Reliability Studies with Binary Outcomes

Sample Size Formulas for Estimating Intraclass Correlation Coefficients in Reliability Studies with Binary Outcomes Western University Scholarship@Western Electronic Thesis and Dissertation Repository September 2016 Sample Size Formulas for Estimating Intraclass Correlation Coefficients in Reliability Studies with Binary

More information

Two Measurement Procedures

Two Measurement Procedures Test of the Hypothesis That the Intraclass Reliability Coefficient is the Same for Two Measurement Procedures Yousef M. Alsawalmeh, Yarmouk University Leonard S. Feldt, University of lowa An approximate

More information

A Bayesian Estimator of the Intracluster Correlation Coefficient from Correlated Binary Responses

A Bayesian Estimator of the Intracluster Correlation Coefficient from Correlated Binary Responses Journal of Data Science 82010), 127-137 A Bayesian Estimator of the Intracluster Correlation Coefficient from Correlated Binary Responses Marwa Ahmed 1, Mohamed Shoukri 2,3 1 Cairo University, 2 King Faisal

More information

A UNIFIED APPROACH FOR ASSESSING AGREEMENT FOR CONTINUOUS AND CATEGORICAL DATA

A UNIFIED APPROACH FOR ASSESSING AGREEMENT FOR CONTINUOUS AND CATEGORICAL DATA Journal of Biopharmaceutical Statistics, 17: 69 65, 007 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/150-5711 online DOI: 10.1080/10543400701376498 A UNIFIED APPROACH FOR ASSESSING AGREEMENT

More information

Agreement Coefficients and Statistical Inference

Agreement Coefficients and Statistical Inference CHAPTER Agreement Coefficients and Statistical Inference OBJECTIVE This chapter describes several approaches for evaluating the precision associated with the inter-rater reliability coefficients of the

More information

SEQUENTIAL TESTING OF MEASUREMENT ERRORS IN INTER-RATER RELIABILITY STUDIES

SEQUENTIAL TESTING OF MEASUREMENT ERRORS IN INTER-RATER RELIABILITY STUDIES Statistica Sinica 23 (2013), 1743-1759 doi:http://dx.doi.org/10.5705/ss.2012.036s SEQUENTIAL TESTING OF MEASUREMENT ERRORS IN INTER-RATER RELIABILITY STUDIES Mei Jin 1,3, Aiyi Liu 2, Zhen Chen 2 and Zhaohai

More information

Inter-Rater Agreement

Inter-Rater Agreement Engineering Statistics (EGC 630) Dec., 008 http://core.ecu.edu/psyc/wuenschk/spss.htm Degree of agreement/disagreement among raters Inter-Rater Agreement Psychologists commonly measure various characteristics

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Intraclass Correlations in One-Factor Studies

Intraclass Correlations in One-Factor Studies CHAPTER Intraclass Correlations in One-Factor Studies OBJECTIVE The objective of this chapter is to present methods and techniques for calculating the intraclass correlation coefficient and associated

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

Model II (or random effects) one-way ANOVA:

Model II (or random effects) one-way ANOVA: Model II (or random effects) one-way ANOVA: As noted earlier, if we have a random effects model, the treatments are chosen from a larger population of treatments; we wish to generalize to this larger population.

More information

ANALYSIS OF CORRELATED DATA SAMPLING FROM CLUSTERS CLUSTER-RANDOMIZED TRIALS

ANALYSIS OF CORRELATED DATA SAMPLING FROM CLUSTERS CLUSTER-RANDOMIZED TRIALS ANALYSIS OF CORRELATED DATA SAMPLING FROM CLUSTERS CLUSTER-RANDOMIZED TRIALS Background Independent observations: Short review of well-known facts Comparison of two groups continuous response Control group:

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Assessing intra, inter and total agreement with replicated readings

Assessing intra, inter and total agreement with replicated readings STATISTICS IN MEDICINE Statist. Med. 2005; 24:1371 1384 Published online 30 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2006 Assessing intra, inter and total agreement

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

Superiority by a Margin Tests for One Proportion

Superiority by a Margin Tests for One Proportion Chapter 103 Superiority by a Margin Tests for One Proportion Introduction This module provides power analysis and sample size calculation for one-sample proportion tests in which the researcher is testing

More information

Chapter 19. Agreement and the kappa statistic

Chapter 19. Agreement and the kappa statistic 19. Agreement Chapter 19 Agreement and the kappa statistic Besides the 2 2contingency table for unmatched data and the 2 2table for matched data, there is a third common occurrence of data appearing summarised

More information

Sample size determination for logistic regression: A simulation study

Sample size determination for logistic regression: A simulation study Sample size determination for logistic regression: A simulation study Stephen Bush School of Mathematical Sciences, University of Technology Sydney, PO Box 123 Broadway NSW 2007, Australia Abstract This

More information

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Journal of Modern Applied Statistical Methods Volume 4 Issue Article 8 --5 Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Sudhir R. Paul University of

More information

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study Science Journal of Applied Mathematics and Statistics 2014; 2(1): 20-25 Published online February 20, 2014 (http://www.sciencepublishinggroup.com/j/sjams) doi: 10.11648/j.sjams.20140201.13 Robust covariance

More information

Stat 705: Completely randomized and complete block designs

Stat 705: Completely randomized and complete block designs Stat 705: Completely randomized and complete block designs Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 16 Experimental design Our department offers

More information

Random marginal agreement coefficients: rethinking the adjustment for chance when measuring agreement

Random marginal agreement coefficients: rethinking the adjustment for chance when measuring agreement Biostatistics (2005), 6, 1,pp. 171 180 doi: 10.1093/biostatistics/kxh027 Random marginal agreement coefficients: rethinking the adjustment for chance when measuring agreement MICHAEL P. FAY National Institute

More information

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS Journal of Biopharmaceutical Statistics, 18: 1184 1196, 2008 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400802369053 SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE

More information

Introduction to Business Statistics QM 220 Chapter 12

Introduction to Business Statistics QM 220 Chapter 12 Department of Quantitative Methods & Information Systems Introduction to Business Statistics QM 220 Chapter 12 Dr. Mohammad Zainal 12.1 The F distribution We already covered this topic in Ch. 10 QM-220,

More information

Group-Sequential Tests for One Proportion in a Fleming Design

Group-Sequential Tests for One Proportion in a Fleming Design Chapter 126 Group-Sequential Tests for One Proportion in a Fleming Design Introduction This procedure computes power and sample size for the single-arm group-sequential (multiple-stage) designs of Fleming

More information

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions K. Krishnamoorthy 1 and Dan Zhang University of Louisiana at Lafayette, Lafayette, LA 70504, USA SUMMARY

More information

Reliability of Three-Dimensional Facial Landmarks Using Multivariate Intraclass Correlation

Reliability of Three-Dimensional Facial Landmarks Using Multivariate Intraclass Correlation Reliability of Three-Dimensional Facial Landmarks Using Multivariate Intraclass Correlation Abstract The intraclass correlation coefficient (ICC) is widely used in many fields, including orthodontics,

More information

When enough is enough: early stopping of biometrics error rate testing

When enough is enough: early stopping of biometrics error rate testing When enough is enough: early stopping of biometrics error rate testing Michael E. Schuckers Department of Mathematics, Computer Science and Statistics St. Lawrence University and Center for Identification

More information

Reliability Coefficients

Reliability Coefficients Testing the Equality of Two Related Intraclass Reliability Coefficients Yousef M. Alsawaimeh, Yarmouk University Leonard S. Feldt, University of lowa An approximate statistical test of the equality of

More information

Assessing agreement with multiple raters on correlated kappa statistics

Assessing agreement with multiple raters on correlated kappa statistics Biometrical Journal 52 (2010) 61, zzz zzz / DOI: 10.1002/bimj.200100000 Assessing agreement with multiple raters on correlated kappa statistics Hongyuan Cao,1, Pranab K. Sen 2, Anne F. Peery 3, and Evan

More information

arxiv: v1 [math.st] 28 Feb 2017

arxiv: v1 [math.st] 28 Feb 2017 Bridging Finite and Super Population Causal Inference arxiv:1702.08615v1 [math.st] 28 Feb 2017 Peng Ding, Xinran Li, and Luke W. Miratrix Abstract There are two general views in causal analysis of experimental

More information

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation Research Article Received XXXX (www.interscience.wiley.com) DOI: 10.100/sim.0000 Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample

More information

HYPOTHESIS TESTING. Hypothesis Testing

HYPOTHESIS TESTING. Hypothesis Testing MBA 605 Business Analytics Don Conant, PhD. HYPOTHESIS TESTING Hypothesis testing involves making inferences about the nature of the population on the basis of observations of a sample drawn from the population.

More information

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping : Decision Theory, Dynamic Programming and Optimal Stopping Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj InSPiRe Conference on Methodology

More information

Title. Description. Quick start. Menu. stata.com. icc Intraclass correlation coefficients

Title. Description. Quick start. Menu. stata.com. icc Intraclass correlation coefficients Title stata.com icc Intraclass correlation coefficients Description Menu Options for one-way RE model Remarks and examples Methods and formulas Also see Quick start Syntax Options for two-way RE and ME

More information

TWO-FACTOR AGRICULTURAL EXPERIMENT WITH REPEATED MEASURES ON ONE FACTOR IN A COMPLETE RANDOMIZED DESIGN

TWO-FACTOR AGRICULTURAL EXPERIMENT WITH REPEATED MEASURES ON ONE FACTOR IN A COMPLETE RANDOMIZED DESIGN Libraries Annual Conference on Applied Statistics in Agriculture 1995-7th Annual Conference Proceedings TWO-FACTOR AGRICULTURAL EXPERIMENT WITH REPEATED MEASURES ON ONE FACTOR IN A COMPLETE RANDOMIZED

More information

Coefficients of agreement for fixed observers

Coefficients of agreement for fixed observers Statistical Methods in Medical Research 2006; 15: 255 271 Coefficients of agreement for fixed observers Michael Haber Department of Biostatistics, Rollins School of Public Health, Emory University, Atlanta,

More information

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization. 1 Chapter 1: Research Design Principles The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization. 2 Chapter 2: Completely Randomized Design

More information

UNIVERSITY OF CALGARY. Measuring Observer Agreement on Categorical Data. Andrea Soo A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

UNIVERSITY OF CALGARY. Measuring Observer Agreement on Categorical Data. Andrea Soo A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES UNIVERSITY OF CALGARY Measuring Observer Agreement on Categorical Data by Andrea Soo A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR

More information

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Designs: Theory, Computation and Optimisation Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference

More information

A simulation study for comparing testing statistics in response-adaptive randomization

A simulation study for comparing testing statistics in response-adaptive randomization RESEARCH ARTICLE Open Access A simulation study for comparing testing statistics in response-adaptive randomization Xuemin Gu 1, J Jack Lee 2* Abstract Background: Response-adaptive randomizations are

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS

COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS COMPOSITIONAL IDEAS IN THE BAYESIAN ANALYSIS OF CATEGORICAL DATA WITH APPLICATION TO DOSE FINDING CLINICAL TRIALS M. Gasparini and J. Eisele 2 Politecnico di Torino, Torino, Italy; mauro.gasparini@polito.it

More information

Measurement Error. Martin Bland. Accuracy and precision. Error. Measurement in Health and Disease. Professor of Health Statistics University of York

Measurement Error. Martin Bland. Accuracy and precision. Error. Measurement in Health and Disease. Professor of Health Statistics University of York Measurement in Health and Disease Measurement Error Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Accuracy and precision In this lecture: measurements which are

More information

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Biometrical Journal 42 (2000) 1, 59±69 Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Kung-Jong Lui

More information

NIH Public Access Author Manuscript Stat Med. Author manuscript; available in PMC 2014 October 16.

NIH Public Access Author Manuscript Stat Med. Author manuscript; available in PMC 2014 October 16. NIH Public Access Author Manuscript Published in final edited form as: Stat Med. 2013 October 30; 32(24): 4162 4179. doi:10.1002/sim.5819. Sample Size Determination for Clustered Count Data A. Amatya,

More information

On Measuring Repeatability of Data from Self-Administered Questionnaires

On Measuring Repeatability of Data from Self-Administered Questionnaires International Journal of Epidemiology International Epidemlological Association 987 Vol6,. Printed in Great Britain On Measuring Repeatability of Data from Self-Administered Questionnaires S CHINN AND

More information

Unit 14: Nonparametric Statistical Methods

Unit 14: Nonparametric Statistical Methods Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based

More information

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping

Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming. and Optimal Stopping Interim Monitoring of Clinical Trials: Decision Theory, Dynamic Programming and Optimal Stopping Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj

More information

Optimal SPRT and CUSUM Procedures using Compressed Limit Gauges

Optimal SPRT and CUSUM Procedures using Compressed Limit Gauges Optimal SPRT and CUSUM Procedures using Compressed Limit Gauges P. Lee Geyer Stefan H. Steiner 1 Faculty of Business McMaster University Hamilton, Ontario L8S 4M4 Canada Dept. of Statistics and Actuarial

More information

Sample Size Determination

Sample Size Determination Sample Size Determination 018 The number of subjects in a clinical study should always be large enough to provide a reliable answer to the question(s addressed. The sample size is usually determined by

More information

S Abelman * Keywords: Multivariate analysis of variance (MANOVA), hypothesis testing.

S Abelman * Keywords: Multivariate analysis of variance (MANOVA), hypothesis testing. S Afr Optom 2006 65 (2) 62 67 A p p l i c a t i o n o f m u l t i v a r i a t e a n a l y s i s o f v a r i - a n c e ( M A N O VA ) t o d i s t a n c e r e f r a c t i v e v a r i - a b i l i t y a n

More information

Analysis of Variance and Co-variance. By Manza Ramesh

Analysis of Variance and Co-variance. By Manza Ramesh Analysis of Variance and Co-variance By Manza Ramesh Contents Analysis of Variance (ANOVA) What is ANOVA? The Basic Principle of ANOVA ANOVA Technique Setting up Analysis of Variance Table Short-cut Method

More information

Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design

Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design Chapter 236 Non-Inferiority Tests for the Ratio of Two Proportions in a Cluster- Randomized Design Introduction This module provides power analysis and sample size calculation for non-inferiority tests

More information

Table 2.14 : Distribution of 125 subjects by laboratory and +/ Category. Test Reference Laboratory Laboratory Total

Table 2.14 : Distribution of 125 subjects by laboratory and +/ Category. Test Reference Laboratory Laboratory Total 2.5. Kappa Coefficient and the Paradoxes. - 31-2.5.1 Kappa s Dependency on Trait Prevalence On February 9, 2003 we received an e-mail from a researcher asking whether it would be possible to apply the

More information

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN

TESTS FOR EQUIVALENCE BASED ON ODDS RATIO FOR MATCHED-PAIR DESIGN Journal of Biopharmaceutical Statistics, 15: 889 901, 2005 Copyright Taylor & Francis, Inc. ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400500265561 TESTS FOR EQUIVALENCE BASED ON ODDS RATIO

More information

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity Prof. Kevin E. Thorpe Dept. of Public Health Sciences University of Toronto Objectives 1. Be able to distinguish among the various

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

Understanding Ding s Apparent Paradox

Understanding Ding s Apparent Paradox Submitted to Statistical Science Understanding Ding s Apparent Paradox Peter M. Aronow and Molly R. Offer-Westort Yale University 1. INTRODUCTION We are grateful for the opportunity to comment on A Paradox

More information

A Use of the Information Function in Tailored Testing

A Use of the Information Function in Tailored Testing A Use of the Information Function in Tailored Testing Fumiko Samejima University of Tennessee for indi- Several important and useful implications in latent trait theory, with direct implications vidualized

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lecture 5: Sampling Methods

Lecture 5: Sampling Methods Lecture 5: Sampling Methods What is sampling? Is the process of selecting part of a larger group of participants with the intent of generalizing the results from the smaller group, called the sample, to

More information

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University

QED. Queen s Economics Department Working Paper No Hypothesis Testing for Arbitrary Bounds. Jeffrey Penney Queen s University QED Queen s Economics Department Working Paper No. 1319 Hypothesis Testing for Arbitrary Bounds Jeffrey Penney Queen s University Department of Economics Queen s University 94 University Avenue Kingston,

More information

HOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY?

HOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY? HOW TO DETERMINE THE NUMBER OF SUBJECTS NEEDED FOR MY STUDY? TUTORIAL ON SAMPLE SIZE AND POWER CALCULATIONS FOR INEQUALITY TESTS. John Zavrakidis j.zavrakidis@nki.nl May 28, 2018 J.Zavrakidis Sample and

More information

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS TABLE OF CONTENTS INTRODUCTORY NOTE NOTES AND PROBLEM SETS Section 1 - Point Estimation 1 Problem Set 1 15 Section 2 - Confidence Intervals and

More information

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization Statistics and Probability Letters ( ) Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: wwwelseviercom/locate/stapro Using randomization tests to preserve

More information

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances Preface Power is the probability that a study will reject the null hypothesis. The estimated probability is a function

More information

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Some Applications of Exponential Ordered Scores Author(s): D. R. Cox Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 26, No. 1 (1964), pp. 103-110 Published by: Wiley

More information

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS THUY ANH NGO 1. Introduction Statistics are easily come across in our daily life. Statements such as the average

More information

Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression

Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression Conditional SEMs from OLS, 1 Conditional Standard Errors of Measurement for Performance Ratings from Ordinary Least Squares Regression Mark R. Raymond and Irina Grabovsky National Board of Medical Examiners

More information

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

Non-parametric methods

Non-parametric methods Eastern Mediterranean University Faculty of Medicine Biostatistics course Non-parametric methods March 4&7, 2016 Instructor: Dr. Nimet İlke Akçay (ilke.cetin@emu.edu.tr) Learning Objectives 1. Distinguish

More information

Data envelopment analysis

Data envelopment analysis 15 Data envelopment analysis The purpose of data envelopment analysis (DEA) is to compare the operating performance of a set of units such as companies, university departments, hospitals, bank branch offices,

More information

Use of the Log Odds Ratio to Assess the Reliability of Dichotomous Questionnaire Data

Use of the Log Odds Ratio to Assess the Reliability of Dichotomous Questionnaire Data Use of the Log Odds Ratio to Assess the Reliability of Dichotomous Questionnaire Data D. A. Sprott and M. D. Vogel-Sprott University of Waterloo, Canada The use of the log odds ratio to measure test-retest

More information

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure

Sample size determination for a binary response in a superiority clinical trial using a hybrid classical and Bayesian procedure Ciarleglio and Arendt Trials (2017) 18:83 DOI 10.1186/s13063-017-1791-0 METHODOLOGY Open Access Sample size determination for a binary response in a superiority clinical trial using a hybrid classical

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

IENG581 Design and Analysis of Experiments INTRODUCTION

IENG581 Design and Analysis of Experiments INTRODUCTION Experimental Design IENG581 Design and Analysis of Experiments INTRODUCTION Experiments are performed by investigators in virtually all fields of inquiry, usually to discover something about a particular

More information

Evaluation Strategies

Evaluation Strategies Evaluation Intrinsic Evaluation Comparison with an ideal output: Challenges: Requires a large testing set Intrinsic subjectivity of some discourse related judgments Hard to find corpora for training/testing

More information

A Multivariate Perspective

A Multivariate Perspective A Multivariate Perspective on the Analysis of Categorical Data Rebecca Zwick Educational Testing Service Ellijot M. Cramer University of North Carolina at Chapel Hill Psychological research often involves

More information

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method

More information

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs The Analysis of Variance (ANOVA) The analysis of variance (ANOVA) is a statistical technique

More information

Bootstrap Tests: How Many Bootstraps?

Bootstrap Tests: How Many Bootstraps? Bootstrap Tests: How Many Bootstraps? Russell Davidson James G. MacKinnon GREQAM Department of Economics Centre de la Vieille Charité Queen s University 2 rue de la Charité Kingston, Ontario, Canada 13002

More information

Bootstrap Procedures for Testing Homogeneity Hypotheses

Bootstrap Procedures for Testing Homogeneity Hypotheses Journal of Statistical Theory and Applications Volume 11, Number 2, 2012, pp. 183-195 ISSN 1538-7887 Bootstrap Procedures for Testing Homogeneity Hypotheses Bimal Sinha 1, Arvind Shah 2, Dihua Xu 1, Jianxin

More information

How Measurement Error Affects the Four Ways We Use Data

How Measurement Error Affects the Four Ways We Use Data Measurement error is generally considered to be a bad thing, and yet there is very little written about how measurement error affects the way we use our measurements. This column will consider these effects

More information

Sample size calculations for logistic and Poisson regression models

Sample size calculations for logistic and Poisson regression models Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National

More information

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering

Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering Three-Level Modeling for Factorial Experiments With Experimentally Induced Clustering John J. Dziak The Pennsylvania State University Inbal Nahum-Shani The University of Michigan Copyright 016, Penn State.

More information

Journal of Educational and Behavioral Statistics

Journal of Educational and Behavioral Statistics Journal of Educational and Behavioral Statistics http://jebs.aera.net Theory of Estimation and Testing of Effect Sizes: Use in Meta-Analysis Helena Chmura Kraemer JOURNAL OF EDUCATIONAL AND BEHAVIORAL

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information: Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information: aanum@ug.edu.gh College of Education School of Continuing and Distance Education 2014/2015 2016/2017 Session Overview In this Session

More information

16.400/453J Human Factors Engineering. Design of Experiments II

16.400/453J Human Factors Engineering. Design of Experiments II J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential

More information

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E Salt Lake Community College MATH 1040 Final Exam Fall Semester 011 Form E Name Instructor Time Limit: 10 minutes Any hand-held calculator may be used. Computers, cell phones, or other communication devices

More information

Inverse Sampling for McNemar s Test

Inverse Sampling for McNemar s Test International Journal of Statistics and Probability; Vol. 6, No. 1; January 27 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Inverse Sampling for McNemar s Test

More information

Tutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups. Acknowledgements:

Tutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups. Acknowledgements: Tutorial 5: Power and Sample Size for One-way Analysis of Variance (ANOVA) with Equal Variances Across Groups Anna E. Barón, Keith E. Muller, Sarah M. Kreidler, and Deborah H. Glueck Acknowledgements:

More information

October 1, Keywords: Conditional Testing Procedures, Non-normal Data, Nonparametric Statistics, Simulation study

October 1, Keywords: Conditional Testing Procedures, Non-normal Data, Nonparametric Statistics, Simulation study A comparison of efficient permutation tests for unbalanced ANOVA in two by two designs and their behavior under heteroscedasticity arxiv:1309.7781v1 [stat.me] 30 Sep 2013 Sonja Hahn Department of Psychology,

More information

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA Kazuyuki Koizumi Department of Mathematics, Graduate School of Science Tokyo University of Science 1-3, Kagurazaka,

More information