Sample Size Calculations for ROC Studies: Parametric Robustness and Bayesian Nonparametrics

Size: px
Start display at page:

Download "Sample Size Calculations for ROC Studies: Parametric Robustness and Bayesian Nonparametrics"

Transcription

1 Baylor Health Care System From the SelectedWorks of unlei Cheng Spring January 30, 01 Sample Size Calculations for ROC Studies: Parametric Robustness and Bayesian Nonparametrics unlei Cheng, Baylor Health Care System Adam J Branscum, Oregon State University Wesley O Johnson, University of California - Irvine Available at:

2 Sample Size Calculations for ROC Studies: Parametric Robustness and Bayesian Nonparametrics UNLEI CHENG 1, AAM J. BRANSCUM, and WESLEY O. JOHNSON 3 1 Institute for Health Care Research and Improvement, Baylor Health Care System, allas, TX 7506, USA epartment of Public Health, Oregon State University, Corvallis, OR 97331, USA 3 epartment of Statistics, University of California, Irvine, Irvine, CA 9697, USA Correspondence to: unlei Cheng, Institute for Health Care Research and Improvement, Baylor Health Care System, 8080 N. Central Expressway, Suite 500, allas, TX 7506, dunleic@baylorhealth.edu 1

3 Abstract Methods for sample size calculations in ROC studies often assume independent normal distributions for test scores among the diseased and non-diseased populations. We consider sample size requirements under the default two-group normal model when the data distribution for the diseased population is either skewed or multimodal. For these two common scenarios we investigate the potential for robustness of calculated sample sizes under the mis-specified normal model and we compare to sample sizes calculated under a more flexible nonparametric irichlet process mixture model. We also highlight the utility of flexible models for ROC data analysis and their importance to study design. When non-standard distributional shapes are anticipated, our Bayesian nonparametric approach allows investigators to determine a sample size based on the use of more appropriate distributional assumptions than are generally applied. The method also provides researchers a tool to conduct a sensitivity analysis to sample size calculations that are based on a two-group normal model. We extend the proposed approach to comparative studies involving two continuous tests. Our simulation-based procedure is implemented using the WinBUGS and R software packages and example code is made available. Key Words: AUC, Bayesian design, irichlet process mixtures, ROC curve, simulation 1. Introduction Careful planning is important in the design of studies to evaluate the performance of medical tests. The ability of a continuous test or, more generally, a classification procedure, to distinguish between two groups (e.g., disease positive,, and disease negative, ) is characterized by the tests receiver operating characteristics (ROC) curve and the corresponding area under the curve (AUC). Let y denote a test score and let c denote a cutoff threshold to categorize subjects as positive or negative (T+ or T-). We adhere to the convention that higher test scores are indicative of disease presence, so y > c corresponds to T+. For all c, the ROC curve plots a tests true positive probability (sensitivity), denoted η ( c ) = Pr( y > c ), against its false positive probability (1 specificity), where the specificity at cutoff c is θ ( c ) = Pr( y< c ). When planning a

4 test-accuracy study, a crucial element of the design is proper determination of a sample size that will enable conclusions to be drawn about AUC. Frequentist and Bayesian sample size procedures for ROC studies have often assumed normal distributions for test scores within the and populations [e.g., 1, ]. Modeling ROC data as normal when they are skewed or multimodal can result in biased inference for AUC when there is overlap in the distributions of test scores for the and populations. A multimodal distribution for the population will occur, for instance, when subgroups of individuals are in different stages of disease; individuals in higher stages of disease will tend to have higher test scores than individuals who are newly infected. For example, Bayesian nonparametric estimates of ROC curves for two assays designed to detect Johne s disease in dairy cattle have been used as an alternative to twogroup normal inference in order to account for multiple disease stages [3]. A right skewed distribution of test scores in may occur when only a small proportion of individuals survive long enough to experience disease progression. Although a normal distribution will often accurately model population, it can fail when certain subgroups are more likely to have false positives due to, for instance, cross-reactions of the test to a related infectious agent. We focus on one-test and two-test scenarios with continuous data; in contrast, a hierarchical Bayesian model for sample size determination in ROC analysis of multiple ordinal tests can be found in [4]. The importance of performing sample size and power calculations using the same model that is planned to be used for data analysis is well recognized. If a nonparametric analysis of ROC data is planned, then sample size calculations would ideally use the same nonparametric model. With this motivation, the goals of the current study are: (i) 3

5 to develop a flexible Bayesian approach to sample size analysis in ROC studies using irichlet process mixture (PM) models and (ii) to investigate the robustness of power calculated under the two-group normal when that model does not hold due to skewness or multimodality. The PM model used in this paper is operationally a mixture of normal distributions with a large number of mixing components. Hence, PMs have the flexibility to identify unanticipated multimodality, clumping, and skewness. Throughout this paper the terms power and predictive power are used to describe the following approach to sample size determination in ROC studies. The criterion we adopt involves sampling multiple data sets from a marginal predictive distribution. Marginalization occurs with respect to a (sampling) prior distribution for all unknown model parameters. For each simulated data set, we compute the posterior probability that AUC exceeds some threshold (e.g., the AUC of a competitor test). The calculated predictive power is then a Monte Carlo average of those posterior probabilities. Similar usage of the term power in Bayesian sample size studies appears in []. Other methods used for sample size determination in ROC studies include the average length criterion [7] and the average variance criterion [9]. There has been much recent activity in the development of nonparametric approaches to ROC data analysis. Frequentist methods have used empirical likelihood [10], semiparametric generalized linear models [11], and kernel density estimation [1, 13] to estimate ROC curves. Bayesian P mixtures of normals were used by [14] and [15]. A comparison of Bayesian PMs to a two-group normal ROC analysis by [14] found that, in a particular example with sample sizes of about 1500 in each group, the parametric approach overestimated AUC when both groups have multimodal 4

6 distributions. ROC analysis using Bayesian mixtures of Polya tree models were developed by [15] and [16], but we focus solely on PMs and do not explore Polya tree models here. In the next section we describe the PM model used in this study, including prior specification, posterior approximation, and AUC estimation and testing. Section 3 presents the general computational algorithm for predictive power estimation under the PM and two-group normal procedures. Section 4 reports on a simulation study to compare sample sizes under the two procedures using data sets where normality is satisfied or not. While most of our presentation focuses on studies involving one diagnostic test, in Section 4.5 we detail sample size methods when the goal is to directly compare the accuracy of two tests. A summary of our research is provided in Section 5.. Parametric and Nonparametric Models We consider designs that involve a single (imperfect) continuous diagnostic test. We assume that true disease status has been ascertained by some other means for every sampled individual. Let y i (i = 1,,n ) and y (j = 1,, n ) denote (possibly j transformed) test scores from a random sample of individuals within the and populations, respectively. The standard two-group normal model assumes independent normal distributions for test scores: y i µ, σ ~ N( µ, σ ) and y µ, σ ~ N( µ, σ ). j For sample size determination, it is common to use a diffuse proper prior for ( µ, µ, σ, σ ), e.g., independent normals with large variances for the mean parameters and 5

7 independent gamma or inverse gamma priors for the variances or standard deviations. The modeled AUC is given by AUC = Φ µ µ σ + σ, (1) where Φ ( ) is the cumulative distribution function of the standard normal distribution. A flexible Bayesian nonparametric alternative is the PM normal model with a countable and infinite number of mixture components. Here, test scores are modeled as independent N ( µ i, σ ) or N( µ, σ ), where the µ j i s and µ s are treated as latent j variables sampled from unspecified probability distributions The independent prior distributions for G and G and G, respectively. G are taken as irichlet processes with normal base distributions and weight parameters α and α. Hierarchical models for the two-group data are given by: y i µ i, σ ~ N(, σ ) µ i j y µ, σ ~ N( µ, σ ) j j µ i G ~ G µ G ~ j G G ~ P( N( µ, σ ), α G G ) G ~ P( N( µ, σ ), α ). G G A popular approach to construct a irichlet process is the stick-breaking procedure developed by [17]. Ishwaran and James [18] connected the finite stickbreaking process with the finite irichlet distribution to approximate the P using a irichlet-multinomial allocation (MA) [see also 19]. It is well-known that the P is discrete with probability one, and thus samples from a P will cluster. The MA approximation to a PM involves clustering as well. Let w i ( w ) denote the sampled j 6

8 group corresponding to individual i in population (individual j in population ). Under the MA process, individual test scores from the (or ) population are assigned to a cluster that is defined by a parameter µ w ] (or µ w ]), where w i (or [ i [ j w j ) indexes the cluster that contains y y i (or j ). enote the maximum number of mixture components by M or M. Test scores with the same value of µ w ] (or [ i µ w ]) belong to the same mixture component. The vectors of mixture probabilities, [ j p and p, are modeled with irichlet prior distributions. Letting µ and µ denote vectors of distinct mean parameters for the mixture components in populations and, respectively, the PM model in this context has the form: y i wi µ σ ~ N( [ ], σ ),, y µ w i j w µ σ ~ N ( µ [ ], σ ) j,, w j µ w ] ~ (, N µ σ ) µ w ] ~ (, N µ σ ) [ i G G [ j G G w i p ~ Multinom(1; p ) w j p ~ Multinom(1; p ) p α ~ irichlet( M α,..., M ) p α ~ irichlet( M α,..., M ) Independent normal-gamma priors can be placed on ( µ G, σ ) and ( µ, σ ), and independent gamma priors can be used for σ and σ. Various approaches have been used to handle α and α. Erkanli et al. [14] and [15] assigned gamma priors to the weight parameters, while [19] and [0] used a fixed value, such as 1, in conjunction with a sensitivity analysis for this choice. The choice of the maximum number of mixture components in the MA process is also important. In their ROC study, Erkanli et al. [14] used a maximum of 10 for both M and M. G G G 7

9 The AUC comparing the separation of the mixture components corresponding to, say, clusters w and w is given by: [ ] [ ] (, ), µ w µ w AUC w w =Φ σ + σ w= 1,..., M ; w = 1,...,M. () Erkanli et al. [14] show that the overall AUC is a weighted average of the components in (), where the weights are determined by the mixture proportions in the vectors p and p, namely M M AUC = p [ w] p [ w] AUC( w, w). (3) w= 1 w= 1 The Bayesian model is fitted by generating a Monte Carlo approximation to the posterior distribution using Gibbs sampling. Estimates of ROC curves and AUC are determined by posterior means and outer percentiles. History plots, autocorrelations, and running multiple chains from a variety of starting values for a random selection of data sets are used to evaluate mixing and convergence of the Markov chain sampling procedure. 3. Power Criterion and Simulation Algorithm We determine a sample size combination ( n, n ) that yields a high predictive power of concluding that AUC> k when in fact the tests AUC exceeds k. The value of k can be determined based on the AUC of a competitor s test or it can be chosen according to a desired level of accuracy, e.g. 0.70, 0.80 or The minimum value is k = 0.50, which would be used if the aim of the study is simply to establish that the test performs better than chance. 8

10 We employ a sample size criterion that is based on predictive power. Our simulation-based approach iterates between two steps: for a given sample size combination, first simulate multiple data sets under a model of interest from populations and, and then fit a PM (or two-group normal) model to each simulated data set using Gibbs sampling and calculate the posterior probability that AUC> k. The average of the posterior probabilities across multiple simulated data sets is the predictive power. The procedure requires two types of prior distributions that are aligned with the two steps. Sampling priors are used to generate model parameters needed to simulate data sets, and fitting priors are used in the analysis of the generated data [1]. The purpose of sampling priors is to incorporate uncertainty about the otherwise assumed true values of input parameters. These distributions (e.g., normal or uniform) are usually narrowly concentrated around a presumed true value. On the other hand, fitting priors usually have large variability so as to not overly influence the power analysis. For each simulated data set, calculate the posterior probability that AUC> k, f f namely Pr( AUC > k y, y ), where f y and f y denote vectors of (future) test scores that are simulated from the and populations, respectively. Sample sizes n and n can be used for the future study if, averaged over the diseased and non-diseased populations, the posterior probability is sufficiently high: f f E [Pr( AUC > k y, y )] 1 β. (4) The specific value of β depends on the nature of the problem, but, in general, reasonable choices include 0.10 and 0.0. Based on (4), we select the current sample size combination if the average posterior probability that AUC is above the benchmark k is at least as high as 1 β. 9

11 Under the two-group normal model, we require prior distributions for µ, µ, σ, and σ. Since the two-group normal is the base distribution of the PM model, the same prior distributions are used for µ G, µ, G σ G, and assign a sampling prior of Uniform(.5, 3.5) to the parameter σ G. For instance, we can µ, which reflects some uncertainty about a prior guess of 3. Fitting priors for these parameters are diffuse normal / inverse gamma joint distributions. The computing algorithm below results in estimated predictive power across a range of sample size combinations. For every selected sample size pair, we simulate B data sets. The subscript t { 1,..., B} refers to the iteration in the simulation. The following steps are used for a single sample size combination ( n, n ) and are repeated across a set of sample size pairs. 1. Generate B parameter sets from sampling priors and then, for i = 1,, n and j = 1,, n, simulate test scores y y it and jt, t =1,,B, from their presumed true population distributions.. Fit the PM using a fitting prior for ( µ G, involves σ,, σ ). The MA process α and α, which can be set equal to constants or assigned fitting priors. Specify M and G µ G G M, and specify priors for p and p. 3. Numerically approximate the posterior distribution of AUC t as defined in equation (3), by first computing the individual components, AUC w t, w ), in (). ( t 4. Numerically approximate the posterior probability that AUC t exceeds k. A Monte Carlo approximation to this posterior probability is computed as the 10

12 proportion of iterates for which AUC t is greater than k, namely calculate r t = 1 m s = I( AUC > k) s 1 m t, where I ( ) denotes the indicator function and s AUC t denotes iterate s out of m simulated values from the posterior distribution of AUC t. B 5. Calculate the average of the r t s, r t / B, which approximates the predictive t= 1 power at the sample size combination ( n, n ). Finally, repeat steps 1 5 over a range of sample size combinations to determine a sample size that achieves the desired predictive power. To calculate predictive power under the two-group normal instead of the PM, we use methods from []. Specifically, in step we fit the two-group normal model and in step 3 the approximation of the posterior distribution of AUC t follows from formula (1) instead of (3). We implemented the algorithm for the PM model using a combination of the software packages R and WinBUGS. The parameters and data sets are generated in R, while posterior distributions are approximated using WinBUGS. A program to perform our procedure will be posted to the website 4. Illustrations We first illustrate that the PM model does not overfit normal data and gives accurate predictive power estimates when normality holds. Then we investigate the robustness, or lack thereof, of sample size requirements calculated under the two-group normal when the true distribution of test scores in population is skewed and when it is 11

13 bimodal; both occur in applications [e.g., 3]. Finally, we consider the two-test scenario where, in population, scores for one of the tests are right skewed and for the other test they have a mixture distribution. All simulations in this section used B=1000 data sets with posterior approximations based on m=5000 Monte Carlo iterations after a 1000 iteration burn-in. We did not identify any difficulties with convergence Two-group normal data ata are generated under the two-group normal model. Suppose our best guesses for the means and variances, µ, µ, σ, and σ, are 3.0, 0,.0, and 1.0. Using formula (1), our best guess for AUC is therefore Sample size combinations of ( n, n ) = (10, 10), (0, 0),,(100,100) are used with k = We assume that on average, test scores in the group exceed those in the group, which is reflected in non-overlapping sampling priors and µ ~ Uniform(.5, 3.5) µ ~ Uniform(-0.5, 0.5). Generally, test scores tend to be more variable in the group, so we used Uniform(1.8,.) and Uniform(0.8, 1.) as the sampling priors for and σ, respectively. Under the two-group normal model, we used a diffuse proper fitting prior that approximates the Jeffreys prior, namely N(0,1000) for means and IG(0.001, 0.001) for variances. These distributions were also used in the PM model as fitting priors for µ G and µ G, and for σ G and σ σ G. In the PM model we set the maximum number of mixing components, M and M, at 10 and the concentration parameters, α and α, at 1. Thus, the parameters in the irichlet priors on p are each 0.1. p and 1

14 The calculated predictive power under the parametric and PM methods was similar across the range of sample sizes considered; the largest difference in predictive power between the two methods did not exceed 0.0 Figure 1 presents a cubic polynomial spline for each approach across n = n = n. To achieve 80% predictive power, both methods require about 38 total subjects, while 90% predictive power is attained with 6 fewer subjects per cohort under the two-group normal than the PM. With 100 subjects sampled from each cohort, the predictive power under the parametric model is and it is under the PM. Figure 1 about here. Using 1000 simulated data sets, we also investigated the variability in the computed value of predictive power under the two-group normal and PM models. For a total sample size of 0, the variation under the two-group normal was 0.068; a similar value was seen for the PM (variance of 0.063). With larger sample sizes of n = n = 100, the PM and normal models showed similar variability (0.048 vs ). We investigated the influence on predictive power of a larger degree of overlap between the distributions of test scores for the diseased and non-diseased cohorts. Specifically, we changed the sampling prior mean of µ from 0 to, so our best guess for the AUC is approximately 0.7. We found that, across sample size combinations ranging from 10 per cohort to 100 per cohort with k = 0.6 and k = 0.65, predictive power for the two-group normal analysis is only about larger than predictive power computed under the PM (Table 1). 13

15 Table 1 about here. The selection of the maximum number of mixture components (M and the PM model was also explored in this setting. We let α = α = 1 and M = M ) in M = 5, 10, or 0. Therefore, p or p was changed from 0. to 0.1 to The calculated predictive power was very similar for these three choices (Table ). The computational time under M = M = 10 was about /3 of what it was with M = M = 0. Thus, using 5 or 10 may be a reasonable initial choice, and a single run with a larger value can be used to assess how results change when more structure is allowed in the model. Table about here. 4.. Skewed data We compared the predictive power calculated under the two-group normal and PM models when the group has test scores that vary according to a right skewed distribution. We are particularly interested in the potential for robustness of the twogroup normal model in terms of predictive power, without resorting to the use of a data transformation. The Box-Cox power transformation is a popular method to induce normality, but it is difficult to determine its transformation parameter in sample size problems without access to a sufficient amount of pilot data. We note, however, that in the rare case that an appropriate normal-generating transformation was known ahead of seeing the data, then it would be known that the normal-normal model would be 14

16 appropriate for the future data, in which case it would be unnecessary to consider our nonparametric approach. The y i s were sampled from an exponential(θ) distribution with θ drawn from a Gamma(0, 0 ) sampling prior. We therefore have as a best guess that diseased subjects have a mean test score of with a variance of and allow for uncertainty about these values. Since a normal distribution will often be appropriate for the group, the y s were generated using the same sampling model and priors as in Section j 4.1. Therefore, the "true" AUC is in a neighborhood of 0.79, which we used as our best guess. We used the same reference fitting priors and we compared the predictive power achieved with a total of 0 subjects up to 00 total subjects. Two comparison sizes were used, namely k = 0.70 and k = For both sizes, the predictive power estimated from the PM is larger compared to the power from the mis-specified two-group normal model, as demonstrated in Table 3. When k = 0.70, the predictive power under the parametric and PM models is similar except at small sample sizes. However, with the higher effect size, the difference in predictive power between the two approaches increases. For k = 0.75, the predictive power under the PM is on average about 6.3% higher than for the parametric method. For 70% power with an effect size of 0.75, 7 diseased subjects are needed according to the PM model whereas at least 50 diseased subjects are needed according to the twogroup normal. Table 3 about here. 15

17 We noted that when k = 0.70 and the sample sizes are large for both the and groups, the predictive power under the two-group normal and PM approaches is similar. This occurs, in part, because at large sample sizes the two-group normal appears to overestimate and underestimate different portions of the ROC curve. Figure presents estimates of the ROC curve using data generated as the 1/101, /101,,100/101 quantiles from the exponential distribution with mean for the group and standard normal for the group. The PM model yields an estimated ROC curve that more closely tracks the empirical ROC curve. The two-group normal model appears to drastically overestimate and underestimate different portions of the ROC curve. This realistic example illustrates the utility of flexible models for both ROC data analysis and sample size determination. Figure about here Bimodal data Here we compare the sample size requirements under the two-group normal and PM models for bimodal data. We assume population follows the normal distribution with the same sampling and fitting priors as in Section 4.1, but population is bimodal. Specifically, the sampling model of y j is the mixture model π 1N( µ 1, σ 1) + π N( µ, σ ), where π 1 ~ Beta(0, 30) and π = 1 π 1. The parameters µ and σ were given the same sampling priors as µ and σ in Section 4.1, i.e., Uniform(.5, 3.5) and Uniform (1.8,.), respectively. Using a sampling prior for σ 1 of Uniform(7.8, 8.), the mixture data had a variance of. We considered two 16

18 sets of sampling priors for µ 1, one with mean 0 the other with mean 1. Under these two sampling priors, the corresponding AUCs are about 0.85 and 0.90, respectively. In both cases we used the same diffuse fitting priors as in Section 4.1. Predictive power was calculated to detect AUC > Of particular interest was whether the difference in predictive power between the two-group normal and PM model changes with decreased test accuracy, i.e. with more overlap between the data distributions for the and populations. Figure 3 shows that, across the range of sample sizes considered, the amount of separation between distributions of test scores in this setting does not have a practical impact on the power difference between the PM and two-group normal model. For instance, with AUC of 0.85, the predictive power under the PM, averaged over the sample sizes considered, is only larger than for the two-group normal. The difference in predictive power increases slightly to when the true AUC is raised to However, an interesting secondary result in this setting is that the PM method generates increasing power functions whereas, especially for AUC 0.85, the predictive power function under the two-group normal exhibits a zig-zag pattern and decreases for sample sizes of 60 to 80, 100 to 10, and 180 to 00. Figure 3 about here. The "true" (hypothetical) AUC in the case of a bimodal distribution for individuals who are and a single normal distribution for population can often be well-approximated by an AUC that is obtained using a single normal distribution in place 17

19 of the true bimodal density, where the mean is the mixture mean and the variance is the true variance of the mixture. We calculated the "true" and "approximate" AUC values over a range of possible inputs and found near equivalence. On the other hand, it was also easy to construct scenarios where this was not the case. For example, with a 0.5 mixture of N(-1,1) and N(3,4) distributions for the population, and a N(0,1) for the population, the "true" and "approximate" AUCs are and 0.644, respectively. We found many other such instances as well. Thus, if the normal-normal model is used, the estimate of the population density will be a single normal and consequently the estimated AUC in this setting will be an estimate of the "approximate" AUC. So in the above instance (0.576 versus 0.644), the sample size expected to detect a larger (false) value will be different than it would be when using the more appropriate nonparametric approach, which would be targeting the true value Serologic data example Hanson et al. [15] analyzed serologic data for Johne s disease in dairy cattle using PM models. Test scores from the non-infected cows had an approximate normal distribution with mean value of and standard deviation of However, test values from the infected cows were bimodal with modes located at and 5, respectively. We used the posterior information from [15] to simulate data with a best guess of AUC at 0.89, and compare the predictive power at cutoff k = 0.80 from the two-group normal and the PM model. The parametric method gave predictive power of and with a total sample size of 80 and 100, respectively, and the PM approach led to similar 18

20 values of and Unlike with skewed data, in this example with normal and bimodal data, the two methods again perform similarly Two diagnostic tests We extend our one-sample approach to studies that directly compare two tests. In the previous sections, we have assumed that a standard test has been in use for some time and that its AUC is well established. When the operating characteristics of the standard test are not established, a paired design is more appropriate. In this scenario two tests are applied to each individual, where for ease of exposition we continue to assume that one is a standard (S) test and the other test is a newly (N) developed competitor. Additionally, in this section we consider a sample size criterion that targets precise estimation of the difference in AUCs between two tests. The average length criterion used here finds the minimum sample size such that 95% posterior intervals for the difference in AUCs have a specified average width. The study goal is to establish the superiority of the new test relative to the standard test. The pairs (S, N) of test scores for non-diseased individuals, ( y, y ), are assumed to be correlated (ρ is about 0.8) and to follow a bivariate normal distribution with mean vector (0, 0) and variance vector (1, 1). For diseased individuals, j S j N we assume the y s have the same exponential distribution used in Section 4. (with i S mean of and variance of ), and that y i N is a linear shift of y, specifically i S through the regression y i N = y i S + ε i, where ε i is normal(0, 1.5). Therefore, the difference in AUCs between the N and S tests is approximately 0.15, given that AUC N is about 0.96 and AUC S is about

21 We compare both average length and predictive power when bivariate normal and PM models are fitted to simulated data sets, in both cases using diffuse priors. We investigate the predictive power for testing AUC N AUC S > 0.05 and set the desired average length at Results in Table 4 show that, in this scenario, the average lengths are comparable under both methods, with a sample size of 50 from each group achieving the desired average length under both approaches. However, predictive power under the PM is consistently lower than that under the parametric model at the sample sizes considered, reflecting that the two methods lead to different AUC estimates. We found that, across the sample size combinations considered, the two-group normal model consistently underestimated AUC S whereas the PM model produced estimates that were close to the best guess of 0.81, especially as sample size increased. Meanwhile, both methods provided similarly accurate estimates for AUC N, because the data distribution under the new test for the population is normal and it is approximately normal for the population. As a result, the posterior means of AUC N AUC S under the two-group normal method were on average 8.1% higher across simulated data sets than those under the PM method. This led to the illusion of higher predictive power at each sample size combination under the normal model compared to the PM method. This example illustrates the importance of monitoring the individual AUCs in addition to their difference when using predictive power for sample size determination in paired designs. Table 4 about here. 0

22 5. Conclusions We developed a simulation-based approach that uses nonparametric irichlet process mixtures for sample size and predictive power calculations in ROC studies designed to compare a continuous medical test to a known standard test. An advantage of the PM model for ROC data analysis is its capability to accommodate non-standard features such as skewness and multimodality that a default two-group normal analysis will fail to identify. We showed that predictive power computed under the traditional parametric approach may not be robust to model mis-specification when data are right skewed, but can be fairly robust to bimodal data within the diseased population. For bimodal data in population and univariate normal data in population, the similarity in predictive power under the PM and two-group normal approaches is not surprising since it is common in this setting for the region of distributional overlap to mimic the shape of the two-group normal. A further finding based on the average length criterion in paired designs when the study goal is to estimate the difference between two AUCs was that, although the PM is a much more highly structured model than the normal-normal, the precision in interval estimates was similar for the two approaches in a particular simulation study. We emphasize that a reason estimated AUCs can be the same in both skewed and bimodal situations is because the normal approximation for population is often adequate in terms of estimating AUC=Pr(X > Y), where random variable X corresponds to test scores from population and random variable Y corresponds to test scores from population, while at the same time the corresponding estimate of ROC(t) is biased high for small t followed by biased low for large t, resulting in cancellation when 1

23 calculating AUC. Therefore, the subsequent data analysis may result in a biased estimate of an ROC curve (and AUC) when the two-group normal model is applied to data that are skewed or multimodal. Thus, we recommend using a nonparametric power specification and a subsequent nonparametric data analysis. The PM method is useful for sample size determination for ROC studies since it is partly immune to model mis-specification. An example of software code to implement the PM and two-group normal methods that is easily modifiable can be obtained on our website or by ing the corresponding author. Sample size calculations are often performed using simpler models for both the data generating distribution and the planned model to be used to analyze the data. Our approach, on the other hand, allows for the use of more complex data generating distributions since our nonparametric modeling of the data in the sample size calculation will adapt appropriately to complexity of the data. Primary motivations in this report are thus to advertise the use of nonparametric methods for ROC data analysis and to provide methods for corresponding sample size and predictive power calculations, subsequently matching sample size analysis and data modeling. Since a major goal is the estimation of the ROC curve itself, and since parametric models for data are often overly restrictive, the PM model seems ripe for use in making inferences. It only makes sense then to perform predictive power calculations using the same flexible nonparametric family of models since it should provide a more accurate quantification of sample sizes needed in subsequent analysis. The question arises: when would one decide to select sample size based on our hypothesis testing criterion versus the estimation criterion? It is common in medical

24 testing to have a standard reference marker that has been in use over some period of time by a variety of researchers and practitioners using different cutoffs due to different sensitivity-specificity requirements. Higher cutoffs usually lead to higher specificity and lower sensitivity. Moreover, it is also common for new markers to be developed with the goal of providing higher sensitivity-specificity pairings across cutoffs than would be achieved with the standard marker, which amounts to having a corresponding ROC curve that dominates the curve for the standard test. If the new marker can be shown to have an AUC that is appreciably greater than that for the standard marker, a step will have been taken in this direction. Thus, our hypothesis testing criterion for selecting sample size applies directly when it is of interest to make a decision about the preference for one marker over the other. On the other hand, the estimation criterion would be used when it is desired to establish, precisely, the value of an AUC for a single marker or the difference in AUCs for two markers. In either case, the goal is to find a sample size that will give sufficient precision for inference. Throughout this paper we have assumed that true disease status could be determined by some other means than the test(s) under study. When this gold-standard setting fails to hold and true infection status of sampled individuals is unknown, additional structure is needed to model and impute the latent status. Branscum et al [16] developed a semiparametric approach for joint analysis with no gold-standard data. By ignoring the added uncertainty attached to unknown infection status, sample size estimates could fall far below what is actually required for a test accuracy study. Further research is needed to extend the gold-standard methods presented in the current study to a nonparametric framework for sample size determination in designs with unidentified 3

25 disease status. For instance, the parametric methods in [] for continuous tests without a gold-standard could potentially be extended nonparametrically, or the nonparametric model in [16] could be applied directly to the task of sample size determination. Both approaches require input about disease prevalence in the source population, and informative priors become mandatory in the presence of lack of model identifiablility. Similar research for binary tests appears in [7]. 4

26 Acknowledgements The authors are grateful to two anonymous reviewers for their comments, which helped improve the paper. References 1. Obuchowski NA, McClish K. Sample size determination for diagnostic accuracy studies involving two-group normal ROC curve indices. Statistics in Medicine 1997; 16: Cheng, Branscum AJ, Stamey J. A Bayesian approach to sample size determination for studies designed to evaluate continuous medical tests. Computational Statistics and ata Analysis 010; 54: Hanson TE, Branscum, AJ, Gardner IA. Multivariate mixtures of Polya trees for modeling ROC data. Statistical Modeling 008; 8: Wang F, Gatsonis CA. Hierarchical models for ROC curve summary measure: design and analysis of multi-reader, multi-modality studies of medical test. Statistics in Medicine 008; 7: Cheng, Stamey J, Branscum AJ. Bayesian approach to average power calculation for binary regression with misclassified outcomes. Statistics in Medicine 009; 8: Cheng, Branscum AJ, Stamey J. Accounting for response misclassification and covariate measurement error improves power and reduces bias in epidemiologic studies. Annals of Epidemiology 010; 0: endukuri N, Rahme E, B lisle P, Joseph L. Bayesian sample size determination for prevalence and diagnostic test studies in the absence of a gold standard test. Biometrics 004; 60: Cheng, Stamey J, Branscum AJ. A general approach to sample size determination for prevalence surveys that use dual test protocols. Biometrical Journal 007; 49: Stamey J, Seaman JW, Young M. Bayesian sample-size determination for inference on two binomial populations with no gold standard classifier. Statistics in Medicine 005; 4:

27 10. Qin G, Zhou X-H. Empirical likelihood inference for the area under the ROC curve. Biometrics 006; 6: Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press Zou KH, Hall WJ, Shapiro E. Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Statistics in Medicine 1997; 16: Lloyd CJ. Using smoothed receiver operating characteristic curves to summarize and compare diagnostic systems. Journal of the American Statistical Association 1998; 93: Erkanli A, Sung M, Costello EJ, Angold A. Bayesian semi-parametric ROC analysis. Statistics in Medicine 006; 5: Hanson TE, Kottas A, Branscum AJ. Modelling stochastic order in the analysis of receiver operating characteristic data: Bayesian nonparametric approaches. Applied Statistics 008; 57: Branscum AJ, Johnson WO, Hanson TE, Gardner IA. Bayesian semiparametric ROC curve estimation and disease diagnosis. Statistics in Medicine 008; 7: Sethuraman J. A constructive definition of irichlet priors. Statistica Sinica 1994; 4: Ishwaran H James, LF. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association 001; 96: Neal RM. Markov chain sampling methods for irichlet process mixture models. Journal of Computational and Graphical Statistics 000; 9: Hanson TE, Branscum AJ, Johnson WO. Bayesian nonparametric modeling and data analysis: an introduction. In Handbook of Statistics, volume 5, ed. ey K, Rao CR, Elsevier Wang F, Gelfand AE. A simulation-based approach to Bayesian sample size determination for performance under a given model and for separating models. Statistical Science 00; 17:

28 Figure 1. Predictive power with k = 0.90 for the two-group normal (solid line) and PM (dashed line) models when normal distributions are used to generate the data from the diseased and non-diseased populations. 7

29 Figure. Three estimates of the ROC curve when test scores of diseased subjects follow an exponential distribution with mean distributed standard normal and scores from non-diseased subjects are. The empirical estimate is plotted as a solid line, the Bayesian parametric estimate as a dashed line, and the PM estimate as a dotted line. 8

30 Figure 3. Predictive power curves when bimodal distributions generate data from the diseased group, with a univariate normal distribution for the non-diseased group, using k = The solid line is for the two-group normal model and the dashed line is for the PM analysis when AUC = The dotted line is for the two-group normal and the dashed-dotted line is for the PM analysis when AUC =

31 Table 1: Predictive power comparison when the disease and disease-free groups follow a normal distribution: (1) k = 0. 6 for the two-group normal model; () k = 0. 6 for the PM model; (3) k = 0. 65for the two-group normal; (4) k = for the PM model. Sample Size Power 1 Power Power 3 Power 4 (10, 10) (0, 0) (30, 30) (40, 40) (50, 50) (60, 60) (70, 70) (80, 80) (90, 90) (100, 100)

32 Table : Predictive power comparison for three different maximum number of PM mixture components, with α = M = M = 10; (3) M = M = 0. α = 1 and k = 0.9: (1) M = M = 5 ; () Sample Size Power 1 Power Power 3 (0, 0) (40, 40) (60, 60) (80, 80) (100, 100)

33 Table 3: Predictive power comparison when the group has a skewed data distribution: (1) k = 0. 7 for the two-group normal model; () k = 0. 7 for the PM model; (3) k = 0.75 for the two-group normal; (4) k = for the PM model. Sample Size Power 1 Power Power 3 Power 4 (10, 10) (0, 0) (30, 30) (40, 40) (50, 50) (60, 60) (70, 70) (80, 80) (90, 90) (100, 100)

34 Table 4: Average length and predictive power comparisons for the two-test scenario computed under the bivariate normal model (1) and the PM model (). Sample Size Length 1 Length Power 1 Power (10, 10) (0, 0) (30, 30) (40, 40) (50, 50)

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test L. García Barrado 1 E. Coart 2 T. Burzykowski 1,2 1 Interuniversity Institute for Biostatistics and

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material Quentin Frederik Gronau 1, Monique Duizer 1, Marjan Bakker

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Bayesian semiparametric modeling for stochastic precedence, with applications in epidemiology and survival analysis

Bayesian semiparametric modeling for stochastic precedence, with applications in epidemiology and survival analysis Bayesian semiparametric modeling for stochastic precedence, with applications in epidemiology and survival analysis ATHANASIOS KOTTAS Department of Applied Mathematics and Statistics, University of California,

More information

A general mixed model approach for spatio-temporal regression data

A general mixed model approach for spatio-temporal regression data A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Flexible Regression Modeling using Bayesian Nonparametric Mixtures Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Nonparametric Bayesian modeling for dynamic ordinal regression relationships

Nonparametric Bayesian modeling for dynamic ordinal regression relationships Nonparametric Bayesian modeling for dynamic ordinal regression relationships Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz Joint work with Maria

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests

Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests Nonparametric predictive inference with parametric copulas for combining bivariate diagnostic tests Noryanti Muhammad, Universiti Malaysia Pahang, Malaysia, noryanti@ump.edu.my Tahani Coolen-Maturi, Durham

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Equivalence of random-effects and conditional likelihoods for matched case-control studies

Equivalence of random-effects and conditional likelihoods for matched case-control studies Equivalence of random-effects and conditional likelihoods for matched case-control studies Ken Rice MRC Biostatistics Unit, Cambridge, UK January 8 th 4 Motivation Study of genetic c-erbb- exposure and

More information

Estimating Optimum Linear Combination of Multiple Correlated Diagnostic Tests at a Fixed Specificity with Receiver Operating Characteristic Curves

Estimating Optimum Linear Combination of Multiple Correlated Diagnostic Tests at a Fixed Specificity with Receiver Operating Characteristic Curves Journal of Data Science 6(2008), 1-13 Estimating Optimum Linear Combination of Multiple Correlated Diagnostic Tests at a Fixed Specificity with Receiver Operating Characteristic Curves Feng Gao 1, Chengjie

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Statistical Inference for Stochastic Epidemic Models

Statistical Inference for Stochastic Epidemic Models Statistical Inference for Stochastic Epidemic Models George Streftaris 1 and Gavin J. Gibson 1 1 Department of Actuarial Mathematics & Statistics, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS,

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Fast Likelihood-Free Inference via Bayesian Optimization

Fast Likelihood-Free Inference via Bayesian Optimization Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology

More information

Non-Parametric Estimation of ROC Curves in the Absence of a Gold Standard

Non-Parametric Estimation of ROC Curves in the Absence of a Gold Standard UW Biostatistics Working Paper Series 7-30-2004 Non-Parametric Estimation of ROC Curves in the Absence of a Gold Standard Xiao-Hua Zhou University of Washington, azhou@u.washington.edu Pete Castelluccio

More information

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services

More information

Three-group ROC predictive analysis for ordinal outcomes

Three-group ROC predictive analysis for ordinal outcomes Three-group ROC predictive analysis for ordinal outcomes Tahani Coolen-Maturi Durham University Business School Durham University, UK tahani.maturi@durham.ac.uk June 26, 2016 Abstract Measuring the accuracy

More information

Machine Learning Overview

Machine Learning Overview Machine Learning Overview Sargur N. Srihari University at Buffalo, State University of New York USA 1 Outline 1. What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved 1. Regression

More information

A Nonparametric Model for Stationary Time Series

A Nonparametric Model for Stationary Time Series A Nonparametric Model for Stationary Time Series Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. isadora.antoniano@unibocconi.it Stephen G. Walker University of Texas at Austin, USA. s.g.walker@math.utexas.edu

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May 5-7 2008 Peter Schlattmann Institut für Biometrie und Klinische Epidemiologie

More information

Modeling continuous diagnostic test data using approximate Dirichlet process distributions

Modeling continuous diagnostic test data using approximate Dirichlet process distributions Research Article Received 04 June 2010 Accepted 17 May 2011 Published online 22 July 2011 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/sim.4320 Modeling continuous diagnostic test data

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Constructing Confidence Intervals of the Summary Statistics in the Least-Squares SROC Model

Constructing Confidence Intervals of the Summary Statistics in the Least-Squares SROC Model UW Biostatistics Working Paper Series 3-28-2005 Constructing Confidence Intervals of the Summary Statistics in the Least-Squares SROC Model Ming-Yu Fan University of Washington, myfan@u.washington.edu

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayesian Confidence Intervals for the Ratio of Means of Lognormal Data with Zeros

Bayesian Confidence Intervals for the Ratio of Means of Lognormal Data with Zeros Bayesian Confidence Intervals for the Ratio of Means of Lognormal Data with Zeros J. Harvey a,b & A.J. van der Merwe b a Centre for Statistical Consultation Department of Statistics and Actuarial Science

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Modelling Receiver Operating Characteristic Curves Using Gaussian Mixtures

Modelling Receiver Operating Characteristic Curves Using Gaussian Mixtures Modelling Receiver Operating Characteristic Curves Using Gaussian Mixtures arxiv:146.1245v1 [stat.me] 5 Jun 214 Amay S. M. Cheam and Paul D. McNicholas Abstract The receiver operating characteristic curve

More information

Lehmann Family of ROC Curves

Lehmann Family of ROC Curves Memorial Sloan-Kettering Cancer Center From the SelectedWorks of Mithat Gönen May, 2007 Lehmann Family of ROC Curves Mithat Gonen, Memorial Sloan-Kettering Cancer Center Glenn Heller, Memorial Sloan-Kettering

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements Jeffrey N. Rouder Francis Tuerlinckx Paul L. Speckman Jun Lu & Pablo Gomez May 4 008 1 The Weibull regression model

More information

THE SKILL PLOT: A GRAPHICAL TECHNIQUE FOR EVALUATING CONTINUOUS DIAGNOSTIC TESTS

THE SKILL PLOT: A GRAPHICAL TECHNIQUE FOR EVALUATING CONTINUOUS DIAGNOSTIC TESTS THE SKILL PLOT: A GRAPHICAL TECHNIQUE FOR EVALUATING CONTINUOUS DIAGNOSTIC TESTS William M. Briggs General Internal Medicine, Weill Cornell Medical College 525 E. 68th, Box 46, New York, NY 10021 email:

More information

Richard D Riley was supported by funding from a multivariate meta-analysis grant from

Richard D Riley was supported by funding from a multivariate meta-analysis grant from Bayesian bivariate meta-analysis of correlated effects: impact of the prior distributions on the between-study correlation, borrowing of strength, and joint inferences Author affiliations Danielle L Burke

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements [Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013 BSquare: An R package for Bayesian simultaneous quantile regression Luke B Smith and Brian J Reich North Carolina State University May 21, 2013 BSquare in an R package to conduct Bayesian quantile regression

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Appendix F. Computational Statistics Toolbox. The Computational Statistics Toolbox can be downloaded from:

Appendix F. Computational Statistics Toolbox. The Computational Statistics Toolbox can be downloaded from: Appendix F Computational Statistics Toolbox The Computational Statistics Toolbox can be downloaded from: http://www.infinityassociates.com http://lib.stat.cmu.edu. Please review the readme file for installation

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Modeling conditional dependence among multiple diagnostic tests

Modeling conditional dependence among multiple diagnostic tests Received: 11 June 2017 Revised: 1 August 2017 Accepted: 6 August 2017 DOI: 10.1002/sim.7449 RESEARCH ARTICLE Modeling conditional dependence among multiple diagnostic tests Zhuoyu Wang 1 Nandini Dendukuri

More information

A spatial scan statistic for multinomial data

A spatial scan statistic for multinomial data A spatial scan statistic for multinomial data Inkyung Jung 1,, Martin Kulldorff 2 and Otukei John Richard 3 1 Department of Epidemiology and Biostatistics University of Texas Health Science Center at San

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Subjective and Objective Bayesian Statistics

Subjective and Objective Bayesian Statistics Subjective and Objective Bayesian Statistics Principles, Models, and Applications Second Edition S. JAMES PRESS with contributions by SIDDHARTHA CHIB MERLISE CLYDE GEORGE WOODWORTH ALAN ZASLAVSKY \WILEY-

More information

Bayesian model selection for computer model validation via mixture model estimation

Bayesian model selection for computer model validation via mixture model estimation Bayesian model selection for computer model validation via mixture model estimation Kaniav Kamary ATER, CNAM Joint work with É. Parent, P. Barbillon, M. Keller and N. Bousquet Outline Computer model validation

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic STATISTICS ANCILLARY SYLLABUS (W.E.F. the session 2014-15) Semester Paper Code Marks Credits Topic 1 ST21012T 70 4 Descriptive Statistics 1 & Probability Theory 1 ST21012P 30 1 Practical- Using Minitab

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

CMPS 242: Project Report

CMPS 242: Project Report CMPS 242: Project Report RadhaKrishna Vuppala Univ. of California, Santa Cruz vrk@soe.ucsc.edu Abstract The classification procedures impose certain models on the data and when the assumption match the

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Modelling stochastic order in the analysis of receiver operating characteristic data: Bayesian non-parametric approaches

Modelling stochastic order in the analysis of receiver operating characteristic data: Bayesian non-parametric approaches Appl. Statist. (2008) 57, Part 2, pp. 207 225 Modelling stochastic order in the analysis of receiver operating characteristic data: Bayesian non-parametric approaches Timothy E. Hanson, University of Minnesota,

More information

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of

More information

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017 Bayesian Statistics Debdeep Pati Florida State University April 3, 2017 Finite mixture model The finite mixture of normals can be equivalently expressed as y i N(µ Si ; τ 1 S i ), S i k π h δ h h=1 δ h

More information

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 Rejoinder Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2 1 School of Statistics, University of Minnesota 2 LPMC and Department of Statistics, Nankai University, China We thank the editor Professor David

More information

McGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-675.

McGill University. Department of Epidemiology and Biostatistics. Bayesian Analysis for the Health Sciences. Course EPIB-675. McGill University Department of Epidemiology and Biostatistics Bayesian Analysis for the Health Sciences Course EPIB-675 Lawrence Joseph Bayesian Analysis for the Health Sciences EPIB-675 3 credits Instructor:

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Rank Regression with Normal Residuals using the Gibbs Sampler

Rank Regression with Normal Residuals using the Gibbs Sampler Rank Regression with Normal Residuals using the Gibbs Sampler Stephen P Smith email: hucklebird@aol.com, 2018 Abstract Yu (2000) described the use of the Gibbs sampler to estimate regression parameters

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December

More information

Chapter 2. Data Analysis

Chapter 2. Data Analysis Chapter 2 Data Analysis 2.1. Density Estimation and Survival Analysis The most straightforward application of BNP priors for statistical inference is in density estimation problems. Consider the generic

More information

Regression tree-based diagnostics for linear multilevel models

Regression tree-based diagnostics for linear multilevel models Regression tree-based diagnostics for linear multilevel models Jeffrey S. Simonoff New York University May 11, 2011 Longitudinal and clustered data Panel or longitudinal data, in which we observe many

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

David Hughes. Flexible Discriminant Analysis Using. Multivariate Mixed Models. D. Hughes. Motivation MGLMM. Discriminant. Analysis.

David Hughes. Flexible Discriminant Analysis Using. Multivariate Mixed Models. D. Hughes. Motivation MGLMM. Discriminant. Analysis. Using Using David Hughes 2015 Outline Using 1. 2. Multivariate Generalized Linear Mixed () 3. Longitudinal 4. 5. Using Complex data. Using Complex data. Longitudinal Using Complex data. Longitudinal Multivariate

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information