Journal of Statistical Software

Size: px
Start display at page:

Download "Journal of Statistical Software"

Transcription

1 JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. doi: /jss.v000.i00 GroupTest: Multiple Testing Procedure for Grouped Hypotheses Zhigen Zhao Abstract In the modern Big Data analysis, testing multiple hypotheses simultaneously has been an important tool in analyzing data arising from scientific studies, such as genetics, astronomy, social sciences and many others. The hypotheses can often be grouped together according to the nature of the scientific investigations. For instance, genes can be grouped according to gene pathways; the nearby pixels in an image can be grouped together. The sparsity appears in both between- and within-group levels. Namely, only a small number of groups are significant and a small number of hypotheses within a significant group are significant. To fully incorporate the group information, Liu, Sarkar, and Zhao (2015) proposed the BSG model, described in Section 2. They have further provided a methodology for testing these grouped hypothesis, controlling the total posterior false discovery rate and within-group false discover rate. The GroupTest package provides the implementation of all the procedures under the R programming environment. The usage of GroupTest is illustrated in this paper using a simulated data set and a real data from Liu et al. (2015). Keywords: local fdr, group, posterior false discovery rate. 1. Introduction In this era of Big Data, multiple hypotheses testing has been an important tools in analyzing data arising from various scientific studies. There are tremendous of research in the last decades, focusing on the theories, methodologies, and applications of multiple comparison. In the novel work of Benjamini and Hochberg (1995), they have coined the false discovery rate (FDR), an error rate which is more liberal than the family wise error rate. In the last two decades, there are many extensions of the BH method to various research scenarios. To name a few here, Benjamini and Yekutieli (2001), Genovese and Wasserman (2002, 2004), Sun and Cai (2007), Sun and Cai (2009), He, Sarkar, and Zhao (2015), Efron (2008), Sarkar (2004), Sarkar (2008), Sarkar (2002).

2 2 GroupTest: Grouped Hypotheses Testing However, there are still many more issues to be addressed when applying the multiple testing procedures in scientific studies, such as the group structure where the hypotheses are naturally grouped according to, for instance, biological functions, gene pathways, geological conditions, and many others. Ignoring the group structure when constructing multiple testing methods may result in misleading conclusions (Efron (2008)). In Liu et al. (2015), they have analyzed the Adequately Yearly Progress (AYP) data set to study the academic performance between the socioeconomically disadvantaged students and the socioeconomically advantaged students of all the elementary schools, which can be naturally grouped according to the school districts. The sparsity appears on both the betweengroup (school district) and within-group (schools within a district) levels. Identifying those school districts with at least one school within it having significant difference in academic performance between these two groups of students is equally important, if not more, than identifying the schools. In this paper, they introduced the BSG model to capture the group structure. Based on the decision theoretical approach, Liu et al. (2015) proposes a twostage method providing decisions on both the between- and within-group levels, subject to control the total posterior false discovery rate and within-group false discovery rate. The package, GroupTest, provides all the functions to implement this two-stage method under the R programming environment. Here is how this article is organized. In Section 2, we rephrase the BSG model from Liu et al. (2015) for the multiple testing with grouped hypotheses. In Section 3, we explain the structure of the data and introduce the AYP data set included in this package for further demonstration. In Section 4, we show the structure of the package and demonstrate how to use this package to make decisions. We conclude the article in Section Model and Hypotheses Testing Denote the hypotheses to be tested as H gj where g = 1, 2,, G, j = 1, 2,, m g where G is the total number of groups and m g is the number of hypotheses within the g-th group. Let θ gj be an indicator whether the alternative hypotheses is true. Namely, θ gj = 0 if the null hypotheses corresponding to H gj is true and θ gj = 1 otherwise. Let X gj be the corresponding test statistic. The following model is introduced in Liu et al. (2015), θ 1,..., θ G θ gj θ g = 0 θ g1,..., θ gmg θ g = 1 X gj θ gj i.i.d. Bernoulli(π 1 ), i.i.d Bernoulli (0) (i.e., P (θ gj = 0 θ g = 0) = 1), mg j=1 { } (1 π 2 1 ) 1 θgj π θ gj I( 2 1 j θ gj>0) 1 (1 π 2 1 ) mg, ind (1 θ gj )f 0 (x gj ) + θ gj f 1 (x gj ), where f 0 (x) and f 1 (x) are the density functions of the test statistic under the null and alternative hypotheses respectively. They call this model with Truncated Bernoulli hidden states within each Significant Group as BSG model. Here, θ g, the indicator whether the g-th group is significant, follows a Bernoulli distribution with the probability of success π 1, i.e. the probability that a group is significant is π 1. Given that the group is not significant, then θ gj = 0 for all j = 1, 2,, m g, implying that all the null hypotheses within this group are true. If a group is significant, the (1)

3 Journal of Statistical Software 3 hidden states within this group follows a truncated Bernoulli with a probability of success π 2 1, ensuring that at least one of the hypotheses within this group is false. They further modeled the distribution of the test statistic as following. When θ gj = 0, the test statistic X follows a standard normal distribution φ(x); and when θ gj = 1, the test statistic X follows a mixture of Gaussian distributions with L components. Namely, f 1 (x) = L l=1 c l 1 σ l φ( x µ l σ l ), (2) where l l=1 c l = 1. The parameter in the model (1-2) is ω = (π 1, π 2 1, c l, µ l, σ l ) which can be estimated using the EM algorithm, outlined in the appendix of Liu et al. (2015). Let δ gj {0, 1} be a decision on the hypothesis H gj where δ gj = 1 if we reject the null hypothesis H gj and δ gj = 0 otherwise. Traditionally, the FDR can be written as G mg g=1 j=1 F DR = E (1 θ gj)δ gj { G } mg, g=1 j=1 δ gj 1 where the hidden states θ gj s are assumed to be fixed and the expectation is taken with respect to X only. In Liu et al. (2015), they considered the total posterior FDR for the grouped hypotheses, a counterpart from the Bayesian perspective, as following P F DR T (X) = E G mg g=1 { G g=1 j=1 (1 θ gj)δ gj } mg j=1 δ gj 1 X. (3) For the within-group level, they have further introduced posterior FDR within a group P F DR W g as P F DR W g (X) = E j δ gj(1 θ gj ) { j gj} δ 1 X The total posterior FDR can be rewritten as a function of the following two quantities, which are crucial for the final proposed method, fdr g (x) = P (θ g = 0 x), fdr j g (x) = P (θ gj = 0 θ g = 1, x). (4) Here, the first quantity measures whether the g-th group is significant; and the second quantity measures whether the j-th hypotheses within the g-th group is significant given that this group is significant. Definition 2.1 (Multiple Testing Procedure for Grouped Hypotheses). Step 1. Estimate the parameter ω using the EM algorithm; Step 2. Calculate fdr g and fdr j g according to (4);

4 4 GroupTest: Grouped Hypotheses Testing Step 3. For each g, let fdr (1) g fdr (2) g fdr (mg) g be the ordered fdr j g, with H g(1),..., H g(mg) being the corresponding hypotheses, and find k R g = max k 1 g g : fdr (j) g η, k g j=1 given 0 < η α < 1. Mark the hypotheses H g(1),..., H g(rg) for possible rejection and go to the next step. Step 4. Calculate η g = 1 Rg R g j=1 fdr (j) g, and define fdrg = 1 (1 η g )(1 fdr g ), for each g. Order these fdrg values as fdr(1) fdr (G), and find { k g=1 l = max k : R (g)fdr(g) } k g=1 R α, (g) with R (g) being the value of R for the group that corresponds to fdr(g). The hypotheses that were marked for possible rejection in the groups (1),..., (l) are ultimately rejected. The proposed two-stage method firstly screens all the hypotheses for each group and marks those hypotheses for potential rejection by controlling the posterior FDR within the group level. After that, the between-group decision is made to guarantee a control of the total posterior FDR. The final rejections are taken from those hypotheses which are marked for rejection within the rejected groups. As shown in Liu et al. (2015), this method fully utilizes the group structure and can substantially improve the existing methods. 3. Package The R package GroupTest contains the following four functions and two data sets: 1. GT.wrapper is the main function called by users to do multiple testing for grouped hypotheses; 2. GT.em is the function for estimating the parameter ω; 3. GT.localfdr is the function for calculating the local fdr scores defined in (4); 4. GT.decision is the function making the decision based on Algorithm 2.1; 5. AYP, the Adequately Yearly Progress data of California, 2013; 6. GroupTest simulate, simulated data set Function GT.wrapper This function is the main function to perform the two-stage testing for grouped hypotheses. GT.wrapper(TestStatistic, alpha = 0.05, eta = alpha, pi1.ini = 0.7, pi2.1.ini = 0.4, L = 2, mul.ini = c(-1, 1), sigmal.ini = c(1, 1), cl.ini = c(0.5, 0.5), DELTA = 0.001, sigma.known=false) It takes the following arguments:

5 Journal of Statistical Software 5 TestStatistic: an array of list. Each list of the array corresponds to one group, containing the test statistic, stored as X, and the group size, stored as mg; alpha: the targeted FDR level. By default, it is chosen as 0.05; eta: the targeted FDR level within each group. The default and recommended choice is alpha; pi1.ini: initial value, the probability that a group is significant. By default, it is chosen as 0.7; pi2.1.ini: initial value, the probability that an individual null hypothesis is false given that the group is significant. By default, it is chosen as 0.4; L: the number of Gaussian component under the alternative hypothesis. By default, it is chosen as 2; mul.ini: initial value: a vector of means for all the components of the Gaussian mixture. By default, is is chosen as -1 and 1; sigmal.ini: initial value: a vector of standard deviation of all the components of the Gaussian mixture. By default, it is chosen as 1 and 1; cl.ini: initial value: a vector of the probability for all the components of the Gaussian mixture. By default, it is chosen as 50% and 50%; DELTA: the criteria to stop the EM algorithm. In this algorithm, we calcualte the maximum of absolution difference of the current estiamted value and its previous value for the parameters. By default, it is chosen as ; sigma.known: the boolean variable, indicating whether the variance is known. Be default, it is chosen as FALSE. This function returns the following two varibles: parameter: a list, consisting of estimated parameters based on the EM algorithm. The elements are π 1, π 2 1,, c l, µ l, σ l ; TSGroupTest[[g]]: the quntities regarding the g-th group, including the test statistic within this group, the individual conditional local fdr score (P (θ gj = 0 x, θ g = 1)), the group-wise local fdr score (P (θ g = 0 x)), between-group decision and within-group decision Simulated Data Set This is a simulated data set for demonstrating the structure of the data set. The data set is an array of three list, where each list corresponds to one group. In each list, the number of individual hypotheses and test statistics for the individual hypotheses within the group are stored. The group sizes are 3, 4, and 5 respectively.

6 6 GroupTest: Grouped Hypotheses Testing library(grouptest) data(``grouptest_simulate'') GroupTest_simulate > GroupTest_simulate [[1]] [[1]]$X [1] [[1]]$mg [1] 3 [[2]] [[2]]$X [1] [[2]]$mg [1] 4 [[3]] [[3]]$X [1] [[3]]$mg [1] AYP data set This data set is the adequate yearly progress (AYP) study of California elementary schools in 2013 available from In Liu et al. (2015), they used this data set to compare the academic performance for socioeconomically advantaged (SEA) against socioeconomically disadvantaged (SED) students in elementary schools of California. For each school, they compared the success rates in Math exams for these two group of students. The focus is to discover the schools with unusually small or large difference in performance and to identify the school districts with such schools. Let p 1i and p 2i be the success rates and n 1i and n 2i be the numbers of students in the groups of SEA and SED students, respectively, in the ith school, i = 1,..., N. A z-value for school i is computed according to ( Efron (2008), Cai and Sun (2009), Liu et al. (2015)) z i = p 1i p 2i τ p1i (1 p 1i )/n 1i + p 2i (1 p 2i )/n 2i,

7 Journal of Statistical Software 7 where τ is the overall difference, median (p 1i )- median (p 2i ), which is 18.4% in this AYP study. After removing the schools with extremely small or large test statistic and schools with less than 20 students, there are 4118 (= N) elementary schools and 701 qualified school districts. The data is saved as an array of lists, with each list corresponds to one school district, containing the following three variables, X: the test statistic for each individual schools within this school district; mg: the number of schools within this school district; School.District: the name of the school district. library(grouptest) data("ayp") length( AYP ) AYP[[1]] > length( AYP ) [1] 701 > AYP[[1]] $mg [1] 16 $X [1] [7] [13] $School.District [1] "ABC Unified" 4. Examples In this section, we will demonstrate how to use functions from this package to analyze both the simulated data and the AYP data set Simulated Data The function GT.em calculates the estimate of the parameter ω using the EM algorithm. The number of components of the Gaussian mixture is provided through the argument L, with a default choice of 2. The initial value of ω should be provided. Depending on the data set, one can assume σ to be either known or unknown, determined by the argument sigma.known. The function returns the estimation of ω. library(grouptest) data(grouptest_simulate)

8 8 GroupTest: Grouped Hypotheses Testing em.estimate <- GT.em( GroupTest_simulate, L=2, pi1.ini=0.7, pi2.1.ini=0.4, mul.ini=c(-1,1), sigmal.ini=c(1,2), cl.ini=c(0.4,0.6), DELTA=0.001, sigma.known=false ) > em.estimate $pi1 [1] $pi2.1 [1] $mul [1] $sigmal [1] $cl [1] $L [1] 2 Based on this output, π 1 and π 2 1 are estimated as 1 and 0.60 respectively, and f 1 (x), the density function under the alternative hypothesis can be estimated as ˆf 1 (x) = 0.50N(.85, ) N(0.02, ). To get the decision on all the hypotheses incorporating the group structure, one can apply the GT.wrapper function, which integrates the estimation of all the parameters, the calculation of the local fdr scores, and provides the final decisions. GT.Test <- GT.wrapper( GroupTest_simulate, alpha=0.05, eta=alpha, pi1.ini=0.7, pi2.1.ini=0.4, L=2, mul.ini=c(-1,1), sigmal.ini=c(1,2), cl.ini=c(0.4,0.6), DELTA=0.001, sigma.known=false ) > GT.Test[[1]] $X [1] $mg [1] 3 $f0x [1] $f1x

9 Journal of Statistical Software 9 [1] $fx [1] $fdr.g [1] e-07 $fdr.j.g [1] $prob.theta.1.theta.j.g.1 [1] $m.j.g [,1] [,2] [1,] e-15 [2,] e-59 [3,] e-01 $within.group.rej [1] $eta.g [1] 0 $fdr.star.g [1] e-07 $between.group.rej [1] 0 > GT.Test[[2]] $X [1] $mg [1] 4 $f0x [1] $f1x [1] $fx [1]

10 10 GroupTest: Grouped Hypotheses Testing $fdr.g [1] e-08 $fdr.j.g [1] $prob.theta.1.theta.j.g.1 [1] $m.j.g [,1] [,2] [1,] e-04 [2,] e-05 [3,] e-01 [4,] e-01 $within.group.rej [1] $eta.g [1] 0 $fdr.star.g [1] e-08 $between.group.rej [1] 0 > GT.Test[[3]] $X [1] $mg [1] 5 $f0x [1] $f1x [1] $fx [1] $fdr.g [1] e-07

11 Journal of Statistical Software 11 $fdr.j.g [1] $prob.theta.1.theta.j.g.1 [1] $m.j.g [,1] [,2] [1,] e-01 [2,] e-30 [3,] e-09 [4,] e-14 [5,] e-01 $within.group.rej [1] $eta.g [1] 0 $fdr.star.g [1] e-07 $between.group.rej [1] 0 Note that this function returns an array of lists with each element corresponding to one group and the estimator of the parameters. As an example, GT.Test[[3]] returns all the information regarding the 3rd group. Here are some key quantities: X: the test statistic of all the hypotheses within the 3-rd group; mg: the number of hypothese within this group; fdr.g: the between-group local fdr score fdr g = P (θ g = 0 x); fdr.j.g: the within-group local fdr scores fdr j g = P (θ j g = 0 x, θ g = 1); between.group.rej: the between-group decision, δ g ; within.group.rej: the within-group decision δ j g ; This function also returns the estimation of the parameters which is the same output from GT.em(). > GT.Test$parameter $pi1

12 12 GroupTest: Grouped Hypotheses Testing [1] $pi2.1 [1] $mul [1] $sigmal [1] $cl [1] $L [1] AYP Data Set In this section, the AYP data set will be analyzed using the functions providing from GroupTest. AYP.Test <- GT.wrapper( AYP, alpha=0.1, sigma.known=true ) > AYP.Test$parameter $pi1 [1] $pi2.1 [1] $mul [1] $sigmal [1] 1 1 $cl [1] $L [1] 2 When setting L = 2, note that ˆπ 1 = 0.533, which implies that the estimated proportion of significant school districts is 53.3%. When a school district is significant, the chance that an

13 Journal of Statistical Software 13 individual school is significant is 59.4%. Under the alternative hypothesis, f 1 (x), the density function of the test statistic, can be estimated as f 1 (x) = 0.792N( 1.878, 1) N(2.644, 1). G <- length(ayp) G.rej <- array(0, G) for( g in 1:G ) G.rej[g] <- AYP.Test[[g]]$between.group.rej sum(g.rej) total.school <- 0 for(g in 1:G) total.school <- total.school + sum( AYP.Test[[g]]$within.group.rej ) > sum(g.rej) [1] 284 > which(g.rej==1) [1] [19] > total.school [1] 1085 > AYP.Test[[1]] $mg [1] 16 $X [1] [7] [13] $School.District [1] "ABC Unified" $fdr.g [1] $fdr.j.g [1] [7] [13]

14 14 GroupTest: Grouped Hypotheses Testing $within.group.rej [1] $fdr.star.g [1] $between.group.rej [1] 1 When setting η = α = 0.1, based on the above output, the total number of rejected school districts is 284, including the ABC Unified school district, which is stored as the first school district in the AYP data set. Note that there are 16 schools within the ABC Unified School district and three of them are rejected (the 7-th, 9-th, and the 10-th school). The total number of schools that are rejected is As reported in Liu et al. (2015), the proposed test is much more powerful than the alternative methods without incorporating the group structure. For instance, the SC s method (Sun and Cai (2007) results in 668 rejected schools and the adaptive BH method(benjamini and Hochberg (2000)) leads to 629 rejected schools, not even to mention that both methods provide no control on the between-group levels. 5. Conclusion In big data analysis when testing grouped hypotheses, how to incorporate the group structure to develop valid and sharp testing procedure is an important, urgent, yet challenging problems that needs immediately attentions. Liu et al. (2015) has successfully built a theoretical framework to develop testing procedure using the Bayesian/empirical Bayesian methodologies. The GroupTest, a R-package has been developed to implement the proposed method such that it can be easily applied in real application. In this article, the author used a simulated example and AYP data set to demonstrate how to use this package. References Benjamini Y, Hochberg Y (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B, 57(1), Benjamini Y, Hochberg Y (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. Journal of Educational and Behavioral Statistics, 25(1), Benjamini Y, Yekutieli D (2001). The control of the false discovery rate in multiple testing under dependency. Annals of statistics, pp Cai TT, Sun W (2009). Simultaneous testing of grouped hypotheses: Finding needles in multiple haystacks. Journal of the American Statistical Association, 104(488),

15 Journal of Statistical Software 15 Efron B (2008). Microarrays, empirical Bayes and the two-groups model. Statistical Science, 23(1), Genovese C, Wasserman L (2002). Operating Characteristics and Extensions of the False Discovery Rate Procedure. Journal of the Royal Statistical Society. Series B, 64(3), Genovese C, Wasserman L (2004). A stochastic process approach to false discovery control. The Annals of Statistics, 32(3), He L, Sarkar SK, Zhao Z (2015). Capturing the Severity of Type II errors in High-Dimensional Multiple Testing. Journal of Multivariate Analysis, 142, Liu Y, Sarkar SK, Zhao Z (2015). grouped hypotheses. Submitted. A decision theoretic approach to multiple testing of Sarkar SK (2002). Some results on false discovery rate in stepwise multiple testing procedures. Annals of Statistics, pp Sarkar SK (2004). FDR-controlling stepwise procedures and their false negatives rates. Journal of statistical planning and inference, 125(1), Sarkar SK (2008). On Methods Controlling the False Discovery Rate(with discussion). Sankhya, Ser. A, 70, Sun W, Cai TT (2007). Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association, 102(479), Sun W, Cai TT (2009). Large-scale multiple testing under dependence. Journal of the Royal Statistical Society. Series B, 71(2), Affiliation: Zhigen Zhao 342 Speakman Hall 1801 N. 13th Street Fox School of Business Temple University Philadelphia, PA zhaozhg@temple.edu URL: Journal of Statistical Software published by the Foundation for Open Access Statistics MMMMMM YYYY, Volume VV, Issue II doi: /jss.v000.i Submitted: yyyy-mm-dd Accepted: yyyy-mm-dd

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2009 Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks T. Tony Cai University of Pennsylvania

More information

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE Sanat K. Sarkar 1, Tianhui Zhou and Debashis Ghosh Temple University, Wyeth Pharmaceuticals and

More information

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

False Discovery Control in Spatial Multiple Testing

False Discovery Control in Spatial Multiple Testing False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University

More information

High-throughput Testing

High-throughput Testing High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

A Large-Sample Approach to Controlling the False Discovery Rate

A Large-Sample Approach to Controlling the False Discovery Rate A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University

More information

Alpha-Investing. Sequential Control of Expected False Discoveries

Alpha-Investing. Sequential Control of Expected False Discoveries Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University The Annals of Statistics 2006, Vol. 34, No. 1, 394 415 DOI: 10.1214/009053605000000778 Institute of Mathematical Statistics, 2006 FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a

More information

arxiv: v1 [stat.me] 13 Dec 2017

arxiv: v1 [stat.me] 13 Dec 2017 Local False Discovery Rate Based Methods for Multiple Testing of One-Way Classified Hypotheses Sanat K. Sarkar, Zhigen Zhao Department of Statistical Science, Temple University, Philadelphia, PA, 19122,

More information

Doing Cosmology with Balls and Envelopes

Doing Cosmology with Balls and Envelopes Doing Cosmology with Balls and Envelopes Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

arxiv: v1 [math.st] 31 Mar 2009

arxiv: v1 [math.st] 31 Mar 2009 The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Resampling-Based Control of the FDR

Resampling-Based Control of the FDR Resampling-Based Control of the FDR Joseph P. Romano 1 Azeem S. Shaikh 2 and Michael Wolf 3 1 Departments of Economics and Statistics Stanford University 2 Department of Economics University of Chicago

More information

The miss rate for the analysis of gene expression data

The miss rate for the analysis of gene expression data Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,

More information

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint

More information

FDR and ROC: Similarities, Assumptions, and Decisions

FDR and ROC: Similarities, Assumptions, and Decisions EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

Two-stage stepup procedures controlling FDR

Two-stage stepup procedures controlling FDR Journal of Statistical Planning and Inference 38 (2008) 072 084 www.elsevier.com/locate/jspi Two-stage stepup procedures controlling FDR Sanat K. Sarar Department of Statistics, Temple University, Philadelphia,

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

Sanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract

Sanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract Adaptive Controls of FWER and FDR Under Block Dependence arxiv:1611.03155v1 [stat.me] 10 Nov 2016 Wenge Guo Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102, U.S.A.

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are

More information

On adaptive procedures controlling the familywise error rate

On adaptive procedures controlling the familywise error rate , pp. 3 On adaptive procedures controlling the familywise error rate By SANAT K. SARKAR Temple University, Philadelphia, PA 922, USA sanat@temple.edu Summary This paper considers the problem of developing

More information

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Statistica Sinica 19 (2009), 125-143 EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Debashis Ghosh Penn State University Abstract: There is much recent interest

More information

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo PROCEDURES CONTROLLING THE k-fdr USING BIVARIATE DISTRIBUTIONS OF THE NULL p-values Sanat K. Sarkar and Wenge Guo Temple University and National Institute of Environmental Health Sciences Abstract: Procedures

More information

Heterogeneity and False Discovery Rate Control

Heterogeneity and False Discovery Rate Control Heterogeneity and False Discovery Rate Control Joshua D Habiger Oklahoma State University jhabige@okstateedu URL: jdhabigerokstateedu August, 2014 Motivating Data: Anderson and Habiger (2012) M = 778 bacteria

More information

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Joshua D. Habiger Oklahoma State University jhabige@okstate.edu Nov. 8, 2013 Outline 1 : Motivation and FDR Research Areas

More information

The Pennsylvania State University The Graduate School A BAYESIAN APPROACH TO FALSE DISCOVERY RATE FOR LARGE SCALE SIMULTANEOUS INFERENCE

The Pennsylvania State University The Graduate School A BAYESIAN APPROACH TO FALSE DISCOVERY RATE FOR LARGE SCALE SIMULTANEOUS INFERENCE The Pennsylvania State University The Graduate School A BAYESIAN APPROACH TO FALSE DISCOVERY RATE FOR LARGE SCALE SIMULTANEOUS INFERENCE A Thesis in Statistics by Bing Han c 2007 Bing Han Submitted in

More information

This paper has been submitted for consideration for publication in Biometrics

This paper has been submitted for consideration for publication in Biometrics BIOMETRICS, 1 10 Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses A. Farcomeni Department of Public Health and Infectious Diseases Sapienza

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-5-2016 Large-Scale Multiple Testing of Correlations T. Tony Cai University of Pennsylvania Weidong Liu Follow this

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Research Article Sample Size Calculation for Controlling False Discovery Proportion Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,

More information

New Approaches to False Discovery Control

New Approaches to False Discovery Control New Approaches to False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie

More information

Probabilistic Inference for Multiple Testing

Probabilistic Inference for Multiple Testing This is the title page! This is the title page! Probabilistic Inference for Multiple Testing Chuanhai Liu and Jun Xie Department of Statistics, Purdue University, West Lafayette, IN 47907. E-mail: chuanhai,

More information

Parametric Empirical Bayes Methods for Microarrays

Parametric Empirical Bayes Methods for Microarrays Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions

More information

Hunting for significance with multiple testing

Hunting for significance with multiple testing Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Weidong Liu October 19, 2014 Abstract Large-scale multiple two-sample Student s t testing problems often arise from the

More information

arxiv: v2 [stat.me] 31 Aug 2017

arxiv: v2 [stat.me] 31 Aug 2017 Multiple testing with discrete data: proportion of true null hypotheses and two adaptive FDR procedures Xiongzhi Chen, Rebecca W. Doerge and Joseph F. Heyse arxiv:4274v2 [stat.me] 3 Aug 207 Abstract We

More information

A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data

A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data Juliane Schäfer Department of Statistics, University of Munich Workshop: Practical Analysis of Gene Expression Data

More information

Stat 206: Estimation and testing for a mean vector,

Stat 206: Estimation and testing for a mean vector, Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where

More information

Estimation of the False Discovery Rate

Estimation of the False Discovery Rate Estimation of the False Discovery Rate Coffee Talk, Bioinformatics Research Center, Sept, 2005 Jason A. Osborne, osborne@stat.ncsu.edu Department of Statistics, North Carolina State University 1 Outline

More information

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR CONTROLLING THE FALSE DISCOVERY RATE A Dissertation in Statistics by Scott Roths c 2011

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

On Methods Controlling the False Discovery Rate 1

On Methods Controlling the False Discovery Rate 1 Sankhyā : The Indian Journal of Statistics 2008, Volume 70-A, Part 2, pp. 135-168 c 2008, Indian Statistical Institute On Methods Controlling the False Discovery Rate 1 Sanat K. Sarkar Temple University,

More information

False discovery rate control for non-positively regression dependent test statistics

False discovery rate control for non-positively regression dependent test statistics Journal of Statistical Planning and Inference ( ) www.elsevier.com/locate/jspi False discovery rate control for non-positively regression dependent test statistics Daniel Yekutieli Department of Statistics

More information

Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis

More information

arxiv: v1 [stat.me] 25 Aug 2016

arxiv: v1 [stat.me] 25 Aug 2016 Empirical Null Estimation using Discrete Mixture Distributions and its Application to Protein Domain Data arxiv:1608.07204v1 [stat.me] 25 Aug 2016 Iris Ivy Gauran 1, Junyong Park 1, Johan Lim 2, DoHwan

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments

Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Jie Chen 1 Merck Research Laboratories, P. O. Box 4, BL3-2, West Point, PA 19486, U.S.A. Telephone:

More information

A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments

A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Jie Chen 1 Merck Research Laboratories, P. O. Box 4, BL3-2, West Point, PA 19486, U.S.A. Telephone:

More information

Control of Directional Errors in Fixed Sequence Multiple Testing

Control of Directional Errors in Fixed Sequence Multiple Testing Control of Directional Errors in Fixed Sequence Multiple Testing Anjana Grandhi Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102-1982 Wenge Guo Department of Mathematical

More information

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS Ying Liu Department of Biostatistics, Columbia University Summer Intern at Research and CMC Biostats, Sanofi, Boston August 26, 2015 OUTLINE 1 Introduction

More information

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich

DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO. By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich Submitted to the Annals of Statistics DISCUSSION OF A SIGNIFICANCE TEST FOR THE LASSO By Peter Bühlmann, Lukas Meier and Sara van de Geer ETH Zürich We congratulate Richard Lockhart, Jonathan Taylor, Ryan

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Peak Detection for Images

Peak Detection for Images Peak Detection for Images Armin Schwartzman Division of Biostatistics, UC San Diego June 016 Overview How can we improve detection power? Use a less conservative error criterion Take advantage of prior

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the

More information

Large-Scale Hypothesis Testing

Large-Scale Hypothesis Testing Chapter 2 Large-Scale Hypothesis Testing Progress in statistics is usually at the mercy of our scientific colleagues, whose data is the nature from which we work. Agricultural experimentation in the early

More information

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS Journal of Biopharmaceutical Statistics, 21: 726 747, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.551333 MULTISTAGE AND MIXTURE PARALLEL

More information

ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE

ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE Statistica Sinica 22 (2012), 1689-1716 doi:http://dx.doi.org/10.5705/ss.2010.255 ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE Irina Ostrovnaya and Dan L. Nicolae Memorial Sloan-Kettering

More information

More powerful control of the false discovery rate under dependence

More powerful control of the false discovery rate under dependence Statistical Methods & Applications (2006) 15: 43 73 DOI 10.1007/s10260-006-0002-z ORIGINAL ARTICLE Alessio Farcomeni More powerful control of the false discovery rate under dependence Accepted: 10 November

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lecture 7 April 16, 2018

Lecture 7 April 16, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 7 April 16, 2018 Prof. Emmanuel Candes Scribe: Feng Ruan; Edited by: Rina Friedberg, Junjie Zhu 1 Outline Agenda: 1. False Discovery Rate (FDR) 2. Properties

More information

Controlling the False Discovery Rate in Two-Stage. Combination Tests for Multiple Endpoints

Controlling the False Discovery Rate in Two-Stage. Combination Tests for Multiple Endpoints Controlling the False Discovery Rate in Two-Stage Combination Tests for Multiple ndpoints Sanat K. Sarkar, Jingjing Chen and Wenge Guo May 29, 2011 Sanat K. Sarkar is Professor and Senior Research Fellow,

More information

discovery rate control

discovery rate control Optimal design for high-throughput screening via false discovery rate control arxiv:1707.03462v1 [stat.ap] 11 Jul 2017 Tao Feng 1, Pallavi Basu 2, Wenguang Sun 3, Hsun Teresa Ku 4, Wendy J. Mack 1 Abstract

More information

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015

EECS564 Estimation, Filtering, and Detection Exam 2 Week of April 20, 2015 EECS564 Estimation, Filtering, and Detection Exam Week of April 0, 015 This is an open book takehome exam. You have 48 hours to complete the exam. All work on the exam should be your own. problems have

More information

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Test (2008) 17: 461 471 DOI 10.1007/s11749-008-0134-6 DISCUSSION Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Joseph P. Romano Azeem M. Shaikh

More information

Bayes Factor Single Arm Time-to-event User s Guide (Version 1.0.0)

Bayes Factor Single Arm Time-to-event User s Guide (Version 1.0.0) Bayes Factor Single Arm Time-to-event User s Guide (Version 1.0.0) Department of Biostatistics P. O. Box 301402, Unit 1409 The University of Texas, M. D. Anderson Cancer Center Houston, Texas 77230-1402,

More information

arxiv: v3 [math.st] 15 Jul 2018

arxiv: v3 [math.st] 15 Jul 2018 A New Step-down Procedure for Simultaneous Hypothesis Testing Under Dependence arxiv:1503.08923v3 [math.st] 15 Jul 2018 Contents Prasenjit Ghosh 1 and Arijit Chakrabarti 2 1 Department of Statistics, Presidency

More information

Effects of dependence in high-dimensional multiple testing problems. Kyung In Kim and Mark van de Wiel

Effects of dependence in high-dimensional multiple testing problems. Kyung In Kim and Mark van de Wiel Effects of dependence in high-dimensional multiple testing problems Kyung In Kim and Mark van de Wiel Department of Mathematics, Vrije Universiteit Amsterdam. Contents 1. High-dimensional multiple testing

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

A3. Statistical Inference Hypothesis Testing for General Population Parameters

A3. Statistical Inference Hypothesis Testing for General Population Parameters Appendix / A3. Statistical Inference / General Parameters- A3. Statistical Inference Hypothesis Testing for General Population Parameters POPULATION H 0 : θ = θ 0 θ is a generic parameter of interest (e.g.,

More information

On Assessing Bioequivalence and Interchangeability between Generics Based on Indirect Comparisons

On Assessing Bioequivalence and Interchangeability between Generics Based on Indirect Comparisons On Assessing Bioequivalence and Interchangeability between Generics Based on Indirect Comparisons Jiayin Zheng 1, Shein-Chung Chow 1 and Mengdie Yuan 2 1 Department of Biostatistics & Bioinformatics, Duke

More information

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses Gavin Lynch Catchpoint Systems, Inc., 228 Park Ave S 28080 New York, NY 10003, U.S.A. Wenge Guo Department of Mathematical

More information

Bayesian Aspects of Classification Procedures

Bayesian Aspects of Classification Procedures University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations --203 Bayesian Aspects of Classification Procedures Igar Fuki University of Pennsylvania, igarfuki@wharton.upenn.edu Follow

More information

Package horseshoe. November 8, 2016

Package horseshoe. November 8, 2016 Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding

More information

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Multiple testing methods to control the False Discovery Rate (FDR),

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models

Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models David B. Dahl Department of Statistics Texas A&M University Marina Vannucci, Michael Newton, & Qianxing Mo

More information

Multiple hypothesis testing using the excess discovery count and alpha-investing rules

Multiple hypothesis testing using the excess discovery count and alpha-investing rules Multiple hypothesis testing using the excess discovery count and alpha-investing rules Dean P. Foster and Robert A. Stine Department of Statistics The Wharton School of the University of Pennsylvania Philadelphia,

More information

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons: STAT 263/363: Experimental Design Winter 206/7 Lecture January 9 Lecturer: Minyong Lee Scribe: Zachary del Rosario. Design of Experiments Why perform Design of Experiments (DOE)? There are at least two

More information

arxiv:math/ v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao

arxiv:math/ v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? arxiv:math/0701003v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao ABSTRACT. In the analysis

More information

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects A Simple, Graphical Procedure for Comparing Multiple Treatment Effects Brennan S. Thompson and Matthew D. Webb May 15, 2015 > Abstract In this paper, we utilize a new graphical

More information

Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses

Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses arxiv:1610.03330v1 [stat.me] 11 Oct 2016 Jingshu Wang, Chiara Sabatti, Art B. Owen Department of Statistics, Stanford University

More information

Manual: R package HTSmix

Manual: R package HTSmix Manual: R package HTSmix Olga Vitek and Danni Yu May 2, 2011 1 Overview High-throughput screens (HTS) measure phenotypes of thousands of biological samples under various conditions. The phenotypes are

More information

Multiple Dependent Hypothesis Tests in Geographically Weighted Regression

Multiple Dependent Hypothesis Tests in Geographically Weighted Regression Multiple Dependent Hypothesis Tests in Geographically Weighted Regression Graeme Byrne 1, Martin Charlton 2, and Stewart Fotheringham 3 1 La Trobe University, Bendigo, Victoria Austrlaia Telephone: +61

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 11 1 / 44 Tip + Paper Tip: Two today: (1) Graduate school

More information

Specific Differences. Lukas Meier, Seminar für Statistik

Specific Differences. Lukas Meier, Seminar für Statistik Specific Differences Lukas Meier, Seminar für Statistik Problem with Global F-test Problem: Global F-test (aka omnibus F-test) is very unspecific. Typically: Want a more precise answer (or have a more

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information