arxiv: v1 [stat.me] 13 Dec 2017

Size: px
Start display at page:

Download "arxiv: v1 [stat.me] 13 Dec 2017"

Transcription

1 Local False Discovery Rate Based Methods for Multiple Testing of One-Way Classified Hypotheses Sanat K. Sarkar, Zhigen Zhao Department of Statistical Science, Temple University, Philadelphia, PA, 19122, USA arxiv: v1 [stat.me] 13 Dec 2017 Abstract This paper continues the line of research initiated in Liu et al. (2016) on developing a novel framework for multiple testing of hypotheses grouped in a one-way classified form using hypothesisspecific local false discovery rates (Lfdr s). It is built on an extension of the standard two-class mixture model from single to multiple groups, defining hypothesis-specific Lfdr as a function of the conditional Lfdr for the hypothesis given that it is within a significant group and the Lfdr for the group itself and involving a new parameter that measures grouping effect. This definition captures the underlying group structure for the hypotheses belonging to a group more effectively than the standard two-class mixture model. Two new Lfdr based methods, possessing meaningful optimalities, are produced in their oracle forms. One, designed to control false discoveries across the entire collection of hypotheses, is proposed as a powerful alternative to simply pooling all the hypotheses into a single group and using commonly used Lfdr based method under the standard single-group two-class mixture model. The other is proposed as an Lfdr analog of the method of Benjamini & Bogomolov (2014) for selective inference. It controls Lfdr based measure of false discoveries associated with selecting groups concurrently with controlling the average of within-group false discovery proportions across the selected groups. Numerical studies show that our proposed methods are indeed more powerful than their relevant competitors, at least in their oracle forms, in commonly occurring practical scenarios. Keywords: False Discovery Rate, Grouped Hypotheses, Large-Scale Multiple Testing. 1. Introduction Modern scientific studies aided by high-throughput technologies, such as those related to brain imaging, microarray analysis, astronomy, atmospheric science, drug discovery, and many others, are increasingly relying on large-scale multiple testing as an integral part of statistical investigations focused on high-dimensional inference. With many of these investigations, notably in genome-wide association and neuroimaging studies, giving rise to testing of hypotheses that appear in groups, the multiple testing paradigm seems to be shifting from single to multiple Sanat K. Sarkar is Professor and Zhigen Zhao is Associate Professor of Department of Statistical Science, Temple University. Sarkar s research was supported by NSF Grants DMS and DMS Zhao s research was supported by NSF Grant DMS and NSF Grant IIS addresses: sanat@temple.edu (S.K. Sarkar), zhaozhg@temple.edu (Z. Zhao). Preprint submitted to Elsevier December 15, 2017

2 groups of hypotheses. These groups, forming at single or multiple levels creating one- or multiway classified hypotheses, can occur naturally due to the underlying biological or experimental process or be created using internal or external information capturing certain specific features of the data. Several newer questions arise with this paradigm shift. However, we will focus on the following two questions that seem relatively more relevant in light of what is available in the literature in the context of controlling an overall measure of false discoveries across the entire collection of hypotheses: Q1. For multiple testing of hypotheses grouped into a one-way classified form, how to effectively capture the underlying group/classification structure, instead of simply pooling all the hypotheses into a single group, while controlling overall false discoveries across all individual hypotheses? Q2. For hypotheses grouped into a one-way classified form in the context of post-selective inference where groups are selected before testing the hypotheses in the selected groups, how to effectively capture the underlying group/classification structure to control the expected average of false discovery proportions across the selected groups? Progress has been made toward answering Q1 (Hu et al. (2010)) and Q2 (Benjamini & Bogomolov (2014)) for one-way classified hypotheses in the framework of Benjamini-Hochberg (Benjamini & Hochberg (1995)) type false discovery rate () control. However, research addressing these questions based on local false discovery rate (Lfdr) (Efron et al. (2001)) based methodologies are largely absent, excepting the recent work of Liu et al. (2016) where a method has been proposed in its oracle form to answer the following question related to Q1: When making important discoveries within each group is as important as making those discoveries across all hypotheses, how to maintain a control over falsely discovered hypotheses within each group while controlling it across all hypotheses? The fact that an Lfdr based approach with its Bayesian/empirical Bayesian and decision theoretic foundation can yield powerful multiple testing method controlling false discoveries effectively capturing dependence as well as other structures of the data in single- and multiplegroup settings has been demonstrated before (Sun et al. (2006); Sun & Cai (2007); Efron (2008); Ferkingstad et al. (2008); Sarkar et al. (2008); Sun & Cai (2009); Cai & Sun (2009); Hu et al. (2010); Zablocki et al. (2014); Ignatiadis et al. (2016)). However, the work of Liu et al. (2016) is fundamentally different from these works in that it takes into account the sparsity of signals both across groups and within each active group. Consequently, the effect of a group s significance in terms of its Lfdr can be explicitly factored into a significance measure of each hypothesis within that group. On the other hand, in those other works, such as Sun & Cai (2009); Hu et al. (2010), significance measure of each hypothesis within a group is adjusted for the group s effect through its size rather than its measure of significance. In this article, we continue the line of research initiated in Liu et al. (2016) to answer Q1 and Q2 in an Lfdr framework. More specifically, we borrow ideas from Liu et al. (2016) in developing methodological steps to present a unified group-adjusted multiple testing framework for one-way classified hypotheses that introduces a grouping effect into overall false discoveries across all 2

3 individual hypotheses or the average of within-group false discovery proportions across selected groups. In the next section, we present the current state of knowledge closely pertinent to the present work and make remarks motivating the development of our proposed methodologies. 2. Literature Review and Motivating Remark Suppose there are N null hypotheses that appear in m non-overlapping families/groups, with H ij being the jth hypothesis in the ith group (i = 1,...,m;j = 1,...,n i ). We refer to such a layout of hypotheses as one-way classified hypotheses. With θ ij indicating the truth (θ ij = 0) or falsity (θ ij = 1) of H ij, the Lfdr, defined by the posterior probability P (θ ij = 0 X), where X = {X ij,i = 1,...,m;j = 1,...,n i }, is the basic ingredient for constructing Lfdr based approaches controlling false discoveries. The single-group case (or the case ignoring the group structure) has been considered extensively in the literature, notably Sun & Cai (2007); Cai & Sun (2009) and He et al. (2015) who focused on constructing methods that are optimal, at least in their oracle forms. These oracle methods correspond to Bayes multiple decision rules under a single-group two-class mixture model (Efron et al. (2001); Newton et al. (2004); Storey (2002)) that minimize marginal false non-discovery rate (mfnr), a measure of false non-discoveries closely related to the notion of false non-discoveries (FNR) introduced in Genovese & Wasserman (2002) and Sarkar (2004), subject to controlling marginal false discovery rate (m), a measure of false discoveries closely related to the BH and the positive (p) of Storey (2002). Multiple-group versions of single-group Lfdr based approaches to multiple testing have started getting attention recently, among them the following seem more relevant to our work. Cai & Sun (2009) extended their work from single to multiple groups (one-way classified hypotheses) under the following model: With i taking the value k with some prior probability π k, (X ij,θ ij ), j = 1,...,n i, given i = k, are assumed to be iid random pairs with X kj θ kj (1 θ kj )f k0 + θ kj f k1, for some given densities f k0 and f k1, and θ kj Bernoulli(p k ). They developed a method, which in its oracle form minimizes mfnr subject to controlling m and is defined in terms of thresholding the conditional Lfdr s: CLfdr i (X ij ) = (1 p i )f i0 (X ij )/f i (X ij ), where f i (X ij ) = (1 p i )f i0 (X ij )+p i f i1 (X ij ), for j = 1,...,n i, i = 1,...,m, before proposing a data-driven version of the oracle method that asymptoticaly maintains the original oracle properties. It should be noted that the probability π k relates to the size of group k and provides little information about the significance of the group itself. Ferkingstad et al. (2008) brought the grouped hypotheses setting into testing a single family of hypotheses in an attempt to empower typical Lfdr based thresholding approach by leveraging an external covariate. They partitioned the p-values into a number of small bins (groups) according to ordered values of the covariate. With the underlying two-class mixture model defined separately for each bin depending on the corresponding value of the covariate, they defined the so called covariate-modulated Lfdr as the posterior probability of a null hypothesis given the value of the covariate for the corresponding bin. They estimated 3

4 the covariate-modulated Lfdr in each bin using a Bayesian approach before proposing their thresholding method, not necessarily controlling an overall measure of false discoveries such as the m or the posterior. An extension of this work from single to multiple covariates can be seen in Zablocki et al. (2014); Scott et al. (2015). Very recently, Cai et al. (2016) developed a novel grouped hypotheses testing framework for two-sample multiple testing of the differences between two highly sparsed mean vectors, having constructed the groups to extract sparisty information in the data by using a carefully constructed auxiliary covariate. They proposed an Lfdr based optimal multiple testing procedure controlling as a powerful alternative to standard procedures based on the sample mean differences. A sudden upsurge of research has taken place recently in selective/post-selection inference due to its importance in light of the realization by the scientific community that the lack of reproducibility of a scientist s work is often caused by his/her failure to account for selection bias. When multiple hypotheses are simultaneously tested in a selective inference setting, it gives rise to a grouped hypotheses testing framework with the tested groups being selected from a given set of groups of hypotheses. Benjamini & Bogomolov (2014) introduced the notion of the expected average of false discovery proportion across the selected groups as an appropriate error rate to control in this setting and proposed a method that controls it. Since then, a few papers have been written in this area (Peterson et al. (2016a) and Heller et al. (2017)); however, no research has been produced yet in the Lfdr framework. Remark 2.1. When grouping of hypotheses occurs, naturally or artificially, an assumption can be made that the significance of a hypothesis is influenced by that of the group it belongs to. The Lfdr under the standard two-class mixture model, however, does not help in assessing a group s influence on true significance of its hypotheses. This has been the main motivation behind the work of Liu et al. (2016), who considered a group-adjusted two-class mixture model that yields an explicit representation of each hypothesis-specific Lfdr as a function of its group-adjusted form and the Lfdr for the group it is associated with. It allows them to produce a method that provides a separate control over within-group false discoveries for truly significant groups in addition to having a control of false discoveries across all individual hypotheses. This paper, as mentioned in Introduction, motivates us to proceed further with the development of newer Lfdr based multiple testing methods for one-way classified hypotheses as described in the following section. 3. Proposed Methodologies Let us define H i = n i j=1 H ij to let H i = 0 (or = 1) mean that the ith group, and hence each (or at least one) of its component hypotheses, is non-significant (or significant). Let θ i indicate the truth (θ i = 0) or falsity (θ i = 1) of H i. We express each θ ij as follows: θ ij = θ i θ j i, with θ j i indicating the truth or falsity of H ij conditional on the status of H i, i.e., θ ij = 0, if θ i = 0; and θ ij = 0 or 1 according to whether θ j i = 0 or 1, if θ i = 1. This representation of the θ ij s brings the underlying group structure of the hypotheses into their binary hidden states conditional on the binary hidden states of the groups containing them. 4

5 Let us now recall from Liu et al. (2016) the model, with a different name, extending the two-class mixture model (Efron et al., 2001) from single to multiple groups under the setting of one-way classified hypotheses. The following distribution introduced in Liu et al. (2016) with a different name plays an important role in this model: Definition 3.1. [Truncated Product Bernoulli (TPBern (π, n)]. A set of n binary variables Z 1,...,Z n with the following joint probability distribution is said to have a TPBern (π,n) distribution: P (Z 1 = z 1,...,Z n = z n ) = = 1 1 (1 π) n n { π z i (1 π) 1 z } ( n ) i I z i > 0 i=1 (1 π) n ( π 1 (1 π) n 1 π ) n ( i=1 z i n I i=1 i=1 z i > 0 When hypotheses belonging to a certain group/family are simultaneously tested, this distribution provides a natural adjustment of the commonly used product Bernoulli distribution for the set of binary hidden states of the hypotheses, conditional on the group/family itself being significant. Definition 3.2. [Group-Adjusted Two-Class Mixture Model for One-Way Classified Hypotheses (One-Way GAMM)]. Let (X ij,j = 1,...,n i,θ i,θ j i,j = 1,...,n i ) be the set of random variables associated with the ith group, for i = 1,...,m. The groups are independently distributed with the following model for group i: ind X ij θ i,θ j i (1 θ i θ j i )f 0 (x ij ) + θ i θ j i f 1 (x ij ), for some given densities f 0 and f 1, P (θ j i = 0 θ i = 0) = 1, for each j = 1,...,n i ; (θ 1 i,...,θ ni i) θ i = 1 T P Bern(π 2i ;n i ), θ i Bern(π 1 ). ). Let Lfdr ij (π 1,π 2i ) Lfdr ij (x;π 1,π 2i ) = P r(θ ij = 0 X = x), Lfdr i (π 1,π 2i ) Lfdr i (x;π 1,π 2i ) = P r(θ i = 0 X = x), and Lfdr j i (π 1,π 2i ) Lfdr j i (x;π 1,π 2i ) = P r(θ j i = 0 θ i = 1,X = x) be the local s corresponding to H ij (hypothesis), H i (group), and H ij given H i = 1 (conditional), respectively, under One-Way GAMM. It is easy to see that Lfdr ij (π 1,π 2i ) = 1 [1 Lfdr i (π 1,π 2i )][1 Lfdr j i (π 1,π 2i )], (3.1) showing how a hypothesis specific local factors into the loacl s for the group and for the hypothesis conditional on the group s significance. Let Lfdr ij (π 2i ) = [(1 π 2i )f 0 (x ij )]/m i (x ij ), with m i (x) = (1 π 2i )f 0 (x) + π 2i f 1 (x), and Lfdr i (π 2i ) = n i j=1 Lfdr ij(π 2i ). Then, as shown in Appendix, Lfdr j i (π 1,π 2i ) Lfdr j i (π 2i ) = Lfdr ij(π 2i ) Lfdr i (π 2i ), (3.2) 1 Lfdr i (π 2i ) 5

6 and where Lfdr i (π 1 ;π 2i ) Lfdr i (λ i ;π 2i ) = Lfdr i (π 2i ) Lfdr i (π 2i ) + λ i [1 Lfdr i (π 2i )], (3.3) λ i = π 1 1 (1 π 2i) ni 1 π 1 (1 π 2i ) n. (3.4) i When λ i = 1, Lfdr ij (π 1,π 2i ) reduces to Lfdr ij (π 2i ), and so One-Way GAMM with λ i = 1 for all i represents the case of no group effect. These results can be summarised in the following: Proposition 3.1. Let Lfdr ij (π 2i ) be the local associated with H ij in group i under the standard single-group two-class mixture model with π 2i being the probability of a hypotheses in the group being significant, and Lfdr ij (π 1,π 2i ) be the same under One-Way GAMM that incorporates a similar two-class mixture model across the groups with π 1 as the chance of a group being significant. Then, Lfdr ij (π 1,π 2i ) can be expressed in terms of Lfdr ij (π 2i ) and λ i as follows by making use of (3.1)-(3.3), with λ i measuring an effect due to grouping for group i: Lfdr ij (λ i,π 2i ) = Lfdr i (π 2i ) + λ i [Lfdr ij (π 2i ) Lfdr i (π 2i )], (3.5) Lfdr i (π 2i ) + λ i [1 Lfdr i (π 2i )] for each i = 1,...,m;j = 1,...,n i. Remark 3.1. The above results bring home the point that in an Lfdr based approach to testing hypotheses belonging to a group/family that itself is likely to be significant with a chance of its own, the Lfdr for the group should be separated out from that for each hypothesis before assessing the true significance of the hypothesis. More specifically, suppose that we have a single group (i.e., m = 1) of hypotheses to test. Then, the hypotheses should be tested by taking away from them the confounding effect of the group s significance by using Lfdr j 1 (π 21 ) or the cumulative averages of them, depending on whether one desires to control the local or the average local (when controlling posterior ). Of course, one should test the significance of the group using its local, Lfdr 1 (λ 1,π 21 ), before proceeding to test the hypotheses in it at a level depending on that for Lfdr 1 (λ 1,π 21 ). More specifically, if one wants to control the average local, say at α, then we propose to reject the hypotheses associated with Lfdr (j) 1 (π 21 ), j = 1,...,R 1, the first R 1 increasingly ordered values of Lfdr j 1 (π 21 ), where R 1 is such that 1 R 1 R 1 j=1 Lfdr (j) 1 (π 21 ) α Lfdr 1 (λ 1,π 21 ) 1 Lfdr 1 (λ 1,π 21 ). The Lfdr 1 (λ 1,π 21 ) equals 0 if the group is assumed to be significant, or it can be controlled at some pre-assigned level < α to check if the group is significant. Clearly, when λ 1 = 1, our proposal reduces to controlling the average local for a single group of hypotheses under the standard two-class mixture model without introducing any group effect. We will extend this proposal from single to multiple groups of hypotheses in the following. 6

7 We express δ ij {0,1}, the decision rule associated with θ ij, similarly to θ ij, as follows: δ ij (X) = δ i (X) δ j i (X), with δ i (X) {0,1} and δ j i (X) {0,1} being the decision rules for θ i and θ j i, respectively. This provides a two-stage approach to deciding between θ ij = 0 and θ ij = 1 simultaneously for all (i,j). This paper relates to the development of such two-stage approaches, but focused on controlling the posterior expected proportion of false discoveries across all hypotheses, referred to as the total posterior (P T ), or the posterior expected average false discovery proportion across the selected/signficant groups, referred to as the selective posterior (P S ), at a given level α. In other words, we consider determining (δ i (X),δ j i (X)), i = 1,...,m,j = 1,...,n i, satisfying m ni P T = E i=1 j=1 (1 θ ij)δ ij (X) m } max{ ni i=1 j=1 δ X α, (3.6) ij(x),1 or P S = E 1 n j=1 (1 θ ij)δ ij (X) S n } max{ j=1 δ X α, (3.7) ij(x),1 i S where S is the set of indices for the selected groups, with the expectations taken with respect to θ ij s conditional on X. For notational convenience, we will often hide the symbol X in the δ s. Using (3.1), we see that P T and P S simplify, respectively, to and P T = = m ni i=1 j=1 Lfdr ij(λ i,π 2i )δ ij m ) max( ni i=1 j=1 δ ij,1 m i=1 δ i R i {1 [1 Lfdr i (λ i,π 2i )][1 P Wi ]} max ( m i=1 δ i R i,1 ), (3.8) m i=1 P S = δ i {1 [1 Lfdr i (λ i,π 2i )][1 P Wi ]} max ( m i=1 δ i,1 ), (3.9) where R i = δ i ni j=1 δ j i, and P Wi = n i j=1 δ j ilfdr j i (π 2i )/max(r i,1) is the within-group posterior for group i. The above representations of P T and P S under One-Way GAMM provide a Group Adjusted TEesting (GATE) framework for one-way classified hypotheses using their local s, allowing us to produce algorithm (in their oracle forms) answering each of Q1 and Q2. We commonly refer to these algorithms as One-Way GATE algorithms Answering Q1 Before we present an algorithm in its oracle form answering Q1, it is important to note the following theorem that drives the development of it with some optimality property. Theorem 3.1. Let [ m ni i=1 j=1 P FNR T = E θ ] ij(1 δ ij (X)) max{ m ni i=1 j=1 (1 δ ij(x)),1} X (3.10) 7

8 denote the total posterior FNR (PFNR T ) of a decision rule δ(x) = {δ ij (X),i = 1,...m,j = 1,...,n i }. The PFNR T of the decision rule δ(x) with δ ij (X) = I(Lfdr ij (λ i,π 2i ) c), for c (0,1) satisfying P T = α, is always less than or equal to that of any other δ ij (X) with P T α. A proof of this theorem can be seen in Appendix. Algorithm 1 One-Way GATE 1 (Oracle). 1: Calculate Lfdr ij (λ i,π 2i ), the hypothesis specific local under One-Way GAMM, from Proposition 1, for each i = 1,...,m;j = 1,...,n i. 2: Pool all these Lfdr ij s together and sort them as Lfdr (1) Lfdr (N). 3: Reject { the hypotheses associated with Lfdr (k), k = 1,...,R, where R = max l : } l k=1 Lfdr (k) lα. Theorem 3.2. The oracle One-Way GATE 1 controls P T at α. This theorem can be proved using standard arguments used for Lfdr based approaches to testing single group of hypotheses (see, e.g., Sun & Cai (2007); Sarkar & Zhou (2008)). It is important to note that P T may not equal a pre-specified value of α, and so Algorithm 1 is generally sub-optimal in the sense that it is the closest to one that is optimal as stated in Theorem 1. Remark 3.2. When λ i = 1 for all i, i.e., when the underlying grouping of hypotheses is ineffective in the sense that a group s own chance of being significant is no different from when it is formed by combining a set of independent hypotheses, One-Way GATE 1 reduces to the standard Lfdr based approach (like that in Sun & Cai (2007); He et al. (2015); and in many others). As we will see from simulation studies in Section 4, with λ i increasing (or decreasing) from 1, i.e., when a group s chance of being significant gets larger (or smaller) than what it is if the group consists of independent hypotheses, the standard Lfdr based approach becomes less powerful (or fails to control the error rate) Answering Q2 There are applications in the context of selective inference of multiple groups/familes of hypotheses where discovering significant groups, and hence a control over a measure of their false discoveries, is scientifically no less meaningful than making such discoveries for individual hypotheses subject to a control over a similar measure of false discoveries across all of them. For instance, as Peterson et al. (2016b) noted, in a multiphenotype genome-wide association study, which is often focused on groups/families of all phenotype specific hypotheses related to different genetic variants, rejecting H i corresponding to variant i is considered an important discovery in the process of identifying phenotypes that are significantly associated with that variant. They borrowed ideas from Benjamini & Bogomolov (2014) and considered a hierarchical testing method that allows control of this so-called between-group in the process of 8

9 controlling the expected average of false discovery proportions across significant groups (due to Benjamini & Bogomolov (2014)). The following algorithm in its oracle form answering Q2 offers an Lfdr based alternative to the hierarchical testing method of Peterson et al. (2016b). It allows a control over m i=1 P B = δ i Lfdr i (λ i,π 2i ) max ( m i=1 δ i,1 ), an Lfdr analog of the aforementioned between-group for the selected groups, while controlling P S. The following notation is being used in this algorithm: For 0 < α < 1, R i (α ) = max{1 k n i : k j=1 Lfdr (j) i(π 2i ) kα }, with Lfdr (j) i (π 2i ), j = 1,...,n i, being the sorted values of the Lfdr j i (π 2i ) s in group i. Algorithm 2 One-Way GATE 2 (Oracle). 1: Given an (0, α), select the largest subset of group indices S such that i S Lfdr i (λ i,π 2i ). 1 S 2: For each i S, and any given α α, find R i (α ) to calculate P S (α ) = 1 1 (1 Lfdr i (λ i,π 2i )) S i S 3: Find α (S) = sup{α : P S (α ) α}. 1 1 R i (α ) 4: Reject the hypotheses associated with P S (α (S)). R i (α ) j=1 Lfdr (j) i (π 2i ). (3.11) Theorem 3.3. The oracle One-Way GATE 2 controls P S at α subject to a control over P B at < α. This theorem can be proved by noting that the left-hand side of (3.11) is the P S of the procedure produced by Algorithm 2. and Let [ m i=1 PFNR B = E θ i (1 δ i (X)) max{ m i=1 (1 δ i (X)),1} ], X [ ni j=1 PFNR Wi = E θ ] j i(1 δ j i (X)) max{ n i j=1 (1 δ j i(x)),1} X denote between-group posterior FNR and within-group posterior FNR for group i, respectively, for a decision rule of the form δ ij (X) = δ i (X)δ j i (X), with δ i (X) = I(Lfdr i (λ i,π 2i ) c) and δ j i (X) = I(Lfdr j i (π 2i ) c ), for some 0 < c,c < 1, i = 1,...,m. Remark 3.3. From Theorem 3.1, we have the following optimality result regarding One-Way GATE 2: Given any 0 < < α < 1, (i) the PFNR B of the decision rule of the form δ i (X) = I(Lfdr i (λ i,π 2i ) c) with 0 < c < 1 satisfying P B = is less than or equal to that of any other δ i (X) with P B. 9

10 (ii) Given δ i (X), i = 1,...,m, with P B, there exists an α () α, subject to P S = α, such that, for each i, PFNR Wi of the decision rule of the form δ j i (X) = I(Lfdr j i (π 2i ) c ) with 0 < c < 1 satisfying P Wi = α() is less than or equal to that of any other decision rule in that group for which P Wi α (). Remark 3.4. It is important to note that One-Way GATE 2 without Step 1 can be used in situations where the focus is on controlling P S given a selection rule (or S). 4. Numerical Studies This section presents results of numerical studies we conducted to examine the performances of One-Way GATE 1 and One-Way GATE 2 compared to their relevant competitors in their oracle forms One-Way GATE 1 We considered various simulation settings involving 10,000 or 100,000 hypotheses grouped into equal-sized groups to investigate the performance of One-Way GATE 1 in comparison with its three competitors, all in their oracle forms. The first competitor, named as oracle Method, ignores the group structure by pooling all the hypotheses together into a single group, while the other two are oracle (Sun & Cai (2009)) and oracle (Hu et al. (2010)) methods. They operate under our model setting with equal group size n as follows: Oracle Method: The single-group Lfdr based method of Sun & Cai (2007) is applied to the mn hypotheses pooled together into a single group under a two-class mixture model X ij (1 p)f 0 (x ij )+pf 1 (x ij ), with p = m 1 m i=1 p i, where p i = π 1 and = π 2i /[1 (1 π 2i ) n i]. Oracle Method: The single-group Lfdr based method of Sun & Cai (2007) is applied to the mn hypotheses pooled together into a single group assuming a two-class mixture model X ij (1 )f 0 (x ij ) + f 1 (x ij ) for the n hypotheses in group i, for each i = 1,...,m. Oracle Method: X ij is converted to its p-value P ij before a level α BH method is applied to the weighted p-values Pij w = p(1 p i)p ij /p i, i = 1,...,m;j = 1,...,n, for the mn hypotheses pooled together into a single group. The simulations involved independently generated triplets of observations (X ij,θ i,θ j i ), i = 1,...,m(= 200 or 2000); j = 1,...,n i (= 5 or 50), with (i) θ i Bern(π 1 = 0.3); (ii) θ j i s jointly following TPBern(π 2i ;n i ), with π 2i determined from (3.4) using λ = k 2 /100 for k = 1, 2,..., 19 or 20; and (iii) X ij θ ij N(0,1) if θ ij = 0, and 0.3N( 2,1) + 0.7N(µ 2,1) if θ ij = 1, where µ 2 = 1.5 or 1.6 or... or 2.9 or 3.0. The oracle versions of One-Way GATE 1, the Method, method, and method were applied to the data for testing θ ij = 0 against θ ij = 1 simultaneously for all (i,j) at α = 0.05, and the simulated values of the total false discovery rate, the average number of true rejections, and the average number of total rejections were obtained for each of them based on 1000 replications. 10

11 alpha= λ λ λ Figure 1: Oracle One-Way GATE 1: m = 2000,n i = 5,µ 2 = 1.5. The x-axis corresponds to λ, varying from 0.01 to 4. Figures 1-3 and 6-14 display how the four methods compare across different values of π 2i (or λ) and µ 2 as the group size changes from small to a large value. The first three of these figures are being used here to point out scenarios where One-Way GATE 1 is seen to perform better than its competitors when µ 2 = 1.5. The rest of these graphs for larger values of µ are put in Appendix to see if the comparative performance pattern among the four methods changes with increasing value of µ. Figures 1-3 show that oracle One-Way GATE 1 controls the at the desired level 0.05 well. The oracle Method also controls the at the desired level. However, it is seen to be less powerful than oracle One-Way GATE 1, as expected, with the power difference getting larger with increased group size. The superior performance of oracle One-Way GATE 1 over oracle method when λ 1 is clearly shown by these graphs. The oracle method fails to control the, with the resultant getting as large as 0.47, when λ < 1. This happens because it uses a larger value of π 2i when λ is small, inflating the by an amount relating to the value of λ. When λ is larger, it uses a smaller value of π 2i, resulting in a method which is overly conservative. The has a similar pattern. It fails to control the when λ < 1 and is overly conservative when λ > 1. This conservativeness gets more and more prominent as λ increases. When λ < 1, the method yields slightly more rejections, largely due to its inflated error rate. When λ > 1, oracle One-Way GATE 1 works way better than oracle method and oracle method. 11

12 alpha= Figure 2: Oracle One-Way GATE 1: m = 200,n i = 50,µ 2 = 1.5. The x-axis corresponds to π 2i, varying from 0.05 to As seen from Figures 6-14, oracle One-Way GATE 1 is seen to retain its improved performance over the oracle versions of, and methods for larger values of µ One-Way GATE 2 Simulation studies were conducted to compare oracle One-Way GATE 2 to its only competitor, the method (Benjamini & Bogomolov (2014)) in its oracle form that operates as follows: Oracle method using Simes combination: X ij is converted to its p-value P ij. With P i(1) P i(n) denoting the sorted p-values in group i, let P i = min 1 j n {n(1 )P i(j) /j} denote Simes combination of the p-values in group i in its oracle form, for i = 1,...,m. Let G be the set of indices of the group specific hypotheses H i rejected using the oracle level α BH method based on (1 π 1 )P i, i = 1,...,m. Reject the hypotheses corresponding to P i(j) for all i G and j R i = max{j : (1 π 1 )P i(j) j G α/mn}. The comparison was made in terms of selective, average number of total rejections, and average number of true rejections were carried out under the same setting as in One-Way GATE 1. Figures 4 and 5 present the comparison for the setting where m = 2,000, n i = 50, and π 1 = 0.10 and 0.52 respectively and = The results for other settings are reported in Figures First, it is demonstrated that both the oracle One-Way GATE 2 and oracle method control the P S well. 12

13 alpha= Figure 3: Oracle One-Way GATE 1: m = 2000,n i = 50,µ 2 = 1.5. The x-axis corresponds to π 2i, varying from 0.05 to The oracle One-Way GATE 2 is more powerful in terms of yielding a large number of true rejection when the π 1 is relatively small, indicating a high sparsity level between-group level. When π 1 is as large as 0.8, most of the groups are selected, and there is little adjustment for selection in the oracle method. It thus has more number of rejections. When the group size is large (=50), the oracle One-Way GATE 2 is more powerful than the oracle method; however, the latter one can lead to larger number of rejections when the group size is small (=5). 5. Concluding Remarks The primary focus of this article has been to continue the line of research in Liu et al. (2016) to answer Q1 and Q2 for one-way classified hypotheses, providing the ground work for our broader goal of answering these questions in the setting of two-way classified hypotheses. Two-way classified setting is seen to occur in many applications. For instance, in time-course microarray experiment (see, e.g., Storey et al. (2005); Yuan & Kendziorski (2006); Sun & Wei (2011)), the hypotheses of interest can be laid out in a two-way classified form with gene and time-point representing the two categories of classification. In multiphenotype GWAS (Peterson et al. (2016b); Segura et al. (2012)), the families of the hypotheses related to different phenotypes form one level of grouping, while the other level of grouping is formed by the families of hypotheses corresponding to different SNPs. Two-way classified structure of hypotheses occurs also in brain imaging studies (Liu et al. (2009); Stein et al. (2010); Lin et al. (2014); 13

14 Sel α Figure 4: Oracle One-Way GATE 2: m = 2000,n i = 50,π 1 = The x-axis corresponds to varying from to Barber & Ramdas (2015)). Now that we know the theoretical framework successfully capturing the underlying group effect and yielding powerful approaches to multiple testing in the one-way classified setting, we can proceed to extend it to produce newer and powerful Lfdr based approaches answering Q1 and Q2 in two-way classified setting. We intend to do that in our future communications. Also, we have focused in this paper on developing the GATE algorithms in their oracle forms. In practice, one can estimate the unknown quantities in these oracle methods using various estimation techniques; see, e.g. Liu et al. (2016). Additionally, we can assume hyper-priors for the parameters and use Bayesian tools to calculate the Lfdrs. We will leave these for our future research. The figures associated with our numerical studies involving the method in its oracle form seems to suggest that this method, as proposed in Benjamini & Bogomolov (2014), can potentially be improved by plugging into it an estimated proportion of active groups. This is another important direction that we will pursue in our future research. A. Appendix A.1. Proofs of (3.2) and (3.3) These results, although appeared before in Liu et al. (2016), will be proved here using different and simpler arguments. They are re-stated, without any loss of generality, for a single group with slightly different notations in the following lemma. 14

15 Sel α Figure 5: Oracle One-Way GATE 2: m = 2000,n i = 50,π 1 = The x-axis corresponds to varying from to Lemma A.1. Conditionally given θ Bern(π 1 ), let (X j,θ j ), j = 1,...,n, be distributed as follows: (i) X 1,...X n θ 1,...,θ n ind (1 θ θ j )f 0 (x j )+θ θ j f 1 (x j ), and (ii) θ 1,...,θ n T P Bern(π 2 ;n). Let Lfdr j (π 2 ) Lfdr(x j ;π 2 ) = (1 π 2 )f 0 (x j )/m(x j ), with m(x) = (1 π 2 )f 0 (x) + π 2 f 1 (x), for j = 1,...,n, and Lfdr (π 2 ) = n j=1 Lfdr j(π 2 ). Then, P r(θ j = 0 θ = 1,X 1 = x 1,...,X n = x n ) = Lfdr j(π 2 ) Lfdr (π 2 ) 1 Lfdr (π 2 ) (A.1) and P r(θ = 0 X 1 = x 1,...,X n = x n ) = Lfdr (π 2 ) Lfdr (π 2 ) + λ[1 Lfdr (π 2 )], (A.2) where λ = π 1 1 π 1 (1 π 2) n 1 (1 π 2 ) n. Proof. First, note that (X 1,...,X n ) θ = 0 n f 0 (x j ) = j=1 n j=1 m(x j) (1 π 2 ) n Lfdr (π 2 ), (A.3) 15

16 and = = (X 1,...,X n ) θ = (1 π 2 ) n n j=1 θ j>0 n n {(1 θ j )f 0 (x j ) + θ j f 1 (x j )} {π θ j 2 (1 π 2) 1 θ j } j=1 1 n n 1 (1 π 2 ) n m(x j ) (1 π 2 ) n f 0 (x j ) j=1 j=1 n j=1 m(x j) 1 (1 π 2 ) n [1 Lfdr (π 2 )], (A.4) j=1 from which we get { (1 π1 )Lfdr (π 2 ) (X 1,...,X n ) (1 π 2 ) n + π } 1[1 Lfdr (π 2 )] n 1 (1 π 2 ) n m(x j ). j=1 (A.5) Formula (A.2) follows upon dividing (1 π 1 ) times (A.3) by (A.5). When θ j = 0, the conditional distribution of X 1,...,X n given θ = 1 can be obtained similar to that in (A.4) as follows: (1 π 2 )f 0 (x j ) 1 (1 π 2 ) n n k( j)=1 θ k>0 n k( j)=1 {(1 θ k )f 0 (x k ) + θ k f 1 (x k )} n k( j)=1 {π θ k 2 (1 π 2) 1 θ k } = (1 π 2)f 0 (x j ) n n 1 (1 π 2 ) n m(x k ) (1 π 2 ) n 1 f 0 (x k ) k( j)=1 k( j)=1 n j=1 = m(x j) 1 (1 π 2 ) n )[Lfdr j(π 2 ) Lfdr (π 2 )]. (A.6) Formula (A.1) then follows upon dividing (A.6) by (A.4). Proof of Theorem 3.1. For notational simplicity, we will hide X in δ ij (X), δ ij (X), Lfdr ij(x). First, we note the following inequalities: ( δij δ ij) ( δij δ ij) Lfdrij c ( δij δ ij ), (A.7) α ij ij ij the first of which follows from the fact that the P T of δ is less than or equal to α, which is the P T of δ, while the second one follows from ( ) ij δ ij δ ij (c Lfdr ij ) 0, because of the definition of δ ij. Since α = ij δ ijlfdr ij /max{ ij δ ij,1} c, we have from (A.7) that ij (δ ij δ ij )Lfdr ij 0, that is, (1 δ ij )Lfdr ij (1 δ ij)lfdr ij. (A.8) ij ij 16

17 With PFNR T (δ) and PFNR T (δ ) denoting the PFNR T of δ and δ, respectively, we now note that [ PFNRT (δ) c 1 PFNR T (δ) PFNR T (δ ] ) 1 PFNR T (δ ) = c [ (1 δ ij )(1 Lfdr ij ) (1 δ ij ij ij (1 δ )(1 Lfdr ] ij) ij)lfdr ij ij (1 δ ij )Lfdr ij = [ ] 1 δ ij 1 δ ij ij ij (1 δ ij)lfdr ij ij (1 δ ij )Lfdr [c(1 Lfdr ij ) (1 c)lfdr ij ] ij 0, with the inequality holding due to the definition of δ ij and the inequality in (A.8). Thus, we have which proves the theorem. P FNR T (δ) P FNR T (δ ), References Barber, R. F., & Ramdas, A. (2015). The p-filter: multilayer false discovery rate control for grouped hypotheses. Journal of the Royal Statistical Society: Series B, 79, Benjamini, Y., & Bogomolov, M. (2014). Selective inference on multiple families of hypotheses. Journal of the Royal Statistical Society. Series B, 76, Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B, 57, Cai, T. T., & Sun, W. (2009). Simultaneous testing of grouped hypotheses: Finding needles in multiple haystacks. Journal of the American Statistical Association, 104, Cai, T. T., Sun, W., & Wang, W. (2016). CARS: Covariate assisted ranking and screening for large-scale two-sample inference,. Technical Report. Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statistical Science, 23, Efron, B., Tibshirani, R., Storey, J. D., & Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96, Ferkingstad, E., Frigessi, A., Rue, H., Thorleifsson, G., & Kong, A. (2008). Unsupervised empirical Bayesian multiple testing with external covariates. The Annals of Applied Statistics, 2, Genovese, C., & Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society. Series B, 64,

18 He, L., Sarkar, S. K., & Zhao, Z. (2015). Capturing the severity of type II errors in highdimensional multiple testing. Journal of Multivariate Analysis, 142, Heller, R., Chatterjee, N., Krieger, A., & Shi, J. (2017). Post-selection inference following aggregate level hypothesis testing in large scale genomic data. Journal of the American Statistical Association, 113. Available online. Hu, J. X., Zhao, H., & Zhou, H. H. (2010). False discovery rate control with groups. Journal of the American Statistical Association, 105, Ignatiadis, N., Klaus, B., Zaugg, J. B., & Huber, W. (2016). Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nature Methods, 13, Lin, D., Calhoun, V. D., & Wang, Y. (2014). Correspondence between fmri and SNP data by group sparse canonical correlation analysis. Medical Image Analysis, 18, Liu, J., Pearlson, G., Windemuth, A., Ruano, G., Perrone-Bizzozero, N. I., & Calhoun, V. (2009). Combining fmri and SNP data to investigate connections between brain function and genetics using parallel ICA. Human Brain Mapping, 30, Liu, Y., Sarkar, S. K., & Zhao, Z. (2016). A new approach to multiple testing of grouped hypotheses. Journal of Statistical Planning and Inference, 179, Newton, M. A., Noueiry, A., Sarkar, D., & Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5, Peterson, C. B., Bogomolov, M., Benjamini, Y., & Sabatti, C. (2016a). Many phenotypes without many false discoveries: error controlling strategies for multitrait association studies. Genetic epidemiology, 40, Peterson, C. B., Bogomolov, M., Benjamini, Y., & Sabatti, C. (2016b). Many phenotypes without many false discoveries: Error controlling strategies for multitrait association studies. Genetic epidemiology, 40, Sarkar, S. K. (2004). -controlling stepwise procedures and their false negatives rates. Journal of Statistical Planning and Inference, 125, Sarkar, S. K., & Zhou, T. (2008). Controlling bayes directional false discovery rate in random effects model. Journal of Statistical Planning and Inference, 138, Sarkar, S. K., Zhou, T., & Ghosh, D. (2008). A general decision theoretic formulation of procedures controlling fdr and fnr from a Bayesian perspective. Statista Sinica, 18, Scott, J. G., Kelly, R. C., Smith, M. A., Zhou, P., & Kass, R. E. (2015). False discovery rate regression: an application to neural synchrony detection in primary visual cortex. Journal of the American Statistical Association, 110,

19 Segura, V., Vilhjálmsson, B. J., Platt, A., Korte, A., Seren, Ü., Long, Q., & Nordborg, M. (2012). An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nature Genetics, 44, Stein, J. L., Hua, X., Lee, S., Ho, A. J., Leow, A. D., Toga, A. W., Saykin, A. J., Shen, L., Foroud, T., Pankratz, N. et al. (2010). Voxelwise genome-wide association study (vgwas). Neuroimage, 53, Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society. Series B, 64, Storey, J. D., Xiao, W., Leek, J. T., Tompkins, R. G., & Davis, R. W. (2005). Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences of the United States of America, 102, Sun, L., Craiu, R. V., Paterson, A. D., & Bull, S. B. (2006). Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genetic Epidemiology, 30, Sun, W., & Cai, T. T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association, 102, Sun, W., & Cai, T. T. (2009). Large-scale multiple testing under dependence. Journal of the Royal Statistical Society. Series B, 71, Sun, W., & Wei, Z. (2011). Multiple testing for pattern identification, with applications to microarray time-course experiments. Journal of the American Statistical Association, 106, Yuan, M., & Kendziorski, C. (2006). Hidden Markov models for microarray time course data in multiple biological conditions. Journal of the American Statistical Association, 101, Zablocki, R. W., Schork, A. J., Levine, R. A., Andreassen, O. A., Dale, A. M., & Thompson, W. K. (2014). Covariate-modulated local false discovery rate for genome-wide association studies. Bioinformatics, (p. btu145). 19

20 A.2. More simulation results alpha= λ λ λ Figure 6: Oracle One-Way GATE 1: G = 2000,n i = 5,µ 2 = 2. The x-axis corresponds to λ, varying from 0.01 to 4. 20

21 alpha= λ λ λ Figure 7: Oracle One-Way GATE 1: G = 2000,n i = 5,µ 2 = 2.5. The x-axis corresponds to λ, varying from 0.01 to 4. 21

22 alpha= λ λ λ Figure 8: Oracle One-Way GATE 1: G = 2000,n i = 5,µ 2 = 3. The x-axis corresponds to λ, varying from 0.01 to 4. 22

23 alpha= Figure 9: Oracle One-Way GATE 1: G = 200,n i = 50,µ 2 = 2. The x-axis corresponds to π 2i, varying from 0.05 to

24 alpha= Figure 10: Oracle One-Way GATE 1: G = 200,n i = 50,µ 2 = 2.5. The x-axis corresponds to π 2i, varying from 0.05 to

25 alpha= Figure 11: Oracle One-Way GATE 1: G = 200,n i = 50,µ 2 = 3. The x-axis corresponds to π 2i, varying from 0.05 to

26 alpha= Figure 12: Oracle One-Way GATE 1: G = 2000,n i = 50,µ 2 = 2. The x-axis corresponds to π 2i, varying from 0.05 to

27 alpha= Figure 13: Oracle One-Way GATE 1: G = 2000,n i = 50,µ 2 = 2.5. The x-axis corresponds to π 2i, varying from 0.05 to

28 alpha= Figure 14: Oracle One-Way GATE 1: G = 2000,n i = 50,µ 2 = 3. The x-axis corresponds to π 2i, varying from 0.05 to

29 Sel α Figure 15: Oracle One-Way GATE 2: G = 2000,n i = 5,π 1 = The x-axis corresponds to varying from to

30 Sel α Figure 16: Oracle One-Way GATE 2: G = 2000,n i = 5,π 1 = The x-axis corresponds to varying from to

31 Sel α Figure 17: Oracle One-Way GATE 2: G = 2000,n i = 5,π 1 = The x-axis corresponds to varying from to

32 Sel α Figure 18: Oracle One-Way GATE 2: G = 200,n i = 50,π 1 = The x-axis corresponds to varying from to

33 Sel α Figure 19: Oracle One-Way GATE 2: G = 200,n i = 50,π 1 = 0.5. The x-axis corresponds to varying from to

34 Sel α Figure 20: Oracle One-Way GATE 2: G = 200,n i = 50,π 1 = 0.8. The x-axis corresponds to varying from to

35 Sel α Figure 21: Oracle One-Way GATE 2: G = 2000,n i = 50,π 1 = The x-axis corresponds to varying from to

36 Sel α Figure 22: Oracle One-Way GATE 2: G = 2000,n i = 50,π 1 = 0.5. The x-axis corresponds to varying from to

37 Sel α Figure 23: Oracle One-Way GATE 2: G = 2000,n i = 50,π 1 = The x-axis corresponds to varying from to

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE Sanat K. Sarkar 1, Tianhui Zhou and Debashis Ghosh Temple University, Wyeth Pharmaceuticals and

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. doi: 10.18637/jss.v000.i00 GroupTest: Multiple Testing Procedure for Grouped Hypotheses Zhigen Zhao Abstract In the modern Big Data

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2009 Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks T. Tony Cai University of Pennsylvania

More information

Sanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract

Sanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract Adaptive Controls of FWER and FDR Under Block Dependence arxiv:1611.03155v1 [stat.me] 10 Nov 2016 Wenge Guo Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102, U.S.A.

More information

Two-stage stepup procedures controlling FDR

Two-stage stepup procedures controlling FDR Journal of Statistical Planning and Inference 38 (2008) 072 084 www.elsevier.com/locate/jspi Two-stage stepup procedures controlling FDR Sanat K. Sarar Department of Statistics, Temple University, Philadelphia,

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo PROCEDURES CONTROLLING THE k-fdr USING BIVARIATE DISTRIBUTIONS OF THE NULL p-values Sanat K. Sarkar and Wenge Guo Temple University and National Institute of Environmental Health Sciences Abstract: Procedures

More information

FDR and ROC: Similarities, Assumptions, and Decisions

FDR and ROC: Similarities, Assumptions, and Decisions EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers

More information

The miss rate for the analysis of gene expression data

The miss rate for the analysis of gene expression data Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,

More information

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University The Annals of Statistics 2006, Vol. 34, No. 1, 394 415 DOI: 10.1214/009053605000000778 Institute of Mathematical Statistics, 2006 FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

arxiv: v1 [math.st] 31 Mar 2009

arxiv: v1 [math.st] 31 Mar 2009 The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN

More information

On adaptive procedures controlling the familywise error rate

On adaptive procedures controlling the familywise error rate , pp. 3 On adaptive procedures controlling the familywise error rate By SANAT K. SARKAR Temple University, Philadelphia, PA 922, USA sanat@temple.edu Summary This paper considers the problem of developing

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

A Large-Sample Approach to Controlling the False Discovery Rate

A Large-Sample Approach to Controlling the False Discovery Rate A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses Gavin Lynch Catchpoint Systems, Inc., 228 Park Ave S 28080 New York, NY 10003, U.S.A. Wenge Guo Department of Mathematical

More information

Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses

Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses arxiv:1610.03330v1 [stat.me] 11 Oct 2016 Jingshu Wang, Chiara Sabatti, Art B. Owen Department of Statistics, Stanford University

More information

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Research Article Sample Size Calculation for Controlling False Discovery Proportion Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

On Methods Controlling the False Discovery Rate 1

On Methods Controlling the False Discovery Rate 1 Sankhyā : The Indian Journal of Statistics 2008, Volume 70-A, Part 2, pp. 135-168 c 2008, Indian Statistical Institute On Methods Controlling the False Discovery Rate 1 Sanat K. Sarkar Temple University,

More information

Doing Cosmology with Balls and Envelopes

Doing Cosmology with Balls and Envelopes Doing Cosmology with Balls and Envelopes Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie

More information

A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments

A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Jie Chen 1 Merck Research Laboratories, P. O. Box 4, BL3-2, West Point, PA 19486, U.S.A. Telephone:

More information

Parametric Empirical Bayes Methods for Microarrays

Parametric Empirical Bayes Methods for Microarrays Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions

More information

Selection-adjusted estimation of effect sizes

Selection-adjusted estimation of effect sizes Selection-adjusted estimation of effect sizes with an application in eqtl studies Snigdha Panigrahi 19 October, 2017 Stanford University Selective inference - introduction Selective inference Statistical

More information

discovery rate control

discovery rate control Optimal design for high-throughput screening via false discovery rate control arxiv:1707.03462v1 [stat.ap] 11 Jul 2017 Tao Feng 1, Pallavi Basu 2, Wenguang Sun 3, Hsun Teresa Ku 4, Wendy J. Mack 1 Abstract

More information

Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments

Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Jie Chen 1 Merck Research Laboratories, P. O. Box 4, BL3-2, West Point, PA 19486, U.S.A. Telephone:

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Large-Scale Hypothesis Testing

Large-Scale Hypothesis Testing Chapter 2 Large-Scale Hypothesis Testing Progress in statistics is usually at the mercy of our scientific colleagues, whose data is the nature from which we work. Agricultural experimentation in the early

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES By Wenge Guo Gavin Lynch Joseph P. Romano Technical Report No. 2018-06 September 2018

More information

The optimal discovery procedure: a new approach to simultaneous significance testing

The optimal discovery procedure: a new approach to simultaneous significance testing J. R. Statist. Soc. B (2007) 69, Part 3, pp. 347 368 The optimal discovery procedure: a new approach to simultaneous significance testing John D. Storey University of Washington, Seattle, USA [Received

More information

High-throughput Testing

High-throughput Testing High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector

More information

Probabilistic Inference for Multiple Testing

Probabilistic Inference for Multiple Testing This is the title page! This is the title page! Probabilistic Inference for Multiple Testing Chuanhai Liu and Jun Xie Department of Statistics, Purdue University, West Lafayette, IN 47907. E-mail: chuanhai,

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

New Approaches to False Discovery Control

New Approaches to False Discovery Control New Approaches to False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

False Discovery Control in Spatial Multiple Testing

False Discovery Control in Spatial Multiple Testing False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University

More information

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction

More information

CARS: Covariate Assisted Ranking and Screening for Large-Scale Two-Sample Inference

CARS: Covariate Assisted Ranking and Screening for Large-Scale Two-Sample Inference ARS: ovariate Assisted Ranking and Screening for Large-Scale Two-Sample Inference T. Tony ai University of Pennsylvania, Philadelphia, USA Wenguang Sun University of Southern alifornia, Los Angeles, USA

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

Alpha-Investing. Sequential Control of Expected False Discoveries

Alpha-Investing. Sequential Control of Expected False Discoveries Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Statistica Sinica 19 (2009), 125-143 EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Debashis Ghosh Penn State University Abstract: There is much recent interest

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

New Procedures for False Discovery Control

New Procedures for False Discovery Control New Procedures for False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Elisha Merriam Department of Neuroscience University

More information

Resampling-Based Control of the FDR

Resampling-Based Control of the FDR Resampling-Based Control of the FDR Joseph P. Romano 1 Azeem S. Shaikh 2 and Michael Wolf 3 1 Departments of Economics and Statistics Stanford University 2 Department of Economics University of Chicago

More information

Inference with Transposable Data: Modeling the Effects of Row and Column Correlations

Inference with Transposable Data: Modeling the Effects of Row and Column Correlations Inference with Transposable Data: Modeling the Effects of Row and Column Correlations Genevera I. Allen Department of Pediatrics-Neurology, Baylor College of Medicine, Jan and Dan Duncan Neurological Research

More information

Procedures controlling generalized false discovery rate

Procedures controlling generalized false discovery rate rocedures controlling generalized false discovery rate By SANAT K. SARKAR Department of Statistics, Temple University, hiladelphia, A 922, U.S.A. sanat@temple.edu AND WENGE GUO Department of Environmental

More information

Latent Variable Methods for the Analysis of Genomic Data

Latent Variable Methods for the Analysis of Genomic Data John D. Storey Center for Statistics and Machine Learning & Lewis-Sigler Institute for Integrative Genomics Latent Variable Methods for the Analysis of Genomic Data http://genomine.org/talks/ Data m variables

More information

STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University

STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE Wenge Guo 1 and Sanat K. Sarkar 2 National Institute of Environmental Health Sciences and Temple University Abstract: Often in practice

More information

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Weidong Liu October 19, 2014 Abstract Large-scale multiple two-sample Student s t testing problems often arise from the

More information

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Amit Zeisel, Or Zuk, Eytan Domany W.I.S. June 5, 29 Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving

More information

Methods for High Dimensional Inferences With Applications in Genomics

Methods for High Dimensional Inferences With Applications in Genomics University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-12-2011 Methods for High Dimensional Inferences With Applications in Genomics Jichun Xie University of Pennsylvania,

More information

Heterogeneity and False Discovery Rate Control

Heterogeneity and False Discovery Rate Control Heterogeneity and False Discovery Rate Control Joshua D Habiger Oklahoma State University jhabige@okstateedu URL: jdhabigerokstateedu August, 2014 Motivating Data: Anderson and Habiger (2012) M = 778 bacteria

More information

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Joshua D. Habiger Oklahoma State University jhabige@okstate.edu Nov. 8, 2013 Outline 1 : Motivation and FDR Research Areas

More information

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University) Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber

More information

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS Journal of Biopharmaceutical Statistics, 21: 726 747, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.551333 MULTISTAGE AND MIXTURE PARALLEL

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014 Homework 4 (version 3) - posted October 3 Assigned October 2; Due 11:59PM October 9 Problem 1 (Easy) a. For the genetic regression model: Y

More information

arxiv: v2 [stat.me] 9 Aug 2018

arxiv: v2 [stat.me] 9 Aug 2018 Submitted to the Annals of Applied Statistics arxiv: arxiv:1706.09375 MULTILAYER KNOCKOFF FILTER: CONTROLLED VARIABLE SELECTION AT MULTIPLE RESOLUTIONS arxiv:1706.09375v2 stat.me 9 Aug 2018 By Eugene Katsevich,

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 11 1 / 44 Tip + Paper Tip: Two today: (1) Graduate school

More information

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE By Wenge Guo and M. Bhaskara Rao National Institute of Environmental Health Sciences and University of Cincinnati A classical approach for dealing

More information

In many areas of science, there has been a rapid increase in the

In many areas of science, there has been a rapid increase in the A general framework for multiple testing dependence Jeffrey T. Leek a and John D. Storey b,1 a Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD 21287; and b Lewis-Sigler

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel

More information

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR CONTROLLING THE FALSE DISCOVERY RATE A Dissertation in Statistics by Scott Roths c 2011

More information

Confounder Adjustment in Multiple Hypothesis Testing

Confounder Adjustment in Multiple Hypothesis Testing in Multiple Hypothesis Testing Department of Statistics, Stanford University January 28, 2016 Slides are available at http://web.stanford.edu/~qyzhao/. Collaborators Jingshu Wang Trevor Hastie Art Owen

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

arxiv: v1 [math.st] 13 Mar 2008

arxiv: v1 [math.st] 13 Mar 2008 The Annals of Statistics 2008, Vol. 36, No. 1, 337 363 DOI: 10.1214/009053607000000550 c Institute of Mathematical Statistics, 2008 arxiv:0803.1961v1 [math.st] 13 Mar 2008 GENERALIZING SIMES TEST AND HOCHBERG

More information

Rank conditional coverage and confidence intervals in high dimensional problems

Rank conditional coverage and confidence intervals in high dimensional problems conditional coverage and confidence intervals in high dimensional problems arxiv:1702.06986v1 [stat.me] 22 Feb 2017 Jean Morrison and Noah Simon Department of Biostatistics, University of Washington, Seattle,

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017

Lecture 7: Interaction Analysis. Summer Institute in Statistical Genetics 2017 Lecture 7: Interaction Analysis Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 39 Lecture Outline Beyond main SNP effects Introduction to Concept of Statistical Interaction

More information

29 Sample Size Choice for Microarray Experiments

29 Sample Size Choice for Microarray Experiments 29 Sample Size Choice for Microarray Experiments Peter Müller, M.D. Anderson Cancer Center Christian Robert and Judith Rousseau CREST, Paris Abstract We review Bayesian sample size arguments for microarray

More information

Uncertain Inference and Artificial Intelligence

Uncertain Inference and Artificial Intelligence March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

1 Differential Privacy and Statistical Query Learning

1 Differential Privacy and Statistical Query Learning 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose

More information

Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models

Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models David B. Dahl Department of Statistics Texas A&M University Marina Vannucci, Michael Newton, & Qianxing Mo

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE

ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE Statistica Sinica 22 (2012), 1689-1716 doi:http://dx.doi.org/10.5705/ss.2010.255 ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE Irina Ostrovnaya and Dan L. Nicolae Memorial Sloan-Kettering

More information

GENERALIZING SIMES TEST AND HOCHBERG S STEPUP PROCEDURE 1. BY SANAT K. SARKAR Temple University

GENERALIZING SIMES TEST AND HOCHBERG S STEPUP PROCEDURE 1. BY SANAT K. SARKAR Temple University The Annals of Statistics 2008, Vol. 36, No. 1, 337 363 DOI: 10.1214/009053607000000550 Institute of Mathematical Statistics, 2008 GENERALIZING SIMES TEST AND HOCHBERG S STEPUP PROCEDURE 1 BY SANAT K. SARKAR

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

More powerful control of the false discovery rate under dependence

More powerful control of the false discovery rate under dependence Statistical Methods & Applications (2006) 15: 43 73 DOI 10.1007/s10260-006-0002-z ORIGINAL ARTICLE Alessio Farcomeni More powerful control of the false discovery rate under dependence Accepted: 10 November

More information

Familywise Error Rate Controlling Procedures for Discrete Data

Familywise Error Rate Controlling Procedures for Discrete Data Familywise Error Rate Controlling Procedures for Discrete Data arxiv:1711.08147v1 [stat.me] 22 Nov 2017 Yalin Zhu Center for Mathematical Sciences, Merck & Co., Inc., West Point, PA, U.S.A. Wenge Guo Department

More information

Optimal detection of weak positive dependence between two mixture distributions

Optimal detection of weak positive dependence between two mixture distributions Optimal detection of weak positive dependence between two mixture distributions Sihai Dave Zhao Department of Statistics, University of Illinois at Urbana-Champaign and T. Tony Cai Department of Statistics,

More information

A General Framework for High-Dimensional Inference and Multiple Testing

A General Framework for High-Dimensional Inference and Multiple Testing A General Framework for High-Dimensional Inference and Multiple Testing Yang Ning Department of Statistical Science Joint work with Han Liu 1 Overview Goal: Control false scientific discoveries in high-dimensional

More information

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Multiple testing methods to control the False Discovery Rate (FDR),

More information