arxiv: v1 [stat.me] 13 Dec 2017

Size: px

Start display at page:

Download "arxiv: v1 [stat.me] 13 Dec 2017"

Kelley Hunter
6 years ago
Views:

1 Local False Discovery Rate Based Methods for Multiple Testing of One-Way Classified Hypotheses Sanat K. Sarkar, Zhigen Zhao Department of Statistical Science, Temple University, Philadelphia, PA, 19122, USA arxiv: v1 [stat.me] 13 Dec 2017 Abstract This paper continues the line of research initiated in Liu et al. (2016) on developing a novel framework for multiple testing of hypotheses grouped in a one-way classified form using hypothesisspecific local false discovery rates (Lfdr s). It is built on an extension of the standard two-class mixture model from single to multiple groups, defining hypothesis-specific Lfdr as a function of the conditional Lfdr for the hypothesis given that it is within a significant group and the Lfdr for the group itself and involving a new parameter that measures grouping effect. This definition captures the underlying group structure for the hypotheses belonging to a group more effectively than the standard two-class mixture model. Two new Lfdr based methods, possessing meaningful optimalities, are produced in their oracle forms. One, designed to control false discoveries across the entire collection of hypotheses, is proposed as a powerful alternative to simply pooling all the hypotheses into a single group and using commonly used Lfdr based method under the standard single-group two-class mixture model. The other is proposed as an Lfdr analog of the method of Benjamini & Bogomolov (2014) for selective inference. It controls Lfdr based measure of false discoveries associated with selecting groups concurrently with controlling the average of within-group false discovery proportions across the selected groups. Numerical studies show that our proposed methods are indeed more powerful than their relevant competitors, at least in their oracle forms, in commonly occurring practical scenarios. Keywords: False Discovery Rate, Grouped Hypotheses, Large-Scale Multiple Testing. 1. Introduction Modern scientific studies aided by high-throughput technologies, such as those related to brain imaging, microarray analysis, astronomy, atmospheric science, drug discovery, and many others, are increasingly relying on large-scale multiple testing as an integral part of statistical investigations focused on high-dimensional inference. With many of these investigations, notably in genome-wide association and neuroimaging studies, giving rise to testing of hypotheses that appear in groups, the multiple testing paradigm seems to be shifting from single to multiple Sanat K. Sarkar is Professor and Zhigen Zhao is Associate Professor of Department of Statistical Science, Temple University. Sarkar s research was supported by NSF Grants DMS and DMS Zhao s research was supported by NSF Grant DMS and NSF Grant IIS addresses: sanat@temple.edu (S.K. Sarkar), zhaozhg@temple.edu (Z. Zhao). Preprint submitted to Elsevier December 15, 2017

2 groups of hypotheses. These groups, forming at single or multiple levels creating one- or multiway classified hypotheses, can occur naturally due to the underlying biological or experimental process or be created using internal or external information capturing certain specific features of the data. Several newer questions arise with this paradigm shift. However, we will focus on the following two questions that seem relatively more relevant in light of what is available in the literature in the context of controlling an overall measure of false discoveries across the entire collection of hypotheses: Q1. For multiple testing of hypotheses grouped into a one-way classified form, how to effectively capture the underlying group/classification structure, instead of simply pooling all the hypotheses into a single group, while controlling overall false discoveries across all individual hypotheses? Q2. For hypotheses grouped into a one-way classified form in the context of post-selective inference where groups are selected before testing the hypotheses in the selected groups, how to effectively capture the underlying group/classification structure to control the expected average of false discovery proportions across the selected groups? Progress has been made toward answering Q1 (Hu et al. (2010)) and Q2 (Benjamini & Bogomolov (2014)) for one-way classified hypotheses in the framework of Benjamini-Hochberg (Benjamini & Hochberg (1995)) type false discovery rate () control. However, research addressing these questions based on local false discovery rate (Lfdr) (Efron et al. (2001)) based methodologies are largely absent, excepting the recent work of Liu et al. (2016) where a method has been proposed in its oracle form to answer the following question related to Q1: When making important discoveries within each group is as important as making those discoveries across all hypotheses, how to maintain a control over falsely discovered hypotheses within each group while controlling it across all hypotheses? The fact that an Lfdr based approach with its Bayesian/empirical Bayesian and decision theoretic foundation can yield powerful multiple testing method controlling false discoveries effectively capturing dependence as well as other structures of the data in single- and multiplegroup settings has been demonstrated before (Sun et al. (2006); Sun & Cai (2007); Efron (2008); Ferkingstad et al. (2008); Sarkar et al. (2008); Sun & Cai (2009); Cai & Sun (2009); Hu et al. (2010); Zablocki et al. (2014); Ignatiadis et al. (2016)). However, the work of Liu et al. (2016) is fundamentally different from these works in that it takes into account the sparsity of signals both across groups and within each active group. Consequently, the effect of a group s significance in terms of its Lfdr can be explicitly factored into a significance measure of each hypothesis within that group. On the other hand, in those other works, such as Sun & Cai (2009); Hu et al. (2010), significance measure of each hypothesis within a group is adjusted for the group s effect through its size rather than its measure of significance. In this article, we continue the line of research initiated in Liu et al. (2016) to answer Q1 and Q2 in an Lfdr framework. More specifically, we borrow ideas from Liu et al. (2016) in developing methodological steps to present a unified group-adjusted multiple testing framework for one-way classified hypotheses that introduces a grouping effect into overall false discoveries across all 2

3 individual hypotheses or the average of within-group false discovery proportions across selected groups. In the next section, we present the current state of knowledge closely pertinent to the present work and make remarks motivating the development of our proposed methodologies. 2. Literature Review and Motivating Remark Suppose there are N null hypotheses that appear in m non-overlapping families/groups, with H ij being the jth hypothesis in the ith group (i = 1,...,m;j = 1,...,n i ). We refer to such a layout of hypotheses as one-way classified hypotheses. With θ ij indicating the truth (θ ij = 0) or falsity (θ ij = 1) of H ij, the Lfdr, defined by the posterior probability P (θ ij = 0 X), where X = {X ij,i = 1,...,m;j = 1,...,n i }, is the basic ingredient for constructing Lfdr based approaches controlling false discoveries. The single-group case (or the case ignoring the group structure) has been considered extensively in the literature, notably Sun & Cai (2007); Cai & Sun (2009) and He et al. (2015) who focused on constructing methods that are optimal, at least in their oracle forms. These oracle methods correspond to Bayes multiple decision rules under a single-group two-class mixture model (Efron et al. (2001); Newton et al. (2004); Storey (2002)) that minimize marginal false non-discovery rate (mfnr), a measure of false non-discoveries closely related to the notion of false non-discoveries (FNR) introduced in Genovese & Wasserman (2002) and Sarkar (2004), subject to controlling marginal false discovery rate (m), a measure of false discoveries closely related to the BH and the positive (p) of Storey (2002). Multiple-group versions of single-group Lfdr based approaches to multiple testing have started getting attention recently, among them the following seem more relevant to our work. Cai & Sun (2009) extended their work from single to multiple groups (one-way classified hypotheses) under the following model: With i taking the value k with some prior probability π k, (X ij,θ ij ), j = 1,...,n i, given i = k, are assumed to be iid random pairs with X kj θ kj (1 θ kj )f k0 + θ kj f k1, for some given densities f k0 and f k1, and θ kj Bernoulli(p k ). They developed a method, which in its oracle form minimizes mfnr subject to controlling m and is defined in terms of thresholding the conditional Lfdr s: CLfdr i (X ij ) = (1 p i )f i0 (X ij )/f i (X ij ), where f i (X ij ) = (1 p i )f i0 (X ij )+p i f i1 (X ij ), for j = 1,...,n i, i = 1,...,m, before proposing a data-driven version of the oracle method that asymptoticaly maintains the original oracle properties. It should be noted that the probability π k relates to the size of group k and provides little information about the significance of the group itself. Ferkingstad et al. (2008) brought the grouped hypotheses setting into testing a single family of hypotheses in an attempt to empower typical Lfdr based thresholding approach by leveraging an external covariate. They partitioned the p-values into a number of small bins (groups) according to ordered values of the covariate. With the underlying two-class mixture model defined separately for each bin depending on the corresponding value of the covariate, they defined the so called covariate-modulated Lfdr as the posterior probability of a null hypothesis given the value of the covariate for the corresponding bin. They estimated 3

4 the covariate-modulated Lfdr in each bin using a Bayesian approach before proposing their thresholding method, not necessarily controlling an overall measure of false discoveries such as the m or the posterior. An extension of this work from single to multiple covariates can be seen in Zablocki et al. (2014); Scott et al. (2015). Very recently, Cai et al. (2016) developed a novel grouped hypotheses testing framework for two-sample multiple testing of the differences between two highly sparsed mean vectors, having constructed the groups to extract sparisty information in the data by using a carefully constructed auxiliary covariate. They proposed an Lfdr based optimal multiple testing procedure controlling as a powerful alternative to standard procedures based on the sample mean differences. A sudden upsurge of research has taken place recently in selective/post-selection inference due to its importance in light of the realization by the scientific community that the lack of reproducibility of a scientist s work is often caused by his/her failure to account for selection bias. When multiple hypotheses are simultaneously tested in a selective inference setting, it gives rise to a grouped hypotheses testing framework with the tested groups being selected from a given set of groups of hypotheses. Benjamini & Bogomolov (2014) introduced the notion of the expected average of false discovery proportion across the selected groups as an appropriate error rate to control in this setting and proposed a method that controls it. Since then, a few papers have been written in this area (Peterson et al. (2016a) and Heller et al. (2017)); however, no research has been produced yet in the Lfdr framework. Remark 2.1. When grouping of hypotheses occurs, naturally or artificially, an assumption can be made that the significance of a hypothesis is influenced by that of the group it belongs to. The Lfdr under the standard two-class mixture model, however, does not help in assessing a group s influence on true significance of its hypotheses. This has been the main motivation behind the work of Liu et al. (2016), who considered a group-adjusted two-class mixture model that yields an explicit representation of each hypothesis-specific Lfdr as a function of its group-adjusted form and the Lfdr for the group it is associated with. It allows them to produce a method that provides a separate control over within-group false discoveries for truly significant groups in addition to having a control of false discoveries across all individual hypotheses. This paper, as mentioned in Introduction, motivates us to proceed further with the development of newer Lfdr based multiple testing methods for one-way classified hypotheses as described in the following section. 3. Proposed Methodologies Let us define H i = n i j=1 H ij to let H i = 0 (or = 1) mean that the ith group, and hence each (or at least one) of its component hypotheses, is non-significant (or significant). Let θ i indicate the truth (θ i = 0) or falsity (θ i = 1) of H i. We express each θ ij as follows: θ ij = θ i θ j i, with θ j i indicating the truth or falsity of H ij conditional on the status of H i, i.e., θ ij = 0, if θ i = 0; and θ ij = 0 or 1 according to whether θ j i = 0 or 1, if θ i = 1. This representation of the θ ij s brings the underlying group structure of the hypotheses into their binary hidden states conditional on the binary hidden states of the groups containing them. 4

5 Let us now recall from Liu et al. (2016) the model, with a different name, extending the two-class mixture model (Efron et al., 2001) from single to multiple groups under the setting of one-way classified hypotheses. The following distribution introduced in Liu et al. (2016) with a different name plays an important role in this model: Definition 3.1. [Truncated Product Bernoulli (TPBern (π, n)]. A set of n binary variables Z 1,...,Z n with the following joint probability distribution is said to have a TPBern (π,n) distribution: P (Z 1 = z 1,...,Z n = z n ) = = 1 1 (1 π) n n { π z i (1 π) 1 z } ( n ) i I z i > 0 i=1 (1 π) n ( π 1 (1 π) n 1 π ) n ( i=1 z i n I i=1 i=1 z i > 0 When hypotheses belonging to a certain group/family are simultaneously tested, this distribution provides a natural adjustment of the commonly used product Bernoulli distribution for the set of binary hidden states of the hypotheses, conditional on the group/family itself being significant. Definition 3.2. [Group-Adjusted Two-Class Mixture Model for One-Way Classified Hypotheses (One-Way GAMM)]. Let (X ij,j = 1,...,n i,θ i,θ j i,j = 1,...,n i ) be the set of random variables associated with the ith group, for i = 1,...,m. The groups are independently distributed with the following model for group i: ind X ij θ i,θ j i (1 θ i θ j i )f 0 (x ij ) + θ i θ j i f 1 (x ij ), for some given densities f 0 and f 1, P (θ j i = 0 θ i = 0) = 1, for each j = 1,...,n i ; (θ 1 i,...,θ ni i) θ i = 1 T P Bern(π 2i ;n i ), θ i Bern(π 1 ). ). Let Lfdr ij (π 1,π 2i ) Lfdr ij (x;π 1,π 2i ) = P r(θ ij = 0 X = x), Lfdr i (π 1,π 2i ) Lfdr i (x;π 1,π 2i ) = P r(θ i = 0 X = x), and Lfdr j i (π 1,π 2i ) Lfdr j i (x;π 1,π 2i ) = P r(θ j i = 0 θ i = 1,X = x) be the local s corresponding to H ij (hypothesis), H i (group), and H ij given H i = 1 (conditional), respectively, under One-Way GAMM. It is easy to see that Lfdr ij (π 1,π 2i ) = 1 [1 Lfdr i (π 1,π 2i )][1 Lfdr j i (π 1,π 2i )], (3.1) showing how a hypothesis specific local factors into the loacl s for the group and for the hypothesis conditional on the group s significance. Let Lfdr ij (π 2i ) = [(1 π 2i )f 0 (x ij )]/m i (x ij ), with m i (x) = (1 π 2i )f 0 (x) + π 2i f 1 (x), and Lfdr i (π 2i ) = n i j=1 Lfdr ij(π 2i ). Then, as shown in Appendix, Lfdr j i (π 1,π 2i ) Lfdr j i (π 2i ) = Lfdr ij(π 2i ) Lfdr i (π 2i ), (3.2) 1 Lfdr i (π 2i ) 5

6 and where Lfdr i (π 1 ;π 2i ) Lfdr i (λ i ;π 2i ) = Lfdr i (π 2i ) Lfdr i (π 2i ) + λ i [1 Lfdr i (π 2i )], (3.3) λ i = π 1 1 (1 π 2i) ni 1 π 1 (1 π 2i ) n. (3.4) i When λ i = 1, Lfdr ij (π 1,π 2i ) reduces to Lfdr ij (π 2i ), and so One-Way GAMM with λ i = 1 for all i represents the case of no group effect. These results can be summarised in the following: Proposition 3.1. Let Lfdr ij (π 2i ) be the local associated with H ij in group i under the standard single-group two-class mixture model with π 2i being the probability of a hypotheses in the group being significant, and Lfdr ij (π 1,π 2i ) be the same under One-Way GAMM that incorporates a similar two-class mixture model across the groups with π 1 as the chance of a group being significant. Then, Lfdr ij (π 1,π 2i ) can be expressed in terms of Lfdr ij (π 2i ) and λ i as follows by making use of (3.1)-(3.3), with λ i measuring an effect due to grouping for group i: Lfdr ij (λ i,π 2i ) = Lfdr i (π 2i ) + λ i [Lfdr ij (π 2i ) Lfdr i (π 2i )], (3.5) Lfdr i (π 2i ) + λ i [1 Lfdr i (π 2i )] for each i = 1,...,m;j = 1,...,n i. Remark 3.1. The above results bring home the point that in an Lfdr based approach to testing hypotheses belonging to a group/family that itself is likely to be significant with a chance of its own, the Lfdr for the group should be separated out from that for each hypothesis before assessing the true significance of the hypothesis. More specifically, suppose that we have a single group (i.e., m = 1) of hypotheses to test. Then, the hypotheses should be tested by taking away from them the confounding effect of the group s significance by using Lfdr j 1 (π 21 ) or the cumulative averages of them, depending on whether one desires to control the local or the average local (when controlling posterior ). Of course, one should test the significance of the group using its local, Lfdr 1 (λ 1,π 21 ), before proceeding to test the hypotheses in it at a level depending on that for Lfdr 1 (λ 1,π 21 ). More specifically, if one wants to control the average local, say at α, then we propose to reject the hypotheses associated with Lfdr (j) 1 (π 21 ), j = 1,...,R 1, the first R 1 increasingly ordered values of Lfdr j 1 (π 21 ), where R 1 is such that 1 R 1 R 1 j=1 Lfdr (j) 1 (π 21 ) α Lfdr 1 (λ 1,π 21 ) 1 Lfdr 1 (λ 1,π 21 ). The Lfdr 1 (λ 1,π 21 ) equals 0 if the group is assumed to be significant, or it can be controlled at some pre-assigned level < α to check if the group is significant. Clearly, when λ 1 = 1, our proposal reduces to controlling the average local for a single group of hypotheses under the standard two-class mixture model without introducing any group effect. We will extend this proposal from single to multiple groups of hypotheses in the following. 6

7 We express δ ij {0,1}, the decision rule associated with θ ij, similarly to θ ij, as follows: δ ij (X) = δ i (X) δ j i (X), with δ i (X) {0,1} and δ j i (X) {0,1} being the decision rules for θ i and θ j i, respectively. This provides a two-stage approach to deciding between θ ij = 0 and θ ij = 1 simultaneously for all (i,j). This paper relates to the development of such two-stage approaches, but focused on controlling the posterior expected proportion of false discoveries across all hypotheses, referred to as the total posterior (P T ), or the posterior expected average false discovery proportion across the selected/signficant groups, referred to as the selective posterior (P S ), at a given level α. In other words, we consider determining (δ i (X),δ j i (X)), i = 1,...,m,j = 1,...,n i, satisfying m ni P T = E i=1 j=1 (1 θ ij)δ ij (X) m } max{ ni i=1 j=1 δ X α, (3.6) ij(x),1 or P S = E 1 n j=1 (1 θ ij)δ ij (X) S n } max{ j=1 δ X α, (3.7) ij(x),1 i S where S is the set of indices for the selected groups, with the expectations taken with respect to θ ij s conditional on X. For notational convenience, we will often hide the symbol X in the δ s. Using (3.1), we see that P T and P S simplify, respectively, to and P T = = m ni i=1 j=1 Lfdr ij(λ i,π 2i )δ ij m ) max( ni i=1 j=1 δ ij,1 m i=1 δ i R i {1 [1 Lfdr i (λ i,π 2i )][1 P Wi ]} max ( m i=1 δ i R i,1 ), (3.8) m i=1 P S = δ i {1 [1 Lfdr i (λ i,π 2i )][1 P Wi ]} max ( m i=1 δ i,1 ), (3.9) where R i = δ i ni j=1 δ j i, and P Wi = n i j=1 δ j ilfdr j i (π 2i )/max(r i,1) is the within-group posterior for group i. The above representations of P T and P S under One-Way GAMM provide a Group Adjusted TEesting (GATE) framework for one-way classified hypotheses using their local s, allowing us to produce algorithm (in their oracle forms) answering each of Q1 and Q2. We commonly refer to these algorithms as One-Way GATE algorithms Answering Q1 Before we present an algorithm in its oracle form answering Q1, it is important to note the following theorem that drives the development of it with some optimality property. Theorem 3.1. Let [ m ni i=1 j=1 P FNR T = E θ ] ij(1 δ ij (X)) max{ m ni i=1 j=1 (1 δ ij(x)),1} X (3.10) 7

8 denote the total posterior FNR (PFNR T ) of a decision rule δ(x) = {δ ij (X),i = 1,...m,j = 1,...,n i }. The PFNR T of the decision rule δ(x) with δ ij (X) = I(Lfdr ij (λ i,π 2i ) c), for c (0,1) satisfying P T = α, is always less than or equal to that of any other δ ij (X) with P T α. A proof of this theorem can be seen in Appendix. Algorithm 1 One-Way GATE 1 (Oracle). 1: Calculate Lfdr ij (λ i,π 2i ), the hypothesis specific local under One-Way GAMM, from Proposition 1, for each i = 1,...,m;j = 1,...,n i. 2: Pool all these Lfdr ij s together and sort them as Lfdr (1) Lfdr (N). 3: Reject { the hypotheses associated with Lfdr (k), k = 1,...,R, where R = max l : } l k=1 Lfdr (k) lα. Theorem 3.2. The oracle One-Way GATE 1 controls P T at α. This theorem can be proved using standard arguments used for Lfdr based approaches to testing single group of hypotheses (see, e.g., Sun & Cai (2007); Sarkar & Zhou (2008)). It is important to note that P T may not equal a pre-specified value of α, and so Algorithm 1 is generally sub-optimal in the sense that it is the closest to one that is optimal as stated in Theorem 1. Remark 3.2. When λ i = 1 for all i, i.e., when the underlying grouping of hypotheses is ineffective in the sense that a group s own chance of being significant is no different from when it is formed by combining a set of independent hypotheses, One-Way GATE 1 reduces to the standard Lfdr based approach (like that in Sun & Cai (2007); He et al. (2015); and in many others). As we will see from simulation studies in Section 4, with λ i increasing (or decreasing) from 1, i.e., when a group s chance of being significant gets larger (or smaller) than what it is if the group consists of independent hypotheses, the standard Lfdr based approach becomes less powerful (or fails to control the error rate) Answering Q2 There are applications in the context of selective inference of multiple groups/familes of hypotheses where discovering significant groups, and hence a control over a measure of their false discoveries, is scientifically no less meaningful than making such discoveries for individual hypotheses subject to a control over a similar measure of false discoveries across all of them. For instance, as Peterson et al. (2016b) noted, in a multiphenotype genome-wide association study, which is often focused on groups/families of all phenotype specific hypotheses related to different genetic variants, rejecting H i corresponding to variant i is considered an important discovery in the process of identifying phenotypes that are significantly associated with that variant. They borrowed ideas from Benjamini & Bogomolov (2014) and considered a hierarchical testing method that allows control of this so-called between-group in the process of 8

9 controlling the expected average of false discovery proportions across significant groups (due to Benjamini & Bogomolov (2014)). The following algorithm in its oracle form answering Q2 offers an Lfdr based alternative to the hierarchical testing method of Peterson et al. (2016b). It allows a control over m i=1 P B = δ i Lfdr i (λ i,π 2i ) max ( m i=1 δ i,1 ), an Lfdr analog of the aforementioned between-group for the selected groups, while controlling P S. The following notation is being used in this algorithm: For 0 < α < 1, R i (α ) = max{1 k n i : k j=1 Lfdr (j) i(π 2i ) kα }, with Lfdr (j) i (π 2i ), j = 1,...,n i, being the sorted values of the Lfdr j i (π 2i ) s in group i. Algorithm 2 One-Way GATE 2 (Oracle). 1: Given an (0, α), select the largest subset of group indices S such that i S Lfdr i (λ i,π 2i ). 1 S 2: For each i S, and any given α α, find R i (α ) to calculate P S (α ) = 1 1 (1 Lfdr i (λ i,π 2i )) S i S 3: Find α (S) = sup{α : P S (α ) α}. 1 1 R i (α ) 4: Reject the hypotheses associated with P S (α (S)). R i (α ) j=1 Lfdr (j) i (π 2i ). (3.11) Theorem 3.3. The oracle One-Way GATE 2 controls P S at α subject to a control over P B at < α. This theorem can be proved by noting that the left-hand side of (3.11) is the P S of the procedure produced by Algorithm 2. and Let [ m i=1 PFNR B = E θ i (1 δ i (X)) max{ m i=1 (1 δ i (X)),1} ], X [ ni j=1 PFNR Wi = E θ ] j i(1 δ j i (X)) max{ n i j=1 (1 δ j i(x)),1} X denote between-group posterior FNR and within-group posterior FNR for group i, respectively, for a decision rule of the form δ ij (X) = δ i (X)δ j i (X), with δ i (X) = I(Lfdr i (λ i,π 2i ) c) and δ j i (X) = I(Lfdr j i (π 2i ) c ), for some 0 < c,c < 1, i = 1,...,m. Remark 3.3. From Theorem 3.1, we have the following optimality result regarding One-Way GATE 2: Given any 0 < < α < 1, (i) the PFNR B of the decision rule of the form δ i (X) = I(Lfdr i (λ i,π 2i ) c) with 0 < c < 1 satisfying P B = is less than or equal to that of any other δ i (X) with P B. 9

10 (ii) Given δ i (X), i = 1,...,m, with P B, there exists an α () α, subject to P S = α, such that, for each i, PFNR Wi of the decision rule of the form δ j i (X) = I(Lfdr j i (π 2i ) c ) with 0 < c < 1 satisfying P Wi = α() is less than or equal to that of any other decision rule in that group for which P Wi α (). Remark 3.4. It is important to note that One-Way GATE 2 without Step 1 can be used in situations where the focus is on controlling P S given a selection rule (or S). 4. Numerical Studies This section presents results of numerical studies we conducted to examine the performances of One-Way GATE 1 and One-Way GATE 2 compared to their relevant competitors in their oracle forms One-Way GATE 1 We considered various simulation settings involving 10,000 or 100,000 hypotheses grouped into equal-sized groups to investigate the performance of One-Way GATE 1 in comparison with its three competitors, all in their oracle forms. The first competitor, named as oracle Method, ignores the group structure by pooling all the hypotheses together into a single group, while the other two are oracle (Sun & Cai (2009)) and oracle (Hu et al. (2010)) methods. They operate under our model setting with equal group size n as follows: Oracle Method: The single-group Lfdr based method of Sun & Cai (2007) is applied to the mn hypotheses pooled together into a single group under a two-class mixture model X ij (1 p)f 0 (x ij )+pf 1 (x ij ), with p = m 1 m i=1 p i, where p i = π 1 and = π 2i /[1 (1 π 2i ) n i]. Oracle Method: The single-group Lfdr based method of Sun & Cai (2007) is applied to the mn hypotheses pooled together into a single group assuming a two-class mixture model X ij (1 )f 0 (x ij ) + f 1 (x ij ) for the n hypotheses in group i, for each i = 1,...,m. Oracle Method: X ij is converted to its p-value P ij before a level α BH method is applied to the weighted p-values Pij w = p(1 p i)p ij /p i, i = 1,...,m;j = 1,...,n, for the mn hypotheses pooled together into a single group. The simulations involved independently generated triplets of observations (X ij,θ i,θ j i ), i = 1,...,m(= 200 or 2000); j = 1,...,n i (= 5 or 50), with (i) θ i Bern(π 1 = 0.3); (ii) θ j i s jointly following TPBern(π 2i ;n i ), with π 2i determined from (3.4) using λ = k 2 /100 for k = 1, 2,..., 19 or 20; and (iii) X ij θ ij N(0,1) if θ ij = 0, and 0.3N( 2,1) + 0.7N(µ 2,1) if θ ij = 1, where µ 2 = 1.5 or 1.6 or... or 2.9 or 3.0. The oracle versions of One-Way GATE 1, the Method, method, and method were applied to the data for testing θ ij = 0 against θ ij = 1 simultaneously for all (i,j) at α = 0.05, and the simulated values of the total false discovery rate, the average number of true rejections, and the average number of total rejections were obtained for each of them based on 1000 replications. 10

11 alpha= λ λ λ Figure 1: Oracle One-Way GATE 1: m = 2000,n i = 5,µ 2 = 1.5. The x-axis corresponds to λ, varying from 0.01 to 4. Figures 1-3 and 6-14 display how the four methods compare across different values of π 2i (or λ) and µ 2 as the group size changes from small to a large value. The first three of these figures are being used here to point out scenarios where One-Way GATE 1 is seen to perform better than its competitors when µ 2 = 1.5. The rest of these graphs for larger values of µ are put in Appendix to see if the comparative performance pattern among the four methods changes with increasing value of µ. Figures 1-3 show that oracle One-Way GATE 1 controls the at the desired level 0.05 well. The oracle Method also controls the at the desired level. However, it is seen to be less powerful than oracle One-Way GATE 1, as expected, with the power difference getting larger with increased group size. The superior performance of oracle One-Way GATE 1 over oracle method when λ 1 is clearly shown by these graphs. The oracle method fails to control the, with the resultant getting as large as 0.47, when λ < 1. This happens because it uses a larger value of π 2i when λ is small, inflating the by an amount relating to the value of λ. When λ is larger, it uses a smaller value of π 2i, resulting in a method which is overly conservative. The has a similar pattern. It fails to control the when λ < 1 and is overly conservative when λ > 1. This conservativeness gets more and more prominent as λ increases. When λ < 1, the method yields slightly more rejections, largely due to its inflated error rate. When λ > 1, oracle One-Way GATE 1 works way better than oracle method and oracle method. 11

12 alpha= Figure 2: Oracle One-Way GATE 1: m = 200,n i = 50,µ 2 = 1.5. The x-axis corresponds to π 2i, varying from 0.05 to As seen from Figures 6-14, oracle One-Way GATE 1 is seen to retain its improved performance over the oracle versions of, and methods for larger values of µ One-Way GATE 2 Simulation studies were conducted to compare oracle One-Way GATE 2 to its only competitor, the method (Benjamini & Bogomolov (2014)) in its oracle form that operates as follows: Oracle method using Simes combination: X ij is converted to its p-value P ij. With P i(1) P i(n) denoting the sorted p-values in group i, let P i = min 1 j n {n(1 )P i(j) /j} denote Simes combination of the p-values in group i in its oracle form, for i = 1,...,m. Let G be the set of indices of the group specific hypotheses H i rejected using the oracle level α BH method based on (1 π 1 )P i, i = 1,...,m. Reject the hypotheses corresponding to P i(j) for all i G and j R i = max{j : (1 π 1 )P i(j) j G α/mn}. The comparison was made in terms of selective, average number of total rejections, and average number of true rejections were carried out under the same setting as in One-Way GATE 1. Figures 4 and 5 present the comparison for the setting where m = 2,000, n i = 50, and π 1 = 0.10 and 0.52 respectively and = The results for other settings are reported in Figures First, it is demonstrated that both the oracle One-Way GATE 2 and oracle method control the P S well. 12

13 alpha= Figure 3: Oracle One-Way GATE 1: m = 2000,n i = 50,µ 2 = 1.5. The x-axis corresponds to π 2i, varying from 0.05 to The oracle One-Way GATE 2 is more powerful in terms of yielding a large number of true rejection when the π 1 is relatively small, indicating a high sparsity level between-group level. When π 1 is as large as 0.8, most of the groups are selected, and there is little adjustment for selection in the oracle method. It thus has more number of rejections. When the group size is large (=50), the oracle One-Way GATE 2 is more powerful than the oracle method; however, the latter one can lead to larger number of rejections when the group size is small (=5). 5. Concluding Remarks The primary focus of this article has been to continue the line of research in Liu et al. (2016) to answer Q1 and Q2 for one-way classified hypotheses, providing the ground work for our broader goal of answering these questions in the setting of two-way classified hypotheses. Two-way classified setting is seen to occur in many applications. For instance, in time-course microarray experiment (see, e.g., Storey et al. (2005); Yuan & Kendziorski (2006); Sun & Wei (2011)), the hypotheses of interest can be laid out in a two-way classified form with gene and time-point representing the two categories of classification. In multiphenotype GWAS (Peterson et al. (2016b); Segura et al. (2012)), the families of the hypotheses related to different phenotypes form one level of grouping, while the other level of grouping is formed by the families of hypotheses corresponding to different SNPs. Two-way classified structure of hypotheses occurs also in brain imaging studies (Liu et al. (2009); Stein et al. (2010); Lin et al. (2014); 13

14 Sel α Figure 4: Oracle One-Way GATE 2: m = 2000,n i = 50,π 1 = The x-axis corresponds to varying from to Barber & Ramdas (2015)). Now that we know the theoretical framework successfully capturing the underlying group effect and yielding powerful approaches to multiple testing in the one-way classified setting, we can proceed to extend it to produce newer and powerful Lfdr based approaches answering Q1 and Q2 in two-way classified setting. We intend to do that in our future communications. Also, we have focused in this paper on developing the GATE algorithms in their oracle forms. In practice, one can estimate the unknown quantities in these oracle methods using various estimation techniques; see, e.g. Liu et al. (2016). Additionally, we can assume hyper-priors for the parameters and use Bayesian tools to calculate the Lfdrs. We will leave these for our future research. The figures associated with our numerical studies involving the method in its oracle form seems to suggest that this method, as proposed in Benjamini & Bogomolov (2014), can potentially be improved by plugging into it an estimated proportion of active groups. This is another important direction that we will pursue in our future research. A. Appendix A.1. Proofs of (3.2) and (3.3) These results, although appeared before in Liu et al. (2016), will be proved here using different and simpler arguments. They are re-stated, without any loss of generality, for a single group with slightly different notations in the following lemma. 14

15 Sel α Figure 5: Oracle One-Way GATE 2: m = 2000,n i = 50,π 1 = The x-axis corresponds to varying from to Lemma A.1. Conditionally given θ Bern(π 1 ), let (X j,θ j ), j = 1,...,n, be distributed as follows: (i) X 1,...X n θ 1,...,θ n ind (1 θ θ j )f 0 (x j )+θ θ j f 1 (x j ), and (ii) θ 1,...,θ n T P Bern(π 2 ;n). Let Lfdr j (π 2 ) Lfdr(x j ;π 2 ) = (1 π 2 )f 0 (x j )/m(x j ), with m(x) = (1 π 2 )f 0 (x) + π 2 f 1 (x), for j = 1,...,n, and Lfdr (π 2 ) = n j=1 Lfdr j(π 2 ). Then, P r(θ j = 0 θ = 1,X 1 = x 1,...,X n = x n ) = Lfdr j(π 2 ) Lfdr (π 2 ) 1 Lfdr (π 2 ) (A.1) and P r(θ = 0 X 1 = x 1,...,X n = x n ) = Lfdr (π 2 ) Lfdr (π 2 ) + λ[1 Lfdr (π 2 )], (A.2) where λ = π 1 1 π 1 (1 π 2) n 1 (1 π 2 ) n. Proof. First, note that (X 1,...,X n ) θ = 0 n f 0 (x j ) = j=1 n j=1 m(x j) (1 π 2 ) n Lfdr (π 2 ), (A.3) 15

16 and = = (X 1,...,X n ) θ = (1 π 2 ) n n j=1 θ j>0 n n {(1 θ j )f 0 (x j ) + θ j f 1 (x j )} {π θ j 2 (1 π 2) 1 θ j } j=1 1 n n 1 (1 π 2 ) n m(x j ) (1 π 2 ) n f 0 (x j ) j=1 j=1 n j=1 m(x j) 1 (1 π 2 ) n [1 Lfdr (π 2 )], (A.4) j=1 from which we get { (1 π1 )Lfdr (π 2 ) (X 1,...,X n ) (1 π 2 ) n + π } 1[1 Lfdr (π 2 )] n 1 (1 π 2 ) n m(x j ). j=1 (A.5) Formula (A.2) follows upon dividing (1 π 1 ) times (A.3) by (A.5). When θ j = 0, the conditional distribution of X 1,...,X n given θ = 1 can be obtained similar to that in (A.4) as follows: (1 π 2 )f 0 (x j ) 1 (1 π 2 ) n n k( j)=1 θ k>0 n k( j)=1 {(1 θ k )f 0 (x k ) + θ k f 1 (x k )} n k( j)=1 {π θ k 2 (1 π 2) 1 θ k } = (1 π 2)f 0 (x j ) n n 1 (1 π 2 ) n m(x k ) (1 π 2 ) n 1 f 0 (x k ) k( j)=1 k( j)=1 n j=1 = m(x j) 1 (1 π 2 ) n )[Lfdr j(π 2 ) Lfdr (π 2 )]. (A.6) Formula (A.1) then follows upon dividing (A.6) by (A.4). Proof of Theorem 3.1. For notational simplicity, we will hide X in δ ij (X), δ ij (X), Lfdr ij(x). First, we note the following inequalities: ( δij δ ij) ( δij δ ij) Lfdrij c ( δij δ ij ), (A.7) α ij ij ij the first of which follows from the fact that the P T of δ is less than or equal to α, which is the P T of δ, while the second one follows from ( ) ij δ ij δ ij (c Lfdr ij ) 0, because of the definition of δ ij. Since α = ij δ ijlfdr ij /max{ ij δ ij,1} c, we have from (A.7) that ij (δ ij δ ij )Lfdr ij 0, that is, (1 δ ij )Lfdr ij (1 δ ij)lfdr ij. (A.8) ij ij 16

17 With PFNR T (δ) and PFNR T (δ ) denoting the PFNR T of δ and δ, respectively, we now note that [ PFNRT (δ) c 1 PFNR T (δ) PFNR T (δ ] ) 1 PFNR T (δ ) = c [ (1 δ ij )(1 Lfdr ij ) (1 δ ij ij ij (1 δ )(1 Lfdr ] ij) ij)lfdr ij ij (1 δ ij )Lfdr ij = [ ] 1 δ ij 1 δ ij ij ij (1 δ ij)lfdr ij ij (1 δ ij )Lfdr [c(1 Lfdr ij ) (1 c)lfdr ij ] ij 0, with the inequality holding due to the definition of δ ij and the inequality in (A.8). Thus, we have which proves the theorem. P FNR T (δ) P FNR T (δ ), References Barber, R. F., & Ramdas, A. (2015). The p-filter: multilayer false discovery rate control for grouped hypotheses. Journal of the Royal Statistical Society: Series B, 79, Benjamini, Y., & Bogomolov, M. (2014). Selective inference on multiple families of hypotheses. Journal of the Royal Statistical Society. Series B, 76, Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B, 57, Cai, T. T., & Sun, W. (2009). Simultaneous testing of grouped hypotheses: Finding needles in multiple haystacks. Journal of the American Statistical Association, 104, Cai, T. T., Sun, W., & Wang, W. (2016). CARS: Covariate assisted ranking and screening for large-scale two-sample inference,. Technical Report. Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statistical Science, 23, Efron, B., Tibshirani, R., Storey, J. D., & Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association, 96, Ferkingstad, E., Frigessi, A., Rue, H., Thorleifsson, G., & Kong, A. (2008). Unsupervised empirical Bayesian multiple testing with external covariates. The Annals of Applied Statistics, 2, Genovese, C., & Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society. Series B, 64,

18 He, L., Sarkar, S. K., & Zhao, Z. (2015). Capturing the severity of type II errors in highdimensional multiple testing. Journal of Multivariate Analysis, 142, Heller, R., Chatterjee, N., Krieger, A., & Shi, J. (2017). Post-selection inference following aggregate level hypothesis testing in large scale genomic data. Journal of the American Statistical Association, 113. Available online. Hu, J. X., Zhao, H., & Zhou, H. H. (2010). False discovery rate control with groups. Journal of the American Statistical Association, 105, Ignatiadis, N., Klaus, B., Zaugg, J. B., & Huber, W. (2016). Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nature Methods, 13, Lin, D., Calhoun, V. D., & Wang, Y. (2014). Correspondence between fmri and SNP data by group sparse canonical correlation analysis. Medical Image Analysis, 18, Liu, J., Pearlson, G., Windemuth, A., Ruano, G., Perrone-Bizzozero, N. I., & Calhoun, V. (2009). Combining fmri and SNP data to investigate connections between brain function and genetics using parallel ICA. Human Brain Mapping, 30, Liu, Y., Sarkar, S. K., & Zhao, Z. (2016). A new approach to multiple testing of grouped hypotheses. Journal of Statistical Planning and Inference, 179, Newton, M. A., Noueiry, A., Sarkar, D., & Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5, Peterson, C. B., Bogomolov, M., Benjamini, Y., & Sabatti, C. (2016a). Many phenotypes without many false discoveries: error controlling strategies for multitrait association studies. Genetic epidemiology, 40, Peterson, C. B., Bogomolov, M., Benjamini, Y., & Sabatti, C. (2016b). Many phenotypes without many false discoveries: Error controlling strategies for multitrait association studies. Genetic epidemiology, 40, Sarkar, S. K. (2004). -controlling stepwise procedures and their false negatives rates. Journal of Statistical Planning and Inference, 125, Sarkar, S. K., & Zhou, T. (2008). Controlling bayes directional false discovery rate in random effects model. Journal of Statistical Planning and Inference, 138, Sarkar, S. K., Zhou, T., & Ghosh, D. (2008). A general decision theoretic formulation of procedures controlling fdr and fnr from a Bayesian perspective. Statista Sinica, 18, Scott, J. G., Kelly, R. C., Smith, M. A., Zhou, P., & Kass, R. E. (2015). False discovery rate regression: an application to neural synchrony detection in primary visual cortex. Journal of the American Statistical Association, 110,

19 Segura, V., Vilhjálmsson, B. J., Platt, A., Korte, A., Seren, Ü., Long, Q., & Nordborg, M. (2012). An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nature Genetics, 44, Stein, J. L., Hua, X., Lee, S., Ho, A. J., Leow, A. D., Toga, A. W., Saykin, A. J., Shen, L., Foroud, T., Pankratz, N. et al. (2010). Voxelwise genome-wide association study (vgwas). Neuroimage, 53, Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society. Series B, 64, Storey, J. D., Xiao, W., Leek, J. T., Tompkins, R. G., & Davis, R. W. (2005). Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences of the United States of America, 102, Sun, L., Craiu, R. V., Paterson, A. D., & Bull, S. B. (2006). Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genetic Epidemiology, 30, Sun, W., & Cai, T. T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association, 102, Sun, W., & Cai, T. T. (2009). Large-scale multiple testing under dependence. Journal of the Royal Statistical Society. Series B, 71, Sun, W., & Wei, Z. (2011). Multiple testing for pattern identification, with applications to microarray time-course experiments. Journal of the American Statistical Association, 106, Yuan, M., & Kendziorski, C. (2006). Hidden Markov models for microarray time course data in multiple biological conditions. Journal of the American Statistical Association, 101, Zablocki, R. W., Schork, A. J., Levine, R. A., Andreassen, O. A., Dale, A. M., & Thompson, W. K. (2014). Covariate-modulated local false discovery rate for genome-wide association studies. Bioinformatics, (p. btu145). 19

20 A.2. More simulation results alpha= λ λ λ Figure 6: Oracle One-Way GATE 1: G = 2000,n i = 5,µ 2 = 2. The x-axis corresponds to λ, varying from 0.01 to 4. 20

21 alpha= λ λ λ Figure 7: Oracle One-Way GATE 1: G = 2000,n i = 5,µ 2 = 2.5. The x-axis corresponds to λ, varying from 0.01 to 4. 21

22 alpha= λ λ λ Figure 8: Oracle One-Way GATE 1: G = 2000,n i = 5,µ 2 = 3. The x-axis corresponds to λ, varying from 0.01 to 4. 22

23 alpha= Figure 9: Oracle One-Way GATE 1: G = 200,n i = 50,µ 2 = 2. The x-axis corresponds to π 2i, varying from 0.05 to

24 alpha= Figure 10: Oracle One-Way GATE 1: G = 200,n i = 50,µ 2 = 2.5. The x-axis corresponds to π 2i, varying from 0.05 to

25 alpha= Figure 11: Oracle One-Way GATE 1: G = 200,n i = 50,µ 2 = 3. The x-axis corresponds to π 2i, varying from 0.05 to

26 alpha= Figure 12: Oracle One-Way GATE 1: G = 2000,n i = 50,µ 2 = 2. The x-axis corresponds to π 2i, varying from 0.05 to

27 alpha= Figure 13: Oracle One-Way GATE 1: G = 2000,n i = 50,µ 2 = 2.5. The x-axis corresponds to π 2i, varying from 0.05 to

28 alpha= Figure 14: Oracle One-Way GATE 1: G = 2000,n i = 50,µ 2 = 3. The x-axis corresponds to π 2i, varying from 0.05 to

29 Sel α Figure 15: Oracle One-Way GATE 2: G = 2000,n i = 5,π 1 = The x-axis corresponds to varying from to

30 Sel α Figure 16: Oracle One-Way GATE 2: G = 2000,n i = 5,π 1 = The x-axis corresponds to varying from to

31 Sel α Figure 17: Oracle One-Way GATE 2: G = 2000,n i = 5,π 1 = The x-axis corresponds to varying from to

32 Sel α Figure 18: Oracle One-Way GATE 2: G = 200,n i = 50,π 1 = The x-axis corresponds to varying from to

33 Sel α Figure 19: Oracle One-Way GATE 2: G = 200,n i = 50,π 1 = 0.5. The x-axis corresponds to varying from to

34 Sel α Figure 20: Oracle One-Way GATE 2: G = 200,n i = 50,π 1 = 0.8. The x-axis corresponds to varying from to

35 Sel α Figure 21: Oracle One-Way GATE 2: G = 2000,n i = 50,π 1 = The x-axis corresponds to varying from to

36 Sel α Figure 22: Oracle One-Way GATE 2: G = 2000,n i = 50,π 1 = 0.5. The x-axis corresponds to varying from to

37 Sel α Figure 23: Oracle One-Way GATE 2: G = 2000,n i = 50,π 1 = The x-axis corresponds to varying from to

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE Sanat K. Sarkar 1, Tianhui Zhou and Debashis Ghosh Temple University, Wyeth Pharmaceuticals and