Applying the Benjamini Hochberg procedure to a set of generalized p-values

Similar documents
ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao

On adaptive procedures controlling the familywise error rate

STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

Chapter 1. Stepdown Procedures Controlling A Generalized False Discovery Rate

Modified Simes Critical Values Under Positive Dependence

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

Two-stage stepup procedures controlling FDR

On Methods Controlling the False Discovery Rate 1

Sanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract

arxiv: v1 [math.st] 31 Mar 2009

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University

Step-down FDR Procedures for Large Numbers of Hypotheses

ON TWO RESULTS IN MULTIPLE TESTING

Control of Directional Errors in Fixed Sequence Multiple Testing

Hochberg Multiple Test Procedure Under Negative Dependence

STEPUP PROCEDURES FOR CONTROL OF GENERALIZATIONS OF THE FAMILYWISE ERROR RATE

IMPROVING TWO RESULTS IN MULTIPLE TESTING

Procedures controlling generalized false discovery rate

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

GENERALIZING SIMES TEST AND HOCHBERG S STEPUP PROCEDURE 1. BY SANAT K. SARKAR Temple University

Resampling-Based Control of the FDR

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

arxiv: v1 [math.st] 13 Mar 2008

Familywise Error Rate Controlling Procedures for Discrete Data

A class of improved hybrid Hochberg Hommel type step-up multiple test procedures

Control of Generalized Error Rates in Multiple Testing

A Mixture Gatekeeping Procedure Based on the Hommel Test for Clinical Trial Applications

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

False discovery rate control for non-positively regression dependent test statistics

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

On weighted Hochberg procedures

Closure properties of classes of multiple testing procedures

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Statistica Sinica Preprint No: SS R1

Looking at the Other Side of Bonferroni

Heterogeneity and False Discovery Rate Control

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR

Two simple sufficient conditions for FDR control

Statistical Applications in Genetics and Molecular Biology

On Generalized Fixed Sequence Procedures for Controlling the FWER

Specific Differences. Lukas Meier, Seminar für Statistik

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES

Compatible simultaneous lower confidence bounds for the Holm procedure and other Bonferroni based closed tests

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

GOTEBORG UNIVERSITY. Department of Statistics

Department of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007

Multiple Testing of General Contrasts: Truncated Closure and the Extended Shaffer-Royen Method

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Lecture 7 April 16, 2018

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

The International Journal of Biostatistics

Stat 206: Estimation and testing for a mean vector,

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS

High-throughput Testing

Non-specific filtering and control of false positives

Multiple comparisons of slopes of regression lines. Jolanta Wojnar, Wojciech Zieliński

Testing Statistical Hypotheses

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Multistage Tests of Multiple Hypotheses

arxiv:math/ v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao

Multiple Testing. Anjana Grandhi. BARDS, Merck Research Laboratories. Rahway, NJ Wenge Guo. Department of Mathematical Sciences

Exact and Approximate Stepdown Methods For Multiple Hypothesis Testing

Adaptive, graph based multiple testing procedures and a uniform improvement of Bonferroni type tests.

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

False Discovery Rate

Lecture 12 November 3

Week 5 Video 1 Relationship Mining Correlation Mining

New Procedures for False Discovery Control

MULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

SOME STEP-DOWN PROCEDURES CONTROLLING THE FALSE DISCOVERY RATE UNDER DEPENDENCE

More powerful control of the false discovery rate under dependence

FDR and ROC: Similarities, Assumptions, and Decisions

Testing hypotheses in order

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1

Multiple hypothesis testing using the excess discovery count and alpha-investing rules

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

A note on tree gatekeeping procedures in clinical trials

Adaptive FDR control under independence and dependence

Journal of Statistical Software

Lecture 6 April

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Web-based Supplementary Materials for. of the Null p-values

Controlling the False Discovery Rate in Two-Stage. Combination Tests for Multiple Endpoints

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation

Testing Statistical Hypotheses

Comments on: Control of the false discovery rate under dependence using the bootstrap and subsampling

arxiv: v1 [math.st] 14 Nov 2012

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

Lecture 13: p-values and union intersection tests

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

Model Identification for Wireless Propagation with Control of the False Discovery Rate

arxiv: v3 [math.st] 15 Jul 2018

Transcription:

U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University

Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson, Uppsala University Abstract Let δ,..., δ n be decision functions with associated loss functions L,..., L n, with respect to some statistical model {P θ : θ Θ}. We assume non-negative random variables ˆR,..., ˆR n referred to as generalized p-values), satisfying E θ Lk δ k, θ)i{ ˆR k α} ) α, for α > 0, θ Θ and k =,..., n. It is shown, assuming independence between the pairs δ, ˆR ),..., δ, ˆR ), that the Benjamini Hochberg procedure [], determining the number of confirmed decisions through N = max{i : ˆRi) iα/n}, with respect to an ordering ˆR ) ˆR n), controls the expected average loss among confirmed decisions at level α. We also verify, without any assumption on dependence, that the expected average loss among confirmed decisions is bounded by α n k= /k, thus extending the results of Benjamini and Yekutieli [5] and Hommel [3]. The generalized perspective on p-values is motivated with reference to the treatment of directional conclusions in [4] and [5]. Finally, we demonstrate in a simulation study that several related procedures e.g. the Holm procedure [2]) can not be extended in a similar manner. Introduction The well-known Benjamini Hochberg procedure [] also known as Simes procedure [2]) assumes p-values ˆp,..., ˆp n and a significance level α. The procedure then rejects hypotheses corresponding to the N smallest values in an ordering ˆp ) ˆp n), with 0 N n determined by N = max{i : ˆp i) iα/n}. Simes proved [2], under the assumption that ˆp,..., ˆp n are independent random variables, that the probability of at least one rejection is bounded by α in the intersection of the null hypotheses. Simes result was later extended by Benjamini and Hochberg [], proving that the procedure generally under the assumption of independence controls the false discovery rate FDR, the expected proportion of false rejections among the total number of rejections) at level α. We consider a generalization of the logic of p-values in Section 2 below. Thus, starting with a loss function L corresponding to some decision function δ), we assume AMS 2000 subject classifications. Primary 62H5, 62F03; Secondary 62G0. Keywords and phrases. Multiple testing; false discovery rate; Simes procedure; significance testing; decision theory. Date. November 5, 20

that its expected loss can be controlled at level α through an indicator random variable I, in the sense that E θ Lδ, θ)i) α, for all θ Θ. A loss function L = I{θ Θ 0 } here corresponds to level-α testing of Θ 0, with a confirmation I = referred to as a rejection of Θ 0. Following this path of generalization, we refer to a non-negative random variable ˆR as a generalized p-value, with respect to L and δ, in case the following condition is fulfilled E θ Lδ, θ)i{ ˆR α} ) α, for all α > 0 and θ Θ. In the setting of loss functions L..., L n and confirmative indicators I..., I n, note that the false discovery proportion the proportion of false rejections among the total number of rejections), naturally generalizes into the following average loss among confirmed decisions, L = L I + + L n I n ) / I + + I n ), ) interpreted as zero if I = = I n = 0, that is, if no decisions are confirmed). We show Section 3) that the Benjamini Hochberg procedure applied to a set of generalized p-values ˆR..., ˆR n controls the expected average loss among confirmed decisions, in the sense that E θ L) αm/n, for all θ Θ, assuming pairwise independence for δ, ˆR ),..., δ n, ˆR n ), among which n m decisions are free of risk. We were motivated to investigate the generalized setting in view of the proposal by Jones and Tukey in [4] see also [5]). These papers consider a slightly different perspective compared to the conventional one-sided and two-sided significance testing. More precisely, with three decision procedures for directional conclusions, control was imposed on reversal rates instead of probabilities of false rejections. A connection between the present context of generalized p-values and three decision procedures for directional conclusions is presented in Section 2.. Interestingly, many procedures in the tradition of the Benjamini Hochberg procedure do not have similar extensions. For instance, with L..., L n being zero-one loss functions reporting the errors of the decision functions δ..., δ n ), one could guess that the well-known Holm procedure [2] applied to ˆR..., ˆR n would control the probability of at least one error among confirmed decisions so-called FWER, familywise error rate). This is however not the case, as demonstrated in Section 4. Nevertheless, the fundamental Bonferroni argument is valid in this context cf. Section 2.2). We also consider more recent adaptive FDR-controlling procedures in Section 4, showing by numerical results that their control of expected average loss among confirmed decisions does not extend to the generalized setting.. General dependence Several authors have considered the conservativeness of the Benjamini Hochberg procedure with respect to control of FDR in cases of dependent p-values ˆp,..., ˆp n. Simes 2

[2] included a simulation study concerning correlated test statistics and the intersection of the null hypotheses where the notions of FDR and FWER coincide). Later, Samuel-Cahn [8] and Hochberg and Rom [] gave examples with negatively dependent test statistics where conservativeness is violated albeit only to a small extent) in the intersection of the null hypotheses. Hommel [3] proved work which was later extended by Falk [6]) that the probability of a false rejection is bounded by { α = min, α k }. 2) Moreover, examples demonstrating the sharpness of the bound were given. Sarkar [9] proved that Simes original result on conservativeness for independent test statistics can be extended to a class of positively dependent test statistics. Regarding the complement of the intersection of the null hypotheses, Benjamini and Yekutieli [5] generalized Hommel s bound 2), proving that FDR α m k, 3) n k= k= with m referring to the number of true null hypotheses. Also, conservativeness was extended in [5] to a class of positively dependent test statistics with generalizations by Sarkar [20] and Finner, Dickhaus and Roters [8, Section 4]). With an asymptotic perspective regarding the number of null hypotheses), Finner, Dickhaus and Roters [7] investigated, theoretically and numerically, the impact of dependence on FDR for some commonly encountered test statistics. In the following Section 3) we generalize the bounds of Hommel 2) and Benjamini Yekutieli 3) by showing that, without any assumption of dependence, E θ L) α m n k, k= with L referring to the average loss among confirmed decisions as in )), and n m referring to the number of risk-free decisions. 2 Risk-controlled decisions Following the basic set-up in statistical decision theory, consider a random element X and a set of probability distributions {P θ : θ Θ}, where each element is a possible law of X cf. for instance [6, Section.] or [7, Chapter ]). Also, consider a nonnegative loss function L with respect to some decision function δ and the parameter space Θ cf. [7, Chapter 3]). In this setting, assume that an indicator random variable I a zero-one function of X) is given, with the purpose of confirming δ at maximal risk α. Thus, δ is confirmed whenever I =. Moreover, it is assumed that the following condition is fulfilled: E θ Lδ, θ)i) α, for all θ Θ. 4) Note that condition 4) generalizes the concept of a statistical test. Indeed, given some null hypothesis Θ 0 Θ, and the following zero-one loss function, L = I{θ Θ 0 }, 5) 3

condition 4) characterizes that the indicator I serves as a level-α test of Θ 0 with I = referred to as a rejection of Θ 0 ). Thus, condition 4) generalizes statistical testing in two senses: First, losses are allowed to assume other values than 0 and. Second, losses are allowed to depend on some decision function δ, i.e. on the random data X. Next, generalizing the treatment of p-values in [6, Section 3.3], assume that some family {Iα)} 0 α< of indicator random variables is given. Let the family be stochastically non decreasing, in the sense that Iα) = implies Iα ) = for α < α, and assume that E θ Lδ, θ)iα)) α, for all θ Θ and 0 α <. 6) Then, by defining a random variable ˆR = inf{α : Iα) = }, condition 6) transfers into the following condition cf. [6, Lemma 3.3.]): E θ LI{ ˆR α} ) α, for all θ Θ and 0 α <. 7) For brevity, we refer to ˆR in 7) as a generalized p-value with respect to L and δ. 2. Directional conclusions Following the treatment of three-decision procedures for directional conclusions in [4] and [5], let a partitioning {Θ, Θ 2 } of the parameter space Θ be given. For instance, Θ and Θ 2 may refer to the two possible directions of a treatment effect. Then, being interested in rational conclusions of whether θ Θ or θ Θ 2 holds, consider decision functions δ into the binary decision space {, 2}. A natural loss function distinguishing between true and false conclusions) is then given by Lδ, θ) = I{θ Θ }I{δ = 2} + I{θ Θ 2 }I{δ = }. 8) Introducing an indicator random varible I fulfilling condition 4) in this context, we obtain a three-decision procedure by replacing δ with an indefinite decision in case I = 0. Moreover, the reversal rate is then controlled at level α in the terminology of [4]). In fact, there is a close correspondence between three-decision procedures and non-colliding simultaneous tests of Θ and Θ 2 in this context cf. [5]). Thus, if ˆp is a p-value for testing Θ and ˆp 2 is a p-value for testing Θ 2, consider the directional decision function δ = 2 I{ˆp ˆp 2 } + I{ˆp > ˆp 2 }. In other words, δ concludes θ Θ if it appears to be more profitable to reject Θ 2, and vice versa. Moreover, consider ˆR = min {ˆp, ˆp 2 }. Then, with L given by 8), E θ LI{ ˆR α} ) = E θ I{ ˆR α} I{θ Θ }I{ˆp ˆp 2 } + I{θ Θ 2 }I{ˆp > ˆp 2 )) = I{θ Θ }P θ ˆp α ) + I{θ Θ 2 }P θ ˆp2 α ) α, 4

which proves that ˆR is an generalized p-value with respect to L and δ. As a final remark in this context, comparing the two loss functions in 5) and 8), note that both are zero-one loss functions. However, L in 8) depends on given data through the directional decision function δ). 2.2 Zero-one losses and the Bonferroni argument Now, let loss functions L,..., L n be given, with respect to one and the same statistical model {P θ : θ Θ}. Moreover, assume that corresponding generalized p-values ˆR,..., ˆR n are given. Then, applying the indicator random variables I k = I{ ˆR k α} simultaneously in order to confirm subsets of δ,..., δ n, with respect to some level of significance α, may lead to some undesirable consequences. For instance, assuming that L,..., L n are zero-one loss functions reporting the errors of δ,..., δ n respectively), with θ given such that each pair L k δ k, θ), ˆR k ) is independent of any other pair L k δ k, θ), ˆR k ), and α given such that the controlled risks satisfy E θ L I ) = = E θ L n I n ) = α, then, the probability of no error among confirmed decisions is given by, n { P θ min Lk, I k ) = 0 }) n = Eθ L k I k ) ) = α) n, k= k= which is considerably smaller than α if n is large. As an alternative, consider replacing I k by the more restrictive Bonferroniindicators I k = I{ ˆR k α/n}. Then, once more assuming zero-one loss functions L,..., L n, the probabilities of at least one error among confirmed decisions satisify the following bound: n { P θ Lk I k = }) E θ L k I k) n α/n = α. 9) k= k= 2.3 Maximal and average loss among confirmed decisions Note that condition 9) refers to the fact that the expected maximal loss among confirmed decisions is controlled at level α. In other words, E θ L) α for all θ Θ, with respect to L = max {L I,..., L n I n }. 0) Here, L may be considered as a combined measure of loss among confirmed decisions. An alternative measure is given by the following average loss among confirmed decisions, L = L I + + L n I n ) / I + + I n ), ) with L interpreted as zero if no decisions are confirmed I = = I n = 0). In multiple testing, obtaining E θ L) α with respect to 0) is referred to as controlling the FWER family-wise error rate, cf. [6, p. 349]) at level α. Similarly, obtaining E θ L) α with respect to ) is referred to as controlling the FDR false discovery rate, cf. []) at level α. 5

2.4 The Benjamini Hochberg procedure Consider non-random, zero-one loss functions L,..., L n and corresponding p-values ˆR,..., ˆR n, with respect to some statistical model {P θ : θ Θ}. Let ˆR ),..., ˆR n) refer to ˆR,..., ˆR n in a given ordering ˆR ) ˆR n). Benjamini and Hochberg proved [], assuming ˆR,..., ˆR n to be independent random variables and a level α, that the following indicator random variables with ˆR n+) := for notational convenience), I k = i= I { n ˆR k αi } n+ j=i+ I { n ˆR j) > αj } I { n ˆR i) αi }, k =,..., n, 2) controls the expected average loss among confirmed decisions at level α. Note that Definition 2) is usually interpreted in a stepwise manner. Thus, starting at the bottom of the hierarchy of ordered p-values, ˆR n) is first compared with α. If smaller than α, all decisions are confirmed. Otherwise, n ˆR n ) is compared with αn ), confirming all decisions corresponding to ˆR ),..., ˆR n ) in case αn ) exceeds n ˆR n ). Otherwise, n ˆR n 2) is compared with n 2)α, etcetera. Thus, the procedure confirms a random number N, 0 N n, of the given decisions, corresponding to the N smallest among ˆR,..., ˆR n, with N determined by N = max{i : ˆR i) iα/n}, setting ˆR 0) := 0 for notational convenience). Applying the definition ) of L as the average loss among confirmed decisions to the indicators 2), one obtains L = = n i I{N = i} L k I { n ˆR k αi } i= k= i= k= i L ki { n ˆR k αi } n+ j=i+ I { n ˆR j) > αj } I { n ˆR i) αi }. The fact that E θ L) α can thus be expressed as k= i= i E θ L k I { n ˆR k αi } n+ I { n ˆR j) > αj } I { n ˆR i) αi }) α. 3) j=i+ 3 Main results Theorem 3. generalizes the validity of the bound 3) to the setting of non-negative loss functions L,..., L n and corresponding generalized p-values ˆR,..., ˆR n. Part i) generalize the original result []. Here, the pairs δ, ˆR ),..., δ n, ˆR n ) are assumed to be independent. However, nothing is assumed about the dependence between δ k and ˆR k within a given pair δ k, ˆR k ). Part ii) generalizes Theorem.3 in [5]. In this case, an increased bound on the expected average loss among confirmed decisions is given, valid without any assumption on dependence. 6

Theorem 3.. Let δ,..., δ n be decision functions, with associated loss functions L,..., L n on some statistical model {P θ : θ Θ}. Moreover, let non-negative random variables ˆR,..., ˆR n and θ Θ be given such that, for an integer m, 0 m n, relations 4)-5) hold, for any β α, E θ Lk I{ ˆR k β} ) β, for k m, 4) E θ Lk I{ ˆR k β} ) = 0, for m + k n. 5) For a given α > 0, consider the corresponding level-α Benjamini Hochberg procedure, and let L refer to the average loss among confirmed decisions. i) Assume that the pairs δ, ˆR ),..., δ n, ˆR n ) are independent. Then E θ L) αm/n. 6) Moreover, if equality holds in 4), for all k m and all β α, then equality also holds in 6). ii) Without any additional asssumptions, E θ L) α m n i= i. Proof. To begin with, recall from 2)-3) that E θ L) is given by with the convention ˆR n+) = ): E θ L) = k= i= i E θ L k I { n ˆR k αi } n+ I { n ˆR j) > αj } I { n ˆR i) αi }). j=i+ Here, applying assumption 5), it follows that E θ L) = m k= i= i E θ L k I { n ˆR k αi } n+ I { n ˆR j) > αj } I { n ˆR i) αi }). 7) j=i+ For each integer k, k m, let X 2,k X n,k denote the ordered random variables ˆR ),..., ˆR n) when ˆR k is removed. Moreover, set X,k = 0 and X n+,k =, for convenience. Thus, rewriting 7) we obtain E θ L) = m k= i= i E θ L k I { n ˆR k αi } n+ I { nx j,k > αj } I { nx i,k αi }). 8) j=i+ Regarding part i), note that X 2,k X n,k are independent of the two random variables L k δ k, θ) and ˆR k. Hence, 8) simplifies to E θ L) = m k= i= i E θ Lk I { n ˆR k αi }) n+ E I { nx j,k > αj } I { nx i,k αi }). 9) j=i+ 7

Here, note that, for any k m, n+ i= j=i+ I { nx j,k > αj } I { nx i,k αi } =, 20) since 0 = X,k X 2,k X n+,k =. Thus, applying assumption 4) in 9), it follows from 20) that E θ L) αm/n, with equality in case of equality in 4), which completes the proof of part i). Regarding part ii), we note in view of 20) that, for any k m, i= max i n i I{ n ˆR k αi } n+ j=i+ { i I{ n ˆR k αi }} n I { nx j,k > αj } I { nx i,k αi } n+ i= j=i+ I { nx j,k > αj } I { nx i,k αi } We also note that { = max i n i I{ n ˆR k αi }}. 2) { max i n i I{ n ˆR k αi }} = i= Applying 2)-22) in 8), it follows that E θ L) m k= i I{ αi ) < n ˆR k αi } = n I{ ˆRk α } n + i )I { n i + ˆR k αi } i= = n I{ ˆRk α } n + ii + ) I{ n ˆR k αi }. 22) i= E θ L k n I{ ˆRk α } n + ii + ) I{ n ˆR k αi }). 23) i= Finally, applying assumption 4) to 23), we deduce that E θ L) m α n + α n k= proving part ii) of the theorem. 4 A simulation study n i= ) = α m i + n As a simple statistical model for evaluation of multiple testing procedures, we now consider so-called Dirac-uniform configurations referred to as U in Table ). More precisely, let 0 m n be given, with m determining the number of true null hypotheses. Then, let ˆp,..., ˆp m refer to i.i.d. random variables uniformly distributed i= i, 8

on the interval [0, ] and assume that remaining p-values are constantly equal to zero, ˆp m+ = = ˆp n = 0 With the generalized perspective cf. Section 2) we also consider a related model U½. Similarly, with 0 m n given, let ˆR,..., ˆR m be i.i.d. random variables uniformly distributed on the interval [0, /2], with associated decisions δ,..., δ m i.i.d. uniformly distributed on the binary set {0, }. The remaining generalized p- values and decision functions are assumed to be constantly equal to zero, ˆRm+ = = ˆR n = 0 = δ m+ = = δ n. All decisions are evaluated with respect to a loss function Lδ) = I{δ = }. As a motivation of the U½-model in this situation, consider the treatment of directional conclusions in Section 2.. A simple example in that context is given by the statistical model {Nµ, ) : µ R} of unit variance, univariate normal distributions, divided into H : µ 0 and H 2 : µ > 0. Let the naive directional decision function be given by δ = 2 I{X > 0} + I{X 0}. Then, ˆR = Φ X ) serves as a generalized p-value with respect to δ and the directional loss function, L = I{µ 0}I{X > 0} + I{µ > 0}I{X 0}. Note that ˆR is uniformly distributed on [0, /2], and that δ is uniformly distributed on {0, }, with ˆR and δ independent, in case µ = 0. Thus, the U½-model corresponds to directional conclusions with µ = = µ m = 0, µ m+ = = µ n =, and independent observations. Table reports on estimated FDR and FWER regarding six multiple testing procedures applied to ˆR,..., ˆR n in the above models U and U½. All procedures are performed at level α = 0.05, with respect to n = 00 and m = 2, 5, 20, 99. Moreover, all procedures are known to control the FDR/FWER respectively) at level α = 0.05 for U. m = 2 m = 5 m = 20 m = 99 U U½ U U½ U U½ U U½ BH 0.00 0.00 0.0025 0.0025 0.00 0.00 0.050 0.050 HM 0.049 0.05 0.049 0.050 0.049 0.049 0.049 0.049 HG 0.050 0.053 0.049 0.050 0.049 0.049 0.049 0.049 BL 0.020 0.00 0.050 0.025 0.004 0.004 0.026 0.025 GBS 0.06 0.00 0.034 0.025 0.050 0.083 0.050 0.049 S½ 0.00 0.00 0.025 0.025 0.050 0.00 0.050 0.495 Table : Estimated FDR and FWER for the two models U and U½ for n = 00 and selected values of m. BH refers to the Benjamini Hochberg FDR-procedure, HM to the Holm FWER-procedure [2], HG to the Hochberg FWER-procedure [0], BL to the Benjamini Liu FDR-procedure [4], GBS to the Gavrilov Benjamini Sarkar FDR-procedure [9], and S½ to the Storey FDR-procedure [23] with λ = /2. Note to begin with that the error rate of the Benjamini Hochberg procedure is constant when comparing the two models U and U½ with m fixed. Indeed, the 9

expected proportion of false conclusions is given by αm/n, as explained by Theorem 3. i) above. Next, the probability of a false conclusion with the Holm procedure [2] applied to the U½-model with m = 2 is slightly larger than 5%. Indeed, an error is committed with probability /2 in case exactly one of δ and δ 2 is confirmed, which occurs in case ˆR α/2, ˆR 2 > α or ˆR2 α/2, ˆR > α. Thus, the probability of confirming exactly one of δ and δ 2 equals 2α 2α). Moreover, an error is committed with probability 3/4 in case both δ and δ 2 are confirmed, that is, whenever ˆR α/2, ˆR 2 α or ˆR2 α/2, ˆR α. Here, the probability of confirming both δ and δ 2 equals 2 α 2α α 2 = 3α 2. Thus, the total probability of an error is given by In the case of α = 0.05, 2α 2α)/2 + 9α 2 /4 = α + α 2 /4 > α. α + α 2 /4 0.0506 0.05, as indicated by Table. As a consequence of the behaviour of the Holm procedure, the Hochberg FWERprocedure [0] applied to the U½-model with m = 2 also yields an error larger than 5%, since it always confirms more decisions than the Holm procedure. Indeed, the procedure HG is the step-up version of the step-down procedure HM. The step-down FDR-controlling procedure of Benjamini Liu [4] is known to be more powerful compared to the Benjamini-Hochberg procedure when m is small cf. [4]). This is in agreement with Table, where it can be seen that BL produces larger false discovery rates for m = 2 and 5 compared to BH. Also, it appears from Table that BL is conservative with respect to the model U½. In fact, one may verify that the false discovery rates of BL applied to U½ is approximately bounded by 0.025, for any 0 m 00 in this case. Finally, the two alternative FDR-procedures which we here consider cf. [8], [9] for GBS, and [22], [23] for S½) sometimes produce false discovery rates much larger than 5% when applied to the U½-model. This seems to be characteristic for adaptive FDR-controlling procedures cf. also [3] and [2]). Indeed, the notion of adaptiveness often refers to the strategy of incorporating an estimated value of the number of true null hypotheses into an existing step-wise procedure cf. e.g. [3, Section 3]). This strategy may thus be closely related to the assumption of data-invariant, zero-one loss functions L,..., L n. References [] Benjamini, Y. and Hochberg, Y. 995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57, 289-300. 0

[2] Benjamini, Y. and Hochberg, Y. 2000). On the adaptive control of the false discovery rate in multiple testing. J. Behav. Educ. Statist. 25, 60-83. [3] Benjamini, Y.; Krieger, A.M. and Yekutieli, D. 2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93, 49-507. [4] Benjamini, Y and Liu, W. 999). A step-down multiple hypothesis testing procedure that controls the false discovery rate under independence. J. Statist. Plann. Inference 82, 63-70. [5] Benjamini, Y. and Yekutieli, D. 200). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29, 65-88. [6] Falk, R.W. 989). Hommel s Bonferroni-type inequality for unequally spaced levels. Biometrika 76, 89-9. [7] Finner, H.; Dickhaus, T. and Roters, M. 2007). Dependency and false discovery rate: asymptotics. Ann. Statist. 35, 432-455. [8] Finner, H.; Dickhaus, T. and Roters, M. 2009). On the false discovery rate and an asymptotically optimal rejection curve. Ann. Statist. 37, 596-68. [9] Gavrilov, Y.; Benjamini, Y. and Sarkar, S.K. 2009). An adaptive stepdown procedure with proven FDR control under independence. Ann. Statist. 37, 69-629. [0] Hochberg, Y. 988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800-802. [] Hochberg, Y. and Rom, D. 995). Extensions of multiple testing procedures based on Simes test. J. Statist. Plann. Inference 48, 4-52. [2] Holm, S. 979). A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6, 65-70. [3] Hommel, G. 983). Tests of the overall hypothesis for arbitrary dependence structures. Biometrical J. 25, 423-430. [4] Jones, L.V. and Tukey, J.W. 2000). A sensible formulation of the significance test. Psychol. Meth. 5, 4-44. [5] Jonsson, F. Characterizing optimality among three-decision procedures for directional conclusions. U.U.D.M Project Report 20:2 submitted). [6] Lehmann, E.L. and Romano, J.P. 2005). Testing Statistical Hypotheses. Third Edition. Springer-Verlag. [7] Liese, F. and Miescke, K.-J. 2008). Statistical Decision Theory. Springer- Verlag. [8] Samuel-Cahn, E. 996). Is the Simes improved Bonferroni procedure conservative? Biometrika 83, 928-933. [9] Sarkar, S.K. 998). Some probability inequalities for ordered MTP 2 random variables: a proof of the Simes conjecture. Ann. Statist. 26, 494-504.

[20] Sarkar, S.K. 2002). Some results on false discovery rate in stepwise multiple testing procedures. Ann. Statist. 30, 239-257. [2] Simes, R.J. 986). An improved Bonferroni procedure for multiple tests of significance. Biometrika. 73, 75-754. [22] Storey, J.D. 2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64, 479-498. [23] Storey, J.D. 2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66, 87-205. 2