FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

Size: px

Start display at page:

Download "FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES"

Ruby Adams
5 years ago
Views:

1 FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept of False Discovery Rate (FDR), which is the expected proportion of false positives (Type I errors) among rejected hypotheses, has received increasing attention recently by researchers in multiple hypotheses testing. A similar measure involving false negatives (Type II errors), which we call the False Negatives Rate (FNR), is considered and an explicit formula of it is developed for a generalized step-up-step-down procedure in terms of probability distributions of ordered test statistics. It is shown that a step-down procedure can be used to control the FNR under certain conditions on the test statistics. Various FDR-controlling procedures that exist in a given multiple testing situation are further studied in terms of the FNR. In particular, an unbiasedness property is defined for an FDR- controlling multiple testing procedure by the inequality FDR + FNR 1. The FDR-controlling generalized step-up-step-down procedure considered in Sarkar (2002), in particular the Benjamini and Hochberg (1995) step-up procedure, is proved to be unbiased when the test statistics are independent and is conjectured to be so based on simulations when they are dependent. Also conjectured is the unbiasedness of the FDRcontrolling Benjamini and Liu (1999) step-down procedure with independent as well as dependent test statistics. Different FDR-controlling procedures are investigated based on simulated data from equi-correlated multivariate normals in terms of the quantity 1 - FNR - FDR that reflects the strength of unbiasedness. Key words: Generalized step-up-step-down procedure, Benjamini-Hochberg step-up procedure, Benjamini-Liu step-down procedure, Unbiasedness of an FDR-controlling procedure, Power of an FDR-controlling procedure MSC: 62J15, 62F03 address: sanat@sbm.temple.edu (Sanat K. Sarkar). 1 Presented at the 3rd International Conference on Multiple Comparisons, Bethesda, MD, August, Supported by NSA Grant MDA Preprint submitted to Elsevier Science 8 July 2003

2 1 Introduction In simultaneous testing of multiple null hypotheses, controlling the False Discovery Rate (FDR), which is the expected proportion of Type I errors among rejected hypotheses, is increasingly becoming an acceptable practice. The concept of FDR is quite simple compared to the traditional Familywise Error Rate (FWER) and seems appropriate in many practical situations (Abramovich and Benjamini, 1996; Abramovich et al., 2000; Basford and Tukey, 1997; Drigalenko and Elston, 1997; Efron et al., 2001; Storey and Tibshirani, 2001; Williams et al., 1999). It was introduced in multiple testing by Benjamini and Hochberg (1995), although similar ideas were discussed in some earlier papers; see Finner and Roters (2001) for some historical remarks. Consider simultaneous testing of n null hypotheses H 1,..., H n. The following table presents all possible outcomes when a multiple testing procedure is applied. Table 1. The outcomes in testing n null hypotheses Rejected Accepted Total True Null V W n 0 False Null S T n 1 Total R A n Let Q = V/R if R > 0, and = 0 if R = 0; i.e, Q is the proportion of false positives (Type I errors) among the rejected null hypotheses. Then, E(Q) is the FDR of that procedure (Benjamini and Hochberg, 1995). A number of step-up and step-down procedures exist in the literature that control the FDR in situations involving either independent or some commonly encountered type of positively dependent test statistics (Benjamini and Hochberg, 1995; Benjamini and Liu, 1999; Benjamini and Yekutieli, 2001; Sarkar, 2002). How do we compare these different FDR-controlling procedures? Since the FDR is a measure of false positives (Type I errors), a similar measure in terms of false negatives (Type II errors) appears to be an obvious choice for an alternative criterion through which different FDR-controlling procedures can be compared. Let us define N = T/A if A > 0 and = 0 if A = 0, which is the proportion of false negatives among the accepted null hypotheses. Then, we define the False Negatives Rate (FNR) by E(N). Genovese and Wasserman (2002) also introduced this measure, but called it the false non-discovery rate, and compared different procedures in terms of ceratin measures involving both FDR and FNR. Sarkar (2002) obtained an explicit expression for the FDR of a generalized 2

3 step-up-step-down procedure that includes both step-up and step-down procedures and an upper bound to the FDR of step-down procedure in terms of probability distributions of ordered components of dependent random variables. From these the FDR-controlling property of a number of stepwise procedures, including those considered by Benjamini and Hochberg (1995), Benjamini and Liu (1999) and Benjamini and Yekutieli (2001), was established in the independence as well as some type of positive dependence cases. We revisit these FDR-controlling stepwise procedures in this article, make some clarifications, extend an existing result, and investigate these procedures further in terms of their FNR. Although the concept of FNR can be used to compare different FDR-controlling procedures, controlling itself at a pre-specified level might be of primary importance in many situations. For this, a formula for the FNR of a generalized step-up-step-down procedure, similar to that for the FDR, is developed. It is then shown that a step-down procedure can be used to control the FNR, of course, under certain conditions on the underlying test statistics. Towards studying how the different FDR-controlling procedures compare in terms of the FNR, we first introduce an unbiasedness property of a multiple testing procedure. It is defined using the FDR and the FNR of a multiple testing procedure and is based on the very basic idea that a good procedure must ensure, on the average, a high proportion of correct decisions compared to that of incorrect decisions. This property is investigated for the different FDRcontrolling procedures in this article. It is proved that the FDR-controlling generalized step-up-step-down procedure of order r (Sarkar, 2002), in particular the Benjamini-Hochberg step-up test, is unbiased when the underlying test statistics are independent. In case of dependent test statistics having a joint distribution that is equi-correlated multivariate normal with a positive common correlation, these procedures are empirically checked to be unbiased. The unbiasedness of the FDR-controlling Benjamini-Liu step-down procedure is also empirically verified in both independence and dependence cases considering the same multivariate distribution. The extent to which the expected proportion of correctly accepted null hypotheses, as measured by 1-FNR, differs from that of falsely rejected null hypotheses, which is FDR, provides a measure to evaluate how powerful a procedure is. We will empirically compare the FDR-controlling procedures using this measure when the test statistics are equi-correlated multivariate normal. The paper is organized as follows. The generalized step-up-step-down procedure is defined in Section 2. The FDR-controlling stepwise procedures with independent as well dependent X i s are listed in Section 3. Section 4 presents the explicit expression for the FNR of generalized step-up-step-down procedure and the discussions on how to control the FNR using a step-down procedure. The comparative study of the FDR-controlling procedures mentioned 3

4 in Section 3 in terms of the FNR is carried out in Section 5. We conclude the paper with some remarks in Section 6. The technical details of proof of a theorem are deferred to the Appendix. 2 Generalized step-up-step-down procedure of order r As the results on FDR and FNR are first developed in this paper for a generalized step-up-step-down procedure before they are being discussed in particular for a step-up or step-down procedure, it is helpful to define this procedure here before we proceed further. It was first introduced by Tamhane et al. (1998). Suppose there are n null hypotheses H 1,..., H n with the corresponding test statistics X 1,..., X n, respectively, which are continuous and are identically, not necessarily independently, distributed under the null hypotheses. We assume without any loss of generality that each H i is tested using a right-tailed test, i.e., it is rejected for a large value of X i. Let X 1:n X n:n be the ordered X i s and correspond to H 1:n,..., H n:n, respectively. Then, a generalized step-up-step-down procedure of order r in terms of X i:n s with critical values c 1:n c n:n operates as follows. Let i r = min r+1 i n {i : X i:n c i:n } if the minimum exists, otherwise it is n + 1, and i r = max 1 i r 1 {i : X i:n < c i:n } if the maximum exists, otherwise it is 0. Then, this procedure accepts {H i:n : i < i r} and rejects the rest if X r:n < c r:n ; whereas, it accepts {H i:n : i i r } and rejects the rest if X r:n c r:n. Equivalently, it declares the hypotheses H 1:n,..., H j:n as true and the rest as false if where (X 1,..., X n ) B r j,n, B r j,n = {X j:n < c j:n, X j+1:n c j+1:n,..., X r:n c r:n }, for j = 0, 1,..., r 1, = {X r:n < c r:n,..., X j:n < c j:n, X j+1:n c j+1:n }, for j = r,..., n, (2.1) with B r 0,n = {X 1:n c 1:n,..., X r:n c r:n } and B r n,n = {X r:n < c r:n,..., X n:n < c n:n }. When r = 1, it is a step-up procedure, and when r = n, it is a step-down procedure. Readers are referred to Finner and Roters (2002) and Liu (1996) for more on step-up and step-down procedures. With r = 1 or n, if the critical values are all same, say equal to c, then it reduces to a single step procedure, where each H i is rejected if and only if X i c. 4

5 3 FDR-controlling stepwise procedures We will summarize in this section some existing results on FDR, make some remarks and provide an extension. Starting with the following formula F DR = n 1 1 i I 0 n j P [{X i c j+1:n } Bj,n], r (3.1) j=0 where {H i, i I 0 } is the set of true null hypotheses, Sarkar (2002) gave an explicit expression for the FDR of a generalized step-up-step-down procedure of order r. This is given by with and 1 F DR = P {X i c r:n } n r + 1 i I 0 + [ { r 1 E φ r I(Xi c j:n ) j,n(x i ) i I 0 j=1 n j + 1 I(X }] i c j+1:n ) n j + [ { n 1 E ψ r I(Xi c j+1:n ) j,n(x i ) I(X }] i c j:n ), (3.2) i I 0 n j n j + 1 j=r φ r j,n(x i ) = P {X j:n c j,n,..., X r:n c r:n X i }, ψ r j,n(x i ) = P {X r:n < c r:n,..., X j:n < c j:n X i }. Under a variety of distributional settings of the X i s, the c i:n s can be obtained so that the FDR in (3.2) is less than or equal to n 0 α/n, and hence less than or equal α. For example, when the X i s are independent, or have a multivariate distribution that possesses a positive dependence property in the sense that the X i s are PRDS on the subset {X i, i I 0 }, then c i:n s satisfying F (c i:n ) = 1 (n i + 1)α/n, for i = 1,..., n, (3.3) with F ( ) being the common marginal null cdf of the X i s, provides a control of the FDR at α (Sarkar, 2002). These are the Simes (1986) critical values used in the Benjamini-Hochberg step-up test with independent test statistics; see Sarkar and Chang (1997) and Sarkar (1998) for more on Simes critical values. It is important to note, however, that the step-up test with these critical values in the independence case is actually exactly equal to n 0 α/n (Benjamini and Yekutieli, 2001; Finner and Roters, 2001; Sarkar, 2002). An n-dimensional random vector X = (X 1,..., X n ) or the corresponding multivariate distribution is said to be positive regression dependent on a subset 5

6 {X i, i M}, or simply on M (PRDS on M), where M {1,..., n}, if P {X C X i } is nondecreasing (nonincreasing) in X i, for each i M, for any increasing (decreasing) set C (Benjamini and Yekutieli, 2001; Sarkar, 2002). Karlin and Rinott s (1980) MTP 2 (multivariate totally positive of order two) property implies the PRDS property on any subset. For instance, independent random variables, which are trivially MTP 2, are PRDS. Multivariate normal distributions with certain types of non-negative correlations are MTP 2, and hence PRDS on any subset. These types refer to covariance matrices Σ with the off-diagonals of Σ 1 being nonnegative. Such a covariance matrix arises, for example, in the case of many-to-one comparisons in a one-way layout where the correlations are non-negative and of the form ρ ij = λ i λ j, for i j, for some non-negative λ i s (Karlin and Rinott, 1980). Also, as noted in Sarkar (2002), multivariate F, which arises in many-to-one comparisons of variances with one-sided alternatives, is MTP 2, and hence PRDS on any subset. The PRDS property of the above types of multivariate normal can be invoked to establish the same property of the corresponding multivariate t on I 0 (corresponding to zero means), but conditional on positive values (Sarkar, 2002). Similarly, the PRDS property of independent normals imply the same property of the corresponding absolute-valued multivariate t on I 0 (Sarkar, 2002). It is important to note, however, that the PRDS property of multivariate normal on any subset holds as long as the correlations are non-negative. This follows from the fact that the conditional distribution given any X i is multivariate normal with a mean vector that is non-decreasing in X i and a covariance matrix that does not depend on X i. Therefore, the aforementioned PRDS property of multivariate t corresponding to dependent normals holds as long as these normals have non-negative correlations (Benjamini and Yekutieli, 2001). It is useful to note that absolute-valued multivariate normal with nonnegative correlations is not PRDS unless the correlations are all zero. Remark 3.1. One should keep in mind that in the aforementioned FDRcontrolling property of the step-up-step-down procedures with the critical values satisfying (3.3), the marginal distributions of the X i are completely known under the null hypotheses. When this is not the case, for example, when the null hypotheses are one-sided, the result will be less general. This is because the PRDS property of (X 1,..., X n ) on I 0 corresponding to such null hypotheses may no longer hold under some of the multivariate distributions listed above. For example, if the X i s are multivariate t corresponding to dependent normals with non-negative correlations, then it is not clear if they are PRDS on such I 0 conditional on positive values. This is because, as noted in Sarkar (2002), the arguments leading to this property for multivariate t explicitly require the condition that the mean of X i is zero when i I 0, and cease to work if this mean is non-zero; see also Lemma 3.1, condition (d), in Benjamini and Yekutieli (2001). Of course, random variables that are independent or have a 6

7 joint distribution that is multivariate normal with non-negative correlations or a multivariate F will still be PRDS on this I 0. Moreover, even if (X 1,..., X n ) is PRDS on I 0, since the differences P {X i c j:n } n j + 1 P {X i c j+1:n }, n j for different i and j, are no longer zero under such null hypotheses, the inequality 1 F DR P {X i c r:n } n r + 1 i I 0 + [ r 1 P φ r {Xi c j:n } j,n(c j+1:n ) i I 0 n j + 1 j=1 + n 1 i I 0 j=r ψ r j,n(c j+1:n ) [ P {Xi c j+1:n } n j P {X ] i c j+1:n } n j ] P {X i c j:n } n j + 1, (3.4) which follows from the PRDS property (Sarkar, 2002), now does not yield the result that FDR n 0 α/n unless further conditions are imposed on (X 1,..., X n ). Suppose that the marginal distribution of X i depends on a parameter θ i, i = 1,..., n, and that the null hypotheses are H i : θ i θ 0, i = 1,..., n, for some known θ 0. Notice that we do not attach θ i as a subscript to a probability statement involving the distribution of X i unless θ i has some specific value. If the density of X i, say f(x, θ i ), is TP 2 in (x, θ i ) (Karlin, 1968), i.e., has the monotone likelihood property in x Lehmann (1986, p. 78), as in the case of normal and many other commonly used distributions, then, with the c i:n s satisfying (3.3) (where F ( ) is the cdf of X i at θ i = θ 0 ), it is noted that the inequalities and P {X i c r,n } (n r + 1)α/n, P {X i c j+1:n } n j P {X i c j:n } n j are true for i I 0. The first inequality follows from the fact that P {X i x} is increasing in θ i [see, e.g., Lehmann (1986)]; whereas, the second inequality follows from the fact that P {X i x} is also TP 2 in (x, θ i ) [see, e.g., DasGupta 7

8 and Sarkar (1984) and Karlin (1968)], yielding the inequality P {X i c j+1:n } P {X i c j:n } P θ 0 {X i c j+1:n } P θ0 {X i c j:n }, for θ i θ 0. Thus, the required inequality, namely, FDR n 0 α/n, now holds for r = 1, providing the following modification in case of left-sided null hypotheses. Result 3.1. For multiple testing of the null hypotheses H i : θ i θ 0, i = 1,..., n, against the corresponding alternatives H i : θ i > θ 0, i = 1,..., n, the Benjamini-Hochberg step-up procedure involving right-tailed tests based on the X i s using the critical values satisfying (3.3) controls the FDR at α if (X 1,..., X n ) is PRDS on I 0 and, for each i, the marginal density of X i is TP 2. This is an extension of Theorem 5.2 of Benjamini and Yekutieli (2001) to dependence case. The FDR-controlling property of other step-up-step-down procedures for these types of null hypotheses is, however, not clear. It is important to point out that although this result can be established in the independence case with X i satisfying a less restrictive condition, namely, it is simply stochastically increasing in θ i, as in Benjamini and Yekutieli (2001), it is however not easy to prove in the dependence case without the stronger, yet relatively easy to verify, TP 2 condition. This is explained in the following. With J i = {1,..., n} {i}, let X 1:Ji X n 1:Ji denote the ordered components of the subset {X j, j J i }. Then, note that for j = 0,..., n 1, {X 1:n < c 1:n,..., X j:n < c j:n, X j+1:n c j+1:n, X i > c j+1:n } = {X 1:Ji < c 1:n,..., X j:ji < c j:n, X j+1:ji c j+1:n, X i > c j+1:n } = {R ( i) n = n j 1, X i > c j+1:n }, where R n ( i) is the number of null hypotheses rejected by the step-up test involving the subset {X j, j J i } and the critical values c 1:n... c n 1:n. In other words, the FDR in (3.1) for the step-up test (r = 1) can be rewritten as F DR = n 1 1 i I 0 n j P {X i > c j+1:n, R n ( i) = n j 1}, j=0 which simplifies under independence of the X i s to n 1 i I 0 j=0 1 n j P {X i > c j+1:n }P {R ( i) n = n j 1}. 8

9 Now, if X i is stochastically increasing in θ i and the c j:n s satisfy (3.3), this FDR is less than or equal to n 1 i I 0 j=0 = α n = n 0α n. i I 0 j=0 1 n j P θ 0 {X i > c j+1:n }P {R ( i) n = n j 1} n 1 P {R ( i) n = n j 1} Notice that when the X i s are dependent, just assuming that the X i is stochastically increasing in θ i does not take us anywhere. In addition to the aforementioned stepwise procedures, there is another stepdown procedure suggested by Benjamini and Liu (1999) that controls the FDR at α. The critical values c i:n s of this step-down procedure are such that F (c i:n ) = [1 min(1, n i α)] 1 i, i = 1,..., n. They have proved that this step-down procedure controls the FDR at α when the statistics are independent. Sarkar (2002) has strengthened this by proving that the FDR-controlling property of this procedure still holds when the test statistics are positively dependent in the sense of being MTP 2 under any set of true and false null hypotheses, with the null distribution being exchangeable. Multivariate normal with a nonnegative common correlation is of this type. 4 FNR of generalized step-up-step-down procedure of order r This section presents some new results on FNR. In the first part of this section, we will derive an explicit formula for the FNR of a generalized step-up-stepdown procedure, similar to that for the FDR given in Sarkar (2002). In the second part, we will explain how and when the FNR can be controlled at a pre-specified level. Let I 1 = {1,..., n} I 0 ; that is, {H i, i I 1 } is the set of false null hypotheses. Observing that we have B r j,n {A = j}, j = 0, 1,..., n, 9

10 F NR = E{T I(A > 0)/A} n 1 = E{T I(A = j)} j=1 j n 1 = E{I(H i is accepted, A = j)} (4.1) j=1 j i I 1 Once the event {A = j} occurs resulting in acceptance of H 1:n,..., H j:n and rejection of H j+1:n,..., H n:n, the hypotheses H i (i I 1 ) will be one of these accepted hypotheses if and only if X i c j:n. Thus, we have F NR = i I 1 n j=1 1 j P {X i c j:n, A = j}. (4.2) A more explicit expression for (4.2) is obtained in the following theorem whose proof is given in the Appendix. Theorem 4.1. The FNR of a generalized step-up-step-down procedure of order r is given by F NR = 1 P {X i c r:n } r i I 1 + [ { r E φ r I(Xi c j 1:n ) j,n(x i ) i I 1 j=2 j 1 + n [ { E ψ r I(Xi c j:n ) j,n(x i ) i I 1 j j=r+1 I(X }] i c j:n ) j I(X i c j 1:n ) j 1 }]. (4.3) Can the FNR of a stepwise procedure of the above type be controlled at some pre-specified level? This question was raised in Genovese and Wasserman (2002, p. 507). The answer is yes, of course, under certain conditions on (X 1,..., X n ). Suppose first that (X 1,..., X n ) is PRDS on I 1. Then, φ r j,n(x i ) is nondecreasing and ψ r j,n(x i ) is nonincreasing in X i, for each i I 1. Since I(X i c j 1:n ) j 1 I(X i c j:n ) j or 0 according as X i or c j 1:n, the FNR in (4.3) is less than or equal to 10

11 1 P {X i c r:n } r i I 1 + [ r P φ r {Xi c j 1:n } j,n(c j 1:n ) i I 1 j=2 j 1 + n [ P ψj,n(c r {Xi c j:n } j 1:n ) i I 1 j j=r+1 P {X ] i c j:n } j P {X i c j 1:n } j 1 Next suppose that the marginal density, f(x, θ i ), of X i is TP 2 in (x, θ i ) for each i, and that the null hypotheses are H i : θ i θ 0, i = 1,..., n, which are tested against the alternatives H i : θ i > θ 0, i = 1,..., n, respectively. Then, the following inequalities are true ]. and P {X i c r,n } P θ0 {X i c r,n } P {X i c j 1:n } P {X i c j:n } P θ 0 {X i c j 1:n } P θ0 {X i c j:n }, for i I 1, again because of P {X x} being decreasing in θ i as well as TP 2 in (x, θ i ). Thus, we have the following: Result 4.1. Suppose that the joint distribution of (X 1,..., X n ) is PRDS on I 1, and that the marginal density, f(x, θ i ), of X i is TP 2 in (x, θ i ) for each i. Then, for simultaneous testing of the null hypotheses H i : θ i θ 0, i = 1,..., n, against the corresponding alternatives H i : θ i > θ 0, i = 1,..., n, the FNR of a step-down procedure involving right-tailed tests based on (X 1,..., X n ) is less than or equal to n 1 β/n if the critical values c 1:n c n:n satisfy F (c i:n ) = iβ/n, for i = 1,..., n, where F (x) is the common marginal cdf of the X i s when θ i = θ 0, for i = 1,..., n. Random variables that are independent with TP 2 densities or have a joint distribution that is multivariate normal with nonnegative correlations or multivariate F satisfy the conditions stated in Result 4.1. In fact, when the X i s are independent, the TP 2 condition of the marginal densities can be replaced by the weaker condition that X i is stochastically nondecreasing in θ i, for i = 1,..., n. This is explained in the following. Working with r = n in (4.2), we note that, for j = 1..., n, {X i c j:n, X j:n < c j:n, X j+1:n c j+1:n,..., X n:n c n:n } = {X i c j:n, X j 1:Ji < c j:n, X j:ji c j+1:n,..., X n 1:Ji c n:n } = {X i c j:n, A ( i) n = j 1}, 11

12 where A ( i) n is the number of null hypotheses accepted by the step-down test involving the subset {X j, j J i } and the critical values c 2:n... c n:n. Therefore, when the X i s are independent, the FNR of the step-down test with the critical values satisfying the condition stated in Result 4.1 reduces to n 1 i I 1 j=1 j P {X i c j:n }P {A ( i) n = j 1} n 1 i I 1 j=1 j P θ 0 {X i c j:n }P {A ( i) n = j 1} = β n P {A ( i) n = j 1} n i I 1 j=1 = n 1β n. 5 Comparison of FDR-controlling procedures in terms of their FNR In this section, we will study the different FDR-controlling procedures mentioned in Section 3 in terms of their FNR. First, we will define an unbiasedness property. Definition. An FDR-controlling multiple testing procedure is said to be unbiased at level α if FDR α and FDR + FNR 1. This property is driven by the consideration that a good multiple testing procedure must guarantee that, on the average, proportion of incorrect decisions is not more than proportion of correct decisions. When the FDR is considered as a measure of incorrect decisions, an analogous measure of correct decisions is 1 - FNR, which Genovese and Wasserman (2002) called the correct nondiscovery rate. In situations where controlling false negatives is of primary importance, the FNR provides a measure of incorrect decisions with the corresponding measure of correct decisions being 1 - FDR. In any case, whether we have a multiple testing procedure designed to control FDR or FNR, the inequality FDR + FNR 1 represents a desirable property for any such multiple testing procedure. Result 5.1. Suppose that X 1,..., X n are independent with the marginal distribution of X i depending on a parameter θ i, i = 1,..., n. Then, for simultaneous testing of the null hypotheses H i : θ i = θ 0, i = 1,..., n, against the corresponding alternatives H i : θ i > θ 0, i = 1,..., n, a generalized step-up-step-down procedure of order r using (X 1,..., X n ) and the critical values satisfying (3.3) 12

13 (with F being the cdf of X i at θ i = θ 0 ) is unbiased if α < 0.5. Proof. Observing that the FNR in (4.2) reduces to P {A > 0} if the first summation is taken over all i {1,..., n}, we first rewrite (4.2) as F NR = P {A > 0} i I 0 n j=1 We then note that, for any fixed i, j = 1,..., n, 1 j P {X i c j:n, A = j}. (5.1) {X i c j:n, A = j} {X i c 1:n, A = j} = {X i c 1:n, A ( i) r 1 = j 1}, where A ( i) r 1 is the number of null hypotheses accepted by the step-up-stepdown procedure of order r 1 based on the subset {X j, j J i } and the critical values c 2:n,..., c n:n. Note that A ( i) 0 = A ( i) 1. Therefore, when the X i s are independent and the c i:n s satisfy (3.3), the FNR in (5.1) is less than or equal to 1 i I 0 n j=1 1 j P θ 0 {X i c 1:n }P {A ( i) r 1 = j 1} 1 n 0 (1 α) n 1 n 0 n α (5.1) if α 0.5. The result then follows by noting that the FDR n 0 α/n (Sarkar, 2002). Remark 5.1. When the null hypotheses are left-sided, i.e., H i : θ i θ 0, for i = 1,..., n, we need the additional condition that X i is stochastically nondecreasing in θ i. This condition provides the inequality P {X i c 1:n } P θ0 {X i c 1:n } = 1 α, for each i I 0, implying the first inequality in (5.1). Also, under this condition, the inequality FDR n 0 α/n that yields the second inequality in (5.1) is known only for the Benjamini-Hochberg step-up test (see the remarks following Result 3.1). Thus, with left-sided null hypotheses, Result 5.1 becomes less general in that it is known to hold only for the Benjamini- Hochberg procedure under the additional stochastic increasing property of each X i. Does this result still hold when the test statistics are dependent? Also, is the Bejnamini-Liu step-down procedure with independent or dependent test statistics unbiased? These questions, which appear to be relatively more difficult to answer analytically, are going to be addressed elsewhere. However, we 13

14 conjecture in this article that the answers are yes, at least in the equi-correlated multivariate normal case, based on simulations. The difference 1-(FDR+FNR) indicates the strength of unbiasedness, and in that sense it represents a measure of power of a multiple testing procedure. Between two procedures, the one with higher value of this difference is more powerful in the sense that it maintains either a higher proportion of correctly accepted null hypotheses or a low proportion of falsely rejected null hypotheses. It is interesting to note that Genoveses and Wasserman (2001) suggested using F DR + F NR as a measure of risk to compare multiple testing procedures. Based on simulated data from an equi-correlated multivariate normal, we numerically compare the different FDR-controlling stepwise procedures in terms of this concept of power. Each of the FDR-controlling procedures mentioned in Section 3 is applied to a randomly generated vector from an n-dimensional multivariate normal distribution with n 0 of the n means being zero and the rest being δ, a common variance 1, and a non-negative common correlation ρ. Based on 10,000 independent repetitions of this experiment, we obtain simulated values of 1 - FNR - FDR. Tables 1-2 report these simulated values for α = 0.05, n = 100, δ = 1 and 3, and for some selected values of r (with BL referring to the Benjamini- Liu procedure), n 0 and ρ. While all of the FDR-controlling procedures become more powerful as n 0 increases, the performance of the step-up-step-down procedures, including the Benjamini-Hochberg step-up procedure (r = 1), don t seem to differ much in the independence case. They are equally good in the independence case not only in terms of controlling the FDR at a given level (as seen from the last columns which give 1 - FDR) but also in terms of maintaining a low value for FDR + FNR. However, as the statistics become more dependent, and the number of true null hypotheses is not too large, the performance of the Benjamini-Hochberg test becomes better than any other step-up-step-down test. The Benjamini-Liu step-down procedure has an inconsistent performance compared to the others. It dominates in an extreme situation when both n 0 and ρ are very small, otherwise it is weak. For instance, with n = 100 and δ = 3, the 1 - FNR - FDR of the BL procedure is five times as large as that for any of the step-up-step-down procedures when n 0 = 0 and ρ = 0.25, but quickly becomes less dominant as n 0 and ρ become larger. This can be explained by the fact that the BL procedure terminates and declares all of the null hypotheses to be false if it continues up to n [nα] steps without accepting any of them. Thus, it allows rejections of many null hypotheses irrespective of the values of the corresponding test statistics. This causes 1- FNR = P {rejecting all H i } to be relatively quite high when n 0 = 0, even though FDR = 0, particularly when δ is large. When ρ is large, the BL procedure has small chance of early termination and does not have a dramatic performance. In conclusion, while none of these procedures dominates, the Benjamini-Hochberg procedure appears to have an edge over the others in most situations. 14

15 6 Concluding Remarks A concept of unbiasedness of an FDR- or FNR-controlling multiple testing procedure is introduced in this article. An unbiasedness criterion is defined based on the consideration that the expected proportion of correct decisions should not fall below the expected proportion of incorrect decisions in a good procedure. We have chosen 1- FNR (or 1- FDR) as the measure of correct decisions when FDR (or FNR) is considered to be the measure of incorrect decisions, and have defined FDR + FNR 1 as the unbiasedness condition. While we have focused in this article on proving this unbiasedness property for certain FDR-controlling procedures, an analog of this result for FNRcontrolling procedures can also be established. As in single-hypothesis testing, the unbiasedness criterion imposes an optimality condition on FDR- or FNR-controlling procedures and defines a meaningful sub-class of procedures to be studied further in terms of other operating characteristics. The quantity FDR + FNR (or 1 - FNR - FDR) provides such an operating characteristic as it measures the strength of unbiasedness. We call it a measure of power of an FDR- or FNR-controlling procedure. The procedure with the smallest value of FDR + FNR, if there is any, would be the most powerful in this class. On the other hand, a procedure that is uniformly dominated in terms of FDR + FNR by another procedure would be inadmissible. The quantity FDR + FNR was also considered by Genovese and Wasserman (2002), who suggested using FDR + λ FNR as a measure of risk for some fixed user-specified value of λ. It is important to emphasize that our definition of unbiasedness of an α-level FDR procedure implicitly requires that the inequalities FDR α and FDR + FNR 1 be true for all values of (n 0, n 1 ) and the underlying parameters, with strict inequality holding for at least one of these values. To further clarify this, we will consider two examples, hinted by a referee, in the following. Example 1. Consider a procedure that randomly selects R of the n null hypotheses and rejects all of them, with the value of R [0, n] being chosen according to a certain probability distribution. The FDR of this test procedure is (n 0 /n)p {R > 0}, as E(V R = r) = r(n 0 /n), and similarly the FNR is (n 1 /n)p {R < n}. Thus, this procedure is an FDR-controlling unbiased procedure at level α iff P {R > 0} α. Example 2. Let X i f(x, θ i ), which is stochastically increasing in θ i, and that the X i s be independent. Consider testing H 0 : θ i = θ 0 against θ i > θ 0, for i = 1,..., n. Suppose that we have a procedure that, with probability 1 α, accepts all of the null hypotheses and, with probability α, rejects only one of the null hypotheses that corresponds to the min 1 i n X i. For this procedure, 15

16 F DR = α i I 0 P {X i = min 1 j n X j}, and F NR = (1 α) n 1 n + α[ n 1 P {X i = min n 1 X j} 1 j n i I 0 + n 1 1 n 1 i I 1 P {X i = min 1 j n X j}]. Clearly, the FDR is controlled at α. However, if lim θi P {X i > x} = 1, for any fixed x, which is true for many distributions, we notice that i I 0 P {X i = min 1 j n X j } i I 0 P {X i = min j I0 X j } = 1, and hence F DR + F NR α + (1 α) n 1 n + α n 1 n 1, (6.1) as θ i, for each i I 1. Considering n 1 = n 1 in (6.1), we see that FDR + FNR > 1 when α > 1/(n + 1). In other words, for sufficiently large values of θ i for i I 1, the inequality FDR + FNR 1 does not hold when (n 0, n 1 ) = (1, n 1) and n is not small, thereby proving that this procedure, even though controls the FDR, is not unbiased. On the other hand, if the null hypothesis corresponding to the max 1 i n X i, rather than the min 1 i n X i, is rejected with probability α, the resulting multiple testing procedure controls the FDR at α and is also unbiased. This is because, since P {X i = max 1 j n X j } P θ0,...,θ 0 {X i = max 1 j n X j } = 1/n, for all i I 0, under the stochastic increasing property of X i, we have and F DR = α i I 0 P {X i = max 1 j n X j} n 0 n α, F NR = (1 α) n 1 n + α[ n 1 P {X i = max n 1 X j} 1 j n i I 0 + n 1 1 n 1 i I 1 P {X i = max 1 j n X j}] = (1 α) n 1 n + α[n 1 1 n n 1 (1 α) n 1 n + α[n 1 1 n 1 + n 0 n(n 1) ], i I 0 P {X i = min 1 j n X j}] implying that FDR α and FDR + FNR (1 α)(n 1 /n) + α 1. 16

17 It is interesting to notice that the FDR-controlling multiple testing procedure in Example 2 which is seen to be biased (or unbiased) according to our definition is actually developed using tests that are biased (or unbiased) in a different sense, i.e., they are less (or more) likely to reject false null hypotheses than true null hypotheses. This raises an intersecting question: How does the present unbiasedness property of an FDR-controlling multiple testing procedure relate to the unbiasedness condition of the individual tests? This is discussed in a different communication (Sarkar, 2003). There is a slight difference between the concept of unbiasedness of a multiple testing procedure defined here in terms of its FDR and FNR and the classical concept of unbiasedness of a single test defined in terms of its Type I and Type II errors. A multiple testing procedure controlling the FDR at α is called unbiased according to our definition if 1- FNR FDR; whereas, a single test controlling the Type I error rate at α is unbiased if 1 P (Type II error) α (Lehmann, 1986). Why haven t we defined the unbiasedness of an α-level FDR-controlling multiple testing procedure by the inequality 1- FNR α? The reason is, when, for example, all null hypotheses are false, yet the underlying parameters are very close to their values under the null hypotheses, 1-FNR, which reduces to P (rejecting all H i ), could be very close to 0 (see, e.g., the simulation results in Tables 1-2 for the normal case). Thus, in this situation, 1-FNR is not greater than or equal to α, but it is greater than or equal to the FDR (which is of course 0). 7 APPENDIX Proof of Theorem 4.1. First note that {X j:n c j:n,..., X r:n c r:n } = {A j 1}, for j r, and Hence, {X r:n < c r:n,..., X j:n < c j:n } = {A j}, for j r

18 = r j=2 r j=2 E [ φ r j,n(x i ) { I(X i c j 1:n ) j 1 1 j 1 P {A j 1, X i c j 1:n } I(X i c j:n )}] j r j=2 1 j P {A j, X i c j:n } r 1 1 = j=1 j P {A = j, X i c j:n } 1 r P {A r 1, X i c r:n }. (7.1) Similarly, we have = n j=r+1 n j=r+1 E [ ψ r j,n(x i ) { I(X i c j:n ) j I(X i c j 1:n )}] j 1 1 j P {A = j, X i c j:n } 1 r P {A r + 1, X i c r:n }. (7.2) Using (7.1) and (7.2) in the right-hand side of (4.2), we get the right-hand side of (4.3), which is the FNR. This proves the theorem. Acknowledgements. The author is grateful to Gengqian Cai for preparation of the tables, and to two referees for many valuable comments. References Abramovich, F., Benjamini, Y., Adaptive thresholding of wavelets coefficients. Computational Statistics and Data Analysis 225, Abramovich, F., Benjamni, Y., Donoho, D. L., Johnstone, I. M., Adapting to unknown sparsity by controlling the false discovery rate. Technical Report No. Separtment of Statistics, Stanford University. Basford, K. E., Tukey, J. W., Graphical profiles as an aid to understanding plant breeding experiments. Journal of Statistical Planning and Inference 57, Benjamini, Y., Hochberg, Y., Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 57, Benjamini, Y., Liu, W., A step-down multiple hypothesis testing procedure that controls the false discovery rate under independence. Journal of Statistical Planning and Inference 82, Benjamini, Y., Yekutieli, D., The control of the false discovery rate in multiple testing under dependence. Annals of Statistics 29, DasGupta, S., Sarkar, S. K., On TP 2 and log-concavity. Inequalities in Statistics and Probability, IMS Lecture Notes - Monograph Series 5,

19 Drigalenko, E. I., Elston, R. C., False discoveries in genome scanning. Genetic Epidemiology 14, Efron, B., Tibshirani, R., Storey, J. D., Tusher, V., Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association 96, Finner, H., Roters, M., On the false discovery rate and expected type I errors. Biomterical Journal 43, Finner, H., Roters, M., Multiple hypotheses testing and expected number of type I errors. Annals of Statistics 30, Genovese, C., Wasserman, L., Operarting characteristics and extensions of the false discovery rate procedure. Journal of Royal Statistical Society, Ser B 64, Karlin, S., Total Positivity. Stanford Press, CA. Karlin, S., Rinott, Y., Classes of orderings of measures and related correlation inequalities I: Multivariate totally positive distributions. Journal of Multivariate Analysis 10, Lehmann, E., Testing Statistical Hypotheses, 2nd Edition. Wiley, New York. Liu, W., Multiple tests of a non-hierarchical finite family of hypotheses. Journal of the Royal Statistical Society, Series B 58, Sarkar, S. K., Some probability inequalities for ordered MTP 2 random variables: A proof of the Simes conjecture. Annals of Statistics 26, Sarkar, S. K., Some results on false discovery rate in stepwise multiple testing procedures. Annals of Statistics 30, Sarkar, S. K., False discovery and false non-discovery rates in single-step multiple testing procedures Technical Report, Temple University. Sarkar, S. K., Chang, C. K., Simes method for multiple hypothesis testing with positively dependent test statistics. Journal of the American Statistical Association 92, Simes, R. J., An improved Bonferroni procedure for multiple tests of significance. Biometrika 73, Storey, J. D., Tibshirani, R., Estimating false discovery rates under dependence with applications to DNA microarrays. Technical Report , Department of Statistics, Stanford University. Tamhane, A. C., Liu, W., Dunnett, C. W., A generalized step-up-down multiple test procedure. The Canadian Journal of Statistics 26, Williams, V. S. L., Jones, L. V., Tukey, J. W., Controlling error in multiple comparisons with examples from state-to-state differences in educational achievement. Journal of Educational and Behaviorial Statistics 24,

20 Table 1 Simulated values of 1 FNR FDR, n = 100, δ = 1 ρ r n 0 /n BL BL BL BL

21 Table 2 Simulated values of 1 FNR FDR, n = 100, δ = 3 ρ r n 0 /n BL BL BL BL

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia