This paper has been submitted for consideration for publication in Biometrics

Similar documents
Familywise Error Rate Controlling Procedures for Discrete Data

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

Statistica Sinica Preprint No: SS R1

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

Stat 206: Estimation and testing for a mean vector,

On weighted Hochberg procedures

arxiv: v1 [math.st] 31 Mar 2009

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

Modified Simes Critical Values Under Positive Dependence

Testing a secondary endpoint after a group sequential test. Chris Jennison. 9th Annual Adaptive Designs in Clinical Trials

On Generalized Fixed Sequence Procedures for Controlling the FWER

Department of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007

Control of Directional Errors in Fixed Sequence Multiple Testing

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS

Step-down FDR Procedures for Large Numbers of Hypotheses

More powerful control of the false discovery rate under dependence

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation

Multiple Dependent Hypothesis Tests in Geographically Weighted Regression

Lecture 7 April 16, 2018

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

A Mixture Gatekeeping Procedure Based on the Hommel Test for Clinical Trial Applications

A Gatekeeping Test on a Primary and a Secondary Endpoint in a Group Sequential Design with Multiple Interim Looks

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Multiple Testing. Anjana Grandhi. BARDS, Merck Research Laboratories. Rahway, NJ Wenge Guo. Department of Mathematical Sciences

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

Non-specific filtering and control of false positives

Applying the Benjamini Hochberg procedure to a set of generalized p-values

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR

Hunting for significance with multiple testing

ROI ANALYSIS OF PHARMAFMRI DATA:

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

High-throughput Testing

INTRODUCTION TO INTERSECTION-UNION TESTS

Adaptive designs beyond p-value combination methods. Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

On adaptive procedures controlling the familywise error rate

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /8/2016 1/38

Testing hypotheses in order

Large-Scale Multiple Testing of Correlations

Resampling-Based Control of the FDR

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

STAT Section 5.8: Block Designs

Large-Scale Multiple Testing of Correlations

Journal of Statistical Software

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

Inverse Sampling for McNemar s Test

Group Sequential Designs: Theory, Computation and Optimisation

Rank-sum Test Based on Order Restricted Randomized Design

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Unit 14: Nonparametric Statistical Methods

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

Adaptive Designs: Why, How and When?

A3. Statistical Inference Hypothesis Testing for General Population Parameters

Statistical Applications in Genetics and Molecular Biology

Alpha-Investing. Sequential Control of Expected False Discoveries

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

October 1, Keywords: Conditional Testing Procedures, Non-normal Data, Nonparametric Statistics, Simulation study

Reports of the Institute of Biostatistics

Chapter 7: Hypothesis testing

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

False discovery rate control for non-positively regression dependent test statistics

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

Multiple testing: Intro & FWER 1

Extending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie

Performance of Deming regression analysis in case of misspecified analytical error ratio in method comparison studies

Statistical Applications in Genetics and Molecular Biology

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

High-Throughput Sequencing Course

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

ROI analysis of pharmafmri data: an adaptive approach for global testing

Two-stage stepup procedures controlling FDR

Christopher J. Bennett

Non-parametric Tests

arxiv: v2 [stat.me] 31 Aug 2017

Tests for spatial randomness based on spacings

Design of the Fuzzy Rank Tests Package

MULTIPLE TESTING PROCEDURES AND SIMULTANEOUS INTERVAL ESTIMATES WITH THE INTERVAL PROPERTY

A Robust Test for Two-Stage Design in Genome-Wide Association Studies

Population Variance. Concepts from previous lectures. HUMBEHV 3HB3 one-sample t-tests. Week 8

HYPOTHESIS TESTING. Hypothesis Testing

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE

False Discovery Control in Spatial Multiple Testing

Stat 516, Homework 1

Statistical Applications in Genetics and Molecular Biology

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Lecture 2: Descriptive statistics, normalizations & testing

Statistical testing. Samantha Kleinberg. October 20, 2009

Transcription:

BIOMETRICS, 1 10 Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses A. Farcomeni Department of Public Health and Infectious Diseases Sapienza - University of Rome, Piazzale Aldo Moro, 5, 00185 Rome, Italy *email: alessio.farcomeni@uniroma1.it and L. Finos Department of Statistical Sciences University of Padua, Via Cesare Battisti 241, 35121 Padova, Italy *email: livio@stat.unipd.it SUMMARY: We propose a multiple testing procedure controlling the false discovery rate. The procedure is based on a possibly data driven ordering of the hypotheses, which are tested at the uncorrected level q until a suitable number is not rejected. When the order is data driven, larger effect sizes are considered first, therefore selecting more interesting hypotheses with larger probability. The proposed procedure is valid under independence for the test statistics. We also propose a modification which makes our procedure valid under arbitrary dependence. It is shown in simulation that we compare particularly well when the sample size is small. We conclude with an application to identification of molecular signatures of intracranial ependymoma. The methods are implemented in an R package (somemtp), freely available on CRAN. This paper includes supplementary material online. KEY WORDS: Data driven order; False Discovery Rate; Multiple testing. This paper has been submitted for consideration for publication in Biometrics

Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses 1 This Web Appendix is organized as follows: in Section 1 we provide the proofs to Lemma 1, Theorem 1, Theorem 2 and Theorem 3 of the main paper. Section 2 reports the results of additional simulation studies. In Section 3 we briefly describe an example in which the test statistics are ordered a-priori, to complement the example with data driven order in the main paper. 1. Proofs of Lemma and Theorems 1.1 Proof of Lemma 1 We begin by noting that N 1 0 (i) is a sum of i independent indicators, and the process stopped at the i-th hypothesis since J(i, q) successes, each of probability at least 1 q, are obtained. A success for this negative binomial trial is defined as p (j) > q. At the same time, for all j < i, there were less than J(j, q) successes. We have the following assumption on J(i, q): J(i 1, q) J(i 2, q) for any i 1 i 2. (1) We begin by noting that if J(i, q) = J(q), that is, (1) holds with J(i 1, q) = J(i 2, q) for all i 1, i 2, we have a negative binomial trial. This is exactly the setting of Finos and Farcomeni (2011), Theorem 1. If J(i, q) = J(q), note that Pr(N 1 0 (i) k J(q)) = Pr(N 1 0 k I = i J(q)) = Pr(I = i N 1 0 k J(q)) Pr(N 1 0 k J(q)) Pr(N 1 0 k J(q)), where I denotes the random variable indicating the position of the last hypothesis reached by the algorithm, we have explicited that the algorithm stops after J(q) successes by conditioning; and the first equality follows from the very definition of N 1 0 (i). For the marginal number of type I errors N 1 0 it is shown in Finos and Farcomeni (2011), Theorem

2 Biometrics, 1, that the thesis holds under the assumption that J(i, q) does not depend on i. Given that the proof of this statement is exactly the same as that of point (i) in Finos and Farcomeni (2011), Theorem 1, we omit it. We therefore have Pr(N 1 0 k J(q)) 1 F Bneg(J(q),1 q) (k) and consequently the thesis: Pr(N 1 0 (i) k J(q)) 1 F Bneg(J(q),1 q) (k 1). To see that the thesis holds also under (1) we can use Theorem 1 in Finos and Farcomeni (2011) and the following reasoning: when the number of successes prescribed before stopping is smaller, the algorithm stops earlier and hence there is a lower number of type I errors. We now explicit this reasoning more formally. Let H 0 denote the set of indices associated with true null hypotheses. We have: Pr(N 1 0 (i) k) = Pr( I(p (j) > q) k j H 0 1,...,i i i 1 I(p (j) > q) J(i, q) I(p (j) > q) < J(i 1, q)... I(p (1) > q) = 0) = Pr( I(p j > q) k j H 0 1,...,i i i 1 I(p j > q) J(i, q) I(p j > q) < J(i 1, q)... I(p 1 > q) = 0) Pr( I(p j > q) k j H 0 1,...,i i i 1 I(p j > q) J(i, q) I(p j > q) < J(i, q)... I(p 1 > q) < J(i, q)) = Pr(N 1 0 k I = i J(i, q)) Pr(N 1 0 k J(i, q)) 1 F Bneg(J(i,q),1 q) (k 1) where we explicited the algorithm at the first step, used the hypothesis on a priori ordering at the second, (1) at the third. At the third step we now obtain a classical negative binomial trial, with number of successes J(i, q). Consequently, as long as the algorithm is stopped

Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses 3 on or before the i-th hypothesis, we can use Finos and Farcomeni (2011), Theorem 1, to show the last step. The result holds for any i > 0, hence the thesis. 1.2 Proof of Theorem 1 Note that (1) trivially holds in our definition J(i, q) = i(1 q)/(2 q). We can therefore use Lemma 1 to study the expected value of N 1 0 (i): E[N 1 0 (i)] = i Pr(N 1 0 (i) j) (2) i 1 F Bneg(J(i,q),1 q) (j 1) 1 F Bneg(J(i,q),1 q) (j 1) = J(i, q)q/(1 q). Our proposed procedure leads us to stop testing after u p-values have been reached by the sequential testing. Note that u is a random variable, and that there are at least u J(u, q) rejections if u > 0, and 0 rejections if u = 0. We can now bound the : [ ] N1 0 (u) F DR = E R(u) [ ] N1 0 (i) sup E i R(i) E[N 1 0 (i)] sup i i J(i, q) J(i, q)q sup i (1 q)(i J(i, q)), where we used (2) at the last step. If we now fix J(i, q) i(1 q)/(2 q), we have the following inequality for the last expression: sup i J(i, q)q (1 q)(i J(i, q)) iq(1 q) sup i (2 q)(i i(1 q)/(2 q)) i(1 q) = q i(1 q) = q. Consequently, if we fix u using the proposed algorithm, we have F DR q under independence.

4 Biometrics, 1.3 Proof of Theorem 2 Define p jki = Pr(p j q D u = i R = k j H 0 ) (i.e. the probability that a true null hypothesis is rejected and the procedure stops at step i 1 after k rejections). Recall that H 0 denotes the set of indices associated with true null hypotheses. Note that p (j)ki = p jki given the assumptions on a priori ordering. The representation we now use has some similarities with that in the main result of Benjamini and Yekutieli (2001). Let us remark that m i=1 ik=1 p jki = Pr(p j q D ( m i=1 i k=1 u = i R = k) j H 0 ) = Pr(p j q D j H 0 ) q/ m (2 q)/(j + 1). The can be written as (note that if k < i/(2 q) there are no rejections) [ ] N1 0 E = m i p jki R j H 0 i=1 k=1 k = j H 0 i>j k i/(2 q) Consequently, E [ ] N1 0 R = p jki k j H 0 i>j k i/(2 q) 2 q p jki j H 0 i>j i k i/(2 q) 2 q p jki j H 0 j + 1 i>j k i/(2 q) 2 q m j H 0 j + 1 q/ 2 q j + 1 q p jki k 1.4 Proof of Theorem 3 Under the model and the null hypothesis, M j is a sufficient complete statistic for σ j. By assumption the test statistic T j is ancillary to the dispersion parameter σ j under its corresponding H 0, therefore M j is independent of the test statistic T j through Basu theorem. This holds for all p-values calculated by transformation of the null CDF of test statistics T j, even with arbitrarily dependent errors (see for instance Läuter et al. (1998), Finos (2011)). As a consequence, the ordering on which any hypothesis j is tested does not depend on the value of T j whenever the corresponding null hypothesis is true. The corresponding p-value

Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses 5 is therefore upper bounded by a uniform distribution under the null for assigned testing rank. We can now use Theorem 1 under independence or Theorem 2 without assumptions on dependence. 2. Additional simulation studies We give in this section an account of additional simulation studies. The setting is as follows: we perform one-sample t-tests, with data generated from standard normals under the null hypothesis. We let n = 5, 10, 20, 50; m = 100, 1000, 10000; and q = 0.01, 0.05, 0.10. We fix the mean under the alternative hypotheses so that the single tests have a prescribed power of 70% when q = 0.05 (hence, the mean under the alternative decreases as the sample size n increases). The position of the false hypotheses are selected randomly at each iteration, and unless stated otherwise the number of false nulls is set so that 10% of the m hypotheses are false. For each setting we generate the data, compute p-values, and apply the Benjamini and Hochberg (1995) () and Benjamini and Yekutieli (2001) () procedures; together with our procedure with data-driven order of the hypotheses () and the version for general dependence (GD). We perform 10000 Monte Carlo iterations and estimate power through the fraction of correctly rejected hypotheses over the number of false nulls. 2.1 Variability In this section we replicate the independence setting in the main paper, but we now report on the variability of the measures. Given that the procedure seems to be strict at the initial stages and liberal afterwards, it could happen that our procedure could stop very early in certain cases and proceed very far in other cases. This would imply a large variability of the power measure P ow = N 1 1 M 1.

6 Biometrics, In order to check this point, we compare our procedure () with the Benjamini and Hochberg (1995) () procedure. We compute the coefficient of variation of P ow, as defined above, and of the quantity F DP = N 1 0 R + I(R = 0). We report the ratio of the two coefficients of variation, and the raw coefficients of variation for P ow in Table 3 for different values of n and m. [Table 1 about here.] The most important message of Table 3 is given by the column regarding CV (P ow ), which reports fairly small coefficients of variation in most cases, except maybe when n = 50 and m = 100, 1000. Hence, procedure with small or moderate sample size does not yield a power with large variability. When n = 5, 10 the CV (P ow) of the procedure is actually smaller than or approximately equal to that of the procedure. When n 20, the procedure leads to a very small coefficient of variation for our measure of power. For what concerns the variability of the error measure, we have a similar result: when n < 20, procedure gives a smaller variability than the procedure, and for n 20 the reverse holds. As a referee noted, the variance of the FDP may be inflated for both procedures under dependence (see e.g. Owen (2005); Farcomeni (2006, 2007)). This simulation study complements the simulations in the main paper and in this Web Appendix, and confirms that the procedure is particularly suited for cases in which the sample size is relatively small. 2.2 control under block dependence In this Section we generate data as described in Section 2, with dependent test statistics. We use a block dependence assumption in order to mimic a genomic situation (it is frequently assumed that genes are dependent in blocks of size 20-30). We then generate data in blocks

Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses 7 of size 25, in which observations within each block arise from a multivariate normal with a covariance matrix with diagonal elements equal to 1 and off-diagonal elements equal to ρ = 0.3. Figure 1 and Figure 2 report the estimated power and the estimated level of the multiple test, respectively. [Figure 1 about here.] [Figure 2 about here.] It can be seen that the nominal error rate is not exceeded by the procedures. It actually is very hard to find simulated examples which lead the error rates to exceed the nominal error rate, see for instance Benjamini and Yekutieli (2001) and Reiner-Benaim (2007). It can be seen that the procedures valid under general dependence become more conservative. Our and GD procedures outperform the respective competitors for n < 20, while for n = 50 they are outperformed by and, respectively. When n = 20 the comparison is mixed. It shall be noted that even if there is some mild dependence in the data, all procedures still show a good power, which is comparable to that under independence as illustrated in the main paper. 3. 1,25-dihydroxycholecalciferol in kidney transplant patients 1,25-dihydroxycholecalciferol (1,25-DHCC) is a physiologically active form of Vitamin D. Vitamin D can be acquired either with the diet or by the metabolism of 7-dehydrocholesterol, after exposure to ultraviolet light. Vitamin D is then transported to the liver where it is converted into 25-hydroxycholecalciferol (25-HCC). 25-HCC is finally transported to the kidney, which converts it to the active 1,25-DHCC. The active form is an important hormone which regulates calcium absorption and mobilization, among other things. While the benefits and importance of an adequate Vitamin D intake (by means of the diet or by

8 Biometrics, exposition to sunlight) are well recognized, it is apparent that the ability of the body to actually use the Vitamin D is linked to liver and kidney health. In this study we compare 135 patients who have undergone a kidney transplant with 290 controls. For each of them we have a measurement of 25-HCC (in one of five classes) and 1,25-DHCC from blood samples. Our aim is to identify doses of 25-HCC in which transplant patients have a significantly different amount of 1,25-DHCC. If these are significantly lower, the study would help us measure what amounts of 25-HCC can a transplanted kidney bear as compared to a native kidney. A paper discussing the findings from a clinical perspective is in preparation. We can proceed by pre-specifying the order of testing by considering the p-value associated with the lower 25-HCC class first, and then considering increasing classes of 25-HCC. The class with the largest 25-HCC is tested last (if at all). It is important to underline here that, as a matter of fact, any pre-specified ordering would lead to control of the according to Theorem 1 in the main paper. The a priori order we suggest is justified according to the following reasoning: first of all, we have increasing doses of 25- HCC, which provide a natural ordering of the tests; secondly, we would like to focus on low 25-HCC doses, that is, results of clinical relevance are that associated with low 25- HCC doses. We are mostly interested in subjects with a low vitamin D intake since those subjects are at higher risk. Subjects with a limited kidney functionality but high vitamin D intake may still end up with enough 1,25-DHCC for a safe mineral equilibrium; hence any difference found associated with high initial amounts of 25-HCC may not be clinically relevant. In Table 3 we report class-specific medians of 1,25-DHCC for each group and p-values arising from Mann-Whitney testing. Note that given that the order is pre-specified, we do not need any assumption on the error distribution.

Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses 9 [Table 2 about here.] With our procedure and q = 0.05 we end up rejecting the null hypothesis of no differences between transplants and controls in all but the third 25-HCC class. This corresponds to rejecting all p-values for which p j < 0.05, as if we were in a single inference situation. On the other hand, Benjamini and Hochberg (1995) procedure leads us to reject the null hypothesis only for the last two classes, i.e., for 25-HCC 26. These correspond to very low p-values, but are much less clinically relevant than the first two classes. The additional, important, rejections obtained with the procedure lead us to the following conclusions: when there is a low vitamin D intake, the transplanted kidney may not receive enough 25- HCC to promote the use of enzymes leading to as much active hormone as normal kidneys. Consequently, while suitable levels of vitamin D intake is an important recommendation for every one, it seems even more important for kidney transplants. References Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society (Ser. B) 57, 289 300. Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165 1188. Farcomeni, A. (2006). More powerful control of the false discovery rate under dependence. Statistical Methods & Applications 15, 43 73. Farcomeni, A. (2007). Some results on the control of the false discovery rate under dependence. Scandinavian Journal of Statistics 34, 275 297. Finos, L. (2011). A note on left-spherically distributed test with covariates. Statistics & Probability Letters 81, 639 641.

10 Biometrics, Finos, L. and Farcomeni, A. (2011). k-fwer control without p-value adjustment, with application to detection of genetic determinants of multiple sclerosis in Italian twins. Biometrics 67, 174 181. Läuter, J., Glimm, E., and Kropf, S. (1998). Multivariate tests based on left-spherically distributed linear scores. Annals of Statistics 26, 1972 1988. Owen, A. (2005). Variance of the number of false discoveries. Journal of the Royal Statistical Society (Ser. B) 67, 411 426. Reiner-Benaim, A. (2007). control by the procedure for two-sided correlated tests with implications to gene expression data analysis. Biometrical Journal 49, 107 126. Received. Revised. Accepted

Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses 11 q =.01, m=100 q =.05, m=100 q =.10, m=100.00.10.25 GD.00.25.50.75 GD.00.25.50.75 GD q =.01, m=1 000 q =.05, m=1 000 q =.10, m=1 000.00.10.25 GD.00.25.50.75 GD.00.25.50.75 GD q =.01, m=10 000 q =.05, m=10 000 q =.10, m=10 000.00.10.25 GD.00.25.50.75 GD.00.25.50.75 GD Figure 1. Proportion of correctly rejected hypotheses for different values of q, n, m. The proportion of false nulls is set at 10%, the power of each single test at 70%. Block dependent test statistics

12 Biometrics, q =.01, m=100 q =.05, m=100 q =.10, m=100 GD GD GD q =.01, m=1 000 q =.05, m=1 000 q =.10, m=1 000 GD GD GD q =.01, m=10 000 q =.05, m=10 000 q =.10, m=10 000 GD GD GD Figure 2. for different values of q, n, m. The proportion of false nulls is set at 10%, the power of each single test at 70%. Block dependent test statistics.

Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses 13 n m CV (F DP ) CV (F DP ) CV (P ow ) CV (P ow ) CV (P ow ) CV (P ow ) CV (F DP ) CV (F DP ) 5 100 0.50 0.26 1.19 0.31 3.27 1.64 10 100 0.90 0.85 0.59 0.50 2.31 2.07 20 100 1.59 2.07 0.41 0.84 1.91 3.05 50 100 2.51 3.87 0.34 1.30 1.79 4.50 5 1000 0.21 0.18 1.17 0.20 2.71 0.56 10 1000 0.74 0.74 0.25 0.19 0.79 0.58 20 1000 1.42 2.92 0.14 0.41 0.65 0.92 50 1000 4.92 10.72 0.11 1.20 0.61 2.98 5 10000 0.10 0.12 1.19 0.15 2.21 0.22 10 10000 0.81 1.29 0.08 0.11 0.25 0.20 20 10000 1.32 3.83 0.05 0.17 0.20 0.27 50 10000 6.02 18.90 0.04 0.67 0.19 1.16 Table 1 Coefficients of variation and their ratios for N 1 0 (F DP ) and N 1 1 R M 1 (P ow), obtained with the and procedures under independence. The results are averaged over B = 10000 replications.

14 Biometrics, 25-HCC class Transplants Controls p-value < 13 35 37.10 0.0451 [13 20) 37.2 43.50 0.03260 [20, 26) 45 50.60 0.48900 [26 34.5) 44.85 52 0.003420 34.5 48.15 Table 2 59.70 0.01720 Class-specific median of 1,25-DHCC and p-value for comparing transplants and controls within that class