Department of Biostatistics University of North Carolina. Institute of Statistics Mimeo Series No Oct 1997

Similar documents
Modified Simes Critical Values Under Positive Dependence

Applying the Benjamini Hochberg procedure to a set of generalized p-values

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao

IMPROVING TWO RESULTS IN MULTIPLE TESTING

On adaptive procedures controlling the familywise error rate

Hochberg Multiple Test Procedure Under Negative Dependence

Lecture 6 April

ON TWO RESULTS IN MULTIPLE TESTING

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

A class of improved hybrid Hochberg Hommel type step-up multiple test procedures

A Mixture Gatekeeping Procedure Based on the Hommel Test for Clinical Trial Applications

Adaptive, graph based multiple testing procedures and a uniform improvement of Bonferroni type tests.

Control of Directional Errors in Fixed Sequence Multiple Testing

CHL 5225H Advanced Statistical Methods for Clinical Trials: Multiplicity

Group Sequential Designs: Theory, Computation and Optimisation

On Generalized Fixed Sequence Procedures for Controlling the FWER

More Powerful Tests for Homogeneity of Multivariate Normal Mean Vectors under an Order Restriction

Mixtures of multiple testing procedures for gatekeeping applications in clinical trials

MULTISTAGE AND MIXTURE PARALLEL GATEKEEPING PROCEDURES IN CLINICAL TRIALS

Likelihood Ratio Tests and Intersection-Union Tests. Roger L. Berger. Department of Statistics, North Carolina State University

PRINCIPLES OF STATISTICAL INFERENCE

Familywise Error Rate Controlling Procedures for Discrete Data

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

On weighted Hochberg procedures

Multiple Endpoints: A Review and New. Developments. Ajit C. Tamhane. (Joint work with Brent R. Logan) Department of IE/MS and Statistics

Multiple Dependent Hypothesis Tests in Geographically Weighted Regression

Alpha-Investing. Sequential Control of Expected False Discoveries

The International Journal of Biostatistics

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation

Exact and Approximate Stepdown Methods For Multiple Hypothesis Testing

INCOMPLETE MULTIRESPONSE DESIGNS AND SURROGATE ENDPOINTS IN CLINICAL TRIALS. Pranab K.

Katz Family of Distributions and Processes

MONTE CARLO ANALYSIS OF CHANGE POINT ESTIMATORS

Declarative Statistics

INTRODUCTION TO INTERSECTION-UNION TESTS

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Understanding Ding s Apparent Paradox

Statistica Sinica Preprint No: SS R1

Probability and Stochastic Processes

Multiple Testing of General Contrasts: Truncated Closure and the Extended Shaffer-Royen Method

Department of Biostatistics University of North Carolina at Chapel Hill

Adaptive designs beyond p-value combination methods. Ekkehard Glimm, Novartis Pharma EAST user group meeting Basel, 31 May 2013

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Comparing Adaptive Designs and the. Classical Group Sequential Approach. to Clinical Trial Design

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

HANDBOOK OF APPLICABLE MATHEMATICS

ON THE USE OF GENERAL POSITIVE QUADRATIC FORMS IN SIMULTANEOUS INFERENCE

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Generalized Multivariate Rank Type Test Statistics via Spatial U-Quantiles

Testing hypotheses in order

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

Duke University. Duke Biostatistics and Bioinformatics (B&B) Working Paper Series. Randomized Phase II Clinical Trials using Fisher s Exact Test

Estimation of a Two-component Mixture Model

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Pubh 8482: Sequential Analysis

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex

Analysis of variance and linear contrasts in experimental design with generalized secant hyperbolic distribution

On robust and efficient estimation of the center of. Symmetry.

Stat 5101 Lecture Notes

arxiv: v1 [math.st] 28 Feb 2017

Optimal exact tests for multiple binary endpoints

Major Matrix Mathematics Education 7-12 Licensure - NEW

Applied Mathematics Research Report 07-08

An Alpha-Exhaustive Multiple Testing Procedure

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

n-step mingling inequalities: new facets for the mixed-integer knapsack set

Control of Generalized Error Rates in Multiple Testing

A UNIVERSAL SEQUENCE OF CONTINUOUS FUNCTIONS

Fiducial Inference and Generalizations

A note on tree gatekeeping procedures in clinical trials

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES

Research Note: A more powerful test statistic for reasoning about interference between units

Adaptive Designs: Why, How and When?

Sample size re-estimation in clinical trials. Dealing with those unknowns. Chris Jennison. University of Kyoto, January 2018

ADMISSIBILITY OF UNBIASED TESTS FOR A COMPOSITE HYPOTHESIS WITH A RESTRICTED ALTERNATIVE

On detection of unit roots generalizing the classic Dickey-Fuller approach

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Step-down FDR Procedures for Large Numbers of Hypotheses

Optimal Sequential Procedures with Bayes Decision Rules

INFERENCE APPROACHES FOR INSTRUMENTAL VARIABLE QUANTILE REGRESSION. 1. Introduction

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

Testing Statistical Hypotheses

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

The Skorokhod reflection problem for functions with discontinuities (contractive case)

Adaptive Extensions of a Two-Stage Group Sequential Procedure for Testing a Primary and a Secondary Endpoint (II): Sample Size Re-estimation

Testing the homogeneity of variances in a two-way classification

STAT 461/561- Assignments, Year 2015

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Journal of Biostatistics and Epidemiology

Consonance and the Closure Method in Multiple Testing. Institute for Empirical Research in Economics University of Zurich

Chapter 1 Statistical Inference

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Multistage Tests of Multiple Hypotheses

arxiv: v4 [stat.me] 20 Oct 2017

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

COMPARING TRANSFORMATIONS USING TESTS OF SEPARATE FAMILIES. Department of Biostatistics, University of North Carolina at Chapel Hill, NC.

Estimation of parametric functions in Downton s bivariate exponential distribution

Received: 2/7/07, Revised: 5/25/07, Accepted: 6/25/07, Published: 7/20/07 Abstract

Transcription:

SOME REMARKS ON SIMES TYPE MULTIPLE TESTS OF SIGNIFICANCE by Pranab K. Sen Department of Biostatistics University of North Carolina Institute of Statistics Mimeo Series No. 2177 Oct 1997

Some Remarks on Simes type multiple tests of significance PRANAB K. SEN Departments of Biostatistics and Statistics, University of North Carolina, Chapel Hill, NC 27599-7400 e-mail: pksen@bios.unc.edu Abstract Simes' theorem, posed in a multiple testing setup, has its genesis in the classical ballot theorem in uniform order statistics. The purpose of this note is to comment on this rediscovery of the Ballot theorem in multiple testing theory, and with a view to enhance scope of applications, to stress its ramifications in the same context. This privides sharper error rates. AMS Subject Classifications: 62J15, 62E15. Keywords: Ballot Therorem; Bonferroni procedure; discrete distributions, step-down procedure's, uniform order statistics. 1 Introduction Let a null hypothesis Ho be expressible as the intersection of k(~ 2) components HOb"" HOk' who:>e respestive alternatives are denoted by Hu,..., Hlk; then the alternative hypothesis to the null one is HI that can be expressed as the uninon of the components H Ij, j = 1,..., k. Let TI,...,Tk be the test statistics for testing these component (null vs. alternative) hypotheses, and let their observed significance level (OSL) or the so called p-values be denoted by PI,"" Pk respectively. We denote the ordered p-values by P(I),..., P(k) respectively. Then Simes' (1986) theorem, quite popular in multiple testing theory, can be stated as follows. Theorem 1. If the T j are independent, then under Ho, for every a E (0,1), P{P(j) > ja/k, for all j = 1,..., k} =1- a. (1.1).. Simes (1986) has an elegant proof bsaed on the method of induction. Basically, Simes' theorem is a reinstatement of the classical ballot theorem related to uniform order statistics, as may be found it the texts by Takacs (1967) or Karlin (1969). We shall make this point clear in the next section Wha1 is more in this context is the intricate connection between the two setups that generates addition~.1 multiple testing procedures some of which have already been considered by others [ viz., Hochberg 1988, Hommel, 1988, Hochberg and Rom, 1995, Benjamini and Hochberg, 1996 and Samuel-Cahn, 1996, among others). In Section 3, we will add some further feathers to this hat by incorporating 1

2 PRANAB K. SE,j\' some exact results in this setup. Finally, in Section 4, along with the case of discrete distributions for p-values, the case of dependent tests is discussed with due emphasis on the step-down procedures. 2 The ballot theorems Let U1,..., Uk be k independent and identically distributed (i.i.d.) random variables (r.v.) having the uniform (0, 1) distribution, and let U(1) < '" < U(k) be the associated ordered r.v.'s. Lf'~, Gk(t) = k- 1 2:7=1 1(U1 ~ t), t E (0,1) be the empirical distribution function (d.f.). Then note the' Gk(t) = ilk, for U(i) ~ t < U(i+l), i = 0,1,..., k, where conventionally, Uo = 0, U(k+1) = 1. ballot theorem can then be stated as follows. Theorem 2. For every I ~ 1, and every k ~ 1, t' (2.1) We refer to Karlin (1969, p.251) for an elegant proof. It is easy to verify that Gk(t) < It, Vt E (0,1) :? U(i) > i/(k, ), i = 1,..., k. (2.2).. Therefore, the same probability holds for (2.2). Let us now compare Theorems 1 and 2. Note that under the null hypothesis, the p-values PI,..., Pk are i.i.d.r.v.'s with the uniform (0,1) distribution, so that the P(j) correspond to the uniform order statistics U(j). Hence, letting I = a-i, we readily obtain that (1.1) and (2.1) are equivalent. The Simes procedure provides only a test for the overall null hypothesis. Hommel (1988) coihdered a stagewise rejective multiple testing procedure, while Hochberg (1988) considered an extended Simes procedure wherein he compared P(i) against a/(k - i + 1), i = 1,..., k, and showed that works out well in the multiple hypothesis testing setup. Yochberg and Rom (1995) have some further results in this direction with emphasis on logically related hypotheses (LRH). Keeping these in mind, we consider the following extension of the ballot theorem, where the order statistics U(i) are defined as in before. Theorem 3. For every 0< a1 <... < ak < 1; k ~ 1, P{a} = P{ U(j) ~ aj, j = 1,...,k} (2.3) where for each j(= 1,..., k), and U E (aj, 1),,, and u Hkj(U) =l Hk(j_1)(Uj)duj, j =2,... k; J. Proof. Note that the joint density of U(j), j = 1,..., k is given by k! 1(0 < U1 <... < Uk < 1), HkO(U) = 1(u ~ aj). (2.4) (2.5) (2.6) where {Uj ~ aj,j = 1,..., k} is a subset of the cone {o < U1 <... < Uk < I}. Hence, the rest of the proof follows by some standard arguments.

MULTIPLE COMPARISONS AND SIMES' TEST 3 In passing we may remark that the aj may generally depend on k, and hence the Hkj, even though dependent only on a r, r ~ j, may depend on k. These recursive relations are helpful foj actual numerical evaluation of P {a} in specific situations that would be considered in the next section. 3 Multiple comparisons We consider some specific values of k. For k = 1, we note that by definition P{ad = l-al = H ll(i), so that al = 0:, as it should be. For k = 2, we have so that al = ~a2 = 0:/2 leads to an exact 1-0: probability for P{at, a2}. This is in agreement with Simes (1986) as well as Hochberg (1988). The attainment ofthe exact level 1-0: may also be accomplished with some other combinations of ai, a2 for which the right hand side of (3.1) is equ3j. to 1-0:. For example, allowing al to approach a2 we get the Bonferroni solution a2 =1-,/(l - 0: y, and allowing al to be arbirary close to 0, we have a2 larger than 0:. In this way, a variety of ffiultipl. testing procedures can be considered along the same line as in Hochberg (1988). Consider next i le case of k = 3. Using the recursion relation, we obtain that 6H33(1) = 6(1- at}(1 - a2)(1 - a3) (3.1) -3(1- a2)2(1- a3) - 3(1- at}(1- a3)2 + (1- a3)3. (3.2) If as in Hochberg (1988), we let a = a3 = 2a2 = 3al, then the above expression simplifies to (1 - a)(1 + ta2) that is strictly greater than 1 - a, so that choosing a = 0: the overall significance level is given by 1 - (1-0:)(1 + to: 2 ) that is strictly smaller than 0:. Consequently, solving for (1 - a)(1 + ta2) = 1-0:, we can achieve an exact significance level 0:. It also shows that there are other combinations ofthe aj that lead to the attainment of the exact level 1-0: for P{at, a2, a3}. The recursive relations can be used successively to consider the case of k ~ 3, and instead of the Hochberg (1988) solution, a slightly modified one can be obtained that has an exact size. This way, Simes procedure can be adapted better to multiple hypothesis testing problems. In general for any k ~ 2, if we relate the aj = a. m(j, k), j = 1,..., k where the m(j, k) are preassumed nonnegative constants (and nondecreasing in j), then we may note that Hkk(l) is a polynomial of degree k in a with coefficients that are functions of the preassumed m(j, k)'s. Therefore, we are at liberty to choose a spending junction m(j, k), j ~ k that is appropriate in a particular context and leads to an attainment of the exact significance level 0:. Of course, such a spending function needs to be so chosen that the closure principle referred to in Hochberg (1988) and Hochberg and Rom (1995) holds for the subset hypothesis testing problem. The latter authors considered the LRH family (Shaffer, 1986), and our findings remain pertinent for that as well. There is, however, a basic theoretical query: Can some of these multiple testing procedures be justified on the ground of admissibility or minimaxity? As of now, the major emphasis has been laid down on the overall significance level and

MULTIPLE COMPARISONS AND SIMES' TEST 5 [l distribution of Pj, though concentrated on the unit interval (0, 1), has jump-discontinuitlul aun jump-magnitudes that depend on the underlying null distributions. Therefore, the P j are nolonger distribution-free. In this case, for k = 1, the value ofal may not be exactly equal to Q, and moreover, for a given Q(O < Q < 1), there may not be an exact solution for P{a} = 1 - Q. The situation may depend on whether we define the P j by including or excluding the observed points of the test statistics; as they have discrete null distributions, these two values may not generally agree. The inclusion may result in an OSL level larger while the exclusion may lead to a smaller than the value as obtained in the continuous case. A randomization test can be used to achieve this goal, but then the tie probability explicitly enters the picture, and the Ballot theorems may no longer be tenable. Whether this results in a conservativeness or not depends on the underlying null distributions, and hence, no blanket prescription can be made to suit all cases. Moreover, for different component hypotheses, the null hypothesis distributions might not be the same, and hence, their impact on tilt' distributions of the P j may vary from one to other components. We refer to Samuel-Cahn (19961 for some discussion on conservativeness aspects of the Simes test, and that applies to others as wdl Let us make some comments on the dependent case where the test statistics for the component hypotheses are not stochastically independent of each other (even under the null hypothesis). Again the dependence may be either of the positive association type or negative association type, and in the case of several component hypotheses, there could be a mixed-type dependence too. Thus, the effect of dependence on the Simes-Hochberg type multiple tests may depend on the underlying dependence pattern. If the P j are not stochastically independent, the Simes theorem or the parent ballot theorems may no longer be true. As a simple illustration consider the case where the Pi are exchangeable (i.e., symmetric dependent) r.v.'s, each having marginally the uniform (0, 1) distribution. If the dependence is of negative type the result will be different from the independence case as well as the case of positive dependfence. This can be studied as in Samuel-Cahn (1996) where the bivariate normal case has been dealt with in detail. In multiple tests of significance, the step-down procedure, introduced by J. Roy (1958) for multlvariate analysis and popularized by the pioneering work oflate S. N. Roy and his associates ( Roy ~t al., 1971), occupies a prominent place. The step-down procedure relates to a finite union-intersection principle wherein there are essentially k, a fixed positive integer, component hypotheses that are to be tested in an order of priority or importance. The main feature of this procedure is that the OSL values as obtained in the subsequent steps are quasi-independent under the null hypothesis, though the usual test statistics for the component hypotheses are generally dependent when viewed simultaneously. This finite VI-structure results in a sharper error rate for the component hypotheses, though in terms of consistency, the infinite VI-tests generally perform better. This feature was incorporated by Sen (1983) in proposing a Fisherian detour of the step-down procedure based on the p-values for the subsequent steps. It follows from the same argument that the ballot theorems (2 and 3) apply to these p-values as well, and hence, the Simes-Hochberg multiple tests of significance apply also to the step-down procedures. For some allied results on multiple testing based on OSL values, we refer to Sen (1988); the findings of the current study also pertain to those setups.

4 PRANAB K. SEN family-wise error rates for subfamilies of hypotheses, and when it comes to the question of l,)wer properties, the alternative hypotheses may be too broad to allow a simple resolution. In many problems of practical interest, we encounter multiple tests of significance where k, the number of components, may not be very small. For example, in a clinical trial involving the group sequential methodology, one may have a value of k as large as 10. Although Theorem 3 can still be used in such a context with recursive computations, there remains the open question: Is there any optimality or desirability property of the Simes-Hochberg tests in such a context? As has been discussed in Sen (1998) that in a multiparameter setting, even in most simple parametric models, a universally optimal multiple testing procedure may not generally exit. The choice of a multiple testing procedure is largely guided by practical considerations and robustness cum validity perspectives. In that way the flexibility of choice of the aj, granted by Theorem 3 offers a variety of OSL based tests. In this context, the Fisher method of combinining independent tests has some asymptotic optimality properties; though in finite sample cases and for broader families of alternatives, this asymptotic optimality criterion might not be very appealing. If a group-sequential plan is adopted, the stopping number relating to the stopping of the trial at an intermediate stage has a distributi0u that might depend on the null hypothesis being true or not. Hence, in judging the merits of a testing scheme, these factors are to be taken into account. Guided by robustness considerations, fol lar~e k, it might be appealing to choose a comparatively smaller value r, and let, al <... < ar = ar+l =... = ak (3.3) Thus, effectively more importance is attached to the largest r p-values. The expression for P{a} in Theorem 3 then reduces to P{a} = k[r] 1 Hk(r-l)(u)(l- u)k-rdu, (3.4) Gr where k[r] == k(k - 1)... (k - r + 1), r 2: 1, and k[o] = 1. When we do not want to attach too much importance to just a few components, we may choose the opposite way, namely al =... = ar < ar+l <... < ak, for some r < k. (3 5) This way we can make use of Theorem 3 in a variety of models of practical importance. For large values of k, the weak convergence of the empirical distributional process or equivalently the ulliforn quantile process to a Brownian bridge can be most convenioently used to provide some simple approximations; we refer to Sen (1998) for some allied results. In this specific situation at hand, since we are dealing with uniform order statistics, a value of k as low as 16 would justify such asymptotic approximations. 4 Discrete and dependent cases There has been lot of discussions on the impact of discreteness of the underlying null hypothesis distributions of the test statistics as well as on the case where the component test statistics are not necessarily independent. In the discrete case, the Pj may not have the continuous uniform (0,1) distribution (under the null hypothesis), and as has been noted by Schmid (1958), the null hypothesis

6 PRANAB K. SEN References.. Benjamini, Y, and Hochberg, Y (1996). More on Simes' test. Preprint. Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800-802. Hochberg, Y, and Rom, D. R (1995). Extensions of multiple testing procedures based on Simes'test. J. Statist. Plan. Infer. 48, 141-152. Hommel, G. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75, 383-386. Karlin, S. (1969). A First Course in Stochastic Processes. Academic Press, New York. Roy, J. (1958). Step-down procedures in multivariate analysis. Ann. Math. Statist. 29, 1177-1188. Roy, S. N., Gnanadesikan, R, and Srivastava, J. N. (1971). Analysis and Design of Certain Quantitative Multiresponse Experiments. Pergamon Press, New York. Samuel-Cahn, E. (1996). Is the Simes' improved Bonferroni procedure conservative? Biometrika, 83, 928-933. Schmid, P. (1958). On the Kolmogorov and Smirnov limit theorems for discontinuous distribution functions. Ann. Math. Statist. 29, 1011-1027. Sen, P. K. (1983). A Fisherian detour of the step-down procedure. In Contributions to Statistics: Essays in Honour of Norman L. Johnson (ed. P. K. Sen), North Holland, Amsterdam, pp. 367-377. Sen, P. K. (1988). Combination of statistical tests for multivariate hypotheses against restricted alternatives. In Advasnces in Multivariate Statistical Analysis (ed. S. Dasgupta and J. K. Ghosh), Indian Statistical Institute, Calcutta, pp. 377-402. Sen, P. K. (1998). Multiple comparisons in interim analysis. Preprint Shaffer, J. P. (1986). Modified sequentially rejective multiple test procedures. J. Amer. Statist. Assoc. 81, 826-831. Simes, R J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73, 751-754. Takacs, L. (1967). Combinatorial Methods in the Theory of Stochastic Processes. John Wiley, New York.