are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

Similar documents
Multiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests

Generalized Augmentation for Control of the k-familywise Error Rate

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

arxiv: v1 [stat.ot] 7 Jul 2010

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

Some Proofs: This section provides proofs of some theoretical results in section 3.

Non-Parametric Non-Line-of-Sight Identification 1

FDR- and FWE-controlling methods using data-driven weights

Simultaneous critical values for t-tests in very high dimensions

Topic 5a Introduction to Curve Fitting & Linear Regression

Block designs and statistics

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

Introduction to Machine Learning. Recitation 11

Selecting an optimal rejection region for multiple testing

Biostatistics Department Technical Report

Selecting an Optimal Rejection Region for Multiple Testing

Kernel Methods and Support Vector Machines

Testing equality of variances for multiple univariate normal populations

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

In this chapter, we consider several graph-theoretic and probabilistic models

Distributed Subgradient Methods for Multi-agent Optimization

A Simple Regression Problem

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Estimating Parameters for a Gaussian pdf

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Data-driven hypothesis weighting increases detection power in genome-scale multiple testing

Fairness via priority scheduling

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Bootstrapping Dependent Data

Bayes Decision Rule and Naïve Bayes Classifier

Control of Directional Errors in Fixed Sequence Multiple Testing

Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression

Pattern Recognition and Machine Learning. Artificial Neural networks

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Artificial Neural networks

Probability Distributions

OBJECTIVES INTRODUCTION

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Machine Learning Basics: Estimators, Bias and Variance

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

Multiscale Entropy Analysis: A New Method to Detect Determinism in a Time. Series. A. Sarkar and P. Barat. Variable Energy Cyclotron Centre

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

3.8 Three Types of Convergence

Support recovery in compressed sensing: An estimation theoretic approach

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS

Analyzing Simulation Results

1 Proof of learning bounds

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Ensemble Based on Data Envelopment Analysis

Stochastic Subgradient Methods

Feature Extraction Techniques

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

Effective joint probabilistic data association using maximum a posteriori estimates of target states

A method to determine relative stroke detection efficiencies from multiplicity distributions

Computational and Statistical Learning Theory

Scale Invariant Conditional Dependence Measures

An Introduction to Meta-Analysis

When Short Runs Beat Long Runs

Chapter 6 1-D Continuous Groups

Linguistic majorities with difference in support

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Meta-Analytic Interval Estimation for Bivariate Correlations

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

Combining Classifiers

Weighted- 1 minimization with multiple weighting sets

Interactive Markov Models of Evolutionary Algorithms

3.3 Variational Characterization of Singular Values

arxiv: v1 [stat.me] 1 Aug 2016

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A general forulation of the cross-nested logit odel Michel Bierlaire, Dpt of Matheatics, EPFL, Lausanne Phone: Fax:

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5

Understanding Machine Learning Solution Manual

ON LEAST FAVORABLE CONFIGURATIONS FOR STEP-UP-DOWN TESTS

Randomized Recovery for Boolean Compressed Sensing

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

1 Bounding the Margin

Bayesian Approach for Fatigue Life Prediction from Field Inspection

STOPPING SIMULATED PATHS EARLY

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

Correcting a Significance Test for Clustering in Designs With Two Levels of Nesting

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

Least Squares Fitting of Data

SEISMIC FRAGILITY ANALYSIS

COS 424: Interacting with Data. Written Exercises

Mechanics Physics 151

Moments of the product and ratio of two correlated chi-square variables

An improved self-adaptive harmony search algorithm for joint replenishment problems

IAENG International Journal of Computer Science, 42:2, IJCS_42_2_06. Approximation Capabilities of Interpretable Fuzzy Inference Systems

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Transcription:

Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations and Definitions Let denote the nuber of genes in the data that has gene expressions for each gene on p categories tuor sizes and noral tissue. Let µ ij denote the ean response corresponding to the i-th category in j-th gene, i =,..., p, j =,...,. The proble of biological interest we discuss in the context of uterine fibroid data is to detect genes that are differentially expressed in a tuor size category copared to noral saple. Thus, if the last category, p, corresponds to noral saple, we need to test the pairwise differences θ ij = µ ij µ pj, i =, 2,..., q and j =, 2,...,, are equal to zero, where, q = p. For each gene j, the pairwise null and alternative hypotheses are, H j 0i : θ ij = 0 against H j i : θ ij 0, i =,..., q. S For each gene j, we have a vector of paraeters θ j = θ j, θ 2j,..., θ qj. We first need to find out the genes that are differentially expressed in at least one tuor saple copared to noral saples. Thus, we define the null and alternative screening hypotheses to test the significance of each gene as, H j 0screen : θ j = 0 against H j screen : θ j 0, j =,...,, S2 for testing whether all paraeters θ ij, i =,..., q are siultaneously 0 or not, equivalently, whether all µ ij, i =,..., p are equal or not. These hypotheses give rise to failies of hypotheses corresponding to genes with each faily having a screening hypothesis and q pairwise hypotheses. Figure S shows a siple graphical representation of the structure of hypotheses in our forulation. Let x k ij denote the kth observed gene expressions of the j th gene in i th group, k =,..., n i with n i being the saple size for i th group, j =,...,, i =,..., p. Let T ij and P ij, i =,..., q, j =,...,, denote the test statistics and the p-values respectively for testing H j 0i. The test statistics for testing the screening hypotheses H j 0screen are obtained as a function of T ij, i =,..., q, for instance, the highest order statistic of T ij, i =,..., q. We denote the p-values for testing screening hypotheses as P j screen. For each faily j we denote a vector of p-values, P j = P j, P 2j,..., P qj based on the test statistics T j = T j, T 2j,..., T qj, for testing the pairwise hypotheses in S. If H j 0i is rejected we conclude on direction, i.e., declare θ ij > 0 if T ij > 0 or declare θ ij < 0 if T ij < 0. Given the screening p-values P j screen, for every j =, 2,...,, to carry out the siultaneous testing of the screening hypotheses in S2, we use the BH-procedure [], that controls the FDR at a given level α. This is a step-up procedure as follows: given the ordered screening p-values P screen P screen 2 P screen and the corresponding screening null hypotheses H 0screen, H 0screen 2,..., H 0screen,

Page 2 of 8 find, R = ax { j : P screen j jα/} and reject H 0screen,..., H 0screen R, provided the axiu exists, otherwise, accept all the screening hypotheses. When an H j 0screen : θ j = 0 is rejected, further decisions are ade on the pairwise hypotheses in S and on rejection, directional decisions are ade on the signs of the coponent θ ij. A Type I error ight occur due to wrongly rejecting H j 0screen or correctly rejecting H j 0screen but wrongly rejecting Hj 0i for soe i =,..., q. A Type II error ight occur due to failing to reject a false null hypothesis H j 0screen or correctly rejecting a false H j 0screen but failing to reject a false null pairwise hypothesis H j 0i for soe i =,..., q. A directional error Type III error ight occur due to correctly rejecting H j 0screen but wrong assignent of the sign of θ ij while correctly rejecting H j 0i : θ ij = 0. The general practice in any ultiple testing proble is to find a procedure that controls the Type I errors and iniizes the Type II errors. Here, we need to control Type I as well as Type III errors. A practical way of doing that would be to use an error rate cobining both Type I and Type III errors in the FDR fraework and ake sure that it is controlled. An error rate that cobines Type I errors and Type III errors in FWER setup is dfwer [2, 3], which is the probability of aking at least one Type I error or Directional error. Heller et al. [4] used the Overall False Discovery Rate OFDR, which is the expected proportion of falsely discovered gene sets out of all discovered gene sets, as an appropriate error easure to control, in their two-stage procedure for identifying differentially expressed genes and gene sets. The concept of OFDR was introduced by Benjaini and Heller [5] in the context of testing partial conjunction hypotheses. Inspired by Heller et al. [4] and Shaffer [2], Guo et al. [6] define the ixed directional False Discovery Rate dfdr defined below. Let V j denote the indicator function of at least one Type I error or Directional Error coitted while testing faily j and the pairwise hypotheses in it, i.e., V j is if either H j 0screen is falsely rejected or Hj 0screen is correctly rejected but at least one Type I error or Directional error occurs while testing pairwise hypotheses in the faily j; V j is 0 otherwise. Let, R denote the nuber of screening hypotheses rejected by a ultiple testing procedure, that is, R = ax { j : P screen j jα/}. Then, dfdr is forally defined as follows. Definition : dfdr - ixed directional False Discovery Rate. The expected proportion of Type I and Directional errors aong all discovered failies, [ j= df DR = E V j ]. S3 axr, S2 Proof of Theore Proof Let, I 0 denote the set of true null screening hypotheses H j 0screen and I denote set of false H j 0screen with I 0 = 0 and I =, 0 + =. Fro definition of dfdr, [ j= df DR = EQ = E V j ]. S4 axr, Let, P screen P screen, denote the ordered screening p-values. In the event that R = r, P screen k rα/ for k =, 2,..., r and P screen k > r +α/

Page 3 of 8 for k = r +,...,. Consequently, we have r nuber of P j screen s that are rα/. Then, S4 can be written as, EQ = P rv j =, R = r r = r= j= r= j I 0 + r P r r= j I Pscreen j rα, R j = r r P r Pscreen j rα, Type I or Type III error at j, R j = r. S5 where, R j denotes the nuber of screening hypotheses rejected fro the set of screening hypotheses {H0screen, H0screen, 2..., H j 0screen, Hj+ 0screen,..., H 0screen} by using the step-up procedure with the critical values i + α/, i =,...,. First consider the second ter in S5. By the assuption of independence of p-value vectors we can write it as follows: r= j I r= = α. r P r Pscreen j rα, Type I or Type III error at j P r R j = r j I r rα P r R j = r S6 S7 The inequality in S6 follows as we use an dfwer controlling procedure at level rα/ for each significant faily and as j I, the probability of aking at least one Type I error or directional error in faily j is rα/. Suing over all values of r, the equality in S7 follows by noting that r= R P r j = r =. Next consider the first ter in S5. By the assuption of independence of p-value vectors we can write it as follows: r= j I 0 r= = 0α. P r P r screen j rα P r R j = r j I 0 r rα P r R j = r S8 S9 The inequality in S8 follows due to the fact that the true null p-values are stochastically larger than or equal to U0,. Suing over all values of r, the equality

Page 4 of 8 in S9 follows by noting that r= P r R j = r =. The result follows by cobining S7 and S9. In the theore we only assue that the p-value vectors are independent and we do not discuss about the coponent pairwise p-values. This iplies that the p-values within a gene across tuor saples ay have any dependence structure. The dfwer controlling procedure used will tell us under what kind of dependence structures of p-values within genes is the procedure valid. For exaple, if we use Hol s dfwer controlling procedure, which is proved to control the d- FWER when the p-values are independent, then this theore is valid under the additional assuption that the pairwise p-values P ij, i =, 2,..., q of a vector P j are independent. The generality of this algorith akes it a flexible procedure to apply to several practical situations where ultidiensional directional decisions are required to ake. Although, in the paper, we discuss testing of differential gene expression in each tuor size against the noral saple for each gene, this procedure can be applied to any type of pairwise coparison desired to be tested for each gene. For exaple, if it is of interest to group genes by the inequalities aong the ean responses, we would want to detect the pattern of ean responses in the p categories, known as directional pattern, and see how the ean responses vary across the categories. Soe coon inequalities are µ j µ 2j µ pj onotone pattern, µ j µ ij µ i+j µ pj ubrella pattern with peak µ ij. To test for the pattern we need to test the differences of ean response of the categories, θ ij = µ i+j µ ij, i =, 2,..., q and j =, 2,..., and q = p. If the proble of interest is testing all pairwise differences of the p categories, possibly unordered, then q = pp /2. Based on the question we want to answer fro a data, appropriate ethodology can be developed fro this general procedure. S3 Details of Statistical Methodology for FGS Gene Expression Data Dunnett P screen for Step : The scenario is of coparing ultiple groups with a coon control group and the standard ethod used in this situation is the Dunnett test [7]. Dunnett test is a powerful ethod that is designed specifically for this kind of coparison. The test assues that the underlying distributions of the data fro the different groups have sae variance and the test statistics are obtained by using a pooled estiate of the variance. This assuption is valid for the Uterine fibroid data as the gene expressions are noralized to have siilar eans and variances for coparability. The test statistic for testing S is given by, Tij Dunn = x ij x pj, s j n i + n p S0 where, x ij = /n i n i k= xk ij, i =,..., p; j =,..., are the saple eans and s 2 j = p ni i= k= x k 2/ν, ij x ij j =,...,, are the pooled saple variances, with ν = p i= n i p. The null distribution of each Tij Dunn is univariate t-distribution with ν degrees of freedo. The vector of Dunnett test statistics = Tj Dunn,..., Tqj Dunn has a q-variate t-distribution with ν degrees of T Dunn j

Page 5 of 8 freedo and correlation atrix R = ρ ik q q, where ρ ik are defined in Dunnett. The Dunnett-adjusted critical value for the two sided test for { T Dunn ij, i =,..., q }, denoted by u α q, ν, is the quantile of the above q-variate t-distribution such that, P r Tj Dunn or equivalently, P r ax {i=,...,q} u α,..., T Dunn qj α u α = 2, T Dunn ij uα = α 2. The observed values of Tij Dunn, t Dunn ij, say, are copared to u α q, ν and we reject H j 0i if t Dunn ij > uα q, ν. For each gene j we have a vector of observed Dunnett test statistics, t Dunn ij = t Dunn j,..., t Dunn qj. Let, t ax j = ax i=,...,q t Dunn ij. We obtain the screening P -value for testing the screening hypotheses S2 as follows: P r ax T ij > t ax j {i=,...,q} = P r t ax j T ij t ax j, i =,..., q, S where, the probability is obtained fro the CDF of q-variate t-distribution with ν degrees of freedo and the correlation structure defined in Dunnett [7]. Let R denote the nuber of rejected screening hypotheses while applying the BH procedure to these screening p-values. Dunnett dfwer controlling procedure for Steps 2-3: We use Dunnett procedure to obtain the Dunnett-adjusted p-values, Pij Dunnett, for testing the pairwise hypotheses as follows, Dunnett P ij = 2 P r ax {i=,...,q} T Dunn ij t Dunn ij. S2 We reject H j Dunnett 0i if the corresponding adjusted P -value P ij Rα and conclude θ ij > 0 if Tij Dunn > 0 and vice versa. S4 Suppleentary Results for the Siulation Study Methods Used in Siulation Study In this section we describe the different dfdr controlling ethodologies used in the siulation study. We develop ethodologies by cobining Dunnett screening procedure with four different dfwer controlling procedures for steps 2 and 3 and copare the perforance of the resulting four ethodologies with Guo et al. [6] ethodology in ters of dfdr control and power. Screening Procedure for Step : Dunnett P screen : The Dunnett ethod [7, 8, 9] is a powerful ethod specifically designed to test hypotheses where several treatents are copared with a coon control in an unbalanced one-way layout. The ultiple pairwise coparisons we

Page 6 of 8 ake with the FGS gene expression data fit into the fraework of Dunnett test [7]. The test statistics, T ij, for testing the pairwise hypotheses, are obtained as described in [7], with details given in Section S3. Procedures for Steps 2 and 3: Dunnett Procedure: We obtain the Dunnett-adjusted pairwise p-values [8, 9], to be used in Step 2 of the algorith and call the P ij. The details are given in section S3. The procedure rejects H j 0i if Pij Rα q and conclude on sign of θ ij based on the sign of the test statistic T ij. Hol Procedure: We use Hol s step-down procedure at level Rα within each significant gene. Order the pairwise p-values for each significant gene j as P j P j q with corresponding hypotheses denoted as H j 0,..., Hj 0 q. Let k k = Rα,..., q be the axiu index such that P j i q i+ reject H j 0,..., Hj 0 for all i k, then k, conclude on direction based on the sign of the test statistic and accept the rest of the hypotheses. Hochberg Procedure: We use Hochberg s step-up procedure at level Rα within each significant gene to identify significant categories. Let k k =,..., q be the Rα axiu index such that P j k q k+, then reject Hj,..., H j k, conclude on direction based on the sign of the test statistic and accept the rest of the hypotheses. Bonferroni Procedure: Bonferroni procedure is a coonly used single step ultiple testing procedure that strongly controls FWER. Reject H j 0i if P ij Rα q and conclude on direction based on the sign of the test statistic T ij. Guo et al. [6] Procedure: The procedure of Guo et al. [6] is a special case of the general testing procedure. They first obtain the p-values, {P j,..., P qj }, for testing the pairwise hypotheses in S and use the Bonferroni pooling to obtain the screening p-values as follows: P j screen = q in {P j,..., P qj }. Note down R, the nuber of significant genes. For each significant gene, use the Bonferroni procedure discussed above to identify significant pairwise differences and conclude on direction. Results for the Siulation Study In this section we present the results fro the siulation study that consider different kinds of dependencies of the gene expressions. The dependence within genes across groups eans that the gene expressions fro different tuor saples are dependent but given any two genes for a saple, the expressions are independent between the two genes; as several tuor saples belong to sae subject, this kind of dependence structure is valid. The dependence aong genes eans that the gene expressions fro different genes are dependent but given any two saples for a gene, the expressions are independent; as the genes belonging to sae gene set have siilar activity, this kind of dependence structure is also valid in the FGS icroarray data.

Page 7 of 8 We define the concept of average power for the three step procedure for coparing different ethodologies that control the dfdr at the sae level. Let, R denote the set of indices of rejected screening hypotheses H j 0screen with R = R. Let, R j, j R denote the nuber of pairwise hypotheses H j 0i rejected for each significant gene. Let, S denote the nuber of false null screening hypotheses rejected and let, S j, j R denote the nuber of false null pairwise hypotheses rejected for each discovered gene. Then, the average power for the three step procedure is defined as follows: Definition 2: Average Power in Three Step Procedure: The average power is defined as the expected proportion of false null hypotheses rejected in Step and Step 2 aong all rejections in Step and Step 2, S + Average Power = E j R S j ax R +. j R R j, S3 The different dfdr controlling procedures that are shown in the Figures 4-6 and Figures S2-S5 are: i The proposed ethodology Dunnett, ii Dunnett screening and Hol procedure Hol, iii Dunnett screening and Hochberg procedure Hochberg, iv Dunnett screening and Bonferroni procedure Bonferroni and v Guo et al. [6] procedure. Dependence within genes across groups. In this case the coponents Z s ij, i =,..., p are dependent with Zs ij Nµ ij, and have a coon correlation ρ = 0.2, 0.5, 0.8. The results are suarized in Figures 5 and Figures S2-S3. All five procedures control the dfdr at less than α = 0.05. Once again, as in the case of independence, the proposed ethod gains in power copared to the other ethods. Dependence aong genes. We next considered the situation where gene expressions are dependent aong genes. For this siulation, the coponents Zij s, j =,..., are dependent with Zij s Nµ ij, and have a coon correlation ρ = 0.2, 0.5, 0.8. The results are suarized in Figure 6 and Figures S4-S5. Again, all five procedures control the dfdr at less than α = 0.05 and as in the case of independence, the proposed ethod gains in power copared to the other ethods. Author details References. Benjaini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to ultiple testing. Journal of the Royal Statistical Society: Series B 57, 289 300 995 2. Shaffer, J.P.: Control of directional errors with stagewise ultiple test procedures. Annals of Statistics 8, 342 347 980 3. Finner, H.: Stepwise ultiple test procedures and control of directional errors. Annals of Statistics 27, 274 289 999 4. Heller, R., Manduchi, E., Grant, G., Ewens, W.: A flexible two-stage procedure for identifying gene sets that are differentially expressed. Bioinforatics 25, 929 942 2009 5. Benjaini, Y., Heller, R.: Screening for partial conjunction hypotheses. Bioetrics 64, 25 222 2008 6. Guo, W., Sarkar, S.K., Peddada, S.D.: Controlling false discoveries in ultidiensional directional decisions, with applications to gene expression data on ordered categories. Bioetrics 66, 485 492 200 7. Dunnett, C.W.: A ultiple coparison procedure for coparing several treatents with a control. Journal of the Aerican Statistical Association 50, 096 2 955

8. Dunnett, C.W., Tahane, A.C.: Step-down ultiple tests for coparing treatents with a control in unbalanced one-way layouts. Statistics in Medicine 0, 939 947 99 9. Dunnett, C.W., Tahane, A.C.: A step-up ultiple test procedure. Journal of the Aerican Statistical Association 87, 62 70 992 Page 8 of 8