Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Similar documents
High-throughput Testing

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical Applications in Genetics and Molecular Biology

Large-Scale Hypothesis Testing

Looking at the Other Side of Bonferroni

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Lecture 7 April 16, 2018

Stat 206: Estimation and testing for a mean vector,

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University

The locfdr Package. August 19, hivdata... 1 lfdrsim... 2 locfdr Index 5

False Discovery Rate

Department of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007

arxiv: v1 [stat.me] 25 Aug 2016

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Hunting for significance with multiple testing

Lecture 6 April

Multiple testing: Intro & FWER 1

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

Non-specific filtering and control of false positives

Specific Differences. Lukas Meier, Seminar für Statistik

Advanced Statistical Methods: Beyond Linear Regression

A Large-Sample Approach to Controlling the False Discovery Rate

Step-down FDR Procedures for Large Numbers of Hypotheses

Estimation of a Two-component Mixture Model

Exam: high-dimensional data analysis January 20, 2014

Announcements. Proposals graded

FDR and ROC: Similarities, Assumptions, and Decisions

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao

Familywise Error Rate Controlling Procedures for Discrete Data

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR

Control of Generalized Error Rates in Multiple Testing

Lecture 21: October 19

By Bradley Efron Stanford University

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Resampling-Based Control of the FDR

Alpha-Investing. Sequential Control of Expected False Discoveries

Some General Types of Tests

Biostatistics Advanced Methods in Biostatistics IV

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE

Week 5 Video 1 Relationship Mining Correlation Mining

Machine learning: Hypothesis testing. Anders Hildeman

Statistical tests for differential expression in count data (1)

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Single gene analysis of differential expression

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

Inferential Statistics Hypothesis tests Confidence intervals

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Control of Directional Errors in Fixed Sequence Multiple Testing

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments

The miss rate for the analysis of gene expression data

Control of the False Discovery Rate under Dependence using the Bootstrap and Subsampling

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS

Package locfdr. July 15, Index 5

arxiv:math/ v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao

Peak Detection for Images

Sample Size Estimation for Studies of High-Dimensional Data

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

Modified Simes Critical Values Under Positive Dependence

Testing Statistical Hypotheses

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

Journal of Statistical Software

Multiple hypothesis testing using the excess discovery count and alpha-investing rules

Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments

Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq

A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

Procedures controlling generalized false discovery rate

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

Statistica Sinica Preprint No: SS R1

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Heterogeneity and False Discovery Rate Control

On adaptive procedures controlling the familywise error rate

Probabilistic Inference for Multiple Testing

False discovery rate regression: an application to neural synchrony detection in primary visual cortex

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1

Correlation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University

Type I error rate control in adaptive designs for confirmatory clinical trials with treatment selection at interim

Testing Statistical Hypotheses

IEOR165 Discussion Week 12

Frequentist Accuracy of Bayesian Estimates

Comparison of the Empirical Bayes and the Significance Analysis of Microarrays

SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE

STAT 461/561- Assignments, Year 2015

Mutual fund performance: false discoveries, bias, and power

Large-Scale Inference:

Superchain Procedures in Clinical Trials. George Kordzakhia FDA, CDER, Office of Biostatistics Alex Dmitrienko Quintiles Innovation

VALIDATION OF CREDIT DEFAULT PROBABILITIES VIA MULTIPLE TESTING PROCEDURES

Doing Cosmology with Balls and Envelopes

STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University

New Procedures for False Discovery Control

DETECTING DIFFERENTIALLY EXPRESSED GENES WHILE CONTROLLING THE FALSE DISCOVERY RATE FOR MICROARRAY DATA

Transcription:

Multiple Testing Hoang Tran Department of Statistics, Florida State University

Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome data: which bacteria are differentially expressed between two traits/conditions? Prostate cancer data: n = 102 subjects (52 cases and 50 controls) and N = 6033 genes. How do we test for differences in gene expression?

Large-Scale Testing 1. For the jth gene: compute the two-sample t statistic comparing gene expression between cases and controls (t j ). 2. Test H 0j : gene j s expression levels are the same between the two groups at significance level α. 3. Repeat for all N = 6033 genes. What s the problem?

Multiple Comparisons Running 100 separate hypothesis tests at α = 0.05 will produce about 5 significant results even if each case is actually null. Examples: Efficacy of a drug in terms of reduction of disease symptoms. Two methods of teaching writing are used on students. Students in the groups are compared in terms of grammar, spelling, etc. Increased likelihood of type I errors

Bonferroni Correction Family-Wise Error Rate: the probability of rejecting any true null hypothesis. Test each individual hypothesis at α/n. Let J 0 be the indices of the true H 0j with J 0 = N 0. ( FWER = P p j α ) N J 0 J 0 P { p j α } N = N 0 α N N α N = α The Bonferroni Correction ensures FWER α.

Bonferroni Correction No requirement of independence of p i s, but it s perhaps too conservative. N = 6033 and α = 0.05: only reject when p j 8.3 10 6! We want to control type I errors, but we also want to find interesting/significant genes.

Holm s Procedure Order the p-values: p (1) p (2) p (N) with H 0(j) the null hypotheses. Let j 0 be the smallest index j such that p (j) > α/(n j + 1) Reject all H 0(j) for j < j 0 and accept all with j j 0. Satisfies FWER α but not as conservative as Bonferroni (more rejections).

Stepdown Procedures Holm s procedure: look at most significant test first and continue rejecting hypotheses if p-values are small. An improvement: incorporate the dependence structure of individual tests.

Generic Stepdown Method K {1,..., N}, H K : intersection hypothesis that all H 0j with j K are true. T j is the jth test statistic. T (1) T (N) and H 0(j). Generic Stepdown Method 1. Let K 1 = {1,..., N}. If T (N) ĉ K1 (1 α) then accept all hypotheses and stop; otherwise, reject H 0(N) and continue. 2. Let K 2 be the indices of the hypotheses not previously rejected. If T (N 1) ĉ K2 (1 α) then accept all hypotheses and stop; otherwise, reject H 0(N 1) and continue.

Generic Stepdown Method How do we find the critical values ĉ K (1 α)? Under certain conditions, FWER α (Lehmann and Romano 2005). FWER P (max j J 0 T j > ĉ J0 (1 α)) α Critical values are the (1 α) quantile of max j K T j under H K. Not as conservative as Holm s procedure in general.

The Hypothesis of Homogeneity Consider testing for all ( N 2 ) pairs i < j. H i,j : µ i = µ j, i < j {H 1,2, H 2,3 } cannot be the set of all true hypotheses Previous methods allow acceptance of H 1,2 : µ 1 = µ 2 and H 2,3 but rejection of H 1,3

A Holm type approach Setup (N = 6): Normal random variables with common variance σ 2. X(1) X (N) and µ (j). ˆp (i),(j) : the p-value for testing µ (i) = µ (j). Procedure: 1. If ˆp (6),(1) α/ ( N 2 ), accept all hypotheses and terminate. Otherwise, reject µ (1) = µ (6) and continue. 2. Test the largest of X (6) X (2) and X (5) X (1) by comparing ˆp (6),(2) or ˆp (5),(1) with α/( ( N 2 ) 1). FWER is controlled at α.

An improvement Suppose we are at step 2 (µ (1) = µ (6) has been rejected). µ (1) = µ (2) or µ (2) = µ (6) must be false. Possible true hypotheses: ( 6 2) 5 = 10 < ( 6 2) 1 = 14. No violation of FWER and more rejections.

An improvement

FWER Summary Family-Wise Error Rate: the probability of rejecting any true null hypothesis. Bonferroni, Holm, and Generic Stepdown method all control FWER Holm and Generic Stepdown method have at least as much power as Bonferroni We can exploit the structure of pairwise tests to improve the Holm procedure s power for these situations

False-Discovery Rates FWER probably too conservative for very large N, such as N 20. For N in the thousands/millions, the issue is exacerbated. A more liberal criterion: False-Discovery Rates.

False-Discovery Rates

False-Discovery Rates The number of false discoveries (a) is unobservable. We want to minimize Fdp = a/r. Define FDR(D) = E(Fdp(D)). We can t observe Fdp but we can control FDR. Decision rule D controls FDR at q (0, 1) if FDR(D) q

Benjamini-Hochberg Procedure for FDR Control 1. For given q, let j max be the largest j such that p (j) j N q 2. Let D q be the rule that rejects H 0(j) for all j j max. If p-values are independent: FDR(D q ) = N 0 N q q. FDR is more generous than FWER.

Benjamini-Hochberg q-values p.adjust in R computes FDR (q-value). In practice, q < 0.10 is significant. Example: q = 0.05 means we expect 5% of significant tests to result in false positives.

Microbiome Example for FDR Goal: study the association of the microbiome with asthma exacerbations. n = 3122 samples and N = 268 taxa. Question: which taxa are differentially expressed between samples with/without asthma exacerbations? 498 exacerbators, 2624 non-exacerbators. These are (possibly overdispersed) count data so we fit negative binomial regressions for each taxa.

Microbiome Example for FDR

Bayesian Interpretation of FDR Each of the N cases is null with prior probability π 0 or non-null with probability π 1 = 1 π 0. Each z statistic has density f 0 (z) if null (i.e. N(0, 1)), f 1 (z) if non-null (unknown). F 0 (z) and F 1 (z) are cdf s with survival curves S 0 (z) = 1 F 0 (z) and S 1 (z) = 1 F 1 (z). Define S(z) = π 0 S 0 (z) + π 1 S 1 (z) and f(z) = π 0 f 0 (z) + π 1 f 1 (z).

Bayesian Interpretation of FDR Suppose z j > z 0 = 3. Then Fdr(z 0 ) P (case j is null z j z 0 ) = π 0 S 0 (z 0 )/S(z 0 ) S 0 (z 0 ) usually known (i.e. 1 Φ(z 0 )) Ŝ(z 0 ) = #{z j z 0 }/N Empirical Bayes estimate: Fdr(z 0 ) = π 0 S 0 (z 0 )/Ŝ(z 0)

Bayesian Interpretation of FDR p (j) (j/n)q from BH procedure beomes So then Fdr(z 0 ) π 0 q S 0 (z (j) ) Ŝ(z (j))q BH rejects cases for which the empirical Bayes posterior probability of nullness is too small.

A Note about False-Negative Rates The false negative proportion: Fnp = (N 1 b)/(n R) The expectation of Fnp is a measure of Type II error. Let A be the region in which a null hypothesis is accepted. Then 1 Fdr(A) estimates the Bayesian false negative rate.

Local FDR Instead of a tail-area probability, what about z j = z 0? Local FDR: fdr(z 0 ) = P (case j is null z j = z 0 ) = π 0 f 0 (z 0 )/f(z 0 ) π 0 is unknown but can be estimated (Efron 2010) or set to 1 (most cases are null) f(z) is unknown but can be estimated

Local FDR

Local FDR Conventionally interesting threshold: fdr(z) 0.2. Local and tail-area FDR: Fdr(z 0 ) = E[fdr(z) z z 0 ] Often Fdr(z 0 ) < fdr(z 0 ).

Local FDR Computing ˆf(z): A fourth-degree log polynomial Poisson regression fit to the histogram of z-values. See next figure.

Local FDR

Choosing the Null Distribution In large-scale testing we can examine hundreds/thousands/millions of z-values. The chosen null distribution is inappropriate. What if we empirically determine the null distribution? Use the R package locfdr (also computes local FDR).

Empirical Null Distribution

FDR Summary FDR: a more liberal criterion than FWER. Many practitioners prefer to control FDR; in gene studies it is often more important to discover interesting genes. The BH procedure for FDR rejects cases for which the empirical Bayes posterior probability of nullness is too small. Local FDR: investigate more than just the tail-area probability.