Heterogeneity and False Discovery Rate Control

Similar documents
Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control

Adaptive False Discovery Rate Control for Heterogeneous Data

Resampling-Based Control of the FDR

arxiv: v1 [math.st] 31 Mar 2009

Applying the Benjamini Hochberg procedure to a set of generalized p-values

On Methods Controlling the False Discovery Rate 1

False Discovery Control in Spatial Multiple Testing

Doing Cosmology with Balls and Envelopes

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses

Two-stage stepup procedures controlling FDR

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Chapter 1. Stepdown Procedures Controlling A Generalized False Discovery Rate

Lecture 7 April 16, 2018

STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University

Large-Scale Multiple Testing of Correlations

A Large-Sample Approach to Controlling the False Discovery Rate

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004

High-throughput Testing

New Procedures for False Discovery Control

The miss rate for the analysis of gene expression data

False Discovery Rate

New Approaches to False Discovery Control

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

False discovery control for multiple tests of association under general dependence

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests

Looking at the Other Side of Bonferroni

Sanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract

Research Article Sample Size Calculation for Controlling False Discovery Proportion

arxiv: v2 [stat.me] 31 Aug 2017

arxiv:math/ v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao

Modified Simes Critical Values Under Positive Dependence

Peak Detection for Images

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence

STEPUP PROCEDURES FOR CONTROL OF GENERALIZATIONS OF THE FAMILYWISE ERROR RATE

hal , version 2-2 Apr 2010

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University

arxiv: v4 [stat.me] 3 Sep 2017

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

Journal of Statistical Software

Hunting for significance with multiple testing

Some General Types of Tests

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Large-Scale Multiple Testing of Correlations

CHOOSING THE LESSER EVIL: TRADE-OFF BETWEEN FALSE DISCOVERY RATE AND NON-DISCOVERY RATE

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

On adaptive procedures controlling the familywise error rate

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS

A General Framework for High-Dimensional Inference and Multiple Testing

Generalized estimators for multiple testing: proportion of true nulls and false discovery rate by. Xiongzhi Chen and R.W. Doerge

Alpha-Investing. Sequential Control of Expected False Discoveries

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR

Controlling the False Discovery Rate in Two-Stage. Combination Tests for Multiple Endpoints

Effects of dependence in high-dimensional multiple testing problems. Kyung In Kim and Mark van de Wiel

A Stochastic Process Approach to False Discovery Rates Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University January 7, 2003

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling

Control of the False Discovery Rate under Dependence using the Bootstrap and Subsampling

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

False Discovery Rates for Random Fields

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

Estimation of a Two-component Mixture Model

Post-Selection Inference

Department of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007

Non-specific filtering and control of false positives

Adaptive FDR control under independence and dependence

Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses

arxiv: v3 [math.st] 15 Jul 2018

POSITIVE FALSE DISCOVERY PROPORTIONS: INTRINSIC BOUNDS AND ADAPTIVE CONTROL

Positive false discovery proportions: intrinsic bounds and adaptive control

Aliaksandr Hubin University of Oslo Aliaksandr Hubin (UIO) Bayesian FDR / 25

A Unified Computational Framework to Compare Direct and Sequential False Discovery Rate Algorithms for Exploratory DNA Microarray Studies

arxiv: v1 [stat.me] 18 Jan 2017

Statistical testing. Samantha Kleinberg. October 20, 2009

False discovery rate control for identifying simultaneous signals

CARS: Covariate Assisted Ranking and Screening for Large-Scale Two-Sample Inference

False discovery control in large-scale spatial multiple testing

Comments on: Control of the false discovery rate under dependence using the bootstrap and subsampling

More powerful control of the false discovery rate under dependence

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

Introductory Econometrics

Journal Club: Higher Criticism

PHASE TRANSITION AND REGULARIZED BOOTSTRAP IN LARGE-SCALE t-tests WITH FALSE DISCOVERY RATE CONTROL

Control of Generalized Error Rates in Multiple Testing

Large-Scale Hypothesis Testing

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence

The optimal discovery procedure: a new approach to simultaneous significance testing

False Discovery Rate Based Distributed Detection in the Presence of Byzantines

Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators

SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE

Two simple sufficient conditions for FDR control

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Statistical Applications in Genetics and Molecular Biology

Announcements. Proposals graded

Transcription:

Heterogeneity and False Discovery Rate Control Joshua D Habiger Oklahoma State University jhabige@okstateedu URL: jdhabigerokstateedu August, 2014

Motivating Data: Anderson and Habiger (2012) M = 778 bacteria living near roots of wheat plants identified Prevalence/count number bacteria in Low, Medium-Low, Medium, Medium-High, High productivity groups Productivity Group Bacteria # (m) L ML M MH H total (n m) 1 0 1 1 0 5 7 2 9 2 0 0 3 10 M = 778 16 10 29 18 13 86 Biomass(g) 085 133 181 237 300 Research question: Which bacteria are positively associated with productivity?

More Formally Data: Y mj = prevalence of mth bacteria in jth group; x j = shoot biomass of group j Model: Y mj Poisson ( e β 0m+β 1m x j ) Hypotheses: H m : β 1m = 0 vs K m : β 1m > 0 Test statistic: T m = j Y mjx j Ancillary statistic: Y m = j Y mj P-value: P m = Pr(T m t m Y m = n m ) 1 1 McCullagh and Nelder (1989)

Basic Goal Reject as many H m s as possible Constraint: FDR = E[FDP] α FDP = V max{r, 1} V = # False Discoveries R = # Discoveries

BH Procedure Procedure: δ m(p m;ˆt α) 0 = I(P m ˆt α) 0 where to chooseˆt α 0 1 Sort p-values: P (1) P (2) P (m) 2 k = max { } m : P (m) α m M 3 ˆt 0 α = α k M Properties: FDR(ˆt α) 0 α M 0 M M 0 = # true nulls α under certain dependence structure 2 2 Benjamini and Hochberg (1995); Benjamini and Yekuteli (2001)

Adaptive BH Procedure Procedure: δ m(p m;ˆt λ α) = I(P m ˆt λ α) where to chooseˆt λ α 1 Sort p-values: P (1) P (2) P (m) 2 Compute ˆM 0 (λ) = I(P m λ)+1 1 λ { } 3 k = max m : P (m) α m ˆM 0 (λ) 4 ˆt λ α = α k M Properties 3 : FDR(ˆt α) λ α under certain dependence structure lim M ˆt α 0 lim M ˆt α λ as under weak dependence lim M FDP(ˆt α) λ α as under weak dependence 3 Storey, Taylor, Siegmund (2004)

Question Note ancillary statistics Productivity Group Bacteria # (m) L ML M MH H total (n m ) 1 0 1 1 0 5 7 2 9 2 0 0 3 10 M = 778 16 10 29 18 13 86 Can/should we use this information?

WEIGHTED Adaptive BH Procedure Procedure: δ m(p m;ˆt λ αw m) = I(P m/w m ˆt λ α) I(Q m ˆt λ α) where to chooseˆt λ α 1 Select weights w 1,, w M st w = 1 (could depend on n ms for example) 2 Compute weighted p-values Q m = P m/w m 3 Sort weighted p-values: Q (1) Q (2) Q (m) 4 Compute ˆM 0 (λ) = I(Q m λ)+1 1 λ { } 5 k = max m : Q (m) α m ˆM 0 (λ) 6 ˆt λ α = α k M

Finite Sample Results Theorem (FDR control) If P m s ind under H m s and independent of other P m s FDR(ˆt α λ w) α w 1 λ 0 1 λ w 0 for w 0 mean weight among true H m s Corollaries for FDR control w 0 1 w = 1 - Storey et al (2004) Take α = α 1 1 λw (M) w (M) 1 λ

Asymptotic Results Under weak dependence Theorem (Larger Threshold) The weighted adaptive BH uses larger threshold than weighted unadaptive BH lim t α 0 lim M M tλ α as Theorem (FDP control) lim M FDP(ˆt λ αw) α as if µ 0 1 µ 0 asymptotic mean of null weights Corollaries for FDP control: optimal weights for random effects model weights positively correlated with optimal weights w m iid E[W m] = 1

More Asymptotic Results Question: Can we use a larger threshold? Theorem (α-exhaustive FDP control) Under weak dependence lim M FDP(ˆt λ αw) = α as under least favorable distribution (µ 0 = 1 and E[δ m ] = 1 if H m false) a a Dirac-Uniform, see Finner, Dickhaus, and Roters (2009) Corollaries for α-exhaustive FDP control: optimal weights for certain random effects models weights positively correlated with above weights w m = 1 - Storey et al (2004) w m iid E[W m ] = 1

Simulation Assessment Setup: Optimal weights for random effects model considered Heterogeneity - distribution of data when H m false Heterogeneity - prior probability for state of H m Power of weighted vs unweighted adaptive procedure Weight type Gain in Power (++, +, 0, -, ) Optimal Weight ++ Noisy Optimal weight 4 + Independent Weights - FDR always controlled 4 Noisy optimal weight = optimal weight Un(0,2)

Example: Bacteria data Optimal weights: somewhat involved: Need to use decision fxn framework can show effect sizes n m and assume average power = 1/2 weight 00 02 04 06 08 10 12 power 04 06 08 10 Frequency 0 50 100 150 6 114 333 883 2 4 6 8 10 effect size 2 4 6 8 10 effect size n Results: WA BH (o) 38 discoveries vs A BH ( ) 32 discoveries

Concluding Remarks Many different weighting schemes - depends on type of heterogeneity Weighted procedure robust FDP control provided under weak dependence even if weights misspecified Potential loss in power small vs potential gain

Acknowledgements/References Thanks to: W Sun, Y Liang, I Ahmad, E Peña, A Adekpedjou Some References Benjamini, Y and Y Hochberg (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing Journal of the Royal Statistical Society Series B 57(1), 289 300 Finner, H, T Dickhaus, and M Roters (2009) On the false discovery rate and an asymptotically optimal rejection curve The Annals of Statistics 37(2), 596 618 Genovese, C, K Roeder, and L Wasserman (2006) False discovery control with p-value weighting Biometrika 93(3), 509 524 Habiger, J D (2012) A method for modifying multiple testing procedures J Statist Plann Inference 142(7), 2227 2231 Peña, E, J Habiger, and W Wu (2011) Power-enhanced multiple decision functions controlling family-wise error and false discovery rates Annals of Statistics 39(1), 556 583 Storey, J D, J E Taylor, and D Siegmund (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach Journal of the Royal Statistical Society Series B 66(1), 187 205