Heterogeneity and False Discovery Rate Control

Size: px

Start display at page:

Download "Heterogeneity and False Discovery Rate Control"

Nathaniel Fleming
5 years ago
Views:

1 Heterogeneity and False Discovery Rate Control Joshua D Habiger Oklahoma State University jhabige@okstateedu URL: jdhabigerokstateedu August, 2014

2 Motivating Data: Anderson and Habiger (2012) M = 778 bacteria living near roots of wheat plants identified Prevalence/count number bacteria in Low, Medium-Low, Medium, Medium-High, High productivity groups Productivity Group Bacteria # (m) L ML M MH H total (n m) M = Biomass(g) Research question: Which bacteria are positively associated with productivity?

3 More Formally Data: Y mj = prevalence of mth bacteria in jth group; x j = shoot biomass of group j Model: Y mj Poisson ( e β 0m+β 1m x j ) Hypotheses: H m : β 1m = 0 vs K m : β 1m > 0 Test statistic: T m = j Y mjx j Ancillary statistic: Y m = j Y mj P-value: P m = Pr(T m t m Y m = n m ) 1 1 McCullagh and Nelder (1989)

4 Basic Goal Reject as many H m s as possible Constraint: FDR = E[FDP] α FDP = V max{r, 1} V = # False Discoveries R = # Discoveries

5 BH Procedure Procedure: δ m(p m;ˆt α) 0 = I(P m ˆt α) 0 where to chooseˆt α 0 1 Sort p-values: P (1) P (2) P (m) 2 k = max { } m : P (m) α m M 3 ˆt 0 α = α k M Properties: FDR(ˆt α) 0 α M 0 M M 0 = # true nulls α under certain dependence structure 2 2 Benjamini and Hochberg (1995); Benjamini and Yekuteli (2001)

6 Adaptive BH Procedure Procedure: δ m(p m;ˆt λ α) = I(P m ˆt λ α) where to chooseˆt λ α 1 Sort p-values: P (1) P (2) P (m) 2 Compute ˆM 0 (λ) = I(P m λ)+1 1 λ { } 3 k = max m : P (m) α m ˆM 0 (λ) 4 ˆt λ α = α k M Properties 3 : FDR(ˆt α) λ α under certain dependence structure lim M ˆt α 0 lim M ˆt α λ as under weak dependence lim M FDP(ˆt α) λ α as under weak dependence 3 Storey, Taylor, Siegmund (2004)

7 Question Note ancillary statistics Productivity Group Bacteria # (m) L ML M MH H total (n m ) M = Can/should we use this information?

8 WEIGHTED Adaptive BH Procedure Procedure: δ m(p m;ˆt λ αw m) = I(P m/w m ˆt λ α) I(Q m ˆt λ α) where to chooseˆt λ α 1 Select weights w 1,, w M st w = 1 (could depend on n ms for example) 2 Compute weighted p-values Q m = P m/w m 3 Sort weighted p-values: Q (1) Q (2) Q (m) 4 Compute ˆM 0 (λ) = I(Q m λ)+1 1 λ { } 5 k = max m : Q (m) α m ˆM 0 (λ) 6 ˆt λ α = α k M

9 Finite Sample Results Theorem (FDR control) If P m s ind under H m s and independent of other P m s FDR(ˆt α λ w) α w 1 λ 0 1 λ w 0 for w 0 mean weight among true H m s Corollaries for FDR control w 0 1 w = 1 - Storey et al (2004) Take α = α 1 1 λw (M) w (M) 1 λ

10 Asymptotic Results Under weak dependence Theorem (Larger Threshold) The weighted adaptive BH uses larger threshold than weighted unadaptive BH lim t α 0 lim M M tλ α as Theorem (FDP control) lim M FDP(ˆt λ αw) α as if µ 0 1 µ 0 asymptotic mean of null weights Corollaries for FDP control: optimal weights for random effects model weights positively correlated with optimal weights w m iid E[W m] = 1

11 More Asymptotic Results Question: Can we use a larger threshold? Theorem (α-exhaustive FDP control) Under weak dependence lim M FDP(ˆt λ αw) = α as under least favorable distribution (µ 0 = 1 and E[δ m ] = 1 if H m false) a a Dirac-Uniform, see Finner, Dickhaus, and Roters (2009) Corollaries for α-exhaustive FDP control: optimal weights for certain random effects models weights positively correlated with above weights w m = 1 - Storey et al (2004) w m iid E[W m ] = 1

12 Simulation Assessment Setup: Optimal weights for random effects model considered Heterogeneity - distribution of data when H m false Heterogeneity - prior probability for state of H m Power of weighted vs unweighted adaptive procedure Weight type Gain in Power (++, +, 0, -, ) Optimal Weight ++ Noisy Optimal weight 4 + Independent Weights - FDR always controlled 4 Noisy optimal weight = optimal weight Un(0,2)

13 Example: Bacteria data Optimal weights: somewhat involved: Need to use decision fxn framework can show effect sizes n m and assume average power = 1/2 weight power Frequency effect size effect size n Results: WA BH (o) 38 discoveries vs A BH ( ) 32 discoveries

14 Concluding Remarks Many different weighting schemes - depends on type of heterogeneity Weighted procedure robust FDP control provided under weak dependence even if weights misspecified Potential loss in power small vs potential gain

15 Acknowledgements/References Thanks to: W Sun, Y Liang, I Ahmad, E Peña, A Adekpedjou Some References Benjamini, Y and Y Hochberg (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing Journal of the Royal Statistical Society Series B 57(1), Finner, H, T Dickhaus, and M Roters (2009) On the false discovery rate and an asymptotically optimal rejection curve The Annals of Statistics 37(2), Genovese, C, K Roeder, and L Wasserman (2006) False discovery control with p-value weighting Biometrika 93(3), Habiger, J D (2012) A method for modifying multiple testing procedures J Statist Plann Inference 142(7), Peña, E, J Habiger, and W Wu (2011) Power-enhanced multiple decision functions controlling family-wise error and false discovery rates Annals of Statistics 39(1), Storey, J D, J E Taylor, and D Siegmund (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach Journal of the Royal Statistical Society Series B 66(1),

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Joshua D. Habiger Oklahoma State University jhabige@okstate.edu Nov. 8, 2013 Outline 1 : Motivation and FDR Research Areas