False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008

1 / 35 Lecture outline Motivation for not using classical p values in large-scale simultaneous multiple testing situations False discovery rate (FDR) and other multiple testing error measurements Estimation of FDR FDR, power and sample size

2 / 35 Classical single hypothesis testing Let µ be the difference in mean between two groups. We want to test the hypotheses H 0 : µ = 0 vs H 1 : µ 0 Observations in group 1: X = X 1, X 2,..., X nx Observations in group 2: Y = Y 1, Y 2,..., Y ny Test procedure Find a test statistic Z = h(x, Y ). Reject H 0 if p = 2P(Z > z obs given H 0 is true) < α, where α is significance level (e.g. 0.05) and z obs is observed value of Z.

3 / 35 When H 0 is correct f(t) 0.0 0.1 0.2 0.3 0.4 Frequency 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 1.0 4 2 0 2 4 Distribution of the test statistic. t P value Given that the model for the data used under H 0 is correct, p values have a Uniform(0,1) distribution.

4 / 35 Single hypothesis testing set-up Not reject H 0 Reject H 0 H 0 true Correct Type I error H 0 false Type II error Correct Significance level=p(type I error)=α Power=1-P(type II error)=β, i.e. probability of detecting a difference if there is a true difference.

Microarrays Microarrays measure differences in expression levels between two conditions. Sick vs healthy Microarray gene expressions More expressed in the sick individual More expressed in the healthy individual Same expression level in sick and healthy individuals 5 / 35

6 / 35 Microarray test statistic We want to test differential expression between two groups for i = 1,..., m genes (m of order 10000). This can be done using the ordinary two sample t statistic t i = x i ȳ i σ i, where σ i is the (estimated) standard deviation for the difference x i ȳ i. Variance estimates can be improved by borrowing strength across genes in a technique called variance shrinkage: z i = x i ȳ i. B σ 2 all + (1 B) σ i 2

7 / 35 Bootstrap estimated test statistic Variance shrinkage is often accompanied by bootstrap estimation of the test statistic under H 0. For B bootstrap samples: {x 1,..., x n, y 1,..., y n }: (draw) {x 1,..., x n},{y 1,..., y n} Calculate the null statistic z from the x s and the y s. Compare observed test statistic z obs with the B z -values. Frequency 0 10 20 30 40 50 Histogram of z z obs 6 4 2 0 2 4 6 z

0.0 0.2 0.4 0.6 0.8 1.0 P value 0.0 0.2 0.4 0.6 0.8 1.0 P value 8 / 35 P values from a microarray experiment Frequency 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Frequency 0 2 4 6 8 10 12 p values for null genes p values for non-null genes. Frequency True positives False positives True negatives False negatives α P value p values for all genes on the microarray

9 / 35 Multiple testing set-up Not reject H 0 Reject H 0 Total H 0 true TN FP m 0 H 0 false FN TP m m 0 Total m R R m m = # of hypotheses. m 0 = # of true H 0 s R = # of rejected H 0 s TP = # of true positives FP = # of false positives TN = # of true negatives FN = # of false negatives

10 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Type I error rates Family-wise error rate (FWER): FWER = P(FP 1) False discovery rate (FDR): FDR = E{ FP R I (R > 0)}, i.e. the expected proportion of falsely rejected H 0 among all rejections if there are any rejections, otherwise zero. Positive false discovery rate (pfdr): pfdr = E( FP R R > 0), i.e. same as FDR, but conditioned on having at least one rejection. Per comparison error rate (PCER): PCER = E(FP) m

11 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Family-wise error rates (FWER) Usual way of controlling for multiple testing in the pre-genomic era. FWER= Pr(FP 1) is the probability of at least one false positive. Most common method Bonferroni(1936): p = min(mp, 1) Other methods Šidàk (1967) Stepwise procedures, e.g. Holm (1979) Westfall & Young (1993) For genome-wide data controlling FWER leads to very low power! Less conservative approach: Generalized FWER (Dudoit et al., 2004, and van der Laan et al., 2004): P(FP k).

12 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) False discovery rate (FDR) Benjamini and Hochberg (1995) FDR = E{ FP R I (R > 0)} is the expected proportion of false positives, if there are any positives, else zero. Common method: Benjamini & Hochberg s (BH) step-up procedure: Let p (1) p (2) p (m) be the ordered raw p values. Let k = max{k : mp (k) α} k Reject all hypotheses for which the corresponding p values are smaller than p ( k) : p (1),..., p ( k), p ( k+1),..., p (m).

13 / 35 BH step-up: Motivation Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) k = max{k : mp (k) k α} Core of the BH step-up is mp (k). k m 0 p (k) is an estimate of the expected number of false positives when p (k) is cut-off value for the raw p values. Since m 0 is unknown, m is used as a conservative estimate of m 0. is then an estimate of the proportion of expected false positives among the total number of positives k. mp (k) k

14 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Modification for general dependence Benjamini & Yekutiely (2001) The Benjamini & Yekutiely (BY) step-up procedure modifies for general dependence: k = max{k : m m l=1 1 l p (k) α} k When m is large the penalty of the BY-procedure is about log(m) compared to the BH-procedure Can be a large price to pay for allowing arbitrary dependence (Ge et al. 2003)

15 / 35 Proportion of true nulls Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) The number of null genes m 0 is unknown, therefore also the proportion π 0 = m0 m. π 0 is important in estimation of FDR.

Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Estimating π 0 Schweder and Spjøtvoll s estimator Look at an interval [λ, 1], where most p values are assumed to come from true nulls. The Schweder and Spjøtvoll (1982) estimator is π 0 (λ) = #{p i > λ} m(1 λ) for a fixed λ (0, 1) Frequency Null genes Non-null genes λ 16 / 35

17 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Estimating π 0 using convex decreasing p value density (Langaas et al., 2002) For p close to 1, f (p) π 0. Reasonable to assume that f (p) is decreasing in p. Assuming f (p) also is convex leads to improved estimation of f (1), which can be used as an estimate of π 0. Decreasing p values. Convex decreasing p values.

18 / 35 Inserting π 0 to improve FDR estimate Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) The BH step-up procedure finds k = max{k : mp (k) k α}, where m was a conservative estimate of the number of true nulls. The BH procedure with adaptive control (Benjamini & Hochberg, 2000) finds k = max{k : π 0mp (k) k α}.

19 / 35 Mixture model for p values Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) According to Genovese & Wasserman (2002) Conditional distributions of p values Null genes: Uniform(0,1) (when correct distribution for test statistic is used to calculate the p values.) Non-null genes: h(p) Unconditional distribution of p values is then f (p) = π 0 1 + (1 π 0 ) h(p)

20 / 35 Mixture model for test statistic Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Unconditional distribution of z values is (Efron et al., 2001) f (z) = π 0 f 0 (z) + (1 π 0 ) f 1 (z), where f 0 (z) is the distribution of the test statistic Z for non-null genes and f 1 (z) is the distribution of Z for non-null genes.

21 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) (Empirical) Bayesian Fdr and local Fdr Assume (without loss of generality) that H 0 is rejected for large values of Z. The mixture model based or (empirical) Bayesian false discovery rate is q(z) = Fdr(z) = P(H 0 true Z z) = P(Z z H 0 true)p(h 0 true) P(Z z) = π 0(1 F 0 (z)) (1 F (z)), where F 0 is the cumulative distribution of Z under H 0, and F is the unconditional cumulative distributions of Z. Local Fdr (locally at Z = z) is defined as (Efron et al., 2001) fdr(z) = P(H 0 true Z = z) = π 0f 0 (z) f (z)

22 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Connection between BH ( frequentist ) FDR and empirical Bayesian Fdr Frequentist procedure: The BH step-up procedure with adaptive control finds k such that k = max{k : π 0p (k) k/m α}. Rejecting p 1,..., p k provides FDR α. Let z 1 z 2 z m be the ordered z values. The empirical Bayesian procedure finds l = max l : Fdr(zl ) α, where Fdr(z l ) = π 0P(Z z l H 0 true) P(Z z l ) = π 0p l l/m

23 / 35 Estimation under mixture model Recall the mixture model f (z) = π 0 f 0 (z) + (1 π 0 ) f 1 (z). Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Null distribution f 0 (z) is usually assumed N(0, 1) (but normality assumption may be violated), or found by bootstrap estimation via resampling group labels. Unconditional distribution f (z) can be approximated by smoothing the empirical distribution.

24 / 35 Estimation under mixture model Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Upper bound for π 0 can be found by requiring (Efron et al., 2001) 1 fdr(z) = 1 π 0 f 0 (z)/f (z) > 0 for all z This yields π 0 min f (z)/f 0 (z) z

25 / 35 Violation of N(0,1) assumption Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) The null distribution is not necessarily N(0, 1). Deviations from N(0, 1) are caused by (1) Non-normal data and n too small for asymptotic theory to be valid. (2) Unobserved covariates. Inflate the distribution. (3) Correlation across arrays (4) Correlation between genes Bootstrap can not resolve (2) (4). Efron (2007) suggests to estimate empirical null distribution.

26 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Estimating empirical null distribution (Efron, 2007) Assume f 0 (z) N(δ 0, σ0) 2 Estimate δ 0 and σ0 2 by fitting a quadratic curve to the log of the distribution of Z around 0. The procedure is called central matching.

Type II errors Type II errors Optimizing power Sample size False non-discovery rate (FNDR) is the proportion of non-null genes among all non-significant genes. False negative rate (FNR) is the proportion of non-significant genes among all non-null genes. Sensitivity=power=1-FNR, i.e. proportion of significant genes among all non-null genes. Given Type I error rate α, an optimal testing procedure maximizes sensitivity (minimizes FNR). Frequency True positives False positives True negatives False negatives α 27 / 35

28 / 35 Type II errors Optimizing power Sample size Optimal discovery procedure (Storey, 2007) Neyman-Pearson (NP) lemma (1933): Given observed data, optimal testing procedure is based on likelihood ratio P(data H 1 ) P(data H 0 ) Storey (2007) applies NP lemma to multiple testing situation. Assume that test j has density f j under H 0 and g j under H 1. The optimal discovery procedure (ODP) statistics for a gene with observation vector x is defined as S ODP (x) = Sum of P(x under H 1) for all non-null genes Sum of P(x under H 0 ) for all null genes m j=m = 0 +1 g j(x) m0 j=1 f j(x) The f j s and g j s, as well as m 0, must be estimated.

29 / 35 Type II errors Optimizing power Sample size Optimal discovery procedure (Storey, 2007) The ODP procedure: 1 Evaluate the estimated ODP statistic for each gene 2 Use bootstrap to simulate data from the null distribution for each gene, and recompute ODP to get a null distribution for ODP. 3 Use observed and resampled ODPs to calculate q-value for each gene.

30 / 35 Type II errors Optimizing power Sample size Covariate modulated FDR (Ferkingstad et al., 2008) Sensitivity can also be increased by adding external covariates x i, i = 1,... m. Let g(p x) be the conditional density of p under H 1 and π 0 (x) = P(H 0 true x) Mixture model for p values given x is then f (p x) = π 0 (x)+(1 π 0 (x))g(p x).

31 / 35 Type II errors Optimizing power Sample size Sample size assessments (Pawitan et al., 2005) FDR (and FNR) as a function of sample size.

32 / 35 Type II errors Optimizing power Sample size Sample size assessments (Efron, 2007) Efron (2007) studied how multiplying the sample size with a factor c would affect local Fdr. c 1 1.5 2 2.5 3 Prostate cancer 0.68 0.54 0.44 0.38 0.34 HIV 0.45 0.31 0.23 0.18 0.14

33 / 35 Summary References Summary Use of classical p values is problematic in large-scale simultaneous hypothesis testing situations, as it easily generates too many false positives. For microarrays, False Discovery Rate (FDR) is a convenient measure for balancing the number of false positives and false negatives. FDR can be calculated using the Benjamini & Hochberg step-up procedure ( frequentist ) approach or a mixed model ( Bayesian or empirical Bayesian ) approach. The mixed model approach has recently been used to avoid the N(0, 1) null distribution assumption, and to include external covariates. Methods for power and sample size calculations when controlling significance via FDR have recently been proposed.

34 / 35 Summary References References Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practival and powerful approach to multiple testing. J. Roy. Statist. Soc. B, 57:289 300. Benjamini, Y. and Hochberg, Y. (2000). The adaptive control of the false discovery rate in multiple hypotheses testing. J. Behav. Educ. Statist., 25:60 83. Benjamini, Y. and Yekutieli, Y. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist., 29:1165 1188. Efron, B. (2007). Size, power and false discovery rates. Ann. Statist., 35:1351 1377. Efron, B. et al. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc., 96:1151 1160. Ferkingstad, E. et al. (2008). Unsupervised empirical Bayesian multiple testing with external covariates. Ann. of appl. statist., 2:714 735.

35 / 35 Summary References References Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control Ann. Statist., 32:1035 1061. Langaas, M. et al. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. Roy. Statist. Soc. Ser. B, 67:555 572. Pawitan, Y. et al. (2005). False discovery rate, sensitivity and sample size for microarray studies. Bioinforamtics, 21:3017 3024. Storey, J. D. (2002). A direct approach to false discovery rates. J. Roy. Statist. Soc. B, 64:479 498. Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA, 100:9440 9445. Storey, J. D. (2007). The optimal discovery procedure: a new approach to simultaneous significance testing. J. Roy. Statist. Soc. B, 69:347 368.