Non-specific filtering and control of false positives

Size: px

Start display at page:

Download "Non-specific filtering and control of false positives"

Gerald Oscar Summers
5 years ago
Views:

1 Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 EBI is an outstation of the European Molecular Biology Laboratory

2 Outline Multiple testing I: overview Genome-wide experiments Why adjustment is necessary Experiment-wide type I error rates Uniformity and a mixture of p-values Non-specific filtering The ALL data: B-cell subtypes Non-specific filtering to increase rejections Multiple testing II: impact of filtering Non-specific filtering: are we controlling error rate? Independence Implications for FWER and FDR Slide 2

3 Multiple testing I Slide 3

Differential expression testing Acute lymphocytic leukemia (ALL) data Chiaretti et al., Clinical Cancer Research 11:7209, 2005.

4 Differential expression testing Acute lymphocytic leukemia (ALL) data Chiaretti et al., Clinical Cancer Research 11:7209, Immunophenotypic analysis of cell surface markers identified T-cell derivation in 33 samples. B-cell derivation in 95 samples. Affymetrix HG-U95Av2 oligonucleotide arrays with ~ 13,000 probe sets. Chiaretti et al. identified 792 differentially expressed genes, with sufficient levels of expression and variation across groups. Clustered expression data for all 128 subjects, and a subset of 475 genes showing evidence of differential expression between groups Slide 4

5 The traditional type I error rate The traditional p-value for a single gene: Define a test statistic T based on the expression data. Compute its value, t, for the observed data. Define p = P(T > t the gene is not differentially expressed). Compare p to an acceptable type I (false positive) error rate. Suppose Chiaretti et al. had compared replicate samples, so that no gene was differentially expressed. Comparing p-values to =.05 gives and average of 13, = 650 false positives. Per-family error rate (PFER): the expected number of false positives. For the real data, PFER 650. Slide 5

6 Experiment-wide type I error rates Not rejected Rejected Total True null hypotheses U V m 0 False null hypotheses T S m 1 Total m R R m 0 Family-wise error rate: P(V > 0), i.e., the probability of one or more false positives. For large m 0, this is very difficult to keep small. False discovery rate (FDR): let Q = V/R, or 0 if R is 0. The FDR is E(Q), i.e., the expected fraction of false positives among all discoveries. Slide 6

7 A nice property of continuous random variables For a continuous random variable X, P(X = x) is 0 for any x. Continuous Gaussian Exponential Discrete Binomial Geometric The distribution function: Standard normal distribution function F(x) = P(X x). The nice property: if we define random U = F(X), then U is uniformly distributed on the unit interval [0,1]. P(X x) x Slide 7

of X Histogram of F(X) Frequency 0 2000 4000 6000 8000 Frequency 0

8 A nice property of CDFs for continuous RVs > X = rnorm(100000) > F = pnorm > hist(x, breaks = 50) > hist(f(x), breaks = 50) Histogram of X Histogram of F(X) Frequency Frequency X F(X) Slide 8

9 A nice property? To compute a p-value for testing a null hypothesis H 0, we typically Define a test statistic T, and compute its value t for the observed data. Assume we know the distribution of T when H 0 is true: F 0. Compute p = 1 F 0 (t), i.e., define p = P(T > t H 0 is true). Compare p to some. Now define the random variable P = 1 F 0 (T). If H 0 is true, then F 0 (T) is uniformly distribution on [0,1]. By symmetry, P is uniformly distribution on [0,1] as well. Suppose 20% of genes are differentially expressed, so that 0 = m 0 m = Slide 9

10 Observed p-values: a mixture A. F 0 C. Mixture density 0 F F ( 0 = 0.8) B. F Density II III I IV False negative True positive False positive True negative p Slide 10

11 Observed p-values: a mixture A. F 0 C. Mixture density 0 F F ( 0 = 0.8) B. F Density p Slide 11

12 Non-specific filtering Slide 12

13 Continuing with the Chiaretti et al. ALL data 79 subjects with B-cell ALL: 37 with the BCR/ABL mutation ( Philadelphia chromosome ) 42 with no observed cytogenetic abnormalities We ve seen that Multiple testing correction becomes more extreme for larger m. Removal of true null hypotheses reduces the FDR associated with a particular p- value cutoff. Proposal: non-specific filtering von Heydebreck, Huber and Gentleman, Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. Wiley, McClintick and Edenberg, BMC Bioinformatics 7:49, Slide 13

14 Non-specific filtering For a given gene, write the data as ((c 1,Y 1 ),,(c p,y p )). First group (c = 1): i = 1,, p 1. First group (c = 2): i = p 1 + 1,, p 1 + p 2. Conditions under which we expect little variation in Y: 1. Genes which are absent in both samples. (Probes will still report noise and crosshybridization, typically at the same level in both groups.) 2. Probe sets which do not respond to target. 3. Genes which are not differentially expressed. A non-specific filter: Ignores c 1,, c p, i.e., U (I) (Y). Helps identify any of these three classes, based on our a priori understanding of array behavior. Apply standard testing to genes passing the filter, using some U (II) (c,y). Slide 14

15 Increased rejection rate Stage one non-specific filter statistic: compute the overall variance U (I) (Y) = S 2 = 1 and remove the smallest. Stage two: standard two-sample t-test for genes passing stage 1. p 1 p (Y i Y ) 2 i=1 a b R = 0.5 = 0.4 = 0.3 = 0.2 = 0.1 = 0 R FDR (BH) q value Slide 15

16 Increased power? An increased detection rate implies increased power only if we are still controlling type I errors at the nominal level. a b R = 0.5 = 0.4 = 0.3 = 0.2 = 0.1 = 0 R FDR (BH) q value Slide 16

17 Multiple testing II: impact of filtering Slide 17

18 Slide 18 Notation ahead!

19 Conditional control Random variable definitions: V: number of false positives. FWER = P(V > 0). R: total number of rejections. Q: V / max{r, 1}. FDR = E(Q). Note that R = 0 implies that V = 0. M: the random set (of size M) of hypothesis passing the stage-one filter. U (I) and U (II) : the stage-one and stage-two test statistics. Conditional control is sufficient: E(Q) = E(E(Q M)). Because we only reject at stage two, given M, we can assess conditional control of type I error by considering how a procedure performs when applied to the M conditionally distributed (U (II) I1,,U (II) IM ). M={I1,,I M } Thus, FWER or FDR control is achieved if the conditional distributions of the stage-two statistics meet requirements for the control procedure. Slide 19

20 Requirements for FWER and FDR control Marginal properties of true-null test statistics. Distributions must be properly specified. Joint properties of all test-statistics. Adjustment procedures may work for arbitrary dependence structure (e.g., Bonferroni and Holm). estimate and correct for dependence (e.g., Westfall and Young). require that dependence structure satisfies some assumptions. Assumptions about dependence structure: Independence! Subset pivotality. Positive regression dependence. Convergence of the empirical processes based on V( ) and R( ). Etc. Slide 20

21 Independence of stage one and stage two test statistics For genes for which the null hypotheses is true, U (I) and U (II) are statistically independent in both of the following cases: For normally distributed data: Stage one: overall mean,, or variance, S 2 = 1 p Y = 1 p p 1 (Y i Y ) 2. p Y i=1 i i=1 Stage two: the standard two-sample t-statistic, or any location- and scaleinvariant test statistic. Non-parametrically: Stage one: any function of the data which (i) to filter gene g only uses data from gene g, and (ii) doesn t depend on the order of the arguments. S 2 above, or the IQR, are both candidates. Stage two: the Wilcoxon rank sum test statistic. Both can be extended to the multi-class context: ANOVA and Kruskal Wallis. Slide 21

22 Independence: Bonferroni and Holm FWER adjustments Single-stage Bonferroni correction compares each p i to /m. The Holm step-down procedure is a more powerful variant. For ordered p- values p (1) p (2) p (m), reject as long as Independence of U (I) and U (II) implies that the key step in proving that these procedures control FWER still applies: ( P(V > 0) = P {U (I) > a,u (II) ) i 0 i i > b} p (i ) < i 0 i 0 P(U (I) i > a,u i (II) > b) = P(U (I) i > a)p(p i < ) = E(M). m i + 1. Slide 22

23 FWER: Westfall and Young Westfall and Young (1993) controls FWER with more power, but depends on the joint distribution of all p-values: ( ). p i = P min 1 j m P j p i H 0 C WY93 is valid under subset pivotality. If this holds for the one-stage procedure, it holds for the two-stage non-specific filtering approach as well. H 0 C Distribution of min P j under is typically estimated by permutation. If filtering changes correlation structure, new structure is used by permutation! Slide 23

24 FDR: Benjamini & Hochberg and Storey adjustments What is the FDR associated with use of cutoff? Naive estimator: V is not observable, but E(V) is m 0, which is bounded by m. E(R) cannot be computed, but R can be used as an estimator. FDR ( ) = Evaluating at each p (i) using m or ˆm 0 gives BH95 or Storey adjustments, respectively: FDR (p(i ) ) = m 0 #{i : p i } m #{i : p i } m 0 #{i : p i p (i ) } = m 0. i density II I III IV FDR( ) = E V( ) R( ) 1. False negative True positive False positive True negative p Slide 24

25 FDR: Benjamini & Hochberg and Storey adjustments The foregoing motivation for the BH95 and Storey procedures uses E(V( )) = m 0. FDR( ) = E V( ) R( ) 1. Marginal independence of true null U (I) and U (II) means that this still applies at stage two in expectation. Define M 0 to be the random number of true nulls passing stage 1. Then E(V( )) = P(U (I) i > a, P i i < ) 0 = P(U (I) i > a)p( P i < ) i 0 = E(M 0 ). density II I III IV False negative True positive False positive True negative p Slide 25

26 Counterexamples Slide 26

27 Counterexample #1: normality matters Filter on S 2, test using T. For true null hypotheses, data are i.i.d. normal with probability p, but heavily skewed with probability 1 p. Let latent X indicate mixture component identity. Frequency ^ X = ^ X = 1 Frequency T(Y) X = T(Y) X = 1 The filter statistic is now predictive for X. The conditional distribution of T will be more weighted towards the X = 1 case than the unconditional distribution. Frequency Frequency Slide 27

This is more pronounced for small n precisely the context in which limma is most useful.

28 Counterexample #2: the limma t-statistic Filter on S 2, test using the limma moderated t-statistic. The moderated t-statistic is not scale invariant, due to the effect of the global variance estimator. This is more pronounced for small n precisely the context in which limma is most useful. Filtering on overall mean rather than variance solves the problem. t Moderated t and overall variance S Slide 28

29 Counterexample #2: the limma t-statistic Unconditional p values Conditional p values Frequency Frequency fit$p.value[, 2] fit$p.value[s > median(s), 2] Slide 29

30 Conclusions In actual examples, use of an independent filter leads to (biologically) significant increases in the number of genes identified. Some commonly used stage one/stage two test statistic pairs are statistically independent for genes which are not differentially expressed but others are not! The non-specific criterion is not enough. Given this independence, Bonferroni and Holm FWER control is valid in the two-stage procedure. Likewise for FDR-controlling procedures which make no assumptions about independence. Correlation structure may change under filtering. Permutation-based Westfall and Young correction accounts for this. FDR control could theoretically suffer, although the practical impact is likely to be small. Effect of filtering on correlation can also be checked, and impact, assessed. Slide 30

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi