Aliaksandr Hubin University of Oslo Aliaksandr Hubin (UIO) Bayesian FDR / 25

Size: px

Start display at page:

Download "Aliaksandr Hubin University of Oslo Aliaksandr Hubin (UIO) Bayesian FDR / 25"

Fay Thomas
5 years ago
Views:

1 Presentation of The Paper: The Positive False Discovery Rate: A Bayesian Interpretation and the q-value, J.D. Storey, The Annals of Statistics, Vol. 31 No.6 (Dec. 2003), pp Aliaksandr Hubin University of Oslo aliaksah@math.uio.no Aliaksandr Hubin (UIO) Bayesian FDR / 25

2 Overview 1 Introduction 2 Multiple Hypothesis Testing 3 Error Measurements and Control 4 Bayesian interpretation of pfdr 5 The q-value 6 Dependence of test statistics and asymptotic properties 7 A connection to classification theory 8 An application to DNA micro arrays in a Bayesian framework 9 Conclusions 10 Discussion Aliaksandr Hubin (UIO) Bayesian FDR / 25

3 Introduction Single hypothesis aim to minimize Error-II having Error-I controlled by some positive α; In multiple hypothesis testing controlling each test individually leads to the increase of number of both False Positives and False Negatives; Measures like FWER(P{V 1}) and FDR(E{ V R }) have been suggested to measure the number of False Positives; A number of methods to control FWER and/or FDR have been suggested: Bonferroni Method, Benjamini-Hochberg and etc.; This is my very first time to use LaTex and I have tried to play around with different features: please do not judge my formatting too strictly. Aliaksandr Hubin (UIO) Bayesian FDR / 25

4 Possible outcomes from m hypothesis tests Accept null Reject null Total Null true U V m 0 Alternative true T S m 1 Total W R m Table: 1 Aliaksandr Hubin (UIO) Bayesian FDR / 25

5 List of measures of Error I level and their drawbacks Controlled Measures 1 P{V 0} 2 E{ V R } 3 E{ V R R > 0} Pr{R > 0} 4 E{ V R R > 0} 5 E{V } E{R} Drawbacks 1 Significant decrease of power of m tests 2 Not defined when R=0 3 Little interest in cases when all cases are significant 4 Equals to 1 when m = m 0, whereas α (0, 1) 5 Equals to 1 when m = m 0, whereas α (0, 1) Authors however choose E{ V R R > 0} to be controlled; they call it pfdr (positive false discovery rate) and argue, that such a measure should be only available when we have at least one rejection that occurs, they also claim that it makes sense that the measure is equal to one, when m = m 0, however they do not give neither practical nor theoretical reason for that. Aliaksandr Hubin (UIO) Bayesian FDR / 25

6 One should be careful when controlling pfdr by means of the Benjamini and Hochberg procedure ˆk = argmax 1 k m Procedure {k p (k) α k m }, p (i) p (i+1), i [1, m 1] Z Reject all H 0i, i ˆk Note that Benjamini-Hochberg procedure controls FDR (3) at α =!!! α Pr{V 0} Aliaksandr Hubin (UIO) Bayesian FDR / 25

7 Bayesian interpretation of pfdr Aliaksandr Hubin (UIO) Bayesian FDR / 25

8 p-value and q-value definitions p-value p value(t) = inf {Pr{T Γ α H 0 }} = Pr{ T t H 0 } Γ α t Γα p-value is a type I error when rejecting any hypothesis based on statistics equal or more extreme to t in other words it is the minimal type I error over all significance regions that might take place when rejecting a statistic with value t q-value q value(t) = inf {pfdr{γ α}} = inf {Pr{H 0 T Γ α }} = Γ α t Γα Γ α t Γα = pfdr{ T t} = Pr{H 0 T t} q-value is a pfdr error when rejecting any hypothesis based on statistics equal or more extreme to t in other words it is the minimal pfdr over all significance regions that might take place when rejecting a statistic with value t Aliaksandr Hubin (UIO) Bayesian FDR / 25

9 q-value maximization in terms of Type I error and power Note that argmin Γ α t Γα {pfdr{γ α }} = argmin Γ α t Γα argmin Γ α t Γα {Pr{H 0 T Γ α }} = argmin G 0 (α) G 1 (α) = G 1(α ) G 1 (α ) Γ α t Γα Pr{T Γ α H 0 } Pr{T Γ α H 1 } = Where Aliaksandr Hubin (UIO) Bayesian FDR / 25

10 Relations between p-value and q-value for concave G 1 (α) Figure 1 Aliaksandr Hubin (UIO) Bayesian FDR / 25

11 pfdr transformation of p-value to Γ α This theorem says that through pfdr space of p-value can be transformed into the space of significant regions if and only if the Power function is increasing slower that Type I error, which is its argument, or in other words if and only if the Power function is concave. Aliaksandr Hubin (UIO) Bayesian FDR / 25

12 Generalization of Theorem 1 As one can see theorem one is not valid for both of such settings Aliaksandr Hubin (UIO) Bayesian FDR / 25

13 Asymptotic properties of FDR-controlling measures Aliaksandr Hubin (UIO) Bayesian FDR / 25

14 Asymptotic properties of FDR-controlling measures Where the following equations define asymptotic frequency based analogues of Type I error and Power: Thus, Theorem 4 says that if G 0, G 1 and π 0 can be calculated than for sufficiently large m these provides good approximations for all three FDRcontrolling measures. Aliaksandr Hubin (UIO) Bayesian FDR / 25

15 Practical example of such convergence Aliaksandr Hubin (UIO) Bayesian FDR / 25

16 Relation to classification theory FNR FNR = E{ T W W 0}Pr{W 0} pfnr FNR = E{ T W W 0} AND Aliaksandr Hubin (UIO) Bayesian FDR / 25

17 Bayes Miss-classification error BE(Γ) BE(Γ) = (1 λ)pr{t i Γ, H i = 0} + λpr{t i Γ, H i = 1} Classify H i as 1 Classify H i as 0 Null true 0 1 λ Alternative true λ 0 Table: 2. Outcomes of classification with the corresponding penalties Aliaksandr Hubin (UIO) Bayesian FDR / 25

18 Bayesian interpretation of pfnr Aliaksandr Hubin (UIO) Bayesian FDR / 25

19 Trade-off between different mixed error measures Where set B λ, λ [0; 1] defines the Bayes rule for the cost matrix given by Table 3: Aliaksandr Hubin (UIO) Bayesian FDR / 25

20 Practical application to DNA micro arrays Performed steps and achieved results: 1 T i H i (1 H i )F 0 + H i F 1 ; 2 Pr{H i = 0 T i = t i } = 3 ˆB λ = {t ˆPr{H = 0 T = t}} λ; 4 λ is chosen to be 0.10; 5 pfdr{ ˆB 0.10 } = Pr{H = 0 T ˆB 0.10 }; π 0 f 0 (t i ) π 0 f 0 (t i )+π 1 f 1 (t i ) is estimated by ˆPr{H i = 0 T i = t i }; 6 ˆq value(t i ) = ˆPr{H i = 0 T i ˆB ˆPr{Hi =0 T i =t i } } Aliaksandr Hubin (UIO) Bayesian FDR / 25

21 Conclusions Aliaksandr Hubin (UIO) Bayesian FDR / 25

22 Discussion of stupid (???) stuff multiple type I error measure A Y (Γ α, θ H0 ) = Pr{V > Y } = 1 F bin(pr(h1 H 0 ))(Y ), Y = r 1 N multiple type II error measure B Z (Γ α, θ H1 ) = Pr{T > Z} = 1 F bin(pr(h0 H 1 ))(Z), Z = r 2 N Bayesian rule {Γ α, θ H 0, θ H 1 } = argmin Γ α,θ H0,θ H 1 {λ 1 B z (Γ α, θ H1 ) + λ 2 A y (Γ α, θ H0 )} Aliaksandr Hubin (UIO) Bayesian FDR / 25

23 Discussion of stupid (???) stuff multiple p-value P(t 1,...t n ) Y = inf {Pr{{τ 1,..., τ Y } Γ α {τ 1,...,τ Y } Γα Γ α, {t 1,..., t n } {τ 1,..., τ Y } Γ α H 0 }}, {τ 1,..., τ Y } {t 1,..., t n } Aliaksandr Hubin (UIO) Bayesian FDR / 25

24 References J.D. Storey (2003) The Positive False Discovery Rate: A Bayesian Interpretation and the q-value The Annals of Statistics 31(6), Aliaksandr Hubin (UIO) Bayesian FDR / 25

25 The End. Thank You for the attention! Aliaksandr Hubin (UIO) Bayesian FDR / 25

Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis