Hunting for significance with multiple testing

Size: px
Start display at page:

Download "Hunting for significance with multiple testing"

Transcription

1 Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance with multiple testing 1 / 38

2 1 Introduction 2 Classical results 3 FDP control 4 Posthoc inference Etienne Roquain Hunting for significance with multiple testing 2 / 38

3 A multiple testing story" ( Etienne Roquain Hunting for significance with multiple testing Introduction 3 / 38

4 A multiple testing story" ( Etienne Roquain Hunting for significance with multiple testing Introduction 3 / 38

5 A multiple testing story" ( Etienne Roquain Hunting for significance with multiple testing Introduction 3 / 38

6 Shadoks motto Etienne Roquain Hunting for significance with multiple testing Introduction 4 / 38

7 False positive in practice Abstract of [Giovannucci et al. (1995)] "Between 1986 and 1992, 812 new cases of prostate cancer [...] were documented. Of 46 vegetables and fruits or related products, four were significantly associated with lower prostate cancer risk; of the four tomato sauce (P for trend =.1), tomatoes (P for trend =.3), and pizza (P for trend =.5), but not strawberries were primary sources of lycopene. Combined intake of tomatoes, tomato sauce, tomato juice, and pizza (which accounted for 82% of lycopene intake) was inversely associated with risk of prostate cancer (multivariate RR =.65; 95% CI = , for consumption frequency greater than 1 versus less than 1.5 servings per week; P for trend =.1) and advanced (stages C and D) prostate cancers (multivariate RR =.47; 95% CI =.22-1.; P for trend =.3)." Should we eat pizza to avoid prostate cancer? Etienne Roquain Hunting for significance with multiple testing Introduction 5 / 38

8 Massive and complex datasets Gene expressions interesting genes? Genome-wide assoc. study interesting SNPs? Copy number alterations interesting probes? Etienne Roquain Hunting for significance with multiple testing Introduction 6 / 38

9 Massive and complex datasets Neuroimaging (FMRI) activated regions? Neuroscience periods for neuronal response? Astronomy directions with stars? Etienne Roquain Hunting for significance with multiple testing Introduction 7 / 38

10 Single testing Data: X = (X 1,..., X n ) i.i.d. N (µ, 1), µ R unknown Null hypothesis H : µ " against alt. H 1 : µ > " Test statistic: T (X) = n 1/2 n i=1 X i, p-value p(x) = Φ(T (X)) T=.5 p=.31 T=2 p= Test of level α: reject H if p(x) α Risk under H : P(p(X) α) α Etienne Roquain Hunting for significance with multiple testing Introduction 8 / 38

11 Multiple testing Pure noise, m = 48 indep. tests: At least one false positive: with prob. 1 (1.5) Etienne Roquain Hunting for significance with multiple testing Introduction 9 / 38

12 Multiple testing Pure noise, m = 192 indep. tests: At least one false positive: with prob. 1 (1.5) Etienne Roquain Hunting for significance with multiple testing Introduction 9 / 38

13 Canonical setting X i = avg group 2 - avg group 1 (rescaled) for item i, 1 i m. Gaussian model : X N (µ, Γ) R m Hypotheses : H,i : µ i " vs H 1,i : µ i > ", 1 i m. Parameter of interest : θ i = 1{µ i > }, 1 i m. p-values: p i = Φ(X i ), 1 i m. Dependence: covariance matrix Γ (known or unknown) Row Row Row Column Dimensions: 2 x 2 Column Dimensions: 2 x 2 Column Dimensions: 2 x 2 Aim : recover θ from (p i (X), 1 i m) Etienne Roquain Hunting for significance with multiple testing Introduction 1 / 38

14 General setting Data X (X, X) with X P P (model) H,i : P P,i ", 1 i m, null hypotheses for P true/false label θ = θ(p) {, 1} m such that θ i = if and only if H,i is true for P Assume p-values (p i (X), 1 i m) such that if θ i =, p i (X) U(, 1) if θ i = 1, p i (X) let arbitrary (typically "small") Aim : recover θ from (p i (X), 1 i m) Etienne Roquain Hunting for significance with multiple testing Introduction 11 / 38

15 Thresholding i p i Rejection set R = {1 i m : p i (X) t} Some false positives Etienne Roquain Hunting for significance with multiple testing Introduction 12 / 38

16 Thresholding i p i Rejection set R = {1 i m : p i (X) t} Some false positives Etienne Roquain Hunting for significance with multiple testing Introduction 12 / 38

17 Thresholding i p i Rejection set R = {1 i m : p i (X) t} Some false positives Etienne Roquain Hunting for significance with multiple testing Introduction 12 / 38

18 Ordering p-values i p i Stopping rule Reject l smallest p-values Etienne Roquain Hunting for significance with multiple testing Introduction 13 / 38

19 Ordering p-values p(l) Stopping rule Reject l smallest p-values l Etienne Roquain Hunting for significance with multiple testing Introduction 13 / 38

20 Ordering p-values p(l) Stopping rule Reject l smallest p-values l Etienne Roquain Hunting for significance with multiple testing Introduction 13 / 38

21 Ordering p-values p(ell) Stopping rule Reject l smallest p-values l Etienne Roquain Hunting for significance with multiple testing Introduction 13 / 38

22 Choosing procedure? "Quality" criterion? Etienne Roquain Hunting for significance with multiple testing Introduction 14 / 38

23 1 Introduction 2 Classical results 3 FDP control 4 Posthoc inference Etienne Roquain Hunting for significance with multiple testing Classical results 15 / 38

24 Family-wise error rate (FWER) For a threshold t, V (t) = m i=1 (1 θ i)1{p i t}, FWER(t, P) = P(V (t) 1) = P(p (1:H ) t) FWER control α given, find t = t α with P P, FWER(t, P) α, Clear interpretation Bonferroni t = α/m (union bound) Possible refinements : X N (µ, Γ) (one-sided), Γ known V (t) V (t) = m 1{Φ(Z i ) t}, Z N (, Γ) i=1 Choose t such that P Z N (,Γ) (Φ(Z ) (1:m) t) α Relaxation to k-fwer. Etienne Roquain Hunting for significance with multiple testing Classical results 16 / 38

25 Family-wise error rate (FWER) For a threshold t, V (t) = m i=1 (1 θ i)1{p i t}, FWER(t, P) = P(V (t) 1) = P(p (1:H ) t) FWER control α given, find t = t α with P P, FWER(t, P) α, Clear interpretation Bonferroni t = α/m (union bound) Possible refinements : X N (µ, Γ) (one-sided), Γ known V (t) V (t) = m 1{Φ(Z i ) t}, Z N (, Γ) i=1 Choose t such that P Z N (,Γ) (Φ(Z ) (1:m) t) α Relaxation to k-fwer. Etienne Roquain Hunting for significance with multiple testing Classical results 16 / 38

26 Family-wise error rate (FWER) For a threshold t, V (t) = m i=1 (1 θ i)1{p i t}, FWER(t, P) = P(V (t) 1) = P(p (1:H ) t) FWER control α given, find t = t α with P P, FWER(t, P) α, Clear interpretation Bonferroni t = α/m (union bound) Possible refinements : X N (µ, Γ) (one-sided), Γ known V (t) V (t) = m 1{Φ(Z i ) t}, Z N (, Γ) i=1 Choose t such that P Z N (,Γ) (Φ(Z ) (1:m) t) α Relaxation to k-fwer. Etienne Roquain Hunting for significance with multiple testing Classical results 16 / 38

27 Family-wise error rate (FWER) For a threshold t, V (t) = m i=1 (1 θ i)1{p i t}, FWER(t, P) = P(V (t) 1) = P(p (1:H ) t) FWER control α given, find t = t α with P P, FWER(t, P) α, Clear interpretation Bonferroni t = α/m (union bound) Possible refinements : X N (µ, Γ) (one-sided), Γ known V (t) V (t) = m 1{Φ(Z i ) t}, Z N (, Γ) i=1 Choose t such that P Z N (,Γ) (Φ(Z ) (1:m) t) α Relaxation to k-fwer. Etienne Roquain Hunting for significance with multiple testing Classical results 16 / 38

28 FWER(Bonf.) illustration, α =.2, m = Noncorrected Bonferroni Etienne Roquain Hunting for significance with multiple testing Classical results 17 / 38

29 FWER(Bonf.) illustration, α =.2, m = Etienne Roquain Hunting for significance with multiple testing Classical results 17 / 38

30 FWER(Bonf.) illustration, α =.2, m = Etienne Roquain Hunting for significance with multiple testing Classical results 17 / 38

31 FWER(Bonf.) illustration, α =.2, m = Etienne Roquain Hunting for significance with multiple testing Classical results 17 / 38

32 FWER(Bonf.) illustration, α =.2, m = Etienne Roquain Hunting for significance with multiple testing Classical results 17 / 38

33 FWER(Bonf.) illustration, α =.2, m = Etienne Roquain Hunting for significance with multiple testing Classical results 17 / 38

34 FWER(Bonf.) illustration, α =.2, m = Etienne Roquain Hunting for significance with multiple testing Classical results 17 / 38

35 FWER(Bonf.) illustration, α =.2, m = Etienne Roquain Hunting for significance with multiple testing Classical results 17 / 38

36 FWER(Bonf.) illustration, α =.2, m = Etienne Roquain Hunting for significance with multiple testing Classical results 17 / 38

37 FWER(Bonf.) illustration, α =.2, m = Etienne Roquain Hunting for significance with multiple testing Classical results 17 / 38

38 False discovery rate (FDR) What should we choose for k? 4 false positives out of 1 positives? 7 false positives out of 1 positives? Idea : link k to the number of positives. For a threshold t, R( t) = m i=1 1{p i t}, FDR( t, P) = E[FDP( t, P)], FDP( t, P) = V (t) R(t) ( ) = FDR control α given, find t = t α with P P, FDR( t, P) α. If FDR.5: if 2 rejections, allow at most 1 false discovery (on average) if 1 rejections, allow at most 5 false discoveries (on average) Etienne Roquain Hunting for significance with multiple testing Classical results 18 / 38

39 False discovery rate (FDR) What should we choose for k? 4 false positives out of 1 positives? 7 false positives out of 1 positives? Idea : link k to the number of positives. For a threshold t, R( t) = m i=1 1{p i t}, FDR( t, P) = E[FDP( t, P)], FDP( t, P) = V (t) R(t) ( ) = FDR control α given, find t = t α with P P, FDR( t, P) α. If FDR.5: if 2 rejections, allow at most 1 false discovery (on average) if 1 rejections, allow at most 5 false discoveries (on average) Etienne Roquain Hunting for significance with multiple testing Classical results 18 / 38

40 False discovery rate (FDR) What should we choose for k? 4 false positives out of 1 positives? 7 false positives out of 1 positives? Idea : link k to the number of positives. For a threshold t, R( t) = m i=1 1{p i t}, FDR( t, P) = E[FDP( t, P)], FDP( t, P) = V (t) R(t) ( ) = FDR control α given, find t = t α with P P, FDR( t, P) α. If FDR.5: if 2 rejections, allow at most 1 false discovery (on average) if 1 rejections, allow at most 5 false discoveries (on average) Etienne Roquain Hunting for significance with multiple testing Classical results 18 / 38

41 False discovery rate (FDR) What should we choose for k? 4 false positives out of 1 positives? 7 false positives out of 1 positives? Idea : link k to the number of positives. For a threshold t, R( t) = m i=1 1{p i t}, FDR( t, P) = E[FDP( t, P)], FDP( t, P) = V (t) R(t) ( ) = FDR control α given, find t = t α with P P, FDR( t, P) α. If FDR.5: if 2 rejections, allow at most 1 false discovery (on average) if 1 rejections, allow at most 5 false discoveries (on average) Etienne Roquain Hunting for significance with multiple testing Classical results 18 / 38

42 BH procedure [Benjamini and Hochberg, 1995] t = α( l 1)/m with l = max{ l m : p(l) αl/m} Noncorrected BH Bonferroni Etienne Roquain Hunting for significance with multiple testing Classical results 19 / 38

43 FDR(BH) illustration, α =.2, m = FDP = Etienne Roquain Hunting for significance with multiple testing Classical results 2 / 38

44 FDR(BH) illustration, α =.2, m = FDP = Etienne Roquain Hunting for significance with multiple testing Classical results 2 / 38

45 FDR(BH) illustration, α =.2, m = FDP = Etienne Roquain Hunting for significance with multiple testing Classical results 2 / 38

46 FDR(BH) illustration, α =.2, m = FDP = Etienne Roquain Hunting for significance with multiple testing Classical results 2 / 38

47 FDR(BH) illustration, α =.2, m = FDP = Etienne Roquain Hunting for significance with multiple testing Classical results 2 / 38

48 FDR control for BH Theorem [Benjamini and Hochberg (1995)] Assume p i U(, 1) for θ i = (p i, 1 i m) are mutually indep for any P P. Then for all P P, FDR(BH α, P) π α α for π = m /m proportion of zeros in θ(p). Simple proof [Blanchard and R. (28, EJS)] Lemma : Let U U(, 1) and g : [, 1] [, ) nonincreasing measurable. Then ( ) 1{U c g(u)} E 1{g(U) > } c, for any c >. g(u) Etienne Roquain Hunting for significance with multiple testing Classical results 21 / 38

49 FDR control for BH Theorem [Benjamini and Hochberg (1995)] Assume p i U(, 1) for θ i = (p i, 1 i m) are mutually indep for any P P. Then for all P P, FDR(BH α, P) π α α for π = m /m proportion of zeros in θ(p). Simple proof [Blanchard and R. (28, EJS)] Lemma : Let U U(, 1) and g : [, 1] [, ) nonincreasing measurable. Then ( ) 1{U c g(u)} E 1{g(U) > } c, for any c >. g(u) Etienne Roquain Hunting for significance with multiple testing Classical results 21 / 38

50 FDR control for BH Theorem [Benjamini and Hochberg (1995)] Assume p i U(, 1) for θ i = (p i, 1 i m) are mutually indep for any P P. Then for all P P, FDR(BH α, P) π α α for π = m /m proportion of zeros in θ(p). Simple proof [Blanchard and R. (28, EJS)] Lemma : Let U U(, 1) and g : [, 1] [, ) nonincreasing measurable. Then ( ) 1{U c g(u)} E 1{g(U) > } c, for any c >. g(u) Etienne Roquain Hunting for significance with multiple testing Classical results 21 / 38

51 Extension to positive dependence Theorem [Benjamini and Yekutieli (21)] For all P P, FDR(BH α, P) π α if p i U(, 1) for θ i = p = (p i, 1 i m) are weak PRDS : P P, i D [, 1] m, u P(p D p i u) Simple proof [Blanchard and R. (28, EJS)] Lemma : Let U U(, 1); V such that v, u P(V < v U u) Then ( ) 1{U c V } E 1{V > } c, for any c >. V Etienne Roquain Hunting for significance with multiple testing Classical results 22 / 38

52 Extension to positive dependence Theorem [Benjamini and Yekutieli (21)] For all P P, FDR(BH α, P) π α if p i U(, 1) for θ i = p = (p i, 1 i m) are weak PRDS : P P, i D [, 1] m, u P(p D p i u) Simple proof [Blanchard and R. (28, EJS)] Lemma : Let U U(, 1); V such that v, u P(V < v U u) Then ( ) 1{U c V } E 1{V > } c, for any c >. V Etienne Roquain Hunting for significance with multiple testing Classical results 22 / 38

53 FDR revolution Benjamini and Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing GENETICS HEREDITY BIOTECHNOLOGY APPLIED MICROBIOLOGY NEUROSCIENCES NEUROLOGY MATHEMATICS SCIENCE TECHNOLOGY OTHER TOPICS BIOCHEMISTRY MOLECULAR BIOLOGY OTHERS COMPUTER SCIENCE ENVIRONMENTAL SCIENCES ECOLOGY ONCOLOGY MATHEMATICAL COMPUTATIONAL BIOLOGY Figure: 13,427 papers citing this work ( ) according to "the web of science". 3, 85 citations on google scholar Etienne Roquain Hunting for significance with multiple testing Classical results 23 / 38

54 1 Introduction 2 Classical results 3 FDP control 4 Posthoc inference Etienne Roquain Hunting for significance with multiple testing FDP control 24 / 38

55 Motivation Distribution of FDP(BH) under ρ-equicorrelation density mean median 95% quantile ρ = ρ =.1 ρ = Etienne Roquain Hunting for significance with multiple testing FDP control 25 / 38

56 Aim Choose new critical values τ l, 1 l m (incorporating dep. structure) ρ = ρ =.1 ρ =.5 ζ =.5 ζ =.25 ζ = such that t = τˆl (same way as BH) satisfies P(FDP( t) α) 1 ζ. Etienne Roquain Hunting for significance with multiple testing FDP control 26 / 38

57 Method [Delattre and R. New procedures controlling the false discovery proportion via Romano-Wolf s heuristic (215, AoS)] Proposition: B(l, l ) = P(V (τ l ) αl + 1) V (t) V (t) for all t (with V (t) in t and V () = a.s.) Then P(FDP( t) > α) m ( ) ( ) B(l, l) B(l 1, l) B(l, l 1) B(l 1, l 1) l=1 Example : X N (µ, Γ) (one-sided), Γ known ( m ) B(l, l ) = P Z N (,Γ) 1{Φ(Z i ) τ l } αl + 1. i=1 Incorporate dependence, much powerful than [Guo et al. (214, AoS)] Etienne Roquain Hunting for significance with multiple testing FDP control 27 / 38

58 1 Introduction 2 Classical results 3 FDP control 4 Posthoc inference Etienne Roquain Hunting for significance with multiple testing Posthoc inference 28 / 38

59 Coming back to the pizza... Abstract of [Giovannucci et al. (1995)] "Between 1986 and 1992, 812 new cases of prostate cancer [...] were documented. Of 46 vegetables and fruits or related products, four were significantly associated with lower prostate cancer risk; of the four tomato sauce (P for trend =.1), tomatoes (P for trend =.3), and pizza (P for trend =.5), but not strawberries were primary sources of lycopene. Combined intake of tomatoes, tomato sauce, tomato juice, and pizza (which accounted for 82% of lycopene intake) was inversely associated with risk of prostate cancer (multivariate RR =.65; 95% CI = , for consumption frequency greater than 1 versus less than 1.5 servings per week; P for trend =.1) and advanced (stages C and D) prostate cancers (multivariate RR =.47; 95% CI =.22-1.; P for trend =.3)." How many false positives in that abstract? Etienne Roquain Hunting for significance with multiple testing Posthoc inference 29 / 38

60 Volcano plot [Chiaretti et al. (24)] Etienne Roquain Hunting for significance with multiple testing Posthoc inference 3 / 38

61 Bias selection what means statistic values in the selected? Simulate i.i.d. absolute values of N (, 1) Data 1 : 5 values Data2 : 1 values Etienne Roquain Hunting for significance with multiple testing Posthoc inference 31 / 38

62 Posthoc inference Selective inference = after some selection procedure. e.g., after LASSO selection [Tibshirani et al. (215), Barber and Candes (215,AoS)] In omnia paratus posthoc inference = ready for any selection. Number of false positives in R {1,..., m} : V (R) = m (1 θ i )1{i R}. i=1 Aim: posthoc bound α given, find V α ( ) N, such that for all P P, P ( R {1,..., m} : V (R) V α (R) ) 1 α Proposed in [Goeman and Solari (211,StatScience)] (with closed testing) Etienne Roquain Hunting for significance with multiple testing Posthoc inference 32 / 38

63 Our method: reference family If A H 5, B H 7 B. A =. Etienne Roquain Hunting for significance with multiple testing Posthoc inference 33 / 38

64 Our method: reference family If A H 5, B H 7 R H? R B. A =. Etienne Roquain Hunting for significance with multiple testing Posthoc inference 33 / 38

65 Our method: reference family If A H 5, B H 7 R H 1 R B. A =. Etienne Roquain Hunting for significance with multiple testing Posthoc inference 33 / 38

66 JR criterion [Blanchard, Neuvial and R. On controlled posthoc inference (ongoing work)] Define for T = {t k } 1 k m, JR(T, P) = P( k {1,..., m} : V (t k ) k) = 1 P( k {1,..., m} : {i : p i t k } H k 1) Proposition 1 JR control at level α implies posthoc bound: { } V α (R) = 1{p i (X) > t k } + k 1, R {1,..., m}. min 1 k m i R Etienne Roquain Hunting for significance with multiple testing Posthoc inference 34 / 38

67 JR criterion [Blanchard, Neuvial and R. On controlled posthoc inference (ongoing work)] Define for T = {t k } 1 k m, JR(T, P) = P( k {1,..., m} : V (t k ) k) = 1 P( k {1,..., m} : {i : p i t k } H k 1) Proposition 1 JR control at level α implies posthoc bound: { } V α (R) = 1{p i (X) > t k } + k 1, R {1,..., m}. min 1 k m i R Etienne Roquain Hunting for significance with multiple testing Posthoc inference 34 / 38

68 JR criterion [Blanchard, Neuvial and R. On controlled posthoc inference (ongoing work)] Define for T = {t k } 1 k m, JR(T, P) = P( k {1,..., m} : V (t k ) k) = 1 P( k {1,..., m} : {i : p i t k } H k 1) Proposition 1 JR control at level α implies posthoc bound: { } V α (R) = 1{p i (X) > t k } + k 1, R {1,..., m}. min 1 k m i R Etienne Roquain Hunting for significance with multiple testing Posthoc inference 34 / 38

69 Controlling JR Proposition 2 Taking t k = t k (λ) and right-continuous in λ, ( { } JR(T, P) = P min t 1 k (p (k:h )) 1 k m ) λ Example : λ-adjustement X N (µ, Γ) (one-sided), Γ known t k (λ) = λk/m λ chosen as α-quantile of ( { m D Z N (,Γ) min (k:m)} ) k Φ(Z ) 1 k m Etienne Roquain Hunting for significance with multiple testing Posthoc inference 35 / 38

70 Outlook Various criteria FWER FDR FDP JR Various challenges Surviving to dependence Incorporating known dependence Adaptating to unknown dependence Etienne Roquain Hunting for significance with multiple testing Posthoc inference 36 / 38

71 Continuous testing in a nutshell [Blanchard, Delattre and R. Testing over a continuum of null hypotheses with False Discovery Rate control (214, Bernoulli)] (H,t, t [, 1]) continuous set of null hypotheses (p t (X), t [, 1]) p-value process (jointly-measurable) continuous FDR FDR( t, P) = E[FDP( t, P)], FDP=.83 FDP( t, P) = Λ({t : θ t =, p t (X) t}) Λ({t : p t (X) t}) pvalue process correct reject. erroneous reject. threshold= Continuous BH procedure t α Finite dimensional PRDS Result : continuous FDR control Etienne Roquain Hunting for significance with multiple testing Posthoc inference 37 / 38

72 Continuous testing in a nutshell [Blanchard, Delattre and R. Testing over a continuum of null hypotheses with False Discovery Rate control (214, Bernoulli)] (H,t, t [, 1]) continuous set of null hypotheses (p t (X), t [, 1]) p-value process (jointly-measurable) continuous FDR FDR( t, P) = E[FDP( t, P)], FDP=.83 FDP( t, P) = Λ({t : θ t =, p t (X) t}) Λ({t : p t (X) t}) pvalue process correct reject. erroneous reject. threshold= Continuous BH procedure t α Finite dimensional PRDS Result : continuous FDR control Etienne Roquain Hunting for significance with multiple testing Posthoc inference 37 / 38

73 Application [Picard et al. Continuous testing for Poisson process intensities (ongoing work)] N A heterogeneous poisson process intensity λ A N B heterogeneous poisson process intensity λ B H,t : λ A = λ B sur I t " I t, t [, 1], sliding windows Basic test statistics give finite dim PRDS Continuous BH = data-driven weighted BH Pouzat, package STAR : Spike Train Analysis with R. Etienne Roquain Hunting for significance with multiple testing Posthoc inference 38 / 38

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

Lecture 7 April 16, 2018

Lecture 7 April 16, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 7 April 16, 2018 Prof. Emmanuel Candes Scribe: Feng Ruan; Edited by: Rina Friedberg, Junjie Zhu 1 Outline Agenda: 1. False Discovery Rate (FDR) 2. Properties

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis

More information

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses Gavin Lynch Catchpoint Systems, Inc., 228 Park Ave S 28080 New York, NY 10003, U.S.A. Wenge Guo Department of Mathematical

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES By Wenge Guo Gavin Lynch Joseph P. Romano Technical Report No. 2018-06 September 2018

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

Peak Detection for Images

Peak Detection for Images Peak Detection for Images Armin Schwartzman Division of Biostatistics, UC San Diego June 016 Overview How can we improve detection power? Use a less conservative error criterion Take advantage of prior

More information

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo PROCEDURES CONTROLLING THE k-fdr USING BIVARIATE DISTRIBUTIONS OF THE NULL p-values Sanat K. Sarkar and Wenge Guo Temple University and National Institute of Environmental Health Sciences Abstract: Procedures

More information

Multiple testing: Intro & FWER 1

Multiple testing: Intro & FWER 1 Multiple testing: Intro & FWER 1 Mark van de Wiel mark.vdwiel@vumc.nl Dep of Epidemiology & Biostatistics,VUmc, Amsterdam Dep of Mathematics, VU 1 Some slides courtesy of Jelle Goeman 1 Practical notes

More information

arxiv: v2 [stat.me] 14 Mar 2011

arxiv: v2 [stat.me] 14 Mar 2011 Submission Journal de la Société Française de Statistique arxiv: 1012.4078 arxiv:1012.4078v2 [stat.me] 14 Mar 2011 Type I error rate control for testing many hypotheses: a survey with proofs Titre: Une

More information

High-throughput Testing

High-throughput Testing High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector

More information

Stat 206: Estimation and testing for a mean vector,

Stat 206: Estimation and testing for a mean vector, Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where

More information

False Discovery Control in Spatial Multiple Testing

False Discovery Control in Spatial Multiple Testing False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University

More information

Post hoc inference via joint family-wise error rate control

Post hoc inference via joint family-wise error rate control Post hoc inference via joint family-wise error rate control Gilles Blanchard, Pierre Neuvial, Etienne Roquain To cite this version: Gilles Blanchard, Pierre Neuvial, Etienne Roquain. Post hoc inference

More information

HABILITATION À DIRIGER DES RECHERCHES. Contributions to multiple testing theory for high-dimensional data

HABILITATION À DIRIGER DES RECHERCHES. Contributions to multiple testing theory for high-dimensional data UNIVERSITÉ PIERRE ET MARIE CURIE HABILITATION À DIRIGER DES RECHERCHES Auteur : Etienne Roquain Contributions to multiple testing theory for high-dimensional data Laboratoire de Probabilités et Modèles

More information

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR CONTROLLING THE FALSE DISCOVERY RATE A Dissertation in Statistics by Scott Roths c 2011

More information

Supplement to Post hoc inference via joint family-wise error rate control

Supplement to Post hoc inference via joint family-wise error rate control Supplement to Post hoc inference via joint family-wise error rate control Gilles Blanchard Universität Potsdam, Institut für Mathemati Karl-Liebnecht-Straße 24-25 14476 Potsdam, Germany e-mail: gilles.blanchard@math.uni-potsdam.de

More information

Journal de la Société Française de Statistique Vol. 152 No. 2 (2011) Type I error rate control for testing many hypotheses: a survey with proofs

Journal de la Société Française de Statistique Vol. 152 No. 2 (2011) Type I error rate control for testing many hypotheses: a survey with proofs Journal de la Société Française de Statistique Vol. 152 No. 2 (2011) Type I error rate control for testing many hypotheses: a survey with proofs Titre: Une revue du contrôle de l erreur de type I en test

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

Sanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract

Sanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract Adaptive Controls of FWER and FDR Under Block Dependence arxiv:1611.03155v1 [stat.me] 10 Nov 2016 Wenge Guo Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102, U.S.A.

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

On Methods Controlling the False Discovery Rate 1

On Methods Controlling the False Discovery Rate 1 Sankhyā : The Indian Journal of Statistics 2008, Volume 70-A, Part 2, pp. 135-168 c 2008, Indian Statistical Institute On Methods Controlling the False Discovery Rate 1 Sanat K. Sarkar Temple University,

More information

A Large-Sample Approach to Controlling the False Discovery Rate

A Large-Sample Approach to Controlling the False Discovery Rate A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University

More information

Doing Cosmology with Balls and Envelopes

Doing Cosmology with Balls and Envelopes Doing Cosmology with Balls and Envelopes Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

False discovery control for multiple tests of association under general dependence

False discovery control for multiple tests of association under general dependence False discovery control for multiple tests of association under general dependence Nicolai Meinshausen Seminar für Statistik ETH Zürich December 2, 2004 Abstract We propose a confidence envelope for false

More information

Alpha-Investing. Sequential Control of Expected False Discoveries

Alpha-Investing. Sequential Control of Expected False Discoveries Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint

More information

New Procedures for False Discovery Control

New Procedures for False Discovery Control New Procedures for False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Elisha Merriam Department of Neuroscience University

More information

Resampling-Based Control of the FDR

Resampling-Based Control of the FDR Resampling-Based Control of the FDR Joseph P. Romano 1 Azeem S. Shaikh 2 and Michael Wolf 3 1 Departments of Economics and Statistics Stanford University 2 Department of Economics University of Chicago

More information

Adaptive FDR control under independence and dependence

Adaptive FDR control under independence and dependence Adaptive FDR control under independence and dependence Gilles Blanchard, Etienne Roquain To cite this version: Gilles Blanchard, Etienne Roquain. Adaptive FDR control under independence and dependence.

More information

Familywise Error Rate Controlling Procedures for Discrete Data

Familywise Error Rate Controlling Procedures for Discrete Data Familywise Error Rate Controlling Procedures for Discrete Data arxiv:1711.08147v1 [stat.me] 22 Nov 2017 Yalin Zhu Center for Mathematical Sciences, Merck & Co., Inc., West Point, PA, U.S.A. Wenge Guo Department

More information

New Approaches to False Discovery Control

New Approaches to False Discovery Control New Approaches to False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie

More information

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint

More information

Sta$s$cs for Genomics ( )

Sta$s$cs for Genomics ( ) Sta$s$cs for Genomics (140.688) Instructor: Jeff Leek Slide Credits: Rafael Irizarry, John Storey No announcements today. Hypothesis testing Once you have a given score for each gene, how do you decide

More information

Control of Directional Errors in Fixed Sequence Multiple Testing

Control of Directional Errors in Fixed Sequence Multiple Testing Control of Directional Errors in Fixed Sequence Multiple Testing Anjana Grandhi Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102-1982 Wenge Guo Department of Mathematical

More information

Statistical Emerging Pattern Mining with Multiple Testing Correction

Statistical Emerging Pattern Mining with Multiple Testing Correction Statistical Emerging Pattern Mining with Multiple Testing Correction Junpei Komiyama 1, Masakazu Ishihata 2, Hiroki Arimura 2, Takashi Nishibayashi 3, Shin-Ichi Minato 2 1. U-Tokyo 2. Hokkaido Univ. 3.

More information

Two simple sufficient conditions for FDR control

Two simple sufficient conditions for FDR control Electronic Journal of Statistics Vol. 2 (2008) 963 992 ISSN: 1935-7524 DOI: 10.1214/08-EJS180 Two simple sufficient conditions for FDR control Gilles Blanchard, Fraunhofer-Institut FIRST Kekuléstrasse

More information

More powerful control of the false discovery rate under dependence

More powerful control of the false discovery rate under dependence Statistical Methods & Applications (2006) 15: 43 73 DOI 10.1007/s10260-006-0002-z ORIGINAL ARTICLE Alessio Farcomeni More powerful control of the false discovery rate under dependence Accepted: 10 November

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

On adaptive procedures controlling the familywise error rate

On adaptive procedures controlling the familywise error rate , pp. 3 On adaptive procedures controlling the familywise error rate By SANAT K. SARKAR Temple University, Philadelphia, PA 922, USA sanat@temple.edu Summary This paper considers the problem of developing

More information

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a

More information

This paper has been submitted for consideration for publication in Biometrics

This paper has been submitted for consideration for publication in Biometrics BIOMETRICS, 1 10 Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses A. Farcomeni Department of Public Health and Infectious Diseases Sapienza

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

arxiv: v1 [stat.me] 24 May 2017

arxiv: v1 [stat.me] 24 May 2017 Continuous testing for Poisson process intensities : A new perspective on scanning statistics arxiv:1705.08800v1 [stat.me] 24 May 2017 Franck Picard Univ Lyon, Université Lyon 1, CNRS, LBBE UMR5558, F-69622

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

arxiv: v2 [stat.me] 31 Aug 2017

arxiv: v2 [stat.me] 31 Aug 2017 Multiple testing with discrete data: proportion of true null hypotheses and two adaptive FDR procedures Xiongzhi Chen, Rebecca W. Doerge and Joseph F. Heyse arxiv:4274v2 [stat.me] 3 Aug 207 Abstract We

More information

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Research Article Sample Size Calculation for Controlling False Discovery Proportion Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,

More information

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Amit Zeisel, Or Zuk, Eytan Domany W.I.S. June 5, 29 Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving

More information

Post-selection Inference for Changepoint Detection

Post-selection Inference for Changepoint Detection Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,

More information

Package pfa. July 4, 2016

Package pfa. July 4, 2016 Type Package Package pfa July 4, 2016 Title Estimates False Discovery Proportion Under Arbitrary Covariance Dependence Version 1.1 Date 2016-06-24 Author Jianqing Fan, Tracy Ke, Sydney Li and Lucy Xia

More information

Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses

Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses arxiv:1610.03330v1 [stat.me] 11 Oct 2016 Jingshu Wang, Chiara Sabatti, Art B. Owen Department of Statistics, Stanford University

More information

Specific Differences. Lukas Meier, Seminar für Statistik

Specific Differences. Lukas Meier, Seminar für Statistik Specific Differences Lukas Meier, Seminar für Statistik Problem with Global F-test Problem: Global F-test (aka omnibus F-test) is very unspecific. Typically: Want a more precise answer (or have a more

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

Heterogeneity and False Discovery Rate Control

Heterogeneity and False Discovery Rate Control Heterogeneity and False Discovery Rate Control Joshua D Habiger Oklahoma State University jhabige@okstateedu URL: jdhabigerokstateedu August, 2014 Motivating Data: Anderson and Habiger (2012) M = 778 bacteria

More information

Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators

Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators Pierre Neuvial To cite this version: Pierre Neuvial. Asymptotic Results on Adaptive False Discovery

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

Bumpbars: Inference for region detection. Yuval Benjamini, Hebrew University

Bumpbars: Inference for region detection. Yuval Benjamini, Hebrew University Bumpbars: Inference for region detection Yuval Benjamini, Hebrew University yuvalbenj@gmail.com WHOA-PSI-2017 Collaborators Jonathan Taylor Stanford Rafael Irizarry Dana Farber, Harvard Amit Meir U of

More information

hal , version 2-2 Apr 2010

hal , version 2-2 Apr 2010 Submitted to the Annals of Statistics arxiv: 1002.2845 EXACT CALCULATIONS FOR FALSE DISCOVERY PROPORTION WITH APPLICATION TO LEAST FAVORABLE CONFIGURATIONS By Etienne Roquain and Fanny Villers UPMC University

More information

arxiv: v3 [math.st] 15 Jul 2018

arxiv: v3 [math.st] 15 Jul 2018 A New Step-down Procedure for Simultaneous Hypothesis Testing Under Dependence arxiv:1503.08923v3 [math.st] 15 Jul 2018 Contents Prasenjit Ghosh 1 and Arijit Chakrabarti 2 1 Department of Statistics, Presidency

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

Testing over a continuum of null hypotheses with False Discovery Rate control

Testing over a continuum of null hypotheses with False Discovery Rate control Bernoulli 20(1), 2014, 304 333 DOI: 10.3150/12-BEJ488 arxiv:1110.3599v3 [stat.me] 6 Feb 2014 Testing over a continuum of null hypotheses with False Discovery Rate control GILLES BLANCHARD 1, SYLVAIN DELATTRE

More information

Week 5 Video 1 Relationship Mining Correlation Mining

Week 5 Video 1 Relationship Mining Correlation Mining Week 5 Video 1 Relationship Mining Correlation Mining Relationship Mining Discover relationships between variables in a data set with many variables Many types of relationship mining Correlation Mining

More information

Chapter 1. Stepdown Procedures Controlling A Generalized False Discovery Rate

Chapter 1. Stepdown Procedures Controlling A Generalized False Discovery Rate Chapter Stepdown Procedures Controlling A Generalized False Discovery Rate Wenge Guo and Sanat K. Sarkar Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park,

More information

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 28 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 3, 2015 1 2 3 4 5 1 Familywise error rates 2 procedure 3 Performance of with multiple

More information

Hypes and Other Important Developments in Statistics

Hypes and Other Important Developments in Statistics Hypes and Other Important Developments in Statistics Aad van der Vaart Vrije Universiteit Amsterdam May 2009 The Hype Sparsity For decades we taught students that to estimate p parameters one needs n p

More information

STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University

STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE Wenge Guo 1 and Sanat K. Sarkar 2 National Institute of Environmental Health Sciences and Temple University Abstract: Often in practice

More information

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Multiple testing methods to control the False Discovery Rate (FDR),

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 164 Multiple Testing Procedures: R multtest Package and Applications to Genomics Katherine

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 11 1 / 44 Tip + Paper Tip: Two today: (1) Graduate school

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity

More information

Procedures controlling generalized false discovery rate

Procedures controlling generalized false discovery rate rocedures controlling generalized false discovery rate By SANAT K. SARKAR Department of Statistics, Temple University, hiladelphia, A 922, U.S.A. sanat@temple.edu AND WENGE GUO Department of Environmental

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. doi: 10.18637/jss.v000.i00 GroupTest: Multiple Testing Procedure for Grouped Hypotheses Zhigen Zhao Abstract In the modern Big Data

More information

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1

Linear Combinations. Comparison of treatment means. Bruce A Craig. Department of Statistics Purdue University. STAT 514 Topic 6 1 Linear Combinations Comparison of treatment means Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 6 1 Linear Combinations of Means y ij = µ + τ i + ǫ ij = µ i + ǫ ij Often study

More information

Control of Generalized Error Rates in Multiple Testing

Control of Generalized Error Rates in Multiple Testing Institute for Empirical Research in Economics University of Zurich Working Paper Series ISSN 1424-0459 Working Paper No. 245 Control of Generalized Error Rates in Multiple Testing Joseph P. Romano and

More information

Lecture 6 April

Lecture 6 April Stats 300C: Theory of Statistics Spring 2017 Lecture 6 April 14 2017 Prof. Emmanuel Candes Scribe: S. Wager, E. Candes 1 Outline Agenda: From global testing to multiple testing 1. Testing the global null

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

Online Rules for Control of False Discovery Rate and False Discovery Exceedance

Online Rules for Control of False Discovery Rate and False Discovery Exceedance Online Rules for Control of False Discovery Rate and False Discovery Exceedance Adel Javanmard and Andrea Montanari August 15, 216 Abstract Multiple hypothesis testing is a core problem in statistical

More information

1 Statistical inference for a population mean

1 Statistical inference for a population mean 1 Statistical inference for a population mean 1. Inference for a large sample, known variance Suppose X 1,..., X n represents a large random sample of data from a population with unknown mean µ and known

More information

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Joshua D. Habiger Oklahoma State University jhabige@okstate.edu Nov. 8, 2013 Outline 1 : Motivation and FDR Research Areas

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Confidence Thresholds and False Discovery Control

Confidence Thresholds and False Discovery Control Confidence Thresholds and False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics

More information

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE Sanat K. Sarkar 1, Tianhui Zhou and Debashis Ghosh Temple University, Wyeth Pharmaceuticals and

More information

Universität Potsdam. Testing over a continuum of null hypotheses. Gilles Blanchard Sylvain Delattre Étienne Roquain

Universität Potsdam. Testing over a continuum of null hypotheses. Gilles Blanchard Sylvain Delattre Étienne Roquain Universität Potsdam Gilles Blanchard Sylvain Delattre Étienne Roquain Testing over a continuum of null hypotheses Preprints des Instituts für Mathematik der Universität Potsdam 1 (2012) 1 Preprints des

More information

Statistica Sinica Preprint No: SS R1

Statistica Sinica Preprint No: SS R1 Statistica Sinica Preprint No: SS-2017-0072.R1 Title Control of Directional Errors in Fixed Sequence Multiple Testing Manuscript ID SS-2017-0072.R1 URL http://www.stat.sinica.edu.tw/statistica/ DOI 10.5705/ss.202017.0072

More information

Department of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007

Department of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007 Department of Statistics University of Central Florida Technical Report TR-2007-01 25APR2007 Revised 25NOV2007 Controlling the Number of False Positives Using the Benjamini- Hochberg FDR Procedure Paul

More information

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence

Estimating False Discovery Proportion Under Arbitrary Covariance Dependence Estimating False Discovery Proportion Under Arbitrary Covariance Dependence arxiv:1010.6056v2 [stat.me] 15 Nov 2011 Jianqing Fan, Xu Han and Weijie Gu May 31, 2018 Abstract Multiple hypothesis testing

More information

Familywise Error Rate Control via Knockoffs

Familywise Error Rate Control via Knockoffs Familywise Error Rate Control via Knockoffs Abstract We present a novel method for controlling the k-familywise error rate (k-fwer) in the linear regression setting using the knockoffs framework first

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Séminaire INRA de Toulouse

Séminaire INRA de Toulouse Séminaire INRA de Toulouse Mélisande ALBERT Travaux de thèse, Université Nice Sophia Antipolis, LJAD Vendredi 17 mars Mélisande ALBERT Séminaire INRA Toulouse Vendredi 17 mars 1 / 28 Table of contents

More information

Lecture 12 April 25, 2018

Lecture 12 April 25, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 12 April 25, 2018 Prof. Emmanuel Candes Scribe: Emmanuel Candes, Chenyang Zhong 1 Outline Agenda: The Knockoffs Framework 1. The Knockoffs Framework

More information

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests

Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Incorporation of Sparsity Information in Large-scale Multiple Two-sample t Tests Weidong Liu October 19, 2014 Abstract Large-scale multiple two-sample Student s t testing problems often arise from the

More information

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS Ying Liu Department of Biostatistics, Columbia University Summer Intern at Research and CMC Biostats, Sanofi, Boston August 26, 2015 OUTLINE 1 Introduction

More information

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Statistica Sinica 19 (2009), 125-143 EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Debashis Ghosh Penn State University Abstract: There is much recent interest

More information