False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

Size: px
Start display at page:

Download "False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data"

Transcription

1 False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008

2 1 / 35 Lecture outline Motivation for not using classical p values in large-scale simultaneous multiple testing situations False discovery rate (FDR) and other multiple testing error measurements Estimation of FDR FDR, power and sample size

3 2 / 35 Classical single hypothesis testing Let µ be the difference in mean between two groups. We want to test the hypotheses H 0 : µ = 0 vs H 1 : µ 0 Observations in group 1: X = X 1, X 2,..., X nx Observations in group 2: Y = Y 1, Y 2,..., Y ny Test procedure Find a test statistic Z = h(x, Y ). Reject H 0 if p = 2P(Z > z obs given H 0 is true) < α, where α is significance level (e.g. 0.05) and z obs is observed value of Z.

4 3 / 35 When H 0 is correct f(t) Frequency Distribution of the test statistic. t P value Given that the model for the data used under H 0 is correct, p values have a Uniform(0,1) distribution.

5 4 / 35 Single hypothesis testing set-up Not reject H 0 Reject H 0 H 0 true Correct Type I error H 0 false Type II error Correct Significance level=p(type I error)=α Power=1-P(type II error)=β, i.e. probability of detecting a difference if there is a true difference.

6 Microarrays Microarrays measure differences in expression levels between two conditions. Sick vs healthy Microarray gene expressions More expressed in the sick individual More expressed in the healthy individual Same expression level in sick and healthy individuals 5 / 35

7 6 / 35 Microarray test statistic We want to test differential expression between two groups for i = 1,..., m genes (m of order 10000). This can be done using the ordinary two sample t statistic t i = x i ȳ i σ i, where σ i is the (estimated) standard deviation for the difference x i ȳ i. Variance estimates can be improved by borrowing strength across genes in a technique called variance shrinkage: z i = x i ȳ i. B σ 2 all + (1 B) σ i 2

8 7 / 35 Bootstrap estimated test statistic Variance shrinkage is often accompanied by bootstrap estimation of the test statistic under H 0. For B bootstrap samples: {x 1,..., x n, y 1,..., y n }: (draw) {x 1,..., x n},{y 1,..., y n} Calculate the null statistic z from the x s and the y s. Compare observed test statistic z obs with the B z -values. Frequency Histogram of z z obs z

9 P value P value 8 / 35 P values from a microarray experiment Frequency Frequency p values for null genes p values for non-null genes. Frequency True positives False positives True negatives False negatives α P value p values for all genes on the microarray

10 9 / 35 Multiple testing set-up Not reject H 0 Reject H 0 Total H 0 true TN FP m 0 H 0 false FN TP m m 0 Total m R R m m = # of hypotheses. m 0 = # of true H 0 s R = # of rejected H 0 s TP = # of true positives FP = # of false positives TN = # of true negatives FN = # of false negatives

11 10 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Type I error rates Family-wise error rate (FWER): FWER = P(FP 1) False discovery rate (FDR): FDR = E{ FP R I (R > 0)}, i.e. the expected proportion of falsely rejected H 0 among all rejections if there are any rejections, otherwise zero. Positive false discovery rate (pfdr): pfdr = E( FP R R > 0), i.e. same as FDR, but conditioned on having at least one rejection. Per comparison error rate (PCER): PCER = E(FP) m

12 11 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Family-wise error rates (FWER) Usual way of controlling for multiple testing in the pre-genomic era. FWER= Pr(FP 1) is the probability of at least one false positive. Most common method Bonferroni(1936): p = min(mp, 1) Other methods Šidàk (1967) Stepwise procedures, e.g. Holm (1979) Westfall & Young (1993) For genome-wide data controlling FWER leads to very low power! Less conservative approach: Generalized FWER (Dudoit et al., 2004, and van der Laan et al., 2004): P(FP k).

13 12 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) False discovery rate (FDR) Benjamini and Hochberg (1995) FDR = E{ FP R I (R > 0)} is the expected proportion of false positives, if there are any positives, else zero. Common method: Benjamini & Hochberg s (BH) step-up procedure: Let p (1) p (2) p (m) be the ordered raw p values. Let k = max{k : mp (k) α} k Reject all hypotheses for which the corresponding p values are smaller than p ( k) : p (1),..., p ( k), p ( k+1),..., p (m).

14 13 / 35 BH step-up: Motivation Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) k = max{k : mp (k) k α} Core of the BH step-up is mp (k). k m 0 p (k) is an estimate of the expected number of false positives when p (k) is cut-off value for the raw p values. Since m 0 is unknown, m is used as a conservative estimate of m 0. is then an estimate of the proportion of expected false positives among the total number of positives k. mp (k) k

15 14 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Modification for general dependence Benjamini & Yekutiely (2001) The Benjamini & Yekutiely (BY) step-up procedure modifies for general dependence: k = max{k : m m l=1 1 l p (k) α} k When m is large the penalty of the BY-procedure is about log(m) compared to the BH-procedure Can be a large price to pay for allowing arbitrary dependence (Ge et al. 2003)

16 15 / 35 Proportion of true nulls Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) The number of null genes m 0 is unknown, therefore also the proportion π 0 = m0 m. π 0 is important in estimation of FDR.

17 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Estimating π 0 Schweder and Spjøtvoll s estimator Look at an interval [λ, 1], where most p values are assumed to come from true nulls. The Schweder and Spjøtvoll (1982) estimator is π 0 (λ) = #{p i > λ} m(1 λ) for a fixed λ (0, 1) Frequency Null genes Non-null genes λ 16 / 35

18 17 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Estimating π 0 using convex decreasing p value density (Langaas et al., 2002) For p close to 1, f (p) π 0. Reasonable to assume that f (p) is decreasing in p. Assuming f (p) also is convex leads to improved estimation of f (1), which can be used as an estimate of π 0. Decreasing p values. Convex decreasing p values.

19 18 / 35 Inserting π 0 to improve FDR estimate Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) The BH step-up procedure finds k = max{k : mp (k) k α}, where m was a conservative estimate of the number of true nulls. The BH procedure with adaptive control (Benjamini & Hochberg, 2000) finds k = max{k : π 0mp (k) k α}.

20 19 / 35 Mixture model for p values Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) According to Genovese & Wasserman (2002) Conditional distributions of p values Null genes: Uniform(0,1) (when correct distribution for test statistic is used to calculate the p values.) Non-null genes: h(p) Unconditional distribution of p values is then f (p) = π (1 π 0 ) h(p)

21 20 / 35 Mixture model for test statistic Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Unconditional distribution of z values is (Efron et al., 2001) f (z) = π 0 f 0 (z) + (1 π 0 ) f 1 (z), where f 0 (z) is the distribution of the test statistic Z for non-null genes and f 1 (z) is the distribution of Z for non-null genes.

22 21 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) (Empirical) Bayesian Fdr and local Fdr Assume (without loss of generality) that H 0 is rejected for large values of Z. The mixture model based or (empirical) Bayesian false discovery rate is q(z) = Fdr(z) = P(H 0 true Z z) = P(Z z H 0 true)p(h 0 true) P(Z z) = π 0(1 F 0 (z)) (1 F (z)), where F 0 is the cumulative distribution of Z under H 0, and F is the unconditional cumulative distributions of Z. Local Fdr (locally at Z = z) is defined as (Efron et al., 2001) fdr(z) = P(H 0 true Z = z) = π 0f 0 (z) f (z)

23 22 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Connection between BH ( frequentist ) FDR and empirical Bayesian Fdr Frequentist procedure: The BH step-up procedure with adaptive control finds k such that k = max{k : π 0p (k) k/m α}. Rejecting p 1,..., p k provides FDR α. Let z 1 z 2 z m be the ordered z values. The empirical Bayesian procedure finds l = max l : Fdr(zl ) α, where Fdr(z l ) = π 0P(Z z l H 0 true) P(Z z l ) = π 0p l l/m

24 23 / 35 Estimation under mixture model Recall the mixture model f (z) = π 0 f 0 (z) + (1 π 0 ) f 1 (z). Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Null distribution f 0 (z) is usually assumed N(0, 1) (but normality assumption may be violated), or found by bootstrap estimation via resampling group labels. Unconditional distribution f (z) can be approximated by smoothing the empirical distribution.

25 24 / 35 Estimation under mixture model Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Upper bound for π 0 can be found by requiring (Efron et al., 2001) 1 fdr(z) = 1 π 0 f 0 (z)/f (z) > 0 for all z This yields π 0 min f (z)/f 0 (z) z

26 25 / 35 Violation of N(0,1) assumption Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) The null distribution is not necessarily N(0, 1). Deviations from N(0, 1) are caused by (1) Non-normal data and n too small for asymptotic theory to be valid. (2) Unobserved covariates. Inflate the distribution. (3) Correlation across arrays (4) Correlation between genes Bootstrap can not resolve (2) (4). Efron (2007) suggests to estimate empirical null distribution.

27 26 / 35 Family-wise error rate (FWER) False Discovery rate Benjamini & Hochberg ( frequentist approach) False Discovery Rate Mixture model ( Bayesian approach) Estimating empirical null distribution (Efron, 2007) Assume f 0 (z) N(δ 0, σ0) 2 Estimate δ 0 and σ0 2 by fitting a quadratic curve to the log of the distribution of Z around 0. The procedure is called central matching.

28 Type II errors Type II errors Optimizing power Sample size False non-discovery rate (FNDR) is the proportion of non-null genes among all non-significant genes. False negative rate (FNR) is the proportion of non-significant genes among all non-null genes. Sensitivity=power=1-FNR, i.e. proportion of significant genes among all non-null genes. Given Type I error rate α, an optimal testing procedure maximizes sensitivity (minimizes FNR). Frequency True positives False positives True negatives False negatives α 27 / 35

29 28 / 35 Type II errors Optimizing power Sample size Optimal discovery procedure (Storey, 2007) Neyman-Pearson (NP) lemma (1933): Given observed data, optimal testing procedure is based on likelihood ratio P(data H 1 ) P(data H 0 ) Storey (2007) applies NP lemma to multiple testing situation. Assume that test j has density f j under H 0 and g j under H 1. The optimal discovery procedure (ODP) statistics for a gene with observation vector x is defined as S ODP (x) = Sum of P(x under H 1) for all non-null genes Sum of P(x under H 0 ) for all null genes m j=m = 0 +1 g j(x) m0 j=1 f j(x) The f j s and g j s, as well as m 0, must be estimated.

30 29 / 35 Type II errors Optimizing power Sample size Optimal discovery procedure (Storey, 2007) The ODP procedure: 1 Evaluate the estimated ODP statistic for each gene 2 Use bootstrap to simulate data from the null distribution for each gene, and recompute ODP to get a null distribution for ODP. 3 Use observed and resampled ODPs to calculate q-value for each gene.

31 30 / 35 Type II errors Optimizing power Sample size Covariate modulated FDR (Ferkingstad et al., 2008) Sensitivity can also be increased by adding external covariates x i, i = 1,... m. Let g(p x) be the conditional density of p under H 1 and π 0 (x) = P(H 0 true x) Mixture model for p values given x is then f (p x) = π 0 (x)+(1 π 0 (x))g(p x).

32 31 / 35 Type II errors Optimizing power Sample size Sample size assessments (Pawitan et al., 2005) FDR (and FNR) as a function of sample size.

33 32 / 35 Type II errors Optimizing power Sample size Sample size assessments (Efron, 2007) Efron (2007) studied how multiplying the sample size with a factor c would affect local Fdr. c Prostate cancer HIV

34 33 / 35 Summary References Summary Use of classical p values is problematic in large-scale simultaneous hypothesis testing situations, as it easily generates too many false positives. For microarrays, False Discovery Rate (FDR) is a convenient measure for balancing the number of false positives and false negatives. FDR can be calculated using the Benjamini & Hochberg step-up procedure ( frequentist ) approach or a mixed model ( Bayesian or empirical Bayesian ) approach. The mixed model approach has recently been used to avoid the N(0, 1) null distribution assumption, and to include external covariates. Methods for power and sample size calculations when controlling significance via FDR have recently been proposed.

35 34 / 35 Summary References References Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practival and powerful approach to multiple testing. J. Roy. Statist. Soc. B, 57: Benjamini, Y. and Hochberg, Y. (2000). The adaptive control of the false discovery rate in multiple hypotheses testing. J. Behav. Educ. Statist., 25: Benjamini, Y. and Yekutieli, Y. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist., 29: Efron, B. (2007). Size, power and false discovery rates. Ann. Statist., 35: Efron, B. et al. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc., 96: Ferkingstad, E. et al. (2008). Unsupervised empirical Bayesian multiple testing with external covariates. Ann. of appl. statist., 2:

36 35 / 35 Summary References References Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control Ann. Statist., 32: Langaas, M. et al. (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data. J. Roy. Statist. Soc. Ser. B, 67: Pawitan, Y. et al. (2005). False discovery rate, sensitivity and sample size for microarray studies. Bioinforamtics, 21: Storey, J. D. (2002). A direct approach to false discovery rates. J. Roy. Statist. Soc. B, 64: Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA, 100: Storey, J. D. (2007). The optimal discovery procedure: a new approach to simultaneous significance testing. J. Roy. Statist. Soc. B, 69:

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

The optimal discovery procedure: a new approach to simultaneous significance testing

The optimal discovery procedure: a new approach to simultaneous significance testing J. R. Statist. Soc. B (2007) 69, Part 3, pp. 347 368 The optimal discovery procedure: a new approach to simultaneous significance testing John D. Storey University of Washington, Seattle, USA [Received

More information

High-throughput Testing

High-throughput Testing High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

Stat 206: Estimation and testing for a mean vector,

Stat 206: Estimation and testing for a mean vector, Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where

More information

CHOOSING THE LESSER EVIL: TRADE-OFF BETWEEN FALSE DISCOVERY RATE AND NON-DISCOVERY RATE

CHOOSING THE LESSER EVIL: TRADE-OFF BETWEEN FALSE DISCOVERY RATE AND NON-DISCOVERY RATE Statistica Sinica 18(2008), 861-879 CHOOSING THE LESSER EVIL: TRADE-OFF BETWEEN FALSE DISCOVERY RATE AND NON-DISCOVERY RATE Radu V. Craiu and Lei Sun University of Toronto Abstract: The problem of multiple

More information

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS

EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Statistica Sinica 19 (2009), 125-143 EMPIRICAL BAYES METHODS FOR ESTIMATION AND CONFIDENCE INTERVALS IN HIGH-DIMENSIONAL PROBLEMS Debashis Ghosh Penn State University Abstract: There is much recent interest

More information

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University The Annals of Statistics 2006, Vol. 34, No. 1, 394 415 DOI: 10.1214/009053605000000778 Institute of Mathematical Statistics, 2006 FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING

More information

FDR and ROC: Similarities, Assumptions, and Decisions

FDR and ROC: Similarities, Assumptions, and Decisions EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1 2004 Article 13 Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates Sandrine Dudoit Mark

More information

STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University

STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE Wenge Guo 1 and Sanat K. Sarkar 2 National Institute of Environmental Health Sciences and Temple University Abstract: Often in practice

More information

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint

More information

Two-stage stepup procedures controlling FDR

Two-stage stepup procedures controlling FDR Journal of Statistical Planning and Inference 38 (2008) 072 084 www.elsevier.com/locate/jspi Two-stage stepup procedures controlling FDR Sanat K. Sarar Department of Statistics, Temple University, Philadelphia,

More information

Large-Scale Hypothesis Testing

Large-Scale Hypothesis Testing Chapter 2 Large-Scale Hypothesis Testing Progress in statistics is usually at the mercy of our scientific colleagues, whose data is the nature from which we work. Agricultural experimentation in the early

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

Estimation of the False Discovery Rate

Estimation of the False Discovery Rate Estimation of the False Discovery Rate Coffee Talk, Bioinformatics Research Center, Sept, 2005 Jason A. Osborne, osborne@stat.ncsu.edu Department of Statistics, North Carolina State University 1 Outline

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

A Large-Sample Approach to Controlling the False Discovery Rate

A Large-Sample Approach to Controlling the False Discovery Rate A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University

More information

The miss rate for the analysis of gene expression data

The miss rate for the analysis of gene expression data Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,

More information

Chapter 1. Stepdown Procedures Controlling A Generalized False Discovery Rate

Chapter 1. Stepdown Procedures Controlling A Generalized False Discovery Rate Chapter Stepdown Procedures Controlling A Generalized False Discovery Rate Wenge Guo and Sanat K. Sarkar Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park,

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE Sanat K. Sarkar 1, Tianhui Zhou and Debashis Ghosh Temple University, Wyeth Pharmaceuticals and

More information

Aliaksandr Hubin University of Oslo Aliaksandr Hubin (UIO) Bayesian FDR / 25

Aliaksandr Hubin University of Oslo Aliaksandr Hubin (UIO) Bayesian FDR / 25 Presentation of The Paper: The Positive False Discovery Rate: A Bayesian Interpretation and the q-value, J.D. Storey, The Annals of Statistics, Vol. 31 No.6 (Dec. 2003), pp 2013-2035 Aliaksandr Hubin University

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are

More information

Procedures controlling generalized false discovery rate

Procedures controlling generalized false discovery rate rocedures controlling generalized false discovery rate By SANAT K. SARKAR Department of Statistics, Temple University, hiladelphia, A 922, U.S.A. sanat@temple.edu AND WENGE GUO Department of Environmental

More information

Multiple testing: Intro & FWER 1

Multiple testing: Intro & FWER 1 Multiple testing: Intro & FWER 1 Mark van de Wiel mark.vdwiel@vumc.nl Dep of Epidemiology & Biostatistics,VUmc, Amsterdam Dep of Mathematics, VU 1 Some slides courtesy of Jelle Goeman 1 Practical notes

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Research Article Sample Size Calculation for Controlling False Discovery Proportion Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,

More information

Sanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract

Sanat Sarkar Department of Statistics, Temple University Philadelphia, PA 19122, U.S.A. September 11, Abstract Adaptive Controls of FWER and FDR Under Block Dependence arxiv:1611.03155v1 [stat.me] 10 Nov 2016 Wenge Guo Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102, U.S.A.

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

On adaptive procedures controlling the familywise error rate

On adaptive procedures controlling the familywise error rate , pp. 3 On adaptive procedures controlling the familywise error rate By SANAT K. SARKAR Temple University, Philadelphia, PA 922, USA sanat@temple.edu Summary This paper considers the problem of developing

More information

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR

The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR The Pennsylvania State University The Graduate School Eberly College of Science GENERALIZED STEPWISE PROCEDURES FOR CONTROLLING THE FALSE DISCOVERY RATE A Dissertation in Statistics by Scott Roths c 2011

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Probabilistic Inference for Multiple Testing

Probabilistic Inference for Multiple Testing This is the title page! This is the title page! Probabilistic Inference for Multiple Testing Chuanhai Liu and Jun Xie Department of Statistics, Purdue University, West Lafayette, IN 47907. E-mail: chuanhai,

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

False discovery control for multiple tests of association under general dependence

False discovery control for multiple tests of association under general dependence False discovery control for multiple tests of association under general dependence Nicolai Meinshausen Seminar für Statistik ETH Zürich December 2, 2004 Abstract We propose a confidence envelope for false

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Effects of dependence in high-dimensional multiple testing problems. Kyung In Kim and Mark van de Wiel

Effects of dependence in high-dimensional multiple testing problems. Kyung In Kim and Mark van de Wiel Effects of dependence in high-dimensional multiple testing problems Kyung In Kim and Mark van de Wiel Department of Mathematics, Vrije Universiteit Amsterdam. Contents 1. High-dimensional multiple testing

More information

Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments

Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Jie Chen 1 Merck Research Laboratories, P. O. Box 4, BL3-2, West Point, PA 19486, U.S.A. Telephone:

More information

Doing Cosmology with Balls and Envelopes

Doing Cosmology with Balls and Envelopes Doing Cosmology with Balls and Envelopes Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Larry Wasserman Department of Statistics Carnegie

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS Ying Liu Department of Biostatistics, Columbia University Summer Intern at Research and CMC Biostats, Sanofi, Boston August 26, 2015 OUTLINE 1 Introduction

More information

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Test (2008) 17: 461 471 DOI 10.1007/s11749-008-0134-6 DISCUSSION Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Joseph P. Romano Azeem M. Shaikh

More information

arxiv: v1 [math.st] 31 Mar 2009

arxiv: v1 [math.st] 31 Mar 2009 The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN

More information

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo PROCEDURES CONTROLLING THE k-fdr USING BIVARIATE DISTRIBUTIONS OF THE NULL p-values Sanat K. Sarkar and Wenge Guo Temple University and National Institute of Environmental Health Sciences Abstract: Procedures

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 11 1 / 44 Tip + Paper Tip: Two today: (1) Graduate school

More information

A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments

A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Jie Chen 1 Merck Research Laboratories, P. O. Box 4, BL3-2, West Point, PA 19486, U.S.A. Telephone:

More information

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE By Wenge Guo and M. Bhaskara Rao National Institute of Environmental Health Sciences and University of Cincinnati A classical approach for dealing

More information

More powerful control of the false discovery rate under dependence

More powerful control of the false discovery rate under dependence Statistical Methods & Applications (2006) 15: 43 73 DOI 10.1007/s10260-006-0002-z ORIGINAL ARTICLE Alessio Farcomeni More powerful control of the false discovery rate under dependence Accepted: 10 November

More information

Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis

More information

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction

More information

False discovery rate procedures for high-dimensional data Kim, K.I.

False discovery rate procedures for high-dimensional data Kim, K.I. False discovery rate procedures for high-dimensional data Kim, K.I. DOI: 10.6100/IR637929 Published: 01/01/2008 Document Version Publisher s PDF, also known as Version of Record (includes final page, issue

More information

Multiple Testing Procedures under Dependence, with Applications

Multiple Testing Procedures under Dependence, with Applications Multiple Testing Procedures under Dependence, with Applications Alessio Farcomeni November 2004 ii Dottorato di ricerca in Statistica Metodologica Dipartimento di Statistica, Probabilità e Statistiche

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 164 Multiple Testing Procedures: R multtest Package and Applications to Genomics Katherine

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 6, Issue 1 2007 Article 28 A Comparison of Methods to Control Type I Errors in Microarray Studies Jinsong Chen Mark J. van der Laan Martyn

More information

Multiple Hypothesis Testing in Microarray Data Analysis

Multiple Hypothesis Testing in Microarray Data Analysis Multiple Hypothesis Testing in Microarray Data Analysis Sandrine Dudoit jointly with Mark van der Laan and Katie Pollard Division of Biostatistics, UC Berkeley www.stat.berkeley.edu/~sandrine Short Course:

More information

POSITIVE FALSE DISCOVERY PROPORTIONS: INTRINSIC BOUNDS AND ADAPTIVE CONTROL

POSITIVE FALSE DISCOVERY PROPORTIONS: INTRINSIC BOUNDS AND ADAPTIVE CONTROL Statistica Sinica 18(2008, 837-860 POSITIVE FALSE DISCOVERY PROPORTIONS: INTRINSIC BOUNDS AND ADAPTIVE CONTROL Zhiyi Chi and Zhiqiang Tan University of Connecticut and Rutgers University Abstract: A useful

More information

On Methods Controlling the False Discovery Rate 1

On Methods Controlling the False Discovery Rate 1 Sankhyā : The Indian Journal of Statistics 2008, Volume 70-A, Part 2, pp. 135-168 c 2008, Indian Statistical Institute On Methods Controlling the False Discovery Rate 1 Sanat K. Sarkar Temple University,

More information

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Multiple testing methods to control the False Discovery Rate (FDR),

More information

Inferential Statistical Analysis of Microarray Experiments 2007 Arizona Microarray Workshop

Inferential Statistical Analysis of Microarray Experiments 2007 Arizona Microarray Workshop Inferential Statistical Analysis of Microarray Experiments 007 Arizona Microarray Workshop μ!! Robert J Tempelman Department of Animal Science tempelma@msuedu HYPOTHESIS TESTING (as if there was only one

More information

arxiv:math/ v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao

arxiv:math/ v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? arxiv:math/0701003v1 [math.st] 29 Dec 2006 Jianqing Fan Peter Hall Qiwei Yao ABSTRACT. In the analysis

More information

Resampling-based Multiple Testing with Applications to Microarray Data Analysis

Resampling-based Multiple Testing with Applications to Microarray Data Analysis Resampling-based Multiple Testing with Applications to Microarray Data Analysis DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School

More information

Department of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007

Department of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007 Department of Statistics University of Central Florida Technical Report TR-2007-01 25APR2007 Revised 25NOV2007 Controlling the Number of False Positives Using the Benjamini- Hochberg FDR Procedure Paul

More information

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control

Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Weighted Adaptive Multiple Decision Functions for False Discovery Rate Control Joshua D. Habiger Oklahoma State University jhabige@okstate.edu Nov. 8, 2013 Outline 1 : Motivation and FDR Research Areas

More information

Tools and topics for microarray analysis

Tools and topics for microarray analysis Tools and topics for microarray analysis USSES Conference, Blowing Rock, North Carolina, June, 2005 Jason A. Osborne, osborne@stat.ncsu.edu Department of Statistics, North Carolina State University 1 Outline

More information

Hypothesis testing (cont d)

Hypothesis testing (cont d) Hypothesis testing (cont d) Ulrich Heintz Brown University 4/12/2016 Ulrich Heintz - PHYS 1560 Lecture 11 1 Hypothesis testing Is our hypothesis about the fundamental physics correct? We will not be able

More information

TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? Jianqing Fan Peter Hall Qiwei Yao

TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? Jianqing Fan Peter Hall Qiwei Yao TO HOW MANY SIMULTANEOUS HYPOTHESIS TESTS CAN NORMAL, STUDENT S t OR BOOTSTRAP CALIBRATION BE APPLIED? Jianqing Fan Peter Hall Qiwei Yao ABSTRACT. In the analysis of microarray data, and in some other

More information

Sta$s$cs for Genomics ( )

Sta$s$cs for Genomics ( ) Sta$s$cs for Genomics (140.688) Instructor: Jeff Leek Slide Credits: Rafael Irizarry, John Storey No announcements today. Hypothesis testing Once you have a given score for each gene, how do you decide

More information

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses Gavin Lynch Catchpoint Systems, Inc., 228 Park Ave S 28080 New York, NY 10003, U.S.A. Wenge Guo Department of Mathematical

More information

ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE

ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE Statistica Sinica 22 (2012), 1689-1716 doi:http://dx.doi.org/10.5705/ss.2010.255 ESTIMATING THE PROPORTION OF TRUE NULL HYPOTHESES UNDER DEPENDENCE Irina Ostrovnaya and Dan L. Nicolae Memorial Sloan-Kettering

More information

Single gene analysis of differential expression

Single gene analysis of differential expression Single gene analysis of differential expression Giorgio Valentini DSI Dipartimento di Scienze dell Informazione Università degli Studi di Milano valentini@dsi.unimi.it Comparing two conditions Each condition

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University Tweedie s Formula and Selection Bias Bradley Efron Stanford University Selection Bias Observe z i N(µ i, 1) for i = 1, 2,..., N Select the m biggest ones: z (1) > z (2) > z (3) > > z (m) Question: µ values?

More information

Alpha-Investing. Sequential Control of Expected False Discoveries

Alpha-Investing. Sequential Control of Expected False Discoveries Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint

More information

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs with Remarks on Family Selection Dissertation Defense April 5, 204 Contents Dissertation Defense Introduction 2 FWER Control within

More information

Specific Differences. Lukas Meier, Seminar für Statistik

Specific Differences. Lukas Meier, Seminar für Statistik Specific Differences Lukas Meier, Seminar für Statistik Problem with Global F-test Problem: Global F-test (aka omnibus F-test) is very unspecific. Typically: Want a more precise answer (or have a more

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. doi: 10.18637/jss.v000.i00 GroupTest: Multiple Testing Procedure for Grouped Hypotheses Zhigen Zhao Abstract In the modern Big Data

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

Resampling-Based Control of the FDR

Resampling-Based Control of the FDR Resampling-Based Control of the FDR Joseph P. Romano 1 Azeem S. Shaikh 2 and Michael Wolf 3 1 Departments of Economics and Statistics Stanford University 2 Department of Economics University of Chicago

More information

The Pennsylvania State University The Graduate School A BAYESIAN APPROACH TO FALSE DISCOVERY RATE FOR LARGE SCALE SIMULTANEOUS INFERENCE

The Pennsylvania State University The Graduate School A BAYESIAN APPROACH TO FALSE DISCOVERY RATE FOR LARGE SCALE SIMULTANEOUS INFERENCE The Pennsylvania State University The Graduate School A BAYESIAN APPROACH TO FALSE DISCOVERY RATE FOR LARGE SCALE SIMULTANEOUS INFERENCE A Thesis in Statistics by Bing Han c 2007 Bing Han Submitted in

More information

Correlation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University

Correlation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University Correlation, z-values, and the Accuracy of Large-Scale Estimators Bradley Efron Stanford University Correlation and Accuracy Modern Scientific Studies N cases (genes, SNPs, pixels,... ) each with its own

More information

A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data

A moment-based method for estimating the proportion of true null hypotheses and its application to microarray gene expression data Biostatistics (2007), 8, 4, pp. 744 755 doi:10.1093/biostatistics/kxm002 Advance Access publication on January 22, 2007 A moment-based method for estimating the proportion of true null hypotheses and its

More information

Multiple hypothesis testing using the excess discovery count and alpha-investing rules

Multiple hypothesis testing using the excess discovery count and alpha-investing rules Multiple hypothesis testing using the excess discovery count and alpha-investing rules Dean P. Foster and Robert A. Stine Department of Statistics The Wharton School of the University of Pennsylvania Philadelphia,

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Hunting for significance with multiple testing

Hunting for significance with multiple testing Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance

More information

Exam: high-dimensional data analysis February 28, 2014

Exam: high-dimensional data analysis February 28, 2014 Exam: high-dimensional data analysis February 28, 2014 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question (not the subquestions) on a separate piece of paper.

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1 2004 Article 14 Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate Mark J. van der Laan

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

SOME STEP-DOWN PROCEDURES CONTROLLING THE FALSE DISCOVERY RATE UNDER DEPENDENCE

SOME STEP-DOWN PROCEDURES CONTROLLING THE FALSE DISCOVERY RATE UNDER DEPENDENCE Statistica Sinica 18(2008), 881-904 SOME STEP-DOWN PROCEDURES CONTROLLING THE FALSE DISCOVERY RATE UNDER DEPENDENCE Yongchao Ge 1, Stuart C. Sealfon 1 and Terence P. Speed 2,3 1 Mount Sinai School of Medicine,

More information

Positive false discovery proportions: intrinsic bounds and adaptive control

Positive false discovery proportions: intrinsic bounds and adaptive control Positive false discovery proportions: intrinsic bounds and adaptive control Zhiyi Chi and Zhiqiang Tan University of Connecticut and The Johns Hopkins University Running title: Bounds and control of pfdr

More information

MIXTURE MODELS FOR DETECTING DIFFERENTIALLY EXPRESSED GENES IN MICROARRAYS

MIXTURE MODELS FOR DETECTING DIFFERENTIALLY EXPRESSED GENES IN MICROARRAYS International Journal of Neural Systems, Vol. 16, No. 5 (2006) 353 362 c World Scientific Publishing Company MIXTURE MOLS FOR TECTING DIFFERENTIALLY EXPRESSED GENES IN MICROARRAYS LIAT BEN-TOVIM JONES

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

DETECTING DIFFERENTIALLY EXPRESSED GENES WHILE CONTROLLING THE FALSE DISCOVERY RATE FOR MICROARRAY DATA

DETECTING DIFFERENTIALLY EXPRESSED GENES WHILE CONTROLLING THE FALSE DISCOVERY RATE FOR MICROARRAY DATA University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Dissertations and Theses in Statistics Statistics, Department of 2009 DETECTING DIFFERENTIALLY EXPRESSED GENES WHILE CONTROLLING

More information

New Procedures for False Discovery Control

New Procedures for False Discovery Control New Procedures for False Discovery Control Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Elisha Merriam Department of Neuroscience University

More information

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES By Wenge Guo Gavin Lynch Joseph P. Romano Technical Report No. 2018-06 September 2018

More information