Stat 206: Estimation and testing for a mean vector,

Size: px
Start display at page:

Download "Stat 206: Estimation and testing for a mean vector,"

Transcription

1 Stat 206: Estimation and testing for a mean vector, Part II James Johndrow Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where µ 1 and µ 2 are two population mean vectors. This is a natural testing setting for multivariate statistics, but it often isn t the focus of scientific interest. The more common case in applied multivariate analysis is that we have observations on a p-vector, and we want to simultaneously test all of the marginal hypotheses H 0j : µ 1j = µ 2j, i.e. that each of the components of µ 1 are equal to each of the components of µ 2. To see why this differs, keep in mind that we could reject some of the H 0j and fail to reject other H 0j, whereas before we were performing a single test for equality of the whole vector. Each H 0j is a univariate simple hypothesis test, and when both groups are jointly normal or, in this case, even marginally normal we would do a t test for each H 0j. This is, in fact, perfectly valid for each test considered independently. The problem is that we are doing p tests simultaneously, and in many of the applications of interest in modern multivariate statistics, p might be large, even bigger than n. The classical tests we just covered won t work when p ą n, 1 but 1 Why? for one thing, the sample the methods we ll talk about now can be applied in that setting as well. Moreover, they aren t restricted to that setting, and in fact the problem of multiple testing is relevant whenever p ą 1. covariance matrix isn t even full rank when p > n, which means we cannot compute its inverse. But that inverse appears in the Hotelling test statistic, so we re stuck. If we test H 0j at level α, then by definition the probability of a Type I error is α. Recall that a p-value is the probability under the null of observing a test statistic at least as extreme as the one observed. Accordingly, we can recast level α testing as reject H 0 if p ă α. 2. Since we are testing p such hypotheses, the probability of making at least one type I error referred to as the familywise error rate (FWER) is larger than α. If we instead perform each test at level α/p then, by Boole s inequality 3 [ p ] ď P p j ă α p H 0 ď j=1 pÿ j=1 [ P p j ă α ] p H 0 = p α p = α, 2 I have put a tilde on p here so as not to cause confusion with the dimension of the vector/number of hypotheses 3 Some authors refer to this as Bonferroni s inequality or the union bound

2 stat 206: estimation and testing for a mean vector, part ii 2 so the Familywise error rate is controlled at level α. It s not necessarily obvious that controlling the familywise error rate is the right thing to do, but it s also clear that just testing independently at level α doesn t make much sense when p is more than a few. To see why, suppose that the hypothesis tests are all independent, 4 and that every one of the null hypotheses H 0j is actually true. Then if we test each at level α, the number of Type I errors (false positives) V is distributed as 4 A dubious assumption, but useful for exposition. V Binomial(p, α), so in particular, P[N ą pα] = 0.5, assuming pα is an integer. So if α = 0.05 and p = 1000, we will make more than 50 mistakes (type I errors) half of the time. This is an obvious problem in the era of modern science, where often p is in the hundreds or thousands and n is similar to p or smaller. On the other hand, controlling the FWER is pretty conservative. It might not be so bad to have a few type I errors. In particular, it might be ok so long as type I errors are a relatively small proportion of the total number of rejections of H 0j. In particular, we want to choose a level α to perform each test such that V /(V + S) is small, where V = ÿ j 1 t pj ăα,h 0j u S = ÿ j 1 t pjăα,h 1ju, the number of rejections for which the null was true and the number of rejections for which the alternative was true, respectively. In this notation, the FWER is equal to P[V ě 1]. The quantity [ ] V Q e = E[Q] = E V + S is called the False discovery rate (FDR). Q e is the expectation of the unobserved random variable Q. Let p 0 be the number of null hypotheses that are true. The following two basic facts are important: 1. If all the null hypotheses are true, then FDR is equivalent to FWER. 2. When p 0 ă p, the FDR is less than or equal to the FWER. Therefore, any procedure that controls FWER also controls FDR. However, a procedure that controls FDR only is less stringent, so there is the potential for higher power. In particular, when p p 0 is large,

3 stat 206: estimation and testing for a mean vector, part ii 3 S tends to be large, resulting in a larger difference between E[Q] and P[V ě 1]. Thus, controlling FDR at level α is less conservative than controlling FWER at level α, since it allows us to make multiple type I errors on expectation, so long as they don t account for too high a proportion of the total number of discoveries (hypothesis tests for which we reject the null). Notice that FDR requires us to compute probabilities under both the null and the alternative, so we ll now be concerned with the power functions of the component tests. A motivating example Let s motivate the discussion to follow with a real problem. 5 The following dataset has gene expression measurement for two samples: one of size n 1 from prostate cancer tissue, and one of size n 2 from healthy prostate tissue. The expression levels of p = 6033 genes were measured. The scientific question of interest is whether the genes are differentially expressed in cancer and healthy tissue. The corresponding null hypothesis is H 0 : µ 1j = µ 2j, where µ 1 is the true mean for the control group and µ 2 the true mean for the disease state group. Let sx 1 be the sample mean for the control group and sx 2 the mean for the cancer group. Then, assuming that the variances are all equal between the control and disease state groups, the two-sample t statistic for each gene marginally is 5 This dataset is analyzed at length in Efron s Large Scale Inference (2014). t j = sx 1j sx 2j? sj, where s j is the estimate of the standard error under equal variances, for s j = s ( 1j + s 2j ) n 1 + n 2 2 n 1 n 2 ÿn 1 s 1j = (x ij sx 1j ) 2, s 2j = i=1 ÿ n 1 +n 2 i=n+1 (x ij sx 2j ) 2. It will be convenient to transform to z-values from t-values, i.e. z j = Φ 1 (2T n1+n 2 2( t j )),

4 stat 206: estimation and testing for a mean vector, part ii 4 where T (n1 +n 2 2) is the tail probability of the t distribution with n 1 + n 2 2 degrees of freedom, the distribution of the test statistic under the null. This allows us to discuss methods in generality, emphasizing that the basic principles apply regardless of the distribution of test statistic used. Let s load the data and compute z j for each coordinate. load('../../datasets-efron/prostatedata.rdata') ytmp <- as.numeric(colnames(prostatedata)) y <- '';y[ytmp==1] <- 'cancer';y[ytmp==2] <- 'control' X <- as.matrix(t(prostatedata)) colnames(x) <- paste('gene',seq(ncol(x)),sep='') rownames(x) <- paste('subject',seq(nrow(x)),sep='') X <- data.frame(x) Xs <- split(x,as.factor(y)) pvals <- mapply(t.test.p,xs[[1]],xs[[2]]) z.scores <- qnorm(pvals) z.scores <- data.frame(z.scores) names(z.scores) <- 'z' ggplot(z.scores,aes(x=z)) + geom_histogram(aes(y =..density..),bins=50) + geom_vline(xintercept=c(qnorm(.05/6033),qnorm(1-.05/6033))) stat_function(fun = dnorm, args = list(mean = 0, sd = 1), col='red') 0.1 As you can see, if we applied the Bonferroni bounds to control FWER, we would not have many discoveries (rejections of the null). In fact, there are only 2 out of the 6033 genes. bonf.p <- p.adjust(pvals,method='bonferroni') sum(bonf.p<.05) ## [1] 2 FDR control: the method of Benjamini and Hochberg Perhaps the earliest, and simplest, method for control of FDR was proposed by Benjamini and Hochberg (1995). We give the procedure here. For a proof that the procedure controls FDR at the specified level, see the original paper. Let p 1 ď p 2 ď... p p be the ordered p-values. Let k be the largest j for which density z Figure 1: z scores for the prostate data, with the Bonferroni threshold at level 0.05 shown as vertical line and the theoretical distribution of the z score under the null overlaid in red p j ď j α, (1) p

5 stat 206: estimation and testing for a mean vector, part ii 5 and reject all H j, j = 1,..., k. If the test statistics are either independent or satisfy a positive correlation condition (see Benjamini and Yekutieli (2001)), this procedure controls FDR at level α. Let s use the Benjamini-Hochberg procedure on the prostate cancer data: bh.p <- p.adjust(pvals,method='bh') sum(bh.p<.05) ## [1] 21 As expected, we have considerably more discoveries (rejections of the null) 21 instead of 2 at the 0.05 level. Benjamini and Yekeutieli (2001) propose a second procedure that controls FDR at level α for any dependence structure of the test statistics. Let k be the largest j for which j p j ď p ř j j 1 α, and reject all H j, j = 1,..., k. Then FDR is controlled at level α. Let s apply this alternative FDR controlling procedure to the prostate data: by.p <- p.adjust(pvals,method='by') sum(by.p<.05) ## [1] 2 In this particular case, we ve given up all of additional discoveries we made using the Benjamini-Hochberg procedure. Benjamini and Yekeutieli (2001) argue that in most applied problems, the positive correlation condition for their original procedure is likely to hold. The specific condition needed is the following. Let D be an increasing set, that is, if x P D and y ě x, then y P D. Let D be an increasing set and J 0 Ă D an arbitrary subset of D. Definition 1. A random vector X is PRDS on I 0 if for any increasing set D, and for each j P J 0, P[X P D X j = x] is nondecreasing in x. The condition, then, for the procedure in (1) to control FDR at level α is Theorem 1 (Benjamini and Yekutieli (2001)). Let X be the random vector of test statistics and J 0 = tj : H 0j is true u. If X is PRDS on J 0, then the Benjamini and Hochberg procedure in (1) controls the FDR at level less than or equal to (p 0 /p)α.

6 stat 206: estimation and testing for a mean vector, part ii 6 Benjamini and Yekutieli (2001) provide numerous examples of common applications in which the condition in Theorem 1 holds. Because the condition is considered to be relatively weak, the procedure in (1) has become the default FDR controlling procedure. However, keep in mind that you should at least check whether the specific application for which you are performing multiple hypothesis testing is one that has been previously considered with respect to the PRDS condition in the literature, or evaluate the plausibility of the condition for your problem (usually requires specifying a likelihood) when using the Benjamini-Hochberg procedure. Other methods for controlling FDR Numerous other method for controlling FDR exist, though most require significant background in other areas of statistics to understand thoroughly. Among these are empirical Bayes and fully Bayes procedures that offer control of local FDR, the probability than each test statistic corresponds to a false discovery. If interested, I recommend Efron s book Large Scale Inference (2014) for a comprehensive treatment of empirical Bayes approaches. References

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis

More information

High-throughput Testing

High-throughput Testing High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector

More information

Large-Scale Hypothesis Testing

Large-Scale Hypothesis Testing Chapter 2 Large-Scale Hypothesis Testing Progress in statistics is usually at the mercy of our scientific colleagues, whose data is the nature from which we work. Agricultural experimentation in the early

More information

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a

More information

Specific Differences. Lukas Meier, Seminar für Statistik

Specific Differences. Lukas Meier, Seminar für Statistik Specific Differences Lukas Meier, Seminar für Statistik Problem with Global F-test Problem: Global F-test (aka omnibus F-test) is very unspecific. Typically: Want a more precise answer (or have a more

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES

FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES FDR-CONTROLLING STEPWISE PROCEDURES AND THEIR FALSE NEGATIVES RATES Sanat K. Sarkar a a Department of Statistics, Temple University, Speakman Hall (006-00), Philadelphia, PA 19122, USA Abstract The concept

More information

Stat 206: Sampling theory, sample moments, mahalanobis

Stat 206: Sampling theory, sample moments, mahalanobis Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because

More information

FDR and ROC: Similarities, Assumptions, and Decisions

FDR and ROC: Similarities, Assumptions, and Decisions EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 28 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 3, 2015 1 2 3 4 5 1 Familywise error rates 2 procedure 3 Performance of with multiple

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs with Remarks on Family Selection Dissertation Defense April 5, 204 Contents Dissertation Defense Introduction 2 FWER Control within

More information

Confidence Intervals and Hypothesis Tests

Confidence Intervals and Hypothesis Tests Confidence Intervals and Hypothesis Tests STA 281 Fall 2011 1 Background The central limit theorem provides a very powerful tool for determining the distribution of sample means for large sample sizes.

More information

Hunting for significance with multiple testing

Hunting for significance with multiple testing Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Inferences about a Mean Vector

Inferences about a Mean Vector Inferences about a Mean Vector Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University

More information

Lecture 7 April 16, 2018

Lecture 7 April 16, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 7 April 16, 2018 Prof. Emmanuel Candes Scribe: Feng Ruan; Edited by: Rina Friedberg, Junjie Zhu 1 Outline Agenda: 1. False Discovery Rate (FDR) 2. Properties

More information

Multiple testing: Intro & FWER 1

Multiple testing: Intro & FWER 1 Multiple testing: Intro & FWER 1 Mark van de Wiel mark.vdwiel@vumc.nl Dep of Epidemiology & Biostatistics,VUmc, Amsterdam Dep of Mathematics, VU 1 Some slides courtesy of Jelle Goeman 1 Practical notes

More information

Control of Directional Errors in Fixed Sequence Multiple Testing

Control of Directional Errors in Fixed Sequence Multiple Testing Control of Directional Errors in Fixed Sequence Multiple Testing Anjana Grandhi Department of Mathematical Sciences New Jersey Institute of Technology Newark, NJ 07102-1982 Wenge Guo Department of Mathematical

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo PROCEDURES CONTROLLING THE k-fdr USING BIVARIATE DISTRIBUTIONS OF THE NULL p-values Sanat K. Sarkar and Wenge Guo Temple University and National Institute of Environmental Health Sciences Abstract: Procedures

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

Rejection regions for the bivariate case

Rejection regions for the bivariate case Rejection regions for the bivariate case The rejection region for the T 2 test (and similarly for Z 2 when Σ is known) is the region outside of an ellipse, for which there is a (1-α)% chance that the test

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are

More information

This paper has been submitted for consideration for publication in Biometrics

This paper has been submitted for consideration for publication in Biometrics BIOMETRICS, 1 10 Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses A. Farcomeni Department of Public Health and Infectious Diseases Sapienza

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests: One sided tests So far all of our tests have been two sided. While this may be a bit easier to understand, this is often not the best way to do a hypothesis test. One simple thing that we can do to get

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota Multiple Testing Gary W. Oehlert School of Statistics University of Minnesota January 28, 2016 Background Suppose that you had a 20-sided die. Nineteen of the sides are labeled 0 and one of the sides is

More information

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS Ying Liu Department of Biostatistics, Columbia University Summer Intern at Research and CMC Biostats, Sanofi, Boston August 26, 2015 OUTLINE 1 Introduction

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

Induction 1 = 1(1+1) = 2(2+1) = 3(3+1) 2

Induction 1 = 1(1+1) = 2(2+1) = 3(3+1) 2 Induction 0-8-08 Induction is used to prove a sequence of statements P(), P(), P(3),... There may be finitely many statements, but often there are infinitely many. For example, consider the statement ++3+

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

Topic 3: Hypothesis Testing

Topic 3: Hypothesis Testing CS 8850: Advanced Machine Learning Fall 07 Topic 3: Hypothesis Testing Instructor: Daniel L. Pimentel-Alarcón c Copyright 07 3. Introduction One of the simplest inference problems is that of deciding between

More information

Lecture 27. December 13, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 27. December 13, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling

Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Test (2008) 17: 461 471 DOI 10.1007/s11749-008-0134-6 DISCUSSION Rejoinder on: Control of the false discovery rate under dependence using the bootstrap and subsampling Joseph P. Romano Azeem M. Shaikh

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses Gavin Lynch Catchpoint Systems, Inc., 228 Park Ave S 28080 New York, NY 10003, U.S.A. Wenge Guo Department of Mathematical

More information

STAT Chapter 8: Hypothesis Tests

STAT Chapter 8: Hypothesis Tests STAT 515 -- Chapter 8: Hypothesis Tests CIs are possibly the most useful forms of inference because they give a range of reasonable values for a parameter. But sometimes we want to know whether one particular

More information

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Amit Zeisel, Or Zuk, Eytan Domany W.I.S. June 5, 29 Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving

More information

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester Physics 403 Credible Intervals, Confidence Intervals, and Limits Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Summarizing Parameters with a Range Bayesian

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons: STAT 263/363: Experimental Design Winter 206/7 Lecture January 9 Lecturer: Minyong Lee Scribe: Zachary del Rosario. Design of Experiments Why perform Design of Experiments (DOE)? There are at least two

More information

Lecture 6 April

Lecture 6 April Stats 300C: Theory of Statistics Spring 2017 Lecture 6 April 14 2017 Prof. Emmanuel Candes Scribe: S. Wager, E. Candes 1 Outline Agenda: From global testing to multiple testing 1. Testing the global null

More information

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong Statistics Primer ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong 1 Quick Overview of Statistics 2 Descriptive vs. Inferential Statistics Descriptive Statistics: summarize and describe data

More information

Lecture 2: Descriptive statistics, normalizations & testing

Lecture 2: Descriptive statistics, normalizations & testing Lecture 2: Descriptive statistics, normalizations & testing From sequences to OTU table Sequencing Sample 1 Sample 2... Sample N Abundances of each microbial taxon in each of the N samples 2 1 Normalizing

More information

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao

ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE. By Wenge Guo and M. Bhaskara Rao ON STEPWISE CONTROL OF THE GENERALIZED FAMILYWISE ERROR RATE By Wenge Guo and M. Bhaskara Rao National Institute of Environmental Health Sciences and University of Cincinnati A classical approach for dealing

More information

More powerful control of the false discovery rate under dependence

More powerful control of the false discovery rate under dependence Statistical Methods & Applications (2006) 15: 43 73 DOI 10.1007/s10260-006-0002-z ORIGINAL ARTICLE Alessio Farcomeni More powerful control of the false discovery rate under dependence Accepted: 10 November

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 11 1 / 44 Tip + Paper Tip: Two today: (1) Graduate school

More information

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5)

STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons (Ch. 4-5) STAT 5200 Handout #7a Contrasts & Post hoc Means Comparisons Ch. 4-5) Recall CRD means and effects models: Y ij = µ i + ϵ ij = µ + α i + ϵ ij i = 1,..., g ; j = 1,..., n ; ϵ ij s iid N0, σ 2 ) If we reject

More information

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

Optional Stopping Theorem Let X be a martingale and T be a stopping time such Plan Counting, Renewal, and Point Processes 0. Finish FDR Example 1. The Basic Renewal Process 2. The Poisson Process Revisited 3. Variants and Extensions 4. Point Processes Reading: G&S: 7.1 7.3, 7.10

More information

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES By Wenge Guo Gavin Lynch Joseph P. Romano Technical Report No. 2018-06 September 2018

More information

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6. Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized

More information

Sample Size Estimation for Studies of High-Dimensional Data

Sample Size Estimation for Studies of High-Dimensional Data Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Two sample T 2 test 1 Two sample T 2 test 2 Analogous to the univariate context, we

More information

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Gordon K. Smyth (as interpreted by Aaron J. Baraff) STAT 572 Intro Talk April 10, 2014 Microarray

More information

Multiple Testing. Tim Hanson. January, Modified from originals by Gary W. Oehlert. Department of Statistics University of South Carolina

Multiple Testing. Tim Hanson. January, Modified from originals by Gary W. Oehlert. Department of Statistics University of South Carolina Multiple Testing Tim Hanson Department of Statistics University of South Carolina January, 2017 Modified from originals by Gary W. Oehlert Type I error A Type I error is to wrongly reject the null hypothesis

More information

Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses

Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses arxiv:1610.03330v1 [stat.me] 11 Oct 2016 Jingshu Wang, Chiara Sabatti, Art B. Owen Department of Statistics, Stanford University

More information

On adaptive procedures controlling the familywise error rate

On adaptive procedures controlling the familywise error rate , pp. 3 On adaptive procedures controlling the familywise error rate By SANAT K. SARKAR Temple University, Philadelphia, PA 922, USA sanat@temple.edu Summary This paper considers the problem of developing

More information

Introductory Econometrics

Introductory Econometrics Session 4 - Testing hypotheses Roland Sciences Po July 2011 Motivation After estimation, delivering information involves testing hypotheses Did this drug had any effect on the survival rate? Is this drug

More information

CSE 312 Final Review: Section AA

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information

Week 5 Video 1 Relationship Mining Correlation Mining

Week 5 Video 1 Relationship Mining Correlation Mining Week 5 Video 1 Relationship Mining Correlation Mining Relationship Mining Discover relationships between variables in a data set with many variables Many types of relationship mining Correlation Mining

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University Tweedie s Formula and Selection Bias Bradley Efron Stanford University Selection Bias Observe z i N(µ i, 1) for i = 1, 2,..., N Select the m biggest ones: z (1) > z (2) > z (3) > > z (m) Question: µ values?

More information

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint

More information

Proof Techniques (Review of Math 271)

Proof Techniques (Review of Math 271) Chapter 2 Proof Techniques (Review of Math 271) 2.1 Overview This chapter reviews proof techniques that were probably introduced in Math 271 and that may also have been used in a different way in Phil

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University

FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING PROCEDURES 1. BY SANAT K. SARKAR Temple University The Annals of Statistics 2006, Vol. 34, No. 1, 394 415 DOI: 10.1214/009053605000000778 Institute of Mathematical Statistics, 2006 FALSE DISCOVERY AND FALSE NONDISCOVERY RATES IN SINGLE-STEP MULTIPLE TESTING

More information

The optimal discovery procedure: a new approach to simultaneous significance testing

The optimal discovery procedure: a new approach to simultaneous significance testing J. R. Statist. Soc. B (2007) 69, Part 3, pp. 347 368 The optimal discovery procedure: a new approach to simultaneous significance testing John D. Storey University of Washington, Seattle, USA [Received

More information

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004

Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Exceedance Control of the False Discovery Proportion Christopher Genovese 1 and Larry Wasserman 2 Carnegie Mellon University July 10, 2004 Multiple testing methods to control the False Discovery Rate (FDR),

More information

Chapter 23. Inference About Means

Chapter 23. Inference About Means Chapter 23 Inference About Means 1 /57 Homework p554 2, 4, 9, 10, 13, 15, 17, 33, 34 2 /57 Objective Students test null and alternate hypotheses about a population mean. 3 /57 Here We Go Again Now that

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity

More information

One-sample categorical data: approximate inference

One-sample categorical data: approximate inference One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution

More information

IEOR165 Discussion Week 12

IEOR165 Discussion Week 12 IEOR165 Discussion Week 12 Sheng Liu University of California, Berkeley Apr 15, 2016 Outline 1 Type I errors & Type II errors 2 Multiple Testing 3 ANOVA IEOR165 Discussion Sheng Liu 2 Type I errors & Type

More information

False Discovery Control in Spatial Multiple Testing

False Discovery Control in Spatial Multiple Testing False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University

More information

Technical Report 1004 Dept. of Biostatistics. Some Exact and Approximations for the Distribution of the Realized False Discovery Rate

Technical Report 1004 Dept. of Biostatistics. Some Exact and Approximations for the Distribution of the Realized False Discovery Rate Technical Report 14 Dept. of Biostatistics Some Exact and Approximations for the Distribution of the Realized False Discovery Rate David Gold ab, Jeffrey C. Miecznikowski ab1 a Department of Biostatistics,

More information

STA 437: Applied Multivariate Statistics

STA 437: Applied Multivariate Statistics Al Nosedal. University of Toronto. Winter 2015 1 Chapter 5. Tests on One or Two Mean Vectors If you can t explain it simply, you don t understand it well enough Albert Einstein. Definition Chapter 5. Tests

More information

Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis

Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis Jinhyun Ju Jason Banfelder Luce Skrabanek June 21st, 218 1 Preface For the last session in this course, we ll

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

16.3 One-Way ANOVA: The Procedure

16.3 One-Way ANOVA: The Procedure 16.3 One-Way ANOVA: The Procedure Tom Lewis Fall Term 2009 Tom Lewis () 16.3 One-Way ANOVA: The Procedure Fall Term 2009 1 / 10 Outline 1 The background 2 Computing formulas 3 The ANOVA Identity 4 Tom

More information

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Research Article Sample Size Calculation for Controlling False Discovery Proportion Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information