Technical Report 1004 Dept. of Biostatistics. Some Exact and Approximations for the Distribution of the Realized False Discovery Rate

Size: px
Start display at page:

Download "Technical Report 1004 Dept. of Biostatistics. Some Exact and Approximations for the Distribution of the Realized False Discovery Rate"

Transcription

1 Technical Report 14 Dept. of Biostatistics Some Exact and Approximations for the Distribution of the Realized False Discovery Rate David Gold ab, Jeffrey C. Miecznikowski ab1 a Department of Biostatistics, University at Buffalo, Buffalo, NY , USA b Department of Biostatistics, Roswell Park Cancer Institute, New York Short title: Realized False Discovery Rate Proofs to be sent to: Jeffrey C. Miecznikowski Department of Biostatistics School of Public Health and Health Professions 249 Farber Hall University at Buffalo 3435 Main Street Buffalo NY , USA 1 Corresponding Author. Department of Biostatistics, School of Public Health and Health Professions, 249 Farber Hall, University at Buffalo, 3435 Main Street, Buffalo NY , USA. Tel:+1(716) jcm38@buffalo.edu 1

2 Some Exact and Approximations for the Distribution of the Realized False Discovery Rate by David Gold and Jeff Miecznikowski Here, we derive the distribution of the realized false discovery rate (rfdr), for Benjamini and Hochberg s (1995) procedure, given a general distribution of test statistics. The following notation is refered to 1. a the desired FDR 2. test statistics, X iid π F (X) + π 1 F 1 (X), X = (X 1,..., X m ) mixture of the CDF s F, the null distribution, and F 1 the alternative, with mixture weights π + π 1 = 1. We are mainly interested in the case of a t-test, where F is a t distribution with v degrees of freedom and F 1 is a (possibly) mixture of non-central t s, with v degrees of freedom and noncentrality parameter η. 3. two-sided p-values p i = 2(1 F ( X i )), i = 1,..., m with distribution P (p c) = π P (p c) + π 1 P 1 (p c) 4. ordered p-values p (1),..., p (m) 5. c j = aj/m for j = 1,..., m 1 Density of the Ordered p-value chosen Define the sets: A j = {p (j) aj/m, p (j+1) > a(j + 1)/m, p (j+2) > a(j + 2)/m,..., p (m) > a} B j = {p (j) > aj/m, p (j+1) > a(j + 1)/m,..., p (m) > a} C j = {p (j+1) > a(j + 1)/m, p (j+2) > a(j + 2)/m,..., p (m) > a} Then P (A j ) = P (C j ) P (B j ). Note that for sufficiently large m, large π 1, and entropy between P and P 1, the approximation P (A j ) P ({p (j) aj/m, p (j+1) > a(j + 1)/m}) is efficient. 2

3 2 Distribution of Order Statistics The joint distribution of two order statistics is derived in Casella and Berger. P (p (i1 ) u 1, p (i2 ) u 2 ) define U 1 = I(p i u 1 ) U 2 = I(u 1 < p i u 2 ) P (p (i1 ) u 1, p (i2 ) u2) = P (i 1 U 1 < i 2, i 2 U 1 + U 2 n) + P (U 1 i 2 ) = i 2 1 s1=i 1 m s 1 s2=i 2 s 1 P (U 1 = s 1, U 2 = s 2 ) + P (U 1 i 2 ) for the general case, where p 1,..., p m are not necessarily independent or identically distributed, P (U 1 = s 1, U 2 = s 2 ) = u1 u1 u2 u 1 q Q u2 u 1 u 2 u 2 P pq1,...,p qm (p q1,..., p qm )dp q1 dp qm for the set Q = {q : (q 1,..., q m ) are permutations of (1,..., m)}, and reducing in the iid case to, P (U 1 = s 1, U 2 = s 2 ) = m! s 1!s 2!(m s 1 s 2 )! [P (p u 1)] s 1 [P (p u 2 ) P (p u 1 )] s 2 [1 P (p u 2 )] m s 1 s 2 The joint CDF of k order statistics is derived in Glueck et. al (28) for the non-identically distributed case, in particular with two sub-populations. In order to calculate the probability of B j, or that k = m j 1 of the largest order statistics are greater than constants c 1, c 2,..., c k, following the logic in Glueck et al. and some of their notation, for p 1,..., p m iid F, the joint CDF 3

4 of the order statistics, (p (j1 ),..., p (jk )), P ( k s=1{p (js) c s }) = P ( k s=1{ at least j s of p i s c s }) = P ( k s=1{i s j s }) = P ( k s=1{i s = i s }) i I = i I P ( k+1 s=1{i s i s 1 of p i s (c s 1, c s ]}) = k+1 m! i I s=1 [P (p c s ) P (p c s 1 )] (is i s 1) (i s i s 1 )! given I s = m i=1 I(p i c s ), so that I 1 I k, leading to the index set I = {i : = i i 1 i k i k+1 = m, i s j s all s [1, k]} We, however, are interested in the joint probability P ( k s=1{p (js) > c s }) = P ( k s=1{at most j s 1 of p i s c j }) = P ( k s=1{i s < j s }) = i I P ( k s=1{i s = i s }) = i I P ( k+1 s=1{i s i s 1 of p i s (c s 1, c s ]}) = k+1 m! i I s=1 [P (p c s ) P (p c s 1 )] (is i s 1) (i s i s 1 )! I = {i : = i i 1 i k i k+1 = m, i s < j s all s [1, k]} note that for 2-sided p-values, the P (p c s ) = P (X X cs ) + P (X X cs ), where and P (X X cs ) = F ( X cs ), etc., and X cs = F 1 (.5c s ). The results in Glueck can be extended for a multivariate distribution P ( k+1 s=1{i s i s 1 of p i s (c s 1, c s ]}) = c1 c1 c2 c2 ck+1 ck+1 c c 1 c k P pq1,...,p qm (p q1,..., p qm )dp q1 dp qm q Q c c 1 c k 4

5 There are i 1 integrals from (, c 1 ), (i 2 i 1 ) from (c 1, c 2 ),..., and m i k integrals from (c k, ). In order to compute the respective n-dim integral over 2-sided p-values, for each permutation, there are 2 m possible ways to integrate over the distribution of test statistics, over positive and negative domains, respectively. Suppose for example that X f and X f 1 are independent, and that within each sub-population, the covariance matrix is block diagonal, with B and B 1 blocks, repsectively. This leads to considerable reductions in computation, i.e. the product of B - and B 1 -dim integrals, rather than one m-dim integral, assuming the order of integration is inter-changable. Permuations within-block are not necessary to compute, where variables are exchangable. 3 Density of the p-value threshold P (τ) = P () + m P (τ j)p (A j ) j=1 where P () is the probability that no tests are rejected, and P (τ j) = P (p (j) A j ) = 1 1 a(j+1)/m a P (p (j), p (j+1),..., p (m) A j )dp (j+1) dp (m) or, P (τ j) = τ P (p (j) τ A j ) = τ P (p (j) τ, p (j+1) > a(j + 1)/m,..., p (m) > a)/p (A j ) if τ aj/m, and otherwise. For the bivariate approximation, let à j = {p (j) aj/m, p (j+1) > a(j + 1)/m}. 5

6 P (τ j) τ P (p (j) τ Ãj) = τ P (p (j) τ, p (j+1) > a(j + 1)/m)/P (Ãj) = τ P (p (j) τ)/p (Ãj) τ P (p (j) τ, p (j+1) a(j + 1)/m)/P (Ãj) if τ < aj/m and otherwise. Further, for the iid case, ( ) m τ P (p (j) τ) = j P (p = τ)[p (p τ)] j 1 [1 P (p τ)] m j j and j+1 1 τ P (p (j) τ, p (j+1) a(j + 1)/m) = s1=j m s 1 s2=j+1 s 1 where, the partial of P (U 1 j + 1) w.r.t τ is found as above, as ( ) m (j + 1) P (p = τ)[p (p τ)] (j+1) 1 [1 P (p τ)] m (j+1) j + 1 τ P (U 1 = s 1, U 2 = s 2 ) + τ P (U 1 j + 1) and τ = P (U 1 = s 1, U 2 = s 2 ) = m! τ s 1!s 2!(m s 1 s 2 )! [P (p τ)]s 1 [P (p a(j + 1)/m) P (p τ)] s 2 [1 P (p a(j + 1)/m)] m s 1 s 2 m! s 1!s 2!(m s 1 s 2 )! [1 P (p a(j + 1)/m)]m s 1 s 2 [s 1 P (p τ) s1 1 P (p = τ)[p (p a(j + 1)/m) P (p τ)] s 2 P (p τ) s 1 (s 2 )[P (p a(j + 1)/m) P (p τ)] s 2 1 P (p = τ)] Again, note P (p c) is found above. 6

7 4 CDF of rfdr w if w + w 1 > rf DR = w + w 1 if w + w 1 = where w is the count of false, and w 1 true rejections, respectively. The CDF is defined as P (rf DR c j) = w,w 1 : rf DR c P (w, w 1 m, m 1, F, j)p (m, m 1 m, F ) Stating the obvious, m Binom(m, π ) m 1 = m m We can find the conditional joint distribution P (w, w 1 m, m 1, F, j) = P (w, w 1, j m, m 1, F )/P (j m, m 1, F ) as described at the end of Section 3, letting f be partitioned into two subpopulations of size m and m 1, requiring that w and w 1 of the integration limits be (, c 1 ) respectively by sub-population. Also, consider the results in Glueck for two populations. Then we have P (rf DR c) = j P (rf DR c j)p (A j ) mote that P (rf DR c τ) can be approximated well, for large m, treating w, w 1 as independent, e.g. P (w τ) w r = Binom(r, m, π P (p τ)). P (rf DR c) = 1 P (rf DR c τ)p (τ)dτ 7

8 4.1 Independent Case Under the integral, there is a double sum of independent terms, with each depending on τ. Integrating each term and aggregating yields the result. The sum is composed of the terms P (rf DR c τ)p (τ j) = w ( ) m (π τ) r (1 π τ) m r w,w 1 : rf DR c r =1 w 1 r 1 =1 ( m r 1 r ) (π 1 τ) r 1 (1 π 1 F 1 (τ)) m r1 [ τ P (p (j) τ)/p (Ãj) ] τ P (p (j) τ, p (j+1) a(j + 1)/m)/P (Ãj) Then for any r, r 1, in the above, [ [ = E 1 E 2 E 3 + j+1 1 s1=j m s 1 s2=j+1 s 1 E 4 (s 1, s 2 )(E 5 (s 1, s 2 ) E 6 (s 1, s 2 )) where ( ) ( ) m E 1 = (π τ) r (1 π τ) m r m (π 1 τ) r 1 (1 π 1 P 1 (p τ)) m r 1 r r ( ) 1 m E 2 = j P (p = τ)[p (p τ)] j 1 [1 P (p τ)] n j j ( ) m E 3 = (j + 1) P (p = τ)[p (p τ)] (j+1) 1 [1 P (p τ)] n (j+1) j + 1 E 4 = m! s 1!s 2!(m s 1 s 2 )! [1 P (p a(j + 1)/m)]m s 1 s 2 E 5 = s 1 P (p τ) s1 1 P (p = τ)[p (p a(j + 1)/m) P (p τ)] s 2 s 1 E 6 = P (p τ) s 1 (s 2 s 1 )[P (p a(j + 1)/m) P (p τ)] s 2 s 1 1 P (p τ) E 7 = P (Ãj) where f, F are the pdf and cdf of the p-values, respectively. the integrals that need to be performed are proportional to cj τ r +r 1 (1 π τ) m r (1 π 1 P 1 (p τ)) m r 1 P (p = τ)[p (p τ)] j 1 [1 P (p τ)] m j dτ ]] /E 6 8

9 cj τ r +r 1 (1 π τ) m r (1 π 1 P 1 (p τ)) m r 1 P (p = τ) s 1 1 P (p = τ)[p (p a(j + 1)/m) P (p = τ)] s 2 s 1 dτ cj τ r +r 1 (1 π τ) m r (1 π 1 P 1 (p τ)) m r 1 P (p τ) s 1 (s 2 s 1 ) [P (p a(j + 1)/m) P (p τ)] s 2 s 1 1 P (p = τ)dτ 4.2 Correlation Case Correlation introduces complexities, that are beyond our capacity, with current state of the art computing. It is unpractical and unrealistic to expect that we will generate results for the general correlated case. However, for restricted and special cases, results can be achieved quickly. One such case is the block diagonal correlation matrix, of blocks of size B, identically distributted in a block, and further assuming that variables following either component distribution f or f 1 are independent. For the approximation, relying on the set Ã, rather than the full set A, we need perform calculations for the joint density of two order statistics. There are four combinations of two cases that we must consider, for two variables that are indepedent or dependent, and belonging to components f or f 1, respectively, and weight results accordingly. For the independent variables, we take previous results. For dependent variables, we compute, for each component weighting accordingly, P (U 1 = s 1, U 2 = s 2 ) = B! u1 u1 u2 u 1 u2 u 1 u 2 u 2 P p1,...,p B (p 1,..., p B )dp q1 dp qm where B! is the number of ways to permute B variables. If the block sizes vary, then we may compute over each size, and weight accordingly. If allow variables from each component in a block, then we must weight accordingly, with the correct number of permutations, which must be mixed over the correct binomial distribution, given the population rates. All of these considerations are for the sake of computational speed. 9

10 5 SIMULATIONS To Be Determined 6 REFERENCES 1. Benjamini and Hochberg (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B. 57: Glueck, Karimpour-Fard, Mandel,Hunter, Muller (28) Fast Computation by Block Permanents of Cumulative Distribution Functions of Order Statistics from Several Populations. Commun Stat Theory Methods. 37(18): doi: 1.18/ Casella and Berger (21) Statistical Inference. Duxbury Press. 1

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo

PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo PROCEDURES CONTROLLING THE k-fdr USING BIVARIATE DISTRIBUTIONS OF THE NULL p-values Sanat K. Sarkar and Wenge Guo Temple University and National Institute of Environmental Health Sciences Abstract: Procedures

More information

Stat 206: Estimation and testing for a mean vector,

Stat 206: Estimation and testing for a mean vector, Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

Probability and Stochastic Processes

Probability and Stochastic Processes Probability and Stochastic Processes A Friendly Introduction Electrical and Computer Engineers Third Edition Roy D. Yates Rutgers, The State University of New Jersey David J. Goodman New York University

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Correlation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University

Correlation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University Correlation, z-values, and the Accuracy of Large-Scale Estimators Bradley Efron Stanford University Correlation and Accuracy Modern Scientific Studies N cases (genes, SNPs, pixels,... ) each with its own

More information

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

Optional Stopping Theorem Let X be a martingale and T be a stopping time such Plan Counting, Renewal, and Point Processes 0. Finish FDR Example 1. The Basic Renewal Process 2. The Poisson Process Revisited 3. Variants and Extensions 4. Point Processes Reading: G&S: 7.1 7.3, 7.10

More information

STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University

STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE Wenge Guo 1 and Sanat K. Sarkar 2 National Institute of Environmental Health Sciences and Temple University Abstract: Often in practice

More information

Modified Simes Critical Values Under Positive Dependence

Modified Simes Critical Values Under Positive Dependence Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia

More information

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 28 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 3, 2015 1 2 3 4 5 1 Familywise error rates 2 procedure 3 Performance of with multiple

More information

arxiv: v1 [math.st] 31 Mar 2009

arxiv: v1 [math.st] 31 Mar 2009 The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models

Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models Statistical regularity Properties of relative frequency

More information

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Research Article Sample Size Calculation for Controlling False Discovery Proportion Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Probabilistic Inference for Multiple Testing

Probabilistic Inference for Multiple Testing This is the title page! This is the title page! Probabilistic Inference for Multiple Testing Chuanhai Liu and Jun Xie Department of Statistics, Purdue University, West Lafayette, IN 47907. E-mail: chuanhai,

More information

Chapter 1. Stepdown Procedures Controlling A Generalized False Discovery Rate

Chapter 1. Stepdown Procedures Controlling A Generalized False Discovery Rate Chapter Stepdown Procedures Controlling A Generalized False Discovery Rate Wenge Guo and Sanat K. Sarkar Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park,

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis

More information

Two-stage stepup procedures controlling FDR

Two-stage stepup procedures controlling FDR Journal of Statistical Planning and Inference 38 (2008) 072 084 www.elsevier.com/locate/jspi Two-stage stepup procedures controlling FDR Sanat K. Sarar Department of Statistics, Temple University, Philadelphia,

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

THE UNIVERSITY OF HONG KONG DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE

THE UNIVERSITY OF HONG KONG DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE THE UNIVERSITY OF HONG KONG DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE STAT1301 PROBABILITY AND STATISTICS I EXAMPLE CLASS 2 Review Definition of conditional probability For any two events A and B,

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Alpha-Investing. Sequential Control of Expected False Discoveries

Alpha-Investing. Sequential Control of Expected False Discoveries Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint

More information

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW Miguel A Gómez-Villegas and Beatriz González-Pérez Departamento de Estadística

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

3 Comparison with Other Dummy Variable Methods

3 Comparison with Other Dummy Variable Methods Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

Exam: high-dimensional data analysis January 20, 2014

Exam: high-dimensional data analysis January 20, 2014 Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish

More information

Resampling-Based Control of the FDR

Resampling-Based Control of the FDR Resampling-Based Control of the FDR Joseph P. Romano 1 Azeem S. Shaikh 2 and Michael Wolf 3 1 Departments of Economics and Statistics Stanford University 2 Department of Economics University of Chicago

More information

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses

Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Amit Zeisel, Or Zuk, Eytan Domany W.I.S. June 5, 29 Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1 2004 Article 14 Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate Mark J. van der Laan

More information

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses

On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses Gavin Lynch Catchpoint Systems, Inc., 228 Park Ave S 28080 New York, NY 10003, U.S.A. Wenge Guo Department of Mathematical

More information

Lecture 27. December 13, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 27. December 13, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Effects of dependence in high-dimensional multiple testing problems. Kyung In Kim and Mark van de Wiel

Effects of dependence in high-dimensional multiple testing problems. Kyung In Kim and Mark van de Wiel Effects of dependence in high-dimensional multiple testing problems Kyung In Kim and Mark van de Wiel Department of Mathematics, Vrije Universiteit Amsterdam. Contents 1. High-dimensional multiple testing

More information

Applying the Benjamini Hochberg procedure to a set of generalized p-values

Applying the Benjamini Hochberg procedure to a set of generalized p-values U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Monday, September 26, 2011 Lecture 10: Exponential families and Sufficient statistics Exponential Families Exponential families are important parametric families

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

Model Identification for Wireless Propagation with Control of the False Discovery Rate

Model Identification for Wireless Propagation with Control of the False Discovery Rate Model Identification for Wireless Propagation with Control of the False Discovery Rate Christoph F. Mecklenbräuker (TU Wien) Joint work with Pei-Jung Chung (Univ. Edinburgh) Dirk Maiwald (Atlas Elektronik)

More information

Supplementary Material for On the evaluation of climate model simulated precipitation extremes

Supplementary Material for On the evaluation of climate model simulated precipitation extremes Supplementary Material for On the evaluation of climate model simulated precipitation extremes Andrea Toreti 1 and Philippe Naveau 2 1 European Commission, Joint Research Centre, Ispra (VA), Italy 2 Laboratoire

More information

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs with Remarks on Family Selection Dissertation Defense April 5, 204 Contents Dissertation Defense Introduction 2 FWER Control within

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

The miss rate for the analysis of gene expression data

The miss rate for the analysis of gene expression data Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

Peak Detection for Images

Peak Detection for Images Peak Detection for Images Armin Schwartzman Division of Biostatistics, UC San Diego June 016 Overview How can we improve detection power? Use a less conservative error criterion Take advantage of prior

More information

Journal Club: Higher Criticism

Journal Club: Higher Criticism Journal Club: Higher Criticism David Donoho (2002): Higher Criticism for Heterogeneous Mixtures, Technical Report No. 2002-12, Dept. of Statistics, Stanford University. Introduction John Tukey (1976):

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Procedures controlling generalized false discovery rate

Procedures controlling generalized false discovery rate rocedures controlling generalized false discovery rate By SANAT K. SARKAR Department of Statistics, Temple University, hiladelphia, A 922, U.S.A. sanat@temple.edu AND WENGE GUO Department of Environmental

More information

FDR and ROC: Similarities, Assumptions, and Decisions

FDR and ROC: Similarities, Assumptions, and Decisions EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Review for the previous lecture Theorems and Examples: How to obtain the pmf (pdf) of U = g ( X Y 1 ) and V = g ( X Y) Chapter 4 Multiple Random Variables Chapter 43 Bivariate Transformations Continuous

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint

More information

Multiple testing: Intro & FWER 1

Multiple testing: Intro & FWER 1 Multiple testing: Intro & FWER 1 Mark van de Wiel mark.vdwiel@vumc.nl Dep of Epidemiology & Biostatistics,VUmc, Amsterdam Dep of Mathematics, VU 1 Some slides courtesy of Jelle Goeman 1 Practical notes

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

Bivariate Paired Numerical Data

Bivariate Paired Numerical Data Bivariate Paired Numerical Data Pearson s correlation, Spearman s ρ and Kendall s τ, tests of independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity

More information

Heterogeneity and False Discovery Rate Control

Heterogeneity and False Discovery Rate Control Heterogeneity and False Discovery Rate Control Joshua D Habiger Oklahoma State University jhabige@okstateedu URL: jdhabigerokstateedu August, 2014 Motivating Data: Anderson and Habiger (2012) M = 778 bacteria

More information

A Likelihood Ratio Test

A Likelihood Ratio Test A Likelihood Ratio Test David Allen University of Kentucky February 23, 2012 1 Introduction Earlier presentations gave a procedure for finding an estimate and its standard error of a single linear combination

More information

arxiv: v2 [stat.me] 31 Aug 2017

arxiv: v2 [stat.me] 31 Aug 2017 Multiple testing with discrete data: proportion of true null hypotheses and two adaptive FDR procedures Xiongzhi Chen, Rebecca W. Doerge and Joseph F. Heyse arxiv:4274v2 [stat.me] 3 Aug 207 Abstract We

More information

This paper has been submitted for consideration for publication in Biometrics

This paper has been submitted for consideration for publication in Biometrics BIOMETRICS, 1 10 Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses A. Farcomeni Department of Public Health and Infectious Diseases Sapienza

More information

Lecture 7 April 16, 2018

Lecture 7 April 16, 2018 Stats 300C: Theory of Statistics Spring 2018 Lecture 7 April 16, 2018 Prof. Emmanuel Candes Scribe: Feng Ruan; Edited by: Rina Friedberg, Junjie Zhu 1 Outline Agenda: 1. False Discovery Rate (FDR) 2. Properties

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Review Quiz 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Cramér Rao lower bound (CRLB). That is, if where { } and are scalars, then

More information

Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information

Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information Budi Pratikno 1 and Shahjahan Khan 2 1 Department of Mathematics and Natural Science Jenderal Soedirman

More information

STAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment

STAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment STAT 5 Assignment Name Spring Reading Assignment: Johnson and Wichern, Chapter, Sections.5 and.6, Chapter, and Chapter. Review matrix operations in Chapter and Supplement A. Examine the matrix properties

More information

0 0'0 2S ~~ Employment category

0 0'0 2S ~~ Employment category Analyze Phase 331 60000 50000 40000 30000 20000 10000 O~----,------.------,------,,------,------.------,----- N = 227 136 27 41 32 5 ' V~ 00 0' 00 00 i-.~ fl' ~G ~~ ~O~ ()0 -S 0 -S ~~ 0 ~~ 0 ~G d> ~0~

More information

Hybrid Dirichlet processes for functional data

Hybrid Dirichlet processes for functional data Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA

More information

Hunting for significance with multiple testing

Hunting for significance with multiple testing Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance

More information

Regular Sparse Crossbar Concentrators

Regular Sparse Crossbar Concentrators Regular Sparse Crossbar Concentrators Weiming Guo and A. Yavuz Oruç Electrical Engineering Department College Park, MD 20742 ABSTRACT A bipartite concentrator is a single stage sparse crossbar switching

More information

False Discovery Control in Spatial Multiple Testing

False Discovery Control in Spatial Multiple Testing False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

Ratio of Linear Function of Parameters and Testing Hypothesis of the Combination Two Split Plot Designs

Ratio of Linear Function of Parameters and Testing Hypothesis of the Combination Two Split Plot Designs Middle-East Journal of Scientific Research 13 (Mathematical Applications in Engineering): 109-115 2013 ISSN 1990-9233 IDOSI Publications 2013 DOI: 10.5829/idosi.mejsr.2013.13.mae.10002 Ratio of Linear

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.

Course information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment. Course information: Instructor: Tim Hanson, Leconte 219C, phone 777-3859. Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment. Text: Applied Linear Statistical Models (5th Edition),

More information

Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018

Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018 Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals BayesMP Zhiguang Huo 1, Chi Song 2, George Tseng

More information

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data

Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations

More information

Bumpbars: Inference for region detection. Yuval Benjamini, Hebrew University

Bumpbars: Inference for region detection. Yuval Benjamini, Hebrew University Bumpbars: Inference for region detection Yuval Benjamini, Hebrew University yuvalbenj@gmail.com WHOA-PSI-2017 Collaborators Jonathan Taylor Stanford Rafael Irizarry Dana Farber, Harvard Amit Meir U of

More information

Standard Testing Procedures for White Noise and Heteroskedasticity

Standard Testing Procedures for White Noise and Heteroskedasticity Standard Testing Procedures for White Noise and Heteroskedasticity Violetta Dalla 1, Liudas Giraitis 2 and Peter C.B. Phillips 3 1 University of Athens, 2 Queen Mary, UL, 3 Yale University A Celebration

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. doi: 10.18637/jss.v000.i00 GroupTest: Multiple Testing Procedure for Grouped Hypotheses Zhigen Zhao Abstract In the modern Big Data

More information

Familywise Error Rate Controlling Procedures for Discrete Data

Familywise Error Rate Controlling Procedures for Discrete Data Familywise Error Rate Controlling Procedures for Discrete Data arxiv:1711.08147v1 [stat.me] 22 Nov 2017 Yalin Zhu Center for Mathematical Sciences, Merck & Co., Inc., West Point, PA, U.S.A. Wenge Guo Department

More information

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. STAT 302 Introduction to Probability Learning Outcomes Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. Chapter 1: Combinatorial Analysis Demonstrate the ability to solve combinatorial

More information

STAT 501 Assignment 1 Name Spring 2005

STAT 501 Assignment 1 Name Spring 2005 STAT 50 Assignment Name Spring 005 Reading Assignment: Johnson and Wichern, Chapter, Sections.5 and.6, Chapter, and Chapter. Review matrix operations in Chapter and Supplement A. Written Assignment: Due

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

Sequential Analysis & Testing Multiple Hypotheses,

Sequential Analysis & Testing Multiple Hypotheses, Sequential Analysis & Testing Multiple Hypotheses, CS57300 - Data Mining Spring 2016 Instructor: Bruno Ribeiro 2016 Bruno Ribeiro This Class: Sequential Analysis Testing Multiple Hypotheses Nonparametric

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS Ying Liu Department of Biostatistics, Columbia University Summer Intern at Research and CMC Biostats, Sanofi, Boston August 26, 2015 OUTLINE 1 Introduction

More information

Department of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007

Department of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007 Department of Statistics University of Central Florida Technical Report TR-2007-01 25APR2007 Revised 25NOV2007 Controlling the Number of False Positives Using the Benjamini- Hochberg FDR Procedure Paul

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information