Technical Report 1004 Dept. of Biostatistics. Some Exact and Approximations for the Distribution of the Realized False Discovery Rate
|
|
- Beverly Felicia Robinson
- 6 years ago
- Views:
Transcription
1 Technical Report 14 Dept. of Biostatistics Some Exact and Approximations for the Distribution of the Realized False Discovery Rate David Gold ab, Jeffrey C. Miecznikowski ab1 a Department of Biostatistics, University at Buffalo, Buffalo, NY , USA b Department of Biostatistics, Roswell Park Cancer Institute, New York Short title: Realized False Discovery Rate Proofs to be sent to: Jeffrey C. Miecznikowski Department of Biostatistics School of Public Health and Health Professions 249 Farber Hall University at Buffalo 3435 Main Street Buffalo NY , USA 1 Corresponding Author. Department of Biostatistics, School of Public Health and Health Professions, 249 Farber Hall, University at Buffalo, 3435 Main Street, Buffalo NY , USA. Tel:+1(716) jcm38@buffalo.edu 1
2 Some Exact and Approximations for the Distribution of the Realized False Discovery Rate by David Gold and Jeff Miecznikowski Here, we derive the distribution of the realized false discovery rate (rfdr), for Benjamini and Hochberg s (1995) procedure, given a general distribution of test statistics. The following notation is refered to 1. a the desired FDR 2. test statistics, X iid π F (X) + π 1 F 1 (X), X = (X 1,..., X m ) mixture of the CDF s F, the null distribution, and F 1 the alternative, with mixture weights π + π 1 = 1. We are mainly interested in the case of a t-test, where F is a t distribution with v degrees of freedom and F 1 is a (possibly) mixture of non-central t s, with v degrees of freedom and noncentrality parameter η. 3. two-sided p-values p i = 2(1 F ( X i )), i = 1,..., m with distribution P (p c) = π P (p c) + π 1 P 1 (p c) 4. ordered p-values p (1),..., p (m) 5. c j = aj/m for j = 1,..., m 1 Density of the Ordered p-value chosen Define the sets: A j = {p (j) aj/m, p (j+1) > a(j + 1)/m, p (j+2) > a(j + 2)/m,..., p (m) > a} B j = {p (j) > aj/m, p (j+1) > a(j + 1)/m,..., p (m) > a} C j = {p (j+1) > a(j + 1)/m, p (j+2) > a(j + 2)/m,..., p (m) > a} Then P (A j ) = P (C j ) P (B j ). Note that for sufficiently large m, large π 1, and entropy between P and P 1, the approximation P (A j ) P ({p (j) aj/m, p (j+1) > a(j + 1)/m}) is efficient. 2
3 2 Distribution of Order Statistics The joint distribution of two order statistics is derived in Casella and Berger. P (p (i1 ) u 1, p (i2 ) u 2 ) define U 1 = I(p i u 1 ) U 2 = I(u 1 < p i u 2 ) P (p (i1 ) u 1, p (i2 ) u2) = P (i 1 U 1 < i 2, i 2 U 1 + U 2 n) + P (U 1 i 2 ) = i 2 1 s1=i 1 m s 1 s2=i 2 s 1 P (U 1 = s 1, U 2 = s 2 ) + P (U 1 i 2 ) for the general case, where p 1,..., p m are not necessarily independent or identically distributed, P (U 1 = s 1, U 2 = s 2 ) = u1 u1 u2 u 1 q Q u2 u 1 u 2 u 2 P pq1,...,p qm (p q1,..., p qm )dp q1 dp qm for the set Q = {q : (q 1,..., q m ) are permutations of (1,..., m)}, and reducing in the iid case to, P (U 1 = s 1, U 2 = s 2 ) = m! s 1!s 2!(m s 1 s 2 )! [P (p u 1)] s 1 [P (p u 2 ) P (p u 1 )] s 2 [1 P (p u 2 )] m s 1 s 2 The joint CDF of k order statistics is derived in Glueck et. al (28) for the non-identically distributed case, in particular with two sub-populations. In order to calculate the probability of B j, or that k = m j 1 of the largest order statistics are greater than constants c 1, c 2,..., c k, following the logic in Glueck et al. and some of their notation, for p 1,..., p m iid F, the joint CDF 3
4 of the order statistics, (p (j1 ),..., p (jk )), P ( k s=1{p (js) c s }) = P ( k s=1{ at least j s of p i s c s }) = P ( k s=1{i s j s }) = P ( k s=1{i s = i s }) i I = i I P ( k+1 s=1{i s i s 1 of p i s (c s 1, c s ]}) = k+1 m! i I s=1 [P (p c s ) P (p c s 1 )] (is i s 1) (i s i s 1 )! given I s = m i=1 I(p i c s ), so that I 1 I k, leading to the index set I = {i : = i i 1 i k i k+1 = m, i s j s all s [1, k]} We, however, are interested in the joint probability P ( k s=1{p (js) > c s }) = P ( k s=1{at most j s 1 of p i s c j }) = P ( k s=1{i s < j s }) = i I P ( k s=1{i s = i s }) = i I P ( k+1 s=1{i s i s 1 of p i s (c s 1, c s ]}) = k+1 m! i I s=1 [P (p c s ) P (p c s 1 )] (is i s 1) (i s i s 1 )! I = {i : = i i 1 i k i k+1 = m, i s < j s all s [1, k]} note that for 2-sided p-values, the P (p c s ) = P (X X cs ) + P (X X cs ), where and P (X X cs ) = F ( X cs ), etc., and X cs = F 1 (.5c s ). The results in Glueck can be extended for a multivariate distribution P ( k+1 s=1{i s i s 1 of p i s (c s 1, c s ]}) = c1 c1 c2 c2 ck+1 ck+1 c c 1 c k P pq1,...,p qm (p q1,..., p qm )dp q1 dp qm q Q c c 1 c k 4
5 There are i 1 integrals from (, c 1 ), (i 2 i 1 ) from (c 1, c 2 ),..., and m i k integrals from (c k, ). In order to compute the respective n-dim integral over 2-sided p-values, for each permutation, there are 2 m possible ways to integrate over the distribution of test statistics, over positive and negative domains, respectively. Suppose for example that X f and X f 1 are independent, and that within each sub-population, the covariance matrix is block diagonal, with B and B 1 blocks, repsectively. This leads to considerable reductions in computation, i.e. the product of B - and B 1 -dim integrals, rather than one m-dim integral, assuming the order of integration is inter-changable. Permuations within-block are not necessary to compute, where variables are exchangable. 3 Density of the p-value threshold P (τ) = P () + m P (τ j)p (A j ) j=1 where P () is the probability that no tests are rejected, and P (τ j) = P (p (j) A j ) = 1 1 a(j+1)/m a P (p (j), p (j+1),..., p (m) A j )dp (j+1) dp (m) or, P (τ j) = τ P (p (j) τ A j ) = τ P (p (j) τ, p (j+1) > a(j + 1)/m,..., p (m) > a)/p (A j ) if τ aj/m, and otherwise. For the bivariate approximation, let à j = {p (j) aj/m, p (j+1) > a(j + 1)/m}. 5
6 P (τ j) τ P (p (j) τ Ãj) = τ P (p (j) τ, p (j+1) > a(j + 1)/m)/P (Ãj) = τ P (p (j) τ)/p (Ãj) τ P (p (j) τ, p (j+1) a(j + 1)/m)/P (Ãj) if τ < aj/m and otherwise. Further, for the iid case, ( ) m τ P (p (j) τ) = j P (p = τ)[p (p τ)] j 1 [1 P (p τ)] m j j and j+1 1 τ P (p (j) τ, p (j+1) a(j + 1)/m) = s1=j m s 1 s2=j+1 s 1 where, the partial of P (U 1 j + 1) w.r.t τ is found as above, as ( ) m (j + 1) P (p = τ)[p (p τ)] (j+1) 1 [1 P (p τ)] m (j+1) j + 1 τ P (U 1 = s 1, U 2 = s 2 ) + τ P (U 1 j + 1) and τ = P (U 1 = s 1, U 2 = s 2 ) = m! τ s 1!s 2!(m s 1 s 2 )! [P (p τ)]s 1 [P (p a(j + 1)/m) P (p τ)] s 2 [1 P (p a(j + 1)/m)] m s 1 s 2 m! s 1!s 2!(m s 1 s 2 )! [1 P (p a(j + 1)/m)]m s 1 s 2 [s 1 P (p τ) s1 1 P (p = τ)[p (p a(j + 1)/m) P (p τ)] s 2 P (p τ) s 1 (s 2 )[P (p a(j + 1)/m) P (p τ)] s 2 1 P (p = τ)] Again, note P (p c) is found above. 6
7 4 CDF of rfdr w if w + w 1 > rf DR = w + w 1 if w + w 1 = where w is the count of false, and w 1 true rejections, respectively. The CDF is defined as P (rf DR c j) = w,w 1 : rf DR c P (w, w 1 m, m 1, F, j)p (m, m 1 m, F ) Stating the obvious, m Binom(m, π ) m 1 = m m We can find the conditional joint distribution P (w, w 1 m, m 1, F, j) = P (w, w 1, j m, m 1, F )/P (j m, m 1, F ) as described at the end of Section 3, letting f be partitioned into two subpopulations of size m and m 1, requiring that w and w 1 of the integration limits be (, c 1 ) respectively by sub-population. Also, consider the results in Glueck for two populations. Then we have P (rf DR c) = j P (rf DR c j)p (A j ) mote that P (rf DR c τ) can be approximated well, for large m, treating w, w 1 as independent, e.g. P (w τ) w r = Binom(r, m, π P (p τ)). P (rf DR c) = 1 P (rf DR c τ)p (τ)dτ 7
8 4.1 Independent Case Under the integral, there is a double sum of independent terms, with each depending on τ. Integrating each term and aggregating yields the result. The sum is composed of the terms P (rf DR c τ)p (τ j) = w ( ) m (π τ) r (1 π τ) m r w,w 1 : rf DR c r =1 w 1 r 1 =1 ( m r 1 r ) (π 1 τ) r 1 (1 π 1 F 1 (τ)) m r1 [ τ P (p (j) τ)/p (Ãj) ] τ P (p (j) τ, p (j+1) a(j + 1)/m)/P (Ãj) Then for any r, r 1, in the above, [ [ = E 1 E 2 E 3 + j+1 1 s1=j m s 1 s2=j+1 s 1 E 4 (s 1, s 2 )(E 5 (s 1, s 2 ) E 6 (s 1, s 2 )) where ( ) ( ) m E 1 = (π τ) r (1 π τ) m r m (π 1 τ) r 1 (1 π 1 P 1 (p τ)) m r 1 r r ( ) 1 m E 2 = j P (p = τ)[p (p τ)] j 1 [1 P (p τ)] n j j ( ) m E 3 = (j + 1) P (p = τ)[p (p τ)] (j+1) 1 [1 P (p τ)] n (j+1) j + 1 E 4 = m! s 1!s 2!(m s 1 s 2 )! [1 P (p a(j + 1)/m)]m s 1 s 2 E 5 = s 1 P (p τ) s1 1 P (p = τ)[p (p a(j + 1)/m) P (p τ)] s 2 s 1 E 6 = P (p τ) s 1 (s 2 s 1 )[P (p a(j + 1)/m) P (p τ)] s 2 s 1 1 P (p τ) E 7 = P (Ãj) where f, F are the pdf and cdf of the p-values, respectively. the integrals that need to be performed are proportional to cj τ r +r 1 (1 π τ) m r (1 π 1 P 1 (p τ)) m r 1 P (p = τ)[p (p τ)] j 1 [1 P (p τ)] m j dτ ]] /E 6 8
9 cj τ r +r 1 (1 π τ) m r (1 π 1 P 1 (p τ)) m r 1 P (p = τ) s 1 1 P (p = τ)[p (p a(j + 1)/m) P (p = τ)] s 2 s 1 dτ cj τ r +r 1 (1 π τ) m r (1 π 1 P 1 (p τ)) m r 1 P (p τ) s 1 (s 2 s 1 ) [P (p a(j + 1)/m) P (p τ)] s 2 s 1 1 P (p = τ)dτ 4.2 Correlation Case Correlation introduces complexities, that are beyond our capacity, with current state of the art computing. It is unpractical and unrealistic to expect that we will generate results for the general correlated case. However, for restricted and special cases, results can be achieved quickly. One such case is the block diagonal correlation matrix, of blocks of size B, identically distributted in a block, and further assuming that variables following either component distribution f or f 1 are independent. For the approximation, relying on the set Ã, rather than the full set A, we need perform calculations for the joint density of two order statistics. There are four combinations of two cases that we must consider, for two variables that are indepedent or dependent, and belonging to components f or f 1, respectively, and weight results accordingly. For the independent variables, we take previous results. For dependent variables, we compute, for each component weighting accordingly, P (U 1 = s 1, U 2 = s 2 ) = B! u1 u1 u2 u 1 u2 u 1 u 2 u 2 P p1,...,p B (p 1,..., p B )dp q1 dp qm where B! is the number of ways to permute B variables. If the block sizes vary, then we may compute over each size, and weight accordingly. If allow variables from each component in a block, then we must weight accordingly, with the correct number of permutations, which must be mixed over the correct binomial distribution, given the population rates. All of these considerations are for the sake of computational speed. 9
10 5 SIMULATIONS To Be Determined 6 REFERENCES 1. Benjamini and Hochberg (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B. 57: Glueck, Karimpour-Fard, Mandel,Hunter, Muller (28) Fast Computation by Block Permanents of Cumulative Distribution Functions of Order Statistics from Several Populations. Commun Stat Theory Methods. 37(18): doi: 1.18/ Casella and Berger (21) Statistical Inference. Duxbury Press. 1
PROCEDURES CONTROLLING THE k-fdr USING. BIVARIATE DISTRIBUTIONS OF THE NULL p-values. Sanat K. Sarkar and Wenge Guo
PROCEDURES CONTROLLING THE k-fdr USING BIVARIATE DISTRIBUTIONS OF THE NULL p-values Sanat K. Sarkar and Wenge Guo Temple University and National Institute of Environmental Health Sciences Abstract: Procedures
More informationStat 206: Estimation and testing for a mean vector,
Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where
More informationSTAT 461/561- Assignments, Year 2015
STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and
More informationFalse Discovery Rate
False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR
More informationProbability and Stochastic Processes
Probability and Stochastic Processes A Friendly Introduction Electrical and Computer Engineers Third Edition Roy D. Yates Rutgers, The State University of New Jersey David J. Goodman New York University
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationCorrelation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University
Correlation, z-values, and the Accuracy of Large-Scale Estimators Bradley Efron Stanford University Correlation and Accuracy Modern Scientific Studies N cases (genes, SNPs, pixels,... ) each with its own
More informationOptional Stopping Theorem Let X be a martingale and T be a stopping time such
Plan Counting, Renewal, and Point Processes 0. Finish FDR Example 1. The Basic Renewal Process 2. The Poisson Process Revisited 3. Variants and Extensions 4. Point Processes Reading: G&S: 7.1 7.3, 7.10
More informationSTEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE. National Institute of Environmental Health Sciences and Temple University
STEPDOWN PROCEDURES CONTROLLING A GENERALIZED FALSE DISCOVERY RATE Wenge Guo 1 and Sanat K. Sarkar 2 National Institute of Environmental Health Sciences and Temple University Abstract: Often in practice
More informationModified Simes Critical Values Under Positive Dependence
Modified Simes Critical Values Under Positive Dependence Gengqian Cai, Sanat K. Sarkar Clinical Pharmacology Statistics & Programming, BDS, GlaxoSmithKline Statistics Department, Temple University, Philadelphia
More informationLecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 28 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 3, 2015 1 2 3 4 5 1 Familywise error rates 2 procedure 3 Performance of with multiple
More informationarxiv: v1 [math.st] 31 Mar 2009
The Annals of Statistics 2009, Vol. 37, No. 2, 619 629 DOI: 10.1214/07-AOS586 c Institute of Mathematical Statistics, 2009 arxiv:0903.5373v1 [math.st] 31 Mar 2009 AN ADAPTIVE STEP-DOWN PROCEDURE WITH PROVEN
More informationFalse discovery rate and related concepts in multiple comparisons problems, with applications to microarray data
False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using
More informationProbability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models
Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models Statistical regularity Properties of relative frequency
More informationResearch Article Sample Size Calculation for Controlling False Discovery Proportion
Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationProbabilistic Inference for Multiple Testing
This is the title page! This is the title page! Probabilistic Inference for Multiple Testing Chuanhai Liu and Jun Xie Department of Statistics, Purdue University, West Lafayette, IN 47907. E-mail: chuanhai,
More informationChapter 1. Stepdown Procedures Controlling A Generalized False Discovery Rate
Chapter Stepdown Procedures Controlling A Generalized False Discovery Rate Wenge Guo and Sanat K. Sarkar Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park,
More informationNon-specific filtering and control of false positives
Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview
More informationLooking at the Other Side of Bonferroni
Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis
More informationTwo-stage stepup procedures controlling FDR
Journal of Statistical Planning and Inference 38 (2008) 072 084 www.elsevier.com/locate/jspi Two-stage stepup procedures controlling FDR Sanat K. Sarar Department of Statistics, Temple University, Philadelphia,
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationTHE UNIVERSITY OF HONG KONG DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE
THE UNIVERSITY OF HONG KONG DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE STAT1301 PROBABILITY AND STATISTICS I EXAMPLE CLASS 2 Review Definition of conditional probability For any two events A and B,
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationAlpha-Investing. Sequential Control of Expected False Discoveries
Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint
More informationA CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW
A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW Miguel A Gómez-Villegas and Beatriz González-Pérez Departamento de Estadística
More informationHigh-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018
High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More information3 Comparison with Other Dummy Variable Methods
Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction
More informationFirst Year Examination Department of Statistics, University of Florida
First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationExam: high-dimensional data analysis January 20, 2014
Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish
More informationResampling-Based Control of the FDR
Resampling-Based Control of the FDR Joseph P. Romano 1 Azeem S. Shaikh 2 and Michael Wolf 3 1 Departments of Economics and Statistics Stanford University 2 Department of Economics University of Chicago
More informationImproving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses
Improving the Performance of the FDR Procedure Using an Estimator for the Number of True Null Hypotheses Amit Zeisel, Or Zuk, Eytan Domany W.I.S. June 5, 29 Amit Zeisel, Or Zuk, Eytan Domany (W.I.S.)Improving
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 3, Issue 1 2004 Article 14 Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate Mark J. van der Laan
More informationOn Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses
On Procedures Controlling the FDR for Testing Hierarchically Ordered Hypotheses Gavin Lynch Catchpoint Systems, Inc., 228 Park Ave S 28080 New York, NY 10003, U.S.A. Wenge Guo Department of Mathematical
More informationLecture 27. December 13, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationEffects of dependence in high-dimensional multiple testing problems. Kyung In Kim and Mark van de Wiel
Effects of dependence in high-dimensional multiple testing problems Kyung In Kim and Mark van de Wiel Department of Mathematics, Vrije Universiteit Amsterdam. Contents 1. High-dimensional multiple testing
More informationApplying the Benjamini Hochberg procedure to a set of generalized p-values
U.U.D.M. Report 20:22 Applying the Benjamini Hochberg procedure to a set of generalized p-values Fredrik Jonsson Department of Mathematics Uppsala University Applying the Benjamini Hochberg procedure
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Monday, September 26, 2011 Lecture 10: Exponential families and Sufficient statistics Exponential Families Exponential families are important parametric families
More informationSome General Types of Tests
Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.
More informationModel Identification for Wireless Propagation with Control of the False Discovery Rate
Model Identification for Wireless Propagation with Control of the False Discovery Rate Christoph F. Mecklenbräuker (TU Wien) Joint work with Pei-Jung Chung (Univ. Edinburgh) Dirk Maiwald (Atlas Elektronik)
More informationSupplementary Material for On the evaluation of climate model simulated precipitation extremes
Supplementary Material for On the evaluation of climate model simulated precipitation extremes Andrea Toreti 1 and Philippe Naveau 2 1 European Commission, Joint Research Centre, Ispra (VA), Italy 2 Laboratoire
More informationFamily-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs
Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs with Remarks on Family Selection Dissertation Defense April 5, 204 Contents Dissertation Defense Introduction 2 FWER Control within
More informationStatistical testing. Samantha Kleinberg. October 20, 2009
October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find
More informationThe miss rate for the analysis of gene expression data
Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationPeak Detection for Images
Peak Detection for Images Armin Schwartzman Division of Biostatistics, UC San Diego June 016 Overview How can we improve detection power? Use a less conservative error criterion Take advantage of prior
More informationJournal Club: Higher Criticism
Journal Club: Higher Criticism David Donoho (2002): Higher Criticism for Heterogeneous Mixtures, Technical Report No. 2002-12, Dept. of Statistics, Stanford University. Introduction John Tukey (1976):
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationProcedures controlling generalized false discovery rate
rocedures controlling generalized false discovery rate By SANAT K. SARKAR Department of Statistics, Temple University, hiladelphia, A 922, U.S.A. sanat@temple.edu AND WENGE GUO Department of Environmental
More informationFDR and ROC: Similarities, Assumptions, and Decisions
EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers
More informationChapter 4 Multiple Random Variables
Review for the previous lecture Theorems and Examples: How to obtain the pmf (pdf) of U = g ( X Y 1 ) and V = g ( X Y) Chapter 4 Multiple Random Variables Chapter 43 Bivariate Transformations Continuous
More informationMultiple Testing. Hoang Tran. Department of Statistics, Florida State University
Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome
More informationEstimation of a Two-component Mixture Model
Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint
More informationMultiple testing: Intro & FWER 1
Multiple testing: Intro & FWER 1 Mark van de Wiel mark.vdwiel@vumc.nl Dep of Epidemiology & Biostatistics,VUmc, Amsterdam Dep of Mathematics, VU 1 Some slides courtesy of Jelle Goeman 1 Practical notes
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference
More informationStep-down FDR Procedures for Large Numbers of Hypotheses
Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate
More informationBivariate Paired Numerical Data
Bivariate Paired Numerical Data Pearson s correlation, Spearman s ρ and Kendall s τ, tests of independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationLearning Objectives for Stat 225
Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationLarge-Scale Multiple Testing of Correlations
Large-Scale Multiple Testing of Correlations T. Tony Cai and Weidong Liu Abstract Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity
More informationHeterogeneity and False Discovery Rate Control
Heterogeneity and False Discovery Rate Control Joshua D Habiger Oklahoma State University jhabige@okstateedu URL: jdhabigerokstateedu August, 2014 Motivating Data: Anderson and Habiger (2012) M = 778 bacteria
More informationA Likelihood Ratio Test
A Likelihood Ratio Test David Allen University of Kentucky February 23, 2012 1 Introduction Earlier presentations gave a procedure for finding an estimate and its standard error of a single linear combination
More informationarxiv: v2 [stat.me] 31 Aug 2017
Multiple testing with discrete data: proportion of true null hypotheses and two adaptive FDR procedures Xiongzhi Chen, Rebecca W. Doerge and Joseph F. Heyse arxiv:4274v2 [stat.me] 3 Aug 207 Abstract We
More informationThis paper has been submitted for consideration for publication in Biometrics
BIOMETRICS, 1 10 Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses A. Farcomeni Department of Public Health and Infectious Diseases Sapienza
More informationLecture 7 April 16, 2018
Stats 300C: Theory of Statistics Spring 2018 Lecture 7 April 16, 2018 Prof. Emmanuel Candes Scribe: Feng Ruan; Edited by: Rina Friedberg, Junjie Zhu 1 Outline Agenda: 1. False Discovery Rate (FDR) 2. Properties
More informationUQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables
UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1
More informationReview Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the
Review Quiz 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Cramér Rao lower bound (CRLB). That is, if where { } and are scalars, then
More informationTesting Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information
Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information Budi Pratikno 1 and Shahjahan Khan 2 1 Department of Mathematics and Natural Science Jenderal Soedirman
More informationSTAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment
STAT 5 Assignment Name Spring Reading Assignment: Johnson and Wichern, Chapter, Sections.5 and.6, Chapter, and Chapter. Review matrix operations in Chapter and Supplement A. Examine the matrix properties
More information0 0'0 2S ~~ Employment category
Analyze Phase 331 60000 50000 40000 30000 20000 10000 O~----,------.------,------,,------,------.------,----- N = 227 136 27 41 32 5 ' V~ 00 0' 00 00 i-.~ fl' ~G ~~ ~O~ ()0 -S 0 -S ~~ 0 ~~ 0 ~G d> ~0~
More informationHybrid Dirichlet processes for functional data
Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA
More informationHunting for significance with multiple testing
Hunting for significance with multiple testing Etienne Roquain 1 1 Laboratory LPMA, Université Pierre et Marie Curie (Paris 6), France Séminaire MODAL X, 19 mai 216 Etienne Roquain Hunting for significance
More informationRegular Sparse Crossbar Concentrators
Regular Sparse Crossbar Concentrators Weiming Guo and A. Yavuz Oruç Electrical Engineering Department College Park, MD 20742 ABSTRACT A bipartite concentrator is a single stage sparse crossbar switching
More informationFalse Discovery Control in Spatial Multiple Testing
False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University
More informationContents 1. Contents
Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................
More informationControlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method
Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman
More informationRatio of Linear Function of Parameters and Testing Hypothesis of the Combination Two Split Plot Designs
Middle-East Journal of Scientific Research 13 (Mathematical Applications in Engineering): 109-115 2013 ISSN 1990-9233 IDOSI Publications 2013 DOI: 10.5829/idosi.mejsr.2013.13.mae.10002 Ratio of Linear
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationCourse information: Instructor: Tim Hanson, Leconte 219C, phone Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment.
Course information: Instructor: Tim Hanson, Leconte 219C, phone 777-3859. Office hours: Tuesday/Thursday 11-12, Wednesday 10-12, and by appointment. Text: Applied Linear Statistical Models (5th Edition),
More informationZhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018
Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals BayesMP Zhiguang Huo 1, Chi Song 2, George Tseng
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More informationBumpbars: Inference for region detection. Yuval Benjamini, Hebrew University
Bumpbars: Inference for region detection Yuval Benjamini, Hebrew University yuvalbenj@gmail.com WHOA-PSI-2017 Collaborators Jonathan Taylor Stanford Rafael Irizarry Dana Farber, Harvard Amit Meir U of
More informationStandard Testing Procedures for White Noise and Heteroskedasticity
Standard Testing Procedures for White Noise and Heteroskedasticity Violetta Dalla 1, Liudas Giraitis 2 and Peter C.B. Phillips 3 1 University of Athens, 2 Queen Mary, UL, 3 Yale University A Celebration
More informationJournal of Statistical Software
JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. doi: 10.18637/jss.v000.i00 GroupTest: Multiple Testing Procedure for Grouped Hypotheses Zhigen Zhao Abstract In the modern Big Data
More informationFamilywise Error Rate Controlling Procedures for Discrete Data
Familywise Error Rate Controlling Procedures for Discrete Data arxiv:1711.08147v1 [stat.me] 22 Nov 2017 Yalin Zhu Center for Mathematical Sciences, Merck & Co., Inc., West Point, PA, U.S.A. Wenge Guo Department
More informationSTAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.
STAT 302 Introduction to Probability Learning Outcomes Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. Chapter 1: Combinatorial Analysis Demonstrate the ability to solve combinatorial
More informationSTAT 501 Assignment 1 Name Spring 2005
STAT 50 Assignment Name Spring 005 Reading Assignment: Johnson and Wichern, Chapter, Sections.5 and.6, Chapter, and Chapter. Review matrix operations in Chapter and Supplement A. Written Assignment: Due
More informationIntroduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??
to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter
More informationSequential Analysis & Testing Multiple Hypotheses,
Sequential Analysis & Testing Multiple Hypotheses, CS57300 - Data Mining Spring 2016 Instructor: Bruno Ribeiro 2016 Bruno Ribeiro This Class: Sequential Analysis Testing Multiple Hypotheses Nonparametric
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationSemi-Penalized Inference with Direct FDR Control
Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p
More informationREPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS
REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS Ying Liu Department of Biostatistics, Columbia University Summer Intern at Research and CMC Biostats, Sanofi, Boston August 26, 2015 OUTLINE 1 Introduction
More informationDepartment of Statistics University of Central Florida. Technical Report TR APR2007 Revised 25NOV2007
Department of Statistics University of Central Florida Technical Report TR-2007-01 25APR2007 Revised 25NOV2007 Controlling the Number of False Positives Using the Benjamini- Hochberg FDR Procedure Paul
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More information