Multiple Testing Multiple Testing

Similar documents
Web-based Supplementary Materials for. Controlling False Discoveries in Multidimensional Directional Decisions, with

Lecture 7 Topic 5: Multiple Comparisons (means separation)

Some challenges in the analysis of microbiome data. Shyamal Peddada Biostatistics and Computational Biology Branch NIEHS, NIH

Pearson s Chi-Square Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted Histograms

Central Coverage Bayes Prediction Intervals for the Generalized Pareto Distribution

Topic 5. Mean separation: Multiple comparisons [ST&D Ch.8, except 8.3]

Goodness-of-fit for composite hypotheses.

Chem 453/544 Fall /08/03. Exam #1 Solutions

Multiple Criteria Secretary Problem: A New Approach

Estimation of the Correlation Coefficient for a Bivariate Normal Distribution with Missing Data

Psychometric Methods: Theory into Practice Larry R. Price

CSCE 478/878 Lecture 4: Experimental Design and Analysis. Stephen Scott. 3 Building a tree on the training set Introduction. Outline.

The Substring Search Problem

4/18/2005. Statistical Learning Theory

FUSE Fusion Utility Sequence Estimator

Hypothesis Test and Confidence Interval for the Negative Binomial Distribution via Coincidence: A Case for Rare Events

1D2G - Numerical solution of the neutron diffusion equation

Determining solar characteristics using planetary data

MEASURES OF BLOCK DESIGN EFFICIENCY RECOVERING INTERBLOCK INFORMATION

Inference for A One Way Factorial Experiment. By Ed Stanek and Elaine Puleo

Information Retrieval Advanced IR models. Luca Bondi

Introduction to Nuclear Forces

Likelihood vs. Information in Aligning Biopolymer Sequences. UCSD Technical Report CS Timothy L. Bailey

FW Laboratory Exercise. Survival Estimation from Banded/Tagged Animals. Year No. i Tagged

2. Electrostatics. Dr. Rakhesh Singh Kshetrimayum 8/11/ Electromagnetic Field Theory by R. S. Kshetrimayum

Decomposing portfolio risk using Monte Carlo estimators

Identification of the degradation of railway ballast under a concrete sleeper

4. Some Applications of first order linear differential

is the instantaneous position vector of any grid point or fluid

A New Method of Estimation of Size-Biased Generalized Logarithmic Series Distribution

A Power Method for Computing Square Roots of Complex Matrices

1) (A B) = A B ( ) 2) A B = A. i) A A = φ i j. ii) Additional Important Properties of Sets. De Morgan s Theorems :

LINEAR AND NONLINEAR ANALYSES OF A WIND-TUNNEL BALANCE

Electromagnetic scattering. Graduate Course Electrical Engineering (Communications) 1 st Semester, Sharif University of Technology

Chapter 6 Balanced Incomplete Block Design (BIBD)

Some technical details on confidence. intervals for LIFT measures in data. mining

As is natural, our Aerospace Structures will be described in a Euclidean three-dimensional space R 3.

On the Simes inequality and its generalization

A Comparative Study of Exponential Time between Events Charts

6 Matrix Concentration Bounds

Single Particle State AB AB

ST 501 Course: Fundamentals of Statistical Inference I. Sujit K. Ghosh.

Revision of Lecture Eight

Alternative Tests for the Poisson Distribution

New problems in universal algebraic geometry illustrated by boolean equations

3.1 Random variables

A DETAILED DESCRIPTION OF THE DISCREPANCY IN FORMULAS FOR THE STANDARD ERROR OF THE DIFFERENCE BETWEEN A RAW AND PARTIAL CORRELATION: A TYPOGRAPHICAL

ABSTRACT INTRODUCTION

Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline. Machine. Learning. Problems. Measuring. Performance.

COUPLED MODELS OF ROLLING, SLIDING AND WHIRLING FRICTION

Analytical calculation of the power dissipated in the LHC liner. Stefano De Santis - LBNL and Andrea Mostacci - CERN

MATH 220: SECOND ORDER CONSTANT COEFFICIENT PDE. We consider second order constant coefficient scalar linear PDEs on R n. These have the form

Safety variations in steel designed using Eurocode 3

8 Separation of Variables in Other Coordinate Systems

Atomic Physics Effects on Convergent, Spherically Symmetric Ion Flow

Surveillance Points in High Dimensional Spaces

PES 3950/PHYS 6950: Homework Assignment 6

Stanford University CS259Q: Quantum Computing Handout 8 Luca Trevisan October 18, 2012

Math 151. Rumbos Spring Solutions to Assignment #7

Recent Advances in Chemical Engineering, Biochemistry and Computational Chemistry

MA557/MA578/CS557. Lecture 14. Prof. Tim Warburton. Spring

Newton s Laws, Kepler s Laws, and Planetary Orbits

FE FORMULATIONS FOR PLASTICITY

THE NUMBER OF TWO CONSECUTIVE SUCCESSES IN A HOPPE-PÓLYA URN

MULTILAYER PERCEPTRONS

ASTR415: Problem Set #6

Teoría del Funcional de la Densidad (Density Functional Theory)

Power and sample size calculations for longitudinal studies comparing rates of change with a time-varying exposure

LET a random variable x follows the two - parameter

COMP Parallel Computing SMM (3) OpenMP Case Study: The Barnes-Hut N-body Algorithm

Auchmuty High School Mathematics Department Advanced Higher Notes Teacher Version

Many Electron Atoms. Electrons can be put into approximate orbitals and the properties of the many electron systems can be catalogued

QIP Course 10: Quantum Factorization Algorithm (Part 3)

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Physics Department Physics 8.07: Electromagnetism II September 15, 2012 Prof. Alan Guth PROBLEM SET 2

JIEMS Journal of Industrial Engineering and Management Studies

7.2. Coulomb s Law. The Electric Force

SOME STEP-DOWN PROCEDURES CONTROLLING THE FALSE DISCOVERY RATE UNDER DEPENDENCE

Uncertainty in Operational Modal Analysis of Hydraulic Turbine Components

Chapter 3 Optical Systems with Annular Pupils

EM Boundary Value Problems

Directed Regression. Benjamin Van Roy Stanford University Stanford, CA Abstract

Bayesian Analysis of Topp-Leone Distribution under Different Loss Functions and Different Priors

Nuclear size corrections to the energy levels of single-electron atoms

FALL 2006 EXAM C SOLUTIONS

HOW TO TEACH THE FUNDAMENTALS OF INFORMATION SCIENCE, CODING, DECODING AND NUMBER SYSTEMS?

Contact impedance of grounded and capacitive electrodes

Multi-Objective Optimization Algorithms for Finite Element Model Updating

What molecular weight polymer is necessary to provide steric stabilization? = [1]

arxiv: v2 [physics.data-an] 15 Jul 2015

Random Variables and Probability Distribution Random Variable

Nuclear Medicine Physics 02 Oct. 2007

2. The Munich chain ladder method

10/04/18. P [P(x)] 1 negl(n).

c n ψ n (r)e ient/ h (2) where E n = 1 mc 2 α 2 Z 2 ψ(r) = c n ψ n (r) = c n = ψn(r)ψ(r)d 3 x e 2r/a0 1 πa e 3r/a0 r 2 dr c 1 2 = 2 9 /3 6 = 0.

The geometric construction of Ewald sphere and Bragg condition:

State tracking control for Takagi-Sugeno models

AREVA NP GmbH. AREVA NP GmbH, an AREVA and Siemens company

Fresnel Diffraction. monchromatic light source

6 PROBABILITY GENERATING FUNCTIONS

Coupled Electromagnetic and Heat Transfer Simulations for RF Applicator Design for Efficient Heating of Materials

Transcription:

Multiple Testin

Test Hypothesis in in Micoaay Studies Micoaay studies aim to discove enes in bioloical samples that ae diffeentially expessed unde diffeent expeimental conditions aim at havin hih pobability of declain enes to be sinificantly expessed if they ae tuly expessed (hih powe ~ low type II eo isk), while keepin the pobability of makin false declaations of expession acceptably low (contollin type I eo isk) Lee & Whitmoe (22) Statistics in Medicine 21, 3543-357 expession..5.1.15 1 2 3 4 ene

Multiple Testin Micoaay studies typically involve the simultaneous study of thousands of enes, the pobability of poducin incoect test conclusions (false positives and false neatives) must be contolled fo the whole ene set. fo each ene thee ae two possible situations - the ene is not diffeentially expessed, e.. hypothesis H is tue - the ene is diffeentially expessed at the level descibed by the altenative hypothesis H A test declaation (decision) tue hypothesis unexpessed (H ) expessed (H A ) - the ene is diffeentially expessed (H ejected) - the ene is unexpessed (H not ejected) unexpessed (H not ejected) tue neative false neative (type II eo β) test declaation expessed (H ejected) false positive (type I eo α) tue positive Lee & Whitmoe (22) Statistics in Medicine 21, 3543-357

Multiple Testin Testin simultaneously hypothesis H 1,..., H, of these hypothesis ae tue # not ejected # ejected hypothesis hypothesis # tue hypothesis (unexpessed enes) U V # false hypothesis (expessed enes) T S - total -R R counts U, V, S, T ae andom vaiables in advance of the analysis of the study data obseved andom vaiable R = numbe of ejected hypothesis U, V, S, T not obsevable andom vaiables V = numbe of type I eos (false positives) T = numbe of type II eos (false neatives) Dudoit et al. (22) Multiple Hypothesis Testin in Micoaay Expeiments, Technical Repot

Type I I and II II Eo Rates # not ejected hypothesis # ejected Hypothesis # tue hypothesis (unexpessed enes) U V # false hypothesis (expessed enes) T S - total -R R α = pobability of type I eo fo any ene = E(V)/ β 1 = pobability of type II eo fo any ene = E(T)/(- ) α F = family-wise eo ate (FWER) = P(V > ) (pobability of at least one type I eo) False discovey ate (FDR) (Benjamini & Hochbe, 1995) = expected popotion of false positives amon the ejected hypothesis FDR = E( Q), V / R Q = : R > : R = Dudoit et al. (22) Multiple Hypothesis Testin in Micoaay Expeiments, Technical Repot

Ston vs. weak contol expectations and pobabilities ae conditional on which hypothesis ae tue ston contol: contol of the Type I eo ate unde any combination of tue and false hypotheses, i.e., any value of h H 1, fo all {,..., }, = weak contol: contol of the Type I eo ate only when all hypothesis ae tue, i.e. unde the complete null-hypothesis H C = h = 1H, with = Dudoit et al. (22) Multiple Hypothesis Testin in Micoaay Expeiments, Technical Repot

Notations Notations Fo hypothesis H, = 1,..., : obseved test statistics t obseved unadjusted p-values p Odeed p-values and test statistics: } {,, t t t p p p = 2 1 2 1 1 Dudoit et al. (22) Multiple Hypothesis Testin in Micoaay Expeiments, Technical Repot

Contol of of the family-wise eo ate (FWER) obseved p-values Bonfeoni Holm Step-down Hochbe Step-up p 1 α/ α/ α/ p 2 α/ α/(-1) α/(-1) : : : : p α/ α/(-+1) α/(-+1) : : : : p 1 α/ α/2 α/2 p α/ α α

Contol of of the family-wise eo ate (FWER) 1. sinle-step Bonfeoni pocedue eject H with p α/, adjusted p-value p ~ = min( p, 1) 2. Holm (1979) step-down pocedue * = min{ : p > α /( + 1)}, eject H fo 1,, = adjusted p - value p ~ = max {min(( m k + 1) p, 1)} k = 1,, k * 1, 3. Hochbe (1988) step-up pocedue * = max{ : p α /( + 1)}, eject H fo 1,, = adjusted p - value p ~ = min {min(( k + 1) p, 1)} k =,, m 4. Sinle-step Šidák pocedue adjusted p - value p ~ = 1 ( 1 ) p k *, Dudoit et al. (22) Multiple Hypothesis Testin in Micoaay Expeiments, Technical Repot

Resamplin Estimate joint distibution of the test statistics T 1,...,T unde the complete null C H hypothesis by pemutin the columns of the ene expession data matix X. Pemutation aloithm fo non-adjusted p-values Fo the b-th pemutation, b = 1,...,B 1. Pemute the n columns of the data matix X. 2. Compute test statistics t 1,b,..., t,b fo each hypothesis. The pemutation distibution of the test statistic T fo hypothesis H, =1,...,, is iven by the empiical distibution of t,1,..., t,b. Fo two-sided altenative hypotheses, the pemutation p-value fo hypothesis H is p * = B 1 I( t, b t j ) B b= 1 whee I(.) is the indicato function, equalin 1 if the condition in paenthesis is tue, and othewise. Dudoit et al. (22) Multiple Hypothesis Testin in Micoaay Expeiments, Technical Repot

Contol of of the family-wise eo ate (FWER) Pemutation aloithm of Westfall & Youn (1993) - step-down pocedue without assumin t distibution of the test statistics fo each ene s diffeential expession - adjusted p-values diectly estimated by pemutation - ston contol of FWER - takes dependency stuctue of hypotheses into account

Contol of of the family-wise eo ate (FWER) Pemutation aloithm of Westfall & Youn (maxt) - Ode obseved test statistics: t t t 1 2 - fo the b-th pemutation of the data (b = 1,...,B): divide data into atificial contol and teatment oup compute test statistics t 1b,..., t b compute successive maxima of the test statistics u u, b, b = t, b = max{ u + 1, b, t, b } fü = 1,..., 1 - compute adjusted p-values: p ~ * = B 1 I( u t ) B b= 1, b Dudoit et al. (22) Multiple Hypothesis Testin in Micoaay Expeiments, Technical Repot

Contol of of the family-wise eo ate (FWER) Pemutation aloithm of Westfall &Youn Example ene 1 4 5 2 3 t.1.2 2.8 3.4 7.1 t t 1 sot obseved values : t 2 t 1 ene t b u b I(u b > t ) p ~ = / B 1 1.3 1.3 1 935.935 4.8 1.3 1 876.876 5 3. 3. 1 138.138 2 2.1 3. 145.145 3 1.8 3. 48.48 B=1 pemutations adjusted p-values O. Hatmann - NFN Symposium, 19.11.22 Belin

Example: Leukemia study, olub et et al. al. (1999) patients with ALL (acute lymphoblastic leukemia) n 1 =27 AML (acute myeloid leukemia) n 2 =11 Affy-Chip: 6817 enes eduction to 351 enes accodin to cetain exclusion citeia fo expession values

Example: Leukemia study, olub et et al. al. (1999) Dudoit et al. (22)

Example: Leukemia study, olub et et al. al. (1999) Dudoit et al. (22)

Contol of of the False Discovey Rate (FDR) While in some cases FWER contol is needed, the multiplicity poblem in micoaay data does not equie a potection aainst aainst even a sinle type I eo, so that the seve loss of powe involved in such potection is not justified. Instead, it may be moe appopiate to emphasize the popotion of eos amon the identified diffeentially expessed enes. The expectation of this popotion is the False Discovey Rate (FDR). FDR = E( Q), Q = V / R : R : R > = R = numbe of ejected hypothesis V = numbe of type I eos (false positives) Reine, Yekutieli & Benjamini (23) Bioinfomatics 19, 368-375

Contol of of the False Discovey Rate (FDR) 1. Linea step-up pocedue (Benjamini & Hochbe, 1995) * = max{ : p adjusted p - value q}, p ~ = eject min k =,, H {min( fo k p k = 1,,, 1)} *, - contols FDR at level q fo independent test statistics FDR q q 2. Benjamini & Yekutieli (21) - pocedue 1 contols the FDR unde cetain dependency stuctues (positive eession dependency) - step-up pocedue fo moe eneal cases (eplace q by q / 1/ i ) i 1 * = max adjusted p - { } : p q /( 1/ i), eject H fo i = 1 ~ { } value p = min min( p 1/ i, 1) k =, l, k k i = 1 = = 1, l, *, - this modification may be to consevative fo the micoaay poblem Reine, Yekutieli & Benjamini (23) Bioinfomatics 19, 368-375

Contol of of the False Discovey Rate (FDR) 3. Adaptive pocedues (Benjamini & Hochbe, 2) - ty to estimate and use q*=q / instead of q in pocedue 1 to ain moe powe - Stoey (21) suests a simila vesion to estimate, which ae implemented in SAM (Stoey & Tibshiani, 23) - adaptive methods offe bette pefomance only by utilizin the diffeence between / and 1, if the diffeence is small, i.e. when the potential popotion of diffeentially expessed enes is small, they offe little advantae in powe while thei popeties ae not well established. 4. Resamplin FDR adjustments - Yekutieli & Benjamini (1999) J. Statist. Plan. Infeence 82, 171-196 - Reine, Yekutieli & Benjamini (23) Bioinfomatics 19, 368-375 Reine, Yekutieli & Benjamini (23) Bioinfomatics 19, 368-375

Example: Leukemia study, olub et et al. al. (1999) Dudoit et al. (22)

Example: Apo AI AI Exp., Callow et et al. al. (2) Apolipopotein A1 (Apo A1) expeiment in mice aim: identification of diffeentially expessed enes in live tissues expeimental oup: contol oup: 8 mice with apo A1-ene knocked out (apo A1 KO) 8 C57B1/6 mice expeimental sample: cdna fo each of the 16 mice labeled with ed (Cy5) efeence-sample: pooled cdna of the 8 contol mice labeled with een (Cy3) cdna Aays with 6384 cdna pobes, 2 elated to lipid-metabolism 16 hybidizations oveall

Example: Apo AI AI Exp., Callow et et al. al. (2) Dudoit et al. (22)

Beispiel 2: 2: Apo AI AI Exp., Callow et et al. al. (2) Dudoit et al. (22)

Multiple Testin -- Summay Fo multiple testin poblems thee ae seveal methods to contol the family-wise eo ate (FWER). FDR contollin pocedues ae pomisin altenatives to moe consevative FWER contollin pocedues. Ston contol of the type one eo ate is essential in the micoaay context. Adjusted p-values povide flexible summaies of the esults fom a multiple testin pocedue and allow fo a compaison of diffeent methods. Substantial ain in powe can be obtained by takin into account the joint distibution of the test statistics (e.. Westfall & Youn, 1993; Reine, Yekutieli & Benjamini 23). Recommended softwae: Bioconducto R multtest packae (http://www.bioconducto.o/) Adapted fom S. Dudoit, Bioconducto shot couse 22

Multiple Testin -- Liteatue Benjamini, Y. & Hochbe, Y. (1995). Contollin the false discovey ate: a pactical and poweful appoach to multiple testin, J. R. Statist. Soc. B 57: 289-3. Benjamini,Y. and Hochbe,Y. (2) On the adaptive contol of the false discovey ate in multiple testin with independent statistics. J. Educ. Behav. Stat., 25, 6 83. Benjamini,Y. and Yekutieli,D. (21b) The contol of the false discovey ate unde dependency. Ann Stat. 29, 1165 1188. Callow, M. J., Dudoit, S., on, E. L., Speed, T. P. & Rubin, E. M. (2). Micoaay expession pofilin identifies enes with alteed expession in HDL deficient mice, enome Reseach 1(12): 222-229. S. Dudoit, J. P. Shaffe, and J. C. Boldick (Submitted). Multiple hypothesis testin in micoaay expeiments, Technical Repot #11 (http://stat-www.bekeley.edu/uses/sandine/publications.html) olub, T. R., Slonim, D. K., Tamayo, P., Huad, C., aasenbeek,m., Mesiov, J. P., Colle, H., Loh, M., Downin, J. R., Caliiui, M. A., Bloomeld, C. D. & Lande, E. S. (1999). Molecula classication of cance: class discovey and class pediction by ene expession monitoin, Science 286: 531-537.

Multiple Testin -- Liteatue Hochbe, Y. (1988). A shape bonfeoni pocedue fo multiple tests of sinificance, Biometika 75: 8-82. Holm, S. (1979). A simple sequentially ejective multiple test pocedue, Scand. J. Statist. 6: 65-7. M.-L. T. Lee &.A. Whitmoe (22) Powe and sample size fo DNA micoaay studies. Statistics in Medicine 21, 3543-357. A. Reine, D. Yekutieli & Y. Benjamini (23) Identifyin diffeentially expessed enes usin false discovey ate contollin pocedues. Bioinfomatics 19, 368-375 Westfall, P. H. & Youn, S. S. (1993). Resamplin-based multiple testin: Examples and methods fo p-value adjustment, John Wiley & Sons. Yekutieli,D. and Benjamini,Y. (1999) Resamplin-based false discovey ate contollin multiple test pocedues fo coelated test statistics. J. Stat. Plan Infe., 82, 171 196.

2 x 2 Factoial Expeiments Two expeimental factos, e.. teatment (unteated T -, teated T +) stain (knock out KN, wild-type WT) Linea model y = β + β1x1 + β2x2 + β3x1x2 + ε, ε ~ N(, σ 2 ) x x 1 2 = 1 = 1 : stain = KN : stain = WT : teatment = T : teatment = T + KN stain WT β 1 - stain effect β 2 - teatment effect β 3 - inteaction effect of stain and teatment teatment T - T+ β β + β 2 β + β 1 β +β 1 +β 3

2 x 2 Factoial Expeiments β 3 > β + + β1 + β 2 β3 β + + β1 + β 2 β3 WT β 3 T+ β 3 β + β 2 β + β 2 β + β 1 KN β 2 β + β 1 β 2 β β 1 β T- β 1 T- T+ KN WT

2 x 2 Factoial Expeiments β 3 < β + β 2 β 3 β + β 2 T+ β 3 β + + β1 + β 2 β3 β + β 1 β WT β 1 KN β 2 β + + β1 + β 2 β3 β + β 1 β β 2 T- β 1 T- T+ KN WT

2 x 2 Factoial Expeiments H : β 3 = H A : β 3 - effect of stain is independent of teatment o - effect of teatment is independent of stain o - stain and teatment ae additive - teatment inteacts with stain - teatment modifies effect of stain - stain modifies effect of teatment - teatment and stain ae nonadditive H : β 1 = β 3 = H A : β 1 o β 3 - stain is not associated with expession Y - stain is associated with expession Y - stain is associated with expession Y fo eithe T- o T+ H : β 2 = β 3 = H A : β 2 o β 3 - teatment is not associated with expession Y - teatment is associated with expession Y - teatment is associated with expession Y fo eithe KN o WT F.E. Haell, J. (21) Reession Modelin Stateies, Spine

2 x 2 Factoial Expeiments --Teatment effect 7 8 9 1 11 12 13 T- T+ KN WT

2 x 2 Factoial Expeiments --Stain effect 7 8 9 1 11 12 T- T+ KN WT

2 x 2 Factoial Expeiments --Stain effect 6. 6.2 6.4 6.6 6.8 7. T- T+ KN WT

2 x 2 Factoial Expeiments --Inteaction effect 8. 8.2 8.4 8.6 8.8 9. T- T+ KN WT