Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018

Size: px
Start display at page:

Download "Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018"

Transcription

1 Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals BayesMP Zhiguang Huo 1, Chi Song 2, George Tseng 3 1 Department of Biostatistics, University of Florida 2 Department of Biostatistics, the Ohio State University 3 Department of Biostatistics, University of Pittsburgh July 30, / 19

2 Background for data integration (Tseng et al. 2012) Horizontal meta-analysis: Same type of genomic data from multiple patient cohorts. Vertical integrative analysis: Multiple types of genomic data from the same patient cohort. BayesMP: Differential expression (DE) analysis. 2 / 19

3 Backgroud for meta-analysis of DE analysis According to Tseng et al. (2012), there are four major categories of transcriptomic meta-analysis: Combine effect sizes Fixed effects models, random effects model Combine p-values p-value aggregation methods: Fisher (Fisher, 1925), Stouffer (Stouffer, 1949) order statistics: minp (Tippett, 1931), maxp (Wilkinson, 1951), rop (Song, 2014) Combine ranks ranksum, rankprod (Hong et al, 2006) Direct merge 3 / 19

4 Combine p-values Combining p-values is simple, powerful and independent of batch effect. Table: p-value combining method. E.g. combine p 11, p 12,..., p 1S Genes Study 1 Study 2... Study S 1 p 11 p p 1S 2 p 21 p p 2S 3 p 31 p p 3S G p G1 p G2... p GS Genomic meta-analysis Perform combining p-value methods gene-wisely Adjust for multiple comparisons 4 / 19

5 Motivation 1: Hypothesis testing setting θ s is the effect size of study s, 1 s S. HS B targets biomarkers that are DE in one or more studies: H 0 : θ {θ s = 0} vs H A : θ {θ s 0}. Fisher minp HS A targets biomarkers that are DE in all studies: H 0 : θ {θ s = 0} vs H A : θ {θ s 0}, maxp HS r targets biomarkers that r or more studies are DE: H 0 : θ {θ s = 0} vs H A : θ I{θ s 0} r, rop Problem: HS A and HS r are not complementary hypothesis testing setting. 5 / 19

6 Motivation 2: differential expression from multiple tissues I. II. III. IV. V. VI. Brown fat Heart Liver Figure: heatmap Phenotypes: Black: Wild type. Red: VLCAD-deficient. Differential expression pattern: Homogeneous differential expression pattern. (Moduel I, II). Study specific differential expression pattern. (Moduel III, IV, V, VI) How to categorize meta-analysis differential expression pattern (metapattern)? 6 / 19

7 Z statistics and its distribution Figure: Z statistics distribution in one study. Black line: null component. Red line: positive DE component. Blue line: negative DE component. p gs is one sided p-value for gene g and study s. Z gs = Φ 1 (p gs ), where Φ 1 ( ) is the inverse cumulative density function (CDF) of standard Gaussian distribution. Null component: assume standard Gaussian distribution or empirical null (Efron, 2004). Alternative component: Dirichlet process. 7 / 19

8 Multiple studies (a) Study 1 (b) Study 2 (c) Study 3 Figure: Z statistics distribution in three studies. Y gs { 1, 0, 1} is DE indicator: f (s) (Z gs Y gs ) = f (s) 0 (Z gs ) I(Y gs = 0) + f (s) +1 (Z gs) I(Y gs = 1) + f (s) 1 (Z gs) I(Y gs = 1), Prior Y gs Mult ( 1, (1 π g, π + g, π g ) ) (0, 1, 1), where π + g = π g δ g, π g = π g (1 δ g ). 8 / 19

9 Graphical Model G 0+ G 0- γ β α π g δ g G s+ G s- Y gs f (s) f (s) k+ k- f 0 f (s) Z gs Figure: Graphical representation of Bayesian latent hierarchical model. Shaded nodes are observed variables. Dashed nodes are pre-estimated/fixed parameters. Arrows represent generative process. Dashed lines represent equivalent variables. s is the study index and g is the gene index. 9 / 19

10 Bayesian computing 1. Update π g s: π g Y gs Beta(γ/(G γ) + Y + g + Y g, S Y + g Y g + 1), where Y + g = s I(Y gs = 1) and Y g = s I(Y gs = 1). 2. Update δ g s: 3. Update Y gs s: First update C gs s s.t. δ g Y gs Beta(β + Y + g, β + Y g ). Pr(C gs = k C g,s, Z gs, π ± g ) h (s) k (Z gs C g,s )(π g + ) I(k>0) (πg ) I(k<0) (1 π g ) I(k=0) Set Y gs = sgn(c gs ), Conjugacy will make the Bayesian computing very fast. 10 / 19

11 Decision making framework (Problem 1) For meta-analysis purpose, we will declare differentially expressed genes which are in: ΩĀ : Ω 1 = { θ Ā g : S s=1 I(θ gs 0) = S}. Ω B : Ω 1 B = { θ g : S s=1 I(θ gs 0) = 1}. Ω r : Ω 1 r = { θ g : S s=1 I(θ gs 0) r}. Efron (2001) proposed local FDR ξ g = Pr( θ g Ω 0 Ā Z) = 1 Pr( θ g Ω 1 Ā Z). Given a threshold κ, we declare gene g as a DE gene if ξ g κ and the expected number of false discoveries is g ξ g I(ξ g κ). The Bayesian false discovery rate (FDR) (Newton 2004) is defined as g ξg I(ξg κ) g I(ξg κ). We will compare the performance of our Bayesian approach in terms of FDR with FDR (Benjamini-Hochberg) from frequentists perspective. 11 / 19

12 Biomarker clustering for meta-patterns of homogenous and heterogenous differential signals (Problem 2) Denote by U gs the posterior probability vector for Y gs : U gs = (Pr(Y gs = 1 Z), Pr(Y gs = 1 Z), Pr(Y gs = 0 Z)). We will calculate dissimilarity of U is and U js in study s and then average over study index s. Apply tight clustering (Tseng and Wong) on gene-gene dissimilarity matrix, obtain stable modules. 12 / 19

13 Simulation (FDR) Table: Comparison of different methods by FDR for decision spaces D Ā, D B, and D r. The nominal FDR is 5% for all compared methods. The mean results and SD (in parentheses) were calculated based on 100 simulations. FDR DĀ D B D r (r = S/2 + 1) S σ BayesMP maxp BayesMP Fisher AW BayesMP rop (0.008) (0.013) (0.006) (0.005) (0.004) (0.005) (0.008) (0.012) (0.016) (0.008) (0.006) (0.006) (0.006) (0.010) (0.018) (0.021) (0.010) (0.008) (0.009) (0.008) (0.015) (0.009) (0.017) (0.005) (0.004) (0.004) (0.005) (0.008) (0.016) (0.023) (0.006) (0.005) (0.005) (0.007) (0.008) (0.032) (0.035) (0.008) (0.008) (0.008) (0.008) (0.013) (0.019) (0.023) (0.004) (0.004) (0.004) (0.005) (0.010) (0.029) (0.027) (0.006) (0.005) (0.005) (0.009) (0.012) (0.063) (0.038) (0.007) (0.006) (0.006) (0.009) (0.014) 13 / 19

14 Simulation (AUC) Table: Comparison of different methods by AUC of ROC curve for decision spaces D Ā, D B, and D r. The nominal FDR is 5% for all compared methods. The mean results and SD (in parentheses) were calculated based on 100 simulations. AUC DĀ D B D r (r = S/2 + 1) S σ BayesMP maxp BayesMP Fisher AW BayesMP rop (0.003) (0.003) (0.002) (0.002) (0.002) (0.002) (0.003) (0.006) (0.007) (0.004) (0.004) (0.004) (0.004) (0.005) (0.008) (0.008) (0.005) (0.005) (0.005) (0.006) (0.006) (0.004) (0.003) (0.002) (0.002) (0.002) (0.002) (0.002) (0.007) (0.006) (0.004) (0.004) (0.004) (0.004) (0.005) (0.009) (0.009) (0.005) (0.005) (0.005) (0.005) (0.006) (0.007) (0.003) (0.001) (0.002) (0.001) (0.002) (0.002) (0.011) (0.006) (0.003) (0.003) (0.003) (0.004) (0.004) (0.013) (0.010) (0.004) (0.004) (0.005) (0.005) (0.006) 14 / 19

15 Mouse Metabolism data Table: Sample size description Study wild type VLCAD Brown fat 4 4 Heart 3 4 Liver 4 4 Metabolism disorder in children. Two genotypes of the mouse model - wild type (VLCAD +/+) and VLCAD-deficient (VLCAD -/-)-were studied. Total number of genes from these three transcriptomic studies is 14,495. For D B FDR 5%, we declared 1,701 genes. For D A FDR 5%, we declared 133 genes. 15 / 19

16 Mouse Metabolism data metapattern Brown fat Heart Liver Brown Heart Liver n = 277 I. II. III. IV. V Brown+ Heart+ Liver+ Brown Heart Liver n = 195 Brown+ Heart+ Liver+ Brown Heart Liver n = 194 Brown+ Heart+ Liver+ Brown Heart Liver n = 140 Brown+ Heart+ Liver+ Brown Heart Liver n = 276 Brown+ Heart+ Liver+ Brown Heart Liver n = 110 VI. (a) Heatmap (b) CS (c) Brown+ Heart+ Liver+ Brown Heart Liver bar plot 16 / 19

17 Mouse Metabolism data pathway enrichment analysis Table: module information Target pathway type q value module 1 KEGG LYSOSOME q = module 2 BIOCARTA AHSP PATHWAY q = module 3 DEFENSE RESPONSE q = module 4 BIOCARTA MCM PATHWAY q = module 5 none module 6 FC GAMMA R MEDIATED PHAGOCYTOSIS q = / 19

18 Mouse Metabolism data D A FDR 5%, 133 genes (a) Brown (b) Heart (c) Liver Figure: Heatmaps of 133 DE genes detected under D Ā (at FDR level of 5%) in the mouse metabolism dataset. 18 / 19

19 Summary Novelty: 1. The p-value based method is capable of combining data from different microarray and RNA-seq platforms, 2. Bayesian framework provides complementary decision making space. 3. Non-parametric Bayesian framework makes it robust against distribution assumptions. 4. Meta-pattern help characterize heterogeneities of studies with same disease but different pheonotypes. Performance: 1. Better performance than current meta-analysis hypothesis testing methods (AUC, FDR, etc). 2. Computing is fast because of conjugacy. 3. Implemented is C++ and publicly available in Github. 19 / 19

using Bayesian hierarchical model

using Bayesian hierarchical model Biomarker detection and categorization in RNA-seq meta-analysis using Bayesian hierarchical model Tianzhou Ma Department of Biostatistics University of Pittsburgh, Pittsburgh, PA 15261 email: tim28@pitt.edu

More information

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS

REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS Ying Liu Department of Biostatistics, Columbia University Summer Intern at Research and CMC Biostats, Sanofi, Boston August 26, 2015 OUTLINE 1 Introduction

More information

Multiple testing: Intro & FWER 1

Multiple testing: Intro & FWER 1 Multiple testing: Intro & FWER 1 Mark van de Wiel mark.vdwiel@vumc.nl Dep of Epidemiology & Biostatistics,VUmc, Amsterdam Dep of Mathematics, VU 1 Some slides courtesy of Jelle Goeman 1 Practical notes

More information

FDR and ROC: Similarities, Assumptions, and Decisions

FDR and ROC: Similarities, Assumptions, and Decisions EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

High-throughput Testing

High-throughput Testing High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector

More information

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE

A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE Sanat K. Sarkar 1, Tianhui Zhou and Debashis Ghosh Temple University, Wyeth Pharmaceuticals and

More information

Looking at the Other Side of Bonferroni

Looking at the Other Side of Bonferroni Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs

Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs with Remarks on Family Selection Dissertation Defense April 5, 204 Contents Dissertation Defense Introduction 2 FWER Control within

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University

More information

Research Article Sample Size Calculation for Controlling False Discovery Proportion

Research Article Sample Size Calculation for Controlling False Discovery Proportion Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,

More information

Androgen-independent prostate cancer

Androgen-independent prostate cancer The following tutorial walks through the identification of biological themes in a microarray dataset examining androgen-independent. Visit the GeneSifter Data Center (www.genesifter.net/web/datacenter.html)

More information

Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses

Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses arxiv:1610.03330v1 [stat.me] 11 Oct 2016 Jingshu Wang, Chiara Sabatti, Art B. Owen Department of Statistics, Stanford University

More information

Bayesian Partition Models for Identifying Expression Quantitative Trait Loci

Bayesian Partition Models for Identifying Expression Quantitative Trait Loci Journal of the American Statistical Association ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20 Bayesian Partition Models for Identifying Expression Quantitative

More information

Department of Statistics, The Wharton School, University of Pennsylvania

Department of Statistics, The Wharton School, University of Pennsylvania Submitted to the Annals of Applied Statistics BAYESIAN TESTING OF MANY HYPOTHESIS MANY GENES: A STUDY OF SLEEP APNEA BY SHANE T. JENSEN Department of Statistics, The Wharton School, University of Pennsylvania

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are

More information

Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq

Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq Xing Ren 1, Jianmin Wang 1,2,, Song Liu 1,2, and Jeffrey C. Miecznikowski 1,2,

More information

Empirical Bayesian Inference & Non-Null Bootstrapping for Threshold Selection, Nasseroleslami Page 1 of 10

Empirical Bayesian Inference & Non-Null Bootstrapping for Threshold Selection, Nasseroleslami Page 1 of 10 Empirical Bayesian Inference & Non-Null Bootstrapping for Threshold Selection, Nasseroleslami Page 1 of 10 An Implementation of Empirical Bayesian Inference and Non-Null Bootstrapping for Threshold Selection

More information

Pearson s meta-analysis revisited

Pearson s meta-analysis revisited Pearson s meta-analysis revisited 1 Pearson s meta-analysis revisited in a microarray context Art B. Owen Department of Statistics Stanford University Pearson s meta-analysis revisited 2 Long story short

More information

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 28 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 3, 2015 1 2 3 4 5 1 Familywise error rates 2 procedure 3 Performance of with multiple

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE

SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE A HYPOTHESIS TEST APPROACH Ismaïl Ahmed 1,2, Françoise Haramburu 3,4, Annie Fourrier-Réglat 3,4,5, Frantz Thiessard 4,5,6,

More information

The miss rate for the analysis of gene expression data

The miss rate for the analysis of gene expression data Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,

More information

A Large-Sample Approach to Controlling the False Discovery Rate

A Large-Sample Approach to Controlling the False Discovery Rate A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University

More information

Lesson 11. Functional Genomics I: Microarray Analysis

Lesson 11. Functional Genomics I: Microarray Analysis Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Biochip informatics-(i)

Biochip informatics-(i) Biochip informatics-(i) : biochip normalization & differential expression Ju Han Kim, M.D., Ph.D. SNUBI: SNUBiomedical Informatics http://www.snubi snubi.org/ Biochip Informatics - (I) Biochip basics Preprocessing

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Guosheng Yin Department of Statistics and Actuarial Science The University of Hong Kong Joint work with J. Xu PSI and RSS Journal

More information

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test L. García Barrado 1 E. Coart 2 T. Burzykowski 1,2 1 Interuniversity Institute for Biostatistics and

More information

A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments

A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Jie Chen 1 Merck Research Laboratories, P. O. Box 4, BL3-2, West Point, PA 19486, U.S.A. Telephone:

More information

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder

More information

High-dimensional data: Exploratory data analysis

High-dimensional data: Exploratory data analysis High-dimensional data: Exploratory data analysis Mark van de Wiel mark.vdwiel@vumc.nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University Contributions by Wessel

More information

Bayesian Aspects of Classification Procedures

Bayesian Aspects of Classification Procedures University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations --203 Bayesian Aspects of Classification Procedures Igar Fuki University of Pennsylvania, igarfuki@wharton.upenn.edu Follow

More information

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors

Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a

More information

Step-down FDR Procedures for Large Numbers of Hypotheses

Step-down FDR Procedures for Large Numbers of Hypotheses Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Stat 206: Estimation and testing for a mean vector,

Stat 206: Estimation and testing for a mean vector, Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where

More information

Frequentist Accuracy of Bayesian Estimates

Frequentist Accuracy of Bayesian Estimates Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University Bayesian Inference Parameter: µ Ω Observed data: x Prior: π(µ) Probability distributions: Parameter of interest: { fµ (x), µ

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Network Biology-part II

Network Biology-part II Network Biology-part II Jun Zhu, Ph. D. Professor of Genomics and Genetic Sciences Icahn Institute of Genomics and Multi-scale Biology The Tisch Cancer Institute Icahn Medical School at Mount Sinai New

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Large-Scale Hypothesis Testing

Large-Scale Hypothesis Testing Chapter 2 Large-Scale Hypothesis Testing Progress in statistics is usually at the mercy of our scientific colleagues, whose data is the nature from which we work. Agricultural experimentation in the early

More information

Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments

Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Jie Chen 1 Merck Research Laboratories, P. O. Box 4, BL3-2, West Point, PA 19486, U.S.A. Telephone:

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive

More information

Lecture: Mixture Models for Microbiome data

Lecture: Mixture Models for Microbiome data Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance

More information

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1

Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA

More information

Non-Parametric Combination (NPC) & classical multivariate tests

Non-Parametric Combination (NPC) & classical multivariate tests Non-Parametric Combination (NPC) & classical multivariate tests Anderson M. Winkler fmrib Analysis Group 5.May.26 Winkler Non-Parametric Combination (NPC) / 55 Winkler Non-Parametric Combination (NPC)

More information

False Discovery Control in Spatial Multiple Testing

False Discovery Control in Spatial Multiple Testing False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES

A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES By Wenge Guo Gavin Lynch Joseph P. Romano Technical Report No. 2018-06 September 2018

More information

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

Journal Club: Higher Criticism

Journal Club: Higher Criticism Journal Club: Higher Criticism David Donoho (2002): Higher Criticism for Heterogeneous Mixtures, Technical Report No. 2002-12, Dept. of Statistics, Stanford University. Introduction John Tukey (1976):

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /8/2016 1/38

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /8/2016 1/38 BIO5312 Biostatistics Lecture 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016 1/38 Outline In this lecture, we will continue to

More information

Rank conditional coverage and confidence intervals in high dimensional problems

Rank conditional coverage and confidence intervals in high dimensional problems conditional coverage and confidence intervals in high dimensional problems arxiv:1702.06986v1 [stat.me] 22 Feb 2017 Jean Morrison and Noah Simon Department of Biostatistics, University of Washington, Seattle,

More information

Dispersion modeling for RNAseq differential analysis

Dispersion modeling for RNAseq differential analysis Dispersion modeling for RNAseq differential analysis E. Bonafede 1, F. Picard 2, S. Robin 3, C. Viroli 1 ( 1 ) univ. Bologna, ( 3 ) CNRS/univ. Lyon I, ( 3 ) INRA/AgroParisTech, Paris IBC, Victoria, July

More information

Peak Detection for Images

Peak Detection for Images Peak Detection for Images Armin Schwartzman Division of Biostatistics, UC San Diego June 016 Overview How can we improve detection power? Use a less conservative error criterion Take advantage of prior

More information

Spiked Dirichlet Process Prior for Bayesian Multiple Hypothesis Testing in Random Effects Models

Spiked Dirichlet Process Prior for Bayesian Multiple Hypothesis Testing in Random Effects Models Bayesian Analysis (2009) 4, Number 4, pp. 707 732 Spiked Dirichlet Process Prior for Bayesian Multiple Hypothesis Testing in Random Effects Models Sinae Kim, David B. Dahl and Marina Vannucci Abstract.

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden Clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Some slides adapted from Jacques van Helden Gene expression profiling A quick review Which molecular processes/functions

More information

The locfdr Package. August 19, hivdata... 1 lfdrsim... 2 locfdr Index 5

The locfdr Package. August 19, hivdata... 1 lfdrsim... 2 locfdr Index 5 Title Computes local false discovery rates Version 1.1-2 The locfdr Package August 19, 2006 Author Bradley Efron, Brit Turnbull and Balasubramanian Narasimhan Computation of local false discovery rates

More information

Statistical analysis of microarray data: a Bayesian approach

Statistical analysis of microarray data: a Bayesian approach Biostatistics (003), 4, 4,pp. 597 60 Printed in Great Britain Statistical analysis of microarray data: a Bayesian approach RAPHAEL GTTARD University of Washington, Department of Statistics, Box 3543, Seattle,

More information

Chapter 10. Semi-Supervised Learning

Chapter 10. Semi-Supervised Learning Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

Differential Modeling for Cancer Microarray Data

Differential Modeling for Cancer Microarray Data Differential Modeling for Cancer Microarray Data Omar Odibat Department of Computer Science Feb, 01, 2011 1 Outline Introduction Cancer Microarray data Problem Definition Differential analysis Existing

More information

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The parsimony principle: A quick review Find the tree that requires the fewest

More information

PB HLTH 240A: Advanced Categorical Data Analysis Fall 2007

PB HLTH 240A: Advanced Categorical Data Analysis Fall 2007 Cohort study s formulations PB HLTH 240A: Advanced Categorical Data Analysis Fall 2007 Srine Dudoit Division of Biostatistics Department of Statistics University of California, Berkeley www.stat.berkeley.edu/~srine

More information

Identifying Bio-markers for EcoArray

Identifying Bio-markers for EcoArray Identifying Bio-markers for EcoArray Ashish Bhan, Keck Graduate Institute Mustafa Kesir and Mikhail B. Malioutov, Northeastern University February 18, 2010 1 Introduction This problem was presented by

More information

Lecture 27. December 13, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 27. December 13, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Statistical testing. Samantha Kleinberg. October 20, 2009

Statistical testing. Samantha Kleinberg. October 20, 2009 October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find

More information

A Unified Approach for Simultaneous Gene Clustering and Differential Expression Identification

A Unified Approach for Simultaneous Gene Clustering and Differential Expression Identification A Unified Approach for Simultaneous Gene Clustering and Differential Expression Identification Ming Yuan and Christina Kendziorski (March 17, 2005) Abstract Although both clustering and identification

More information

CHOOSING THE LESSER EVIL: TRADE-OFF BETWEEN FALSE DISCOVERY RATE AND NON-DISCOVERY RATE

CHOOSING THE LESSER EVIL: TRADE-OFF BETWEEN FALSE DISCOVERY RATE AND NON-DISCOVERY RATE Statistica Sinica 18(2008), 861-879 CHOOSING THE LESSER EVIL: TRADE-OFF BETWEEN FALSE DISCOVERY RATE AND NON-DISCOVERY RATE Radu V. Craiu and Lei Sun University of Toronto Abstract: The problem of multiple

More information

Weighted gene co-expression analysis. Yuehua Cui June 7, 2013

Weighted gene co-expression analysis. Yuehua Cui June 7, 2013 Weighted gene co-expression analysis Yuehua Cui June 7, 2013 Weighted gene co-expression network (WGCNA) A type of scale-free network: A scale-free network is a network whose degree distribution follows

More information

Adaptive Filtering Procedures for Replicability Analysis of High-throughput Experiments

Adaptive Filtering Procedures for Replicability Analysis of High-throughput Experiments Adaptive Filtering Procedures for Replicability Analysis of High-throughput Experiments Jingshu Wang 1, Weijie Su 1, Chiara Sabatti 2, and Art B. Owen 2 1 Department of Statistics, University of Pennsylvania

More information

Alpha-Investing. Sequential Control of Expected False Discoveries

Alpha-Investing. Sequential Control of Expected False Discoveries Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

A BAYESIAN STEPWISE MULTIPLE TESTING PROCEDURE. By Sanat K. Sarkar 1 and Jie Chen. Temple University and Merck Research Laboratories

A BAYESIAN STEPWISE MULTIPLE TESTING PROCEDURE. By Sanat K. Sarkar 1 and Jie Chen. Temple University and Merck Research Laboratories A BAYESIAN STEPWISE MULTIPLE TESTING PROCEDURE By Sanat K. Sarar 1 and Jie Chen Temple University and Merc Research Laboratories Abstract Bayesian testing of multiple hypotheses often requires consideration

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data

A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data Juliane Schäfer Department of Statistics, University of Munich Workshop: Practical Analysis of Gene Expression Data

More information

Semiparametric Varying Coefficient Models for Matched Case-Crossover Studies

Semiparametric Varying Coefficient Models for Matched Case-Crossover Studies Semiparametric Varying Coefficient Models for Matched Case-Crossover Studies Ana Maria Ortega-Villa Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial

More information

Sample Size Estimation for Studies of High-Dimensional Data

Sample Size Estimation for Studies of High-Dimensional Data Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,

More information

29 Sample Size Choice for Microarray Experiments

29 Sample Size Choice for Microarray Experiments 29 Sample Size Choice for Microarray Experiments Peter Müller, M.D. Anderson Cancer Center Christian Robert and Judith Rousseau CREST, Paris Abstract We review Bayesian sample size arguments for microarray

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover

More information

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2009 Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks T. Tony Cai University of Pennsylvania

More information

Large-Scale Multiple Testing of Correlations

Large-Scale Multiple Testing of Correlations University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-5-2016 Large-Scale Multiple Testing of Correlations T. Tony Cai University of Pennsylvania Weidong Liu Follow this

More information

Bayesian Methods for Highly Correlated Data. Exposures: An Application to Disinfection By-products and Spontaneous Abortion

Bayesian Methods for Highly Correlated Data. Exposures: An Application to Disinfection By-products and Spontaneous Abortion Outline Bayesian Methods for Highly Correlated Exposures: An Application to Disinfection By-products and Spontaneous Abortion November 8, 2007 Outline Outline 1 Introduction Outline Outline 1 Introduction

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University

Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University Bayesian Inference and the Parametric Bootstrap Bradley Efron Stanford University Importance Sampling for Bayes Posterior Distribution Newton and Raftery (1994 JRSS-B) Nonparametric Bootstrap: good choice

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information