Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018
|
|
- Janice Lee
- 5 years ago
- Views:
Transcription
1 Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals BayesMP Zhiguang Huo 1, Chi Song 2, George Tseng 3 1 Department of Biostatistics, University of Florida 2 Department of Biostatistics, the Ohio State University 3 Department of Biostatistics, University of Pittsburgh July 30, / 19
2 Background for data integration (Tseng et al. 2012) Horizontal meta-analysis: Same type of genomic data from multiple patient cohorts. Vertical integrative analysis: Multiple types of genomic data from the same patient cohort. BayesMP: Differential expression (DE) analysis. 2 / 19
3 Backgroud for meta-analysis of DE analysis According to Tseng et al. (2012), there are four major categories of transcriptomic meta-analysis: Combine effect sizes Fixed effects models, random effects model Combine p-values p-value aggregation methods: Fisher (Fisher, 1925), Stouffer (Stouffer, 1949) order statistics: minp (Tippett, 1931), maxp (Wilkinson, 1951), rop (Song, 2014) Combine ranks ranksum, rankprod (Hong et al, 2006) Direct merge 3 / 19
4 Combine p-values Combining p-values is simple, powerful and independent of batch effect. Table: p-value combining method. E.g. combine p 11, p 12,..., p 1S Genes Study 1 Study 2... Study S 1 p 11 p p 1S 2 p 21 p p 2S 3 p 31 p p 3S G p G1 p G2... p GS Genomic meta-analysis Perform combining p-value methods gene-wisely Adjust for multiple comparisons 4 / 19
5 Motivation 1: Hypothesis testing setting θ s is the effect size of study s, 1 s S. HS B targets biomarkers that are DE in one or more studies: H 0 : θ {θ s = 0} vs H A : θ {θ s 0}. Fisher minp HS A targets biomarkers that are DE in all studies: H 0 : θ {θ s = 0} vs H A : θ {θ s 0}, maxp HS r targets biomarkers that r or more studies are DE: H 0 : θ {θ s = 0} vs H A : θ I{θ s 0} r, rop Problem: HS A and HS r are not complementary hypothesis testing setting. 5 / 19
6 Motivation 2: differential expression from multiple tissues I. II. III. IV. V. VI. Brown fat Heart Liver Figure: heatmap Phenotypes: Black: Wild type. Red: VLCAD-deficient. Differential expression pattern: Homogeneous differential expression pattern. (Moduel I, II). Study specific differential expression pattern. (Moduel III, IV, V, VI) How to categorize meta-analysis differential expression pattern (metapattern)? 6 / 19
7 Z statistics and its distribution Figure: Z statistics distribution in one study. Black line: null component. Red line: positive DE component. Blue line: negative DE component. p gs is one sided p-value for gene g and study s. Z gs = Φ 1 (p gs ), where Φ 1 ( ) is the inverse cumulative density function (CDF) of standard Gaussian distribution. Null component: assume standard Gaussian distribution or empirical null (Efron, 2004). Alternative component: Dirichlet process. 7 / 19
8 Multiple studies (a) Study 1 (b) Study 2 (c) Study 3 Figure: Z statistics distribution in three studies. Y gs { 1, 0, 1} is DE indicator: f (s) (Z gs Y gs ) = f (s) 0 (Z gs ) I(Y gs = 0) + f (s) +1 (Z gs) I(Y gs = 1) + f (s) 1 (Z gs) I(Y gs = 1), Prior Y gs Mult ( 1, (1 π g, π + g, π g ) ) (0, 1, 1), where π + g = π g δ g, π g = π g (1 δ g ). 8 / 19
9 Graphical Model G 0+ G 0- γ β α π g δ g G s+ G s- Y gs f (s) f (s) k+ k- f 0 f (s) Z gs Figure: Graphical representation of Bayesian latent hierarchical model. Shaded nodes are observed variables. Dashed nodes are pre-estimated/fixed parameters. Arrows represent generative process. Dashed lines represent equivalent variables. s is the study index and g is the gene index. 9 / 19
10 Bayesian computing 1. Update π g s: π g Y gs Beta(γ/(G γ) + Y + g + Y g, S Y + g Y g + 1), where Y + g = s I(Y gs = 1) and Y g = s I(Y gs = 1). 2. Update δ g s: 3. Update Y gs s: First update C gs s s.t. δ g Y gs Beta(β + Y + g, β + Y g ). Pr(C gs = k C g,s, Z gs, π ± g ) h (s) k (Z gs C g,s )(π g + ) I(k>0) (πg ) I(k<0) (1 π g ) I(k=0) Set Y gs = sgn(c gs ), Conjugacy will make the Bayesian computing very fast. 10 / 19
11 Decision making framework (Problem 1) For meta-analysis purpose, we will declare differentially expressed genes which are in: ΩĀ : Ω 1 = { θ Ā g : S s=1 I(θ gs 0) = S}. Ω B : Ω 1 B = { θ g : S s=1 I(θ gs 0) = 1}. Ω r : Ω 1 r = { θ g : S s=1 I(θ gs 0) r}. Efron (2001) proposed local FDR ξ g = Pr( θ g Ω 0 Ā Z) = 1 Pr( θ g Ω 1 Ā Z). Given a threshold κ, we declare gene g as a DE gene if ξ g κ and the expected number of false discoveries is g ξ g I(ξ g κ). The Bayesian false discovery rate (FDR) (Newton 2004) is defined as g ξg I(ξg κ) g I(ξg κ). We will compare the performance of our Bayesian approach in terms of FDR with FDR (Benjamini-Hochberg) from frequentists perspective. 11 / 19
12 Biomarker clustering for meta-patterns of homogenous and heterogenous differential signals (Problem 2) Denote by U gs the posterior probability vector for Y gs : U gs = (Pr(Y gs = 1 Z), Pr(Y gs = 1 Z), Pr(Y gs = 0 Z)). We will calculate dissimilarity of U is and U js in study s and then average over study index s. Apply tight clustering (Tseng and Wong) on gene-gene dissimilarity matrix, obtain stable modules. 12 / 19
13 Simulation (FDR) Table: Comparison of different methods by FDR for decision spaces D Ā, D B, and D r. The nominal FDR is 5% for all compared methods. The mean results and SD (in parentheses) were calculated based on 100 simulations. FDR DĀ D B D r (r = S/2 + 1) S σ BayesMP maxp BayesMP Fisher AW BayesMP rop (0.008) (0.013) (0.006) (0.005) (0.004) (0.005) (0.008) (0.012) (0.016) (0.008) (0.006) (0.006) (0.006) (0.010) (0.018) (0.021) (0.010) (0.008) (0.009) (0.008) (0.015) (0.009) (0.017) (0.005) (0.004) (0.004) (0.005) (0.008) (0.016) (0.023) (0.006) (0.005) (0.005) (0.007) (0.008) (0.032) (0.035) (0.008) (0.008) (0.008) (0.008) (0.013) (0.019) (0.023) (0.004) (0.004) (0.004) (0.005) (0.010) (0.029) (0.027) (0.006) (0.005) (0.005) (0.009) (0.012) (0.063) (0.038) (0.007) (0.006) (0.006) (0.009) (0.014) 13 / 19
14 Simulation (AUC) Table: Comparison of different methods by AUC of ROC curve for decision spaces D Ā, D B, and D r. The nominal FDR is 5% for all compared methods. The mean results and SD (in parentheses) were calculated based on 100 simulations. AUC DĀ D B D r (r = S/2 + 1) S σ BayesMP maxp BayesMP Fisher AW BayesMP rop (0.003) (0.003) (0.002) (0.002) (0.002) (0.002) (0.003) (0.006) (0.007) (0.004) (0.004) (0.004) (0.004) (0.005) (0.008) (0.008) (0.005) (0.005) (0.005) (0.006) (0.006) (0.004) (0.003) (0.002) (0.002) (0.002) (0.002) (0.002) (0.007) (0.006) (0.004) (0.004) (0.004) (0.004) (0.005) (0.009) (0.009) (0.005) (0.005) (0.005) (0.005) (0.006) (0.007) (0.003) (0.001) (0.002) (0.001) (0.002) (0.002) (0.011) (0.006) (0.003) (0.003) (0.003) (0.004) (0.004) (0.013) (0.010) (0.004) (0.004) (0.005) (0.005) (0.006) 14 / 19
15 Mouse Metabolism data Table: Sample size description Study wild type VLCAD Brown fat 4 4 Heart 3 4 Liver 4 4 Metabolism disorder in children. Two genotypes of the mouse model - wild type (VLCAD +/+) and VLCAD-deficient (VLCAD -/-)-were studied. Total number of genes from these three transcriptomic studies is 14,495. For D B FDR 5%, we declared 1,701 genes. For D A FDR 5%, we declared 133 genes. 15 / 19
16 Mouse Metabolism data metapattern Brown fat Heart Liver Brown Heart Liver n = 277 I. II. III. IV. V Brown+ Heart+ Liver+ Brown Heart Liver n = 195 Brown+ Heart+ Liver+ Brown Heart Liver n = 194 Brown+ Heart+ Liver+ Brown Heart Liver n = 140 Brown+ Heart+ Liver+ Brown Heart Liver n = 276 Brown+ Heart+ Liver+ Brown Heart Liver n = 110 VI. (a) Heatmap (b) CS (c) Brown+ Heart+ Liver+ Brown Heart Liver bar plot 16 / 19
17 Mouse Metabolism data pathway enrichment analysis Table: module information Target pathway type q value module 1 KEGG LYSOSOME q = module 2 BIOCARTA AHSP PATHWAY q = module 3 DEFENSE RESPONSE q = module 4 BIOCARTA MCM PATHWAY q = module 5 none module 6 FC GAMMA R MEDIATED PHAGOCYTOSIS q = / 19
18 Mouse Metabolism data D A FDR 5%, 133 genes (a) Brown (b) Heart (c) Liver Figure: Heatmaps of 133 DE genes detected under D Ā (at FDR level of 5%) in the mouse metabolism dataset. 18 / 19
19 Summary Novelty: 1. The p-value based method is capable of combining data from different microarray and RNA-seq platforms, 2. Bayesian framework provides complementary decision making space. 3. Non-parametric Bayesian framework makes it robust against distribution assumptions. 4. Meta-pattern help characterize heterogeneities of studies with same disease but different pheonotypes. Performance: 1. Better performance than current meta-analysis hypothesis testing methods (AUC, FDR, etc). 2. Computing is fast because of conjugacy. 3. Implemented is C++ and publicly available in Github. 19 / 19
using Bayesian hierarchical model
Biomarker detection and categorization in RNA-seq meta-analysis using Bayesian hierarchical model Tianzhou Ma Department of Biostatistics University of Pittsburgh, Pittsburgh, PA 15261 email: tim28@pitt.edu
More informationREPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS
REPRODUCIBLE ANALYSIS OF HIGH-THROUGHPUT EXPERIMENTS Ying Liu Department of Biostatistics, Columbia University Summer Intern at Research and CMC Biostats, Sanofi, Boston August 26, 2015 OUTLINE 1 Introduction
More informationMultiple testing: Intro & FWER 1
Multiple testing: Intro & FWER 1 Mark van de Wiel mark.vdwiel@vumc.nl Dep of Epidemiology & Biostatistics,VUmc, Amsterdam Dep of Mathematics, VU 1 Some slides courtesy of Jelle Goeman 1 Practical notes
More informationFDR and ROC: Similarities, Assumptions, and Decisions
EDITORIALS 8 FDR and ROC: Similarities, Assumptions, and Decisions. Why FDR and ROC? It is a privilege to have been asked to introduce this collection of papers appearing in Statistica Sinica. The papers
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationMultiple Testing. Hoang Tran. Department of Statistics, Florida State University
Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome
More informationHigh-throughput Testing
High-throughput Testing Noah Simon and Richard Simon July 2016 1 / 29 Testing vs Prediction On each of n patients measure y i - single binary outcome (eg. progression after a year, PCR) x i - p-vector
More informationA GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE
A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE Sanat K. Sarkar 1, Tianhui Zhou and Debashis Ghosh Temple University, Wyeth Pharmaceuticals and
More informationLooking at the Other Side of Bonferroni
Department of Biostatistics University of Washington 24 May 2012 Multiple Testing: Control the Type I Error Rate When analyzing genetic data, one will commonly perform over 1 million (and growing) hypothesis
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationA Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data
A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction
More informationEmpirical Bayes Moderation of Asymptotically Linear Parameters
Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi
More informationFamily-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs
Family-wise Error Rate Control in QTL Mapping and Gene Ontology Graphs with Remarks on Family Selection Dissertation Defense April 5, 204 Contents Dissertation Defense Introduction 2 FWER Control within
More informationHigh-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018
High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously
More informationEmpirical Bayes Moderation of Asymptotically Linear Parameters
Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi
More informationMixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data
Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University
More informationResearch Article Sample Size Calculation for Controlling False Discovery Proportion
Probability and Statistics Volume 2012, Article ID 817948, 13 pages doi:10.1155/2012/817948 Research Article Sample Size Calculation for Controlling False Discovery Proportion Shulian Shang, 1 Qianhe Zhou,
More informationAndrogen-independent prostate cancer
The following tutorial walks through the identification of biological themes in a microarray dataset examining androgen-independent. Visit the GeneSifter Data Center (www.genesifter.net/web/datacenter.html)
More informationAdaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses
Adaptive Filtering Multiple Testing Procedures for Partial Conjunction Hypotheses arxiv:1610.03330v1 [stat.me] 11 Oct 2016 Jingshu Wang, Chiara Sabatti, Art B. Owen Department of Statistics, Stanford University
More informationBayesian Partition Models for Identifying Expression Quantitative Trait Loci
Journal of the American Statistical Association ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20 Bayesian Partition Models for Identifying Expression Quantitative
More informationDepartment of Statistics, The Wharton School, University of Pennsylvania
Submitted to the Annals of Applied Statistics BAYESIAN TESTING OF MANY HYPOTHESIS MANY GENES: A STUDY OF SLEEP APNEA BY SHANE T. JENSEN Department of Statistics, The Wharton School, University of Pennsylvania
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Hypothesis testing Machine Learning CSE546 Kevin Jamieson University of Washington October 30, 2018 2018 Kevin Jamieson 2 Anomaly detection You are
More informationEstimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq
Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq Xing Ren 1, Jianmin Wang 1,2,, Song Liu 1,2, and Jeffrey C. Miecznikowski 1,2,
More informationEmpirical Bayesian Inference & Non-Null Bootstrapping for Threshold Selection, Nasseroleslami Page 1 of 10
Empirical Bayesian Inference & Non-Null Bootstrapping for Threshold Selection, Nasseroleslami Page 1 of 10 An Implementation of Empirical Bayesian Inference and Non-Null Bootstrapping for Threshold Selection
More informationPearson s meta-analysis revisited
Pearson s meta-analysis revisited 1 Pearson s meta-analysis revisited in a microarray context Art B. Owen Department of Statistics Stanford University Pearson s meta-analysis revisited 2 Long story short
More informationLecture 28. Ingo Ruczinski. December 3, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 28 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 3, 2015 1 2 3 4 5 1 Familywise error rates 2 procedure 3 Performance of with multiple
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationControlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method
Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman
More informationSIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE
SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE A HYPOTHESIS TEST APPROACH Ismaïl Ahmed 1,2, Françoise Haramburu 3,4, Annie Fourrier-Réglat 3,4,5, Frantz Thiessard 4,5,6,
More informationThe miss rate for the analysis of gene expression data
Biostatistics (2005), 6, 1,pp. 111 117 doi: 10.1093/biostatistics/kxh021 The miss rate for the analysis of gene expression data JONATHAN TAYLOR Department of Statistics, Stanford University, Stanford,
More informationA Large-Sample Approach to Controlling the False Discovery Rate
A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University
More informationLesson 11. Functional Genomics I: Microarray Analysis
Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationBiochip informatics-(i)
Biochip informatics-(i) : biochip normalization & differential expression Ju Han Kim, M.D., Ph.D. SNUBI: SNUBiomedical Informatics http://www.snubi snubi.org/ Biochip Informatics - (I) Biochip basics Preprocessing
More informationFalse discovery rate and related concepts in multiple comparisons problems, with applications to microarray data
False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using
More informationLecture 7: Hypothesis Testing and ANOVA
Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationTwo-stage Adaptive Randomization for Delayed Response in Clinical Trials
Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Guosheng Yin Department of Statistics and Actuarial Science The University of Hong Kong Joint work with J. Xu PSI and RSS Journal
More informationEstimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test
Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test L. García Barrado 1 E. Coart 2 T. Burzykowski 1,2 1 Interuniversity Institute for Biostatistics and
More informationA Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments
A Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Jie Chen 1 Merck Research Laboratories, P. O. Box 4, BL3-2, West Point, PA 19486, U.S.A. Telephone:
More informationLecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data
Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder
More informationHigh-dimensional data: Exploratory data analysis
High-dimensional data: Exploratory data analysis Mark van de Wiel mark.vdwiel@vumc.nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University Contributions by Wessel
More informationBayesian Aspects of Classification Procedures
University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations --203 Bayesian Aspects of Classification Procedures Igar Fuki University of Pennsylvania, igarfuki@wharton.upenn.edu Follow
More informationTable of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. Table of Outcomes. T=number of type 2 errors
The Multiple Testing Problem Multiple Testing Methods for the Analysis of Microarray Data 3/9/2009 Copyright 2009 Dan Nettleton Suppose one test of interest has been conducted for each of m genes in a
More informationStep-down FDR Procedures for Large Numbers of Hypotheses
Step-down FDR Procedures for Large Numbers of Hypotheses Paul N. Somerville University of Central Florida Abstract. Somerville (2004b) developed FDR step-down procedures which were particularly appropriate
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationStat 206: Estimation and testing for a mean vector,
Stat 206: Estimation and testing for a mean vector, Part II James Johndrow 2016-12-03 Comparing components of the mean vector In the last part, we talked about testing the hypothesis H 0 : µ 1 = µ 2 where
More informationFrequentist Accuracy of Bayesian Estimates
Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University Bayesian Inference Parameter: µ Ω Observed data: x Prior: π(µ) Probability distributions: Parameter of interest: { fµ (x), µ
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationBayesian Inference of Interactions and Associations
Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,
More informationNetwork Biology-part II
Network Biology-part II Jun Zhu, Ph. D. Professor of Genomics and Genetic Sciences Icahn Institute of Genomics and Multi-scale Biology The Tisch Cancer Institute Icahn Medical School at Mount Sinai New
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationLarge-Scale Hypothesis Testing
Chapter 2 Large-Scale Hypothesis Testing Progress in statistics is usually at the mercy of our scientific colleagues, whose data is the nature from which we work. Agricultural experimentation in the early
More informationBayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments
Bayesian Determination of Threshold for Identifying Differentially Expressed Genes in Microarray Experiments Jie Chen 1 Merck Research Laboratories, P. O. Box 4, BL3-2, West Point, PA 19486, U.S.A. Telephone:
More informationPerformance Evaluation
Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive
More informationLecture: Mixture Models for Microbiome data
Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance
More informationControlling Bayes Directional False Discovery Rate in Random Effects Model 1
Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA
More informationNon-Parametric Combination (NPC) & classical multivariate tests
Non-Parametric Combination (NPC) & classical multivariate tests Anderson M. Winkler fmrib Analysis Group 5.May.26 Winkler Non-Parametric Combination (NPC) / 55 Winkler Non-Parametric Combination (NPC)
More informationFalse Discovery Control in Spatial Multiple Testing
False Discovery Control in Spatial Multiple Testing WSun 1,BReich 2,TCai 3, M Guindani 4, and A. Schwartzman 2 WNAR, June, 2012 1 University of Southern California 2 North Carolina State University 3 University
More informationDEGseq: an R package for identifying differentially expressed genes from RNA-seq data
DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics
More informationNon-specific filtering and control of false positives
Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview
More informationA NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES
A NEW APPROACH FOR LARGE SCALE MULTIPLE TESTING WITH APPLICATION TO FDR CONTROL FOR GRAPHICALLY STRUCTURED HYPOTHESES By Wenge Guo Gavin Lynch Joseph P. Romano Technical Report No. 2018-06 September 2018
More informationDavid B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison
AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University
More informationJournal Club: Higher Criticism
Journal Club: Higher Criticism David Donoho (2002): Higher Criticism for Heterogeneous Mixtures, Technical Report No. 2002-12, Dept. of Statistics, Stanford University. Introduction John Tukey (1976):
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /8/2016 1/38
BIO5312 Biostatistics Lecture 11: Multisample Hypothesis Testing II Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/8/2016 1/38 Outline In this lecture, we will continue to
More informationRank conditional coverage and confidence intervals in high dimensional problems
conditional coverage and confidence intervals in high dimensional problems arxiv:1702.06986v1 [stat.me] 22 Feb 2017 Jean Morrison and Noah Simon Department of Biostatistics, University of Washington, Seattle,
More informationDispersion modeling for RNAseq differential analysis
Dispersion modeling for RNAseq differential analysis E. Bonafede 1, F. Picard 2, S. Robin 3, C. Viroli 1 ( 1 ) univ. Bologna, ( 3 ) CNRS/univ. Lyon I, ( 3 ) INRA/AgroParisTech, Paris IBC, Victoria, July
More informationPeak Detection for Images
Peak Detection for Images Armin Schwartzman Division of Biostatistics, UC San Diego June 016 Overview How can we improve detection power? Use a less conservative error criterion Take advantage of prior
More informationSpiked Dirichlet Process Prior for Bayesian Multiple Hypothesis Testing in Random Effects Models
Bayesian Analysis (2009) 4, Number 4, pp. 707 732 Spiked Dirichlet Process Prior for Bayesian Multiple Hypothesis Testing in Random Effects Models Sinae Kim, David B. Dahl and Marina Vannucci Abstract.
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationClustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden
Clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Some slides adapted from Jacques van Helden Gene expression profiling A quick review Which molecular processes/functions
More informationThe locfdr Package. August 19, hivdata... 1 lfdrsim... 2 locfdr Index 5
Title Computes local false discovery rates Version 1.1-2 The locfdr Package August 19, 2006 Author Bradley Efron, Brit Turnbull and Balasubramanian Narasimhan Computation of local false discovery rates
More informationStatistical analysis of microarray data: a Bayesian approach
Biostatistics (003), 4, 4,pp. 597 60 Printed in Great Britain Statistical analysis of microarray data: a Bayesian approach RAPHAEL GTTARD University of Washington, Department of Statistics, Box 3543, Seattle,
More informationChapter 10. Semi-Supervised Learning
Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline
More informationDifferential Modeling for Cancer Microarray Data
Differential Modeling for Cancer Microarray Data Omar Odibat Department of Computer Science Feb, 01, 2011 1 Outline Introduction Cancer Microarray data Problem Definition Differential analysis Existing
More informationGene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The parsimony principle: A quick review Find the tree that requires the fewest
More informationPB HLTH 240A: Advanced Categorical Data Analysis Fall 2007
Cohort study s formulations PB HLTH 240A: Advanced Categorical Data Analysis Fall 2007 Srine Dudoit Division of Biostatistics Department of Statistics University of California, Berkeley www.stat.berkeley.edu/~srine
More informationIdentifying Bio-markers for EcoArray
Identifying Bio-markers for EcoArray Ashish Bhan, Keck Graduate Institute Mustafa Kesir and Mikhail B. Malioutov, Northeastern University February 18, 2010 1 Introduction This problem was presented by
More informationLecture 27. December 13, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationStatistical testing. Samantha Kleinberg. October 20, 2009
October 20, 2009 Intro to significance testing Significance testing and bioinformatics Gene expression: Frequently have microarray data for some group of subjects with/without the disease. Want to find
More informationA Unified Approach for Simultaneous Gene Clustering and Differential Expression Identification
A Unified Approach for Simultaneous Gene Clustering and Differential Expression Identification Ming Yuan and Christina Kendziorski (March 17, 2005) Abstract Although both clustering and identification
More informationCHOOSING THE LESSER EVIL: TRADE-OFF BETWEEN FALSE DISCOVERY RATE AND NON-DISCOVERY RATE
Statistica Sinica 18(2008), 861-879 CHOOSING THE LESSER EVIL: TRADE-OFF BETWEEN FALSE DISCOVERY RATE AND NON-DISCOVERY RATE Radu V. Craiu and Lei Sun University of Toronto Abstract: The problem of multiple
More informationWeighted gene co-expression analysis. Yuehua Cui June 7, 2013
Weighted gene co-expression analysis Yuehua Cui June 7, 2013 Weighted gene co-expression network (WGCNA) A type of scale-free network: A scale-free network is a network whose degree distribution follows
More informationAdaptive Filtering Procedures for Replicability Analysis of High-throughput Experiments
Adaptive Filtering Procedures for Replicability Analysis of High-throughput Experiments Jingshu Wang 1, Weijie Su 1, Chiara Sabatti 2, and Art B. Owen 2 1 Department of Statistics, University of Pennsylvania
More informationAlpha-Investing. Sequential Control of Expected False Discoveries
Alpha-Investing Sequential Control of Expected False Discoveries Dean Foster Bob Stine Department of Statistics Wharton School of the University of Pennsylvania www-stat.wharton.upenn.edu/ stine Joint
More informationBayesian Regression (1/31/13)
STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed
More informationA BAYESIAN STEPWISE MULTIPLE TESTING PROCEDURE. By Sanat K. Sarkar 1 and Jie Chen. Temple University and Merck Research Laboratories
A BAYESIAN STEPWISE MULTIPLE TESTING PROCEDURE By Sanat K. Sarar 1 and Jie Chen Temple University and Merc Research Laboratories Abstract Bayesian testing of multiple hypotheses often requires consideration
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More informationA Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data
A Practical Approach to Inferring Large Graphical Models from Sparse Microarray Data Juliane Schäfer Department of Statistics, University of Munich Workshop: Practical Analysis of Gene Expression Data
More informationSemiparametric Varying Coefficient Models for Matched Case-Crossover Studies
Semiparametric Varying Coefficient Models for Matched Case-Crossover Studies Ana Maria Ortega-Villa Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial
More informationSample Size Estimation for Studies of High-Dimensional Data
Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,
More information29 Sample Size Choice for Microarray Experiments
29 Sample Size Choice for Microarray Experiments Peter Müller, M.D. Anderson Cancer Center Christian Robert and Judith Rousseau CREST, Paris Abstract We review Bayesian sample size arguments for microarray
More informationMicroarray Data Analysis: Discovery
Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover
More informationSimultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2009 Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks T. Tony Cai University of Pennsylvania
More informationLarge-Scale Multiple Testing of Correlations
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-5-2016 Large-Scale Multiple Testing of Correlations T. Tony Cai University of Pennsylvania Weidong Liu Follow this
More informationBayesian Methods for Highly Correlated Data. Exposures: An Application to Disinfection By-products and Spontaneous Abortion
Outline Bayesian Methods for Highly Correlated Exposures: An Application to Disinfection By-products and Spontaneous Abortion November 8, 2007 Outline Outline 1 Introduction Outline Outline 1 Introduction
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationBayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University
Bayesian Inference and the Parametric Bootstrap Bradley Efron Stanford University Importance Sampling for Bayes Posterior Distribution Newton and Raftery (1994 JRSS-B) Nonparametric Bootstrap: good choice
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More information