Dispersion modeling for RNAseq differential analysis

Size: px
Start display at page:

Download "Dispersion modeling for RNAseq differential analysis"

Transcription

1 Dispersion modeling for RNAseq differential analysis E. Bonafede 1, F. Picard 2, S. Robin 3, C. Viroli 1 ( 1 ) univ. Bologna, ( 3 ) CNRS/univ. Lyon I, ( 3 ) INRA/AgroParisTech, Paris IBC, Victoria, July 2016 S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 1 / 20

2 Modeling dispersion in RNAseq experiments 1 Modeling dispersion in RNAseq experiments 2 Statistical inference & Test statistics 3 Simulations & Illustration 4 Conclusions and future work S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 2 / 20

3 Modeling dispersion in RNAseq experiments General problem RNAseq is a sequencing based technology that gives access to a measure of the expression level of all the genes from a given species in a given sample (condition) Differential analysis: p genes, d conditions (possibly with replicates). Find the genes, the expression of which vary across conditions. Data. Y ijr = RNAseq read count for gene i in replicate r of condition j S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 3 / 20

4 Modeling dispersion in RNAseq experiments Negative binomial model RNAseq count = number of reads mapped onto a gene s sequence. Observed variability often exceeds this expected according to Poisson. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 4 / 20

5 Modeling dispersion in RNAseq experiments Negative binomial model RNAseq count = number of reads mapped onto a gene s sequence. Observed variability often exceeds this expected according to Poisson. Popular model. [14] Y ijr N B(λ ij, α i ) where 1/α = over-dispersion parameter: V(Y ijr ) = λ ij (1 + λ ij /α i ). S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 4 / 20

6 Modeling dispersion in RNAseq experiments Negative binomial model RNAseq count = number of reads mapped onto a gene s sequence. Observed variability often exceeds this expected according to Poisson. Popular model. [14] Y ijr N B(λ ij, α i ) where 1/α = over-dispersion parameter: V(Y ijr ) = λ ij (1 + λ ij /α i ). Differential analysis. For each gene i, test H 0 = {λ i1 = λ i2 } vs H 1 = {λ i1 λ i2 }. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 4 / 20

7 Modeling dispersion in RNAseq experiments Modeling over-dispersion Assumption on α: Same α for all genes: unrealistic; Gene-specific α i : hard to estimate, especially with few replicates. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 5 / 20

8 Modeling dispersion in RNAseq experiments Modeling over-dispersion Assumption on α: Same α for all genes: unrealistic; Gene-specific α i : hard to estimate, especially with few replicates. Several approaches: Shrinkage: edger [15,13], DSS [18]; Function of mean expression: [8], NBPseq [5], DEseq [1,11]; Bayesian estimation: bayseq [6]; Mixture model: DEXUS [7] (components = conditions) + many others: NOISeq [17], PoissonSeq [10], QuasiSeq [12], TSPM [2] S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 5 / 20

9 Modeling dispersion in RNAseq experiments Mixture model for the dispersion Poisson-Gamma representation. U Gam(α, α), Y U P(λU) Y N B(λ, α) Over-dispersion generated by the latent variable U. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 6 / 20

10 Modeling dispersion in RNAseq experiments Mixture model for the dispersion Poisson-Gamma representation. U Gam(α, α), Y U P(λU) Y N B(λ, α) Over-dispersion generated by the latent variable U. Mixture model for the dispersion. Genes belong to K different dispersion groups: S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 6 / 20

11 Modeling dispersion in RNAseq experiments Mixture model for the dispersion Poisson-Gamma representation. U Gam(α, α), Y U P(λU) Y N B(λ, α) Over-dispersion generated by the latent variable U. Mixture model for the dispersion. Genes belong to K different dispersion groups: Dispersion group of gene i: Z i M (1; (ω k )) ; S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 6 / 20

12 Modeling dispersion in RNAseq experiments Mixture model for the dispersion Poisson-Gamma representation. U Gam(α, α), Y U P(λU) Y N B(λ, α) Over-dispersion generated by the latent variable U. Mixture model for the dispersion. Genes belong to K different dispersion groups: Dispersion group of gene i: Z i M (1; (ω k )) ; Latent dispersion variable: U ijr Z i = k Gam(α k, α k ); S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 6 / 20

13 Modeling dispersion in RNAseq experiments Mixture model for the dispersion Poisson-Gamma representation. U Gam(α, α), Y U P(λU) Y N B(λ, α) Over-dispersion generated by the latent variable U. Mixture model for the dispersion. Genes belong to K different dispersion groups: Dispersion group of gene i: Z i M (1; (ω k )) ; Latent dispersion variable: U ijr Z i = k Gam(α k, α k ); Observed count: Y ijr U ijr P(λ ij U ijr ). S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 6 / 20

14 Modeling dispersion in RNAseq experiments Distributions U ijr Gam(α k, α k ) Y ijr N B(λ, α k ) S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 7 / 20

15 Modeling dispersion in RNAseq experiments Graphical representation For gene i: Marginal distribution: Y ijr Z i = k N B(λ ij, α k ), Y ijr = k ω k N B(λ ij, α k ) S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 8 / 20

16 Statistical inference & Test statistics 1 Modeling dispersion in RNAseq experiments 2 Statistical inference & Test statistics 3 Simulations & Illustration 4 Conclusions and future work S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 9 / 20

17 Statistical inference & Test statistics A two-hidden layer model Parameter θ = {ω = (ω k ), α = (α k ), λ = (λ ij )}; Hidden variables H = {Z = (Z i ), U = (U ijr )}; Observed variables Y = (Y ijr ). S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 10 / 20

18 Statistical inference & Test statistics A two-hidden layer model Parameter θ = {ω = (ω k ), α = (α k ), λ = (λ ij )}; Hidden variables H = {Z = (Z i ), U = (U ijr )}; Observed variables Y = (Y ijr ). Likelihood decomposition. [4] log p θ (Y ) = log p θ (Y, H)dH S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 10 / 20

19 Statistical inference & Test statistics A two-hidden layer model Parameter θ = {ω = (ω k ), α = (α k ), λ = (λ ij )}; Hidden variables H = {Z = (Z i ), U = (U ijr )}; Observed variables Y = (Y ijr ). Likelihood decomposition. [4] log p θ (Y ) = log p θ (Y, H)dH = E [log p θ (Y, H) Y ] + H [p θ (H Y )] S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 10 / 20

20 Statistical inference & Test statistics A two-hidden layer model Parameter θ = {ω = (ω k ), α = (α k ), λ = (λ ij )}; Hidden variables H = {Z = (Z i ), U = (U ijr )}; Observed variables Y = (Y ijr ). Likelihood decomposition. [4] log p θ (Y ) = log p θ (Y, H)dH = E [log p θ (Y, H) Y ] + H [p θ (H Y )] = E [log p ω (Z) + log p α (U Z) + log p λ (Y U) Y ] + H [p θ (H Y )] S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 10 / 20

21 Statistical inference & Test statistics EM algorithm Aim: Find θ = arg max θ log p θ (Y ). S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 11 / 20

22 Statistical inference & Test statistics EM algorithm Aim: Find θ = arg max θ log p θ (Y ). E step: Compute conditional moments Z Y : multinomial; U Y : mixture of Gammas. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 11 / 20

23 Statistical inference & Test statistics EM algorithm Aim: Find θ = arg max θ log p θ (Y ). E step: Compute conditional moments Z Y : multinomial; U Y : mixture of Gammas. M step: Estimate the parameters ω k : explicit; α k : numerical via quasi-newton (or fix-point); λ ij : explicit. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 11 / 20

24 Statistical inference & Test statistics Three contrasts Test: H 0 = {λ i1 = λ i2 } vs H 1 = {λ i1 λ i2 }. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 12 / 20

25 Statistical inference & Test statistics Three contrasts Test: H 0 = {λ i1 = λ i2 } vs H 1 = {λ i1 λ i2 }. Considered contrasts: Difference = λ i1 λ i2 Ratio = λ i1 / λ i2 ) Log-ratio = ln ( λi1 / λ i2 S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 12 / 20

26 Statistical inference & Test statistics Three contrasts Test: H 0 = {λ i1 = λ i2 } vs H 1 = {λ i1 λ i2 }. Considered contrasts: Difference = λ i1 λ i2 Ratio = λ i1 / λ i2 ) Log-ratio = ln ( λi1 / λ i2 Constrast variance: First-order approximation derived via the -method. Test statistic = Contrast / V(Contrast) S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 12 / 20

27 Simulations & Illustration Outline 1 Modeling dispersion in RNAseq experiments 2 Statistical inference & Test statistics 3 Simulations & Illustration 4 Conclusions and future work S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 13 / 20

28 Simulations & Illustration Simulation design compcoder package [16]: Independent realistic RNAseq simulation. p = 1000, 5000 genes (inc. 10% down- and 10% up-regulated); d = 2 conditions; n j = 3, 5, 10; Library size = Evaluation criteria Type-I error control; ROC curve Estimation of the dispersion: ( ) V(Y ijr ) = λ ij 1 + λ ij E θ(z ik Y )/ α k k S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 14 / 20

29 Simulations & Illustration Type-I error control (p = 1000) 0.15 nj= MixtNB DESeq edger DSS DEXUS NBPSeq TSPM SAMseq PoisSeq QuasiSeq 0.15 nj= MixtNB DESeq edger DSS DEXUS NBPSeq TSPM SAMseq PoisSeq QuasiSeq 0.15 nj= MixtNB DESeq edger DSS DEXUS NBPSeq TSPM SAMseq PoisSeq QuasiSeq S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 15 / 20

30 Simulations & Illustration ROC curves (p = 5000) ROC curve ROC curve True positive rate Difference Ratio Log Ratio True positive rate MixtNB DESeq edger DSS False positive rate False positive rate ROC curve ROC curve True positive rate False positive rate MixtNB DEXUS NBPSeq TSPM True positive rate False positive rate MixtNB SAMseq PoisSeq QuasiSeq S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 16 / 20

31 Simulations & Illustration Estimation of dispersion (p = 5000) relative error relative error Common DESeq2 DEXUS DSS edger MixtNB NBPSeq number of mixture components Estimation error of V(Y ) as a function of K Precision with the different methods S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 17 / 20

32 Simulations & Illustration Illustration: data from [9] p = genes, d = 2 conditions (treated/control), n j = 3, 4 K = 3 dispersion groups (BIC). Number of declared differentially expressed genes: α 5% 1%.1% Difference Ratio Log Ratio DESeq edger DSS S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 18 / 20

33 Conclusions and future work Conclusions Summary. A generic framework for RNAseq differential analysis Can account for the library size via an offset µ jr. Flexible modeling of over-dispersion via mixture model A genuine EM algorithm taking advantage of the Poisson-Gamma representation State-of-the art accuracy + control of the type-i error + estimation of the dispersion Published in Biometrics (2015) [3] + R CRAN package MixtNB S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 19 / 20

34 Conclusions and future work Future works Generalize to more complex designs (taking advantage of the GLM framework) Negative binomial latent-block model (LBM) for metagenomics: Y ijr = number of reads from species i in medium j (rep. r) Simultaneous clustering of species and medium using a variational EM algorithm for Poisson-Gamma. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 20 / 20

35 Appendix S. Anders and W. Huber. Differential expression analysis for sequence count data. Genome biology, 11(10):R106, P.L. Auer and R.W. Doerge. A two-stage poisson model for testing RNA-seq data. Statistical applications in genetics and molecular biology, 10(1):1 26, E. Bonafede, F. Picard, S. Robin, and C. Viroli. Modeling overdispersion heterogeneity in differential expression analysis using mixtures. Biometrics, pages n/a n/a, A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. B, 39:1 38, Y. Di, D.W. Schafer, J.S. Cumbie, and J.H. Chang. The NBP negative binomial model for assessing differential gene expression from RNA-Seq. Statistical Applications in Genetics and Molecular Biology, 10(1):1 28, T. Hardcastle and K. Kelly. BaySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics, 11(422):1 15, G. Klambauer, T. Unterthiner, and S. Hochreiter. DEXUS: identifying differential expression in RNA-Seq studies with unknown condtions. Nucleics Acids Research, 42(21):1 11, C.W. Law, Y. Chen, W. Shi, and G.K. Smyth. Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 15:R29, H. Li, M. T. Lovci, Y. S. Kwon, M. G. Rosenfeld, X. D. Fu, and G. W. Yeo. Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model. Proceedings of the National Academy of Sciences, 105(51): , J. Li and R. Tibshirani. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 20 / 20

36 Appendix Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Statistical Methods in Medical Research, 22(5): , M.I. Love, W. Huber, and S. Anders. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550, S.P. Lund, D. Nettleton, D.J. McCarthy, and G.K. Smyth. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical applications in genetics and molecular biology, 11(5):8, D.J. McCarthy, Y. Chen, and G.K. Smyth. Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation. Nucleics Acids Research, 40(10): , M. D. Robinson, D. J. McCarthy, and G. K. Smyth. edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1): , M. D. Robinson and G. K. Smyth. Small-sample estimation of negative binomial dispersion, with application to SAGE data. Biostatistics, 9: , C. Soneson. compcoder - an R package for benchmarking differential expression methods for RNA-seq data. Bioinformatics, 30(17): , S. Tarazona, F. García-Alcalde, J. Dopazo, A. Ferrer, and A. Conesa. Differential expression in RNA-seq: a matter of depth. Genome research, 21(12): , H. Wu, C. Wang, and Z. Wu. A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics, 14(2): , S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 20 / 20

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University

More information

g A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n(

g A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n( ,a) RNA-seq RNA-seq Cuffdiff, edger, DESeq Sese Jun,a) Abstract: Frequently used biological experiment technique for observing comprehensive gene expression has been changed from microarray using cdna

More information

Normalization and differential analysis of RNA-seq data

Normalization and differential analysis of RNA-seq data Normalization and differential analysis of RNA-seq data Nathalie Villa-Vialaneix INRA, Toulouse, MIAT (Mathématiques et Informatique Appliquées de Toulouse) nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org

More information

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder

More information

Statistics for Differential Expression in Sequencing Studies. Naomi Altman

Statistics for Differential Expression in Sequencing Studies. Naomi Altman Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand

More information

ABSSeq: a new RNA-Seq analysis method based on modelling absolute expression differences

ABSSeq: a new RNA-Seq analysis method based on modelling absolute expression differences ABSSeq: a new RNA-Seq analysis method based on modelling absolute expression differences Wentao Yang October 30, 2018 1 Introduction This vignette is intended to give a brief introduction of the ABSSeq

More information

Differential expression analysis for sequencing count data. Simon Anders

Differential expression analysis for sequencing count data. Simon Anders Differential expression analysis for sequencing count data Simon Anders RNA-Seq Count data in HTS RNA-Seq Tag-Seq Gene 13CDNA73 A2BP1 A2M A4GALT AAAS AACS AADACL1 [...] ChIP-Seq Bar-Seq... GliNS1 4 19

More information

Lecture: Mixture Models for Microbiome data

Lecture: Mixture Models for Microbiome data Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance

More information

DEXSeq paper discussion

DEXSeq paper discussion DEXSeq paper discussion L Collado-Torres December 10th, 2012 1 / 23 1 Background 2 DEXSeq paper 3 Results 2 / 23 Gene Expression 1 Background 1 Source: http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/applexpression.shtml

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

arxiv: v1 [stat.me] 1 Dec 2015

arxiv: v1 [stat.me] 1 Dec 2015 Bayesian Estimation of Negative Binomial Parameters with Applications to RNA-Seq Data arxiv:1512.00475v1 [stat.me] 1 Dec 2015 Luis León-Novelo Claudio Fuentes Sarah Emerson UT Health Science Center Oregon

More information

RNASeq Differential Expression

RNASeq Differential Expression 12/06/2014 RNASeq Differential Expression Le Corguillé v1.01 1 Introduction RNASeq No previous genomic sequence information is needed In RNA-seq the expression signal of a transcript is limited by the

More information

Unlocking RNA-seq tools for zero inflation and single cell applications using observation weights

Unlocking RNA-seq tools for zero inflation and single cell applications using observation weights Unlocking RNA-seq tools for zero inflation and single cell applications using observation weights Koen Van den Berge, Ghent University Statistical Genomics, 2018-2019 1 The team Koen Van den Berge Fanny

More information

Statistical tests for differential expression in count data (1)

Statistical tests for differential expression in count data (1) Statistical tests for differential expression in count data (1) NBIC Advanced RNA-seq course 25-26 August 2011 Academic Medical Center, Amsterdam The analysis of a microarray experiment Pre-process image

More information

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis March 18, 2016 UVA Seminar RNA Seq 1 RNA Seq Gene expression is the transcription of the

More information

NBLDA: negative binomial linear discriminant analysis for RNA-Seq data

NBLDA: negative binomial linear discriminant analysis for RNA-Seq data Dong et al. BMC Bioinformatics (2016) 17:369 DOI 10.1186/s12859-016-1208-1 RESEARCH ARTICLE Open Access NBLDA: negative binomial linear discriminant analysis for RNA-Seq data Kai Dong 1,HongyuZhao 2,TiejunTong

More information

*Equal contribution Contact: (TT) 1 Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv

*Equal contribution Contact: (TT) 1 Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv Supplementary of Complementary Post Transcriptional Regulatory Information is Detected by PUNCH-P and Ribosome Profiling Hadas Zur*,1, Ranen Aviner*,2, Tamir Tuller 1,3 1 Department of Biomedical Engineering,

More information

Normalization, testing, and false discovery rate estimation for RNA-sequencing data

Normalization, testing, and false discovery rate estimation for RNA-sequencing data Biostatistics Advance Access published October 14, 2011 Biostatistics (2011), 0, 0, pp. 1 16 doi:10.1093/biostatistics/kxr031 Normalization, testing, and false discovery rate estimation for RNA-sequencing

More information

Statistical methods for estimation, testing, and clustering with gene expression data

Statistical methods for estimation, testing, and clustering with gene expression data Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2017 Statistical methods for estimation, testing, and clustering with gene expression data Andrew Lithio Iowa

More information

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Expression analysis for RNA-seq data Ewa Szczurek Instytut Informatyki Uniwersytet Warszawski 1/35 The problem

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

Analyses biostatistiques de données RNA-seq

Analyses biostatistiques de données RNA-seq Analyses biostatistiques de données RNA-seq Ignacio Gonzàlez, Annick Moisan & Nathalie Villa-Vialaneix prenom.nom@toulouse.inra.fr Toulouse, 18/19 mai 2017 IG, AM, NV 2 (INRA) Biostatistique RNA-seq Toulouse,

More information

Statistical challenges in RNA-Seq data analysis

Statistical challenges in RNA-Seq data analysis Statistical challenges in RNA-Seq data analysis Julie Aubert UMR 518 AgroParisTech-INRA Mathématiques et Informatique Appliquées Ecole de bioinformatique, Station biologique de Roscoff, 2013 Nov. 18 J.

More information

Comparative analysis of RNA- Seq data with DESeq2

Comparative analysis of RNA- Seq data with DESeq2 Comparative analysis of RNA- Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA- Seq Discovery Eind new transcripts Eind transcript boundaries Eind splice junctions Comparison Given

More information

Multiple hypothesis testing and RNA-seq differential expression analysis accounting for dependence and relevant covariates

Multiple hypothesis testing and RNA-seq differential expression analysis accounting for dependence and relevant covariates Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 218 Multiple hypothesis testing and RNA-seq differential expression analysis accounting for dependence and relevant

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

SPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1

SPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1 SPH 247 Statistical Analysis of Laboratory Data April 28, 2015 SPH 247 Statistics for Laboratory Data 1 Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure and

More information

Mixtures and Hidden Markov Models for analyzing genomic data

Mixtures and Hidden Markov Models for analyzing genomic data Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche

More information

Hidden Markov models for time series of counts with excess zeros

Hidden Markov models for time series of counts with excess zeros Hidden Markov models for time series of counts with excess zeros Madalina Olteanu and James Ridgway University Paris 1 Pantheon-Sorbonne - SAMM, EA4543 90 Rue de Tolbiac, 75013 Paris - France Abstract.

More information

Differential Expression with RNA-seq: Technical Details

Differential Expression with RNA-seq: Technical Details Differential Expression with RNA-seq: Technical Details Lieven Clement Ghent University, Belgium Statistical Genomics: Master of Science in Bioinformatics TWIST, Krijgslaan 281 (S9), Gent, Belgium e-mail:

More information

Co-expression analysis of RNA-seq data

Co-expression analysis of RNA-seq data Co-expression analysis of RNA-seq data Etienne Delannoy & Marie-Laure Martin-Magniette & Andrea Rau Plant Science Institut of Paris-Saclay (IPS2) Applied Mathematics and Informatics Unit (MIA-Paris) Genetique

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information

using Bayesian hierarchical model

using Bayesian hierarchical model Biomarker detection and categorization in RNA-seq meta-analysis using Bayesian hierarchical model Tianzhou Ma Department of Biostatistics University of Pittsburgh, Pittsburgh, PA 15261 email: tim28@pitt.edu

More information

Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq

Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq Xing Ren 1, Jianmin Wang 1,2,, Song Liu 1,2, and Jeffrey C. Miecznikowski 1,2,

More information

Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Some Statistical Models and Algorithms for Change-Point Problems in Genomics

Some Statistical Models and Algorithms for Change-Point Problems in Genomics Some Statistical Models and Algorithms for Change-Point Problems in Genomics S. Robin UMR 518 AgroParisTech / INRA Applied MAth & Comput. Sc. Journées SMAI-MAIRCI Grenoble, September 2012 S. Robin (AgroParisTech

More information

RNA-seq. Differential analysis

RNA-seq. Differential analysis RNA-seq Differential analysis DESeq2 DESeq2 http://bioconductor.org/packages/release/bioc/vignettes/deseq 2/inst/doc/DESeq2.html Input data Why un-normalized counts? As input, the DESeq2 package expects

More information

Empirical Bayesian analysis of paired high-throughput sequencing data with a beta-binomial distribution

Empirical Bayesian analysis of paired high-throughput sequencing data with a beta-binomial distribution Hardcastle and Kelly BMC Bioinformatics 2013, 14:135 RESEARCH ARTICLE Open Access Empirical Bayesian analysis of paired high-throughput sequencing data with a beta-binomial distribution Thomas J Hardcastle

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments

Linear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Gordon K. Smyth (as interpreted by Aaron J. Baraff) STAT 572 Intro Talk April 10, 2014 Microarray

More information

Robust statistics. Michael Love 7/10/2016

Robust statistics. Michael Love 7/10/2016 Robust statistics Michael Love 7/10/2016 Robust topics Median MAD Spearman Wilcoxon rank test Weighted least squares Cook's distance M-estimators Robust topics Median => middle MAD => spread Spearman =>

More information

scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017

scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017 scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017 Olga (NBIS) scrna-seq de October 2017 1 / 34 Outline Introduction: what

More information

Classifying next-generation sequencing data using a zero-inflated Poisson model

Classifying next-generation sequencing data using a zero-inflated Poisson model 7 Doc-StartBIOINFORMATICS Classifying next-generation sequencing data using a zero-inflated Poisson model Yan Zhou 1, Xiang Wan 2,, Baoxue Zhang 3 and Tiejun Tong 4, 1 College of Mathematics and Statistics,

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Uncovering structure in biological networks: A model-based approach

Uncovering structure in biological networks: A model-based approach Uncovering structure in biological networks: A model-based approach J-J Daudin, F. Picard, S. Robin, M. Mariadassou UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Statistics

More information

Lecture 6: Gaussian Mixture Models (GMM)

Lecture 6: Gaussian Mixture Models (GMM) Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning

More information

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures

More information

EBSeq: An R package for differential expression analysis using RNA-seq data

EBSeq: An R package for differential expression analysis using RNA-seq data EBSeq: An R package for differential expression analysis using RNA-seq data Ning Leng, John Dawson, and Christina Kendziorski October 14, 2013 Contents 1 Introduction 2 2 Citing this software 2 3 The Model

More information

Estimation and Testing of Gene Expression Heterosis

Estimation and Testing of Gene Expression Heterosis Supplementary materials for this article are available at 1.17/s1353-14-173-. Estimation and Testing of Gene Expression Heterosis Tieming JI,Peng LIU,and Dan NETTLETON Heterosis, also known as the hybrid

More information

Differential Expression Analysis Techniques for Single-Cell RNA-seq Experiments

Differential Expression Analysis Techniques for Single-Cell RNA-seq Experiments Differential Expression Analysis Techniques for Single-Cell RNA-seq Experiments for the Computational Biology Doctoral Seminar (CMPBIO 293), organized by N. Yosef & T. Ashuach, Spring 2018, UC Berkeley

More information

Empirical likelihood tests for nonparametric detection of differential expression from RNA seq data

Empirical likelihood tests for nonparametric detection of differential expression from RNA seq data Empirical likelihood tests for nonparametric detection of differential expression from RNA seq data Article Accepted Version Thorne, T. (2015) Empirical likelihood tests for nonparametric detection of

More information

Statistical analysis of biological networks.

Statistical analysis of biological networks. Statistical analysis of biological networks. Assessing the exceptionality of network motifs S. Schbath Jouy-en-Josas/Evry/Paris, France http://genome.jouy.inra.fr/ssb/ Colloquium interactions math/info,

More information

The Expectation Maximization Algorithm & RNA-Sequencing

The Expectation Maximization Algorithm & RNA-Sequencing Senior Thesis in Mathematics The Expectation Maximization Algorithm & RNA-Sequencing Author: Maria Martinez Advisor: Dr. Johanna S. Hardin Submitted to Pomona College in Partial Fulfillment of the Degree

More information

What is the expectation maximization algorithm?

What is the expectation maximization algorithm? primer 2008 Nature Publishing Group http://www.nature.com/naturebiotechnology What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou The expectation maximization algorithm arises

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Matthias Katzfuß Advisor: Dr. Noel Cressie Department of Statistics The Ohio State University

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018

Zhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018 Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals BayesMP Zhiguang Huo 1, Chi Song 2, George Tseng

More information

Chapter 10. Semi-Supervised Learning

Chapter 10. Semi-Supervised Learning Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays

One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays Peter Tiňo School of Computer Science University of Birmingham, UK One-shot Learning

More information

Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study

Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Journal: Manuscript ID: LSSP-00-0.R Manuscript Type: Original Paper Date Submitted by the Author: -May-0 Complete List

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

arxiv: v1 [stat.me] 6 Jun 2016

arxiv: v1 [stat.me] 6 Jun 2016 A gamma approximation to the Bayesian posterior distribution of a discrete parameter of the Generalized Poisson model arxiv:1606.01749v1 [stat.me] 6 Jun 2016 Tsung Fei Khang Institute of Mathematical Sciences,

More information

Models for Replicated Discrimination Tests: A Synthesis of Latent Class Mixture Models and Generalized Linear Mixed Models

Models for Replicated Discrimination Tests: A Synthesis of Latent Class Mixture Models and Generalized Linear Mixed Models Models for Replicated Discrimination Tests: A Synthesis of Latent Class Mixture Models and Generalized Linear Mixed Models Rune Haubo Bojesen Christensen & Per Bruun Brockhoff DTU Informatics Section for

More information

Introduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas

Introduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Introduc)on to RNA- Seq Data Analysis Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Material: hep://)ny.cc/rnaseq Slides: hep://)ny.cc/slidesrnaseq

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Sample Size Estimation for Studies of High-Dimensional Data

Sample Size Estimation for Studies of High-Dimensional Data Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,

More information

arxiv: v1 [stat.ml] 22 Jun 2012

arxiv: v1 [stat.ml] 22 Jun 2012 Hidden Markov Models with mixtures as emission distributions Stevenn Volant 1,2, Caroline Bérard 1,2, Marie-Laure Martin Magniette 1,2,3,4,5 and Stéphane Robin 1,2 arxiv:1206.5102v1 [stat.ml] 22 Jun 2012

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

Downloaded by Stanford University Medical Center Package from online.liebertpub.com at 10/25/17. For personal use only. ABSTRACT 1.

Downloaded by Stanford University Medical Center Package from online.liebertpub.com at 10/25/17. For personal use only. ABSTRACT 1. JOURNAL OF COMPUTATIONAL BIOLOGY Volume 24, Number 7, 2017 # Mary Ann Liebert, Inc. Pp. 721 731 DOI: 10.1089/cmb.2017.0053 A Poisson Log-Normal Model for Constructing Gene Covariation Network Using RNA-seq

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III) Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial

More information

A Generalized Poisson Model for Gene Expression Profiling using RNA Sequence Data

A Generalized Poisson Model for Gene Expression Profiling using RNA Sequence Data A Generalized Poisson Model for Gene Expression Profiling using RNA Sequence Data A Thesis Paper Submitted to the Graduate School in Partial Fulfillment of the Requirements for the Degree Master of Science

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Package MACAU2. R topics documented: April 8, Type Package. Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data. Version 1.

Package MACAU2. R topics documented: April 8, Type Package. Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data. Version 1. Package MACAU2 April 8, 2017 Type Package Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data Version 1.10 Date 2017-03-31 Author Shiquan Sun, Jiaqiang Zhu, Xiang Zhou Maintainer Shiquan Sun

More information

Co-expression analysis

Co-expression analysis Co-expression analysis Etienne Delannoy & Marie-Laure Martin-Magniette & Andrea Rau ED& MLMM& AR Co-expression analysis Ecole chercheur SPS 1 / 49 Outline 1 Introduction 2 Unsupervised clustering Distance-based

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information

SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE

SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE A HYPOTHESIS TEST APPROACH Ismaïl Ahmed 1,2, Françoise Haramburu 3,4, Annie Fourrier-Réglat 3,4,5, Frantz Thiessard 4,5,6,

More information

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear

More information

Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data

Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data arxiv:1601.04879v2 [stat.ap] 12 May 2016 Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data Ranciati, S. ( )(1)(2), Viroli, C. (1),

More information

Non-specific filtering and control of false positives

Non-specific filtering and control of false positives Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview

More information

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation

More information

Generalized estimators for multiple testing: proportion of true nulls and false discovery rate by. Xiongzhi Chen and R.W. Doerge

Generalized estimators for multiple testing: proportion of true nulls and false discovery rate by. Xiongzhi Chen and R.W. Doerge Generalized estimators for multiple testing: proportion of true nulls and false discovery rate by Xiongzhi Chen and R.W. Doerge Department of Statistics, Purdue University, West Lafayette, USA. Technical

More information

Regression Models for Multivariate Count Data

Regression Models for Multivariate Count Data JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS 2017, VOL 26, NO 1, 1 13 http://dxdoiorg/101080/1061860020161154063 Regression Models for Multivariate Count Data Yiwen Zhang a,huazhou b, Jin Zhou c,

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Deciphering and modeling heterogeneity in interaction networks

Deciphering and modeling heterogeneity in interaction networks Deciphering and modeling heterogeneity in interaction networks (using variational approximations) S. Robin INRA / AgroParisTech Mathematical Modeling of Complex Systems December 2013, Ecole Centrale de

More information

Normalization of metagenomic data A comprehensive evaluation of existing methods

Normalization of metagenomic data A comprehensive evaluation of existing methods MASTER S THESIS Normalization of metagenomic data A comprehensive evaluation of existing methods MIKAEL WALLROTH Department of Mathematical Sciences CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG

More information

Gene Selection Using GeneSelectMMD

Gene Selection Using GeneSelectMMD Gene Selection Using GeneSelectMMD Jarrett Morrow remdj@channing.harvard.edu, Weilianq Qiu stwxq@channing.harvard.edu, Wenqing He whe@stats.uwo.ca, Xiaogang Wang stevenw@mathstat.yorku.ca, Ross Lazarus

More information

BIOINFORMATICS. On Differential Gene Expression Using RNA-Seq Data. Juhee Lee 1, Peter Müller 2 Shoudan Liang 3 Guoshuai Cai 3, and Yuan Ji 1

BIOINFORMATICS. On Differential Gene Expression Using RNA-Seq Data. Juhee Lee 1, Peter Müller 2 Shoudan Liang 3 Guoshuai Cai 3, and Yuan Ji 1 BIOINFORMATICS Vol. 00 no. 00 2011 Pages 1 8 On Differential Gene Expression Using RNA-Seq Data Juhee Lee 1, Peter Müller 2 Shoudan Liang 3 Guoshuai Cai 3, and Yuan Ji 1 1 Department of Biostatistics,

More information

Hierarchical Mixture Models for Expression Profiles

Hierarchical Mixture Models for Expression Profiles 2 Hierarchical Mixture Models for Expression Profiles MICHAEL A. NEWTON, PING WANG, AND CHRISTINA KENDZIORSKI University of Wisconsin at Madison Abstract A class of probability models for inference about

More information

Testing High-Dimensional Count (RNA-Seq) Data for Differential Expression

Testing High-Dimensional Count (RNA-Seq) Data for Differential Expression Testing High-Dimensional Count (RNA-Seq) Data for Differential Expression Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 6 1 References Anders & Huber (2010), Differential

More information

Alignment-free RNA-seq workflow. Charlotte Soneson University of Zurich Brixen 2017

Alignment-free RNA-seq workflow. Charlotte Soneson University of Zurich Brixen 2017 Alignment-free RNA-seq workflow Charlotte Soneson University of Zurich Brixen 2017 The alignment-based workflow ALIGNMENT COUNTING ANALYSIS Gene A Gene B... Gene X 7... 13............... The alignment-based

More information

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like

More information