Dispersion modeling for RNAseq differential analysis
|
|
- Archibald Long
- 5 years ago
- Views:
Transcription
1 Dispersion modeling for RNAseq differential analysis E. Bonafede 1, F. Picard 2, S. Robin 3, C. Viroli 1 ( 1 ) univ. Bologna, ( 3 ) CNRS/univ. Lyon I, ( 3 ) INRA/AgroParisTech, Paris IBC, Victoria, July 2016 S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 1 / 20
2 Modeling dispersion in RNAseq experiments 1 Modeling dispersion in RNAseq experiments 2 Statistical inference & Test statistics 3 Simulations & Illustration 4 Conclusions and future work S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 2 / 20
3 Modeling dispersion in RNAseq experiments General problem RNAseq is a sequencing based technology that gives access to a measure of the expression level of all the genes from a given species in a given sample (condition) Differential analysis: p genes, d conditions (possibly with replicates). Find the genes, the expression of which vary across conditions. Data. Y ijr = RNAseq read count for gene i in replicate r of condition j S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 3 / 20
4 Modeling dispersion in RNAseq experiments Negative binomial model RNAseq count = number of reads mapped onto a gene s sequence. Observed variability often exceeds this expected according to Poisson. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 4 / 20
5 Modeling dispersion in RNAseq experiments Negative binomial model RNAseq count = number of reads mapped onto a gene s sequence. Observed variability often exceeds this expected according to Poisson. Popular model. [14] Y ijr N B(λ ij, α i ) where 1/α = over-dispersion parameter: V(Y ijr ) = λ ij (1 + λ ij /α i ). S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 4 / 20
6 Modeling dispersion in RNAseq experiments Negative binomial model RNAseq count = number of reads mapped onto a gene s sequence. Observed variability often exceeds this expected according to Poisson. Popular model. [14] Y ijr N B(λ ij, α i ) where 1/α = over-dispersion parameter: V(Y ijr ) = λ ij (1 + λ ij /α i ). Differential analysis. For each gene i, test H 0 = {λ i1 = λ i2 } vs H 1 = {λ i1 λ i2 }. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 4 / 20
7 Modeling dispersion in RNAseq experiments Modeling over-dispersion Assumption on α: Same α for all genes: unrealistic; Gene-specific α i : hard to estimate, especially with few replicates. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 5 / 20
8 Modeling dispersion in RNAseq experiments Modeling over-dispersion Assumption on α: Same α for all genes: unrealistic; Gene-specific α i : hard to estimate, especially with few replicates. Several approaches: Shrinkage: edger [15,13], DSS [18]; Function of mean expression: [8], NBPseq [5], DEseq [1,11]; Bayesian estimation: bayseq [6]; Mixture model: DEXUS [7] (components = conditions) + many others: NOISeq [17], PoissonSeq [10], QuasiSeq [12], TSPM [2] S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 5 / 20
9 Modeling dispersion in RNAseq experiments Mixture model for the dispersion Poisson-Gamma representation. U Gam(α, α), Y U P(λU) Y N B(λ, α) Over-dispersion generated by the latent variable U. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 6 / 20
10 Modeling dispersion in RNAseq experiments Mixture model for the dispersion Poisson-Gamma representation. U Gam(α, α), Y U P(λU) Y N B(λ, α) Over-dispersion generated by the latent variable U. Mixture model for the dispersion. Genes belong to K different dispersion groups: S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 6 / 20
11 Modeling dispersion in RNAseq experiments Mixture model for the dispersion Poisson-Gamma representation. U Gam(α, α), Y U P(λU) Y N B(λ, α) Over-dispersion generated by the latent variable U. Mixture model for the dispersion. Genes belong to K different dispersion groups: Dispersion group of gene i: Z i M (1; (ω k )) ; S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 6 / 20
12 Modeling dispersion in RNAseq experiments Mixture model for the dispersion Poisson-Gamma representation. U Gam(α, α), Y U P(λU) Y N B(λ, α) Over-dispersion generated by the latent variable U. Mixture model for the dispersion. Genes belong to K different dispersion groups: Dispersion group of gene i: Z i M (1; (ω k )) ; Latent dispersion variable: U ijr Z i = k Gam(α k, α k ); S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 6 / 20
13 Modeling dispersion in RNAseq experiments Mixture model for the dispersion Poisson-Gamma representation. U Gam(α, α), Y U P(λU) Y N B(λ, α) Over-dispersion generated by the latent variable U. Mixture model for the dispersion. Genes belong to K different dispersion groups: Dispersion group of gene i: Z i M (1; (ω k )) ; Latent dispersion variable: U ijr Z i = k Gam(α k, α k ); Observed count: Y ijr U ijr P(λ ij U ijr ). S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 6 / 20
14 Modeling dispersion in RNAseq experiments Distributions U ijr Gam(α k, α k ) Y ijr N B(λ, α k ) S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 7 / 20
15 Modeling dispersion in RNAseq experiments Graphical representation For gene i: Marginal distribution: Y ijr Z i = k N B(λ ij, α k ), Y ijr = k ω k N B(λ ij, α k ) S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 8 / 20
16 Statistical inference & Test statistics 1 Modeling dispersion in RNAseq experiments 2 Statistical inference & Test statistics 3 Simulations & Illustration 4 Conclusions and future work S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 9 / 20
17 Statistical inference & Test statistics A two-hidden layer model Parameter θ = {ω = (ω k ), α = (α k ), λ = (λ ij )}; Hidden variables H = {Z = (Z i ), U = (U ijr )}; Observed variables Y = (Y ijr ). S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 10 / 20
18 Statistical inference & Test statistics A two-hidden layer model Parameter θ = {ω = (ω k ), α = (α k ), λ = (λ ij )}; Hidden variables H = {Z = (Z i ), U = (U ijr )}; Observed variables Y = (Y ijr ). Likelihood decomposition. [4] log p θ (Y ) = log p θ (Y, H)dH S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 10 / 20
19 Statistical inference & Test statistics A two-hidden layer model Parameter θ = {ω = (ω k ), α = (α k ), λ = (λ ij )}; Hidden variables H = {Z = (Z i ), U = (U ijr )}; Observed variables Y = (Y ijr ). Likelihood decomposition. [4] log p θ (Y ) = log p θ (Y, H)dH = E [log p θ (Y, H) Y ] + H [p θ (H Y )] S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 10 / 20
20 Statistical inference & Test statistics A two-hidden layer model Parameter θ = {ω = (ω k ), α = (α k ), λ = (λ ij )}; Hidden variables H = {Z = (Z i ), U = (U ijr )}; Observed variables Y = (Y ijr ). Likelihood decomposition. [4] log p θ (Y ) = log p θ (Y, H)dH = E [log p θ (Y, H) Y ] + H [p θ (H Y )] = E [log p ω (Z) + log p α (U Z) + log p λ (Y U) Y ] + H [p θ (H Y )] S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 10 / 20
21 Statistical inference & Test statistics EM algorithm Aim: Find θ = arg max θ log p θ (Y ). S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 11 / 20
22 Statistical inference & Test statistics EM algorithm Aim: Find θ = arg max θ log p θ (Y ). E step: Compute conditional moments Z Y : multinomial; U Y : mixture of Gammas. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 11 / 20
23 Statistical inference & Test statistics EM algorithm Aim: Find θ = arg max θ log p θ (Y ). E step: Compute conditional moments Z Y : multinomial; U Y : mixture of Gammas. M step: Estimate the parameters ω k : explicit; α k : numerical via quasi-newton (or fix-point); λ ij : explicit. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 11 / 20
24 Statistical inference & Test statistics Three contrasts Test: H 0 = {λ i1 = λ i2 } vs H 1 = {λ i1 λ i2 }. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 12 / 20
25 Statistical inference & Test statistics Three contrasts Test: H 0 = {λ i1 = λ i2 } vs H 1 = {λ i1 λ i2 }. Considered contrasts: Difference = λ i1 λ i2 Ratio = λ i1 / λ i2 ) Log-ratio = ln ( λi1 / λ i2 S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 12 / 20
26 Statistical inference & Test statistics Three contrasts Test: H 0 = {λ i1 = λ i2 } vs H 1 = {λ i1 λ i2 }. Considered contrasts: Difference = λ i1 λ i2 Ratio = λ i1 / λ i2 ) Log-ratio = ln ( λi1 / λ i2 Constrast variance: First-order approximation derived via the -method. Test statistic = Contrast / V(Contrast) S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 12 / 20
27 Simulations & Illustration Outline 1 Modeling dispersion in RNAseq experiments 2 Statistical inference & Test statistics 3 Simulations & Illustration 4 Conclusions and future work S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 13 / 20
28 Simulations & Illustration Simulation design compcoder package [16]: Independent realistic RNAseq simulation. p = 1000, 5000 genes (inc. 10% down- and 10% up-regulated); d = 2 conditions; n j = 3, 5, 10; Library size = Evaluation criteria Type-I error control; ROC curve Estimation of the dispersion: ( ) V(Y ijr ) = λ ij 1 + λ ij E θ(z ik Y )/ α k k S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 14 / 20
29 Simulations & Illustration Type-I error control (p = 1000) 0.15 nj= MixtNB DESeq edger DSS DEXUS NBPSeq TSPM SAMseq PoisSeq QuasiSeq 0.15 nj= MixtNB DESeq edger DSS DEXUS NBPSeq TSPM SAMseq PoisSeq QuasiSeq 0.15 nj= MixtNB DESeq edger DSS DEXUS NBPSeq TSPM SAMseq PoisSeq QuasiSeq S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 15 / 20
30 Simulations & Illustration ROC curves (p = 5000) ROC curve ROC curve True positive rate Difference Ratio Log Ratio True positive rate MixtNB DESeq edger DSS False positive rate False positive rate ROC curve ROC curve True positive rate False positive rate MixtNB DEXUS NBPSeq TSPM True positive rate False positive rate MixtNB SAMseq PoisSeq QuasiSeq S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 16 / 20
31 Simulations & Illustration Estimation of dispersion (p = 5000) relative error relative error Common DESeq2 DEXUS DSS edger MixtNB NBPSeq number of mixture components Estimation error of V(Y ) as a function of K Precision with the different methods S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 17 / 20
32 Simulations & Illustration Illustration: data from [9] p = genes, d = 2 conditions (treated/control), n j = 3, 4 K = 3 dispersion groups (BIC). Number of declared differentially expressed genes: α 5% 1%.1% Difference Ratio Log Ratio DESeq edger DSS S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 18 / 20
33 Conclusions and future work Conclusions Summary. A generic framework for RNAseq differential analysis Can account for the library size via an offset µ jr. Flexible modeling of over-dispersion via mixture model A genuine EM algorithm taking advantage of the Poisson-Gamma representation State-of-the art accuracy + control of the type-i error + estimation of the dispersion Published in Biometrics (2015) [3] + R CRAN package MixtNB S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 19 / 20
34 Conclusions and future work Future works Generalize to more complex designs (taking advantage of the GLM framework) Negative binomial latent-block model (LBM) for metagenomics: Y ijr = number of reads from species i in medium j (rep. r) Simultaneous clustering of species and medium using a variational EM algorithm for Poisson-Gamma. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 20 / 20
35 Appendix S. Anders and W. Huber. Differential expression analysis for sequence count data. Genome biology, 11(10):R106, P.L. Auer and R.W. Doerge. A two-stage poisson model for testing RNA-seq data. Statistical applications in genetics and molecular biology, 10(1):1 26, E. Bonafede, F. Picard, S. Robin, and C. Viroli. Modeling overdispersion heterogeneity in differential expression analysis using mixtures. Biometrics, pages n/a n/a, A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. B, 39:1 38, Y. Di, D.W. Schafer, J.S. Cumbie, and J.H. Chang. The NBP negative binomial model for assessing differential gene expression from RNA-Seq. Statistical Applications in Genetics and Molecular Biology, 10(1):1 28, T. Hardcastle and K. Kelly. BaySeq: Empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics, 11(422):1 15, G. Klambauer, T. Unterthiner, and S. Hochreiter. DEXUS: identifying differential expression in RNA-Seq studies with unknown condtions. Nucleics Acids Research, 42(21):1 11, C.W. Law, Y. Chen, W. Shi, and G.K. Smyth. Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 15:R29, H. Li, M. T. Lovci, Y. S. Kwon, M. G. Rosenfeld, X. D. Fu, and G. W. Yeo. Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model. Proceedings of the National Academy of Sciences, 105(51): , J. Li and R. Tibshirani. S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 20 / 20
36 Appendix Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Statistical Methods in Medical Research, 22(5): , M.I. Love, W. Huber, and S. Anders. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550, S.P. Lund, D. Nettleton, D.J. McCarthy, and G.K. Smyth. Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical applications in genetics and molecular biology, 11(5):8, D.J. McCarthy, Y. Chen, and G.K. Smyth. Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation. Nucleics Acids Research, 40(10): , M. D. Robinson, D. J. McCarthy, and G. K. Smyth. edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1): , M. D. Robinson and G. K. Smyth. Small-sample estimation of negative binomial dispersion, with application to SAGE data. Biostatistics, 9: , C. Soneson. compcoder - an R package for benchmarking differential expression methods for RNA-seq data. Bioinformatics, 30(17): , S. Tarazona, F. García-Alcalde, J. Dopazo, A. Ferrer, and A. Conesa. Differential expression in RNA-seq: a matter of depth. Genome research, 21(12): , H. Wu, C. Wang, and Z. Wu. A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics, 14(2): , S. Robin (AgroParisTech / INRA) Dispersion modeling for RNAseq differential analysis IBC, Victoria 20 / 20
Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data
Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University
More informationg A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n(
,a) RNA-seq RNA-seq Cuffdiff, edger, DESeq Sese Jun,a) Abstract: Frequently used biological experiment technique for observing comprehensive gene expression has been changed from microarray using cdna
More informationNormalization and differential analysis of RNA-seq data
Normalization and differential analysis of RNA-seq data Nathalie Villa-Vialaneix INRA, Toulouse, MIAT (Mathématiques et Informatique Appliquées de Toulouse) nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org
More informationLecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data
Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder
More informationStatistics for Differential Expression in Sequencing Studies. Naomi Altman
Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand
More informationABSSeq: a new RNA-Seq analysis method based on modelling absolute expression differences
ABSSeq: a new RNA-Seq analysis method based on modelling absolute expression differences Wentao Yang October 30, 2018 1 Introduction This vignette is intended to give a brief introduction of the ABSSeq
More informationDifferential expression analysis for sequencing count data. Simon Anders
Differential expression analysis for sequencing count data Simon Anders RNA-Seq Count data in HTS RNA-Seq Tag-Seq Gene 13CDNA73 A2BP1 A2M A4GALT AAAS AACS AADACL1 [...] ChIP-Seq Bar-Seq... GliNS1 4 19
More informationLecture: Mixture Models for Microbiome data
Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance
More informationDEXSeq paper discussion
DEXSeq paper discussion L Collado-Torres December 10th, 2012 1 / 23 1 Background 2 DEXSeq paper 3 Results 2 / 23 Gene Expression 1 Background 1 Source: http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/applexpression.shtml
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationarxiv: v1 [stat.me] 1 Dec 2015
Bayesian Estimation of Negative Binomial Parameters with Applications to RNA-Seq Data arxiv:1512.00475v1 [stat.me] 1 Dec 2015 Luis León-Novelo Claudio Fuentes Sarah Emerson UT Health Science Center Oregon
More informationRNASeq Differential Expression
12/06/2014 RNASeq Differential Expression Le Corguillé v1.01 1 Introduction RNASeq No previous genomic sequence information is needed In RNA-seq the expression signal of a transcript is limited by the
More informationUnlocking RNA-seq tools for zero inflation and single cell applications using observation weights
Unlocking RNA-seq tools for zero inflation and single cell applications using observation weights Koen Van den Berge, Ghent University Statistical Genomics, 2018-2019 1 The team Koen Van den Berge Fanny
More informationStatistical tests for differential expression in count data (1)
Statistical tests for differential expression in count data (1) NBIC Advanced RNA-seq course 25-26 August 2011 Academic Medical Center, Amsterdam The analysis of a microarray experiment Pre-process image
More informationDavid M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis
David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis March 18, 2016 UVA Seminar RNA Seq 1 RNA Seq Gene expression is the transcription of the
More informationNBLDA: negative binomial linear discriminant analysis for RNA-Seq data
Dong et al. BMC Bioinformatics (2016) 17:369 DOI 10.1186/s12859-016-1208-1 RESEARCH ARTICLE Open Access NBLDA: negative binomial linear discriminant analysis for RNA-Seq data Kai Dong 1,HongyuZhao 2,TiejunTong
More information*Equal contribution Contact: (TT) 1 Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv
Supplementary of Complementary Post Transcriptional Regulatory Information is Detected by PUNCH-P and Ribosome Profiling Hadas Zur*,1, Ranen Aviner*,2, Tamir Tuller 1,3 1 Department of Biomedical Engineering,
More informationNormalization, testing, and false discovery rate estimation for RNA-sequencing data
Biostatistics Advance Access published October 14, 2011 Biostatistics (2011), 0, 0, pp. 1 16 doi:10.1093/biostatistics/kxr031 Normalization, testing, and false discovery rate estimation for RNA-sequencing
More informationStatistical methods for estimation, testing, and clustering with gene expression data
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2017 Statistical methods for estimation, testing, and clustering with gene expression data Andrew Lithio Iowa
More informationTechnologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA
Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Expression analysis for RNA-seq data Ewa Szczurek Instytut Informatyki Uniwersytet Warszawski 1/35 The problem
More informationDEGseq: an R package for identifying differentially expressed genes from RNA-seq data
DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics
More informationGenetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig
Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.
More informationAnalyses biostatistiques de données RNA-seq
Analyses biostatistiques de données RNA-seq Ignacio Gonzàlez, Annick Moisan & Nathalie Villa-Vialaneix prenom.nom@toulouse.inra.fr Toulouse, 18/19 mai 2017 IG, AM, NV 2 (INRA) Biostatistique RNA-seq Toulouse,
More informationStatistical challenges in RNA-Seq data analysis
Statistical challenges in RNA-Seq data analysis Julie Aubert UMR 518 AgroParisTech-INRA Mathématiques et Informatique Appliquées Ecole de bioinformatique, Station biologique de Roscoff, 2013 Nov. 18 J.
More informationComparative analysis of RNA- Seq data with DESeq2
Comparative analysis of RNA- Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA- Seq Discovery Eind new transcripts Eind transcript boundaries Eind splice junctions Comparison Given
More informationMultiple hypothesis testing and RNA-seq differential expression analysis accounting for dependence and relevant covariates
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 218 Multiple hypothesis testing and RNA-seq differential expression analysis accounting for dependence and relevant
More informationMixture models for analysing transcriptome and ChIP-chip data
Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,
More informationSPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1
SPH 247 Statistical Analysis of Laboratory Data April 28, 2015 SPH 247 Statistics for Laboratory Data 1 Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure and
More informationMixtures and Hidden Markov Models for analyzing genomic data
Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche
More informationHidden Markov models for time series of counts with excess zeros
Hidden Markov models for time series of counts with excess zeros Madalina Olteanu and James Ridgway University Paris 1 Pantheon-Sorbonne - SAMM, EA4543 90 Rue de Tolbiac, 75013 Paris - France Abstract.
More informationDifferential Expression with RNA-seq: Technical Details
Differential Expression with RNA-seq: Technical Details Lieven Clement Ghent University, Belgium Statistical Genomics: Master of Science in Bioinformatics TWIST, Krijgslaan 281 (S9), Gent, Belgium e-mail:
More informationCo-expression analysis of RNA-seq data
Co-expression analysis of RNA-seq data Etienne Delannoy & Marie-Laure Martin-Magniette & Andrea Rau Plant Science Institut of Paris-Saclay (IPS2) Applied Mathematics and Informatics Unit (MIA-Paris) Genetique
More informationDetermining the number of components in mixture models for hierarchical data
Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000
More informationusing Bayesian hierarchical model
Biomarker detection and categorization in RNA-seq meta-analysis using Bayesian hierarchical model Tianzhou Ma Department of Biostatistics University of Pittsburgh, Pittsburgh, PA 15261 email: tim28@pitt.edu
More informationEstimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq
Estimating empirical null distributions for Chi-squared and Gamma statistics with application to multiple testing in RNA-seq Xing Ren 1, Jianmin Wang 1,2,, Song Liu 1,2, and Jeffrey C. Miecznikowski 1,2,
More informationUncertainty quantification and visualization for functional random variables
Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationSome Statistical Models and Algorithms for Change-Point Problems in Genomics
Some Statistical Models and Algorithms for Change-Point Problems in Genomics S. Robin UMR 518 AgroParisTech / INRA Applied MAth & Comput. Sc. Journées SMAI-MAIRCI Grenoble, September 2012 S. Robin (AgroParisTech
More informationRNA-seq. Differential analysis
RNA-seq Differential analysis DESeq2 DESeq2 http://bioconductor.org/packages/release/bioc/vignettes/deseq 2/inst/doc/DESeq2.html Input data Why un-normalized counts? As input, the DESeq2 package expects
More informationEmpirical Bayesian analysis of paired high-throughput sequencing data with a beta-binomial distribution
Hardcastle and Kelly BMC Bioinformatics 2013, 14:135 RESEARCH ARTICLE Open Access Empirical Bayesian analysis of paired high-throughput sequencing data with a beta-binomial distribution Thomas J Hardcastle
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationLinear Models and Empirical Bayes Methods for. Assessing Differential Expression in Microarray Experiments
Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Gordon K. Smyth (as interpreted by Aaron J. Baraff) STAT 572 Intro Talk April 10, 2014 Microarray
More informationRobust statistics. Michael Love 7/10/2016
Robust statistics Michael Love 7/10/2016 Robust topics Median MAD Spearman Wilcoxon rank test Weighted least squares Cook's distance M-estimators Robust topics Median => middle MAD => spread Spearman =>
More informationscrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017
scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017 Olga (NBIS) scrna-seq de October 2017 1 / 34 Outline Introduction: what
More informationClassifying next-generation sequencing data using a zero-inflated Poisson model
7 Doc-StartBIOINFORMATICS Classifying next-generation sequencing data using a zero-inflated Poisson model Yan Zhou 1, Xiang Wan 2,, Baoxue Zhang 3 and Tiejun Tong 4, 1 College of Mathematics and Statistics,
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationUncovering structure in biological networks: A model-based approach
Uncovering structure in biological networks: A model-based approach J-J Daudin, F. Picard, S. Robin, M. Mariadassou UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Statistics
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationPreface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of
Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures
More informationEBSeq: An R package for differential expression analysis using RNA-seq data
EBSeq: An R package for differential expression analysis using RNA-seq data Ning Leng, John Dawson, and Christina Kendziorski October 14, 2013 Contents 1 Introduction 2 2 Citing this software 2 3 The Model
More informationEstimation and Testing of Gene Expression Heterosis
Supplementary materials for this article are available at 1.17/s1353-14-173-. Estimation and Testing of Gene Expression Heterosis Tieming JI,Peng LIU,and Dan NETTLETON Heterosis, also known as the hybrid
More informationDifferential Expression Analysis Techniques for Single-Cell RNA-seq Experiments
Differential Expression Analysis Techniques for Single-Cell RNA-seq Experiments for the Computational Biology Doctoral Seminar (CMPBIO 293), organized by N. Yosef & T. Ashuach, Spring 2018, UC Berkeley
More informationEmpirical likelihood tests for nonparametric detection of differential expression from RNA seq data
Empirical likelihood tests for nonparametric detection of differential expression from RNA seq data Article Accepted Version Thorne, T. (2015) Empirical likelihood tests for nonparametric detection of
More informationStatistical analysis of biological networks.
Statistical analysis of biological networks. Assessing the exceptionality of network motifs S. Schbath Jouy-en-Josas/Evry/Paris, France http://genome.jouy.inra.fr/ssb/ Colloquium interactions math/info,
More informationThe Expectation Maximization Algorithm & RNA-Sequencing
Senior Thesis in Mathematics The Expectation Maximization Algorithm & RNA-Sequencing Author: Maria Martinez Advisor: Dr. Johanna S. Hardin Submitted to Pomona College in Partial Fulfillment of the Degree
More informationWhat is the expectation maximization algorithm?
primer 2008 Nature Publishing Group http://www.nature.com/naturebiotechnology What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou The expectation maximization algorithm arises
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationParameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets
Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Matthias Katzfuß Advisor: Dr. Noel Cressie Department of Statistics The Ohio State University
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationMixture Models and Expectation-Maximization
Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?
More informationZhiguang Huo 1, Chi Song 2, George Tseng 3. July 30, 2018
Bayesian latent hierarchical model for transcriptomic meta-analysis to detect biomarkers with clustered meta-patterns of differential expression signals BayesMP Zhiguang Huo 1, Chi Song 2, George Tseng
More informationChapter 10. Semi-Supervised Learning
Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline
More informationOne-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays
One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays Peter Tiňo School of Computer Science University of Birmingham, UK One-shot Learning
More informationCommunications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study
Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Journal: Manuscript ID: LSSP-00-0.R Manuscript Type: Original Paper Date Submitted by the Author: -May-0 Complete List
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationarxiv: v1 [stat.me] 6 Jun 2016
A gamma approximation to the Bayesian posterior distribution of a discrete parameter of the Generalized Poisson model arxiv:1606.01749v1 [stat.me] 6 Jun 2016 Tsung Fei Khang Institute of Mathematical Sciences,
More informationModels for Replicated Discrimination Tests: A Synthesis of Latent Class Mixture Models and Generalized Linear Mixed Models
Models for Replicated Discrimination Tests: A Synthesis of Latent Class Mixture Models and Generalized Linear Mixed Models Rune Haubo Bojesen Christensen & Per Bruun Brockhoff DTU Informatics Section for
More informationIntroduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas
Introduc)on to RNA- Seq Data Analysis Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Material: hep://)ny.cc/rnaseq Slides: hep://)ny.cc/slidesrnaseq
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationSample Size Estimation for Studies of High-Dimensional Data
Sample Size Estimation for Studies of High-Dimensional Data James J. Chen, Ph.D. National Center for Toxicological Research Food and Drug Administration June 3, 2009 China Medical University Taichung,
More informationarxiv: v1 [stat.ml] 22 Jun 2012
Hidden Markov Models with mixtures as emission distributions Stevenn Volant 1,2, Caroline Bérard 1,2, Marie-Laure Martin Magniette 1,2,3,4,5 and Stéphane Robin 1,2 arxiv:1206.5102v1 [stat.ml] 22 Jun 2012
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationDownloaded by Stanford University Medical Center Package from online.liebertpub.com at 10/25/17. For personal use only. ABSTRACT 1.
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 24, Number 7, 2017 # Mary Ann Liebert, Inc. Pp. 721 731 DOI: 10.1089/cmb.2017.0053 A Poisson Log-Normal Model for Constructing Gene Covariation Network Using RNA-seq
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationLattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)
Title: Spatial Statistics for Point Processes and Lattice Data (Part III) Lattice Data Tonglin Zhang Outline Description Research Problems Global Clustering and Local Clusters Permutation Test Spatial
More informationA Generalized Poisson Model for Gene Expression Profiling using RNA Sequence Data
A Generalized Poisson Model for Gene Expression Profiling using RNA Sequence Data A Thesis Paper Submitted to the Graduate School in Partial Fulfillment of the Requirements for the Degree Master of Science
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More informationModel Selection for Semiparametric Bayesian Models with Application to Overdispersion
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and
More informationContents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1
Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationPackage MACAU2. R topics documented: April 8, Type Package. Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data. Version 1.
Package MACAU2 April 8, 2017 Type Package Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data Version 1.10 Date 2017-03-31 Author Shiquan Sun, Jiaqiang Zhu, Xiang Zhou Maintainer Shiquan Sun
More informationCo-expression analysis
Co-expression analysis Etienne Delannoy & Marie-Laure Martin-Magniette & Andrea Rau ED& MLMM& AR Co-expression analysis Ecole chercheur SPS 1 / 49 Outline 1 Introduction 2 Unsupervised clustering Distance-based
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More informationSIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE
SIGNAL RANKING-BASED COMPARISON OF AUTOMATIC DETECTION METHODS IN PHARMACOVIGILANCE A HYPOTHESIS TEST APPROACH Ismaïl Ahmed 1,2, Françoise Haramburu 3,4, Annie Fourrier-Réglat 3,4,5, Frantz Thiessard 4,5,6,
More informationLinear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics
Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear
More informationMixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data
arxiv:1601.04879v2 [stat.ap] 12 May 2016 Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data Ranciati, S. ( )(1)(2), Viroli, C. (1),
More informationNon-specific filtering and control of false positives
Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationGeneralized estimators for multiple testing: proportion of true nulls and false discovery rate by. Xiongzhi Chen and R.W. Doerge
Generalized estimators for multiple testing: proportion of true nulls and false discovery rate by Xiongzhi Chen and R.W. Doerge Department of Statistics, Purdue University, West Lafayette, USA. Technical
More informationRegression Models for Multivariate Count Data
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS 2017, VOL 26, NO 1, 1 13 http://dxdoiorg/101080/1061860020161154063 Regression Models for Multivariate Count Data Yiwen Zhang a,huazhou b, Jin Zhou c,
More informationPairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion
Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationDeciphering and modeling heterogeneity in interaction networks
Deciphering and modeling heterogeneity in interaction networks (using variational approximations) S. Robin INRA / AgroParisTech Mathematical Modeling of Complex Systems December 2013, Ecole Centrale de
More informationNormalization of metagenomic data A comprehensive evaluation of existing methods
MASTER S THESIS Normalization of metagenomic data A comprehensive evaluation of existing methods MIKAEL WALLROTH Department of Mathematical Sciences CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG
More informationGene Selection Using GeneSelectMMD
Gene Selection Using GeneSelectMMD Jarrett Morrow remdj@channing.harvard.edu, Weilianq Qiu stwxq@channing.harvard.edu, Wenqing He whe@stats.uwo.ca, Xiaogang Wang stevenw@mathstat.yorku.ca, Ross Lazarus
More informationBIOINFORMATICS. On Differential Gene Expression Using RNA-Seq Data. Juhee Lee 1, Peter Müller 2 Shoudan Liang 3 Guoshuai Cai 3, and Yuan Ji 1
BIOINFORMATICS Vol. 00 no. 00 2011 Pages 1 8 On Differential Gene Expression Using RNA-Seq Data Juhee Lee 1, Peter Müller 2 Shoudan Liang 3 Guoshuai Cai 3, and Yuan Ji 1 1 Department of Biostatistics,
More informationHierarchical Mixture Models for Expression Profiles
2 Hierarchical Mixture Models for Expression Profiles MICHAEL A. NEWTON, PING WANG, AND CHRISTINA KENDZIORSKI University of Wisconsin at Madison Abstract A class of probability models for inference about
More informationTesting High-Dimensional Count (RNA-Seq) Data for Differential Expression
Testing High-Dimensional Count (RNA-Seq) Data for Differential Expression Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 6 1 References Anders & Huber (2010), Differential
More informationAlignment-free RNA-seq workflow. Charlotte Soneson University of Zurich Brixen 2017
Alignment-free RNA-seq workflow Charlotte Soneson University of Zurich Brixen 2017 The alignment-based workflow ALIGNMENT COUNTING ANALYSIS Gene A Gene B... Gene X 7... 13............... The alignment-based
More informationPetr Volf. Model for Difference of Two Series of Poisson-like Count Data
Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like
More information