Unlocking RNA-seq tools for zero inflation and single cell applications using observation weights
|
|
- Anthony Harrington
- 5 years ago
- Views:
Transcription
1 Unlocking RNA-seq tools for zero inflation and single cell applications using observation weights Koen Van den Berge, Ghent University Statistical Genomics,
2 The team Koen Van den Berge Fanny Perraudeau Jean-Philippe Vert Charlotte Soneson Mark Robinson Sandrine Dudoit Davide Risso Michael Love Lieven Clement 2
3 single-cell RNA-sequencing (scrna-seq) is noisier than bulk RNA-seq 3
4 Single-cell RNA-seq protocols I Full-length protocols (e.g., SMART-Seq2) I Cells must be isolated (manually, FACS,...). I Library prep is typically plate-based; one well contains one cell. I Droplet-based protocols (e.g., 10X, drop-seq) I Cells do not need to be isolated! I Cell-containing medium is mixed with bead-containing oil droplets. 4
5 single-cell RNA-sequencing (scrna-seq) is noisier than bulk RNA-seq data from [Pickrell et al. 2010, Fletcher et al. 2017, Zheng et al. 2017] 5
6 Bulk RNA-seq di erential expression (DE) analysis Popular methods (edger, DESeq2) adopt negative binomial (NB) models y gi NB(µ gi, g ) log(µ gi ) = gi gi = X i g + log(o i ) with y gi the expression count of gene g in sample i. 6
7 Bulk RNA-seq DE not worse than bespoke scrna-seq tools Jaakkoola et al. (2016), Bioinformatics: Our evaluations did not reveal systematic benefits of the currently available single-cell-specific methods. Soneson & Robinson (2018), Nat. Meth.: We found that bulk RNA-seq analysis methods do not generally perform worse than those developed specifically for scrna-seq. 7
8 Bulk RNA-seq methods still break down due to ZI Simulated (ZI-)bulk RNA-seq data using [Zhou et al. 2014] framework 8
9 Bulk RNA-seq methods still break down due to ZI Simulated (ZI-)bulk RNA-seq data using [Zhou et al. 2014] framework 8
10 Observation weights unlock bulk RNA-seq tools towards zero inflation Excess zeros observed! zero inflation We propose to model counts with a zero inflated negative binomial (ZINB) distribution f ZINB (y gi ; µ gi, g, gi )= gi +(1 gi )f NB (y gi ; µ gi, g ). (1) 9
11 Observation weights unlock bulk RNA-seq tools towards zero inflation Excess zeros observed! zero inflation We propose to model counts with a zero inflated negative binomial (ZINB) distribution f ZINB (y gi ; µ gi, g, gi )= gi +(1 gi )f NB (y gi ; µ gi, g ). (1) A ZINB model corresponds to a weighted NB where observation weights are posterior probabilities w gi = (1 gi)f NB (y gi ; µ gi, g ) f ZINB (y gi ; µ gi, g, gi ) (2) 9
12 Observation weights unlock bulk RNA-seq tools towards zero inflation Excess zeros observed! zero inflation We propose to model counts with a zero inflated negative binomial (ZINB) distribution f ZINB (y gi ; µ gi, g, gi )= gi +(1 gi )f NB (y gi ; µ gi, g ). (1) A ZINB model corresponds to a weighted NB where observation weights are posterior probabilities w gi = (1 gi)f NB (y gi ; µ gi, g ) f ZINB (y gi ; µ gi, g, gi ) (2) Weights are used to unlock RNA-seq NB models (edger, DESeq2) for zero inflation [Van den Berge,Perraudeau et al., 2018]. 9
13 10 zinbwave can be used to fit ZINB models in scrna-seq Estimation of the ZINB parameters using penalized likelihood implemented in the ZINB-WaVE model [Risso et al. 2018] Bioconductor: J genes log μ n samples } Known sample-level covariates Known gene-level covariates M J L J M β L VT + γ Τ = X + Unknown sample-level covariates K J α K W J genes n n n n samples logit π } Observed random variable } Unknown parameter } Unknown parameter } Observed random variable } Unobserved random variable } Unknown parameter X intercept acts as a gene-specific scaling factor V intercept acts as a sample-specific scaling factor
14 10 zinbwave can be used to fit ZINB models in scrna-seq Estimation of the ZINB parameters using penalized likelihood implemented in the ZINB-WaVE model [Risso et al. 2018] Bioconductor: J genes log μ n samples } Known sample-level covariates Known gene-level covariates M J L J M β L VT + γ Τ = X + Unknown sample-level covariates K J α K W J genes n n n n samples logit π } Observed random variable } Unknown parameter } Unknown parameter } Observed random variable } Unobserved random variable } Unknown parameter X intercept acts as a gene-specific scaling factor V intercept acts as a sample-specific scaling factor Alternatively: EM-algorithm (see last couple of slides)
15 Downweighting excess zeros recovers mean-variance trend, resulting in high power Simulated (ZI-)bulk RNA-seq data using [Zhou et al. 2014] framework 11
16 Downweighting excess zeros recovers mean-variance trend, resulting in high power Simulated (ZI-)bulk RNA-seq data using [Zhou et al. 2014] framework 11
17 12 High power, good FDR control in scrna-seq simulations Full-length protocols DESeq2 MAST ZINB-WaVE_DESeq2_common ZINB-WaVE_edgeR_common edger
18 High power, good FDR control in scrna-seq simulations Droplet-based protocols, e.g. 10X Genomics, Drop-seq 13
19 14 Mock comparisons on real data show good FPR control Non-UMI dataset on 622 neuronal cells from [Usoskin et al. 2015]. 45 vs. 45 mock comparisons.
20 15 Downweighting leads to biologically meaningful results 10X Genomics PBMC dataset, preprocessed using tutorial from Seurat.
21 16 Method is implemented in zinbwave Bioc package I computeobservationalweights for weights calculation I edger:glmweightedf for ZI-adjusted inference I DESeq2:nbinomWaldTest and nbinomlrt for ZI-adjusted inference I Tutorial available in zinbwave vignette
22 What follows are some slides on the EM algorithm used in the zinger method. 17
23 18 Azero-inflatednegativebinomialmodel Distribution of counts y for gene g over samples i y gi i +(1 i )f NB (µ gi, g ) i.e. mixture distribution between point-mass at zero and negative binomial Log-likelihood l(y gi )= X i log { i +(1 i )f NB (µ gi, g )} does not factorize! very di cult to maximize!
24 19 Fitting a mixture distribution with EM Estimate mixture using EM-algorithm: introduce latent variable Z gi B( gi ) to assign zeros to the zero-inflation or count component. The joint density becomes f (y gi, z gi )=f (y gi z gi )f (z gi ) =[ i ] z gi [(1 i )f NB (µ gi, g )] (1 z gi )
25 19 Fitting a mixture distribution with EM Estimate mixture using EM-algorithm: introduce latent variable Z gi B( gi ) to assign zeros to the zero-inflation or count component. The joint density becomes f (y gi, z gi )=f(y gi z gi )f (z gi ) =[ i ] z gi [(1 i )f NB (µ gi, g )] (1 z gi ) Maximization of expected log-likelihood given the data: Q = E(l(y gi, z gi ) y gi ) = E(z gi y gi )log i + E(z gi y gi )log +[1 E(z gi y gi )]log(1 i )+ [1 E(z gi y gi )]log[f NB (µ gi, g )] 1. E-step: Calculate expected likelihood 2. M-step: Maximize expected likelihood
26 20 E-step EM-algorithm I Calculate posterior probability that a zero belongs to zero-inflation component ˆ i I (y gi = 0) E(z gi y gi ) = ˆ i I (y gi = 0) + (1 ˆ i )f NB (y gi ;ˆµ gi, ˆg )
27 E-step EM-algorithm I Calculate posterior probability that a zero belongs to zero-inflation component M-step ˆ i I (y gi = 0) E(z gi y gi ) = ˆ i I (y gi = 0) + (1 ˆ i )f NB (y gi ;ˆµ gi, ˆg ) I Estimate NB component parameters µ gi and g using edger I Incorporate observation-level weights wgi =1 E y (z gi )for counts y gi I Because maximizing ZINB likelihood for NB model parameters is equivalent to maximizing a weighted NB likelihood. I Estimate i using logistic regression model i log = N i 1 i with N i log library size of sample i 20
28 Why we use logistic regression with library size in the EM 21
29 22 Case study: Islam et al. (2014) I Single-cell RNA-seq (scrna-seq) allowed the study of sparse cell populations. I One of the first datasets we worked with was from Islam et al. (2014). It demonstrates scrna-seq for 85 cells consisting of two cell populations in mouse: embryonic stem cells and fibroblasts. I This paper was one of the first scrna-seq studies and motivated our method development. I Link to paper:
scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017
scrna-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017 Olga (NBIS) scrna-seq de October 2017 1 / 34 Outline Introduction: what
More informationDifferential Expression Analysis Techniques for Single-Cell RNA-seq Experiments
Differential Expression Analysis Techniques for Single-Cell RNA-seq Experiments for the Computational Biology Doctoral Seminar (CMPBIO 293), organized by N. Yosef & T. Ashuach, Spring 2018, UC Berkeley
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationDispersion modeling for RNAseq differential analysis
Dispersion modeling for RNAseq differential analysis E. Bonafede 1, F. Picard 2, S. Robin 3, C. Viroli 1 ( 1 ) univ. Bologna, ( 3 ) CNRS/univ. Lyon I, ( 3 ) INRA/AgroParisTech, Paris IBC, Victoria, July
More informationMixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data
Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationExam: high-dimensional data analysis January 20, 2014
Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish
More informationg A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n(
,a) RNA-seq RNA-seq Cuffdiff, edger, DESeq Sese Jun,a) Abstract: Frequently used biological experiment technique for observing comprehensive gene expression has been changed from microarray using cdna
More informationABSSeq: a new RNA-Seq analysis method based on modelling absolute expression differences
ABSSeq: a new RNA-Seq analysis method based on modelling absolute expression differences Wentao Yang October 30, 2018 1 Introduction This vignette is intended to give a brief introduction of the ABSSeq
More informationLecture: Mixture Models for Microbiome data
Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance
More informationNormalization and differential analysis of RNA-seq data
Normalization and differential analysis of RNA-seq data Nathalie Villa-Vialaneix INRA, Toulouse, MIAT (Mathématiques et Informatique Appliquées de Toulouse) nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org
More informationSPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1
SPH 247 Statistical Analysis of Laboratory Data April 28, 2015 SPH 247 Statistics for Laboratory Data 1 Outline RNA-Seq for differential expression analysis Statistical methods for RNA-Seq: Structure and
More informationLecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data
Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder
More informationDifferential Expression with RNA-seq: Technical Details
Differential Expression with RNA-seq: Technical Details Lieven Clement Ghent University, Belgium Statistical Genomics: Master of Science in Bioinformatics TWIST, Krijgslaan 281 (S9), Gent, Belgium e-mail:
More informationStatistics for Differential Expression in Sequencing Studies. Naomi Altman
Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand
More informationPackage zinbwave. October 11, 2018
Type Package Package zinbwave October 11, 2018 Title Zero-Inflated Negative Binomial Model for RNA-Seq Data Version 1.2.0 Implements a general and flexible zero-inflated negative binomial model that can
More informationTechnologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA
Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Expression analysis for RNA-seq data Ewa Szczurek Instytut Informatyki Uniwersytet Warszawski 1/35 The problem
More informationDEXSeq paper discussion
DEXSeq paper discussion L Collado-Torres December 10th, 2012 1 / 23 1 Background 2 DEXSeq paper 3 Results 2 / 23 Gene Expression 1 Background 1 Source: http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/applexpression.shtml
More informationStatistical tests for differential expression in count data (1)
Statistical tests for differential expression in count data (1) NBIC Advanced RNA-seq course 25-26 August 2011 Academic Medical Center, Amsterdam The analysis of a microarray experiment Pre-process image
More informationDEGseq: an R package for identifying differentially expressed genes from RNA-seq data
DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics
More informationMohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago
Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago What is zero inflation? Suppose you want to study hippos and the effect of habitat variables on their
More informationComparative analysis of RNA- Seq data with DESeq2
Comparative analysis of RNA- Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA- Seq Discovery Eind new transcripts Eind transcript boundaries Eind splice junctions Comparison Given
More informationDavid M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis
David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis March 18, 2016 UVA Seminar RNA Seq 1 RNA Seq Gene expression is the transcription of the
More informationAnalyses biostatistiques de données RNA-seq
Analyses biostatistiques de données RNA-seq Ignacio Gonzàlez, Annick Moisan & Nathalie Villa-Vialaneix prenom.nom@toulouse.inra.fr Toulouse, 18/19 mai 2017 IG, AM, NV 2 (INRA) Biostatistique RNA-seq Toulouse,
More informationRobust statistics. Michael Love 7/10/2016
Robust statistics Michael Love 7/10/2016 Robust topics Median MAD Spearman Wilcoxon rank test Weighted least squares Cook's distance M-estimators Robust topics Median => middle MAD => spread Spearman =>
More informationExam: high-dimensional data analysis February 28, 2014
Exam: high-dimensional data analysis February 28, 2014 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question (not the subquestions) on a separate piece of paper.
More informationMACAU 2.0 User Manual
MACAU 2.0 User Manual Shiquan Sun, Jiaqiang Zhu, and Xiang Zhou Department of Biostatistics, University of Michigan shiquans@umich.edu and xzhousph@umich.edu April 9, 2017 Copyright 2016 by Xiang Zhou
More informationCo-expression analysis of RNA-seq data with the coseq package
Co-expression analysis of RNA-seq data with the coseq package Andrea Rau 1 and Cathy Maugis-Rabusseau 1 andrea.rau@jouy.inra.fr coseq version 0.1.10 Abstract This vignette illustrates the use of the coseq
More informationDifferential expression analysis for sequencing count data. Simon Anders
Differential expression analysis for sequencing count data Simon Anders RNA-Seq Count data in HTS RNA-Seq Tag-Seq Gene 13CDNA73 A2BP1 A2M A4GALT AAAS AACS AADACL1 [...] ChIP-Seq Bar-Seq... GliNS1 4 19
More informationMixtures and Hidden Markov Models for analyzing genomic data
Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche
More informationRNASeq Differential Expression
12/06/2014 RNASeq Differential Expression Le Corguillé v1.01 1 Introduction RNASeq No previous genomic sequence information is needed In RNA-seq the expression signal of a transcript is limited by the
More informationGenetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig
Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.
More informationA deep generative model for gene expression profiles from single-cell RNA sequencing
A deep generative model for gene expression profiles from single-cell RNA sequencing Nir Yosef Michael Jordan Romain Lopez Electrical Engineering and Computer Sciences University of California at Berkeley
More informationMachine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers
Machine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers School of Informatics, University of Edinburgh, Instructor: Chris Williams 1. Given a dataset {(x n, y n ), n 1,..., N}, where
More informationNew RNA-seq workflows. Charlotte Soneson University of Zurich Brixen 2016
New RNA-seq workflows Charlotte Soneson University of Zurich Brixen 2016 Wikipedia The traditional workflow ALIGNMENT COUNTING ANALYSIS Gene A Gene B... Gene X 7... 13............... The traditional workflow
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationCo-expression analysis of RNA-seq data
Co-expression analysis of RNA-seq data Etienne Delannoy & Marie-Laure Martin-Magniette & Andrea Rau Plant Science Institut of Paris-Saclay (IPS2) Applied Mathematics and Informatics Unit (MIA-Paris) Genetique
More information*Equal contribution Contact: (TT) 1 Department of Biomedical Engineering, the Engineering Faculty, Tel Aviv
Supplementary of Complementary Post Transcriptional Regulatory Information is Detected by PUNCH-P and Ribosome Profiling Hadas Zur*,1, Ranen Aviner*,2, Tamir Tuller 1,3 1 Department of Biomedical Engineering,
More informationEBSeq: An R package for differential expression analysis using RNA-seq data
EBSeq: An R package for differential expression analysis using RNA-seq data Ning Leng, John Dawson, and Christina Kendziorski October 14, 2013 Contents 1 Introduction 2 2 Citing this software 2 3 The Model
More informationCo-expression analysis
Co-expression analysis Etienne Delannoy & Marie-Laure Martin-Magniette & Andrea Rau ED& MLMM& AR Co-expression analysis Ecole chercheur SPS 1 / 49 Outline 1 Introduction 2 Unsupervised clustering Distance-based
More informationOverdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion
Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion
More informationLatent Variable Methods for the Analysis of Genomic Data
John D. Storey Center for Statistics and Machine Learning & Lewis-Sigler Institute for Integrative Genomics Latent Variable Methods for the Analysis of Genomic Data http://genomine.org/talks/ Data m variables
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More informationALDEx: ANOVA-Like Differential Gene Expression Analysis of Single-Organism and Meta-RNA-Seq
ALDEx: ANOVA-Like Differential Gene Expression Analysis of Single-Organism and Meta-RNA-Seq Andrew Fernandes, Gregory Gloor, Jean Macklaim July 18, 212 1 Introduction This guide provides an overview of
More informationMaximum Smoothed Likelihood for Multivariate Nonparametric Mixtures
Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,
More informationAlignment-free RNA-seq workflow. Charlotte Soneson University of Zurich Brixen 2017
Alignment-free RNA-seq workflow Charlotte Soneson University of Zurich Brixen 2017 The alignment-based workflow ALIGNMENT COUNTING ANALYSIS Gene A Gene B... Gene X 7... 13............... The alignment-based
More informationUsing Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results
Using Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results Jing Zhang Miami University August 12, 2014 Jing Zhang (Miami University) Using Historical
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationPackage MACAU2. R topics documented: April 8, Type Package. Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data. Version 1.
Package MACAU2 April 8, 2017 Type Package Title MACAU 2.0: Efficient Mixed Model Analysis of Count Data Version 1.10 Date 2017-03-31 Author Shiquan Sun, Jiaqiang Zhu, Xiang Zhou Maintainer Shiquan Sun
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationTesting High-Dimensional Count (RNA-Seq) Data for Differential Expression
Testing High-Dimensional Count (RNA-Seq) Data for Differential Expression Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 6 1 References Anders & Huber (2010), Differential
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationDiscrete Choice Modeling
[Part 6] 1/55 0 Introduction 1 Summary 2 Binary Choice 3 Panel Data 4 Bivariate Probit 5 Ordered Choice 6 7 Multinomial Choice 8 Nested Logit 9 Heterogeneity 10 Latent Class 11 Mixed Logit 12 Stated Preference
More informationSingle Cell Sequencing
Single Cell Sequencing Fundamental unit of life Autonomous and unique Interactive Dynamic - change over time Evolution occurs on the cellular level Robert Hooke s drawing of cork cells, 1665 Type Prokaryotes
More informationIntroduction to the statistical analysis of single cell transcritomic data
Introduction to the statistical analysis of single cell transcritomic data Franck Picard Laboratoire Biométrie et Biologie Evolutive, CNRS Univ. Lyon Netbio workshop - December 2018 franck.picard@univ-lyon1.fr
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationarxiv: v4 [cs.lg] 16 Jan 2018
A deep generative model for gene expression profiles from single-cell RNA sequencing Romain Lopez, Jeffrey Regier, Michael Cole, Michael Jordan and Nir Yosef Department of Electrical Engineering and Computer
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 9: Logistic regression (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 28 Regression methods for binary outcomes 2 / 28 Binary outcomes For the duration of this lecture suppose
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationThe Expectation Maximization Algorithm & RNA-Sequencing
Senior Thesis in Mathematics The Expectation Maximization Algorithm & RNA-Sequencing Author: Maria Martinez Advisor: Dr. Johanna S. Hardin Submitted to Pomona College in Partial Fulfillment of the Degree
More informationLatent Variable Models
Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationRegression Clustering
Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationBayesian Clustering of Multi-Omics
Bayesian Clustering of Multi-Omics for Cardiovascular Diseases Nils Strelow 22./23.01.2019 Final Presentation Trends in Bioinformatics WS18/19 Recap Intermediate presentation Precision Medicine Multi-Omics
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationDimension Reduction. David M. Blei. April 23, 2012
Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationEstimation for nonparametric mixture models
Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationSUPPLEMENTARY SIMULATIONS & FIGURES
Supplementary Material: Supplementary Material for Mixed Effects Models for Resampled Network Statistics Improve Statistical Power to Find Differences in Multi-Subject Functional Connectivity Manjari Narayan,
More informationAlignment. Peak Detection
ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie
More informationChapter 1. Modeling Basics
Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical
More informationSome Statistical Models and Algorithms for Change-Point Problems in Genomics
Some Statistical Models and Algorithms for Change-Point Problems in Genomics S. Robin UMR 518 AgroParisTech / INRA Applied MAth & Comput. Sc. Journées SMAI-MAIRCI Grenoble, September 2012 S. Robin (AgroParisTech
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationPetr Volf. Model for Difference of Two Series of Poisson-like Count Data
Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationNormalization of metagenomic data A comprehensive evaluation of existing methods
MASTER S THESIS Normalization of metagenomic data A comprehensive evaluation of existing methods MIKAEL WALLROTH Department of Mathematical Sciences CHALMERS UNIVERSITY OF TECHNOLOGY UNIVERSITY OF GOTHENBURG
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationBayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang
Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January
More informationWageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice
Wageningen Summer School in Econometrics The Bayesian Approach in Theory and Practice September 2008 Slides for Lecture on Qualitative and Limited Dependent Variable Models Gary Koop, University of Strathclyde
More informationRNA-seq. Differential analysis
RNA-seq Differential analysis DESeq2 DESeq2 http://bioconductor.org/packages/release/bioc/vignettes/deseq 2/inst/doc/DESeq2.html Input data Why un-normalized counts? As input, the DESeq2 package expects
More informationGenome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics
Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics I believe it is helpful to number your slides for easy reference. It's been a while since I took
More informationSta s cal Models: Day 3
Sta s cal Models: Day 3 Linear Mixed Effects Models and Generalized Linear Models Vincenzo Coia & David Kepplinger Applied Sta s cs and Data Science Group UBC Sta s cs February 9, 2017 Linear Mixed Effects
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationStatistical challenges in RNA-Seq data analysis
Statistical challenges in RNA-Seq data analysis Julie Aubert UMR 518 AgroParisTech-INRA Mathématiques et Informatique Appliquées Ecole de bioinformatique, Station biologique de Roscoff, 2013 Nov. 18 J.
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More information