DEXSeq paper discussion

Size: px

Start display at page:

Download "DEXSeq paper discussion"

Jonathan Ramsey
5 years ago
Views:

1 DEXSeq paper discussion L Collado-Torres December 10th, / 23

2 1 Background 2 DEXSeq paper 3 Results 2 / 23

Gene Expression 1 Background 1 Source: http://www.ncbi.nlm.

3 Gene Expression 1 Background 1 Source: 3 / 23

4 High-Throughput Sequencing 2 Background 2 Source: Metzker, Sequencing technologies the next generation, 2010, Nat Rev Genet 4 / 23

5 Alignment (Mapping) 3 Background 3 Source: Trapnell et al, How to map billions of short reads onto genomes, 2009, Nat Biotech 5 / 23

6 What can we find? 4 Background 4 Source: Sorek and Cossart, Prokaryotic transcriptomics a new view on regulation, physiology and pathogenicity, 2010, Nat Rev Genet 6 / 23

7 What can we find? 5 Background 5 Source: Sorek and Cossart, Prokaryotic transcriptomics a new view on regulation, physiology and pathogenicity, 2010, Nat Rev Genet 7 / 23

8 What can we find? 6 Background 6 Source: Sorek and Cossart, Prokaryotic transcriptomics a new view on regulation, physiology and pathogenicity, 2010, Nat Rev Genet 8 / 23

9 What can we find? 7 Background 7 Source: Sorek and Cossart, Prokaryotic transcriptomics a new view on regulation, physiology and pathogenicity, 2010, Nat Rev Genet 9 / 23

10 DEXSeq paper Main ideas Compare two or more conditions of interest to find the DE exons (DEX). Focus on DE: assume a transcript inventory Account for biological variation Use GLMs Fine tuning to make it fast, control for false positives, and when possible increase power 10 / 23

11 DEXSeq paper Simplifying the exome: counting bins 8 8 Source: Anders, Reyes, Huber; Detecting differential usage of exons from RNA-seq data, 2012, Genome Research 11 / 23

12 DEXSeq paper Model Using count data and assume it follows a negative binomial distribution K ijl NB (mean = s j µ ijl, dispersion = α il ) (1) counting bin l gene i sample j = 1,..., m size factor s j : needed because each sample is sequenced at a different depth α il is the dispersion parameter 12 / 23

13 Poisson vs NB 10 DEXSeq paper Poisson GLM Outcome Y Poisson(µ) Link function: log µ = x β Variance function Var(Y ) = Var(µ) = αµ where α = 1. α 1 is the quasi-likelihood approach. Negative Binomial Model: Gamma-Poisson mixture construction Assume unobserved r.v. E where E Gamma(θ, 1/θ). Mean: θ 1/θ = 1, Variance: θ 1/θ 2 = 1/θ. Assume that Y E Poisson(µE) Then Y has a negative binomial distribution with mean µ and variance µ + µ 2 /θ = µ(1 + µ/θ) 9 Variance of Y increases quadratically with the mean rather than linearly. 9 α = 1/θ in the DEXSeq paper 10 Source: slides by Roger Peng 13 / 23

14 Main log-linear model DEXSeq paper log µ ijl = β G i + β E il + β C iρ j + β EC iρ j l (2) β G i : baseline expression strength of gene i β E il : log of the expected fraction of the reads mapped to gene i that overlap counting bin l β C iρ j : log of the fold change in overall expression of gene i under condition ρ j ρ j experimental condition of sample j β EC iρ j l : effect condition ρ j has on the fraction of reads falling into bin l 14 / 23

15 DEXSeq paper Variability: gene expression + exon usage Var. in gene expression: when the total number of transcripts for a gene i differs from the expected value under ρ j Var. in exon usage: using different exons or counting bins log µ ijl = β G i + β E il + β S ij + β EC iρ j l (3) Change β C iρ j by βij S. Absorbs var. in gene expression. 15 / 23

16 Dispersion estimates 11 DEXSeq paper 11 Source: Anders, Reyes, Huber; Detecting differential usage of exons from RNA-seq data, 2012, Genome Research 16 / 23

17 DEXSeq paper Analysis of Deviance 12 Deviance D( ˆβ) = 2l 2l( ˆβ; y) where l is the saturated likelihood Two spaces for β: small S (nested) and large L with H 0 : β S and H a : β L S. Likelihood ratio Under H 0, 2 log LR χ 2 L S LR = L ( ˆβ S ; y) L ( ˆβ L ; y) Note D( ˆβ S ) D( ˆβ L ) = 2[l( ˆβ S ; y) l( ˆβ L ; y)] = 2 log LR 12 Source: slides by Roger Peng 17 / 23

18 Testing for DEX: ANODEV DEXSeq paper Fit two models where log µ ijl = β G i + β E il + β S ij (4) log µ ijl = β G i δ ll = + β E il + β S ij + β EC iρ j lδ ll (5) { 1 if l = l 0 otherwise Then test using analysis of deviance (ANODEV) Control FDR by adjusting p-values using Benjamini-Hochberg s method. 18 / 23

19 Results Finding DEX: knockdown of pasilla on Drosophila melanogaster example Source 19 / 23

20 Results Detection power depends on mean Source: reproduced with code from 20 / 23

21 Results Without considering biological variation Source 21 / 23

22 Results Interesting comparison Mock comparison: check for DEX between replicates from a control condition Used an FDR of 10% DEXSeq: 8 genes (159 in the real control vs treatment comparison) Cuffdiff v 1.3.0: 639 genes (37 in real comp.) This trend continues with other data sets. 22 / 23

23 Results Thanks! Main source: Anders, Reyes, Huber; Detecting differential usage of exons from RNA-seq data, 2012, Genome Research PMID: / 23

Comparative analysis of RNA- Seq data with DESeq2

Comparative analysis of RNA- Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA- Seq Discovery Eind new transcripts Eind transcript boundaries Eind splice junctions Comparison Given