Isoform discovery and quantification from RNA-Seq data C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Deloger November 2016 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 1 / 66
Introduction Forewords Haas BJ, Zody MC.: Advancing RNA-Seq analysis. Nat Biotechnol. 2010 May;28(5):421-3 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 2 / 66
Introduction Forewords Quantification from RNA-Seq data Previous talk: quantification within the gene level Condition 1 Condition 2 Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C.: EBSeq: an empirical Bayes hierarchical model for inference in RNA-Seq experiments. Bioinformatics. 2013 Apr 15;29(8):1035-43 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 3 / 66
Introduction Forewords Quantification from RNA-Seq data Previous talk: quantification within the gene level Condition 1 Condition 2 but Genes may be differentially spliced many different mrnas from a single locus isoforms Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C.: EBSeq: an empirical Bayes hierarchical model for inference in RNA-Seq experiments. Bioinformatics. 2013 Apr 15;29(8):1035-43 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 4 / 66
Introduction Forewords Quantification from RNA-Seq data And isoforms may be differentially expressed between 2 conditions: Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C.: EBSeq: an empirical Bayes hierarchical model for inference in RNA-Seq experiments. Bioinformatics. 2013 Apr 15;29(8):1035-43 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 5 / 66
Introduction Forewords Classification and usage of splicing events Histogram: AStalavista 1 + lastests RefSeq versions available of species annotations, ce2, dm3, hg18, tair10 (number of splicing events) 1. Foissac S, Sammeth M (2007) ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic Acids Research 35:W297-299 - http://genome.crg.es/astalavista/ C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 6 / 66
A real need? Introduction Forewords transcriptome from new condition tissue-specific transcriptome different development stages transcriptome from non model organism cancer cell RNA maturation mutant... How to manage RNA-Seq data with genes subjected to differential splicing? Is it possible to discover new isoforms? Is it possible to quantify abundance of each isoform? C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 7 / 66
A real need? Introduction Forewords transcriptome from new condition tissue-specific transcriptome different development stages transcriptome from non model organism cancer cell RNA maturation mutant... How to manage RNA-Seq data with genes subjected to differential splicing? Is it possible to discover new isoforms? Cufflinks, Cuffmerge Is it possible to quantify abundance of each isoform? RSEM, EBSeq C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 8 / 66
Introduction Forewords Isoforms reconstruction and quantification from RNA-Seq C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 9 / 66
Introduction RNA-Seq Data: Profiling of sex-biased expression in Drosophila melanogaster Data tissue: whole flies developmental stage, age: adult, 5-7 days post eclosion conditions: sex, female or male, biological duplicate Female rep1 Female rep2 Male rep1 Male rep2 SRA 1 GSM694258 GSM694259 GSM694260 GSM694261 PolyA+ mrna, paire-ends 2x75bp, insert size +/- 200bp genome reduction: chr 3R (autosom), from 377 to 13 947 890 1. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=gsm6942xx C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 10 / 66
Introduction TP 1 st step: Data importation from Published histories Data C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 11 / 66
Discovery of new transcript Isoforms reconstruction protocol Cufflinks Tuxedo suite: Trapnell C, & al.: Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and Cufflinks. Nat Protoc. 2012 Mar 1;7(3):562-78 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 12 / 66
TP 2 nd step: Cufflinks Discovery of new transcript Cufflinks Parameters SAM or BAM file of aligned RNA-Seq reads : Your mapping file Use Reference Annotation : Set to Use reference annotation as guide Reference Annotation : Your genome annotation Use effective length correction : No We are interesed in Isoform detection, but not in their quantification C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 13 / 66
Cufflinks algorithm Discovery of new transcript Cufflinks Trapnell C. et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010 May;28(5):511-5 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 14 / 66
Our cufflinks usage Discovery of new transcript Cufflinks for the discovery of isoform and without quantification aims no matter of the parameters related to quantification (normalization, length correction) 2 thresholds (signal to noise ratio), isoform and splicing event: minimum expression ratio: given isoform / majority isoform number of reads ratio: splicing site / intron with a well-known genome (fruitfly) use reference annotation as guide (but may be used with no reference annotation) Trapnell C. et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010 May;28(5):511-5 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 15 / 66
Discovery of new transcript Cuffmerge Merge transcripts from many samples Cufflinks done for each sample different lists of transcripts necessary to unify lists between them in connection with the reference annotations cuffmerge Trapnell C. et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010 May;28(5):511-5 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 16 / 66
Discovery of new transcript TP 3 rd step: Cuffmerge Cuffmerge Parameters GTF file produced by Cufflinks : Your first genomic annotation produced by Cufflinks (gtf) Additional GTF Input Files : Repeat up to your last annotation! Use Reference Annotation : Set to Yes, then insert your initial genomic annotation (gtf) Use Sequence Data : Set it to Yes Choose the source for the reference list : Set it to History Using Reference file : Your genomic sequence (fasta) C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 17 / 66
Discovery of new transcript Results Understanding the cuff classification of the transcripts http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/ C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 18 / 66
= class code Discovery of new transcript Results The following reads are mapped to an existing transcript in the fly genome (here female sample 2 and male sample 1) without any differential expression, nor differential processing. C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 19 / 66
= class code Discovery of new transcript Results Another example of = class which is differentially expressed in relation to the male condition. C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 20 / 66
j class code Discovery of new transcript Results The following reads are mapped to a part of an existing transcript. This is a potential novel isoform in female sample. C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 21 / 66
u class code Discovery of new transcript Results The following reads are mapped to an intergenic region. C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 22 / 66
Discovery of new transcript Results Isoforms reconstruction and quantification from RNA-Seq C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 23 / 66
Differential expression, transcript level Isoforms differential expression Forewords RSEM aligns reads on a reference of transcripts and counts EBSeq finds DE isoforms across two conditions and some intermediary steps to link RSEM and EBSeq C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 24 / 66
Differential expression, transcript level Forewords Isoforms differential expression: RSEM RSEM aligns reads on a transcript reference: computed from the genome annotations (gft file) only stranded features: filter to remove unstranded isoforms directly from transcript assembly (in case of non-model organism, cancer cell, etc) C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 25 / 66
Differential expression, transcript level RSEM: Pre-processing data RSEM: Removing unstranded new isoforms RSEM requires only stranded features, so we have to filter unstranded isoforms C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 26 / 66
Differential expression, transcript level TP 4 th step: filter for RSEM RSEM: Pre-processing data Parameters Filter : The file to be filtered, our merged gtf With following condition : c7!=. Number of header lines to skip : We have no header at all, so 0 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 27 / 66
Differential expression, transcript level RSEM: Pre-processing data Isoforms differential expression: RSEM RSEM aligns reads on a transcript reference: computed from the genome annotations (gft file) only stranded features: filter to remove unstranded isoforms directly from transcript assembly (in case of non-model organism, cancer cell, etc) RSEM adds a polya tail to each transcript (reads from 3 end mrna) and uses indexation (gain time) RSEM prepare reference C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 28 / 66
Differential expression, transcript level RSEM: Pre-processing data TP 5 th step: RSEM prepare reference Parameters Reference transcript source : Set it to reference genome and gtf reference fasta file : Your genome sequence (fasta) gtf or gff3 file : Your enhanced and merged genome annotation Use Bowtie2 : Hit Yes C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 29 / 66
RSEM features Differential expression, transcript level RSEM: Calculating Expression values RSEM estimates the incertainty due to both multiread allocation and random sampling effect using all valid mappings of the read (mapping scores, probability for a read to come from a locus) Need of a specific mapping (sam/bam) file: reporting of all the valid mappings for each read relaunch the mapping step (bowtie/bowtie2) Some RSEM features: strand-specificity highly 5 or 3 biaised ditribution of read positions in case of single-end, fix the fragment length does not support gapped mapping (no indel) RSEM: RNA-Seq by Expectation Maximization C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 30 / 66
Differential expression, transcript level RSEM: Calculating Expression values EM algorithm: estimate the expression of cognate isoforms EM: Expectated-Maximization First 3 cycles of EM algorithm. Abundance of red isoform estimated after the 1srt M-step: (1/3 read a + 1/2 read c + 1 read d + 1/2 read e)/(total read number), i.e. 0.47 ((0.33+0.5+1+0.5)/5) proved to converge stop criterion: when all probabilities that a fragment is derived from a transcript 10-7 have a relative change than 10-3 RSEM calculate expression L. Pachter: Models for transcript quantification from RNA-Seq, http://arxiv.org/pdf/1104.3889v2.pdf C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 31 / 66
Differential expression, transcript level RSEM: Calculating Expression values TP 6 th step: RSEM calculate expression Parameters RSEM Reference Source : Set it to From your history RSEM reference : Your previous reference Library type : Set it to Paired End Reads Read 1 fastq file and Read 2 fastq file : Your reads (fastq) Use bowtie 1 or 2? : Set it to Bowtie 2 Is the library strand specific? : Set it to forward orientation C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 32 / 66
Differential expression, transcript level EBSeq algorithm Isoforms differential expression: EBSeq EBSeq: Empirical Bayesian approach that models a number of features observed in RNA-Seq data. Runs EBSeq to find DE isoforms across two conditions: Isoform level DE test across two conditions C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 33 / 66
Differential expression, transcript level EBSeq algorithm EBSeq algorithm Mapping incertainty increases due to the presence of multiple isoforms of a given gene. EBSeq: Expected count for an isoform is distributed as Negative Binomiale Isoform-specific means and variances are estimated via the EM algorithm EBSeq accomodates isoform expression estimation uncertainty by modeling the differential variability observed in distinct groups of isoforms. 3 groups: following the number of isoforms associated to each gene (1, 2 or 3 and more) Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C.: EBSeq: an empirical Bayes hierarchical model for inference in RNA-Seq experiments. Bioinformatics. 2013 Apr 15;29(8):1035-43 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 34 / 66
Differential expression, transcript level EBSeq algorithm EBSeq directly models isoform expression A collective analysis of isoforms: reduces the power for identifying isoform in the 1 group (the true variance in that group are lower, on average, than those derived from the full collection of isoforms) increases the false discoveries in the 2 other groups (true variances are higher). Changes of the estimation incertainty with the increase of isoform complexity Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C.: EBSeq: an empirical Bayes hierarchical model for inference in RNA-Seq experiments. Bioinformatics. 2013 Apr 15;29(8):1035-43 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 35 / 66
Differential expression, transcript level Pre-processing data Isoforms differential expression: EBSeq Empirical Bayesian approach that models a number of features observed in RNA-Seq data. 2 workflows: Create a vector with the related group for each isoform Create IG Vector 4 RSEM outputs 1 EBSeq input Create Expression Table Runs EBSeq to find DE isoforms across two conditions: Isoform level DE test across two conditions C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 36 / 66
Differential expression, transcript level Pre-processing data Isoforms differential expression: EBSeq Empirical Bayesian approach that models a number of features observed in RNA-Seq data. 2 workflows: Create a vector with the related group for each isoform Create IG Vector 4 RSEM outputs 1 EBSeq input Create Expression Table Runs EBSeq to find DE isoforms across two conditions: Isoform level DE test across two conditions C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 37 / 66
Differential expression, transcript level Pre-processing data From RSEM to EBSeq We have: 4 files (1 per replicate) Those files are identically ordered by transcript names We need: - 1 file containing the number of isoforms each gene owns: IG vector - 1 file for all expected expression: Expression table We have to convert RSEM output to fit EBSeq input s requirements. C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 38 / 66
Differential expression, transcript level Pre-processing data TP 7 th step: EBSeq IG Vector (1/3) What is the IG Vector? - The IG Vector is a table with only one column of numbers (integers) - Each row corresponds to a transcript on the same row in the Expression table. Each integer in the IG Vector corresponds to the group 1, 2 or 3, according to the number of isoforms of the gene related to the considered isoform Tools : - Cut and Remove beginning from Text Manipulation section - Get Ig vector from gene-isoform mapping for isoform level DE analysis, available in EBSeq section C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 39 / 66
Differential expression, transcript level Pre-processing data TP 7 th step: EBSeq IG Vector (2/3) RSEM Isoform Abundance table EBSeq IG Vector C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 40 / 66
Differential expression, transcript level Pre-processing data TP 7 th step: EBSeq IG Vector (3/3) parameter input: A count table from RSEM. Caution: All Isoform abundances tabular files have the same succession of transcripts and genes names through each line. This succession is used by the Create IG Vector workflow. Therefore, any Isoform abundance file may be used in this step. C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 41 / 66
Differential expression, transcript level Pre-processing data Isoforms differential expression: EBseq Empirical Bayesian approach that models a number of features observed in RNA-Seq data. 2 workflows: Create a vector with the related group for each isoform Create IG Vector 4 RSEM outputs 1 EBSeq input Create Expression Table Runs EBSeq to find DE isoforms across two conditions: Isoform level DE test across two conditions C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 42 / 66
Differential expression, transcript level Pre-processing data From RSEM Expression Table to EBSeq Data Matrix The expression table 5 columns: - Transcripts name - The expected expression of F1, F2, M1 and M2 Obtained by merging the 5 th column of RSEM Isoform Expression results. Tool: Create Expression Table, available among the shared workflows C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 43 / 66
Differential expression, transcript level Pre-processing data TP 8 th step: Create Expression Table parameters First Dataset, Second Dataset, Third Dataset, and Fourth Dataset: Your count tables C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 44 / 66
Differential expression, transcript level Differential analysis Isoforms differential expression: EBseq Empirical Bayesian approach that models a number of features observed in RNA-Seq data. 2 workflows: Create a vector with the related group for each isoform Create IG Vector 4 RSEM outputs 1 EBSeq input Create Expression Table Runs EBSeq to find DE isoforms across two conditions: Isoform level DE test across two conditions C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 45 / 66
Differential expression, transcript level Differential analysis TP 9 th step: EBSeq Differential expression Parameters Isoform Expression : Our Data Matrix The first row is Sample Names : Yes Enter which condition each sample belongs to : M, M, F, F Ig Vector : Our IG Vector C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 46 / 66
List of DE isoforms Differential expression, transcript level Differential analysis...... C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 47 / 66
Conclusion Isoforms differential expression: methods and tools Classical RNA-Seq analysis method. Many methods (and tools): Expression estimation: Bayesian estimation of parameters of a model: BitSeq, Cufflinks, express Expectation-Maximization approach to inferring isoform abundances: RSEM-EBseq, Sailfish/Salmon, Kallisto Mapping to: the genome: Cuffdiff2, BitSeq, FluxCapacitor the transcriptome: express, RSEM-EBseq Mapping-free: Sailfish/Salmon, Kallisto C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 48 / 66
Mapping-free? Conclusion Kallisto example: De Bruijn Graph on transcriptome Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal RNA-Seq quantification with kallisto.nat Biotechnol. 2016 May;34(5):525-7 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 49 / 66
Mapping-free? Conclusion Kallisto example: De Bruijn Graph on transcriptome C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 50 / 66
Mapping-free? Conclusion Kallisto example: De Bruijn Graph on transcriptome C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 51 / 66
Mapping-free? Conclusion Kallisto example: De Bruijn Graph on transcriptome C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 52 / 66
Mapping-free? Conclusion Kallisto example: De Bruijn Graph on transcriptome C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 53 / 66
Mapping-free? Conclusion Kallisto example: De Bruijn Graph on transcriptome Stand for multimap reads Need to adapt algorithm to use stranded RNAseq No mapping = no visualization C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 54 / 66
Conclusion Isoforms with RNA-Seq: not yet Isoforms discovery and quantification from RNA-Seq: not yet a well-established measure Methods based on transcriptome are generally better (for quantification but not for discovery) EM methods are better than count-based methods (many EM methods are available but differ little in accuracy) the more abundant is the isoform, the more accurately it is inferred major bottleneck: small size of read (comparing to 2.2 kb for mammals transcripts), multimap reads Evaluate the accuracy of isoform abundance computational methods: difficult too few number of isoform with experimental validation strategies (ex. qrt-pcr) synthetically generated datasets may not capture adequately the complexities of RNA-Seq experiments C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 55 / 66
Improve? Conclusion don t forget the micro-arrays designed for isoform detection (not for discovery of new isoform, model organism) gain statistical power with spike measurements make protocols like ribodepletion but for highly expressed housekeeping genes (to enrich with interesting transcripts) complete isoform definitions by other NGS studies? ChIPSeq with a protein from the spliceosome as target capturing the 5 ends of RNAs... full-length cdnas technology (Pacific Biosciences)? a too low throughput (10 4 transcripts, summer 2015) Adapt! biological query + organism + data Parameters, softwares, sequencing protocols (single or paired-end, stranded or not) Kanitz A, Gypas F, Gruber AJ, Gruber AR, Martin G, Zavolan M. Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-Seq data.genome Biol. 2015 Jul 23;16:150 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 56 / 66
RNA-Seq: just a photo Conclusion RNA-Seq is just an unique and sampled RNA capture in a given position, at a given time, of one biological experiment... a poor quality photo comparing to real life C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 57 / 66
Start a new workflow Bonus: Création d un workflow 1/9 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 58 / 66
Add some details Bonus: Création d un workflow 2/9 Both name and annotation are important for your own workflows management C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 59 / 66
Add some details Bonus: Création d un workflow 3/9 The workflow is created empty, let us add some tags before diving through tools C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 60 / 66
Cut Bonus: Création d un workflow 4/9 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 61 / 66
Add some actions Bonus: Création d un workflow 5/9 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 62 / 66
Remove Beginning Bonus: Création d un workflow 6/9 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 63 / 66
EBSeq IG Vector Bonus: Création d un workflow 7/9 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 64 / 66
Custom output Bonus: Création d un workflow 8/9 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 65 / 66
End Bonus: Création d un workflow 9/9 Do not forget to save! C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November 2016 66 / 66