Isoform discovery and quantification from RNA-Seq data

Size: px
Start display at page:

Download "Isoform discovery and quantification from RNA-Seq data"

Transcription

1 Isoform discovery and quantification from RNA-Seq data C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Deloger November 2016 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

2 Introduction Forewords Haas BJ, Zody MC.: Advancing RNA-Seq analysis. Nat Biotechnol May;28(5):421-3 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

3 Introduction Forewords Quantification from RNA-Seq data Previous talk: quantification within the gene level Condition 1 Condition 2 Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C.: EBSeq: an empirical Bayes hierarchical model for inference in RNA-Seq experiments. Bioinformatics Apr 15;29(8): C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

4 Introduction Forewords Quantification from RNA-Seq data Previous talk: quantification within the gene level Condition 1 Condition 2 but Genes may be differentially spliced many different mrnas from a single locus isoforms Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C.: EBSeq: an empirical Bayes hierarchical model for inference in RNA-Seq experiments. Bioinformatics Apr 15;29(8): C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

5 Introduction Forewords Quantification from RNA-Seq data And isoforms may be differentially expressed between 2 conditions: Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C.: EBSeq: an empirical Bayes hierarchical model for inference in RNA-Seq experiments. Bioinformatics Apr 15;29(8): C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

6 Introduction Forewords Classification and usage of splicing events Histogram: AStalavista 1 + lastests RefSeq versions available of species annotations, ce2, dm3, hg18, tair10 (number of splicing events) 1. Foissac S, Sammeth M (2007) ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic Acids Research 35:W C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

7 A real need? Introduction Forewords transcriptome from new condition tissue-specific transcriptome different development stages transcriptome from non model organism cancer cell RNA maturation mutant... How to manage RNA-Seq data with genes subjected to differential splicing? Is it possible to discover new isoforms? Is it possible to quantify abundance of each isoform? C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

8 A real need? Introduction Forewords transcriptome from new condition tissue-specific transcriptome different development stages transcriptome from non model organism cancer cell RNA maturation mutant... How to manage RNA-Seq data with genes subjected to differential splicing? Is it possible to discover new isoforms? Cufflinks, Cuffmerge Is it possible to quantify abundance of each isoform? RSEM, EBSeq C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

9 Introduction Forewords Isoforms reconstruction and quantification from RNA-Seq C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

10 Introduction RNA-Seq Data: Profiling of sex-biased expression in Drosophila melanogaster Data tissue: whole flies developmental stage, age: adult, 5-7 days post eclosion conditions: sex, female or male, biological duplicate Female rep1 Female rep2 Male rep1 Male rep2 SRA 1 GSM GSM GSM GSM PolyA+ mrna, paire-ends 2x75bp, insert size +/- 200bp genome reduction: chr 3R (autosom), from 377 to C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

11 Introduction TP 1 st step: Data importation from Published histories Data C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

12 Discovery of new transcript Isoforms reconstruction protocol Cufflinks Tuxedo suite: Trapnell C, & al.: Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and Cufflinks. Nat Protoc Mar 1;7(3): C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

13 TP 2 nd step: Cufflinks Discovery of new transcript Cufflinks Parameters SAM or BAM file of aligned RNA-Seq reads : Your mapping file Use Reference Annotation : Set to Use reference annotation as guide Reference Annotation : Your genome annotation Use effective length correction : No We are interesed in Isoform detection, but not in their quantification C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

14 Cufflinks algorithm Discovery of new transcript Cufflinks Trapnell C. et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol May;28(5):511-5 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

15 Our cufflinks usage Discovery of new transcript Cufflinks for the discovery of isoform and without quantification aims no matter of the parameters related to quantification (normalization, length correction) 2 thresholds (signal to noise ratio), isoform and splicing event: minimum expression ratio: given isoform / majority isoform number of reads ratio: splicing site / intron with a well-known genome (fruitfly) use reference annotation as guide (but may be used with no reference annotation) Trapnell C. et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol May;28(5):511-5 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

16 Discovery of new transcript Cuffmerge Merge transcripts from many samples Cufflinks done for each sample different lists of transcripts necessary to unify lists between them in connection with the reference annotations cuffmerge Trapnell C. et al.: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol May;28(5):511-5 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

17 Discovery of new transcript TP 3 rd step: Cuffmerge Cuffmerge Parameters GTF file produced by Cufflinks : Your first genomic annotation produced by Cufflinks (gtf) Additional GTF Input Files : Repeat up to your last annotation! Use Reference Annotation : Set to Yes, then insert your initial genomic annotation (gtf) Use Sequence Data : Set it to Yes Choose the source for the reference list : Set it to History Using Reference file : Your genomic sequence (fasta) C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

18 Discovery of new transcript Results Understanding the cuff classification of the transcripts C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

19 = class code Discovery of new transcript Results The following reads are mapped to an existing transcript in the fly genome (here female sample 2 and male sample 1) without any differential expression, nor differential processing. C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

20 = class code Discovery of new transcript Results Another example of = class which is differentially expressed in relation to the male condition. C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

21 j class code Discovery of new transcript Results The following reads are mapped to a part of an existing transcript. This is a potential novel isoform in female sample. C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

22 u class code Discovery of new transcript Results The following reads are mapped to an intergenic region. C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

23 Discovery of new transcript Results Isoforms reconstruction and quantification from RNA-Seq C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

24 Differential expression, transcript level Isoforms differential expression Forewords RSEM aligns reads on a reference of transcripts and counts EBSeq finds DE isoforms across two conditions and some intermediary steps to link RSEM and EBSeq C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

25 Differential expression, transcript level Forewords Isoforms differential expression: RSEM RSEM aligns reads on a transcript reference: computed from the genome annotations (gft file) only stranded features: filter to remove unstranded isoforms directly from transcript assembly (in case of non-model organism, cancer cell, etc) C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

26 Differential expression, transcript level RSEM: Pre-processing data RSEM: Removing unstranded new isoforms RSEM requires only stranded features, so we have to filter unstranded isoforms C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

27 Differential expression, transcript level TP 4 th step: filter for RSEM RSEM: Pre-processing data Parameters Filter : The file to be filtered, our merged gtf With following condition : c7!=. Number of header lines to skip : We have no header at all, so 0 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

28 Differential expression, transcript level RSEM: Pre-processing data Isoforms differential expression: RSEM RSEM aligns reads on a transcript reference: computed from the genome annotations (gft file) only stranded features: filter to remove unstranded isoforms directly from transcript assembly (in case of non-model organism, cancer cell, etc) RSEM adds a polya tail to each transcript (reads from 3 end mrna) and uses indexation (gain time) RSEM prepare reference C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

29 Differential expression, transcript level RSEM: Pre-processing data TP 5 th step: RSEM prepare reference Parameters Reference transcript source : Set it to reference genome and gtf reference fasta file : Your genome sequence (fasta) gtf or gff3 file : Your enhanced and merged genome annotation Use Bowtie2 : Hit Yes C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

30 RSEM features Differential expression, transcript level RSEM: Calculating Expression values RSEM estimates the incertainty due to both multiread allocation and random sampling effect using all valid mappings of the read (mapping scores, probability for a read to come from a locus) Need of a specific mapping (sam/bam) file: reporting of all the valid mappings for each read relaunch the mapping step (bowtie/bowtie2) Some RSEM features: strand-specificity highly 5 or 3 biaised ditribution of read positions in case of single-end, fix the fragment length does not support gapped mapping (no indel) RSEM: RNA-Seq by Expectation Maximization C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

31 Differential expression, transcript level RSEM: Calculating Expression values EM algorithm: estimate the expression of cognate isoforms EM: Expectated-Maximization First 3 cycles of EM algorithm. Abundance of red isoform estimated after the 1srt M-step: (1/3 read a + 1/2 read c + 1 read d + 1/2 read e)/(total read number), i.e (( )/5) proved to converge stop criterion: when all probabilities that a fragment is derived from a transcript 10-7 have a relative change than 10-3 RSEM calculate expression L. Pachter: Models for transcript quantification from RNA-Seq, C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

32 Differential expression, transcript level RSEM: Calculating Expression values TP 6 th step: RSEM calculate expression Parameters RSEM Reference Source : Set it to From your history RSEM reference : Your previous reference Library type : Set it to Paired End Reads Read 1 fastq file and Read 2 fastq file : Your reads (fastq) Use bowtie 1 or 2? : Set it to Bowtie 2 Is the library strand specific? : Set it to forward orientation C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

33 Differential expression, transcript level EBSeq algorithm Isoforms differential expression: EBSeq EBSeq: Empirical Bayesian approach that models a number of features observed in RNA-Seq data. Runs EBSeq to find DE isoforms across two conditions: Isoform level DE test across two conditions C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

34 Differential expression, transcript level EBSeq algorithm EBSeq algorithm Mapping incertainty increases due to the presence of multiple isoforms of a given gene. EBSeq: Expected count for an isoform is distributed as Negative Binomiale Isoform-specific means and variances are estimated via the EM algorithm EBSeq accomodates isoform expression estimation uncertainty by modeling the differential variability observed in distinct groups of isoforms. 3 groups: following the number of isoforms associated to each gene (1, 2 or 3 and more) Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C.: EBSeq: an empirical Bayes hierarchical model for inference in RNA-Seq experiments. Bioinformatics Apr 15;29(8): C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

35 Differential expression, transcript level EBSeq algorithm EBSeq directly models isoform expression A collective analysis of isoforms: reduces the power for identifying isoform in the 1 group (the true variance in that group are lower, on average, than those derived from the full collection of isoforms) increases the false discoveries in the 2 other groups (true variances are higher). Changes of the estimation incertainty with the increase of isoform complexity Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould MN, Stewart RM, Kendziorski C.: EBSeq: an empirical Bayes hierarchical model for inference in RNA-Seq experiments. Bioinformatics Apr 15;29(8): C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

36 Differential expression, transcript level Pre-processing data Isoforms differential expression: EBSeq Empirical Bayesian approach that models a number of features observed in RNA-Seq data. 2 workflows: Create a vector with the related group for each isoform Create IG Vector 4 RSEM outputs 1 EBSeq input Create Expression Table Runs EBSeq to find DE isoforms across two conditions: Isoform level DE test across two conditions C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

37 Differential expression, transcript level Pre-processing data Isoforms differential expression: EBSeq Empirical Bayesian approach that models a number of features observed in RNA-Seq data. 2 workflows: Create a vector with the related group for each isoform Create IG Vector 4 RSEM outputs 1 EBSeq input Create Expression Table Runs EBSeq to find DE isoforms across two conditions: Isoform level DE test across two conditions C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

38 Differential expression, transcript level Pre-processing data From RSEM to EBSeq We have: 4 files (1 per replicate) Those files are identically ordered by transcript names We need: - 1 file containing the number of isoforms each gene owns: IG vector - 1 file for all expected expression: Expression table We have to convert RSEM output to fit EBSeq input s requirements. C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

39 Differential expression, transcript level Pre-processing data TP 7 th step: EBSeq IG Vector (1/3) What is the IG Vector? - The IG Vector is a table with only one column of numbers (integers) - Each row corresponds to a transcript on the same row in the Expression table. Each integer in the IG Vector corresponds to the group 1, 2 or 3, according to the number of isoforms of the gene related to the considered isoform Tools : - Cut and Remove beginning from Text Manipulation section - Get Ig vector from gene-isoform mapping for isoform level DE analysis, available in EBSeq section C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

40 Differential expression, transcript level Pre-processing data TP 7 th step: EBSeq IG Vector (2/3) RSEM Isoform Abundance table EBSeq IG Vector C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

41 Differential expression, transcript level Pre-processing data TP 7 th step: EBSeq IG Vector (3/3) parameter input: A count table from RSEM. Caution: All Isoform abundances tabular files have the same succession of transcripts and genes names through each line. This succession is used by the Create IG Vector workflow. Therefore, any Isoform abundance file may be used in this step. C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

42 Differential expression, transcript level Pre-processing data Isoforms differential expression: EBseq Empirical Bayesian approach that models a number of features observed in RNA-Seq data. 2 workflows: Create a vector with the related group for each isoform Create IG Vector 4 RSEM outputs 1 EBSeq input Create Expression Table Runs EBSeq to find DE isoforms across two conditions: Isoform level DE test across two conditions C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

43 Differential expression, transcript level Pre-processing data From RSEM Expression Table to EBSeq Data Matrix The expression table 5 columns: - Transcripts name - The expected expression of F1, F2, M1 and M2 Obtained by merging the 5 th column of RSEM Isoform Expression results. Tool: Create Expression Table, available among the shared workflows C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

44 Differential expression, transcript level Pre-processing data TP 8 th step: Create Expression Table parameters First Dataset, Second Dataset, Third Dataset, and Fourth Dataset: Your count tables C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

45 Differential expression, transcript level Differential analysis Isoforms differential expression: EBseq Empirical Bayesian approach that models a number of features observed in RNA-Seq data. 2 workflows: Create a vector with the related group for each isoform Create IG Vector 4 RSEM outputs 1 EBSeq input Create Expression Table Runs EBSeq to find DE isoforms across two conditions: Isoform level DE test across two conditions C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

46 Differential expression, transcript level Differential analysis TP 9 th step: EBSeq Differential expression Parameters Isoform Expression : Our Data Matrix The first row is Sample Names : Yes Enter which condition each sample belongs to : M, M, F, F Ig Vector : Our IG Vector C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

47 List of DE isoforms Differential expression, transcript level Differential analysis C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

48 Conclusion Isoforms differential expression: methods and tools Classical RNA-Seq analysis method. Many methods (and tools): Expression estimation: Bayesian estimation of parameters of a model: BitSeq, Cufflinks, express Expectation-Maximization approach to inferring isoform abundances: RSEM-EBseq, Sailfish/Salmon, Kallisto Mapping to: the genome: Cuffdiff2, BitSeq, FluxCapacitor the transcriptome: express, RSEM-EBseq Mapping-free: Sailfish/Salmon, Kallisto C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

49 Mapping-free? Conclusion Kallisto example: De Bruijn Graph on transcriptome Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal RNA-Seq quantification with kallisto.nat Biotechnol May;34(5):525-7 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

50 Mapping-free? Conclusion Kallisto example: De Bruijn Graph on transcriptome C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

51 Mapping-free? Conclusion Kallisto example: De Bruijn Graph on transcriptome C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

52 Mapping-free? Conclusion Kallisto example: De Bruijn Graph on transcriptome C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

53 Mapping-free? Conclusion Kallisto example: De Bruijn Graph on transcriptome C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

54 Mapping-free? Conclusion Kallisto example: De Bruijn Graph on transcriptome Stand for multimap reads Need to adapt algorithm to use stranded RNAseq No mapping = no visualization C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

55 Conclusion Isoforms with RNA-Seq: not yet Isoforms discovery and quantification from RNA-Seq: not yet a well-established measure Methods based on transcriptome are generally better (for quantification but not for discovery) EM methods are better than count-based methods (many EM methods are available but differ little in accuracy) the more abundant is the isoform, the more accurately it is inferred major bottleneck: small size of read (comparing to 2.2 kb for mammals transcripts), multimap reads Evaluate the accuracy of isoform abundance computational methods: difficult too few number of isoform with experimental validation strategies (ex. qrt-pcr) synthetically generated datasets may not capture adequately the complexities of RNA-Seq experiments C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

56 Improve? Conclusion don t forget the micro-arrays designed for isoform detection (not for discovery of new isoform, model organism) gain statistical power with spike measurements make protocols like ribodepletion but for highly expressed housekeeping genes (to enrich with interesting transcripts) complete isoform definitions by other NGS studies? ChIPSeq with a protein from the spliceosome as target capturing the 5 ends of RNAs... full-length cdnas technology (Pacific Biosciences)? a too low throughput (10 4 transcripts, summer 2015) Adapt! biological query + organism + data Parameters, softwares, sequencing protocols (single or paired-end, stranded or not) Kanitz A, Gypas F, Gruber AJ, Gruber AR, Martin G, Zavolan M. Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-Seq data.genome Biol Jul 23;16:150 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

57 RNA-Seq: just a photo Conclusion RNA-Seq is just an unique and sampled RNA capture in a given position, at a given time, of one biological experiment... a poor quality photo comparing to real life C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

58 Start a new workflow Bonus: Création d un workflow 1/9 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

59 Add some details Bonus: Création d un workflow 2/9 Both name and annotation are important for your own workflows management C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

60 Add some details Bonus: Création d un workflow 3/9 The workflow is created empty, let us add some tags before diving through tools C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

61 Cut Bonus: Création d un workflow 4/9 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

62 Add some actions Bonus: Création d un workflow 5/9 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

63 Remove Beginning Bonus: Création d un workflow 6/9 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

64 EBSeq IG Vector Bonus: Création d un workflow 7/9 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

65 Custom output Bonus: Création d un workflow 8/9 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

66 End Bonus: Création d un workflow 9/9 Do not forget to save! C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification from RNA-Seq data November / 66

EBSeq: An R package for differential expression analysis using RNA-seq data

EBSeq: An R package for differential expression analysis using RNA-seq data EBSeq: An R package for differential expression analysis using RNA-seq data Ning Leng, John Dawson, and Christina Kendziorski October 14, 2013 Contents 1 Introduction 2 2 Citing this software 2 3 The Model

More information

Alignment-free RNA-seq workflow. Charlotte Soneson University of Zurich Brixen 2017

Alignment-free RNA-seq workflow. Charlotte Soneson University of Zurich Brixen 2017 Alignment-free RNA-seq workflow Charlotte Soneson University of Zurich Brixen 2017 The alignment-based workflow ALIGNMENT COUNTING ANALYSIS Gene A Gene B... Gene X 7... 13............... The alignment-based

More information

New RNA-seq workflows. Charlotte Soneson University of Zurich Brixen 2016

New RNA-seq workflows. Charlotte Soneson University of Zurich Brixen 2016 New RNA-seq workflows Charlotte Soneson University of Zurich Brixen 2016 Wikipedia The traditional workflow ALIGNMENT COUNTING ANALYSIS Gene A Gene B... Gene X 7... 13............... The traditional workflow

More information

Annotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA)

Annotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA) Annotation of Plant Genomes using RNA-seq Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA) inuscu1-35bp 5 _ 0 _ 5 _ What is Annotation inuscu2-75bp luscu1-75bp 0 _ 5 _ Reconstruction

More information

Our typical RNA quantification pipeline

Our typical RNA quantification pipeline RNA-Seq primer Our typical RNA quantification pipeline Upload your sequence data (fastq) Align to the ribosome (Bow>e) Align remaining reads to genome (TopHat) or transcriptome (RSEM) Make report of quality

More information

Bias in RNA sequencing and what to do about it

Bias in RNA sequencing and what to do about it Bias in RNA sequencing and what to do about it Walter L. (Larry) Ruzzo Computer Science and Engineering Genome Sciences University of Washington Fred Hutchinson Cancer Research Center Seattle, WA, USA

More information

High-throughput sequencing: Alignment and related topic

High-throughput sequencing: Alignment and related topic High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg HTS Platforms E s ta b lis h e d p la tfo rm s Illu m in a H is e q, A B I S O L id, R o c h e 4 5 4 N e w c o m e rs

More information

The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector.

The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector. The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector. Omar S. Akbari*, Igor Antoshechkin*, Henry Amrhein, Brian Williams, Race Diloreto, Jeremy

More information

express: Streaming read deconvolution and abundance estimation applied to RNA-Seq

express: Streaming read deconvolution and abundance estimation applied to RNA-Seq express: Streaming read deconvolution and abundance estimation applied to RNA-Seq Adam Roberts 1 and Lior Pachter 1,2 1 Department of Computer Science, 2 Departments of Mathematics and Molecular & Cell

More information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007 -2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

Comparative analysis of RNA- Seq data with DESeq2

Comparative analysis of RNA- Seq data with DESeq2 Comparative analysis of RNA- Seq data with DESeq2 Simon Anders EMBL Heidelberg Two applications of RNA- Seq Discovery Eind new transcripts Eind transcript boundaries Eind splice junctions Comparison Given

More information

SUSTAINABLE AND INTEGRAL EXPLOITATION OF AGAVE

SUSTAINABLE AND INTEGRAL EXPLOITATION OF AGAVE SUSTAINABLE AND INTEGRAL EXPLOITATION OF AGAVE Editor Antonia Gutiérrez-Mora Compilers Benjamín Rodríguez-Garay Silvia Maribel Contreras-Ramos Manuel Reinhart Kirchmayr Marisela González-Ávila Index 1.

More information

Statistical Inferences for Isoform Expression in RNA-Seq

Statistical Inferences for Isoform Expression in RNA-Seq Statistical Inferences for Isoform Expression in RNA-Seq Hui Jiang and Wing Hung Wong February 25, 2009 Abstract The development of RNA sequencing (RNA-Seq) makes it possible for us to measure transcription

More information

COLE TRAPNELL, BRIAN A WILLIAMS, GEO PERTEA, ALI MORTAZAVI, GORDON KWAN, MARIJKE J VAN BAREN, STEVEN L SALZBERG, BARBARA J WOLD, AND LIOR PACHTER

COLE TRAPNELL, BRIAN A WILLIAMS, GEO PERTEA, ALI MORTAZAVI, GORDON KWAN, MARIJKE J VAN BAREN, STEVEN L SALZBERG, BARBARA J WOLD, AND LIOR PACHTER SUPPLEMENTARY METHODS FOR THE PAPER TRANSCRIPT ASSEMBLY AND QUANTIFICATION BY RNA-SEQ REVEALS UNANNOTATED TRANSCRIPTS AND ISOFORM SWITCHING DURING CELL DIFFERENTIATION COLE TRAPNELL, BRIAN A WILLIAMS,

More information

RNA- seq read mapping

RNA- seq read mapping RNA- seq read mapping Pär Engström SciLifeLab RNA- seq workshop October 216 IniDal steps in RNA- seq data processing 1. Quality checks on reads 2. Trim 3' adapters (opdonal (for species with a reference

More information

Supplemental Information

Supplemental Information Molecular Cell, Volume 52 Supplemental Information The Translational Landscape of the Mammalian Cell Cycle Craig R. Stumpf, Melissa V. Moreno, Adam B. Olshen, Barry S. Taylor, and Davide Ruggero Supplemental

More information

GEP Annotation Report

GEP Annotation Report GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:

More information

Introduction. SAMStat. QualiMap. Conclusions

Introduction. SAMStat. QualiMap. Conclusions Introduction SAMStat QualiMap Conclusions Introduction SAMStat QualiMap Conclusions Where are we? Why QC on mapped sequences Acknowledgment: Fernando García Alcalde The reads may look OK in QC analyses

More information

g A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n(

g A n(a, g) n(a, ḡ) = n(a) n(a, g) n(a) B n(b, g) n(a, ḡ) = n(b) n(b, g) n(b) g A,B A, B 2 RNA-seq (D) RNA mrna [3] RNA 2. 2 NGS 2 A, B NGS n( ,a) RNA-seq RNA-seq Cuffdiff, edger, DESeq Sese Jun,a) Abstract: Frequently used biological experiment technique for observing comprehensive gene expression has been changed from microarray using cdna

More information

Unit-free and robust detection of differential expression from RNA-Seq data

Unit-free and robust detection of differential expression from RNA-Seq data Unit-free and robust detection of differential expression from RNA-Seq data arxiv:405.4538v [stat.me] 8 May 204 Hui Jiang,2,* Department of Biostatistics, University of Michigan 2 Center for Computational

More information

Mixtures and Hidden Markov Models for analyzing genomic data

Mixtures and Hidden Markov Models for analyzing genomic data Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche

More information

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences Supplementary Material

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences Supplementary Material Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences Supplementary Material Charlotte Soneson, Michael I. Love, Mark D. Robinson Contents 1 Simulation details, sim2

More information

RNASeq Differential Expression

RNASeq Differential Expression 12/06/2014 RNASeq Differential Expression Le Corguillé v1.01 1 Introduction RNASeq No previous genomic sequence information is needed In RNA-seq the expression signal of a transcript is limited by the

More information

ASSESSING TRANSLATIONAL EFFICIACY THROUGH POLY(A)- TAIL PROFILING AND IN VIVO RNA SECONDARY STRUCTURE DETERMINATION

ASSESSING TRANSLATIONAL EFFICIACY THROUGH POLY(A)- TAIL PROFILING AND IN VIVO RNA SECONDARY STRUCTURE DETERMINATION ASSESSING TRANSLATIONAL EFFICIACY THROUGH POLY(A)- TAIL PROFILING AND IN VIVO RNA SECONDARY STRUCTURE DETERMINATION Journal Club, April 15th 2014 Karl Frontzek, Institute of Neuropathology POLY(A)-TAIL

More information

Alignment. Peak Detection

Alignment. Peak Detection ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie

More information

Statistics for Differential Expression in Sequencing Studies. Naomi Altman

Statistics for Differential Expression in Sequencing Studies. Naomi Altman Statistics for Differential Expression in Sequencing Studies Naomi Altman naomi@stat.psu.edu Outline Preliminaries what you need to do before the DE analysis Stat Background what you need to know to understand

More information

Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics

Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics Genome 541 Unit 4, lecture 2 Transcription factor binding using functional genomics Slides vs chalk talk: I m not sure why you chose a chalk talk over ppt. I prefer the latter no issues with readability

More information

Introduction to de novo RNA-seq assembly

Introduction to de novo RNA-seq assembly Introduction to de novo RNA-seq assembly Introduction Ideal day for a molecular biologist Ideal Sequencer Any type of biological material Genetic material with high quality and yield Cutting-Edge Technologies

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 25 no. 8 29, pages 126 132 doi:1.193/bioinformatics/btp113 Gene expression Statistical inferences for isoform expression in RNA-Seq Hui Jiang 1 and Wing Hung Wong 2,

More information

The official electronic file of this thesis or dissertation is maintained by the University Libraries on behalf of The Graduate School at Stony Brook

The official electronic file of this thesis or dissertation is maintained by the University Libraries on behalf of The Graduate School at Stony Brook Stony Brook University The official electronic file of this thesis or dissertation is maintained by the University Libraries on behalf of The Graduate School at Stony Brook University. Alll Rigghht tss

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier

ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier Data visualization, quality control, normalization & peak calling Peak annotation Presentation () Practical session

More information

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014 Going Beyond SNPs with Next Genera5on Sequencing Technology 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Next Genera5on Sequencing Technology (NGS) NGS technology Discover more

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Lecture: Mixture Models for Microbiome data

Lecture: Mixture Models for Microbiome data Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance

More information

Genome-wide modelling of transcription kinetics reveals patterns of RNA production delays arxiv: v2 [q-bio.

Genome-wide modelling of transcription kinetics reveals patterns of RNA production delays arxiv: v2 [q-bio. Genome-wide modelling of transcription kinetics reveals patterns of RNA production delays arxiv:153.181v2 [q-bio.gn] 16 Jul 215 Antti Honkela 1, Jaakko Peltonen 2,3, Hande Topa 2, Iryna Charapitsa 4, Filomena

More information

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST

More information

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder

More information

TRANSCRIPTOMICS. (or the analysis of the transcriptome) Mario Cáceres. Main objectives of genomics. Determine the entire DNA sequence of an organism

TRANSCRIPTOMICS. (or the analysis of the transcriptome) Mario Cáceres. Main objectives of genomics. Determine the entire DNA sequence of an organism TRANSCRIPTOMICS (or the analysis of the transcriptome) Mario Cáceres Main objectives of genomics Determine the entire DNA sequence of an organism Identify and annotate the complete set of genes encoded

More information

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA

Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Technologie w skali genomowej 2/ Algorytmiczne i statystyczne aspekty sekwencjonowania DNA Expression analysis for RNA-seq data Ewa Szczurek Instytut Informatyki Uniwersytet Warszawski 1/35 The problem

More information

Bayesian Clustering of Multi-Omics

Bayesian Clustering of Multi-Omics Bayesian Clustering of Multi-Omics for Cardiovascular Diseases Nils Strelow 22./23.01.2019 Final Presentation Trends in Bioinformatics WS18/19 Recap Intermediate presentation Precision Medicine Multi-Omics

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier

ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier ChIP-seq analysis M. Defrance, C. Herrmann, S. Le Gras, D. Puthier, M. Thomas.Chollier Visualization, quality, normalization & peak-calling Presentation (Carl Herrmann) Practical session Peak annotation

More information

Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons

Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons Gao and Li BMC Genomics (2017) 18:234 DOI 10.1186/s12864-017-3600-2 RESEARCH ARTICLE Open Access Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics

More information

DEXSeq paper discussion

DEXSeq paper discussion DEXSeq paper discussion L Collado-Torres December 10th, 2012 1 / 23 1 Background 2 DEXSeq paper 3 Results 2 / 23 Gene Expression 1 Background 1 Source: http://www.ncbi.nlm.nih.gov/projects/genome/probe/doc/applexpression.shtml

More information

Single Cell Sequencing

Single Cell Sequencing Single Cell Sequencing Fundamental unit of life Autonomous and unique Interactive Dynamic - change over time Evolution occurs on the cellular level Robert Hooke s drawing of cork cells, 1665 Type Prokaryotes

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford

RNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford RNAseq Applications in Genome Studies Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford RNAseq Protocols } Next generation sequencing protocol } cdna, not RNA sequencing

More information

Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics

Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics Genome 541 Gene regulation and epigenomics Lecture 2 Transcription factor binding using functional genomics I believe it is helpful to number your slides for easy reference. It's been a while since I took

More information

Taxonomical Classification using:

Taxonomical Classification using: Taxonomical Classification using: Extracting ecological signal from noise: introduction to tools for the analysis of NGS data from microbial communities Bergen, April 19-20 2012 INTRODUCTION Taxonomical

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Introduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas

Introduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Introduc)on to RNA- Seq Data Analysis Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Material: hep://)ny.cc/rnaseq Slides: hep://)ny.cc/slidesrnaseq

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

SPOTTED cdna MICROARRAYS

SPOTTED cdna MICROARRAYS SPOTTED cdna MICROARRAYS Spot size: 50um - 150um SPOTTED cdna MICROARRAYS Compare the genetic expression in two samples of cells PRINT cdna from one gene on each spot SAMPLES cdna labelled red/green e.g.

More information

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database

Overview - MS Proteomics in One Slide. MS masses of peptides. MS/MS fragments of a peptide. Results! Match to sequence database Overview - MS Proteomics in One Slide Obtain protein Digest into peptides Acquire spectra in mass spectrometer MS masses of peptides MS/MS fragments of a peptide Results! Match to sequence database 2 But

More information

Student Handout Fruit Fly Ethomics & Genomics

Student Handout Fruit Fly Ethomics & Genomics Student Handout Fruit Fly Ethomics & Genomics Summary of Laboratory Exercise In this laboratory unit, students will connect behavioral phenotypes to their underlying genes and molecules in the model genetic

More information

Web-based Supplementary Materials for BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data

Web-based Supplementary Materials for BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data Web-based Supplementary Materials for BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data Yuan Ji 1,, Yanxun Xu 2, Qiong Zhang 3, Kam-Wah Tsui 3, Yuan Yuan 4, Clift Norris 1, Shoudan

More information

Multi-Assembly Problems for RNA Transcripts

Multi-Assembly Problems for RNA Transcripts Multi-Assembly Problems for RNA Transcripts Alexandru Tomescu Department of Computer Science University of Helsinki Joint work with Veli Mäkinen, Anna Kuosmanen, Romeo Rizzi, Travis Gagie, Alex Popa CiE

More information

1 Decomposition of ESG

1 Decomposition of ESG 1 Decomposition of ESG DiffSplice resolves alternative splicing events in complex gene models through decomposition of the splice graph. Figure 1 shows the hierarchical decomposition on gene VEGFA. In

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

Supplementary Figure 1 The number of differentially expressed genes for uniparental males (green), uniparental females (yellow), biparental males

Supplementary Figure 1 The number of differentially expressed genes for uniparental males (green), uniparental females (yellow), biparental males Supplementary Figure 1 The number of differentially expressed genes for males (green), females (yellow), males (red), and females (blue) in caring vs. control comparisons in the caring gene set and the

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

SpliceGrapherXT: From Splice Graphs to Transcripts Using RNA-Seq

SpliceGrapherXT: From Splice Graphs to Transcripts Using RNA-Seq SpliceGrapherXT: From Splice Graphs to Transcripts Using RNA-Seq Mark F. Rogers, Christina Boucher, and Asa Ben-Hur Department of Computer Science 1873 Campus Delivery Fort Collins, CO 80523 rogersma@cs.colostate.edu,

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

Statistical Models for Gene and Transcripts Quantification and Identification Using RNA-Seq Technology

Statistical Models for Gene and Transcripts Quantification and Identification Using RNA-Seq Technology Purdue University Purdue e-pubs Open Access Dissertations Theses and Dissertations Fall 2013 Statistical Models for Gene and Transcripts Quantification and Identification Using RNA-Seq Technology Han Wu

More information

Searching Sear ( Sub- (Sub )Strings Ulf Leser

Searching Sear ( Sub- (Sub )Strings Ulf Leser Searching (Sub-)Strings Ulf Leser This Lecture Exact substring search Naïve Boyer-Moore Searching with profiles Sequence profiles Ungapped approximate search Statistical evaluation of search results Ulf

More information

Count ratio model reveals bias affecting NGS fold changes

Count ratio model reveals bias affecting NGS fold changes Published online 8 July 2015 Nucleic Acids Research, 2015, Vol. 43, No. 20 e136 doi: 10.1093/nar/gkv696 Count ratio model reveals bias affecting NGS fold changes Florian Erhard * and Ralf Zimmer Institut

More information

TUTORIAL EXERCISES WITH ANSWERS

TUTORIAL EXERCISES WITH ANSWERS TUTORIAL EXERCISES WITH ANSWERS Tutorial 1 Settings 1. What is the exact monoisotopic mass difference for peptides carrying a 13 C (and NO additional 15 N) labelled C-terminal lysine residue? a. 6.020129

More information

Matrix-based pattern discovery algorithms

Matrix-based pattern discovery algorithms Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)

More information

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis March 18, 2016 UVA Seminar RNA Seq 1 RNA Seq Gene expression is the transcription of the

More information

Differential expression analysis for sequencing count data. Simon Anders

Differential expression analysis for sequencing count data. Simon Anders Differential expression analysis for sequencing count data Simon Anders RNA-Seq Count data in HTS RNA-Seq Tag-Seq Gene 13CDNA73 A2BP1 A2M A4GALT AAAS AACS AADACL1 [...] ChIP-Seq Bar-Seq... GliNS1 4 19

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

Supplemental Data. Perea-Resa et al. Plant Cell. (2012) /tpc

Supplemental Data. Perea-Resa et al. Plant Cell. (2012) /tpc Supplemental Data. Perea-Resa et al. Plant Cell. (22)..5/tpc.2.3697 Sm Sm2 Supplemental Figure. Sequence alignment of Arabidopsis LSM proteins. Alignment of the eleven Arabidopsis LSM proteins. Sm and

More information

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1. Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

High-throughput sequence alignment. November 9, 2017

High-throughput sequence alignment. November 9, 2017 High-throughput sequence alignment November 9, 2017 a little history human genome project #1 (many U.S. government agencies and large institute) started October 1, 1990. Goal: 10x coverage of human genome,

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Synteny Portal Documentation

Synteny Portal Documentation Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,

More information

Package NarrowPeaks. September 24, 2012

Package NarrowPeaks. September 24, 2012 Package NarrowPeaks September 24, 2012 Version 1.0.1 Date 2012-03-15 Type Package Title Functional Principal Component Analysis to Narrow Down Transcription Factor Binding Site Candidates Author Pedro

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Exhaustive search. CS 466 Saurabh Sinha

Exhaustive search. CS 466 Saurabh Sinha Exhaustive search CS 466 Saurabh Sinha Agenda Two different problems Restriction mapping Motif finding Common theme: exhaustive search of solution space Reading: Chapter 4. Restriction Mapping Restriction

More information

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database SoyBase, the USDA-ARS Soybean Genetics and Genomics Database David Grant Victoria Carollo Blake Steven B. Cannon Kevin Feeley Rex T. Nelson Nathan Weeks SoyBase Site Map and Navigation Video Tutorials:

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary Discussion Rationale for using maternal ythdf2 -/- mutants as study subject To study the genetic basis of the embryonic developmental delay that we observed, we crossed fish with different

More information

Expression arrays, normalization, and error models

Expression arrays, normalization, and error models 1 Epression arrays, normalization, and error models There are a number of different array technologies available for measuring mrna transcript levels in cell populations, from spotted cdna arrays to in

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

86 Part 4 SUMMARY INTRODUCTION

86 Part 4 SUMMARY INTRODUCTION 86 Part 4 Chapter # AN INTEGRATION OF THE DESCRIPTIONS OF GENE NETWORKS AND THEIR MODELS PRESENTED IN SIGMOID (CELLERATOR) AND GENENET Podkolodny N.L. *1, 2, Podkolodnaya N.N. 1, Miginsky D.S. 1, Poplavsky

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions? 1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section

More information

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji

Gene Regula*on, ChIP- X and DNA Mo*fs. Statistics in Genomics Hongkai Ji Gene Regula*on, ChIP- X and DNA Mo*fs Statistics in Genomics Hongkai Ji (hji@jhsph.edu) Genetic information is stored in DNA TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCTC

More information

GBS Bioinformatics Pipeline(s) Overview

GBS Bioinformatics Pipeline(s) Overview GBS Bioinformatics Pipeline(s) Overview Getting from sequence files to genotypes. Pipeline Coding: Ed Buckler Jeff Glaubitz James Harriman Presentation: Terry Casstevens With supporting information from

More information