RNA- seq read mapping
|
|
- Jane Robyn Ball
- 6 years ago
- Views:
Transcription
1 RNA- seq read mapping Pär Engström SciLifeLab RNA- seq workshop October 216
2 IniDal steps in RNA- seq data processing 1. Quality checks on reads 2. Trim 3' adapters (opdonal (for species with a reference genome 3. Index reference genome 4. Map reads to genome (output in SAM or BAM format 5. Convert results to a sorted, indexed BAM file 6. Quality checks on mapped reads 7. Visualize read mappings on the genome Followed by further analyses
3 Input: sequence reads (FASTQ CTTCATTTCCCTCCAGTCCCTGGAGGGGCTTCTAGTATTACTGGGACAATGACCACGCTGCCTGTTTGTCTGTGAGTTACGGGCAACCAGCCTC + CGACCAGCTGATCGTGTCTCCAAGGGCAGAAGCACAAGCGGGGAGGCTGGGGTGGCTGCAGCGAGGTCCTCCCTAAGTAGGGCAGGGGAGCCCC + GGTGGCTGCAGCGAGGTCCTCCCTAAGTAGGGCAGGGGAGCCCCCAGGTGGGGAGGGCTCATGGGGGCCAGGGAGTAAGGCTGGCTCCCCTGGT + CCCACCTGCAACTTTCCTCCAAGTGTGGCTCGGAGAAGAAACATCAACAAGGACCCTGGGCTTCGATTCAAAAACTCCTCTGAAGCCATCCATG + CGCCTGCCCAGCAGTGTTTATCCTGGGATCCTCCTATTGGGGTTGAGGGAGGGGAAGACAGCAGGAAGGTTGAGGGAGCAGCAACTTGGCCAGA + ^\\cccc^y[ybee^bfcegagx_^aeehhheebzpbf_rzeo^_ea]`ye`[wyy^q_xab]zz^z\_ay[gy^anrow^pqxqx`a`xy`p^...
4 Goal: reads mapped to genome (SAM format HWI-ST118:7:126:3667:137198# 97 chr M2769N47M7S chr2 HWI-ST118:7:235:11836:132357# 177 chr S9M chr2 HWI-ST118:7:125:1818:8988# 97 chr M5S chr HWI-ST118:7:113:2457:7159# 129 chr M chr HWI-ST118:7:117:1423:14655# 99 chr M = HWI-ST118:7:116:168:6339# 163 chr M = 7336 HWI-ST118:7:236:199:6213# 99 chr M = 7337 HWI-ST118:7:235:8697:195892# 163 chr S97M = 7336 HWI-ST118:7:128:124:5258# 99 chr M3S = 7336 HWI-ST118:7:117:1423:14655# 147 chr M = HWI-ST118:7:128:1123:715# 99 chr M = 7336 HWI-ST118:7:217:11555:46214# 163 chr M = 7336 HWI-ST118:7:112:1213:8767# 73 chr M = 7335 HWI-ST118:7:112:1213:8767# 133 chr * = 7335 HWI-ST118:7:126:3667:137198# 145 chr M chr HWI-ST118:7:128:16138:8853# 99 chr M = 7337 HWI-ST118:7:226:7742:86872# 163 chr M = 7336 HWI-ST118:7:138:1466:19516# 99 chr S1M = 7338 HWI-ST118:7:231:14871:8111# 99 chr M = 7337 HWI-ST118:7:221:13683:6477# 145 chr S9M =
5 VisualizaDon of read alignments Scale chr2: 2 kb hg19 136,872, 136,873, 136,874, Gm12878 RNA-seq reads 136,875, 136,876, 4 _ Gm12878 RNA-seq coverage, minus strand Gm12878_minus _ 4.88 _ 1 Vert. Cons _ Common SNPs(144 RepeatMasker CXCR4 CXCR4 RefSeq Genes 1 vertebrates Basewise Conservation by PhyloP Simple Nucleotide Polymorphisms (dbsnp 144 Found in >= 1% of Samples Repeating Elements by RepeatMasker
6 Spliced alignment k Garber et al. Nature Methods 211
7 Introns can be very large! Human introns (Ensembl 1..8 Cumulative proportion , 1, 1, 1M 1M Intron size (bp
8 Limited sequence signals at splice sites 5 ss BP 3 ss H. sapiens D. rerio D. melanogaster P. chrysosporium S. cerevisiae P. tricornutum A. thaliana GT AG 98.6% GC AG 1.3% AT AC.1% Iwata and Gotoh BMC Genomics 211
9 MulD- mapping reads and pseudogenes FuncDonal gene Processed pseudogene Correct read alignment IdenDcal, spliced Incorrect read alignment Mismatches, not spliced Note: An aligner may report both alignments or either Some search strategies and scoring schemes give preference to unspliced alignments
10 How important is mapping accuracy? Depends what you want to do: IdenDfy novel genedc variants or RNA edidng Importance Allele- specific expression Genome annotadon Gene and transcript discovery DifferenDal expression
11 Current RNA- seq aligners TopHat2 Kim et al. Genome Biology 213 HISAT2 Kim et al. Nature Methods 215 STAR Dobin et al. Bioinforma8cs 213 GSNAP Wu and Nacu Bioinforma8cs 21 OLego Wu et al. Nucleic Acids Research 213 HPG aligner Medina et al. DNA Research 216 MapSplice2 hjp://
12 Compute requirements Program Run time (min Memory usage (GB HISATx HISATx HISAT STAR STARx GSNAP OLego TopHat2 1, Run times and memory usage for HISAT and other spliced aligners to align 19 million 11-bp RNA-seq reads from a lung fibroblast data set. We used three CPU cores to run the programs on a Mac Pro with a 3.7 GHz Quad-Core Intel Xeon E5 processor and 64 GB of RAM. Kim et al. Nature Methods 215
13 The predecessor: BLAT In the process of assembling and annotadng the human genome, I was faced with two very large- scale alignment problems: aligning three million ESTs and aligning 13 million mouse whole- genome random reads against the human genome. These alignments needed to be done in less than two weeks Dme on a moderate- sized (9 CPU Linux cluster in order to have Dme to process an updated genome every month or two. To achieve this I developed a very- high- speed mrna/dna and translated protein alignment algorithm. (Kent Genome Research 22
14 InnovaDons in RNA- seq alignment sooware Read pair alignment Consider base call quality scores SophisDcated indexing to decrease CPU and memory usage Map to genedc variants Resolve muld- mappers using regional read coverage Consider juncdon annotadon Two- step approach (juncdon discovery & final alignment
15 Two- step RNA- seq read mapping 1 st runofhisattodiscoversplicesites mapped e1# e2# e3# unmapped 2 nd runofhisattoalignreadsbymakinguseofthelistofsplicesitescollectedabove e1# e2# e3# Read Exon Intron GlobalSearch LocalSearch Extension Junc:onextension Kim et al. Nature Methods 215
16 Mapping accuracy Correctly and uniquely mapped Correctly mapped (multimapped Incorrectly mapped Unmapped ( ( ( ( ( ( ( (97.4 ( % % + % Percentage of reads HISAT 1 OLego GSNAP STAR STAR 2 HISAT HISAT 2 TopHat2 Accuracy for 2 million simulated human 1 bp reads with.5% mismatch rate Kim et al. Nature Methods 215
17 Mapping accuracy for reads with small anchors Percentage of reads, 2M_8_15 Percentage of reads, 2M_1_7 Correctly and uniquely mapped ( (7.7 HISAT (88.5 ( OLego Correctly mapped (multimapped 9.4 ( (9.4 GSNAP 51.1 (52.2 ( STAR 94.4 ( (92.6 STAR 2 Incorrectly mapped 97.2 ( (97.3 HISAT 97.2 ( (98.3 HISAT 2 Unmapped 93.5 ( (96 TopHat2 ( ( % % + % % % + % b 62.4% (M 25.1% (2M_gt_15 3.1% (gt_2m 4.2% (2M_1_7 5.1% (2M_8_15 Kim et al. Nature Methods 215
18 Novel juncdons are typically supported by few alignments b K562 whole cell K562 whole cell Annotated junctions 5, 1, 15, Novel junctions 1, 2, 3, Annotated junctions Minimum number of supporting mappings Minimum number of supporting mappings Each curve represents one RNA- seq read mapping protocol (program + sesngs. Engström et al. Nature Methods 213
19 1 c 1, 11, 12, True junctions d 68, 1, 72, 11, 76, 12, 8, 2, Several methods show over- confidence in annotadon Minimum number of supporting mappings 5, 1, 15, 2, 25, Annotated junctions All junctions False junctions Simulation 2 68, 68, 72, 76, 8, 68, 72, 3, 76, 35, 8, 4, 5, 1, 15, BAGET ann PASS Simulation 2 Simulation Annotated GEM ann 2 junctio PASS cons Simulation GEM cons ReadsMap Simulation 1 GEM cons ann SMALT Simulation 1 GSNAP STAR 1p GSNAP ann STAR 1p ann GSTRUCT STAR 2p GSTRUCT ann STAR 2p ann MapSplice TopHat1 MapSplice ann TopHat1 ann PALMapper TopHat2 PALM. ann TopHat2 ann PALM. cons PALM. cons ann 1, 5, 2, 1, 3, 15, 2, 4, 5, 1, 15, 5, 1, 15, 2, 25, 5, 1, 15, Engström et al. Nature Methods 213 d 8, Minimum numb False junctions Simulation 2
20 RecommendaDons Use STAR, HISAT2 or GSNAP STAR and HISAT2 are the fastest HISAT2 uses the least memory If you want to run Cufflinks, use TopHat2 (but don t Consider 2- pass read mapping (default in HISAT2 and TopHat2 No need to supply annotadon to mapper Check that juncdon discovery criteria are conservadve HISAT2 and GSNAP can use SNP data, which may give higher sensidvity For long (PacBio reads, STAR, BLAT or GMAP can be used Don t trust novel introns supported by single reads Always check the results!
21 Unsolved problems in RNA- seq read mapping Determine correct locadon of muldmapping reads Accurate alignment of indels Use gene annotadon in an unbiased fashion Cross- species mapping
22 Browsing your results Two main browsers: Integra:ve Genomics Viewer (IGV + Fast response (runs locally + Easy to load your data (including custom genomes - Limited funcdonality - User interface issues UCSC Genome Brower - - Sluggish (remote web site Need to place data on web server (e.g. UPPMAX webexport + Much public data for comparison + Good for sharing your data tracks (e.g. using track hubs
23 Important SAM fields Command: samtools view X file.bam Perfectly and uniquely aligned read pair: HWI-ST118:3:135:219:45397# ppr1 chr M = GT C@ NH:i:1 HI:i:1 AS:i:2 nm:i: HWI-ST118:3:135:219:45397# ppr2 chr M NH:i:1 HI:i:1 AS:i:2 nm:i: = CG 5< Problema:c read pair: HWI-ST118:3:219:617:66353# ppr2s chr M36S = CA B@ NH:i:2 HI:i:2 AS:i:135 nm:i:9 HWI-ST118:3:219:617:66353# ppr1s chr S73M1D21M = CC ## NH:i:2 HI:i:2 AS:i:135 nm:i:9
24 Thanks for listening!
GEP Annotation Report
GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:
More informationIntroduction. SAMStat. QualiMap. Conclusions
Introduction SAMStat QualiMap Conclusions Introduction SAMStat QualiMap Conclusions Where are we? Why QC on mapped sequences Acknowledgment: Fernando García Alcalde The reads may look OK in QC analyses
More informationWhole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationMathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007
-2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open
More informationHigh-throughput sequencing: Alignment and related topic
High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg HTS Platforms E s ta b lis h e d p la tfo rm s Illu m in a H is e q, A B I S O L id, R o c h e 4 5 4 N e w c o m e rs
More informationRNAseq Applications in Genome Studies. Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford
RNAseq Applications in Genome Studies Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford RNAseq Protocols } Next generation sequencing protocol } cdna, not RNA sequencing
More informationAnnotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA)
Annotation of Plant Genomes using RNA-seq Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA) inuscu1-35bp 5 _ 0 _ 5 _ What is Annotation inuscu2-75bp luscu1-75bp 0 _ 5 _ Reconstruction
More informationBLAT The BLAST-Like Alignment Tool
Resource BLAT The BLAST-Like Alignment Tool W. James Kent Department of Biology and Center for Molecular Biology of RNA, University of California, Santa Cruz, Santa Cruz, California 95064, USA Analyzing
More informationBioinformatics Practical for Biochemists
Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt WS 2013/2014 01. DNA & Genomics!! 1 Description Lectures about general topics in Bioinformatics & History Tutorials
More informationOur typical RNA quantification pipeline
RNA-Seq primer Our typical RNA quantification pipeline Upload your sequence data (fastq) Align to the ribosome (Bow>e) Align remaining reads to genome (TopHat) or transcriptome (RSEM) Make report of quality
More informationGenome Annotation. Qi Sun Bioinformatics Facility Cornell University
Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST
More informationSynteny Portal Documentation
Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,
More informationIsoform discovery and quantification from RNA-Seq data
Isoform discovery and quantification from RNA-Seq data C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Deloger November 2016 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification
More informationAnnotation of Drosophila grimashawi Contig12
Annotation of Drosophila grimashawi Contig12 Marshall Strother April 27, 2009 Contents 1 Overview 3 2 Genes 3 2.1 Genscan Feature 12.4............................................. 3 2.1.1 Genome Browser:
More informationBioinformatics Practical for Biochemists
Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt WS 2012/2013 01. DNA & Genomics 1 Description Lectures about general topics in Bioinformatics & History Tutorials will
More informationIntroduction to Sequence Alignment. Manpreet S. Katari
Introduction to Sequence Alignment Manpreet S. Katari 1 Outline 1. Global vs. local approaches to aligning sequences 1. Dot Plots 2. BLAST 1. Dynamic Programming 3. Hash Tables 1. BLAT 4. BWT (Burrow Wheeler
More informationBrowsing Genomic Information with Ensembl Plants
Browsing Genomic Information with Ensembl Plants Etienne de Villiers, PhD (Adapted from slides by Bert Overduin EMBL-EBI) Outline of workshop Brief introduction to Ensembl Plants History Content Tutorial
More informationEnsembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:
Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationVariant visualisation and quality control
Variant visualisation and quality control You really should be making plots! 25/06/14 Paul Theodor Pyl 1 Classical Sequencing Example DNA.BAM.VCF Aligner Variant Caller A single sample sequencing run 25/06/14
More informationSupplemental Information
Molecular Cell, Volume 52 Supplemental Information The Translational Landscape of the Mammalian Cell Cycle Craig R. Stumpf, Melissa V. Moreno, Adam B. Olshen, Barry S. Taylor, and Davide Ruggero Supplemental
More informationCRISPRseek Workshop Design of target-specific guide RNAs in CRISPR-Cas9 genome-editing systems
April 2008 CRISPRseek Workshop Design of target-specific guide RNAs in CRISPR-Cas9 genome-editing systems Lihua Julie Zhu Sept 10th 2014 Michael Brodsky Jianhong Ou INSTALLATION First install R 3.1.0 Windows:
More informationexpress: Streaming read deconvolution and abundance estimation applied to RNA-Seq
express: Streaming read deconvolution and abundance estimation applied to RNA-Seq Adam Roberts 1 and Lior Pachter 1,2 1 Department of Computer Science, 2 Departments of Mathematics and Molecular & Cell
More informationBias in RNA sequencing and what to do about it
Bias in RNA sequencing and what to do about it Walter L. (Larry) Ruzzo Computer Science and Engineering Genome Sciences University of Washington Fred Hutchinson Cancer Research Center Seattle, WA, USA
More informationBLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010
BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for
More informationComparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes
More informationRGP finder: prediction of Genomic Islands
Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication
More informationIon Torrent. The chip is the machine
Ion Torrent Introduction The Ion Personal Genome Machine [PGM] is simple, more costeffective, and more scalable than any other sequencing technology. Founded in 2007 by Jonathan Rothberg. Part of Life
More informationPredictive Genome Analysis Using Partial DNA Sequencing Data
Predictive Genome Analysis Using Partial DNA Sequencing Data Nauman Ahmed, Koen Bertels and Zaid Al-Ars Computer Engineering Lab, Delft University of Technology, Delft, The Netherlands {n.ahmed, k.l.m.bertels,
More informationSupplementary Information
Supplementary Information LINE-1-like retrotransposons contribute to RNA-based gene duplication in dicots Zhenglin Zhu 1, Shengjun Tan 2, Yaqiong Zhang 2, Yong E. Zhang 2,3 1. School of Life Sciences,
More informationA Browser for Pig Genome Data
A Browser for Pig Genome Data Thomas Mailund January 2, 2004 This report briefly describe the blast and alignment data available at http://www.daimi.au.dk/ mailund/pig-genome/ hits.html. The report describes
More informationPotato Genome Analysis
Potato Genome Analysis Xin Liu Deputy director BGI research 2016.1.21 WCRTC 2016 @ Nanning Reference genome construction???????????????????????????????????????? Sequencing HELL RIEND WELCOME BGI ZHEN LLOFRI
More informationPyrobayes: an improved base caller for SNP discovery in pyrosequences
Pyrobayes: an improved base caller for SNP discovery in pyrosequences Aaron R Quinlan, Donald A Stewart, Michael P Strömberg & Gábor T Marth Supplementary figures and text: Supplementary Figure 1. The
More informationGoing Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014
Going Beyond SNPs with Next Genera5on Sequencing Technology 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Next Genera5on Sequencing Technology (NGS) NGS technology Discover more
More informationPackage NarrowPeaks. September 24, 2012
Package NarrowPeaks September 24, 2012 Version 1.0.1 Date 2012-03-15 Type Package Title Functional Principal Component Analysis to Narrow Down Transcription Factor Binding Site Candidates Author Pedro
More informationSUPPLEMENTARY INFORMATION
SUPPLEMENTARY INFORMATION Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals Young Seok Ju, Jong-Il Kim, Sheehyun
More informationUnfixed endogenous retroviral insertions in the human population. Emanuele Marchi, Alex Kanapin, Gkikas Magiorkinis and Robert Belshaw
Unfixed endogenous retroviral insertions in the human population Emanuele Marchi, Alex Kanapin, Gkikas Magiorkinis and Robert Belshaw Supplementary Methods Common sources of 'false positives' in mining
More informationHigh-throughput sequence alignment. November 9, 2017
High-throughput sequence alignment November 9, 2017 a little history human genome project #1 (many U.S. government agencies and large institute) started October 1, 1990. Goal: 10x coverage of human genome,
More informationEBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013
EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice
More informationPrac%cal Bioinforma%cs for Life Scien%sts. Week 14, Lecture 28. István Albert Bioinforma%cs Consul%ng Center Penn State
Prac%cal Bioinforma%cs for Life Scien%sts Week 14, Lecture 28 István Albert Bioinforma%cs Consul%ng Center Penn State Final project A group of researchers are interested in studying protein binding loca%ons
More informationDEGseq: an R package for identifying differentially expressed genes from RNA-seq data
DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics
More informationBTRY 7210: Topics in Quantitative Genomics and Genetics
BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:
More informationAlignment Strategies for Large Scale Genome Alignments
Alignment Strategies for Large Scale Genome Alignments CSHL Computational Genomics 9 November 2003 Algorithms for Biological Sequence Comparison algorithm value scoring gap time calculated matrix penalty
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationEnsembl Genomes (non-chordates): Quick tour. This quick tour provides a brief introduction to Ensembl Genomes [2], the non-chordate genome browser.
Paul Kersey [1] DNA & RNA Beginner 0.5 hour This quick tour provides a brief introduction to Ensembl Genomes [2], the non-chordate genome browser. Learning objectives: Basic understanding of Ensembl Genomes
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationLecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins
Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) 2 19 2015 Scribe: John Ekins Multiple Sequence Alignment Given N sequences x 1, x 2,, x N : Insert gaps in each of the sequences
More informationEvolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites
Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites Paper by: James P. Balhoff and Gregory A. Wray Presentation by: Stephanie Lucas Reviewed
More informationSpliceGrapherXT: From Splice Graphs to Transcripts Using RNA-Seq
SpliceGrapherXT: From Splice Graphs to Transcripts Using RNA-Seq Mark F. Rogers, Christina Boucher, and Asa Ben-Hur Department of Computer Science 1873 Campus Delivery Fort Collins, CO 80523 rogersma@cs.colostate.edu,
More informationSmall RNA in rice genome
Vol. 45 No. 5 SCIENCE IN CHINA (Series C) October 2002 Small RNA in rice genome WANG Kai ( 1, ZHU Xiaopeng ( 2, ZHONG Lan ( 1,3 & CHEN Runsheng ( 1,2 1. Beijing Genomics Institute/Center of Genomics and
More informationEffective Cluster-Based Seed Design for Cross-Species Sequence Comparisons
Effective Cluster-Based Seed Design for Cross-Species Sequence Comparisons Leming Zhou and Liliana Florea 1 Methods Supplementary Materials 1.1 Cluster-based seed design 1. Determine Homologous Genes.
More informationTowards More Effective Formulations of the Genome Assembly Problem
Towards More Effective Formulations of the Genome Assembly Problem Alexandru Tomescu Department of Computer Science University of Helsinki, Finland DACS June 26, 2015 1 / 25 2 / 25 CENTRAL DOGMA OF BIOLOGY
More informationDetecting non-coding RNA in Genomic Sequences
Detecting non-coding RNA in Genomic Sequences I. Overview of ncrnas II. What s specific about RNA detection? III. Looking for known RNAs IV. Looking for unknown RNAs Daniel Gautheret INSERM ERM 206 & Université
More informationBioinformatics for Biologists
Bioinformatics for Biologists Sequence Analysis: Part I. Pairwise alignment and database searching Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Definitions The use of computational
More informationMegAlign Pro Pairwise Alignment Tutorials
MegAlign Pro Pairwise Alignment Tutorials All demo data for the following tutorials can be found in the MegAlignProAlignments.zip archive here. Tutorial 1: Multiple versus pairwise alignments 1. Extract
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationSingle Cell Sequencing
Single Cell Sequencing Fundamental unit of life Autonomous and unique Interactive Dynamic - change over time Evolution occurs on the cellular level Robert Hooke s drawing of cork cells, 1665 Type Prokaryotes
More informationThe MANTiS Manual. Contents. MANTiS Version 1.1
The MANTiS Manual MANTiS Version 1.1 Contents Connection to the MANTiS database... 2 Memory settings... 2 Main functionalities... 2 Character Mapping View... 4 Genome content View... 5 Biological processes
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 389; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs06.html 1/12/06 CAP5510/CGS5166 1 Evaluation
More informationIntroduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas
Introduc)on to RNA- Seq Data Analysis Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Material: hep://)ny.cc/rnaseq Slides: hep://)ny.cc/slidesrnaseq
More informationSupplementary Information for Discovery and characterization of indel and point mutations
Supplementary Information for Discovery and characterization of indel and point mutations using DeNovoGear Avinash Ramu 1 Michiel J. Noordam 1 Rachel S. Schwartz 2 Arthur Wuster 3 Matthew E. Hurles 3 Reed
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationIdentification of 3 0 gene ends using transcriptional and genomic conservation across vertebrates
Morgan et al. BMC Genomics 2012, 13:708 METHODOLOGY ARTICLE Open Access Identification of 3 0 gene ends using transcriptional and genomic conservation across vertebrates Marcos Morgan 1,2*, Alessandra
More informationtraining workshop 2015
TransPLANT user training workshop 2015 Slides: http://tinyurl.com/transplant2015 Workshop on variation data EMBL-EBI Hinxton-UK 2nd July 2015 Ensembl Genomes Team Notes: This workshop is based on Ensembl
More informationOrganization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p
Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=
More informationOrthologs Detection and Applications
Orthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 1 / 25 Table of contents 1
More informationCOLE TRAPNELL, BRIAN A WILLIAMS, GEO PERTEA, ALI MORTAZAVI, GORDON KWAN, MARIJKE J VAN BAREN, STEVEN L SALZBERG, BARBARA J WOLD, AND LIOR PACHTER
SUPPLEMENTARY METHODS FOR THE PAPER TRANSCRIPT ASSEMBLY AND QUANTIFICATION BY RNA-SEQ REVEALS UNANNOTATED TRANSCRIPTS AND ISOFORM SWITCHING DURING CELL DIFFERENTIATION COLE TRAPNELL, BRIAN A WILLIAMS,
More informationExplore SNP polymorphism data. A. Dereeper, Y. Hueber
Explore SNP polymorphism data A. Dereeper, Y. Hueber Bioinformatics trainings, Supagro, February, 2016 Tablet Graphical tool to visualize assemblies Accept many formats ACE, SAM, BAM GATK (Genome Analysis
More informationTandem repeat 16,225 20,284. 0kb 5kb 10kb 15kb 20kb 25kb 30kb 35kb
Overview Fosmid XAAA112 consists of 34,783 nucleotides. Blat results indicate that this fosmid has significant identity to the 2R chromosome of D.melanogaster. Evidence suggests that fosmid XAAA112 contains
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org
More informationPackage chimeraviz. April 13, 2019
Type Package Title Visualization tools for gene fusions Version 1.8.5 Package chimeraviz April 13, 2019 chimeraviz manages data from fusion gene finders and provides useful visualization tools. License
More informationGCD3033:Cell Biology. Transcription
Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors
More informationPractical considerations of working with sequencing data
Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!
More informationPackage chimeraviz. November 29, 2017
Type Package Title Visualization tools for gene fusions Version 1.4.0 Package chimeraviz November 29, 2017 chimeraviz manages data from fusion gene finders and provides useful visualization tools. License
More informationCGS 5991 (2 Credits) Bioinformatics Tools
CAP 5991 (3 Credits) Introduction to Bioinformatics CGS 5991 (2 Credits) Bioinformatics Tools Giri Narasimhan 8/26/03 CAP/CGS 5991: Lecture 1 1 Course Schedules CAP 5991 (3 credit) will meet every Tue
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course
More informationComputational Structural Bioinformatics
Computational Structural Bioinformatics ECS129 Instructor: Patrice Koehl http://koehllab.genomecenter.ucdavis.edu/teaching/ecs129 koehl@cs.ucdavis.edu Learning curve Math / CS Biology/ Chemistry Pre-requisite
More informationHapsembler version 2.1 ( + Encore & Scarpa) Manual. Nilgun Donmez Department of Computer Science University of Toronto
Hapsembler version 2.1 ( + Encore & Scarpa) Manual Nilgun Donmez Department of Computer Science University of Toronto January 13, 2013 Contents 1 Introduction.................................. 2 2 Installation..................................
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationPart 8 Working with Nucleic Acids
Part 8 Working with Nucleic Acids http://cbm.msoe.edu/newwebsite/learntomodel Introduction Most Protein Databank files loaded into the CBM's Jmol Design Environment include protein structures and small
More informationStatistical Inferences for Isoform Expression in RNA-Seq
Statistical Inferences for Isoform Expression in RNA-Seq Hui Jiang and Wing Hung Wong February 25, 2009 Abstract The development of RNA sequencing (RNA-Seq) makes it possible for us to measure transcription
More informationSequence Database Search Techniques I: Blast and PatternHunter tools
Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered
More informationIntroduction to Bioinformatics Online Course: IBT
Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple
More informationThe Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector.
The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector. Omar S. Akbari*, Igor Antoshechkin*, Henry Amrhein, Brian Williams, Race Diloreto, Jeremy
More informationCorrespondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons
Gao and Li BMC Genomics (2017) 18:234 DOI 10.1186/s12864-017-3600-2 RESEARCH ARTICLE Open Access Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics
More informationExtensive identification and analysis of conserved small ORFs in animals
Mackowiak et al. Genome Biology (2015) 16:179 DOI 10.1186/s13059-015-0742-x RESEARCH Open Access Extensive identification and analysis of conserved small ORFs in animals Sebastian D. Mackowiak 1, Henrik
More informationSupplemental Data. Perea-Resa et al. Plant Cell. (2012) /tpc
Supplemental Data. Perea-Resa et al. Plant Cell. (22)..5/tpc.2.3697 Sm Sm2 Supplemental Figure. Sequence alignment of Arabidopsis LSM proteins. Alignment of the eleven Arabidopsis LSM proteins. Sm and
More informationBiology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall
Biology Biology 1 of 26 Fruit fly chromosome 12-5 Gene Regulation Mouse chromosomes Fruit fly embryo Mouse embryo Adult fruit fly Adult mouse 2 of 26 Gene Regulation: An Example Gene Regulation: An Example
More informationSection 7. Junaid Malek, M.D.
Section 7 Junaid Malek, M.D. RNA Processing and Nomenclature For the purposes of this class, please do not refer to anything as mrna that has not been completely processed (spliced, capped, tailed) RNAs
More information(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.
1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the
More informationGenome Browsers And Genome Databases. Andy Conley Computational Genomics 2009
Genome Browsers And Genome Databases Andy Conley Computational What is a Genome Browser Genome browsers facilitate genomic analysis by presenting alignment, experimental and annotation data in the context
More informationMotivating the need for optimal sequence alignments...
1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use
More informationBIOINFORMATICS LAB AP BIOLOGY
BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationGENOME-WIDE ANALYSIS OF CORE PROMOTER REGIONS IN EMILIANIA HUXLEYI
1 GENOME-WIDE ANALYSIS OF CORE PROMOTER REGIONS IN EMILIANIA HUXLEYI Justin Dailey and Xiaoyu Zhang Department of Computer Science, California State University San Marcos San Marcos, CA 92096 Email: daile005@csusm.edu,
More informationRNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"
RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure
More informationDIMACS Workshop on Parallelism: A 2020 Vision Lattice Basis Reduction and Multi-Core
DIMACS Workshop on Parallelism: A 2020 Vision Lattice Basis Reduction and Multi-Core Werner Backes and Susanne Wetzel Stevens Institute of Technology 29th March 2011 Work supported through NSF Grant DUE
More informationAlignment-free RNA-seq workflow. Charlotte Soneson University of Zurich Brixen 2017
Alignment-free RNA-seq workflow Charlotte Soneson University of Zurich Brixen 2017 The alignment-based workflow ALIGNMENT COUNTING ANALYSIS Gene A Gene B... Gene X 7... 13............... The alignment-based
More information