Paired-End Read Length Lower Bounds for Genome Re-sequencing
|
|
- Madeline Blair
- 5 years ago
- Views:
Transcription
1 1/11 Paired-End Read Length Lower Bounds for Genome Re-sequencing Rayan Chikhi ENS Cachan Brittany PhD student in the Symbiose team, Irisa, France
2 2/11 NEXT-GENERATION SEQUENCING Next-gen vs. traditional (Sanger) DNA sequencers : Throughput Shorter reads (starting at 25 bp) Cost Paired reads (both ends of a fragment High quality reads of size k nt) ( 0.4% error rate per base on Illumina)
3 2/11 NEXT-GENERATION SEQUENCING Next-gen vs. traditional (Sanger) DNA sequencers : Throughput Cost High quality reads ( 0.4% error rate per base on Illumina) Some bioinformatics applications : Genome re-sequencing De novo (w/out reference genome) and comparative assembly Shorter reads (starting at 25 bp) Paired reads (both ends of a fragment of size k nt) Topic of this talk : Determine when NGS paired read length is too short for re-sequencing.
4 3/11 CONTEXT : GENOME RE-SEQUENCING Re-sequencing : align reads to a reference sequence to improve it and detect SNPs/indels Resequencing ambiguity : map: AACGTATGCA to: -AACGTTTGCA AACGTTTGCA-- Micro-repeated (30-50 nt) regions are : -not re-sequencable with short reads -not fully predicted by simple models (such as BLAST s E-value) nor RepeatMasker Solution : analyze actual genomes [Whiteford et al, 2005] : perfect uniqueness of single reads. Genomes Viral Bacterial Small eukaryote Human Read length for 12 nt 18 nt nt nt max uniqueness (100%) (97%) (90-95%) (85-95%)
5 SEQUENCERS CHART 6 5 Throughput (Gb/run) ABI SOLiD 2 (p) Illumina GA II (p) 1 Illumina GA ABI SOLiD Ti (p) 454 GS FLX Read length (nt) 4/11
6 SEQUENCERS CHART 6 Throughput (Gb/run) E. coli, 97% of reads map uniquely ABI SOLiD 2 (p) Human genome, 95% of reads map uniquely Illumina GA II (p) Human genome chr 1, 80% of genome covered by contigs > 10k nt 1 ABI SOLiD 1 Illumina GA 454 GS Read length (nt) 5/11
7 6/11 METHODS We study the perfect uniqueness of mate-paired reads : (σ, δ)-pair : r 1 r 2 σ ± δ Example values for Illumina [Lee 09] : σ 150, δ 15 a (σ, δ)-pair (r 1, r 2 ) is unique there is no other (r 1, r 2 ) pair distant of σ ± δ in the genome ---AACGT---TTGCA AACGT-----TTGCA-- Here, the (3, 2)-pair AACGT---TTGCA is not unique. U = number of unique (σ, δ)-pairs number of (σ, δ)-pairs
8 7/11 METHODS, ALGORITHM Build a suffix array and lcp of the genome sequence We developed a novel pairs-counting algorithm based on a suffix array. Here, δ = 0 case is shown. Complexity : O(n + nδ) time and memory For each read length l, do: Find duplicate reads For each r such that lcp[r]=l, dupe[r]=1 Determine if each (σ,0)-pair is unique For each r s.t. dupe[r]=1, pair[r,r+l+σ]++ For each r s.t. dupe[r]=1, If (pair[r,r+l+σ]>=2) found_pair(r) } Variant of RepAnalyse
9 7/11 METHODS, ALGORITHM Build a suffix array and lcp of the genome sequence Notes : One-time computation, but high memory usage. RepAnalyse on the human genome : 2 days Our algorithm on Medaka genome, δ = 0 : 12 hrs. For each read length l, do: Find duplicate reads For each r such that lcp[r]=l, dupe[r]=1 Determine if each (σ,0)-pair is unique For each r s.t. dupe[r]=1, pair[r,r+l+σ]++ For each r s.t. dupe[r]=1, If (pair[r,r+l+σ]>=2) found_pair(r) } Variant of RepAnalyse
10 8/11 RESULTS : Comparison between paired and single uniqueness 100 Lambda-phage (50 kbp) E. coli (4.7 Mbp) H. sapiens chr X (154 Mbp) Unique reads (%) Read length (nt) paired unpaired Fugu (408 Mbp) Medaka (886 Mbp) both strands are considered, and (σ, δ)=(300,0)
11 9/11 RESULTS : Evaluation of mate-pair separation vs. uniqueness in the E. coli genome 10% σ = 9000, δ = 5% 0% 10% σ = 5000, δ = 5% 0% 10% σ = 1000, δ = 5% 0% 10% σ = 850, δ = 4% 0% 10% σ = 600, δ = 5% 0% 10% σ = 350, δ = 4% 0% 10% σ = 100, δ = 5% 0% Uniqueness of reads in the E. coli genome Read length (nt) % 99.75% % 99.67% 99.73% % 98.90% 98.99% 99.13% 98.77% 98.85% 98.96% 98.53% 98.57% 98.66% 98.25% 98.26% 98.32% 97.85% 97.85% 97.88%
12 10/11 CONCLUSION Conclusion : Best recipe for paired-end sequencing : 1. increase mate-pair separation 2. keep variability of separation as low as possible 3. use longer read lengths Given perfect separation precision, short (estimate : nt) mate-paired reads should map uniquely to 95% of the human genome. Perspectives : Use statistical models to obtain bounds for approximate uniqueness of reads. Find theoretical lower bounds for de novo assembly. Study the time complexity of paired-end de novo assembly (prelim results : NP-hardness of several models).
13 11/11 ACKNOWLEDGEMENTS Dominique Lavenier, PhD advisor Aurélien Biogenouest genomics center Symbiose team and ENS Cachan Britanny CS dept Any Question?
14 12/11 SUPPL. MATERIAL : Is the right mate always in the same strand? [H. Li et al, 2008] Correct paired-end reads : SOLiD : Right mate always on the same strand Illumina GA : Right mate always on the other strand 100 E. coli (4.7 Mbp) If one wishes to perform structural variation detection, then the uniqueness of both correct and discordant reads matters. Unique reads (%) right mate in forward strang right mate in reverse strand right mate in any strand Read length (nt)
15 SUPPL. MATERIAL : Comparison between paired, 2x paired and single uniqueness 100 Lambda-phage (50 kbp) E. coli (4.7 Mbp) H. sapiens chr X (154 Mbp) Unique reads (%) Read length (nt) paired paired (with read length x2) unpaired Fugu (408 Mbp) Medaka (886 Mbp) both strands are considered, and (σ, δ)=(300,0) 13/11
16 14/11 SUPPL. MATERIAL : Near-perfect micro-repeats Counting only perfect micro-repeats gives a upper bound on unicity, hence a lower bound on read length.
Cycle «Analyse de données de séquençage à haut-débit»
Cycle «Analyse de données de séquençage à haut-débit» Module 1/5 Analyse ADN Chadi Saad CRIStAL - Équipe BONSAI - Univ Lille, CNRS, INRIA (chadi.saad@univ-lille.fr) Présentation de Sophie Gallina (source:
More informationMapping-free and Assembly-free Discovery of Inversion Breakpoints from Raw NGS Reads
1st International Conference on Algorithms for Computational Biology AlCoB 2014 Tarragona, Spain, July 1-3, 2014 Mapping-free and Assembly-free Discovery of Inversion Breakpoints from Raw NGS Reads Claire
More informationGenomes Comparision via de Bruijn graphs
Genomes Comparision via de Bruijn graphs Student: Ilya Minkin Advisor: Son Pham St. Petersburg Academic University June 4, 2012 1 / 19 Synteny Blocks: Algorithmic challenge Suppose that we are given two
More informationHigh-throughput sequence alignment. November 9, 2017
High-throughput sequence alignment November 9, 2017 a little history human genome project #1 (many U.S. government agencies and large institute) started October 1, 1990. Goal: 10x coverage of human genome,
More informationGenome Assembly. Sequencing Output. High Throughput Sequencing
Genome High Throughput Sequencing Sequencing Output Example applications: Sequencing a genome (DNA) Sequencing a transcriptome and gene expression studies (RNA) ChIP (chromatin immunoprecipitation) Example
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Sequence Assembly
CMPS 6630: Introduction to Computational Biology and Bioinformatics Sequence Assembly Why Genome Sequencing? Sanger (1982) introduced chaintermination sequencing. Main idea: Obtain fragments of all possible
More informationAnnotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA)
Annotation of Plant Genomes using RNA-seq Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA) inuscu1-35bp 5 _ 0 _ 5 _ What is Annotation inuscu2-75bp luscu1-75bp 0 _ 5 _ Reconstruction
More informationPan-genomics: theory & practice
Pan-genomics: theory & practice Michael Schatz Sept 20, 2014 GRC Assembly Workshop #gi2014 / @mike_schatz Part 1: Theory Advances in Assembly! Perfect Human Assembly First PacBio RS @ CSHL Perfect Microbes
More informationDesigning and Testing a New DNA Fragment Assembler VEDA-2
Designing and Testing a New DNA Fragment Assembler VEDA-2 Mark K. Goldberg Darren T. Lim Rensselaer Polytechnic Institute Computer Science Department {goldberg, limd}@cs.rpi.edu Abstract We present VEDA-2,
More informationPyrobayes: an improved base caller for SNP discovery in pyrosequences
Pyrobayes: an improved base caller for SNP discovery in pyrosequences Aaron R Quinlan, Donald A Stewart, Michael P Strömberg & Gábor T Marth Supplementary figures and text: Supplementary Figure 1. The
More informationGeneral context Anchor-based method Evaluation Discussion. CoCoGen meeting. Accuracy of the anchor-based strategy for genome alignment.
CoCoGen meeting Accuracy of the anchor-based strategy for genome alignment Raluca Uricaru LIRMM, CNRS Université de Montpellier 2 3 octobre 2008 1 / 31 Summary 1 General context 2 Global alignment : anchor-based
More informationComparative genomics: Overview & Tools + MUMmer algorithm
Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first
More informationIntroduction to de novo RNA-seq assembly
Introduction to de novo RNA-seq assembly Introduction Ideal day for a molecular biologist Ideal Sequencer Any type of biological material Genetic material with high quality and yield Cutting-Edge Technologies
More informationSJÄLVSTÄNDIGA ARBETEN I MATEMATIK
SJÄLVSTÄNDIGA ARBETEN I MATEMATIK MATEMATISKA INSTITUTIONEN, STOCKHOLMS UNIVERSITET Analysing k-mer distributions in a genome sequencing project av Josene Röhss 2014 - No 22 MATEMATISKA INSTITUTIONEN,
More informationSingle-cell genomics applied to the picobiliphytes using next-generation sequencing
Department of Ecology, Evolution and Natural Resources and Institute of Marine and Coastal Sciences Rutgers University, NJ 08901 Single-cell genomics applied to the picobiliphytes using next-generation
More informationDiscovery of Genomic Structural Variations with Next-Generation Sequencing Data
Discovery of Genomic Structural Variations with Next-Generation Sequencing Data Advanced Topics in Computational Genomics Slides from Marcel H. Schulz, Tobias Rausch (EMBL), and Kai Ye (Leiden University)
More informationDe novo assembly and genotyping of variants using colored de Bruijn graphs
De novo assembly and genotyping of variants using colored de Bruijn graphs Iqbal et al. 2012 Kolmogorov Mikhail 2013 Challenges Detecting genetic variants that are highly divergent from a reference Detecting
More informationAssembly improvement: based on Ragout approach. student: Anna Lioznova scientific advisor: Son Pham
Assembly improvement: based on Ragout approach student: Anna Lioznova scientific advisor: Son Pham Plan Ragout overview Datasets Assembly improvements Quality overlap graph paired-end reads Coverage Plan
More informationAlgorithmics and Bioinformatics
Algorithmics and Bioinformatics Gregory Kucherov and Philippe Gambette LIGM/CNRS Université Paris-Est Marne-la-Vallée, France Schedule Course webpage: https://wikimpri.dptinfo.ens-cachan.fr/doku.php?id=cours:c-1-32
More informationTranscriptional Circuitry and the Regulatory Conformation of the Genome
Transcriptional Circuitry and the Regulatory Conformation of the Genome I-CORE for Gene Regulation in Complex Human Disease. Feb. 1 st, 2015 The Hebrew University Introduction to Next-generation sequencing
More informationStatistical analysis of genomic binding sites using high-throughput ChIP-seq data
Statistical analysis of genomic binding sites using high-throughput ChIP-seq data Ibrahim Ali H Nafisah Department of Statistics University of Leeds Submitted in accordance with the requirments for the
More informationBio 119 Bacterial Genomics 6/26/10
BACTERIAL GENOMICS Reading in BOM-12: Sec. 11.1 Genetic Map of the E. coli Chromosome p. 279 Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents p. 344 Sec. 13.3 Prokaryotic Genomes: Bioinformatic Analysis
More informationWhole genome sequencing (WGS) as a tool for monitoring purposes. Henrik Hasman DTU - Food
Whole genome sequencing (WGS) as a tool for monitoring purposes Henrik Hasman DTU - Food The Challenge Is to: Continue to increase the power of surveillance and diagnostic using molecular tools Develop
More informationWhole genome sequencing (WGS) - there s a new tool in town. Henrik Hasman DTU - Food
Whole genome sequencing (WGS) - there s a new tool in town Henrik Hasman DTU - Food Welcome to the NGS world TODAY Welcome Introduction to Next Generation Sequencing DNA purification (Hands-on) Lunch (Sandwishes
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationNew Contig Creation Algorithm for the de novo DNA Assembly Problem
New Contig Creation Algorithm for the de novo DNA Assembly Problem Mohammad Goodarzi Computer Science A thesis submitted in partial fulfilment of the requirements for the degree of Master of Science in
More informationCHAPTER : Prokaryotic Genetics
CHAPTER 13.3 13.5: Prokaryotic Genetics 1. Most bacteria are not pathogenic. Identify several important roles they play in the ecosystem and human culture. 2. How do variations arise in bacteria considering
More informationHigh-throughput sequencing: Alignment and related topic
High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg HTS Platforms E s ta b lis h e d p la tfo rm s Illu m in a H is e q, A B I S O L id, R o c h e 4 5 4 N e w c o m e rs
More informationDiscrete Probability Distributions: Poisson Applications to Genome Assembly
Discrete Probability Distributions: Poisson Applications to Genome Assembly Binomial Distribution Binomially distributed random variable X equals sum of n independent Bernoulli trials The probability mass
More informationUnfixed endogenous retroviral insertions in the human population. Emanuele Marchi, Alex Kanapin, Gkikas Magiorkinis and Robert Belshaw
Unfixed endogenous retroviral insertions in the human population Emanuele Marchi, Alex Kanapin, Gkikas Magiorkinis and Robert Belshaw Supplementary Methods Common sources of 'false positives' in mining
More informationG4120: Introduction to Computational Biology
ICB Fall 2009 G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology & Immunology Copyright 2008 Oliver Jovanovic, All Rights Reserved. Genome
More informationChIP seq peak calling. Statistical integration between ChIP seq and RNA seq
Institute for Computational Biomedicine ChIP seq peak calling Statistical integration between ChIP seq and RNA seq Olivier Elemento, PhD ChIP-seq to map where transcription factors bind DNA Transcription
More informationTowards More Effective Formulations of the Genome Assembly Problem
Towards More Effective Formulations of the Genome Assembly Problem Alexandru Tomescu Department of Computer Science University of Helsinki, Finland DACS June 26, 2015 1 / 25 2 / 25 CENTRAL DOGMA OF BIOLOGY
More informationProbabilistic Arithmetic Automata
Probabilistic Arithmetic Automata Applications of a Stochastic Computational Framework in Biological Sequence Analysis Inke Herms PhD thesis defense Overview 1 Probabilistic Arithmetic Automata 2 Application
More informationHumans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase
Humans have two copies of each chromosome Inherited from mother and father. Genotyping technologies do not maintain the phase Genotyping technologies do not maintain the phase Recall that proximal SNPs
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationMulti-Assembly Problems for RNA Transcripts
Multi-Assembly Problems for RNA Transcripts Alexandru Tomescu Department of Computer Science University of Helsinki Joint work with Veli Mäkinen, Anna Kuosmanen, Romeo Rizzi, Travis Gagie, Alex Popa CiE
More informationIntroduction to Molecular and Cell Biology
Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What
More information2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology
2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the
More informationCONTENTS. P A R T I Genomes 1. P A R T II Gene Transcription and Regulation 109
CONTENTS ix Preface xv Acknowledgments xxi Editors and contributors xxiv A computational micro primer xxvi P A R T I Genomes 1 1 Identifying the genetic basis of disease 3 Vineet Bafna 2 Pattern identification
More informationPredictive Genome Analysis Using Partial DNA Sequencing Data
Predictive Genome Analysis Using Partial DNA Sequencing Data Nauman Ahmed, Koen Bertels and Zaid Al-Ars Computer Engineering Lab, Delft University of Technology, Delft, The Netherlands {n.ahmed, k.l.m.bertels,
More information23/01/2018. PiRATE: a Pipeline to Retrieve and Annotate TEs of non-model organisms. Transposable elements (TEs) Impact of TEs on genomes
Transposable elements () PiRATE: a Pipeline to Retrieve and Annotate of non-model organisms are DNAsequences able to move (= transposition) into the host genome of eucaryotic and procaryotic organisms
More informationBioinformatics Practical for Biochemists
Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt WS 2013/2014 01. DNA & Genomics!! 1 Description Lectures about general topics in Bioinformatics & History Tutorials
More informationBioinformatics Practical for Biochemists
Bioinformatics Practical for Biochemists Andrei Lupas, Birte Höcker, Steffen Schmidt WS 2012/2013 01. DNA & Genomics 1 Description Lectures about general topics in Bioinformatics & History Tutorials will
More informationSequence Database Search Techniques I: Blast and PatternHunter tools
Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered
More informationThe Saguaro Genome. Toward the Ecological Genomics of a Sonoran Desert Icon. Dr. Dario Copetti June 30, 2015 STEMAZing workshop TCSS
The Saguaro Genome Toward the Ecological Genomics of a Sonoran Desert Icon Dr. Dario Copetti June 30, 2015 STEMAZing workshop TCSS Why study a genome? - the genome contains the genetic information of an
More informationHeterozygous BMN lines
Optical density at 80 hours 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 0.8 0.6 0.4 0.2 a YPD b YPD + 1µM nystatin c YPD + 2µM nystatin d YPD + 4µM nystatin 1 3 5 6 9 13 16 20 21 22 23 25 28 29 30
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 389; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs06.html 1/12/06 CAP5510/CGS5166 1 Evaluation
More informationHidden Markov Models and some applications
Oleg Makhnin New Mexico Tech Dept. of Mathematics November 11, 2011 HMM description Application to genetic analysis Applications to weather and climate modeling Discussion HMM description Application to
More informationNatural Variation, Genomics and Biotechnology
Natural Variation, Genomics and Biotechnology My royal ancestry Stefan Jansson (1959-) - Professor Britt-Greta Jansson (1936-) Office clerk Karl Ljungberg (1911-1998) - Shopkeeper Olof Ljungberg (1884-1944)
More informationGenome Assembly Results, Protocol & Demo
Genome Assembly Results, Protocol & Demo Monday, March 7, 2016 BIOL 7210: Genome Assembly Group Aroon Chande, Cheng Chen, Alicia Francis, Alli Gombolay, Namrata Kalsi, Ellie Kim, Tyrone Lee, Wilson Martin,
More informationK-means-based Feature Learning for Protein Sequence Classification
K-means-based Feature Learning for Protein Sequence Classification Paul Melman and Usman W. Roshan Department of Computer Science, NJIT Newark, NJ, 07102, USA pm462@njit.edu, usman.w.roshan@njit.edu Abstract
More informationHapsembler version 2.1 ( + Encore & Scarpa) Manual. Nilgun Donmez Department of Computer Science University of Toronto
Hapsembler version 2.1 ( + Encore & Scarpa) Manual Nilgun Donmez Department of Computer Science University of Toronto January 13, 2013 Contents 1 Introduction.................................. 2 2 Installation..................................
More informationL3: Blast: Keyword match basics
L3: Blast: Keyword match basics Fa05 CSE 182 Silly Quiz TRUE or FALSE: In New York City at any moment, there are 2 people (not bald) with exactly the same number of hairs! Assignment 1 is online Due 10/6
More informationA DNA Sequence 2017/12/6 1
A DNA Sequence ccgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatcgtgtgg gtagtagctgatatgatgcgaggtaggggataggatagcaacagatgagc ggatgctgagtgcagtggcatgcgatgtcgatgatagcggtaggtagacttc gcgcataaagctgcgcgagatgattgcaaagragttagatgagctgatgcta
More informationBIOINFORMATICS. PILER: identification and classification of genomic repeats. Robert C. Edgar 1* and Eugene W. Myers 2 1 INTRODUCTION
BIOINFORMATICS Vol. 1 no. 1 2003 Pages 1 1 PILER: identification and classification of genomic repeats Robert C. Edgar 1* and Eugene W. Myers 2 1 195 Roque Moraes Drive, Mill Valley, CA, U.S.A., bob@drive5.com.
More informationMining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences
Mining Infrequent Patterns of Two Frequent Substrings from a Single Set of Biological Sequences Daisuke Ikeda Department of Informatics, Kyushu University 744 Moto-oka, Fukuoka 819-0395, Japan. daisuke@inf.kyushu-u.ac.jp
More informationSupplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation.
Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation. Inlets are labelled in blue, outlets are labelled in red, and static channels
More informationTopic 3: Genetics (Student) Essential Idea: Chromosomes carry genes in a linear sequence that is shared by members of a species.
Topic 3: Genetics (Student) 3.2 Essential Idea: Chromosomes carry genes in a linear sequence that is shared by members of a species. 3.2 Chromosomes 3.2.U1 Prokaryotes have one chromosome consisting of
More informationOverview of Research at Bioinformatics Lab
Overview of Research at Bioinformatics Lab Li Liao Develop new algorithms and (statistical) learning methods that help solve biological problems > Capable of incorporating domain knowledge > Effective,
More informationHidden Markov Models and some applications
Oleg Makhnin New Mexico Tech Dept. of Mathematics November 11, 2011 HMM description Application to genetic analysis Applications to weather and climate modeling Discussion HMM description Hidden Markov
More informationLinear-Space Alignment
Linear-Space Alignment Subsequences and Substrings Definition A string x is a substring of a string x, if x = ux v for some prefix string u and suffix string v (similarly, x = x i x j, for some 1 i j x
More informationRGP finder: prediction of Genomic Islands
Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication
More informationMore Codon Usage Bias
.. CSC448 Bioinformatics Algorithms Alexander Dehtyar.. DA Sequence Evaluation Part II More Codon Usage Bias Scaled χ 2 χ 2 measure. In statistics, the χ 2 statstic computes how different the distribution
More informationOn the Sound Covering Cycle Problem in Paired de Bruijn Graphs
On the Sound Covering Cycle Problem in Paired de Bruijn Graphs Christian Komusiewicz 1 and Andreea Radulescu 2 1 Institut für Softwaretechnik und Theoretische Informatik, TU Berlin, Germany christian.komusiewicz@tu-berlin.de
More informationFitness constraints on horizontal gene transfer
Fitness constraints on horizontal gene transfer Dan I Andersson University of Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala, Sweden GMM 3, 30 Aug--2 Sep, Oslo, Norway Acknowledgements:
More informationFINISHING MY FOSMID XAAA113 Andrew Nett
FINISHING MY FOSMID XAAA113 Andrew Nett The assembly of fosmid XAAA113 initially consisted of three contigs with two gaps (Fig 1). This assembly was based on the sequencing reactions of subclones set up
More informationIntroduction to Sequence Alignment. Manpreet S. Katari
Introduction to Sequence Alignment Manpreet S. Katari 1 Outline 1. Global vs. local approaches to aligning sequences 1. Dot Plots 2. BLAST 1. Dynamic Programming 3. Hash Tables 1. BLAT 4. BWT (Burrow Wheeler
More informationY chromosome dynamics in Drosophila. Amanda Larracuente Department of Biology
Y chromosome dynamics in Drosophila Amanda Larracuente Department of Biology Sex chromosomes J. Graves X X X Y Sex chromosome evolution Autosomes Proto-sex chromosomes Sex determining Suppressed recombination
More informationRNA- seq read mapping
RNA- seq read mapping Pär Engström SciLifeLab RNA- seq workshop October 216 IniDal steps in RNA- seq data processing 1. Quality checks on reads 2. Trim 3' adapters (opdonal (for species with a reference
More informationGenome Sequencing & Assembly Michael Schatz. May 2, 2013 Human Microbiome Consortium
Genome Sequencing & Assembly Michael Schatz May 2, 2013 Human Microbiome Consortium Outline 1. Assembly theory 1. Assembly by analogy 2. De Bruijn and Overlap graph 3. Coverage, read length, errors, and
More informationA Browser for Pig Genome Data
A Browser for Pig Genome Data Thomas Mailund January 2, 2004 This report briefly describe the blast and alignment data available at http://www.daimi.au.dk/ mailund/pig-genome/ hits.html. The report describes
More informationWhole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationCentrifuge: rapid and sensitive classification of metagenomic sequences
Centrifuge: rapid and sensitive classification of metagenomic sequences Daehwan Kim, Li Song, Florian P. Breitwieser, and Steven L. Salzberg Supplementary Material Supplementary Table 1 Supplementary Note
More informationComputer Algebra Systems Activity: Developing the Quadratic Formula
Computer Algebra Systems Activity: Developing the Quadratic Formula Topic: Developing the Quadratic Formula R. Meisel, Feb 19, 2005 rollym@vaxxine.com Ontario Expectations: (To be added when finalized
More informationRead Quality Assessment & Improvement. J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014
Read Quality ssessment & Improvement J Fass UCD Genome Center Bioinformatics Core Monday June 16, 2014 Error modes Each technology has unique error modes, depending on the physico-chemical processes involved
More informationGenome sequence of Plasmopara viticola and insight into the pathogenic mechanism
Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism Ling Yin 1,3,, Yunhe An 1,2,, Junjie Qu 3,, Xinlong Li 1, Yali Zhang 1, Ian Dry 5, Huijun Wu 2*, Jiang Lu 1,4** 1 College
More informationChing-Han Hsu, BMES, National Tsing Hua University c 2015 by Ching-Han Hsu, Ph.D., BMIR Lab. = a + b 2. b a. x a b a = 12
Lecture 5 Continuous Random Variables BMIR Lecture Series in Probability and Statistics Ching-Han Hsu, BMES, National Tsing Hua University c 215 by Ching-Han Hsu, Ph.D., BMIR Lab 5.1 1 Uniform Distribution
More informationApplications of genome alignment
Applications of genome alignment Comparing different genome assemblies Locating genome duplications and conserved segments Gene finding through comparative genomics Analyzing pathogenic bacteria against
More informationEncoding and Decoding 3D Genome Organization
Encoding and Decoding 3D Genome Organization Noam Kaplan Dekker Lab Program in Systems Biology University of Massachusetts Medical School Chromatin organization 10000nm Human genome: 2x3000 Mbp Human chromosome:
More informationIntroduction to microbiota data analysis
Introduction to microbiota data analysis Natalie Knox, PhD Head Bacterial Genomics, Bioinformatics Core National Microbiology Laboratory, Public Health Agency of Canada 2 National Microbiology Laboratory
More informationHidden Markov Models. Ron Shamir, CG 08
Hidden Markov Models 1 Dr Richard Durbin is a graduate in mathematics from Cambridge University and one of the founder members of the Sanger Institute. He has also held carried out research at the Laboratory
More informationGEP Annotation Report
GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:
More informationSupplementary Information for: The genome of the extremophile crucifer Thellungiella parvula
Supplementary Information for: The genome of the extremophile crucifer Thellungiella parvula Maheshi Dassanayake 1,9, Dong-Ha Oh 1,9, Jeffrey S. Haas 1,2, Alvaro Hernandez 3, Hyewon Hong 1,4, Shahjahan
More informationMicrobiome: 16S rrna Sequencing 3/30/2018
Microbiome: 16S rrna Sequencing 3/30/2018 Skills from Previous Lectures Central Dogma of Biology Lecture 3: Genetics and Genomics Lecture 4: Microarrays Lecture 12: ChIP-Seq Phylogenetics Lecture 13: Phylogenetics
More informationIs inversion symmetry of chromosomes a law of nature?
Is inversion symmetry of chromosomes a law of nature? David Horn TAU Safra bioinformatics retreat, 28/6/2018 Lecture based on Inversion symmetry of DNA k-mer counts: validity and deviations. Shporer S,
More informationNature Genetics: doi:0.1038/ng.2768
Supplementary Figure 1: Graphic representation of the duplicated region at Xq28 in each one of the 31 samples as revealed by acgh. Duplications are represented in red and triplications in blue. Top: Genomic
More informationEvaluation of the Number of Different Genomes on Medium and Identification of Known Genomes Using Composition Spectra Approach.
Evaluation of the Number of Different Genomes on Medium and Identification of Known Genomes Using Composition Spectra Approach Valery Kirzhner *1 & Zeev Volkovich 2 1 Institute of Evolution, University
More informationCharacterization of Structural Variants with Single Molecule and Hybrid Sequencing Approaches
Bioinformatics Advance Access published October 28, 2014 Characterization of Structural Variants with Single Molecule and Hybrid Sequencing Approaches Anna Ritz 1,,, Ali Bashir 2,3, Suzanne Sindi 4, David
More informationnatural development from this collection of knowledge: it is more reliable to predict the property
1 Chapter 1 Introduction As the basis of all life phenomena, the interaction of biomolecules has been under the scrutiny of scientists and cataloged meticulously [2]. The recent advent of systems biology
More informationPLNT2530 (2018) Unit 5 Genomes: Organization and Comparisons
PLNT2530 (2018) Unit 5 Genomes: Organization and Comparisons Unless otherwise cited or referenced, all content of this presenataion is licensed under the Creative Commons License Attribution Share-Alike
More informationBioinformatics course
Bioinformatics course Phylogeny and Comparative genomics 10/23/13 1 Contents-phylogeny Introduction-biology, life classificationtaxonomy Phylogenetic-tree of life, tree representation Why study phylogeny?
More informationTitle: Field-based species identification of closely-related plants using real-time nanopore sequencing.
Title: Field-based species identification of closely-related plants using real-time nanopore sequencing. Authors: Joe Parker 1 *, Dion Devey 1, Andrew J. Helmstetter 1, Tim Wilkinson 1 & Alexander S.T.
More informationEssentiality in B. subtilis
Essentiality in B. subtilis 100% 75% Essential genes Non-essential genes Lagging 50% 25% Leading 0% non-highly expressed highly expressed non-highly expressed highly expressed 1 http://www.pasteur.fr/recherche/unites/reg/
More informationSupplemental Materials
JOURNAL OF MICROBIOLOGY & BIOLOGY EDUCATION, May 2013, p. 107-109 DOI: http://dx.doi.org/10.1128/jmbe.v14i1.496 Supplemental Materials for Engaging Students in a Bioinformatics Activity to Introduce Gene
More informationBiology 105/Summer Bacterial Genetics 8/12/ Bacterial Genomes p Gene Transfer Mechanisms in Bacteria p.
READING: 14.2 Bacterial Genomes p. 481 14.3 Gene Transfer Mechanisms in Bacteria p. 486 Suggested Problems: 1, 7, 13, 14, 15, 20, 22 BACTERIAL GENETICS AND GENOMICS We still consider the E. coli genome
More informationOn the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem
On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem Paola Bonizzoni, Riccardo Dondi, Gunnar W. Klau, Yuri Pirola, Nadia Pisanti and Simone Zaccaria DISCo, computer
More informationGenome Sequencing & DNA Sequence Analysis
7.91 / 7.36 / BE.490 Lecture #1 Feb. 24, 2004 Genome Sequencing & DNA Sequence Analysis Chris Burge What is a Genome? A genome is NOT a bag of proteins What s in the Human Genome? Outline of Unit II: DNA/RNA
More informationFragment Assembly of DNA
Wright State University CORE Scholar Computer Science and Engineering Faculty Publications Computer Science and Engineering 2003 Fragment Assembly of DNA Dan E. Krane Wright State University - Main Campus,
More information