Discovery of Genomic Structural Variations with Next-Generation Sequencing Data

Size: px
Start display at page:

Download "Discovery of Genomic Structural Variations with Next-Generation Sequencing Data"

Transcription

1 Discovery of Genomic Structural Variations with Next-Generation Sequencing Data Advanced Topics in Computational Genomics Slides from Marcel H. Schulz, Tobias Rausch (EMBL), and Kai Ye (Leiden University)

2 Computational Methods

3 Detecting Genomic Rearrangements Reference Mate-pair or paired-end mapping abnormalities Split-Read alignments Read depth signals courtesy of Tobias Rausch (EMBL)

4 Detecting Genomic Rearrangements Reference Unmapped or single-anchored reads Mate-pair or paired-end mapping abnormalities Split-Read alignments Read depth signals Local assembly courtesy of Tobias Rausch (EMBL)

5 courtesy of Tobias Rausch (EMBL)

6 courtesy of Tobias Rausch (EMBL)

7 Insertions Deletions courtesy of Tobias Rausch (EMBL)

8 Lee et al. (2009) courtesy of Tobias Rausch (EMBL) Korbel et al. (2007)

9 courtesy of Tobias Rausch (EMBL)

10 courtesy of Tobias Rausch (EMBL)

11 courtesy of Tobias Rausch (EMBL)

12 courtesy of Tobias Rausch (EMBL)

13 1 Copy 1 Copy 0 Copy 2 Copy 2 Copy courtesy of Tobias Rausch (EMBL) Chiang et al. (2009)

14 Down-Syndrom Partial Trisomie 21 courtesy of Tobias Rausch (EMBL) Xie et al. (2009)

15 Human cancer cell lines compared to normal cell lines (SeqSeq algorithm, no fixed window size, multiple change points method ) Chiang et al. (2009)

16 With reads of length bps are we able to find the exact breakpoint of a structural variation?

17 With reads of length bps are we able to find the exact breakpoint of a structural variation? Yes using split-read mapping Donor Reference Example for read of length 40: Expected random matches for a 12bp read-prefix in the human genome?

18 With reads of length bps are we able to find the exact breakpoint of a structural variation? Yes using split-read mapping Donor Reference Example for read of length 40: Expected random matches for a 12bp read-prefix in the human genome?

19 With reads of length bps are we able to find the exact breakpoint of a structural variation? Yes using anchored split-read mapping Donor Reference mappable read mate provides anchor to narrow down search space Medvedev et al. (2009)

20 The Pindel algorithm (Deletions) How to do that? Ye et al. (2009)

21 The Pindel algorithm (Deletions) Use 3 end of left read as anchor point Use pattern growth to search for minimum and maximum unique substrings from the 3 end of the unmapped read (<=2x insert size) Ye et al. (2009)

22 #&)-./!'0&12-./!(3!%0&&$).!/)45&2 ATGCA ATCAAGTATGCTTAGC!"!#$%&$'($)!*!++ +, courtesy of Kai Ye (Leiden U.)

23 #&)-./!'0&12-./!(3!%0&&$).!/)45&2 ATGCA ATCAAGTATGCTTAGC!"!#$%&$'($)!*!++ +, courtesy of Kai Ye (Leiden U.)

24 #&)-./!'0&12-./!(3!%0&&$).!/)45&2 ATGCA ATCAAGTATGCTTAGC!"!#$%&$'($)!*!++ +, courtesy of Kai Ye (Leiden U.)

25 #&)-./!'0&12-./!(3!%0&&$).!/)45&2 ATGCA ATCAAGTATGCTTAGC!"!#$%&$'($)!*!++ +, courtesy of Kai Ye (Leiden U.)

26 #&),-.!'/&01,-.!(2!%/&&$)-!.)34&1 ATGCA ATCAAGTATGCTTAGC 5,-,'6'!6-,76$!86(8&),-.9!:;< 5/=,'6'!6-,76$!86(8&),-.9!:;<>!"!#$%&$'($)!*!++ *! courtesy of Kai Ye (Leiden U.)

27 The Pindel algorithm (Deletions) Use 3 end of left read as anchor point Use pattern growth to search for minimum and maximum unique substrings from the 3 end of the unmapped read (<=2x insert size) Use pattern growth to search for minimum and maximum unique substrings from the 5 end of the unmapped read (read length + Max_D) starting from mapped end in step 2 Ye et al. (2009)

28 The Pindel algorithm (Deletions) Use 3 end of left read as anchor point Use pattern growth to search for minimum and maximum unique substrings from the 3 end of the unmapped read (<=2x insert size) Use pattern growth to search for minimum and maximum unique substrings from the 5 end of the unmapped read (read length + Max_D) starting from mapped end in step 2 check if complete unmapped read can be combined from 3 and 5 end substrings matches Ye et al. (2009)

29 The Pindel algorithm (Insertions) Use 3 end of left read as anchor point Use pattern growth to search for minimum and maximum unique substrings from the 3 end of the unmapped read (<=2x insert size) Use pattern growth to search for minimum and maximum unique substrings from the 5 end of the unmapped read (read length -1) starting from mapped end in step 2 check if complete unmapped read can be combined from 3 and 5 end substrings matches Ye et al. (2009)

30 The Pindel algorithm (Insertions) Use 3 end of left read as anchor point Use pattern growth to search for minimum and maximum unique substrings from the 3 end of the unmapped read (<=2x insert size) Use pattern growth to search for minimum and maximum unique substrings from the 5 end of the unmapped read (read length -1) starting from mapped end in step 2 check if complete unmapped read can be combined from 3 and 5 end substrings matches In initial Pindel version exact matches to reference where required Ye et al. (2009)

31 The Pindel algorithm (Real Data) Ye et al. (2009)

32 The Pindel algorithm (Real Data) Ye et al. (2009)

33 The Pindel algorithm for complex variants a) large deletion b) tandem duplication c) inversion d-f) same as a-c with non-template sequence (yellow part) Ye et al. Pindel manual

34 Acknowledgements Tobias Rausch (EMBL) Kai Ye (Leiden University Medical Center) Anne-Katrin Emde (Freie Universität Berlin) References Kai Ye, Marcel H. Schulz, Quan Long, Rolf Apweiler, and Zemin Ning Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics (2009) 25(21): Pindel homepage: SplazerS homepage:

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014

Going Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014 Going Beyond SNPs with Next Genera5on Sequencing Technology 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Next Genera5on Sequencing Technology (NGS) NGS technology Discover more

More information

High-throughput sequence alignment. November 9, 2017

High-throughput sequence alignment. November 9, 2017 High-throughput sequence alignment November 9, 2017 a little history human genome project #1 (many U.S. government agencies and large institute) started October 1, 1990. Goal: 10x coverage of human genome,

More information

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering Genome Rearrangements In Man and Mouse Abhinav Tiwari Department of Bioengineering Genome Rearrangement Scrambling of the order of the genome during evolution Operations on chromosomes Reversal Translocation

More information

The breakpoint distance for signed sequences

The breakpoint distance for signed sequences The breakpoint distance for signed sequences Guillaume Blin 1, Cedric Chauve 2 Guillaume Fertin 1 and 1 LINA, FRE CNRS 2729 2 LACIM et Département d'informatique, Université de Nantes, Université du Québec

More information

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven) BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

More information

Genomes Comparision via de Bruijn graphs

Genomes Comparision via de Bruijn graphs Genomes Comparision via de Bruijn graphs Student: Ilya Minkin Advisor: Son Pham St. Petersburg Academic University June 4, 2012 1 / 19 Synteny Blocks: Algorithmic challenge Suppose that we are given two

More information

Minimal Height and Sequence Constrained Longest Increasing Subsequence

Minimal Height and Sequence Constrained Longest Increasing Subsequence Minimal Height and Sequence Constrained Longest Increasing Subsequence Chiou-Ting Tseng, Chang-Biau Yang and Hsing-Yen Ann Department of Computer Science and Engineering National Sun Yat-sen University,

More information

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2011 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism

More information

7 Multiple Genome Alignment

7 Multiple Genome Alignment 94 Bioinformatics I, WS /3, D. Huson, December 3, 0 7 Multiple Genome Alignment Assume we have a set of genomes G,..., G t that we want to align with each other. If they are short and very closely related,

More information

Multiple Whole Genome Alignment

Multiple Whole Genome Alignment Multiple Whole Genome Alignment BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 206 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2015 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism

More information

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on: 17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.

More information

Paired-End Read Length Lower Bounds for Genome Re-sequencing

Paired-End Read Length Lower Bounds for Genome Re-sequencing 1/11 Paired-End Read Length Lower Bounds for Genome Re-sequencing Rayan Chikhi ENS Cachan Brittany PhD student in the Symbiose team, Irisa, France 2/11 NEXT-GENERATION SEQUENCING Next-gen vs. traditional

More information

Computational Genetics Winter 2013 Lecture 10. Eleazar Eskin University of California, Los Angeles

Computational Genetics Winter 2013 Lecture 10. Eleazar Eskin University of California, Los Angeles Computational Genetics Winter 2013 Lecture 10 Eleazar Eskin University of California, Los ngeles Pair End Sequencing Lecture 10. February 20th, 2013 (Slides from Ben Raphael) Chromosome Painting: Normal

More information

Characterization of Structural Variants with Single Molecule and Hybrid Sequencing Approaches

Characterization of Structural Variants with Single Molecule and Hybrid Sequencing Approaches Bioinformatics Advance Access published October 28, 2014 Characterization of Structural Variants with Single Molecule and Hybrid Sequencing Approaches Anna Ritz 1,,, Ali Bashir 2,3, Suzanne Sindi 4, David

More information

Nature Genetics: doi:0.1038/ng.2768

Nature Genetics: doi:0.1038/ng.2768 Supplementary Figure 1: Graphic representation of the duplicated region at Xq28 in each one of the 31 samples as revealed by acgh. Duplications are represented in red and triplications in blue. Top: Genomic

More information

Assembly improvement: based on Ragout approach. student: Anna Lioznova scientific advisor: Son Pham

Assembly improvement: based on Ragout approach. student: Anna Lioznova scientific advisor: Son Pham Assembly improvement: based on Ragout approach student: Anna Lioznova scientific advisor: Son Pham Plan Ragout overview Datasets Assembly improvements Quality overlap graph paired-end reads Coverage Plan

More information

arxiv: v1 [q-bio.gn] 5 Mar 2012

arxiv: v1 [q-bio.gn] 5 Mar 2012 CLEVER: Clique-Enumerating Variant Finder Tobias Marschall 1, Ivan Costa 2, Stefan Canzar 1, Markus Bauer 3, Gunnar Klau 1, Alexander Schliep 4, Alexander Schönhuth 1 arxiv:1203.0937v1 [q-bio.gn] 5 Mar

More information

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008

More information

Linear-Space Alignment

Linear-Space Alignment Linear-Space Alignment Subsequences and Substrings Definition A string x is a substring of a string x, if x = ux v for some prefix string u and suffix string v (similarly, x = x i x j, for some 1 i j x

More information

RGP finder: prediction of Genomic Islands

RGP finder: prediction of Genomic Islands Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication

More information

Local Alignment: Smith-Waterman algorithm

Local Alignment: Smith-Waterman algorithm Local Alignment: Smith-Waterman algorithm Example: a shared common domain of two protein sequences; extended sections of genomic DNA sequence. Sensitive to detect similarity in highly diverged sequences.

More information

Whole Genome Alignments and Synteny Maps

Whole Genome Alignments and Synteny Maps Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of

More information

Analysis and Design of Algorithms Dynamic Programming

Analysis and Design of Algorithms Dynamic Programming Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................

More information

MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance

MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance MACFP: Maximal Approximate Consecutive Frequent Pattern Mining under Edit Distance Jingbo Shang, Jian Peng, Jiawei Han University of Illinois, Urbana-Champaign May 6, 2016 Presented by Jingbo Shang 2 Outline

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Alexandru Tomescu, Leena Salmela, Veli Mäkinen, Esa Pitkänen 582670 Algorithms for Bioinformatics Lecture 5: Combinatorial Algorithms and Genomic Rearrangements 1.10.2015 Background

More information

BIO GENETICS CHROMOSOME MUTATIONS

BIO GENETICS CHROMOSOME MUTATIONS BIO 390 - GENETICS CHROMOSOME MUTATIONS OVERVIEW - Multiples of complete sets of chromosomes are called polyploidy. Even numbers are usually fertile. Odd numbers are usually sterile. - Aneuploidy refers

More information

Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem

Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem Efficient Polynomial-Time Algorithms for Variants of the Multiple Constrained LCS Problem Hsing-Yen Ann National Center for High-Performance Computing Tainan 74147, Taiwan Chang-Biau Yang and Chiou-Ting

More information

Sequence Alignment (chapter 6)

Sequence Alignment (chapter 6) Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Hidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays

Hidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays Hidden Markov Models for the Assessment of Chromosomal Alterations using High-throughput SNP Arrays Department of Biostatistics Johns Hopkins Bloomberg School of Public Health November 18, 2008 Acknowledgments

More information

1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine.

1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine. Protein Synthesis & Mutations RNA 1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine. RNA Contains: 1. Adenine 2.

More information

Implementing Approximate Regularities

Implementing Approximate Regularities Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 389; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs06.html 1/12/06 CAP5510/CGS5166 1 Evaluation

More information

Variant visualisation and quality control

Variant visualisation and quality control Variant visualisation and quality control You really should be making plots! 25/06/14 Paul Theodor Pyl 1 Classical Sequencing Example DNA.BAM.VCF Aligner Variant Caller A single sample sequencing run 25/06/14

More information

Perfect Sorting by Reversals and Deletions/Insertions

Perfect Sorting by Reversals and Deletions/Insertions The Ninth International Symposium on Operations Research and Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 512 518 Perfect Sorting by Reversals

More information

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6) Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?

More information

Handling Rearrangements in DNA Sequence Alignment

Handling Rearrangements in DNA Sequence Alignment Handling Rearrangements in DNA Sequence Alignment Maneesh Bhand 12/5/10 1 Introduction Sequence alignment is one of the core problems of bioinformatics, with a broad range of applications such as genome

More information

Chapter 10: Meiosis and Sexual Reproduction

Chapter 10: Meiosis and Sexual Reproduction Chapter 10: Meiosis and Sexual Reproduction AP Curriculum Alignment The preservation and continuity of genetic material that is being passed from generation to generation in sexually reproducing organisms

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 03: Edit distance and sequence alignment Slides adapted from Dr. Shaojie Zhang (University of Central Florida) KUMC visit How many of you would like to attend

More information

Cell Growth and Genetics

Cell Growth and Genetics Cell Growth and Genetics Cell Division (Mitosis) Cell division results in two identical daughter cells. The process of cell divisions occurs in three parts: Interphase - duplication of chromosomes and

More information

ADMM Fused Lasso for Copy Number Variation Detection in Human 3 March Genomes / 1

ADMM Fused Lasso for Copy Number Variation Detection in Human 3 March Genomes / 1 ADMM Fused Lasso for Copy Number Variation Detection in Human Genomes Yifei Chen and Jacob Biesinger 3 March 2011 ADMM Fused Lasso for Copy Number Variation Detection in Human 3 March Genomes 2011 1 /

More information

Bloom Filters, Minhashes, and Other Random Stuff

Bloom Filters, Minhashes, and Other Random Stuff Bloom Filters, Minhashes, and Other Random Stuff Brian Brubach University of Maryland, College Park StringBio 2018, University of Central Florida What? Probabilistic Space-efficient Fast Not exact Why?

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

Cancer: DNA Synthesis, Mitosis, and Meiosis

Cancer: DNA Synthesis, Mitosis, and Meiosis Chapter 5 Cancer: DNA Synthesis, Mitosis, and Meiosis Copyright 2007 Pearson Copyright Prentice Hall, 2007 Inc. Pearson Prentice Hall, Inc. 1 5.6 Meiosis Another form of cell division, meiosis, occurs

More information

Graduate Funding Information Center

Graduate Funding Information Center Graduate Funding Information Center UNC-Chapel Hill, The Graduate School Graduate Student Proposal Sponsor: Program Title: NESCent Graduate Fellowship Department: Biology Funding Type: Fellowship Year:

More information

An Integrated Approach for the Assessment of Chromosomal Abnormalities

An Integrated Approach for the Assessment of Chromosomal Abnormalities An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 26, 2007 Karyotypes Karyotypes General Cytogenetics

More information

Genome Sequencing and Structural Variation (2)

Genome Sequencing and Structural Variation (2) Genome Sequencing and Variation Analysis of matepairs for the identification of variants Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #11 Today

More information

What happens to the replicated chromosomes? depends on the goal of the division

What happens to the replicated chromosomes? depends on the goal of the division Segregating the replicated chromosomes What happens to the replicated chromosomes? depends on the goal of the division - to make more vegetative cells: mitosis daughter cells chromosome set should be identical

More information

Course: Visual Analytics of largescale biological data. Kay Nieselt Center for Bioinformatics Tübingen University of Tübingen

Course: Visual Analytics of largescale biological data. Kay Nieselt Center for Bioinformatics Tübingen University of Tübingen Course: Visual Analytics of largescale biological data Kay Nieselt Center for Bioinformatics Tübingen University of Tübingen THE SUPERGENOME AND GENOMERING Overview A revolution in genomics Flood of genomes:

More information

CHAPTER 15 LECTURE SLIDES

CHAPTER 15 LECTURE SLIDES CHAPTER 15 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.

More information

Reducing storage requirements for biological sequence comparison

Reducing storage requirements for biological sequence comparison Bioinformatics Advance Access published July 15, 2004 Bioinfor matics Oxford University Press 2004; all rights reserved. Reducing storage requirements for biological sequence comparison Michael Roberts,

More information

List of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi

List of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi Contents List of Code Challenges xvii About the Textbook xix Meet the Authors................................... xix Meet the Development Team............................ xx Acknowledgments..................................

More information

A DNA Sequence 2017/12/6 1

A DNA Sequence 2017/12/6 1 A DNA Sequence ccgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatcgtgtgg gtagtagctgatatgatgcgaggtaggggataggatagcaacagatgagc ggatgctgagtgcagtggcatgcgatgtcgatgatagcggtaggtagacttc gcgcataaagctgcgcgagatgattgcaaagragttagatgagctgatgcta

More information

The algorithm of equal acceptance region for detecting copy number alterations: applications to next-generation sequencing data

The algorithm of equal acceptance region for detecting copy number alterations: applications to next-generation sequencing data University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Information Sciences 2011 The algorithm of equal acceptance region for

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

Patterns of Simple Gene Assembly in Ciliates

Patterns of Simple Gene Assembly in Ciliates Patterns of Simple Gene Assembly in Ciliates Tero Harju Department of Mathematics, University of Turku Turku 20014 Finland harju@utu.fi Ion Petre Academy of Finland and Department of Information Technologies

More information

De novo assembly and genotyping of variants using colored de Bruijn graphs

De novo assembly and genotyping of variants using colored de Bruijn graphs De novo assembly and genotyping of variants using colored de Bruijn graphs Iqbal et al. 2012 Kolmogorov Mikhail 2013 Challenges Detecting genetic variants that are highly divergent from a reference Detecting

More information

Supplementary Figure 1. Phenotype of the HI strain.

Supplementary Figure 1. Phenotype of the HI strain. Supplementary Figure 1. Phenotype of the HI strain. (A) Phenotype of the HI and wild type plant after flowering (~1month). Wild type plant is tall with well elongated inflorescence. All four HI plants

More information

Designing and Testing a New DNA Fragment Assembler VEDA-2

Designing and Testing a New DNA Fragment Assembler VEDA-2 Designing and Testing a New DNA Fragment Assembler VEDA-2 Mark K. Goldberg Darren T. Lim Rensselaer Polytechnic Institute Computer Science Department {goldberg, limd}@cs.rpi.edu Abstract We present VEDA-2,

More information

An Integrated Approach for the Assessment of Chromosomal Abnormalities

An Integrated Approach for the Assessment of Chromosomal Abnormalities An Integrated Approach for the Assessment of Chromosomal Abnormalities Department of Biostatistics Johns Hopkins Bloomberg School of Public Health June 6, 2007 Karyotypes Mitosis and Meiosis Meiosis Meiosis

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Describe the process of cell division in prokaryotic cells. The Cell Cycle

Describe the process of cell division in prokaryotic cells. The Cell Cycle The Cell Cycle Objective # 1 In this topic we will examine the cell cycle, the series of changes that a cell goes through from one division to the next. We will pay particular attention to how the genetic

More information

Introduction to Sequence Alignment. Manpreet S. Katari

Introduction to Sequence Alignment. Manpreet S. Katari Introduction to Sequence Alignment Manpreet S. Katari 1 Outline 1. Global vs. local approaches to aligning sequences 1. Dot Plots 2. BLAST 1. Dynamic Programming 3. Hash Tables 1. BLAT 4. BWT (Burrow Wheeler

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Chapter 11 Chromosome Mutations. Changes in chromosome number Chromosomal rearrangements Evolution of genomes

Chapter 11 Chromosome Mutations. Changes in chromosome number Chromosomal rearrangements Evolution of genomes Chapter 11 Chromosome Mutations Changes in chromosome number Chromosomal rearrangements Evolution of genomes Aberrant chromosome constitutions of a normally diploid organism Name Designation Constitution

More information

Chapter 9 Sexual Reproduction and Meiosis

Chapter 9 Sexual Reproduction and Meiosis Chapter 9 Sexual Reproduction and Meiosis Ultrasound: Chad Ehlers/Glow Images Copyright McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of

More information

ARTICLE IN PRESS Discrete Applied Mathematics ( )

ARTICLE IN PRESS Discrete Applied Mathematics ( ) Discrete Applied Mathematics ( ) Contents lists available at ScienceDirect Discrete Applied Mathematics journal homepage: www.elsevier.com/locate/dam Repetition-free longest common subsequence Said S.

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment Molecular Modeling 2018-- Lecture 7 Homology modeling insertions/deletions manual realignment Homology modeling also called comparative modeling Sequences that have similar sequence have similar structure.

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Phylogenetic Networks with Recombination

Phylogenetic Networks with Recombination Phylogenetic Networks with Recombination October 17 2012 Recombination All DNA is recombinant DNA... [The] natural process of recombination and mutation have acted throughout evolution... Genetic exchange

More information

Sequence comparison by compression

Sequence comparison by compression Sequence comparison by compression Motivation similarity as a marker for homology. And homology is used to infer function. Sometimes, we are only interested in a numerical distance between two sequences.

More information

Special Topics on Genetics

Special Topics on Genetics ARISTOTLE UNIVERSITY OF THESSALONIKI OPEN COURSES Section 9: Transposable elements Drosopoulou E License The offered educational material is subject to Creative Commons licensing. For educational material,

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Overview of IslandPick pipeline and the generation of GI datasets

Overview of IslandPick pipeline and the generation of GI datasets Overview of IslandPick pipeline and the generation of GI datasets Predicting GIs using comparative genomics By using whole genome alignments we can identify regions that are present in one genome but not

More information

Cycle «Analyse de données de séquençage à haut-débit»

Cycle «Analyse de données de séquençage à haut-débit» Cycle «Analyse de données de séquençage à haut-débit» Module 1/5 Analyse ADN Chadi Saad CRIStAL - Équipe BONSAI - Univ Lille, CNRS, INRIA (chadi.saad@univ-lille.fr) Présentation de Sophie Gallina (source:

More information

Appendix B Microsoft Office Specialist exam objectives maps

Appendix B Microsoft Office Specialist exam objectives maps B 1 Appendix B Microsoft Office Specialist exam objectives maps This appendix covers these additional topics: A Excel 2003 Specialist exam objectives with references to corresponding material in Course

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Samson Zhou. Pattern Matching over Noisy Data Streams

Samson Zhou. Pattern Matching over Noisy Data Streams Samson Zhou Pattern Matching over Noisy Data Streams Finding Structure in Data Pattern Matching Finding all instances of a pattern within a string ABCD ABCAABCDAACAABCDBCABCDADDDEAEABCDA Knuth-Morris-Pratt

More information

Ensembl Exercise Answers Adapted from Ensembl tutorials presented by Dr. Bert Overduin, EBI

Ensembl Exercise Answers Adapted from Ensembl tutorials presented by Dr. Bert Overduin, EBI Ensembl Exercise Answers Adapted from Ensembl tutorials presented by Dr. Bert Overduin, EBI Exercise 1 Exploring the human MYH9 gene (a) Go to the Ensembl homepage (http://www.ensembl.org). Select Search:

More information

5.1 Cell Division and the Cell Cycle

5.1 Cell Division and the Cell Cycle 5.1 Cell Division and the Cell Cycle Lesson Objectives Contrast cell division in prokaryotes and eukaryotes. Identify the phases of the eukaryotic cell cycle. Explain how the cell cycle is controlled.

More information

Lecture 4: September 19

Lecture 4: September 19 CSCI1810: Computational Molecular Biology Fall 2017 Lecture 4: September 19 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

Linear models for the joint analysis of multiple. array-cgh profiles

Linear models for the joint analysis of multiple. array-cgh profiles Linear models for the joint analysis of multiple array-cgh profiles F. Picard, E. Lebarbier, B. Thiam, S. Robin. UMR 5558 CNRS Univ. Lyon 1, Lyon UMR 518 AgroParisTech/INRA, F-75231, Paris Statistics for

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

QQ 10/5/18 Copy the following into notebook:

QQ 10/5/18 Copy the following into notebook: Chapter 13- Meiosis QQ 10/5/18 Copy the following into notebook: Similarities: 1. 2. 3. 4. 5. Differences: 1. 2. 3. 4. 5. Figure 13.1 Living organisms are distinguished by their ability to reproduce their

More information

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd 4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we

More information

General context Anchor-based method Evaluation Discussion. CoCoGen meeting. Accuracy of the anchor-based strategy for genome alignment.

General context Anchor-based method Evaluation Discussion. CoCoGen meeting. Accuracy of the anchor-based strategy for genome alignment. CoCoGen meeting Accuracy of the anchor-based strategy for genome alignment Raluca Uricaru LIRMM, CNRS Université de Montpellier 2 3 octobre 2008 1 / 31 Summary 1 General context 2 Global alignment : anchor-based

More information

WHERE DOES THE VARIATION COME FROM IN THE FIRST PLACE?

WHERE DOES THE VARIATION COME FROM IN THE FIRST PLACE? What factors contribute to phenotypic variation? The world s tallest man, Sultan Kosen (8 feet 1 inch) towers over the world s smallest, He Ping (2 feet 5 inches). WHERE DOES THE VARIATION COME FROM IN

More information

Microbes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS. Elizabeth Tseng

Microbes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS. Elizabeth Tseng Microbes and you ON THE LATEST HUMAN MICROBIOME DISCOVERIES, COMPUTATIONAL QUESTIONS AND SOME SOLUTIONS Elizabeth Tseng Dept. of CSE, University of Washington Johanna Lampe Lab, Fred Hutchinson Cancer

More information

Supplementary Information for Discovery and characterization of indel and point mutations

Supplementary Information for Discovery and characterization of indel and point mutations Supplementary Information for Discovery and characterization of indel and point mutations using DeNovoGear Avinash Ramu 1 Michiel J. Noordam 1 Rachel S. Schwartz 2 Arthur Wuster 3 Matthew E. Hurles 3 Reed

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

Introduction to Bioinformatics Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Dr. rer. nat. Gong Jing Cancer Research Center Medicine School of Shandong University 2012.11.09 1 Chapter 4 Phylogenetic Tree 2 Phylogeny Evidence from morphological ( 形态学的 ), biochemical, and gene sequence

More information

OBLIVIOUS STRING EMBEDDINGS AND EDIT DISTANCE APPROXIMATIONS

OBLIVIOUS STRING EMBEDDINGS AND EDIT DISTANCE APPROXIMATIONS OBLIVIOUS STRING EMBEDDINGS AND EDIT DISTANCE APPROXIMATIONS Tuğkan Batu a, Funda Ergun b, and Cenk Sahinalp b a LONDON SCHOOL OF ECONOMICS b SIMON FRASER UNIVERSITY LSE CDAM Seminar Oblivious String Embeddings

More information

Complexity of Biomolecular Sequences

Complexity of Biomolecular Sequences Complexity of Biomolecular Sequences Institute of Signal Processing Tampere University of Technology Tampere University of Technology Page 1 Outline ➀ ➁ ➂ ➃ ➄ ➅ ➆ Introduction Biological Preliminaries

More information

Unit 2: Characteristics of Living Things Lesson 25: Mitosis

Unit 2: Characteristics of Living Things Lesson 25: Mitosis Name Unit 2: Characteristics of Living Things Lesson 25: Mitosis Objective: Students will be able to explain the phases of Mitosis. Date Essential Questions: 1. What are the phases of the eukaryotic cell

More information

A metric approach for. comparing DNA sequences

A metric approach for. comparing DNA sequences A metric approach for comparing DNA sequences H. Mora-Mora Department of Computer and Information Technology University of Alicante, Alicante, Spain M. Lloret-Climent Department of Applied Mathematics.

More information