Inferring positional homologs with common intervals of sequences
|
|
- Lester Reed
- 5 years ago
- Views:
Transcription
1 Outline Introduction Our approach Results Conclusion Inferring positional homologs with common intervals of sequences Guillaume Blin, Annie Chateau, Cedric Chauve, Yannick Gingras CGL - Université du Québec à Montréal Université de Marne la Vallée Séminaire de Bioinformatique BIF
2 Outline Introduction Our approach Results Conclusion 1 Ortholog assignment Definitions Importance Automated Inference of Orthologs 2 Our approach Common Intervals Matching Extraction Demo 3 Results Human and Mouse γ-proteobacteria 4 Conclusion Discussion 2
3 Outline Introduction Our approach Results Conclusion Definitions Importance Automation 1 Ortholog assignment Definitions Importance Automated Inference of Orthologs 2 Our approach Common Intervals Matching Extraction Demo 3 Results Human and Mouse γ-proteobacteria 4 Conclusion Discussion 3
4 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Definitions: orthologs, paralogs and homologs source: 4
5 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Why infer orthology? Inference of gene functions Phylogenomics: which copy of a gene do we compare? Gene order: nice to have a permutation etc. source: Wikipedia 5
6 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Mecanisms of evolution Duplications Mutations Losses Rearrangements = Variable gene content 6
7 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Assignment of orthologs: the easy part Make pairs with putative orthologs Single copy genes are paired together 7
8 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Assignment of orthologs: the easy part Make pairs with putative orthologs Single copy genes are paired together 7
9 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Assignment of orthologs: the easy part Make pairs with putative orthologs Single copy genes are paired together What about gene number 4? 7
10 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Harder: multiple copies 8
11 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Harder: multiple copies 8
12 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Harder: multiple copies 8
13 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Harder: multiple copies Which solution is the best? 8
14 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Mu A monk once asked master Zhao Zhou, Does a dog have Buddha-nature or not? Zhao Zhou said, Mu 9
15 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Exemplar Method Keep only one copy of each gene Choose the copy that optimizes a metric or a criterion Sankoff (1999): Minimize the breakpoint/reversal distance Bourque, Yacef, El-Mabrouk (2005): Maximize common/conserved intervals 10
16 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Exemplar Method Keep only one copy of each gene Choose the copy that optimizes a metric or a criterion Sankoff (1999): Minimize the breakpoint/reversal distance Bourque, Yacef, El-Mabrouk (2005): Maximize common/conserved intervals 10
17 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Exemplar Method Keep only one copy of each gene Choose the copy that optimizes a metric or a criterion Sankoff (1999): Minimize the breakpoint/reversal distance Bourque, Yacef, El-Mabrouk (2005): Maximize common/conserved intervals 10
18 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Exemplar Method Keep only one copy of each gene Choose the copy that optimizes a metric or a criterion Sankoff (1999): Minimize the breakpoint/reversal distance Bourque, Yacef, El-Mabrouk (2005): Maximize common/conserved intervals This is the Right Thing 10
19 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Exemplar Method The Right Thing is often hard to do: Minimize the breakpoint/reversal distance: NP-Hard (Bryant 2000) Maximize common/conserved intervals: NP-Hard (Blin et al 2005, 2006) 11
20 Outline Introduction Our approach Results Conclusion Definitions Importance Automation General Matching Keep one or more copy of each gene New pairs are separated from their family Minimize the number of rearrangements Uses breakpoint graph analysis Chen et al (2005): minimize reversal distance Fu et al (2006): extension to translocations Swenson et al (2005): maximize the number of cycles General problem is NP-Hard 12
21 Outline Introduction Our approach Results Conclusion Definitions Importance Automation General Matching Keep one or more copy of each gene New pairs are separated from their family Minimize the number of rearrangements Uses breakpoint graph analysis Chen et al (2005): minimize reversal distance Fu et al (2006): extension to translocations Swenson et al (2005): maximize the number of cycles General problem is NP-Hard 12
22 Outline Introduction Our approach Results Conclusion Definitions Importance Automation General Matching Keep one or more copy of each gene New pairs are separated from their family Minimize the number of rearrangements Uses breakpoint graph analysis Chen et al (2005): minimize reversal distance Fu et al (2006): extension to translocations Swenson et al (2005): maximize the number of cycles General problem is NP-Hard 12
23 Outline Introduction Our approach Results Conclusion Definitions Importance Automation General Matching Keep one or more copy of each gene New pairs are separated from their family Minimize the number of rearrangements Uses breakpoint graph analysis Chen et al (2005): minimize reversal distance Fu et al (2006): extension to translocations Swenson et al (2005): maximize the number of cycles General problem is NP-Hard 12
24 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Find long conserved words (colinear segments, conserved segments) Greedy version: always make a complete assignment in the longest unmatched conserved word Swenson et al (2005) Blin et al (2005) Used in Chen et al (2005) 13
25 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Find long conserved words (colinear segments, conserved segments) Greedy version: always make a complete assignment in the longest unmatched conserved word Swenson et al (2005) Blin et al (2005) Used in Chen et al (2005) 13
26 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Find long conserved words (colinear segments, conserved segments) Greedy version: always make a complete assignment in the longest unmatched conserved word Swenson et al (2005) Blin et al (2005) Used in Chen et al (2005) 13
27 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Find long conserved words (colinear segments, conserved segments) Greedy version: always make a complete assignment in the longest unmatched conserved word Swenson et al (2005) Blin et al (2005) Used in Chen et al (2005) 13
28 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Generates lots of false positives Misses local rearrangements 14
29 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Generates lots of false positives Misses local rearrangements 14
30 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Generates lots of false positives Misses local rearrangements 14
31 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo 1 Ortholog assignment Definitions Importance Automated Inference of Orthologs 2 Our approach Common Intervals Matching Extraction Demo 3 Results Human and Mouse γ-proteobacteria 4 Conclusion Discussion 15
32 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our Approach: The Idea Orthologous gene copies are more likely to share the same genome positions and share the same gene neighbors. Burgetz et al, Positional homology in bacterial genomes 16
33 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals Sets of genes with a contiguous occurence on each genome {1, 2, 3, 4} is a common interval {1, 3} is not Easy to detect: O(n 2 ) (Schmidt and Stoye 2004) Capture local rearrangements 17
34 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our Approach: Overview A common interval is a bunch of gene who stick together 18
35 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our Approach: Overview A common interval is a bunch of gene who stick together That s a good place to start a matching 18
36 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Box Representation 19
37 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Box Representation 19
38 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Box Representation 19
39 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Box Representation 19
40 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Box Representation 19
41 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Incompatibility A common interval occurence is incompatible with another one if we can t assign all the genes in both at the same time without conflicts 20
42 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Incompatibility A common interval occurence is incompatible with another one if we can t assign all the genes in both at the same time without conflicts 20
43 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Incompatibility A common interval occurence is incompatible with another one if we can t assign all the genes in both at the same time without conflicts 20
44 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Incompatible Boxes Two boxes are incompatible if they can hit each other by a vertical or horizontal translation 21
45 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Incompatible Boxes Two boxes are incompatible if they can hit each other by a vertical or horizontal translation 21
46 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22
47 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22
48 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22
49 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22
50 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22
51 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22
52 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22
53 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22
54 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22
55 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Demo Let s try it! 23
56 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria 1 Ortholog assignment Definitions Importance Automated Inference of Orthologs 2 Our approach Common Intervals Matching Extraction Demo 3 Results Human and Mouse γ-proteobacteria 4 Conclusion Discussion 24
57 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria Results: Human and mouse orthologs assignment Assignment on whole genomes Homologs identified with MSOAR hit graph and clustering MSOAR LCS Common Intervals Matched pairs True positives False positives % True positives 70% 69% 69% % False positives 17% 18% 17% MSOAR: General Matching (Z. Fu, X. Chen, V. Vacic, P. Nan, Y. Zhong, T. Jiang, 2006) LCS: Longest Common Substrings (G. Blin, C. Chauve, G. Fertin, 2005) True positives: genes with the same Uniprot name MSOAR is the most accurate Results are comparable Looking for large conserved structures is a valid approach 25
58 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria Results: assignment on bacterial genomes Assignment of orthologs on 8 γ-proteobacteria: Buchnera aphidicola APS Escherichia coli K12 Haemophilus influenzae Rd Pasteurella multocida Pm70 Pseudomonas aeruginosa PA01 Salmonella typhimurium LT2 Xylella fastidiosa 9a5c Yersinia pestis CO_92 Homologs identified by BLAST and clustering All 28 pairwise matching 26
59 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria Consistent Components Gene matching defines a graph on the 8 genomes A connected component is consistent if it contains at most one gene in each genome A perfect component is a consistent component that contains only true positives 27
60 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria Results: assignment on bacterial genomes LCS Common Intervals True positives False positives % True positives 86% 85% Components Consistent % Consistent 85% 88% TP in a CC Perfect Comp % Perfect Comp. 53% 52% LCS: Longest Common Substrings (G. Blin, C. Chauve, G. Fertin, 2005) True positives: genes with the same Uniprot name LCS is more accurate but less consistent 28
61 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria Note on low complexity Gene content vs area 29
62 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria Results: assignment on bacterial genomes LCS CI Filtered LCS Filtered CI True positives False positives % True positives 86% 85% 93% 90% Components Consistent % Consistent 85% 88% 98% 97% TP in a CC Perfect Comp % Perfect Comp. 53% 52% 43% 58% Filter with side 3 Consistency increases with filtering LCS has lower perfect component ratio with filtering Common intervals has higher perfect component ratio with filtering There are 263 perfect components of size 8 with filtering 30
63 Outline Introduction Our approach Results Conclusion Discussion 1 Ortholog assignment Definitions Importance Automated Inference of Orthologs 2 Our approach Common Intervals Matching Extraction Demo 3 Results Human and Mouse γ-proteobacteria 4 Conclusion Discussion 31
64 Outline Introduction Our approach Results Conclusion Discussion Conclusion The Right Thing is hard Simple heuristics can t handle natural shuffling Segments with similar gene content are likely to be related Common intervals are an efficient technique to locate them Future work: handle gaps, smart filtering, etc. 32
65 Outline Introduction Our approach Results Conclusion Discussion Discussion Note to self: stop here unless you have extra time! 33
66 Outline Introduction Our approach Results Conclusion Discussion Cigal: The program 1/3 34
67 Outline Introduction Our approach Results Conclusion Discussion Cigal: The program 2/3 35
68 Outline Introduction Our approach Results Conclusion Discussion Cigal: The program 3/3 36
69 Outline Introduction Our approach Results Conclusion Discussion Human and mouse: with filtering Filtered vs raw results Minimum side of length 3 Common Intervals Filtered Common Intervals Matched pairs True positives % True positives 69% 71% False positives % False positives 17% 16% LCS Filtered LCS Matched pairs True positives % True positives 69% 72% False positives % False positives 18% 15% 37
Genes order and phylogenetic reconstruction: application to γ-proteobacteria
Genes order and phylogenetic reconstruction: application to γ-proteobacteria Guillaume Blin 1, Cedric Chauve 2 and Guillaume Fertin 1 1 LINA FRE CNRS 2729, Université de Nantes 2 rue de la Houssinière,
More informationThe breakpoint distance for signed sequences
The breakpoint distance for signed sequences Guillaume Blin 1, Cedric Chauve 2 Guillaume Fertin 1 and 1 LINA, FRE CNRS 2729 2 LACIM et Département d'informatique, Université de Nantes, Université du Québec
More informationBMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)
BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged
More informationReversing Gene Erosion Reconstructing Ancestral Bacterial Genomes from Gene-Content and Order Data
Reversing Gene Erosion Reconstructing Ancestral Bacterial Genomes from Gene-Content and Order Data Joel V. Earnest-DeYoung 1, Emmanuelle Lerat 2, and Bernard M.E. Moret 1,3 Abstract In the last few years,
More informationIdentifying Positional Homologs as Bidirectional Best Hits of Sequence and Gene Context Similarity
Identifying Positional Homologs as Bidirectional Best Hits of Sequence and Gene Context Similarity Melvin Zhang Department of Computer Science National University of Singapore 13 Computing Drive, Singapore
More information7 Multiple Genome Alignment
94 Bioinformatics I, WS /3, D. Huson, December 3, 0 7 Multiple Genome Alignment Assume we have a set of genomes G,..., G t that we want to align with each other. If they are short and very closely related,
More informationPhylogenetic Reconstruction: Handling Large Scale
p. Phylogenetic Reconstruction: Handling Large Scale and Complex Data Bernard M.E. Moret Department of Computer Science University of New Mexico p. Acknowledgments Main collaborators: Tandy Warnow (UT
More informationMultiple Whole Genome Alignment
Multiple Whole Genome Alignment BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 206 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by
More informationComparing Genomes with Duplications: a Computational Complexity Point of View
TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 1 Comparing Genomes with Duplications: a Computational Complexity Point of View Guillaume Blin, Cedric Chauve, Guillaume Fertin, Romeo Rizzi and
More information17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:
17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.
More informationThe combinatorics and algorithmics of genomic rearrangements have been the subject of much
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 22, Number 5, 2015 # Mary Ann Liebert, Inc. Pp. 425 435 DOI: 10.1089/cmb.2014.0096 An Exact Algorithm to Compute the Double-Cutand-Join Distance for Genomes with
More informationSome Algorithmic Challenges in Genome-Wide Ortholog Assignment
Jiang T. Some algorithmic challenges in genome-wide ortholog assignment. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 25(1): 1 Jan. 2010 Some Algorithmic Challenges in Genome-Wide Ortholog Assignment Tao
More informationComparative genomics: Overview & Tools + MUMmer algorithm
Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first
More informationGene Maps Linearization using Genomic Rearrangement Distances
Gene Maps Linearization using Genomic Rearrangement Distances Guillaume Blin Eric Blais Danny Hermelin Pierre Guillon Mathieu Blanchette Nadia El-Mabrouk Abstract A preliminary step to most comparative
More informationGenome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering
Genome Rearrangements In Man and Mouse Abhinav Tiwari Department of Bioengineering Genome Rearrangement Scrambling of the order of the genome during evolution Operations on chromosomes Reversal Translocation
More informationAnalysis of Gene Order Evolution beyond Single-Copy Genes
Analysis of Gene Order Evolution beyond Single-Copy Genes Nadia El-Mabrouk Département d Informatique et de Recherche Opérationnelle Université de Montréal mabrouk@iro.umontreal.ca David Sankoff Department
More informationGeneral context Anchor-based method Evaluation Discussion. CoCoGen meeting. Accuracy of the anchor-based strategy for genome alignment.
CoCoGen meeting Accuracy of the anchor-based strategy for genome alignment Raluca Uricaru LIRMM, CNRS Université de Montpellier 2 3 octobre 2008 1 / 31 Summary 1 General context 2 Global alignment : anchor-based
More informationExample of Function Prediction
Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little
More informationPerfect Sorting by Reversals and Deletions/Insertions
The Ninth International Symposium on Operations Research and Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 512 518 Perfect Sorting by Reversals
More informationGene Maps Linearization using Genomic Rearrangement Distances
Gene Maps Linearization using Genomic Rearrangement Distances Guillaume Blin, Eric Blais, Danny Hermelin, Pierre Guillon, Mathieu Blanchette, Nadia El-Mabrouk To cite this version: Guillaume Blin, Eric
More informationI519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB
I519 Introduction to Bioinformatics, 2015 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism
More informationBioinformatics and BLAST
Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists
More informationIntegration of Omics Data to Investigate Common Intervals
2011 International Conference on Bioscience, Biochemistry and Bioinformatics IPCBEE vol.5 (2011) (2011) IACSIT Press, Singapore Integration of Omics Data to Investigate Common Intervals Sébastien Angibaud,
More informationAncestral Genome Organization: an Alignment Approach
Ancestral Genome Organization: an Alignment Approach Patrick Holloway 1, Krister Swenson 2, David Ardell 3, and Nadia El-Mabrouk 4 1 Département d Informatique et de Recherche Opérationnelle (DIRO), Université
More informationFitness constraints on horizontal gene transfer
Fitness constraints on horizontal gene transfer Dan I Andersson University of Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala, Sweden GMM 3, 30 Aug--2 Sep, Oslo, Norway Acknowledgements:
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationAlgorithms for Bioinformatics
Adapted from slides by Alexandru Tomescu, Leena Salmela, Veli Mäkinen, Esa Pitkänen 582670 Algorithms for Bioinformatics Lecture 5: Combinatorial Algorithms and Genomic Rearrangements 1.10.2015 Background
More informationChromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre
PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008
More informationGenome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.
Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction
More informationARTICLE IN PRESS Discrete Applied Mathematics ( )
Discrete Applied Mathematics ( ) Contents lists available at ScienceDirect Discrete Applied Mathematics journal homepage: www.elsevier.com/locate/dam Repetition-free longest common subsequence Said S.
More informationScaffold Filling Under the Breakpoint Distance
Scaffold Filling Under the Breakpoint Distance Haitao Jiang 1,2, Chunfang Zheng 3, David Sankoff 4, and Binhai Zhu 1 1 Department of Computer Science, Montana State University, Bozeman, MT 59717-3880,
More informationHomology and Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The
More informationComputational Biology
Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,
More informationAn Integrative Method for Accurate Comparative Genome Mapping
An Integrative Method for Accurate Comparative Genome Mapping Firas Swidan 1,2*, Eduardo P. C. Rocha 3,4, Michael Shmoish 1, Ron Y. Pinter 1 1 Department of Computer Science, Technion, Israel Institute
More informationI519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB
I519 Introduction to Bioinformatics, 2011 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism
More informationComparing Genomes! Homologies and Families! Sequence Alignments!
Comparing Genomes! Homologies and Families! Sequence Alignments! Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationWhole Genome Alignment. Adam Phillippy University of Maryland, Fall 2012
Whole Genome Alignment Adam Phillippy University of Maryland, Fall 2012 Motivation cancergenome.nih.gov Breast cancer karyotypes www.path.cam.ac.uk Goal of whole-genome alignment } For two genomes, A and
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationBLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010
BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for
More informationMotivating the need for optimal sequence alignments...
1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationOrthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona
Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona (tgabaldon@crg.es) http://gabaldonlab.crg.es Homology the same organ in different animals under
More informationAlignment Algorithms. Alignment Algorithms
Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.
More informationA PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS
A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology
More informationWhole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationLocal Search Based Approximation Algorithms. Vinayaka Pandit. IBM India Research Laboratory
Local Search Based Approximation Algorithms The k-median problem Vinayaka Pandit IBM India Research Laboratory joint work with Naveen Garg, Rohit Khandekar, and Vijay Arya The 2011 School on Approximability,
More informationRevisiting the Minimum Breakpoint Linearization Problem Theoretical Computer Science
Revisiting the Minimum Breakpoint Linearization Problem Theoretical Computer Science Laurent Bulteau, Guillaume Fertin, Irena Rusu To cite this version: Laurent Bulteau, Guillaume Fertin, Irena Rusu. Revisiting
More informationAN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM
AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM MENG ZHANG College of Computer Science and Technology, Jilin University, China Email: zhangmeng@jlueducn WILLIAM ARNDT AND JIJUN TANG Dept of Computer Science
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationOGtree: a tool for creating genome trees of prokaryotes based on overlapping genes
Published online 2 May 2008 Nucleic Acids Research, 2008, Vol. 36, Web Server issue W475 W480 doi:10.1093/nar/gkn240 OGtree: a tool for creating genome trees of prokaryotes based on overlapping genes Li-Wei
More informationOn Critical Path Selection Based Upon Statistical Timing Models -- Theory and Practice
On Critical Path Selection Based Upon Statistical Timing Models -- Theory and Practice Jing-Jia Liou, Angela Krstic, Li-C. Wang, and Kwang-Ting Cheng University of California - Santa Barbara Problem Find
More informationScaffold Filling Under the Breakpoint and Related Distances
1 Scaffold Filling Under the Breakpoint and Related Distances Haitao Jiang School of Computer Science and Technology School of Maththematics and System Science Shandong University Jinan, China Email: htjiang@mail.sdu.edu.cn
More informationComparative Genomics II
Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationMiGA: The Microbial Genome Atlas
December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From
More informationRELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES
Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that
More informationInferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT
Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions
More informationSession 5: Phylogenomics
Session 5: Phylogenomics B.- Phylogeny based orthology assignment REMINDER: Gene tree reconstruction is divided in three steps: homology search, multiple sequence alignment and model selection plus tree
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationHandling Rearrangements in DNA Sequence Alignment
Handling Rearrangements in DNA Sequence Alignment Maneesh Bhand 12/5/10 1 Introduction Sequence alignment is one of the core problems of bioinformatics, with a broad range of applications such as genome
More informationTests for gene clustering
Tests for gene clustering Dannie Durand Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA email: durand@cmu.edu David Sankoff Department of Mathematics and Statistics,
More informationA PROTEOMIC APPROACH FOR IDENTIFICATION OF BACTERIA USING TANDEM MASS SPECTROMETRY COMBINED WITH A TRANSLATOME DATABASE AND STATISTICAL SCORING
A PROTEOMIC APPROACH FOR IDENTIFICATION OF BACTERIA USING TANDEM MASS SPECTROMETRY COMBINED WITH A TRANSLATOME DATABASE AND STATISTICAL SCORING Jacek P. Dworzanski Geo-Centers, Inc., Aberdeen Proving Ground,
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationComparative Network Analysis
Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by
More information1 ATGGGTCTC 2 ATGAGTCTC
We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that
More informationHomology. and. Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology
More informationFigure S1. Pangenome plots of ten recombining bacterial species based on RAST annotated
Figure S1 Figure S2 Supplementary Figure legends Figure S1. Pangenome plots of ten recombining bacterial species based on RAST annotated genomes. To generate these plots all strains within a species were
More informationPhylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)
Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction Lesser Tenrec (Echinops telfairi) Goals: 1. Use phylogenetic experimental design theory to select optimal taxa to
More informationC3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationSupplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss
Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Methods Identification of orthologues, alignment and evolutionary distances A preliminary set of orthologues was
More informationGenome Rearrangement Problems with Single and Multiple Gene Copies: A Review
Genome Rearrangement Problems with Single and Multiple Gene Copies: A Review Ron Zeira and Ron Shamir June 27, 2018 Dedicated to Bernard Moret upon his retirement. Abstract Problems of genome rearrangement
More informationarxiv: v2 [cs.ds] 2 Dec 2013
arxiv:1305.4747v2 [cs.ds] 2 Dec 2013 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Easy identification of generalized common and conserved nested intervals Fabien de Montgolfier
More informationNon-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore
Non-binary Tree Reconciliation Louxin Zhang Department of Mathematics National University of Singapore matzlx@nus.edu.sg Introduction: Gene Duplication Inference Consider a duplication gene family G Species
More informationNetwork Alignment 858L
Network Alignment 858L Terms & Questions A homologous h Interolog = B h Species 1 Species 2 Are there conserved pathways? What is the minimum set of pathways required for life? Can we compare networks
More informationA Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes
A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes Cedric Chauve 1, Eric Tannier 2,3,4,5 * 1 Department of Mathematics,
More informationIntroduction to protein alignments
Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare
More informationPhylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.
Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class
More informationSequence Alignment (chapter 6)
Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:
More informationGenômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal
Genômica comparativa João Carlos Setubal IQ-USP outubro 2012 11/5/2012 J. C. Setubal 1 Comparative genomics There are currently (out/2012) 2,230 completed sequenced microbial genomes publicly available
More informationIn order to compare the proteins of the phylogenomic matrix, we needed a similarity
Similarity Matrix Generation In order to compare the proteins of the phylogenomic matrix, we needed a similarity measure. Hamming distances between phylogenetic profiles require the use of thresholds for
More informationGenome Rearrangement Problems with Single and Multiple Gene Copies: A Review
Genome Rearrangement Problems with Single and Multiple Gene Copies: A Review Ron Zeira and Ron Shamir August 9, 2018 Dedicated to Bernard Moret upon his retirement. Abstract Genome rearrangement problems
More informationBioinformatics tools for phylogeny and visualization. Yanbin Yin
Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and
More informationHorizontal gene transfer and the evolution of transcriptional regulation in Escherichia coli
Horizontal gene transfer and the evolution of transcriptional regulation in Escherichia coli Morgan N Price 1,2, Paramvir S Dehal 1,2 and Adam P Arkin 1,2,3 1 Physical Biosciences Division, Lawrence Berkeley
More informationFrom Phylogenetics to Phylogenomics: The Evolutionary Relationships of Insect Endosymbiotic γ-proteobacteria as a Test Case
Syst. Biol. 56(1):1 16, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150601109759 From Phylogenetics to Phylogenomics: The Evolutionary Relationships
More informationBiol478/ August
Biol478/595 29 August # Day Inst. Topic Hwk Reading August 1 M 25 MG Introduction 2 W 27 MG Sequences and Evolution Handouts 3 F 29 MG Sequences and Evolution September M 1 Labor Day 4 W 3 MG Database
More informationBioinformatics: Network Analysis
Bioinformatics: Network Analysis Comparative Network Analysis COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Biomolecular Network Components 2 Accumulation of Network Components
More informationBasic Local Alignment Search Tool
Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses
More informationGenomes Comparision via de Bruijn graphs
Genomes Comparision via de Bruijn graphs Student: Ilya Minkin Advisor: Son Pham St. Petersburg Academic University June 4, 2012 1 / 19 Synteny Blocks: Algorithmic challenge Suppose that we are given two
More informationarxiv: v1 [cs.ds] 21 May 2013
Easy identification of generalized common nested intervals Fabien de Montgolfier 1, Mathieu Raffinot 1, and Irena Rusu 2 arxiv:1305.4747v1 [cs.ds] 21 May 2013 1 LIAFA, Univ. Paris Diderot - Paris 7, 75205
More informationEvolutionary Analysis by Whole-Genome Comparisons
JOURNAL OF BACTERIOLOGY, Apr. 2002, p. 2260 2272 Vol. 184, No. 8 0021-9193/02/$04.00 0 DOI: 184.8.2260 2272.2002 Copyright 2002, American Society for Microbiology. All Rights Reserved. Evolutionary Analysis
More informationPhylogenetic Networks with Recombination
Phylogenetic Networks with Recombination October 17 2012 Recombination All DNA is recombinant DNA... [The] natural process of recombination and mutation have acted throughout evolution... Genetic exchange
More informationGraphs, permutations and sets in genome rearrangement
ntroduction Graphs, permutations and sets in genome rearrangement 1 alabarre@ulb.ac.be Universite Libre de Bruxelles February 6, 2006 Computers in Scientic Discovery 1 Funded by the \Fonds pour la Formation
More information1 Introduction. Abstract
CBS 530 Assignment No 2 SHUBHRA GUPTA shubhg@asu.edu 993755974 Review of the papers: Construction and Analysis of a Human-Chimpanzee Comparative Clone Map and Intra- and Interspecific Variation in Primate
More informationX X (2) X Pr(X = x θ) (3)
Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationDid you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden
Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1 Results Multiple Alignment with SP-score Star Alignment Tree Alignment (with given phylogeny) are NP-hard
More informationPairwise & Multiple sequence alignments
Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived
More informationLinear-Space Alignment
Linear-Space Alignment Subsequences and Substrings Definition A string x is a substring of a string x, if x = ux v for some prefix string u and suffix string v (similarly, x = x i x j, for some 1 i j x
More informationGenetic Basis of Variation in Bacteria
Mechanisms of Infectious Disease Fall 2009 Genetics I Jonathan Dworkin, PhD Department of Microbiology jonathan.dworkin@columbia.edu Genetic Basis of Variation in Bacteria I. Organization of genetic material
More information