Inferring positional homologs with common intervals of sequences

Size: px
Start display at page:

Download "Inferring positional homologs with common intervals of sequences"

Transcription

1 Outline Introduction Our approach Results Conclusion Inferring positional homologs with common intervals of sequences Guillaume Blin, Annie Chateau, Cedric Chauve, Yannick Gingras CGL - Université du Québec à Montréal Université de Marne la Vallée Séminaire de Bioinformatique BIF

2 Outline Introduction Our approach Results Conclusion 1 Ortholog assignment Definitions Importance Automated Inference of Orthologs 2 Our approach Common Intervals Matching Extraction Demo 3 Results Human and Mouse γ-proteobacteria 4 Conclusion Discussion 2

3 Outline Introduction Our approach Results Conclusion Definitions Importance Automation 1 Ortholog assignment Definitions Importance Automated Inference of Orthologs 2 Our approach Common Intervals Matching Extraction Demo 3 Results Human and Mouse γ-proteobacteria 4 Conclusion Discussion 3

4 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Definitions: orthologs, paralogs and homologs source: 4

5 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Why infer orthology? Inference of gene functions Phylogenomics: which copy of a gene do we compare? Gene order: nice to have a permutation etc. source: Wikipedia 5

6 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Mecanisms of evolution Duplications Mutations Losses Rearrangements = Variable gene content 6

7 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Assignment of orthologs: the easy part Make pairs with putative orthologs Single copy genes are paired together 7

8 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Assignment of orthologs: the easy part Make pairs with putative orthologs Single copy genes are paired together 7

9 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Assignment of orthologs: the easy part Make pairs with putative orthologs Single copy genes are paired together What about gene number 4? 7

10 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Harder: multiple copies 8

11 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Harder: multiple copies 8

12 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Harder: multiple copies 8

13 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Harder: multiple copies Which solution is the best? 8

14 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Mu A monk once asked master Zhao Zhou, Does a dog have Buddha-nature or not? Zhao Zhou said, Mu 9

15 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Exemplar Method Keep only one copy of each gene Choose the copy that optimizes a metric or a criterion Sankoff (1999): Minimize the breakpoint/reversal distance Bourque, Yacef, El-Mabrouk (2005): Maximize common/conserved intervals 10

16 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Exemplar Method Keep only one copy of each gene Choose the copy that optimizes a metric or a criterion Sankoff (1999): Minimize the breakpoint/reversal distance Bourque, Yacef, El-Mabrouk (2005): Maximize common/conserved intervals 10

17 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Exemplar Method Keep only one copy of each gene Choose the copy that optimizes a metric or a criterion Sankoff (1999): Minimize the breakpoint/reversal distance Bourque, Yacef, El-Mabrouk (2005): Maximize common/conserved intervals 10

18 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Exemplar Method Keep only one copy of each gene Choose the copy that optimizes a metric or a criterion Sankoff (1999): Minimize the breakpoint/reversal distance Bourque, Yacef, El-Mabrouk (2005): Maximize common/conserved intervals This is the Right Thing 10

19 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Exemplar Method The Right Thing is often hard to do: Minimize the breakpoint/reversal distance: NP-Hard (Bryant 2000) Maximize common/conserved intervals: NP-Hard (Blin et al 2005, 2006) 11

20 Outline Introduction Our approach Results Conclusion Definitions Importance Automation General Matching Keep one or more copy of each gene New pairs are separated from their family Minimize the number of rearrangements Uses breakpoint graph analysis Chen et al (2005): minimize reversal distance Fu et al (2006): extension to translocations Swenson et al (2005): maximize the number of cycles General problem is NP-Hard 12

21 Outline Introduction Our approach Results Conclusion Definitions Importance Automation General Matching Keep one or more copy of each gene New pairs are separated from their family Minimize the number of rearrangements Uses breakpoint graph analysis Chen et al (2005): minimize reversal distance Fu et al (2006): extension to translocations Swenson et al (2005): maximize the number of cycles General problem is NP-Hard 12

22 Outline Introduction Our approach Results Conclusion Definitions Importance Automation General Matching Keep one or more copy of each gene New pairs are separated from their family Minimize the number of rearrangements Uses breakpoint graph analysis Chen et al (2005): minimize reversal distance Fu et al (2006): extension to translocations Swenson et al (2005): maximize the number of cycles General problem is NP-Hard 12

23 Outline Introduction Our approach Results Conclusion Definitions Importance Automation General Matching Keep one or more copy of each gene New pairs are separated from their family Minimize the number of rearrangements Uses breakpoint graph analysis Chen et al (2005): minimize reversal distance Fu et al (2006): extension to translocations Swenson et al (2005): maximize the number of cycles General problem is NP-Hard 12

24 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Find long conserved words (colinear segments, conserved segments) Greedy version: always make a complete assignment in the longest unmatched conserved word Swenson et al (2005) Blin et al (2005) Used in Chen et al (2005) 13

25 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Find long conserved words (colinear segments, conserved segments) Greedy version: always make a complete assignment in the longest unmatched conserved word Swenson et al (2005) Blin et al (2005) Used in Chen et al (2005) 13

26 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Find long conserved words (colinear segments, conserved segments) Greedy version: always make a complete assignment in the longest unmatched conserved word Swenson et al (2005) Blin et al (2005) Used in Chen et al (2005) 13

27 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Find long conserved words (colinear segments, conserved segments) Greedy version: always make a complete assignment in the longest unmatched conserved word Swenson et al (2005) Blin et al (2005) Used in Chen et al (2005) 13

28 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Generates lots of false positives Misses local rearrangements 14

29 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Generates lots of false positives Misses local rearrangements 14

30 Outline Introduction Our approach Results Conclusion Definitions Importance Automation Longest Common Substrings Generates lots of false positives Misses local rearrangements 14

31 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo 1 Ortholog assignment Definitions Importance Automated Inference of Orthologs 2 Our approach Common Intervals Matching Extraction Demo 3 Results Human and Mouse γ-proteobacteria 4 Conclusion Discussion 15

32 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our Approach: The Idea Orthologous gene copies are more likely to share the same genome positions and share the same gene neighbors. Burgetz et al, Positional homology in bacterial genomes 16

33 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals Sets of genes with a contiguous occurence on each genome {1, 2, 3, 4} is a common interval {1, 3} is not Easy to detect: O(n 2 ) (Schmidt and Stoye 2004) Capture local rearrangements 17

34 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our Approach: Overview A common interval is a bunch of gene who stick together 18

35 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our Approach: Overview A common interval is a bunch of gene who stick together That s a good place to start a matching 18

36 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Box Representation 19

37 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Box Representation 19

38 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Box Representation 19

39 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Box Representation 19

40 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Box Representation 19

41 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Incompatibility A common interval occurence is incompatible with another one if we can t assign all the genes in both at the same time without conflicts 20

42 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Incompatibility A common interval occurence is incompatible with another one if we can t assign all the genes in both at the same time without conflicts 20

43 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Incompatibility A common interval occurence is incompatible with another one if we can t assign all the genes in both at the same time without conflicts 20

44 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Incompatible Boxes Two boxes are incompatible if they can hit each other by a vertical or horizontal translation 21

45 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Common Intervals: Incompatible Boxes Two boxes are incompatible if they can hit each other by a vertical or horizontal translation 21

46 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22

47 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22

48 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22

49 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22

50 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22

51 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22

52 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22

53 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22

54 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Our approach Repeat as long as there is an unmatched common interval Pick L, the largest box Filter out boxes incompatible with L Recurse on L 22

55 Outline Introduction Our approach Results Conclusion Common Intervals Matching Extraction Demo Demo Let s try it! 23

56 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria 1 Ortholog assignment Definitions Importance Automated Inference of Orthologs 2 Our approach Common Intervals Matching Extraction Demo 3 Results Human and Mouse γ-proteobacteria 4 Conclusion Discussion 24

57 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria Results: Human and mouse orthologs assignment Assignment on whole genomes Homologs identified with MSOAR hit graph and clustering MSOAR LCS Common Intervals Matched pairs True positives False positives % True positives 70% 69% 69% % False positives 17% 18% 17% MSOAR: General Matching (Z. Fu, X. Chen, V. Vacic, P. Nan, Y. Zhong, T. Jiang, 2006) LCS: Longest Common Substrings (G. Blin, C. Chauve, G. Fertin, 2005) True positives: genes with the same Uniprot name MSOAR is the most accurate Results are comparable Looking for large conserved structures is a valid approach 25

58 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria Results: assignment on bacterial genomes Assignment of orthologs on 8 γ-proteobacteria: Buchnera aphidicola APS Escherichia coli K12 Haemophilus influenzae Rd Pasteurella multocida Pm70 Pseudomonas aeruginosa PA01 Salmonella typhimurium LT2 Xylella fastidiosa 9a5c Yersinia pestis CO_92 Homologs identified by BLAST and clustering All 28 pairwise matching 26

59 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria Consistent Components Gene matching defines a graph on the 8 genomes A connected component is consistent if it contains at most one gene in each genome A perfect component is a consistent component that contains only true positives 27

60 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria Results: assignment on bacterial genomes LCS Common Intervals True positives False positives % True positives 86% 85% Components Consistent % Consistent 85% 88% TP in a CC Perfect Comp % Perfect Comp. 53% 52% LCS: Longest Common Substrings (G. Blin, C. Chauve, G. Fertin, 2005) True positives: genes with the same Uniprot name LCS is more accurate but less consistent 28

61 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria Note on low complexity Gene content vs area 29

62 Outline Introduction Our approach Results Conclusion Human and Mouse γ-proteobacteria Results: assignment on bacterial genomes LCS CI Filtered LCS Filtered CI True positives False positives % True positives 86% 85% 93% 90% Components Consistent % Consistent 85% 88% 98% 97% TP in a CC Perfect Comp % Perfect Comp. 53% 52% 43% 58% Filter with side 3 Consistency increases with filtering LCS has lower perfect component ratio with filtering Common intervals has higher perfect component ratio with filtering There are 263 perfect components of size 8 with filtering 30

63 Outline Introduction Our approach Results Conclusion Discussion 1 Ortholog assignment Definitions Importance Automated Inference of Orthologs 2 Our approach Common Intervals Matching Extraction Demo 3 Results Human and Mouse γ-proteobacteria 4 Conclusion Discussion 31

64 Outline Introduction Our approach Results Conclusion Discussion Conclusion The Right Thing is hard Simple heuristics can t handle natural shuffling Segments with similar gene content are likely to be related Common intervals are an efficient technique to locate them Future work: handle gaps, smart filtering, etc. 32

65 Outline Introduction Our approach Results Conclusion Discussion Discussion Note to self: stop here unless you have extra time! 33

66 Outline Introduction Our approach Results Conclusion Discussion Cigal: The program 1/3 34

67 Outline Introduction Our approach Results Conclusion Discussion Cigal: The program 2/3 35

68 Outline Introduction Our approach Results Conclusion Discussion Cigal: The program 3/3 36

69 Outline Introduction Our approach Results Conclusion Discussion Human and mouse: with filtering Filtered vs raw results Minimum side of length 3 Common Intervals Filtered Common Intervals Matched pairs True positives % True positives 69% 71% False positives % False positives 17% 16% LCS Filtered LCS Matched pairs True positives % True positives 69% 72% False positives % False positives 18% 15% 37

Genes order and phylogenetic reconstruction: application to γ-proteobacteria

Genes order and phylogenetic reconstruction: application to γ-proteobacteria Genes order and phylogenetic reconstruction: application to γ-proteobacteria Guillaume Blin 1, Cedric Chauve 2 and Guillaume Fertin 1 1 LINA FRE CNRS 2729, Université de Nantes 2 rue de la Houssinière,

More information

The breakpoint distance for signed sequences

The breakpoint distance for signed sequences The breakpoint distance for signed sequences Guillaume Blin 1, Cedric Chauve 2 Guillaume Fertin 1 and 1 LINA, FRE CNRS 2729 2 LACIM et Département d'informatique, Université de Nantes, Université du Québec

More information

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven) BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

More information

Reversing Gene Erosion Reconstructing Ancestral Bacterial Genomes from Gene-Content and Order Data

Reversing Gene Erosion Reconstructing Ancestral Bacterial Genomes from Gene-Content and Order Data Reversing Gene Erosion Reconstructing Ancestral Bacterial Genomes from Gene-Content and Order Data Joel V. Earnest-DeYoung 1, Emmanuelle Lerat 2, and Bernard M.E. Moret 1,3 Abstract In the last few years,

More information

Identifying Positional Homologs as Bidirectional Best Hits of Sequence and Gene Context Similarity

Identifying Positional Homologs as Bidirectional Best Hits of Sequence and Gene Context Similarity Identifying Positional Homologs as Bidirectional Best Hits of Sequence and Gene Context Similarity Melvin Zhang Department of Computer Science National University of Singapore 13 Computing Drive, Singapore

More information

7 Multiple Genome Alignment

7 Multiple Genome Alignment 94 Bioinformatics I, WS /3, D. Huson, December 3, 0 7 Multiple Genome Alignment Assume we have a set of genomes G,..., G t that we want to align with each other. If they are short and very closely related,

More information

Phylogenetic Reconstruction: Handling Large Scale

Phylogenetic Reconstruction: Handling Large Scale p. Phylogenetic Reconstruction: Handling Large Scale and Complex Data Bernard M.E. Moret Department of Computer Science University of New Mexico p. Acknowledgments Main collaborators: Tandy Warnow (UT

More information

Multiple Whole Genome Alignment

Multiple Whole Genome Alignment Multiple Whole Genome Alignment BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 206 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Comparing Genomes with Duplications: a Computational Complexity Point of View

Comparing Genomes with Duplications: a Computational Complexity Point of View TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 1 Comparing Genomes with Duplications: a Computational Complexity Point of View Guillaume Blin, Cedric Chauve, Guillaume Fertin, Romeo Rizzi and

More information

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on: 17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.

More information

The combinatorics and algorithmics of genomic rearrangements have been the subject of much

The combinatorics and algorithmics of genomic rearrangements have been the subject of much JOURNAL OF COMPUTATIONAL BIOLOGY Volume 22, Number 5, 2015 # Mary Ann Liebert, Inc. Pp. 425 435 DOI: 10.1089/cmb.2014.0096 An Exact Algorithm to Compute the Double-Cutand-Join Distance for Genomes with

More information

Some Algorithmic Challenges in Genome-Wide Ortholog Assignment

Some Algorithmic Challenges in Genome-Wide Ortholog Assignment Jiang T. Some algorithmic challenges in genome-wide ortholog assignment. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 25(1): 1 Jan. 2010 Some Algorithmic Challenges in Genome-Wide Ortholog Assignment Tao

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

Gene Maps Linearization using Genomic Rearrangement Distances

Gene Maps Linearization using Genomic Rearrangement Distances Gene Maps Linearization using Genomic Rearrangement Distances Guillaume Blin Eric Blais Danny Hermelin Pierre Guillon Mathieu Blanchette Nadia El-Mabrouk Abstract A preliminary step to most comparative

More information

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering Genome Rearrangements In Man and Mouse Abhinav Tiwari Department of Bioengineering Genome Rearrangement Scrambling of the order of the genome during evolution Operations on chromosomes Reversal Translocation

More information

Analysis of Gene Order Evolution beyond Single-Copy Genes

Analysis of Gene Order Evolution beyond Single-Copy Genes Analysis of Gene Order Evolution beyond Single-Copy Genes Nadia El-Mabrouk Département d Informatique et de Recherche Opérationnelle Université de Montréal mabrouk@iro.umontreal.ca David Sankoff Department

More information

General context Anchor-based method Evaluation Discussion. CoCoGen meeting. Accuracy of the anchor-based strategy for genome alignment.

General context Anchor-based method Evaluation Discussion. CoCoGen meeting. Accuracy of the anchor-based strategy for genome alignment. CoCoGen meeting Accuracy of the anchor-based strategy for genome alignment Raluca Uricaru LIRMM, CNRS Université de Montpellier 2 3 octobre 2008 1 / 31 Summary 1 General context 2 Global alignment : anchor-based

More information

Example of Function Prediction

Example of Function Prediction Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little

More information

Perfect Sorting by Reversals and Deletions/Insertions

Perfect Sorting by Reversals and Deletions/Insertions The Ninth International Symposium on Operations Research and Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 512 518 Perfect Sorting by Reversals

More information

Gene Maps Linearization using Genomic Rearrangement Distances

Gene Maps Linearization using Genomic Rearrangement Distances Gene Maps Linearization using Genomic Rearrangement Distances Guillaume Blin, Eric Blais, Danny Hermelin, Pierre Guillon, Mathieu Blanchette, Nadia El-Mabrouk To cite this version: Guillaume Blin, Eric

More information

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2015 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism

More information

Bioinformatics and BLAST

Bioinformatics and BLAST Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists

More information

Integration of Omics Data to Investigate Common Intervals

Integration of Omics Data to Investigate Common Intervals 2011 International Conference on Bioscience, Biochemistry and Bioinformatics IPCBEE vol.5 (2011) (2011) IACSIT Press, Singapore Integration of Omics Data to Investigate Common Intervals Sébastien Angibaud,

More information

Ancestral Genome Organization: an Alignment Approach

Ancestral Genome Organization: an Alignment Approach Ancestral Genome Organization: an Alignment Approach Patrick Holloway 1, Krister Swenson 2, David Ardell 3, and Nadia El-Mabrouk 4 1 Département d Informatique et de Recherche Opérationnelle (DIRO), Université

More information

Fitness constraints on horizontal gene transfer

Fitness constraints on horizontal gene transfer Fitness constraints on horizontal gene transfer Dan I Andersson University of Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala, Sweden GMM 3, 30 Aug--2 Sep, Oslo, Norway Acknowledgements:

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Algorithms for Bioinformatics

Algorithms for Bioinformatics Adapted from slides by Alexandru Tomescu, Leena Salmela, Veli Mäkinen, Esa Pitkänen 582670 Algorithms for Bioinformatics Lecture 5: Combinatorial Algorithms and Genomic Rearrangements 1.10.2015 Background

More information

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre

Chromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

ARTICLE IN PRESS Discrete Applied Mathematics ( )

ARTICLE IN PRESS Discrete Applied Mathematics ( ) Discrete Applied Mathematics ( ) Contents lists available at ScienceDirect Discrete Applied Mathematics journal homepage: www.elsevier.com/locate/dam Repetition-free longest common subsequence Said S.

More information

Scaffold Filling Under the Breakpoint Distance

Scaffold Filling Under the Breakpoint Distance Scaffold Filling Under the Breakpoint Distance Haitao Jiang 1,2, Chunfang Zheng 3, David Sankoff 4, and Binhai Zhu 1 1 Department of Computer Science, Montana State University, Bozeman, MT 59717-3880,

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

An Integrative Method for Accurate Comparative Genome Mapping

An Integrative Method for Accurate Comparative Genome Mapping An Integrative Method for Accurate Comparative Genome Mapping Firas Swidan 1,2*, Eduardo P. C. Rocha 3,4, Michael Shmoish 1, Ron Y. Pinter 1 1 Department of Computer Science, Technion, Israel Institute

More information

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2011 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism

More information

Comparing Genomes! Homologies and Families! Sequence Alignments!

Comparing Genomes! Homologies and Families! Sequence Alignments! Comparing Genomes! Homologies and Families! Sequence Alignments! Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Whole Genome Alignment. Adam Phillippy University of Maryland, Fall 2012

Whole Genome Alignment. Adam Phillippy University of Maryland, Fall 2012 Whole Genome Alignment Adam Phillippy University of Maryland, Fall 2012 Motivation cancergenome.nih.gov Breast cancer karyotypes www.path.cam.ac.uk Goal of whole-genome alignment } For two genomes, A and

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona (tgabaldon@crg.es) http://gabaldonlab.crg.es Homology the same organ in different animals under

More information

Alignment Algorithms. Alignment Algorithms

Alignment Algorithms. Alignment Algorithms Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

Whole Genome Alignments and Synteny Maps

Whole Genome Alignments and Synteny Maps Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of

More information

Local Search Based Approximation Algorithms. Vinayaka Pandit. IBM India Research Laboratory

Local Search Based Approximation Algorithms. Vinayaka Pandit. IBM India Research Laboratory Local Search Based Approximation Algorithms The k-median problem Vinayaka Pandit IBM India Research Laboratory joint work with Naveen Garg, Rohit Khandekar, and Vijay Arya The 2011 School on Approximability,

More information

Revisiting the Minimum Breakpoint Linearization Problem Theoretical Computer Science

Revisiting the Minimum Breakpoint Linearization Problem Theoretical Computer Science Revisiting the Minimum Breakpoint Linearization Problem Theoretical Computer Science Laurent Bulteau, Guillaume Fertin, Irena Rusu To cite this version: Laurent Bulteau, Guillaume Fertin, Irena Rusu. Revisiting

More information

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM MENG ZHANG College of Computer Science and Technology, Jilin University, China Email: zhangmeng@jlueducn WILLIAM ARNDT AND JIJUN TANG Dept of Computer Science

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

More information

OGtree: a tool for creating genome trees of prokaryotes based on overlapping genes

OGtree: a tool for creating genome trees of prokaryotes based on overlapping genes Published online 2 May 2008 Nucleic Acids Research, 2008, Vol. 36, Web Server issue W475 W480 doi:10.1093/nar/gkn240 OGtree: a tool for creating genome trees of prokaryotes based on overlapping genes Li-Wei

More information

On Critical Path Selection Based Upon Statistical Timing Models -- Theory and Practice

On Critical Path Selection Based Upon Statistical Timing Models -- Theory and Practice On Critical Path Selection Based Upon Statistical Timing Models -- Theory and Practice Jing-Jia Liou, Angela Krstic, Li-C. Wang, and Kwang-Ting Cheng University of California - Santa Barbara Problem Find

More information

Scaffold Filling Under the Breakpoint and Related Distances

Scaffold Filling Under the Breakpoint and Related Distances 1 Scaffold Filling Under the Breakpoint and Related Distances Haitao Jiang School of Computer Science and Technology School of Maththematics and System Science Shandong University Jinan, China Email: htjiang@mail.sdu.edu.cn

More information

Comparative Genomics II

Comparative Genomics II Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

Session 5: Phylogenomics

Session 5: Phylogenomics Session 5: Phylogenomics B.- Phylogeny based orthology assignment REMINDER: Gene tree reconstruction is divided in three steps: homology search, multiple sequence alignment and model selection plus tree

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Handling Rearrangements in DNA Sequence Alignment

Handling Rearrangements in DNA Sequence Alignment Handling Rearrangements in DNA Sequence Alignment Maneesh Bhand 12/5/10 1 Introduction Sequence alignment is one of the core problems of bioinformatics, with a broad range of applications such as genome

More information

Tests for gene clustering

Tests for gene clustering Tests for gene clustering Dannie Durand Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA email: durand@cmu.edu David Sankoff Department of Mathematics and Statistics,

More information

A PROTEOMIC APPROACH FOR IDENTIFICATION OF BACTERIA USING TANDEM MASS SPECTROMETRY COMBINED WITH A TRANSLATOME DATABASE AND STATISTICAL SCORING

A PROTEOMIC APPROACH FOR IDENTIFICATION OF BACTERIA USING TANDEM MASS SPECTROMETRY COMBINED WITH A TRANSLATOME DATABASE AND STATISTICAL SCORING A PROTEOMIC APPROACH FOR IDENTIFICATION OF BACTERIA USING TANDEM MASS SPECTROMETRY COMBINED WITH A TRANSLATOME DATABASE AND STATISTICAL SCORING Jacek P. Dworzanski Geo-Centers, Inc., Aberdeen Proving Ground,

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Comparative Network Analysis

Comparative Network Analysis Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

1 ATGGGTCTC 2 ATGAGTCTC

1 ATGGGTCTC 2 ATGAGTCTC We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

Figure S1. Pangenome plots of ten recombining bacterial species based on RAST annotated

Figure S1. Pangenome plots of ten recombining bacterial species based on RAST annotated Figure S1 Figure S2 Supplementary Figure legends Figure S1. Pangenome plots of ten recombining bacterial species based on RAST annotated genomes. To generate these plots all strains within a species were

More information

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi) Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction Lesser Tenrec (Echinops telfairi) Goals: 1. Use phylogenetic experimental design theory to select optimal taxa to

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Methods Identification of orthologues, alignment and evolutionary distances A preliminary set of orthologues was

More information

Genome Rearrangement Problems with Single and Multiple Gene Copies: A Review

Genome Rearrangement Problems with Single and Multiple Gene Copies: A Review Genome Rearrangement Problems with Single and Multiple Gene Copies: A Review Ron Zeira and Ron Shamir June 27, 2018 Dedicated to Bernard Moret upon his retirement. Abstract Problems of genome rearrangement

More information

arxiv: v2 [cs.ds] 2 Dec 2013

arxiv: v2 [cs.ds] 2 Dec 2013 arxiv:1305.4747v2 [cs.ds] 2 Dec 2013 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Easy identification of generalized common and conserved nested intervals Fabien de Montgolfier

More information

Non-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore

Non-binary Tree Reconciliation. Louxin Zhang Department of Mathematics National University of Singapore Non-binary Tree Reconciliation Louxin Zhang Department of Mathematics National University of Singapore matzlx@nus.edu.sg Introduction: Gene Duplication Inference Consider a duplication gene family G Species

More information

Network Alignment 858L

Network Alignment 858L Network Alignment 858L Terms & Questions A homologous h Interolog = B h Species 1 Species 2 Are there conserved pathways? What is the minimum set of pathways required for life? Can we compare networks

More information

A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes

A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes A Methodological Framework for the Reconstruction of Contiguous Regions of Ancestral Genomes and Its Application to Mammalian Genomes Cedric Chauve 1, Eric Tannier 2,3,4,5 * 1 Department of Mathematics,

More information

Introduction to protein alignments

Introduction to protein alignments Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Sequence Alignment (chapter 6)

Sequence Alignment (chapter 6) Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:

More information

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal Genômica comparativa João Carlos Setubal IQ-USP outubro 2012 11/5/2012 J. C. Setubal 1 Comparative genomics There are currently (out/2012) 2,230 completed sequenced microbial genomes publicly available

More information

In order to compare the proteins of the phylogenomic matrix, we needed a similarity

In order to compare the proteins of the phylogenomic matrix, we needed a similarity Similarity Matrix Generation In order to compare the proteins of the phylogenomic matrix, we needed a similarity measure. Hamming distances between phylogenetic profiles require the use of thresholds for

More information

Genome Rearrangement Problems with Single and Multiple Gene Copies: A Review

Genome Rearrangement Problems with Single and Multiple Gene Copies: A Review Genome Rearrangement Problems with Single and Multiple Gene Copies: A Review Ron Zeira and Ron Shamir August 9, 2018 Dedicated to Bernard Moret upon his retirement. Abstract Genome rearrangement problems

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Horizontal gene transfer and the evolution of transcriptional regulation in Escherichia coli

Horizontal gene transfer and the evolution of transcriptional regulation in Escherichia coli Horizontal gene transfer and the evolution of transcriptional regulation in Escherichia coli Morgan N Price 1,2, Paramvir S Dehal 1,2 and Adam P Arkin 1,2,3 1 Physical Biosciences Division, Lawrence Berkeley

More information

From Phylogenetics to Phylogenomics: The Evolutionary Relationships of Insect Endosymbiotic γ-proteobacteria as a Test Case

From Phylogenetics to Phylogenomics: The Evolutionary Relationships of Insect Endosymbiotic γ-proteobacteria as a Test Case Syst. Biol. 56(1):1 16, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150601109759 From Phylogenetics to Phylogenomics: The Evolutionary Relationships

More information

Biol478/ August

Biol478/ August Biol478/595 29 August # Day Inst. Topic Hwk Reading August 1 M 25 MG Introduction 2 W 27 MG Sequences and Evolution Handouts 3 F 29 MG Sequences and Evolution September M 1 Labor Day 4 W 3 MG Database

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Comparative Network Analysis COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Biomolecular Network Components 2 Accumulation of Network Components

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

Genomes Comparision via de Bruijn graphs

Genomes Comparision via de Bruijn graphs Genomes Comparision via de Bruijn graphs Student: Ilya Minkin Advisor: Son Pham St. Petersburg Academic University June 4, 2012 1 / 19 Synteny Blocks: Algorithmic challenge Suppose that we are given two

More information

arxiv: v1 [cs.ds] 21 May 2013

arxiv: v1 [cs.ds] 21 May 2013 Easy identification of generalized common nested intervals Fabien de Montgolfier 1, Mathieu Raffinot 1, and Irena Rusu 2 arxiv:1305.4747v1 [cs.ds] 21 May 2013 1 LIAFA, Univ. Paris Diderot - Paris 7, 75205

More information

Evolutionary Analysis by Whole-Genome Comparisons

Evolutionary Analysis by Whole-Genome Comparisons JOURNAL OF BACTERIOLOGY, Apr. 2002, p. 2260 2272 Vol. 184, No. 8 0021-9193/02/$04.00 0 DOI: 184.8.2260 2272.2002 Copyright 2002, American Society for Microbiology. All Rights Reserved. Evolutionary Analysis

More information

Phylogenetic Networks with Recombination

Phylogenetic Networks with Recombination Phylogenetic Networks with Recombination October 17 2012 Recombination All DNA is recombinant DNA... [The] natural process of recombination and mutation have acted throughout evolution... Genetic exchange

More information

Graphs, permutations and sets in genome rearrangement

Graphs, permutations and sets in genome rearrangement ntroduction Graphs, permutations and sets in genome rearrangement 1 alabarre@ulb.ac.be Universite Libre de Bruxelles February 6, 2006 Computers in Scientic Discovery 1 Funded by the \Fonds pour la Formation

More information

1 Introduction. Abstract

1 Introduction. Abstract CBS 530 Assignment No 2 SHUBHRA GUPTA shubhg@asu.edu 993755974 Review of the papers: Construction and Analysis of a Human-Chimpanzee Comparative Clone Map and Intra- and Interspecific Variation in Primate

More information

X X (2) X Pr(X = x θ) (3)

X X (2) X Pr(X = x θ) (3) Notes for 848 lecture 6: A ML basis for compatibility and parsimony Notation θ Θ (1) Θ is the space of all possible trees (and model parameters) θ is a point in the parameter space = a particular tree

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden

Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1 Results Multiple Alignment with SP-score Star Alignment Tree Alignment (with given phylogeny) are NP-hard

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

Linear-Space Alignment

Linear-Space Alignment Linear-Space Alignment Subsequences and Substrings Definition A string x is a substring of a string x, if x = ux v for some prefix string u and suffix string v (similarly, x = x i x j, for some 1 i j x

More information

Genetic Basis of Variation in Bacteria

Genetic Basis of Variation in Bacteria Mechanisms of Infectious Disease Fall 2009 Genetics I Jonathan Dworkin, PhD Department of Microbiology jonathan.dworkin@columbia.edu Genetic Basis of Variation in Bacteria I. Organization of genetic material

More information