Beyond Galled Trees Decomposition and Computation of Galled Networks

Similar documents
Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

Splits and Phylogenetic Networks. Daniel H. Huson

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

Phylogenetic Networks, Trees, and Clusters

A Phylogenetic Network Construction due to Constrained Recombination

TheDisk-Covering MethodforTree Reconstruction

Finding a gene tree in a phylogenetic network Philippe Gambette

From graph classes to phylogenetic networks Philippe Gambette

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

Properties of normal phylogenetic networks

Tree-average distances on certain phylogenetic networks have their weights uniquely determined

Regular networks are determined by their trees

A new algorithm to construct phylogenetic networks from trees

Integer Programming for Phylogenetic Network Problems

Phylogenetic Networks with Recombination

Intraspecific gene genealogies: trees grafting into networks

NOTE ON THE HYBRIDIZATION NUMBER AND SUBTREE DISTANCE IN PHYLOGENETICS

arxiv: v1 [q-bio.pe] 1 Jun 2014

An introduction to phylogenetic networks

Phylogenetic networks: overview, subclasses and counting problems Philippe Gambette

ALGORITHMIC STRATEGIES FOR ESTIMATING THE AMOUNT OF RETICULATION FROM A COLLECTION OF GENE TREES

ISMB-Tutorial: Introduction to Phylogenetic Networks. Daniel H. Huson

Efficient Parsimony-based Methods for Phylogenetic Network Reconstruction

Introduction to Phylogenetic Networks

Evolutionary Tree Analysis. Overview

UNICYCLIC NETWORKS: COMPATIBILITY AND ENUMERATION

Improved maximum parsimony models for phylogenetic networks

Solving the Tree Containment Problem for Genetically Stable Networks in Quadratic Time

Reconstruction of certain phylogenetic networks from their tree-average distances

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Integer Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.!

Phylogenetics: Parsimony

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction

Dr. Amira A. AL-Hosary

Restricted trees: simplifying networks with bottlenecks

A CLUSTER REDUCTION FOR COMPUTING THE SUBTREE DISTANCE BETWEEN PHYLOGENIES

1.1 The (rooted, binary-character) Perfect-Phylogeny Problem

What is Phylogenetics

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Charles Semple, Philip Daniel, Wim Hordijk, Roderic D M Page, and Mike Steel

Bioinformatics Advance Access published August 23, 2006

Consistency Index (CI)

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Aphylogenetic network is a generalization of a phylogenetic tree, allowing properties that are not tree-like.

Phylogenetic analyses. Kirsi Kostamo

Distances that Perfectly Mislead

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Fast Phylogenetic Methods for the Analysis of Genome Rearrangement Data: An Empirical Study

Supertree Algorithms for Ancestral Divergence Dates and Nested Taxa

Higher Order ODE's (3A) Young Won Lim 12/27/15

Inferring a level-1 phylogenetic network from a dense set of rooted triplets

Algorithms in Bioinformatics

Algorithms for phylogeny construction

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Relations (3A) Young Won Lim 3/27/18

An Overview of Combinatorial Methods for Haplotype Inference

AP Biology. Cladistics

Zhongyi Xiao. Correlation. In probability theory and statistics, correlation indicates the

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

I. Short Answer Questions DO ALL QUESTIONS

Estimating Recombination Rates. LRH selection test, and recombination

Phylogenetic Tree Reconstruction

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Microbial Taxonomy and the Evolution of Diversity

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Algebraic Statistics Tutorial I

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Haplotyping as Perfect Phylogeny: A direct approach

arxiv: v3 [q-bio.pe] 1 May 2014

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Higher Order ODE's (3A) Young Won Lim 7/7/14

Perfect Phylogenetic Networks with Recombination Λ

Matrix Transformation (2A) Young Won Lim 11/10/12

Theory of Evolution Charles Darwin

Phylogenetic inference

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS

General CORDIC Description (1A)

Phylogeny. Properties of Trees. Properties of Trees. Trees represent the order of branching only. Phylogeny: Taxon: a unit of classification

arxiv: v5 [q-bio.pe] 24 Oct 2016

The Multi-State Perfect Phylogeny Problem with Missing and Removable Data: Solutions via Integer-Programming and Chordal Graph Theory

Theory of Evolution. Charles Darwin

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Phylogenetics. BIOL 7711 Computational Bioscience

Exploring Treespace. Katherine St. John. Lehman College & the Graduate Center. City University of New York. 20 June 2011

CS 394C Algorithms for Computational Biology. Tandy Warnow Spring 2012

On the Subnet Prune and Regraft distance

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Biology 211 (2) Week 1 KEY!

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Introduction to characters and parsimony analysis

Notes 3 : Maximum Parsimony

A program to compute the soft Robinson Foulds distance between phylogenetic networks

Dispersion (3A) 1-D Dispersion. Young W. Lim 10/15/13

Chapter 26 Phylogeny and the Tree of Life

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Supplementary Materials for

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

Transcription:

Beyond Galled Trees Decomposition and Computation of Galled Networks Daniel H. Huson & Tobias H.Kloepper RECOMB 2007 1

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be found at http://www.gnu.org/copyleft/fdl.html

Two Main Types of Phylogenetic Networks Implicit networks: visualization of incompatible signals Explicit networks: explicitly describe a evolutionary scenario involving reticulate events 2

Implicit Networks Visualization of incompatible signals Eg split network from binary characters: Haplotype data Split network Data: Cassens et al., 2003, Dusky dolphins 3

Explicit Networks Explicitly describe a evolutionary scenario involving reticulate events such as hybridization, HGT or recombination A B C D E F G H r 1 r 2 r 3 root 4

A Simple Model of Reticulate Evolution A B H C D P Hybridization HGT Recombination: Uneven Mixture mixture Order of matters of genomes reticulate events Q speciation events By C.R. Linder http://www.pitt.edu/~heh1/research.html Ancestral genome By C.R. Linder mutations 5

Data For Reticulate Networks Hybridization or HGT: Different genes have different histories Input: Two or more gene trees T 1,T 2, Output: Network N that explains T 1,T 2, A B C D E A B C D E A B C D E N T 1 T 2 6

Data For Reticulate Networks Recombination: Recombination of closely related sequences Input: Alignment M of binary sequences Output: Network N that explains M Alignment M A:100110000000 B:010101000000 R:001101100000 C:000000110100 D:000000111010 O:000000000000 Additional annotation Network N 2 1,5 000101000000 000100000000 R:001101100000 B:010101000000 C:000000110100 A:100110000000 D:000000111010 6 6 4 3 10 9,11 000101 100000 7 8 000000110000 000000100000 O: 000000000000 7

Combinatorial Approach Can be formulated in terms of splits: Every edge e of a tree T defines a split of the taxon set X: F A H e A,C,D,F,G vs B,E,H D E B G C Split encoding Σ(T) 8

Splits From Sequences Every non-constant binary character induces a split of the taxon set X: Alignment M A:100110000010 B:010101001000 C:001101100010 D:000000110110 E:000000111000 F:000000000010 ACDF vs BE Multiple columns may map to the same split Define Σ(M): set of all splits induced by M 9

Combinatorial Approach Hybridization or HGT: Input trees T 1,T 2, represented by splits :=Σ(T 1 ) Σ(T 2 ) (Information loss: which splits occur together in same input tree?) Recombination: Binary alignment M represented by splits :=Σ(M) (Information loss: order along sequence) 10

Reticulate Networks And Splits For a reticulate network N, how to define Σ(N)? Extract tree by deleting one reticulate edge for each reticulate node For each tree edge e: Obtain split from tree: A B C D E F G H r 1 r 2 r 3 e A,B,C,D,E,H vs F,G root Σ(N): set of all splits thus obtained 11

Parsimonious Reticulate Network Problem Input: Set of splits on a taxon set X. Output: A reticulate network N with: 1. Σ(N) 2. N contains a minimum number of reticulate nodes Such an N always exists (Baroni & Steel, 2005) To find one is NP-hard in general (Wang et al, 2001, Borderwich & Semple, 2006) Special case: N is a galled tree (Gusfield et al, 2003-2005) 12

The Galled Tree Property Dan Gusfield et al (2003-2005): If a solution exists that has the galled tree property, then it can be computed efficiently 13

The Galled Tree Property A reticulation is a gall, if it is cycle disjoint to all others P R Q A B C D E F Reticulation at P is a gall, at Q is a gall Addition of R destroys gall property for Q Gall property is fragile 14

The Loose Gall Property A reticulation is a loose gall, if it has a cycle whose backbone consists only of tree edges P R Q A B C D E F P, Q and R are loose galls Not fragile: Adding taxa doesn t destroy property 15

The Galled Network Property New definition: A reticulate network is a galled network, if all reticulations are loose galls. How to compute them? The Decomposition Theorem 16

Input: Computing A Galled Network Set of splits on X={A,B,,I} that comes from a network, either via trees or binary sequences, e.g.: G H I A B C D E F 17

Computing A Galled Network Assume we know G,H,I are reticulate taxa Where to attach G, H, I? H I G A B C D E F Induced splits Extended splits X-{G,H,I} X-{H,I} Orient edges to show where splits place G Attach G to ends of target path 18

Computing A Galled Network Assume we know G,H,I are reticulate taxa Where to attach G, H, I? G I H A B C D E F Induced splits Extended splits X-{G,H,I} X-{G,I} Orient edges to show where splits place H Attach H to ends of target path 19

Computing A Galled Network Assume we know G,H,I are reticulate taxa Where to attach G, H, I? G H I A B C D E F I Induced splits Extended splits X-{G,H,I} X-{G,H} Orient edges to show where splits place I Attach I to ends of target path 20

Computing A Galled Network Assume we know G,H,I are reticulate taxa Where to attach G, H, I? G H A B C D E F I If Σ(N), then return N 21

Algorithm Input: Set of splits on X, parameter k In increasing order of size k: Consider a set of taxa R X If X-R is compatible: Attempt to attach each r R to T( X-R ) If successful, construct network N If Σ(N), return N Return fail FPT, for fixed maximum size k of R 22

Decomposition Conjecture (Dan Gusfield) Input trees Split network Minimal reticulate network 23

Decomposition Conjecture Each incompatibility component can be considered independently: 1. component 2. component (Gusfield et al. 2005) (Huson et al. 2005) 24

The Decomposition Theorem Let of be a set of splits. If there exists a galled network N with Σ(N), then there exists a minimal network N min that has the decomposition property. To compute N min we can consider each component separately 25

Proof Easy, assuming non-degenerate: For every tree node v there exists a path of tree edges from v down to some leaf w. A B C D E F G H v non-degenerate root Degenerate node, no tree path to leaf 26

Consider any reticulation cycle A Proof a X Y R b B Any split S Σ(e) is incompatible with all S Σ(a) or S Σ(b): S contains either AXR BY or AX BYR S contains AR XYB S contains AXY BR e 27

Implementation Available in the latest version of SplitsTree4 Interactive program for phylogenetic analysis using trees and networks (Huson and Bryant, MBE, 2006) 28

Reticulate Network with 4 reticulations Data: Kumar et al, 1998. Restriction map of the rdna cistron, culicine mosquitos 29

Reticulate cladogram 30

Conclusion & Outlook Galled networks go beyond galled trees A user-friendly implementation is available in the latest version of SplitsTree4 Decomposition Conjecture unsolved in general All current methods based on combinatorics, thus are sensitive to false-positive splits More robust methods for computation of phylogenetic networks required 31

http://www.newton.cam.ac.uk/programmes/plg 32