Beyond Galled Trees Decomposition and Computation of Galled Networks Daniel H. Huson & Tobias H.Kloepper RECOMB 2007 1
Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be found at http://www.gnu.org/copyleft/fdl.html
Two Main Types of Phylogenetic Networks Implicit networks: visualization of incompatible signals Explicit networks: explicitly describe a evolutionary scenario involving reticulate events 2
Implicit Networks Visualization of incompatible signals Eg split network from binary characters: Haplotype data Split network Data: Cassens et al., 2003, Dusky dolphins 3
Explicit Networks Explicitly describe a evolutionary scenario involving reticulate events such as hybridization, HGT or recombination A B C D E F G H r 1 r 2 r 3 root 4
A Simple Model of Reticulate Evolution A B H C D P Hybridization HGT Recombination: Uneven Mixture mixture Order of matters of genomes reticulate events Q speciation events By C.R. Linder http://www.pitt.edu/~heh1/research.html Ancestral genome By C.R. Linder mutations 5
Data For Reticulate Networks Hybridization or HGT: Different genes have different histories Input: Two or more gene trees T 1,T 2, Output: Network N that explains T 1,T 2, A B C D E A B C D E A B C D E N T 1 T 2 6
Data For Reticulate Networks Recombination: Recombination of closely related sequences Input: Alignment M of binary sequences Output: Network N that explains M Alignment M A:100110000000 B:010101000000 R:001101100000 C:000000110100 D:000000111010 O:000000000000 Additional annotation Network N 2 1,5 000101000000 000100000000 R:001101100000 B:010101000000 C:000000110100 A:100110000000 D:000000111010 6 6 4 3 10 9,11 000101 100000 7 8 000000110000 000000100000 O: 000000000000 7
Combinatorial Approach Can be formulated in terms of splits: Every edge e of a tree T defines a split of the taxon set X: F A H e A,C,D,F,G vs B,E,H D E B G C Split encoding Σ(T) 8
Splits From Sequences Every non-constant binary character induces a split of the taxon set X: Alignment M A:100110000010 B:010101001000 C:001101100010 D:000000110110 E:000000111000 F:000000000010 ACDF vs BE Multiple columns may map to the same split Define Σ(M): set of all splits induced by M 9
Combinatorial Approach Hybridization or HGT: Input trees T 1,T 2, represented by splits :=Σ(T 1 ) Σ(T 2 ) (Information loss: which splits occur together in same input tree?) Recombination: Binary alignment M represented by splits :=Σ(M) (Information loss: order along sequence) 10
Reticulate Networks And Splits For a reticulate network N, how to define Σ(N)? Extract tree by deleting one reticulate edge for each reticulate node For each tree edge e: Obtain split from tree: A B C D E F G H r 1 r 2 r 3 e A,B,C,D,E,H vs F,G root Σ(N): set of all splits thus obtained 11
Parsimonious Reticulate Network Problem Input: Set of splits on a taxon set X. Output: A reticulate network N with: 1. Σ(N) 2. N contains a minimum number of reticulate nodes Such an N always exists (Baroni & Steel, 2005) To find one is NP-hard in general (Wang et al, 2001, Borderwich & Semple, 2006) Special case: N is a galled tree (Gusfield et al, 2003-2005) 12
The Galled Tree Property Dan Gusfield et al (2003-2005): If a solution exists that has the galled tree property, then it can be computed efficiently 13
The Galled Tree Property A reticulation is a gall, if it is cycle disjoint to all others P R Q A B C D E F Reticulation at P is a gall, at Q is a gall Addition of R destroys gall property for Q Gall property is fragile 14
The Loose Gall Property A reticulation is a loose gall, if it has a cycle whose backbone consists only of tree edges P R Q A B C D E F P, Q and R are loose galls Not fragile: Adding taxa doesn t destroy property 15
The Galled Network Property New definition: A reticulate network is a galled network, if all reticulations are loose galls. How to compute them? The Decomposition Theorem 16
Input: Computing A Galled Network Set of splits on X={A,B,,I} that comes from a network, either via trees or binary sequences, e.g.: G H I A B C D E F 17
Computing A Galled Network Assume we know G,H,I are reticulate taxa Where to attach G, H, I? H I G A B C D E F Induced splits Extended splits X-{G,H,I} X-{H,I} Orient edges to show where splits place G Attach G to ends of target path 18
Computing A Galled Network Assume we know G,H,I are reticulate taxa Where to attach G, H, I? G I H A B C D E F Induced splits Extended splits X-{G,H,I} X-{G,I} Orient edges to show where splits place H Attach H to ends of target path 19
Computing A Galled Network Assume we know G,H,I are reticulate taxa Where to attach G, H, I? G H I A B C D E F I Induced splits Extended splits X-{G,H,I} X-{G,H} Orient edges to show where splits place I Attach I to ends of target path 20
Computing A Galled Network Assume we know G,H,I are reticulate taxa Where to attach G, H, I? G H A B C D E F I If Σ(N), then return N 21
Algorithm Input: Set of splits on X, parameter k In increasing order of size k: Consider a set of taxa R X If X-R is compatible: Attempt to attach each r R to T( X-R ) If successful, construct network N If Σ(N), return N Return fail FPT, for fixed maximum size k of R 22
Decomposition Conjecture (Dan Gusfield) Input trees Split network Minimal reticulate network 23
Decomposition Conjecture Each incompatibility component can be considered independently: 1. component 2. component (Gusfield et al. 2005) (Huson et al. 2005) 24
The Decomposition Theorem Let of be a set of splits. If there exists a galled network N with Σ(N), then there exists a minimal network N min that has the decomposition property. To compute N min we can consider each component separately 25
Proof Easy, assuming non-degenerate: For every tree node v there exists a path of tree edges from v down to some leaf w. A B C D E F G H v non-degenerate root Degenerate node, no tree path to leaf 26
Consider any reticulation cycle A Proof a X Y R b B Any split S Σ(e) is incompatible with all S Σ(a) or S Σ(b): S contains either AXR BY or AX BYR S contains AR XYB S contains AXY BR e 27
Implementation Available in the latest version of SplitsTree4 Interactive program for phylogenetic analysis using trees and networks (Huson and Bryant, MBE, 2006) 28
Reticulate Network with 4 reticulations Data: Kumar et al, 1998. Restriction map of the rdna cistron, culicine mosquitos 29
Reticulate cladogram 30
Conclusion & Outlook Galled networks go beyond galled trees A user-friendly implementation is available in the latest version of SplitsTree4 Decomposition Conjecture unsolved in general All current methods based on combinatorics, thus are sensitive to false-positive splits More robust methods for computation of phylogenetic networks required 31
http://www.newton.cam.ac.uk/programmes/plg 32