Reconstructing Phylogenetic Networks Mareike Fischer, Leo van Iersel, Steven Kelk, Nela Lekić, Simone Linz, Celine Scornavacca, Leen Stougie Centrum Wiskunde & Informatica (CWI) Amsterdam MCW Prague, 3 July 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44
Definition Let X be a finite set. A (rooted) phylogenetic tree on X is a rooted tree with no indegree- outdegree- vertices whose leaves are bijectively labelled by the elements of X. 8 Mya 76 Mya 68 Mya 35 Mya Kiwi (New Zealand) Cassowary (New Guinea + Australia) Leo van Iersel (CWI) Emu (Australia) Ostrich Moa (Africa) (New Zealand) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44
Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 3 / 44
Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 4 / 44
Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 5 / 44
W.F. Doolittle et al. () Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 6 / 44
Definition Let X be a finite set. A (rooted) phylogenetic network on X is a rooted directed acyclic graph with no indegree- outdegree- vertices whose leaves are bijectively labelled by the elements of X. Parasitic Jaeger Pomarine Skua Great Skua Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 7 / 44
The first phylogenetic network (Buffon, 755) Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 8 / 44
Phylogenetic network for humans (Reich et al., ) Definition A reticulation is a vertex with indegree at least. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 9 / 44
Tree-based methods Compute trees from DNA sequences. Different parts of DNA might give different trees. Try to induce a phylogenetic network from the trees. Definition A phylogenetic tree T is displayed by a phylogenetic network N if T can be obtained from a subgraph of N by contracting edges. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44
Example: tree T is displayed by network N a h a h g g b c d e f N b c d e f T Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44
The other binary tree T displayed by network N a h a h g g b c d e f b c d e f N T Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44
Challenge: try to reconstruct the network from the trees a h g a h b c d e f g a b c d e f h g b c d e f Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 3 / 44
Definition The reticulation number of a phylogenetic network N is d (v). Problem Minimum Reticulation v V \{root} Instance: phylogenetic trees T, T Solution: phylogenetic network that displays T and T Minimize: reticulation number of the network. Theorem There exists a constant factor approximation algorithm for Minimum Reticulation if and only if there exists a constant factor approximation algorithm for Directed Feedback Vertex Set. Open question: how to handle more than two trees (efficiently)? Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 4 / 44
Reconstructing phylogenetic networks Tree-based methods Construct trees from DNA sequences. Find a network that displays the trees and has minimum reticulation number. Sequence-based methods Find a network directly from the DNA sequences. Optimize Parsimony or Likelihood score of network. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 5 / 44
Maximum Parsimony for trees Small parsimony problem: given a tree and a sequence for each leaf, assign sequences to the internal vertices in order to minimize the total number of mutations. ACCTG ATCTG ATCTC GTAAA TTACT Example input Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 6 / 44
Maximum Parsimony for trees Small parsimony problem: given a tree and a sequence for each leaf, assign sequences to the internal vertices in order to minimize the total number of mutations. TTCTA ATCTA ATCTG TTAAA ACCTG ATCTG ATCTC GTAAA TTACT Example labelling of internal vertices Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 7 / 44
Maximum Parsimony for trees Small parsimony problem: given a tree and a sequence for each leaf, assign sequences to the internal vertices in order to minimize the total number of mutations. ATCTA TTCTA ATCTG TTAAA ACCTG ATCTG ATCTC GTAAA TTACT Example of one mutation Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 8 / 44
Maximum Parsimony for trees Small parsimony problem: given a tree and a sequence for each leaf, assign sequences to the internal vertices in order to minimize the total number of mutations. TTCTA ATCTA ATCTG TTAAA ACCTG ATCTG ATCTC GTAAA TTACT All 9 mutations. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 9 / 44
Maximum Parsimony for trees Small parsimony problem: given a tree and a sequence for each leaf, assign sequences to the interior vertices in order to minimize the total number of mutations. Polynomial-time solvable: Consider each character separately. Use dynamic programming (Fitch, 97). Two possible extensions to networks: hardwired softwired Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44
Hardwired Maximum Parsimony on Networks A p-state character on X is a function α : X {,..., p}. The change c τ (e) on edge e = (u, v) w.r.t. a p-state character τ on V (N) is defined as: { if τ(u) = τ(v) c τ (e) = if τ(u) τ(v). The hardwired parsimony score of a phylogenetic network N and p-state character α is given by PS hw (N, α) = min c τ (e), τ e E(N) where the minimum is taken over all p-state characters τ on V (N) that extend α. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44
Example input: (N, α) 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 / 44
A 3-state character τ on V (N) that extends α. 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 3 / 44
PS hw (N, α) = 4 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 4 / 44
Softwired Maximum Parsimony on Networks The softwired parsimony score of a phylogenetic network N and p-state character α is given by PS sw (N, α) = min PS(T, α), T T (N) where T (N) is the set of trees on X displayed by N. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 5 / 44
One of the two trees on X displayed by the network 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 6 / 44
A 3-state character τ on V (T ) that extends α. 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 7 / 44
There are 3 changes 3 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 8 / 44
The other tree needs 4 changes 3 The minimum over the two trees is 3, so PS sw (N, α) = 3. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 9 / 44
Proposition PS hw (N, α) is not an o(n)-approximation of PS sw (N, α). Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 3 / 44
Softwired Parsimony Score PS sw (N, α) = Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 3 / 44
Hardwired Parsimony Score with r the number of reticulations. PS hw (N, α) = 4 = r + Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 3 / 44
Proposition Let G be the graph obtained from network N by merging all leaves x with α(x) = i into a single node γ i, for i =,..., p. Then, PS hw (N, α) equals the size of a minimum multiterminal cut in G with terminals γ,..., γ p. Corollary Computing the hardwired parsimony score of a phylogenetic network and a binary character is polynomial-time solvable. Corollary Computing the hardwired parsimony score of a phylogenetic network and a p-state character, for p 3, is NP-hard and APX-hard but fixed-parameter tractable (FPT) in the parsimony score, and there exists a polynomial-time.3438-approximation for all p and a -approximation for p = 3. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 33 / 44
Example Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 34 / 44
Merge -leaves and -leaves Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 35 / 44
Hardwired Parsimony Score is 4 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 36 / 44
Observation There exists a (trivial) X -approximation for computing the softwired parsimony score of a phylogenetic network. Theorem For every constant ɛ > there is no polynomial-time approximation algorithm that approximates PS sw (N, α) to a factor X ɛ, for a phylogenetic network N and a binary character α, unless P = NP. Definition A phylogenetic network is binary if the root has outdegree and all other vertices have total degree or 3. Theorem For every constant ɛ > there is no polynomial-time approximation algorithm that approximates PS sw (N, α) to a factor X 3 ɛ, for a binary phylogenetic network N and a binary character α, unless P = NP. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 37 / 44
Proof: reduction from 3SAT Ú s s variable gadget x :x y :y z :z.................. f(n,ï) copies f(n,ï) copies clause gadget... f(n,ï)... copies x _ :y :x _ y _ z f(n,ï) copies Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 38 / 44
Binary case x :x y :y z :z dx Îx dy Îy dz Îz dx Îz dx 3 x _ y x _ :y _ z x _ :z :x _ :z Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 39 / 44
Theorem There is no FPT algorithm for computing the softwired parsimony score, with the score as parameter, unless P = NP. Definition A phylogenetic network is level-k if each biconnected component has reticulation number at most k. Theorem There is an FPT algorithm for computing the softwired parsimony score, with the level of the network as parameter. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 4 / 44
ILP for softwired parsimony score min c e e E s.t. x v,s = s P c e x u,s x v,s ( y e ) c e x v,s x u,s ( y e ) y (v,r) = v:(v,r) E y e = x v,α(v) = c e, y e {, } x v,s {, } for all v V for all e = (u, v) E, s P for all e = (u, v) E, s P for each reticulation r for each non-reticulate edge e for each leaf v for all e E for all v V, s P with P = {,..., p} and α(v) the given character state of a leaf v. Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 4 / 44
Both parsimony scores can be computed quickly using ILP X Avg. Average computation time (s) num. of Hardwired PS Softwired PS retic. -state 3-state 4-state -state 3-state 4-state 5 7.......3 37.......6 5 54....6...8 7.8.....4.4 5 9.3.. 3.5..4. 3.6.. 5...6 3.7 Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 4 / 44
Future Work Are there approximation or FPT algorithms for computing the softwired parsimony score of restricted classes of networks? How to search for an optimal network? What if the different characters are not independent? Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 43 / 44
Thanks Mareike Fischer (Greifswald) Steven Kelk (Maastricht) Celine Scornavacca (Montpellier) Simone Linz (Christchurch) Leen Stougie (Amsterdam) Nela Lekić (Maastricht) Leo van Iersel (CWI) Reconstructing Phylogenetic Networks MCW Prague, 3 July 3 44 / 44