Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014. syu087
Problem: quartet-based supertree Input Output A B C D A C D E A C D A D B E B E Def: a set Q of quartets is compatible if there is a tree that induces each quartet in Q. Goal: find the largest compatible subset of the given quartet set. NP-hard
Outline Background: Quartet MaxCut (QMC) Weighted Quartet MaxCut (wqmc) Results of wqmc
Background: Quartet MaxCut (QMC) Example: cut in a graph 2 A B 3 1 C D 5 cut C = ( {A, B}, {C, D} ) weight of cut, w(c) = 3 + 1 = 4 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.
Quartet MaxCut (QMC): a heuristic method Given a set of species (taxa) X, QMC builds a graph G(Q) = (V, E). Node: V = X Edge: For every quartet q in Q, add to G edges related to every pair of leaves in q. - bad edges: edges that link adjacent sister leaves - good edges: other (four) pairs 1 2 3 4 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.
Quartet graph 1 2 1 3 3 4 Put together 1 3 2 4 2 4 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.
Quartet MaxCut (QMC) algorithm Find a cut C in the quartet graph that maximizes the ratio between the good and bad edges in C The cut defines a split (U, X\U) over the taxa set X 1 3 Apply recursively on U and X\U, until the subset size is <= 4 Every split defines an edge in the construction 2 4 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.
Outline Background: Quartet MaxCut (QMC) Weighted Quartet MaxCut (wqmc) Results of wqmc
Contribution of this paper A weighted extension of QMC A scheme for associating weights to quartets A new measure of tree similarity
A weighted extension of QMC Recall QMC: Find a cut C in the quartet graph that maximizes the ratio between the number of good and bad edges in C Now, suppose we are given a set of quartets with associated weights Question: what is natural extension of QMC to handle weighted quartets? Find a cut C in the quartet graph that maximizes the ratio between the total weight of good and bad edges in C
Prioritize between quartets 1 2 3 1 4 1 2 1 1.0 1.0 0.1 0.1 4 3 5 4 5 3 2 5 No tree satisfies them all simultaneously. Some optimization criterion is necessary. Construction without weights 1 4 2 Construction with weights 1 3 4 3 5 2 5 Satisfies 3 quartets Sum of weights 1.2 Satisfies 2 quartets Sum of weights 2.0
A scheme for associating weights a b c d Let d 1 = d ab + d cd d 2 = d ac + d bd d 3 = d ad + d bc We assume that d 1 d 2 d 3 The weight function of quartet q=ac cd is defined as w( q) ( d3 d1) ex p( d d ) d 3 2 3 Remarks: Note that d 3 -d 1 is the twice the length of the internal edge. The quartet weight increases as the internal edge is longer and the split is more significant Weight becomes 0 if the quartet is unresolved, i.e., d 3 -d 1 =0. d 3 -d 2 0, data more reliable, weight becomes larger In a tree, d 3 -d 2 = 0, we have 1 wq ( ) 1 d d 3
A new measure of tree similarity Existing measure: Qfit measure (Estabrook 1985) Qfit New measure: wqfit measure (this paper) # shared quartets # all possible quartets For quartets: wqfit q ( q, ) ( ) ( ) 2 q w q w w 1 2 where 1 2 1 2 1 q1 q2 q q For trees: wqfit T ( T, T ) 1 2 s 2 s wqfit ( T, T ) q 1, s 2, s wqfit ( T, T ) wqfit ( T, T ) q 1, s 1, s s q 2, s 2, s where s is a subset of input species X, and s =4 T1,s is the quartet of tree T 1 induced by s
Properties of wqfit wqfit T ( T, T ) 1 2 s 2 s wqfit ( T, T ) q 1, s 2, s wqfit ( T, T ) wqfit ( T, T ) q 1, s 1, s s q 2, s 2, s Two trees T 1 = T 2 if and only if wqfit(t 1, T 2 ) = 1 For any two trees T 1 and T 2 on the same input species X, wqfit(t 1, T 2 ) 1 Given a weighted tree T 1. T 2 is obtained by assigning a random permutation of input species X to the leaves of T 1, then E[wQfit(T 1, T 2 )] = 0
Outline Background: Quartet MaxCut (QMC) Weighted Quartet MaxCut (wqmc) Results of wqmc
Performance of wqmc RF (Robinson and Foulds 1981): # different splits between two trees Rewire: randomly replace the topology of a quartet with one of its two incorrect topologies qrt-num-factor: for a taxa set of size n, the number of input quartets is n k, where k is called qrt-num-factor. Observations: wqmc can reconstruct a tree that is highly similar to the original, even when receiving noisy input
Comparison between Qfit and wqfit Qfit: fraction of quartets that are equal in both trees. Does not reflect confidence on the quality of quartets. Example: 30% quartets disagree with the constructed tree. Qfit score for this is 70%. We expect this fraction to be mainly composed unreliable quartets Their total weight should be smaller, e.g., 10%. We expect the wqfit score to reflect the low level of confidence in the wrong quartets, e.g., wqfit=90% Observations: wqfit augments information to the score by segregating quartets according to quality.
Comparison between QMC and wqmc Observations: Weights reflect confidence in quartet data, allowing wqmc to prioritize correct quartets, esp. for noisy data. Lightweight quartets are more prone to exhibit a wrong topology.
Thank you!