Weighted Quartets Phylogenetics

Similar documents
NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence

A Phylogenetic Network Construction due to Constrained Recombination

Jed Chou. April 13, 2015

X X (2) X Pr(X = x θ) (3)

A Minimum Spanning Tree Framework for Inferring Phylogenies

CS 581 Paper Presentation

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Dr. Amira A. AL-Hosary

Constructing Evolutionary/Phylogenetic Trees

Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting

arxiv: v1 [q-bio.pe] 27 Oct 2011

Phylogenetic Geometry

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

Molecular Evolution & Phylogenetics

Notes 3 : Maximum Parsimony

A first order divided difference

Consistency Index (CI)

Symmetric Tree, ClustalW. Divergence x 0.5 Divergence x 1 Divergence x 2. Alignment length

TheDisk-Covering MethodforTree Reconstruction

Analytic Solutions for Three Taxon ML MC Trees with Variable Rates Across Sites

arxiv: v1 [q-bio.pe] 1 Jun 2014

Network alignment and querying

7.1 Sampling Error The Need for Sampling Distributions

Isolating - A New Resampling Method for Gene Order Data

Is the equal branch length model a parsimony model?

Tree Space: Algorithms & Applications Part I. Megan Owen University of Waterloo

arxiv: v1 [cs.ds] 1 Nov 2018

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

DEF 1 Let V be a vector space and W be a nonempty subset of V. If W is a vector space w.r.t. the operations, in V, then W is called a subspace of V.

Phylogenetic inference

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Constructing Evolutionary/Phylogenetic Trees

CSE 321 Solutions to Practice Problems

RECOVERING A PHYLOGENETIC TREE USING PAIRWISE CLOSURE OPERATIONS

Distances that Perfectly Mislead

A phylogenomic toolbox for assembling the tree of life

Protein Complex Identification by Supervised Graph Clustering

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Regression tree methods for subgroup identification I

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Assessing Congruence Among Ultrametric Distance Matrices

Phylogenetic Tree Reconstruction

Phylogeny Tree Algorithms

Optimality conditions for unconstrained optimization. Outline

Model Accuracy Measures

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction ABSTRACT

Control and synchronization in systems coupled via a complex network

On the maximum quartet distance between phylogenetic trees

Phylogenetics: Parsimony

PRAM lower bounds. 1 Overview. 2 Definitions. 3 Monotone Circuit Value Problem

Multiple Whole Genome Alignment

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

Classification and regression trees

Lecture 11 : Asymptotic Sample Complexity

CS 584 Data Mining. Association Rule Mining 2

Principles of AI Planning

Undirected Graphical Models: Markov Random Fields

Questions we can ask. Recall. Accuracy and Precision. Systematics - Bio 615. Outline

An Introduction to Matroids & Greedy in Approximation Algorithms

CLOSURE OPERATIONS IN PHYLOGENETICS. Stefan Grünewald, Mike Steel and M. Shel Swenson

Species Tree Inference using SVDquartets

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016)

Handout 2 Gaussian elimination: Special cases

Inferring Causal Phenotype Networks from Segregating Populat

Math 3191 Applied Linear Algebra

Machine Learning 2010

Reconstructing Trees from Subtree Weights

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Supply Chain Network Structure and Risk Propagation

Statistical Analysis of Chemical Data Chapter 4


Properties of Consensus Methods for Inferring Species Trees from Gene Trees

CMPSCI 311: Introduction to Algorithms Second Midterm Exam

THE UNIVERSAL HOMOGENEOUS TRIANGLE-FREE GRAPH HAS FINITE RAMSEY DEGREES

Synthesis of 2-level Logic Exact and Heuristic Methods. Two Approaches

Structure Learning: the good, the bad, the ugly

Lassoing phylogenetic trees

arxiv: v4 [q-bio.pe] 7 Jul 2016

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

AP CALCULUS BC 2009 SCORING GUIDELINES

CHAPTER 7 MULTI-LEVEL GATE CIRCUITS NAND AND NOR GATES

Phylogenetic invariants versus classical phylogenetics

Semantics-Preserving Procedure Extraction

Improving divergence time estimation in phylogenetics: more taxa vs. longer sequences

Computer Graphics II

Foundations of Artificial Intelligence

Lecture 5. 1 Goermans-Williamson Algorithm for the maxcut problem

NP-Hardness reductions

Machine Learning 4771

Principles of AI Planning

CS246 Final Exam. March 16, :30AM - 11:30AM

1 Non-deterministic Turing Machine

Supplementary Information

Decision Trees (Cont.)

Transcription:

Weighted Quartets Phylogenetics Yunan Luo E. Avni, R. Cohen, and S. Snir. Weighted quartets phylogenetics. Systematic Biology, 2014. syu087

Problem: quartet-based supertree Input Output A B C D A C D E A C D A D B E B E Def: a set Q of quartets is compatible if there is a tree that induces each quartet in Q. Goal: find the largest compatible subset of the given quartet set. NP-hard

Outline Background: Quartet MaxCut (QMC) Weighted Quartet MaxCut (wqmc) Results of wqmc

Background: Quartet MaxCut (QMC) Example: cut in a graph 2 A B 3 1 C D 5 cut C = ( {A, B}, {C, D} ) weight of cut, w(c) = 3 + 1 = 4 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.

Quartet MaxCut (QMC): a heuristic method Given a set of species (taxa) X, QMC builds a graph G(Q) = (V, E). Node: V = X Edge: For every quartet q in Q, add to G edges related to every pair of leaves in q. - bad edges: edges that link adjacent sister leaves - good edges: other (four) pairs 1 2 3 4 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.

Quartet graph 1 2 1 3 3 4 Put together 1 3 2 4 2 4 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.

Quartet MaxCut (QMC) algorithm Find a cut C in the quartet graph that maximizes the ratio between the good and bad edges in C The cut defines a split (U, X\U) over the taxa set X 1 3 Apply recursively on U and X\U, until the subset size is <= 4 Every split defines an edge in the construction 2 4 Snir, Sagi, and Satish Rao. "Quartet MaxCut: a fast algorithm for amalgamating quartet trees." Molecular phylogenetics and evolution 62.1 (2012): 1-8.

Outline Background: Quartet MaxCut (QMC) Weighted Quartet MaxCut (wqmc) Results of wqmc

Contribution of this paper A weighted extension of QMC A scheme for associating weights to quartets A new measure of tree similarity

A weighted extension of QMC Recall QMC: Find a cut C in the quartet graph that maximizes the ratio between the number of good and bad edges in C Now, suppose we are given a set of quartets with associated weights Question: what is natural extension of QMC to handle weighted quartets? Find a cut C in the quartet graph that maximizes the ratio between the total weight of good and bad edges in C

Prioritize between quartets 1 2 3 1 4 1 2 1 1.0 1.0 0.1 0.1 4 3 5 4 5 3 2 5 No tree satisfies them all simultaneously. Some optimization criterion is necessary. Construction without weights 1 4 2 Construction with weights 1 3 4 3 5 2 5 Satisfies 3 quartets Sum of weights 1.2 Satisfies 2 quartets Sum of weights 2.0

A scheme for associating weights a b c d Let d 1 = d ab + d cd d 2 = d ac + d bd d 3 = d ad + d bc We assume that d 1 d 2 d 3 The weight function of quartet q=ac cd is defined as w( q) ( d3 d1) ex p( d d ) d 3 2 3 Remarks: Note that d 3 -d 1 is the twice the length of the internal edge. The quartet weight increases as the internal edge is longer and the split is more significant Weight becomes 0 if the quartet is unresolved, i.e., d 3 -d 1 =0. d 3 -d 2 0, data more reliable, weight becomes larger In a tree, d 3 -d 2 = 0, we have 1 wq ( ) 1 d d 3

A new measure of tree similarity Existing measure: Qfit measure (Estabrook 1985) Qfit New measure: wqfit measure (this paper) # shared quartets # all possible quartets For quartets: wqfit q ( q, ) ( ) ( ) 2 q w q w w 1 2 where 1 2 1 2 1 q1 q2 q q For trees: wqfit T ( T, T ) 1 2 s 2 s wqfit ( T, T ) q 1, s 2, s wqfit ( T, T ) wqfit ( T, T ) q 1, s 1, s s q 2, s 2, s where s is a subset of input species X, and s =4 T1,s is the quartet of tree T 1 induced by s

Properties of wqfit wqfit T ( T, T ) 1 2 s 2 s wqfit ( T, T ) q 1, s 2, s wqfit ( T, T ) wqfit ( T, T ) q 1, s 1, s s q 2, s 2, s Two trees T 1 = T 2 if and only if wqfit(t 1, T 2 ) = 1 For any two trees T 1 and T 2 on the same input species X, wqfit(t 1, T 2 ) 1 Given a weighted tree T 1. T 2 is obtained by assigning a random permutation of input species X to the leaves of T 1, then E[wQfit(T 1, T 2 )] = 0

Outline Background: Quartet MaxCut (QMC) Weighted Quartet MaxCut (wqmc) Results of wqmc

Performance of wqmc RF (Robinson and Foulds 1981): # different splits between two trees Rewire: randomly replace the topology of a quartet with one of its two incorrect topologies qrt-num-factor: for a taxa set of size n, the number of input quartets is n k, where k is called qrt-num-factor. Observations: wqmc can reconstruct a tree that is highly similar to the original, even when receiving noisy input

Comparison between Qfit and wqfit Qfit: fraction of quartets that are equal in both trees. Does not reflect confidence on the quality of quartets. Example: 30% quartets disagree with the constructed tree. Qfit score for this is 70%. We expect this fraction to be mainly composed unreliable quartets Their total weight should be smaller, e.g., 10%. We expect the wqfit score to reflect the low level of confidence in the wrong quartets, e.g., wqfit=90% Observations: wqfit augments information to the score by segregating quartets according to quality.

Comparison between QMC and wqmc Observations: Weights reflect confidence in quartet data, allowing wqmc to prioritize correct quartets, esp. for noisy data. Lightweight quartets are more prone to exhibit a wrong topology.

Thank you!