Notes 3 : Maximum Parsimony

Similar documents
Lecture 9 : Identifiability of Markov Models

arxiv: v5 [q-bio.pe] 24 Oct 2016

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

Complexity, P and NP

arxiv: v1 [q-bio.pe] 1 Jun 2014

Lecture 8: Complete Problems for Other Complexity Classes

Lecture 4 : Quest for Structure in Counting Problems

A Working Knowledge of Computational Complexity for an Optimizer

A An Overview of Complexity Theory for the Algorithm Designer

Stanford University CS254: Computational Complexity Handout 8 Luca Trevisan 4/21/2010

Essential facts about NP-completeness:

Lecture 22: Counting

A Phylogenetic Network Construction due to Constrained Recombination

Chapter 3. Complexity of algorithms

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: NP-Completeness I Date: 11/13/18

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

The Class NP. NP is the problems that can be solved in polynomial time by a nondeterministic machine.

Notes for Lecture 3... x 4

Theory of Computation Chapter 1: Introduction

NP-Complete Problems. More reductions

Solving the Maximum Agreement Subtree and Maximum Comp. Tree problems on bounded degree trees. Sylvain Guillemot, François Nicolas.

On the Long-Repetition-Free 2-Colorability of Trees

NP-Complete problems

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness

P = k T IME(n k ) Now, do all decidable languages belong to P? Let s consider a couple of languages:

Computational Intractability 2010/4/15. Lecture 2

NP-Completeness. Dr. B. S. Panda Professor, Department of Mathematics IIT Delhi, Hauz Khas, ND-16

CSCI 1590 Intro to Computational Complexity

Space Complexity. Master Informatique. Université Paris 5 René Descartes. Master Info. Complexity Space 1/26

Properties of normal phylogenetic networks

Ch. 7.6 Squares, Squaring & Parabolas

Tractable Counting of the Answers to Conjunctive Queries

Walks in Phylogenetic Treespace

arxiv: v1 [q-bio.pe] 6 Jun 2013

Space and Nondeterminism

K-center Hardness and Max-Coverage (Greedy)

1 Reductions and Expressiveness

Notes on Complexity Theory Last updated: October, Lecture 6

ORF 523. Finish approximation algorithms + Limits of computation & undecidability + Concluding remarks

Randomness and non-uniformity

2. ALGORITHM ANALYSIS

Easy Problems vs. Hard Problems. CSE 421 Introduction to Algorithms Winter Is P a good definition of efficient? The class P

Algorithms Design & Analysis. Approximation Algorithm

A Tale of Two Cultures: Phase Transitions in Physics and Computer Science. Cristopher Moore University of New Mexico and the Santa Fe Institute

CS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018

Lecture 2. 1 More N P-Compete Languages. Notes on Complexity Theory: Fall 2005 Last updated: September, Jonathan Katz

INTRO TO MATH PROGRAMMING MA/OR 504, FALL 2015 LECTURE 11

Not all counting problems are efficiently approximable. We open with a simple example.

Lecture 18: P & NP. Revised, May 1, CLRS, pp

Lecture #14: NP-Completeness (Chapter 34 Old Edition Chapter 36) Discussion here is from the old edition.

CS Lecture 29 P, NP, and NP-Completeness. k ) for all k. Fall The class P. The class NP

Lecture 24: Approximate Counting

Notes on Complexity Theory Last updated: December, Lecture 2

NP Completeness. CS 374: Algorithms & Models of Computation, Spring Lecture 23. November 19, 2015

ECE 695 Numerical Simulations Lecture 2: Computability and NPhardness. Prof. Peter Bermel January 11, 2017

Multiple Sequence Alignment: Complexity, Gunnar Klau, January 12, 2006, 12:

9. PSPACE. PSPACE complexity class quantified satisfiability planning problem PSPACE-complete

Principles of AI Planning

HW8. Due: November 1, 2018

Complexity Theory. Knowledge Representation and Reasoning. November 2, 2005

Computational complexity theory

The complexity of the Weight Problem for permutation and matrix groups

Introduction to Complexity Theory. Bernhard Häupler. May 2, 2006

Induced subgraphs of graphs with large chromatic number. VIII. Long odd holes

Algorithms and Complexity Theory. Chapter 8: Introduction to Complexity. Computer Science - Durban - September 2005

NP-Completeness. Until now we have been designing algorithms for specific problems

Unit 1A: Computational Complexity

Lecture 11 : Asymptotic Sample Complexity

Lecture 15 - NP Completeness 1

arxiv: v1 [cs.ds] 1 Nov 2018

Reconstruction of certain phylogenetic networks from their tree-average distances

Admin NP-COMPLETE PROBLEMS. Run-time analysis. Tractable vs. intractable problems 5/2/13. What is a tractable problem?

Computational Complexity in Numerical Optimization

Principles of AI Planning

1 Computational problems

Algorithm Design and Analysis

Graph Theory and Optimization Computational Complexity (in brief)

Lecture 59 : Instance Compression and Succinct PCP s for NP

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch]

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

Finite Metric Spaces & Their Embeddings: Introduction and Basic Tools

Comp487/587 - Boolean Formulas

Show that the following problems are NP-complete

Mathematical Preliminaries

8. INTRACTABILITY I. Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley. Last updated on 2/6/18 2:16 AM

A 3-APPROXIMATION ALGORITHM FOR THE SUBTREE DISTANCE BETWEEN PHYLOGENIES. 1. Introduction

NP-Completeness. NP-Completeness 1

Exact Max 2-SAT: Easier and Faster. Martin Fürer Shiva Prasad Kasiviswanathan Pennsylvania State University, U.S.A

Math 239: Discrete Mathematics for the Life Sciences Spring Lecture 14 March 11. Scribe/ Editor: Maria Angelica Cueto/ C.E.

Lecture 21: Counting and Sampling Problems

Tree-width and algorithms

Summer School on Introduction to Algorithms and Optimization Techniques July 4-12, 2017 Organized by ACMU, ISI and IEEE CEDA.

Polynomial-time reductions. We have seen several reductions:

CS483 Design and Analysis of Algorithms

Computational Complexity

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan August 30, Notes for Lecture 1

W -Hardness under Linear FPT-Reductions: Structural Properties and Further Applications

fsat We next show that if sat P, then fsat has a polynomial-time algorithm. c 2010 Prof. Yuh-Dauh Lyuu, National Taiwan University Page 425

Counting Colorings on Cubic Graphs

Complexity Theory: The P vs NP question

Transcription:

Notes 3 : Maximum Parsimony MATH 833 - Fall 2012 Lecturer: Sebastien Roch References: [SS03, Chapter 5], [DPV06, Chapter 8] 1 Beyond Perfect Phylogenies Viewing (full, i.e., defined on all of X) binary characters as X-splits, the Splits- Equivalence Theorem and its proof via the Tree Popping procedure provide an algorithmic solution to the problem of checking whether a collection of binary characters are compatible and, if so, of constructing a minimal X-tree on which they are convex. (For a discussion of the more general non-binary, non-full problem, see [SS03, Chapters 4, 6].) However, typical data may not be compatible and a more flexible approach is needed. DEF 3.1 (Parsimony Score) Let C be a character state space with C 2, let χ be a (full) character on X and let T = (T, φ) be an X-tree with T = (V, E). Let χ be an extension of χ to V. The changing number ch( χ) is ch( χ) = {{u, v} E} : χ(u) χ(v). The parsimony score l(χ, T ) of χ on T is the minimum value of ch( χ) over all extensions of χ on T. For a collection C = {χ 1,..., χ k } of characters, the parsimony score of C on T is k l(c, T ) = l(χ i, T ). i=1 A maximum parsimony tree T for C is an X-tree which minimizes l(c, T ) over all X-trees. The corresponding parsimony score is denoted by l(c). A natural generalization of the parsimony score is obtained by considering a metric δ on C and replacing ch( χ) with δ( χ(u), χ(v)). e={u,v} E 1

Notes 3: Maximum Parsimony 2 We then use the notation l δ. Given a character χ on X, an X-tree T and a metric δ on C, one can compute the parsimony score l δ (χ, T ) using a technique known as dynamic programming. Choose an arbitrary root ρ on T. If v = φ(x) for some x X, for each α C let { 0, if χ(x) = α, l(v, α) = +, otherwise. (By convention, we assume that the parsimony score is + if two different states are assigned to the same node of T.) For all v / φ(x), let v 1,..., v m be the children of v (i.e., the immediate descendants of v in the partial order defined under the above rooting of T ) and for each α C define l(v, α) = m i=1 min {δ(α, β) + l(v i, β)}. β C Then, it is straighforward to check by induction that l δ (χ, T ) = min l(ρ, α), α C which can be computed recursively from the leaves up to the root. For a collection of characters C, one can compute l δ (C, T ) by computing the parsimony scores of each character separately. Computing l δ (C, T ) is known as the Fixed Tree Problem. As it turns out, computing l δ (C) is much harder and, as we now explain, no efficient procedure is likely to exist for it. 2 Computational Complexity: A Brief Overview We will use the notation g(n) = O(f(n)) to indicate that there is K > 0 such that g(n) Kf(n) for all n 1. The following definitions are intentionally informal. For more details, see [Pap94]. In a search problem, we are given an instance I and we are asked to find a solution S, that is, an object that meets certain requirements (or indicate that no such solution exists). For example, in the SAT problem, we are given a formula f over a Boolean vector x = (x 1,..., x n ) and we are asked to find an assignment for x such that f(x) is TRUE if such an assignment exists.

Notes 3: Maximum Parsimony 3 An algorithm A for a search problem is said to be efficient if the number of elementary operations it performs on any instance I is bounded by a polynomial in the size of the input, that is, there is a constant K > 0 such that the running time of A on an input of size n is O(n K ). EX 3.2 (Fixed Tree Problem: Dynamic Programming) Consider again the dynamic programming algorithm for solving the Fixed Tree Problem. For each vertex and each character state, we must perform a calculation which takes O(m C ) where m is the number of children of that particular vertex. Summing over all vertices and character states, we get a running time of O( V C 2 ). The input here is a character, a tree and a metric, the size of which is O( X + V + C 2 ). Hence, the dynamic programming procedure is efficient. EX 3.3 (Maximum Parsimony: Exhaustive Search) Suppose we are given a collection of characters C = {χ 1,..., χ k } on X and we seek to compute l (2) δ (C) for a metric δ, where l (2) δ indicates the maximum parsimony score restricted to binary phylogenetic trees on X. The input size is O(k X + C 2 ). If we perform an exhaustive search over all binary phylogenetic trees and use dynamic programming on each of them to compute its parsimony score, the running time is O(b( X ) V C 2 ), which is not polynomial in the size of the input. EXER 3.4 (Tree Popping) Show that the Tree Popping algorithm is efficient. What is its running time? The class of all search problems for which there exists an efficient algorithm is called P. Another important class of search problems is NP, which is defined as those problems for which a solution can be verified efficiently. For example, SAT is in NP as, given a solution x, it is easy to check whether f(x) is TRUE. (The standard definition involves decision problems which we will not discuss here.) An important conjecture is that P NP, that is, there exist problems in NP for which there is no efficient algorithm. In particular, it is possible a sub-class of NP consisting of the hardest problems within NP in the sense that the existence of an efficient algorithm for any such problem would lead to an efficient algorithm for any problem in NP. Such problems are called NP-complete and require the notion of a reduction to be defined. A reduction from a search problem A to a search problem B is: An efficient algorithm f that transforms any instance I of A into an instance f(i) of B, together with another efficient algorithm h that maps any solution S of f(i) back into a solution h(s) of I.

Notes 3: Maximum Parsimony 4 See [DPV06] for several examples of reductions. Then, the class of NP-complete problems is defined as follows: A search problem is NP-complete if all other search problems in NP reduce to it. It is well-known that SAT is NP-complete. 3 Maximum Parsimony is NP-complete A natural way to transform an minimization problem such as Maximum Parsimony into a search problem is to add to the input a threshold g and ask for a solution with objective function below g. Then, the following was shown by Graham and Foulds: THM 3.5 (Complexity of Maximum Parsimony) The search problem corresponding to Maximum Parsimony is NP-complete. In other words, it is unlikely that an efficient algorithm exists for Maximum Parsimony. Further reading The definitions and results discussed here were taken from Chapter 5 of [SS03] and Chapter 8 of [DPV06]. The rigorous theory of computational complexity is described at length in [Pap94]. The proof that Maximum Parsimony is NPcomplete can be found in [GF82]. References [DPV06] S. Dasgupta, C. Papadimitriou, and U. Vazirani. Algorithms. McGraw- Hill, 2006. [GF82] R. L. Graham and L. R. Foulds. Unlikelihood that minimal phylogenies for a realistic biological study can be constructed in reasonable computational time. Math. Biosci., 60:133 142, 1982.

Notes 3: Maximum Parsimony 5 [Pap94] Christos H. Papadimitriou. Computational complexity. Addison- Wesley Publishing Company, Reading, MA, 1994. [SS03] Charles Semple and Mike Steel. Phylogenetics, volume 24 of Oxford Lecture Series in Mathematics and its Applications. Oxford University Press, Oxford, 2003.