Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden

Size: px
Start display at page:

Download "Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden"

Transcription

1 Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1

2 Results Multiple Alignment with SP-score Star Alignment Tree Alignment (with given phylogeny) are NP-hard under all metrics! 2

3 Pairwise Alignment Mutations: substitutions, insertions, and deletions. Input: Two related strings s 1 and s 2. Output: The least number of mutations needed to s 1 s 2. s 1 = a a g a c t s 2 = a g t g c t 3

4 Pairwise Alignment Mutations: substitutions, insertions, and deletions. Input: Two related strings s 1 and s 2. Output: The least number of mutations needed to s 1 s 2. 2 l matrix A s 1 = a a g a c t s 2 = a g t g c t d A (s 1, s 2 ) = 3 mutations 3

5 Metric symbol distance Unit metric - binary alphabet (the edit distance) Σ

6 Metric symbol distance Unit metric - binary alphabet (the edit distance) Σ Insertions and deletions occur less frequently! 4

7 Metric symbol distance Unit metric - binary alphabet (the edit distance) Σ Insertions and deletions occur less frequently! General metric - α, β, γ have metric properties (identity, symmetry, triangle ineq.) Σ α β 1 α 0 γ β γ 0 4

8 Multiple Alignment When sequence similarity is low pairwise alignment may fail to find similarities. 5

9 Multiple Alignment When sequence similarity is low pairwise alignment may fail to find similarities. Considering several strings may be helpful. Input: k strings s 1... s k. Output: k l matrix A. a a g a - c t - - g a g c t a c g a g c - a - g t g c t 5

10 Multiple Alignment When sequence similarity is low pairwise alignment may fail to find similarities. Considering several strings may be helpful. Input: k strings s 1... s k. Output: k l matrix A. a a g a - c t - - g a g c t a c g a g c - a - g t g c t How do we score columns? 5

11 Sum of Pairs score (SP-score) Let the cost be the sum of costs for all pairs of rows: k i=1 k j=i d A (s i, s j ), where d A (s i, s j ) is the pairwise distance between the rows containing strings s i and s j. 6

12 Earlier NP results Wang and Jiang 94 - Non-metric Bonizzoni and Vedova 01 - Specific metric (we build on their construction) Just 01 - Many metrics

13 Earlier NP results Wang and Jiang 94 - Non-metric Bonizzoni and Vedova 01 - Specific metric (we build on their construction) Just 01 - Many metrics

14 Earlier NP results Wang and Jiang 94 - Non-metric Bonizzoni and Vedova 01 - Specific metric (we build on their construction) Just 01 - Many metrics

15 Earlier NP results Wang and Jiang 94 - Non-metric Bonizzoni and Vedova 01 - Specific metric (we build on their construction) Just 01 - Many metrics In particular it was unknown for the unit metric.

16 Earlier NP results Wang and Jiang 94 - Non-metric Bonizzoni and Vedova 01 - Specific metric (we build on their construction) Just 01 - Many metrics In particular it was unknown for the unit metric. Here: All binary or larger alphabets under all metrics!

17 Independent R3 Set Independent set in three regular graphs, i.e. all vertices have degree 3, is NP-hard. Input: A three regular graph G = (V, E) and a integer c. Output: Yes if there is an independent set V V of size c, otherwise No. 8

18 Independent R3 Set Independent set in three regular graphs, i.e. all vertices have degree 3, is NP-hard. Input: A three regular graph G = (V, E) and a integer c. Output: Yes if there is an independent set V V of size c, otherwise No. 8

19 Reduction Independent R3 Set G = (V, E) c set of size c V SP-score Set of strings K matrix of cost K A 9

20 Reduction Independent R3 Set G = (V, E) c set of size c V SP-score Set of strings K matrix of cost K A v 1 v 4 v 2 v 3 v 1 v 2 v 3 v

21 Gadgetering 1. Each vertex is represented by a column (n vertex columns). 2. c vertex columns are picked. 3. Each edge is represented by an edge string. v 1 v 2... v i... vn

22 Template strings Vertex columns We add very many template strings: T = (10 b ) n 1 1 v 1 v 2... v i... vn Fact: Identical strings are aligned identically in an optimal alignment. 11

23 Pick strings V V We add very many pick strings: P = 1 c. v 1 v 2... v i... vn Fact: c vertex columns are picked since this is the alignment with least missmatches. 12

24 Edge strings Property: If there is an edge (v i, v j ) then both v i and v j can not be part of the independent set. E ij = 0 s (00 b ) i 1 10 b s (00 b ) j i 1 10 b (00 b ) n j 1 00 s v 1 v i v j... vn good 0 s s good s s bad 0 s s s 13

25 Independent set c Alignment K ( K = (n c)γb 2 + b(n 1)βb 2 + (sβ + nα)bm + cα + (s + b(n ) 1) + n c 2)β + 2γ bm 3cb(α + γ β) + (sβ + 2α)m 2 14

26 Independent set c Alignment K 1. Remember three regular graph. So there are atmost three ones in each vertex column. 2. If a column with only two ones is picked then there is a missmatch extra for each pick string ( score > K). 14

27 Example 1. Atmost three ones in each vertex column. 2. Pick strings pick columns with most ones. v 1 v 4 v 2 v 3 v 1 v 2 v 3 v E E E E E E

28 Example 1. Atmost three ones in each vertex column. 2. Pick strings pick columns with most ones. v 1 v 4 v 2 v 3 v 1 v 2 v 3 v E E E E E E

29 Example 1. Atmost three ones in each vertex column. 2. Pick strings pick columns with most ones. v 1 v 4 v 2 v 3 v 1 v 2 v 3 v E E E E E E

30 Example 1. Atmost three ones in each vertex column. 2. Pick strings pick columns with most ones. v 1 v 4 v 2 v 3 v 1 v 2 v 3 v E E E E E E

31 Star Alignment SP-score not a good model of evolution. s 2 s 4 c s 1 s 3 16

32 Star Alignment SP-score not a good model of evolution. Input: k strings s 1... s k Output: string c minimizing d(c, s i ) i s 2 s 4? s 1 s 3 16

33 Star Alignment SP-score not a good model of evolution. Input: k strings s 1... s k Output: string c minimizing d(c, s i ) i s 2 s 4? s 1 s 3 Symbol distance metric = Pairwise alignment distance metric Star Alignment is a special case of Steiner Star in a metric space. 16

34 Earlier results - Star Alignment Wang and Jiang 94 - Non-metric APX-complete Li, Ma, Wang 99 - Constant number gaps, NP-hard and PTAS Sim and Park 01 - Specific metric NP-hard

35 Earlier results - Star Alignment Wang and Jiang 94 - Non-metric APX-complete Li, Ma, Wang 99 - Constant number gaps, NP-hard and PTAS Sim and Park 01 - Specific metric NP-hard

36 Earlier results - Star Alignment Wang and Jiang 94 - Non-metric APX-complete Li, Ma, Wang 99 - Constant number gaps, NP-hard and PTAS Sim and Park 01 - Specific metric NP-hard

37 Earlier results - Star Alignment Wang and Jiang 94 - Non-metric APX-complete Li, Ma, Wang 99 - Constant number gaps, NP-hard and PTAS Sim and Park 01 - Specific metric NP-hard Here: All binary or larger alphabets under all metrics!

38 Reduction Vertex Cover G = (V, E) minimum cover V Star Alignment Set of strings minimum string c = DDCDD 18

39 Reduction Vertex Cover G = (V, E) minimum cover V Star Alignment Set of strings minimum string c = DDCDD v 1 v 2 v 3... v n C =...?...?...?...?... 18

40 Reduction Vertex Cover G = (V, E) minimum cover V Star Alignment Set of strings minimum string c = DDCDD v 1 v 2 v 3... v n C V = C V - Only 1 s. 18

41 Reduction Vertex Cover G = (V, E) minimum cover V Star Alignment Set of strings minimum string c = DDCDD v 1 v 2 v 3... v n C = C V - Only 1 s. C - Only 0 s. 18

42 Construction Idea (E, G) c = DDCDD (E, S ab ) c = vertex cover G minimum cover S ab E E c G E S a b E G G String minimizing i d(c, s i) minimum cover 19

43 Canonical Structure E = DDC V DD G = DDC DD Differ only in the vertex positions. 20

44 Canonical Structure E = DDC V DD G = DDC DD Differ only in the vertex positions. Many such pairs will vote for the canonical structure. E G c E G 20

45 Canonical Structure E = DDC V DD G = DDC DD Differ only in the vertex positions. Many such pairs will vote for the canonical structure. Vote even in the vertex positions! E G c E G 20

46 Canonical Structure E = DDC V DD G = DDC DD Differ only in the vertex positions. Many such pairs will vote for the canonical structure. Vote even in the vertex positions! E G c E G c = DDCDD 20

47 String Cover Edge (v a, v b ) add two strings E = DDC V DD S ab = C a DC b v 1... v a... v b... v n C a =

48 String Cover Edge (v a, v b ) add two strings E = DDC V DD S ab = C a DC b v 1... v a... v b... v n C b =

49 String Cover Edge (v a, v b ) add two strings E = DDC V DD S ab = C a DC b E votes 1 in all vertex positions. 21

50 String Cover Edge (v a, v b ) add two strings E = DDC V DD S ab = C a DC b E votes 1 in all vertex positions. S ab votes 0 in all except vertex position a or b. D D C D D C a D C b C a D C b 21

51 String Cover Edge (v a, v b ) add two strings E = DDC V DD S ab = C a DC b E votes 1 in all vertex positions. S ab votes 0 in all except vertex position a or b. D D C D D C a D C b C a D C b Either vertex v a or vertex v b is part of the cover. 21

52 Minimal String Minimum Cover (E, G) c = DDCDD (E, S ab ) c = vertex cover S ab E E c G E S a b E G 22

53 Minimal String Minimum Cover (E, G) c = DDCDD (E, S ab ) c = vertex cover S ab E E c G E S a b E G G G = DDC DD penalizes each 1 in vertex positions. 22

54 Minimal String Minimum Cover (E, G) c = DDCDD (E, S ab ) c = vertex cover S ab E E c G E S a b E G G G = DDC DD penalizes each 1 in vertex positions. String minimizing i d(c, s i) minimum cover 22

55 Tree Alignment (given phylogeny) Star Alignment not a good model of evolution. p 1 p 2 p 3 s 1 s 2 s 3 s 4 23

56 Tree Alignment (given phylogeny) Star Alignment not a good model of evolution. Input: k strings s 1... s k phylogeny T Output: strings p 1... p k 1 min (a,b) E(T ) d(a, b)??? s 1 s 2 s 3 s 4 23

57 Tree Alignment (given phylogeny) Star Alignment not a good model of evolution. Input: k strings s 1... s k phylogeny T Output: strings p 1... p k 1 min (a,b) E(T ) d(a, b)??? s 1 s 2 s 3 s 4 Symbol distance metric = Pairwise alignment distance metric Tree Alignment is a special case of Steiner Tree in a metric space. 23

58 Construction Idea Same idea as for Star Alignment. r base comp. inbetween each branch E G E G E G r base comp. E G E E G r base comp. E G G E S ij m branches S i j 24

59 Overview and Open problems Problem Here Approx SP-score all metrics 2-approx [GPBL] Star Alignment all metrics 2-approx Tree Alignment all metrics PTAS [WJGL] (given phylogeny) 25

60 Overview and Open problems Problem Here Approx SP-score all metrics 2-approx [GPBL] Star Alignment all metrics 2-approx Tree Alignment all metrics PTAS [WJGL] (given phylogeny) Consensus Patterns NP-hard PTAS [LMW] Substring Parsimony NP-hard PTAS [ WJGL] 25

61 Acknowledgments My advisor Prof. Jens Lagergren Prof. Benny for hosting me Thanks! 26

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny

More information

ON THE APPROXIMABILITY OF

ON THE APPROXIMABILITY OF APPROX October 30, 2018 Overview 1 2 3 4 5 6 Closing remarks and questions What is an approximation algorithm? Definition 1.1 An algorithm A for an optimization problem X is an a approximation algorithm

More information

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17: Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:

More information

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55 Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise

More information

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction. Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal

More information

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch

More information

Theory of Computation Time Complexity

Theory of Computation Time Complexity Theory of Computation Time Complexity Bow-Yaw Wang Academia Sinica Spring 2012 Bow-Yaw Wang (Academia Sinica) Time Complexity Spring 2012 1 / 59 Time for Deciding a Language Let us consider A = {0 n 1

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Combinatorial Optimization

Combinatorial Optimization Combinatorial Optimization Problem set 8: solutions 1. Fix constants a R and b > 1. For n N, let f(n) = n a and g(n) = b n. Prove that f(n) = o ( g(n) ). Solution. First we observe that g(n) 0 for all

More information

Evolutionary Tree Analysis. Overview

Evolutionary Tree Analysis. Overview CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

More information

Implementing Approximate Regularities

Implementing Approximate Regularities Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National

More information

Math.3336: Discrete Mathematics. Chapter 9 Relations

Math.3336: Discrete Mathematics. Chapter 9 Relations Math.3336: Discrete Mathematics Chapter 9 Relations Instructor: Dr. Blerina Xhabli Department of Mathematics, University of Houston https://www.math.uh.edu/ blerina Email: blerina@math.uh.edu Fall 2018

More information

Non-Approximability Results (2 nd part) 1/19

Non-Approximability Results (2 nd part) 1/19 Non-Approximability Results (2 nd part) 1/19 Summary - The PCP theorem - Application: Non-approximability of MAXIMUM 3-SAT 2/19 Non deterministic TM - A TM where it is possible to associate more than one

More information

CMPT307: Complexity Classes: P and N P Week 13-1

CMPT307: Complexity Classes: P and N P Week 13-1 CMPT307: Complexity Classes: P and N P Week 13-1 Xian Qiu Simon Fraser University xianq@sfu.ca Strings and Languages an alphabet Σ is a finite set of symbols {0, 1}, {T, F}, {a, b,..., z}, N a string x

More information

Integer Programming for Phylogenetic Network Problems

Integer Programming for Phylogenetic Network Problems Integer Programming for Phylogenetic Network Problems D. Gusfield University of California, Davis Presented at the National University of Singapore, July 27, 2015.! There are many important phylogeny problems

More information

Value-Ordering and Discrepancies. Ciaran McCreesh and Patrick Prosser

Value-Ordering and Discrepancies. Ciaran McCreesh and Patrick Prosser Value-Ordering and Discrepancies Maintaining Arc Consistency (MAC) Achieve (generalised) arc consistency (AC3, etc). If we have a domain wipeout, backtrack. If all domains have one value, we re done. Pick

More information

The Intractability of Computing the Hamming Distance

The Intractability of Computing the Hamming Distance The Intractability of Computing the Hamming Distance Bodo Manthey and Rüdiger Reischuk Universität zu Lübeck, Institut für Theoretische Informatik Wallstraße 40, 23560 Lübeck, Germany manthey/reischuk@tcs.uni-luebeck.de

More information

The Complexity of Probabilistic Lobbying

The Complexity of Probabilistic Lobbying Gábor Erdélyi 1, Henning Fernau 2, Judy Goldsmith 3 Nicholas Mattei 3, Daniel Raible 2, and Jörg Rothe 1 1 ccc.cs.uni-duesseldorf.de/~{erdelyi,rothe} 2 Univ. Trier, FB 4 Abteilung Informatik www.informatik.uni-trier.de/~{fernau,raible},

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

7 Distances. 7.1 Metrics. 7.2 Distances L p Distances

7 Distances. 7.1 Metrics. 7.2 Distances L p Distances 7 Distances We have mainly been focusing on similarities so far, since it is easiest to explain locality sensitive hashing that way, and in particular the Jaccard similarity is easy to define in regards

More information

Combinatorial optimization problems

Combinatorial optimization problems Combinatorial optimization problems Heuristic Algorithms Giovanni Righini University of Milan Department of Computer Science (Crema) Optimization In general an optimization problem can be formulated as:

More information

PCPs and Inapproximability Gap-producing and Gap-Preserving Reductions. My T. Thai

PCPs and Inapproximability Gap-producing and Gap-Preserving Reductions. My T. Thai PCPs and Inapproximability Gap-producing and Gap-Preserving Reductions My T. Thai 1 1 Hardness of Approximation Consider a maximization problem Π such as MAX-E3SAT. To show that it is NP-hard to approximation

More information

Consensus Optimizing Both Distance Sum and Radius

Consensus Optimizing Both Distance Sum and Radius Consensus Optimizing Both Distance Sum and Radius Amihood Amir 1, Gad M. Landau 2, Joong Chae Na 3, Heejin Park 4, Kunsoo Park 5, and Jeong Seop Sim 6 1 Bar-Ilan University, 52900 Ramat-Gan, Israel 2 University

More information

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas n Introduction to Bioinformatics lgorithms Multiple lignment Slides revised and adapted to Bioinformática IS 2005 na eresa Freitas n Introduction to Bioinformatics lgorithms Outline Dynamic Programming

More information

Structure-Based Comparison of Biomolecules

Structure-Based Comparison of Biomolecules Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein

More information

Hardness of Approximation

Hardness of Approximation Hardness of Approximation We have seen several methods to find approximation algorithms for NP-hard problems We have also seen a couple of examples where we could show lower bounds on the achievable approxmation

More information

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter

Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Institute of Bioinformatics Johannes Kepler University, Linz, Austria Sequence Alignment 2. Sequence Alignment Sequence Alignment 2.1

More information

Network Design and Game Theory Spring 2008 Lecture 6

Network Design and Game Theory Spring 2008 Lecture 6 Network Design and Game Theory Spring 2008 Lecture 6 Guest Lecturer: Aaron Archer Instructor: Mohammad T. Hajiaghayi Scribe: Fengming Wang March 3, 2008 1 Overview We study the Primal-dual, Lagrangian

More information

BBM402-Lecture 11: The Class NP

BBM402-Lecture 11: The Class NP BBM402-Lecture 11: The Class NP Lecturer: Lale Özkahya Resources for the presentation: http://ocw.mit.edu/courses/electrical-engineering-andcomputer-science/6-045j-automata-computability-andcomplexity-spring-2011/syllabus/

More information

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved. Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved. 1 Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should

More information

Multiple Sequence Alignment

Multiple Sequence Alignment Multiple Sequence Alignment Multiple Alignment versus Pairwise Alignment Up until now we have only tried to align two sequences. What about more than two? And what for? A faint similarity between two sequences

More information

Approximation algorithms and mechanism design for minimax approval voting

Approximation algorithms and mechanism design for minimax approval voting Approximation algorithms and mechanism design for minimax approval voting Ioannis Caragiannis Dimitris Kalaitzis University of Patras Vangelis Markakis Athens University of Economics and Business Outline

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How

More information

Lecture 18 Classification of Dynkin Diagrams

Lecture 18 Classification of Dynkin Diagrams 18745 Introduction to Lie Algebras October 21, 2010 Lecture 18 Classification of Dynkin Diagrams Prof Victor Kac Scribe: Benjamin Iriarte 1 Examples of Dynkin diagrams In the examples that follow, we will

More information

Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions

Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions Copyright c 2015 Andrew Klapper. All rights reserved. 1. For the functions satisfying the following three recurrences, determine which is the

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

Section Summary. Relations and Functions Properties of Relations. Combining Relations

Section Summary. Relations and Functions Properties of Relations. Combining Relations Chapter 9 Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations Closures of Relations (not currently included

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

k-means Clustering Raten der µi Neubestimmung µi Zuordnungsschritt Konvergenz

k-means Clustering Raten der µi Neubestimmung µi Zuordnungsschritt Konvergenz k-means Clustering Neubestimmung µi Konvergenz Raten der µi Zuordnungsschritt Sebastian Böcker Bioinformatik und Genomforschung Clustering Microarrays 1 Parameterized algorithms and FPT we want to find

More information

Introduction to spectral alignment

Introduction to spectral alignment SI Appendix C. Introduction to spectral alignment Due to the complexity of the anti-symmetric spectral alignment algorithm described in Appendix A, this appendix provides an extended introduction to the

More information

NP and Computational Intractability

NP and Computational Intractability NP and Computational Intractability 1 Polynomial-Time Reduction Desiderata'. Suppose we could solve X in polynomial-time. What else could we solve in polynomial time? don't confuse with reduces from Reduction.

More information

Definition: A binary relation R from a set A to a set B is a subset R A B. Example:

Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Chapter 9 1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Let A = {0,1,2} and B = {a,b} {(0, a), (0, b), (1,a), (2, b)} is a relation from A to B.

More information

Maths GCSE Langdon Park Foundation Calculator pack A

Maths GCSE Langdon Park Foundation Calculator pack A Maths GCSE Langdon Park Foundation Calculator pack A Name: Class: Date: Time: 96 minutes Marks: 89 marks Comments: Q1. The table shows how 25 students travel to school. Walk Bus Car Taxi 9 8 7 1 Draw a

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

1 Alphabets and Languages

1 Alphabets and Languages 1 Alphabets and Languages Look at handout 1 (inference rules for sets) and use the rules on some examples like {a} {{a}} {a} {a, b}, {a} {{a}}, {a} {{a}}, {a} {a, b}, a {{a}}, a {a, b}, a {{a}}, a {a,

More information

Algorithm Analysis Divide and Conquer. Chung-Ang University, Jaesung Lee

Algorithm Analysis Divide and Conquer. Chung-Ang University, Jaesung Lee Algorithm Analysis Divide and Conquer Chung-Ang University, Jaesung Lee Introduction 2 Divide and Conquer Paradigm 3 Advantages of Divide and Conquer Solving Difficult Problems Algorithm Efficiency Parallelism

More information

Approximation Algorithms and Hardness of Approximation. IPM, Jan Mohammad R. Salavatipour Department of Computing Science University of Alberta

Approximation Algorithms and Hardness of Approximation. IPM, Jan Mohammad R. Salavatipour Department of Computing Science University of Alberta Approximation Algorithms and Hardness of Approximation IPM, Jan 2006 Mohammad R. Salavatipour Department of Computing Science University of Alberta 1 Introduction For NP-hard optimization problems, we

More information

Bio nformatics. Lecture 3. Saad Mneimneh

Bio nformatics. Lecture 3. Saad Mneimneh Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per

More information

Bio nformatics. Lecture 16. Saad Mneimneh

Bio nformatics. Lecture 16. Saad Mneimneh Bio nformatics Lecture 16 DNA sequencing To sequence a DNA is to obtain the string of bases that it contains. It is impossible to sequence the whole DNA molecule directly. We may however obtain a piece

More information

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n Aside: Golden Ratio Golden Ratio: A universal law. Golden ratio φ = lim n a n+b n a n = 1+ 5 2 a n+1 = a n + b n, b n = a n 1 Ruta (UIUC) CS473 1 Spring 2018 1 / 41 CS 473: Algorithms, Spring 2018 Dynamic

More information

Strong Mathematical Induction

Strong Mathematical Induction Strong Mathematical Induction Lecture 23 Section 5.4 Robb T. Koether Hampden-Sydney College Mon, Feb 24, 2014 Robb T. Koether (Hampden-Sydney College) Strong Mathematical Induction Mon, Feb 24, 2014 1

More information

Graph. Supply Vertices and Demand Vertices. Supply Vertices. Demand Vertices

Graph. Supply Vertices and Demand Vertices. Supply Vertices. Demand Vertices Partitioning Graphs of Supply and Demand Generalization of Knapsack Problem Takao Nishizeki Tohoku University Graph Supply Vertices and Demand Vertices Supply Vertices Demand Vertices Graph Each Supply

More information

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems Rani M. R, Mohith Jagalmohanan, R. Subashini Binary matrices having simultaneous consecutive

More information

NP-Complete Problems. Complexity Class P. .. Cal Poly CSC 349: Design and Analyis of Algorithms Alexander Dekhtyar..

NP-Complete Problems. Complexity Class P. .. Cal Poly CSC 349: Design and Analyis of Algorithms Alexander Dekhtyar.. .. Cal Poly CSC 349: Design and Analyis of Algorithms Alexander Dekhtyar.. Complexity Class P NP-Complete Problems Abstract Problems. An abstract problem Q is a binary relation on sets I of input instances

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch]

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch] NP-Completeness Andreas Klappenecker [based on slides by Prof. Welch] 1 Prelude: Informal Discussion (Incidentally, we will never get very formal in this course) 2 Polynomial Time Algorithms Most of the

More information

X-MA2C01-1: Partial Worked Solutions

X-MA2C01-1: Partial Worked Solutions X-MAC01-1: Partial Worked Solutions David R. Wilkins May 013 1. (a) Let A, B and C be sets. Prove that (A \ (B C)) (B \ C) = (A B) \ C. [Venn Diagrams, by themselves without an accompanying logical argument,

More information

Efficient High-Similarity String Comparison: The Waterfall Algorithm

Efficient High-Similarity String Comparison: The Waterfall Algorithm Efficient High-Similarity String Comparison: The Waterfall Algorithm Alexander Tiskin Department of Computer Science University of Warwick http://go.warwick.ac.uk/alextiskin Alexander Tiskin (Warwick)

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

ABHELSINKI UNIVERSITY OF TECHNOLOGY

ABHELSINKI UNIVERSITY OF TECHNOLOGY Approximation Algorithms Seminar 1 Set Cover, Steiner Tree and TSP Siert Wieringa siert.wieringa@tkk.fi Approximation Algorithms Seminar 1 1/27 Contents Approximation algorithms for: Set Cover Steiner

More information

Local Alignment of RNA Sequences with Arbitrary Scoring Schemes

Local Alignment of RNA Sequences with Arbitrary Scoring Schemes Local Alignment of RNA Sequences with Arbitrary Scoring Schemes Rolf Backofen 1, Danny Hermelin 2, ad M. Landau 2,3, and Oren Weimann 4 1 Institute of omputer Science, Albert-Ludwigs niversität Freiburg,

More information

Phylogeny: traditional and Bayesian approaches

Phylogeny: traditional and Bayesian approaches Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent

More information

CSE 421 NP-Completeness

CSE 421 NP-Completeness CSE 421 NP-Completeness Yin at Lee 1 Cook-Levin heorem heorem (Cook 71, Levin 73): 3-SA is NP-complete, i.e., for all problems A NP, A p 3-SA. (See CSE 431 for the proof) So, 3-SA is the hardest problem

More information

Pairwise sequence alignment

Pairwise sequence alignment Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

More information

Counting Strategies: Inclusion-Exclusion, Categories

Counting Strategies: Inclusion-Exclusion, Categories Counting Strategies: Inclusion-Exclusion, Categories Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ May 4, 2016 A scheduling problem In one

More information

Dynamic programming. Curs 2015

Dynamic programming. Curs 2015 Dynamic programming. Curs 2015 Fibonacci Recurrence. n-th Fibonacci Term INPUT: n nat QUESTION: Compute F n = F n 1 + F n 2 Recursive Fibonacci (n) if n = 0 then return 0 else if n = 1 then return 1 else

More information

Advanced Algorithms 南京大学 尹一通

Advanced Algorithms 南京大学 尹一通 Advanced Algorithms 南京大学 尹一通 LP-based Algorithms LP rounding: Relax the integer program to LP; round the optimal LP solution to a nearby feasible integral solution. The primal-dual schema: Find a pair

More information

Phylogenetic Reconstruction with Insertions and Deletions

Phylogenetic Reconstruction with Insertions and Deletions Phylogenetic Reconstruction with Insertions and Deletions Alexandr Andoni, Mark Braverman, Avinatan Hassidim April 7, 2010 Abstract We study phylogenetic reconstruction of evolutionary trees, with three

More information

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD Greedy s Greedy s Shortest path Claim 2: Let S be a subset of vertices containing s such that we know the shortest path length l(s, u) from s to any vertex in u S. Let e = (u, v) be an edge such that 1

More information

High Dimensional Search Min- Hashing Locality Sensi6ve Hashing

High Dimensional Search Min- Hashing Locality Sensi6ve Hashing High Dimensional Search Min- Hashing Locality Sensi6ve Hashing Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata September 8 and 11, 2014 High Support Rules vs Correla6on of

More information

July 18, Approximation Algorithms (Travelling Salesman Problem)

July 18, Approximation Algorithms (Travelling Salesman Problem) Approximation Algorithms (Travelling Salesman Problem) July 18, 2014 The travelling-salesman problem Problem: given complete, undirected graph G = (V, E) with non-negative integer cost c(u, v) for each

More information

Lecture 4: An FPTAS for Knapsack, and K-Center

Lecture 4: An FPTAS for Knapsack, and K-Center Comp 260: Advanced Algorithms Tufts University, Spring 2016 Prof. Lenore Cowen Scribe: Eric Bailey Lecture 4: An FPTAS for Knapsack, and K-Center 1 Introduction Definition 1.0.1. The Knapsack problem (restated)

More information

Enumeration Schemes for Words Avoiding Permutations

Enumeration Schemes for Words Avoiding Permutations Enumeration Schemes for Words Avoiding Permutations Lara Pudwell November 27, 2007 Abstract The enumeration of permutation classes has been accomplished with a variety of techniques. One wide-reaching

More information

SAT, NP, NP-Completeness

SAT, NP, NP-Completeness CS 473: Algorithms, Spring 2018 SAT, NP, NP-Completeness Lecture 22 April 13, 2018 Most slides are courtesy Prof. Chekuri Ruta (UIUC) CS473 1 Spring 2018 1 / 57 Part I Reductions Continued Ruta (UIUC)

More information

Generalized Splines. Madeline Handschy, Julie Melnick, Stephanie Reinders. Smith College. April 1, 2013

Generalized Splines. Madeline Handschy, Julie Melnick, Stephanie Reinders. Smith College. April 1, 2013 Smith College April 1, 213 What is a Spline? What is a Spline? are used in engineering to represent objects. What is a Spline? are used in engineering to represent objects. What is a Spline? are used

More information

Linear Programming. Scheduling problems

Linear Programming. Scheduling problems Linear Programming Scheduling problems Linear programming (LP) ( )., 1, for 0 min 1 1 1 1 1 11 1 1 n i x b x a x a b x a x a x c x c x z i m n mn m n n n n! = + + + + + + = Extreme points x ={x 1,,x n

More information

CSCI3390-Second Test with Solutions

CSCI3390-Second Test with Solutions CSCI3390-Second Test with Solutions April 26, 2016 Each of the 15 parts of the problems below is worth 10 points, except for the more involved 4(d), which is worth 20. A perfect score is 100: if your score

More information

Scheduling on Unrelated Parallel Machines. Approximation Algorithms, V. V. Vazirani Book Chapter 17

Scheduling on Unrelated Parallel Machines. Approximation Algorithms, V. V. Vazirani Book Chapter 17 Scheduling on Unrelated Parallel Machines Approximation Algorithms, V. V. Vazirani Book Chapter 17 Nicolas Karakatsanis, 2008 Description of the problem Problem 17.1 (Scheduling on unrelated parallel machines)

More information

Chapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013

Chapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013 Chapter 2 Reductions and NP CS 573: Algorithms, Fall 2013 August 29, 2013 2.1 Reductions Continued 2.1.1 The Satisfiability Problem SAT 2.1.1.1 Propositional Formulas Definition 2.1.1. Consider a set of

More information

Lecture 2: Pairwise Alignment. CG Ron Shamir

Lecture 2: Pairwise Alignment. CG Ron Shamir Lecture 2: Pairwise Alignment 1 Main source 2 Why compare sequences? Human hexosaminidase A vs Mouse hexosaminidase A 3 www.mathworks.com/.../jan04/bio_genome.html Sequence Alignment עימוד רצפים The problem:

More information

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. x 1 x 2 x 3 x K Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)

More information

Proving languages to be nonregular

Proving languages to be nonregular Proving languages to be nonregular We already know that there exist languages A Σ that are nonregular, for any choice of an alphabet Σ. This is because there are uncountably many languages in total and

More information

arxiv:cs/ v1 [cs.cc] 21 May 2002

arxiv:cs/ v1 [cs.cc] 21 May 2002 Parameterized Intractability of Motif Search Problems arxiv:cs/0205056v1 [cs.cc] 21 May 2002 Michael R. Fellows Jens Gramm Rolf Niedermeier Abstract We show that Closest Substring, one of the most important

More information

Space Complexity. The space complexity of a program is how much memory it uses.

Space Complexity. The space complexity of a program is how much memory it uses. Space Complexity The space complexity of a program is how much memory it uses. Measuring Space When we compute the space used by a TM, we do not count the input (think of input as readonly). We say that

More information

Coloring Vertices and Edges of a Path by Nonempty Subsets of a Set

Coloring Vertices and Edges of a Path by Nonempty Subsets of a Set Coloring Vertices and Edges of a Path by Nonempty Subsets of a Set P.N. Balister E. Győri R.H. Schelp April 28, 28 Abstract A graph G is strongly set colorable if V (G) E(G) can be assigned distinct nonempty

More information

Relations. Relations of Sets N-ary Relations Relational Databases Binary Relation Properties Equivalence Relations. Reading (Epp s textbook)

Relations. Relations of Sets N-ary Relations Relational Databases Binary Relation Properties Equivalence Relations. Reading (Epp s textbook) Relations Relations of Sets N-ary Relations Relational Databases Binary Relation Properties Equivalence Relations Reading (Epp s textbook) 8.-8.3. Cartesian Products The symbol (a, b) denotes the ordered

More information

CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182

CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182 CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding 10-07 CSE182 Bell Labs Honors Pattern matching 10-07 CSE182 Just the Facts Consider the set of all substrings

More information

arxiv: v1 [cs.cc] 15 Nov 2016

arxiv: v1 [cs.cc] 15 Nov 2016 Diploid Alignment is NP-hard Romeo Rizzi 1, Massimo Cairo 1, Veli Mäkinen 2, and Daniel Valenzuela 2 1 Department of Computer Science, University of Verona, Italy 2 Helsinki Institute for Information echnology,

More information

Coloring Vertices and Edges of a Path by Nonempty Subsets of a Set

Coloring Vertices and Edges of a Path by Nonempty Subsets of a Set Coloring Vertices and Edges of a Path by Nonempty Subsets of a Set P.N. Balister E. Győri R.H. Schelp November 8, 28 Abstract A graph G is strongly set colorable if V (G) E(G) can be assigned distinct

More information

Time to learn about NP-completeness!

Time to learn about NP-completeness! Time to learn about NP-completeness! Harvey Mudd College March 19, 2007 Languages A language is a set of strings Examples The language of strings of all a s with odd length The language of strings with

More information

Search. Search is a key component of intelligent problem solving. Get closer to the goal if time is not enough

Search. Search is a key component of intelligent problem solving. Get closer to the goal if time is not enough Search Search is a key component of intelligent problem solving Search can be used to Find a desired goal if time allows Get closer to the goal if time is not enough section 11 page 1 The size of the search

More information

CS6999 Probabilistic Methods in Integer Programming Randomized Rounding Andrew D. Smith April 2003

CS6999 Probabilistic Methods in Integer Programming Randomized Rounding Andrew D. Smith April 2003 CS6999 Probabilistic Methods in Integer Programming Randomized Rounding April 2003 Overview 2 Background Randomized Rounding Handling Feasibility Derandomization Advanced Techniques Integer Programming

More information

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony ioinformatics -- lecture 9 Phylogenetic trees istance-based tree building Parsimony (,(,(,))) rees can be represented in "parenthesis notation". Each set of parentheses represents a branch-point (bifurcation),

More information

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely

Let S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT

More information

Finding Consensus Strings With Small Length Difference Between Input and Solution Strings

Finding Consensus Strings With Small Length Difference Between Input and Solution Strings Finding Consensus Strings With Small Length Difference Between Input and Solution Strings Markus L. Schmid Trier University, Fachbereich IV Abteilung Informatikwissenschaften, D-54286 Trier, Germany, MSchmid@uni-trier.de

More information