Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden
|
|
- Colin Nicholson
- 5 years ago
- Views:
Transcription
1 Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1
2 Results Multiple Alignment with SP-score Star Alignment Tree Alignment (with given phylogeny) are NP-hard under all metrics! 2
3 Pairwise Alignment Mutations: substitutions, insertions, and deletions. Input: Two related strings s 1 and s 2. Output: The least number of mutations needed to s 1 s 2. s 1 = a a g a c t s 2 = a g t g c t 3
4 Pairwise Alignment Mutations: substitutions, insertions, and deletions. Input: Two related strings s 1 and s 2. Output: The least number of mutations needed to s 1 s 2. 2 l matrix A s 1 = a a g a c t s 2 = a g t g c t d A (s 1, s 2 ) = 3 mutations 3
5 Metric symbol distance Unit metric - binary alphabet (the edit distance) Σ
6 Metric symbol distance Unit metric - binary alphabet (the edit distance) Σ Insertions and deletions occur less frequently! 4
7 Metric symbol distance Unit metric - binary alphabet (the edit distance) Σ Insertions and deletions occur less frequently! General metric - α, β, γ have metric properties (identity, symmetry, triangle ineq.) Σ α β 1 α 0 γ β γ 0 4
8 Multiple Alignment When sequence similarity is low pairwise alignment may fail to find similarities. 5
9 Multiple Alignment When sequence similarity is low pairwise alignment may fail to find similarities. Considering several strings may be helpful. Input: k strings s 1... s k. Output: k l matrix A. a a g a - c t - - g a g c t a c g a g c - a - g t g c t 5
10 Multiple Alignment When sequence similarity is low pairwise alignment may fail to find similarities. Considering several strings may be helpful. Input: k strings s 1... s k. Output: k l matrix A. a a g a - c t - - g a g c t a c g a g c - a - g t g c t How do we score columns? 5
11 Sum of Pairs score (SP-score) Let the cost be the sum of costs for all pairs of rows: k i=1 k j=i d A (s i, s j ), where d A (s i, s j ) is the pairwise distance between the rows containing strings s i and s j. 6
12 Earlier NP results Wang and Jiang 94 - Non-metric Bonizzoni and Vedova 01 - Specific metric (we build on their construction) Just 01 - Many metrics
13 Earlier NP results Wang and Jiang 94 - Non-metric Bonizzoni and Vedova 01 - Specific metric (we build on their construction) Just 01 - Many metrics
14 Earlier NP results Wang and Jiang 94 - Non-metric Bonizzoni and Vedova 01 - Specific metric (we build on their construction) Just 01 - Many metrics
15 Earlier NP results Wang and Jiang 94 - Non-metric Bonizzoni and Vedova 01 - Specific metric (we build on their construction) Just 01 - Many metrics In particular it was unknown for the unit metric.
16 Earlier NP results Wang and Jiang 94 - Non-metric Bonizzoni and Vedova 01 - Specific metric (we build on their construction) Just 01 - Many metrics In particular it was unknown for the unit metric. Here: All binary or larger alphabets under all metrics!
17 Independent R3 Set Independent set in three regular graphs, i.e. all vertices have degree 3, is NP-hard. Input: A three regular graph G = (V, E) and a integer c. Output: Yes if there is an independent set V V of size c, otherwise No. 8
18 Independent R3 Set Independent set in three regular graphs, i.e. all vertices have degree 3, is NP-hard. Input: A three regular graph G = (V, E) and a integer c. Output: Yes if there is an independent set V V of size c, otherwise No. 8
19 Reduction Independent R3 Set G = (V, E) c set of size c V SP-score Set of strings K matrix of cost K A 9
20 Reduction Independent R3 Set G = (V, E) c set of size c V SP-score Set of strings K matrix of cost K A v 1 v 4 v 2 v 3 v 1 v 2 v 3 v
21 Gadgetering 1. Each vertex is represented by a column (n vertex columns). 2. c vertex columns are picked. 3. Each edge is represented by an edge string. v 1 v 2... v i... vn
22 Template strings Vertex columns We add very many template strings: T = (10 b ) n 1 1 v 1 v 2... v i... vn Fact: Identical strings are aligned identically in an optimal alignment. 11
23 Pick strings V V We add very many pick strings: P = 1 c. v 1 v 2... v i... vn Fact: c vertex columns are picked since this is the alignment with least missmatches. 12
24 Edge strings Property: If there is an edge (v i, v j ) then both v i and v j can not be part of the independent set. E ij = 0 s (00 b ) i 1 10 b s (00 b ) j i 1 10 b (00 b ) n j 1 00 s v 1 v i v j... vn good 0 s s good s s bad 0 s s s 13
25 Independent set c Alignment K ( K = (n c)γb 2 + b(n 1)βb 2 + (sβ + nα)bm + cα + (s + b(n ) 1) + n c 2)β + 2γ bm 3cb(α + γ β) + (sβ + 2α)m 2 14
26 Independent set c Alignment K 1. Remember three regular graph. So there are atmost three ones in each vertex column. 2. If a column with only two ones is picked then there is a missmatch extra for each pick string ( score > K). 14
27 Example 1. Atmost three ones in each vertex column. 2. Pick strings pick columns with most ones. v 1 v 4 v 2 v 3 v 1 v 2 v 3 v E E E E E E
28 Example 1. Atmost three ones in each vertex column. 2. Pick strings pick columns with most ones. v 1 v 4 v 2 v 3 v 1 v 2 v 3 v E E E E E E
29 Example 1. Atmost three ones in each vertex column. 2. Pick strings pick columns with most ones. v 1 v 4 v 2 v 3 v 1 v 2 v 3 v E E E E E E
30 Example 1. Atmost three ones in each vertex column. 2. Pick strings pick columns with most ones. v 1 v 4 v 2 v 3 v 1 v 2 v 3 v E E E E E E
31 Star Alignment SP-score not a good model of evolution. s 2 s 4 c s 1 s 3 16
32 Star Alignment SP-score not a good model of evolution. Input: k strings s 1... s k Output: string c minimizing d(c, s i ) i s 2 s 4? s 1 s 3 16
33 Star Alignment SP-score not a good model of evolution. Input: k strings s 1... s k Output: string c minimizing d(c, s i ) i s 2 s 4? s 1 s 3 Symbol distance metric = Pairwise alignment distance metric Star Alignment is a special case of Steiner Star in a metric space. 16
34 Earlier results - Star Alignment Wang and Jiang 94 - Non-metric APX-complete Li, Ma, Wang 99 - Constant number gaps, NP-hard and PTAS Sim and Park 01 - Specific metric NP-hard
35 Earlier results - Star Alignment Wang and Jiang 94 - Non-metric APX-complete Li, Ma, Wang 99 - Constant number gaps, NP-hard and PTAS Sim and Park 01 - Specific metric NP-hard
36 Earlier results - Star Alignment Wang and Jiang 94 - Non-metric APX-complete Li, Ma, Wang 99 - Constant number gaps, NP-hard and PTAS Sim and Park 01 - Specific metric NP-hard
37 Earlier results - Star Alignment Wang and Jiang 94 - Non-metric APX-complete Li, Ma, Wang 99 - Constant number gaps, NP-hard and PTAS Sim and Park 01 - Specific metric NP-hard Here: All binary or larger alphabets under all metrics!
38 Reduction Vertex Cover G = (V, E) minimum cover V Star Alignment Set of strings minimum string c = DDCDD 18
39 Reduction Vertex Cover G = (V, E) minimum cover V Star Alignment Set of strings minimum string c = DDCDD v 1 v 2 v 3... v n C =...?...?...?...?... 18
40 Reduction Vertex Cover G = (V, E) minimum cover V Star Alignment Set of strings minimum string c = DDCDD v 1 v 2 v 3... v n C V = C V - Only 1 s. 18
41 Reduction Vertex Cover G = (V, E) minimum cover V Star Alignment Set of strings minimum string c = DDCDD v 1 v 2 v 3... v n C = C V - Only 1 s. C - Only 0 s. 18
42 Construction Idea (E, G) c = DDCDD (E, S ab ) c = vertex cover G minimum cover S ab E E c G E S a b E G G String minimizing i d(c, s i) minimum cover 19
43 Canonical Structure E = DDC V DD G = DDC DD Differ only in the vertex positions. 20
44 Canonical Structure E = DDC V DD G = DDC DD Differ only in the vertex positions. Many such pairs will vote for the canonical structure. E G c E G 20
45 Canonical Structure E = DDC V DD G = DDC DD Differ only in the vertex positions. Many such pairs will vote for the canonical structure. Vote even in the vertex positions! E G c E G 20
46 Canonical Structure E = DDC V DD G = DDC DD Differ only in the vertex positions. Many such pairs will vote for the canonical structure. Vote even in the vertex positions! E G c E G c = DDCDD 20
47 String Cover Edge (v a, v b ) add two strings E = DDC V DD S ab = C a DC b v 1... v a... v b... v n C a =
48 String Cover Edge (v a, v b ) add two strings E = DDC V DD S ab = C a DC b v 1... v a... v b... v n C b =
49 String Cover Edge (v a, v b ) add two strings E = DDC V DD S ab = C a DC b E votes 1 in all vertex positions. 21
50 String Cover Edge (v a, v b ) add two strings E = DDC V DD S ab = C a DC b E votes 1 in all vertex positions. S ab votes 0 in all except vertex position a or b. D D C D D C a D C b C a D C b 21
51 String Cover Edge (v a, v b ) add two strings E = DDC V DD S ab = C a DC b E votes 1 in all vertex positions. S ab votes 0 in all except vertex position a or b. D D C D D C a D C b C a D C b Either vertex v a or vertex v b is part of the cover. 21
52 Minimal String Minimum Cover (E, G) c = DDCDD (E, S ab ) c = vertex cover S ab E E c G E S a b E G 22
53 Minimal String Minimum Cover (E, G) c = DDCDD (E, S ab ) c = vertex cover S ab E E c G E S a b E G G G = DDC DD penalizes each 1 in vertex positions. 22
54 Minimal String Minimum Cover (E, G) c = DDCDD (E, S ab ) c = vertex cover S ab E E c G E S a b E G G G = DDC DD penalizes each 1 in vertex positions. String minimizing i d(c, s i) minimum cover 22
55 Tree Alignment (given phylogeny) Star Alignment not a good model of evolution. p 1 p 2 p 3 s 1 s 2 s 3 s 4 23
56 Tree Alignment (given phylogeny) Star Alignment not a good model of evolution. Input: k strings s 1... s k phylogeny T Output: strings p 1... p k 1 min (a,b) E(T ) d(a, b)??? s 1 s 2 s 3 s 4 23
57 Tree Alignment (given phylogeny) Star Alignment not a good model of evolution. Input: k strings s 1... s k phylogeny T Output: strings p 1... p k 1 min (a,b) E(T ) d(a, b)??? s 1 s 2 s 3 s 4 Symbol distance metric = Pairwise alignment distance metric Tree Alignment is a special case of Steiner Tree in a metric space. 23
58 Construction Idea Same idea as for Star Alignment. r base comp. inbetween each branch E G E G E G r base comp. E G E E G r base comp. E G G E S ij m branches S i j 24
59 Overview and Open problems Problem Here Approx SP-score all metrics 2-approx [GPBL] Star Alignment all metrics 2-approx Tree Alignment all metrics PTAS [WJGL] (given phylogeny) 25
60 Overview and Open problems Problem Here Approx SP-score all metrics 2-approx [GPBL] Star Alignment all metrics 2-approx Tree Alignment all metrics PTAS [WJGL] (given phylogeny) Consensus Patterns NP-hard PTAS [LMW] Substring Parsimony NP-hard PTAS [ WJGL] 25
61 Acknowledgments My advisor Prof. Jens Lagergren Prof. Benny for hosting me Thanks! 26
Page 1. Evolutionary Trees. Why build evolutionary tree? Outline
Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny
More informationON THE APPROXIMABILITY OF
APPROX October 30, 2018 Overview 1 2 3 4 5 6 Closing remarks and questions What is an approximation algorithm? Definition 1.1 An algorithm A for an optimization problem X is an a approximation algorithm
More informationMultiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:
Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:
More informationPairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55
Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise
More informationDynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.
Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal
More informationCSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary
CSCI1950 Z Computa4onal Methods for Biology Lecture 4 Ben Raphael February 2, 2009 hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary Parsimony Probabilis4c Method Input Output Sankoff s & Fitch
More informationTheory of Computation Time Complexity
Theory of Computation Time Complexity Bow-Yaw Wang Academia Sinica Spring 2012 Bow-Yaw Wang (Academia Sinica) Time Complexity Spring 2012 1 / 59 Time for Deciding a Language Let us consider A = {0 n 1
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationCombinatorial Optimization
Combinatorial Optimization Problem set 8: solutions 1. Fix constants a R and b > 1. For n N, let f(n) = n a and g(n) = b n. Prove that f(n) = o ( g(n) ). Solution. First we observe that g(n) 0 for all
More informationEvolutionary Tree Analysis. Overview
CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based
More informationImplementing Approximate Regularities
Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National
More informationMath.3336: Discrete Mathematics. Chapter 9 Relations
Math.3336: Discrete Mathematics Chapter 9 Relations Instructor: Dr. Blerina Xhabli Department of Mathematics, University of Houston https://www.math.uh.edu/ blerina Email: blerina@math.uh.edu Fall 2018
More informationNon-Approximability Results (2 nd part) 1/19
Non-Approximability Results (2 nd part) 1/19 Summary - The PCP theorem - Application: Non-approximability of MAXIMUM 3-SAT 2/19 Non deterministic TM - A TM where it is possible to associate more than one
More informationCMPT307: Complexity Classes: P and N P Week 13-1
CMPT307: Complexity Classes: P and N P Week 13-1 Xian Qiu Simon Fraser University xianq@sfu.ca Strings and Languages an alphabet Σ is a finite set of symbols {0, 1}, {T, F}, {a, b,..., z}, N a string x
More informationInteger Programming for Phylogenetic Network Problems
Integer Programming for Phylogenetic Network Problems D. Gusfield University of California, Davis Presented at the National University of Singapore, July 27, 2015.! There are many important phylogeny problems
More informationValue-Ordering and Discrepancies. Ciaran McCreesh and Patrick Prosser
Value-Ordering and Discrepancies Maintaining Arc Consistency (MAC) Achieve (generalised) arc consistency (AC3, etc). If we have a domain wipeout, backtrack. If all domains have one value, we re done. Pick
More informationThe Intractability of Computing the Hamming Distance
The Intractability of Computing the Hamming Distance Bodo Manthey and Rüdiger Reischuk Universität zu Lübeck, Institut für Theoretische Informatik Wallstraße 40, 23560 Lübeck, Germany manthey/reischuk@tcs.uni-luebeck.de
More informationThe Complexity of Probabilistic Lobbying
Gábor Erdélyi 1, Henning Fernau 2, Judy Goldsmith 3 Nicholas Mattei 3, Daniel Raible 2, and Jörg Rothe 1 1 ccc.cs.uni-duesseldorf.de/~{erdelyi,rothe} 2 Univ. Trier, FB 4 Abteilung Informatik www.informatik.uni-trier.de/~{fernau,raible},
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More information7 Distances. 7.1 Metrics. 7.2 Distances L p Distances
7 Distances We have mainly been focusing on similarities so far, since it is easiest to explain locality sensitive hashing that way, and in particular the Jaccard similarity is easy to define in regards
More informationCombinatorial optimization problems
Combinatorial optimization problems Heuristic Algorithms Giovanni Righini University of Milan Department of Computer Science (Crema) Optimization In general an optimization problem can be formulated as:
More informationPCPs and Inapproximability Gap-producing and Gap-Preserving Reductions. My T. Thai
PCPs and Inapproximability Gap-producing and Gap-Preserving Reductions My T. Thai 1 1 Hardness of Approximation Consider a maximization problem Π such as MAX-E3SAT. To show that it is NP-hard to approximation
More informationConsensus Optimizing Both Distance Sum and Radius
Consensus Optimizing Both Distance Sum and Radius Amihood Amir 1, Gad M. Landau 2, Joong Chae Na 3, Heejin Park 4, Kunsoo Park 5, and Jeong Seop Sim 6 1 Bar-Ilan University, 52900 Ramat-Gan, Israel 2 University
More informationMultiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas
n Introduction to Bioinformatics lgorithms Multiple lignment Slides revised and adapted to Bioinformática IS 2005 na eresa Freitas n Introduction to Bioinformatics lgorithms Outline Dynamic Programming
More informationStructure-Based Comparison of Biomolecules
Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein
More informationHardness of Approximation
Hardness of Approximation We have seen several methods to find approximation algorithms for NP-hard problems We have also seen a couple of examples where we could show lower bounds on the achievable approxmation
More informationBioinformatics for Computer Scientists (Part 2 Sequence Alignment) Sepp Hochreiter
Bioinformatics for Computer Scientists (Part 2 Sequence Alignment) Institute of Bioinformatics Johannes Kepler University, Linz, Austria Sequence Alignment 2. Sequence Alignment Sequence Alignment 2.1
More informationNetwork Design and Game Theory Spring 2008 Lecture 6
Network Design and Game Theory Spring 2008 Lecture 6 Guest Lecturer: Aaron Archer Instructor: Mohammad T. Hajiaghayi Scribe: Fengming Wang March 3, 2008 1 Overview We study the Primal-dual, Lagrangian
More informationBBM402-Lecture 11: The Class NP
BBM402-Lecture 11: The Class NP Lecturer: Lale Özkahya Resources for the presentation: http://ocw.mit.edu/courses/electrical-engineering-andcomputer-science/6-045j-automata-computability-andcomplexity-spring-2011/syllabus/
More informationChapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.
Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved. 1 Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should
More informationMultiple Sequence Alignment
Multiple Sequence Alignment Multiple Alignment versus Pairwise Alignment Up until now we have only tried to align two sequences. What about more than two? And what for? A faint similarity between two sequences
More informationApproximation algorithms and mechanism design for minimax approval voting
Approximation algorithms and mechanism design for minimax approval voting Ioannis Caragiannis Dimitris Kalaitzis University of Patras Vangelis Markakis Athens University of Economics and Business Outline
More informationHidden Markov Models
Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How
More informationLecture 18 Classification of Dynkin Diagrams
18745 Introduction to Lie Algebras October 21, 2010 Lecture 18 Classification of Dynkin Diagrams Prof Victor Kac Scribe: Benjamin Iriarte 1 Examples of Dynkin diagrams In the examples that follow, we will
More informationAlgorithm Design CS 515 Fall 2015 Sample Final Exam Solutions
Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions Copyright c 2015 Andrew Klapper. All rights reserved. 1. For the functions satisfying the following three recurrences, determine which is the
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationSection Summary. Relations and Functions Properties of Relations. Combining Relations
Chapter 9 Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations Closures of Relations (not currently included
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationk-means Clustering Raten der µi Neubestimmung µi Zuordnungsschritt Konvergenz
k-means Clustering Neubestimmung µi Konvergenz Raten der µi Zuordnungsschritt Sebastian Böcker Bioinformatik und Genomforschung Clustering Microarrays 1 Parameterized algorithms and FPT we want to find
More informationIntroduction to spectral alignment
SI Appendix C. Introduction to spectral alignment Due to the complexity of the anti-symmetric spectral alignment algorithm described in Appendix A, this appendix provides an extended introduction to the
More informationNP and Computational Intractability
NP and Computational Intractability 1 Polynomial-Time Reduction Desiderata'. Suppose we could solve X in polynomial-time. What else could we solve in polynomial time? don't confuse with reduces from Reduction.
More informationDefinition: A binary relation R from a set A to a set B is a subset R A B. Example:
Chapter 9 1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Let A = {0,1,2} and B = {a,b} {(0, a), (0, b), (1,a), (2, b)} is a relation from A to B.
More informationMaths GCSE Langdon Park Foundation Calculator pack A
Maths GCSE Langdon Park Foundation Calculator pack A Name: Class: Date: Time: 96 minutes Marks: 89 marks Comments: Q1. The table shows how 25 students travel to school. Walk Bus Car Taxi 9 8 7 1 Draw a
More informationEffects of Gap Open and Gap Extension Penalties
Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See
More informationComparing whole genomes
BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will
More information1 Alphabets and Languages
1 Alphabets and Languages Look at handout 1 (inference rules for sets) and use the rules on some examples like {a} {{a}} {a} {a, b}, {a} {{a}}, {a} {{a}}, {a} {a, b}, a {{a}}, a {a, b}, a {{a}}, a {a,
More informationAlgorithm Analysis Divide and Conquer. Chung-Ang University, Jaesung Lee
Algorithm Analysis Divide and Conquer Chung-Ang University, Jaesung Lee Introduction 2 Divide and Conquer Paradigm 3 Advantages of Divide and Conquer Solving Difficult Problems Algorithm Efficiency Parallelism
More informationApproximation Algorithms and Hardness of Approximation. IPM, Jan Mohammad R. Salavatipour Department of Computing Science University of Alberta
Approximation Algorithms and Hardness of Approximation IPM, Jan 2006 Mohammad R. Salavatipour Department of Computing Science University of Alberta 1 Introduction For NP-hard optimization problems, we
More informationBio nformatics. Lecture 3. Saad Mneimneh
Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per
More informationBio nformatics. Lecture 16. Saad Mneimneh
Bio nformatics Lecture 16 DNA sequencing To sequence a DNA is to obtain the string of bases that it contains. It is impossible to sequence the whole DNA molecule directly. We may however obtain a piece
More informationAside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n
Aside: Golden Ratio Golden Ratio: A universal law. Golden ratio φ = lim n a n+b n a n = 1+ 5 2 a n+1 = a n + b n, b n = a n 1 Ruta (UIUC) CS473 1 Spring 2018 1 / 41 CS 473: Algorithms, Spring 2018 Dynamic
More informationStrong Mathematical Induction
Strong Mathematical Induction Lecture 23 Section 5.4 Robb T. Koether Hampden-Sydney College Mon, Feb 24, 2014 Robb T. Koether (Hampden-Sydney College) Strong Mathematical Induction Mon, Feb 24, 2014 1
More informationGraph. Supply Vertices and Demand Vertices. Supply Vertices. Demand Vertices
Partitioning Graphs of Supply and Demand Generalization of Knapsack Problem Takao Nishizeki Tohoku University Graph Supply Vertices and Demand Vertices Supply Vertices Demand Vertices Graph Each Supply
More informationClassical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems
Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems Rani M. R, Mohith Jagalmohanan, R. Subashini Binary matrices having simultaneous consecutive
More informationNP-Complete Problems. Complexity Class P. .. Cal Poly CSC 349: Design and Analyis of Algorithms Alexander Dekhtyar..
.. Cal Poly CSC 349: Design and Analyis of Algorithms Alexander Dekhtyar.. Complexity Class P NP-Complete Problems Abstract Problems. An abstract problem Q is a binary relation on sets I of input instances
More informationPhylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches
Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationNP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch]
NP-Completeness Andreas Klappenecker [based on slides by Prof. Welch] 1 Prelude: Informal Discussion (Incidentally, we will never get very formal in this course) 2 Polynomial Time Algorithms Most of the
More informationX-MA2C01-1: Partial Worked Solutions
X-MAC01-1: Partial Worked Solutions David R. Wilkins May 013 1. (a) Let A, B and C be sets. Prove that (A \ (B C)) (B \ C) = (A B) \ C. [Venn Diagrams, by themselves without an accompanying logical argument,
More informationEfficient High-Similarity String Comparison: The Waterfall Algorithm
Efficient High-Similarity String Comparison: The Waterfall Algorithm Alexander Tiskin Department of Computer Science University of Warwick http://go.warwick.ac.uk/alextiskin Alexander Tiskin (Warwick)
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationABHELSINKI UNIVERSITY OF TECHNOLOGY
Approximation Algorithms Seminar 1 Set Cover, Steiner Tree and TSP Siert Wieringa siert.wieringa@tkk.fi Approximation Algorithms Seminar 1 1/27 Contents Approximation algorithms for: Set Cover Steiner
More informationLocal Alignment of RNA Sequences with Arbitrary Scoring Schemes
Local Alignment of RNA Sequences with Arbitrary Scoring Schemes Rolf Backofen 1, Danny Hermelin 2, ad M. Landau 2,3, and Oren Weimann 4 1 Institute of omputer Science, Albert-Ludwigs niversität Freiburg,
More informationPhylogeny: traditional and Bayesian approaches
Phylogeny: traditional and Bayesian approaches 5-Feb-2014 DEKM book Notes from Dr. B. John Holder and Lewis, Nature Reviews Genetics 4, 275-284, 2003 1 Phylogeny A graph depicting the ancestor-descendent
More informationCSE 421 NP-Completeness
CSE 421 NP-Completeness Yin at Lee 1 Cook-Levin heorem heorem (Cook 71, Levin 73): 3-SA is NP-complete, i.e., for all problems A NP, A p 3-SA. (See CSE 431 for the proof) So, 3-SA is the hardest problem
More informationPairwise sequence alignment
Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL
More informationCounting Strategies: Inclusion-Exclusion, Categories
Counting Strategies: Inclusion-Exclusion, Categories Russell Impagliazzo and Miles Jones Thanks to Janine Tiefenbruck http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ May 4, 2016 A scheduling problem In one
More informationDynamic programming. Curs 2015
Dynamic programming. Curs 2015 Fibonacci Recurrence. n-th Fibonacci Term INPUT: n nat QUESTION: Compute F n = F n 1 + F n 2 Recursive Fibonacci (n) if n = 0 then return 0 else if n = 1 then return 1 else
More informationAdvanced Algorithms 南京大学 尹一通
Advanced Algorithms 南京大学 尹一通 LP-based Algorithms LP rounding: Relax the integer program to LP; round the optimal LP solution to a nearby feasible integral solution. The primal-dual schema: Find a pair
More informationPhylogenetic Reconstruction with Insertions and Deletions
Phylogenetic Reconstruction with Insertions and Deletions Alexandr Andoni, Mark Braverman, Avinatan Hassidim April 7, 2010 Abstract We study phylogenetic reconstruction of evolutionary trees, with three
More informationCSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD
Greedy s Greedy s Shortest path Claim 2: Let S be a subset of vertices containing s such that we know the shortest path length l(s, u) from s to any vertex in u S. Let e = (u, v) be an edge such that 1
More informationHigh Dimensional Search Min- Hashing Locality Sensi6ve Hashing
High Dimensional Search Min- Hashing Locality Sensi6ve Hashing Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata September 8 and 11, 2014 High Support Rules vs Correla6on of
More informationJuly 18, Approximation Algorithms (Travelling Salesman Problem)
Approximation Algorithms (Travelling Salesman Problem) July 18, 2014 The travelling-salesman problem Problem: given complete, undirected graph G = (V, E) with non-negative integer cost c(u, v) for each
More informationLecture 4: An FPTAS for Knapsack, and K-Center
Comp 260: Advanced Algorithms Tufts University, Spring 2016 Prof. Lenore Cowen Scribe: Eric Bailey Lecture 4: An FPTAS for Knapsack, and K-Center 1 Introduction Definition 1.0.1. The Knapsack problem (restated)
More informationEnumeration Schemes for Words Avoiding Permutations
Enumeration Schemes for Words Avoiding Permutations Lara Pudwell November 27, 2007 Abstract The enumeration of permutation classes has been accomplished with a variety of techniques. One wide-reaching
More informationSAT, NP, NP-Completeness
CS 473: Algorithms, Spring 2018 SAT, NP, NP-Completeness Lecture 22 April 13, 2018 Most slides are courtesy Prof. Chekuri Ruta (UIUC) CS473 1 Spring 2018 1 / 57 Part I Reductions Continued Ruta (UIUC)
More informationGeneralized Splines. Madeline Handschy, Julie Melnick, Stephanie Reinders. Smith College. April 1, 2013
Smith College April 1, 213 What is a Spline? What is a Spline? are used in engineering to represent objects. What is a Spline? are used in engineering to represent objects. What is a Spline? are used
More informationLinear Programming. Scheduling problems
Linear Programming Scheduling problems Linear programming (LP) ( )., 1, for 0 min 1 1 1 1 1 11 1 1 n i x b x a x a b x a x a x c x c x z i m n mn m n n n n! = + + + + + + = Extreme points x ={x 1,,x n
More informationCSCI3390-Second Test with Solutions
CSCI3390-Second Test with Solutions April 26, 2016 Each of the 15 parts of the problems below is worth 10 points, except for the more involved 4(d), which is worth 20. A perfect score is 100: if your score
More informationScheduling on Unrelated Parallel Machines. Approximation Algorithms, V. V. Vazirani Book Chapter 17
Scheduling on Unrelated Parallel Machines Approximation Algorithms, V. V. Vazirani Book Chapter 17 Nicolas Karakatsanis, 2008 Description of the problem Problem 17.1 (Scheduling on unrelated parallel machines)
More informationChapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013
Chapter 2 Reductions and NP CS 573: Algorithms, Fall 2013 August 29, 2013 2.1 Reductions Continued 2.1.1 The Satisfiability Problem SAT 2.1.1.1 Propositional Formulas Definition 2.1.1. Consider a set of
More informationLecture 2: Pairwise Alignment. CG Ron Shamir
Lecture 2: Pairwise Alignment 1 Main source 2 Why compare sequences? Human hexosaminidase A vs Mouse hexosaminidase A 3 www.mathworks.com/.../jan04/bio_genome.html Sequence Alignment עימוד רצפים The problem:
More informationHidden Markov Models. x 1 x 2 x 3 x K
Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)
More informationProving languages to be nonregular
Proving languages to be nonregular We already know that there exist languages A Σ that are nonregular, for any choice of an alphabet Σ. This is because there are uncountably many languages in total and
More informationarxiv:cs/ v1 [cs.cc] 21 May 2002
Parameterized Intractability of Motif Search Problems arxiv:cs/0205056v1 [cs.cc] 21 May 2002 Michael R. Fellows Jens Gramm Rolf Niedermeier Abstract We show that Closest Substring, one of the most important
More informationSpace Complexity. The space complexity of a program is how much memory it uses.
Space Complexity The space complexity of a program is how much memory it uses. Measuring Space When we compute the space used by a TM, we do not count the input (think of input as readonly). We say that
More informationColoring Vertices and Edges of a Path by Nonempty Subsets of a Set
Coloring Vertices and Edges of a Path by Nonempty Subsets of a Set P.N. Balister E. Győri R.H. Schelp April 28, 28 Abstract A graph G is strongly set colorable if V (G) E(G) can be assigned distinct nonempty
More informationRelations. Relations of Sets N-ary Relations Relational Databases Binary Relation Properties Equivalence Relations. Reading (Epp s textbook)
Relations Relations of Sets N-ary Relations Relational Databases Binary Relation Properties Equivalence Relations Reading (Epp s textbook) 8.-8.3. Cartesian Products The symbol (a, b) denotes the ordered
More informationCSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182
CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding 10-07 CSE182 Bell Labs Honors Pattern matching 10-07 CSE182 Just the Facts Consider the set of all substrings
More informationarxiv: v1 [cs.cc] 15 Nov 2016
Diploid Alignment is NP-hard Romeo Rizzi 1, Massimo Cairo 1, Veli Mäkinen 2, and Daniel Valenzuela 2 1 Department of Computer Science, University of Verona, Italy 2 Helsinki Institute for Information echnology,
More informationColoring Vertices and Edges of a Path by Nonempty Subsets of a Set
Coloring Vertices and Edges of a Path by Nonempty Subsets of a Set P.N. Balister E. Győri R.H. Schelp November 8, 28 Abstract A graph G is strongly set colorable if V (G) E(G) can be assigned distinct
More informationTime to learn about NP-completeness!
Time to learn about NP-completeness! Harvey Mudd College March 19, 2007 Languages A language is a set of strings Examples The language of strings of all a s with odd length The language of strings with
More informationSearch. Search is a key component of intelligent problem solving. Get closer to the goal if time is not enough
Search Search is a key component of intelligent problem solving Search can be used to Find a desired goal if time allows Get closer to the goal if time is not enough section 11 page 1 The size of the search
More informationCS6999 Probabilistic Methods in Integer Programming Randomized Rounding Andrew D. Smith April 2003
CS6999 Probabilistic Methods in Integer Programming Randomized Rounding April 2003 Overview 2 Background Randomized Rounding Handling Feasibility Derandomization Advanced Techniques Integer Programming
More informationBioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony
ioinformatics -- lecture 9 Phylogenetic trees istance-based tree building Parsimony (,(,(,))) rees can be represented in "parenthesis notation". Each set of parentheses represents a branch-point (bifurcation),
More informationLet S be a set of n species. A phylogeny is a rooted tree with n leaves, each of which is uniquely
JOURNAL OF COMPUTATIONAL BIOLOGY Volume 8, Number 1, 2001 Mary Ann Liebert, Inc. Pp. 69 78 Perfect Phylogenetic Networks with Recombination LUSHENG WANG, 1 KAIZHONG ZHANG, 2 and LOUXIN ZHANG 3 ABSTRACT
More informationFinding Consensus Strings With Small Length Difference Between Input and Solution Strings
Finding Consensus Strings With Small Length Difference Between Input and Solution Strings Markus L. Schmid Trier University, Fachbereich IV Abteilung Informatikwissenschaften, D-54286 Trier, Germany, MSchmid@uni-trier.de
More information