Physical Mapping. Restriction Mapping. Lecture 12. A physical map of a DNA tells the location of precisely defined sequences along the molecule.

Similar documents
Bioinformatics Algorithms. Physical Mapping Restriction Mapping

Physical Mapping Restriction Mapping

4. How to prove a problem is NPC

Mid-labeled Partial Digest Problem 1

Lecture 15. Saad Mneimneh

DNA sequencing. Bad example (repeats) Lecture 15. Shortest common superstring SCS. Given a set of fragments F,

More on NP and Reductions

VIII. NP-completeness

Quadratic-backtracking Algorithm for String Reconstruction from Substring Compositions

ECS122A Handout on NP-Completeness March 12, 2018

Approximation Algorithms for the Consecutive Ones Submatrix Problem on Sparse Matrices

NP-Completeness. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Exhaustive search. CS 466 Saurabh Sinha

Lecture 22: Counting

Chapter 34: NP-Completeness

Representations of All Solutions of Boolean Programming Problems

Limitations of Algorithm Power

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

CS 372: Computational Geometry Lecture 4 Lower Bounds for Computational Geometry Problems

Some Algebra Problems (Algorithmic) CSE 417 Introduction to Algorithms Winter Some Problems. A Brief History of Ideas

An Algorithmic Problem in Molecular Biology Partial Digest Problem

Bio nformatics. Lecture 16. Saad Mneimneh

Structure-Based Comparison of Biomolecules

Search and Lookahead. Bernhard Nebel, Julien Hué, and Stefan Wölfl. June 4/6, 2012

Week Cuts, Branch & Bound, and Lagrangean Relaxation

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions

Friday Four Square! Today at 4:15PM, Outside Gates

Perfect Sorting by Reversals and Deletions/Insertions

Lecture 4: DNA Restriction Mapping

from notes written mostly by Dr. Matt Stallmann: All Rights Reserved

NP-Complete problems

Easy Problems vs. Hard Problems. CSE 421 Introduction to Algorithms Winter Is P a good definition of efficient? The class P

Temporal Constraint Networks

Algorithms Re-Exam TIN093/DIT600

Advanced Combinatorial Optimization Feb 13 and 25, Lectures 4 and 6

Computational Complexity and Intractability: An Introduction to the Theory of NP. Chapter 9

Model Checking for Propositions CS477 Formal Software Dev Methods

Lecture 25: Cook s Theorem (1997) Steven Skiena. skiena

P and NP. Or, how to make $1,000,000.

NP-Completeness. f(n) \ n n sec sec sec. n sec 24.3 sec 5.2 mins. 2 n sec 17.9 mins 35.

1. (a) Explain the asymptotic notations used in algorithm analysis. (b) Prove that f(n)=0(h(n)) where f(n)=0(g(n)) and g(n)=0(h(n)).

Automata Theory CS Complexity Theory I: Polynomial Time

CSC 8301 Design & Analysis of Algorithms: Lower Bounds

CS6840: Advanced Complexity Theory Mar 29, Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K.

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}

Algorithms and Theory of Computation. Lecture 22: NP-Completeness (2)

CS 320, Fall Dr. Geri Georg, Instructor 320 NP 1

Lecture 4: NP and computational intractability

Lecture 18: P & NP. Revised, May 1, CLRS, pp

Some Algebra Problems (Algorithmic) CSE 417 Introduction to Algorithms Winter Some Problems. A Brief History of Ideas

Complexity, P and NP

P, NP, NP-Complete, and NPhard

arxiv: v1 [cs.dm] 29 Oct 2012

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

Outline. 1 NP-Completeness Theory. 2 Limitation of Computation. 3 Examples. 4 Decision Problems. 5 Verification Algorithm

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Algorithms. NP -Complete Problems. Dong Kyue Kim Hanyang University

Ringing the Changes II

Algorithms and Data Structures 2016 Week 5 solutions (Tues 9th - Fri 12th February)

Covering Linear Orders with Posets

Kernelization by matroids: Odd Cycle Transversal

In complexity theory, algorithms and problems are classified by the growth order of computation time as a function of instance size.

Lecture 24 : Even more reductions

6.854J / J Advanced Algorithms Fall 2008

Lecture 4: DNA Restriction Mapping

8/29/13 Comp 555 Fall

Theory of Computation Time Complexity

Chapter 3: Proving NP-completeness Results

Tractable Approximations of Graph Isomorphism.

Reductions. Example 1

On Machine Dependency in Shop Scheduling

Algebraic Proof Systems

8. INTRACTABILITY I. Lecture slides by Kevin Wayne Copyright 2005 Pearson-Addison Wesley. Last updated on 2/6/18 2:16 AM

NP-Completeness. Until now we have been designing algorithms for specific problems

Limits to Approximability: When Algorithms Won't Help You. Note: Contents of today s lecture won t be on the exam

Discrete Optimization 2010 Lecture 8 Lagrangian Relaxation / P, N P and co-n P

NP Completeness. CS 374: Algorithms & Models of Computation, Spring Lecture 23. November 19, 2015

Lecture 2: Scheduling on Parallel Machines

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Lecture 15 - NP Completeness 1

Ma/CS 6b Class 12: Graphs and Matrices

CS Communication Complexity: Applications and New Directions

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

BBM402-Lecture 11: The Class NP

Approximation Preserving Reductions

Week 4-5: Generating Permutations and Combinations

Q = Set of states, IE661: Scheduling Theory (Fall 2003) Primer to Complexity Theory Satyaki Ghosh Dastidar

Computability Theory

Advanced Combinatorial Optimization September 22, Lecture 4

Finite Model Theory and Graph Isomorphism. II.

Equitable and semi-equitable coloring of cubic graphs and its application in batch scheduling

P,NP, NP-Hard and NP-Complete

Multi-Agent Systems. Bernhard Nebel, Felix Lindner, and Thorsten Engesser. Summer Term Albert-Ludwigs-Universität Freiburg

Minimizing Interference in Wireless Networks. Yves Brise, ETH Zürich, Joint work with Marek, Michael, and Tobias.

COMP Analysis of Algorithms & Data Structures

Complexity Theory Part I

BDD Based Upon Shannon Expansion

Essential facts about NP-completeness:

Transcription:

Computat onal iology Lecture 12 Physical Mapping physical map of a DN tells the location of precisely defined sequences along the molecule. Restriction mapping: mapping of restriction sites of a cutting enzyme based on lengths of fragments Double Digest Problem DDP Partial Digest Problem PDP Hybridization mapping: mapping clones based on hybridization data with probes Non-unique probes Unique probes Restriction Mapping enzyme apply uncut DN fragment lengths Ordering the fragments maps the restriction sites on the DN. ut lengths of fragments is not enough information, any order would satisfy the experiment data. Use two cutting enzymes

Double Digest enzyme apply uncut DN fragment lengths uncut DN apply enzyme fragment lengths enzyme apply uncut DN apply enzyme fragment lengths Double Digest Problem Double Digest Problem DDP given 1. multiset of lengths from enzyme 2. multiset of lengths from enzyme 3. multiset C of lengths from enzymes determine an ordering of the fragments that is consistent with the three experiments enzyme : {3, 6, 8,10} : {4, 5, 7, 11} enzyme : {1, 2, 3, 3, 5, 6, 7} enzyme enzyme C = 3 8 6 10 3 1 5 2 6 3 7 4 5 11 7

Number of solutions The solution for a DDP might not be unique. The number of solutions grows exponentially enzyme : {1, 2, 2, 3, 3, 4} : {1, 1, 2, 2, 4, 5} enzyme : {1, 1, 1, 1, 1, 2, 2, 3, 3} enzyme enzyme C = 2 3 2 4 1 3 111 2 2 3 11 3 1 2 2 5 1 4 3 2 1 3 4 2 2 1111 3 1 3 2 2 2 1 4 1 5 Equivalence of Solutions Some different solutions might be equivalent. For instance, if (a 1, a 2,, a m ) (b 1, b 2,, b n ) is a solution, then (a m, a m-1,, a 1 ) (b n, b n-1,, b 1 ) is also a solution. This is a Reflection. No fragment length data could possibly distinguish between the two, they only differ by orientation.

Reflection C = 3 8 6 10 3 1 5 2 6 3 7 4 5 11 7 C = 10 7 7 3 6 6 11 2 8 5 5 3 1 3 4 Overlap Equivalence Let s define a more general type of equivalence called overlap equivalence. Let { i } be the set of fragments from and { j } be the set of fragments from. solution defines an overlap matrix O, s.t. O ij = 1 if i overlaps with j. Two solutions are overlap equivalent if they define the same overlap matrix O. Reflections are overlap equivalent. Equivalence class solution with all its overlap equivalent solutions form an equivalence class (this is an equivalence relation). Given a solution, What is the size of its equivalence class? Can we generate all solutions in the class?

Observations If a solution has t 1 coincident cuts sites, then it has t components. They can be permuted in t! ways without changing the overlap data. Each component can also be reflected without changing the overlap. Therefore, we can generate 2 t t! solutions. C = 2 3 2 4 1 3 111 2 2 3 11 3 1 2 2 5 1 4 permute C = 2 4 1 3 2 3 11 3 5 1 4 2 3 111 2 1 2 2 reflect C = 3 1 4 2 2 3 3 11 3 2 111 2 4 1 5 1 2 2 nother observation Given a solution, let j = { k : k j } i = { k : k i } Permuting j and i does not change the overlap data

C = 6 3 1 2 5 1 4 1 3 1 2 2 3 1 4 9 3 C = 6 2 3 1 5 1 4 1 2 3 1 2 3 1 4 9 3 Size of equivalence class Is it 2 t t! Π j! Π i! Not quite! If a component has only one fragment in either or, then a reflection is also a permutation. Let s be the number of such components, then the size of the equivalence class is: 2 (t-s) t! Π j! Π i! Other Equivalences? We can define other kinds of equivalences. Consider overlap size equivalence, i.e. two solutions are equivalent if they produce the same overlap sizes. Overlap equivalence => overlap size equivalence, but not the other way around.

switch => overlap size equivalent C = 2 3 2 4 1 3 111 2 2 3 11 3 1 2 2 5 1 4 2 overlaps with 2 and 3 Let C = l Cassette Transformation Equivalence For 1 i j l, I c = { C k : i k j } is the set of fragments from C i to C j. The cassette defined by I C is the set of fragments (I, I ) that contain a fragment from I C. Cassette 1 4 3 5 2 4 5 3 1 2 11 2 11 4 2 2 2 1 4 3 3 1 3 2 8 3 8 Cassette for I c = { C 3, C 4, C 5, C 6, C 7 }

Cassette left and right overlap left overlap = m a m b m a m b = right overlap Cassette exchange Left overlap = -2 Right overlap = 4 1 4 3 5 2 4 5 3 1 2 11 2 11 4 2 2 2 1 4 3 3 1 3 2 8 3 8 1 4 1 2 2 1 3 3 5 2 4 3 5 3 4 2 2 11 2 11 4 3 8 1 3 2 8 Two cassettes with the same left and right overlap can be exchanged Cassette Reflection Left overlap = -1 Right overlap = 1 1 3 12 3 11 2 2 6 3 1 2 1 2 4 6 3 3 1 1 3 12 3 11 2 1 3 6 2 2 1 2 3 3 6 4 1 cassettes with the same left and right overlap (but different signs) can be reflected

Cassette Equivalence Two solutions are cassette equivalent if there exists a series of cassette transformations (exchanges and reflections) that take on to the other. What is the size of an equivalence class? lternating Euler Paths Consider a graph with colored edges n Euler path (cycle) is a path (cycle) that goes through every edge once n alternating Euler path (cycle) is an Euler path (cycle) such that consecutive edges on the path (cycle) have different colors Pevzner 1995 showed that given a solution, we can construct a special bi-colored graph called the border block graph. Each cassette equivalent solution corresponds to an alternating Euler path (cycle) in the graph and vice-versa. Fact Let d i (v) be the number of edges of color i incident to v. n edge bi-colored connected graph with d (v) = d (v) has an alternating Euler cycle. Proof: Every vertex has even degree; therefore, the graph contains an Euler cycle. Construct the Euler cycle the usual way, but by using only distinct color edges when traversing a vertex.

Exchange Consider an alternating path x y x y It consists of 5 parts F 1 F 2 F 3 F 4 F 5 F 1 F 2 F 3 F 4 F 5 F 1 F 4 F 3 F 2 F 5 is called an exchange if F 1 F 4 F 3 F 2 F 5 is an alternating path Illustration F 3 F 1 F 2 x y F 5 F 4 F 3 F 1 F 2 x y F 5 F 4 Reflection Consider an alternating path x x It consists of 3 parts F 1 F 2 F 3 F 1 F 2 F 3 F 1 F 2 F 3 is called a reflection if F 1 F 2 F 3 is an alternating path, where F 2 is the reverse of F 2.

Illustration F 2 F 1 F 3 x F 2 F 1 F 3 x Fact Every two alternating Euler cycles in a bi-colored graph can be transformed into each other by a series of exchanges and reflections. Proof: Pevzner p. 29 The border blocks Let I( i ) = { C k : C k i } Let I( j ) = { C k : C k j } If I(X) > 1, define the border blocks of X to be the left most and right most block in I(X). C i is a border block is it is a border block for some fragment X.

1 3 12 3 11 2 2 6 3 1 2 1 2 4 6 3 3 1 I( 3 ) = { C 4, C 5, C 6, C 7 } order blocks of 3 : C 4 and C 7 Lemma Each fragment X with I(X) > 1 contains exactly two border blocks. I( i ) I( j ) 1 ssume no cuts in and coincide, then each border block is a border block for some i and some j, except C 1 and C l. order graph Let = { C k : C k is a border block } V = { C k : C k } vertices E = { ( C i, C j ) : C i and C j I(X) for some X } Each edge labeled by its X and colored if X and if X.

4 1 3 12 3 11 2 2 6 3 1 2 1 2 4 6 3 3 1 3 2 1 2 1 2 5 lternating Euler path in border block graph Each vertex has equal number of edges of each color, except possibly for C 1 and C l. y adding one or two edges (depending on the colors) we can fix this. Therefore, the graph has an alternating Euler path or cycle. Let C 1 C m be the ordered set of border blocks, then P = C 1 C m is an alternating Euler path (cycle). Result Cassette transformations do not change the border graph. Let P be the alternating Euler path (cycle) corresponding a solution [, ]. If a solution [, ] is obtained from [, ] by cassette exchange (reflection), it will have a path P that can be obtained from P by an exchange (reflection). Let P be an alternating Euler path (cycle) obtained from P by exchange (reflection). Then there is a solution [, ] that can be obtained from [, ] by cassette exchange (reflection), where P corresponds to [, ]

1 3 12 3 11 2 2 6 3 1 2 1 2 4 6 3 3 1 reflection This corresponds to the cycle: 1 2 2 3 5 4 1 1 2 2 1 2 1 1 3 12 3 11 2 1 3 6 2 2 1 2 3 3 6 4 1 4 3 2 1 1 2 2 5 This corresponds to the cycle: 1 4 5 3 2 2 1 1 2 1 2 2 1 4 3 2 1 1 2 2 Cycle 1: 1 2 2 3 5 4 1 1 2 2 1 2 1 Cycle 1: 5 1 2 2 3 5 4 1 1 2 2 1 2 1 1 2 5 3 2 4 1 1 2 1 2 2 1 1 2 5 3 2 4 1 1 2 1 2 2 1 Cycle 2: 1 4 2 3 5 2 1 1 2 2 1 2 1 Cycle 2: 1 4 2 3 5 2 1 1 2 2 1 2 1 (cont.) 1 3 12 3 11 2 2 6 3 1 2 1 1 2 2 3 5 4 2 4 6 3 3 1 1 1 2 2 1 2 1 1 3 12 3 11 2 1 3 6 2 2 1 2 3 3 6 4 1 1 3 12 3 11 2 2 6 3 1 2 1 2 4 6 3 3 1 1 2 5 3 2 4 1 1 2 1 2 2 1 1 2 5 3 2 4 1 1 2 1 2 2 1 1 4 2 3 5 2 1 1 2 2 1 2 1

DDP is NP-complete Proof: DPP NP. solution for DDP can be verified in polynomial time. Set Partition problem (classical one), which is NP-complete, reduces to DDP in polynomial time. DDP NP Given multiset of lengths from enzyme multiset of lengths from enzyme multiset C of lengths from enzymes + Solution two sets of restriction sites, a and b. Verification: Sort g = a b {0, L}, L = sum of all lengths in Compute multiset c = { c i : c i = g i+1 g i, 0 < i < g and g i+1 g i } Sort c and C and compare them Set Partition Set Partition SP: Given a set Ι of integers with total sum J Can we partition Ι into two sets of sum J/2 each? is NP-complete. Given an SP instance, construct the following DDP: = Ι = {J/2, J/2} C = Ι DPP has a solution iff SP has a solution

enzyme apply Partial Digest uncut DN One restriction enzyme only Obtain fragments for every pair of cuts every cut and a boundary fragment lengths This can be achieved by missing a number of sites in every trial Partial Digest Problem (turnpike problem) Partial Digest Problem PDP given multiset x of distances between every pair of points on the line n n( n 1) x = = 2 2 reconstruct the points on the line PDP No polynomial time algorithm know Not known to be NP-complete Practical acktracking algorithm due to Skiena et al. 1990

The algorithm Find longest distance in X, this decides the two outermost points, delete that distance from X. Repeatedly position the longest remaining distance of X. Since the longer distance must be realized from one of the two outermost points, we have two possible positions (left and right) for the point. For each of these two positions, check whether all the distances from the position to the points already positioned are in X. If they are, delete all those distances from X and proceed. acktrack if they are not for both of the two positions. X = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 } X = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 } X = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 } (cont.) X = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 } X = { 2, 2, 3, 3, 4, 5, 6, 7, 8, 10 }

Worst case: Exponential We could end up backtracking an exponential number of times in this decision tree It is not trivial to show that an exponential number of backtracking could occur Zhang [1994] provided an example In practice If we have real points in general positions, then one of the two choices will be pruned with probability 1. The algorithm runs in O(n 2 log n) expected time since we have O(n 2 ) positions and all operations can be done on X using binary search which takes O(log n 2 ) = O(log n). enzyme 1 apply uncut DN apply enzyme 2 fragment lengths uncut DN apply enzyme 2 fragment lengths

enzyme : {1, 2, 3, 5, 6} : {1, 3, 4, 9} enzyme : {1, 1, 1, 2, 2, 3, 3, 4} enzyme enzyme 6 3 1 2 5 C = 4 11 3 1 2 2 3 4 1 9 3