Bio nformatics. Lecture 16. Saad Mneimneh
|
|
- Stewart Ray
- 5 years ago
- Views:
Transcription
1 Bio nformatics Lecture 16
2 DNA sequencing To sequence a DNA is to obtain the string of bases that it contains. It is impossible to sequence the whole DNA molecule directly. We may however obtain a piece of a certain length cut at random and sequence it. This is called a fragment. By using cloning and cutting techniques we can obtain a large number of sequenced fragments. The goal is to reconstruct the DNA molecule based on the fragments overlap.
3 Ideal case We know the length of the DNA (e.g. 10 bases) There are no errors in sequencing the fragments ACCGT CGTGC TTAC TACCGT --ACCGT CGTGC TTAC TACCGT-- Align sequences ignoring end gaps Find consensus by majority voting TTACCGTGC
4 Insertion errors ACCGT CAGTGC TTAC TACCGT --ACC-GT CAGTGC TTAC TACC-GT-- TTACC-GTGC Insertion of A in the second fragment Gap in consensus will be discarded In this example, it still works because of majority voting
5 Deletion error ACCGT CGTGC TTAC TACCGT --ACCGT CGTGC TTAC TAC-GT-- TTACCGTGC The first C was deleted from 4 th fragment Consensus still works
6 Chimeric fragment Two disjoint fragments join to form one fragment that is not originally part of the DNA ACCGT CGTGC TTAC TACCGT TTATGC --ACCGT CGTGC TTAC TACCGT-- TTACCGTGC TTA---TGC
7 Unknown orientation which strand a particular fragment belongs to? CACGT ACGT ACTACG GTACT ACTGA CTGA CACGT -ACGT --CGTAGT -----AGTAC ACTGA CTGA reverse compliment We have 2 n possibilities CACGTAGTACTGA
8 Repeats A X B X C X D A X C X B X D Repeats of the form X X X
9 Repeats A X B Y C X D Y E A X D Y C X B Y E Repeats of the form X Y X Y
10 Inverted repeats CGA X TCG X reverse complement inverted X X Inverted repeat
11 Lack of coverage uncovered area contig contig We have more than one contig
12 Number of fragment It is important to know how many fragments we need to generate in order to achieve certain coverage. Let T denote the length of the DNA. Assume all fragments have length l and that we can detect overlaps of at least t bases. If we sample n fragments at random, what is the expected number of contigs? E[# contigs] ne n( l t)/ T
13 Alternative methods Shortest common superstring SCS An elegant theoretical abstraction, but fundamentally flawed R. Karp Generalized SCS Models errors and orientations Multicontig Models errors, orientations, and coverage
14 SCS Given a set of fragments F, Find the shortest string s that contains every f F as a substring This is NP-hard The SCS might not be what we really want
15 Bad example (repeats) X X Shortest common superstring will give: X X
16 Generalized SCS Given a set of fragments F, Find the shortest string s that contains either f or f as a substring, for every f F Now it models orientations
17 Generalized SCS (cont.) Given a set of fragments F, ε > 0, and a distance function d Find the shortest string s that contains a substring x for every f F such that min[d(f,x), d(f,x)] ε f Now it models both orientations and errors
18 Multicontig For a given set of fragments F, a contig is a multiple alignment containing either f or f for every f F. A contig has an ε-consensus iff each fragment f (or f) differs from its image in the consensus by at most ε f. A contig is a t-contig if the smallest overlap that is not contained in any fragment is at least t (t is a measure of coverage).
19 Multicontig Given a set of fragments F, ε > 0, and t > 0 Partition F into a minimum number of subsets such that each subset has a t-contig with ε-consensus This is NP-hard This models errors, orientations, and coverage
20 Solving SCS We are going to consider a Hamiltonian path approach to solving the SCS problem
21 Overlap graph Consider the complete directed weighted graph G = (V, E), called the overlap graph V = F (each fragment is a vertex) (u,v) E with weight -t iff t is the length of the maximal suffix of u that is a prefix of v We allow self loops and zero weight edges
22 Example c CTAAAG 0 weight edges not shown TACGA a d GACA -1 ACCC b a = TACGA b = ACCC c = CTAAAG d = GAGC
23 A path defines a superstring Every simple path P in the overlap graph involving a set of vertices (fragments) A defines a superstring s(p) for the set A. Therefore, a Hamiltonian path in the overlap graph defines a superstring for the set of fragments F. A Hamiltonian path must exist because the graph is complete (how many do we have?).
24 Example c CTAAAG TACGA a weight edges not shown -1-1 d GACA -1 ACCC b a = TACGA b = ACCC c = CTAAAG d = GAGC s(p): P = adbc TACGA GACA ACCC CTAAAG TACGACACCCTAAAG
25 Does a superstring define a path? We have seen that every Hamiltonian path corresponds to a superstring. Is the converse true? No: A superstring can contain arbitrary characters that are not present in any fragments Does a shortest superstring correspond to a Hamiltonian path? Yes: if F is substring-free, i.e. no fragment in F is contained in another
26 Example AGC The shortest superstring is G b 0 a c CT AGCT There is no Hamiltonian path P, such that s(p) = AGCT
27 Subtring-free collection F Let F be a substring free set, then for every shortest superstring s, there is a Hamiltonian path P, such that s(p) = s. Proof: assume the fragments appear in s as follows (no gaps and no one can be contained in another) this must be the max overlap between a a and b b c d s -t 1 -t 2 0 t 1 t 2 Ham path: a b c d etc
28 Non substring-free F If F is not substring-free, then we can remove all fragments from F that are substrings of other fragments We end up with a set F But any superstring of F is a superstring of F Therefore, we can use F
29 Length of string v.s. weight of path Let P be a Hamiltonian path. Let w(p) be the weight of P. Let F = Σ a F a Then s(p) = F + w(p) [proof is simple] Therefore, the shortest common superstring corresponds to the Hamiltonian path with minimum weight
30 Proof Let P be a Hamiltonian path with minimum weight s(p) is a shortest superstring Let s be a shortest superstring with s < s(p) Then there is a Hamiltonian path P such that s = s(p ) s(p ) = F + w(p ) < s(p) = F + w(p) Therefore, w(p ) < w(p), contradiction
31 Hamiltonian path approach Finding a minimum weight Hamiltonian path is NP-hard (you can reduce HAMPATH to it) Unfortunately, there is no better approach to solve SCS, because SCS itself is NP-hard Let s consider a greedy algorithm for finding a Hamiltonian path
32 Greedy algorithm Greedy: start with an empty path repeatedly add the least weighted available edge until you get a Hamiltonian path Every time we add an edge (u,v), we need to check: (u,v) does not create a cycle with the previously added edges u has no previously added outgoing edge v has no previously added incoming edge
33 Greedy algorithm sort edges by their weight: e 1, e 2, e E for all v V in(v) 0 out(v) 0 H φ i 1 while H < F 1 (u,v) e i if out(u) = 0 and in(v) = 0 then To build the graph: O(n F ) (could be done optimally in O(n 2 + F ) using suffix trees) To run the algorithm: O(n 2 logn) if H e i does not contain a cycle [disjoint set data structure] H H e i out(u) 1 in(v) 1 i i + 1
34 Example ATGC GCC TGCAT Greedy algorithm will choose: ATGC TGCAT GCC ATGCATGCC Optimal is: TGCAT ATGC GCC TGCATGCC
35 Sequncing By Hybridization SBH Use all possible probes of length l and obtain hybridization data with the DNA. If no errors, we have all substrings of length l. We would like to reconstruct the DNA from those substrings. We can formalize this as SCS and solve it as before. But we can simplify a little bit
36 SBH and SCS SBH is a special case of the SCS problem where all fragments of F have the same length l. In the overlap graph, we will keep only the edges with weights equal to (l 1). By construction of these fragments, we know that there must be a Hamiltonian path in this modified overlap graph. All Hamiltonian paths now have the same weight = (n 1)(l 1) Thus we only need to find a Hamiltonian path (still NP-complete)
37 Example l = 3 ATG TGG TGC GTG GGC GCA GCG CGT
38 Idea Instead of representing fragments as vertices, represent them as edges. Then, instead of looking for a Hamiltonian path (a path that goes through each vertex once), look for an Euler path (a path that goes through each edge once). Euler path can be found in linear time.
39 Fragments as edges Construct a directed graph G = (V, E) V: (l 1) length fragments (these can be obtained from our set F by considering the first and last l 1 characters of each fragment) E: A directed edge (u,v) for each fragment in F that starts with u and ends with v
40 Example l = 3 ATG TGG TGC GTG GGC GCA GCG CGT GT CG AT TG GC CA GG
41 Euler Cycle By construction of the fragments, we know that the graph will have all vertices balanced except possibly for two unbalanced vertices (each occurrence of an l-1 fragment is shared by two l length fragments, except possibly for the first and last one) By adding an edge between two unbalanced vertices we can make the graph balanced Then we can find an Euler cycle in the graph (since it is balanced, there is one)
42 Example l = 3 ATG TGG TGC GTG GGC GCA GCG CGT GT CG ATGCGTGGCA AT TG GC CA GG
Lecture 15. Saad Mneimneh
Computat onal Biology Lecture 15 DNA sequencing Shortest common superstring SCS An elegant theoretical abstraction, but fundamentally flawed R. Karp Given a set of fragments F, Find the shortest string
More informationDNA sequencing. Bad example (repeats) Lecture 15. Shortest common superstring SCS. Given a set of fragments F,
Computat onal Biology Lecture 15 DNA sequencing Shortest common superstring SCS An elegant theoretical abstraction, but fundamentally flawed R. Karp Given a set of fragments F, Find the shortest string
More informationFragment Assembly of DNA
Wright State University CORE Scholar Computer Science and Engineering Faculty Publications Computer Science and Engineering 2003 Fragment Assembly of DNA Dan E. Krane Wright State University - Main Campus,
More informationFRAGMENT ASSEMBLY OF DNA
FRAGMENT ASSEMBLY OF DNA In Chapter 1 we saw the biological aspects of DNA sequencing. In this chapter we discuss the computational task involved in sequencing, which is called fragment assembly. The motivation
More informationGraph Algorithms in Bioinformatics
Graph Algorithms in Bioinformatics Outline 1. Introduction to Graph Theory 2. The Hamiltonian & Eulerian Cycle Problems 3. Basic Biological Applications of Graph Theory 4. DNA Sequencing 5. Shortest Superstring
More informationBio nformatics. Lecture 3. Saad Mneimneh
Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per
More informationLecture 15: Realities of Genome Assembly Protein Sequencing
Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing
More informationProblem: Shortest Common Superstring. The Greedy Algorithm for Shortest Common Superstrings. Overlap graphs. Substring-freeness
Problem: Shortest Common Superstring The Greedy Algorithm for Shortest Common Superstrings Course Discrete Biological Models (Modelli Biologici Discreti) Zsuzsanna Lipták Laurea Triennale in Bioinformatica
More informationPractical Bioinformatics
5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o
More informationPhysical Mapping. Restriction Mapping. Lecture 12. A physical map of a DNA tells the location of precisely defined sequences along the molecule.
Computat onal iology Lecture 12 Physical Mapping physical map of a DN tells the location of precisely defined sequences along the molecule. Restriction mapping: mapping of restriction sites of a cutting
More informationGel Electrophoresis. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 1
Gel Electrophoresis Used to measure the lengths of DNA fragments. When voltage is applied to DNA, different size fragments migrate to different distances (smaller ones travel farther). 10/28/0310/21/2003
More informationCMPSCI 311: Introduction to Algorithms Second Midterm Exam
CMPSCI 311: Introduction to Algorithms Second Midterm Exam April 11, 2018. Name: ID: Instructions: Answer the questions directly on the exam pages. Show all your work for each question. Providing more
More informationApproximating Shortest Superstring Problem Using de Bruijn Graphs
Approximating Shortest Superstring Problem Using de Bruijn Advanced s Course Presentation Farshad Barahimi Department of Computer Science University of Lethbridge November 19, 2013 This presentation is
More informationA GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS *
A GREEDY APPROXIMATION ALGORITHM FOR CONSTRUCTING SHORTEST COMMON SUPERSTRINGS * 1 Jorma Tarhio and Esko Ukkonen Department of Computer Science, University of Helsinki Tukholmankatu 2, SF-00250 Helsinki,
More informationModelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics
582746 Modelling and Analysis in Bioinformatics Lecture 1: Genomic k-mer Statistics Juha Kärkkäinen 06.09.2016 Outline Course introduction Genomic k-mers 1-Mers 2-Mers 3-Mers k-mers for Larger k Outline
More informationPattern Matching (Exact Matching) Overview
CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm
More information10.4 The Kruskal Katona theorem
104 The Krusal Katona theorem 141 Example 1013 (Maximum weight traveling salesman problem We are given a complete directed graph with non-negative weights on edges, and we must find a maximum weight Hamiltonian
More informationData Structures in Java
Data Structures in Java Lecture 21: Introduction to NP-Completeness 12/9/2015 Daniel Bauer Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways
More informationSEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.
More informationSUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA
SUPPORTING INFORMATION FOR SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA Aik T. Ooi, Cliff I. Stains, Indraneel Ghosh *, David J. Segal
More informationNumber-controlled spatial arrangement of gold nanoparticles with
Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2016 Number-controlled spatial arrangement of gold nanoparticles with DNA dendrimers Ping Chen,*
More informationAnalysis and Design of Algorithms Dynamic Programming
Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................
More informationTrees. A tree is a graph which is. (a) Connected and. (b) has no cycles (acyclic).
Trees A tree is a graph which is (a) Connected and (b) has no cycles (acyclic). 1 Lemma 1 Let the components of G be C 1, C 2,..., C r, Suppose e = (u, v) / E, u C i, v C j. (a) i = j ω(g + e) = ω(g).
More informationSupplemental data. Pommerrenig et al. (2011). Plant Cell /tpc
Supplemental Figure 1. Prediction of phloem-specific MTK1 expression in Arabidopsis shoots and roots. The images and the corresponding numbers showing absolute (A) or relative expression levels (B) of
More informationCMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017
CMSC CMSC : Lecture Greedy Algorithms for Scheduling Tuesday, Sep 9, 0 Reading: Sects.. and. of KT. (Not covered in DPV.) Interval Scheduling: We continue our discussion of greedy algorithms with a number
More informationA PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS
A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology
More informationCrick s early Hypothesis Revisited
Crick s early Hypothesis Revisited Or The Existence of a Universal Coding Frame Ryan Rossi, Jean-Louis Lassez and Axel Bernal UPenn Center for Bioinformatics BIOINFORMATICS The application of computer
More informationHigh throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm
Electronic Supplementary Material (ESI) for Nanoscale. This journal is The Royal Society of Chemistry 2018 High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence
More informationRepeat resolution. This exposition is based on the following sources, which are all recommended reading:
Repeat resolution This exposition is based on the following sources, which are all recommended reading: 1. Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions,
More informationAdvanced topics in bioinformatics
Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib
More informationRegulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)
Regulatory Sequence Analysis Sequence models (Bernoulli and Markov models) 1 Why do we need random models? Any pattern discovery relies on an underlying model to estimate the random expectation. This model
More information1 Some loose ends from last time
Cornell University, Fall 2010 CS 6820: Algorithms Lecture notes: Kruskal s and Borůvka s MST algorithms September 20, 2010 1 Some loose ends from last time 1.1 A lemma concerning greedy algorithms and
More informationNP-Complete Reductions 3
x 1 x 1 x 2 x 2 x 3 x 3 x 4 x 4 12 22 32 CS 4407 1 13 21 23 31 33 Algorithms NP-Complete Reductions 3 Prof. Gregory Provan Department of Computer Science University College Cork 1 HARDEST PROBLEMS IN NP
More informationTowards More Effective Formulations of the Genome Assembly Problem
Towards More Effective Formulations of the Genome Assembly Problem Alexandru Tomescu Department of Computer Science University of Helsinki, Finland DACS June 26, 2015 1 / 25 2 / 25 CENTRAL DOGMA OF BIOLOGY
More informationWhat we have done so far
What we have done so far DFAs and regular languages NFAs and their equivalence to DFAs Regular expressions. Regular expressions capture exactly regular languages: Construct a NFA from a regular expression.
More information1 More finite deterministic automata
CS 125 Section #6 Finite automata October 18, 2016 1 More finite deterministic automata Exercise. Consider the following game with two players: Repeatedly flip a coin. On heads, player 1 gets a point.
More information4. How to prove a problem is NPC
The reducibility relation T is transitive, i.e, A T B and B T C imply A T C Therefore, to prove that a problem A is NPC: (1) show that A NP (2) choose some known NPC problem B define a polynomial transformation
More informationLimitations of Algorithm Power
Limitations of Algorithm Power Objectives We now move into the third and final major theme for this course. 1. Tools for analyzing algorithms. 2. Design strategies for designing algorithms. 3. Identifying
More informationStructure-Based Comparison of Biomolecules
Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 5 Greedy Algorithms Interval Scheduling Interval Partitioning Guest lecturer: Martin Furer Review In a DFS tree of an undirected graph, can there be an edge (u,v)
More informationHierarchical Overlap Graph
Hierarchical Overlap Graph B. Cazaux and E. Rivals LIRMM & IBC, Montpellier 8. Feb. 2018 arxiv:1802.04632 2018 B. Cazaux & E. Rivals 1 / 29 Overlap Graph for a set of words Consider the set P := {abaa,
More informationLecture 4: NP and computational intractability
Chapter 4 Lecture 4: NP and computational intractability Listen to: Find the longest path, Daniel Barret What do we do today: polynomial time reduction NP, co-np and NP complete problems some examples
More informationMore Approximation Algorithms
CS 473: Algorithms, Spring 2018 More Approximation Algorithms Lecture 25 April 26, 2018 Most slides are courtesy Prof. Chekuri Ruta (UIUC) CS473 1 Spring 2018 1 / 28 Formal definition of approximation
More informationCharacterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin
International Journal of Genetic Engineering and Biotechnology. ISSN 0974-3073 Volume 2, Number 1 (2011), pp. 109-114 International Research Publication House http://www.irphouse.com Characterization of
More informationApproximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs
Approximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs Haim Kaplan Tel-Aviv University, Israel haimk@post.tau.ac.il Nira Shafrir Tel-Aviv University, Israel shafrirn@post.tau.ac.il
More informationUC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 22 Lecturer: David Wagner April 24, Notes 22 for CS 170
UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 22 Lecturer: David Wagner April 24, 2003 Notes 22 for CS 170 1 NP-completeness of Circuit-SAT We will prove that the circuit satisfiability
More informationPairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55
Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise
More informationNP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch]
NP-Completeness Andreas Klappenecker [based on slides by Prof. Welch] 1 Prelude: Informal Discussion (Incidentally, we will never get very formal in this course) 2 Polynomial Time Algorithms Most of the
More informationAlgorithms Exam TIN093 /DIT602
Algorithms Exam TIN093 /DIT602 Course: Algorithms Course code: TIN 093, TIN 092 (CTH), DIT 602 (GU) Date, time: 21st October 2017, 14:00 18:00 Building: SBM Responsible teacher: Peter Damaschke, Tel. 5405
More informationLecture 1 : Data Compression and Entropy
CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for
More information8.3 Hamiltonian Paths and Circuits
8.3 Hamiltonian Paths and Circuits 8.3 Hamiltonian Paths and Circuits A Hamiltonian path is a path that contains each vertex exactly once A Hamiltonian circuit is a Hamiltonian path that is also a circuit
More informationInformation Theory of DNA Shotgun Sequencing
Information Theory of DNA Shotgun Sequencing Abolfazl Motahari, Guy Bresler and David Tse arxiv:103.633v4 [cs.it] 14 Feb 013 Department of Electrical Engineering and Computer Sciences University of California,
More informationEasy Problems vs. Hard Problems. CSE 421 Introduction to Algorithms Winter Is P a good definition of efficient? The class P
Easy Problems vs. Hard Problems CSE 421 Introduction to Algorithms Winter 2000 NP-Completeness (Chapter 11) Easy - problems whose worst case running time is bounded by some polynomial in the size of the
More informationDynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.
Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal
More informationThree new strategies for exact string matching
Three new strategies for exact string matching Simone Faro 1 Thierry Lecroq 2 1 University of Catania, Italy 2 University of Rouen, LITIS EA 4108, France SeqBio 2012 November 26th-27th 2012 Marne-la-Vallée,
More informationNetwork Design and Game Theory Spring 2008 Lecture 2
Network Design and Game Theory Spring 2008 Lecture 2 Instructor: Mohammad T. Hajiaghayi Scribe: Imdadullah Khan February 04, 2008 MAXIMUM COVERAGE In this lecture we review Maximum Coverage and Unique
More informationChapter 34: NP-Completeness
Graph Algorithms - Spring 2011 Set 17. Lecturer: Huilan Chang Reference: Cormen, Leiserson, Rivest, and Stein, Introduction to Algorithms, 2nd Edition, The MIT Press. Chapter 34: NP-Completeness 2. Polynomial-time
More informationCSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo
CSE 431/531: Analysis of Algorithms Dynamic Programming Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Paradigms for Designing Algorithms Greedy algorithm Make a
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Sequence Assembly
CMPS 6630: Introduction to Computational Biology and Bioinformatics Sequence Assembly Why Genome Sequencing? Sanger (1982) introduced chaintermination sequencing. Main idea: Obtain fragments of all possible
More informationNP-Complete Problems and Approximation Algorithms
NP-Complete Problems and Approximation Algorithms Efficiency of Algorithms Algorithms that have time efficiency of O(n k ), that is polynomial of the input size, are considered to be tractable or easy
More informationTheoretical Computer Science
Theoretical Computer Science Zdeněk Sawa Department of Computer Science, FEI, Technical University of Ostrava 17. listopadu 15, Ostrava-Poruba 708 33 Czech republic September 22, 2017 Z. Sawa (TU Ostrava)
More informationTopics in Approximation Algorithms Solution for Homework 3
Topics in Approximation Algorithms Solution for Homework 3 Problem 1 We show that any solution {U t } can be modified to satisfy U τ L τ as follows. Suppose U τ L τ, so there is a vertex v U τ but v L
More informationCS 350 Algorithms and Complexity
CS 350 Algorithms and Complexity Winter 2019 Lecture 15: Limitations of Algorithmic Power Introduction to complexity theory Andrew P. Black Department of Computer Science Portland State University Lower
More informationAutomata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,
Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu ETH Zürich (D-ITET) September, 24 2015 Last week was all about Deterministic Finite Automaton We saw three main
More informationCS 350 Algorithms and Complexity
1 CS 350 Algorithms and Complexity Fall 2015 Lecture 15: Limitations of Algorithmic Power Introduction to complexity theory Andrew P. Black Department of Computer Science Portland State University Lower
More informationNature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1
Supplementary Figure 1 Zn 2+ -binding sites in USP18. (a) The two molecules of USP18 present in the asymmetric unit are shown. Chain A is shown in blue, chain B in green. Bound Zn 2+ ions are shown as
More informationA Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus
A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus Timothy A. D. Fowler Department of Computer Science University of Toronto 10 King s College Rd., Toronto, ON, M5S 3G4, Canada
More informationNP-Complete problems
NP-Complete problems NP-complete problems (NPC): A subset of NP. If any NP-complete problem can be solved in polynomial time, then every problem in NP has a polynomial time solution. NP-complete languages
More informationCMPSCI611: The Matroid Theorem Lecture 5
CMPSCI611: The Matroid Theorem Lecture 5 We first review our definitions: A subset system is a set E together with a set of subsets of E, called I, such that I is closed under inclusion. This means that
More informationSupplementary Information
Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2014 Directed self-assembly of genomic sequences into monomeric and polymeric branched DNA structures
More informationCSE 202 Dynamic Programming II
CSE 202 Dynamic Programming II Chapter 6 Dynamic Programming Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 Algorithmic Paradigms Greed. Build up a solution incrementally,
More information1. Introduction Recap
1. Introduction Recap 1. Tractable and intractable problems polynomial-boundness: O(n k ) 2. NP-complete problems informal definition 3. Examples of P vs. NP difference may appear only slightly 4. Optimization
More informationDid you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden
Did you know that Multiple Alignment is NP-hard? Isaac Elias Royal Institute of Technology Sweden 1 Results Multiple Alignment with SP-score Star Alignment Tree Alignment (with given phylogeny) are NP-hard
More information10.3 Matroids and approximation
10.3 Matroids and approximation 137 10.3 Matroids and approximation Given a family F of subsets of some finite set X, called the ground-set, and a weight function assigning each element x X a non-negative
More informationLecture 18: More NP-Complete Problems
6.045 Lecture 18: More NP-Complete Problems 1 The Clique Problem a d f c b e g Given a graph G and positive k, does G contain a complete subgraph on k nodes? CLIQUE = { (G,k) G is an undirected graph with
More informationTable S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R
Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R AAC MGG ATT AGA TAC CCK G GGY TAC CTT GTT ACG ACT T Detection of Candidatus
More informationTheoretical Computer Science
Theoretical Computer Science 410 (2009) 2759 2766 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Note Computing the longest topological
More informationPreliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}
Preliminaries Graphs G = (V, E), V : set of vertices E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) 1 2 3 5 4 V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)} 1 Directed Graph (Digraph)
More informationAdmin NP-COMPLETE PROBLEMS. Run-time analysis. Tractable vs. intractable problems 5/2/13. What is a tractable problem?
Admin Two more assignments No office hours on tomorrow NP-COMPLETE PROBLEMS Run-time analysis Tractable vs. intractable problems We ve spent a lot of time in this class putting algorithms into specific
More informationElectronic supplementary material
Applied Microbiology and Biotechnology Electronic supplementary material A family of AA9 lytic polysaccharide monooxygenases in Aspergillus nidulans is differentially regulated by multiple substrates and
More informationSSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),
48 3 () Vol. 48 No. 3 2009 5 Journal of Xiamen University (Nat ural Science) May 2009 SSR,,,, 3 (, 361005) : SSR. 21 516,410. 60 %96. 7 %. (),(Between2groups linkage method),.,, 11 (),. 12,. (, ), : 0.
More informationImplementing Approximate Regularities
Implementing Approximate Regularities Manolis Christodoulakis Costas S. Iliopoulos Department of Computer Science King s College London Kunsoo Park School of Computer Science and Engineering, Seoul National
More informationPCPs and Inapproximability Gap-producing and Gap-Preserving Reductions. My T. Thai
PCPs and Inapproximability Gap-producing and Gap-Preserving Reductions My T. Thai 1 1 Hardness of Approximation Consider a maximization problem Π such as MAX-E3SAT. To show that it is NP-hard to approximation
More informationHumans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase
Humans have two copies of each chromosome Inherited from mother and father. Genotyping technologies do not maintain the phase Genotyping technologies do not maintain the phase Recall that proximal SNPs
More information8 Knapsack Problem 8.1 (Knapsack)
8 Knapsack In Chapter 1 we mentioned that some NP-hard optimization problems allow approximability to any required degree. In this chapter, we will formalize this notion and will show that the knapsack
More informationCS 161: Design and Analysis of Algorithms
CS 161: Design and Analysis of Algorithms Greedy Algorithms 3: Minimum Spanning Trees/Scheduling Disjoint Sets, continued Analysis of Kruskal s Algorithm Interval Scheduling Disjoint Sets, Continued Each
More informationBio nformatics. Lecture 23. Saad Mneimneh
Bio nformatics Lecture 23 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely
More informationIntroduction to Complexity Theory
Introduction to Complexity Theory Read K & S Chapter 6. Most computational problems you will face your life are solvable (decidable). We have yet to address whether a problem is easy or hard. Complexity
More information1 Alphabets and Languages
1 Alphabets and Languages Look at handout 1 (inference rules for sets) and use the rules on some examples like {a} {{a}} {a} {a, b}, {a} {{a}}, {a} {{a}}, {a} {a, b}, a {{a}}, a {a, b}, a {{a}}, a {a,
More informationF. Blanchet-Sadri, "Codes, Orderings, and Partial Words." Theoretical Computer Science, Vol. 329, 2004, pp DOI: /j.tcs
Codes, orderings, and partial words By: F. Blanchet-Sadri F. Blanchet-Sadri, "Codes, Orderings, and Partial Words." Theoretical Computer Science, Vol. 329, 2004, pp 177-202. DOI: 10.1016/j.tcs.2004.08.011
More informationComputational Models - Lecture 3 1
Computational Models - Lecture 3 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. March 13/18, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice
More informationNATIONAL UNIVERSITY OF SINGAPORE CS3230 DESIGN AND ANALYSIS OF ALGORITHMS SEMESTER II: Time Allowed 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE CS3230 DESIGN AND ANALYSIS OF ALGORITHMS SEMESTER II: 2017 2018 Time Allowed 2 Hours INSTRUCTIONS TO STUDENTS 1. This assessment consists of Eight (8) questions and comprises
More informationSUPPLEMENTARY DATA - 1 -
- 1 - SUPPLEMENTARY DATA Construction of B. subtilis rnpb complementation plasmids For complementation, the B. subtilis rnpb wild-type gene (rnpbwt) under control of its native rnpb promoter and terminator
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 26 Computational Intractability Polynomial Time Reductions Sofya Raskhodnikova S. Raskhodnikova; based on slides by A. Smith and K. Wayne L26.1 What algorithms are
More informationCS 580: Algorithm Design and Analysis
CS 58: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 28 Announcement: Homework 3 due February 5 th at :59PM Midterm Exam: Wed, Feb 2 (8PM-PM) @ MTHW 2 Recap: Dynamic Programming
More informationON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES
ON THE NP-COMPLETENESS OF SOME GRAPH CLUSTER MEASURES JIŘÍ ŠÍMA AND SATU ELISA SCHAEFFER Academy of Sciences of the Czech Republic Helsinki University of Technology, Finland elisa.schaeffer@tkk.fi SOFSEM
More informationAlgorithms: COMP3121/3821/9101/9801
NEW SOUTH WALES Algorithms: COMP32/382/90/980 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales LECTURE 7: MAXIMUM FLOW COMP32/382/90/980 / 24 Flow Networks A flow
More informationNP Completeness and Approximation Algorithms
Chapter 10 NP Completeness and Approximation Algorithms Let C() be a class of problems defined by some property. We are interested in characterizing the hardest problems in the class, so that if we can
More informationLecture 19: Finish NP-Completeness, conp and Friends
6.045 Lecture 19: Finish NP-Completeness, conp and Friends 1 Polynomial Time Reducibility f : Σ* Σ* is a polynomial time computable function if there is a poly-time Turing machine M that on every input
More informationCographs; chordal graphs and tree decompositions
Cographs; chordal graphs and tree decompositions Zdeněk Dvořák September 14, 2015 Let us now proceed with some more interesting graph classes closed on induced subgraphs. 1 Cographs The class of cographs
More information