Allen Holder - Trinity University

Size: px
Start display at page:

Download "Allen Holder - Trinity University"

Transcription

1 Haplotyping - Trinity University Population Problems - joint with Courtney Davis, University of Utah Single Individuals - joint with John Louie, Carrol College, and Lena Sherbakov, Williams University

2 Some Genetics Mother Paired Gene Representation Physical Trait ABABBA AAABBB Physical Trait ABA AAA Mother s Donation ABABBA ABA Haplotype Genotype Representation AXABBX AXA Child ABABBA ABA BBAABA BBA Paired Gene Representation Paired Gene Representation Physical Trait BBAABA BBBAAB Father Physical Trait BBA BAA XBAXBA XBA Father s Donation BBAABA BBA Genotype Representation Haplotype Genotype Representation BBXAXX BXA SNP 2 SNP 7 SNP 2 SNP 7

3 Haplotyping a Population Definition: The problem is to reconstruct the haplotypes donated by a previous population from the genotypes of the current population. Why: Tracing genetic markers from generation to generation is needed to gauge a population s susceptibility to disease and in the design of patient-specific drugs. Past Research: Investigations were started by Clark in 1990, and recent contributions were made by Istrial, Lancia, Gusfield, Pinotti, and Rizzi.

4 Clark s Rule 1. Start with an empty collection of haplotypes. 2. Choose a genotype. 3. Add as few haplotypes to the set as possible (you need to add either 1 or 2) so that the genotype can be formed from the collection of haplotypes. 4. Continue until all genotypes can be formed. This technique mimics what happens in nature. Notice that it can be interpreted as an attempt to find the smallest collection of haplotypes, but the process is dependent on the sequence of genotypes.

5 The Pure Parsimony Problem Parsimony Problems Haplotyping is a situation where simple explanations appear to be biologically relevant. So, finding small collections of haplotypes that can explain the genotypic information of the current population is important. Pure Parsimony Problem Finding a smallest collection of haplotypes that can reconstruct a set of genotypes is called the Pure Parsimony Problem. This problem is known to be APX-hard (Lancia, Pinotti, and Rizzi).

6 For parent haplotypes h 1 and h 2 and offspring genotype g, we have the following at each SNP: g i = A if and only if h 1 i = h2 i = A. g i = B if and only if h 1 i = h2 i = B. g i = X if and only if either h 1 i = A and h2 i = B, or h1 i = B and h2 i = A. We say that h 1 h 2 = g provided that h 1, h 2, and g adhere to these rules. For example, let h 1 = AABAAB and h 2 = ABBABB. Then, h 1 h 2 = g = AXBAXB. It is easy to see that is a binary operation with the property that h i h j = h i h k implies h j = h k. Parental haplotypes that contribute genetic information to the same offspring s genotype are called mates. That is, if h 1 h 2 = g, we say that h 1 mates with h 2 to form g. Furthermore, we say that h 1 resolves g if h 1 h 2 = g for some h 2. This concept is extended to sets, and we say that H resolves G if for each g G, there is an h 1 and h 2 in H such that h 1 h 2 = g.

7 Diversity Graph A bipartite graph D = (H, G, E) is a diversity graph if G is nonempty, each genotype in G is resolved by some haplotype in H, and E has the property that if (h 1, g) E, then there exists an h 2 H such that (h 2, g) E and h 1 h 2 = g. Notice that the definition is biological.

8 Bipartite Graphs that are not Diversity Graphs A bipartite graph is a diversity graph if the nodes can be labeled to satisfy the definition. The definition requires that the degree of every node in the genotype set has even degree. There are graphs with each node having an even degree but that are not diversity graphs. As an example, K(2, 2) is not a diversity graph because it violates.

9 Some Definitions The set of all haplotypes of length n is denoted by H. The largest edge set between the collection of genotypes G and H is E. Any subgraph of (H, G, E) that is a diversity graph and has the property that the subset of H is as small as possible is a solution to the Pure Parsimony problem. These subgraphs are denoted by (H, G, E ). There are typically several optimal subgraphs, which makes solving an IP formulation of the problem difficult.

10 A Simple, but Useful Result Theorem If the elements of H are lexicographically ordered (where A < B), we have for 1 j 2 n that h j h (2n j+1) = XX...X. The proof is simple. An example for n = 3 is below. 0 AAA AAB ABA ABB BAA BAB BBA BBB 1 0 C B BBB BBA BAB BAA ABB ABA AAB AAA 1 0 = C B XXX XXX XXX XXX XXX XXX XXX XXX 1. C A

11 Extending Bipartite Graphs to Diversity Graphs Let (V, W, E) be a bipartite graph. For each w, define May need to add a node if N(w) is ^ odd and V(w) is empty. V ^ F(w) T (w) = S w w [N(w) N(w )]. Add enough to remove conflicts in N(w). ^ V(w) W w Let ˆV (w) and ˆF (w) be vertex sets such that N(w) ˆF (w) = ˆV (w) = 2 T (w) N(w) + and ( 0, N(w) ˆV (w) is even 1, N(w) ˆV (w) is odd T(w)

12 Some Extension Bounds Lemma The bipartite graph (V, W, E) can be extended and labeled to become a diversity graph by adding no more than X w W ˆF (w) + (2 T (w) N(w) ) + nodes to V, provide that there are no isolated nodes. proof: This is a long constructive proof that uses the Lexicographic Theorem. Theorem Any bipartite graph (V, W, E) can be extended and labeled to become a diversity graph by adding no more than X ˆF (w) + (2 T (w) N(w) ) (M V M W ) + w W nodes to V, where M V and M W are the number of isolated nodes in V and W, respectively.

13 Some Theory Lemma Suppose that T (g) for some g G. Then, H contains an element of S g G T (g). This lemma says that if some haplotypes can resolve multiple genotypes, then a smallest collection of haplotypes contains some of these haplotypes. Theorem Assume every g has one or more ambiguous SNPs. Then, H = 2 G if, and only if, the neighborhoods of the genotypes together with the set of isolated haplotypes partitions H. This result classifies the graphs that attain the largest, smallest collection of haplotypes.

14 Restricting Mating Structure We now constrain our optimization problem so that the maximum number of mates that any haplotype can have is m. A smallest haplotype set that resolves G with this restriction is denoted by H m, and we let φ(m) = H m. If m = 1, each haplotype can mate with at most one other haplotype. Biologically this means each parent can donate one of two haplotypes to a unique child, so this haplotype cannot be used to form another child. So, for m = 1 the neighborhoods of the genotypes in an optimal subgraph are disjoint, and the smallest number of haplotypes that can resolve G is φ(1) = 2 G u, where u is the number of unambiguous genotypes.

15 Properties of φ(m) At some threshold, increasing m does not change the cardinality of H m. Hence, for some m, φ(m) = φ(m + k) for every natural number k. The smallest m such that φ(m) = φ(m + k), for all k N, is denoted by m. So, if m m, we have that φ(m) = φ(m ). Calculating φ(m ) solves the Pure Parsimony problem and indicates the least amount of mating needed. Increasing the number of possible mates that any haplotype is allowed never causes an increase in Hm. Thus, φ(m) φ(m + 1) for all m, and φ is non-increasing. No haplotype can mate with more than G haplotypes, and hence, m G. If no haplotype reconciles more than one genotype, m = 1.

16 What if m is at its Upper Bound Theorem If m = G, we have that φ(m G, if h h = g for some h Hm ) =, G + 1, otherwise.

17 Calculating φ(2) Step 1: Set v = 0 and (H v, G v ) = (H, G). Step 2: Find the longest path in (H v, G v ), say P v. If no path exists, set P v =. Step 3: If P v =, stop. Step 4: Index v by 1. Step 5: Set (H v+1, G v+1 ) = (H v, G v )\P v. Step 6: Index v by 1. Step 7: Go to Step 2. This greedy algorithm iteratively removes the longest paths in a diversity graph.

18 The Greedy Algorithm Works Theorem The greedy algorithm finds an optimal subgraph of the acyclic diversity graph (H, G, E). Moreover, if v is the number of paths found by the algorithm, φ(2) = G + v. proof: The proof follows by induction on G.

19 An Example The paths through the genotypes g 1 = AXBBBB g 2 = XAXXBB g 3 = BXAXBX g 4 = BXXAXB g 5 = BBBXAB g 6 = BBXBBA must pass through these genotypes as indicated below. g 1 g2 g3 g4 g5 g6

20 Path Decompositions First Path s Genotype Progression Second Path s Genotype Progression (g 1, g 2, g 3, g 4, g 5 ) (g 6 ) (g 1, g 2, g 4, g 5 ) (g 3, g 6 ) (g 1, g 2, g 3, g 6 ) (g 4, g 5 ) (g 6, g 3, g 4, g 5 ) (g 1, g 2 ) The greedy algorithm finds the first solution in the Table, as the first path is as long as possible. None of the other paths have this property, and so the algorithm is not capable of finding these solutions.

21 Future Directions How fast does φ(m) grow? We see from Theorem that knowing m can solve the Pure Parsimony problem in some cases. Moreover, knowing m is beneficial in all cases as this removes many subgraphs from consideration. So, in an integer programming formulation of the Pure Parsimony problem, m provides a cut that may help reduce solution times. Finding bounds on m is an interesting area of future work. Randomized coloring algorithms have been efficient on many classes of graphs, and it may be that finding longest paths and cycles can be thought of as a coloring problem. If so, then these techniques could be used to approximate the greedy algorithm, with the hope being that substantial biological models could be addressed.

22 Haplotyping an Individual The hope in the future is to (partially) construct an individual s unique genetic information to design patient-specific drugs, screening methodologies, and other therapies. DNA sequencing machines are not capable of sequencing the entire DNA strand at once, and instead, the DNA strand is replicated and sequenced in smaller fragments (1,000-30,000 individual nucleotides). This process is called shotgun sequencing.

23 Sequencing machines make errors (analysis is based on probabilistic determination) and return information similar to that in the following matrix. SNP A B A A Fragments 2 B A B B B A - B A B B B B B A B B A B B A A B A B A A The A s and B s represent heterozygous or homozygous SNPs, and a dash indicates that the sequencer was not capable of deciding between an A and a B. NOTE: Errors can occur at any position.

24 The problem is to find a 2-set partition of the fragments (rows) so that the fragments within each set are not in conflict i.e. they agree in all spots (dashes can not cause conflicts). Correcting or removing errors is required to form the two haplotypes. One way is to the remove (the fewest) fragments or SNPs until we can construct the haplotypes. These problems are known to be polynomial with gapless data and APX-hard otherwise (Lancia, et. all) A B A A B A B B B A - B A B B B B B A B B A B B A A B A B A A Instead of removing information, we address the problem of changing the fewest number of positions that allows the haplotypes to be constructed. The different haplotypes are indicated by red and blue. The green letters are changed to form the haplotypes.

25 Conflict Graphs For a collection of haplotypes, H, the conflict graph, CG(H), is the graph whose nodes are haplotypes and two haplotypes are connected by an edge if they are in conflict. AB A A A The Conflict Graph for H = {AB A, A A, AAAA, BBA } Ĥ = {AB A, B A, AAAA, BBA } AAAA BBA The Conflict Graph Minimum Letter Flip (CGMLF) problem is to find Ĥ such that CG(Ĥ) is a bipartite graph and z(h, Ĥ) is as small as possible, where z is a measure of the distance from H to Ĥ.

26 The P-Median Problem The p-median problem is to find p nodes in a (directed) graph with edge distances d (i,j) such that the aggregate distance from any node to one of these medians is as small as possible. An undirected graph with medians and edges associating them to nodes highlighted in red. Each edge has a distance, and this choice of medians minimizes the distance along the red edges. The p-median problems is known to be NP-hard, but for a fixed p it is polynomial. In particular, the 2-median problem is O(n 2 ).

27 Distances We let d(h i, h j ) be the number We let l(h i, h j ) be the of SNPs where one is an A and the nonsymmetrical sum of the other is a B number of SNPs where the supposedly certain information h i = AB A of an A or a B in h i disagrees with the symbol in h j. h i = B A h i = AB A d(h i, h j ) = h i = B A l(h i, h j ) = 2

28 Problem Statements MLF The 2-median problem on the complete graph K H = (H, H H), with edge distances defined by d. This problem is O(m 3 ). MLF The 2-median problem on the graph ({A, B, } n, H {A, B, } n ), with edge distance defined by d. (this is not a very interesting problem because one median is always....) DMLF The 2-median problem on the directed complete graph K H = (H, H H), with edge distances defined by l. This problem is O(m 3 ). DMLF is the 2-median problem on the directed graph ({A, B, } n, H {A, B, } n ), with edge distance defined by l. This problem is O(3 3n ).

29 Theoretical Results Theorem We have that MLF MLF DMLF = CGMLF DMLF. Hence, the exponential problem of finding the minimum number of flips is bound by polynomial problems. The proof is not complete. Theorem If there is one SNP for each median haplotype that is sampled correctly in all fragments, then MLF = DMLF So, CGMLF is solvable in polynomial time under normal practice -i.e. a sampling machine would need to missclassify some fragment for every SNP for the problem to fail to be polynomial.

30 Thank you for your time, please ask questions

The Pure Parsimony Problem

The Pure Parsimony Problem Haplotyping and Minimum Diversity Graphs Courtney Davis - University of Utah - Trinity University Some Genetics Mother Paired Gene Representation Physical Trait ABABBA AAABBB Physical Trait ABA AAA Mother

More information

Mathematical Approaches to the Pure Parsimony Problem

Mathematical Approaches to the Pure Parsimony Problem Mathematical Approaches to the Pure Parsimony Problem P. Blain a,, A. Holder b,, J. Silva c, and C. Vinzant d, July 29, 2005 Abstract Given the genetic information of a population, the Pure Parsimony problem

More information

SAT in Bioinformatics: Making the Case with Haplotype Inference

SAT in Bioinformatics: Making the Case with Haplotype Inference SAT in Bioinformatics: Making the Case with Haplotype Inference Inês Lynce 1 and João Marques-Silva 2 1 IST/INESC-ID, Technical University of Lisbon, Portugal ines@sat.inesc-id.pt 2 School of Electronics

More information

On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem

On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem Paola Bonizzoni, Riccardo Dondi, Gunnar W. Klau, Yuri Pirola, Nadia Pisanti and Simone Zaccaria DISCo, computer

More information

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1) COSE212: Programming Languages Lecture 1 Inductive Definitions (1) Hakjoo Oh 2017 Fall Hakjoo Oh COSE212 2017 Fall, Lecture 1 September 4, 2017 1 / 9 Inductive Definitions Inductive definition (induction)

More information

Haplotype Inference Constrained by Plausible Haplotype Data

Haplotype Inference Constrained by Plausible Haplotype Data Haplotype Inference Constrained by Plausible Haplotype Data Michael R. Fellows 1, Tzvika Hartman 2, Danny Hermelin 3, Gad M. Landau 3,4, Frances Rosamond 1, and Liat Rozenberg 3 1 The University of Newcastle,

More information

Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries

Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries Harry Lewis September 5, 2013 Reading: Sipser, Chapter 0 Sets Sets are defined by their members A = B means that for every x, x A iff

More information

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1) COSE212: Programming Languages Lecture 1 Inductive Definitions (1) Hakjoo Oh 2018 Fall Hakjoo Oh COSE212 2018 Fall, Lecture 1 September 5, 2018 1 / 10 Inductive Definitions Inductive definition (induction)

More information

Haplotyping estimation from aligned single nucleotide polymorphism fragments has attracted increasing

Haplotyping estimation from aligned single nucleotide polymorphism fragments has attracted increasing INFORMS Journal on Computing Vol. 22, No. 2, Spring 2010, pp. 195 209 issn 1091-9856 eissn 1526-5528 10 2202 0195 informs doi 10.1287/ijoc.1090.0333 2010 INFORMS A Class Representative Model for Pure Parsimony

More information

A Phylogenetic Network Construction due to Constrained Recombination

A Phylogenetic Network Construction due to Constrained Recombination A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer

More information

An Overview of Combinatorial Methods for Haplotype Inference

An Overview of Combinatorial Methods for Haplotype Inference An Overview of Combinatorial Methods for Haplotype Inference Dan Gusfield 1 Department of Computer Science, University of California, Davis Davis, CA. 95616 Abstract A current high-priority phase of human

More information

This is a survey designed for mathematical programming people who do not know molecular biology and

This is a survey designed for mathematical programming people who do not know molecular biology and INFORMS Journal on Computing Vol. 16, No. 3, Summer 2004, pp. 211 231 issn 0899-1499 eissn 1526-5528 04 1603 0211 informs doi 10.1287/ijoc.1040.0073 2004 INFORMS Opportunities for Combinatorial Optimization

More information

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016)

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016) FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016) The final exam will be on Thursday, May 12, from 8:00 10:00 am, at our regular class location (CSI 2117). It will be closed-book and closed-notes, except

More information

ACO Comprehensive Exam March 17 and 18, Computability, Complexity and Algorithms

ACO Comprehensive Exam March 17 and 18, Computability, Complexity and Algorithms 1. Computability, Complexity and Algorithms (a) Let G(V, E) be an undirected unweighted graph. Let C V be a vertex cover of G. Argue that V \ C is an independent set of G. (b) Minimum cardinality vertex

More information

Theory of Computer Science

Theory of Computer Science Theory of Computer Science C1. Formal Languages and Grammars Malte Helmert University of Basel March 14, 2016 Introduction Example: Propositional Formulas from the logic part: Definition (Syntax of Propositional

More information

Haplotyping as Perfect Phylogeny: A direct approach

Haplotyping as Perfect Phylogeny: A direct approach Haplotyping as Perfect Phylogeny: A direct approach Vineet Bafna Dan Gusfield Giuseppe Lancia Shibu Yooseph February 7, 2003 Abstract A full Haplotype Map of the human genome will prove extremely valuable

More information

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have. Section 1: Chromosomes and Meiosis KEY CONCEPT Gametes have half the number of chromosomes that body cells have. VOCABULARY somatic cell autosome fertilization gamete sex chromosome diploid homologous

More information

{a, b, c} {a, b} {a, c} {b, c} {a}

{a, b, c} {a, b} {a, c} {b, c} {a} Section 4.3 Order Relations A binary relation is an partial order if it transitive and antisymmetric. If R is a partial order over the set S, we also say, S is a partially ordered set or S is a poset.

More information

Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem

Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem Practical Algorithms and Fixed-Parameter Tractability for the Single Individual SNP Haplotyping Problem Romeo Rizzi 1,, Vineet Bafna 2, Sorin Istrail 2, and Giuseppe Lancia 3 1 Math. Dept., Università

More information

4-coloring P 6 -free graphs with no induced 5-cycles

4-coloring P 6 -free graphs with no induced 5-cycles 4-coloring P 6 -free graphs with no induced 5-cycles Maria Chudnovsky Department of Mathematics, Princeton University 68 Washington Rd, Princeton NJ 08544, USA mchudnov@math.princeton.edu Peter Maceli,

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

Phylogenetic Networks, Trees, and Clusters

Phylogenetic Networks, Trees, and Clusters Phylogenetic Networks, Trees, and Clusters Luay Nakhleh 1 and Li-San Wang 2 1 Department of Computer Science Rice University Houston, TX 77005, USA nakhleh@cs.rice.edu 2 Department of Biology University

More information

arxiv: v1 [cs.cc] 27 Feb 2011

arxiv: v1 [cs.cc] 27 Feb 2011 arxiv:1102.5471v1 [cs.cc] 27 Feb 2011 An Implicit Cover Problem in Wild Population Study Mary V. Ashley Tanya Y. Berger-Wolf Wanpracha Chaovalitwongse Bhaskar DasGupta Ashfaq Khokhar Saad Sheikh April

More information

Humans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase

Humans have two copies of each chromosome. Inherited from mother and father. Genotyping technologies do not maintain the phase Humans have two copies of each chromosome Inherited from mother and father. Genotyping technologies do not maintain the phase Genotyping technologies do not maintain the phase Recall that proximal SNPs

More information

CS6902 Theory of Computation and Algorithms

CS6902 Theory of Computation and Algorithms CS6902 Theory of Computation and Algorithms Any mechanically (automatically) discretely computation of problem solving contains at least three components: - problem description - computational tool - procedure/analysis

More information

SNPs Problems, Complexity, and Algorithms

SNPs Problems, Complexity, and Algorithms SNPs Problems, Complexity, and lgorithms Giuseppe Lancia 1,2, Vineet afna 1, Sorin Istrail 1, Ross Lippert 1, and Russell Schwartz 1 1 Celera Genomics, Rockville MD, US, {Giuseppe.Lancia,Vineet.afna,Sorin.Istrail,Ross.Lippert,

More information

Efficient Approximation for Restricted Biclique Cover Problems

Efficient Approximation for Restricted Biclique Cover Problems algorithms Article Efficient Approximation for Restricted Biclique Cover Problems Alessandro Epasto 1, *, and Eli Upfal 2 ID 1 Google Research, New York, NY 10011, USA 2 Department of Computer Science,

More information

Estimating Recombination Rates. LRH selection test, and recombination

Estimating Recombination Rates. LRH selection test, and recombination Estimating Recombination Rates LRH selection test, and recombination Recall that LRH tests for selection by looking at frequencies of specific haplotypes. Clearly the test is dependent on the recombination

More information

Efficient Haplotype Inference with Boolean Satisfiability

Efficient Haplotype Inference with Boolean Satisfiability Efficient Haplotype Inference with Boolean Satisfiability Joao Marques-Silva 1 and Ines Lynce 2 1 School of Electronics and Computer Science University of Southampton 2 INESC-ID/IST Technical University

More information

CS 301: Complexity of Algorithms (Term I 2008) Alex Tiskin Harald Räcke. Hamiltonian Cycle. 8.5 Sequencing Problems. Directed Hamiltonian Cycle

CS 301: Complexity of Algorithms (Term I 2008) Alex Tiskin Harald Räcke. Hamiltonian Cycle. 8.5 Sequencing Problems. Directed Hamiltonian Cycle 8.5 Sequencing Problems Basic genres. Packing problems: SET-PACKING, INDEPENDENT SET. Covering problems: SET-COVER, VERTEX-COVER. Constraint satisfaction problems: SAT, 3-SAT. Sequencing problems: HAMILTONIAN-CYCLE,

More information

A GENETIC ALGORITHM FOR FINITE STATE AUTOMATA

A GENETIC ALGORITHM FOR FINITE STATE AUTOMATA A GENETIC ALGORITHM FOR FINITE STATE AUTOMATA Aviral Takkar Computer Engineering Department, Delhi Technological University( Formerly Delhi College of Engineering), Shahbad Daulatpur, Main Bawana Road,

More information

Theory of Computation

Theory of Computation Theory of Computation Lecture #2 Sarmad Abbasi Virtual University Sarmad Abbasi (Virtual University) Theory of Computation 1 / 1 Lecture 2: Overview Recall some basic definitions from Automata Theory.

More information

Undecidable Problems. Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science May 12, / 65

Undecidable Problems. Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science May 12, / 65 Undecidable Problems Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science May 12, 2018 1/ 65 Algorithmically Solvable Problems Let us assume we have a problem P. If there is an algorithm solving

More information

BOUNDS ON THE NUMBER OF INFERENCE FUNCTIONS OF A GRAPHICAL MODEL

BOUNDS ON THE NUMBER OF INFERENCE FUNCTIONS OF A GRAPHICAL MODEL BOUNDS ON THE NUMBER OF INFERENCE FUNCTIONS OF A GRAPHICAL MODEL SERGI ELIZALDE AND KEVIN WOODS Abstract. Directed and undirected graphical models, also called Bayesian networks and Markov random fields,

More information

The minimum G c cut problem

The minimum G c cut problem The minimum G c cut problem Abstract In this paper we define and study the G c -cut problem. Given a complete undirected graph G = (V ; E) with V = n, edge weighted by w(v i, v j ) 0 and an undirected

More information

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs

Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs Linear-Time Algorithms for Finding Tucker Submatrices and Lekkerkerker-Boland Subgraphs Nathan Lindzey, Ross M. McConnell Colorado State University, Fort Collins CO 80521, USA Abstract. Tucker characterized

More information

1 Matchings in Non-Bipartite Graphs

1 Matchings in Non-Bipartite Graphs CS 598CSC: Combinatorial Optimization Lecture date: Feb 9, 010 Instructor: Chandra Chekuri Scribe: Matthew Yancey 1 Matchings in Non-Bipartite Graphs We discuss matching in general undirected graphs. Given

More information

Minimization of Symmetric Submodular Functions under Hereditary Constraints

Minimization of Symmetric Submodular Functions under Hereditary Constraints Minimization of Symmetric Submodular Functions under Hereditary Constraints J.A. Soto (joint work with M. Goemans) DIM, Univ. de Chile April 4th, 2012 1 of 21 Outline Background Minimal Minimizers and

More information

A Class Representative Model for Pure Parsimony Haplotyping

A Class Representative Model for Pure Parsimony Haplotyping A Class Representative Model for Pure Parsimony Haplotyping Daniele Catanzaro, Alessandra Godi, and Martine Labbé June 5, 2008 Abstract Haplotyping estimation from aligned Single Nucleotide Polymorphism

More information

On the Complexity of SNP Block Partitioning Under the Perfect Phylogeny Model

On the Complexity of SNP Block Partitioning Under the Perfect Phylogeny Model On the Complexity of SNP Block Partitioning Under the Perfect Phylogeny Model Jens Gramm Wilhelm-Schickard-Institut für Informatik, Universität Tübingen, Germany. Tzvika Hartman Dept. of Computer Science,

More information

More on NP and Reductions

More on NP and Reductions Indian Institute of Information Technology Design and Manufacturing, Kancheepuram Chennai 600 127, India An Autonomous Institute under MHRD, Govt of India http://www.iiitdm.ac.in COM 501 Advanced Data

More information

Lecture 4: NP and computational intractability

Lecture 4: NP and computational intractability Chapter 4 Lecture 4: NP and computational intractability Listen to: Find the longest path, Daniel Barret What do we do today: polynomial time reduction NP, co-np and NP complete problems some examples

More information

PCPs and Inapproximability Gap-producing and Gap-Preserving Reductions. My T. Thai

PCPs and Inapproximability Gap-producing and Gap-Preserving Reductions. My T. Thai PCPs and Inapproximability Gap-producing and Gap-Preserving Reductions My T. Thai 1 1 Hardness of Approximation Consider a maximization problem Π such as MAX-E3SAT. To show that it is NP-hard to approximation

More information

On Approximating An Implicit Cover Problem in Biology

On Approximating An Implicit Cover Problem in Biology On Approximating An Implicit Cover Problem in Biology Mary V. Ashley 1, Tanya Y. Berger-Wolf 2, Wanpracha Chaovalitwongse 3, Bhaskar DasGupta 2, Ashfaq Khokhar 2, and Saad Sheikh 2 1 Department of Biological

More information

Labs 7 and 8: Mitosis, Meiosis, Gametes and Genetics

Labs 7 and 8: Mitosis, Meiosis, Gametes and Genetics Biology 107 General Biology Labs 7 and 8: Mitosis, Meiosis, Gametes and Genetics In Biology 107, our discussion of the cell has focused on the structure and function of subcellular organelles. The next

More information

Name: Period: EOC Review Part F Outline

Name: Period: EOC Review Part F Outline Name: Period: EOC Review Part F Outline Mitosis and Meiosis SC.912.L.16.17 Compare and contrast mitosis and meiosis and relate to the processes of sexual and asexual reproduction and their consequences

More information

Fall 2017 Qualifier Exam: OPTIMIZATION. September 18, 2017

Fall 2017 Qualifier Exam: OPTIMIZATION. September 18, 2017 Fall 2017 Qualifier Exam: OPTIMIZATION September 18, 2017 GENERAL INSTRUCTIONS: 1 Answer each question in a separate book 2 Indicate on the cover of each book the area of the exam, your code number, and

More information

On the minimum neighborhood of independent sets in the n-cube

On the minimum neighborhood of independent sets in the n-cube Matemática Contemporânea, Vol. 44, 1 10 c 2015, Sociedade Brasileira de Matemática On the minimum neighborhood of independent sets in the n-cube Moysés da S. Sampaio Júnior Fabiano de S. Oliveira Luérbio

More information

AN IMPLICIT COVER PROBLEM IN WILD POPULATION STUDY

AN IMPLICIT COVER PROBLEM IN WILD POPULATION STUDY Discrete Mathematics, Algorithms and Applications Vol. 2, No. 1 (2010) 21 31 c World Scientific Publishing Company DOI: 10.1142/S1793830910000449 AN IMPLICIT COVER PROBLEM IN WILD POPULATION STUDY MARY

More information

Journal of Computational Biology. Linear Time Probabilistic Algorithms for the Singular Haplotype Reconstruction Problem from SNP Fragments

Journal of Computational Biology. Linear Time Probabilistic Algorithms for the Singular Haplotype Reconstruction Problem from SNP Fragments : http://mc.manuscriptcentral.com/liebert/jcb Linear Time Probabilistic Algorithms for the Singular Haplotype Reconstruction Problem from SNP Fragments Journal: Manuscript ID: Manuscript Type: Date Submitted

More information

Semigroup presentations via boundaries in Cayley graphs 1

Semigroup presentations via boundaries in Cayley graphs 1 Semigroup presentations via boundaries in Cayley graphs 1 Robert Gray University of Leeds BMC, Newcastle 2006 1 (Research conducted while I was a research student at the University of St Andrews, under

More information

4. How to prove a problem is NPC

4. How to prove a problem is NPC The reducibility relation T is transitive, i.e, A T B and B T C imply A T C Therefore, to prove that a problem A is NPC: (1) show that A NP (2) choose some known NPC problem B define a polynomial transformation

More information

10. How many chromosomes are in human gametes (reproductive cells)? 23

10. How many chromosomes are in human gametes (reproductive cells)? 23 Name: Key Block: Define the following terms: 1. Dominant Trait-characteristics that are expressed if present in the genotype 2. Recessive Trait-characteristics that are masked by dominant traits unless

More information

6.046 Recitation 11 Handout

6.046 Recitation 11 Handout 6.046 Recitation 11 Handout May 2, 2008 1 Max Flow as a Linear Program As a reminder, a linear program is a problem that can be written as that of fulfilling an objective function and a set of constraints

More information

Linear Classifiers (Kernels)

Linear Classifiers (Kernels) Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers (Kernels) Blaine Nelson, Christoph Sawade, Tobias Scheffer Exam Dates & Course Conclusion There are 2 Exam dates: Feb 20 th March

More information

Phylogenetic Networks with Recombination

Phylogenetic Networks with Recombination Phylogenetic Networks with Recombination October 17 2012 Recombination All DNA is recombinant DNA... [The] natural process of recombination and mutation have acted throughout evolution... Genetic exchange

More information

arxiv: v4 [q-bio.pe] 7 Jul 2016

arxiv: v4 [q-bio.pe] 7 Jul 2016 Complexity and algorithms for finding a perfect phylogeny from mixed tumor samples Ademir Hujdurović a,b Urša Kačar c Martin Milanič a,b Bernard Ries d Alexandru I. Tomescu e arxiv:1506.07675v4 [q-bio.pe]

More information

Theory of Computation

Theory of Computation Fall 2002 (YEN) Theory of Computation Midterm Exam. Name:... I.D.#:... 1. (30 pts) True or false (mark O for true ; X for false ). (Score=Max{0, Right- 1 2 Wrong}.) (1) X... If L 1 is regular and L 2 L

More information

Unit 3 - Molecular Biology & Genetics - Review Packet

Unit 3 - Molecular Biology & Genetics - Review Packet Name Date Hour Unit 3 - Molecular Biology & Genetics - Review Packet True / False Questions - Indicate True or False for the following statements. 1. Eye color, hair color and the shape of your ears can

More information

The Maximum Flow Problem with Disjunctive Constraints

The Maximum Flow Problem with Disjunctive Constraints The Maximum Flow Problem with Disjunctive Constraints Ulrich Pferschy Joachim Schauer Abstract We study the maximum flow problem subject to binary disjunctive constraints in a directed graph: A negative

More information

The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth

The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth Gregory Gutin, Mark Jones, and Magnus Wahlström Royal Holloway, University of London Egham, Surrey TW20 0EX, UK Abstract In the

More information

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1.

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1. Theory of Computer Science March 20, 2017 C1. Formal Languages and Grammars Theory of Computer Science C1. Formal Languages and Grammars Malte Helmert University of Basel March 20, 2017 C1.1 Introduction

More information

On improving matchings in trees, via bounded-length augmentations 1

On improving matchings in trees, via bounded-length augmentations 1 On improving matchings in trees, via bounded-length augmentations 1 Julien Bensmail a, Valentin Garnero a, Nicolas Nisse a a Université Côte d Azur, CNRS, Inria, I3S, France Abstract Due to a classical

More information

Hanna Furmańczyk EQUITABLE COLORING OF GRAPH PRODUCTS

Hanna Furmańczyk EQUITABLE COLORING OF GRAPH PRODUCTS Opuscula Mathematica Vol. 6 No. 006 Hanna Furmańczyk EQUITABLE COLORING OF GRAPH PRODUCTS Abstract. A graph is equitably k-colorable if its vertices can be partitioned into k independent sets in such a

More information

Part V. Matchings. Matching. 19 Augmenting Paths for Matchings. 18 Bipartite Matching via Flows

Part V. Matchings. Matching. 19 Augmenting Paths for Matchings. 18 Bipartite Matching via Flows Matching Input: undirected graph G = (V, E). M E is a matching if each node appears in at most one Part V edge in M. Maximum Matching: find a matching of maximum cardinality Matchings Ernst Mayr, Harald

More information

ACM 116: Lecture 1. Agenda. Philosophy of the Course. Definition of probabilities. Equally likely outcomes. Elements of combinatorics

ACM 116: Lecture 1. Agenda. Philosophy of the Course. Definition of probabilities. Equally likely outcomes. Elements of combinatorics 1 ACM 116: Lecture 1 Agenda Philosophy of the Course Definition of probabilities Equally likely outcomes Elements of combinatorics Conditional probabilities 2 Philosophy of the Course Probability is the

More information

CS60007 Algorithm Design and Analysis 2018 Assignment 1

CS60007 Algorithm Design and Analysis 2018 Assignment 1 CS60007 Algorithm Design and Analysis 2018 Assignment 1 Palash Dey and Swagato Sanyal Indian Institute of Technology, Kharagpur Please submit the solutions of the problems 6, 11, 12 and 13 (written in

More information

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005

CSCE 750 Final Exam Answer Key Wednesday December 7, 2005 CSCE 750 Final Exam Answer Key Wednesday December 7, 2005 Do all problems. Put your answers on blank paper or in a test booklet. There are 00 points total in the exam. You have 80 minutes. Please note

More information

AUTHORIZATION TO LEND AND REPRODUCE THE THESIS. Date Jong Wha Joanne Joo, Author

AUTHORIZATION TO LEND AND REPRODUCE THE THESIS. Date Jong Wha Joanne Joo, Author AUTHORIZATION TO LEND AND REPRODUCE THE THESIS As the sole author of this thesis, I authorize Brown University to lend it to other institutions or individuals for the purpose of scholarly research. Date

More information

Aphylogenetic network is a generalization of a phylogenetic tree, allowing properties that are not tree-like.

Aphylogenetic network is a generalization of a phylogenetic tree, allowing properties that are not tree-like. INFORMS Journal on Computing Vol. 16, No. 4, Fall 2004, pp. 459 469 issn 0899-1499 eissn 1526-5528 04 1604 0459 informs doi 10.1287/ijoc.1040.0099 2004 INFORMS The Fine Structure of Galls in Phylogenetic

More information

Solving the MWT. Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP:

Solving the MWT. Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP: Solving the MWT Recall the ILP for the MWT. We can obtain a solution to the MWT problem by solving the following ILP: max subject to e i E ω i x i e i C E x i {0, 1} x i C E 1 for all critical mixed cycles

More information

Approximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs

Approximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs Approximation Algorithms for Asymmetric TSP by Decomposing Directed Regular Multigraphs Haim Kaplan Tel-Aviv University, Israel haimk@post.tau.ac.il Nira Shafrir Tel-Aviv University, Israel shafrirn@post.tau.ac.il

More information

10.4 The Kruskal Katona theorem

10.4 The Kruskal Katona theorem 104 The Krusal Katona theorem 141 Example 1013 (Maximum weight traveling salesman problem We are given a complete directed graph with non-negative weights on edges, and we must find a maximum weight Hamiltonian

More information

Combinatorial Optimization

Combinatorial Optimization Combinatorial Optimization 2017-2018 1 Maximum matching on bipartite graphs Given a graph G = (V, E), find a maximum cardinal matching. 1.1 Direct algorithms Theorem 1.1 (Petersen, 1891) A matching M is

More information

A Tiling Approach to Chebyshev Polynomials

A Tiling Approach to Chebyshev Polynomials A Tiling Approach to Chebyshev Polynomials Daniel Walton Arthur T. Benjamin, Advisor Sanjai Gupta, Reader May, 2007 Department of Mathematics Copyright c 2007 Daniel Walton. The author grants Harvey Mudd

More information

Lesson 4: Understanding Genetics

Lesson 4: Understanding Genetics Lesson 4: Understanding Genetics 1 Terms Alleles Chromosome Co dominance Crossover Deoxyribonucleic acid DNA Dominant Genetic code Genome Genotype Heredity Heritability Heritability estimate Heterozygous

More information

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission. CSE 5 Homework Due: Monday October 9, 7 Instructions Upload a single file to Gradescope for each group. should be on each page of the submission. All group members names and PIDs Your assignments in this

More information

Preliminaries and Complexity Theory

Preliminaries and Complexity Theory Preliminaries and Complexity Theory Oleksandr Romanko CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 16, 2006 Introduction Book structure: 2 Part I Linear Algebra

More information

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT

THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2009 International Press Vol. 9, No. 4, pp. 295-302, 2009 001 THE THREE-STATE PERFECT PHYLOGENY PROBLEM REDUCES TO 2-SAT DAN GUSFIELD AND YUFENG WU Abstract.

More information

1 Primals and Duals: Zero Sum Games

1 Primals and Duals: Zero Sum Games CS 124 Section #11 Zero Sum Games; NP Completeness 4/15/17 1 Primals and Duals: Zero Sum Games We can represent various situations of conflict in life in terms of matrix games. For example, the game shown

More information

In English, there are at least three different types of entities: letters, words, sentences.

In English, there are at least three different types of entities: letters, words, sentences. Chapter 2 Languages 2.1 Introduction In English, there are at least three different types of entities: letters, words, sentences. letters are from a finite alphabet { a, b, c,..., z } words are made up

More information

Algorithms and Theory of Computation. Lecture 22: NP-Completeness (2)

Algorithms and Theory of Computation. Lecture 22: NP-Completeness (2) Algorithms and Theory of Computation Lecture 22: NP-Completeness (2) Xiaohui Bei MAS 714 November 8, 2018 Nanyang Technological University MAS 714 November 8, 2018 1 / 20 Set Cover Set Cover Input: a set

More information

Graph coloring, perfect graphs

Graph coloring, perfect graphs Lecture 5 (05.04.2013) Graph coloring, perfect graphs Scribe: Tomasz Kociumaka Lecturer: Marcin Pilipczuk 1 Introduction to graph coloring Definition 1. Let G be a simple undirected graph and k a positive

More information

The Minimum k-colored Subgraph Problem in Haplotyping and DNA Primer Selection

The Minimum k-colored Subgraph Problem in Haplotyping and DNA Primer Selection The Minimum k-colored Subgraph Problem in Haplotyping and DNA Primer Selection M.T. Hajiaghayi K. Jain K. Konwar L.C. Lau I.I. Măndoiu A. Russell A. Shvartsman V.V. Vazirani Abstract In this paper we consider

More information

Paths and cycles in extended and decomposable digraphs

Paths and cycles in extended and decomposable digraphs Paths and cycles in extended and decomposable digraphs Jørgen Bang-Jensen Gregory Gutin Department of Mathematics and Computer Science Odense University, Denmark Abstract We consider digraphs called extended

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Greedy Algorithms My T. UF

Greedy Algorithms My T. UF Introduction to Algorithms Greedy Algorithms @ UF Overview A greedy algorithm always makes the choice that looks best at the moment Make a locally optimal choice in hope of getting a globally optimal solution

More information

Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples

Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples Published in: Journal of Global Optimization, 5, pp. 69-9, 199. Inference of A Minimum Size Boolean Function by Using A New Efficient Branch-and-Bound Approach From Examples Evangelos Triantaphyllou Assistant

More information

Lecture 14 - P v.s. NP 1

Lecture 14 - P v.s. NP 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanford.edu) February 27, 2018 Lecture 14 - P v.s. NP 1 In this lecture we start Unit 3 on NP-hardness and approximation

More information

Cost-Constrained Matchings and Disjoint Paths

Cost-Constrained Matchings and Disjoint Paths Cost-Constrained Matchings and Disjoint Paths Kenneth A. Berman 1 Department of ECE & Computer Science University of Cincinnati, Cincinnati, OH Abstract Let G = (V, E) be a graph, where the edges are weighted

More information

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms 1. Computability, Complexity and Algorithms Part a: You are given a graph G = (V,E) with edge weights w(e) > 0 for e E. You are also given a minimum cost spanning tree (MST) T. For one particular edge

More information

arxiv: v1 [cs.ds] 2 Oct 2018

arxiv: v1 [cs.ds] 2 Oct 2018 Contracting to a Longest Path in H-Free Graphs Walter Kern 1 and Daniël Paulusma 2 1 Department of Applied Mathematics, University of Twente, The Netherlands w.kern@twente.nl 2 Department of Computer Science,

More information

Chapter 7 Matchings and r-factors

Chapter 7 Matchings and r-factors Chapter 7 Matchings and r-factors Section 7.0 Introduction Suppose you have your own company and you have several job openings to fill. Further, suppose you have several candidates to fill these jobs and

More information

1 Non-deterministic Turing Machine

1 Non-deterministic Turing Machine 1 Non-deterministic Turing Machine A nondeterministic Turing machine is a generalization of the standard TM for which every configuration may yield none, or one or more than one next configurations. In

More information

CMPSCI611: The Matroid Theorem Lecture 5

CMPSCI611: The Matroid Theorem Lecture 5 CMPSCI611: The Matroid Theorem Lecture 5 We first review our definitions: A subset system is a set E together with a set of subsets of E, called I, such that I is closed under inclusion. This means that

More information

k-blocks: a connectivity invariant for graphs

k-blocks: a connectivity invariant for graphs 1 k-blocks: a connectivity invariant for graphs J. Carmesin R. Diestel M. Hamann F. Hundertmark June 17, 2014 Abstract A k-block in a graph G is a maximal set of at least k vertices no two of which can

More information

CS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs. Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs. Instructor: Shaddin Dughmi CS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs Instructor: Shaddin Dughmi Outline 1 Introduction 2 Shortest Path 3 Algorithms for Single-Source Shortest

More information

Parameterized Complexity of the Arc-Preserving Subsequence Problem

Parameterized Complexity of the Arc-Preserving Subsequence Problem Parameterized Complexity of the Arc-Preserving Subsequence Problem Dániel Marx 1 and Ildikó Schlotter 2 1 Tel Aviv University, Israel 2 Budapest University of Technology and Economics, Hungary {dmarx,ildi}@cs.bme.hu

More information

Decomposing dense bipartite graphs into 4-cycles

Decomposing dense bipartite graphs into 4-cycles Decomposing dense bipartite graphs into 4-cycles Nicholas J. Cavenagh Department of Mathematics The University of Waikato Private Bag 3105 Hamilton 3240, New Zealand nickc@waikato.ac.nz Submitted: Jun

More information

Equitable and semi-equitable coloring of cubic graphs and its application in batch scheduling

Equitable and semi-equitable coloring of cubic graphs and its application in batch scheduling Equitable and semi-equitable coloring of cubic graphs and its application in batch scheduling Hanna Furmańczyk, Marek Kubale Abstract In the paper we consider the problems of equitable and semi-equitable

More information