Heterogeneous Graph Mining for Biological Pattern Discovery in Metabolic Pathways

Size: px
Start display at page:

Download "Heterogeneous Graph Mining for Biological Pattern Discovery in Metabolic Pathways"

Transcription

1 Heterogeneous Graph Mining for Biological Pattern Discovery in Metabolic Pathways Alexandra Zaharia, Bernard Labedan, Christine Froidevaux, Alain Denise LRI, I2BC, Université Paris Sud, CNRS, Université Paris Saclay SeqBio 2016, Nantes, November 17 th 2016

2 Biological motivation Metabolic pathway (directed graph) Vertices reactions; Arcs connect reactions sharing a metabolite. A B C D E F Gene neighboring (undirected graph) Vertices genes; Edges connect adjacent genes (same chromosome, same strand). Objective: find consecutive reactions in a metabolic pathway that are catalyzed by products of neighboring genes. 2 / 19

3 Model Metabolic pathway: D = (V, A) Gene neighboring: G = (U, E ) A B C D E 3 / 19

4 Model Metabolic pathway: D = (V, A) Gene neighboring: G = (U, E ) A B C D E Correspondence graph: G = (V, E) [Babou, 2012] Vertices reactions; Edges translate gene adjacency (in G ) into reaction connectivity. Correspondence function: f : V 2 U v f(v) C, D A, D E B C A Associates to every vertex in D a subset of vertices of G. 3 / 19

5 Model D = (V, A) G = (U, E ) A B C D E G = (V, E) f : V 2 U v f(v) C, D A, D E B C A 4 / 19

6 Model D = (V, A) G = (U, E ) A B C D E G = (V, E) f : V 2 U v f(v) C, D A, D E B C A 4 / 19

7 Model D = (V, A) G = (U, E ) A B C D E G = (V, E) f : V 2 U v f(v) C, D A, D E B C A 4 / 19

8 Model D = (V, A) G = (U, E ) A B C D E G = (V, E) f : V 2 U v f(v) C, D A, D E B C A 4 / 19

9 Problem formulation Objective: find consecutive reactions in a metabolic pathway that are catalyzed by products of neighboring genes. Walk: ordered sequence of vertices such that any two consecutive vertices of the walk are connected by an arc. Trail: walk with no repeated arcs. Path: walk with no repeated vertices. paths trails walks 5 / 19

10 Problem formulation Longest Supported Path (LSP) [Fertin et al., 2012] Input: A directed graph D = (V, A), an undirected graph G = (V, E). Output: A longest path P in D such that G[V(P)] is connected. LSP is NP-hard [Fertin et al., 2015]. Heuristic solution proposed if D is a DAG [Fertin et al., 2012]. What to do if D contains cycles? Span: the number of distinct vertices in a trail. 6 / 19

11 Problem formulation Longest Supported Path (LSP) [Fertin et al., 2012] Input: A directed graph D = (V, A), an undirected graph G = (V, E). Output: A longest path P in D such that G[V(P)] is connected. Supported Trail of Maximum Span (STMS) Input: A directed graph D = (V, A), an undirected graph G = (V, E), an arc (u, v) in D. Output: A trail of maximum span T in D passing through (u, v) such that G[V(T)] is connected. Solution can contain cycles. STMS is NP-hard. How to enumerate trails in D? Line graph 6 / 19

12 Problem formulation Let D = (V, A) be a directed graph. The line graph of D is the graph L(D) = (A, A ), where (x, y) A x A, y A, x = (r, s), y = (s, t), with r, s, t V. D ,2 2,3 8,3 L(D) ,4 4,5 5, ,6 6,7 7,5 7 / 19

13 Problem formulation Let D = (V, A) be a directed graph. The line graph of D is the graph L(D) = (A, A ), where (x, y) A x A, y A, x = (r, s), y = (s, t), with r, s, t V. Trail (= distinct arcs) Path (= distinct vertices) D ,2 2,3 8,3 L(D) ,4 4,5 5, ,6 6,7 7,5 T = T is the trail in D corresponding to path P in L(D). Notation: T = L -1 (P). P = (1,4) (4,5) (5,6) (6,7) (7,5) (5,8) (8,3) 7 / 19

14 Problem formulation Longest Supported Path (LSP) [Fertin et al., 2012] Input: A directed graph D = (V, A), an undirected graph G = (V, E). Output: A longest path P in D such that G[V(P)] is connected. Supported Trail of Maximum Span (STMS) Input: A directed graph D = (V, A), an undirected graph G = (V, E), an arc (u, v) in D. Output: A trail of maximum span T in D passing through (u, v) such that G[V(T)] is connected. Supported Corresponding Trail of Maximum Span (CTMS) Input: A directed graph D = (V, A), an undirected graph G = (V, E), an arc (u, v) in D. Output: A path P in the line graph of D such that L -1 (P) has maximum span, passes through arc (u, v) and G[V(L -1 (P))] is connected. 8 / 19

15 Allowing for skipped vertices Objective: find consecutive reactions in a metabolic pathway that are catalyzed by products of neighboring genes. Flexibility? Should be able to skip a few reactions and/or genes [Boyer et al., 2005] Gap parameters: δ G how many genes can be skipped (default: δ G = 0) δ D how many reactions can be skipped (default: δ D = 0) G A B X C D δ G = 0 δ G = 1 δ G = 2 9 / 19

16 Graph reduction Cover set of a path P in D with respect to G [Fertin et al., 2012] Intuitively, if it exists, it is a maximal subset of vertices of D that could extend P to P such that P induces connected subgraphs in G and the undirected graph underlying D. D G / 19

17 Graph reduction Cover set of a path P in D with respect to G [Fertin et al., 2012] Intuitively, if it exists, it is a maximal subset of vertices of D that could extend P to P such that P induces connected subgraphs in G and the undirected graph underlying D. D G Let: 5 D = (V, A) directed graph G = (V, E) undirected graph (u, v) arc in D S cover set of (u, v) in D with respect to G. 5 We have shown that STMS(D, G, (u, v)) yields the same solution as SMTS(D[S], G[S], (u, v)) 10 / 19

18 Path finding in the line graph D (Directed graph) 2,1 1,2 C 1 C 3 8,6 1,3 3,4 6,4 7,8 6,7 C 2 C 4 C 5 4,5 5,3 L(D) (Line graph of D) 11 / 19

19 Path finding in the line graph D (Directed graph) 2,1 1,2 C 1 C 3 8,6 1,3 3,4 6,4 7,8 6,7 C 2 C 4 C 5 4,5 5,3 L(D) (Line graph of D) For every strongly connected component (SCC) of L(D): All possible entry and exit points are determined. Paths are enumerated between feasible pairs of entry and exit points. The best ones (in terms of span of their corresponding trail in D) are retained. 11 / 19

20 HNet HNet* provides an exact solution to CTMS. Supported Corresponding Trail of Maximum Span (CTMS) Input: A directed graph D = (V, A), an undirected graph G = (V, E), an arc (u, v) in D. Output: A path P in the line graph of D such that L -1 (P) has maximum span, passes through arc (u, v) and G[V(L -1 (P))] is connected. HNet uses: The graph reduction to the cover set of the input arc. Path finding in the line graph. * HNet stands for Heterogeneous Network Mining. 12 / 19

21 HNet 2,1 1,2 C 1 C 3 8,6 1,3 3,4 C 2 C 4 6,4 C 5 4,5 5,3 L(D) (Line graph of D) C 1 C 2 C 5 C 4 C 3 C (Condensation graph of L(D)) 7,8 6,7 All paths in C are enumerated. Every path in C is translated to paths in L(D). A path in L(D) is a candidate if: (a) It contains vertex (u, v); (b) Its corresponding trail in D induces a connected subgraph in G. Solution to CTMS: the path in L(D) that fulfills (a) and (b) and whose corresponding trail in D has maximum span. 13 / 19

22 Application to biological data Gene neighboring information 5 3 G = (U, E ) 3 5 KEGG Metabolic pathways G = (V, E) HNet D = (V, A) For every arc (u, v) of D: trail of maximum span passing through (u, v) and inducing a connected subgraph in G 14 / 19

23 Application to biological data Metabolic pathways (on average 73) of 50 bacterial species δ G and δ D between 0 and 3 5-minute timeout => 95% of data set analyzed Run-times* for: Strict neighboring (δ G = δ D = 0): ~11 minutes Average time per organism: ~13 seconds; Median time per organism: ~8 seconds. One insertion allowed (δ G = δ D = 1): ~2h24 minutes Average time per organism: ~43 seconds; Median time per organism: ~15 seconds. * Intel Core 2.5 GHz (6 MB cache), 16 GB 1600 MHz 15 / 19

24 Application to biological data Actinobacteria sco Streptomyces coelicolor A3(2) cgl Corynebacterium glutamicum ATCC bbv Bifidobacterium breve ACS-071-V-Sch8b Firmicutes sau Staphylococcus aureus N315 lmo Listeria monocytogenes EGD-e bsu Bacillus subtilis subsp. subtilis str. 168 snd Streptococcus pneumoniae ST556 cpe Clostridium perfringens str. 13 R03504 R03503 R sco dg = 1, dd = 0 cgl dg = 0, dd = 0 sau dg = 0, dd = 0 bbv dg = 0, dd = 0 lmo dg = 0, dd = 0 cpe dg = 0, dd = 0 bsu dg = 0, dd = 0 R03504 R03503 R03067 R { , } snd dg = 1, dd = 0 R00428 R05046 R05048 R04639 [R04620] R03504 R03503 R [ ] cgl dg = 0, dd = 1 16 / 19

25 Application to biological data R05046 R05048 R04639 R00428 R03504 R03503 R03066 R03067 Firmicutes sau SA0472 SA0473 SA0474 R02237 SCO3400 SCO3401 SCO3402 SCO3403 sco lmo bsu lmo0224 lmo0225 lmo0226 BSU00770 BSU00780 BSU00790 NCgl2599 NCgl2600 NCgl2601 NCgl2602 cgl bbv snd MYY_0368 MYY_0369 MYY_0370 MYY_0371 Actinobacteria cpe CPE1019 CPE1020 CPE1021 CPE / 19

26 Conclusion & perspectives Proposed a new problem formulation for identifying consecutive reactions being catalyzed by products of neighboring genes. Proposed an exact method (HNet). Integrated useful concepts into HNet: Cover set [Fertin et al., 2012]; Gap parameters [Boyer et al., 2005]. HNet is quite fast in practice. Analyze metabolic pathway variation (phylogenetic perspective). Infer ancestral bacterial metabolism (evolutionary perspective). Perform extensive study of fungi. 18 / 19

27 Thank you! Questions? 19 / 19

28 References Babou, H. M. (2012). Comparaison de réseaux biologiques (Doctoral dissertation, Université de Nantes). Boyer, F., Morgat, A., Labarre, L., Pothier, J., & Viari, A. (2005). Syntons, metabolons and interactons: an exact graph-theoretical approach for exploring neighbourhood between genomic and functional data. Bioinformatics, 21(23), Fertin, G., Babou, H. M., & Rusu, I. (2012, June). Algorithms for subnetwork mining in heterogeneous networks. In International Symposium on Experimental Algorithms (pp ). Springer Berlin Heidelberg. Fertin, G., Komusiewicz, C., Mohamed-Babou, H., & Rusu, I. (2015). Finding Supported Paths in Heterogeneous Networks. Algorithms, 8(4),

29 Walks. Trails. Paths Walk: ordered sequence of vertices such that any two consecutive vertices of the walk are connected by an arc. Trail: walk with no repeated arcs. Path: walk with no repeated vertices paths trails walks

Integration of Omics Data to Investigate Common Intervals

Integration of Omics Data to Investigate Common Intervals 2011 International Conference on Bioscience, Biochemistry and Bioinformatics IPCBEE vol.5 (2011) (2011) IACSIT Press, Singapore Integration of Omics Data to Investigate Common Intervals Sébastien Angibaud,

More information

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM

AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM MENG ZHANG College of Computer Science and Technology, Jilin University, China Email: zhangmeng@jlueducn WILLIAM ARNDT AND JIJUN TANG Dept of Computer Science

More information

Constructive proof of deficiency theorem of (g, f)-factor

Constructive proof of deficiency theorem of (g, f)-factor SCIENCE CHINA Mathematics. ARTICLES. doi: 10.1007/s11425-010-0079-6 Constructive proof of deficiency theorem of (g, f)-factor LU HongLiang 1, & YU QingLin 2 1 Center for Combinatorics, LPMC, Nankai University,

More information

Bioinformatics: Network Analysis

Bioinformatics: Network Analysis Bioinformatics: Network Analysis Comparative Network Analysis COMP 572 (BIOS 572 / BIOE 564) - Fall 2013 Luay Nakhleh, Rice University 1 Biomolecular Network Components 2 Accumulation of Network Components

More information

Nonnegative Matrices I

Nonnegative Matrices I Nonnegative Matrices I Daisuke Oyama Topics in Economic Theory September 26, 2017 References J. L. Stuart, Digraphs and Matrices, in Handbook of Linear Algebra, Chapter 29, 2006. R. A. Brualdi and H. J.

More information

Revisiting the Minimum Breakpoint Linearization Problem Theoretical Computer Science

Revisiting the Minimum Breakpoint Linearization Problem Theoretical Computer Science Revisiting the Minimum Breakpoint Linearization Problem Theoretical Computer Science Laurent Bulteau, Guillaume Fertin, Irena Rusu To cite this version: Laurent Bulteau, Guillaume Fertin, Irena Rusu. Revisiting

More information

Minimum Linear Arrangements

Minimum Linear Arrangements Minimum Linear Arrangements Rafael Andrade, Tibérius Bonates, Manoel Câmpelo, Mardson Ferreira ParGO - Research team in Parallel computing, Graph theory and Optimization Department of Statistics and Applied

More information

The combinatorics and algorithmics of genomic rearrangements have been the subject of much

The combinatorics and algorithmics of genomic rearrangements have been the subject of much JOURNAL OF COMPUTATIONAL BIOLOGY Volume 22, Number 5, 2015 # Mary Ann Liebert, Inc. Pp. 425 435 DOI: 10.1089/cmb.2014.0096 An Exact Algorithm to Compute the Double-Cutand-Join Distance for Genomes with

More information

Inferring Causal Phenotype Networks from Segregating Populat

Inferring Causal Phenotype Networks from Segregating Populat Inferring Causal Phenotype Networks from Segregating Populations Elias Chaibub Neto chaibub@stat.wisc.edu Statistics Department, University of Wisconsin - Madison July 15, 2008 Overview Introduction Description

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem

On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem Paola Bonizzoni, Riccardo Dondi, Gunnar W. Klau, Yuri Pirola, Nadia Pisanti and Simone Zaccaria DISCo, computer

More information

General context Anchor-based method Evaluation Discussion. CoCoGen meeting. Accuracy of the anchor-based strategy for genome alignment.

General context Anchor-based method Evaluation Discussion. CoCoGen meeting. Accuracy of the anchor-based strategy for genome alignment. CoCoGen meeting Accuracy of the anchor-based strategy for genome alignment Raluca Uricaru LIRMM, CNRS Université de Montpellier 2 3 octobre 2008 1 / 31 Summary 1 General context 2 Global alignment : anchor-based

More information

Comparative Analysis of Nitrogen Assimilation Pathways in Pseudomonas using Hypergraphs

Comparative Analysis of Nitrogen Assimilation Pathways in Pseudomonas using Hypergraphs Comparative Analysis of Nitrogen Assimilation Pathways in Pseudomonas using Hypergraphs Aziz Mithani, Arantza Rico, Rachel Jones, Gail Preston and Jotun Hein mithani@stats.ox.ac.uk Department of Statistics

More information

P versus NP. Math 40210, Fall November 10, Math (Fall 2015) P versus NP November 10, / 9

P versus NP. Math 40210, Fall November 10, Math (Fall 2015) P versus NP November 10, / 9 P versus NP Math 40210, Fall 2015 November 10, 2015 Math 40210 (Fall 2015) P versus NP November 10, 2015 1 / 9 Properties of graphs A property of a graph is anything that can be described without referring

More information

Approximation Algorithms for the k-set Packing Problem

Approximation Algorithms for the k-set Packing Problem Approximation Algorithms for the k-set Packing Problem Marek Cygan Institute of Informatics University of Warsaw 20th October 2016, Warszawa Marek Cygan Approximation Algorithms for the k-set Packing Problem

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Inferring positional homologs with common intervals of sequences

Inferring positional homologs with common intervals of sequences Outline Introduction Our approach Results Conclusion Inferring positional homologs with common intervals of sequences Guillaume Blin, Annie Chateau, Cedric Chauve, Yannick Gingras CGL - Université du Québec

More information

1. Which of the following species have strains that are capable of undergoing the process of conjugation?

1. Which of the following species have strains that are capable of undergoing the process of conjugation? Biology 3340 Summer 2005 Second Examination Version A Name Be sure to put your name on the mark-sense sheet as well Directions: Write your name in the correct space on the mark-sense sheet and the exam

More information

Control and synchronization in systems coupled via a complex network

Control and synchronization in systems coupled via a complex network Control and synchronization in systems coupled via a complex network Chai Wah Wu May 29, 2009 2009 IBM Corporation Synchronization in nonlinear dynamical systems Synchronization in groups of nonlinear

More information

Comparing Genomes with Duplications: a Computational Complexity Point of View

Comparing Genomes with Duplications: a Computational Complexity Point of View TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 1 Comparing Genomes with Duplications: a Computational Complexity Point of View Guillaume Blin, Cedric Chauve, Guillaume Fertin, Romeo Rizzi and

More information

Graphs, permutations and sets in genome rearrangement

Graphs, permutations and sets in genome rearrangement ntroduction Graphs, permutations and sets in genome rearrangement 1 alabarre@ulb.ac.be Universite Libre de Bruxelles February 6, 2006 Computers in Scientic Discovery 1 Funded by the \Fonds pour la Formation

More information

Vers un apprentissage subquadratique pour les mélanges d arbres

Vers un apprentissage subquadratique pour les mélanges d arbres Vers un apprentissage subquadratique pour les mélanges d arbres F. Schnitzler 1 P. Leray 2 L. Wehenkel 1 fschnitzler@ulg.ac.be 1 Université deliège 2 Université de Nantes 10 mai 2010 F. Schnitzler (ULG)

More information

Essentiality in B. subtilis

Essentiality in B. subtilis Essentiality in B. subtilis 100% 75% Essential genes Non-essential genes Lagging 50% 25% Leading 0% non-highly expressed highly expressed non-highly expressed highly expressed 1 http://www.pasteur.fr/recherche/unites/reg/

More information

CS281A/Stat241A Lecture 19

CS281A/Stat241A Lecture 19 CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723

More information

Microbial Taxonomy. Classification of living organisms into groups. A group or level of classification

Microbial Taxonomy. Classification of living organisms into groups. A group or level of classification Lec 2 Oral Microbiology Dr. Chatin Purpose Microbial Taxonomy Classification Systems provide an easy way grouping of diverse and huge numbers of microbes To provide an overview of how physicians think

More information

1 Review of Vertex Cover

1 Review of Vertex Cover CS266: Parameterized Algorithms and Complexity Stanford University Lecture 3 Tuesday, April 9 Scribe: Huacheng Yu Spring 2013 1 Review of Vertex Cover In the last lecture, we discussed FPT algorithms for

More information

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering Genome Rearrangements In Man and Mouse Abhinav Tiwari Department of Bioengineering Genome Rearrangement Scrambling of the order of the genome during evolution Operations on chromosomes Reversal Translocation

More information

Decompositions of graphs into cycles with chords

Decompositions of graphs into cycles with chords Decompositions of graphs into cycles with chords Paul Balister Hao Li Richard Schelp May 22, 2017 In memory of Dick Schelp, who passed away shortly after the submission of this paper Abstract We show that

More information

SPANNING TREES WITH A BOUNDED NUMBER OF LEAVES. Junqing Cai, Evelyne Flandrin, Hao Li, and Qiang Sun

SPANNING TREES WITH A BOUNDED NUMBER OF LEAVES. Junqing Cai, Evelyne Flandrin, Hao Li, and Qiang Sun Opuscula Math. 37, no. 4 (017), 501 508 http://dx.doi.org/10.7494/opmath.017.37.4.501 Opuscula Mathematica SPANNING TREES WITH A BOUNDED NUMBER OF LEAVES Junqing Cai, Evelyne Flandrin, Hao Li, and Qiang

More information

ATLAS of Biochemistry

ATLAS of Biochemistry ATLAS of Biochemistry USER GUIDE http://lcsb-databases.epfl.ch/atlas/ CONTENT 1 2 3 GET STARTED Create your user account NAVIGATE Curated KEGG reactions ATLAS reactions Pathways Maps USE IT! Fill a gap

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European

More information

Structure-Based Comparison of Biomolecules

Structure-Based Comparison of Biomolecules Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein

More information

In order to compare the proteins of the phylogenomic matrix, we needed a similarity

In order to compare the proteins of the phylogenomic matrix, we needed a similarity Similarity Matrix Generation In order to compare the proteins of the phylogenomic matrix, we needed a similarity measure. Hamming distances between phylogenetic profiles require the use of thresholds for

More information

METABOLIC PATHWAY PREDICTION/ALIGNMENT

METABOLIC PATHWAY PREDICTION/ALIGNMENT COMPUTATIONAL SYSTEMIC BIOLOGY METABOLIC PATHWAY PREDICTION/ALIGNMENT Hofestaedt R*, Chen M Bioinformatics / Medical Informatics, Technische Fakultaet, Universitaet Bielefeld Postfach 10 01 31, D-33501

More information

A Cubic-Vertex Kernel for Flip Consensus Tree

A Cubic-Vertex Kernel for Flip Consensus Tree To appear in Algorithmica A Cubic-Vertex Kernel for Flip Consensus Tree Christian Komusiewicz Johannes Uhlmann Received: date / Accepted: date Abstract Given a bipartite graph G = (V c, V t, E) and a nonnegative

More information

On the Sound Covering Cycle Problem in Paired de Bruijn Graphs

On the Sound Covering Cycle Problem in Paired de Bruijn Graphs On the Sound Covering Cycle Problem in Paired de Bruijn Graphs Christian Komusiewicz 1 and Andreea Radulescu 2 1 Institut für Softwaretechnik und Theoretische Informatik, TU Berlin, Germany christian.komusiewicz@tu-berlin.de

More information

Enumeration and symmetry of edit metric spaces. Jessie Katherine Campbell. A dissertation submitted to the graduate faculty

Enumeration and symmetry of edit metric spaces. Jessie Katherine Campbell. A dissertation submitted to the graduate faculty Enumeration and symmetry of edit metric spaces by Jessie Katherine Campbell A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY

More information

Ancestral Genome Organization: an Alignment Approach

Ancestral Genome Organization: an Alignment Approach Ancestral Genome Organization: an Alignment Approach Patrick Holloway 1, Krister Swenson 2, David Ardell 3, and Nadia El-Mabrouk 4 1 Département d Informatique et de Recherche Opérationnelle (DIRO), Université

More information

Structures and Hyperstructures in Metabolic Networks

Structures and Hyperstructures in Metabolic Networks Structures and Hyperstructures in Metabolic Networks Alberto Marchetti-Spaccamela (Sapienza U. Rome) joint work with V. Acuña, L.Cottret, P. Crescenzi, V. Lacroix, A. Marino, P. Milreu, A. Ribichini, MF.

More information

IE418 Integer Programming

IE418 Integer Programming IE418: Integer Programming Department of Industrial and Systems Engineering Lehigh University 23rd February 2005 The Ingredients Some Easy Problems The Hard Problems Computational Complexity The ingredients

More information

arxiv: v2 [cs.ds] 2 Dec 2013

arxiv: v2 [cs.ds] 2 Dec 2013 arxiv:1305.4747v2 [cs.ds] 2 Dec 2013 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Easy identification of generalized common and conserved nested intervals Fabien de Montgolfier

More information

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)}

Preliminaries. Graphs. E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)} Preliminaries Graphs G = (V, E), V : set of vertices E : set of edges (arcs) (Undirected) Graph : (i, j) = (j, i) (edges) 1 2 3 5 4 V = {1, 2, 3, 4, 5}, E = {(1, 3), (3, 2), (2, 4)} 1 Directed Graph (Digraph)

More information

Network Alignment 858L

Network Alignment 858L Network Alignment 858L Terms & Questions A homologous h Interolog = B h Species 1 Species 2 Are there conserved pathways? What is the minimum set of pathways required for life? Can we compare networks

More information

arxiv: v1 [cs.ds] 20 Feb 2017

arxiv: v1 [cs.ds] 20 Feb 2017 AN OPTIMAL XP ALGORITHM FOR HAMILTONIAN CYCLE ON GRAPHS OF BOUNDED CLIQUE-WIDTH BENJAMIN BERGOUGNOUX, MAMADOU MOUSTAPHA KANTÉ, AND O-JOUNG KWON arxiv:1702.06095v1 [cs.ds] 20 Feb 2017 Abstract. For MSO

More information

Integer Programming Formulations for the Minimum Weighted Maximal Matching Problem

Integer Programming Formulations for the Minimum Weighted Maximal Matching Problem Optimization Letters manuscript No. (will be inserted by the editor) Integer Programming Formulations for the Minimum Weighted Maximal Matching Problem Z. Caner Taşkın Tınaz Ekim Received: date / Accepted:

More information

Chapter 7 Network Flow Problems, I

Chapter 7 Network Flow Problems, I Chapter 7 Network Flow Problems, I Network flow problems are the most frequently solved linear programming problems. They include as special cases, the assignment, transportation, maximum flow, and shortest

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

On the complexity of unsigned translocation distance

On the complexity of unsigned translocation distance Theoretical Computer Science 352 (2006) 322 328 Note On the complexity of unsigned translocation distance Daming Zhu a, Lusheng Wang b, a School of Computer Science and Technology, Shandong University,

More information

COMPARATIVE PATHWAY ANNOTATION WITH PROTEIN-DNA INTERACTION AND OPERON INFORMATION VIA GRAPH TREE DECOMPOSITION

COMPARATIVE PATHWAY ANNOTATION WITH PROTEIN-DNA INTERACTION AND OPERON INFORMATION VIA GRAPH TREE DECOMPOSITION COMPARATIVE PATHWAY ANNOTATION WITH PROTEIN-DNA INTERACTION AND OPERON INFORMATION VIA GRAPH TREE DECOMPOSITION JIZHEN ZHAO, DONGSHENG CHE AND LIMING CAI Department of Computer Science, University of Georgia,

More information

6 Euler Circuits and Hamiltonian Cycles

6 Euler Circuits and Hamiltonian Cycles November 14, 2017 6 Euler Circuits and Hamiltonian Cycles William T. Trotter trotter@math.gatech.edu EulerTrails and Circuits Definition A trail (x 1, x 2, x 3,, x t) in a graph G is called an Euler trail

More information

Variable Elimination (VE) Barak Sternberg

Variable Elimination (VE) Barak Sternberg Variable Elimination (VE) Barak Sternberg Basic Ideas in VE Example 1: Let G be a Chain Bayesian Graph: X 1 X 2 X n 1 X n How would one compute P X n = k? Using the CPDs: P X 2 = x = x Val X1 P X 1 = x

More information

Detecting Conserved Interaction Patterns in Biological Networks

Detecting Conserved Interaction Patterns in Biological Networks Detecting Conserved Interaction Patterns in Biological Networks Mehmet Koyutürk 1, Yohan Kim 2, Shankar Subramaniam 2,3, Wojciech Szpankowski 1 and Ananth Grama 1 1 Department of Computer Sciences, Purdue

More information

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Networks & pathways. Hedi Peterson MTAT Bioinformatics Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes

More information

Spectral results on regular graphs with (k, τ)-regular sets

Spectral results on regular graphs with (k, τ)-regular sets Discrete Mathematics 307 (007) 1306 1316 www.elsevier.com/locate/disc Spectral results on regular graphs with (k, τ)-regular sets Domingos M. Cardoso, Paula Rama Dep. de Matemática, Univ. Aveiro, 3810-193

More information

The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth

The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth The Mixed Chinese Postman Problem Parameterized by Pathwidth and Treedepth Gregory Gutin, Mark Jones, and Magnus Wahlström Royal Holloway, University of London Egham, Surrey TW20 0EX, UK Abstract In the

More information

Combinatorial optimization problems

Combinatorial optimization problems Combinatorial optimization problems Heuristic Algorithms Giovanni Righini University of Milan Department of Computer Science (Crema) Optimization In general an optimization problem can be formulated as:

More information

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3 The Minimal-Gene-Set -Kapil Rajaraman(rajaramn@uiuc.edu) PHY498BIO, HW 3 The number of genes in organisms varies from around 480 (for parasitic bacterium Mycoplasma genitalium) to the order of 100,000

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

Markov Independence (Continued)

Markov Independence (Continued) Markov Independence (Continued) As an Undirected Graph Basic idea: Each variable V is represented as a vertex in an undirected graph G = (V(G), E(G)), with set of vertices V(G) and set of edges E(G) the

More information

Metabolic networks: Activity detection and Inference

Metabolic networks: Activity detection and Inference 1 Metabolic networks: Activity detection and Inference Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group Advanced microarray analysis course, Elsinore, Denmark, May 21th,

More information

The Multiple Traveling Salesman Problem with Time Windows: Bounds for the Minimum Number of Vehicles

The Multiple Traveling Salesman Problem with Time Windows: Bounds for the Minimum Number of Vehicles The Multiple Traveling Salesman Problem with Time Windows: Bounds for the Minimum Number of Vehicles Snežana Mitrović-Minić Ramesh Krishnamurti School of Computing Science, Simon Fraser University, Burnaby,

More information

Index coding with side information

Index coding with side information Index coding with side information Ehsan Ebrahimi Targhi University of Tartu Abstract. The Index Coding problem has attracted a considerable amount of attention in the recent years. The problem is motivated

More information

Packing Bipartite Graphs with Covers of Complete Bipartite Graphs

Packing Bipartite Graphs with Covers of Complete Bipartite Graphs Packing Bipartite Graphs with Covers of Complete Bipartite Graphs Jérémie Chalopin 1, and Daniël Paulusma 2, 1 Laboratoire d Informatique Fondamentale de Marseille, CNRS & Aix-Marseille Université, 39

More information

Microbiology / Active Lecture Questions Chapter 10 Classification of Microorganisms 1 Chapter 10 Classification of Microorganisms

Microbiology / Active Lecture Questions Chapter 10 Classification of Microorganisms 1 Chapter 10 Classification of Microorganisms 1 2 Bergey s Manual of Systematic Bacteriology differs from Bergey s Manual of Determinative Bacteriology in that the former a. groups bacteria into species. b. groups bacteria according to phylogenetic

More information

Markov properties for directed graphs

Markov properties for directed graphs Graphical Models, Lecture 7, Michaelmas Term 2009 November 2, 2009 Definitions Structural relations among Markov properties Factorization G = (V, E) simple undirected graph; σ Say σ satisfies (P) the pairwise

More information

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012 CS-62 Theory Gems October 24, 202 Lecture Lecturer: Aleksander Mądry Scribes: Carsten Moldenhauer and Robin Scheibler Introduction In Lecture 0, we introduced a fundamental object of spectral graph theory:

More information

NP-Hardness and Fixed-Parameter Tractability of Realizing Degree Sequences with Directed Acyclic Graphs

NP-Hardness and Fixed-Parameter Tractability of Realizing Degree Sequences with Directed Acyclic Graphs NP-Hardness and Fixed-Parameter Tractability of Realizing Degree Sequences with Directed Acyclic Graphs Sepp Hartung and André Nichterlein Institut für Softwaretechnik und Theoretische Informatik TU Berlin

More information

GROOLS: Reactive Graph Reasoning for Genome Annotation

GROOLS: Reactive Graph Reasoning for Genome Annotation GROOLS: Reactive Graph Reasoning for Genome Annotation Jonathan Mercier 123 and David Vallenet 123 1 Direction des Sciences du Vivant, CEA, Institut de Génomique, Genoscope, LABGeM, Evry, France 2 CNRS-UMR8030,

More information

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION

RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION RECOVERING NORMAL NETWORKS FROM SHORTEST INTER-TAXA DISTANCE INFORMATION MAGNUS BORDEWICH, KATHARINA T. HUBER, VINCENT MOULTON, AND CHARLES SEMPLE Abstract. Phylogenetic networks are a type of leaf-labelled,

More information

Network alignment and querying

Network alignment and querying Network biology minicourse (part 4) Algorithmic challenges in genomics Network alignment and querying Roded Sharan School of Computer Science, Tel Aviv University Multiple Species PPI Data Rapid growth

More information

Applying Bayesian networks in the game of Minesweeper

Applying Bayesian networks in the game of Minesweeper Applying Bayesian networks in the game of Minesweeper Marta Vomlelová Faculty of Mathematics and Physics Charles University in Prague http://kti.mff.cuni.cz/~marta/ Jiří Vomlel Institute of Information

More information

V14 Graph connectivity Metabolic networks

V14 Graph connectivity Metabolic networks V14 Graph connectivity Metabolic networks In the first half of this lecture section, we use the theory of network flows to give constructive proofs of Menger s theorem. These proofs lead directly to algorithms

More information

Additional file 1 for Structural correlations in bacterial metabolic networks by S. Bernhardsson, P. Gerlee & L. Lizana

Additional file 1 for Structural correlations in bacterial metabolic networks by S. Bernhardsson, P. Gerlee & L. Lizana Additional file 1 for Structural correlations in bacterial metabolic networks by S. Bernhardsson, P. Gerlee & L. Lizana Table S1 The species marked with belong to the Proteobacteria subset and those marked

More information

Finding a gene tree in a phylogenetic network Philippe Gambette

Finding a gene tree in a phylogenetic network Philippe Gambette LRI-LIX BioInfo Seminar 19/01/2017 - Palaiseau Finding a gene tree in a phylogenetic network Philippe Gambette Outline Phylogenetic networks Classes of phylogenetic networks The Tree Containment Problem

More information

Maximum Motif Problem in Vertex-Colored Graphs

Maximum Motif Problem in Vertex-Colored Graphs Maximum Motif Problem in Vertex-Colored Graphs Riccardo Dondi, Guillaume Fertin, Stéphane Vialette To cite this version: Riccardo Dondi, Guillaume Fertin, Stéphane Vialette. Maximum Motif Problem in Vertex-

More information

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms 1. Computability, Complexity and Algorithms Part a: You are given a graph G = (V,E) with edge weights w(e) > 0 for e E. You are also given a minimum cost spanning tree (MST) T. For one particular edge

More information

A greedy, graph-based algorithm for the alignment of multiple homologous gene lists

A greedy, graph-based algorithm for the alignment of multiple homologous gene lists A greedy, graph-based algorithm for the alignment of multiple homologous gene lists Jan Fostier, Sebastian Proost, Bart Dhoedt, Yvan Saeys, Piet Demeester, Yves Van de Peer, and Klaas Vandepoele Bioinformatics

More information

Towards More Effective Formulations of the Genome Assembly Problem

Towards More Effective Formulations of the Genome Assembly Problem Towards More Effective Formulations of the Genome Assembly Problem Alexandru Tomescu Department of Computer Science University of Helsinki, Finland DACS June 26, 2015 1 / 25 2 / 25 CENTRAL DOGMA OF BIOLOGY

More information

Broadcasting With Side Information

Broadcasting With Side Information Department of Electrical and Computer Engineering Texas A&M Noga Alon, Avinatan Hasidim, Eyal Lubetzky, Uri Stav, Amit Weinstein, FOCS2008 Outline I shall avoid rigorous math and terminologies and be more

More information

Hamilton Cycles in Digraphs of Unitary Matrices

Hamilton Cycles in Digraphs of Unitary Matrices Hamilton Cycles in Digraphs of Unitary Matrices G. Gutin A. Rafiey S. Severini A. Yeo Abstract A set S V is called an q + -set (q -set, respectively) if S has at least two vertices and, for every u S,

More information

Balanced Allocation Through Random Walk

Balanced Allocation Through Random Walk Balanced Allocation Through Random Walk Alan Frieze Samantha Petti November 25, 2017 Abstract We consider the allocation problem in which m (1 ε)dn items are to be allocated to n bins with capacity d.

More information

Undirected Graphical Models

Undirected Graphical Models Undirected Graphical Models 1 Conditional Independence Graphs Let G = (V, E) be an undirected graph with vertex set V and edge set E, and let A, B, and C be subsets of vertices. We say that C separates

More information

On the Complexity of the Minimum Independent Set Partition Problem

On the Complexity of the Minimum Independent Set Partition Problem On the Complexity of the Minimum Independent Set Partition Problem T-H. Hubert Chan 1, Charalampos Papamanthou 2, and Zhichao Zhao 1 1 Department of Computer Science the University of Hong Kong {hubert,zczhao}@cs.hku.hk

More information

A new algorithm to construct phylogenetic networks from trees

A new algorithm to construct phylogenetic networks from trees A new algorithm to construct phylogenetic networks from trees J. Wang College of Computer Science, Inner Mongolia University, Hohhot, Inner Mongolia, China Corresponding author: J. Wang E-mail: wangjuanangle@hit.edu.cn

More information

Hamiltonian problem on claw-free and almost distance-hereditary graphs

Hamiltonian problem on claw-free and almost distance-hereditary graphs Discrete Mathematics 308 (2008) 6558 6563 www.elsevier.com/locate/disc Note Hamiltonian problem on claw-free and almost distance-hereditary graphs Jinfeng Feng, Yubao Guo Lehrstuhl C für Mathematik, RWTH

More information

A SEQUENTIAL ELIMINATION ALGORITHM FOR COMPUTING BOUNDS ON THE CLIQUE NUMBER OF A GRAPH

A SEQUENTIAL ELIMINATION ALGORITHM FOR COMPUTING BOUNDS ON THE CLIQUE NUMBER OF A GRAPH A SEQUENTIAL ELIMINATION ALGORITHM FOR COMPUTING BOUNDS ON THE CLIQUE NUMBER OF A GRAPH Bernard Gendron Département d informatique et de recherche opérationnelle and Centre de recherche sur les transports

More information

Genome-Wide Detection and Analysis of Cell Wall-Bound Proteins with LPxTG-Like Sorting Motifs

Genome-Wide Detection and Analysis of Cell Wall-Bound Proteins with LPxTG-Like Sorting Motifs JOURNAL OF BACTERIOLOGY, July 2005, p. 4928 4934 Vol. 187, No. 14 0021-9193/05/$08.00 0 doi:10.1128/jb.187.14.4928 4934.2005 Copyright 2005, American Society for Microbiology. All Rights Reserved. Genome-Wide

More information

Multiple Whole Genome Alignment

Multiple Whole Genome Alignment Multiple Whole Genome Alignment BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 206 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Prokaryotic phylogenies inferred from protein structural domains

Prokaryotic phylogenies inferred from protein structural domains Letter Prokaryotic phylogenies inferred from protein structural domains Eric J. Deeds, 1 Hooman Hennessey, 2 and Eugene I. Shakhnovich 3,4 1 Department of Molecular and Cellular Biology, Harvard University,

More information

Alternating cycles and paths in edge-coloured multigraphs: a survey

Alternating cycles and paths in edge-coloured multigraphs: a survey Alternating cycles and paths in edge-coloured multigraphs: a survey Jørgen Bang-Jensen and Gregory Gutin Department of Mathematics and Computer Science Odense University, Denmark Abstract A path or cycle

More information

Combinatorial Optimization

Combinatorial Optimization Combinatorial Optimization Problem set 8: solutions 1. Fix constants a R and b > 1. For n N, let f(n) = n a and g(n) = b n. Prove that f(n) = o ( g(n) ). Solution. First we observe that g(n) 0 for all

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Feasibility of Motion Planning on Directed Graphs

Feasibility of Motion Planning on Directed Graphs Feasibility of Motion Planning on Directed Graphs Zhilin Wu 1 and Stéphane Grumbach 2 1 CASIA-LIAMA, zlwu@liama.ia.ac.cn 2 INRIA-LIAMA, stephane.grumbach@inria.fr Abstract. Because of irreversibility of

More information

A necessary and sufficient condition for the existence of a spanning tree with specified vertices having large degrees

A necessary and sufficient condition for the existence of a spanning tree with specified vertices having large degrees A necessary and sufficient condition for the existence of a spanning tree with specified vertices having large degrees Yoshimi Egawa Department of Mathematical Information Science, Tokyo University of

More information

1 Matchings in Non-Bipartite Graphs

1 Matchings in Non-Bipartite Graphs CS 598CSC: Combinatorial Optimization Lecture date: Feb 9, 010 Instructor: Chandra Chekuri Scribe: Matthew Yancey 1 Matchings in Non-Bipartite Graphs We discuss matching in general undirected graphs. Given

More information

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016)

FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016) FINAL EXAM PRACTICE PROBLEMS CMSC 451 (Spring 2016) The final exam will be on Thursday, May 12, from 8:00 10:00 am, at our regular class location (CSI 2117). It will be closed-book and closed-notes, except

More information

A Self-Stabilizing Algorithm for Finding a Minimal Distance-2 Dominating Set in Distributed Systems

A Self-Stabilizing Algorithm for Finding a Minimal Distance-2 Dominating Set in Distributed Systems JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 1709-1718 (2008) A Self-Stabilizing Algorithm for Finding a Minimal Distance-2 Dominating Set in Distributed Systems JI-CHERNG LIN, TETZ C. HUANG, CHENG-PIN

More information

Intuitionistic Fuzzy Estimation of the Ant Methodology

Intuitionistic Fuzzy Estimation of the Ant Methodology BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 2 Sofia 2009 Intuitionistic Fuzzy Estimation of the Ant Methodology S Fidanova, P Marinov Institute of Parallel Processing,

More information

Perfect Sorting by Reversals and Deletions/Insertions

Perfect Sorting by Reversals and Deletions/Insertions The Ninth International Symposium on Operations Research and Its Applications (ISORA 10) Chengdu-Jiuzhaigou, China, August 19 23, 2010 Copyright 2010 ORSC & APORC, pp. 512 518 Perfect Sorting by Reversals

More information