METABOLIC PATHWAY PREDICTION/ALIGNMENT

Similar documents
Biological Pathways Representation by Petri Nets and extension

BIOINFORMATICS: An Introduction

Overview of Kinetics

Bioinformatics. Dept. of Computational Biology & Bioinformatics

86 Part 4 SUMMARY INTRODUCTION

GOSAP: Gene Ontology Based Semantic Alignment of Biological Pathways

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Metabolic modelling. Metabolic networks, reconstruction and analysis. Esa Pitkänen Computational Methods for Systems Biology 1 December 2009

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

ATLAS of Biochemistry

Missouri Educator Gateway Assessments

V19 Metabolic Networks - Overview

V14 Graph connectivity Metabolic networks

V14 extreme pathways

BIOINFORMATICS LAB AP BIOLOGY

Graph Alignment and Biological Networks

Chapter 7: Metabolic Networks

Prediction of protein function from sequence analysis

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

BMD645. Integration of Omics

Genomic Comparison of Bacterial Species Based on Metabolic Characteristics

Computational Structural Bioinformatics

Activating Strategy. AP Lesson #10. EQ: What is metabolism and what role does energy play in metabolism? How does energy move through an environment?

Mestrado em Microbiologia Aplicada. FRM Fisiologia e Regulação Microbiana. IFRM Introdução à Fisiologia e Regulação Microbiana. Ano Lectivo 2017/2018

Comparative genomics: Overview & Tools + MUMmer algorithm

Curriculum Links. AQA GCE Biology. AS level

CSCE555 Bioinformatics. Protein Function Annotation

Biochemistry 3300 Problems (and Solutions) Metabolism I

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February ISSN

Predicting rice (Oryza sativa) metabolism

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

In-Depth Assessment of Local Sequence Alignment

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Mutation Selection on the Metabolic Pathway and the Effects on Protein Co-evolution and the Rate Limiting Steps on the Tree of Life

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Identifying Signaling Pathways

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180

SABIO-RK Integration and Curation of Reaction Kinetics Data Ulrike Wittig

Chapter 8: An Introduction to Metabolism

Chapter 3. Chemistry of Life

Updated: 10/11/2018 Page 1 of 5

Chapter 6- An Introduction to Metabolism*

Computational Molecular Biology (

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Supplementary Information

Introduction on metabolism & refresher in enzymology

AP Biology Essential Knowledge Cards BIG IDEA 1

Chapter 15 part 2. Biochemistry I Introduction to Metabolism Bioenergetics: Thermodynamics in Biochemistry. ATP 4- + H 2 O ADP 3- + P i + H +

Chapter 6: Energy Flow in the Life of a Cell

FCModeler: Dynamic Graph Display and Fuzzy Modeling of Regulatory and Metabolic Maps

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

2. In regards to the fluid mosaic model, which of the following is TRUE?

Stoichiometric analysis of metabolic networks

Chapter 8: An Introduction to Metabolism

EVOLUTIONARY DISTANCES

Integration of functional genomics data

Computational Biology: Basics & Interesting Problems

Application of Associative Matrices to Recognize DNA Sequences in Bioinformatics

BIOLOGY 111. CHAPTER 1: An Introduction to the Science of Life

Berg Tymoczko Stryer Biochemistry Sixth Edition Chapter 1:

Energy and Cellular Metabolism

Cell biology traditionally identifies proteins based on their individual actions as catalysts, signaling

Computational Biology 1

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

ADVANCED PLACEMENT BIOLOGY

Chapter 8: An Introduction to Metabolism

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

Range of Competencies

Comparative Network Analysis

A critical examination of stoichiometric and path-finding approaches to metabolic pathways Francisco J. Planes and John E. Beasley

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

AP Biology Review Chapters 6-8 Review Questions Chapter 6: Metabolism: Energy and Enzymes Chapter 7: Photosynthesis Chapter 8: Cellular Respiration

CHAPTER 15 Metabolism: Basic Concepts and Design

Unit 1: Chemistry of Life Guided Reading Questions (80 pts total)

RGP finder: prediction of Genomic Islands

Mapping and Filling Metabolic Pathways

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

Integrated Knowledge-based Reverse Engineering of Metabolic Pathways

V15 Flux Balance Analysis Extreme Pathways

Types of biological networks. I. Intra-cellurar networks

Bioinformatics and BLAST

Support vector machines, Kernel methods, and Applications in bioinformatics

Introduction to Bioinformatics

Unravelling the biochemical reaction kinetics from time-series data

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Cellular Automata Approaches to Enzymatic Reaction Networks

Dr. Amira A. AL-Hosary

BIOLOGY STANDARDS BASED RUBRIC

Matter and Substances Section 3-1

Grundlagen der Systembiologie und der Modellierung epigenetischer Prozesse

2.1 The Nature of Matter

Cell and Molecular Biology

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

HOW TO USE MIKANA. 1. Decompress the zip file MATLAB.zip. This will create the directory MIKANA.

Transcription:

COMPUTATIONAL SYSTEMIC BIOLOGY METABOLIC PATHWAY PREDICTION/ALIGNMENT Hofestaedt R*, Chen M Bioinformatics / Medical Informatics, Technische Fakultaet, Universitaet Bielefeld Postfach 10 01 31, D-33501 Bielefeld, Germany * Corresponding author: e-mail: ralfhofestaedt@uni-bielefeldde Keywords: metabolic pathway, prediction, alignment, PathAligner Summary Motivation: Comparing metabolic pathways are important to identify conservation and variations among different biology system Alignment is a strong indicator of the biologically significant relationship Results: A definition of metabolic pathway is defined An alignment algorithm and a computational system are developed to reveal the similarities between metabolic pathways Availability: http://bibiservtechfakuni-bielefeldde/pathaligner Introduction Today a huge amount of molecular data about different organisms has been accumulated and systematically stored in specific databases (Collado-Vides, Hofestaedt, 2002) This rapid accumulation of biological data provides the possibility of studying metabolic pathways systematically Analysis of metabolic pathways is an essential topic in understanding the relationship between genotype to phenotype (Dandekar et al, 1999) Researches on genomic sequence alignment have been so far intensively conducted Applications and tools, such as FASTA [http://wwwebiacuk/fasta3] and BLAST [http://wwwncbinlmnihgov/ BLAST] have been developed to further understand the biological homology and estimate evolutionary distance Although the information provided by sequenced genomes can yield insights into their evolution and cellular metabolism, knowledge of the genome sequence alone is really only the start point of the real work Several approaches of metabolic pathway alignment are already made in the past years Forst CV and Schulten K (1999; 2001) extended the DNA sequence alignment methods to define distances between metabolic pathways by combining sequence information of involved genes Dandekar et al (1999) compared glycoslysis, Entner-Doudoroff pathway and pyruvate processing in 17 organisms based on the genomic and metabolic pathway data by aligning specific pathway related enzyme-encoding genes on the genomes Tohsato Y et al (2000) proposed a multiple (local) alignment algorithm by utilizing information content that was extended to symbols having a hierarchical structure EC numbers We are going to present a new pathway alignment strategy to analysis and fully characterize metabolic pathways in the cell Metabolic Pathway Definitions A biochemical pathway is defined by Mavrovouniotis ML (1995) as an abstraction of a subset of intricate networks in the soup of interacting biomolecules A prevailing definition is that a metabolic pathway is a special case of a metabolic network with distinct start and end points, initial and terminal vertices, respectively, and a unique path between them, ie a directed reaction graph with substrates as vertices and arcs denoting enzymatic reactions (Forst, Schulten, 1999) Typical metabolic pathways are given by the wall chart of Boehringer Mannheim [http://wwwexpasych/ tools/pathways/] and KEGG [http://wwwgenomeadjp/kegg/metabolismhtml], which have been verified with a number of printed and on-line sources Databases such as KEGG, WIT [http:// 68

witmcsanlgov/wit2/] represent metabolic pathway graphs with labeled arcs indicating the involved enzymes Traditionally metabolic pathways have been defined in the context of their historical discovery, often named after key molecules (eg glycolysis, urea cycle, pentose phosphate pathway and citric acid cycle and so on) Schuster et al (2000) provided a general definition of metabolic pathways based on the concept of elementary flux modes The basic strategy to represent and compute pathways is the reactant-product binary relation Properties of the pathway that rely upon the integration of two or more input molecules and unrelated output molecules and feedback effects are ignored Obviously, a metabolic pathway is a special part of complex network of reactants, products and enzymes with multiple interconnections representing reactions and regulation One is called a pathway only if they are linear and non-branched A pathway s substrates are usually the products of another pathway, and there are junctions where pathways meet or cross We consider that a metabolic pathway is a subset of reactions that describe the biochemical conversion of a given reactant to its desired end product Let M = { m1,, m n } be a set of metabolites in cells Let e i : M M be a function for enzymatic reactions taking place in the cells The fact that e i is a function from a set of substrates S (S M) into a set of products P (P M) It can be written as follows: e i : S P for all m 1, m 2, m 3 M, the following property holds: e 1 ) = m 2 and (m 2 ) = m 3 (e 1 )) = m 3 Let e 1 ) = m 2, (m 2 ) = m 3,, e k (m k ) = m k+1, we define e 1 e k ) = e k (e k-1 e 1 )) = m k+1 A new proposed definition of the metabolic pathway is presented and discussed in the following paragraphs Definition 1 Given e i : M M, a metabolic pathway is defined as a subset of successive enzymatic reaction events P = e 1 e k Each enzymatic reaction e i (1 i k) is catalyzed by a certain enzyme that is denoted as a unique EC number The EC number is expressed with a 4-level hierarchical scheme that has been developing by the International Union of Biochemistry and Molecular Biology (IUBMB) The 4-digit EC number, d 1 d 2 d 3 d 4 represents a sub-sub-subclass indication of biochemical reaction For instance, arginase is numbered by EC 3531, which indicates that the enzyme is a hydrolase (EC 3***), acts on the carbon-nitrogen bonds, other than peptide bonds (sub-class EC 35**) in linear amidines (sub-sub-class EC 353*) Thus we can adapt the EC number as a unique name for the responding enzyme catalyzed reaction Metabolic Pathway Alignment Theory Basics In order to score the similarity (percent identity) between two metabolic pathways, we define the similarity function Definition 2 Let E be a finite set of e functions, an edit operation is an ordered pair (α,β) (Ε {ε}) (Ε {ε})\{(ε,ε)} α and β denote 4-digit EC strings of enzymatic reaction function, eg α=e 1111 β=345, ε denotes the empty string for null function However, if α ε and β ε, then the edit operation (α,β) is identified with a pair of enzymatic reaction function An edit operation (α,β) is written as α β (we can simply written α, β as EC numbers) There are 69

three kinds of edit operations: α ε denotes the deletion of the enzymatic reaction function α, ε β denotes the insertion of the enzymatic reaction function β, α β denotes the replacement of the enzymatic reaction function α by the enzymatic reaction function β, notice that ε ε never happens Definition 3 Let E 1 =e 1 e m =e 1 e n be two metabolic pathways, an alignment of E 1 is a pair sequence (α 1 β 1 β h ) of edit operations such that E 1 =α 1 = β 1,, β h Example 1 The alignment A = (2423 2424, 3545 ε, 3135 3135, ε 2749) of the pathways 423 e 3545 e 3135 and 424 e 3135 749 is written as follows, one over the other: 2423 2424 3545 ε 3135 3135 ε 2749 Similarity Function Definition 4 A similarity function σ assigns to each edit operation (α,β) a nonnegative real number The similarity σ (α,ε) and σ (ε,β) of the deletion operation (α,ε) and insertion operation (ε,β) is 0 For all replacement operations (α,β) α ε, β ε, say, α = d 1 d 2 d 3 d 4 and β =d 1 d 2 d 3 d 4, then the similarity function σ (α,β) is defined by: 0, if (d 1 d 1 ); 025, if (d 1 =d 1 and d 2 d 2 ); σ (α,β) = 05, if (d 1 =d 1 and d 2 =d 2 and d 3 d 3 ); 075, if (d 1 =d 1 and d 2 =d 2 and d 3 = d 3 and d 4 d 4 ); 1, if (d 1 =d 1 and d 2 =d 2 and d 3 = d 3 and d 4 = d 4 ie α=β) The definition does not exclude the possibility that d 4, d 3 d 4, and d 2 d 3 d 4 can be respectively expressed as wide card symbols *, ** and *** which means no clear classification of the enzyme However single pair of EC string comparison just means to measure how different EC strings are Often it is additionally of interest to analyze the total difference between two strings into σ collection of individual elementary differences The most important mode of such analyses is an alignment of the pathways The function s can be extended to alignments in a straightforward way: the similarity σ (A) of an alignment A=(α 1 β 1 β h ) is the sum of the similarities of the edit operations A consists of h σ ( A) = σ( αi βi) i= 1 A alignment scoring scheme, Score(E 1,E 2 ) of two metabolic pathways is the minimal mean similarity of their alignment 1 Score( E 1,E2) = σ ( A) max( m,n), where m, n are the lengths of pathways 70

Algorithms and Implementation The pairwise alignment algorithm is as follows: 1 Initialize the set of unaligned EC number sequences, and the lengths; 2 Starting from both ends towards the middle, align one sequence to another and attempt to find all EC numbers with same 4-level hierarchical numbers Score the similarities Recall the alignment positions where EC number are identical and cut the sequences into more subsequences by removing the identical EC numbers; 3 Each pair of sub-sequences is initialized to begin a new round of 3-level hierarchical EC number matching till all pairs of subsequences are aligned A similarity score is calculated afterwards; 4 Apply the same rule again, find the similarities of rest unaligned sub-sub-sequences based on 2-level hierarchical EC number matching and then sub-sub-sub-sequences on 1-level matching if any The algorithm has been implemented into the PathAligner system (http://bibiservtechfakunibielefeldde/pathaligner) (Fig) Fig A screenshot of PathAligner Three web-based alignment interfaces are implemented: E-E Alignment, M-E-M Alignment and Multiple Alignment E-E Alignment uses the basic algorithm to align two linear metabolic pathways (represented as EC number sequences) User can also align any such a metabolic pathway against our pool database to find a list of hits M-E-M Alignment considers the differences of metabolites in two pathways, which are presented as Metabolite-EC number-metabolite patterns of sequence It is possible to pick up two such pathways and align them to identify whether they are alternative pathways or partially are Multiple Alignment allows the alignment of more than two metabolic pathways 71

Conclusion Identification and analysis of metabolic networks is a complex task due to the complexity of the metabolic system Abstract pathway defined as a linear reaction sequence is practical for our alignment algorithm We have presented an algorithm to study the problem of metabolic pathway alignment Our algorithm calculates the hierarchical similarities of EC numbers mapping from both ends of the sequences The algorithm has been successfully implemented into the PathAligner system References Collado-Vides J, Hofestädt R Gene regulation and Metabolism Post genomic Computational Approaches MIT Press, Cambridge, MA 2002 Dandekar T, Schuster S, Snel B et al Pathway alignment: application to the comparative analysis of glycolytic enzymes // Biochem J 1999 V 1 P 115 24 Forst CV, Schulten K Evolution of metabolisms: a new method for the comparison of metabolic pathways using genomics information // J Comput Biol 1999 V 6 P 343 360 Forst CV, Schulten K Phylogenetic analysis of metabolic pathways // J Mol Evol 2001 V 52 P 471 489 Mavrovouniotis ML Computational methods for complex metabolic systems: representation of multiple levels of detail // Bioinformatics & Genome Research / Eds HA Lim, CR Cantor World Scientific, 1995 P 265 273 Schuster S, Fell D, Dandekar T A general definition of metabolic pathways uuseful for systematic organization and analysis of complex metabolic networks // Nature Biotechnol 2000 V 18 P 326 332 Tohsato Y, Matsuda H, Hashimot A A multiple alignment algorithm for metabolic pathway analysis using enzyme hierarchy // Proc 8th International Conference on Intelligent Systems for Molecular Biology (ISMB 2000), 2000 P 376 383 72