Synthesizing and simplifying biological networks from pathway level information

Similar documents
Inference of signal transduction networks from double

COMPUTER SIMULATION OF DIFFERENTIAL KINETICS OF MAPK ACTIVATION UPON EGF RECEPTOR OVEREXPRESSION

Types of biological networks. I. Intra-cellurar networks

State Machine Modeling of MAPK Signaling Pathways

Richik N. Ghosh, Linnette Grove, and Oleg Lapets ASSAY and Drug Development Technologies 2004, 2:

Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation

Introduction to Bioinformatics

Regulation and signaling. Overview. Control of gene expression. Cells need to regulate the amounts of different proteins they express, depending on

Remarks on structural identification, modularity, and retroactivity

The EGF Signaling Pathway! Introduction! Introduction! Chem Lecture 10 Signal Transduction & Sensory Systems Part 3. EGF promotes cell growth

Chem Lecture 10 Signal Transduction

AP Biology Gene Regulation and Development Review

Unravelling the biochemical reaction kinetics from time-series data

Supplementary Figures

DYNAMIC MODELING OF BIOLOGICAL AND PHYSICAL SYSTEMS

From Petri Nets to Differential Equations An Integrative Approach for Biochemical Network Analysis

Networks in systems biology

Comparative Network Analysis

SPA for quantitative analysis: Lecture 6 Modelling Biological Processes

DESIGN OF EXPERIMENTS AND BIOCHEMICAL NETWORK INFERENCE

What is Systems Biology

Modeling the dynamics and function of cellular interaction networks. Réka Albert Department of Physics and Huck Institutes for the Life Sciences

Biological networks CS449 BIOINFORMATICS

Advances in high-throughput genomics and proteomics analysis

Robustness of the MAPK network topologies

Introduction to Bioinformatics

Systems Biology Across Scales: A Personal View XIV. Intra-cellular systems IV: Signal-transduction and networks. Sitabhra Sinha IMSc Chennai

Generating executable models from signaling network connectivity and semi-quantitative proteomic measurements

Reception The target cell s detection of a signal coming from outside the cell May Occur by: Direct connect Through signal molecules

Bioinformatics Modelling dynamic behaviour: Modularisation

Chapter 15 Active Reading Guide Regulation of Gene Expression

86 Part 4 SUMMARY INTRODUCTION

Inferring Protein-Signaling Networks

A A A A B B1

Inferring Protein-Signaling Networks II

Biological Pathways Representation by Petri Nets and extension

A Database of human biological pathways

Predicting Essential Components of Signal Transduction Networks: A Dynamic Model of Guard Cell Abscisic Acid Signaling

Biological Concepts and Information Technology (Systems Biology)

Big Idea 1: The process of evolution drives the diversity and unity of life.

Mechanisms of Human Health and Disease. Developmental Biology

AP Curriculum Framework with Learning Objectives

Integrating Bayesian variable selection with Modular Response Analysis to infer biochemical network topology

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution.

Activation of a receptor. Assembly of the complex

Computational Systems Biology

Identifying Signaling Pathways

Multiple Choice Review- Eukaryotic Gene Expression

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Bioinformatics 2. Yeast two hybrid. Proteomics. Proteomics

Gene Network Science Diagrammatic Cell Language and Visual Cell

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180

Qualitative modelling of post-transcriptional effects in the EWS/FLI1 signalling network

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Signal Transduction. Dr. Chaidir, Apt

FCModeler: Dynamic Graph Display and Fuzzy Modeling of Regulatory and Metabolic Maps

DOWNLOAD OR READ : UNANSWERED QUESTIONS CELL BIOLOGY PDF EBOOK EPUB MOBI

Evidence for dynamically organized modularity in the yeast protein-protein interaction network

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

ACTA PHYSICA DEBRECINA XLVI, 47 (2012) MODELLING GENE REGULATION WITH BOOLEAN NETWORKS. Abstract

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Supplementary methods

Lecture 7: Simple genetic circuits I

Bioinformatics 2 - Lecture 4

Graduate Institute t of fanatomy and Cell Biology

Proteomics Systems Biology

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

More Protein Synthesis and a Model for Protein Transcription Error Rates

Analysis and Simulation of Biological Systems

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Mathematical model of IL6 signal transduction pathway

Modularization of Signal Transduction Pathways: detecting the trend of development among various species

I/O monotone dynamical systems. Germán A. Enciso University of California, Irvine Eduardo Sontag, Rutgers University May 25 rd, 2011

Generating executable models from signaling network connectivity and semi-quantitative proteomic measurements

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide

Supplementary information. A proposal for a novel impact factor as an alternative to the JCR impact factor

Adding missing post-transcriptional regulations to a regulatory network

A discrete model of the nuclear factor NF-kappaB regulatory network

MOLECULAR DOCKING ANALYSIS OF HEME BINDING TO MAPK SIGNALING CASCADE MEMBERS INVOLVED IN NEURONS DEVELOPMENT AND SURVIVAL

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Network motifs in the transcriptional regulation network (of Escherichia coli):

Introduction Biology before Systems Biology: Reductionism Reduce the study from the whole organism to inner most details like protein or the DNA.

Signalling and Crosstalk in Cytokine Pathways: Mathematical Modelling and Quantitative Analysis

Problem Set # 3

Cell-Cell Communication in Development

Bi 1x Spring 2014: LacI Titration

The Logic of EGFR/ErbB Signaling: Theoretical Properties and Analysis of High-Throughput Data

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

In Silico Investigation of Off-Target Effects

BioControl - Week 6, Lecture 1

Mathematical Modeling and Analysis of Crosstalk between MAPK Pathway and Smad-Dependent TGF-β Signal Transduction

Random Boolean Networks

The majority of cells in the nervous system arise during the embryonic and early post

Computational Structural Bioinformatics

Physical network models and multi-source data integration

MEDLINE Clinical Laboratory Sciences Journals

S1 Gene ontology (GO) analysis of the network alignment results

Protein function studies: history, current status and future trends

Keywords: anti-coagulants, factor Xa, QSAR, Thrombosis. Introduction

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February ISSN

Transcription:

Synthesizing and simplifying biological networks from pathway level information Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607-7053 dasgupta@cs.uic.edu Joint works with Reka Albert, Piotr Berman, German Enciso, Sema Kachalo, Paola Vera-Licona, Eduardo Sontag, Kelly Westbrooks, Alexander Zelikovsky and Ranran Zhang Supported by NSF grants DBI-0543365 and IIS-0346973

Cellular Networks A single cell by itself is complex enough to understand its functions completely. Various technologies have facilitated the monitoring of expression of genes and activities of proteins. Difficult to find the causal relations and overall structure of the network. http://www.nyas.org/ebriefreps/ebrief/000534/images/mendes2.gif

Reverse engineering issues Given partial knowledge about the process/network access to suitable biological experiments how to gain more knowledge about the process/network? effective use of resources (time, cost)

Given Reverse engineering issues partial knowledge about the process/network access to suitable biological experiments how to gain more knowledge about the process/network? effective use of resources (time, cost)

Reverse engineering Process of backward reasoning, requiring careful observation of inputs and outputs, to elucidate the structure of the system. http://www.computerworld.com/computerworld/records/images/story/46reverse-engineering.gif

Ingredients for reverse engineering of biological networks Appropriate mathematical models Differential equation model Computational techniques (algorithms) Set multicover Biological experiments Perturbation experiments

Iterative process in systems biology

Difficulty with traditional perturbation experiments Perturbation given to any gene or part of network may quickly spread to whole network Measurement of only global changes is possible http://www.cumc.columbia.edu/news/journal/journal-o/winter-2006/img/magnet-diagram.jpg

Differential Equation Model of Biological Systems state variables evolve by (unknown) partial differential equations ),,,,,,, ( ),,,,,,, ( ), ( 2 1 2 1 2 1 2 1 1 1 m n n n m n p p p x x x f t x p p p x x x f t x t x f t x = = = x = (x 1 (t),...,x n (t)) state variables over time t measurable (e.g., activity levels of proteins) p = (p 1,...,p m ) parameters that can be manipulated f(x *,p * )=0 p * wild-type (i.e., normal) condition of p x * corresponding steady-state condition

Settings for modular response analysis method do not know f but, prior information of the following type is available parameter p j does or does not effect variables x i (i.e., f i / p j 0 or not) Kholodenko, Kiyatkin, Bruggeman, Sontag, Westerhoff and Hoek, PNAS, 2002

Experimental protocols (perturbation experiments) perturb one parameter, say p k for perturbed p, measure steady state vector x = ξ(p) let the system relax to steady state measure x i (western blots, microarrys etc.) estimate n sensitivities : b ij ξi * 1 * = ( p ) ( ξi ( p + p je j ) ξi ( p * p p p j j j * )) for i = 1,2,, n where e j is the j th canonical basis vector

Modeling Goal Modeling goal can be at different levels 1. Topology of connections only 2. Direction of the relationship 3. Information about stimulatory or inhibitory effects 4. Strength of relationship A 2.1 + C B 9.3 + 1.2-4.8 + 5.3 - D Stark et al., Trends Biotechnology 21, pp.290-293, 2003

Our very modest goal Obtain information about the sign of f i / x j (x,p ) e.g., if f i / x j > 0, then x j has a positive (catalytic) effect on the formation of x i

In a nutshell after some combinatorics and linear algebra one can quantify the additional prior knowledge necessary to reach the goal Kholodenko, Kiyatkin, Bruggeman, Sontag, Westerhoff and Hoek, PNAS, 2002 Bermen, DasGupta and Sontag, Discrete Applied Math, 2007 Berman, DasGupta and Sontag, Annals of NYAS, 2007

But, assuming (near)-sufficient prior information how to determine a minimum or near-minimum number of perturbation experiments that will work? This now becomes a algorithmic/complexity issue...

After some effort, one can see that designing minimal sets of experiments leads to the set multi-cover problem

Modular Response Analysis for Differential Equations model Linear Algebraic formulation Combinatorial Algorithms (randomized) Combinatorial formulation Selection of appropriate perturbation experiments Overall high-level picture

In our biological application context, it means... we can provide a set of suggested experiments such that # of experiments minimum possible

Experimental validation of Modular Response Analysis (MRA) Method Growth factor-induced MAPK network topology shapes Erk response determining PC-12 cell fate by Silvia D. M. Santos, Peter J. Verveer, Philippe I. H. Bastiaens Nature Cell Biology 9, 324-330 (2007)

Experimental validation (continued) MAPK pathway involving proteins Raf, Mek and Erk is activated through receptor tyrosine kinases TrkA and epidermal growth factor receptor (EGFR) by two different stimuli, NGF (neuronal-) or EGF (epidermal growth factor) MRA method was applied to determine the MAPK network architecture in the context of NGF and EGF stimulations

Another ongoing work on reverse engineering (with Paola Vera-Licona (INRIA), Eduardo Sontag (Rutgers), Joe Dundas (UIC)) Comparison of reverse engineering methods to infer network topology from gene expression data

steady state profiles of perturbations of the network http://sts.bioengr.uic.edu/causal/ hitting set (set cover) set multicover Boolean network introduce redundancy expression data representing state transition measurement for wildtype and perturbation data hitting set (set cover) set multicover topology of interconnection network introduce redundancy

Synthesizing and Minimizing Signal Transduction Networks

Overall Goal direct interaction A B A B network additional information Method (algorithms, software) minimal complexity biologically relevant double-causal interaction A (B C) A (B C) FAST

Nature of experimental evidence biochemical (e.g., enzymatic activity, protein-protein interaction) direct interaction pharmacological evidence double-causal interaction genetic evidence of differential responses to a stimulus can be direct, but most often double-causal

We describe a method for synthesizing double-causal (path-level) information into a consistent network Our method significantly expands the capability for incorporating indirect (pathway-level) information. Previous methods of synthesizing signal transduction networks only include direct biochemical interactions, and are therefore restricted by the incompleteness of the experimental knowledge on pairwise interactions.

Direct interactions A promotes B A B A B A inhibits B A B A B Illustration of double-causal interaction C promotes the process of A promoting B A pseudo B C

Critical edge (known direct interaction, part of input)

Main computational step for network synthesis Pseudo-vertex collapse (PVC) not so hard Binary transitive reduction (BTR) hard need heuristics

Pseudo-vertex collapse (PVC) Intuitively, the PVC problem is useful for reducing the pseudovertex set to the the minimal set that maintains the graph consistent with all indirect experimental observations. u pseudo-vertices in(u)=in(v) v out(u)=out(v) new psuedo-vertex uv

Illustration of Binary Transitive Reduction (BTR) remove? remove? no, critical edge yes, alternate path Intuitively, the BTR problem is useful for determining the sparsest graph consistent with a set of experimental observations

High level description of the network synthesis process Synthesize direct interactions Interaction with biologists Optimize BTR Synthesize double-causal interactions PVC Optimize BTR

Biological validation of the network synthesis approach

Plant signal transduction network consistent guard cell signal transduction network for ABAinduced stomatal closure manually curated described in S. Li, S. M. Assmann and R. Albert, Predicting Essential Components of Signal Transduction Networks: A Dynamic Model of Guard Cell Abscisic Acid Signaling, PLoS Biology, 4(10), October 2006 list of experimentally observed causal relationships collected by Li et al. and published as Table S1. This table contains around 140 interactions and causal inferences, both of type A promotes B and C promotes process (A promotes B) We augment this list with critical edges drawn from biophysical/biochemical knowledge on enzymatic reactions and ion flows and with simplifying hypotheses made by Li et al. both described in Text of S1

Arabidopsis thaliana is a small flowering plant that is widely used as a model organism in plant biology. Arabidopsis is a member of the mustard (Brassicaceae) family, which includes cultivated species such as cabbage and radish. Arabidopsis is not of major agronomic significance, but it offers important advantages for basic research in genetics and molecular biology (source: http://www.arabidopsis.org/portals/education/aboutarabidopsis.jsp)

Regulatory interactions between ABA signal transduction pathway components

Regulatory interactions between ABA signal transduction pathway components (continued) ERA1 (ABA CalM) NO GC not critical and not enzymatic

Some nodes in the network GCR1 OST1 NO ABH1 RAC1 putative G protein coupled receptor protein Nitric Oxide RNA cap-binding protein small GTPase protein

(left) Guard cell signal transduction network for ABA-induced stomatal closure manually curated by Li, Assmann and Albert [source: PloS Biology, 10 (4), 2006]. ( right) our developed automated network synthesis procedure produced a reduced (fewer edges) network while preserving all observed pathways [source: DasGupta s group, Journal of Computational Biology and Bioinformatics]

Summary of comparison of the two networks Li et al. has 54 vertices and 92 edges our network has 57 vertices but 84 edges Both networks have identical strongly connected component of vertices All the paths present in the Li et al. s reconstruction are present in our network as well The two networks have 71 common edges It took a few seconds to synthesize our network

Summary of comparison of the two networks (continued) Thus the two networks are highly similar but diverge on a few edges, All these discrepancies are not due to algorithmic deficiencies but to human decisions.

Software is available at: http://www.cs.uic.edu/~dasgupta/network-synthesis/ runs on any machine with MS Windows (Win32) click, save the executable and run for linux/unix fans, source files for a non-graphic version of the program, that can be compiled and run from the console, can be obtained by sending an email to the authors

Data sources Signal transduction pathway repositories such as TRANSPATH (http://www.generegulation.com/pub/databases.html#transpath) protein interaction databases such as the Search Tool for the Retrieval of Interacting Proteins (http://string.embl.de) contain up to thousands of interactions, a large number of which are not supported by direct physical evidence. NET-SYNTHESIS can be used to filter redundant information while keeping all direct interactions

Other applications of the software Synthesizing a Network for T Cell Survival and Death in LGL Leukemia Backgound Large Granular Lymphocytes (LGL) medium to large size cells with eccentric nuclei and abundant cytoplasm comprise 10%~15% of the total peripheral blood mononuclear cells two major lineages CD3- natural-killer (NK) cell lineage: ~85% of LGL cells CD3+ lineage: ~15% of LGL

LGL leukemia disordered clonal expansion of LGL and their invasions in the marrow, spleen and liver

Ras: Background (continued) small GTPase essential for controlling multiple essential signaling pathways its deregulation is frequently seen in human cancers Activation of H-Ras required its farnesylation, which can be blocked by Farnesyltransferase inhibitiors (FTIs) This envisions FTIs as future drug target for anti-cancer therapies, and several FTIs have entered early phase clinical trials This observation, together with the finding that Ras is constitutively activated in leukemic LGL cells, leads to the hypothesis that Ras plays an important role in LGL leukemia, and may functions through influencing Fas/FasL pathway.

To further understand the molecular mechanism(s) of the onset of LGL leukemia, we constructed the cell-survival/cell-death regulation-related signaling network, with special interest on the Ras effect on apoptosis response through Fas/FasL pathway Goal: initiates the understanding of the interactions between Ras pathway and Fas/FasL pathways, two of the major pathways that regulate cell survival/death decision. Currently, there is no standard therapy for LGL leukemia. Understanding the mechanism of this disease is crucial for drug/therapy development Proteins that modulate the Ras-apoptosis response can potentially serve as future reference for drug design and therapeutic-target-molecule search, and this may not be restricted to LGL leukemia

Synthesizing a Network for T Cell Survival and Death in Large Granular Lymphocyte Leukemia Synthesized a cell-survival/cell-death regulation-related signaling network from the TRANSPATH 6.0 database, with additional information manually curated from literature search 359 vertices of this network represent proteins/protein families and mrnas participating in pro-survival and Fas-induced apoptosis pathways 1295 edges represent regulatory relationships between nodes, including protein interactions, catalytic reactions, transcriptional regulation (no double-causal interactions were known) Performing BTR with NET-SYNTHESIS reduced the total edge-number to 873

To focus on pathways that involve the 33 known T-LGL deregulated proteins, we designated vertices that correspond to proteins with no evidence of being changed during T-LGL as pseudo-vertices and deleted the label Y for those edges whose both endpoints were pseudovertices Recursively performing Reduction (faster) BTR and Collapse degree-2 pseudonodes of NET-SYNTHESIS until no edge/node could be further removed simplified the network to 267 nodes and 751 edges.

For further results, see R. Zhang, M. V. Shah, J. Yang, S. B. Nyland, X. Liu, J. K. Yun, R. Albert, and T. P. Loughran, Network Model of Survival Signaling in LGL Leukemia PNAS, 2008

Thank you for your attention! Questions? 52