Parameter estimation using simulated annealing for S- system models of biochemical networks. Orland Gonzalez

Similar documents
PARAMETER ESTIMATION IN S-SYSTEM AND GMA MODELS: AN OVERVIEW

Introduction to Bioinformatics

Metaheuristics and Local Search. Discrete optimization problems. Solution approaches

5. Simulated Annealing 5.2 Advanced Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini

5. Simulated Annealing 5.1 Basic Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini

PROBLEM SOLVING AND SEARCH IN ARTIFICIAL INTELLIGENCE

Project in Computational Game Theory: Communities in Social Networks

Metaheuristics and Local Search

FUNDAMENTALS of SYSTEMS BIOLOGY From Synthetic Circuits to Whole-cell Models

Σ N (d i,p z i,p ) 2 (1)

Bottom-Up Propositionalization

Introduction to Simulated Annealing 22c:145

Motivation, Basic Concepts, Basic Methods, Travelling Salesperson Problem (TSP), Algorithms

7.1 Basis for Boltzmann machine. 7. Boltzmann machines

Energy-aware scheduling for GreenIT in large-scale distributed systems

12. LOCAL SEARCH. gradient descent Metropolis algorithm Hopfield neural networks maximum cut Nash equilibria

Metaheuristics. 2.3 Local Search 2.4 Simulated annealing. Adrian Horga

Program for the rest of the course

Reducing the Run-time of MCMC Programs by Multithreading on SMP Architectures

Incremental identification of transport phenomena in wavy films

Pengju

A parallel metaheuristics for the single machine total weighted tardiness problem with sequence-dependent setup times

Chapter 4 Beyond Classical Search 4.1 Local search algorithms and optimization problems

Modeling and Systems Analysis of Gene Regulatory Networks

Local Search & Optimization

Deep Learning & Artificial Intelligence WS 2018/2019

Algebraic solution for time course of enzyme assays. Enzyme kinetics in the real world. Progress curvature at low initial [substrate]

Mining Classification Knowledge

INVERSE DETERMINATION OF SPATIAL VARIATION OF DIFFUSION COEFFICIENTS IN ARBITRARY OBJECTS CREATING DESIRED NON- ISOTROPY OF FIELD VARIABLES

De Novo Peptide Sequencing

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Theory and Applications of Simulated Annealing for Nonlinear Constrained Optimization 1

Protein Complex Identification by Supervised Graph Clustering

Simulated Annealing for Constrained Global Optimization

Networks in systems biology

Lin-Kernighan Heuristic. Simulated Annealing

Kyle Reing University of Southern California April 18, 2018

Constraint-based Subspace Clustering

Lecture 4: Simulated Annealing. An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved

Design and Analysis of Algorithms

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

LOCAL SEARCH. Today. Reading AIMA Chapter , Goals Local search algorithms. Introduce adversarial search 1/31/14

Basic modeling approaches for biological systems. Mahesh Bule

9.4 Cubic spline interpolation

Supplementary Figure 3

Biological Pathways Representation by Petri Nets and extension

Bayesian Networks Inference with Probabilistic Graphical Models

Regulation of metabolism

GTOC 7 Team 12 Solution

SIMU L TED ATED ANNEA L NG ING

Compiling and Executing PDDL in Picat

Identification of proteins by enzyme digestion, mass

CSC 411 Lecture 6: Linear Regression

Unravelling the biochemical reaction kinetics from time-series data

Computational Linear Algebra

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

86 Part 4 SUMMARY INTRODUCTION

Simulated Annealing. 2.1 Introduction

Mathematics Background

Modeling Biological Networks

Gradient-based Adaptive Stochastic Search

Simulated Annealing. Local Search. Cost function. Solution space

On the Monotonicity of the String Correction Factor for Words with Mismatches

2. Mathematical descriptions. (i) the master equation (ii) Langevin theory. 3. Single cell measurements

Carbon labeling for Metabolic Flux Analysis. Nicholas Wayne Henderson Computational and Applied Mathematics. Abstract

Local and Stochastic Search

Optimization Methods via Simulation

Self Similar (Scale Free, Power Law) Networks (I)

Heuristic Optimisation

Local Search & Optimization

V19 Metabolic Networks - Overview

BioControl - Week 6, Lecture 1

Vehicle Routing and Scheduling. Martin Savelsbergh The Logistics Institute Georgia Institute of Technology

Advanced sampling. fluids of strongly orientation-dependent interactions (e.g., dipoles, hydrogen bonds)

Statistics and Quantum Computing

Finding optimal configurations ( combinatorial optimization)

Methods for finding optimal configurations

ANALYSIS OF BIOLOGICAL NETWORKS USING HYBRID SYSTEMS THEORY. Nael H. El-Farra, Adiwinata Gani & Panagiotis D. Christofides

Computational statistics

Linear Discrimination Functions

Microcanonical Mean Field Annealing: A New Algorithm for Increasing the Convergence Speed of Mean Field Annealing.

Advanced computational methods X Selected Topics: SGD

Qualitative Dynamic Behavior of Physical System Models With Algebraic Loops

Mixed Integer Programming:

PUBBLICAZIONI. Istituto di Matematica Applicata e Tecnologie Informatiche Enrico Magenes. Consiglio Nazionale delle Ricerche. L. Ironi, D.X.

Hill climbing: Simulated annealing and Tabu search

Featured Articles Advanced Research into AI Ising Computer

CSC321 Lecture 2: Linear Regression

SPA for quantitative analysis: Lecture 6 Modelling Biological Processes

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Cool Off, Will Ya! Investigating Effect of Temperature Differences between Water and Environment on Cooling Rate of Water

Lecture 2: Symbolic Model Checking With SAT

Parametrized Genetic Algorithms for NP-hard problems on Data Envelopment Analysis

Chapter 6 Constraint Satisfaction Problems

Minimizing Clock Latency Range in Robust Clock Tree Synthesis

ENZYME KINETICS. Medical Biochemistry, Lecture 24

purpose of this Chapter is to highlight some problems that will likely provide new

OPTIMIZATION METHODS IN DEEP LEARNING

Grundlagen der Systembiologie und der Modellierung epigenetischer Prozesse

The Quadratic Assignment Problem

Transcription:

Parameter estimation using simulated annealing for S- system models of biochemical networks Orland Gonzalez

Outline S-systems quick review Definition of the problem Simulated annealing Perturbation function Demonstration General techniques Decoupling Structure identification The cadba network in E. coli

Outline S-systems quick review Definition of the problem Simulated annealing Perturbation function Demonstration General techniques Decoupling Structure identification The cadba network in E. coli

Coupled ODEs Power-laws S-systems review Special form of GMAs Production and degradation fluxes are aggregated n: number of dependent variables m: number of independet variable

S-system example Very simple symbolic setup! X 1 production: & + 1 X = α X 1 X 1 degradation: X& 1 = β X 1 h 1 Therefore: g 0 11 10 X g 2 12 & = α g10 g12 X X X 1 1 0 2 β X 1 where: g 10 > 0 since X 1 is produced from X 0 g 12 < 0 since X 1 is inhibited by X 2 where: h 11 > 0 since we expect the current state of X 1 affects its own degradation rate. h 1 11

S-system example Very simple symbolic setup! Whole Network: X& 1 X& 2 = α X 1 = α X 2 g 0 10 g 1 21 X g 2 12 β X β X where α 2 = β 1 and h 11 = g 21 since the amount of X 1 degraded is the amount of X 2 that is produced. 2 1 h 2 22 h 1 11

Outline S-systems quick review Definition of the problem Simulated annealing Perturbation function Demonstration General techniques Decoupling Structure identification The cadba network in E. coli

Parameter Estimation From literature FromK m s and other enzyme data From perturbation experiments From a system in steady state: a single perturbation is made From time series data (biochemical profiles) Inverse problem Curve fitting

Inverse problem ex. Given the network: And the profile: Problem: Find the parameters that fit the data.

Challenges Mathematical redundancy Multiple possible solutions Computational complexity Abundant local minima Extreme computational requirements due to numerical integration An early EXTREME example (Kikuchi et. al. 2003, Bioinformatics) : System with 5 variables Artificial data Cluster of 1040 Pentium-III 933MHz processors Each algorithmic loop took about 10 hours!!!

Outline S-systems quick review Definition of the problem Simulated annealing Perturbation function Demonstration General techniques Decoupling Structure identification The cadba network in E. coli

Simulated annealing Optimization technique roughly analogous to glass and metal creation melting and then slow cooling Properties solutions less fit than the current one can be accepted (probabilistically) can escape from local minima

Pseudo code create initial solution h (random) initialize temperature (melting) slow cooling create new candidate solution h probabilistically accept h lower the temperature (gradually) Error:

2 Aspects of SA Annealing schedule temperature control t αt where α < 1 Perturbation function creation of candidate solution initial form used where c is a constant

Perturbation function Problems does not converge for too large c frequently gets trapped with too small c What we did convert c to a function of the error E

Construction of p-function error of a specific metabolite X i and a specific time t first order approx. of X it given X it-1 substitution yields

...cont d Suppose it is possible to reduce E it to a factor of p < 0 by perturbing a particular g ik by g, this relationship is described by Subtracting this from the orginal relationship, doing algebraic manipulations, shifting to logarithmic space, and isolating g yields

Direct use of equation Problems Computationally expensive Aggregation for each parameter over all possible t How to aggregate? Note: Aggregation actually makes it possible to increase the overall error E.

Indirect use of equation Defines a roughly linear relationship between Note: Consistent with BST theory of linearization in logarithmic space. Perturbation function: where l is externally optimized.

Graph of perturbation function Perturbation magnitudes go down with the error Encourages exploration of promising areas of the solution space The lower E gets, the smaller the average magnitudes are

Demonstration Generate artificial data Pretend we do not know the true parameters. Try to estimate them using the generated artificial data set (4 sets in this example)

Parameter estimation 18 free parameters Initial solution randomly created [0, 90] for rate constants [-2, 2] for kinetic orders (based on general observations) 10 hours on a 1.5GHz Intel Celeron machine

Outline S-systems quick review Definition of the problem Simulated annealing Perturbation function Demonstration General techniques Decoupling Structure identification The cadba network in E. coli

Decoupling / slope estimation Approximate slopes Finite difference, splines, etc... Equate them directly to the ODE s Advantages removes the need for numerical integration runtime dropped to under 2 mins per iteration! Disadvantages additional search bias susceptibility to noise not always applicable

Structure identification Try to identify structure simple due to canonical form of S-systems leave all/most parameters free let optimization determine structure Note: still in infancy but can be useful for limited applications ex. leaving only alternative parameters free (alternative regulations)

...cont d

Outline S-systems quick review Definition of the problem Simulated annealing Perturbation function Demonstration General techniques Decoupling Structure identification The cadba network in E. coli

The cadba network in E. coli Believed to be part of coping mechanism for acidic environments Has regulatory (transcriptional) and metabolic elements

Existing cadba model (CMA) Similarity: We used same data set Difference: We tried to avoid speculating on dynamics of components with no data.

Existing cadba model (CMA)

Problems & handling Data availability CadC is constitutive hard to measure active form bypassed by redirecting regulatory signals from ph and lysine to the equation for the transcript No data for internal lysine and cadaverine decarboxylation and transport activities were coupled

Working model Network with estimated parameters Model fit Uncertain reason (data for lysine measured separately from the rest)

Working model Network with estimated parameters play around: (neutral ph) Model fit

cadba model Challenges left nature of data CadA in arbitrary units (activity) cadba in relative intensity Further work include other players lysp and LysP determine correctness of hypothesized structure

Possible SA refinements Annealing schedule quick convergence heuristics Perturbation function perturbation by parameter subsets probably useful for handling larger problems solution histories momentum terms automatic control direct use of derived equation

Reference Thanks!!!!