Parameter estimation using simulated annealing for S- system models of biochemical networks. Orland Gonzalez

Parameter estimation using simulated annealing for S- system models of biochemical networks Orland Gonzalez

Outline S-systems quick review Definition of the problem Simulated annealing Perturbation function Demonstration General techniques Decoupling Structure identification The cadba network in E. coli

Coupled ODEs Power-laws S-systems review Special form of GMAs Production and degradation fluxes are aggregated n: number of dependent variables m: number of independet variable

S-system example Very simple symbolic setup! X 1 production: & + 1 X = α X 1 X 1 degradation: X& 1 = β X 1 h 1 Therefore: g 0 11 10 X g 2 12 & = α g10 g12 X X X 1 1 0 2 β X 1 where: g 10 > 0 since X 1 is produced from X 0 g 12 < 0 since X 1 is inhibited by X 2 where: h 11 > 0 since we expect the current state of X 1 affects its own degradation rate. h 1 11

S-system example Very simple symbolic setup! Whole Network: X& 1 X& 2 = α X 1 = α X 2 g 0 10 g 1 21 X g 2 12 β X β X where α 2 = β 1 and h 11 = g 21 since the amount of X 1 degraded is the amount of X 2 that is produced. 2 1 h 2 22 h 1 11

Outline S-systems quick review Definition of the problem Simulated annealing Perturbation function Demonstration General techniques Decoupling Structure identification The cadba network in E. coli

Parameter Estimation From literature FromK m s and other enzyme data From perturbation experiments From a system in steady state: a single perturbation is made From time series data (biochemical profiles) Inverse problem Curve fitting

Inverse problem ex. Given the network: And the profile: Problem: Find the parameters that fit the data.

Challenges Mathematical redundancy Multiple possible solutions Computational complexity Abundant local minima Extreme computational requirements due to numerical integration An early EXTREME example (Kikuchi et. al. 2003, Bioinformatics) : System with 5 variables Artificial data Cluster of 1040 Pentium-III 933MHz processors Each algorithmic loop took about 10 hours!!!

Outline S-systems quick review Definition of the problem Simulated annealing Perturbation function Demonstration General techniques Decoupling Structure identification The cadba network in E. coli

Simulated annealing Optimization technique roughly analogous to glass and metal creation melting and then slow cooling Properties solutions less fit than the current one can be accepted (probabilistically) can escape from local minima

Pseudo code create initial solution h (random) initialize temperature (melting) slow cooling create new candidate solution h probabilistically accept h lower the temperature (gradually) Error:

2 Aspects of SA Annealing schedule temperature control t αt where α < 1 Perturbation function creation of candidate solution initial form used where c is a constant

Perturbation function Problems does not converge for too large c frequently gets trapped with too small c What we did convert c to a function of the error E

Construction of p-function error of a specific metabolite X i and a specific time t first order approx. of X it given X it-1 substitution yields

...cont d Suppose it is possible to reduce E it to a factor of p < 0 by perturbing a particular g ik by g, this relationship is described by Subtracting this from the orginal relationship, doing algebraic manipulations, shifting to logarithmic space, and isolating g yields

Direct use of equation Problems Computationally expensive Aggregation for each parameter over all possible t How to aggregate? Note: Aggregation actually makes it possible to increase the overall error E.

Indirect use of equation Defines a roughly linear relationship between Note: Consistent with BST theory of linearization in logarithmic space. Perturbation function: where l is externally optimized.

Graph of perturbation function Perturbation magnitudes go down with the error Encourages exploration of promising areas of the solution space The lower E gets, the smaller the average magnitudes are

Demonstration Generate artificial data Pretend we do not know the true parameters. Try to estimate them using the generated artificial data set (4 sets in this example)

Parameter estimation 18 free parameters Initial solution randomly created [0, 90] for rate constants [-2, 2] for kinetic orders (based on general observations) 10 hours on a 1.5GHz Intel Celeron machine

Outline S-systems quick review Definition of the problem Simulated annealing Perturbation function Demonstration General techniques Decoupling Structure identification The cadba network in E. coli

Decoupling / slope estimation Approximate slopes Finite difference, splines, etc... Equate them directly to the ODE s Advantages removes the need for numerical integration runtime dropped to under 2 mins per iteration! Disadvantages additional search bias susceptibility to noise not always applicable

Structure identification Try to identify structure simple due to canonical form of S-systems leave all/most parameters free let optimization determine structure Note: still in infancy but can be useful for limited applications ex. leaving only alternative parameters free (alternative regulations)

...cont d

Outline S-systems quick review Definition of the problem Simulated annealing Perturbation function Demonstration General techniques Decoupling Structure identification The cadba network in E. coli

The cadba network in E. coli Believed to be part of coping mechanism for acidic environments Has regulatory (transcriptional) and metabolic elements

Existing cadba model (CMA) Similarity: We used same data set Difference: We tried to avoid speculating on dynamics of components with no data.

Existing cadba model (CMA)

Problems & handling Data availability CadC is constitutive hard to measure active form bypassed by redirecting regulatory signals from ph and lysine to the equation for the transcript No data for internal lysine and cadaverine decarboxylation and transport activities were coupled

Working model Network with estimated parameters Model fit Uncertain reason (data for lysine measured separately from the rest)

Working model Network with estimated parameters play around: (neutral ph) Model fit

cadba model Challenges left nature of data CadA in arbitrary units (activity) cadba in relative intensity Further work include other players lysp and LysP determine correctness of hypothesized structure

Possible SA refinements Annealing schedule quick convergence heuristics Perturbation function perturbation by parameter subsets probably useful for handling larger problems solution histories momentum terms automatic control direct use of derived equation

Reference Thanks!!!!