BioControl - Week 3, Lecture 1

Size: px

Start display at page:

Download "BioControl - Week 3, Lecture 1"

William Gordon
5 years ago
Views:

1 BioControl - Week 3, Lecture 1 Goals of this lecture Background on system identification - Fitting models - Selecting models Suggested readings System identification: theory for the user L Ljung, Prentice-Hall Elisa Franco, Caltech 1

2 Need for sys-id in biology MAPK epidermal growth factor receptors How can we validate a model? What can we measure? Kinase cascade Phosphorylation Stochasticity Unknown dynamics Steady state vs dynamic measurements Downstream phosphorylation of many different proteins Stimulate/repress gene expression Elisa Franco, Caltech 2

3 System identification perspective Data Model Unknown parameters d = {d 1,...,d N } d = M(θ) θ = {θ 1,...,θ M } θ independent of time: Parametric identification θ(t) depends on time and d = {d(t 1 ),...,d(t N )} t>t N t = t N t 1 <t<t N Prediction Filtering Smoothing Examples: Predict tomorrow s traffic on the 405 based on historical data Current average velocity on the 405 between Wilshire and Santa Monica Blvd Yesterday s average velocity distribution on the 405. Elisa Franco, Caltech 3

4 Core elements of system identification Data source of information; partial; noisy; sometimes experiments can be designed ESTIMATOR Model class Model selection criterion choice depends on the questions we want to answer! linear or non-linear grey box (first principles) or black box (I/O) parametric or non-parametric (functions) M = {M(θ, d) :θ Θ Q} within the model class, choose the model that best fits the data according to a certain performance measure θ = arg min θ Θ J(θ) Validation criterion depends on the model purpose - convergence, error variance, consistency wrt new data... Elisa Franco, Caltech 4

5 Core elements of system identification System data experiment design methods Prior information Model selection Model parameter estimation validation data Model validation Most identification procedures consider a class of models that are linear, discrete time, lumped parameters, single output. Elisa Franco, Caltech 5

6 Estimators Data are noisy Estimated quantities are therefore random variables. A good estimator should yield: p(d, θ) θ 1 θ 2 Unbiased estimates. θ 0 x p(d, θ) V 1 V 2 Minimum variance: between two estimators, pick the one that gives estimates with the least variance characteristics. p(d, θ) V 3 : N 3 > N 2 V 1 : N 1 V 2 : N 2 > N 1 x Estimates that converge in mean square (in some sense): the more data we add to our set, the smaller the variance of the estimator output should be. x Elisa Franco, Caltech 6

7 Limits to the precision of estimation The variance of any estimator is conditioned by the data source statistical properties! p(d, θ) probability density function of the data, given a certain value of the parameters Let us define the Fisher Information Matrix: I F (θ). = E 2 ln p(d, θ) θ2 Iij F (θ) =. 2 E ln p(d, θ) θ i θ j Could we improve the FIM by experimental design? It can be proved that for any estimator we pick: Var[ˆθ] (I F ) 1 Cramér-Rao inequality Elisa Franco, Caltech 7

8 Example 1: parametric identification Least squares: linear regression, has analytical solution Given the data: y(t), t=1,...,n and for each t: u(t) = u 1 (t),...,u M (t) (typically overdetermined data set) Define the error: e(t) =. y(t) u(t)θ min J(θ) = 1 N Model estimation criterion: θ 2 t=1 J Minimum of the cost: θ =0 U (Y Uθ)=0 θ = U U 1 U y? θ = θ 1... θ M such that y(t) =u(t)θ Moore-Penrose pseudoinverse e 2 (t) = 1 2 (Y Uθ) (Y Uθ) data d(x) fit f(x) Note: the model needs to be linear in the parameters, not necessarily on the independent variables! Example: fitting the function f(x) =α 0 + α 1 x + α 2 x 2 Elisa Franco, Caltech 8

9 Example 2: Maximum Likelihood estimators Given a likelihood function L(θ) =p(d, θ) p(d, θ) θ 1 θ 2 Select θ for which: L(θ ) L(θ) for any possible θ d x Recall the least square example, with added zero-mean Gaussian noise: y = Uθ + v v G(0,V) Minimizing the likelihood function is equivalent to minimizing its log. p(x, θ) = 1 (2π)N det(v ) e 1 2 (x Uθ) V 1 (x Uθ) min θ (y Uθ)V 1 (y Uθ) We recover the weighted least square solution... Elisa Franco, Caltech 9

10 Example 3: Kalman Filter Suppose we want to track an object in space, but we can only measure its position, and it s noisy. Can we estimate its velocity? Note - Observability condition required Abstract problem formulation: Given a dynamical system with partially measurable states and zero mean Gaussian disturbances, we want to find the best linear estimator to x ẋ = Ax + Bu + R v v y = Cx + R w w Prediction ˆx = Aˆx + Bu + L(y C ˆx) L =? Correction The system is linear, so the mean will be zero or driven by the input u We want to minimize the error covariance: e(t) =x ˆx min E[e(t)e (t)] = min E[P (t)] Now, since ė =(A LC)e + R v v LR w w P =(A LC)P + P (A LC) T + R v VR v + LR w WR wl L = PC R 1 w The error covariance is minimized by the Kalman linear gain. KF can be used to estimate time-varying parameters! Elisa Franco, Caltech 10

11 Identification in biology Most identification procedures consider a class of models that are linear, discrete time, lumped parameters, often single output... Most biological processes are nonlinear! Class of models is uncertain Limited number of measurable quantities In the context of biology, identification almost coincides with - Off-line parameter estimation - Model selection Objectives: -Gain insight on system -Simulation-aided design of experiments -Bio-molecular programming (good identification allows redesign of pathways) Elisa Franco, Caltech 11

12 Nonlinear parameter estimation Given a set of data, calibrate the model to reproduce the experimental results in the best possible way. Most often, we fall in the Nonlinear Programming (NLP) class of optimization problems. Given the data d(t), define: e(θ, t). = d(t) y(θ, t) NLP problem: min J(θ, T) = subject to: T 0 ẋ = f(x, y, θ, ψ, t) x(t 0 )=x 0 h(x, y, θ, ψ) =0 g(x, y, θ, ψ) 0 θ L θ θ U e(θ, t) W (t)e(θ, t)dt dynamic constraints trajectory constraints parameters constraints NLP problems have a global minimum only when cost functional and constraints are convex! -Convex optimization, Boyd SP and Vandenberghe L Cambridge University Press. -Nonlinear programming, Bertsekas D Athena Scientific In practical cases, they are not. Numerical methods used to solve NLP problems must carefully handle local minima! Simple gradient methods won t work. Parameter Estimation in Biochemical Pathways: A Comparison of Global Optimization Methods Moles CG, Mendes P, Banga JR Genome Research Nov 1; 13(11): Elisa Franco, Caltech 12

13 Parameter estimation: global optimization algorithm classes Adaptive stochastic methods 1. independent variables = random variables 2. center the distribution of the RV about best search point found 3. adaptive search steps in the region Clustering methods 1. sample points in the search domain 2. transform the sampled points to group them around the local minima 3. apply a clustering technique to identify groups that (hopefully) represent neighborhoods of local minima => Minimize redundant local searches Genetic algorithms (evolutionary computation) 1.Initialize and evaluate initial population 2.Repeat: Perform competitive selection Apply genetic operators to generate new solutions Evaluate solutions in the population Until some convergence criteria is satisfied Simulated annealing Cost function = energy landscape E Repeat: Pick temperature T Move in the parameter space ΔE 0 keep params ΔE>0 keep params w/ P ( E) =e E k B T T initially large, decreases gradually for fine tuning, jumps allowed to avoid local minima Elisa Franco, Caltech 13

14 Model selection ẋ = f 1 (x,θ 1 ) ẋ = f 2 (x,θ 2 ) fluorescence Which is the best model? t Need a tradeoff between accuracy and overfitting Elisa Franco, Caltech 14

15 Akaike Information Criterion (AIC) We should not strive for the truth, but for reasonable approximations L. Ljung The truth: y = G(z) Approximation - model: M(z θ) Kullback-Leibler distance: I(G, M) = G(z)ln G(z) dz M(z θ) I(G, M) = G(z)ln(G(z)) dz G(z)ln(M(z θ)) dz Akaike approximation to the K-L distance: AIC(G, M) = 2ln L(ˆθ y) Log-likelihood function Number of parameters! +2P Model selection criterion: min AIC(G, M i)= 2ln M i M L(ˆθ i y) +2P i S. Kullback and R.A. Leibler. On information and sufficiency. The Annals of Mathematical Statistics. Vol. 22, pg H. Bozdogan. Akaike s information criterion and recent developments in information complexity. Journal of Mathematical Psychology, Vol Elisa Franco, Caltech 15

16 AIC: application 1. Fit each model parameters with simulated annealing 2. Select model with AIC AIC table Dunlop MJ, Franco E, Murray RM, ACC2007 Elisa Franco, Caltech 16

17 Model discrimination: let s compare some papers Gadkar KG, Gunawan R, Doyle FJ. Iterative approach to model identification of biological networks. Bmc Bioinformatics, 6, 2005 Iterative approach η = η 1,...,η P possible measurements θ = θ 1,...,θ N W noise cov η J = θ η=η meas max Fisher I F = J WJ information matrix Discretized ẋ = Ax + Br + C r = f(x, θ) Quad Prog Bayesian iteration max det(i F ) s.t. max θ id Exp. Feasible Model prediction error criterion Elisa Franco, Caltech 17

18 Model discrimination through experimental design Case 1 Stimulus design for model selection and validation in cell signaling. Apgar JF, Toettcher JE, Endy D, White FM, Tidor B PLoS Comput Biol Feb; 4(2): e30 min Design time-varying model-based controller to achieve desired output The better the model, the smaller the experimental tracking error Consider mass action kinetics up to second order. Model linearized for controller design or gradient based optimization ANY FLAW IN THIS METHOD? Elisa Franco, Caltech 18

Model discrimination through experimental design Case 2 Model Discrimination of Polynomial Systems via Stochastic Inputs Georgiev D and Klavins E, CDC 2008 Discrete time models Polynomial state

19 Model discrimination through experimental design Case 2 Model Discrimination of Polynomial Systems via Stochastic Inputs Georgiev D and Klavins E, CDC 2008 Discrete time models Polynomial state transition functions MDP (model discrimination problem) Given a pair of candidate models with the same input and output spaces, find an input, called the disparity certificate, that yields different outputs for all possible disturbances. MIP (model invalidation problem) Given the inputs and outputs for a series of executed experiments, find which candidate model maps the inputs to different outputs for all possible disturbances. Elisa Franco, Caltech 19

Optimization-Based Control

Optimization-Based Control Richard M. Murray Control and Dynamical Systems California Institute of Technology DRAFT v1.7a, 19 February 2008 c California Institute of Technology All rights reserved. This