Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 10 03/25/2008. Sven Zenker

Size: px

Start display at page:

Download "Parameter estimation for nonlinear models: Numerical approaches to solving the inverse problem. Lecture 10 03/25/2008. Sven Zenker"

Elizabeth Ellis
6 years ago
Views:

1 Parameter estimation or nonlinear models: Numerical approaches to solving the inverse problem Lecture 10 03/25/2008 Sven Zenker

2 Review: Multiple Shooting homework Method o Multipliers: unction [x, lastlambda] ] = mom(h,, x0, terminationnorm, maxiter,, lb, ub) % minimize unction : R^n -> > R s.t.. g: R^n -> R^m = 0 using Method o % Multipliers % unction [x[ x, grad, hx, jachx] ] = h(x) % terminationnorm = 1E-5; % terminate when constraint violation < than this % maxiter = 25; beta = 5; % actor by which to increase c in each iteration % irst evaluation, ind out dimensions [x, grad, hx, jachx] ] = h(x0) % initialization dimh = length(hx); c = 1; lambda = zeros(dimh,, 1); % 0 as initial guess or Lagrange multiplier lastlambda = lambda; x = x0; iter = 0; opts = optimset('display', ', 'Iter' Iter', 'GradObj' GradObj', 'on', 'MaxIter' MaxIter', 50); while(norm(hx) ) > terminationnorm && iter < maxiter) i isempty(lb) ) && isempty(ub) [x, x] ] = minunc(@(thex) L(thex,, c, lambda), x, opts); % minimize augmented Lagrangian with current parameter values else [x, x] ] = mincon(@(thex) L(thex,, c, lambda), x, [], [], [], [], lb, ub,, [], opts); % minimize augmented Lagrangian with current parameter values [x, jacx, hx, jachx] ] = h(x); % ind contraint values disp(sprint('iteration %d: GradObj c=%d, (x) ) = %d, norm(h(x)) = %d', iter,, c, x, norm(hx))); lastlambda = lambda; lambda = lambda + c*hx hx; ; % Method o Multipliers Langrange Multiplier updata c = beta * c; % increase penalty weight iter = iter+1; unction [Lx, gradlx] ] = L(x,, c, lambda) % augmented Lagrangian with quadratic penalty term [x, grad, hx, jachx] ] = h(x); Lx = x + lambda' * hx + c/2*hx hx'* '*hx; gradlx = grad + (lambda' * jachx)' + c * jachx' ' * hx;

3 Review: Multiple Shooting homework Method o Multipliers, initialization: unction [x, lastlambda] ] = mom(h,, x0, terminationnorm, maxiter,, lb, ub) % minimize unction : R^n -> > R s.t.. g: R^n -> R^m = 0 using Method o % Multipliers % unction [x[ x, grad, hx, jachx] ] = h(x) % terminationnorm = 1E-5; % terminate when constraint violation < than this % maxiter = 25; beta = 5; % actor by which to increase c in each iteration % irst evaluation, ind out dimensions [x, grad, hx, jachx] ] = h(x0) % initialization dimh = length(hx); c = 1; lambda = zeros(dimh,, 1); % 0 as initial guess or Lagrange multiplier lastlambda = lambda; x = x0; iter = 0; opts = optimset('display', ', 'Iter' Iter', 'GradObj' GradObj', 'on', 'MaxIter' MaxIter', 50);

4 Review: Multiple Shooting homework Method o Multipliers, main loop: while(norm(hx) ) > terminationnorm && iter < maxiter) i isempty(lb) ) && isempty(ub) [x, x] ] = minunc(@(thex) L(thex,, c, lambda), x, opts); % minimize augmented Lagrangian with current parameter values else [x, x] ] = mincon(@(thex) L(thex,, c, lambda), x, [], [], [], [], lb, ub, [], opts); % minimize augmented Lagrangian with current parameter values [x, jacx, hx, jachx] ] = h(x); % ind constraint values disp(sprint('iteration %d: GradObj c=%d, (x) ) = %d, norm(h(x)) = %d', iter,, c, x, norm(hx))); lastlambda = lambda; lambda = lambda + c*hx hx; ; % Method o Multipliers Langrange Multiplier update c = beta * c; % increase penalty weight iter = iter+1;

5 Review: Multiple Shooting homework Method o Multipliers, objective unction: unction [Lx, gradlx] ] = L(x,, c, lambda) % augmented Lagrangian with quadratic penalty term [x, grad, hx, jachx] ] = h(x); Lx = x + lambda' * hx + c/2*hx hx'* '*hx; gradlx = grad + (lambda' * jachx)' + c * jachx' ' * hx;

6 Multiple shooting: initialization and solution unction [opty0, optpars] ] = multishoot(tdata,, data, odesol,, p0, plower, pupper, nodeindices) i nodeindices(1) ~= 1 nodeindices = [nodeindices[ nodeindices; ; 1]; numnodes = length(nodeindices); numdim = size(data,, 1); % data in column vectors, since we assume ully and directly observed system, equal to solution dimension numobs = size(data,, 2); numpars = length(p0); % initialize initial guesses or initial conditions ics = zeros(numdim, numnodes); or i = 1:length(nodeindices) ics(:, i) = data(:, nodeindices(i)); % create initial guess vector x0 = [reshape(ics[ reshape(ics,, [numnodes[ numnodes* numdim 1]); p0]; % and run method o multipliers on this... lb = [ones(numnodes[ * numdim,, 1) * -In; plower]; % constrain parameters to be positive ub = [ones(numnodes[ * numdim,, 1) * In; pupper]; % constrain parameters to be positive [x, lambda] = mom(@msobjfunction,, x0, 1E-3, 25, lb, ub) opty0 = x(1:numdim); % initial conditions or irst interval = overall initial conditions optpars = x(numdim*numnodes+1:); % parameters

7 Multiple shooting: objective unction (1) unction [x[ x, gradx, hx, jachx] ] = msobjfunction(x) cics = reshape(x(1:numnodes*numdim numdim), [numdim[ numnodes]); % extract initial conditions cp = x(numnodes*numdim+1:); % and current parameter values % preallocate results vx = zeros(numdim, numobs); % residuals, or now in array ormat, will reshape later jacvx = zeros(numobs, numdim, numnodes * numdim + numpars); % Jacobian,, will rearrange at the hx = zeros((numnodes-1)* 1)*numDim,, 1); % one constraint deviation or each interior node jachx = zeros((numnodes-1)* 1)*numDim, numnodes * numdim + numpars); % dep on everything... or cnode = 1:numNodes % run over all nodes i cnode < numnodes % all but last inds = nodeindices(cnode):nodeindices(cnode+1); else inds = nodeindices(cnode):length(tdata); ); [sol, jacsol] ] = odesol(tdata(inds), cics(:, cnode), cp); % get solution at observation times including next node vx(:, inds(1:-1)) 1)) = sol(1:-1, 1, :)' - data(:, inds(1:-1)); 1)); % deviation, assume arrangement o solution is by MATLAB solver convention, i.e., ntimes x ndim jacvx(inds(1:-1), 1), :, [(cnode-1)*numdim+1:cnode* 1)*numDim+1:cnode*numDim numnodes*numdim+1:length(x)]) = jacsol(1:-1, 1, :, :); % Jacobian,, assume is arranged by sens_analysis convention, i.e. ntimes x ndim x pars % now the constraints i cnode ~= numnodes % only or intervals which have ollowing interval hx((cnode-1)*numdim+1:cnode* 1)*numDim+1:cnode*numDim) ) = sol(,, :)' - x(cnode*numdim+1:(cnode+1)* *numdim+1:(cnode+1)*numdim); % deviation o shared point with next interval rom initial condition ion o next interval jachx((cnode-1)*numdim+1:cnode* 1)*numDim+1:cnode*numDim,, [(cnode-1)*numdim+1:cnode* 1)*numDim+1:cnode*numDim numnodes*numdim+1:length(x)]) = squeeze(jacsol(,, :, :)); jachx((cnode-1)*numdim+1:cnode* 1)*numDim+1:cnode*numDim, cnode*numdim+1:(cnode+1)* *numdim+1:(cnode+1)*numdim)) = -eye(numdim); % eect o initial conditions o next interval on this constraint deviation else % last interval, inal point matters vx(:, inds()) = sol(,, :)' - data(:, inds()); % deviation, assume arrangement o solution is by MATLAB solver convention, i.e., ntimes x ndim jacvx(inds(), :, [(cnode-1)*numdim+1:cnode* 1)*numDim+1:cnode*numDim numnodes*numdim+1:length(x)]) = jacsol(, :, :); % Jacobian,, assume is arranged by sens_analysis convention, i.e. ntimes x ndim x pars

8 Multiple shooting: objective unction (2) % now reshape to obtain column vector o residuals % now realign everything, slow but sure version... ind = 1; vxn = zeros(numdim*numobs numobs,, 1); jacvxn = zeros(numdim*numobs numobs, numnodes * numdim + numpars); or ii=1:numobs or jj=1:numdim vxn(ind) ) = vx(jj,, ii); jacvxn(ind,, :) = jacvx(ii, jj,, :); ind = ind+1; vx = vxn; jacvx = jacvxn; % compute squared residuals and their gradient x = 1/2 * vx' ' * vx; gradx = jacvx' ' * vx;

9 Function handles, numerical integration in MATLAB, etc.

10 Review Lecture 9 Probability density unction: For our purposes: a way o describing a probability distribution by a unction o the vectors o possible values such that: : S such that n Px ( M) ( xdx ) = M +

11 Review Lecture 9 Marginal and conditional distributions Given a set {X,..., X } o random variables, one can 1 1 compute the probability densities or the marginal distribution o a subset o these variables indexed by a set S o indices in {1,..., n} as ( x ) = ( x 1 ) dx 1 j n S j n j n S XY, n Conditional probability in general is deined as PA ( B) PAB ( ) = PB ( ) For continuous random variables X and Y described by a joint PDF ( x, y), we have the ollowing relationship between joint PDF, marginal PDFs, and conditional PDFs ( x, y) = ( x y) ( y) = ( y x) ( x) XY, XY Y YX X yielding XY ( x y) ( x, y) XY, = = YX (Bayes' theorem or PDFs). ( y x) ( x) ( y) ( y) Y Y X (Caveat limits and metric, sketch)

12 Transormation o random variables Consider probability distribution o a random variable X described n by PDF ( x) deined on. How can we ind the PDF ( y) o a new random variable X n n we arrive at by applying an invertible unction T :, y = T( x) to the original one? Consider probability o y being in some subset S o Py ( S) ( ydy ), which we could compute i we knew Y. = S Y Since T is invertible, we can express the above integral in terms o ( x) as ollows: Py ( S) = ( ydy ) = ( xdx ) Y S 1 T ( S) and change variables to y to obtain 1 1 ( ) Y( ) X( ) X( ( )) det y ( ) S 1 T ( S) S so we see that 1 1 Y( ) X( ( )) det y ( ) X Py S = ydy= xdx= T y DT ydy y = T y D T y n Y X

13 Expected value For a random variable X described by a probability density unction EX ( ) = x( xdx ) X ( x), the expected value is (discrete gambling example)

14 Sources o uncertainty in the orward and inverse problems, a more complete picture Forward Single State Vector Measurement error and model stochasticity (i present) introduce uncertainty Probability Density unction on measurement space Interpretation Quantitative representation o system System states Parameters Prediction Mathematical model o system Inerence Measurement results Observation Probability density Function on state and Parameter space Measurement error, model stochasticity, and ill-posedness introduce uncertainty Inverse Single Measurement vector

15 Bayesian inerence or continous variables Recall that XY XY ( x y) XY, = = while we also have YX ( x, y) ( y x) ( x) YX ( y) ( y) ( y) = ( x, y) dx= ( y x) ( x) dx Y so that YX ( x y) = Y ( y x) ( x) X ( y x) ( x) dx Y X xand y can be vector valued, as well.so ar, this is just a statement about conditional probability density unctions (with the corresponding caveats...) The idea in Bayesian inerence is now to use this in a setting where we observe some data living in y space and are interested in the distribution o parameters o some model living in x space conditional on these observations. The conditional probability density unction ( y x) is called the likelihood (and has been the object o our maximizing attempts so ar...).

16 Bayesian inerence or continous variables XY ( x y) = Y ( y x) ( x) X X ( y x) ( x) dx Likelihood

17 Example I the measurement errors or n measurements predictable by a model M are assumed to be indepently and normally distributed, one could set up a likelihood unction like this n 1 ( x y) = L( x) = e 2πσ k = 1 i ( yi M ( x)) σ i 2 i 2

18 Bayesian inerence or continous variables XY ( x y) = ( y x) ( x) Y X X ( y x) ( x) dx Probability density unction o the posterior distribution

19 Bayesian inerence or continous variables XY ( x y) = YX ( y x) X ( x) ( y x) ( x) dx Probability density unction o the prior distribution.

20 Bayesian inerence The underlying idea in Bayesian statistics is to identiy probabilities with (subjective) degrees o belie in uncertain events. This conlicts with the more restrictive viewpoint o the requentist philosophy, which accepts probabilities only as the relative requency o occurrence o an event in a well deined random experiment. The extent to which this quantiication o degree o belie is subjective is the matter o some debate.

21 Bayesian inerence A key issue where the subjectivity problem maniests itsel is the selection o prior distributions In particular, a key question is how the total lack o inormation about the distribution o parameters can be represented.

22 Prior distributions This question may seem innocent at irst, but is in act rather tricky and to my knowledge, no true consensus exists at this point.

23 Prior distributions In the Bayesian spirit, priors can be used to implement the modelers belie (hopeully based on his domain expertise) about the distribution o parameters, e.g. along the lines o All values are equally probable (uniorm distribution (on some interval), otherwise improper), or, The probability o each decade in the parameter range is equal (hyperbolic) Gaussian, etc.,etc.

24 Prior distributions and reparametrization It is crucial to recognize that the shape o a prior distribution and the speciic parametrization o the model are linked: Consider or example a model y = M( x) o a single parameter x. Let's assume that our domain expertise leads us to believe that all values in the interval [1,2] are equally likely, i.e., the prior is 1 i 1 x 2 X ( x) = 0 otherwise Now consider a reparameterization o the model, e.g., by logarithmically transorming the indepent variable: 1 ˆ ˆ ˆ ˆ ˆ xx ( ) = ln x, x ( x) = xx ( ) = exp( x) Mˆ ( xˆ): = M( x( xˆ)) What does our prior distribution look like or Xˆ? 1 1 dxˆ ( xˆ) exp( xˆ) i 0 xˆ ln 2 ˆ ( xˆ) = ( ˆ ( ˆ)) (exp( ˆ X x x = x))exp( xˆ ) = X dxˆ 0 otherwise

25 Prior distributions and reparametrization Conversely, i we were to assume a uniorm prior density on, e.g., [0, ln 2] or the logarithmically transormed variable, the corresponding PDFor the original variable would be 1 1 i 1 x 2 X ( x) = ˆ (ln x) = x X x 0 otherwise This kind o hyperbolic prior (uniorm on the logarithmically transormed variable) can be viewed as assigning equal probability to each decade o the parameter since ka 1 Pa ( x ka) = dx= ln ka ln a = ln k x a

26 Priors Invariance arguments can be brought into play to derive prior distributions that are claimed to be as uninormative as possible. The derivations are somewhat technical and we will not go into detail here. Well known examples include Jereys prior or parametrized amilies o probability distributions and the so-called reerence priors, each o which are not without issues (and may be expensive to compute).

27 Priors rom a practical perspective I actual prior inormation is available, one should try to incorporate it One needs to be aware o the interrelationship o model parametrization and the shape o the prior distribution. I the phenomena modeled are well understood, a canonical parametrization may be obvious on which the choice o a uniorm prior, e.g., is physically meaningul I suicient data is available, the eect o the prior may be small I insuicient data is available, the prior will dominate,, that is, the inerence results will primarily dep on the choice o the prior Experimentation with dierent priors may (and should) reveal to what extent the conclusions drawn dep on the choice o the prior

28 Sampling to tackle high dimensional problems Full evaluation or analysis o unctions o o the posterior density in high dimensions intractable since it involves high dimensional integrals (e.g., 1D marginal will require computation o an (n-1) 1)-D volume integral, expectation will require n-n D volume integral, and so on and so orth )

29 Sampling to tackle high dimensional problems A way out: sample based approximation... I we can obtain a set o samples { X,..., X } rom the posterior distribution π ( x), n 1 E( ( x)) = ( x) π ( x) dx ( Xi ) n i= 1 1 n

30 What will thereore occupy us in the uture X Y ( x y) = YX ( y x) ( x) X ( y x) ( x) dx How to sample rom such a distribution given an implementation o the likelihood and the prior.

31 Assignment No. 8 1) Implement an (unnormalized( ) likelihood unction corresponding to an arbitrary number o indepent observations with Gaussian measurement noise or the unorced van der Pol oscillator and plot the likelihood as a unction o \mu on [0.05, 5] or the ollowing scenarios, using the parameters and initial conditions rom homework no. 1 unless stated otherwise. Describe your observations. Hint: it o course makes sense to implement a generic plotting routine that ll handle all cases and then run through the various combinations programatically ) a) 5, 10, and 20 measurements o both states simultaneously, with measurements perturbed by additive Gaussian noise with standard deviations o 0.5, 1, and 2. Vary actual additive noise in the measurements and the standard deviation you are using to compute your likelihood unction indepently a ew times to observe their respective r eects, but use the same values or both or the overall exploration. b) Perorm the same experiments as in a), but with observations or only the 1 st and only the 2 nd state, respectively. (or a total o 27 plots, as mentioned previously, you may wish to automate this) 2) Modiy your plotting routine rom 1) to plot the likelihood as a unction o \mu \in [0.05, 3] and the initial condition or state 1 \in [0, 4], using the surc plotting unction, and rerun the 9 scenarios where only state 2 is observed is observed. Describe your observations.

Lecture 8 Optimization

4/9/015 Lecture 8 Optimization EE 4386/5301 Computational Methods in EE Spring 015 Optimization 1 Outline Introduction 1D Optimization Parabolic interpolation Golden section search Newton s method Multidimensional