R Package FME : Inverse Modelling, Sensitivity, Monte Carlo Applied to a Steady-State Model

Similar documents
Solving partial differential equations, using R package ReacTran

R-package ReacTran : Reactive Transport Modelling in R

R Package ecolmod: figures and examples from Soetaert and Herman (2009)

Package detestset: testset for initial value problems of differential equations in R

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017

Introduction to SparseGrid

R Package ecolmod: figures and examples from Soetaert and Herman (2009)

Package rootsolve : roots, gradients and steady-states in R

Package diffeq. February 19, 2015

First steps of multivariate data analysis

Hands on cusp package tutorial

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

Density Temp vs Ratio. temp

Reducing The Computational Cost of Bayesian Indoor Positioning Systems

Reminder of some Markov Chain properties:

Package ReacTran. August 16, 2017

Multilevel Statistical Models: 3 rd edition, 2003 Contents

A Re-Introduction to General Linear Models (GLM)

Chapter 5 Exercises 1

Why Bayesian approaches? The average height of a rare plant

Consider fitting a model using ordinary least squares (OLS) regression:

Metric Predicted Variable on Two Groups

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Multiple Regression Analysis

Modeling radiocarbon in the Earth System. Radiocarbon summer school 2012

Bayesian inference for a population growth model of the chytrid fungus Philipp H Boersch-Supan, Sadie J Ryan, and Leah R Johnson September 2016

Dispersion of a flow that passes a dynamic structure. 5*t 5*t. Time

Parameter Selection, Model Calibration, and Uncertainty Propagation for Physical Models

Consistent Downscaling of Seismic Inversions to Cornerpoint Flow Models SPE

Stat 5031 Quadratic Response Surface Methods (QRSM) Sanford Weisberg November 30, 2015

Finite Mixture Model Diagnostics Using Resampling Methods

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

Lecture XI. Approximating the Invariant Distribution

Chapter 8 Conclusion

Doing Bayesian Integrals

Nonstationary time series models

The lmm Package. May 9, Description Some improved procedures for linear mixed models

ST 740: Markov Chain Monte Carlo

Random Walks A&T and F&S 3.1.2

Interpreting Regression Results

Supplementary Note on Bayesian analysis

Introduction to Statistics and R

Package RcppSMC. March 18, 2018

Autonomous Navigation for Flying Robots

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Gov 2000: 9. Regression with Two Independent Variables

Chapter 5 Exercises 1. Data Analysis & Graphics Using R Solutions to Exercises (April 24, 2004)

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Statistical Prediction

Package lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer

Metric Predicted Variable on One Group

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

The sbgcop Package. March 9, 2007

Univariate Statistics (One Variable) : the Basics. Linear Regression Assisted by ANOVA Using Excel.

Explore the data. Anja Bråthen Kristoffersen

BSCB 6520: Statistical Computing Homework 4 Due: Friday, December 5

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

The Bayesian Approach to Multi-equation Econometric Model Estimation

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Generalized Linear Models in R

An MCMC Package for R

R Output for Linear Models using functions lm(), gls() & glm()

Topic 23: Diagnostics and Remedies

Introduction to Maximum Likelihood Estimation

0.1 poisson.bayes: Bayesian Poisson Regression

Package bayeslm. R topics documented: June 18, Type Package

A quick guide to PLANOR, an R library for the automatic generation of regular fractional factorial designs

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Markov Chain Monte Carlo

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models

Dynamic linear models (aka state-space models) 1

The evdbayes Package

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Multivariate Normal & Wishart

A Bayesian Approach to Phylogenetics

STAT 425: Introduction to Bayesian Analysis

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Power analysis examples using R

Advanced numerical methods for nonlinear advectiondiffusion-reaction. Peter Frolkovič, University of Heidelberg

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

PIER HLM Course July 30, 2011 Howard Seltman. Discussion Guide for Bayes and BUGS

Dynamic Programming with Hermite Interpolation

Package esaddle. R topics documented: January 9, 2017

Technical note: Curve fitting with the R Environment for Statistical Computing

1 Computing the Invariant Distribution

Bayesian Inference for DSGE Models. Lawrence J. Christiano

The betareg Package. October 6, 2006

Using R in 200D Luke Sonnet

SOLVING BOUNDARY VALUE PROBLEMS IN THE OPEN SOURCE SOFTWARE R: PACKAGE bvpsolve. Francesca Mazzia, Jeff R. Cash, and Karline Soetaert

Label Switching and Its Simple Solutions for Frequentist Mixture Models

M. H. Gonçalves 1 M. S. Cabral 2 A. Azzalini 3

Lecture 17: Numerical Optimization October 2014

MCMC algorithms for fitting Bayesian models

Riemann Manifold Methods in Bayesian Statistics

Inferring biomarkers for Mycobacterium avium subsp. paratuberculosis infection and disease progression in cattle using experimental data

Bayesian Dynamic Modeling for Space-time Data in R

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Week 7 Multiple factors. Ch , Some miscellaneous parts

Transcription:

R Package FME : Inverse Modelling, Sensitivity, Monte Carlo Applied to a Steady-State Model Karline Soetaert NIOZ Yerseke The Netherlands Abstract Rpackage FME (Soetaert and Petzoldt 2010) contains functions for model calibration, sensitivity, identifiability, and Monte Carlo analysis of nonlinear models. This vignette, (vignette("fmesteady")), applies FME to a partial differential equation, solved with a steady-state solver from package rootsolve A similar vignette (vignette("fmedyna")), applies the functions to a dynamic similation model, solved with integration routines from package desolve A third vignette (vignette("fmeother")), applies the functions to a simple nonlinear model vignette("fmemcmc") tests the Markov chain Monte Carlo (MCMC) implementation Keywords: steady-state models, differential equations, fitting, sensitivity, Monte Carlo, identifiability, R. 1. A steady-state model of oxygen in a marine sediment This is a simple model of oxygen in a marine (submersed) sediment, diffusing along a spatial gradient, with imposed upper boundary concentration oxygen is consumed at maximal fixed rate, and including a monod limitation. See (Soetaert and Herman 2009) for a description of reaction-transport models. The constitutive equations are: O 2 t = Flux x O 2 cons O 2 k s Flux = D O 2 x O 2 (x = 0) = upo2 > par(mfrow=c(2, 2)) > require(fme) First the model parameters are defined... > pars <- c(upo2 = 360, # concentration at upper boundary, mmolo2/m3 cons = 80, # consumption rate, mmolo2/m3/day

2 FME Inverse Modelling, Sensitivity, Monte Carlo with a Steady-State Model ks = 1, # O2 half-saturation ct, mmolo2/m3 D = 1) # diffusion coefficient, cm2/d Next the sediment is vertically subdivided into 100 grid cells, each 0.05 cm thick. > n <- 100 # nr grid points > dx <- 0.05 #cm > dx <- c(dx/2, rep(dx, n-1), dx/2) # dispersion distances; half dx near boundaries > X <- seq(dx/2, len = n, by = dx) # distance from upper interface at middle of box The model function takes as input the parameter values and returns the steady-state condition of oxygen. Function steady.1d from package rootsolve ( (Soetaert 2009)) does this in a very efficient way (see (Soetaert and Herman 2009)). > O2fun <- function(pars) { derivs<-function(t, O2, pars) { with (as.list(pars),{ Flux <- -D* diff(c(upo2, O2, O2[n]))/dX do2 <- -diff(flux)/dx - cons*o2/(o2 ks) return(list(do2, UpFlux = Flux[1], LowFlux = Flux[n1])) }) } # Solve the steady-state conditions of the model ox <- steady.1d(y = runif(n), func = derivs, parms = pars, nspec = 1, positive = TRUE) data.frame(x = X, O2 = ox$y) } The model is run > ox <- O2fun(pars) and the results plotted... > > plot(ox$o2, ox$x, ylim = rev(range(x)), xlab = "mmol/m3", main = "Oxygen", ylab = "depth, cm", type = "l", lwd = 2) 2. Global sensitivity analysis : Sensitivity ranges The sensitivity of the oxygen profile to parameter cons, the consumption rate is estimated. We assume a normally distributed parameter, with mean = 80 (parmean), and a variance=100 (parcovar). The model is run 100 times (num).

Karline Soetaert 3 Oxygen depth, cm 5 4 3 2 1 0 0 50 100 150 200 250 300 350 mmol/m3 Figure 1: The modeled oxygen profile - see text for R-code > print(system.time( Sens2 <- sensrange(parms = pars, func = O2fun, dist = "norm", num = 100, parmean = c(cons = 80), parcovar = 100) )) user system elapsed 1.7 0.0 1.7 The results can be plotted in two ways: > par(mfrow = c(1, 2)) > plot(sens2, xyswap = TRUE, xlab = "O2", ylab = "depth, cm", main = "Sensitivity runs") > plot(summary(sens2), xyswap = TRUE, xlab = "O2", ylab = "depth, cm", main = "Sensitivity ranges") > par(mfrow = c(1, 1)) 3. Local sensitivity analysis : Sensitivity functions Local sensitivity analsysis starts by calculating the sensitivity functions > O2sens <- sensfun(func=o2fun,parms=pars) The summary of these functions gives information about which parameters have the largest effect (univariate sensitivity): > summary(o2sens)

4 FME Inverse Modelling, Sensitivity, Monte Carlo with a Steady-State Model Sensitivity runs depth, cm 5 4 3 2 1 0 0 50 100 150 200 250 300 350 O2 Figure 2: Results of the sensitivity run - left: all model runs, right: summary - see text for R-code value scale L1 L2 Mean Min Max N upo2 360 360 7.0 0.88 7.0 1.0e00 13.4176 100 cons 80 80 8.1 1.14-8.1-2.2e01-0.0084 100 ks 1 1 2.2 0.37 2.2 1.2e-04 9.6137 100 D 1 1 8.1 1.14 8.1 8.4e-03 22.0312 100 In bivariate sensitivity the pair-wise relationship and the correlation is estimated and/or plotted: > pairs(o2sens) > cor(o2sens[,-(1:2)]) upo2 cons ks D upo2 1.0000000-0.9787945 0.8375806 0.9787945 cons -0.9787945 1.0000000-0.9317287-1.0000000 ks 0.8375806-0.9317287 1.0000000 0.9317287 D 0.9787945-1.0000000 0.9317287 1.0000000 Multivariate sensitivity is done by estimating the collinearity between parameter sets (Brun, Reichert, and Kunsch 2001). > Coll <- collin(o2sens) > Coll upo2 cons ks D N collinearity 1 1 1 0 0 2 7.7 2 1 0 1 0 2 2.9

Karline Soetaert 5 20 10 5 0 0 5 10 15 20 upo2 2 4 6 8 12 20 10 5 0 0.98 cons 0.84 0.93 ks 0 2 4 6 8 0 5 10 15 20 0.98 1 0.93 D 2 4 6 8 12 0 2 4 6 8 Figure 3: pairs plot - see text for R-code 3 1 0 0 1 2 7.7 4 0 1 1 0 2 4.4 5 0 1 0 1 2 67108864.0 6 0 0 1 1 2 4.4 7 1 1 1 0 3 25.5 8 1 1 0 1 3 NaN 9 1 0 1 1 3 25.5 10 0 1 1 1 3 61616597.4 11 1 1 1 1 4 NaN > plot(coll, log = "y") 4. Fitting the model to the data Assume both the oxygen flux at the upper interface and a vertical profile of oxygen has been measured. These are the data: > O2dat <- data.frame(x = seq(0.1, 3.5, by = 0.1), y = c(279,260,256,220,200,203,189,179,165,140,138,127,116, 109,92,87,78,72,62,55,49,43,35,32,27,20,15,15,10,8,5,3,2,1,0)) > O2depth <- cbind(name = "O2", O2dat) # oxygen versus depth > O2flux <- c(upflux = 170) # measured flux First a function is defined that returns only the required model output. > O2fun2 <- function(pars) {

6 FME Inverse Modelling, Sensitivity, Monte Carlo with a Steady-State Model Collinearity Collinearity index 1e01 1e03 1e05 1e07 2 3 Number of parameters Figure 4: collinearity - see text for R-code derivs<-function(t, O2, pars) { with (as.list(pars),{ Flux <- -D*diff(c(upO2, O2, O2[n]))/dX do2 <- -diff(flux)/dx - cons*o2/(o2 ks) return(list(do2,upflux = Flux[1], LowFlux = Flux[n1])) }) } ox <- steady.1d(y = runif(n), func = derivs, parms = pars, nspec = 1, positive = TRUE, rtol = 1e-8, atol = 1e-10) list(data.frame(x = X, O2 = ox$y), UpFlux = ox$upflux) } The function used in the fitting algorithm returns an instance of type modcost. This is created by calling function modcost twice. First with the modeled oxygen profile, then with the modeled flux. > Objective <- function (P) { Pars <- pars Pars[names(P)]<-P modo2 <- O2fun2(Pars) # Model cost: first the oxygen profile Cost <- modcost(obs = O2depth, model = modo2[[1]],

Karline Soetaert 7 x = "x", y = "y") # then the flux modfl <- c(upflux = modo2$upflux) Cost <- modcost(obs = O2flux, model = modfl, x = NULL, cost = Cost) return(cost) } We first estimate the identifiability of the parameters, given the data: > print(system.time( sf<-sensfun(objective, parms = pars) )) user system elapsed 0.14 0.00 0.14 > summary(sf) value scale L1 L2 Mean Min Max N upo2 360 360 4.3 0.97 4.3 0.5069 13.3 36 cons 80 80 3.7 0.99-3.6-15.3722 0.5 36 ks 1 1 0.4 0.14 0.4-0.0069 3.1 36 D 1 1 3.7 0.99 3.7 0.0342 15.4 36 > collin(sf) upo2 cons ks D N collinearity 1 1 1 0 0 2 8.6 2 1 0 1 0 2 3.1 3 1 0 0 1 2 8.7 4 0 1 1 0 2 4.2 5 0 1 0 1 2 50.6 6 0 0 1 1 2 4.2 7 1 1 1 0 3 14.2 8 1 1 0 1 3 50.8 9 1 0 1 1 3 14.7 10 0 1 1 1 3 50.6 11 1 1 1 1 4 51.0 The collinearity of the full set is too high, but as the oxygen diffusion coefficient is well known, it is left out of the fitting. The combination of the three remaining parameters has a low enough collinearity to enable automatic fitting. The parameters are constrained to be >0 > collin(sf, parset = c("upo2", "cons", "ks"))

8 FME Inverse Modelling, Sensitivity, Monte Carlo with a Steady-State Model upo2 cons ks D N collinearity 1 1 1 1 0 3 14 > print(system.time( Fit <- modfit(p = c(upo2 = 360, cons = 80, ks = 1), f = Objective, lower = c(0, 0, 0)) )) user system elapsed 1.25 0.00 1.26 > (SFit<-summary(Fit)) Parameters: Estimate Std. Error t value Pr(> t ) upo2 292.937 2.104 139.245 <2e-16 *** cons 49.687 2.366 21.001 <2e-16 *** ks 1.297 1.362 0.953 0.348 --- Signif. codes: 0 Ś***Š 0.001 Ś**Š 0.01 Ś*Š 0.05 Ś.Š 0.1 Ś Š 1 Residual standard error: 4.401 on 33 degrees of freedom Parameter correlation: upo2 cons ks upo2 1.0000 0.5791 0.2975 cons 0.5791 1.0000 0.9011 ks 0.2975 0.9011 1.0000 We next plot the residuals > plot(objective(fit$par), xlab = "depth", ylab = "", main = "residual", legpos = "top") and show the best-fit model > Pars <- pars > Pars[names(Fit$par)] <- Fit$par > modo2 <- O2fun(Pars) > plot(o2depth$y, O2depth$x, ylim = rev(range(o2depth$x)), pch = 18, main = "Oxygen-fitted", xlab = "mmol/m3", ylab = "depth, cm") > lines(modo2$o2, modo2$x) 5. Running a Markov chain Monte Carlo We use the parameter covariances of previous fit to update parameters, while the mean squared residual of the fit is use as prior fo the model variance.

Karline Soetaert 9 residual 10 5 0 5 10 15 UpFlux O2 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 depth Figure 5: residuals - see text for R-code Oxygen fitted depth, cm 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0 50 100 150 200 250 mmol/m3 Figure 6: Best fit model - see text for R-code

10 FME Inverse Modelling, Sensitivity, Monte Carlo with a Steady-State Model > Covar <- SFit$cov.scaled * 2.4^2/3 > s2prior <- SFit$modVariance We run an adaptive Metropolis, making sure that ks does not become negative... > print(system.time( MCMC <- modmcmc(f = Objective, p = Fit$par, jump = Covar, niter = 1000, ntrydr = 2, var0 = s2prior, wvar0 = 1, updatecov = 100, lower = c(na, NA, 0)) )) number of accepted runs: 699 out of 1000 (69.9%) user system elapsed 27.78 0.02 28.34 > MCMC$count dr_steps Alfasteps num_accepted num_covupdate 684 2052 699 9 Plotting the results is similar to previous cases. > plot(mcmc,full=true) > hist(mcmc, Full = TRUE) > pairs(mcmc, Full = TRUE) or summaries can be created: > summary(mcmc) upo2 cons ks var_model mean 294.669801 53.960056 4.26541661 8.373713e02 sd 6.058268 6.329367 4.33510416 1.127736e04 min 275.596731 42.378560 0.02361273 1.982599e00 max 321.490783 93.961430 37.84672056 2.610703e05 q025 291.719704 49.888962 1.53816103 1.245861e01 q050 293.605504 52.477486 3.15966952 3.436431e01 q075 296.246191 56.415191 5.81336260 1.034256e02 > cor(mcmc$pars) upo2 cons ks upo2 1.0000000 0.7064404 0.3040173 cons 0.7064404 1.0000000 0.8218140 ks 0.3040173 0.8218140 1.0000000

Karline Soetaert 11 upo2 cons ks 280 290 300 310 320 50 60 70 80 90 0 10 20 30 0 200 600 1000 iter 0 200 600 1000 iter 0 200 600 1000 iter SSR var_model 1000 2000 3000 4000 5000 variance 1e01 1e03 1e05 0 200 600 1000 iter 0 200 600 1000 iter Figure 7: MCMC plot results - see text for R-code

12 FME Inverse Modelling, Sensitivity, Monte Carlo with a Steady-State Model upo2 cons ks 0.00 0.05 0.10 0.15 0.00 0.04 0.08 0.12 0.00 0.05 0.10 0.15 280 300 320 40 60 80 0 10 20 30 SSR error var_model 0.000 0.001 0.002 0.003 0.004 var_model 0e00 2e 04 4e 04 1000 3000 5000 0 100000 250000 Figure 8: MCMC histogram results - see text for R-code

Karline Soetaert 13 50 70 90 1000 3000 5000 50 70 90 1000 3000 5000 upo2 0.71 cons ks 0.3 0.82 0.64 0.62 0.35 0.14 0.081 SSR 0.018 0.11 var_model 280 300 320 0 10 20 30 0 100000 250000 280 300 320 0 10 20 30 0 100000 250000 Figure 9: MCMC pairs plot - see text for R-code

14 FME Inverse Modelling, Sensitivity, Monte Carlo with a Steady-State Model O2 y 5 4 3 2 1 0 Min Max Mean sd 0 50 100 150 200 250 300 x Figure 10: MCMC range plot - see text for R-code Note: we pass to sensrange the full parameter vector (parms) and the parameters sampled during the MCMC (parinput). > plot(summary(sensrange(parms = pars, parinput = MCMC$par, f = O2fun, num = 500)), xyswap = TRUE) > points(o2depth$y, O2depth$x) 6. Finally This vignette is made with Sweave (Leisch 2002). References Brun R, Reichert P, Kunsch H (2001). Practical Identifiability Analysis of Large Environmental Simulation Models. Water Resources Research, 37(4), 1015 1030.

Karline Soetaert 15 Leisch F(2002). Dynamic Generation of Statistical Reports Using Literate Data Analysis. In W Härdle, B Rönz (eds.), COMPSTAT 2002 Proceedings in Computational Statistics, pp. 575 580. Physica-Verlag, Heidelberg. Soetaert K (2009). rootsolve: Nonlinear Root Finding, Equilibrium and Steady-State Analysis of Ordinary Differential Equations. R package version 1.6, URL http://cran. R-project.org/package=rootSolve. Soetaert K, Herman PMJ (2009). A Practical Guide to Ecological Modelling. Using R as a Simulation Platform. Springer-Verlag, New York. Soetaert K, Petzoldt T (2010). Inverse Modelling, Sensitivity and Monte Carlo Analysis in R Using Package FME. Journal of Statistical Software, 33(3), 1 28. URL http://www. jstatsoft.org/v33/i03/. Affiliation: Karline Soetaert Royal Netherlands Institute of Sea Research (NIOZ) 4401 NT Yerseke, Netherlands E-mail: k.soetaert@nioz.nl URL: http://www.nioz.nl