An adaptive kriging method for characterizing uncertainty in inverse problems

Similar documents
eqr094: Hierarchical MCMC for Bayesian System Reliability

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo

Adaptive Monte Carlo methods

Surveying the Characteristics of Population Monte Carlo

Introduction to emulators - the what, the when, the why

Bayesian Methods for Machine Learning

Bayesian spatial hierarchical modeling for temperature extremes

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

Bayesian data analysis in practice: Three simple examples

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

ST 740: Markov Chain Monte Carlo

MEASUREMENT UNCERTAINTY AND SUMMARISING MONTE CARLO SAMPLES

Markov chain Monte Carlo

An introduction to Bayesian statistics and model calibration and a host of related topics

F denotes cumulative density. denotes probability density function; (.)

Gaussian Processes for Computer Experiments

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bayesian Inference. Chapter 1. Introduction and basic concepts

Kobe University Repository : Kernel

Sequential approaches to reliability estimation and optimization based on kriging

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

Lecture 8: The Metropolis-Hastings Algorithm

Monte Carlo Methods. Leon Gu CSD, CMU

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Introduction to Machine Learning CMU-10701

Online Appendix to: Crises and Recoveries in an Empirical Model of. Consumption Disasters

Markov Chain Monte Carlo methods

Kernel adaptive Sequential Monte Carlo

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Efficient MCMC Sampling for Hierarchical Bayesian Inverse Problems

Markov Switching Regular Vine Copulas

STAT 518 Intro Student Presentation

16 : Approximate Inference: Markov Chain Monte Carlo

CTDL-Positive Stable Frailty Model

Bayesian sensitivity analysis of a cardiac cell model using a Gaussian process emulator Supporting information

MCMC Sampling for Bayesian Inference using L1-type Priors

Bayesian Methods in Multilevel Regression

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Comparing Priors in Bayesian Logistic Regression for Sensorial Classification of Rice

Bayesian Monte Carlo Filtering for Stochastic Volatility Models

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Bayesian Inference for the Multivariate Normal

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Markov chain Monte Carlo methods in atmospheric remote sensing

Efficiency and Reliability of Bayesian Calibration of Energy Supply System Models

Making rating curves - the Bayesian approach

17 : Markov Chain Monte Carlo

Statistical Inference for Stochastic Epidemic Models

Kullback-Leibler Designs

Gibbs Sampling in Linear Models #2

STA 4273H: Statistical Machine Learning

Multivariate Normal & Wishart

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A Practical Implementation of the Gibbs Sampler for Mixture of Distributions: Application to the Determination of Specifications in Food Industry

Advances and Applications in Perfect Sampling

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

An Application of MCMC Methods and of Mixtures of Laws for the Determination of the Purity of a Product.

Markov Chain Monte Carlo (MCMC)

Bayesian model selection: methodology, computation and applications

An introduction to Sequential Monte Carlo

Bayesian Defect Signal Analysis

Multi-fidelity sensitivity analysis

MCMC algorithms for fitting Bayesian models

Principles of Bayesian Inference

Bayesian model selection in graphs by using BDgraph package

STAT 425: Introduction to Bayesian Analysis

Summary of Talk Background to Multilevel modelling project. What is complex level 1 variation? Tutorial dataset. Method 1 : Inverse Wishart proposals.

Quantile POD for Hit-Miss Data

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Introduction to Bayesian methods in inverse problems

Bayesian Linear Regression

Bayesian inference for multivariate skew-normal and skew-t distributions

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

POSTERIOR ANALYSIS OF THE MULTIPLICATIVE HETEROSCEDASTICITY MODEL

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Monte Carlo Integration using Importance Sampling and Gibbs Sampling

Bayesian Gaussian Process Regression

DAG models and Markov Chain Monte Carlo methods a short overview

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Overlapping block proposals for latent Gaussian Markov random fields

Computer Practical: Metropolis-Hastings-based MCMC

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson

Timevarying VARs. Wouter J. Den Haan London School of Economics. c Wouter J. Den Haan

Monetary and Exchange Rate Policy Under Remittance Fluctuations. Technical Appendix and Additional Results

The two subset recurrent property of Markov chains

MARKOV CHAIN MONTE CARLO

Sensitivity analysis and calibration of a global aerosol model

Making the Most of What We Have: A Practical Application of Multidimensional Item Response Theory in Test Scoring

Monte Carlo methods for sampling-based Stochastic Optimization

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Markov Chain Monte Carlo Algorithms for Gaussian Processes

Likelihood-free MCMC

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Transcription:

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p98 An adaptive kriging method for characterizing uncertainty in inverse problems FU Shuai 2 University Paris-Sud & INRIA, Department of Mathematics 945 Orsay, France E-mail: fshvip@gmailcom Couplet Mathieu 2, Bousquet Nicolas 2 2 EDF R&D, Department of Industrial Risk Management 6 quai Watier 784 Chatou, France E-mail: mathieucouplet@edffr, nicolasbousquet@edffr Industrial context of the model Our present work is based on a simplified model which is related to the risk of dyke overflow during a flood We want to predict the water level in this case, with a number of variables availables : the observed quantities are the water level at the dyke position Z c, the speed of the river V and the observed flow of river Q The unobservable quantities are the value of Strickler coefficient K s as well as the river bed level at the dyke Z v In order to predict the future water level, we need to quantify the non-observed variables K s and Z v In this flooding model, if we define the data vector Y = Z c, V ) T R 2 ; the non-observed vector X = K s, Z v ) T R 2 ; the observed variable d = Q R, we can rewrite the measured observation Y as a function of the non-observed vector X, adding an error U : Y = Z c, V ) T + U = HK s, Z v ; Q) + U In general, we consider the following model for the uncertainty treatment: ) Y i = HX i, d i ) + U i, i n) with n the sample size, and Y i ) R p denotes the data vectors; H denotes a known function from R q+q 2 to R p which is expensive in CPU-time consumption and often regarded as a black box ; X i ) R q denotes the non-observed random data which is assumed independent and identically distributed iid); d i ) R q 2 denotes the observed variables related to the experimental conditions; U i ) R p denotes the measurement errors which are assumed iid Besides, the variables X i ) and U i ) are assumed to be independent

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p98 To solve this inverse problem we choose a Bayesian framework and we assume the following priori distributions for the random variables X i and U i : X i m, C N q m, C); U i N p, R), i n), which allows us to take into account the experts prior knowledge as well as to reduce the problem of identifiability Moreover, assuming a known R, the prior distributions of m and C can be defined as follows : m C N q µ, C/a), with a an hyperparameter to be specified; C IW q Λ, ν) M q q where IW denotes a Inverse-Wishart distribution Typically, we want to estimate the posterior distribution of θ = m, C) knowing the data Y = Y,, Y n ), which caracterizes the distribution of X i We choose a Gibbs sampling to approximate the Bayesian posterior distributions In order to define a Gibbs sampler, noting X = X,, X n ) and Y = Y,, Y n ), we calculate the full conditional posterior distribution for each unknown quantity m, C, X) : a m C, Y, X, ρ N n + a µ + C m, Y, X, ρ IW Λ + { X Y, m, C, ρ exp 2 2) n n + a X, C ) n + a n ) m X i )m X i ) + am µ)m µ) ν + n + i= n i= [ ]} X i m) C X i m) + Y i HX i, d i )) R Y i HX i, d i )) with ρ the set of hyperparameters As the last conditional distribution 2) of X is under a complicated and strange form, it is necessary to apply a numerical method for simulation, Metropolis-Hastings for example 2 New kriging version of the model In general, the CPU-time consuming for the function H is very large for the reason of complicated code of H) It is desirable to limit the number of calls to the function H Here we propose a kriging approximation method where Ĥ H is regarded as the realisation of a Gaussian process H To apply this method, we choose a bounded hypercube Ω R Q Q = q + q 2 ) and generate a set of N max points D = {z,, z Nmax } Ω with z i = x i, d i ), applying a Latin Hypercube Sampling LHS) - maximin strategy, called a design Our approximation is restricted to Ω and the number of calls to H is limited to N max, noted by H D = {Hz ),, Hz Nmax )} The predictor Ĥ is calculated as a conditional experance for any point z Ω Ĥz ) = EHz ) H D = H D ), and its conditional variance denoted by MSEz ) Mean Squared Error) can be seen as a mesure of the prediction accuracy Following this idea, each Y i can also be seen as a realisation of a new process

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p982 Y i which corresponds with the Gaussian process H, we rewrite the current model: 3) Y i = ĤX i, d i ) + H Ĥ)X i, d i ) + U i }{{} = ĤX i, d i ) + V i X i, d i ) with the new error term V i X i, d i ) combining two types of variance : the R of the old error U i and the MSE of the kriging method We regard especially the whole sample Y = Y,, Y n ), which permets us to consider the correlation between the predictions ĤZ i) and ĤZ j) Thus model 3) can be written assuming that p = 2 for simplicity): 4) Y = Y Y n Y 2 Y 2 n = Ĥ Z ) Ĥ nz n ) Ĥ 2Z + ) ĤnZ 2 n ) V Z ) V n Z n ) V 2 Z ) V 2 n Z n ) = ĤZ) + V Z) We deduce its distribution given the variable Z = Z,, Z Nmax ) and the function values H D of the design: 5) Y Z, H D N ĤZ), R + MSEZ)), where R is a diagonal variance matrix: R R 6) R = R 22 R 22, n times n times with R ii the i th diagonal component of R; and MSEZ) a block diagonal matrix composed by the MSE i Z) which is the variance-covariance matrix of H i Z): MSE Z) n times MSEZ) = MSE 2 Z) n times The conditional distribution of the vector X R p n knowing Y can be specified : 7) R 2 X Y, m, C, ρ, d, H D Ω + MSEZ) exp 2 { 2 ), ) ) ) Y ĤZ ), Y n ĤZ n) R + MSEZ) n i= [ ] X i m) C X i m) ) Y ĤZ ) ) Y n ĤZ n) }

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p983 3 Hybrid MCMC Algorithm We explicit the Gibbs algorithm as follows: Given m [r], C [r], X [r] ) for r =,, 2,, generate C [r+] IW Λ + ) n i= m[r] X [r] i )m [r] X [r] i ) + am [r] µ)m [r] µ), ν + n + ) 2 m [r+] N a n+a µ + n n+a X[r], C[r+] n+a 3 X [r+]? The simulation of X [r+] consists of l iterations of Metropolis-Hastings algorithm By consequence, our Gibbs sampling algorithm becomes a so-called hybrid MCMC algorithm, following Tierney 994), which simultaneously utilizes both Gibbs sampling steps and Metropolis-Hastings steps In detail, Let X = X,,, X n, ) = X [r] For s =,, l, updating X [r] component by component for i =,, n): Generate X i,s J X i,s ) where J is a proposal distribution 2 Let 8) αx i,s, X πĥ X s Y, θ [r+], ρ, d, H D ) JX i,s X i,s ) ) i,s ) = min πĥx s Y, θ [r+], ρ, d, H D ) J X i,s X i,s ),, where X s = X,s,, X i,s, Xi,s, X i+,s,, X n,s ) X s = X,s,, X i,s, X i,s, X i+,s,, X n,s ) 3 Accept X i,s = { Xi,s with probability αx i,s, X i,s ), X i,s otherwise ) 4 Renovation X s = X,s,, X i,s, X i+,s,, X n,s X [r+] = X l In Step 2 of the MH algorithm, we consider a gaussian proposal distributions J = N m [r+], C [r+] )

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p984 4 Construction of an adaptive design An important point for a kriging approximation is the choice of the design of experience DoE) In order to fill the whole field, we use a classic strategy called Latin Hypercube Sampling LHS) - maximin But this general method is not always perfect, sometimes the design points are relatively far from the true value of X and sometimes the MSE may explode at the edge of the experimental field We propose to construct a design adaptively by providing the additional information sequentially The substitution is as follows : Fix N max as calculation budget and a proportion p of points 2 Choose a hypercubic domain Ω where proxy is valid 3 Build a design LHS-maximin D with p N max points in Ω 4 Decide whether the design is satisfactory quality criterion) If it is, we call the kriging predictor Ĥ and we run the Gibbs algorithm M-H) 5 If it isn t, we add the other p) N max points sequentially to the indicated area according to an adaptive procedure to improve the quality of design The adaptive scheme requires many choices: the experimental field Ω; the proportion p of points selected for the design LHS-maximin; a criterion to measure the quality of a design; the adaptive strategy of adding the points, etc Small experience : with different number N max of points The number N max of points of the design D is very important If we increase this number, we improve the accuracy of the kriging approximation But on the other hand, the computation requires more time In the following numerical experiments, we consider three different cases : N max =, 3, 5 The Figure gives us an illustration : when we increase the points of the design D from to 2, there is an obvious decrease of the MSE 25 2 Quantile 5 Median Quantile 95 5 MSE 5 5 5 2 Number of points of Design Figure : MSE with an increasing number N max of the points of D Now we consider three LHS-maximin designs with points, 3 points and 5 points and we compare the posterior distributions of two parameters m and C by using these three kriging methods, based on a sample of size, drawn from the simulated Monte Carlo chain after the burn-in period We can notice a great difference when we apply the -point kriging

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p985 35 3 Distribution of m Posterior kriging Posterior kriging 3 Posterior kriging 5 25 2 Distribution of m2 Posterior kriging Posterior kriging 3 Posterior kriging 5 25 2 5 5 5 5 22 24 26 28 3 32 34 36 38 485 49 495 5 55 5 Figure 2: m Figure 3: m 2 3 25 Distribution of C Posterior kriging Posterior kriging 3 Posterior kriging 5 4 2 Distribution of C22 Posterior kriging Posterior kriging 3 Posterior kriging 5 2 8 5 6 4 5 2 5 5 2 5 5 2 25 3 35 4 45 Figure 4: C Figure 5: C 22 How to choose adaptively the N max points of design? as follows : Here we give a procedure well detailed Fix an experimental field Ω and generate a design D of p N max p = 9% for example) points in Ω according to the LHS - maximin strategy 2 Divide the field Ω densely enough 5 respectively for x,x 2 and d for example) 3 At each grid point x, x 2, d) = x, d), we calculate its criterion value C D x, d) with two possible types : a) weighted criterion : where max MSEx, x,d) E d)α πx) α πx) [ + x µ) + ) ] ν+ 2 a ) Λ x µ), and MSEx, d) can be obtained easily with the package DACE; b) weighted criterion taking into account the y i at iteration k) : max MSEx, x,d) E d)α πx y i, θ [k] ) α i where πx y i, θ [k] ) has to be updated at each iteration of MCMC

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p986 4 Complete the design sequentially with p) N max additional points which maximise the criterion quantity For the moment, we use only the first criterion by logarithm, which means that C D x, d) = α logmsex, d)) ν + α) log [ + x µ) + ) ] 2 a ) Λ x µ) where α could be chosen between and If we take p = 9%, N max =, with different α we obtain the following graphs Figure 6, 7 and 8, where Figure 6 represents the design D with 9 points generated by LHS - Maximin plus additional points with the biggest MSE value, and in Figure 7 and 8, we add points according to the MSE value as well as the prior distribution of X where the prior mean µ is all supposed [4, 48], but the weight α = 8 for Figure 7 and α = 5 for Figure 8 The strategy of adding points does allow us to diminish the kriging uncertainty We can compare the final result : the posterior distributions of the mean parameter m Figure 9 and ) in three cases and we use a 5-point-kriging result as a reference We find that in our example, with α = we only consider the MSE impact), the simulation results satisfy us the most 54 Design: 9% LHS plus % points,, Grille 54 Design: 9% LHS plus % points,8, Grille 54 Design: 9% LHS plus % points,5, Grille 53 53 53 52 52 52 5 5 5 5 5 5 49 49 49 48 48 48 47 47 47 46 2 3 4 5 6 46 2 3 4 5 6 46 2 3 4 5 6 Figure 6: α = Figure 7: α = 8 Figure 8: α = 5 25 2 Distribution of m with different alpha sequentlly, points sequentlly, points 8 sequentlly, points 5 5 points 25 2 Distribution of m2 with different alpha sequentlly, points sequentlly, points 8 sequentlly, points 5 5 points 5 5 5 5 5 2 25 3 35 4 485 49 495 5 55 5 Figure 9: m Figure : m 2 Why did we choose p = 9%? In fact, in order to make a best choice of p, we repeat the experiment with different p and we obtain the following graph Figure ) Obviously, p = 8% 9% seems to be a good solution Remark : When we add the p) N max points to D, we have to distinguish their physical positions to avoid adding all the points in the same area So we construct the DoE sequentially by adding one new point and recalculating the criterion C D each time

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p987 3 MSE with Percent) LHS + Percent points MSE 25 2 MSE 5 5 2 3 4 5 6 Percent %) of points to add Figure : MSE with an increasing number of points to add To illustrate the advantage of this adaptive idea, we compare the experimental results of the - point-adaptive-kriging methods by always using a 5-point-kriging method as a reference Figure 2, 3, 4 and 5) There is a great improvement when we apply the adaptive kriging method with points, especially for m 2 and C 22 25 2 Distribution of m time, points sequentlly, points classic, points time, 5 points 2 8 6 time, points sequentlly, points classic, points time, 5 points Distribution of m2 4 5 2 8 6 5 4 2 5 2 25 3 35 4 48 485 49 495 5 55 5 Figure 2: m Figure 3: m 2 8 6 4 2 Distribution of C time, points sequentlly, points classic, points time, 5 points 4 2 Distribution of C22 time, points sequentlly, points classic, points time, 5 points 8 8 6 6 4 4 2 2 5 5 2 25 3 35 2 3 4 5 Figure 4: C Figure 5: C 22 REFERENCES RÉFERENCES) [] Robert C P and Casella G 24) Monte Carlo Statistical Methods Second Edition New York: Springer [2] Gelman A, Carlin J B, Stern HS and Rubin DB 24) Bayesian Data Analysis Second Edition Chapman & Hall/Crc [3] Celeux G, Grimaud A, Lefebvre Y and De Rocquigny E 29) Identifying variability in multivariate systems through linearised inverse methods Inverse Problems In Science & Engineering, à aparaître [4] Barbillon P, Celeux G, Grimaud A, Lefebvre Y and De Rocquigny E 29) Non linear methods for inverse statistical problems Rapport de research INRIA

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p988 [5] Bettinger R 29) Inversion d un systme par krigeage Thesis, Nice-Sophia Antipolis University [6] Dubourg V, Deheeger F 2) Une alternative la substitution pour les mta-modles en analyse de fiabilit Journes Nationales de la Fiabilit, Toulouse [7] Picheny V, Ginsbourger D, Roustant O, Haftka RT, Kim N-H Adaptive Designs of Experiments for Accurate Approximation of a Target Region [8] Williams B, Santner T and Notz W 2) Sequential design of computer experiments to minimize integrated response functions Statistica Sinica [9] Scheidt C 26) Analyse statistique d expriences simules: Modlisation adaptative de rponses non-rgulires par krigeage et plans d expriences Thesis, Louis Pasteur University, Strasbourg [] Leonardo S Bastos and Anthony O Hagan 29) Diagnostics for Gaussian Process Emulators University of Sheffield