An adaptive kriging method for characterizing uncertainty in inverse problems

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p98 An adaptive kriging method for characterizing uncertainty in inverse problems FU Shuai 2 University Paris-Sud & INRIA, Department of Mathematics 945 Orsay, France E-mail: fshvip@gmailcom Couplet Mathieu 2, Bousquet Nicolas 2 2 EDF R&D, Department of Industrial Risk Management 6 quai Watier 784 Chatou, France E-mail: mathieucouplet@edffr, nicolasbousquet@edffr Industrial context of the model Our present work is based on a simplified model which is related to the risk of dyke overflow during a flood We want to predict the water level in this case, with a number of variables availables : the observed quantities are the water level at the dyke position Z c, the speed of the river V and the observed flow of river Q The unobservable quantities are the value of Strickler coefficient K s as well as the river bed level at the dyke Z v In order to predict the future water level, we need to quantify the non-observed variables K s and Z v In this flooding model, if we define the data vector Y = Z c, V ) T R 2 ; the non-observed vector X = K s, Z v ) T R 2 ; the observed variable d = Q R, we can rewrite the measured observation Y as a function of the non-observed vector X, adding an error U : Y = Z c, V ) T + U = HK s, Z v ; Q) + U In general, we consider the following model for the uncertainty treatment: ) Y i = HX i, d i ) + U i, i n) with n the sample size, and Y i ) R p denotes the data vectors; H denotes a known function from R q+q 2 to R p which is expensive in CPU-time consumption and often regarded as a black box ; X i ) R q denotes the non-observed random data which is assumed independent and identically distributed iid); d i ) R q 2 denotes the observed variables related to the experimental conditions; U i ) R p denotes the measurement errors which are assumed iid Besides, the variables X i ) and U i ) are assumed to be independent

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p98 To solve this inverse problem we choose a Bayesian framework and we assume the following priori distributions for the random variables X i and U i : X i m, C N q m, C); U i N p, R), i n), which allows us to take into account the experts prior knowledge as well as to reduce the problem of identifiability Moreover, assuming a known R, the prior distributions of m and C can be defined as follows : m C N q µ, C/a), with a an hyperparameter to be specified; C IW q Λ, ν) M q q where IW denotes a Inverse-Wishart distribution Typically, we want to estimate the posterior distribution of θ = m, C) knowing the data Y = Y,, Y n ), which caracterizes the distribution of X i We choose a Gibbs sampling to approximate the Bayesian posterior distributions In order to define a Gibbs sampler, noting X = X,, X n ) and Y = Y,, Y n ), we calculate the full conditional posterior distribution for each unknown quantity m, C, X) : a m C, Y, X, ρ N n + a µ + C m, Y, X, ρ IW Λ + { X Y, m, C, ρ exp 2 2) n n + a X, C ) n + a n ) m X i )m X i ) + am µ)m µ) ν + n + i= n i= [ ]} X i m) C X i m) + Y i HX i, d i )) R Y i HX i, d i )) with ρ the set of hyperparameters As the last conditional distribution 2) of X is under a complicated and strange form, it is necessary to apply a numerical method for simulation, Metropolis-Hastings for example 2 New kriging version of the model In general, the CPU-time consuming for the function H is very large for the reason of complicated code of H) It is desirable to limit the number of calls to the function H Here we propose a kriging approximation method where Ĥ H is regarded as the realisation of a Gaussian process H To apply this method, we choose a bounded hypercube Ω R Q Q = q + q 2 ) and generate a set of N max points D = {z,, z Nmax } Ω with z i = x i, d i ), applying a Latin Hypercube Sampling LHS) - maximin strategy, called a design Our approximation is restricted to Ω and the number of calls to H is limited to N max, noted by H D = {Hz ),, Hz Nmax )} The predictor Ĥ is calculated as a conditional experance for any point z Ω Ĥz ) = EHz ) H D = H D ), and its conditional variance denoted by MSEz ) Mean Squared Error) can be seen as a mesure of the prediction accuracy Following this idea, each Y i can also be seen as a realisation of a new process

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p982 Y i which corresponds with the Gaussian process H, we rewrite the current model: 3) Y i = ĤX i, d i ) + H Ĥ)X i, d i ) + U i }{{} = ĤX i, d i ) + V i X i, d i ) with the new error term V i X i, d i ) combining two types of variance : the R of the old error U i and the MSE of the kriging method We regard especially the whole sample Y = Y,, Y n ), which permets us to consider the correlation between the predictions ĤZ i) and ĤZ j) Thus model 3) can be written assuming that p = 2 for simplicity): 4) Y = Y Y n Y 2 Y 2 n = Ĥ Z ) Ĥ nz n ) Ĥ 2Z + ) ĤnZ 2 n ) V Z ) V n Z n ) V 2 Z ) V 2 n Z n ) = ĤZ) + V Z) We deduce its distribution given the variable Z = Z,, Z Nmax ) and the function values H D of the design: 5) Y Z, H D N ĤZ), R + MSEZ)), where R is a diagonal variance matrix: R R 6) R = R 22 R 22, n times n times with R ii the i th diagonal component of R; and MSEZ) a block diagonal matrix composed by the MSE i Z) which is the variance-covariance matrix of H i Z): MSE Z) n times MSEZ) = MSE 2 Z) n times The conditional distribution of the vector X R p n knowing Y can be specified : 7) R 2 X Y, m, C, ρ, d, H D Ω + MSEZ) exp 2 { 2 ), ) ) ) Y ĤZ ), Y n ĤZ n) R + MSEZ) n i= [ ] X i m) C X i m) ) Y ĤZ ) ) Y n ĤZ n) }

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p983 3 Hybrid MCMC Algorithm We explicit the Gibbs algorithm as follows: Given m [r], C [r], X [r] ) for r =,, 2,, generate C [r+] IW Λ + ) n i= m[r] X [r] i )m [r] X [r] i ) + am [r] µ)m [r] µ), ν + n + ) 2 m [r+] N a n+a µ + n n+a X[r], C[r+] n+a 3 X [r+]? The simulation of X [r+] consists of l iterations of Metropolis-Hastings algorithm By consequence, our Gibbs sampling algorithm becomes a so-called hybrid MCMC algorithm, following Tierney 994), which simultaneously utilizes both Gibbs sampling steps and Metropolis-Hastings steps In detail, Let X = X,,, X n, ) = X [r] For s =,, l, updating X [r] component by component for i =,, n): Generate X i,s J X i,s ) where J is a proposal distribution 2 Let 8) αx i,s, X πĥ X s Y, θ [r+], ρ, d, H D ) JX i,s X i,s ) ) i,s ) = min πĥx s Y, θ [r+], ρ, d, H D ) J X i,s X i,s ),, where X s = X,s,, X i,s, Xi,s, X i+,s,, X n,s ) X s = X,s,, X i,s, X i,s, X i+,s,, X n,s ) 3 Accept X i,s = { Xi,s with probability αx i,s, X i,s ), X i,s otherwise ) 4 Renovation X s = X,s,, X i,s, X i+,s,, X n,s X [r+] = X l In Step 2 of the MH algorithm, we consider a gaussian proposal distributions J = N m [r+], C [r+] )

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p984 4 Construction of an adaptive design An important point for a kriging approximation is the choice of the design of experience DoE) In order to fill the whole field, we use a classic strategy called Latin Hypercube Sampling LHS) - maximin But this general method is not always perfect, sometimes the design points are relatively far from the true value of X and sometimes the MSE may explode at the edge of the experimental field We propose to construct a design adaptively by providing the additional information sequentially The substitution is as follows : Fix N max as calculation budget and a proportion p of points 2 Choose a hypercubic domain Ω where proxy is valid 3 Build a design LHS-maximin D with p N max points in Ω 4 Decide whether the design is satisfactory quality criterion) If it is, we call the kriging predictor Ĥ and we run the Gibbs algorithm M-H) 5 If it isn t, we add the other p) N max points sequentially to the indicated area according to an adaptive procedure to improve the quality of design The adaptive scheme requires many choices: the experimental field Ω; the proportion p of points selected for the design LHS-maximin; a criterion to measure the quality of a design; the adaptive strategy of adding the points, etc Small experience : with different number N max of points The number N max of points of the design D is very important If we increase this number, we improve the accuracy of the kriging approximation But on the other hand, the computation requires more time In the following numerical experiments, we consider three different cases : N max =, 3, 5 The Figure gives us an illustration : when we increase the points of the design D from to 2, there is an obvious decrease of the MSE 25 2 Quantile 5 Median Quantile 95 5 MSE 5 5 5 2 Number of points of Design Figure : MSE with an increasing number N max of the points of D Now we consider three LHS-maximin designs with points, 3 points and 5 points and we compare the posterior distributions of two parameters m and C by using these three kriging methods, based on a sample of size, drawn from the simulated Monte Carlo chain after the burn-in period We can notice a great difference when we apply the -point kriging

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p985 35 3 Distribution of m Posterior kriging Posterior kriging 3 Posterior kriging 5 25 2 Distribution of m2 Posterior kriging Posterior kriging 3 Posterior kriging 5 25 2 5 5 5 5 22 24 26 28 3 32 34 36 38 485 49 495 5 55 5 Figure 2: m Figure 3: m 2 3 25 Distribution of C Posterior kriging Posterior kriging 3 Posterior kriging 5 4 2 Distribution of C22 Posterior kriging Posterior kriging 3 Posterior kriging 5 2 8 5 6 4 5 2 5 5 2 5 5 2 25 3 35 4 45 Figure 4: C Figure 5: C 22 How to choose adaptively the N max points of design? as follows : Here we give a procedure well detailed Fix an experimental field Ω and generate a design D of p N max p = 9% for example) points in Ω according to the LHS - maximin strategy 2 Divide the field Ω densely enough 5 respectively for x,x 2 and d for example) 3 At each grid point x, x 2, d) = x, d), we calculate its criterion value C D x, d) with two possible types : a) weighted criterion : where max MSEx, x,d) E d)α πx) α πx) [ + x µ) + ) ] ν+ 2 a ) Λ x µ), and MSEx, d) can be obtained easily with the package DACE; b) weighted criterion taking into account the y i at iteration k) : max MSEx, x,d) E d)α πx y i, θ [k] ) α i where πx y i, θ [k] ) has to be updated at each iteration of MCMC

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p986 4 Complete the design sequentially with p) N max additional points which maximise the criterion quantity For the moment, we use only the first criterion by logarithm, which means that C D x, d) = α logmsex, d)) ν + α) log [ + x µ) + ) ] 2 a ) Λ x µ) where α could be chosen between and If we take p = 9%, N max =, with different α we obtain the following graphs Figure 6, 7 and 8, where Figure 6 represents the design D with 9 points generated by LHS - Maximin plus additional points with the biggest MSE value, and in Figure 7 and 8, we add points according to the MSE value as well as the prior distribution of X where the prior mean µ is all supposed [4, 48], but the weight α = 8 for Figure 7 and α = 5 for Figure 8 The strategy of adding points does allow us to diminish the kriging uncertainty We can compare the final result : the posterior distributions of the mean parameter m Figure 9 and ) in three cases and we use a 5-point-kriging result as a reference We find that in our example, with α = we only consider the MSE impact), the simulation results satisfy us the most 54 Design: 9% LHS plus % points,, Grille 54 Design: 9% LHS plus % points,8, Grille 54 Design: 9% LHS plus % points,5, Grille 53 53 53 52 52 52 5 5 5 5 5 5 49 49 49 48 48 48 47 47 47 46 2 3 4 5 6 46 2 3 4 5 6 46 2 3 4 5 6 Figure 6: α = Figure 7: α = 8 Figure 8: α = 5 25 2 Distribution of m with different alpha sequentlly, points sequentlly, points 8 sequentlly, points 5 5 points 25 2 Distribution of m2 with different alpha sequentlly, points sequentlly, points 8 sequentlly, points 5 5 points 5 5 5 5 5 2 25 3 35 4 485 49 495 5 55 5 Figure 9: m Figure : m 2 Why did we choose p = 9%? In fact, in order to make a best choice of p, we repeat the experiment with different p and we obtain the following graph Figure ) Obviously, p = 8% 9% seems to be a good solution Remark : When we add the p) N max points to D, we have to distinguish their physical positions to avoid adding all the points in the same area So we construct the DoE sequentially by adding one new point and recalculating the criterion C D each time

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p987 3 MSE with Percent) LHS + Percent points MSE 25 2 MSE 5 5 2 3 4 5 6 Percent %) of points to add Figure : MSE with an increasing number of points to add To illustrate the advantage of this adaptive idea, we compare the experimental results of the - point-adaptive-kriging methods by always using a 5-point-kriging method as a reference Figure 2, 3, 4 and 5) There is a great improvement when we apply the adaptive kriging method with points, especially for m 2 and C 22 25 2 Distribution of m time, points sequentlly, points classic, points time, 5 points 2 8 6 time, points sequentlly, points classic, points time, 5 points Distribution of m2 4 5 2 8 6 5 4 2 5 2 25 3 35 4 48 485 49 495 5 55 5 Figure 2: m Figure 3: m 2 8 6 4 2 Distribution of C time, points sequentlly, points classic, points time, 5 points 4 2 Distribution of C22 time, points sequentlly, points classic, points time, 5 points 8 8 6 6 4 4 2 2 5 5 2 25 3 35 2 3 4 5 Figure 4: C Figure 5: C 22 REFERENCES RÉFERENCES) [] Robert C P and Casella G 24) Monte Carlo Statistical Methods Second Edition New York: Springer [2] Gelman A, Carlin J B, Stern HS and Rubin DB 24) Bayesian Data Analysis Second Edition Chapman & Hall/Crc [3] Celeux G, Grimaud A, Lefebvre Y and De Rocquigny E 29) Identifying variability in multivariate systems through linearised inverse methods Inverse Problems In Science & Engineering, à aparaître [4] Barbillon P, Celeux G, Grimaud A, Lefebvre Y and De Rocquigny E 29) Non linear methods for inverse statistical problems Rapport de research INRIA

Int Statistical Inst: Proc 58th World Statistical Congress, 2, Dublin Session STS2) p988 [5] Bettinger R 29) Inversion d un systme par krigeage Thesis, Nice-Sophia Antipolis University [6] Dubourg V, Deheeger F 2) Une alternative la substitution pour les mta-modles en analyse de fiabilit Journes Nationales de la Fiabilit, Toulouse [7] Picheny V, Ginsbourger D, Roustant O, Haftka RT, Kim N-H Adaptive Designs of Experiments for Accurate Approximation of a Target Region [8] Williams B, Santner T and Notz W 2) Sequential design of computer experiments to minimize integrated response functions Statistica Sinica [9] Scheidt C 26) Analyse statistique d expriences simules: Modlisation adaptative de rponses non-rgulires par krigeage et plans d expriences Thesis, Louis Pasteur University, Strasbourg [] Leonardo S Bastos and Anthony O Hagan 29) Diagnostics for Gaussian Process Emulators University of Sheffield