Introduction to Gaussian-process based Kriging models for metamodeling and validation of computer codes

Size: px

Start display at page:

Download "Introduction to Gaussian-process based Kriging models for metamodeling and validation of computer codes"

Shanon Stevens
5 years ago
Views:

1 Introduction to Gaussian-process based Kriging models for metamodeling and validation of computer codes François Bachoc Department of Statistics and Operations Research, University of Vienna (Former PhD student at CEA-Saclay, DEN, DM2S, STMF, LGLS, F Gif-Sur-Yvette, France and LPMA, Université Paris 7) LRC MANON - INSTN Saclay - March 2014 François Bachoc Introduction to Kriging models March / 55

2 1 Introduction to Gaussian processes 2 Kriging prediction Conditional distribution and Gaussian conditioning theorem Kriging prediction 3 Application to metamodeling of the GERMINAL code 4 Application to validation of the FLICA 4 thermal-hydraulic code François Bachoc Introduction to Kriging models March / 55

.., V n) t is a vector of random variables.

3 Random variables and vectors Remark A random Variable X is a random number, defined by a probability density function f X : R R + for which, for a, b R : b "probability of a X b" = f X (x)dx a Similarly a random Vector V = (V 1,..., V n) t is a vector of random variables. It is also defined by a probability density function f V : R n R + for which, for E R n : "probability of V E" = f V (v)dv E Naturally we have + f X (x)dx = R n f V (v)dv = 1 François Bachoc Introduction to Kriging models March / 55

4 Mean, variance and covariance The mean of a random variable X with density f X is denoted E(X) and is + E(X) = xf X (x)dx Let X be a random variable. The variance of X is denoted var(x) and is var(x) = E {(X E(X)) 2} var(x) is large X can be far from its mean more uncertainty. var(x) is small X is close to its mean less uncertainty. Let X, Y be two random variables. The covariance between X and Y is denoted cov(x, Y ) and is cov(x, Y ) = E {(X E(X))(Y E(Y ))} cov(x, Y ) var(x)var(y ) X and Y are almost proportional to one another. cov(x, Y ) << var(x)var(y ) X and Y are almost independent (when they are Gaussian). François Bachoc Introduction to Kriging models March / 55

5 Mean vector and covariance matrix Let V = (V 1,..., V n) t be a random vector. The mean vector of V is denoted E(V ) and is the n 1 vector defined by (E(V )) i = E(V i ) Let V = (V 1,..., V n) t be a random vector. The covariance matrix of V is denoted cov(v ) and is the n n matrix defined by (cov(v )) i,j = cov(v i, V j ) The diagonal terms show which components are the most uncertain. The non-diagonal terms show the dependence between the components. François Bachoc Introduction to Kriging models March / 55

6 Stochastic processes A stochastic process is a function Z : R n R such that Z (x) is a random variable. Alternatively a stochastic process is a function on R n that is unknown, or that depends of underlying random phenomena. We explicit the randomness of Z (x) by writing it Z (ω, x) with ω in a probability space Ω. For a given ω 0, we call the function x Z (ω 0, x) a realization of the stochastic process Z. Mean function M : x M(x) = E(Z (x)) Covariance function C : (x 1, x 2 ) C(x 1, x 2 ) = cov(z (x 1 ), Z (x 2 )) François Bachoc Introduction to Kriging models March / 55

Gaussian variables and vectors A random variable X is a Gaussian variable with mean µ and variance σ 2 > 0 when its probability density function is f µ,σ 2 (x) = 1 2πσ

density function is f m,r (v) = ( 1 (2π) n exp 1 ) 2 det(r) 2 (v m)t R 1 (v m) ) E.g.

7 Gaussian variables and vectors A random variable X is a Gaussian variable with mean µ and variance σ 2 > 0 when its probability density function is f µ,σ 2 (x) = 1 2πσ exp ( 1 (x µ)2 2σ2 A n-dimensional random vector V is a Gaussian vector with mean vector m and invertible covariance matrix R when its multidimensional probability density function is f m,r (v) = ( 1 (2π) n exp 1 ) 2 det(r) 2 (v m)t R 1 (v m) ) E.g. for Gaussian variables : µ and σ 2 are both parameters of the probability density function and the mean and variances of it. That is + xf µ,σ 2 (x)dx = µ and + (x µ)2 f µ,σ 2 (x)dx = σ 2 François Bachoc Introduction to Kriging models March / 55

8 Gaussian processes A stochastic process Z on R d is a Gaussian process when for all x 1,..., x n, the random vector (Z (x 1 ),..., Z (x n)) is Gaussian. A Gaussian process is characterized by its mean and covariance functions. Why are Gaussian processes convenient? Gaussian distribution is reasonable for modeling a large variety of random variables Gaussian processes are simple to define and simulate They are characterized by their mean and covariance functions As we will see, Gaussian properties simplify the resolution of problems Gaussian processes have been the most studied theoretically François Bachoc Introduction to Kriging models March / 55

9 Example of the Matérn 3 2 covariance function on R The Matérn 3 covariance function, for a 2 Gaussian process on R is parameterized by A variance parameter σ 2 > 0 A correlation length parameter l > 0 It is defined as ( C(x 1, x 2 ) = x ) 1 x 2 e 6 x 1 x 2 l l Interpretation cov l=0.5 l=1 l= x The Matérn 3 2 function is stationary : C(x 1 + h, x 2 + h) = c(x 1, x 2 ) The behavior of the corresponding Gaussian process is invariant by translation. σ 2 corresponds to the order of magnitude of the functions that are realizations of the Gaussian process l corresponds to the speed of variation of the functions that are realizations of the Gaussian process François Bachoc Introduction to Kriging models March / 55

10 y y y The Matérn 3 2 convariance function on R : illustration of l Plot of realizations of a Gaussian process having the Matérn 3 2 covariance function for σ2 = 1 and l = 0, 5, 1, 2 from left to right x x x François Bachoc Introduction to Kriging models March / 55

11 The Matérn 3 2 convariance function : generalization to Rd We now consider a Gaussian process on R d. The corresponding multidimensional Matérn 3 convariance function is parameterized by 2 A variance parameter σ 2 > 0 d correlation length parameters l 1 > 0,..., l d > 0 It is defined as C(x, y) = (1 + ) 6 x y l1,...,l d e 6 x y l1,...,l d with x y l1,...,l d = d (x i y i ) 2 l 2 i=1 i Interpretation Still stationary σ 2 still drives the order of magnitudes of the realizations l 1,..., l d correspond to the speed of variation of the realizations x Z (ω, x) when only the corresponding variable x 1,..., x d varies. when l i is particularly small, then the variable x i is particularly important hierarchy of the input variables x 1,...x d according to their correlation lengths l 1,..., l d François Bachoc Introduction to Kriging models March / 55

12 Summary A Gaussian process can be seen as a random phenomenon yielding realizations, i.e. specific functions R d R The standard probability tools enable to model and quantify the uncertainty we have on these realizations The choice of the covariance function (e.g. Matérn 3 ) enables to synthesize the information 2 we have (get) on the nature of the realizations with a small number of parameters François Bachoc Introduction to Kriging models March / 55

13 1 Introduction to Gaussian processes 2 Kriging prediction Conditional distribution and Gaussian conditioning theorem Kriging prediction 3 Application to metamodeling of the GERMINAL code 4 Application to validation of the FLICA 4 thermal-hydraulic code François Bachoc Introduction to Kriging models March / 55

14 Conditional probability density function Consider a partitioned random vector (Y 1, Y 2 ) t of size (n 1 + 1) 1, with probability density function f Y1,Y 2 : R n 1+1 R +. Then, Y 1 has the probability density function f Y1 (y 1 ) = R f Y 1,Y 2 (y 1, y 2 )dy 2. The conditional probability density function of Y 2 given Y 1 = y 1 is then Interpretation f Y2 Y 1 =y 1 (y 2 ) = f Y 1,Y 2 (y 1, y 2 ) f Y1 (y 1 ) It is the continuous generalization of the Bayes formula P(A B) = P(A, B) P(B) François Bachoc Introduction to Kriging models March / 55

15 Conditional mean Consider a partitioned random vector (Y 1, Y 2 ) t of size (n 1 + 1) 1, with conditional probability density function of Y 2 given Y 1 = y 1 given by f Y2 Y 1 =y 1 (y 2 ). Then the conditional mean of Y 2 given Y 1 = y 1 is E(Y 2 Y 1 = y 1 ) = y 2 f Y2 Y 1 =y 1 (y 2 )dy 2 R E(Y 2 Y 1 = y 1 ) is in fact a function of Y 1. Thus it is also a random variable. We emphasize this by writing E(Y 2 Y 1 ). Thus E(Y 2 Y 1 = y 1 ) is a realization of E(Y 2 Y 1 ). Optimality The function y 1 E(Y 2 Y 1 = y 1 ) is the best prediction of Y 2 we can make, when observing only Y 1. That is, for any function f : R n 1 R : { E (Y 2 f (Y 1 )) 2} E {(Y 2 E(Y 2 Y 1 )) 2} François Bachoc Introduction to Kriging models March / 55

16 Conditional variance Consider a partitioned random vector (Y 1, Y 2 ) t of size (n 1 + 1) 1, with conditional probability density function of Y 2 given Y 1 = y 1 given by f Y2 Y 1 =y 1 (y 2 ). Then the conditional variance of Y 2 given Y 1 = y 1 is var(y 2 Y 1 = y 1 ) = (y 2 E(Y 2 Y 1 = y 1 )) 2 f Y2 Y 1 =y 1 (y 2 )dy 2 R Summary The conditional mean E(Y 2 Y 1 ) is the best possible prediction of Y 2 given Y 1 The conditional probability density function y 2 f Y2 Y 1 =y 1 (y 2 ) can give the probability density function of the corresponding error ( most probable value, probability of threshold exceedance...) The conditional variance var(y 2 Y 1 = y 1 ) summarizes the order of magnitude of the prediction error François Bachoc Introduction to Kriging models March / 55

17 Gaussian conditioning theorem Theorem Let (Y 1, Y 2 ) t be a (n 1 + 1) 1 Gaussian vector with mean vector (m t 1, µ 2) t and covariance matrix ( R1 r 1,2 r t 1,2 σ 2 2 Then, conditionaly on Y 1 = y 1, Y 2 is a Gaussian vector with mean E(Y 2 Y 1 = y 1 ) = µ 2 + r t 1,2 R 1 1 (y 1 m 1 ) ) and variance var(y 2 Y 1 = y 1 ) = σ 2 2 r t 1,2 R 1 1 r 1,2 Illustration When (Y 1, Y 2 ) t be a 2 1 Gaussian vector with mean vector (µ 1, µ 2 ) t and covariance matrix ( 1 ρ ) ρ 1 Then E(Y 2 Y 1 = y 1 ) = µ 2 + ρ(y 1 µ 1 ) and var(y 2 Y 1 = y 1 ) = 1 ρ 2 François Bachoc Introduction to Kriging models March / 55

18 1 Introduction to Gaussian processes 2 Kriging prediction Conditional distribution and Gaussian conditioning theorem Kriging prediction 3 Application to metamodeling of the GERMINAL code 4 Application to validation of the FLICA 4 thermal-hydraulic code François Bachoc Introduction to Kriging models March / 55

19 A problem of function approximation We want to approximate a deterministic function, from a finite number of observed values of it. y x A possibility : deterministic approximation : polynomial regression, neural networks, splines, RKHS,... we can have a deterministic error bound With a Kriging model : stochastic method gives a stochastic error bound François Bachoc Introduction to Kriging models March / 55

20 Kriging model with Gaussian process realizations Kriging model : representing the deterministic and unknown function by a realization of a Gaussian process. y x Bayesian interpretation In statistics, a Bayesian model generally consists in representing a deterministic and unknown number by the realization of a random variable ( enables to incorporate expert knowledge, gives access to Bayes formula...). Here, we do the same with functions François Bachoc Introduction to Kriging models March / 55

21 Kriging prediction We let Y be the Gaussian process, on R d. Y is observed at x 1,..., x n R d. We consider here that we know the covariance function C of Y, and that the mean function of Y is zero Notations Let Y n = (Y (x 1 ),..., Y (x n)) t be the observation vector. It is a Gaussian vector Let R be the n n covariance matrix of Y n : (R) i,j = C(x i, x j ). Let x new R d be a new input point for the Gaussian process Y. We want to predict Y (x new ). Let r be the n 1 covariance vector between y and Y (x new ) : r i = C(x i, x new ) Then the Gaussian conditioning theorem gives the conditional mean of Y (x new ) given the observed values in Y n : ŷ(x new ) := E(Y (x new ) Y n) = r t R 1 Y n We also have the conditional variance : ˆσ 2 (x new ) := var(y (x new ) Y n) = C(x new, x new ) r t R 1 r François Bachoc Introduction to Kriging models March / 55

22 Kriging prediction : interpretation Exact reproduction of known values Assume, x new = x 1. Then, R i,1 = C(x i, x 1 ) = C(x i, x new ) = r i. Thus r t 1 Y (x 1 ) Y (x 1 ) r t R 1 Y n = r t.. = (1, 0,..., 0). = Y (x 1 ) Y (x n) Y (x n) Conservative extrapolation Let x new be far from x 1,..., x n. Then, we generally have r i = C(x i, x new ) 0. Thus ŷ(x new ) = r t R 1 Y n 0 and conservative ˆσ 2 (x new ) = C(x new, x new ) r t R 1 r C(x new, x new ) François Bachoc Introduction to Kriging models March / 55

23 Illustration of Kriging prediction observations François Bachoc Introduction to Kriging models March / 55

24 Illustration of Kriging prediction observations realizations from conditional distribution given Y n François Bachoc Introduction to Kriging models March / 55

25 Illustration of Kriging prediction observations realizations from conditional distribution given Y n conditional mean x new E(Y (x new ) Y n) François Bachoc Introduction to Kriging models March / 55

26 Illustration of Kriging prediction observations realizations from conditional distribution given Y n conditional mean x new E(Y (x new ) Y n) 95% confidence intervals François Bachoc Introduction to Kriging models March / 55

27 Kriging prediction with measure error It can be desirable not to reproduce the observed value exactly : when the observations comes from experiments variability of the response for a fixed input point even when the response is fixed for a given input point, it can vary strongly between very close input points Observations with measure error We consider that at x 1,..., x n, we observe Y (x 1 ) + ɛ 1,..., Y (x n) + ɛ n. ɛ 1,..., ɛ n are independent and are Gaussian variables, with mean 0 and known variance σ 2 mes. We Let Y n = (Y (x 1 ) + ɛ 1,..., Y (x n)ɛ n) t Then the Gaussian conditioning theorem still gives the conditional mean of Y (x new ) given the observed values in Y n : We also have the conditional variance : ŷ(x new ) := E(Y (x new ) Y n) = r t (R + σ 2 mes In) 1 Y n ˆσ 2 (x new ) := var(y (x new ) Y n) = C(x new, x new ) r t (R + σ 2 mes In) 1 r François Bachoc Introduction to Kriging models March / 55

28 Illustration of Kriging prediction with measure error observations François Bachoc Introduction to Kriging models March / 55

29 Illustration of Kriging prediction with measure error observations realizations from conditional distribution given Y n François Bachoc Introduction to Kriging models March / 55

30 Illustration of Kriging prediction with measure error observations realizations from conditional distribution given Y n conditional mean x new E(Y (x new ) Y n) François Bachoc Introduction to Kriging models March / 55

31 Illustration of Kriging prediction with measure error observations realizations from conditional distribution given Y n conditional mean x new E(Y (x new ) Y n) 95% confidence intervals François Bachoc Introduction to Kriging models March / 55

32 Covariance function estimation In most practical cases, the covariance function (x 1, x 2 ) C(x 1, x 2 ) is unknown. It is important to choose it correctly. In practice, it is first constrained in a parametric family of the form {C θ, θ Θ}, Θ R p E.g. the multidimensional Matérn 3 2 covariance function model on Rd, with θ = (σ 2, l 1,..., l d ) Then, most classically, the covariance parameter θ is automatically selected by Maximum Likelihood In the case without measure errors : Let Y n = (Y (x 1 ),..., Y (x n)) t be the n 1 observation vector Let R θ, be the n n covariance matrix of Y n, under covariance parameter θ : (R θ ) i,j = C θ (x i, x j ). The Maximum Likelihood estimator ˆθ ML of θ is then : ˆθ ML argmin θ Θ ( 1 (2π) n 2 det(rθ ) exp 1 ) 2 Y nr t 1 θ Yn We maximize the Gaussian probability density function of the observation vector, as a function of the covariance parameter Numerical optimization problem, where the cost function has a O(n 3 ) computational cost François Bachoc Introduction to Kriging models March / 55

33 Summary for Kriging prediction Most classical method : 1 From observed values gathered in the vector Y n 2 Choose a covariance function family, parameterized by θ Generally before investigating the observed values in detail and from a limited number of classical options (e.g. Matérn 3 2 ) 3 Optimize the Maximum Likelihood criterion w.r.t θ ˆθ ML Numerical optimization : gradient, quasi Newton, genetic algorithm... Potential condition-number problems 4 In the sequel, do as if the estimated covariance function C ˆθML (x 1, x 2 ) is the true covariance function (plug-in method). 5 Compute the conditional mean x new E(Y (x new ) Y n) and the conditional variance x new var(y (x new ) Y n) with explicit matrix vector formulas (Gaussian conditioning theorem) François Bachoc Introduction to Kriging models March / 55

34 1 Introduction to Gaussian processes 2 Kriging prediction Conditional distribution and Gaussian conditioning theorem Kriging prediction 3 Application to metamodeling of the GERMINAL code 4 Application to validation of the FLICA 4 thermal-hydraulic code François Bachoc Introduction to Kriging models March / 55

GERMINAL code : presentation Context GERMINAL code : simulation of the thermal-mechanical impact of the irradiation on a nuclear fuel pin Its utilization is part of a multi-physics and

35 GERMINAL code : presentation Context GERMINAL code : simulation of the thermal-mechanical impact of the irradiation on a nuclear fuel pin Its utilization is part of a multi-physics and multi-objective optimization problem from reactor core design In collaboration with Karim Ammar (PhD student, CEA, DEN) François Bachoc Introduction to Kriging models March / 55

36 GERMINAL code : inputs and outputs 12 inputs x 1,..., x 12 [0, 1] (normalization) x 1, x 2 : Schedule parameters for the exploitation of the fuel pin x 3,..., x 8 : Nature parameters of the fuel pin (geometry, plutonium concentration) x 9, x 10, x 11 : parameters for the characterization of the power map in the fuel pin x 12 : disposal volume for the fission gas produced in the fuel pin 2 scalar variable of interests g 1 : initial temperature. Maximum, over space, of the temperature at the initial time. Rather simple to approximate g 2 : fusion-margin. Minimum difference, over space and time, of the fusion temperature of the fuel and the current temperature. More difficult to approximate general scheme 12 scalar inputs GERMINAL run spatio-temporal maps 2 scalar outputs We want to approximate 2 functions g 1, g 2 : R 12 R François Bachoc Introduction to Kriging models March / 55

37 GERMINAL code : data bases and measure error Data bases For the first output g 1, we have a learning base of n = points (15722 couples (x 1, g 1 (x 1 )),..., (x n, g 1 (x n)) with x i R 12 ). We have a test base of n test = 6521 elements. For the second output g 2, we have n = 3807 and n test = 1613 Measure errors The GERMINAL computation scheme (GERMINAL + pre and post-treatment) had not been used for so many inputs numerical instabilities (some very close inputs can give significantly distant outputs) we incorporate the measure error parameter σmes 2 to model numerical instabilities (estimated by Maximum Likelihood, together with covariance function parameters) François Bachoc Introduction to Kriging models March / 55

38 GERMINAL code : metamodels and performance indicators Metamodel A metamodel of g is a function ĝ : [0, 1] 12 R, that is built using the learning base only. We consider 2 metamodels : The Kriging conditional mean (with Matérn 3 covariance function and measure error variance 2 estimated by Maximum Likelihood) A neural-network method, of the uncertainty platform URANIE Once built, the cost of computing ĝ(x new ) for a new x new [0, 1] 12 is very small compared to a GERMINAL run. Error indicator Root Mean Square Error (RMSE) on the test base : RMSE = 1 n test (ĝ(x test,i ) g(x test,i )) 2 n test i=1 François Bachoc Introduction to Kriging models March / 55

39 Results For initial temperature g 1 (standard deviation of 344 ) estimated σ mes RMSE Kriging Neural networks 11.9 For Fusion Margin g 2 (standard deviation of 342 ) estimated σ mes RMSE Kriging Neural networks 39.7 Confirmation that output g 2 is more difficult to predict than g 1 In both cases, a significant part of the RMSE comes from the numerical instability, of order of magnitude σ mes The metamodels have overall quite good performances (3% and 10% relative error) The Kriging metamodel has here comparable to slightly larger accuracy than the neural networks On the other hand, the neural network metamodel is significantly faster than Kriging (computational cost in O(n) with n large). Nevertheless both metamodels can be considered as fast enough François Bachoc Introduction to Kriging models March / 55

40 1 Introduction to Gaussian processes 2 Kriging prediction Conditional distribution and Gaussian conditioning theorem Kriging prediction 3 Application to metamodeling of the GERMINAL code 4 Application to validation of the FLICA 4 thermal-hydraulic code François Bachoc Introduction to Kriging models March / 55

41 Computer code and physical system The computer code is represented by a function f : f : R d R m R (x, β) f (x, β) The physical system is represented by a function Y real. Y real : R d R x Y real (x) Y obs : x Y obs (x) := Y real (x) + ɛ(x) The inputs in x are the experimental conditions The inputs in β are the calibration parameters of the computer code The outputs f (x, β) and Y real (x) are the variable of interest Measure error ɛ(x) François Bachoc Introduction to Kriging models March / 55

42 Explaining discrepancies between simulation and experimental results We have carried out experiments Y obs (x 1 ),..., Y obs (x n). Discrepancies between simulations f (x i, β) and observations Y obs (x i ) can have 3 sources : misspecification of β measure errors on the observations Y obs (x i ) Errors on the specifications of the experimental conditions x i These 3 errors can insufficient to explain the differences between simulations and experiments François Bachoc Introduction to Kriging models March / 55

43 Gaussian process modeling of the model error Gaussian process model : unknown physical system represented by a realization of a Gaussian process Y real (x) = f (x, β) + Z (x) Y obs (x) = Y real (x) + ɛ(x) β : calibration parameter incorporation of expert knowledge with the Bayesian framework Z is the model error of the code. Z is modeled as the realization of a Gaussian process François Bachoc Introduction to Kriging models March / 55

44 Universal Kriging model Linear approximation of the code x : f (x, β) = m h i (x)β i small uncertainty on β Observations stem from a Gaussian process with linearly parameterized mean function with unknown coefficients universal Kriging model. 3 steps With similar matrix vector formula and interpretation as for the 0 mean function case : Estimation of the covariance function of Z Code calibration : conditional probability density function of β Prediction of the physical system : conditional mean E(Y real (x new ) Y n) and conditional variance var(y real (x new ) Y n) i=1 François Bachoc Introduction to Kriging models March / 55

45 Universal Kriging model : calibration and prediction formula (1/2) Y real is observed at x 1,..., x n R d. We consider here that we know the covariance function C of the model error Z Notations Let Y n = (Y real (x 1 ),..., Y real (x n)) t be the observation vector. It is a Gaussian vector Let R be the n n covariance matrix of (Z (x 1 ),..., Z (x n)) : (R) i,j = C(x i, x j ). Let x new R d be a new input point for the Gaussian process Y real. We want to predict Y (x new ). Let r be the n 1 covariance vector between Z (x 1 ),..., Z (x n) and Z (x new ) : r i = C(x i, x new ) Let H be the n m matrix of partial derivatives of f at x 1,..., x n : H i,j = h j (x i ) Let h be the m 1 vector of partial derivatives of f at X new : h i = h i (x new ) Let σ 2 mes be the variance of the measure error Then the Gaussian conditioning theorem gives the conditional mean of β given the observed values in Y n : β post := E(β Y n) = β prior + (Q 1 prior + HT (R + σ 2 mes In) 1 H) 1 H T (R + σ 2 mes In) 1 (Y n Hβ prior ). François Bachoc Introduction to Kriging models March / 55

46 Universal Kriging model : calibration and prediction formula (2/2) We also have the conditional mean of Y real (x new ) : ŷ real (x new ) := E(Y real (x new ) Y n) = h t β post + r t (R + σ 2 mes In) 1 (Y n Hβ post ) The conditional variance of Y real (x new ) is ˆσ 2 (x new ) := var(y real (x new ) Y n) = C(x new, x new ) r t (R + σ 2 mesi n) 1 r + (h H t (R + σ 2 mesi n) 1 h) t (H t (R + σ 2 mesi n) 1 H + Q 1 prior ) 1 (h H t (R + σ 2 mesi n) 1 r) Interpretation The prediction expression is decomposed into a calibration term and a Gaussian inference term of the model error When the code has a small error on the n observations, the prediction at x new uses almost only the calibrated code The conditional variance is larger than when the mean function is known François Bachoc Introduction to Kriging models March / 55

47 Application to FLICA 4 thermal-hydraulic code The experiment consists in pressurized and possibly heated water passing through a cylinder. We measure the pressure drop between the two ends of the cylinder. Quantity of interest : The part of the pressure drop due to friction : P fro Two kinds of experimental conditions : System parameters : Hydraulic diameter D h, Friction height H f, Channel width e Environment variables : Output pressure P s, Flowrate G e, Parietal heat flux Φ p, Liquid enthalpy he, l Thermodynamic title Xth e, Input temperature Te We have 253 experimental results François Bachoc Introduction to Kriging models March / 55

48 Friction model of FLICA 4 Code based on the (local) analytical model with P fro = f iso : Isothermal model. Parameterized a t and b t. f h : Monophasic model. Prior information case with β prior = ( H 2ρD h G 2 f iso f h. ) ( , Q prior = ) François Bachoc Introduction to Kriging models March / 55

Results We compare predictions to observations using Cross Validation We dispose of : The vector of posterior mean Pˆ fro of size n. The vector of posterior variance σpred 2 of size n.

49 Results We compare predictions to observations using Cross Validation We dispose of : The vector of posterior mean Pˆ fro of size n. The vector of posterior variance σpred 2 of size n. 2 quantitative criteria : 1 ( ) 2 RMSE : n n i=1 P fro,i ˆ P fro (x i ) Confidence Interval Reliability : proportion of observations that fall in the posterior 90% confidence interval. François Bachoc Introduction to Kriging models March / 55

50 Results with the thermal-hydraulic code Flica IV (2/2) RMSE 90% Confidence Interval Reliability Nominal code 661Pa 234/ Gaussian Processes 189Pa 235/ prediction error 90% confidence intervals prediction error 90% confidence intervals pressure drop pressure drop Index Index François Bachoc Introduction to Kriging models March / 55

51 Conclusion We can improve the predictions of a computer code by completing it with a Kriging model built with the experimental results The number of experimental results needs to be sufficient. No extrapolation For more details Bachoc F, Bois G, Garnier J and Martinez J.M, Calibration and improved prediction of computer models by universal Kriging, Nuclear Science and Engineering 176(1) (2014) François Bachoc Introduction to Kriging models March / 55

52 General conclusion Standard Kriging framework Versatile and easy-to-use statistical model We can incorporate a priori knowledge in the choice of the covariance function family After this choice, the standard method is rather automatic We associate confidence intervals to the predictions The Gaussian framework brings numerical criteria for the quality of the obtained model Extensions Kriging model can be goal-oriented : optimization, code validation, estimation of failure regions, global sensitivity analysis... Standard Kriging method can be computationally costly for large n approximate Kriging prediction and covariance function estimation is a current research domain François Bachoc Introduction to Kriging models March / 55

53 Some references (1/2) General books C.E. Rasmussen and C.K.I. Williams, Gaussian Processes for Machine Learning, The MIT Press, Cambridge (2006). Santner, T.J, Williams, B.J and Notz, W.I, The Design and Analysis of Computer Experiments Springer, New York (2003). M. Stein, Interpolation of Spatial Data : Some Theory for Kriging, Springer, New York (1999). On covariance function estimation F. Bachoc, Cross Validation and Maximum Likelihood Estimations of Hyper-parameters of Gaussian Processes with Model Mispecification, Computational Statistics and Data Analysis 66 (2013) F. Bachoc, Asymptotic analysis of the role of spatial sampling for covariance parameter estimation of Gaussian processes, Journal of Multivariate Analysis 125 (2014) Kriging for optimization D. Ginsbourger and V. Picheny, Noisy kriging-based optimization methods : a unified implementation within the DiceOptim package, Computational Statistics and Data Analysis 71 (2013) François Bachoc Introduction to Kriging models March / 55

54 Some references (2/2) Kriging for failure-domain estimation C. Chevalier, J. Bect, D. Ginsbourger, E. Vazquez, V. Picheny, and Y. Richet, Fast parallel kriging-based stepwise uncertainty reduction with application to the identification of an excursion set, Technometrics, (2014). Kriging with multifidelity computer codes L. Le Gratiet, Bayesian Analysis of Hierarchichal Multifidelity Codes, SIAM/ASA J. Uncertainty Quantification (2013). L. Le Gratiet and J. Garnier, Recursive co-kriging model for Design of Computer experiments with multiple levels of fidelity, International Journal for Uncertainty Quantification. Kriging for global sensitivity analysis A. Marrel, B. Iooss, S. da Veiga and M. Ribatet, Global sensitivity analysis of stochastic computer models with joint metamodels, Statistics and Computing, 22 (2012) Kriging for code validation S. Fu, Inversion probabiliste bayésienne en analyse d incertitude, PhD thesis, Université Paris-Sud 11. François Bachoc Introduction to Kriging models March / 55

55 Thank you for your attention! François Bachoc Introduction to Kriging models March / 55

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling François Bachoc former PhD advisor: Josselin Garnier former CEA advisor: Jean-Marc Martinez Department