Assessment of uncertainty in computer experiments: from Universal Kriging to Bayesian Kriging. Céline Helbert, Delphine Dupuy and Laurent Carraro

Assessment of uncertainty in computer experiments: from Universal Kriging to Bayesian Kriging., Delphine Dupuy and Laurent Carraro

Historical context First introduced in the field of geostatistics (Matheron, 1960). Recent use as a response surface for computer experiments (Sacks,1989 ; Santner,003). Prediction and uncertainty on prediction (Jones, 1998, Oakley, 004) Two different approaches among practitioners: Universal Kriging (UK) parameters are estimated (CV, ML) Bayesian Kriging (BK) parameters are random variables Goal : BK allows the interpretation of UK uncertainty as a prediction variance Application of BK : petroleum case study

Outline Universal Kriging limits Bayesian Kriging pro and con Case study

Universal Kriging

Assumptions : ( ) ( ) ( ) Probabilistic Context Y x = f x β + Z x where Z is a GP E ( Z ( x) ) ( ) ( ) = 0 Cov Z x Z x h R h (, + ) = σ ( θ ) T n values, Y = ( y1 y n ), are observed at points = ( ) Prediction and uncertainty : X x1 x n T

Assumptions : ( ) = ( ) β + ( ) Probabilistic Context Y x f x Z x where Z is a GP E ( Z ( x) ) ( ) ( ) = 0 Cov Z x Z x h R h (, + ) = σ ( θ ) T n values, Y = ( y1 y n ), are observed at points = ( ) Prediction and uncertainty : Case Simple Kriging : parameters are known SK T 1 ( ) = ( ) β + ( β ) Y x f x r R Y F 0 0 T ( 0 ) = σ ( 1 θ θ θ ) 1 σ SK x r R r θ θ X x1 x n T

Assumptions : ( ) = ( ) β + ( ) Probabilistic Context Y x f x Z x where Z is a GP E ( Z ( x) ) ( ) ( ) = 0 Cov Z x Z x h R h (, + ) = σ ( θ ) T n values, Y = ( y1 y n ), are observed at points = ( ) X x1 x n T Prediction and uncertainty : Case Simple Kriging : parameters are known SK T 1 ( ) = ( ) β + ( β ) Y x f x r R Y F 0 0 T ( 0 ) = σ ( 1 θ θ θ ) 1 σ SK x r R r θ θ Case Universal Kriging : parameters are estimated UK ( ) ( ) ˆ T 1 0 = 0 β + ˆ ˆ ( ˆ β θ θ ) Y x f x r R Y F 1 ( )( ) ( ( ) ) T 1 T 1 T 1 T 1 UK ( x 0 ) ˆ 1 r ˆ R ˆ r ˆ f ( x 0 ) r ˆ R ˆ F F R ˆ F f x 0 r ˆ R σ = σ + ˆ F θ θ θ θ θ θ θ θ T

Simple Kriging - example 3.5 1.5 1 0.5 0-0.5 output SK data -1 0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ( ( )) ( ) ( ) ( ) ( ) ( ) ( h ) E Y x = 0 Var Y x = 4 Corr Y x, Y x + h = exp 0.

Limits of Universal Kriging Difficulties due to estimation : flat Likelihood too few data to estimate covariance function and ranges experimental design sensibility Underestimation of uncertainty σ UK ( x 0 ) does not take into account uncertainties due the estimation of variance, σ, and range θ No probabilistic interpretation ( ) ( ) ˆ T 1 Y ( ˆ ) UK x0 f x0 β rˆ R ˆ Y Fβ θ θ ( ) = + E Y ( x0 ) Y T 0 = ˆ σ 1 ˆ ˆ ˆ +... θ θ θ Var ( Y ( x0 ) Y ) ( ) ( ) 1 σuk x r R r

Bayesian Kriging

Model of Bayesian Kriging Assumptions : β, θ, σ are Random Variables. Let π be the prior distribution. ( ) = ( ) + ( ) ( Z ( x) ) x D Y x β, θ, σ f x β Z x, with the same assumptions for T n values, = ( ), are observed at points = ( ) Y y1 y n X x x n 1 T Interpretation : A mixture of Gaussian Processes (Y(x) is not Gaussian) The weight of a given process in the mixture depends on its prior π.

Equations of Bayesian Kriging ( SK SK ) ( ), β, θ, σ Ν ( ), σ ( ) Y x Y Y x x 0 0 0 ( ( 0 ) ) ( 0 ) β, θ, σ ( ) ( ) E Y x Y E Y x Y, β, θ, σ π β, θ, σ Y dβ dθdσ = (,, Y ) π β θ σ = ( ; β, θ, σ ) π ( β, θ, σ ) L Y π ( Y )

Equations of Bayesian Kriging ( SK SK ) ( ), β, θ, σ Ν ( ), σ ( ) Y x Y Y x x 0 0 0 ( ( 0 ) ) ( 0 ) β, θ, σ ( ) ( ) E Y x Y E Y x Y, β, θ, σ π β, θ, σ Y dβ dθdσ = Prediction : BK ( ) ( ) = ( ) Y x E Y x Y 0 0 (,, Y ) π β θ σ = ( ; β, θ, σ ) π ( β, θ, σ ) L Y π ( Y )

Equations of Bayesian Kriging ( SK SK ) ( ), β, θ, σ Ν ( ), σ ( ) Y x Y Y x x 0 0 0 ( ( 0 ) ) ( 0 ) β, θ, σ ( ) ( ) E Y x Y E Y x Y, β, θ, σ π β, θ, σ Y dβ dθdσ = Prediction : BK ( ) ( ) = ( ) Y x E Y x Y Measure of uncertainty : 0 0 ( ) ( ) ( ) BK x 0 Var Y x 0 Y σ = Simulation of the distribution of Y(x 0 ) Y (,, Y ) π β θ σ = ( ; β, θ, σ ) π ( β, θ, σ ) L Y π ( Y )

Particular case of prior distribution Gaussian Case for β Prior distribution (θ and σ are constant) ( ) = Ν( µ, λσ) π β

Particular case of prior distribution Gaussian Case for β (θ and σ are constant) ( ) = Ν( µ, λσ) Prior distribution π β Posterior Gaussian distribution for β 1 T T ( β ) = µ + λσ ( λ Σ + σ θ ) ( µ ) 1 T T ( β ) λ λ ( λ σ θ ) E Y F F F R Y F Var Y = Σ ΣF FΣ F + R FΣ

Particular case of prior distribution Gaussian Case for β (θ and σ are constant) ( ) = Ν( µ, λσ) Prior distribution π β Posterior Gaussian distribution for β Posterior Gaussian distribution for Y(x 0 ) ( ( 0 ) ) ( 0 ) ( ( ) ) ( ) 1 ( )( µ ( λ σ ) ( µ )) T 1 T T T 1 E Y x Y = f x rθ Rθ F + ΣF FΣ F + Rθ Y F + rθ Rθ Y 1 T 1 T T T 1 1 ( ) ( ) ( ( ) ) T T θ θ λ λ λ σ θ θ θ σ ( θ θ θ ) Var Y x0 Y = f x0 r R F Σ ΣF FΣ F + R FΓ f x0 r R F + 1 r R r

Particular case of prior distribution Gaussian Case for β (θ and σ are constant) ( ) = Ν( µ, λσ) Prior distribution π β Posterior Gaussian distribution for β Posterior Gaussian distribution for Y(x 0 ) Particular case : λ + (non informative prior for β) 1 Posterior Gaussian distribution for β T 1 T 1 ( ) ( ) ˆ E β Y = F Rθ F F Rθ Y = β 1 T 1 Var β Y = σ F R F Var ˆ θ = β Posterior Gaussian distribution for Y(x 0 ) ( ) ( ) ( ) ( ) ( ML ) ( 0 ) = UK ( 0 ) ( 0 ) = σuk ( 0 ) E Y x Y Y x Var Y x Y x ML

Bayesian Kriging- difficulties The simulation of the posterior distribution of the parameters can be hard (Simulation by a Monte Carlo Markov chain method) The choice of the prior is difficult Case of a flat prior for β, θ and σ Roughly equivalent to maximize the likelihood function Advantages: the prediction variance takes all sources of uncertainty into account, the optimization problem disappears Case of an informative prior: which one? What impact? IDEA: to use a simplified simulation (faster) to derive prior information Example : petroleum field

Application Simulator : flow simulator - 3DSL 3 inputs on [-1,1] lmultkz (permeability) krwmax (relative permeability), lbhp (low bottom hole pressure) Output: Field oil production total after 7000 days Problem : uncertainty analysis Method : metamodel and its uncertainty (Bayesian Kriging)

3DSL /degraded simulations Idea : using a faster simulator to get prior information degraded simulations (NODESMAX, DTMAX, DVPMAX etc)

3DSL /degraded simulations Correlation Coefficient ALL lmultkz krwmax lbhp 3DSL /Degraded 1 0.95 0.61 0.97 0.98 3DSL /Degraded 0.80-0.19 0.89 0.91 Note : calculations carried out on a grid of 1331=11 3 points

3DSL /degraded simulations 3.4 x 107 LMULTKZ 3.5 x 107 KRWMAX 3.45 x 107 LBHP 3.4 3.38 3.36 3.45 3.4 3.4 FOPT 3.34 3.3 3DSL Degraded 1 Degraded 3.35 3.3 3.35 3.3 3.3 3.5 3.8 3.5 3.6 3. 3.4-1 -0.5 0 0.5 1 3.15-1 -0.5 0 0.5 1 3. -1-0.5 0 0.5 1

4 different strategies runs UK no info BK BK info1 BK info time UK no info BK BK info1 BK info 3DSL 0 0 17 18 3DSL 41 41 35 37 Deg. 1 X X 4 X Deg. 1 X X 4 X Deg. X X X 4 Deg. X X X 3 Total 41 41 39 40 Info 1 = No info BK on the 4 runs of Degraded 1 provides information on trend, variance and correlation

4 different strategies UK no info BK BK info1 BK info RMSE* 05 04 04 04 166 770 3 845 Average standard deviation 99 457 141 573 188 411 58 394 proportion of outside points* 40% 19% 8% 4% Note : calculations carried out on a grid of 1331=11 3 points Accuracy of prediction: N N i = 1 ( ( ) ( )) i i 1 RMSE = Y x Yˆ x Accuracy of uncertainty: ASD 1 N σ Yˆ i N i= 1 = N i= 1 ( x ) 1 PR = 1 N σ ( ) ˆ( ) > ( ) Y x Y x x i i Yˆ i

Conclusion Advantages of Bayesian kriging In the case of a non informative prior for β: ( ( 0 ) ) = UK ( 0 ) ( ) = σ E Y x Y Y x ( 0 ) UK ( 0 ) Var Y x Y x Good estimation of the prediction variance which takes into account all sources of uncertainty: on β, σ and θ. Weakness of Bayesian kriging MCMC simulations Choice of the prior

Universal Kriging - example 3.5 1.5 θ = 0. (known) ˆ σ = 0.86 (σ = 4) ˆ β = 1.09 (β = 0) 1 0.5 0-0.5 output SK UK data -1 0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Experimental design sensibility likelihood optimization : f(θ) = -log(l(θ)) 3.5 1.5 1 0.5 0 4 0 18 16 14 1 10 8 6 4 5 8 10 11 3 5 8 9 3 6 9 11 5 6 7 9 11 5 6 9 11-0.5 0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 4 0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Variability on θ θ Identification pb

Prior and posterior distributions 0,35 0,19 0,78 0,68 0,70 0,68 0,4 0,61 Std(. Y) 0,97 0,48,5 1,69-0,58-0,63 0,00-0,56 E(. Y) Deg. 1,30 0,31 1,93 0,88 0,53 0,88 0,57 0,78 Std(. Y),71 0,76 3,57,1-0,64-0,5-0,3-0,40 E(. Y) Deg. 1 teta3 teta teta1 sigma beta3 beta beta1 beta0 1,73 0,8,00 0,80 0,4 1,05 0,55 1,06 Std(. Y) 4,36 1,08 3,86,64-0,35 0,34 0,54-0,57 E(. Y) no info 0,8 0,13 0,64 0,1 0,40 0,44 0,4 0,39 Std(. Y) 1,45 0,69,89 0,97-0,36-0,63 0,16-0,40 E(. Y) info 0,88 0,19 1,16 0,40 0,3 0,64 0,30 0,55 Std(. Y) 3,3 0,87 3,74 1,88-0,47-0,41 0,19-0,41 E(. Y) info 1 teta3 teta teta1 sigma beta3 beta beta1 beta0 17 points Prior

Range variation sensibility 4 beta = 1.1 sigma = 0.94 teta = 0.01 3 4 beta = 1.1 sigma = 0.98 teta = 0.16756 1 3 0-1 1 0-0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 4 beta = 4.4 sigma = 66 teta = 0.5-1 3-0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0-1 - 0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Sensibility to range variation 4 beta = 1.1 sigma = 0.98 teta = 0.16756 4 beta = 4.4 sigma = 66 teta = 0.5 3 3 1 1 0 0-1 -1-0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 4 beta = 1.1 sigma = 0.94 teta = 0.01 1 Vraisemblance en fonction des paramètres d'échelle. 75 3 0.9 0.8 70 65 0.7 60 0.6 55 1 θ 0.5 50 0 0.4 0.3 45 40-1 0. 35 0.1 30-0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 θ 1