Assessment of uncertainty in computer experiments: from Universal Kriging to Bayesian Kriging. Céline Helbert, Delphine Dupuy and Laurent Carraro

Similar documents
Density Estimation. Seungjin Choi

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Fast Likelihood-Free Inference via Bayesian Optimization

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

STA 4273H: Sta-s-cal Machine Learning

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Bayesian Methods for Machine Learning

David Giles Bayesian Econometrics

Bayesian Inference and MCMC

Module 17: Bayesian Statistics for Genetics Lecture 4: Linear regression

A Probabilistic Framework for solving Inverse Problems. Lambros S. Katafygiotis, Ph.D.

CPSC 540: Machine Learning

STAT 518 Intro Student Presentation

Better Simulation Metamodeling: The Why, What and How of Stochastic Kriging

arxiv: v1 [stat.me] 24 May 2010

Machine Learning. Probabilistic KNN.

Session 5B: A worked example EGARCH model

Assessing Reliability Using Developmental and Operational Test Data

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Accurate Maximum Likelihood Estimation for Parametric Population Analysis. Bob Leary UCSD/SDSC and LAPK, USC School of Medicine

A Bayesian Approach to Phylogenetics

STA414/2104 Statistical Methods for Machine Learning II

Prediction of Data with help of the Gaussian Process Method

Data Analysis and Uncertainty Part 2: Estimation

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Multi-fidelity sensitivity analysis

Statistics in Environmental Research (BUC Workshop Series) II Problem sheet - WinBUGS - SOLUTIONS

Statistics & Data Sciences: First Year Prelim Exam May 2018

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Module 4: Bayesian Methods Lecture 5: Linear regression

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Climate Change: the Uncertainty of Certainty

Monitoring Wafer Geometric Quality using Additive Gaussian Process

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Interval Estimation III: Fisher's Information & Bootstrapping

Bayesian Estimation of Input Output Tables for Russia

An introduction to Bayesian statistics and model calibration and a host of related topics

Nonparameteric Regression:

Dynamic Multipath Estimation by Sequential Monte Carlo Methods

ABHELSINKI UNIVERSITY OF TECHNOLOGY

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Non-Parametric Bayes

Bayesian Regression Linear and Logistic Regression

Risk Estimation and Uncertainty Quantification by Markov Chain Monte Carlo Methods

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Design of experiments for smoke depollution of diesel engine outputs

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

Introduction to Gaussian Process

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

STA 4273H: Statistical Machine Learning

F denotes cumulative density. denotes probability density function; (.)

Statistical Methods in Particle Physics

Bayesian Estimation with Sparse Grids

VCMC: Variational Consensus Monte Carlo

CSC 2541: Bayesian Methods for Machine Learning

Consistent Downscaling of Seismic Inversions to Cornerpoint Flow Models SPE

Introduction to Bayes and non-bayes spatial statistics

Variational Methods in Bayesian Deconvolution

System identification and control with (deep) Gaussian processes. Andreas Damianou

Part 6: Multivariate Normal and Linear Models

Nonparametric Bayesian Methods (Gaussian Processes)

Introduction to Markov Chain Monte Carlo

Bayesian Machine Learning

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Statistical Data Analysis Stat 3: p-values, parameter estimation

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

Bayesian Networks in Educational Assessment

Lecture 6: Bayesian Inference in SDE Models

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

Covariance function estimation in Gaussian process regression

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Application of the Ensemble Kalman Filter to History Matching

Parametric Techniques Lecture 3

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayes: All uncertainty is described using probability.

The local dark matter halo density. Riccardo Catena. Institut für Theoretische Physik, Heidelberg

Statistical Data Analysis Stat 5: More on nuisance parameters, Bayesian methods

Sequential Importance Sampling for Rare Event Estimation with Computer Experiments

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

PATTERN RECOGNITION AND MACHINE LEARNING

Modular Bayesian uncertainty assessment for Structural Health Monitoring

STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS

A novel determination of the local dark matter density. Riccardo Catena. Institut für Theoretische Physik, Heidelberg

Uncertainty in energy system models

Patterns of Scalable Bayesian Inference Background (Session 1)

A Bayesian Approach to Prediction and Variable Selection Using Nonstationary Gaussian Processes

Parametric Techniques

Probing the covariance matrix

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Transcription:

Assessment of uncertainty in computer experiments: from Universal Kriging to Bayesian Kriging., Delphine Dupuy and Laurent Carraro

Historical context First introduced in the field of geostatistics (Matheron, 1960). Recent use as a response surface for computer experiments (Sacks,1989 ; Santner,003). Prediction and uncertainty on prediction (Jones, 1998, Oakley, 004) Two different approaches among practitioners: Universal Kriging (UK) parameters are estimated (CV, ML) Bayesian Kriging (BK) parameters are random variables Goal : BK allows the interpretation of UK uncertainty as a prediction variance Application of BK : petroleum case study

Outline Universal Kriging limits Bayesian Kriging pro and con Case study

Universal Kriging

Assumptions : ( ) ( ) ( ) Probabilistic Context Y x = f x β + Z x where Z is a GP E ( Z ( x) ) ( ) ( ) = 0 Cov Z x Z x h R h (, + ) = σ ( θ ) T n values, Y = ( y1 y n ), are observed at points = ( ) Prediction and uncertainty : X x1 x n T

Assumptions : ( ) = ( ) β + ( ) Probabilistic Context Y x f x Z x where Z is a GP E ( Z ( x) ) ( ) ( ) = 0 Cov Z x Z x h R h (, + ) = σ ( θ ) T n values, Y = ( y1 y n ), are observed at points = ( ) Prediction and uncertainty : Case Simple Kriging : parameters are known SK T 1 ( ) = ( ) β + ( β ) Y x f x r R Y F 0 0 T ( 0 ) = σ ( 1 θ θ θ ) 1 σ SK x r R r θ θ X x1 x n T

Assumptions : ( ) = ( ) β + ( ) Probabilistic Context Y x f x Z x where Z is a GP E ( Z ( x) ) ( ) ( ) = 0 Cov Z x Z x h R h (, + ) = σ ( θ ) T n values, Y = ( y1 y n ), are observed at points = ( ) X x1 x n T Prediction and uncertainty : Case Simple Kriging : parameters are known SK T 1 ( ) = ( ) β + ( β ) Y x f x r R Y F 0 0 T ( 0 ) = σ ( 1 θ θ θ ) 1 σ SK x r R r θ θ Case Universal Kriging : parameters are estimated UK ( ) ( ) ˆ T 1 0 = 0 β + ˆ ˆ ( ˆ β θ θ ) Y x f x r R Y F 1 ( )( ) ( ( ) ) T 1 T 1 T 1 T 1 UK ( x 0 ) ˆ 1 r ˆ R ˆ r ˆ f ( x 0 ) r ˆ R ˆ F F R ˆ F f x 0 r ˆ R σ = σ + ˆ F θ θ θ θ θ θ θ θ T

Simple Kriging - example 3.5 1.5 1 0.5 0-0.5 output SK data -1 0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ( ( )) ( ) ( ) ( ) ( ) ( ) ( h ) E Y x = 0 Var Y x = 4 Corr Y x, Y x + h = exp 0.

Limits of Universal Kriging Difficulties due to estimation : flat Likelihood too few data to estimate covariance function and ranges experimental design sensibility Underestimation of uncertainty σ UK ( x 0 ) does not take into account uncertainties due the estimation of variance, σ, and range θ No probabilistic interpretation ( ) ( ) ˆ T 1 Y ( ˆ ) UK x0 f x0 β rˆ R ˆ Y Fβ θ θ ( ) = + E Y ( x0 ) Y T 0 = ˆ σ 1 ˆ ˆ ˆ +... θ θ θ Var ( Y ( x0 ) Y ) ( ) ( ) 1 σuk x r R r

Bayesian Kriging

Model of Bayesian Kriging Assumptions : β, θ, σ are Random Variables. Let π be the prior distribution. ( ) = ( ) + ( ) ( Z ( x) ) x D Y x β, θ, σ f x β Z x, with the same assumptions for T n values, = ( ), are observed at points = ( ) Y y1 y n X x x n 1 T Interpretation : A mixture of Gaussian Processes (Y(x) is not Gaussian) The weight of a given process in the mixture depends on its prior π.

Equations of Bayesian Kriging ( SK SK ) ( ), β, θ, σ Ν ( ), σ ( ) Y x Y Y x x 0 0 0 ( ( 0 ) ) ( 0 ) β, θ, σ ( ) ( ) E Y x Y E Y x Y, β, θ, σ π β, θ, σ Y dβ dθdσ = (,, Y ) π β θ σ = ( ; β, θ, σ ) π ( β, θ, σ ) L Y π ( Y )

Equations of Bayesian Kriging ( SK SK ) ( ), β, θ, σ Ν ( ), σ ( ) Y x Y Y x x 0 0 0 ( ( 0 ) ) ( 0 ) β, θ, σ ( ) ( ) E Y x Y E Y x Y, β, θ, σ π β, θ, σ Y dβ dθdσ = Prediction : BK ( ) ( ) = ( ) Y x E Y x Y 0 0 (,, Y ) π β θ σ = ( ; β, θ, σ ) π ( β, θ, σ ) L Y π ( Y )

Equations of Bayesian Kriging ( SK SK ) ( ), β, θ, σ Ν ( ), σ ( ) Y x Y Y x x 0 0 0 ( ( 0 ) ) ( 0 ) β, θ, σ ( ) ( ) E Y x Y E Y x Y, β, θ, σ π β, θ, σ Y dβ dθdσ = Prediction : BK ( ) ( ) = ( ) Y x E Y x Y Measure of uncertainty : 0 0 ( ) ( ) ( ) BK x 0 Var Y x 0 Y σ = Simulation of the distribution of Y(x 0 ) Y (,, Y ) π β θ σ = ( ; β, θ, σ ) π ( β, θ, σ ) L Y π ( Y )

Particular case of prior distribution Gaussian Case for β Prior distribution (θ and σ are constant) ( ) = Ν( µ, λσ) π β

Particular case of prior distribution Gaussian Case for β (θ and σ are constant) ( ) = Ν( µ, λσ) Prior distribution π β Posterior Gaussian distribution for β 1 T T ( β ) = µ + λσ ( λ Σ + σ θ ) ( µ ) 1 T T ( β ) λ λ ( λ σ θ ) E Y F F F R Y F Var Y = Σ ΣF FΣ F + R FΣ

Particular case of prior distribution Gaussian Case for β (θ and σ are constant) ( ) = Ν( µ, λσ) Prior distribution π β Posterior Gaussian distribution for β Posterior Gaussian distribution for Y(x 0 ) ( ( 0 ) ) ( 0 ) ( ( ) ) ( ) 1 ( )( µ ( λ σ ) ( µ )) T 1 T T T 1 E Y x Y = f x rθ Rθ F + ΣF FΣ F + Rθ Y F + rθ Rθ Y 1 T 1 T T T 1 1 ( ) ( ) ( ( ) ) T T θ θ λ λ λ σ θ θ θ σ ( θ θ θ ) Var Y x0 Y = f x0 r R F Σ ΣF FΣ F + R FΓ f x0 r R F + 1 r R r

Particular case of prior distribution Gaussian Case for β (θ and σ are constant) ( ) = Ν( µ, λσ) Prior distribution π β Posterior Gaussian distribution for β Posterior Gaussian distribution for Y(x 0 ) Particular case : λ + (non informative prior for β) 1 Posterior Gaussian distribution for β T 1 T 1 ( ) ( ) ˆ E β Y = F Rθ F F Rθ Y = β 1 T 1 Var β Y = σ F R F Var ˆ θ = β Posterior Gaussian distribution for Y(x 0 ) ( ) ( ) ( ) ( ) ( ML ) ( 0 ) = UK ( 0 ) ( 0 ) = σuk ( 0 ) E Y x Y Y x Var Y x Y x ML

Bayesian Kriging- difficulties The simulation of the posterior distribution of the parameters can be hard (Simulation by a Monte Carlo Markov chain method) The choice of the prior is difficult Case of a flat prior for β, θ and σ Roughly equivalent to maximize the likelihood function Advantages: the prediction variance takes all sources of uncertainty into account, the optimization problem disappears Case of an informative prior: which one? What impact? IDEA: to use a simplified simulation (faster) to derive prior information Example : petroleum field

Application Simulator : flow simulator - 3DSL 3 inputs on [-1,1] lmultkz (permeability) krwmax (relative permeability), lbhp (low bottom hole pressure) Output: Field oil production total after 7000 days Problem : uncertainty analysis Method : metamodel and its uncertainty (Bayesian Kriging)

3DSL /degraded simulations Idea : using a faster simulator to get prior information degraded simulations (NODESMAX, DTMAX, DVPMAX etc)

3DSL /degraded simulations Correlation Coefficient ALL lmultkz krwmax lbhp 3DSL /Degraded 1 0.95 0.61 0.97 0.98 3DSL /Degraded 0.80-0.19 0.89 0.91 Note : calculations carried out on a grid of 1331=11 3 points

3DSL /degraded simulations 3.4 x 107 LMULTKZ 3.5 x 107 KRWMAX 3.45 x 107 LBHP 3.4 3.38 3.36 3.45 3.4 3.4 FOPT 3.34 3.3 3DSL Degraded 1 Degraded 3.35 3.3 3.35 3.3 3.3 3.5 3.8 3.5 3.6 3. 3.4-1 -0.5 0 0.5 1 3.15-1 -0.5 0 0.5 1 3. -1-0.5 0 0.5 1

4 different strategies runs UK no info BK BK info1 BK info time UK no info BK BK info1 BK info 3DSL 0 0 17 18 3DSL 41 41 35 37 Deg. 1 X X 4 X Deg. 1 X X 4 X Deg. X X X 4 Deg. X X X 3 Total 41 41 39 40 Info 1 = No info BK on the 4 runs of Degraded 1 provides information on trend, variance and correlation

4 different strategies UK no info BK BK info1 BK info RMSE* 05 04 04 04 166 770 3 845 Average standard deviation 99 457 141 573 188 411 58 394 proportion of outside points* 40% 19% 8% 4% Note : calculations carried out on a grid of 1331=11 3 points Accuracy of prediction: N N i = 1 ( ( ) ( )) i i 1 RMSE = Y x Yˆ x Accuracy of uncertainty: ASD 1 N σ Yˆ i N i= 1 = N i= 1 ( x ) 1 PR = 1 N σ ( ) ˆ( ) > ( ) Y x Y x x i i Yˆ i

Conclusion Advantages of Bayesian kriging In the case of a non informative prior for β: ( ( 0 ) ) = UK ( 0 ) ( ) = σ E Y x Y Y x ( 0 ) UK ( 0 ) Var Y x Y x Good estimation of the prediction variance which takes into account all sources of uncertainty: on β, σ and θ. Weakness of Bayesian kriging MCMC simulations Choice of the prior

Universal Kriging - example 3.5 1.5 θ = 0. (known) ˆ σ = 0.86 (σ = 4) ˆ β = 1.09 (β = 0) 1 0.5 0-0.5 output SK UK data -1 0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Experimental design sensibility likelihood optimization : f(θ) = -log(l(θ)) 3.5 1.5 1 0.5 0 4 0 18 16 14 1 10 8 6 4 5 8 10 11 3 5 8 9 3 6 9 11 5 6 7 9 11 5 6 9 11-0.5 0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 4 0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Variability on θ θ Identification pb

Prior and posterior distributions 0,35 0,19 0,78 0,68 0,70 0,68 0,4 0,61 Std(. Y) 0,97 0,48,5 1,69-0,58-0,63 0,00-0,56 E(. Y) Deg. 1,30 0,31 1,93 0,88 0,53 0,88 0,57 0,78 Std(. Y),71 0,76 3,57,1-0,64-0,5-0,3-0,40 E(. Y) Deg. 1 teta3 teta teta1 sigma beta3 beta beta1 beta0 1,73 0,8,00 0,80 0,4 1,05 0,55 1,06 Std(. Y) 4,36 1,08 3,86,64-0,35 0,34 0,54-0,57 E(. Y) no info 0,8 0,13 0,64 0,1 0,40 0,44 0,4 0,39 Std(. Y) 1,45 0,69,89 0,97-0,36-0,63 0,16-0,40 E(. Y) info 0,88 0,19 1,16 0,40 0,3 0,64 0,30 0,55 Std(. Y) 3,3 0,87 3,74 1,88-0,47-0,41 0,19-0,41 E(. Y) info 1 teta3 teta teta1 sigma beta3 beta beta1 beta0 17 points Prior

Range variation sensibility 4 beta = 1.1 sigma = 0.94 teta = 0.01 3 4 beta = 1.1 sigma = 0.98 teta = 0.16756 1 3 0-1 1 0-0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 4 beta = 4.4 sigma = 66 teta = 0.5-1 3-0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 0-1 - 0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Sensibility to range variation 4 beta = 1.1 sigma = 0.98 teta = 0.16756 4 beta = 4.4 sigma = 66 teta = 0.5 3 3 1 1 0 0-1 -1-0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 4 beta = 1.1 sigma = 0.94 teta = 0.01 1 Vraisemblance en fonction des paramètres d'échelle. 75 3 0.9 0.8 70 65 0.7 60 0.6 55 1 θ 0.5 50 0 0.4 0.3 45 40-1 0. 35 0.1 30-0 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 θ 1