Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Similar documents
Covariance function estimation in Gaussian process regression

Estimation de la fonction de covariance dans le modèle de Krigeage et applications: bilan de thèse et perspectives

Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification

Introduction to Gaussian-process based Kriging models for metamodeling and validation of computer codes

Cross Validation and Maximum Likelihood estimation of hyper-parameters of Gaussian processes with model misspecification

arxiv: v2 [stat.ap] 26 Feb 2013

On the smallest eigenvalues of covariance matrices of multivariate spatial processes

For more information about how to cite these materials visit

Using Estimating Equations for Spatially Correlated A

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

Spatial Modeling and Prediction of County-Level Employment Growth Data

Introduction to Spatial Data and Models

Biostat 2065 Analysis of Incomplete Data

Likelihood-Based Methods

Statistica Sinica Preprint No: SS R4

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Overview of Spatial Statistics with Applications to fmri

ECE531 Lecture 10b: Maximum Likelihood Estimation

Handbook of Spatial Statistics Chapter 2: Continuous Parameter Stochastic Process Theory by Gneiting and Guttorp

ORIGINALLY used in spatial statistics (see for instance

Econometrics I, Estimation

Minimax design criterion for fractional factorial designs

Inference for High Dimensional Robust Regression

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Uncertainty quantification and visualization for functional random variables

Vector Auto-Regressive Models

SINGLE NUGGET KRIGING. Minyong R. Lee Art B. Owen. Department of Statistics STANFORD UNIVERSITY Stanford, California

VAR Models and Applications

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Introduction to Spatial Data and Models

ON VARIANCE COVARIANCE COMPONENTS ESTIMATION IN LINEAR MODELS WITH AR(1) DISTURBANCES. 1. Introduction

Statistics: Learning models from data

Robust Backtesting Tests for Value-at-Risk Models

Lecture 2: ARMA(p,q) models (part 2)

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

Economics 583: Econometric Theory I A Primer on Asymptotics

Part 6: Multivariate Normal and Linear Models

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

Lawrence D. Brown* and Daniel McCarthy*

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Chapter 3: Maximum Likelihood Theory

SPATIAL LINEAR MIXED MODELS WITH COVARIATE MEASUREMENT ERRORS

Stat 5101 Lecture Notes

arxiv: v1 [stat.me] 24 May 2010

Geostatistics in Hydrology: Kriging interpolation

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Single Equation Linear GMM with Serially Correlated Moment Conditions

Long-Run Covariability

Multi-fidelity sensitivity analysis

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Modification and Improvement of Empirical Likelihood for Missing Response Problem

F & B Approaches to a simple model

Gradient Enhanced Universal Kriging Model for Uncertainty Propagation in Nuclear Engineering

Econ 583 Final Exam Fall 2008

Model Selection and Geometry

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Elements of Multivariate Time Series Analysis

Bayesian Dynamic Linear Modelling for. Complex Computer Models

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

A Primer on Asymptotics

6.867 Machine Learning

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Hierarchical Modelling for Univariate Spatial Data

Computation time/accuracy trade-off and linear regression

Gaussian Process Optimization with Mutual Information

Introduction to Geostatistics

Monitoring Wafer Geometric Quality using Additive Gaussian Process

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Quantifying Stochastic Model Errors via Robust Optimization

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Algorithmisches Lernen/Machine Learning

Empirical Market Microstructure Analysis (EMMA)

Hierarchical Modeling for Univariate Spatial Data

Bayesian Machine Learning

Statistics & Data Sciences: First Year Prelim Exam May 2018

Why experimenters should not randomize, and what they should do instead

Spatial Process Estimates as Smoothers: A Review

GARCH Models Estimation and Inference

Point-Referenced Data Models

The Bayesian approach to inverse problems

Regression I: Mean Squared Error and Measuring Quality of Fit

Efficient and Robust Scale Estimation

Limit Kriging. Abstract

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

BAYESIAN KRIGING AND BAYESIAN NETWORK DESIGN

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Outline. Motivation Contest Sample. Estimator. Loss. Standard Error. Prior Pseudo-Data. Bayesian Estimator. Estimators. John Dodson.

Hierarchical Modelling for Univariate Spatial Data

Statistics 910, #5 1. Regression Methods

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

e-companion ONLY AVAILABLE IN ELECTRONIC FORM Electronic Companion Stochastic Kriging for Simulation Metamodeling

A NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

An Introduction to Bayesian Linear Regression

Gaussian Process Regression

A Gaussian state-space model for wind fields in the North-East Atlantic

Transcription:

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling François Bachoc former PhD advisor: Josselin Garnier former CEA advisor: Jean-Marc Martinez Department of Statistics and Operations Research, University of Vienna (Former PhD student at CEA-Saclay, DEN, DM2S, STMF, LGLS, F-91191 Gif-Sur-Yvette, France and LPMA, Université Paris Diderot) JPS 2014 - Forges-les-Eaux - March 2014 François Bachoc Kriging models April 2014 1 / 28

1 Kriging models with Gaussian processes 2 Asymptotic analysis of covariance function estimation and of spatial sampling impact François Bachoc Kriging models April 2014 2 / 28

Kriging model with Gaussian processes Kriging model Study of a single realization of a Gaussian process Y (x) on a domain X R d y 2 1 0 1 2 2 1 0 1 2 x Goal Predicting the continuous realization function, from a finite number of observation points François Bachoc Kriging models April 2014 3 / 28

The Gaussian process The Gaussian process We consider that the Gaussian process is centered, x, E(Y (x)) = 0 The Gaussian process is hence characterized by its covariance function The covariance function The function K : X 2 R, defined by K (x 1, x 2 ) = cov(y (x 1 ), Y (x 2 )) In most classical cases : Stationarity : K (x 1, x 2 ) = K (x 1 x 2 ) Continuity : K (x) is continuous Gaussian process realizations are continuous Decrease : K (x) is a decreasing function for x 0 and lim x + K (x) = 0 François Bachoc Kriging models April 2014 4 / 28

Example of the Matérn 3 2 covariance function on R The Matérn 3 covariance function, for a Gaussian 2 process on R is parameterized by A variance parameter σ 2 > 0 A correlation length parameter l > 0 It is defined as ( C σ 2,l (x 1, x 2 ) = σ 2 1 + 6 x ) 1 x 2 e 6 x 1 x 2 l l Interpretation Stationarity, continuity, decrease cov 0.0 0.2 0.4 0.6 0.8 1.0 l=0.5 l=1 l=2 0 1 2 3 4 x σ 2 corresponds to the order of magnitude of the functions that are realizations of the Gaussian process l corresponds to the speed of variation of the functions that are realizations of the Gaussian process Natural generalization on R d François Bachoc Kriging models April 2014 5 / 28

Covariance function estimation Parameterization Covariance function model { σ 2 K θ, σ 2 0, θ Θ } for the Gaussian Process Y. σ 2 is the variance parameter θ is the multidimensional correlation parameter. K θ is a stationary correlation function. Observations Y is observed at x 1,..., x n X, yielding the Gaussian vector y = (Y (x 1 ),..., Y (x n)). Estimation Objective : build estimators ˆσ 2 (y) and ˆθ(y) François Bachoc Kriging models April 2014 6 / 28

Maximum Likelihood for estimation Explicit Gaussian likelihood function for the observation vector y Maximum Likelihood Define R θ as the correlation matrix of y = (Y (x 1 ),..., Y (x n)) with correlation function K θ and σ 2 = 1. The Maximum Likelihood estimator of (σ 2, θ) is ( (ˆσ ML 2, ˆθ 1 ML ) argmin ln ( σ 2 R θ ) + 1 ) n σ 2 y t R 1 θ y σ 2 0,θ Θ Numerical optimization with O(n 3 ) criterion François Bachoc Kriging models April 2014 7 / 28

Cross Validation for estimation ŷ θ,i, i = E σ 2,θ (Y (x i ) y 1,..., y i 1, y i+1,..., y n) σ 2 c 2 θ,i, i = var σ 2,θ (Y (x i ) y 1,..., y i 1, y i+1,..., y n) Leave-One-Out criteria we study and 1 n ˆθ CV argmin θ Θ n (y i ŷ θ,i, i ) 2 i=1 n (y i ŷ ˆθ CV,i, i )2 ˆσ 2 = 1 ˆσ i=1 CV c2ˆθcv CV 2 = 1 n (y i ŷ ˆθ CV,i, i )2 n c 2ˆθCV,i, i i=1,i, i Robustness We show that Cross Validation can be preferable to Maximum Likelihood when the covariance function model is misspecified Bachoc F, Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification, Computational Statistics and Data Analysis 66 (2013) 55-69 François Bachoc Kriging models April 2014 8 / 28

Virtual Leave One Out formula Let R θ be the covariance matrix of y = (y 1,..., y n) with correlation function K θ and σ 2 = 1 Virtual Leave-One-Out y i ŷ θ,i, i = 1 (R 1 θ ) i,i ( R 1 θ y )i and c 2 i, i = 1 (R 1 θ ) i,i O. Dubrule, Cross Validation of Kriging in a Unique Neighborhood, Mathematical Geology, 1983. Using the virtual Cross Validation formula : and Same computational cost as ML ˆθ CV argmin θ Θ 1 n y t R 1 θ diag(r 1 θ ) 2 R 1 θ y ˆσ 2 CV = 1 n y t R 1 ˆθ CV diag(r 1 ˆθ CV ) 1 R 1 ˆθ CV y François Bachoc Kriging models April 2014 9 / 28

Prediction with fixed covariance function Gaussian process Y observed at x 1,..., x n and predicted at x new y = (Y (x 1 ),..., Y (x n)) t Once the covariance function has been estimated and fixed R is the covariance matrix of Y at x 1,..., x n r is the covariance vector of Y between x 1,..., x n and x new Prediction The prediction is Ŷ (xnew ) := E(Y (xnew ) Y (x 1),..., Y (x n)) = r t R 1 y. Predictive variance The predictive variance is var(y (x new ) Y (x 1 ),..., Y (x n)) = E [(Y (x new ) Ŷ (xnew ))2 ] = var(y (x new )) r t R 1 r. François Bachoc Kriging models April 2014 10 / 28

Illustration of prediction Observations Prediction mean 95% confidence intervals Conditional realizations François Bachoc Kriging models April 2014 11 / 28

Application to computer experiments Computer model A computer model, computing a given variable of interest, corresponds to a deterministic function R d R. Evaluations of this function are time consuming Examples : Simulation of a nuclear fuel pin, of thermal-hydraulic systems, of components of a car, of a plane... Kriging model for computer experiments Basic idea : representing the code function by a realization of a Gaussian process Bayesian framework on a fixed function What we obtain metamodel of the code : the Kriging prediction function approximates the code function, and its evaluation cost is negligible Error indicator with the predictive variance Full conditional Gaussian process possible goal-oriented iterative strategies for optimization, failure domain estimation, small probability problems, code calibration... François Bachoc Kriging models April 2014 12 / 28

Conclusion Kriging models : The covariance function characterizes the Gaussian process It is estimated first. Here we consider Maximum Likelihood and Cross Validation estimation Then we can compute prediction and predictive variances with explicit matrix vector formulas Widely used for computer experiments François Bachoc Kriging models April 2014 13 / 28

1 Kriging models with Gaussian processes 2 Asymptotic analysis of covariance function estimation and of spatial sampling impact François Bachoc Kriging models April 2014 14 / 28

Framework and objectives Estimation We do not make use of the distinction σ 2, θ. Hence we use the set {K θ, θ Θ} of stationary covariance functions for the estimation. Well-specified model The true covariance function K of the Gaussian Process belongs to the set {K θ, θ Θ}. Hence K = K θ0, θ 0 Θ Objectives Study the consistency and asymptotic distribution of the Cross Validation estimator Confirm that, asymptotically, Maximum Likelihood is more efficient Study the influence of the spatial sampling on the estimation François Bachoc Kriging models April 2014 15 / 28

Spatial sampling for covariance parameter estimation Spatial sampling : initial design of experiments for Kriging It has been shown that irregular spatial sampling is often an advantage for covariance parameter estimation Stein M, Interpolation of Spatial Data : Some Theory for Kriging, Springer, New York, 1999. Ch.6.9. Zhu Z, Zhang H, Spatial sampling design under the infill asymptotics framework, Environmetrics 17 (2006) 323-337. Our question : can we confirm this finding in an asymptotic framework François Bachoc Kriging models April 2014 16 / 28

Two asymptotic frameworks for covariance parameter estimation Asymptotics (number of observations n + ) is an active area of research (Maximum-Likelihood estimator) Two main asymptotic frameworks fixed-domain asymptotics : The observation points are dense in a bounded domain 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 increasing-domain asymptotics : A minimum spacing exists between the observation points infinite observation domain. 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 François Bachoc Kriging models April 2014 17 / 28

Choice of the asymptotic framework Comments on the two asymptotic frameworks fixed-domain asymptotics From 80-90 and onwards. Fruitful theory Stein, M., Interpolation of Spatial Data Some Theory for Kriging, Springer, New York, 1999. However, when convergence in distribution is proved, the asymptotic distribution does not depend on the spatial sampling Impossible to compare sampling techniques for estimation in this context increasing-domain asymptotics : Asymptotic normality proved for Maximum-Likelihood (under conditions that are not simple to check) Sweeting, T., Uniform asymptotic normality of the maximum likelihood estimator, Annals of Statistics 8 (1980) 1375-1381. Mardia K, Marshall R, Maximum likelihood estimation of models for residual covariance in spatial regression, Biometrika 71 (1984) 135-146. (no results for CV) We study increasing-domain asymptotics for ML and CV under irregular sampling François Bachoc Kriging models April 2014 18 / 28

The randomly perturbed regular grid that we study Observation point i : v i + ɛx i (v i ) i N : regular square grid of step one in dimension d (X i ) i N : iid with symmetric distribution on [ 1, 1] d ɛ ( 1 2, 1 ) is the regularity parameter of the grid. 2 ɛ = 0 regular grid. ɛ close to 1 2 irregularity is maximal Illustration with ɛ = 0, 1 8, 3 8 1 2 3 4 5 6 7 8 2 4 6 8 2 4 6 8 1 2 3 4 5 6 7 8 2 4 6 8 2 4 6 8 François Bachoc Kriging models April 2014 19 / 28

Consistency and asymptotic normality Under general summability, regularity and identifiability conditions, we show Proposition : for ML a.s convergence ( of the random ) Fisher information : The random trace 1 2n Tr R 1 R θ0 θ R 1 R θ0 0 θ i θ converges a.s to the element (I 0 θ ML ) i,j of a p p deterministic j matrix I ML as n + asymptotic normality : With Σ ML = I 1 ML ) n (ˆθ ML θ 0 N (0, Σ ML ) Proposition : for CV Same result with more complex expressions for asymptotic covariance matrix Σ CV François Bachoc Kriging models April 2014 20 / 28

Main ideas for the proof A central tool : because of the minimum distance between observation points : the eigenvalues of the random matrices involved are uniformly lower and upper bounded For consistency : bounding from below the difference of M-estimator criteria between θ and θ 0 by the integrated square difference between K θ and K θ0 For almost-sure convergence of random traces : block-diagonal approximation of the random matrices involved and Cauchy criterion For asymptotic normality of criterion gradient : almost-sure (with respect to the random perturbations) Lindeberg-Feller Central Limit Theorem Conclude with classical M-estimator method François Bachoc Kriging models April 2014 21 / 28

Analysis of the asymptotic covariance matrices The asymptotic covariance matrices Σ ML,CV depend only on the regularity parameter ɛ. in the sequel, we study the functions ɛ Σ ML,CV Matérn model in dimension one K l,ν (x 1, x 2 ) = 1 Γ(ν)2 ν 1 ( 2 ν x ) 1 x 2 ν ( K ν 2 ν x ) 1 x 2, l l with Γ the Gamma function and K ν the modified Bessel function of second order We consider The estimation of l when ν 0 is known The estimation of ν when l 0 is known = We study scalar asymptotic variances François Bachoc Kriging models April 2014 22 / 28

Results for the Matérn model (1/2) Estimation of l when ν 0 is known. Level plot of [ Σ ML,CV (ɛ = 0) ] / [ Σ ML,CV (ɛ = 0.45) ] in l 0 ν 0 for ML (left) and CV (right) ν 0 1 2 3 4 5 190 48 12 3 ν 0 1 2 3 4 5 190 48 12 3 0.75 0.75 0.5 1.0 1.5 2.0 2.5 l 0 0.5 1.0 1.5 2.0 2.5 l 0 Perturbations of the regular grid are always beneficial for ML François Bachoc Kriging models April 2014 23 / 28

Results for the Matérn model (2/2) Estimation of ν when l 0 is known. Level plot of [ Σ ML,CV (ɛ = 0) ] / [ Σ ML,CV (ɛ = 0.45) ] in l 0 ν 0 for ML (left) and CV (right) ν 0 1 2 3 4 5 220 56 14 3.6 ν 0 1 2 3 4 5 220 56 14 3.6 0.95 0.95 0.5 1.0 1.5 2.0 2.5 l 0 0.5 1.0 1.5 2.0 2.5 l 0 Perturbations of the regular grid are always beneficial for ML and CV François Bachoc Kriging models April 2014 24 / 28

Conclusion on covariance function estimation and spatial sampling CV is consistent and has the same rate of convergence as ML We confirm (not presented here) that ML is more efficient Strong irregularity in the sampling is an advantage for covariance function estimation With ML, irregular sampling is more often an advantage than with CV We show that, however, regular sampling is better for prediction with known covariance function = motivation for using space-filling samplings augmented with some clustered observation points For further details : Z. Zhu and H. Zhang, Spatial Sampling Design Under the Infill Asymptotics Framework, Environmetrics 17 (2006) 323-337. L. Pronzato and W. G. Müller, Design of computer experiments : space filling and beyond, Statistics and Computing 22 (2012) 681-701. F. Bachoc, Asymptotic analysis of the role of spatial sampling for covariance parameter estimation of Gaussian processes, Journal of Multivariate Analysis 125 (2014) 1-35. François Bachoc Kriging models April 2014 25 / 28

Some perspectives Ongoing work Asymptotic analysis of the case of a misspecified covariance function model with purely random sampling Other potential perspectives Designing other CV procedures (LOO error weighting, decorrelation and penalty term) to reduce the variance Start studying the fixed-domain asymptotics of CV, in the particular cases where it is done for ML François Bachoc Kriging models April 2014 26 / 28

French community French community GDR MASCOT-NUM www.gdr-mascotnum.fr consortium ReDICE www.redice-project.org François Bachoc Kriging models April 2014 27 / 28

Thank you for your attention! François Bachoc Kriging models April 2014 28 / 28