arxiv: v1 [stat.ml] 29 Mar 2012

Similar documents
Fast Update of Conditional Simulation Ensembles

Fast Update of Conditional Simulation Ensembles

PARAMETER ESTIMATION FOR FRACTIONAL BROWNIAN SURFACES

Bayesian Transgaussian Kriging

Chapter 4 - Fundamentals of spatial processes Lecture notes

Approximating likelihoods for large spatial data sets

Warped Gaussian processes and derivative-based sequential design for functions with heterogeneous variations

Geostatistics in Hydrology: Kriging interpolation

Covariance to PCA. CS 510 Lecture #8 February 17, 2014

Geostatistics for Gaussian processes

Package KrigInv. R topics documented: September 11, Type Package

Étude de classes de noyaux adaptées à la simplification et à l interprétation des modèles d approximation.

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Towards GP-based optimization with finite time horizon

A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets

Limit Kriging. Abstract

Methods for sparse analysis of high-dimensional data, II

What s for today. Continue to discuss about nonstationary models Moving windows Convolution model Weighted stationary model

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M.

STAT 100C: Linear models

Continuum Limit of Forward Kolmogorov Equation Friday, March 06, :04 PM

Data Preprocessing Tasks

Regression Analysis. Ordinary Least Squares. The Linear Model

Nonparametric Regression With Gaussian Processes

Midterm: CS 6375 Spring 2015 Solutions

Classification 2: Linear discriminant analysis (continued); logistic regression

TESTS ON REINFORCED CONCRETE LOW-RISE SHEAR WALLS UNDER STATIC CYCLIC LOADING

Independence of some multiple Poisson stochastic integrals with variable-sign kernels

Gaussian Processes (10/16/13)

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Simulation of a Gaussian random vector: A propagative approach to the Gibbs sampler

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Optimal Designs for Gaussian Process Models via Spectral Decomposition. Ofir Harari

Gaussian random variables inr n

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Stable Process. 2. Multivariate Stable Distributions. July, 2006

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Linear Algebra I. Ronald van Luijk, 2015

Multiple realizations: Model variance and data uncertainty

Coregionalization by Linear Combination of Nonorthogonal Components 1

Lecture 9: Introduction to Kriging

Gaussian vectors and central limit theorem

arxiv: v1 [math.gr] 8 Nov 2008

3 rd class Mech. Eng. Dept. hamdiahmed.weebly.com Fourier Series

Lecture 24: Weighted and Generalized Least Squares

Methods for sparse analysis of high-dimensional data, II

Lines, parabolas, distances and inequalities an enrichment class

A note on the Green s function for the transient random walk without killing on the half lattice, orthant and strip

Introduction to emulators - the what, the when, the why

CHAPTER V. Brownian motion. V.1 Langevin dynamics

A Short Note on the Proportional Effect and Direct Sequential Simulation

Assessing the covariance function in geostatistics

Correcting Variogram Reproduction of P-Field Simulation

Overfitting, Bias / Variance Analysis

Kalman Filter Computer Vision (Kris Kitani) Carnegie Mellon University

Dale L. Zimmerman Department of Statistics and Actuarial Science, University of Iowa, USA

Linear Regression (9/11/13)

Covariance function estimation in Gaussian process regression

Technical Details about the Expectation Maximization (EM) Algorithm

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Decision-Oriented Environmental Mapping with Radial Basis Function Neural Networks

Harmonic Analysis. 1. Hermite Polynomials in Dimension One. Recall that if L 2 ([0 2π] ), then we can write as

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Ch. 12 Linear Bayesian Estimators

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

PRINCIPAL COMPONENTS ANALYSIS

Sequential approaches to reliability estimation and optimization based on kriging

Erdős-Renyi random graphs basics

Covariance Matrix Simplification For Efficient Uncertainty Management

REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b)

LECTURE 15: SIMPLE LINEAR REGRESSION I

1 Correlation between an independent variable and the error

SELECTED TOPICS CLASS FINAL REPORT

Spatial Backfitting of Roller Measurement Values from a Florida Test Bed

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Multiple Linear Regression

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

Covariance to PCA. CS 510 Lecture #14 February 23, 2018

Linear Model Under General Variance Structure: Autocorrelation

RECURSIVE ESTIMATION AND KALMAN FILTERING

y(x) = x w + ε(x), (1)

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Gaussian Processes. 1. Basic Notions

Economics 620, Lecture 2: Regression Mechanics (Simple Regression)

STATISTICAL ORBIT DETERMINATION

[y i α βx i ] 2 (2) Q = i=1

Basics of Point-Referenced Data Models

1 Lecture 6 Monday 01/29/01

Dimension Reduction. David M. Blei. April 23, 2012

A Covariance Conversion Approach of Gamma Random Field Simulation

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.)

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

Ensemble Kalman filter assimilation of transient groundwater flow data via stochastic moment equations

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

1 Recurrence relations, continued continued

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density

Transcription:

Corrected Kriging update formulae for batch-sequential data assimilation Clément Chevalier, David Ginsbourger arxiv:1203.6452v1 [stat.ml] 29 Mar 2012 1 Introduction March 28th, 2012 Recently, a lot of effort has been paid to the efficient computation of Kriging predictors when observations are assimilated sequentially. In particular, Kriging update formulae enabling significant computational savings were derived in [1, 3, 2]. Taking advantage of the previous Kriging mean and variance calculations helps avoiding a costly (n+1) (n+1) matrix inversion when adding one observation to the n already available ones. In addition to traditional update formulae taking into account a single new observation, [2] also proposed formulae for the batch-sequential case, i.e. when r > 1 new observations are simultaneously assimilated. However, the Kriging variance and covariance formulae given without proof in [2] for the batch-sequential case are not correct. In this paper we fix this issue and establish corrected expressions for updated Kriging variances and covariances when assimilating several observations in parallel. 2 Kriging update formulae for the parallel case 2.1 notations and formulae In this section we give a counter-example for the Kriging variance update formula proposed in [2]. Let us first introduce the notations of this paper and recall these formulae. When n observations, at locations x 1,...,x n D, of a real valued random field Z indexed by D R d are available, [2] denotes respectively by Ẑ(x) n data and σ 2 n(x) the Kriging mean and variance at x D based on these n observations. The corresponding Kriging covariance is denoted by σ n : (x,y) D 2 σ n (x,y). If k > 1 new observations, at points x n+1,...,x n+k, are available, the parallel Kriging update formulae given in [2] Institut de Radioprotection et de Sûreté Nucléaire (IRSN), Fontenay-aux-Roses, France IMSV, Department of Mathematics and Statistics, University of Berne, Switzerland 1

follow: Ẑ(x) n+k data = Ẑ(x) n data + σ 2 n+k (x) = σ2 n(x) σ n+k (x,y) = σ n (x,y) ) λ n+i n+k (x) (Z(x n+i ) Ẑ(x n+i) n data, (1) λ 2 n+i n+k (x)σ2 n(x n+i ), (2) λ n+i n+k (x)λ n+i n+k (y)σn 2 (x n+i) (3) where λ n+i n+k (x) denotes the Kriging weight of Z(x n+i ) when predicting Z(x) relying on n + k observations. In [2], these formulae are proven only for k = 1. For k > 1, they are quickly justified with an argument based on Pythagoras theorem. In fact, the assumptions for using this theorem are not satisfied, as we will detail here. We now give a counter-example for Equation (2) and then propose fixed expressions. 2.2 A counterexample for Equation (2) Let us consider for the sake of simplicity a case where Z is a one-dimensional centered random process indexed by D = [0,1], with covariance kernel C(x,y) := σ 0 (x,y) = min(x, y) (Wiener process, or Brownian Motion). Note that even though this process is non-stationary, Simple Kriging equations are applicable to it (Written in the general case of a non-stationary covariance kernel, as allowed in [2] s results and notations). Assuming n = 0 initial observations and k = 2 new observations at the points x 1 = 1 2, x 2 = 1, the Kriging weights for a prediction at x = 3 4 write: (λ 1 2 (x),λ 2 2 (x)) = (C(x,x 1 ),C(x,x 2 )) ( ) 1 C(x1,x 1 ) C(x 1,x 2 ) = C(x 2,x 1 ) C(x 2,x 2 ) ( 1 2, 1 ) 2 leading to a Kriging mean Ẑ(x) n+k data = 1 2 Z(x 1)+ 1 2 Z(x 2), and to a Kriging variance: σ 2 2(x) = C(x,x) (λ 1 2 (x),λ 2 2 (x)) Now, using Eq. (2) would lead to ( ) C(x1,x 1 ) C(x 1,x 2 ) (λ C(x 2,x 1 ) C(x 2,x 2 ) 1 2 (x),λ 2 2 (x)) T = 1 8 σ 2 2 (x) = σ2 0 (x) λ2 1 2 (x)σ2 0 (x 1) λ 2 2 2 (x)σ2 0 (x 2) = 3 4 1 1 42 1 4 = 3 8 and we see that the two expressions do not match. More precisely, the conditional variance obtained by using Eq. (2) is too large, as if the contribution of x 1 and x 2 to decreasing the variance would have been underestimated. As we will see now, this is indeed the case, and the missing part is related to conditional covariances. 2

2.3 Corrected Kriging variance and covariance update formulae In this section, we establish corrected update formulae that may be used in the batchsequential case instead of the formulae (2) and (3). To improve the readability of the properties and their proofs, we adopt the following simplified notations: X old := {x 1,...,x n }, and X new := {x n+1,...,x n+k }, Z old := (Z(x 1 ),...,Z(x n )), and Z new := (Z(x n+1 ),...,Z(x n+k )), λ new,old (x) := (λ 1 n+k (x),...,λ n n+k (x)) T, λ new,new (x) := (λ n+1 n+k (x),...,λ n+k n+k (x)) T, σold 2 (x) := σ2 n(x), σnew(x) 2 := σn+k 2 (x), andsimilarlyfortheconditionalcovariances. The conditional covariance matrix of Z new knowing Z old is denoted by Σ new, For conciseness and coherence, Ẑ(x) n data and Ẑ(x) n+k data are also denoted by Ẑ(x) old and Ẑ(x) new, respectively. The corrected update formulae are given below: Proposition 1. (Corrected Kriging update equations for the parallel case) Ẑ(x) new = Ẑ(x) old +λ new,new (x) T (Z new Ẑ(X new) old ) (4) σ 2 new(x) = σ 2 old (x) λ new,new(x) T Σ new λ new,new (x) (5) σ new (x,y) = σ old (x,y) λ new,new (x) T Σ new λ new,new (y) (6) The proofs above use a Gaussian assumption on the field Z, enabling a convenient interpretation of the Kriging mean and variance in terms of conditional expectation and variance. A key property is that even though the conditional distribution interpretation doesn t hold in non-gaussian cases, the formulae for the Kriging weights remain valid whatever the assumed square-integrable distribution for the field; we are just using here that best linear prediction and conditioning coincide in the Gaussian case. Furthermore, the results hold as well for the cases of Ordinary and Universal Kriging (written in terms of covariances, for a square-integrable field, not with variograms), relying on their well-known Bayesian construction with an improper prior on the trend coefficient(s). Proof. The first equation follows from an application of the law of total expectation: Ẑ(x) old = E[Z(x) Z old ] = E[E[Z(x) Z old,z new ] Z old ] = E[λ new,old (x) T Z old +λ new,new (x) T Z new ] Z old ] = λ new,old (x) T Z old +λ new,new (x) T Ẑ(X new ) old = λ new,old (x) T Z old +λ new,new (x) T Z new λ new,new (x) T Z new +λ new,new (x) T Ẑ(X new ) old = Ẑ(x) new λ new,new (x) T (Z new Ẑ(X new ) old ) 3

Similarly, using the law of total variance delivers: σ 2 old (x) = Var[Z(x) Z old] = E[Var[Z(x) Z old,z new ] Z old ]+Var[E[Z(x) Z old,z new ] Z old ] = Var[Z(x) Z old,z new ]+Var[λ new,old (x) T Z old +λ new,new (x) T Z new Z old ] = σ 2 new(x)+λ new,new (x) T Σ new λ new,new (x) which proves the second equation. The third equation comes with the same method, using the law of total covariance. Proposition 2. (Kriging weights expressed in terms of conditional covariance) Σ new λ new,new (x) = σ old (X new,x) (7) Corollary 1. (Corrected Kriging update equations in terms of conditional covariance) Ẑ(x) new = Ẑ(x) old +σ old (X new,x) T Σ 1 new (Z new Ẑ(X new) old ) (8) σnew 2 (x) = σ2 old (x) σ old(x new,x) T Σ 1 new σ old(x new,x) (9) σ new (x,y) = σ old (x,y) σ old (X new,x) T Σ 1 new σ old(x new,y) (10) Proof. Using the orthogonal projection interpretation of the conditional expectation, {}}{ Z(x) = E(Z(x) Z old,z new )+ Z(x) E(Z(x) Z old,z new ) =:ε = λ new,old (x) Z old +λ new,new (x) Z new +ε, with ε centered, and independent of Z old and Z new. Let us now calculate the conditional covariance between Z(x) and Z new knowing the observations Z old : σ old (X new,x) := cov(z new,z(x) Z old ) = 0+cov (Z new,λ new,new (x) ) Z new Zold +cov(z new,ε Z old ) =Σ new λ new,new (x)+cov(ε,z new Z old ) Noting that cov(z new,ε Z old ) = 0, the latter equation proves Proposition 2. Eqs. 8, 9, and 9 of the corollary directly follow by plugging in Eq. 2 into Eqs. 4, 5, 6. 2.4 Interpretation on the counter-example The main difference between Equations 2 and 5 is that Eq. 5 takes into account conditional Kriging covariances. Coming back to the counter example, Eq. 5 delivers: σ 2 2(x) = σ 2 0(x) λ 2 1 2 (x)σ2 0(x 1 ) λ 2 2 2 (x)σ2 0(x 2 ) 2λ 1 2 (x)λ 2 2 (x)σ 0 (x 1,x 2 ) = 3 4 1 1 42 1 4 21 4 1 2 = 1 8 4

which is the correct variance, as obtained earlier with regular Simple Kriging equations. Note that Eq. 2 would be correct if the matrix Σ new were diagonal, which has no particular reason to happen in practical applications. The Pytagorean theorem invoked in [2], enabling to decompose a variance into a sum of (weighted) variances, cannot hence be applied in the general case because of non-independence between Kriging residuals. 3 Conclusion In this paper we derived corrected Kriging update formulae, fixing the incorrect formulae 2 and 3 given in [2], where a part of the update involving conditional covariances had been neglected. A simple interpretation of the new update formulae is that they correspond to Simple Kriging equations, where the underling covariance kernel would be the conditional covariance kernel before update, i.e σ old (.,.). These formulae enable important computational savings (avoiding a large matrix inversion) and may perfectly be adapted to further frameworks such as co-kriging. Acknowledgements: The authors would like to thank Dr. Julien Bect(Ecole Supérieure d Electricité) for drawing their attention to the conditional covariance interpretation of Kriging weights in the non-parallel case. Clément Chevalier acknowledges support from IRSN and the ReDICE consortium. David Ginsbourger acknowledges support from the IMSV, University of Berne. References [1] R. J. Barnes and A. Watson. Efficient updating of Kriging estimates and variances. Mathematical geology, 24(1):129 133, 1992. [2] X. Emery. The Kriging update equations and their application to the selection of neighboring data. Computational Geosciences, 13(1):211 219, 2009. [3] H. Gao, J. Wang, and P. Zhao. The updated Kriging variance and optimal sample design. Mathematical Geology, 28:295 313, 1996. 5