A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets

Similar documents
Geostatistical Modeling for Large Data Sets: Low-rank methods

Nonparametric Bayesian Methods

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

Experimental variogram of the residuals in the universal kriging (UK) model

Geostatistics for Gaussian processes

An Additive Gaussian Process Approximation for Large Spatio-Temporal Data

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

arxiv: v1 [stat.me] 3 Dec 2014

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

A Fused Lasso Approach to Nonstationary Spatial Covariance Estimation

Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes

A Note on the comparison of Nearest Neighbor Gaussian Process (NNGP) based models

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Low-rank methods and predictive processes for spatial models

Multivariate modelling and efficient estimation of Gaussian random fields with application to roller data

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

A Multi-resolution Gaussian process model for the analysis of large spatial data sets.

A multi-resolution Gaussian process model for the analysis of large spatial data sets.

The Matrix Reloaded: Computations for large spatial data sets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Gaussian predictive process models for large spatial data sets.

Douglas Nychka, Soutir Bandyopadhyay, Dorit Hammerling, Finn Lindgren, and Stephan Sain. October 10, 2012

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University

The Matrix Reloaded: Computations for large spatial data sets

arxiv: v4 [stat.me] 14 Sep 2015

A Bayesian Spatio-Temporal Geostatistical Model with an Auxiliary Lattice for Large Datasets

Latent Gaussian Processes and Stochastic Partial Differential Equations

STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS

Basics in Geostatistics 2 Geostatistical interpolation/estimation: Kriging methods. Hans Wackernagel. MINES ParisTech.

Nouveaux développements en géostatistique minière. New developments in mining geostatistics

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Positive Definite Functions on Spheres

Journal of Multivariate Analysis. Nonstationary modeling for multivariate spatial processes

Hierarchical Low Rank Approximation of Likelihoods for Large Spatial Datasets

Handbook of Spatial Statistics Chapter 2: Continuous Parameter Stochastic Process Theory by Gneiting and Guttorp

Proceedings of the 2015 Winter Simulation Conference L. Yilmaz, W. K. V. Chan, I. Moon, T. M. K. Roeder, C. Macal, and M. D. Rossetti, eds.

Summary STK 4150/9150

Spatial Backfitting of Roller Measurement Values from a Florida Test Bed

Modelling Multivariate Spatial Data

A full-scale approximation of covariance functions for large spatial data sets

A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging

Isotropic Covariance Functions on Spheres: Some Properties and Modeling Considerations

COVARIANCE APPROXIMATION FOR LARGE MULTIVARIATE SPATIAL DATA SETS WITH AN APPLICATION TO MULTIPLE CLIMATE MODEL ERRORS 1

Spatial smoothing using Gaussian processes

Nearest Neighbor Gaussian Processes for Large Spatial Data

Covariance Tapering for Interpolation of Large Spatial Datasets

Introduction to Spatial Data and Models

ENVIRONMENTAL DATA ANALYSIS WILLIAM MENKE JOSHUA MENKE WITH MATLAB COPYRIGHT 2011 BY ELSEVIER, INC. ALL RIGHTS RESERVED.

Covariance function estimation in Gaussian process regression

Wrapped Gaussian processes: a short review and some new results

Statistical modeling of MODIS cloud data using the spatial random effects model

Predictive spatio-temporal models for spatially sparse environmental data. Umeå University

Introduction to Spatial Data and Models

arxiv: v1 [stat.me] 24 May 2010

Kriging and Alternatives in Computer Experiments

A BAYESIAN SPATIO-TEMPORAL GEOSTATISTICAL MODEL WITH AN AUXILIARY LATTICE FOR LARGE DATASETS

Statistícal Methods for Spatial Data Analysis

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Theory and Computation for Gaussian Processes

New Flexible Compact Covariance Model on a Sphere

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Statistical Models for Monitoring and Regulating Ground-level Ozone. Abstract

Covariance Tapering for Interpolation of Large Spatial Datasets

Separable approximations of space-time covariance matrices

STATISTICAL INTERPOLATION METHODS RICHARD SMITH AND NOEL CRESSIE Statistical methods of interpolation are all based on assuming that the process

Introduction to Geostatistics

X random; interested in impact of X on Y. Time series analogue of regression.

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

7 Geostatistics. Figure 7.1 Focus of geostatistics

Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis

Spatial and Environmental Statistics

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Multivariate Gaussian Random Fields with SPDEs

An Inversion-Free Estimating Equations Approach for. Gaussian Process Models

Hilbert Space Methods for Reduced-Rank Gaussian Process Regression

Decision-Oriented Environmental Mapping with Radial Basis Function Neural Networks

Probabilistic Graphical Models

Space-time data. Simple space-time analyses. PM10 in space. PM10 in time

ENVIRONMENTAL ADAPTIVE SAMPLING FOR MOBILE SENSOR NETWORKS USING GAUSSIAN PROCESSES. Yunfei Xu

Bayesian Transgaussian Kriging

Improving Spatial Data Interoperability

Spatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.

Chapter 4 - Fundamentals of spatial processes Lecture notes

Simulation of a Gaussian random vector: A propagative approach to the Gibbs sampler

Bruno Sansó. Department of Applied Mathematics and Statistics University of California Santa Cruz bruno

Lecture 9: Introduction to Kriging

Massive MIMO: Signal Structure, Efficient Processing, and Open Problems II

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Geostatistics in Hydrology: Kriging interpolation

A STATISTICAL TECHNIQUE FOR MODELLING NON-STATIONARY SPATIAL PROCESSES

Some Topics in Convolution-Based Spatial Modeling

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Modified Linear Projection for Large Spatial Data Sets

Parallel inference for massive distributed spatial data using low-rank models

Extremes and Atmospheric Data

Zudi Lu, Dag Tjøstheim and Qiwei Yao Spatial smoothing, Nugget effect and infill asymptotics

Transcription:

A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets Thomas Romary, Nicolas Desassis & Francky Fouedjio Mines ParisTech Centre de Géosciences, Equipe Géostatistique Réseau RESSTE 11 mai 2015 T. Romary et al. (Mines ParisTech) RESSTE 1 / 25

Introduction Introduction Z(x, t) = µ(x, t) + S(x, t) second order spatio-(temporal) random field Inference Possible non-stationarity in the covariance structure Prediction complexity O(n 3 ), storage O(n 2 ) T. Romary et al. (Mines ParisTech) RESSTE 2 / 25

Introduction Notations n size of the dataset z = {z(x 1 ),..., z(x n )} : data C covariance fonction C covariance matrix associated to the data f deterministic covariates T. Romary et al. (Mines ParisTech) RESSTE 3 / 25

Introduction Universal kriging system Matrix form We have f 00 f 0 =. f 0L, F = Kriging system and variance f 10... f 1L..... f n0... f nl Z = Λ t z, µ = µ 1. µ L ( ) ( ) C F Λ F t 0 µ = ( C0 f 0 ) and σ 2 UK = C(0) Λ t C 0 µ t f 0 T. Romary et al. (Mines ParisTech) RESSTE 4 / 25

Introduction Universal kriging system Z = C 0C 1 z + (f 0 C 0C 1 F)(F C 1 F) 1 F C 1 z Idea C = A + P BP t where A is sparse P is a n p matrix with p n Sherman-Woodbury-Morrison formula C 1 = A 1 A 1 P (B 1 + P t A 1 P ) 1 P t A 1 T. Romary et al. (Mines ParisTech) RESSTE 5 / 25

Introduction Universal kriging system Z = C 0C 1 z + (f 0 C 0C 1 F)(F C 1 F) 1 F C 1 z Idea C = A + P BP t where A is sparse P is a n p matrix with p n Sherman-Woodbury-Morrison formula C 1 = A 1 A 1 P (B 1 + P t A 1 P ) 1 P t A 1 T. Romary et al. (Mines ParisTech) RESSTE 5 / 25

Introduction Spatio-temporal framework Satellite / Remote sensing data {Z(x, t) : x X R d, t = 1,..., T }, d = 2, 3 Spatio-temporal mixed effect model (Katzfuss and Cressie [2011]) where µ mean function Z(x, t) = µ(x, t) + S(x, t) + S second order random function p ξ i (t)p i (x) Ξ t = (ξ t 1,..., ξt p) centered, square-integrable random variables P i (x) basis functions i=1 T. Romary et al. (Mines ParisTech) RESSTE 6 / 25

Introduction Spatio-temporal framework Temporal Structure µ(x, t) = f t (x) β t Ξ t = H t Ξ t 1 + u t, such that var(ξ t Ξ t 1 ) = B t S(x, t) = σ(x, t)ε xt, where ε is a Gaussian white noise Inference by EM algorithm, see Katzfuss and Cressie [2011] Prediction by kriging, filtering, smoothing T. Romary et al. (Mines ParisTech) RESSTE 7 / 25

Introduction Spatial-only framework Assume E(Z(x)) = 0 where Z(x) = S(x) + p ξ i P i (x) + ε(x) i=1 cov(z(x), Z(y)) = C(x, y) = D(x, y) + P (x) t BP (y) + σ 2 xδ xy cov(s(x), S(y)) = D(x, y) = C NS (x, y)c CS (x, y) C NS is a non stationary covariance function (Fouedjio et al. [2014]) C CS is a compactly supported covariance function (Furrer et al. [2006], Gneiting [2002]), with fixed range parameter T. Romary et al. (Mines ParisTech) RESSTE 8 / 25

Inference Inference Several steps 1 Adjust D 2 Select p basis function from a dictionary by minimizing 3 Estimate B z P β 2 2 + λ β 1 T. Romary et al. (Mines ParisTech) RESSTE 9 / 25

Inference D term Compute and fit a local kernel variogram estimator at a set of knots Interpolate the parameters over the domain to get D(x, y) = C NS (x, y)c CS (x, y) see Fouedjio et al. [2014] for more details T. Romary et al. (Mines ParisTech) RESSTE 10 / 25

Inference Selection of basis functions Dictionary Multi resolution basis made of columns of compactly supported covariance matrices (including anisotropic ones) Selection by LASSO (glmnet) λ is set by cross-validation z P β 2 2 + λ β 1 T. Romary et al. (Mines ParisTech) RESSTE 11 / 25

Inference Estimation of B 1 Compute a non-parametric estimator of the covariance at a set of knots Ĉ K (x, y) = 1 K ij (z(x i ))(z(x j )) Σ Kxy where K ij = K h (x x i, y y j ) and Σ Kxy = i,j K ij 2 Compute P = QR 3 Compute B = R 1 Q T (ĈK A θ)q(r 1 ) T i,j T. Romary et al. (Mines ParisTech) RESSTE 12 / 25

Example Example Synthetic data on [0, 1] 2, Gaussian RF, centered with covariance where C(x, y) = 0.3sph( x y, 0.1) + P (x) t BP (y) + 0.1δ xy P 50 basis functions taken at random among 736 B drawn from a Wishart distribution Comparison between full model, small scale only, basis functions only, true model range of the small scale term = 0.05 T. Romary et al. (Mines ParisTech) RESSTE 13 / 25

Example Reference realization T. Romary et al. (Mines ParisTech) RESSTE 14 / 25

Example Reference realization sample (5000) T. Romary et al. (Mines ParisTech) RESSTE 15 / 25

Example Building the dictionary Ingredients models: cubic, Wendland 2 ranges: 0.5, 0.3 angles: 0, π/6, π/3, π/2, 2π/3, 5π/6 736 basis functions in total T. Romary et al. (Mines ParisTech) RESSTE 16 / 25

Example Selection T. Romary et al. (Mines ParisTech) RESSTE 17 / 25

Example Kriging map: full model T. Romary et al. (Mines ParisTech) RESSTE 18 / 25

Example Kriging map: full model T. Romary et al. (Mines ParisTech) RESSTE 19 / 25

Example Kriging map: short scale only T. Romary et al. (Mines ParisTech) RESSTE 20 / 25

Example Kriging map: basis only T. Romary et al. (Mines ParisTech) RESSTE 21 / 25

Example Kriging map: true model T. Romary et al. (Mines ParisTech) RESSTE 22 / 25

Example Results Model MSE MSE (true) Full 0.45 0.005 Taper only 0.48 0.03 Basis only (36) 0.35 0.17 True model 0.45 0 T. Romary et al. (Mines ParisTech) RESSTE 23 / 25

Example Discussion Work in progress Lots of tuning: knots (basis functions, variogram estimation, covariance estimation) parameters of the basis functions bandwidth parameters Prediction variances T. Romary et al. (Mines ParisTech) RESSTE 24 / 25

Example References S. Banerjee, A. E. Gelfand, A. O. Finley, and H. Sang. Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society. Series B, 70(4):825 848, 2008. N. Cressie and G. Johannesson. Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society. Series B, 70:209 226, 2008. Francky Fouedjio, Nicolas Desassis, and Jacques Rivoirard. A generalized convolution model and estimation for non-stationary random functions. arxiv preprint arxiv:1412.1373, 2014. R. Furrer, M. G. Genton, and D. W. Nychka. Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3):502 523, 2006. T. Gneiting. Compactly supported correlation functions. Journal of Multivariate Analysis, 83(2): 493 508, 2002. Matthias Katzfuss and Noel Cressie. Spatio-temporal smoothing and em estimation for massive remote-sensing data sets. Journal of Time Series Analysis, 32(4):430 446, 2011. H. Sang and J.Z. Huang. A full scale approximation of covariance functions for large spatial data sets. Journal of the Royal Statistical Society. Series B, 74(1):111 132, 2012. M. L. Stein. A modeling approach for large spatial datasets. Journal of the Korean Statistical Society, 37:3 10, 2008. T. Romary et al. (Mines ParisTech) RESSTE 25 / 25

Sparse approximation: Covariance Tapering Principle C tap (x, y) = C(x, y)c θ (x, y) where C θ (x, y) is a compactly supported kernel = C tap (x, y) generates a sparse matrix Complexity O(n), storage O(n) R. Furrer, M. G. Genton, and D. W. Nychka. Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3):502 523, 2006 T. Romary et al. (Mines ParisTech) RESSTE 26 / 25

Low rank approximation Principle C LR (x, y) = σ 2 δ xy + P (x) B P (y) where σ 2 is the variance of the nugget, P is a vector of p basis functions, B R p p with p n Complexity O(n p 2 ), storage O(n p) N. Cressie and G. Johannesson. Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society. Series B, 70:209 226, 2008 S. Banerjee, A. E. Gelfand, A. O. Finley, and H. Sang. Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society. Series B, 70(4): 825 848, 2008 T. Romary et al. (Mines ParisTech) RESSTE 27 / 25

Combination Principle C(x, y) C tap (x, y) + P (x) B P (y) Inversion by the Sherman-Morrison-Woodbury formula Complexity O(n p 2 ), storage O(n p) M. L. Stein. A modeling approach for large spatial datasets. Journal of the Korean Statistical Society, 37:3 10, 2008 H. Sang and J.Z. Huang. A full scale approximation of covariance functions for large spatial data sets. Journal of the Royal Statistical Society. Series B, 74(1):111 132, 2012 T. Romary et al. (Mines ParisTech) RESSTE 28 / 25