Introduction to Geostatistics

Similar documents
Introduction to Spatial Data and Models

Introduction to Spatial Data and Models

Hierarchical Modelling for Univariate Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data

Introduction to Spatial Data and Models

Hierarchical Modelling for Univariate Spatial Data

CBMS Lecture 1. Alan E. Gelfand Duke University

Basics of Point-Referenced Data Models

Point-Referenced Data Models

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Spatio-temporal Data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Handbook of Spatial Statistics Chapter 2: Continuous Parameter Stochastic Process Theory by Gneiting and Guttorp

Hierarchical Modeling for Multivariate Spatial Data

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian Linear Models

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

Introduction to Spatial Data and Models

Gaussian predictive process models for large spatial data sets.

Chapter 4 - Fundamentals of spatial processes Lecture notes

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Spatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.

Hierarchical Modeling for non-gaussian Spatial Data

Hierarchical Modelling for Multivariate Spatial Data

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Statistícal Methods for Spatial Data Analysis

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

Bayesian Linear Models

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Hierarchical Modelling for non-gaussian Spatial Data

Fluvial Variography: Characterizing Spatial Dependence on Stream Networks. Dale Zimmerman University of Iowa (joint work with Jay Ver Hoef, NOAA)

Bayesian Linear Models

Practicum : Spatial Regression

Hierarchical Modelling for Univariate and Multivariate Spatial Data

What s for today. Introduction to Space-time models. c Mikyoung Jun (Texas A&M) Stat647 Lecture 14 October 16, / 19

Hierarchical Modeling for Univariate Spatial Data

Introduction. Semivariogram Cloud

Chapter 4 - Fundamentals of spatial processes Lecture notes

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes 1. Schedule

Chapter 1. Summer School GEOSTAT 2014, Spatio-Temporal Geostatistics,

REML Estimation and Linear Mixed Models 4. Geostatistics and linear mixed models for spatial data

Spatial analysis is the quantitative study of phenomena that are located in space.

Bayesian Linear Regression

Types of Spatial Data

Spatial Data Mining. Regression and Classification Techniques

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets

An Introduction to Bayesian Linear Regression

Overview of Spatial Statistics with Applications to fmri

Kriging Luc Anselin, All Rights Reserved

Concepts and Applications of Kriging

Spatial Backfitting of Roller Measurement Values from a Florida Test Bed

Covariance function estimation in Gaussian process regression

Model Selection for Geostatistical Models

Concepts and Applications of Kriging

What s for today. All about Variogram Nugget effect. Mikyoung Jun (Texas A&M) stat647 lecture 4 September 6, / 17

Regression with correlation for the Sales Data

Introduction. Spatial Processes & Spatial Patterns

Spatial analysis. Spatial descriptive analysis. Spatial inferential analysis:

Spatio-temporal prediction of site index based on forest inventories and climate change scenarios

Association studies and regression

Principles of Bayesian Inference

Concepts and Applications of Kriging. Eric Krause

Spatial Statistics or Why Spatial is Special?

Using Estimating Equations for Spatially Correlated A

PRODUCING PROBABILITY MAPS TO ASSESS RISK OF EXCEEDING CRITICAL THRESHOLD VALUE OF SOIL EC USING GEOSTATISTICAL APPROACH

Dale L. Zimmerman Department of Statistics and Actuarial Science, University of Iowa, USA

Spatial and Environmental Statistics

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

PAPER 206 APPLIED STATISTICS

Introduction to Bayes and non-bayes spatial statistics

Concepts and Applications of Kriging. Eric Krause Konstantin Krivoruchko

Spatio-Temporal Geostatistical Models, with an Application in Fish Stock

Adaptive Sampling of Clouds with a Fleet of UAVs: Improving Gaussian Process Regression by Including Prior Knowledge

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Bayesian Thinking in Spatial Statistics

Practical 12: Geostatistics

Interpolation and 3D Visualization of Geodata

Space-time data. Simple space-time analyses. PM10 in space. PM10 in time

Analysing geoadditive regression data: a mixed model approach

Cross-covariance Functions for Tangent Vector Fields on the Sphere

Principles of Bayesian Inference

Likelihood-Based Methods

Multivariate spatial modeling

Hierarchical Modeling for Spatial Data

Some general observations.

arxiv: v1 [stat.me] 28 Dec 2017

Bayesian Transgaussian Kriging

Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes

The ProbForecastGOP Package

Spatial Statistics A Framework for Analyzing Geographically Referenced Data in Insurance Ratemaking

Multivariate Spatial Process Models. Alan E. Gelfand and Sudipto Banerjee

Practical 12: Geostatistics

On dealing with spatially correlated residuals in remote sensing and GIS

Spatial Correlation of Acceleration Spectral Ordinates from European Data

Transcription:

Introduction to Geostatistics Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland. 2 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles. 3 Departments of Forestry and Geography, Michigan State University, East Lansing, Michigan.

Course materials available at https://abhirupdatta.github.io/spatstatjsm2017/

What is spatial data? Any data with some geographical information 2

What is spatial data? Any data with some geographical information Common sources of spatial data: climatology, forestry, ecology, environmental health, disease epidemiology, real estate marketing etc have many important predictors and response variables are often presented as maps 2

What is spatial data? Any data with some geographical information Common sources of spatial data: climatology, forestry, ecology, environmental health, disease epidemiology, real estate marketing etc have many important predictors and response variables are often presented as maps Other examples where spatial need not refer to space on earth: Neuroimaging (data for each voxel in the brain) Genetics (position along a chromosome) 2

Point-referenced spatial data Each observation is associated with a location (point) Data represents a sample from a continuous spatial domain Also referred to as geocoded or geostatistical data Northing (km) 0 500 1000 1500 2000 2500 100 80 60 40 20 0 500 1000 1500 2000 2500 Easting (km) Figure: Pollutant levels in Europe in March, 2009 3

Point level modeling Point-level modeling refers to modeling of point-referenced data collected at locations referenced by coordinates (e.g., lat-long, Easting-Northing). Data from a spatial process {Y (s) : s D}, D is a subset in Euclidean space. Example: Y (s) is a pollutant level at site s Conceptually: Pollutant level exists at all possible sites Practically: Data will be a partial realization of a spatial process observed at {s 1,..., s n } Statistical objectives: Inference about the process Y (s); predict at new locations. Remarkable: Can learn about entire Y (s) surface. The key: Structured dependence 4

Exploratory data analysis (EDA): Plotting the data A typical setup: Data observed at n locations {s 1,..., s n } At each s i we observe the response y(s i ) and a p 1 vector of covariates x(s i ) Surface plots of the data often helps to understand spatial patterns y(s) x(s) Figure: Response and covariate surface plots for Dataset 1 5

What s so special about spatial? Linear regression model: y(s i ) = x(s i ) β + ɛ(s i ) ɛ(s i ) are iid N(0, τ 2 ) errors y = (y(s 1 ), y(s 2 ),..., y(s n )) ; X = (x(s 1 ), x(s 2 ),..., x(s n ) ) Inference: ˆβ = (X X) 1 X Y N(β, τ 2 (X X) 1 ) Prediction at new location s 0 : ŷ(s 0 ) = x(s 0 ) ˆβ Although the data is spatial, this is an ordinary linear regression model 6

Residual plots Surface plots of the residuals (y(s) y(s)) help to identify any spatial patterns left unexplained by the covariates Figure: Residual plot for Dataset 1 after linear regression on x(s) 7

Residual plots Surface plots of the residuals (y(s) y(s)) help to identify any spatial patterns left unexplained by the covariates Figure: Residual plot for Dataset 1 after linear regression on x(s) No evident spatial pattern in plot of the residuals The covariate x(s) seem to explain all spatial variation in y(s) Does a non-spatial regression model always suffice? 7

Western Experimental Forestry (WEF) data Data consist of a census of all trees in a 10 ha. stand in Oregon Response of interest: Diameter at breast height (DBH) Covariate: Tree species (Categorical variable) DBH Species Residuals 8

Western Experimental Forestry (WEF) data Data consist of a census of all trees in a 10 ha. stand in Oregon Response of interest: Diameter at breast height (DBH) Covariate: Tree species (Categorical variable) DBH Species Residuals Local spatial patterns in the residual plot Simple regression on species seems to be not sufficient 8

More EDA Besides eyeballing residual surfaces, how to do more formal EDA to identify spatial pattern? 9

More EDA Besides eyeballing residual surfaces, how to do more formal EDA to identify spatial pattern? First law of geography Everything is related to everything else, but near things are more related than distant things. Waldo Tobler 9

More EDA Besides eyeballing residual surfaces, how to do more formal EDA to identify spatial pattern? First law of geography Everything is related to everything else, but near things are more related than distant things. Waldo Tobler In general (Y (s + h) Y (s)) 2 roughly increasing with h will imply a spatial correlation Can this be formalized to identify spatial pattern? 9

Empirical semivariogram Binning: Make intervals I 1 = (0, m 1 ), I 2 = (m 1, m 2 ), and so forth, up to I K = (m K 1, m K ). Representing each interval by its midpoint t k, we define: N(t k ) = {(s i, s j ) : s i s j I k }, k = 1,..., K. Empirical semivariogram: γ(t k ) = 1 2 N(t k ) s i,s j N(t k ) (Y (s i ) Y (s j )) 2 For spatial data, the γ(t k ) is expected to roughly increase with t k A flat semivariogram would suggest little spatial variation 10

Empirical variogram: Data 1 y residuals Residuals display little spatial variation 11

Empirical variograms: WEF data Regression model: DBH Species DBH Residuals Variogram of the residuals confirm unexplained spatial variation 12

Modeling with the locations When purely covariate based models does not suffice, one needs to leverage the information from locations General model using the locations: y(s) = x(s) β + w(s) + ɛ(s) for all s D How to choose the function w( )? Since we want to predict at any location over the entire domain D, this choice will amount to choosing a surface w(s) How to do this? 13

Gaussian Processes (GPs) One popular approach to model w(s) is via Gaussian Processes (GP) The collection of random variables {w(s) s D} is a GP if it is a valid stochastic process all finite dimensional densities {w(s 1 ),..., w(s n )} follow multivariate Gaussian distribution A GP is completely characterized by a mean function m(s) and a covariance function C(, ) Advantage: Likelihood based inference. w = (w(s 1 ),..., w(s n )) N(m, C) where m = (m(s 1 ),..., m(s n )) and C = C(s i, s j ) 14

Valid covariance functions and isotropy C(, ) needs to be valid. For all n and all {s 1, s 2,..., s n }, the resulting covariance matrix C(s i, s j ) for (w(s 1 ), w(s 2 ),..., w(s n )) must be positive definite So, C(, ) needs to be a positive definite function Simplifying assumptions: Stationarity: C(s 1, s 2 ) only depends on h = s 1 s 2 (and is denoted by C(h)) Isotropic: C(h) = C( h ) Anisotropic: Stationary but not isotropic Isotropic models are popular because of their simplicity, interpretability, and because a number of relatively simple parametric forms are available as candidates for C. 15

Some common isotropic covariance functions Model Covariance function, C(t) = C( h ) [ 0 if t 1/φ Spherical C(t) = σ 2 1 3 2 φt + 1 2 (φt)3] if 0 < t 1/φ τ 2 + σ 2 otherwise { σ 2 exp( φt) if t > 0 Exponential C(t) = τ 2 + σ 2 otherwise { Powered σ 2 exp( φt p ) if t > 0 C(t) = exponential τ 2 + σ 2 otherwise { Matérn σ 2 (1 + φt) exp( φt) if t > 0 C(t) = at ν = 3/2 τ 2 + σ 2 otherwise 16

Notes on exponential model C(t) = { τ 2 + σ 2 if t = 0 σ 2 exp( φt) if t > 0. We define the effective range, t 0, as the distance at which this correlation has dropped to only 0.05. Setting exp( φt 0 ) equal to this value we obtain t 0 3/φ, since log(0.05) 3. The nugget τ 2 is often viewed as a nonspatial effect variance, The partial sill (σ 2 ) is viewed as a spatial effect variance. σ 2 + τ 2 gives the maximum total variance often referred to as the sill Note discontinuity at 0 due to the nugget. Intentional! To account for measurement error or micro-scale variability. 17

Covariance functions and semivariograms Recall: Empirical semivariogram: γ(t k ) = 1 2 N(t k ) s i,s j N(t k ) (Y (s i) Y (s j )) 2 For any stationary GP, E(Y (s + h) Y (s)) 2 /2 = C(0) C(h) = γ(h) γ(h) is the semivariogram corresponding to the covariance function C(h) Example: { For exponential GP, τ 2 + σ 2 (1 exp( φt)) if t > 0 γ(t) =, where t = h 0 if t = 0 18

Covariance functions and semivariograms 19

Covariance functions and semivariograms 19

The Matèrn covariance function The Matèrn is a very versatile family: C(t) = { σ 2 2 ν 1 Γ(ν) (2 νtφ) ν K ν (2 (ν)tφ) if t > 0 τ 2 + σ 2 if t = 0 K ν is the modified Bessel function of order ν (computationally tractable) ν is a smoothness parameter controlling process smoothness. Remarkable! ν = 1/2 gives the exponential covariance function 20

Kriging: Spatial prediction at new locations Goal: Given observations w = (w(s 1 ), w(s 2 ),..., w(s n )), predict w(s 0 ) for a new location s 0 If w(s) is modeled as a GP, then (w(s 0 ), w(s 1 ),..., w(s n )) jointly follow multivariate normal distribution w(s 0 ) w follows a normal distribution with Mean (kriging estimator): m(s 0 ) + c C 1 (w m) where m = E(w), C = Cov(w), c = Cov(w, w(s 0 )) Variance: C(s 0, s 0 ) c C 1 c The GP formulation gives the full predictive distribution of w(s 0 ) w 21

Modeling with GPs Spatial linear model y(s) = x(s) β + w(s) + ɛ(s) w(s) modeled as GP(0, C( θ)) (usually without a nugget) ɛ(s) iid N(0, τ 2 ) contributes to the nugget Under isotropy: C(s + h, s) = σ 2 R( h ; φ) w = (w(s 1 ),..., w(s n )) N(0, σ 2 R(φ)) where R(φ) = σ 2 (R( s i s j ; φ)) y = (y(s 1 ),..., y(s n )) N(Xβ, σ 2 R(φ) + τ 2 I) 22

Parameter estimation y = (y(s 1 ),..., y(s n )) N(Xβ, σ 2 R(φ) + τ 2 I) We can obtain MLEs of parameters β, τ 2, σ 2, φ based on the above model and use the estimates to krige at new locations In practice, the likelihood is often very flat with respect to the spatial covariance parameters and choice of initial values is important Initial values can be eyeballed from empirical semivariogram of the residuals from ordinary linear regression Estimated parameter values can be used for kriging 23

Model comparison For k total parameters and sample size n: AIC: 2k 2 log(l(y ˆβ, ˆθ, τˆ 2 )) BIC: log(n)k 2 log(l(y ˆβ, ˆθ, τ ˆ2 )) Prediction based approaches using holdout data: Root Mean Square Predictive Error (RMSPE): 1 nout n out i=1 (y i ŷ i ) 2 Coverage probability (CP): 1 n out nout i=1 I(y i (ŷ i,0.025, ŷ i,0.975 )) Width of 95% confidence interval (CIW): 1 nout n out i=1 (ŷ i,0.975 ŷ i,0.025 ) The last two approaches compares the distribution of y i instead of comparing just their point predictions 24

Back to WEF data Table: Model comparison Spatial Non-spatial AIC 4419 4465 BIC 4448 4486 RMSPE 18 21 CP 93 93 CIW 77 82 25

WEF data: Kriged surfaces DBH Estimates Standard errors 26

Summary Geostatistics Analysis of point-referenced spatial data Surface plots of data and residuals EDA with empirical semivariograms Modeling unknown surfaces with Gaussian Processes Kriging: Predictions at new locations Spatial linear regression using Gaussian Processes 27