Practical 12: Geostatistics

Similar documents
Practical 12: Geostatistics

GRAD6/8104; INES 8090 Spatial Statistic Spring 2017

Data Break 8: Kriging the Meuse RiverBIOS 737 Spring 2004 p.1/27

University of California, Los Angeles Department of Statistics

REN R 690 Lab Geostatistics Lab

Introduction to Spatial Analysis in R

Introduction to applied geostatistics. Short version. Overheads

University of California, Los Angeles Department of Statistics. Geostatistical data: variogram and kriging

Chapter 1. Summer School GEOSTAT 2014, Spatio-Temporal Geostatistics,

Point patterns. The average number of events (the intensity, (s)) is homogeneous

PAPER 206 APPLIED STATISTICS

Introduction to Geostatistics

Gridding of precipitation and air temperature observations in Belgium. Michel Journée Royal Meteorological Institute of Belgium (RMI)

Applied geostatistics Exercise 6 Assessing the quality of spatial predictions Geostatistical simulation

Analysing Spatial Data in R: Worked example: geostatistics

What s for today. All about Variogram Nugget effect. Mikyoung Jun (Texas A&M) stat647 lecture 4 September 6, / 17

Exploring the World of Ordinary Kriging. Dennis J. J. Walvoort. Wageningen University & Research Center Wageningen, The Netherlands

Introduction to Spatial Data and Models

Influence of parameter estimation uncertainty in Kriging: Part 2 Test and case study applications

11/8/2018. Spatial Interpolation & Geostatistics. Kriging Step 1

Introduction to Spatial Data and Models

Gstat: multivariable geostatistics for S

Open Source Geospatial Software - an Introduction Spatial Programming with R

Spatial Interpolation & Geostatistics

Gstat: Multivariable Geostatistics for S

Introduction. Semivariogram Cloud

University of California, Los Angeles Department of Statistics. Universal kriging

Types of Spatial Data

Spatial Data Analysis in Archaeology Anthropology 589b. Kriging Artifact Density Surfaces in ArcGIS

The ProbForecastGOP Package

Point-Referenced Data Models

11. Kriging. ACE 492 SA - Spatial Analysis Fall 2003

Spatial Data Mining. Regression and Classification Techniques

Geostatistics: Kriging

Basics of Point-Referenced Data Models

Geog 210C Spring 2011 Lab 6. Geostatistics in ArcMap

Package intamapinteractive

Recent Developments in Biostatistics: Space-Time Models. Tuesday: Geostatistics

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Optimizing Sampling Schemes for Mapping and Dredging Polluted Sediment Layers

Lecture 9: Introduction to Kriging

Package ProbForecastGOP

Spatial Backfitting of Roller Measurement Values from a Florida Test Bed

Spatial Interpolation Comparison Evaluation of spatial prediction methods

University of California, Los Angeles Department of Statistics. Effect of variogram parameters on kriging weights

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

7 Geostatistics. Figure 7.1 Focus of geostatistics

PRODUCING PROBABILITY MAPS TO ASSESS RISK OF EXCEEDING CRITICAL THRESHOLD VALUE OF SOIL EC USING GEOSTATISTICAL APPROACH

Soil Moisture Modeling using Geostatistical Techniques at the O Neal Ecological Reserve, Idaho

On dealing with spatially correlated residuals in remote sensing and GIS

COMPARISON OF DIGITAL ELEVATION MODELLING METHODS FOR URBAN ENVIRONMENT

ROeS Seminar, November

Multivariate Geostatistics

GRAD6/8104; INES 8090 Spatial Statistic Spring 2017

Uncertainty in merged radar - rain gauge rainfall products

USING R FOR BASIC SPATIAL ANALYSIS. Dartmouth College Research Computing

Plume-Scale Testing of a Simplified Method for Detecting Tritium Contamination in Plants & Soil

I don t have much to say here: data are often sampled this way but we more typically model them in continuous space, or on a graph

An Introduction to Spatial Autocorrelation and Kriging

Toward an automatic real-time mapping system for radiation hazards

LAB EXERCISE #3 Quantifying Point and Gradient Patterns

Spatial-Temporal Modeling of Active Layer Thickness

Empirical Bayesian Kriging

Umeå University Sara Sjöstedt-de Luna Time series analysis and spatial statistics

An Introduction to Pattern Statistics

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Improving Spatial Data Interoperability

Chapter 4 - Fundamentals of spatial processes Lecture notes

Statistícal Methods for Spatial Data Analysis

Worksheet 4 - Multiple and nonlinear regression models

Spatial Analysis II. Spatial data analysis Spatial analysis and inference

REML Estimation and Linear Mixed Models 4. Geostatistics and linear mixed models for spatial data

5. Geostatistics JEAN-MICHEL FLOCH INSEE. Abstract

Bayesian Transgaussian Kriging

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA

Geostatistical Density Mapping

Investigation of Monthly Pan Evaporation in Turkey with Geostatistical Technique

A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE

Space-time data. Simple space-time analyses. PM10 in space. PM10 in time

A Short Note on the Proportional Effect and Direct Sequential Simulation

University of California, Los Angeles Department of Statistics. Introduction

What s for today. Introduction to Space-time models. c Mikyoung Jun (Texas A&M) Stat647 Lecture 14 October 16, / 19

Index. Geostatistics for Environmental Scientists, 2nd Edition R. Webster and M. A. Oliver 2007 John Wiley & Sons, Ltd. ISBN:

Non-Ergodic Probabilistic Seismic Hazard Analyses

CREATION OF DEM BY KRIGING METHOD AND EVALUATION OF THE RESULTS

Spatiotemporal Analysis of Environmental Radiation in Korea

Package STMedianPolish

A kernel indicator variogram and its application to groundwater pollution

Spatial and spatio-temporal data in. ifgi. Institute for Geoinformatics University of Münster. Edzer Pebesma

Spatial Statistics or Why Spatial is Special?

Nonstationary models for exploring and mapping monthly precipitation in the United Kingdom

Practicum : Spatial Regression

Time-lapse filtering and improved repeatability with automatic factorial co-kriging. Thierry Coléou CGG Reservoir Services Massy

Geostatistical Analyst for Deciding Optimal Interpolation Strategies for Delineating Compact Zones

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

Concepts and Applications of Kriging

Assessing the covariance function in geostatistics

Pedometric Techniques in Spatialisation of Soil Properties for Agricultural Land Evaluation

Lecture 5 Geostatistics

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Transcription:

Practical 12: Geostatistics This practical will introduce basic tools for geostatistics in R. You may need first to install and load a few packages. The packages sp and lattice contain useful function and structures for the management of spatially distributed data. The package gstat provides tools for the analysis of geostatistical data. Meuse data We consider a classical dataset in geostatistics which is avaliable in the sp package. The data set consists of 155 samples of top soil heavy metal concentrations (ppm), along with a number of soil and landscape variables. The samples were collected in a flood plain of the river Meuse, near the village Stein (The Netherlands). library(sp) data(meuse) head(meuse) coordinates(meuse) <- c('x','y') You can see that the dataset reports the geographical coordinates (x and y) as well as the measurements associated to each data point. The coordinates function above instructs R about which column correspond to the coordinates; this changes the nature of the dataset, which is now treated as a SpatialPointsDataFrame, i.e. a data frame which associated spatial location. This is one of the data structure that are available in R for spatial data and it is provided by the package sp. If we now try to plot the data, R gives as the locations of the data point. plot(meuse) Other types of plots are avalable to explore the other variables in the data set, for example the quantity of zinc in the soil: spplot(meuse,'zinc',do.log=true) bubble(meuse,'zinc',do.log=true) Can you interpret the output of this plot? (Check the help if needed) While the SpatialPointsDataFrame structure is usually preferred for geostatistical data, other types of structure are available in the sp package. For example, the river borders can be described (and plotted) using SpatialPolygons and plotted together with the meuse dataset for better data visualisation. 1

data(meuse.riv) meuse.lst <- list(polygons(list(polygon(meuse.riv)), "meuse.riv")) meuse.sr <- SpatialPolygons(meuse.lst) plot(meuse.sr, col = "grey") plot(meuse, add = TRUE) Looking at the geography, can you suggest any interpretation for the variation in the quantity of zinc? To explore the spatial dependence, we can first plot the semivariogram cloud, i.e. the empirical semivarogram for all the distances observed in the dataset (we are transforming the quantity of zinc on the log scale first). library(gstat) cld <- variogram(log(zinc) ~ 1, data=meuse, cloud = TRUE) plot(cld, main = 'Semivariogram cloud') or better the binned semivariogram. svgm <- variogram(log(zinc) ~ 1, width=100, data=meuse) plot(svgm, main = 'Binned Semivariogram',pch=19) The parameter width controls the bandwidth, try to change the value and see what happens. It is also possible to include covariates in the formula (in place of 1), to account for a drift term in the model. Does the plot of the binned semivariogram suggests the presence of spatial dependence? If yes, does the process appear to be (second order) stationary? 2

We can then fit a parametric semivariogram model to the binned sample semivariogram, via weighted least squares, using the vgm and fit.variogram functions. The vgm function provides the expressions for a variety of parametric semivariograms models. You can type vgm() for a complete list. Let us now fit a spherical semivariogram. sph.model<-fit.variogram(svgm, vgm(psill=0.6, "Sph", range=800, nugget=0.2)) plot(svgm,sph.model) sph.model # model psill range # 1 Nug 0.06114813 0.0000 # 2 Sph 0.58610673 933.4006 We needed to specify initial values for the model parameters in the optimization algorithm: the partial sill (0.6), the range (800) and the nugget (0.2). Reasonable choices for these starting values can be obtained looking at the binned variogram plot. In particular, the algorithm may fail if you select completely unreasonable choices for the (effective) range. The estimated nugget is ˆτ 2 = 0.0611, estimated sill is 0.0611 + 0.586 = 0.647 (psill stands for partial sill, which are components that make up the sill) and estimated range is â 2 = 933. Try now to fit an exponential semivariogram model. exp.model<-fit.variogram(svgm, vgm(0.6, "Exp", 800, 0.2)) plot(svgm,exp.model) exp.model # model psill range # 1 Nug 0.01429689 0.0000 # 2 Exp 0.71486260 477.2015 What are the estimated nugget, sill and effective range? binned semivariogram? Which one better fits the Kriging Let us consider again the Meuse dataset, now with the aim of reconstructing a smooth surface of the (logarithm of) lead concentration. Let us start by assuming the process is stationary and follow the spherical semivariogram model as we have seen above. There are two alternative ways to obtain the kriging prediction, by using the krige function or by fitting a model with gstat and then use the predict option for that model. Both are doing the same mathematical operations (actually, krige is just a wrapper for gstat and predict ). Let us consider the simple kriging prediction first (we pretend to know the true mean, even if we estimate it from the data). 3

# grid for prediction data(meuse.grid) coordinates(meuse.grid) <- c('x','y') meuse.grid <- as(meuse.grid, 'SpatialPixelsDataFrame') beta.hat <- mean(log(meuse$lead)) # assumed known #simple kriging prediction lz.sk <- krige(log(lead)~1, meuse, meuse.grid, sph.model, beta = beta.hat) plot(lz.sk) # the spplot gives you prediction # variance as well (although the colorscale is not great) spplot(lz.sk, col.regions=bpy.colors(n = 100, cutoff.tails = 0.2)) The ordinary kriging prediction can be easily obtained by not providing a mean value: ## ordinary kriging lz.ok <- krige(log(lead)~1, meuse, meuse.grid, sph.model) plot(lz.ok) Looking at the concentration of lead and the position of the observation with respect to the river, we see that the lead concentration appears to change with the distance from the river (which is one of the parameter in the dataset, dist). Let us consider first the scatterplot of log(lead) and dist. plot(meuse$dist,log(meuse$lead)) This suggests to fit a universal kriging model where the mean is a function of the distance from the river (possibly a square root, looking at the scatterplot). 4

lead.gstat <- gstat(id = 'lead', formula = log(lead) ~ sqrt(dist), data = meuse, model=sph.model) lead.gstat lead.uk <- predict(lead.gstat, newdata = meuse.grid) plot(lead.uk) The gstat function automatically (and iteratively) estimates the drift, the residuals and the residual variogram, then the predict function compute the universal kriging predictor for the new locations. It is also possible to get an evaluation of the drift using the option BLUE=TRUE: 5

lead.trend<-predict(lead.gstat,newdata = meuse.grid,blue=true) plot(lead.trend, main='drift') You can also get the predicted value of the field or the estimated trend in a specific new observation (but note that you need to provide also the correspondent value for the predictors in the non-stationary mean model, in this case the distance from the river) new_obs<-data.frame(x=179660,y=331860,dist=0.124805) coordinates(new_obs)<-c('x','y') predict(lead.gstat,newdata=new_obs) # [using universal kriging] # coordinates lead.pred lead.var # 1 (179660, 331860) 4.59488 0.1460198 predict(lead.gstat,newdata=new_obs,blue=true) # [generalized least squares trend estimation] # coordinates lead.pred lead.var # 1 (179660, 331860) 4.92977 0.03846024 Radioactivity data The file radio reports the information on 158 control units in the area around a nuclear power plant. At each site, available data consist of: radioactivity levels [Bq], longitude [Long], latitude [Lat] and type of soil [Soil], a factor with two levels, U, urban, and V, vegetation]. filepath <- "http://www.statslab.cam.ac.uk/~tw389/teaching/slp18/data/" filename <- "radioactivity" radio <- read.table(paste0(filepath, filename), header=t) head(radio) coordinates(radio) <- c('long', 'Lat') spplot(radio, 'Soil') spplot(radio, 'Bq') 6

Explore the data graphically and comment on how radioactivity and type of soil correlate in the space. Fit a linear model with the soil as predictor, assuming for the errors a spherical semivariogram. Try to use a semivarogram model both with and without nugget. Which one is preferable? Write down the algebraic form of the fitted model. What are the estimates of the semivariogram parameters? What are the estimates of the coefficients of the linear model (hint: think to how they related to the predicted drift)? Using the chosen model, predict the radioactivity level at the location (Long = 78.59, Lat = 35.34), which is a parking lot. Estimate the variance of prediction error at the same location. 7