GEOSTATISTICAL ANALYSIS OF SPATIAL DATA. Goovaerts, P. Biomedware, Inc. and PGeostat, LLC, Ann Arbor, Michigan, USA

Similar documents
PRODUCING PROBABILITY MAPS TO ASSESS RISK OF EXCEEDING CRITICAL THRESHOLD VALUE OF SOIL EC USING GEOSTATISTICAL APPROACH

A MultiGaussian Approach to Assess Block Grade Uncertainty

Spatiotemporal Analysis of Environmental Radiation in Korea

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA

GEOINFORMATICS Vol. II - Stochastic Modelling of Spatio-Temporal Phenomena in Earth Sciences - Soares, A.

Formats for Expressing Acceptable Uncertainty

7 Geostatistics. Figure 7.1 Focus of geostatistics

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

Entropy of Gaussian Random Functions and Consequences in Geostatistics

Soil Moisture Modeling using Geostatistical Techniques at the O Neal Ecological Reserve, Idaho

Transiogram: A spatial relationship measure for categorical data

Optimizing Thresholds in Truncated Pluri-Gaussian Simulation

Multiple-Point Geostatistics: from Theory to Practice Sebastien Strebelle 1

Improving geological models using a combined ordinary indicator kriging approach

ENVIRONMENTAL MONITORING Vol. II - Geostatistical Analysis of Monitoring Data - Mark Dowdall, John O Dea GEOSTATISTICAL ANALYSIS OF MONITORING DATA

Advances in Locally Varying Anisotropy With MDS

Automatic Determination of Uncertainty versus Data Density

Conditional Distribution Fitting of High Dimensional Stationary Data

A robust statistically based approach to estimating the probability of contamination occurring between sampling locations

4th HR-HU and 15th HU geomathematical congress Geomathematics as Geoscience Reliability enhancement of groundwater estimations

Reservoir Uncertainty Calculation by Large Scale Modeling

Statistical Rock Physics

Acceptable Ergodic Fluctuations and Simulation of Skewed Distributions

A Case for Geometric Criteria in Resources and Reserves Classification

Investigation of Monthly Pan Evaporation in Turkey with Geostatistical Technique

The Proportional Effect of Spatial Variables

A Short Note on the Proportional Effect and Direct Sequential Simulation

Substitution Random Fields with Gaussian and Gamma Distributions: Theory and Application to a Pollution Data Set

Modeling of Atmospheric Effects on InSAR Measurements With the Method of Stochastic Simulation

Building Blocks for Direct Sequential Simulation on Unstructured Grids

DATA COLLECTION AND ANALYSIS METHODS FOR DATA FROM FIELD EXPERIMENTS

Dale L. Zimmerman Department of Statistics and Actuarial Science, University of Iowa, USA

What is Non-Linear Estimation?

Monte Carlo Simulation. CWR 6536 Stochastic Subsurface Hydrology

Best Practice Reservoir Characterization for the Alberta Oil Sands

Exploring the World of Ordinary Kriging. Dennis J. J. Walvoort. Wageningen University & Research Center Wageningen, The Netherlands

Spatial Interpolation & Geostatistics

Quantifying uncertainty of geological 3D layer models, constructed with a-priori

Correcting Variogram Reproduction of P-Field Simulation

The Snap lake diamond deposit - mineable resource.

Statistical Tools and Concepts

Combining geological surface data and geostatistical model for Enhanced Subsurface geological model

Simulating Spatial Distributions, Variability and Uncertainty of Soil Arsenic by Geostatistical Simulations in Geographic Information Systems

Geostatistics: Kriging

11/8/2018. Spatial Interpolation & Geostatistics. Kriging Step 1

Simulating error propagation in land-cover change analysis: The implications of temporal dependence

COLLOCATED CO-SIMULATION USING PROBABILITY AGGREGATION

Drill-Holes and Blast-Holes

Geostatistical Determination of Production Uncertainty: Application to Firebag Project

Isatis applications for soil pollution mapping and risk assessment

COLLOCATED CO-SIMULATION USING PROBABILITY AGGREGATION

Geostatistics for radiological characterization: overview and application cases

C. BRANQUINHO Museu Laboratório Faculdade de Ciências da Universidade de Lisboa,

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE

Using linear and non-linear kriging interpolators to produce probability maps

Tricks to Creating a Resource Block Model. St John s, Newfoundland and Labrador November 4, 2015

Conditional Standardization: A Multivariate Transformation for the Removal of Non-linear and Heteroscedastic Features

Extent of Radiological Contamination in Soil at Four Sites near the Fukushima Daiichi Power Plant, Japan (ArcGIS)

Facies Modeling in Presence of High Resolution Surface-based Reservoir Models

RC 3.3. Summary. Introduction

Estimation of direction of increase of gold mineralisation using pair-copulas

Linear inverse Gaussian theory and geostatistics a tomography example København Ø,

Kriging in the Presence of LVA Using Dijkstra's Algorithm

Maximizing the Capital Efficiency of Contaminated Upstream Oil and Gas Sites Assessments by Using Geostatistical Modeling Approach

Defining Geological Units by Grade Domaining

Stanford Exploration Project, Report 105, September 5, 2000, pages 41 53

NEW GEOLOGIC GRIDS FOR ROBUST GEOSTATISTICAL MODELING OF HYDROCARBON RESERVOIRS

Spatial Modeling of Land Use and Land Cover Change

Optimization of Sediment Sampling at a Tidally Influenced Site (ArcGIS)

Anomaly Density Estimation from Strip Transect Data: Pueblo of Isleta Example

Stepwise Conditional Transformation for Simulation of Multiple Variables 1

Geographic Information Systems In Petroleum Exploration And Development (AAPG Computer Applications In Geology, No. 4) (Aapg Computer Applications In

Time to Depth Conversion and Uncertainty Characterization for SAGD Base of Pay in the McMurray Formation, Alberta, Canada*

Landslide Hazard Assessment Methodologies in Romania

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

Quantifying Uncertainty in Mineral Resources with Classification Schemes and Conditional Simulations

Geostatistics in Geotechnical Engineering: Fad or Empowering? R.E. Hammah 1 and J.H. Curran 2

Multiple realizations using standard inversion techniques a

Toward an automatic real-time mapping system for radiation hazards

OFTEN we need to be able to integrate point attribute information

U.S. GEOLOGICAL SURVEY ASSESSMENT MODEL FOR UNDISCOVERED CONVENTIONAL OIL, GAS, AND NGL RESOURCES THE SEVENTH APPROXIMATION

Comparison of Methods for Deriving a Digital Elevation Model from Contours and Modelling of the Associated Uncertainty

Beta-Binomial Kriging: An Improved Model for Spatial Rates

A C O M P AR I S O N O F O R D I N AR Y AN D S I M P L E L I M B O F T H E B U S H V E L D C O M P L E X

Geostatistical Determination of Production Uncertainty: Application to Pogo Gold Project

Estimation or stochastic simulation in soil science?

Practical application of drill hole spacing analysis in coal resource estimation

Geostatistical applications in petroleum reservoir modelling

Simulating the spatial distribution of clay layer occurrence depth in alluvial soils with a Markov chain geostatistical approach

Markov Chain Modeling of Multinomial Land-Cover Classes

GLOBAL SYMPOSIUM ON SOIL ORGANIC CARBON, Rome, Italy, March 2017

Assessment of Three Spatial Interpolation Models to Obtain the Best One for Cumulative Rainfall Estimation (Case study: Ramsar District)

SASI Spatial Analysis SSC Meeting Aug 2010 Habitat Document 5

Spatiotemporal Analysis of Solar Radiation for Sustainable Research in the Presence of Uncertain Measurements

Basic Geostatistics: Pattern Description

Geostatistical characterisation of contaminated metals: methodology and illustrations

Combining Deterministic and Probabilistic Methods to Produce Gridded Climatologies

Estimating threshold-exceeding probability maps of environmental variables with Markov chain random fields

Ecological Informatics

ASPECTS REGARDING THE USEFULNESS OF GEOGRAPHICALLY WEIGHTED REGRESSION (GWR) FOR DIGITAL MAPPING OF SOIL PARAMETERS

Transcription:

GEOSTATISTICAL ANALYSIS OF SPATIAL DATA Goovaerts, P. Biomedware, Inc. and PGeostat, LLC, Ann Arbor, Michigan, USA Keywords: Semivariogram, kriging, spatial patterns, simulation, risk assessment Contents 1. Introduction 2. Description of Spatial Patterns 3. Modeling Spatial Variation 4. Spatial Prediction 5. Modeling the Local Uncertainty 6. Stochastic Simulation 7. Accounting for Uncertainty in Decision-making 8. Conclusions Acknowledgements Glossary Bibliography Biographical Sketch Summary This article presents an overview of the geostatistical tools available for processing spatial data and illustrates the different steps of a typical geostatistical analysis using an environmental data set. First, geostatistics provides descriptive tools such as semivariograms to characterize the spatial pattern of continuous and categorical soil attributes. Various interpolation (kriging) techniques capitalize on the spatial correlation between observations to predict attribute values at unsampled locations using information related to one or several attributes. An important contribution of geostatistics is the assessment of the uncertainty about unsampled values, which usually takes the form of a map of the probability of exceeding critical values, such as regulatory thresholds in pollution or criteria for soil quality. This uncertainty assessment can be combined with expert knowledge for decision making such as delineation of contaminated areas where remedial measures should be taken or fertile areas where specific management plans can be developed. Last, stochastic simulation allows one to generate several models (images) of the spatial distribution of attribute values, all of which are consistent with the information available. A given scenario (remediation process, land use policy) can be applied to the set of realizations, allowing the uncertainty of the response (remediation efficiency, soil productivity) to be assessed. 1. Introduction During the last decade, the development of computational resources and geoinformatics has fostered the use of numerical methods to process the large bodies of data that are

measured in the geosciences. A key feature of geoscience information is that each observation relates to a particular location in space. For example, Figure 1 shows the spatial distribution of five stratigraphic classes and of concentrations of two heavy metals recorded, respectively, at 359 and 259 locations in a 14.5km area in the Swiss Jura. Knowledge of an attribute value, say a pollutant concentration, is of little interest unless the location of the measurement is known and accounted for in the analysis. Another feature is that the information available is usually sparse which, in combination with the imperfect knowledge of underlying processes, leads to a large uncertainty about the actual spatial distribution of values. Such an uncertainty needs to be quantified and accounted for in decision-making, hence probabilistic (statistical) tools are increasingly preferred to a deterministic approach where a single (error-free) representation is sought. Figure 1. Locations of sampling sites superimposed on the geologic map, and concentrations in cadmium (Cd) and nickel (Ni) at 259 of these sites (units = mg kg -1 ). Different types of spatial data can be distinguished: lattice observations whose spatial locations are regularly spaced (e.g., gridded data, such as satellite sensor imagery or systematic soil survey), point patterns where the important variable to be analyzed is the location of events, and geostatistical data which can be measured continuously in space (e.g., soil properties). This article deals with the latter type of data, which are the most common in the geosciences. It is noteworthy that geostatistics is also widely used for the analysis of remotely sensed data. Geostatistics provides a set of statistical tools for incorporating the spatial and temporal coordinates of observations in data processing (see Stochastic Modeling of Spatiotemporal Phenomena in Earth Sciences). It is a relatively new discipline, which was

developed in the 1960s, primarily by mining engineers who were facing the problem of evaluating recoverable reserves in mining deposits. Priority was given to practicality, a current trademark of geostatistics that explains its success and application in such diverse fields as mining, petroleum, soil science, oceanography, hydrogeology, remote sensing, agriculture, and environmental sciences. Geostatistics can be used for three main purposes: 1) description of spatial patterns, 2) spatial interpolation, and 3) modeling of local and spatial uncertainty. In other words, looking at the example of Figure 1, here are some of the key issues that geostatistics allows one to address. What are the main features of the spatial patterns of heavy metals and how do they relate to the distribution of potential sources, such as rock types and human activities? What is the metal concentration that could be expected at an unsampled location? What is the probability that the regulatory threshold is exceeded at an unsampled location? Which areas should be remediated and what is the risk of making a wrong decision that is classifying as safe contaminated locations or classifying as contaminated safe locations? Where should additional observations be measured to increase the accuracy of the predictions and reduce the risk of misclassification? These few examples illustrate the potential of geostatistics for improving the understanding and characterization of the environment, leading to more accurate models of the spatial distribution of attribute values, and subsequently more informed decision-making. 2. Description of Spatial Patterns Analysis of spatial data typically starts with a spatial posting of data values such as in Figure 1. For both continuous (metal concentrations) and categorical (rock type) attributes, the spatial distribution of values is not random, in that observations close to each other on the ground tend to be more alike than those further apart. The presence of such a spatial structure is a prerequisite to the application of geostatistics, and its description is a preliminary step towards spatial prediction or modeling of uncertainty. Consider the problem of describing the spatial pattern of a continuous attribute z, say a pollutant concentration such as cadmium or nickel. The information available consists of values of the variable z at n locations: u α,{( z u α), α = 1,2,, n}, where u α is a vector of spatial coordinates in up to three dimensions. Spatial patterns are usually described using the experimental semivariogram ˆ( γ h ), which measures the average dissimilarity between data separated by a vector h. It is computed as half the average squared difference between the components of data pairs: N ( ) h α = 1 1 ˆ( γ h) = [ z( uα) z( uα + h)] 2 N( h) 2 (1)

where N( h ) is the number of data pairs within a given class of distance and direction. Figure 2 (the top graphs) shows the semivariograms computed from the data of Figure 1 using distance classes of 100~m. Data pairs in all directions were pooled, and such semivariograms are called omnidirectional. For both metals, semivariogram values increase with the separation distance, reflecting the intuitive feeling that two concentrations close to each other on the ground are more alike and, thus, their squared difference is smaller than those further apart. The two semivariograms stop increasing at a given distance, called the range, which can be interpreted as the distance of dependence, or zone of influence, of metal concentrations. The discontinuity at the origin of the semivariogram (i.e., at very small separation distances) is called the nugget effect and arises from measurement errors or sources of spatial variation at distances smaller than the shortest sampling interval or both. These graphs point out distinct spatial behaviors of the two metals: Ni concentrations appear to vary more continuously than Cd concentrations, as illustrated by the smaller nugget effect and larger range of its semivariogram. In combination with knowledge about the phenomenon and the study area, such a spatial description can enhance our understanding of the physical underlying mechanisms controlling spatial patterns. In the present example, the long-range structure of the semivariogram of Ni concentrations is probably related to the control asserted by rock type, while the short-range structure for cadmium suggests the impact of local human-induced contamination. Figure 2. Experimental omnidirectional semivariograms for Cd and Ni: original concentrations and indicator transforms using thresholds corresponding to the second- (- --), fifth- (--~--) and eighth-decile (-~-~-) of the sample histogram. Spatial patterns may differ depending on whether the attribute value is small, medium, or large. For example, in many environmental applications, a few random hot spots of large concentrations coexist with a background of small values that vary more continuously in space. Depending on whether large concentrations are clustered or

scattered in space, the interpretation of the physical processes controlling contamination and the decision for remediation may change. The characterization of the spatial distribution of z -values above or below a given threshold value z k requires a prior coding of each observation z( u α ) as an indicator datum i( u α ; z k ), defined as: 1 if z( uα ) zk i( u α ; zk ) = (2) 0 otherwise Indicator semivariograms can then be computed by substituting indicator data i( u α ; z k ) for z -data z( u ) in the equation (1): α N ( ) γ 1 zk = h h i uα zk i uα + h z Ι k 2 N( h) α = 1 ˆ ( ; ) [ ( ; ) ( ; )] The indicator variogram value 2 ˆ γ Ι ( h ; z k ) measures how often two z -values separated by a vector h are on opposite sides of the threshold value z k. In other words, 2 ˆ γ Ι ( h ; z k ) measures the transition frequency between two classes of z -values as a function of h. The greater is ˆ γ Ι ( h ; z k ), the less connected in space are the small or large values. Figure 2 (the bottom graphs) shows the omnidirectional indicator semivariograms computed for the second-, fifth- and eighth-decile of the distributions of cadmium and nickel concentrations. For both metals, indicator semivariograms for small concentrations have smaller nugget effect than those for larger concentrations, which suggests that homogeneous areas of small concentrations coexist within larger zones where large and medium concentrations are intermingled. Two clusters of small concentrations are indeed apparent on the location maps of Figure 1 and correspond to the Argovian rocks. 2 Many variables in geosciences, such as texture or land use classes, take only a limited number of states, which might be ordered or not. The spatial patterns of such categorical variables can also be described using geostatistics. Let S be a categorical attribute with K possible states sk, k = 1, 2,, K. The K states are exhaustive and mutually exclusive in the sense that one and only one state s k occurs at each location u α. The pattern of spatial variation of a category s k can be characterized by semivariograms of type (3) defined on an indicator coding of the presence or absence of that category: (3) 1 if s( uα ) = sk i( u α ; sk ) = (4) 0 otherwise

The indicator variogram value 2 ˆ γ Ι ( h ; ) measures how often two locations a vector h s k apart belong to different categories s k s k. The smaller is 2 ˆ γ ( ; s ) Ι h k, the more connected is category s k. The ranges and shapes of the directional indicator semivariograms reflect the geometric patterns of s k. Figure 3 shows the indicator semivariograms of two stratigraphic classes of Figure 1 computed in four directions with an angular tolerance of 22.5. For both classes the indicator semivariogram value equals zero at the first lag, which means that any two data locations less than 100~m apart belong to the same formation. The longer SW-NE range (larger dashed line) reflects the corresponding preferential orientation of these two lithologic formations. Figure 3. Experimental indicator semivariograms of Argovian and Sequanian rocks computed in four directions measured in degrees clockwise from North (---: 22.5, --~--: 67.5, -~-~- : 112.5, ~~: 157.5; angular tolerance = 22.5). Information in the geosciences is often multivariate, and the semivariogram can be generalized to the bivariate case, allowing one to investigate how the correlation between two attributes (e.g., Ni and Cd concentrations) varies in space. Threedimensional data can be analyzed using the same tools although the description and visualization of variability along the different dimensions becomes more complicated. - - - TO ACCESS ALL THE 19 PAGES OF THIS CHAPTER, Visit: http://www.eolss.net/eolss-sampleallchapter.aspx Bibliography Cressie N. (1993). Statistics for Spatial Data, 900pp. New-York: John Wiley & Sons. [This book provides statisticians with a comprehensive presentation of tools for analysing the three main types of spatial data].

Chiles, J.P., Delfiner, P. (1999). Geostatistics. Modelling Spatial Uncertainty, 695pp. New York, NY, USA: John Wiley & Sons. [This recent book provides a thorough exploration of a wide spectrum of theoretical and practical aspects of geostatistics]. Deutsch C.V., Journel A.G. (1998). GSLIB: Geostatistical Software Library and User's Guide. Second Edition, 369pp. New-York: Oxford University Press. [This represents the most widely used library of public-domain geostatistical codes]. Dungan J. (1998). Spatial prediction of vegetation quantities using ground and image data. International Journal of Remote Sensing 19, 267-285. [This work presents an application of geostatistics to the integration of remotely sensed and field data]. Goovaerts P. (1997). Geostatistics for Natural Resources Evaluation. 483pp. New York: Oxford University Press. [This book provides practitioners with a comprehensive presentation of the state-of-theart in geostatistics]. Isaaks E.H., Srivastava R.M. (1989). An Introduction to Applied Geostatistics. 561pp. New York: Oxford University Press. [This book is a primer for a self-taught introduction to geostatistics]. Journel A.G., Huijbregts C.J. (1978). Mining Geostatistics. 600pp. New York: Academic Press. [This is a seminal work in geostatistics, reflecting its mining origin]. Rossi R.E., Mulla D.J., Journel A.G., Franz E.H. (1992). Geostatistical tools for modeling and interpreting ecological spatial dependence. Ecology Monographs 62, 277-314. [This is one of the few review papers dealing with the application of geostatistics to ecological data]. Yarus, J.M., Chambers, R.L. (editors) (1994). Stochastic Modeling and Geostatistics. Principles, Methods, and Case Studies, Vol. 3, 379pp. Tulsa, OK, USA: The American Association of Petroleum Geologists (AAPG). [This book presents an overview of geostatistics as applied to the petroleum industry]. Biographical Sketch Pierre Goovaerts is currently Chief Scientist for the Research and Development company Biomedware, Inc and is the president of the consulting company PGeostat, LLC. He obtained his Ph.D. in agriculture engineering from the Catholic University of Louvain-la-Neuve, Belgium. His general expertise is in the sampling and geostatistical treatment (semivariogram analysis, spatial prediction and stochastic simulation) of data, with an emphasis on the assessment of uncertainty attached to prediction and its use in decision-making process, such as identification of locations for additional sampling or delineation of contaminated areas. He is the author of "Geostatistics for Natural Resources Evaluation", which has become a reference textbook in the field. Pierre Goovaerts is the author or co-author of more than 50 publications in refereed journals, and is a reviewer for more than 40 journals. He was the recipient of the 1999 Andrei Borisovich Vistelius Research Award, given by the International Association of Mathematical Geology for an original and outstanding contribution by a young scientist to the application of mathematics and informatics to the earth sciences.