Transiogram: A spatial relationship measure for categorical data

Similar documents
Agricultural University, Wuhan, China b Department of Geography, University of Connecticut, Storrs, CT, Available online: 11 Nov 2011

Regional-scale modelling of the spatial distribution of surface and subsurface textural classes in alluvial soils using Markov chain geostatistics

To link to this article:

Simulating the spatial distribution of clay layer occurrence depth in alluvial soils with a Markov chain geostatistical approach

Visualizing Spatial Uncertainty of Multinomial Classes in Area-class Mapping

Optimizing Thresholds in Truncated Pluri-Gaussian Simulation

PRODUCING PROBABILITY MAPS TO ASSESS RISK OF EXCEEDING CRITICAL THRESHOLD VALUE OF SOIL EC USING GEOSTATISTICAL APPROACH

Markov Chain Modeling of Multinomial Land-Cover Classes

Combining geological surface data and geostatistical model for Enhanced Subsurface geological model

Building Blocks for Direct Sequential Simulation on Unstructured Grids

Advances in Locally Varying Anisotropy With MDS

Bayesian Markov Chain Random Field Cosimulation for Improving Land Cover Classification Accuracy

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

Teacher s Aide Geologic Characteristics of Hole-Effect Variograms Calculated from Lithology-Indicator Variables 1

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA

Application of Transition Probability Geostatistics in a Detailed Stratigraphic Framework. Gary Weissmann University of New Mexico

7 Geostatistics. Figure 7.1 Focus of geostatistics

Spatiotemporal Analysis of Environmental Radiation in Korea

Estimating threshold-exceeding probability maps of environmental variables with Markov chain random fields

LettertotheEditor. Comments on An efficient maximum entropy approach for categorical variable prediction by D. Allard, D. D Or & R.

Markov Chain Random Fields for Estimation of Categorical Variables

A MultiGaussian Approach to Assess Block Grade Uncertainty

GEOSTATISTICAL ANALYSIS OF SPATIAL DATA. Goovaerts, P. Biomedware, Inc. and PGeostat, LLC, Ann Arbor, Michigan, USA

Automatic Determination of Uncertainty versus Data Density

Many spatial attributes are classified into mutually exclusive

Conditional Distribution Fitting of High Dimensional Stationary Data

Delineating well-head protection areas under conditions of hydrogeological uncertainty. A case-study application in northern Greece

Facies Modeling in Presence of High Resolution Surface-based Reservoir Models

COLLOCATED CO-SIMULATION USING PROBABILITY AGGREGATION

Parameter selection for region-growing image segmentation algorithms using spatial autocorrelation

Improving geological models using a combined ordinary indicator kriging approach

Kriging in the Presence of LVA Using Dijkstra's Algorithm

Spatial Data Mining. Regression and Classification Techniques

Relationships between Soil salinity and geopedological units in Saveh plain, Iran

Large Scale Modeling by Bayesian Updating Techniques

Multiple-Point Geostatistics: from Theory to Practice Sebastien Strebelle 1

Spatial hidden Markov chain models for estimation of petroleum reservoir categorical variables

Entropy of Gaussian Random Functions and Consequences in Geostatistics

Toward an automatic real-time mapping system for radiation hazards

Reservoir Uncertainty Calculation by Large Scale Modeling

4th HR-HU and 15th HU geomathematical congress Geomathematics as Geoscience Reliability enhancement of groundwater estimations

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE

A Short Note on the Proportional Effect and Direct Sequential Simulation

Contents 1 Introduction 2 Statistical Tools and Concepts

ENGRG Introduction to GIS

Correcting Variogram Reproduction of P-Field Simulation

Lecture 5 Geostatistics

Geostatistics: Kriging

Quantifying uncertainty of geological 3D layer models, constructed with a-priori

Soil Moisture Modeling using Geostatistical Techniques at the O Neal Ecological Reserve, Idaho

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

Comparison of Methods for Deriving a Digital Elevation Model from Contours and Modelling of the Associated Uncertainty

On dealing with spatially correlated residuals in remote sensing and GIS

Prediction of Soil Properties Using Fuzzy Membership

Drill-Holes and Blast-Holes

COLLOCATED CO-SIMULATION USING PROBABILITY AGGREGATION

Stanford Exploration Project, Report 105, September 5, 2000, pages 41 53

Statistical Rock Physics

Quantitative Seismic Interpretation An Earth Modeling Perspective

Multiple realizations using standard inversion techniques a

An Introduction to Geographic Information System

Geostatistical applications in petroleum reservoir modelling

SoLIM: A new technology for soil survey

Spatial Analysis II. Spatial data analysis Spatial analysis and inference

The Snap lake diamond deposit - mineable resource.

3D Hydrogeological Structure Modeling Based on TPROGS A Case Study from the West Liaohe Plain

Propagation of Errors in Spatial Analysis

QUANTIFYING CORRELATION IN THREE-DIMENSIONAL GEOLOGIC MAPPING OF GLACIAL DRIFT

An efficient maximum entropy approach for categorical variable prediction

2. REGIS II: PARAMETERIZATION OF A LAYER-BASED HYDROGEOLOGICAL MODEL

Coregionalization by Linear Combination of Nonorthogonal Components 1

Parameter Estimation and Sensitivity Analysis in Clastic Sedimentation Modeling

Assessing Pillar Geometries in the Witbank and Highveld Coalfields Using Geostatistical Techniques

Hierarchical Geostatistical Analysis of an Experimental Stratigraphy

29th Monitoring Research Review: Ground-Based Nuclear Explosion Monitoring Technologies

Improving Spatial Data Interoperability

Modeling of Atmospheric Effects on InSAR Measurements With the Method of Stochastic Simulation

Anomaly Density Estimation from Strip Transect Data: Pueblo of Isleta Example

Mapping Precipitation in Switzerland with Ordinary and Indicator Kriging

Introduction. Semivariogram Cloud

Geostatistics in Geotechnical Engineering: Fad or Empowering? R.E. Hammah 1 and J.H. Curran 2

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Tricks to Creating a Resource Block Model. St John s, Newfoundland and Labrador November 4, 2015

Capturing aquifer heterogeneity: Comparison of approaches through controlled sandbox experiments

Derivatives of Spatial Variances of Growing Windows and the Variogram 1

Efficient geostatistical simulation for spatial uncertainty propagation

Characterization of Geoobjects Continuity using Moments of Inertia

On stochastic modeling of flow in multimodal heterogeneous formations

Index. Geostatistics for Environmental Scientists, 2nd Edition R. Webster and M. A. Oliver 2007 John Wiley & Sons, Ltd. ISBN:

Universitat Autònoma de Barcelona Facultat de Filosofia i Lletres Departament de Prehistòria Doctorat en arqueologia prehistòrica

Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara

Formats for Expressing Acceptable Uncertainty

Abstract. Introduction. G.C. Bohling and M.K. Dubois Kansas Geological Survey Lawrence, Kansas, USA

Supplementary material: Methodological annex

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

GSLIB Geostatistical Software Library and User's Guide

Time to Depth Conversion and Uncertainty Characterization for SAGD Base of Pay in the McMurray Formation, Alberta, Canada*

SPATIAL VARIABILITY MAPPING OF N-VALUE OF SOILS OF MUMBAI CITY USING ARCGIS

GIST 4302/5302: Spatial Analysis and Modeling Lecture 2: Review of Map Projections and Intro to Spatial Analysis

GIST 4302/5302: Spatial Analysis and Modeling

Transcription:

International Journal of Geographical Information Science Vol. 20, No. 6, July 2006, 693 699 Technical Note Transiogram: A spatial relationship measure for categorical data WEIDONG LI* Department of Geography, University of Wisconsin, Madison, WI 53706, USA 1. Introduction (Received 28 February 2005 ) Categorical geographical variables are normally classified into multinomial classes which are mutually exclusive and visualized as area-class maps. Typical categorical variables such as soil types and land cover classes are multinomial and exhibit complex interclass relationships. Interclass relationships may include three situations: cross-correlation (i.e. interdependency), neighbouring situation (i.e. juxtaposition), and directional asymmetry of class patterns. In a space, some classes may be cross-correlated with apparent correlation ranges, but some classes may not be cross-correlated in the traditional sense. For example, if class A and class B occur at two separate subareas of a watershed, respectively, it may be difficult to say they are cross-correlated; but we still can define their interclass relationship as nonneighbouring. If this interclass relationship is effectively incorporated into a geostatistical model, class A and class B will not occur closely as neighbours in simulated results; but if this interclass relationship is ignored in a simulation conditioned on sparse samples, they may occur as neighbours in simulation results. This means that any class has a relationship with another class existing in the same space, and quantifying various spatial relationships of classes and incorporating them into simulation models are helpful in generating realistic realizations of the real spatial distribution of multinomial classes and decreasing spatial uncertainty associated with simulated results. To describe the auto-correlations within single classes and the relationships between different classes, we need practical spatial measures. So far, indicator variograms have been widely used as two-point spatial measures for characterizing the spatial correlations of discrete geographic data in the geosciences (Chiles and Delfiner 1999). However, the physical meanings of indicator variograms, particularly indicator cross-variograms, are difficult to interpret. Variograms are widely used mainly because of the wide application and acceptance of kriging-based (or variogram-based) geostatistics as interpolation and simulation techniques for spatial variables, which normally use variograms as input parameters (Deutsch and Journel 1998). Recent studies (Li et al. 2004, 2005, Zhang and Li 2005) and further progress in the development of practical multidimensional (multi-d) Markov chain conditional simulation models and algorithms will suggest a Markov chain-based geostatistics for simulating categorical variables. As the accompanying spatial measure with this new geostatistics, the author proposes the concept of the transiogram (i.e. 1-D transition probability diagram) and suggests using transiograms to replace Markov transition probability matrices (TPMs) as parameter *Email: weidong6616@yahoo.com International Journal of Geographical Information Science ISSN 1365-8816 print/issn 1362-3087 online # 2006 Taylor & Francis http://www.tandf.co.uk/journals DOI: 10.1080/13658810600607816

694 W. Li inputs to Markov chain models, so that transition probabilities can be estimated from a variety of data types, and complex spatial variations of multinomial classes can be incorporated into simulation. Similar to variograms, transiograms can also be used as independent spatial measures for characterizing spatial variability of discrete variables. Idealized transiograms (i.e. 1-D transition probability diagrams derived from onestep TPMs based on the first-order Markovian assumption) have been used in describing spatial variation of lithofacies (Schwarzacher 1969, Lou 1996) and in indicator kriging for modelling 3-D hydrofacies (Weissmann and Fogg 1999). Carle and Fogg (1996) discussed several properties of idealized transiograms. Ritzi (2000) explored the behaviour of an auto-transiogram from idealized data in relation to the variance in lengths of hydrofacies. However, so far, transiograms have been neither estimated from sampled point data nor effectively used as independent spatial measures for heterogeneity characterization as variograms have been used. The major reason may be that Markov chains were never developed into independent (from kriging) geostatistical approaches for conditional simulation before the recent emergence of the 2-D Markov chain conditional simulation approach (Li et al. 2004, 2005). With the further development and applications of Markov chain-based geostatistics, estimation of transiograms from various data (e.g. points, lines, and patches) and their interpretation and applications will become important issues. This note provides a simple introduction of the transiogram concept and some basic characteristics of typical transiograms. Complex issues such as the theoretical background, features of transiograms of complex categorical variables, and transiogram estimation and modelling from sparse data and expert knowledge will be addressed with case studies in Li (2006). The objective of this note is to introduce a new spatial relationship measure the transiogram to readers in geographical information science as an alternative for describing the spatial variability of categorical variables. 2. Transiogram 2.1 Definition A transiogram is defined as a diagram of 1-D Markov transition probabilities over the distance lag. The term is an analogy to variogram (i.e. semivariance diagram) used in kriging geostatistics, since both are spatial measures representing spatial correlations. Theoretically, a transiogram can be represented as a two-point conditional probability function: p ij ðhþ~prðzxzh ð Þ~jj zx ð Þ~iÞ, ð1þ where p ij (h) represents the transition probability of random variable Z from state i to state j over a distance lag h. With increasing lag h from zero to a further distance, values of p ij (h) form a diagram a transiogram. The lag h may be an exact distance measure (e.g. metres) or the number of spatial steps (i.e. pixels or grid cells). Here, the random variable Z is assumed to be second-order stationary, that is, the transition probability p ij (h) is only dependent on the lag h and not on any specific location x, so that transiograms can be estimated from data pairs in a space. p ii (h) is called an auto-transiogram and p ij (h) (i?j) across-transiogram. Auto-transiograms represent auto-correlations of individual classes, and cross-transiograms represent cross-correlations (more accurately interclass relationships) between different

Transiogram: A spatial relationship measure 695 classes. Class i in a transiogram p ij (h) is called the head class and class j the tail class. The head class and the tail class in a cross-transiogram are not interchangeable because of the asymmetric property of cross-transiograms. Transiograms may be estimated unidirectionally, multidirectionally, or omnidirectionally. If transiograms are estimated unidirectionally, a direction symbol d may be added to the notation of a transiogram as p d ijðhþ. 2.2 Basic properties As transition probability diagrams, transiograms have the following basic properties: (1) they are non-negative; (2) at any specific lag, values of transiograms headed by the same class sum to 1; (3) for mutually exclusive classes, transiograms should not have nuggets, because p ii (0)51 for auto-transiograms and p ij (0)50 for crosstransiograms always hold. These basic properties are also constraint conditions for transiogram modelling in Markov chain simulation, because they may be violated in the processes of model fitting and adjustment of transiogram models. To meet the third property, I suggest that the start point (i.e. point (0, 1) for auto-transiograms and point (0, 0) for cross-transiograms) should always be respected in transiogram modelling, so that a transiogram model always begins from the start point. To meet the second property, I suggest that when modelling experimental transiograms, one should always leave a single experimental transiogram among those headed by the same class not fitted by any mathematical model and infer its model as the left portion (i.e. 1 minus the sum of values of other fitted models at every lag). As to the first property, it normally follows if the second and third properties are met, and experimental transiograms are properly modelled. 2.3 Basic features and physical meanings Transiograms may be calculated from one-step TPMs based on the first-order Markovian assumption, or directly estimated from observed data. The transiograms derived from a one-step TPM are called idealized transiograms because by this we implicitly assume that the data are spatially stationary and first-order Markovian, which are normally not true for real observed data. Idealized transiograms can capture some basic spatial variation characteristics of discrete variables in a large area and are smooth curves. Therefore, they were used in describing the spatial variation of lithofacies (Schwarzacher 1969) and modelling experimental transiograms estimated from borehole data (Weissmann and Fogg 1999). They were also implicitly used in previous multi-d Markov chain simulations (Li et al. 2004). For the convenience of understanding transiograms and interpreting the spatial variation information conveyed by transiograms, we need to know the physical meanings of basic transiogram features. An auto-transiogram represents the change in transition probabilities of a class from one location to another location with increasing lag. An idealized auto-transiogram p ii (h) starts from the point (0, 1) and gradually decreases to a stable value the sill (figure 1(a)). This sill, for an area sufficiently large, should be equal to the proportion p i of that class in the area. The lag h at the place where the auto-transiogram stably approaches its sill is called the auto-correlation range, denoted by a i. While the auto-correlation range represents the distance of self-dependence of the class i, it does not directly indicate the size of polygons (i.e. boundary spacing) of that class. From the start point (0, 1), we may

696 W. Li Figure 1. Illustration of typical features of idealized transiograms. (a) Typical autotransiogram. (b) Typical cross-transiogram. (c) Two classes that are frequent neighbors. (d) Two classes that are infrequent neighbors. Scales along the x-axis are numbers of pixels. draw a tangent of the auto-transiogram to the x-axis. The lag h where the tangent crosses the x-axis is equal to the mean polygon size (i.e. mean boundary spacing) of the class, denoted by l i (also see figure 1(a)) (Carle and Fogg 1996). Cross-transiograms convey information of relationships between classes. A crosstransiogram represents the change in transition probabilities between two different classes from one location to another location with increasing lags. An idealized cross-transiogram p ii (h) starts from the point (0, 0) and gradually increases to a stable value the sill (figure 1(b)). The sill, for an area sufficiently large, should be equal to the proportion p j of the tail class j. Similarly, we have a cross-correlation range a ij at the distance where the sill is stably reached (or approached), which represents the distance of the interdependence of the two classes. At the section before a cross-transiogram stably approaches its sill, depending on the spatial distribution of the two involved classes whether they are frequent neighbours or not the shape of the cross-transiogram may be very different. If class j frequently occurs adjacent to class i, the transiogram p ii (h) will have a peak first and then approach its sill (figure 1(c)); if class j seldom occurs close to class i, transiogram p ij (h) normally will have a low-value section first and then approach its sill (figure 1(d )). Real transiograms (i.e. transiograms directly estimated from data) have nothing to do with the first-order Markovian assumption; therefore, they normally show a wealth of spatial variation information typical of classes in the real world. Real auto-transiograms may reveal a series of peaks and troughs as regular or irregular periodicities. This feature is also called hole effect in variograms because this form of auto-variogram is commonly observed in drill-holes that penetrate layered deposits (Jones and Ma 2001). The hole effect is a reflection of cyclic occurrence of a

Transiogram: A spatial relationship measure 697 Figure 2. Experimental transiograms of alluvial soil textural layer classes in the lateral direction (west to east) estimated from borehole logs along a long soil transect, showing irregular periodicities. (a) Experimental auto-transiogram. (b) Experimental cross-transiogram. Scales along the x-axis are numbers of pixels. lithology in the vertical direction. Such an effect may also reasonably appear in classes of categorical geographical variables on the ground surface, because the changes in the landscape may be periodic but may not be strong and regular. My study shows that such a hole effect exists but is not obvious in auto-transiograms of soil types and land-cover classes. However, it may be strong, for example, in autotransiograms estimated from borehole data of alluvial soil textural layer classes in the lateral direction (figure 2(a)). Strong and irregular peaks and troughs appear in cross-transiograms of typical categorical geographical variables such as soil types and land-cover classes over large areas. These irregular peaks and troughs should be a reflection of the irregular alternate occurrence of different landscapes. The relative height of the low-lag section and the position of the first peak in cross-transiograms are especially valuable in reflecting the neighbouring relationship of two classes. Similar features in real cross-transiograms of soil layers are observed from alluvial soil borehole data in the lateral direction (soil boreholes are normally too shallow for vertical transiograms to show vertical rhythms) (figure 2(b)). It should be noticed that, similar to variograms, real transiograms may not clearly show a stable sill and a correlation range within their maximum lags. This is normal, particularly when a study area is relatively small, and the spatial distribution of classes is strongly nonstationary. But their heights are still a reflection of the proportions of the corresponding tail classes. In addition, in a small area, sills of transiograms, particularly cross-transiograms estimated unidirectionally, may not be equal to proportions of corresponding tail classes. This is because of the boundary effect of small areas. The boundary effect refers to the fact that a class may have statistically biased smaller frequencies of transitions if it has a higher chance of occurring at boundaries of the study area, because boundary polygons are not complete and have no transition to other classes beyond the boundary. However, in a large study area, such a problem is not apparent. 3. Conclusion Transiograms provide an alternative spatial relationship measure for categorical geographical data. For describing complex relationships of multinomial classes, transiograms have several advantages over indicator variograms: 1. Cross-transiograms can detect the directional asymmetry of occurrence sequences of classes and the juxtaposition relationships between classes because of their asymmetric and unidirectionally irreversible properties.

698 W. Li 2. Transiograms are more interpretable physically and intuitively as direct probability representations. 3. Transiograms have more explicit relationships with characteristic parameters of classes such as proportions and parcel mean lengths. The application of transiograms, whether as independent tools or as parameter inputs in multi-d conditional simulation of categorical variables, will increase with the further development and applications of practical Markov chain models. A special advantage for transiograms is that they can be simply derived from onestep TPMs when TPMs are available. Although such idealized transiograms can capture only some basic spatial variation characteristics of classes, they are helpful in interpreting real transiograms and may be used as simplified transiogram models. However, idealized transiograms have limited uses because they cannot effectively reflect the complex features of spatial relationships of multinomial classes and are also not widely available due to the difficulty in estimating one-step TPMs from sparse samples. Therefore, it is preferable to estimate transiograms directly from sample data or from expert knowledge in applications. Acknowledgements I want to thank Prof. Mark Gahegan, Prof. Michael F. Goodchild, Prof. Jan Feyen, Dr C. Zhang, Prof. J. E. Burt, and many other colleagues for their encouragement and/or support on the development of Markov chain-based geostatistics. Insightful comments from Prof. Mark Gahegan and help from Mr Stephen D. Weaver on this technical note are greatly appreciated. References CARLE, S.F. and FOGG, G.E., 1996, Transition probability-based indicator geostatistics. Mathematical Geology, 28, pp. 453 477. CHILES, J-P. and DELFINER, P., 1999, Geostatistics Modeling Spatial Uncertainty (New York: Wiley). DEUTSCH, C.V. and JOURNEL, A.G., 1998, GSLIB: Geostatistics Software Library and User s Guide (New York: Oxford University Press). JONES, T.A. and MA, Y.Z., 2001, Teacher s aide: Geologic characteristics of hole-effect variograms calculated from lithology-indicator variables. Mathematical Geology, 33, pp. 615 629. LI, W., 2006, Transiograms for characterizing spatial variability of soil classes. Soil Science Society of America Journal (in further review). LI, W., ZHANG, C., BURT, J.E. and ZHU, A-X., 2005, A Markov chain-based probability vector approach for modeling spatial uncertainty of soil classes. Soil Science Society of America Journal, 69, pp. 1931 1942. LI, W., ZHANG, C., BURT, J.E., ZHU, A-X. and FEYEN, J., 2004, Two-dimensional Markov chain simulation of soil type spatial distribution. Soil Science Society of America Journal, 68, pp. 1479 1490. LOU, J., 1996, Transition probability approach to statistical analysis of spatial qualitative variables in geology. In Geologic Modeling and Mapping, edited by A. Forster and D.F. Merriam (New York: Plenum Press), pp. 281 299. RITZI, R.W., 2000, Behavior of indicator variograms and transition probabilities in relation to the variance in lengths of hydrofacies. Water Resources Research, 36, pp. 3375 3381. SCHWARZACHER, W., 1969, The use of Markov chains in the study of sedimentary cycles. Mathematical Geology, 1, pp. 17 39.

Transiogram: A spatial relationship measure 699 WEISSMANN, G.S. and FOGG, G.E., 1999, Multi-scale alluvial fan heterogeneity modeled with transition probability geostatistics in a sequence stratigraphic framework. Journal of Hydrology, 226, pp. 48 65. ZHANG, C. and LI, W., 2005, Markov chain modeling of multinomial land-cover classes. GIScience and Remote Sensing, 42, pp. 1 18.