An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University

Similar documents
Spatial and Environmental Statistics

Asymptotic standard errors of MLE

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Summary STK 4150/9150

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

Practicum : Spatial Regression

CBMS Lecture 1. Alan E. Gelfand Duke University

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Chapter 4 - Fundamentals of spatial processes Lecture notes

Geostatistical Modeling for Large Data Sets: Low-rank methods

Introduction to Spatial Data and Models

PRODUCING PROBABILITY MAPS TO ASSESS RISK OF EXCEEDING CRITICAL THRESHOLD VALUE OF SOIL EC USING GEOSTATISTICAL APPROACH

Introduction to Spatial Data and Models

I don t have much to say here: data are often sampled this way but we more typically model them in continuous space, or on a graph

Recent Developments in Biostatistics: Space-Time Models. Tuesday: Geostatistics

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Chapter 1. Summer School GEOSTAT 2014, Spatio-Temporal Geostatistics,

Statistícal Methods for Spatial Data Analysis

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Space-time analysis using a general product-sum model

Point-Referenced Data Models

Second-Order Analysis of Spatial Point Processes

What s for today. Introduction to Space-time models. c Mikyoung Jun (Texas A&M) Stat647 Lecture 14 October 16, / 19

Geostatistics for Gaussian processes

Spatial analysis is the quantitative study of phenomena that are located in space.

STATISTICAL INTERPOLATION METHODS RICHARD SMITH AND NOEL CRESSIE Statistical methods of interpolation are all based on assuming that the process

A kernel indicator variogram and its application to groundwater pollution

Space-time data. Simple space-time analyses. PM10 in space. PM10 in time

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

11/8/2018. Spatial Interpolation & Geostatistics. Kriging Step 1

Lecture 7 Autoregressive Processes in Space

Introduction. Semivariogram Cloud

Chapter 4 - Fundamentals of spatial processes Lecture notes

A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets

Spatial Interpolation & Geostatistics

Point patterns. The average number of events (the intensity, (s)) is homogeneous

An Introduction to Pattern Statistics

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

What s for today. Random Fields Autocovariance Stationarity, Isotropy. c Mikyoung Jun (Texas A&M) stat647 Lecture 2 August 30, / 13

Bayesian Thinking in Spatial Statistics

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA

Gaussian Process Regression Model in Spatial Logistic Regression

Saddlepoint-Based Bootstrap Inference in Dependent Data Settings

Geostatistics: Kriging

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Limit Kriging. Abstract

Spatial Statistics A Framework for Analyzing Geographically Referenced Data in Insurance Ratemaking

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Fluvial Variography: Characterizing Spatial Dependence on Stream Networks. Dale Zimmerman University of Iowa (joint work with Jay Ver Hoef, NOAA)

Statistics 352: Spatial statistics. Jonathan Taylor. Department of Statistics. Models for discrete data. Stanford University.

Combining Regressive and Auto-Regressive Models for Spatial-Temporal Prediction

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

If we want to analyze experimental or simulated data we might encounter the following tasks:

Spatio-Temporal Geostatistical Models, with an Application in Fish Stock

Extreme Value Analysis and Spatial Extremes

Part 5: Spatial-Temporal Modeling

Statistical Models for Monitoring and Regulating Ground-level Ozone. Abstract

Basics of Point-Referenced Data Models

Spatial analysis of a designed experiment. Uniformity trials. Blocking

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Lab #3 Background Material Quantifying Point and Gradient Patterns

Dale L. Zimmerman Department of Statistics and Actuarial Science, University of Iowa, USA

Index. Geostatistics for Environmental Scientists, 2nd Edition R. Webster and M. A. Oliver 2007 John Wiley & Sons, Ltd. ISBN:

Fast kriging of large data sets with Gaussian Markov random fields

Introduction to Bayes and non-bayes spatial statistics

Non-gaussian spatiotemporal modeling

Spatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Effective Sample Size in Spatial Modeling

Spatial-Temporal Modeling of Active Layer Thickness

Anomaly Density Estimation from Strip Transect Data: Pueblo of Isleta Example

Interpolation and 3D Visualization of Geodata

Approximating likelihoods for large spatial data sets

Markov random fields. The Markov property

Areal data. Infant mortality, Auckland NZ districts. Number of plant species in 20cm x 20 cm patches of alpine tundra. Wheat yield

adaptive prior information and Bayesian Partition Modelling.

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Gaussian predictive process models for large spatial data sets.

An introduction to spatial point processes

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes

Spatial Statistics and Spatio-Temporal Data

GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

High dimensional Ising model selection

Assessing the covariance function in geostatistics

Point process with spatio-temporal heterogeneity

Spatial Models: A Quick Overview Astrostatistics Summer School, 2017

Adaptive Sampling of Clouds with a Fleet of UAVs: Improving Gaussian Process Regression by Including Prior Knowledge

RESEARCH REPORT. Estimation of sample spacing in stochastic processes. Anders Rønn-Nielsen, Jon Sporring and Eva B.

7 Geostatistics. Figure 7.1 Focus of geostatistics

Types of Spatial Data

Manual for SOA Exam MLC.

SPATIO-TEMPORAL REGRESSION MODELS FOR DEFORESTATION IN THE BRAZILIAN AMAZON

REML Estimation and Linear Mixed Models 4. Geostatistics and linear mixed models for spatial data

OFTEN we need to be able to integrate point attribute information

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas

Transcription:

An Introduction to Spatial Statistics Chunfeng Huang Department of Statistics, Indiana University

Microwave Sounding Unit (MSU) Anomalies (Monthly): 1979-2006.

Iron Ore (Cressie, 1986) Raw percent data 2 4 6 8 5 10 15

North Carolina sudden infant deaths, 1974-1978 and 1979-1984 (Cressie, 1993) SIDS (sudden infant death syndrome) in North Carolina 36.5 N 1974 36 N 35.5 N 35 N 34.5 N 34 N 1979 36.5 N 36 N 35.5 N 35 N 34.5 N 84 W 82 W 80 W 78 W 76 W 34 N 0 10 20 30 40 50 60

cells japanesepines redwoodfull

Three types of spatial data Geostatistics Variogram Kriging Lattice or Areal data Markov Random Field Conditional Autoregressive Model (CAR) Spatial point pattern Complete spatial randomness (CSR) K function, L function

Geostatistics: Quantify spatial dependency and Assumptions Second-order stationarity: E(Z(s i )) = µ, cov(z(s i ), Z(s j )) = C(s i s j ) Isotropy: cov(z(s i ), Z(s j )) = C( s i s j ) Intrinsic stationarity: E(Z(s i )) = µ, var(z(s i ) Z(s j )) = 2γ(s i s j ) γ( ): semi-variogram, 2γ( ): variogram. Stationary implies intrinsic stationary: γ(s i s j ) = C(0) C(s i s j ) Intrinsic stationary does not necessarily imply stationary.

vgm(1,"sph",1.5,nugget=0.2) 1.2 1.0 0.8 semivariance 0.6 0.4 0.2 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 distance

Variogram (Semi-variogram) Sill, range, nugget Conditional negative definite. (Variance may be negative if this is not satisfied.) Parametric models

3 0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0 vgm(1,"nug",0) vgm(1,"exp",1) vgm(1,"sph",1) vgm(1,"gau",1) vgm(1,"exc",1) 2 1 0 vgm(1,"mat",1) vgm(1,"ste",1) vgm(1,"cir",1) vgm(1,"lin",0) vgm(1,"bes",1) 3 2 1 semivariance 3 2 vgm(1,"pen",1) vgm(1,"per",1) vgm(1,"hol",1) vgm(1,"log",1) vgm(1,"pow",1) 0 1 0 vgm(1,"spl",1) vgm(1,"leg",1) 3 2 1 0 0.0 1.0 2.0 3.0 distance

Estimation of variogram Empirical variogram (Method-of-moments estimator) 2ˆγ(h) = 1 N(h) {Z(s i ) Z(s j )} 2, N(h) where N(h) is the set of pairs with distance h, N(h) is the number of pairs. Tolerance regions. Fitting parametric models (weighted least squares, MLE, REML)

Empirical variogram semivariance 0 5 10 0 2 4 6 8 distance

Empirical variogram semivariance 0 5 10 0 2 4 6 8 distance

Kriging Suppose we observe a spatial process Z(s 1 ), Z(s 2 ),..., Z(s n ). The best (in terms of minimizing mean squared prediction error) unbiased linear predictor of Z(s 0 ) is (Cressie, 1993) Ẑ(s 0 ) = n λ i Z(s i ), i=1 where (λ 1,..., λ n ) = ( γ + 1 (1 1T Γ 1 ) T γ) 1 T Γ 1 Γ 1 1 γ = (γ(s 0 s 1 ),..., γ(s 0 s n )) T Kriging variance: Γ = {γ(s i s j )} n n σ 2 K (s 0) = λ T Γ 1 λ (1 T Γ 1 1 1) 2 /(1 T Γ 1 1)

Z y x 48 50 52 54 56 58

Prediction MSE y x 2.6 2.8 3.0 3.2 3.4 3.6

Northing 2 4 6 8 x x o o x o x o x xo x x o o x o o 52 54 56 58 Iron Ore % 46 50 54 58 ox ox o o o o o o x o ox x o o x x x x x o o o ox x ox x x x 5 10 15 Easting x o = Median Iron Ore % Iron Ore % x = Mean Iron Ore %

Areal Data or Lattice Data Simultaneous autoregressive (SAR) model z(s 1 ) = +b 12 z(s 2 ) +... + b 1n z(s n ) + ɛ 1 z(s 2 ) = b 21 z(s 1 ) + +... + b 2n z(s n ) + ɛ 2. z(s n ) = b n1 z(s 1 ) + b n2 z(s 2 ) +... + +ɛ n Or z = Bz + ɛ Explanatory variables. One can show that ɛ is not independent of z. What if z is discrete?

Conditional autoregressive (CAR) model where z(s i ) {z(s j ) : j i} N( n i=1 c ij τ 2 j = c ji τ 2 i, c ii = 0 c ij z(s j ), τ 2 i ) If we further assume the conditional dependency only through the neighborhood of s i : N(s i ) (Markov random field (MRF)) c ik = 0, k / N(s i )

Response variable falls in exponential family: auto spatial models. Response variable is normal: auto Gaussian model. Response variable is binary: auto logistic model. Response variable is poisson: auto poisson model. SAR can be represented as a CAR model. Not necessarily vice versa, see example in Cressie (1993). Markov random field theory is used to guide from the conditional distributions to a valid joint distribution.

Proximity matrix W. Some choices for W ij (W ii = 0): W ij = 1 if i, j share a common boundary. W ij is an inverse distance between units. W ij = 1 if distance between i and j less than some fixed value. W ij = 1 for m nearest neighbors.

CAR model where and z N(X β, (I C) 1 M) C = {c ij } n n, M = τ 2 I C = ρw ρ = 0 spatial independence Condition on ρ: I ρw is positive definite.

Spatial point process A spatial point process is said to have the complete spatial randomness (CSR) property if it is a homogeneous poisson point process. A 1,..., A r disjoint, then N(A 1 ),..., N(A r ) are independent. (N(A) : number of events in A) N(A) Poisson(λ A ), where A is the volume of A.

Testing for CSR W : distance from a randomly chosen event to its nearest event. X : point to nearest event. Through Monte Carlo test.

First order intensity function λ(x) = lim dx 0 Second order intensity function λ 2 (x, y) = lim dx, dy 0 { } E[N(dx)] dx Stationary: λ(x) = λ, λ 2 (x, y) = λ 2 (x y) Isotropy: λ 2 (x y) = λ 2 ( x y ) Under CSR: λ(x) = λ, λ 2 (x, y) = λ 2 { } E[N(dx)N(dy)] dx dy

K function K(t) = 1 λ E[N 0(t)] where N 0 (t) is the number of further events within distance t of an arbitrary event. One can show that λ 2 (t) = λ2 K (t), λ 2 K(t) = 2π 2πt For the CSR: K(t) = πt 2 L function: L(t) = K(t)/π t t 0 λ 2 (y)ydy

Kenvjap K(r) 0.00 0.05 0.10 0.15 0.20 K obs(r) K theo(r) K hi(r) K lo(r) 0.00 0.05 0.10 0.15 0.20 r

Kenvred K(r) 0.00 0.05 0.10 0.15 K obs(r) K theo(r) K hi(r) K lo(r) 0.00 0.05 0.10 0.15 0.20 r

References Cressie, N. (1993), Statistics for Spatial Data, rev. ed., Wiley. Cressie, N. and Wikle, C. (2012), Statistics for Spatio-Temporal Data, Wiley. Diggle, P. J. (1983), Statistical Analysis of Spatial Point Patterns, Academic Press. Banerjee, S., Carlin, B. P. and Gelfand, A. E. (2004), Hierarchical Modeling and Analysis for Spatial Data, Chapman and Hall. Bivand, R., Pebesma, E. and Gomez-Rubio V. (2008), Applied Spatial Data Analysis with R, Springer-Verlag. Stein, M. (1999), Interpolation of Spatial Data. Cressie, N. (1986). Kiring non stationary data. Journal of American Statistical Association, 81, 625-634.