Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Similar documents
Point-Referenced Data Models

Hierarchical Modeling for Univariate Spatial Data

CBMS Lecture 1. Alan E. Gelfand Duke University

Hierarchical Modelling for Univariate Spatial Data

Introduction to Spatial Data and Models

Basics of Point-Referenced Data Models

Introduction to Spatial Data and Models

Introduction to Spatial Data and Models

Hierarchical Modelling for Univariate and Multivariate Spatial Data

Introduction to Geostatistics

Hierarchical Modelling for Univariate Spatial Data

Practicum : Spatial Regression

Hierarchical Modelling for Multivariate Spatial Data

Hierarchical Modeling for Multivariate Spatial Data

Multivariate spatial modeling

Chapter 4 - Fundamentals of spatial processes Lecture notes

Handbook of Spatial Statistics Chapter 2: Continuous Parameter Stochastic Process Theory by Gneiting and Guttorp

Bayesian Linear Models

Gaussian predictive process models for large spatial data sets.

Bayesian Linear Regression

spbayes: An R Package for Univariate and Multivariate Hierarchical Point-referenced Spatial Models

Chapter 4 - Fundamentals of spatial processes Lecture notes

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

Nearest Neighbor Gaussian Processes for Large Spatial Data

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Introduction to Spatial Data and Models

Bayesian data analysis in practice: Three simple examples

Bayesian Linear Models

Statistícal Methods for Spatial Data Analysis

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Hierarchical Modeling for Spatio-temporal Data

Hierarchical Modeling and Analysis for Spatial Data

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information

Bayesian Linear Models

Introduction. Spatial Processes & Spatial Patterns

Spatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.

Kriging Luc Anselin, All Rights Reserved

Spatio-temporal prediction of site index based on forest inventories and climate change scenarios

Gaussian Processes 1. Schedule

Hierarchical Modeling for Spatial Data

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

The linear model is the most fundamental of all serious statistical models encompassing:

Conjugate Analysis for the Linear Model

variability of the model, represented by σ 2 and not accounted for by Xβ

Space-time data. Simple space-time analyses. PM10 in space. PM10 in time

Principles of Bayesian Inference

Introduction to Bayes and non-bayes spatial statistics

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Lecture 13 Fundamentals of Bayesian Inference

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Summary STK 4150/9150

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

MCMC algorithms for fitting Bayesian models

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

ST 740: Linear Models and Multivariate Normal Inference

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

Bayesian Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Point process with spatio-temporal heterogeneity

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Linear Models #2

BAYESIAN KRIGING AND BAYESIAN NETWORK DESIGN

Fluvial Variography: Characterizing Spatial Dependence on Stream Networks. Dale Zimmerman University of Iowa (joint work with Jay Ver Hoef, NOAA)

On the change of support problem for spatio-temporal data

Cross-sectional space-time modeling using ARNN(p, n) processes

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

10. Exchangeability and hierarchical models Objective. Recommended reading

Bayesian Inference. Chapter 9. Linear models and regression

Modeling Real Estate Data using Quantile Regression

Bayesian Methods for Machine Learning

Analysing geoadditive regression data: a mixed model approach

Non-gaussian spatiotemporal modeling

Statistics & Data Sciences: First Year Prelim Exam May 2018

STATISTICAL MODELS FOR QUANTIFYING THE SPATIAL DISTRIBUTION OF SEASONALLY DERIVED OZONE STANDARDS

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

Types of Spatial Data

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

Bayesian GLMs and Metropolis-Hastings Algorithm

Principles of Bayesian Inference

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Introduction. Semivariogram Cloud

Modelling Multivariate Spatial Data

Problem Selected Scores

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Spatial Misalignment

Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study

Bayesian linear regression

E(x i ) = µ i. 2 d. + sin 1 d θ 2. for d < θ 2 0 for d θ 2

Transcription:

Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise in agriculture, climatology, economics, epidemiology, transportation and many other areas. What does geo-referenced mean? In a nutshell, we know the geographic location at which an observation was collected. Why does it matter? Sometimes, relative location can provide information about an outcome beyond that provided by covariates. Models for spatial data (cont d) Example: infant mortality is typically higher in high poverty areas. Even after incorporating poverty as a covariate, residuals may still be spatially correlated due to other factors such as nearness to pollution sites, distance to pre and post-natal care centers, etc. Often data are also collected over time, so models that include spatial and temporal correlations of outcomes are needed. We focus on spatial, rather than spatio-temporal models. We also focus on models for univariate rather than multivariate outcomes. 1 2 Types of spatial data Types of spatial data (cont d) Point-referenced data: Y (s) a random outcome (perhaps vector-valued) at location s, wheres varies continuously over some region D. The location s is typically two-dimensional (latitude and longitude) but may also include altitude. Known as geostatistical data. Areal data: outcomey i is an aggregate value over an areal unit with well-defined boundaries. Here, D is divided into a finite collection of areal units. Known as lattice data even though lattices can be irregular. Point-pattern data: Outcome Y (s) is the occurrence or not of an event and locations s are random. Example: locations of trees of a species in a forest or addresseses of persons with a particular disease. Interest is often in deciding whether points occur independently in space or whether there is clustering. Marked point process data: If covariate information is available we talk about a marked point process. Covariate value at each site marks the site as belonging to a certain covariate batch or group. Combinations: e.g. ozone daily levels collected in monitoring stations for which we know the precise location, and number of children in a zip code reporting to the ER with respiratory problems on that day. Require data re-alignment so that outcomes and covariates obtained at different spatial resolutions can be combined in a model. 3 4

Models for point-level data The basics Location index s varies continuously over region D. We often assume that the covariance between two observations at locations s i and s j depends only on the distance d ij between the points. The spatial covariance is often modeled as exponential: Cov (Y (s i ),Y(s i )) = C(d ii )=σ 2 e φd ii, where (σ 2,φ) > 0 are the partial sill and decay parameters, respectively. Covariogram: a plot of C(d ii ) against d ii. For i = i, d ii =0andC(d ii )=var(y (s i )). Sometimes, var(y (s i )) = τ 2 + σ 2,forτ 2 the nugget effect and τ 2 + σ 2 the sill. Models for point-level data (cont d) Covariance structure Suppose that outcomes are normally distributed and that we choose an exponential model for the covariance matrix. Then: Y µ, θ N(µ, Σ(θ)), with Y = {Y (s 1 ),Y(s 2 ),..., Y (s n )} Σ(θ) ii = cov(y (s i ),Y(s i )) θ =(τ 2,σ 2,φ). Then Σ(θ) ii = σ 2 exp( φd ii )+τ 2 I i=i, with (τ 2,σ 2,φ) > 0. This is an example of an isotropic covariance function: the spatial correlation is only a function of d. 5 6 Models for point-level data, details The variogram and semivariogram Basic model: Y (s) =µ(s)+w(s)+e(s), where µ(s) =x (s)β and the residual is divided into two components: w(s) is a realization of a zero-centered stationary Gaussian process and e(s) is uncorrelated pure error. The w(s) are functions of the partial sill σ 2 and decay φ parameters. The e(s) introduces the nugget effect τ 2. τ 2 interpreted as pure sampling variability or as microscale variability, i.e., spatial variability at distances smaller than the distance between two outcomes: the e(s) are sometimes viewed as spatial processes with rapid decay. A spatial process is said to be: Strictly stationary if distributions of Y (s)andy(s+ h) are equal, for h the distance. Weakly stationary if µ(s) =µ and Cov(Y (s),y(s+ h)) = C(h). Instrinsically stationary if E[Y (s + h) Y (s)] = 0, and E[Y (s + h) Y (s)] 2 = Var[Y (s + h) Y (s)] =2γ(h), defined for differences and depending only on distance. 2γ(h)isthevariogramandγ(h)isthesemivariogram The specific form of the semivariogram will depend on the assumptions of the model. 7 8

Stationarity Semivariogram (cont d) Strict stationarity implies weak stationarity but the converse is not true except in Gaussian processes. Weak stationarity implies intrinsec stationarity, but the converse is not true in general. Notice that intrinsec stationarity is defined on the differences between outcomes at two locations and thus says nothing about the joint distribution of outcomes. If γ(h) depends on h only through its length h, then the spatial process is isotropic. Elseitisanisotropic. There are many choices for isotropic models. The exponential model is popular and has good properties. For t = h : γ(t) =τ 2 + σ 2 (1 exp( φt)) if t>0, = 0 otherwise. See figures, page 24. The powered exponential model has an extra parameter for smoothness: γ(t) =τ 2 + σ 2 (1 exp( φt κ )) if t>0 Another popular choice is the Gaussian variogram model, equal to the exponential except for the exponent term, that is exp( φ 2 t 2 )). 9 10 Semivariogram (cont d) Point-level data (cont d) Fitting of the variogram has been traditionally done by eye : Plot an empirical estimate of the variogram akin to the sample variance estimate or the autocorrelation function in time series Choose a theoretical functional form to fit the empirical γ Choose values for (τ 2,σ 2,φ) that fit the data well. If a distribution for the outcomes is assumed and a functional form for the variogram is chosen, parameter estimates can be estimated via some likelihood-based method. Of course, we can also be Bayesians. For point-referenced data, frequentists focus on spatial prediction using kriging. Problem: given observations {Y (s 1 ),..., Y (s n )}, how do we predict Y (s o )atanewsites o? Consider the model Y = Xβ + ɛ, where ɛ N(0, Σ), and where Σ=σ 2 H(φ)+τ 2 I. Here, H(φ) ii = ρ(φ, d ii ). Kriging consists in finding a function f(y) of the observations that minimizes the MSE of prediction Q = E[(Y (s o ) f(y)) 2 y]. 11 12

Classical kriging (cont d) Bayesian methods for estimation (Not a surprising!) Result: f(y) that minimizes Q is the conditional mean of Y (s 0 ) given observations y (see pages 50-52 for proof): where E[Y (s o ) y] =x ˆβ o +ˆγ ˆΣ 1 (y X ˆβ) Var[Y (s o ) y] =ˆσ 2 +ˆτ 2 ˆγ ˆΣ 1ˆγ, ˆγ =(ˆσ 2 ρ( ˆφ, d o1 ),..., ˆσ 2 ρ( ˆφ, d on )) ˆβ =(X ˆΣ 1 X) 1 X ˆΣ 1 y ˆΣ =ˆσ 2 H( ˆφ). Solution assumes that we have observed the covariates x o at the new site. If not,in the classical framework Y (s o ),x o are jointly estimated using an EM-type iterative algorithm. The Gaussian isotropic kriging model is just a general linear model similar to those in Chapter 15 of textbook. Just need to define the appropriate covariance structure. For an exponential covariance structure with a nugget effect, parameters to be estimated are θ =(β,σ 2,τ 2,φ). Steps: Choose priors and define sampling distribution Obtain posterior for all parameters p(θ y) Bayesian kriging: get posterior predictive distribution for outcome at new location p(y o y, X, x o ). 13 14 Bayesian methods (cont d) Hierarchical representation of model Sampling distribution (marginal data model) y θ N(Xβ,σ 2 H(φ)+τ 2 I) Priors: typically chosen so that parameters are independent a priori. As in the linear model: Non-informative prior for β is uniform or can use a normal prior too. Conjugate priors for variances σ 2,τ 2 are inverse gamma priors. For φ, appropriate prior depends on covariance model. For simple exponential where ρ(s i s j ; φ) =exp( φ s i s j ), a Gamma prior can be a good choice. Be cautious with improper priors for anything but β. Hierarchical model representation: first condition on the spatial random effects W = {w(s 1 ),..., w(s n )}: y θ, W N(Xβ + W, τ 2 I) W φ, σ 2 N(0,σ 2 H(φ)). Model specification is then completed by choosing priors for β,τ 2 and for φ, σ 2 (hyperparameters). Note that hierarchical model has n more parameters (the w(s i )) than the marginal model. Computation with the marginal model preferable because σ 2 H(φ)+τ 2 I tends to be better behaved than σ 2 H(φ) atsmalldistances. 15 16

Estimation of spatial surface W y Bayesian kriging Interest is sometimes on estimating the spatial surface using p(w y). If marginal model is fitted, we can still get marginal posterior for W as p(w y) = p(w σ 2,φ)p(σ 2,φ y)dσ 2 dφ. Given draws (σ 2(g),φ (g) ) from the Gibbs sampler on the marginal model, we can generate W from p(w σ 2(g),φ (g) )= N(0,σ 2(g) H(φ (g) )). Let Y o = Y (s o )andx o = x(s o ). Kriging is accomplished by obtaining the posterior predictive distribution p(y o x o,x,y)= p(y o,θ y, X, x o )dθ = p(y o θ, y, x o )p(θ y, X)dθ. Since (Y o,y) are jointly multivariate normal (see expressions 2.18 and 2.19 on page 51), then p(y o θ, y, x o ) is a conditional normal distribution. Analytical marginalization over W is possible only if model has Gaussian form. 17 18 Bayesian kriging (cont d) Kriging example from WinBugs Given MCMC draws of the parameters (θ (1),..., θ (G) ) from the posterior distribution p(θ y, X), we draw values y o (g) for each θ (g) as y o (g) p(y o θ (g),y,x o ). Draws {y o (1),y o (2),..., y o (G), } are a sample from the posterior predictive distribution of the outcome at the new location s o. To predict Y at a set of m new locations s o1,..., s om, it is best to do joint prediction to be able to estimate the posterior association among m predictions. Beware of joint prediction at many new locations with WinBUGS. It can take forever. Data were first published by Davis (1973) and consist of heights at 52 locations in a 310-foot square area. We have 52 s = (x, y) coordinates and outcomes (heights). Unit of distance is 50 feet and unit of elevation is 10 feet. The model is height = β + ɛ, where ɛ N(0, Σ), and where Σ=σ 2 H(φ). Here, H(φ) ij = ρ(s i s j ; φ) =exp( φ s i s j κ ). Priors on (β,φ,κ). We predict elevations at 225 new locations. 19 20