Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Similar documents
Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Summary STK 4150/9150

Markov Random Fields

Probabilistic Graphical Models

Asymptotic standard errors of MLE

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Lecture 18. Models for areal data. Colin Rundel 03/22/2017

Modeling Real Estate Data using Quantile Regression

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University

Bayesian Areal Wombling for Geographic Boundary Analysis

Bayesian spatial hierarchical modeling for temperature extremes

Nearest Neighbor Gaussian Processes for Large Spatial Data

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Effective Sample Size in Spatial Modeling

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Lecture 7 Autoregressive Processes in Space

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Multivariate spatial modeling

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Web Appendices: Hierarchical and Joint Site-Edge Methods for Medicare Hospice Service Region Boundary Analysis

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Conjugate Analysis for the Linear Model

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Modeling for Spatio-temporal Data

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Using Estimating Equations for Spatially Correlated A

CSC 412 (Lecture 4): Undirected Graphical Models

AMS-207: Bayesian Statistics

Temporal vs. Spatial Data

Chris Bishop s PRML Ch. 8: Graphical Models

An Introduction to Exponential-Family Random Graph Models

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Modelling geoadditive survival data

Can we do statistical inference in a non-asymptotic way? 1

Spatial Smoothing in Stan: Conditional Auto-Regressive Models

The linear model is the most fundamental of all serious statistical models encompassing:

Linear Methods for Prediction

Chapter 4 - Fundamentals of spatial processes Lecture notes

Bayesian Linear Models

Hierarchical Modeling for Univariate Spatial Data

Probabilistic Graphical Models

Introduction to Machine Learning CMU-10701

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Probabilistic Graphical Models Lecture Notes Fall 2009

3 : Representation of Undirected GM

Default Priors and Effcient Posterior Computation in Bayesian

Bayesian Linear Regression

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Markov Chain Monte Carlo (MCMC)

Bayesian Linear Models

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Basics of Geographic Analysis in R

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

Introduction to Graphical Models

Statistics & Data Sciences: First Year Prelim Exam May 2018

Lecture 4 October 18th

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Spatio-Temporal Modelling of Credit Default Data

Spatio-Temporal Models for Areal Data

STAT 518 Intro Student Presentation

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Spatial Analysis of Incidence Rates: A Bayesian Approach

Gibbs Sampling in Endogenous Variables Models

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian inference for multivariate skew-normal and skew-t distributions

Minimal basis for connected Markov chain over 3 3 K contingency tables with fixed two-dimensional marginals. Satoshi AOKI and Akimichi TAKEMURA

Bayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)

Penalized Loss functions for Bayesian Model Choice

A Bayesian perspective on GMM and IV

1 Data Arrays and Decompositions

Partial factor modeling: predictor-dependent shrinkage for linear regression

Stat 5101 Lecture Notes

Notes on Markov Networks

Bayesian Linear Models

Gibbs Fields & Markov Random Fields

Part 8: GLMs and Hierarchical LMs and GLMs

Some Curiosities Arising in Objective Bayesian Analysis

Markov random fields. The Markov property

Fully Bayesian Spatial Analysis of Homicide Rates.

Hypothesis Testing hypothesis testing approach

Spatial Analysis 2. Spatial Autocorrelation

Generalized Linear Models. Kurt Hornik

Graphical Models and Kernel Methods

STA 4273H: Statistical Machine Learning

Spatial Regression. 6. Specification Spatial Heterogeneity. Luc Anselin.

Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants

Lecture 13 Fundamentals of Bayesian Inference

Transcription:

Areal data models Spatial smoothers Brook s Lemma and Gibbs distribution CAR models Gaussian case Non-Gaussian case SAR models Gaussian case Non-Gaussian case CAR vs. SAR STAR models

Inference for areal data Note: This chapter is very important for hierarchical spatial modeling of any type of data using MCMC methods. For areal units the inferential issues are: Is there a spatial pattern? How strong is it? Spatial pattern suggest that observations close to each other have more similar values than those far from each other. Do we want to smooth the data? How much? If we modify the areal units to new units (from zip codes to county values), what can we say about the new counts we expect for the latter give those for the former? This is the modifiable areal unit problem (MAUP).

Exploratory tools Proximity matrix W : proximity matrix. The entries in W connect different values of the process Y 1,...,Y n in some fashion. Generally w ii is set to zero. Examples (symmetric W ): w ij =1ifi and j share common boundary. w ij could be distance between centroids of regions i and j. w ij =1ifj is one of the K nearest neighbors of i. W does not need to be symmetric. The w ij might be standardized by j w ij = w i+. We can define distance intervals, (0,d 1 ], (d 1,d 2 ], and so on. Then, we call:

First order neighbors of unit i: all units within distance d 1 of i. Second order neighbors: all units within distance d 2 of i but separated by more more than d 1. Analogous to W we can define W (1) as the proximity matrix for the first-order neighbors. This means w (1) ij =1ifi and j and first-order neighbors. And, so on.

Measures of spatial association The standard statistics are the Moran s I and Geary s C. They are analogues for areal data of the empirical correlation function and the variogram. Moran s I: I = n i w ij(y i Ȳ )(Y j Ȳ ) ( i j w ij) i (Y i Ȳ )2 I is not supported on [ 1, 1]. Under the hypothesis of independence, I is asymptotically normal with mean 1/(n 1).

Geary s C: C = (n 1) i j w ij(y i Y j ) 2 ( i j w ij) i (Y i Ȳ )2 C is never negative, and has mean 1 for the null model. Low values (between 0 and 1) indicate positive spatial association. Under the null hypothesis we have asymptotic normality. However for testing is preferable to use Monte Carlo. By permuting the values of Y i s.

The correlogram is a useful tool to study spatial association with areal data. Working with I we can replace w ij with the previously defined w (1) ij and obtain say I (1). Then, we replace it with w (2) ij and obtain I (2). A plot of I (r) versus r is called a correlogram. If there is spatial pattern, we expect I (r) to decline in r Initially and then vary about 0.

Spatial smoothers W provides a spatial smoother. We can replace Y i by Ŷ i = j w ij Y j /w i+. This ensures that the value for an areal unit i looks more like its neighbors. Alternatively, we can consider (to take into account the actual value of Y i ) Ŷ i =(1 α)y i + αŷi, for α (0, 1). This can be viewed as a filter. We will revisit this topic in the hierarchical modeling chapter.

Brook s Lemma Given p(y 1,...,y n ), the full conditional distributions, then p(y i y j,j i) for i = 1,...,n, are uniquely determined. Brook s lemma proves the converse, and it enables us to retrieve the unique joint distribution determined by the conditionals. We can not write down an arbitrary set of conditionals and assert that they determine the joint distribution. Example: Y 1 Y 2 N(α 0 + α 1 Y 2,σ 2 1) Y 2 Y 1 N(β 0 + β 1 Y 3 1,σ 2 2) Thus, E[Y 1 ] is linear in E[Y 2 ] E[Y 1 ]=α 0 + α 1 E[Y 2 ], then, E[Y 2 ] is linear in E[Y 1 ]. However it must also be the case that E[Y 2 ]=β 0 + β 1 E[Y 3 1 ],

This can not be in general. Therefore, there is no joint distribution. Also, p(y 1,...,y n )mightbeimproperevenif the conditionals are proper. Example: p(y 1,y 2 ) exp( 1/2(y 1 y 2 ) 2 ). p(y 1 y 2 )is N(y 2, 1) and p(y 2 y 1 ) N(y 1, 1). But, p(y 1,y 2 ) is improper.

Brook s Lemma p(y 1, y 2...,y n ) p(y 10, y 2...,y n ) p(y 1,...,y n )= p(y 2, y 10,y 3...,y n ) p(y 20, y 10,y 3...,y n ) p(y n, y 10,...,y n 1,0 ) p(y n0, y 10,...,y n 1,0 ) p(y 10,...,y n0 ) here y 0 =(y 10,...,y n0 )isanyfixedpointin the support of p. The joint distribution is determined up to a proportionality constant by the conditionals.

Definitions Markov Random Field (MRF): We specify a set of full conditional distributions for the Y i such that p(y i y j,j i) =p(y i y j,j δ i ). The notion of using local specification to determine a joint distribution is refereed to as a MRF. Clique: A clique is a set of cells such that each element is a neighbor of every other element. We use notation i j if i is a neighbor of j and j is a neighbor of i. Potential: A potential of order k, itisa function of k arguments that is exchangeable in these arguments. The arguments of the potential would be the values taken by variables associated with the cells for a clique of size k. Example: for k =2,wehaveY i Y j if i and j are

a clique of size 2. This is a potential of order 2. Gibbs distribution: p(y 1,...,y n ) is a Gibbs distribution if it is a function of the Y i only through potentials on cliques. ( p(y 1,...,y n ) exp γ ) φ (k) (y α1,...,y αk ) k α M k φ (k) is a potential of order k, M k is the collection of all subsets of size k, α indexes this set, and γ>0 is a scale parameter. Hammersley-Clifford Theorem: If we have a MRF, i.e. if the conditional defines a unique joint distribution, then this joint distribution is a Gibbs distribution.

For continuous data on R, a common choice for joint distribution is p(y 1,...,y n ) exp 1 2τ 2 (y i y j ) 2 I(i j) We will study next this type of distributions, which are Gibbs distributions on potential of order 1 and 2, and then i,j p(y i y j,j i) =N( j δ i y j /m i,τ 2 /m i ) where m i is the number of neighbors of i.

CAR models Conditionally autoregressive models (CAR). The are widely used in MCMC methods for fitting certain classes of hierarchical spatial models. As we will see later. The Gaussian (autonormal) case Y i y j,j i N( j b ij y j,τ 2 i ) i = 1,...,n. These full conditionals are compatible, through Brooks lemma we obtain p(y 1,...,y n ) exp { 1/2y T D 1 (I B)y } where B = {b ij } and D is diagonal with D ii = τ 2 i. For Y to be normal we need first to prove the symmetry of Σ Y =(I B) 1 D. The simple resulting conditions are

b ij τ 2 i = b ji τ 2 j for all i, j. Thus, B is not symmetric in this setting. Suppose we set b ij = w ij /w i+, and τi 2 = τ 2 /w i+. Then, the condition is satisfied, andwehavethaty, p(y 1,...,y n ) exp { 1/2τ 2 y T (D w W )y }, (1) where D w is diagonal with (D w ) ii = w i+.

A second problem is that (D w W )1 = 0 then Σ 1 y is singular and Σ y does not exits. thus, this distribution is improper. we can rewrite (1) as follows p(y 1,...,y n ) exp 1/2τ 2 i j w ij (y i y j ) 2, The impropriety of p is clear, since we can add any constant to all the Y i and the distribution is unaffected. The Y i are not centered. A constraint such as Y i =0 i would solve the problem. This is the IAR model, intrinsically autoregressive model. A joint distribution that is improper but has all full conditionals proper. The impropriety can be remedied in an obvious

way. Redefine: Σ 1 y = D w ρw and choose ρ to make Σ 1 y nonsingular. This is guaranteed if ρ (1/λ 1, 1/λ n ), where λ 1 < <λ n are the ordered eigenvalues of D 1/2 w WD 1/2 w. The bounds can be simplified, by replacing W by W = Diag(1/w i+ )W. then, Σ 1 y = D w (I α W ) where D w is diagonal. If α < 1, then I α W is nonsingular.

Interpretation of the ρ parameter: The additional parameter ρ, when it is zero, the Y i become independent. ρ should not be interpreted as a parameter that explains the spatial dependency. For instance, in a simulation study when ρ =.8, I =.1, when ρ =.9, I=.5. But, an improper choice (ρ = 1) may enable wider scope for posterior spatial patterns, and might be preferable.

We may write the CAR model as Y = BY + ɛ, or (I B)Y = ɛ. If p(y) is proper, then Y N(0, (I B) 1 D) then ɛ N(0,D(I B) t ), i.e. the components of ɛ are not independent. Also cov(ɛ, Y )=D.

The non-gaussian CAR In many cases (e.g. binary data) the normality assumption might not be appropriate. We can start with any exponential family model: p(y i y j,j i) exp(ψ(θ i y i χ(θ i ))) θ i is a canonical link, e.g. θ i = j i b ijy j χ is some specific function, and ψ is a non-negative dispersion parameter. If you write θ i = x i β + j i b ijy j,forsome covariates x i,thenwehave p(y i y j,j i) exp(x i τ + ψ j i b ij y j )

Autologistic model When Y i are binary, the previous model gives us the autologistic model and log P (Y i =1) P (Y i =0) = x i γ + ψ j i w ij y j, where w ij =1,ifi j, and zero otherwise. The joint distribution (by Brook s lemma is) p(y 1,...,y n ) exp(γ( i y i x i )+ψ i j w ij y i y j )

SAR models Simultaneous autoregressive models (SAR). Remember that we may write the CAR model as Y = BY + ɛ, or (I B)Y = ɛ. Suppose that instead of letting Y induce the distribution of ɛ. We let ɛ induce a distribution Y. Suppose the ɛ N(0, D), where D is diagonal, ( D) ii = σ 2 i. Now Y i = j b ij Y j + ɛ i, Therefore, if (I B) isfullrank, Y N(0, (I B) 1 D((I B) 1 ) t )

Also cov(ɛ, Y )=D(I B) 1. If D = σ 2 I,then Y N(0,σ 2 (I B) 1 ((I B) 1 ) t )

Common choices for B: B = ρw, where W is called contiguity matrix, W has entries 1 or 0 according to whether or not i and j are neighbors (with w ii =0.). Here ρ is called a spatial autoregression parameter. We need to impose ρ (1/λ 1, 1/λ n ) where λ are ordered eigenvalues of W.TogetI ρw nonsingular. Alternatively, W can be replaced by W, and replace B = ρ W,then ρ < 1 With point-referenced data B is taken to be ρw where W is the matrix of inter-point distances.

A SAR model is usually used in a regression context, i.e. the residuals U = Y Xβ are assumed to follow a SAR model, rather than Y itself. Then, Y = BY +(I B)Xβ + ɛ. SAR models are well suited to maximum likelihood estimation but not at all for MCMC fitting of Bayesian models. Because it is difficult to introduce SAR random effects (in the CAR framework is easy because of the hierarchical conditional representation).

CAR versus SAR The CAR and SAR models are equivalent only if (I B) 1 D =(I B) 1 D((I B) 1 ) where the tilde indicates the SAR matrices. Any SAR model can be represented as a CAR model (because D is diagonal). But the converse is NOT TRUE.

Also, correlation among pairs can switch in nonintuitive ways, by varying the ρ parameter. Example, working with the adjacency relationships generated by the lower 48 contiguous US states, Wall (2004) finds that when ρ =.49 in proper CAR model, and corr(alabama, F lorida) =.2, and corr(alabama, Georgia) =.16. But when ρ =.975, we instead get corr(alabama, F lorida) =.65, and corr(alabama, Georgia) =.67.

STAR models SAR models have been extended to handle spatiotemporal data. The measurements Y it are spatially soociated at each fixed t. but, we might want to associate, say Y i2 with Y i1 and Y i3. Define W s that provides a spatial contiguity matrix for the Y s. And let W T define a temporal contiguity matrix for the Y s. We can define in our SAR model B = ρ s W s + ρ t W T. We can also introduce W s W T to incorporate interaction between space and time. This models are referred to as spatiotemporal autoregressive (STAR) models.