Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Size: px

Start display at page:

Download "Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case"

Stephanie Cook
5 years ago
Views:

1 Areal data models Spatial smoothers Brook s Lemma and Gibbs distribution CAR models Gaussian case Non-Gaussian case SAR models Gaussian case Non-Gaussian case CAR vs. SAR STAR models

2 Inference for areal data Note: This chapter is very important for hierarchical spatial modeling of any type of data using MCMC methods. For areal units the inferential issues are: Is there a spatial pattern? How strong is it? Spatial pattern suggest that observations close to each other have more similar values than those far from each other. Do we want to smooth the data? How much? If we modify the areal units to new units (from zip codes to county values), what can we say about the new counts we expect for the latter give those for the former? This is the modifiable areal unit problem (MAUP).

3 Exploratory tools Proximity matrix W : proximity matrix. The entries in W connect different values of the process Y 1,...,Y n in some fashion. Generally w ii is set to zero. Examples (symmetric W ): w ij =1ifi and j share common boundary. w ij could be distance between centroids of regions i and j. w ij =1ifj is one of the K nearest neighbors of i. W does not need to be symmetric. The w ij might be standardized by j w ij = w i+. We can define distance intervals, (0,d 1 ], (d 1,d 2 ], and so on. Then, we call:

4 First order neighbors of unit i: all units within distance d 1 of i. Second order neighbors: all units within distance d 2 of i but separated by more more than d 1. Analogous to W we can define W (1) as the proximity matrix for the first-order neighbors. This means w (1) ij =1ifi and j and first-order neighbors. And, so on.

5 Measures of spatial association The standard statistics are the Moran s I and Geary s C. They are analogues for areal data of the empirical correlation function and the variogram. Moran s I: I = n i w ij(y i Ȳ )(Y j Ȳ ) ( i j w ij) i (Y i Ȳ )2 I is not supported on [ 1, 1]. Under the hypothesis of independence, I is asymptotically normal with mean 1/(n 1).

6 Geary s C: C = (n 1) i j w ij(y i Y j ) 2 ( i j w ij) i (Y i Ȳ )2 C is never negative, and has mean 1 for the null model. Low values (between 0 and 1) indicate positive spatial association. Under the null hypothesis we have asymptotic normality. However for testing is preferable to use Monte Carlo. By permuting the values of Y i s.

7 The correlogram is a useful tool to study spatial association with areal data. Working with I we can replace w ij with the previously defined w (1) ij and obtain say I (1). Then, we replace it with w (2) ij and obtain I (2). A plot of I (r) versus r is called a correlogram. If there is spatial pattern, we expect I (r) to decline in r Initially and then vary about 0.

8 Spatial smoothers W provides a spatial smoother. We can replace Y i by Ŷ i = j w ij Y j /w i+. This ensures that the value for an areal unit i looks more like its neighbors. Alternatively, we can consider (to take into account the actual value of Y i ) Ŷ i =(1 α)y i + αŷi, for α (0, 1). This can be viewed as a filter. We will revisit this topic in the hierarchical modeling chapter.

9 Brook s Lemma Given p(y 1,...,y n ), the full conditional distributions, then p(y i y j,j i) for i = 1,...,n, are uniquely determined. Brook s lemma proves the converse, and it enables us to retrieve the unique joint distribution determined by the conditionals. We can not write down an arbitrary set of conditionals and assert that they determine the joint distribution. Example: Y 1 Y 2 N(α 0 + α 1 Y 2,σ 2 1) Y 2 Y 1 N(β 0 + β 1 Y 3 1,σ 2 2) Thus, E[Y 1 ] is linear in E[Y 2 ] E[Y 1 ]=α 0 + α 1 E[Y 2 ], then, E[Y 2 ] is linear in E[Y 1 ]. However it must also be the case that E[Y 2 ]=β 0 + β 1 E[Y 3 1 ],

10 This can not be in general. Therefore, there is no joint distribution. Also, p(y 1,...,y n )mightbeimproperevenif the conditionals are proper. Example: p(y 1,y 2 ) exp( 1/2(y 1 y 2 ) 2 ). p(y 1 y 2 )is N(y 2, 1) and p(y 2 y 1 ) N(y 1, 1). But, p(y 1,y 2 ) is improper.

11 Brook s Lemma p(y 1, y 2...,y n ) p(y 10, y 2...,y n ) p(y 1,...,y n )= p(y 2, y 10,y 3...,y n ) p(y 20, y 10,y 3...,y n ) p(y n, y 10,...,y n 1,0 ) p(y n0, y 10,...,y n 1,0 ) p(y 10,...,y n0 ) here y 0 =(y 10,...,y n0 )isanyfixedpointin the support of p. The joint distribution is determined up to a proportionality constant by the conditionals.

12 Definitions Markov Random Field (MRF): We specify a set of full conditional distributions for the Y i such that p(y i y j,j i) =p(y i y j,j δ i ). The notion of using local specification to determine a joint distribution is refereed to as a MRF. Clique: A clique is a set of cells such that each element is a neighbor of every other element. We use notation i j if i is a neighbor of j and j is a neighbor of i. Potential: A potential of order k, itisa function of k arguments that is exchangeable in these arguments. The arguments of the potential would be the values taken by variables associated with the cells for a clique of size k. Example: for k =2,wehaveY i Y j if i and j are

13 a clique of size 2. This is a potential of order 2. Gibbs distribution: p(y 1,...,y n ) is a Gibbs distribution if it is a function of the Y i only through potentials on cliques. ( p(y 1,...,y n ) exp γ ) φ (k) (y α1,...,y αk ) k α M k φ (k) is a potential of order k, M k is the collection of all subsets of size k, α indexes this set, and γ>0 is a scale parameter. Hammersley-Clifford Theorem: If we have a MRF, i.e. if the conditional defines a unique joint distribution, then this joint distribution is a Gibbs distribution.

14 For continuous data on R, a common choice for joint distribution is p(y 1,...,y n ) exp 1 2τ 2 (y i y j ) 2 I(i j) We will study next this type of distributions, which are Gibbs distributions on potential of order 1 and 2, and then i,j p(y i y j,j i) =N( j δ i y j /m i,τ 2 /m i ) where m i is the number of neighbors of i.

15 CAR models Conditionally autoregressive models (CAR). The are widely used in MCMC methods for fitting certain classes of hierarchical spatial models. As we will see later. The Gaussian (autonormal) case Y i y j,j i N( j b ij y j,τ 2 i ) i = 1,...,n. These full conditionals are compatible, through Brooks lemma we obtain p(y 1,...,y n ) exp { 1/2y T D 1 (I B)y } where B = {b ij } and D is diagonal with D ii = τ 2 i. For Y to be normal we need first to prove the symmetry of Σ Y =(I B) 1 D. The simple resulting conditions are

16 b ij τ 2 i = b ji τ 2 j for all i, j. Thus, B is not symmetric in this setting. Suppose we set b ij = w ij /w i+, and τi 2 = τ 2 /w i+. Then, the condition is satisfied, andwehavethaty, p(y 1,...,y n ) exp { 1/2τ 2 y T (D w W )y }, (1) where D w is diagonal with (D w ) ii = w i+.

17 A second problem is that (D w W )1 = 0 then Σ 1 y is singular and Σ y does not exits. thus, this distribution is improper. we can rewrite (1) as follows p(y 1,...,y n ) exp 1/2τ 2 i j w ij (y i y j ) 2, The impropriety of p is clear, since we can add any constant to all the Y i and the distribution is unaffected. The Y i are not centered. A constraint such as Y i =0 i would solve the problem. This is the IAR model, intrinsically autoregressive model. A joint distribution that is improper but has all full conditionals proper. The impropriety can be remedied in an obvious

18 way. Redefine: Σ 1 y = D w ρw and choose ρ to make Σ 1 y nonsingular. This is guaranteed if ρ (1/λ 1, 1/λ n ), where λ 1 < <λ n are the ordered eigenvalues of D 1/2 w WD 1/2 w. The bounds can be simplified, by replacing W by W = Diag(1/w i+ )W. then, Σ 1 y = D w (I α W ) where D w is diagonal. If α < 1, then I α W is nonsingular.

19 Interpretation of the ρ parameter: The additional parameter ρ, when it is zero, the Y i become independent. ρ should not be interpreted as a parameter that explains the spatial dependency. For instance, in a simulation study when ρ =.8, I =.1, when ρ =.9, I=.5. But, an improper choice (ρ = 1) may enable wider scope for posterior spatial patterns, and might be preferable.

20 We may write the CAR model as Y = BY + ɛ, or (I B)Y = ɛ. If p(y) is proper, then Y N(0, (I B) 1 D) then ɛ N(0,D(I B) t ), i.e. the components of ɛ are not independent. Also cov(ɛ, Y )=D.

21 The non-gaussian CAR In many cases (e.g. binary data) the normality assumption might not be appropriate. We can start with any exponential family model: p(y i y j,j i) exp(ψ(θ i y i χ(θ i ))) θ i is a canonical link, e.g. θ i = j i b ijy j χ is some specific function, and ψ is a non-negative dispersion parameter. If you write θ i = x i β + j i b ijy j,forsome covariates x i,thenwehave p(y i y j,j i) exp(x i τ + ψ j i b ij y j )

22 Autologistic model When Y i are binary, the previous model gives us the autologistic model and log P (Y i =1) P (Y i =0) = x i γ + ψ j i w ij y j, where w ij =1,ifi j, and zero otherwise. The joint distribution (by Brook s lemma is) p(y 1,...,y n ) exp(γ( i y i x i )+ψ i j w ij y i y j )

23 SAR models Simultaneous autoregressive models (SAR). Remember that we may write the CAR model as Y = BY + ɛ, or (I B)Y = ɛ. Suppose that instead of letting Y induce the distribution of ɛ. We let ɛ induce a distribution Y. Suppose the ɛ N(0, D), where D is diagonal, ( D) ii = σ 2 i. Now Y i = j b ij Y j + ɛ i, Therefore, if (I B) isfullrank, Y N(0, (I B) 1 D((I B) 1 ) t )

24 Also cov(ɛ, Y )=D(I B) 1. If D = σ 2 I,then Y N(0,σ 2 (I B) 1 ((I B) 1 ) t )

25 Common choices for B: B = ρw, where W is called contiguity matrix, W has entries 1 or 0 according to whether or not i and j are neighbors (with w ii =0.). Here ρ is called a spatial autoregression parameter. We need to impose ρ (1/λ 1, 1/λ n ) where λ are ordered eigenvalues of W.TogetI ρw nonsingular. Alternatively, W can be replaced by W, and replace B = ρ W,then ρ < 1 With point-referenced data B is taken to be ρw where W is the matrix of inter-point distances.

26 A SAR model is usually used in a regression context, i.e. the residuals U = Y Xβ are assumed to follow a SAR model, rather than Y itself. Then, Y = BY +(I B)Xβ + ɛ. SAR models are well suited to maximum likelihood estimation but not at all for MCMC fitting of Bayesian models. Because it is difficult to introduce SAR random effects (in the CAR framework is easy because of the hierarchical conditional representation).

27 CAR versus SAR The CAR and SAR models are equivalent only if (I B) 1 D =(I B) 1 D((I B) 1 ) where the tilde indicates the SAR matrices. Any SAR model can be represented as a CAR model (because D is diagonal). But the converse is NOT TRUE.

28 Also, correlation among pairs can switch in nonintuitive ways, by varying the ρ parameter. Example, working with the adjacency relationships generated by the lower 48 contiguous US states, Wall (2004) finds that when ρ =.49 in proper CAR model, and corr(alabama, F lorida) =.2, and corr(alabama, Georgia) =.16. But when ρ =.975, we instead get corr(alabama, F lorida) =.65, and corr(alabama, Georgia) =.67.

29 STAR models SAR models have been extended to handle spatiotemporal data. The measurements Y it are spatially soociated at each fixed t. but, we might want to associate, say Y i2 with Y i1 and Y i3. Define W s that provides a spatial contiguity matrix for the Y s. And let W T define a temporal contiguity matrix for the Y s. We can define in our SAR model B = ρ s W s + ρ t W T. We can also introduce W s W T to incorporate interaction between space and time. This models are referred to as spatiotemporal autoregressive (STAR) models.

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets Is there spatial pattern? Chapter 3: Basics of Areal Data Models p. 1/18 Areal Unit Data Regular or Irregular Grids