Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Similar documents
Areal data. Infant mortality, Auckland NZ districts. Number of plant species in 20cm x 20 cm patches of alpine tundra. Wheat yield

Spatial analysis of a designed experiment. Uniformity trials. Blocking

F9 F10: Autocorrelation

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Hypothesis Testing hypothesis testing approach

Asymptotic standard errors of MLE

[y i α βx i ] 2 (2) Q = i=1

. a m1 a mn. a 1 a 2 a = a n

STAT 100C: Linear models

Chapter 5 Matrix Approach to Simple Linear Regression

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Part 6: Multivariate Normal and Linear Models

Probability and Statistics Notes

Unit Root and Cointegration

Part 8: GLMs and Hierarchical LMs and GLMs

Ch 2: Simple Linear Regression

Using Estimating Equations for Spatially Correlated A

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

COMPLETELY RANDOMIZED DESIGNS (CRD) For now, t unstructured treatments (e.g. no factorial structure)

Chapter 4 - Fundamentals of spatial processes Lecture notes

Chapter 12 - Lecture 2 Inferences about regression coefficient

Lecture 9 SLR in Matrix Form

A Re-Introduction to General Linear Models (GLM)

Applied Econometrics (QEM)

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Generalized Linear Models

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Review of Statistics

Bias Variance Trade-off

MS&E 226: Small Data

An Introduction to Path Analysis

Next is material on matrix rank. Please see the handout

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Statistics 910, #5 1. Regression Methods

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Eco517 Fall 2014 C. Sims FINAL EXAM

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

An Introduction to Mplus and Path Analysis

Lecture 15. Hypothesis testing in the linear model

Linear Models and Estimation by Least Squares

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Confidence Intervals, Testing and ANOVA Summary

Lectures on Simple Linear Regression Stat 431, Summer 2012

In this course we: do distribution theory when ǫ i N(0, σ 2 ) discuss what if the errors, ǫ i are not normal? omit proofs.

Lecture 14 Simple Linear Regression

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Open Problems in Mixed Models

Section 4.6 Simple Linear Regression

Simple Linear Regression

Simple Linear Regression: The Model

Measuring the fit of the model - SSR

Statistics: A review. Why statistics?

Topic 6: Non-Spherical Disturbances

Central Limit Theorem ( 5.3)

Applied Statistics and Econometrics

STAT 730 Chapter 4: Estimation

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Vector Auto-Regressive Models

Linear Algebra Review

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Association studies and regression

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

VAR Models and Applications

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

Likelihood-based inference with missing data under missing-at-random

Categorical Predictor Variables

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review

Linear Regression. Junhui Qian. October 27, 2014

Topic 7 - Matrix Approach to Simple Linear Regression. Outline. Matrix. Matrix. Review of Matrices. Regression model in matrix form

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Primer on statistics:

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley

Econ 583 Final Exam Fall 2008

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

1. The OLS Estimator. 1.1 Population model and notation

ECON3150/4150 Spring 2015

Linear Regression Models

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Outline. Overview of Issues. Spatial Regression. Luc Anselin

Introduction to Within-Person Analysis and RM ANOVA

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

First Year Examination Department of Statistics, University of Florida

Lecture 6: Hypothesis Testing

13.2 Example: W, LM and LR Tests

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

Statistical Distribution Assumptions of General Linear Models

Math 423/533: The Main Theoretical Topics

Econometrics of Panel Data

A Probability Review

Transcription:

Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64 squares are a systematic sample: Only have sample If treat as a simple random sample, usual se probably wrong So shift to model-based inference If assume each square independent, ˆµ = 6.09, se ˆµ = 0.36 But we know that model is wrong If we assume spatially correlated, what changes? c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 / 5 Spatial inference If recognize the spatial correlation between nearby squares, ˆµ = 5.69, se ˆµ =.69 Difference in estimates is not too big (7% change) Difference in se s is huge (470% change) This data set has 64 obs I find it very helpful to compare sample sizes for equal se s, rather than directly compare se s. So, under the correlated data model, how many obs. needed to get se = 0.36? A: 40! Huge number! Spatial correlation matters! (BTW, also for geostatistical models) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 / 5 Accounting for spatial correlation How did I get estimates and se s recognizing the spatial correlation? Three common approaches ) Simultaneous Autoregressive (SAR) Model ) Conditional Autoregressive (CAR) Model 3) geostatistical model Geostatistical model can be used for either point or areal data We will consider each in turn. First need to talk about: Describing correlated data (multivariate normal distributions) Regression using matrices Regression with correlated data c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 3 / 5 Multivariate normal distributions Usual 40 setup: Y i iid N(µi, σ ) Mean, µ i, for each observation may be constant, dependent on treatment, or unique to each observation (e.g., β 0 + β X i ) Variance same for each obs. (no subscript on σ ) Independent observations Now, need to describe correlations among pairs of observations Y i N(µ i, σ ) is not sufficient Towards that goal: collect all the observations in a vector Y Y Y Y =., Y = [Y Y Y n ] Y n c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 4 / 5

Multivariate normal distributions Collect the means into a vector: µ = [µ µ µ n ] Variance now becomes the variance-covariance matrix Consider 3 random variables, X, Y and Z VC matrix is a 3 x 3 matrix Σ = σx σ XY σ XZ σ XY σy σ YZ σ XZ σ YZ σz When n values of Y, VC matrix is n x n c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 5 / 5 Variances and covariances Diagonal values are variances: σx = Var X = E (X µ X ) Off diagonal values are covariances: σ XY = Cov X, Y = E (X µ X )(Y µ Y ) Cov X, Y = Cov Y, X, so VC matrix is symmetric variance X is covariance of X with itself: E(X µ X ) = E(X µ X )(X µ X ) correlation between X and Y is Cov X, Y CorX, Y = Var X Var Y Cov = 0 means Cor = 0 In general, independent means Cov = 0. When observations have Multivariate normal distribution, Cov = 0 means independent c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 6 / 5 Multivariate normal distribution So can write VC matrix for 40 observations (independent, constant variance) σ 0 0 0 0 0 0 0 σ 0 0 Σ =....... = 0 0 0 σ....... 0 0 0 σ 0 0 0 I is the identity matrix. matrix equivalent of the scalar = σ I c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 7 / 5 Least squares regression, again Model: Y i = β 0 + β X i + ε i Can write down (but won t) equations to estimate β 0 and β Do not generalize easily to more parameters, e.g., Z i = β 0 + β X i + β Y i + ε i Can use matrices to simplify everything X β 0 + X β X [ ] β 0 + X β X = X 3 β0, β =, X β = β 0 + X 3 β β... X n β 0 + X n β Model: Y = X β + ε Can have many columns in X, or just c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 8 / 5

Least squares regression, again For any regression model with independent errors: ( ) ˆβ ols = X X X Y () is the matrix inverse, equivalent of scalar reciprocal X X = I and X X = I labeled OLS for ordinary least squares will see another type of LS soon (for correlated obs) and Var ˆβ ols = σ (X X ) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 9 / 5 Least squares regression, again Example: constant mean, Y i = µ + ε i X =., β = [µ] X X = = + + = n, ( X X ) = /n X Y = Y = Y + Y + + Y n = ΣY ( ) so ˆβ ols = X X X Y = ΣY /n and Var ˆµ = σ /n c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 0 / 5 Generalized Least Squares When errors are independent, each obs. is an additional piece of information Two positively correlated observations are less than pieces of information y.0.5 3.0 3.5 3 4 5 6 7 c Philip M. Dixon (Iowa State Univ.) Spatial x Data Analysis - Part 3b Spring 08 / 5 Generalized Least Squares Model: Estimates: Y = X β + ε, Var ε = Σ ( ) ˆβ gls = X Σ X X Σ Y What happens when Σ = σ I? ( ) ˆβ gls = X Σ X X Σ Y ( ) = X (σ I ) X X (σ I ) Y ( ) = X (/σ )X X (/σ )Y ( ) = σ (/σ ) X X X Y = ˆβ ols the GLS estimator simplifies to the usual OLS estimator when observations are independent with constant variance c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 / 5

Generalized Least Squares Another way to think about GLS For most Σ, can find a square-root matrix, so that Σ = C C Apply this idea to Σ to get a square root matrix for the inverse, we will call this B, so Σ = B B Multiply all terms in the model by B Look what happens to the errors: BY = BX β + Bε Var Bε = BΣB = BC (CB ) = I I = I Pre multiplying by square root of the inverse covariance matrix removes the correlation - transformed errors are now independent! c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 3 / 5 Generalized Least Squares So, if know the correlation matrix, can compute new Y = BY and new X = BX and use OLS on X and Y : Y = X β + ε The regression coefficients from OLS on transformed variables are: β = [ X X ] (X Y ) = [ (X B )(BX ) = (X Σ X ) (X Σ Y ) which are the GLS estimates for the original model ] [ ] (X B )(BY ) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 4 / 5 Generalized Least Squares Take home: If you know Σ, can compute the best estimates for any regression model using GLS A generalization (the Aitken model) says that all you need is the correlation part of Σ Write Σ as σ C where C is (now) a correlation matrix only need to know C, can estimate σ. Practical problem is that the correlation part almost always has to be estimated Σ (or C) depends on unknown parameters. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 5 / 5 SAR models General regression / ANOVA model Y = X β + ε, if X =., then Y = µ + ε model spatial correlation by allowing ε to depend on error values in neighboring regions ε(s i ) = Σ N j=b ij ε j + ν(s i ) b ij are the elements of the spatial dependence matrix expressing dependence among regions ν(s i ) is an independent random disturbance for each region. Usually assume ν(s i ) iid N(0, σ ν) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 6 / 5

SAR models What this model means. Some examples: In all: Focus on center observation (location s 5 ) assume row standardized rook s neighbors, so {b ij } is 0 0.5 0 0.5 0 0.5 0 0.5 0 ε(s 5 ) = 0 Is this value large or small? when b ij is not zero, depends on neighbors so look at ν(s 5 ) for ε(s 5 ) = 0 and different values for neighbors ε c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 7 / 5 7 8 0 0 8 0 9 Only the bolded neighbors matter, Σb ij ε(s j ) = 9.5, ν(s i ) = 0.5 large error but similar to neighbors, so independent contribution, ν(s i ) is small 7 6 0 4 0 6 9 Σb ij ε(s j ) = 4.5, ν(s i ) = 4.5 large error, but neighbors are larger, so independent contribution, ν(s i ) is negative c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 8 / 5 3 0 3 Σb ij ε(s j ) =.75, ν(s i ) = 8.5 much larger than neighbor errors, so independent contribution, ν(s i ) is large 5 7 0 3 0 6 0 Σb ij ε(s j ) = 3.5, ν(s i ) = 3.5 neighbors suggest a negative error, so independent contribution, ν(s i ) is very large c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 9 / 5 SAR models When part of the variation in error values can be explained by neighbors, the explained part is removed under the SAR model. Both SAR and CAR models do this, but they define explained differently SAR model: applies to all locations simultaneously For one location: ε(s i ) = Σ N j= b ijε j + ν(s i ) For all locations: ε = Bε + ν, which means: ν = ε Bε = (I B)ε ε = (I B) ν Y = X β + (I B) ν where I is the NxN identity matrix. B is the matrix of connectivity coefficients and ν is the vector of independent errors c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 0 / 5

SAR model in matrix terms If consider Y = X β + ε, then the Var-Cov matrix of ε is (I B) Σ ν (I B ) B can be arbitrary, but it has N(N-) parameters. usually simplify using a model usually write B = ρw, where ρ is a coefficient of spatial dependence and W is the spatial weight matrix Note ρ = 0 locations are connected (elements of W > 0), but not spatially dependent What does this model imply about VC matrix of the observations? Consider ρ = 0.9, W is rook s neighbors, row standardized, Σ ν = σ I Pictures on next few slides Correlation declines with distance: good! But Var Y not constant - largest in corners, smalest in middle of region c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 / 5 Correlation with [,].0 0.8 0.6 0.4 0. 0.0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 / 5 Correlation with [4,4].0 0.8 0.6 0.4 0. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 3 / 5 Variance of Y 7.0 6.5 6.0 5.5 5.0 4.5 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 4 / 5

SAR models Notice modeling correlation between observations indirectly B describes how one error value depends on the values of neighboring values describing connections between observations, not correlation Correlated errors is a consequence of those connections As is unequal variances A geostatistical model directly describes correlations Preference is subject-matter dependent Spatial econometrics, most areal data analysis, connections Spatial statistics, geostatistical data, correlations Degree of non-constant variance depends on magnitude of ρ Similar plots for ρ = 0.5 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 5 / 5 Variance, rho = 0.5.40.38.36.34.3.30.8.6.4 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 6 / 5 Correlation with [,], rho = 0.5.0 0.8 0.6 0.4 0. 0.0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 7 / 5 Correlation with [4,4], rho = 0.5.0 0.8 0.6 0.4 0. 0.0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 8 / 5

Estimation for SAR models Reminder: model is Y = X β + (I B) ν, where B = ρw W is the known spatial weight matrix IF ρ is known, then B is known, only need to estimate ˆβ Easy: ) calculate Σ ε = (I B) Σ ν (I B) and use GLS, or ) Note that: (I B)Y = (I B)(X β) + ν transform Y vector and X matrix, and you have an OLS problem. Usual situation: ρ is unknown, need to estimate use maximum likelihood general alternative to LS for any statistical problem Have already seen the VC matrix for ε: Σ ε = (I B) Σ ν (I B), where B = ρw mvnormal lnl is k log π log Σ ε (y X β) Σ ε (y X β) c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 9 / 5 Maximizing the mvn lnl Iterative algorithm. Key insight is that given Σ ε, mle of β is trivial (GLS) Assume ρ = 0, find OLS estimate of β Condition on β, use numerical maximization to find ˆρ β find GLS estimate of β for Σ ε (ρ) repeat last two steps until convergence. Traditional frequentist approach is to estimate β and Var β conditional on ˆρ Bayesian approach incorporates uncertainty in ˆρ into uncertainty about β Not an issue for simple problems (e.g. Σ ε = σ I ) because in this case, ˆβ independent of σ Is in issue in these models because ˆρ and ˆβ are not independent. For sp diversity data, ˆρ = 0.94. That is strong positive spatial dependence. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 30 / 5 Useful things about likelihood test hypotheses using Likelihood Ratio Test (LRT) e.g. test Ho : ρ = 0 calculate lnl given ρ = 0 (need to maximize over β): lnl 0 calculate lnl at mle s of all parameters: lnl A lnl A lnl 0, because ρ probably not 0 but by how much? 0 data consistent with ρ = 0 accept Ho but how much is too far from 0? General result: When Ho true, (lnl 0 lnl A ) χ k where k is the difference in # parameters between the two models This is an asymptotic result, but surprisinly effective in small samples Here, k = because testing hypothesis about parameter lnl A = -.37 lnl 0 = -58.306 = (lnl 0 lnl A) = 9.34 χ,0.95 = 3.84. Here p << 0.000 LRT only works when one model is a simplification of another c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 3 / 5 Useful things about likelihood Model selection: e.g. compare different W matrices AIC = - lnl + k k is # of parameters. spdep counts β, σ, and ρ Can compare models with same number of parameters, or diff. # parameters choose model with smallest AIC Here, 3 parameters (µ, σ, and ρ) ρ = 0: AIC = 36.6 + 6 = 3.6 ρ = 0.94: AIC = 4.7 + 6 = 30.7 Choose model with ρ = 0.94 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 3 / 5

Useful things about likelihood confidence interval for ρ by profile likelihood Concept: Repeat LRT for many values of ρ include inside 95% ci all values of ρ for which test has p > 0.05 here (0.8, 0.98) consequences of choice of ρ on inference about β ρ = 0 (independence): ˆµ = 6.09, se = 0.36 ρ = 0.94: ˆµ = 5.69, se =.69 Two points: just demonstrated non-independence of ˆρ and ˆβ in a SAR model se of ˆµ when account for correlation much larger observations are positively correlated, so fewer effective observations c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 33 / 5 CAR models SAR models all Y values simultaneously CAR models distrib of a each Y given the values of its neighbors RHS same as SAR model Y i Y i = X i β + Σ N j=c ij (Y j X j β) + ν i The difference is that Y i now treated as fixed values when specifying distrib of Y i Diff. VC matrix for obs.: Σ ε = Var Y = (I C) Σ ν c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 34 / 5 to be a valid VC matrix, requires some conditions: ρ can not be too large Definition depends on the weight matrix, W C must be symmetric, C ij = C ji so my example now uses binary weights Practical difference: correlation pattern similar Pictures on next slide correl between pairs in middle higher among than between middle and edge similar to SAR pattern, but details slightly different But pattern of variance quite different biggest var in the middle of the area (more connections to other points) Fitting a CAR model gives: ˆµ = 5.0, se = 0.83, ˆρ = 0.53. 0.53 doesn t seem large, but close to maximum possible value c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 35 / 5 Correlation with [,], CAR.0 0.8 0.6 0.4 0. 0.0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 36 / 5

Correlation with [4,4], CAR.0 0.8 0.6 0.4 0. 0.0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 37 / 5 Variance, CAR.8.6.4..0.8.6.4. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 38 / 5 Different variance and correlation pattern CAR and SAR are not the same models Which fits the data better? How will you answer this Q? c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 39 / 5 Model comparison I would use AIC model AIC SAR 4.54 CAR 6.38 Indep 30.6 Clear dominance of SAR model Traditional interpretation AIC w/i of the top: worse model is reasonable competitor AIC more than 0 from the top: worse model is very unlikely For Bayesian analysis, CAR models very, very popular They are easy to use in an MCMC chain, because they are conditional distributions c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 40 / 5

Smoothing noisy areal data div CarPred 0 8 6 4 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 4 / 5 Comparison of smoothing using CAR and SAR SarPred CarPred 0 9 8 7 6 5 4 3 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 4 / 5 More complicated models What if you have X variables measured for each area? Easy to include in the model My example: soil ph measured in each area the small quadrats cross from igneous parent material (lower ph) to limestone parent material (higher ph) Residuals from OLS regression are spatially correlated Moran s I: 0.6 less than I for observations = 0.79 Pictures on next three slides c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 43 / 5 Soil ph 7.0 6.5 6.0 5.5 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 44 / 5 5.0

# species vs. Soil ph # of species 4 6 8 0 5.5 6.0 6.5 Soil ph Residuals from linear regression c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 45 / 5 4 0 4 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 46 / 5 Results from ph models Model ˆβ ph se ˆρ AIC Indep. 3.56 0.43 75.06 SAR.93 0.66 0.86 5.77 CAR 3.8 0.5 0.6 33.78 Pictures of the two fits on next slide c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 47 / 5 SAR and CAR fits fitsar fitcar 0 8 6 4 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 48 / 5

SAR and CAR fits: trend component = X beta trendsar trendcar 8 7 6 5 4 3 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 49 / 5 Why is SAR slope for ph much lower? When X variable is spatially correlated X and the spatial correlation fight to predict Y analogous to two correlated X variables: ph and space Commonly seen when using spatial correlated errors The effect you love disappears For these data: SAR fit: smaller β ph, higher correlation CAR fit: larger β ph, smaller correlation c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 50 / 5 Comparison of intercept only to ph SAR models Predicted value, ph 4 6 8 0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b 4 6 8 Spring 08 5 / 5 Predicted value, intercept only Comparison of intercept only to ph SAR models fitsar0 fitsar 0 9 8 7 6 5 4 3 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3b Spring 08 5 / 5