Areal data. Infant mortality, Auckland NZ districts. Number of plant species in 20cm x 20 cm patches of alpine tundra. Wheat yield

Similar documents
Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Basics of Geographic Analysis in R

Spatial analysis of a designed experiment. Uniformity trials. Blocking

Temporal vs. Spatial Data

Nature of Spatial Data. Outline. Spatial Is Special

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Types of Spatial Data

Spatial and Environmental Statistics

Global Spatial Autocorrelation Clustering

Space-time data. Simple space-time analyses. PM10 in space. PM10 in time

Areal Unit Data Regular or Irregular Grids or Lattices Large Point-referenced Datasets

Hypothesis Testing hypothesis testing approach

Spatial Autocorrelation (2) Spatial Weights

Spatial Analysis 2. Spatial Autocorrelation

The Study on Trinary Join-Counts for Spatial Autocorrelation

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor

ENGRG Introduction to GIS

Spatial Data Mining. Regression and Classification Techniques

Statistics 910, #5 1. Regression Methods

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

Lecture 18. Models for areal data. Colin Rundel 03/22/2017

Basic Probability Reference Sheet

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Using Estimating Equations for Spatially Correlated A

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Proximity-Based Anomaly Detection using Sparse Structure Learning

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Math 378 Spring 2011 Assignment 4 Solutions

Lab #3 Background Material Quantifying Point and Gradient Patterns

Contents. Learning Outcomes 2012/2/26. Lecture 6: Area Pattern and Spatial Autocorrelation. Dr. Bo Wu

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Introduction. Semivariogram Cloud

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

UMA Putnam Talk LINEAR ALGEBRA TRICKS FOR THE PUTNAM

Properties of the least squares estimates

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Introduction. Spatial Processes & Spatial Patterns

Spatial Autocorrelation

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

7 Geostatistics. Figure 7.1 Focus of geostatistics

Variables and Variable De nitions

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Discrete Math, Spring Solutions to Problems V

An Introduction to Pattern Statistics

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

11/8/2018. Spatial Interpolation & Geostatistics. Kriging Step 1

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Concepts and Applications of Kriging. Eric Krause

I don t have much to say here: data are often sampled this way but we more typically model them in continuous space, or on a graph

Bayesian Areal Wombling for Geographic Boundary Analysis

Chi-Square. Heibatollah Baghi, and Mastee Badii

Solutions to Final STAT 421, Fall 2008

22m:033 Notes: 3.1 Introduction to Determinants

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Spatial Analysis II. Spatial data analysis Spatial analysis and inference

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Spatial Interpolation & Geostatistics

EVALUATION OF FOUR COVARIATE TYPES USED FOR ADJUSTMENT OF SPATIAL VARIABILITY

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Weighted Least Squares

Community Health Needs Assessment through Spatial Regression Modeling

Advanced Introduction to Machine Learning CMU-10715

Lecture 7 Autoregressive Processes in Space

Lecture: Face Recognition and Feature Reduction

Data Preprocessing Tasks

Chapter 1 Statistical Inference

MAA507, Power method, QR-method and sparse matrix representation.

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Interaction Analysis of Spatial Point Patterns

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

LECTURE 15: SIMPLE LINEAR REGRESSION I

Gaussian Models

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Lecture 5 Geostatistics

Disentangling spatial structure in ecological communities. Dan McGlinn & Allen Hurlbert.

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Spatio-Temporal Modelling of Credit Default Data

Chapter 5 Matrix Approach to Simple Linear Regression

Measures of Agreement

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?

MATH 118 FINAL EXAM STUDY GUIDE

Y i = η + ɛ i, i = 1,...,n.

Lecture 13: Simple Linear Regression in Matrix Format

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

ANOVA: Analysis of Variation

Rate Maps and Smoothing

Correlation analysis. Contents

Spatial Effects and Externalities

Getting Started with Communications Engineering

GIST 4302/5302: Spatial Analysis and Modeling

Lecture: Face Recognition and Feature Reduction

Transcription:

Areal data Reminder about types of data Geostatistical data: Z(s) exists everyhere, varies continuously Can accommodate sudden changes by a model for the mean E.g., soil ph, two soil types with different ph model as mean that depends on soil type error = small scale variation, assumed continuous Areal data Spatial data measured and reported by regions Value constant within the region Can be abrupt change at region boundaries Unlike geostat data, location is arbitrary within region Some examples of areal data in pictures c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 1 / 3 Infant mortality, Auckland NZ districts 0 35 30 5 0 15 5 0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 / 3 Number of plant species in 0cm x 0 cm patches of alpine tundra 1 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 3 / 3 Wheat yield Wheat uniformity trial 5.0.5.0 3.5 3.0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 / 3

Incidence of flu in Iowa: made up data 0. 0.0 0.0 0.0 0.0 0.00 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 5 / 3 Areal data Areal data often arises by aggregation number of disease cases by county but doesn t have to: US states, coded by party of state Governor Distinction between Geostat and Areal can be blurred SST data: starts as geostatistical data displayed as single value per spatial grid cell (e.g., 1 x 1 area) To me: relative scale, size of measurement to span of study area Distinction between Areal and Point pattern can be blurred Disease cases: will treat as areal data but point pattern if have individual locations (household address) Again, relative scale matters. What is the location of an individual? c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 / 3 Areal Data Goals: Show data Describe spatial dependence Predict values Less common to predict at new locations Assume observations are noisy, i.e., measurement error at each location, Want to smooth (reduce amount of noise) i.e., predict true values at observed locations Fit regression models while accounting for spatial dependence c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 7 / 3 Describing association Spatial association between what? Consider data on a grid (species diversity picture on next slide) 1 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 / 3

Describing association Could describe in terms of distance 1 = up/down or across = diagonal, and so on = up/down or across But approximate, since distances between areas, not points What about irregular regions (e.g. Auckland, on next slide)? Use distance between centroids of regions - perhaps but depends on size of region Usual solution: pairwise connectivity : how well connected are two regions? c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 9 / 3 0 35 30 5 0 15 5 0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 / 3 Connectivity On a grid: two common definitions of connectivity rook s move: a square in the middle connects to the two on either side and the two above/below queen s move: a square in the middle connects to its neighbors (rooks + diagonals) squares on edge have 3/5 neighbors; those in the corners have only /3 Connections define the spatial proximity matrix, also called spatial weight or spatial connectivity matrix # rows = # columns = # regions 1 if a pair of regions connect 0 otherwise always 0 down the diagonals c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 11 / 3 Connectivity 1 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 1 / 3

Connectivity - options Irregular regions: two approaches 1 if two regions share a boundary very irregular neighbor count. Auckland: from 1 - neighbors, median Could also use % boundary shared very useful for modeling transport among regions economic flows, disease propagation, invasive spp Could also use distance. Options include: all regions with centroids within d of target (0/1) find the k nearest neighbors (k smallest distance between centroids) make weight a function of distance, d α Can use values as is, or row-standardize so row sum = 1 if have N s, each has connectivity 1/ if have N s, each has connectivity 1/ c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 13 / 3 Connectivity No standard way. If there is something that makes sense for the problem: use it! best if connectedness measure informed by subject-matter knowledge When no such insight, most common is: shared boundary = 0/1, perhaps with row standardization choice does have statistical consequences, especially when predicting/smoothing Some weight matrices are symmetrical 0/1 shared boundary d α others are not % shared boundary row-standardized matrices c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 1 / 3 Spatial dependence One very common measure, one less common Moran s I, dates to 1950 I = 1 s Σ i,j w ij (Y i Ȳ )(Y j Ȳ ) Σ i,j w ij s = 1 N Σ i(y i Ȳ ), i.e., mle, not usual unbiased est. w ij is the ij th element of the spatial weight matrix looks like a correlation coefficient between two variables, Y, Z: I ranges from +1 to -1 0: no spatial correlation 1: perfect positive correlation r = Σ(Y i Ȳ )(Z i Z) n Var Y Var Z c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 15 / 3 Moran s I Tests of correlation = 0: 1) If sufficient # regions, I N with mean and variance that can be computed E Î = 1 N 1, variance formula not insightful not clear what is sufficient, depends on W, but would like 0 or more regions ) permutation: randomly shuffle observed values over the regions, compute I each time enumerate all permutations: p = #more extreme #permutations sample (randomization test): p = #more extreme+1 #permutations+1 +1 accounts for the observed data (already included in all permutations) Usually one-sided test, only interested in positive spatial dependence c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 1 / 3

Moran s I example Example: species diversity data using Queen s move neighbors Î = 0.751 randomization test: 0.751 exceeds all 9,999 randomizations, p = 1/,000 = 0.0001 normal approximation: E Î = -0.0157, Var Î = 0.0051 Z = I E I = 11., p < 0.0001 Var I Next slide shows distribution of Moran s I values for randomized values c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 17 / 3 Density 0 1 3 5 0. 0.0 0. 0. 0. 0. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 1 / 3 1 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 19 / 3 Geary s c Based on squared differences, not covariance c = N 1 Σ i,j w ij Σ i,j w ij (Y i Y j ) Σ i (Y i Ȳ ) similar in spirit to semivariance in geostats denominator scales to ±1 usually similar to but not same as Moran s I when there is a difference I is a more global indicator, because uses Ȳ c is more sensitive to differences in local neighborhoods Test Ho: no spatial dependence using permutations or normal approximation Notice that both I and c sum over all pairs of points. One number for entire region c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 0 / 3

Local indicators of spatial association rewrite I as: I = 1 s Σ i,j w ij Σ i (Y i Ȳ ) Σ j w ij (Y j Ȳ ) Calculate second sum separately for each region I i = 1 s (Y i Ȳ ) Σ j w ij (Y j Ȳ ) global statistic is then I = Σ i I i /Σ i,j w ij illustrate using species diversity data First local Moran s I i, then Moran s I i marking Z i > Blue areas are where that region is similar to neighboring regions c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 1 / 3 div Ii 1 0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 / 3 1.0 0. 0. 0. 0. 0.0 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 3 / 3 Moran correlogram Define different weight matrices using different distance classes (0, 1.5) using queen s move neighbors (, 3) regions slightly further apart (3, 5), and so on Or, use 1st nearest neighbor, nd NN, 3rd NN,... Calculate Moran s I for each weight matrix Plot I vs distance c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 / 3

Moran's I 0. 0.0 0. 0. 0. 0. 1 3 5 Distance class c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 5 / 3 Join count statistics Moran s I and Geary s c are for continuous observations c: similar because (Y i Y j ) is small What about categorical data E.g., US states, record whether governor is Republican or Democrat Is there spatial correlation? i.e., If your state is Republican, are neighboring states more likely to be Republican? Usual approach is the Black-Black (BB) join count statistic BB = 1 Σ i,jw ij I i I j I i is 1 if the event happens in region i If w ij is 1 if neighbors, 0 otherwise, BB is the number of pairs where both region and neighbor are events c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 / 3 Join count statistics Compare BB to expected value if no spatial dependence Either a Z-test (assume normal-distribution) Z = BB E BB Var BB or permutation: keep same number of event and non- event regions, keep neighbor structure, assign event / non- event randomly to regions. compute BB for each randomization Example: Species diversity plot, define rich as or more species Rich-rich, rooks neighbors: BB = 53, E BB = 35, Var BB =.95 Normal approximation: Z = (53 35)/.95 =.17, p < 0.0001 Permutation: 53 is larger than any of 9999 randomizations, p = 0.0001 c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 7 / 3 TRUE FALSE c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 / 3

Frequency 0 500 00 1500 000 500 5 30 35 0 5 50 55 BB for random assignments c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 9 / 3 Combining multiple sites Imagine a study of spatial dependence in Iowa that is repeated in Indiana. Methods above will describe that spatial dependence in each state What if you believed the spatial dependence was similar in the two, and wanted one result How can you combine information from both states? No problem, but should think about a few issues. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 30 / 3 Combining multiple sites 1) One mean for both states, or separate means? the issue is the scale of spatial dependence (within state only or including between state) not an issue if the means are similar, but they are often are not Separate means evaluates pooled within state spatial dependence Separate means is most common ) Include only pairs of regions within states, or pairs crossing states? Same spatial scale issues. Only pairs within states is most common. c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 31 / 3 Combining multiple sites 3) Do the two states contribute equal amounts of information? Here, same within state sampling, same # regions, same weighting scheme (In my story) So, equal amounts of information use a simple average of both states (A and B) Î for both states: Î = (ÎA + ÎB)/ E I = (E I A + E I B )/ Var Î = (Var Î A + Var Î B )/ If unequal amounts of information, use a weighted average Many possible ways to weight, depends on study specifics c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 3 / 3

Combining multiple sites Hard part is getting estimate and its variance Test Ho: no within state spatial dependence Z score for overall study, or permuting observations within state. BTW, same ideas can be used for semivariograms for multiple sites Computing trick: artificially separate sites, make sure min distance between IA and IN larger than max within state distance specify max semivariogram distance so that all pairs are within a site. weights each region by number of pairs c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 33 / 3 Multiple sites: example Iowa: Î = 0.35, E Î = 0.0159, Var Î = 0.05 Indiana: Î = 0., E Î = 0.0159, Var Î = 0.05 Individually: Iowa: Z = 0.35 ( 0.0159) 0.05 = 1.5, p = 0. Indiana: Z = 0. ( 0.0159) 0.05 = 1.9, p = 0.07 Together: Î = (0.35 + 0.)/ = 0.35, E Î = 0.0159, Var Î = (0.05 + 0.05)/ = 0.05 Z = 0.35 ( 0.0159) 0.05 = 1.9, p = 0.0 Similar patterns in both areas, aggregate the two stronger evidence c Philip M. Dixon (Iowa State Univ.) Spatial Data Analysis - Part 3 Spring 01 3 / 3