Spatial Clusters of Rates

Similar documents
Rate Maps and Smoothing

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World

Exploratory Spatial Data Analysis Using GeoDA: : An Introduction

Identification of Local Clusters for Count Data: A. Model-Based Moran s I Test

Mapping and Analysis for Spatial Social Science

Point Pattern Analysis

Local Spatial Autocorrelation Clusters

Global Spatial Autocorrelation Clustering

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

Outline ESDA. Exploratory Spatial Data Analysis ESDA. Luc Anselin

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

FleXScan User Guide. for version 3.1. Kunihiko Takahashi Tetsuji Yokoyama Toshiro Tango. National Institute of Public Health

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System

Hypothesis Testing hypothesis testing approach

Bayesian Hierarchical Models

Detection of Clustering in Spatial Data

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

Points. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Practice Problems Section Problems

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Nonparametric Tests. Mathematics 47: Lecture 25. Dan Sloughter. Furman University. April 20, 2006

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Spatial Regression. 9. Specification Tests (1) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Overview of Spatial analysis in ecology

Detection of temporal changes in the spatial distribution of cancer rates using local Moran s I and geostatistically simulated spatial neutral models

An Introduction to Pattern Statistics

Concepts and Applications of Kriging. Eric Krause Konstantin Krivoruchko

Scalable Bayesian Event Detection and Visualization

Community Health Needs Assessment through Spatial Regression Modeling

Statistícal Methods for Spatial Data Analysis

Practical Statistics

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

STATISTICS 4, S4 (4769) A2

Detection of Clustering in Spatial Data

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Spatial Analysis 1. Introduction

Bayesian Model Diagnostics and Checking

Concepts and Applications of Kriging. Eric Krause

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

Testing for Spatial Group Wise Testing for SGWH. Chasco, Le Gallo, López and Mur, Heteroskedasticity.

Statistics Handbook. All statistical tables were computed by the author.

Lab #3 Background Material Quantifying Point and Gradient Patterns

Introduction to Spatial Analysis. Spatial Analysis. Session organization. Learning objectives. Module organization. GIS and spatial analysis

Loglinear Residual Tests of Moran s I Autocorrelation and their Applications to Kentucky Breast Cancer Data

Sociology 6Z03 Review II

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Quasi-likelihood Scan Statistics for Detection of

University of Texas-Austin - Integration of Computing

A spatial scan statistic for multinomial data

Monte Carlo Methods in High Energy Physics I

Exam C Solutions Spring 2005

Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects

A Comparison of Three Exploratory Methods for Cluster Detection in Spatial Point Patterns

Detecting Clusters of Diseases with R

Spatial Regression. 10. Specification Tests (2) Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Eco517 Fall 2004 C. Sims MIDTERM EXAM

In matrix algebra notation, a linear model is written as

Statistical Data Analysis Stat 3: p-values, parameter estimation

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Bayesian Regression Linear and Logistic Regression

Math Review Sheet, Fall 2008

Math 562 Homework 1 August 29, 2006 Dr. Ron Sahoo

Tracey Farrigan Research Geographer USDA-Economic Research Service

TUTORIAL 8 SOLUTIONS #

Chapter 15 Spatial Disease Surveillance: Methods and Applications

A Test of Cointegration Rank Based Title Component Analysis.

Mapping under-five mortality in the Wenchuan earthquake using hierarchical Bayesian modeling

Primer on statistics:

Learning Outbreak Regions in Bayesian Spatial Scan Statistics

Aggregated cancer incidence data: spatial models

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Spatial Point Pattern Analysis

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Pattern Extraction From Spatial Data - Statistical and Modeling Approches

Fundamental Probability and Statistics

Kriging Luc Anselin, All Rights Reserved

Empirical Risk Minimization, Model Selection, and Model Assessment

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Course 4 Solutions November 2001 Exams

3 Joint Distributions 71

Probabilistic Inference for Multiple Testing

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

An introduction to Bayesian inference and model comparison J. Daunizeau

Statistical Inference

GS Analysis of Microarray Data

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Computational Statistics and Data Analysis

Transcription:

Spatial Clusters of Rates Luc Anselin http://spatial.uchicago.edu

concepts EBI local Moran scan statistics

Concepts

Rates as Risk from counts (spatially extensive) to rates (spatially intensive) rate = number of events / population rate as a measure of risk (a probability) crude rate: O i / Pi relative: O i / Ei observed relative to expected

The Problem with Rates r = O / P O number of events P population (at risk) O is a random variable, P is not variance of r depends inversely on P

Moments of the Binomial Variable mean: E [O] = π.p risk times population variance: V [O] = π (1 - π).p variance depends on population P

Moments of the Rate P is just a constant E[r] = E[O]/P = π P / P = π crude rate is unbiased estimator for risk Var[r] = Var[O] / P 2 = π (1 - π) P / P 2 = π (1 - π) / P

Non-Standard Features of Rate Variance variance depends on the mean (= risk) numerator π (1 - π) = π - π 2 π higher risk implies greater variance variance depends inversely on population P P in the denominator smaller places (smaller P) have larger variance

crude rate map Empirical Bayes (EB) smoothed map effect of variance instability on outliers (schools/population)

Approaches variance instability violates the basic assumption underlying spatial autocorrelation analysis of a constant variance solutions standardized local indicators of spatial autocorrelation (EBI LISA) scan statistics

EBI Local Moran

Correcting Variance Instability NOT by smoothing rates and applying standard Moran s I smoothing induces spatial correlation BUT by adjusting the Moran s I statistic directly several proposals: constant risk hypothesis (Walter 92), Tango s I (95), Oden s Ipop (95) and Assuncao-Reis EBI (99)

Empirical Bayes Index - EBI standardizing the rate variable using an Empirical Bayes (EB) logic z i = (ri - b) / si with ri as the original rate (xi/pi), b as a mean and si as a standard deviation use local Moran with standardized rates z i

EBI Adjustment mean b = Σ x / Σ p for i = 1,...,R i i i i i.e., total sum of cases / total population, not the mean of the rates variance i = {[Σ i p i (r i - b) 2 ] / P tot } - b/p av P tot = Σ i p i and Pav = Ptot / m, average population by region si = square root of variance

crude rate EBI local Moran local Moran for crude rate vs EBI local Moran (schools/population)

Scan Statistics

Scan Statistics count events within a given shape typically based on centroids and circle count until a given number of events is reached: Besag-Newell count until a given aggregate population is reached: Kulldorff

Besag-Newell

Principle aggregate areal units until a chosen number of events has been reached then carry out a hypothesis test with the Poisson expected count as the null what is the probability that the observed count in the aggregate areal units is from a Poisson distribution with the average aggregate with highest significance (lowest p- value) is a cluster

Implementation typically carried out using the centroids of areal units sort the neighbors in order of increasing distance add the number of events until the critical threshold (k) is exceeded

cluster 1 cluster 2 Besag-Newell clusters (schools/population)

Interpretation care is needed to interpret the p-values multiple comparisons sequential tests clusters are overlapping same areal unit can appear in multiple clusters

Kulldorff Scan Statistic

Principle aggregate areal units until a target population is reached likelihood ratio test of events within the cluster against events outside of the cluster null hypothesis is Poisson distribution with expected counts select cluster with max likelihood ratio

Likelihood Ratio Test T = max (O i/ei) Oi (Oo/Eo) Oo for Oi/Ei > Oo/Eo count within region (i) versus outside (o) O i/o observed in/out, Ei/o expected in/out inference based on randomization Tr computed for simulation under constant risk compare reference distribution of T r to observed T pseudo p-value = proportion of T r that exceeds T

cluster 1 cluster 2 Kulldorff scan clusters (schools/population)

Interpretation most likely cluster has highest log-likelihood ratio p-value based on Monte Carlo simulation other clusters ranked in order of log-likelihood ratio p-values suffer from multiple comparisons and sequential testing