Spatial Clusters of Rates Luc Anselin http://spatial.uchicago.edu
concepts EBI local Moran scan statistics
Concepts
Rates as Risk from counts (spatially extensive) to rates (spatially intensive) rate = number of events / population rate as a measure of risk (a probability) crude rate: O i / Pi relative: O i / Ei observed relative to expected
The Problem with Rates r = O / P O number of events P population (at risk) O is a random variable, P is not variance of r depends inversely on P
Moments of the Binomial Variable mean: E [O] = π.p risk times population variance: V [O] = π (1 - π).p variance depends on population P
Moments of the Rate P is just a constant E[r] = E[O]/P = π P / P = π crude rate is unbiased estimator for risk Var[r] = Var[O] / P 2 = π (1 - π) P / P 2 = π (1 - π) / P
Non-Standard Features of Rate Variance variance depends on the mean (= risk) numerator π (1 - π) = π - π 2 π higher risk implies greater variance variance depends inversely on population P P in the denominator smaller places (smaller P) have larger variance
crude rate map Empirical Bayes (EB) smoothed map effect of variance instability on outliers (schools/population)
Approaches variance instability violates the basic assumption underlying spatial autocorrelation analysis of a constant variance solutions standardized local indicators of spatial autocorrelation (EBI LISA) scan statistics
EBI Local Moran
Correcting Variance Instability NOT by smoothing rates and applying standard Moran s I smoothing induces spatial correlation BUT by adjusting the Moran s I statistic directly several proposals: constant risk hypothesis (Walter 92), Tango s I (95), Oden s Ipop (95) and Assuncao-Reis EBI (99)
Empirical Bayes Index - EBI standardizing the rate variable using an Empirical Bayes (EB) logic z i = (ri - b) / si with ri as the original rate (xi/pi), b as a mean and si as a standard deviation use local Moran with standardized rates z i
EBI Adjustment mean b = Σ x / Σ p for i = 1,...,R i i i i i.e., total sum of cases / total population, not the mean of the rates variance i = {[Σ i p i (r i - b) 2 ] / P tot } - b/p av P tot = Σ i p i and Pav = Ptot / m, average population by region si = square root of variance
crude rate EBI local Moran local Moran for crude rate vs EBI local Moran (schools/population)
Scan Statistics
Scan Statistics count events within a given shape typically based on centroids and circle count until a given number of events is reached: Besag-Newell count until a given aggregate population is reached: Kulldorff
Besag-Newell
Principle aggregate areal units until a chosen number of events has been reached then carry out a hypothesis test with the Poisson expected count as the null what is the probability that the observed count in the aggregate areal units is from a Poisson distribution with the average aggregate with highest significance (lowest p- value) is a cluster
Implementation typically carried out using the centroids of areal units sort the neighbors in order of increasing distance add the number of events until the critical threshold (k) is exceeded
cluster 1 cluster 2 Besag-Newell clusters (schools/population)
Interpretation care is needed to interpret the p-values multiple comparisons sequential tests clusters are overlapping same areal unit can appear in multiple clusters
Kulldorff Scan Statistic
Principle aggregate areal units until a target population is reached likelihood ratio test of events within the cluster against events outside of the cluster null hypothesis is Poisson distribution with expected counts select cluster with max likelihood ratio
Likelihood Ratio Test T = max (O i/ei) Oi (Oo/Eo) Oo for Oi/Ei > Oo/Eo count within region (i) versus outside (o) O i/o observed in/out, Ei/o expected in/out inference based on randomization Tr computed for simulation under constant risk compare reference distribution of T r to observed T pseudo p-value = proportion of T r that exceeds T
cluster 1 cluster 2 Kulldorff scan clusters (schools/population)
Interpretation most likely cluster has highest log-likelihood ratio p-value based on Monte Carlo simulation other clusters ranked in order of log-likelihood ratio p-values suffer from multiple comparisons and sequential testing