Cluster Analysis using SaTScan
Summary 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic Methods Probability based models (Bernoulli & Poisson) Focused tests
Outline 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic
Spatial Epidemiology Knowledge about the spatial distribution: Are there areas with higher risk than others? Possible relationship with environmental factors: Is the spatial distribution of the disease somehow related to the spatial distribution of a risk factor measured at the same aggregation level? Location of the high risk areas: Are high risk areas geographically clustered or randomly spread?
Spatial Epidemiology Disease mapping. Ecological regression (geographical correlation studies). Assessment of risk in relation to a point source. Cluster detection.
Spatial Epidemiology: cluster detection Brooker, S et al. Spatial clustering of malaria and associated risk factors during an epidemic in a highland area of western Kenya. Trop Med Int Health. 2004 Jul;9(7):757-66. The epidemiology of malaria over small areas remains poorly understood, and this is particularly true for malaria during epidemics in highland areas of Africa, where transmission intensity is low andcharacterized by acute within and between year variations. We report an analysis of the spatial distribution of clinical malaria during an epidemic and investigate putative risk factors. Active case surveillance was undertaken in three schools in Nandi District, Western Kenya for 10 weeks during a malaria outbreak in May July 2002. Household surveys of cases and age-matched controls were conducted to collect information on household construction, exposure factors and socio-economic status. Household geographical location and altitude were determined using a hand-held geographical positioning system and landcover types were determined using high spatial resolution satellite sensor data. Among 129 cases identified during the surveillance, which were matched to 155 controls, we identified significant spatial clusters of malaria cases as determined using the spatial scan statistic. Conditional multiple logistic regression analysis showed that the risk of malaria was higher in children who were underweight, who lived at lower altitudes, and who lived in households where drugs were not kept at home.
Outline 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic
What is a cluster?
What is a cluster? 1. An unusual collection of events. 2. Unusual aggregation of real or perceived health events. 3. Aggregation of relatively uncommon events or diseases in pace and/or time in amounts that are believed or perceived to be greater than could be expected by chance (Last, 2001) 4. A geographically and/or temporally bounded collection of occurrences: of a disease already known to occur typically in clusters, or of sufficient size and concentration to be unlikely to have occurred by chance, or related to each other through some social or biological mechanism, or having a common relationship with some other event or circumstance (Knox, 1989)
Cluster detection: Few issues Over what scale does any clustering occur? Are clusters merely a result of some obvious a priori heterogeneity in the study region? Are they associated with proximity to other features of interest, such as transport arteries or possible point sources of pollution? Are events that cluster in space also clustered in time?
Cluster detection: Global and Local Tests Global tests detect the presence or absence of clustering over the whole study region without specifying the spatial location. Local tests additionally specify the location and if extended to consider temporal patterns, can specify spatio-temporal clusters. A special case of local tests is the focused test which is used to detect raised incidence of disease around some pre-specified source, such as an pollutant source.
Cluster detection: Point data
Cluster detection: Area data (aggregated)
Cluster detection: Statistical Methods Global Measures Point based methods Ripley s K function (and its variants) Area based methods Moran s I Local Measures Kernel density estimation Kuldorff s spatial scan statistic Local Moran s I Kuldorff s spatial scan statistic
Outline 1. Statistical methods for spatial epidemiology 2. Cluster Detection 3. Spatial and spatio-temporal Scan Statistic Method Probability model based (Bernoulli & Poisson) Focused tests
SaTScan. Introduction Analyzes spatial, temporal and spatio-temporal data through a scan statistic Quick assessment of potential clusters Developed by Martin Kuldorff for the National Cancer Institute Freely available from www.satscan.org
SatScan. Objectives To perform geographical surveillance of disease, to detect spatial or space-time disease clusters, and to see if they are statistically significant. To test whether a disease is randomly distributed over space, over time or over space and time. To evaluate reported spatial or space-time disease clusters, to see if they are statistically significant. To perform repeated time-periodic disease surveillance for the early detection of disease outbreaks.
SatScan. Co-ordinates for area data
SatScan. The method 1. A circular scanning window is placed at different coordinates with radii that vary from 0 to some set upper limit. 2. For each location and size of window: H A = elevated risk within window as compared to outside of window
SatScan. The method
SatScan. The method Likelihood function is created depending on model selected (more later..) Likelihood = hypothetical probability that an event that has already occurred would yield a specific outcome Function is maximized over all window locations and sizes The one with the maximum likelihood is most likely cluster (least likely to have occurred by chance) Likelihood ratio for this window becomes maximum likelihood ratio test statistic A p-value is obtained for the cluster by Monte Carlo hypothesis testing
SatScan. The method Indicates whether there is clustering Shows us where it is (text description) Evaluates its statistical significance Produces a relative risk for the cluster Which can be low or high risk Does not rely on a priori knowledge of any cluster
SatScan. The method Types of Analysis & Models
SatScan. Poisson models Should be used when background population reflects a certain risk mass such as total persons in an area Expected number of cases is directly proportional to the total number of persons Example # cases of cancer per 100,000 persons
SatScan. Poisson models Likelihood function: C = total number of cases c = observed number of cases in window E[c] = covariate adjusted expected number of cases in window I() = indicator function
SatScan. Poisson models Need to import this into a GIS for visualization
SatScan. Poisson models
SatScan. Bernoulli models Bernoulli process is a discrete-time stochastic process based on Bernoulli trials An experiment whose outcome is random and can be either of two possible outcomes, success and failure Values expressed as 0 or 1 (non-cases or cases)
SatScan. Bernoulli models
SatScan. Bernoulli models Scan similar to Poisson, visiting each event Likelihood function: C = total number of cases in dataset c = observed number of cases in window n = total # of cases and controls in window N = total number of cases and controls in dataset
SatScan. Focused tests Focused tests can be conducted by providing the coordinates of one or more point sources. In these instances, the scan will only evaluate clusters around these sources and bypass the rest of the coordinates
Summary 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic Methods Probability based models (Bernoulli & Poisson) Focused tests
Case Study 1. Legionella 2. Leukemia 3. TB 4. Hepatitis A Find space and spaciotemporal cluster. Give possible causes
Case Study There are four scenarios to be analyzed The cases of disease have been simulated from the population of Madrid (Spain). The four diseases to be studied are legionella, leukemia, tuberculosis and hepatitis A. Each participant is expected to carry out a cluster analysis of just one of these diseases.
Case Study For each scenario, there is a folder containing the information required: centroid.txt contains the geographic coordinates population.txt contains the inhabitants of each census tract of Madrid aggregated_1.xls contains cases_1.xls contains Folder Shapes contains the files needed to map the area of study and the results obtained.