Cluster Analysis using SaTScan

Similar documents
Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007

An Introduction to SaTScan

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

A spatial scan statistic for multinomial data

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Applications of GIS in Health Research. West Nile virus

Railway suicide clusters: how common are they and what predicts them? Lay San Too Jane Pirkis Allison Milner Lyndal Bugeja Matthew J.

Bayesian Hierarchical Models

FleXScan User Guide. for version 3.1. Kunihiko Takahashi Tetsuji Yokoyama Toshiro Tango. National Institute of Public Health

Multivariate Count Time Series Modeling of Surveillance Data

Interactive GIS in Veterinary Epidemiology Technology & Application in a Veterinary Diagnostic Lab

SaTScan TM. User Guide. for version 7.0. By Martin Kulldorff. August

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

Spatio-temporal modeling of weekly malaria incidence in children under 5 for early epidemic detection in Mozambique

Temporal and spatial mapping of hand, foot and mouth disease in Sarawak, Malaysia

Predicting Long-term Exposures for Health Effect Studies

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Lecture 4. Spatial Statistics

OUTBREAK INVESTIGATION AND SPATIAL ANALYSIS OF SURVEILLANCE DATA: CLUSTER DATA ANALYSIS. Preliminary Programme / Draft_2. Serbia, Mai 2013

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Applied Spatial Analysis in Epidemiology

In matrix algebra notation, a linear model is written as

Spatio-Temporal Cluster Detection of Point Events by Hierarchical Search of Adjacent Area Unit Combinations

Applied Spatial Analysis in Epidemiology

Everything is related to everything else, but near things are more related than distant things.

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

School on Modelling Tools and Capacity Building in Climate and Public Health April Point Event Analysis

A nonparametric spatial scan statistic for continuous data

EpiMAN-TB, a decision support system using spatial information for the management of tuberculosis in cattle and deer in New Zealand

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Detection of Clustering in Spatial Data

Rapid detection of spatiotemporal clusters

Spatial and Temporal Geovisualisation and Data Mining of Road Traffic Accidents in Christchurch, New Zealand

Detecting Emerging Space-Time Crime Patterns by Prospective STSS

THE EFFECT OF GEOGRAPHY ON SEXUALLY TRANSMITTED

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Core Courses for Students Who Enrolled Prior to Fall 2018

Measurement Error in Spatial Modeling of Environmental Exposures

Spatial Variation in Hospitalizations for Cardiometabolic Ambulatory Care Sensitive Conditions Across Canada

Jun Tu. Department of Geography and Anthropology Kennesaw State University

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Lecture 4: Generalized Linear Mixed Models

Extended Follow-Up and Spatial Analysis of the American Cancer Society Study Linking Particulate Air Pollution and Mortality

Shape and scale in detecting disease clusters

Agro Ecological Malaria Linkages in Uganda, A Spatial Probit Model:

Geographical Information System (GIS)-based maps for monitoring of entomological risk factors affecting transmission of chikungunya in Sri Lanka

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

Analyzing the Geospatial Rates of the Primary Care Physician Labor Supply in the Contiguous United States

Long Island Breast Cancer Study and the GIS-H (Health)

WHO lunchtime seminar Mapping child growth failure in Africa between 2000 and Professor Simon I. Hay March 12, 2018

A SPACE-TIME SCAN STATISTIC FOR DETECTION OF TB OUTBREAKS IN THE SAN FRANCISCO HOMELESS POPULATION

ACCELERATING THE DETECTION VECTOR BORNE DISEASES

Dengue Forecasting Project

Spatial Pattern Analysis: Mapping Trends and Clusters. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Geographical Visualization Approach to Perceive Spatial Scan Statistics: An Analysis of Dengue Fever Outbreaks in Delhi

Detection of Clustering in Spatial Data

ESRI 2008 Health GIS Conference

Lecture 5: Sampling Methods

Combining Incompatible Spatial Data

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Outline. 15. Descriptive Summary, Design, and Inference. Descriptive summaries. Data mining. The centroid

Pumps, Maps and Pea Soup: Spatio-temporal methods in environmental epidemiology

Land Use/Cover Changes & Modeling Urban Expansion of Nairobi City

Lecture 01: Introduction

Modelling spatio-temporal patterns of disease

An Introduction to Pattern Statistics

College Of Science Engineering Health: GIS at UWF

DISCRETE PROBABILITY DISTRIBUTIONS

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

MATH/STAT 395. Introduction to Probability Models. Jan 7, 2013

INTRODUCTION. In March 1998, the tender for project CT.98.EP.04 was awarded to the Department of Medicines Management, Keele University, UK.

GIS in Locating and Explaining Conflict Hotspots in Nepal

Mapping Malaria Risk in Dakar, Senegal. Marion Borderon, Sébastien Oliveau

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

Assessing the Regional Vulnerability of Large Scale Mining in Ghana: An Application of Multi Criteria Analysis

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

M statistic commands: interpoint distance distribution analysis with Stata

Linkage Methods for Environment and Health Analysis General Guidelines

Temporal and Spatial Autocorrelation Statistics of Dengue Fever

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Success scans in a sequence of trials

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Spatial Pattern Analysis: Mapping Trends and Clusters

Chapter 6 Spatial Analysis

GIST 4302/5302: Spatial Analysis and Modeling

Applying Health Outcome Data to Improve Health Equity

GIST 4302/5302: Spatial Analysis and Modeling

Temporal vs. Spatial Data

A Comparison of Three Exploratory Methods for Cluster Detection in Spatial Point Patterns

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Using Geographic Information Systems for Exposure Assessment

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

15: Regression. Introduction

Medical GIS: New Uses of Mapping Technology in Public Health. Peter Hayward, PhD Department of Geography SUNY College at Oneonta

Approach to identifying hot spots for NCDs in South Africa

Transcription:

Cluster Analysis using SaTScan

Summary 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic Methods Probability based models (Bernoulli & Poisson) Focused tests

Outline 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic

Spatial Epidemiology Knowledge about the spatial distribution: Are there areas with higher risk than others? Possible relationship with environmental factors: Is the spatial distribution of the disease somehow related to the spatial distribution of a risk factor measured at the same aggregation level? Location of the high risk areas: Are high risk areas geographically clustered or randomly spread?

Spatial Epidemiology Disease mapping. Ecological regression (geographical correlation studies). Assessment of risk in relation to a point source. Cluster detection.

Spatial Epidemiology: cluster detection Brooker, S et al. Spatial clustering of malaria and associated risk factors during an epidemic in a highland area of western Kenya. Trop Med Int Health. 2004 Jul;9(7):757-66. The epidemiology of malaria over small areas remains poorly understood, and this is particularly true for malaria during epidemics in highland areas of Africa, where transmission intensity is low andcharacterized by acute within and between year variations. We report an analysis of the spatial distribution of clinical malaria during an epidemic and investigate putative risk factors. Active case surveillance was undertaken in three schools in Nandi District, Western Kenya for 10 weeks during a malaria outbreak in May July 2002. Household surveys of cases and age-matched controls were conducted to collect information on household construction, exposure factors and socio-economic status. Household geographical location and altitude were determined using a hand-held geographical positioning system and landcover types were determined using high spatial resolution satellite sensor data. Among 129 cases identified during the surveillance, which were matched to 155 controls, we identified significant spatial clusters of malaria cases as determined using the spatial scan statistic. Conditional multiple logistic regression analysis showed that the risk of malaria was higher in children who were underweight, who lived at lower altitudes, and who lived in households where drugs were not kept at home.

Outline 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic

What is a cluster?

What is a cluster? 1. An unusual collection of events. 2. Unusual aggregation of real or perceived health events. 3. Aggregation of relatively uncommon events or diseases in pace and/or time in amounts that are believed or perceived to be greater than could be expected by chance (Last, 2001) 4. A geographically and/or temporally bounded collection of occurrences: of a disease already known to occur typically in clusters, or of sufficient size and concentration to be unlikely to have occurred by chance, or related to each other through some social or biological mechanism, or having a common relationship with some other event or circumstance (Knox, 1989)

Cluster detection: Few issues Over what scale does any clustering occur? Are clusters merely a result of some obvious a priori heterogeneity in the study region? Are they associated with proximity to other features of interest, such as transport arteries or possible point sources of pollution? Are events that cluster in space also clustered in time?

Cluster detection: Global and Local Tests Global tests detect the presence or absence of clustering over the whole study region without specifying the spatial location. Local tests additionally specify the location and if extended to consider temporal patterns, can specify spatio-temporal clusters. A special case of local tests is the focused test which is used to detect raised incidence of disease around some pre-specified source, such as an pollutant source.

Cluster detection: Point data

Cluster detection: Area data (aggregated)

Cluster detection: Statistical Methods Global Measures Point based methods Ripley s K function (and its variants) Area based methods Moran s I Local Measures Kernel density estimation Kuldorff s spatial scan statistic Local Moran s I Kuldorff s spatial scan statistic

Outline 1. Statistical methods for spatial epidemiology 2. Cluster Detection 3. Spatial and spatio-temporal Scan Statistic Method Probability model based (Bernoulli & Poisson) Focused tests

SaTScan. Introduction Analyzes spatial, temporal and spatio-temporal data through a scan statistic Quick assessment of potential clusters Developed by Martin Kuldorff for the National Cancer Institute Freely available from www.satscan.org

SatScan. Objectives To perform geographical surveillance of disease, to detect spatial or space-time disease clusters, and to see if they are statistically significant. To test whether a disease is randomly distributed over space, over time or over space and time. To evaluate reported spatial or space-time disease clusters, to see if they are statistically significant. To perform repeated time-periodic disease surveillance for the early detection of disease outbreaks.

SatScan. Co-ordinates for area data

SatScan. The method 1. A circular scanning window is placed at different coordinates with radii that vary from 0 to some set upper limit. 2. For each location and size of window: H A = elevated risk within window as compared to outside of window

SatScan. The method

SatScan. The method Likelihood function is created depending on model selected (more later..) Likelihood = hypothetical probability that an event that has already occurred would yield a specific outcome Function is maximized over all window locations and sizes The one with the maximum likelihood is most likely cluster (least likely to have occurred by chance) Likelihood ratio for this window becomes maximum likelihood ratio test statistic A p-value is obtained for the cluster by Monte Carlo hypothesis testing

SatScan. The method Indicates whether there is clustering Shows us where it is (text description) Evaluates its statistical significance Produces a relative risk for the cluster Which can be low or high risk Does not rely on a priori knowledge of any cluster

SatScan. The method Types of Analysis & Models

SatScan. Poisson models Should be used when background population reflects a certain risk mass such as total persons in an area Expected number of cases is directly proportional to the total number of persons Example # cases of cancer per 100,000 persons

SatScan. Poisson models Likelihood function: C = total number of cases c = observed number of cases in window E[c] = covariate adjusted expected number of cases in window I() = indicator function

SatScan. Poisson models Need to import this into a GIS for visualization

SatScan. Poisson models

SatScan. Bernoulli models Bernoulli process is a discrete-time stochastic process based on Bernoulli trials An experiment whose outcome is random and can be either of two possible outcomes, success and failure Values expressed as 0 or 1 (non-cases or cases)

SatScan. Bernoulli models

SatScan. Bernoulli models Scan similar to Poisson, visiting each event Likelihood function: C = total number of cases in dataset c = observed number of cases in window n = total # of cases and controls in window N = total number of cases and controls in dataset

SatScan. Focused tests Focused tests can be conducted by providing the coordinates of one or more point sources. In these instances, the scan will only evaluate clusters around these sources and bypass the rest of the coordinates

Summary 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic Methods Probability based models (Bernoulli & Poisson) Focused tests

Case Study 1. Legionella 2. Leukemia 3. TB 4. Hepatitis A Find space and spaciotemporal cluster. Give possible causes

Case Study There are four scenarios to be analyzed The cases of disease have been simulated from the population of Madrid (Spain). The four diseases to be studied are legionella, leukemia, tuberculosis and hepatitis A. Each participant is expected to carry out a cluster analysis of just one of these diseases.

Case Study For each scenario, there is a folder containing the information required: centroid.txt contains the geographic coordinates population.txt contains the inhabitants of each census tract of Madrid aggregated_1.xls contains cases_1.xls contains Folder Shapes contains the files needed to map the area of study and the results obtained.