GEOGRAPHIC INFORMATION SYSTEMS FOR SPATIAL DISEASE CLUSTER DETECTION, SPATIO-TEMPORAL DISEASE MAPPING, AND HEALTH SERVICE PLANNING PING YIN

Size: px
Start display at page:

Download "GEOGRAPHIC INFORMATION SYSTEMS FOR SPATIAL DISEASE CLUSTER DETECTION, SPATIO-TEMPORAL DISEASE MAPPING, AND HEALTH SERVICE PLANNING PING YIN"

Transcription

1 GEOGRAPHIC INFORMATION SYSTEMS FOR SPATIAL DISEASE CLUSTER DETECTION, SPATIO-TEMPORAL DISEASE MAPPING, AND HEALTH SERVICE PLANNING by PING YIN (Under the Direction of Lan Mu and Marguerite Madden) ABSTRACT Geographic information systems (GIS) are increasingly recognized as an effective and efficient tool to deal with geographic questions in health studies. The overarching research question of this dissertation asks how GIS and spatial analysis can be used to facilitate public health studies. Three aspects of health studies are included: spatial disease cluster detection, spatio-temporal disease mapping, and health service planning. New methods or models are proposed and implemented with GIS in this dissertation to address an important problem in each of the three aspects. First, a redesigned spatial scan statistic (RSScan) is proposed to quickly detect disease clusters in arbitrary shapes. The experimental results indicate that the improved RSScan method generally has higher power and accuracy than three existing methods for detecting the clusters in irregular shapes. Second, to explore the spatio-temporal patterns of lung cancer incidence risks in Georgia between 2000 and 2007, a total of seven hierarchical Bayesian models are developed and compared at the census tract level using a two-year time period as the temporal unit. The study shows the northwest region of Georgia has stably elevated lung cancer incidence risks for

2 all the population groups by race and sex. It also shows that there are strong inverse relationships between socioeconomic status and lung cancer incidence risk in males and weak inverse relationships in females in Georgia. Finally, two transportation models that address the modular capacitated maximal covering location problem (MCMCLP) are proposed and used to optimally site ambulances for Emergency Medical Services (EMS) Region 10 in Georgia. As a component of the allocation-location problems for health service planning, spatial demand representation is discussed and three representation approaches are empirically compared in both problem complexity and representation error. Results of this dissertation contribute to the advancement of geospatial analysis in disease surveillance and health service decision making. Future research could include using GIS and spatial analysis to improve the accuracy of detected clusters, explore the environmental factors related to the spatio-temporal patterns of lung cancer incidence risks in Georgia, and integrate population movement in health service planning. INDEX WORDS: GIS, Public health, Cluster detection, Disease mapping, Health planning

3 GEOGRAPHIC INFORMATION SYSTEMS FOR SPATIAL DISEASE CLUSTER DETECTION, SPATIO-TEMPORAL DISEASE MAPPING, AND HEALTH SERVICE PLANNING by PING YIN B.E., Tsinghua University, China, 2002 M.E., Tsinghua University, China, 2005 A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY ATHENS, GEORGIA 2012

4 2012 Ping Yin All Rights Reserved

5 GEOGRAPHIC INFORMATION SYSTEMS FOR SPATIAL DISEASE CLUSTER DETECTION, SPATIO-TEMPORAL DISEASE MAPPING, AND HEALTH SERVICE PLANNING by PING YIN Major Professor: Committee: Lan Mu Marguerite Madden Xiaobai Yao Thomas Jordan John Vena Electronic Version Approved: Maureen Grasso Dean of the Graduate School The University of Georgia August 2012

6 ACKNOWLEDGEMENTS Five years Ph.D. study in the Department of Geography at the University of Georgia (UGA) is great experience to me. I am grateful to all of those people who supported and helped me to finish my dissertation research. First and foremost, my deepest gratitude goes to my major professors, Dr. Lan Mu and Dr. Marguerite Madden, for their excellent guidance and full supports. Without their endless input, timely feedbacks, and great inspiration, I cannot have my research finished today. I really appreciate their dedication and generous help to my research and other academic activities. I would thank Dr. John Vena in the Department of Epidemiology and Biostatistics at UGA for providing me the health data for my research. His invaluable advice from an epidemiological perspective greatly improves my research. I would also acknowledge Dr. Xiaobai Yao and Dr. Thomas Jordan for their insightful advices and suggestions on this research and other academic areas. I want to thank Dr. Andrew Herod. He made me realize that how important correct citations are in academic writing. The institutions that sponsored my research deserve special notice. They are the UGA research foundation and the UGA graduate school with the dean s award in social sciences and the dissertation completion award. Finally, I deeply thank my parents and my wife, Jing. It is their unconditional love and endless patience that encourage me to finish my dissertation. iv

7 TABLE OF CONTENTS Page ACKNOWLEDGEMENTS... iv LIST OF TABLES... viii LIST OF FIGURES... x CHAPTER 1 INTRODUCTION AND LITERATURE REVIEW Background Research Objectives Literature Review Dissertation Structure References DETECTING DISEASE CLUSTERS IN ARBITRARY SHAPES WITH A REDESIGNED SPATIAL SCAN STATISTIC Abstract Introduction Existing Methods for Detection of Disease Clusters Redesigned Spatial Scan Method (RSScan) Performance Evaluation Application: Georgia Lung Cancer, Discussion and Conclusions v

8 References HIERARCHICAL BAYESIAN MODELING OF THE SPATIO-TEMPORAL PATTERNS OF LUNG CANCER INCIDENCE RISKS IN GEORGIA, Abstract Introduction Study Area and Data Methods Results Discussions Conclusions References MODULAR CAPACITATED MAXIMAL COVERING LOCATION PROBLEM FOR THE OPTIMAL SITING OF EMERGENCY VEHICLES Abstract Introduction Modular Capacitated Maximal Covering Location Problem (MCMCLP) Spatial Demand Representation Applications: Optimal Siting of Ambulances Discussion Conclusion References AN EMPIRICAL COMPARISON OF SPATIAL DEMAND REPRESENTATIONS IN MAXIMAL COVERAGE MODELING vi

9 Abstract Introduction Representation Error in Covering Location Modeling The MCLP Model and Problem Complexity Service Area Spatial Demand Representation Experimental Design Results and Discussions Conclusions References CONCLUSIONS Summary and Conclusions Future Research References APPENDICES I LIST OF ACRONYMS vii

10 LIST OF TABLES Page Table 2.1: Test statistics and search strategies of four spatial scan methods Table 2.2: Information of simulated cluster models Table 2.3: Estimated power of four spatial scan methods (significance level=0.05) Table 2.4: Contingency table for detected cluster estimates and true clusters Table 2.5: KIAs between the most likely clusters and true clusters for four spatial scan methods 36 Table 2.6: Average Type I error of four spatial scan methods Table 3.1: Total number of cases of individuals over 20 years old and the percentage of included cases in the analyses by sex and race Table 3.2: Variables incorporated in the modified Darden-Kamel Composite Index Table 3.3: Components of logarithms of RRs in the seven Bayesian spatio-temporal models Table 3.4: DICs of the seven models Table 3.5: Posterior median (95% CI) of the shared temporal components and differential temporal components Table 3.6: Posterior median (95% CI) of the RRs for SES quintile Table 3.7: Correlations between the posterior median RRs using model 2 with two different types of hyperpriors Table 4.1: Information for roads Table 4.2: Count of the facilities with varied numbers of ambulances Table 5.1: Numbers of demand objects in 45 SASDRs viii

11 Table 5.2: Numbers of demand objects in all demand representations for comparison Table 5.3: Minimum numbers of facilities reported by models for covering 100% demand Table 5.4: Cost and optimality errors between grid-point-based demand representations and SASDRs Table 5.5: Cost and optimality errors between grid-rectangle-based demand representations and SASDRs ix

12 LIST OF FIGURES Page Figure 1.1: GIS functions and GIS applications in public health... 4 Figure 1.2: Logical structure of the dissertation research... 9 Figure 2.1: Graph-based representation of a region map Figure 2.2: Population 2000 by counties in GA in the United States Figure 2.3: Locations of simulated clusters: (a) circular shape (b) linear shape (c) trifurcate shape 30 Figure 2.4: Estimated average power of four spatial scan methods Figure 2.5: Average KIAs of four spatial scan methods Figure 2.6: SIRs and the detected cluster of lung cancer incidence in GA, Figure 3.1: Population density by census tract and the 10 most populous cities in Georgia Figure 3.2: Quintile map of SES in Georgia Figure 3.3: Maps of crude standardized incidence ratios (SIRs) by race and sex during Figure 3.4: Maps of the posterior median RRs for white males in each time period Figure 3.5: Maps of the posterior median RRs for white females in each time period Figure 3.6: Maps of the posterior median RRs for black males in each time period Figure 3.7: Maps of the posterior median RRs for black females in each time period Figure 3.8: Maps of elevated RR frequency by race and sex during Figure 3.9: Maps of the posterior median of the shared spatial component and differential spatial components x

13 Figure 4.1: Illustration of three demand types: unallocated demand (d a and d b ), covered allocated demand (d c ), and uncovered allocated demand (d d ) Figure 4.2: Example of the SASDR with circular facility service area (a) demand space U (the square) and two potential service areas S 1 and S 2 (the circles) (b) four demand objects in the SASDR result of demand space U partitioned by service areas S 1 and S Figure 4.3: Population density of Georgia EMS Region 10 (study area) by census block group and existing ambulance facility locations Figure 4.4: Road network in EMS Region 10 in GA Figure 4.5: Eight-minute service areas (non-white polygons) of all potential ambulance facility sites (red points) based on the road network Figure 4.6: SASDR result for the study area with demand (population) distribution Figure 4.7: Results of the MCMCLP models siting 58 ambulances in 82 potential facility locations with w= (the facility location is rendered in the same color as its allocation area) (a) the MCMCLP-NFC model (b) the MCMCLP-FC model with 20 facilities Figure 5.1: Examples of spatial demand representations with (a) census blocks or their centroids, and (b) rectangle grid or its centroids Figure 5.2: Illustration of overlay operation A B: (a) set A and set B (b) the result from A B 114 Figure 5.3: The SASDR with circular facility service area: (a) demand space U and two potential service areas S 1 and S 2, (b) the partition of demand space U with service area S 1, and (c) the partition of demand space U with both service areas S 1 and S Figure 5.4: Three modes of potential facility sites: (a) regular grid points with spacing R, (b) centroids of census blocks, and (c) intersections of major roads xi

14 Figure 5.5: Examples of grid-point-based and grid- rectangle-based demand representations for comparison with SASDR Figure 5.6: Relationship between Site-Service Index and demand object density in SASDR with circular service coverage Figure 5.7: Percentages of covered demand reported by the MCLP models with 3 types of demand representations when the configuration of potential facility sites include: (a) 66 grid points, (b) 272 grid points (c) 66 block centroids, and (d) 272 block centroids xii

15 CHAPTER 1 INTRODUCTION AND LITERATURE REVIEW 1.1 Background Because all fields are changing all along, the debate on the definitions and scopes of subfields such as medical geography, health geography and spatial epidemiology still continues (Brown et al. 2010). However, it cannot be denied that more and more attention from the researchers in health, geography, and other fields are drawn to the geographic component of health, i.e., the question where. Where are populations at risk? Where are hotspot areas with elevated disease risks? Where can we intervene to eliminate or reduce disease risks? Where can we locate healthcare facilities to improve health services delivery? Geographic information systems (GIS), which were originally used within the formal discipline of geography, are increasingly recognized as an effective and efficient tool to deal with these geographic questions in research and practices in epidemiology and public health (Rushton 2003, Najafabadi 2009, Nykiforuk and Flaman 2011, Cromley and McLafferty 2012). Actually, over 150 years ago, early public health professionals learned that maps could be used to explore patterns of diseases and relationships between diseases and risk factors. In 1840, Robert Cowan used a map to show the relationship between fever and overcrowding in Glasgow (Melnick 2002). The famous story about John Snow, one of the fathers of modern epidemiology, is often used in current textbooks in epidemiology, disease mapping and GIS to illustrate the one of the first uses of a map to identify a disease source (Melnick 2002, Koch 2005, Longley et al. 1

16 2005). In 1854, John Snow plotted a map showing the cholera deaths in the Soho district of London, by which he demonstrated the association between these deaths and contaminated water supplies from a public water pump in the center of the outbreak. Since the development of the first real GIS, the Canada Geographic Information System in the mid-1960s, there has been a rapid increase and great improvement in the functions of GIS based on the advances in computer science, cartography, computational geometry, and spatial statistics. Cromley and McLafferty (2012) define GIS as computer-based systems for the integration and analysis of geographic data. They classify GIS functions into three broad categories based on what people want to do with spatial data: 1) spatial database management; 2) visualization and mapping; and 3) spatial analysis. In the past, GIS was regarded as a technology as discussed above. Nowadays, GIS has been attached with multiple labels, such as GIS software, GIS data, GIS community, and doing GIS (Longley et al. 2005). Goodchild (1992) coined the term of GIScience that refers to the research field about the fundamental principles and questions underlying the activities of using GIS as a technology. Nykiforuk and Flaman (2011) reviewed GIS applications in public health and classified four content categories in order of descending prevalence in the literature: disease surveillance, risk analysis, health access and planning, and community health profiling. Disease surveillance is the compilation and tracking of data on the incidence prevalence, and spread of disease (Wall and Devine 2000). Cluster detection, disease mapping, and disease modeling are several interrelated components of disease surveillance. Cluster detection is an analysis process that aims to identify hotspot areas with elevated disease risks. Disease mapping is used to understand the distribution of disease or disease risk in the past or present. Disease modeling extends the disease mapping to identify factors associated with disease risks in order to predict the future spread of 2

17 disease. These components of disease surveillance that are important for disease prevention and control can be conducted in spatial or spatio-temporal dimensions. Risk analysis includes some aspect(s) of risk assessment, management, communication, or monitoring relative to impacts on health (Nykiforuk and Flaman 2011). Health access and planning is to evaluate and improve health services delivery. Community health profiling is the compilation of mapping of information regarding the health of a population in a community. These four categories are overlapping. For example, in a disease mapping application, risk analyses could also be conducted. Figure 1.1 shows GIS functions and GIS applications in public health based on Cromley and McLafferty s (2012) and Nykiforuk and Flaman s (2011) classifications discussed above. It is impossible to completely describe all of GIS functions and how they can be used in public health studies because the use of GIS functions is usually application-dependent and both GIS and health studies are evolving all along. Here, we only briefly list several aspects to show how GIS can greatly facilitate health studies, including population estimation, data integration, exposure assessment, healthcare access evaluation, and communication. (1) Population estimation It is important for health studies to understand the distribution of a population at risk. Because of the economic and social processes that structure residential development, age, sex and race-ethnicity of the population are usually not uniform throughout the region of settlement (Cromley and McLafferty 2012). GIS makes it possible to view residential distributions in great detail. In addition to residence, GIS can help to model people s activity in space and their migration processes to understand the exposure people experienced, which is important for the studies of diseases with a long latency period such as cancers. Sometimes, population data are 3

18 not available in some regions or some time periods, GIS can be used to interpolate or modeling the population with available data in other regions or time periods. Spatial database Store Join Query Edit Delete Visualization and mapping Tables Graphs Maps Statistics GIS functions Spatial analysis Measurement Topological analysis Network analysis Surface analysis Spatial statistics Public health studies Disease surveillance Cluster detection Disease mapping Disease modeling Risk analysis Assessment Management Communication Monitoring Health access and planning Market segmentation Client catchment areas Market utilization Location-allocation modeling Community health profiling Mapping health and setting variables in a community Multilevel, ecological links between people and settings Figure 1.1. GIS functions and GIS applications in public health 4

19 (2) Data integration The strong capability of spatial data management of GIS makes it easy to integrate multiple geographic data of health outcomes and environmental, socioeconomic, and behavioral factors based on geographic information (location). These spatial data may be collected by different local, state, or federal agencies, public and private, using different devices or technology. Linking all of these data can give a more comprehensive context or settings of the disease of interest, which is essential to identify relationships between diseases and all kinds of factors and develop etiological hypotheses. (3) Exposure assessment Accurate estimation and mapping of exposures is clearly vital if valid inferences are to be drawn either about the spatial distribution of risk factors, or about their geographic relationship with health outcome (Elliott et al. 2000). Suitable measures, such as biomarkers, tend to be costly and invasive. Therefore, especially for population-based research, it is common to estimate exposure based on environmental monitoring data, such as air pollutant concentrations, or using proxy measures of exposure, such as distance from source. These indirect methods can be easily conducted in GIS using interpolation methods and measuring functions. (4) Healthcare access evaluation Evaluating current status of health service delivery is important for health policy making and utilization of resources. The network analysis functions in GIS provide convenient ways to calculate client catchment areas of healthcare facilities and the shortest distance from population to healthcare facilities. Some measures for healthcare accessibility, such as the two-step floating catchment area method (2SFCA) for assessing the local availability of services in relation to 5

20 population need (Luo and Wang 2003), can easily be implemented in GIS using join and sum functions. (5) Communication Preparing and displaying maps of health information are among the most important functions of public health GIS (Cromley and McLafferty 2012). By portraying the results of analysis on a map, GIS technology gives communities an easily understandable visual picture of community health (Melnick 2002). Maps are recognized as one of the most important communication tools among researchers, decision makers, and public. With the development of Internet GIS, the health information can be quickly published using interactive web mapping to anyone with access to the Internet (Theseira 2002, Boulos 2003, Boulos 2005). Based on the above examples of GIS applications in health, we can see that GIS can be used as a natural and effective means to approach a variety of program, policy, and planning issues in health promotion and public health (Nykiforuk and Flaman 2011). 1.2 Research Objectives The overarching research question of this dissertation asks how GIS and spatial analysis can be used to facilitate public health studies. Understanding health status and then effectively and efficiently providing health care service are necessary to promote public health. Therefore, this research involves three aspects of health studies related with heath surveillance and health service planning: spatial disease cluster detection, spatio-temporal disease mapping, and optimal siting of health facilities. The first two are both techniques used to describe the distribution of a disease. Spatial disease cluster detection is to quickly identify the hotspot areas with elevated risks. Usually, it only requires health outcome data and basic population data. It is very useful for health departments to maintain surveillances on disease outbreaks. However, it cannot provide 6

21 detailed information on the spatial patterns of disease risks within hotspot areas and other areas of interest. Spatio-temporal disease mapping can complement cluster detection analysis. It can provide the spatio-temporal patterns of disease risks across the whole study area and the time period. These health patterns can be linked to all kinds of factors to develop etiological hypotheses. Knowing the patterns of disease risks is not the end. The goal of health study is to prevent and control the spread of disease and promote public health. Given the patterns of disease risks obtained from disease mapping analyses, we can easily identify areas with high health service needs. Then, based on the spatial distribution of the needs, health service can be planned more effectively and efficiently. This dissertation research includes three main objectives, each of which addresses an important problem in the three aspects of health studies by developing new methods or models that are implemented with GIS and spatial analysis. More specifically, these three objects are: (1) To develop a new method to detect disease clusters in arbitrary shapes with higher statistical power and more accurate geographic boundaries; (2) To develop hierarchical Bayesian models to explore the spatio-temporal patterns of lung cancer incidence risks by race and sex in Georgia ( ) at a fine spatio-temporal scale; (3) To develop a new location-allocation model to optimally site ambulances so that the emergency medical services (EMS) can be delivered more effectively and efficiently. In the study of the location-allocation model for health service planning, a sub-problem spatial demand representation is worth discussing since it is highly related to modeling errors and problem complexity. Therefore, this dissertation research is also to empirically compare 7

22 three existing spatial demand representation approaches to provide some implications on how to choose appropriate one for a specific application. In general, Figure 1.2 shows the logical structure of the dissertation research. 1.3 Literature Review Detection of Irregular Disease Clusters Detection of disease clusters in time, space or space-time has generated considerable interests within disciplines of geography and public health for many decades (Besag and Newell 1991, Maheswaran and Craglia 2004, Lawson 2006). The shape of the geographic area of a true disease cluster may be arbitrary. For example, air pollution diffusing from an incinerator may cause an arbitrary disease cluster due to the wind strength and direction. To detect clusters in irregular shapes, several methods have been proposed in (Duczmal and Assunção 2004, Tango and Takahashi 2005, Aldstadt and Getis 2006, Duczmal et al. 2006, Kulldorff et al. 2006, Yiannakoulias et al. 2007, Duczmal et al. 2008, Duczmal et al. 2009, Cançado et al. 2010). Seeking methods for detection of clusters in irregular shapes with higher statistical power and more accurate geographic boundary is still a hot topic in current health research Spatio-temporal Mapping of Disease Risks Lung cancer is not only the second most commonly diagnosed cancer in men and women, but also the leading cause of cancer-related death in Georgia (Georgia Department of Public Health 2008). However, as far as we know, the lung cancer studies in Georgia are very few, and most of them mainly focus on descriptive analyses using crude rates at a coarse spatio-temporal scale, such as the 5-year incidence rates at the health district or county level. Such analyses are not useful for assessing the health of diverse communities, and could introduce inferential biases on etiological hypotheses. In addition, they can only provide limited help for healthcare 8

23 GIS for public health studies Component Component Health surveillance Health service planning Component Component Component Spatial disease cluster detection Spatio-temporal disease mapping Optimal siting of health facilities Sub-problem Spatial demand representation Research Topic Research Topic Research Topic Research Topic New method for detection of clusters with irregular shapes Spatio-temporal Bayesian models for Georgia lung cancer mapping at fine scales New location-allocation model for ambulance siting Comparison of three spatial demand representations Figure 1.2. Logical structure of the dissertation research 9

24 performance assessment and health policy making to improve the efficiency of interventions and the distribution of resources. The low reliability of the disease rates for small population areas is one of the challenges for mapping disease risk at a fine spatio-temporal scale. Recently, hierarchical Bayesian models have been widely used to map disease risk spatially or spatiotemporally to overcome or mitigate the small number problem (Bernardinelli et al. 1995, Waller et al. 1997, Xia and Carlin 1998, Knorr-Held 2000, Mollié 2001, Wakefield et al. 2001, Best et al. 2005, Richardson et al. 2006, Abellan et al. 2008, Lawson 2009, Fortunato et al. 2011). When mapping one disease for multiple population groups or multiple diseases that have common risk factors, a joint modeling framework can be used (Knorr-Held and Best 2001, Held et al. 2005, Richardson et al. 2006, Downing et al. 2008). In this modeling framework, a set of shared random components exists in each model Capacitated Maximal Covering Location Problems Given a covering standard for a service, such as a distance or travel-time maximum, the objective of the maximal covering location problem (MCLP) is to locate a fixed number of facilities to provide the service to cover as many demands as possible. MCLP modeling, after being put forward by Church and ReVelle (1974), has been a powerful and widely used tool in many planning processes to optimally distribute limited resources to maximize social and economic benefits. Chung et al. (1983) and Current and Storbeck (1988) published two early papers dealing with the capacitated versions of the MCLP where the demands allocated to a facility will not exceed the capacity of that facility. In all capacitated MCLP models, only one fixed capacity level of the facility is considered for each potential facility site. However, many situations arise where each potential facility site could have several possible maximum capacity levels for a facility to choose. For example, the capacity limit of an emergency facility (e.g., 10

25 ambulance base or fire station) can be assumed to be determined by its stationed emergency vehicles (e.g., ambulances or fire trucks). Therefore, varied numbers of emergency vehicles will provide a series of possible maximum capacity levels for the emergency facility to choose Spatial Demand Representations For covering location modeling, it is common to assume that aggregated or continuous spatial demand is concentrated on a set of points or uniformly distributed within areal units. Different from the traditional area-based representations using census units or regular polygons, such as triangles or rectangles, as demand objects, Cromley et al. (2012) proposed a new areabased demand representation that partitions a continuous demand space into a set of the least common demand coverage units (LCDCUs) by overlaying demand coverage areas at potential facility sites. This representation approach, without complicated model formulations, could reduce or eliminate some errors associated with the traditional point-based and area-based representations. Many covering location models, such as the maximal covering location problem (MCLP), have been proven to be nondeterministic polynomial time (NP)-hard (Megiddo et al. 1981), which means that no algorithm has been discovered yet to solve it in polynomial time in the worst case. Actually, the size of a covering location problem is highly related to the demand representation it adopts. Therefore, even if a demand representation approach may theoretically reduce or eliminate some representation errors in a problem, it probably could make the problem difficult, if not impossible, to solve using exact methods in current optimization software. Relying on some heuristic algorithms to solve such a complicated problem may introduce other errors in modeling results. It is worth noting that the complexity of problems associated with demand representations is rarely discussed in current literature. 11

26 1.4 Dissertation Structure The dissertation structure is organized into six chapters. Chapter 1 is a brief introduction of the background and objectives of the dissertation research, and literature review of the topics covered in this dissertation, including the detection of irregular disease cluster, spatio-temporal mapping of disease risks, capacitated maximal covering location problems, and spatial demand representations. The following four chapters are separate papers published in or to be submitted to journals. In Chapter 2, a redesigned spatial scan statistic is proposed to detect disease clusters with irregular shapes. Chapter 3 develops seven hierarchical Bayesian models under separate and joint modeling frameworks to explore the spatio-temporal patterns of lung cancer incidence risks in Georgia ( ) at the census tract level with a two-year temporal unit. Chapter 4 develops modular capacitated maximal covering location problem (MCMCLP) models to optimally site emergency vehicles (e.g. ambulance). In Chapter 5, three spatial demand representation approaches are compared in both representation error and problem complexity using the MCLP as an example. Chapter 6 provides conclusions of this dissertation and shows the future work. 12

27 References Abellan, J.J., Richardson, S. & Best, N., Use of space time models to investigate the stability of patterns of disease. Environmental health perspectives, 116 (8), Aldstadt, J. & Getis, A., Using amoeba to create a spatial weights matrix and identify spatial clusters. Geographical analysis, 38 (4), Bernardinelli, L., Clayton, D., Pascutto, C., Montomoli, C., Ghislandi, M. & Songini, M., Bayesian analysis of space time variation in disease risk. Statistics in Medicine, 14 (21 22), Besag, J. & Newell, J., The detection of clusters in rare diseases. Journal of the Royal Statistical Society. Series A (Statistics in Society), 154 (1), Best, N., Richardson, S. & Thomson, A., A comparison of bayesian spatial models for disease mapping. Statistical Methods in Medical Research, 14 (1), 35. Boulos, M.N.K., The use of interactive graphical maps for browsing medical/health internet information resources. International Journal Of Health Geographics, 2 (1), 1. Boulos, M.N.K., Web gis in practice iii: Creating a simple interactive map of england's strategic health authorities using google maps api, google earth kml, and msn virtual earth map control. International Journal Of Health Geographics, 4 (1), 22. Brown, T., Mclafferty, S. & Moon, G. eds A companion to health and medical geography, Chichester, UK: Wiley-Blackwell. Cançado, A.L.F., Duarte, A.R., Duczmal, L.H., Ferreira, S.J., Fonseca, C.M. & Gontijo, E.C.D.M., Penalized likelihood and multi-objective spatial scans for the detection and inference of irregular clusters. International Journal of Health Geographics, 9 (1), 55. Chung, C., Schilling, D. & Carbone, R., Year. The capacitated maximal covering problem: A heuristiced.^eds. Proceedings of the Fourteenth Annual Pittsburgh Conference on Modeling and Simulation, Church, R. & Revelle, C., The maximal covering location problem. Papers in regional science, 32 (1),

28 Cromley, E.K. & Mclafferty, S.L., Gis and public health, 2nd ed. New York: The Guilford Press. Cromley, R.G., Lin, J. & Merwin, D.A., Evaluating representation and scale error in the maximal covering location problem using gis and intelligent areal interpolation. International Journal of Geographical Information Science, 26 (3), Current, J. & Storbeck, J., Capacitated covering models. Environment and Planning B, 15, Downing, A., Forman, D., Gilthorpe, M., Edwards, K. & Manda, S., Joint disease mapping using six cancers in the yorkshire region of england. International Journal of Health Geographics, 7 (1), 41. Duczmal, L. & Assunção, R., A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Computational Statistics & Data Analysis, 45 (2), Duczmal, L., Cançado, A.L.F. & Takahashi, R.H.C., Geographic delineation of disease clusters through multi-objective optimization. Journal of Computational & Graphical Statistics, 17, Duczmal, L., Duarte, A.R. & Tavares, R., Extensions of the scan statistic for the detection and inference of spatialclusters. Scan Statistics, Duczmal, L., Kulldorff, M. & Huang, L., Evaluation of spatial scan statistics for irregularly shaped clusters. Journal of Computational and Graphical Statistics, 15 (2), Elliott, P., Wakefield, J.C., Best, N.G. & Briggs, D.J., Spatial epidemiology: Methods and applications. In Elliott, P., Wakefield, J.C., Best, N.G. & Briggs, D.J. eds. Spatial epidemiology: Methods and applications. New York: Oxford univeristy press, Fortunato, L., Abellan, J.J., Beale, L., Lefevre, S. & Richardson, S., Spatio-temporal patterns of bladder cancer incidence in utah ( ) and their association with the presence of toxic release inventory sites. International Journal of Health Geographics, 10 (1), 16. Georgia Department of Public Health, Cancer program and data summary. Atlanta,GA. 14

29 Goodchild, M.F., Geographical information science. International Journal of Geographical Information Systems, 6 (1), Held, L., Natário, I., Fenton, S.E., Rue, H. & Becker, N., Towards joint disease mapping. Statistical Methods in Medical Research, 14 (1), Knorr-Held, L., Bayesian modelling of inseparable space-time variation in disease risk. Statistics in Medicine, 19 (17-18), Knorr-Held, L. & Best, N.G., A shared component model for detecting joint and selective clustering of two diseases. Journal of the Royal Statistical Society: Series A (Statistics in Society), 164 (1), Koch, T., Cartographies of disease : Maps, mapping, and medicine Redlands, California: ESRI Press. Kulldorff, M., Huang, L., Pickle, L. & Duczmal, L., An elliptic spatial scan statistic. Statistics in Medicine, 25 (22), Lawson, A., Statistical methods in spatial epidemiology, 2nd ed. Chichester, England ; Hoboken, NJ: Wiley. Lawson, A.B., Bayesian disease mapping: Hierarchical modeling in spatial epidemiology: Chapman & Hall/CRC. Longley, P.A., Goodchild, M.F., Maguire, D.J. & Rhind, D.W., Geographic information systems and science, 2nd ed.: John Wiley & Sons, Ltd. Luo, W. & Wang, F., Measures of spatial accessibility to health care in a gis environment: Synthesis and a case study in the chicago region. Environment and Planning B, 30 (6), Maheswaran, R. & Craglia, M., Gis in public health practice Boca Raton: CRC Press. Megiddo, N., Zemel, E. & Hakimi, S.L., The maximum coverage location problem: Northwestern University. 15

30 Melnick, A.L., Introduction to geographic information systems in public health Gaithersburg, Maryland: Aspen Publishers. Mollié, A., Bayesian mapping of hodgkins disease in france. Spatial Epidemiology, 1 (9), Najafabadi, A.T., Applications of gis in health sciences. Shiraz E Medical Journal, 10 (4), Nykiforuk, C.I.J. & Flaman, L.M., Geographic information systems (gis) for health promotion and public health: A review. Health Promotion Practice, 12 (1), Richardson, S., Abellan, J. & Best, N., Bayesian spatio-temporal analysis of joint patterns of male and female lung cancer risks in yorkshire (uk). Statistical Methods in Medical Research, 15 (4), 385. Rushton, G., Public health, gis and spatial analytic tools. Annual Review of Public Health, 24, Tango, T. & Takahashi, K., A flexibly shaped spatial scan statistic for detecting clusters. International Journal of Health Geographics, 4, Theseira, M., Using internet gis technology for sharing health and health related data for the west midlands region. Health & Place, 8 (1), Wakefield, J., Best, N. & Waller, L., Bayesian approaches to disease mapping. Spatial Epidemiology, 1 (9), Wall, P.A. & Devine, O.J., Interactive analysis of the spatial distribution of disease using a geographic information systems. Journal of geographical systems, 2 (3), 243. Waller, L., Carlin, B., Xia, H. & Gelfand, A., Hierarchical spatio-temporal mapping of disease rates. Journal of the American Statistical Association, Xia, H. & Carlin, B., Spatio-temporal models with errors in covariates: Mapping ohio lung cancer mortality. Statistics in Medicine, 17 (18),

31 Yiannakoulias, N., Rosychuk, R.J. & Hodgson, J., Adaptations for finding irregularly shaped disease clusters. International Journal of Health Geographics, 6 (1),

32 CHAPTER 2 DETECTING DISEASE CLUSTERS IN ARBITRARY SHAPES WITH A REDESIGNED SPATIAL SCAN STATISTIC 1 1 Yin, P. and Mu, L. To be submitted to Geographical Analysis. 18

33 Abstract Detection and surveillance of spatial disease clusters in arbitrary shapes have generated considerable interest within disciplines of geography and public health. However, most of existing methods have drawbacks such as enormous computing workloads, peculiar-shape clusters detected, multiple testing problem, and among others. In this study, the commonly-used Kulldorff s circular spatial scan statistic (CSScan) was redesigned to quickly detect spatial disease clusters in arbitrary shapes by using Tango s restricted likelihood ratio as the test statistic combined with Assunção et al. s dynamic Minimum Spanning Tree (dmst) search strategy. Six cluster models and two non-cluster scenarios were designed and five hundred replications for each model were simulated to test and compare the performances of the redesigned spatial scan statistic method (RSScan) with Tango s method, Assunção et al. s method, and Kulldorff s CSScan method to detect the statistically significant clusters and identify the boundaries of clusters. Besides the metric of power, the Kappa Index of Agreement (KIA) was used to indicate the degree of match between a cluster estimate and the true cluster. The results from the performance experiment indicate that the RSScan method with appropriate parameters, which were explored in this study, generally has a higher or similar capability to rapidly detect spatial disease clusters in arbitrary shapes than other three methods. RSScan method was then applied to detecting the cluster of lung cancer in the State of Georgia in United States for the period of 1998 to Limitations of RSScan method are also discussed. Keywords: Spatial scan statistic, Restricted likelihood ratio, Disease cluster, Arbitrary shape, Dynamic Minimum Spanning Tree 19

34 2.1 Introduction Detection of disease clusters in time, space or space-time has generated considerable interest within disciplines of geography and public health for many decades (Besag and Newell 1991, Maheswaran and Craglia 2004, Lawson 2006). Lawson (2006) described a disease cluster as any area within the study region of significant elevated risk of a particular disease. It is also referred to as hot-spot cluster. The causes of disease clusters may include the communicability of some diseases, adverse effects from physical, socioeconomic, or psychosocial environment, certain kinds of lifestyles which are commonly considered harmful to health, such as smoking, and poor accessibility to healthcare (Maheswaran and Craglia 2004). Detecting disease clusters not only aids the analysis of disease etiology, but also enables public health departments improve their surveillance, distribute funding and other resources and control for possible disease outbreaks. It is well accepted that the spatial variation of disease incidence is highly related with the background population at risk. For example, the occurrence of a kind of disease in an urban area is higher than that in a rural area, maybe only due to the larger population in the urban area. If two cities have the same size of population, but the proportion of population over age 60 in the first city is much higher than that in the second city, it is not surprising that the incidence of cardiovascular disease in the first city is higher. In addition, the geographic area s shape of a true disease cluster may be arbitrary. For example, air pollution diffusing from an incinerator may cause an arbitrary disease cluster due to the wind strength and direction. Therefore, detection of the spatial disease clusters should not only take account of the spatial variation of population at risk, but also be able to catch arbitrary shapes of detected disease clusters. 20

35 In the following sections, Section 2 is a brief review of several well-known methods for detecting spatial disease clusters. Section 3 proposes a redesigned spatial scan method (RSScan) using Tango s (2008) restricted likelihood ratio as the test statistic combined with Assunção et al. s (2006) dynamic Minimum Spanning Tree (dmst) search strategy to quickly detect spatial disease clusters in arbitrary shapes. Section 4 tests the performance of RSScan with simulated data, which is followed by an application in Section 5 using RSScan to detect the cluster of lung cancer in Georgia from 1998 to Section 6 concludes the paper. 2.2 Existing Methods for Detection of Disease Clusters Local Moran s I is an index which has been widely used to identify clusters (Anselin 1995, Jacquez and Greiling 2003, Rogerson and Yamada 2009, Goovaerts 2010). However, there are several issues concerned with using Local Moran s I to detect disease clusters. As the design of Local Moran s I is to test the similarity of the attributive values between the region of interest and its neighbors, the clusters detected with Local Moran s I may be not the areas with significant elevated disease risk. Local Moran s I is incapable of detecting the clusters which only involve a single region. Conducting a separate statistical test with Local Moran s I for each region in the study area results in a multiple testing problem that some clusters may be detected just by chance even if the real pattern of disease incidence is random (Rogerson and Yamada 2009). In addition, crude rates, such as Standardized Incidence Ratio (SIR), are usually directly used as the attribute in Local Moran s I to detect the disease clusters (Jacquez and Greiling 2003, Rogerson and Yamada 2009), which may cause the test to be unstable due to low reliability of disease rate with a small population at risk. Different from Local Moran s I, Openshaw et al. s (1987) Geographical Analysis Machine (GAM) is an exploratory and graphical method that allows to detect clusters with 21

36 significant elevated disease risk. A fine regular lattice is laid on the study region, and many circles of various radii are constructed on each lattice point. The number of disease cases in each circle is then counted and compared with the number of disease cases which would be expected under the null hypothesis that all disease incidences are spatially distributed randomly within the underlying structure of population at risk. With Monte Carlo testing (Dwass 1957) where the probability distribution of the expected number of cases in each circle is generated based on simulations, if the null hypothesis is rejected, the corresponding circle will be drawn on the map. Finally, an idea about where and how large the disease clusters may be can be obtained by looking at the plotted circles. Each circle is regarded as having a significantly elevated risk. Since there are usually thousands of circles with various radii tested simultaneously, the multiple testing problem and enormous computational workload need to be addressed. Turnbull et al. (1990) proposed a method, Cluster Evaluation Permutation Procedure (CEPP), which only tests the circle with maximum count of disease cases among all moving circles covering the same predefined population. This method solves the multiple testing problem, but the input threshold, a predefined population, may be hard to determine. Based on Openshaw et al. s (1987) and Turnbull et al. s (1990) methods, Kulldorff and Nagarwalla (1995) developed a circular spatial scan statistic which is denoted as the CSScan method in the following part. A circular scan window with various radii is constructed and moved over the space of study area. The null hypothesis is defined as the probability of being a case in the circle, p, is the same as that in the rest of the study region, q. The alternative hypothesis is p > q. Given the number of cases and population inside and outside the circle, maximum likelihood ratio between these two hypotheses is selected as the test statistic, which can be derived with two stochastic models, Bernoulli and Poisson (Kulldorff 1997). The circular 22

37 window with the maximum test statistic is regarded as the most likely cluster. Its significance is then tested using Monte Carlo testing method (Dwass 1957). The spatial scan statistic based on Poisson model λ is shown as below (Equation 2.1, Kulldorff 1997): n( z) n n( z) n( z) n n( z) n( z) n n( z) sup > ( ) ( ) if λ = z Ζ ( ) ( ) e z n e z e z n e z Equation otherwise where sup denotes supremum (least upper bound), z denotes the zone within the circular scan window which is included in the zone set Z, n(z) and e(z) denote the actual number of disease cases and the null expected number of cases within the specified zone z, respectively. n is count of total disease cases in study area. CSScan method is one of the widely-used methods for cluster detection until now possibly because it addresses the problems existing in such methods as Local Moran s I, GAM, and CEPP. In addition, the latest version of the tool for this method, SaTScan TM, can be easily accessed over the Internet (Kulldorff and Information Management Services Inc. 2010). Since Kulldorff s CSScan uses a circular window to scan the study region, it is difficult to detect clusters of irregular shapes. In order to solve this problem, many methods have been developed which mainly modify the search strategy of the scan window or the construction of a test statistic. Duczmal and Assunção (2004) proposed a simulated annealing search strategy for detection of arbitrarily shaped spatial clusters. In this method, however, it tends to be arbitrary 23

38 when choosing one of the four strategies with different levels of randomness for the successor of the current subgraph at each step. Tango and Takahashi (2005) proposed a flexibly shaped spatial scan statistic which exhaustively searches all cluster candidates within a given radius of any area. However, there is an exponential increase in running time of their algorithm with the increase of search radius. Several penalty parameters were incorporated into the maximum likelihood ratio function in different methods to either enable the method to find irregular shaped clusters, such as the eccentricity penalty in Kulldorff et al. (2006) for elliptical-shaped clusters, or penalize the detected clusters that are very irregular in shape, such as the non-compactness in Duczmal et al. (2006) and non-connectivity penalty in Yiannakoulias et al (2007). In spite of all the efforts, these methods are still plagued with a large dose of subjectivity in these penalty parameters. 2.3 Redesigned Spatial Scan Method (RSScan) From the review of existing methods in the previous section, it can be summarized that spatial scan methods mainly consist of two components: a search strategy and a test statistic such as the spatial scan statistic λ. The objective of spatial scan is to find zone z which maximizes the test statistic over all zones in the set Z and identifies the one that constitutes the most likely cluster (Duczmal and Assunção 2004). A search strategy mainly defines the zone set Z and in turn determines the possible shape of a cluster estimate and the running time of an algorithm. A test statistic, combined with the search strategy, determines the performance of the method. In order to rapidly detect arbitrarily shaped spatial disease clusters for count data, and at the same time to address the issues identified in the above-mentioned methods, we redesigned Kulldorff s CSScan method by using Assunção et al. s (2006) dmst method as the search strategy and Tango s (2008) restricted likelihood ratio as the test statistic in our RSScan method, which will 24

39 be described in the following subsections (2.3.1 and 2.3.2), respectively. Table 2.1 shows the test statistics and search strategies used in four spatial scan methods including our RSScan method, Tango s method, Assunção et al. s method, and Kulldorff s CSScan method. Table 2.1. Test statistics and search strategies of four spatial scan methods Test Statistic Tango s Restricted Likelihood Ratio Kulldorff s Maximum Likelihood Ratio Search Strategy Assunção et al. s dmst Circular Scan Window RSScan Tango s method Assunção et al. s method CSScan Although Tango (2008) mentioned the restricted likelihood ratio could be used with a non-circular scan window, and his latest version of software FleXScan v3.1 (Takahashi et al. 2010), released just after this study was finished allows the restricted likelihood ratio to be combined with his flexible scan method, the current literature lacks work testing and discussing such kind of combination. Tango (2008) designed four cluster models to test the statistical power of restricted likelihood ratio with circular scan windows. However, using this method it is difficult to explain the performance of restricted likelihood ratio as a test statistic under other situations, such as different levels of disease cases in study area or various shapes of clusters. The choice of the screening level α 1 in the restricted likelihood ratio needs also to be explored when combined with the non-circular scan window such as the dmst search strategy in our RSScan method. 25

40 2.3.1 Test Statistic It is reasonable to think that not only should the disease clusters be areas of significantly elevated risk as a whole, but also the risks of individual regions within the clusters should not be very low. Therefore, we adopt the restricted likelihood ratio proposed by Tango (2008) as the test statistic λ T in our RSScan method (Equation 2.2, Tango 2008). n( z) n n( z) n( z) n n( z) n( z) n n( z) λ = > ( ) ( ) ( ) ( ) ( < ) T sup I I pi α 1 Equation 2.2 z Ζ e z n e z e z n e z i z where I( ) is an indicator function. The only difference between Tango s restricted likelihood ratio function (Equation 2.2) and Kulldorff s maximum likelihood ratio function (Equation 2.1) is the product of indicator functions: ( < ) i z I α, in which α 1 is a screening level specified by p i i users for the risk of any individual region, and p i is the one-tailed mid-p value of region i under the test for null hypothesis H 0 : E(N i ) = e i, which is defined as below (Equation 2.3, Tango 2008). 1 p i = Pr{ N i ni + 1 N i ~ Pois( ei )} + Pr{ N i = ni N i ~ Pois( ei )} Equation where N i is a random variable which denotes the number of disease cases in region i, n i and e i denote the actual number of cases and null expected number of cases in region i, respectively. In Tango s restricted likelihood ratio function, if the one-tailed mid-p value of a region is less than the prespecified screening level α 1, this region will be regarded as being of elevated risk. Otherwise, this region will not be considered in the disease cluster estimate. It should be noted 26

41 that Kulldorff s maximum likelihood ratio is the special case of the restricted likelihood ratio when the screening level α 1 =1. Although the problem of noninterpretability in the parameters is addressed and the cluster size is effectively controlled with the restricted likelihood ratio function, the choice of screening level α 1 is totally up to users. Tango (2008) provides a guideline regarding the choice of α 1 for a test of the nominal α level of 0.05, and recommends α 1 =0.2 as a default value. However, this guideline is derived only from the testing results with four simulated cluster models using a circular scan window. The recommendation of α 1 value in our RSScan method for detecting the clusters in arbitrary shapes will be explored in Section Search Strategy In order to detect arbitrarily shaped clusters and guarantee the spatial contiguity, we use graph G (V, E) to represent a region map, where V is a set of n vertices (each representing such a region as census tract or county), and E is a set of edges (each connecting a unique pair of adjacent regions) (Figure 2.1). Figure 2.1. Graph-based representation of a region map 27

42 The exclusion of the regions of low risks in the restricted likelihood ratio function is realized by removing all edges of those regions in the graph. This screening step also reduces the amount of calculation in the algorithm. Therefore, the final cluster estimate will only include the regions which are connected in the graph. Similar to the Kulldorff s CSScan method, the RSScan method will find the most likely cluster with the largest value of the test statistic to address the multiple testing problem. Assunção et al. s (2006) dmst method is used as the search strategy in our RSScan method. Given a graph G and an empty collection T, for any vertex u, the steps can be described as follows: 1) Put vertex u into T. 2) Among all the vertices not in T but adjacent to any vertex in T, identify the vertex v adding which T has the largest value of the test statistic at current step, and then put vertex v into T. All vertices in current T constitute one zone (i.e. a potential cluster) for scan. 3) Repeat step 2 until all vertices connected to vertex u in graph G are added into T. Above steps are executed for each vertex not isolated in the graph G, and then we can get the zone set Z where the one with the maximum test statistic will be regarded as the most likely cluster. In order to reduce calculating intensity, a search radius K is set so that at most K-1 nearest neighboring vertices are involved into the zones when scanning each vertex. 2.4 Performance Evaluation Experimental design An experiment was designed with six single-cluster models based on simulated data in order to evaluate the performance of the RSScan method. For each cluster model, the location of 28

43 the disease cluster was first located in the study area, and then a relative risk r>1 was assigned to the regions within the disease cluster and r=1 to the rest regions. Given the total number of disease cases in the study area, the number of disease cases in region i follows a multinomial distribution with the probability of i i m r p / r p where r i and p i are the relative risk and i= 1 i i population at risk in region i, respectively. m is the total number of regions in the study area. Based on the criterion used by Kulldorff et al. (2003), the relative risk for all regions that constitutes a cluster is determined using a one-sided binomial test with significance level of 0.05 such that the null hypothesis is rejected with probability of when the alternative is a cluster with unknown risk but with known location. This choice of relative risks provides an upper limit of for the power attainable by any test. Three types of shapes are designed for simulated cluster models: round, line and trifurcate shape. The study area (Figure 2.2) is the State of Georgia (GA) in the United States including 159 counties with a total population of 9,210,790 (year 2000). Three locations in this area (Figure 2.3) are chosen for simulated clusters. Two levels of disease case numbers are designed: Low (500 cases) and High (5000 cases). Combining the types of disease cases and cluster shape, there are total six cluster models. A code format as X_Shape was used to label these cluster models. The first X indicates the level of disease case numbers with L for low and H for high. Table 2.2 lists all detailed information of each cluster model. We also simulated a scenario where there is no cluster for each level of disease case numbers (all regions have a relative risk r=1) so that the capability of the method to control Type I error could be tested. 29

44 Figure 2.2. Population 2000 by counties in GA in the United States Figure 2.3. Locations of simulated clusters: (a) circular shape (b) linear shape (c) trifurcate shape 30

45 Table 2.2. Information of simulated cluster models Cluster Cluster Count of Population in Cluster Size Relative Shape Type ID Code Cases Cluster (count of counties) Risk 1 L_Round ,802,970 7 Round 2 H_Round L_Line ,721,370 5 Line 4 H_Line L_Tri 500 Trifurcate , H_Tri 5000 shape 1.33 For each type of cluster and non-cluster scenario, 500 replications were simulated, each of which has the same cluster location and total number of disease cases over the whole study area but different disease cases in every region. The nominal significance level was selected as 0.05, which means that clusters with p-values larger than 0.05 are considered not significant. Monte Carlo testing method (Dwass 1957) with 999 repetitions were used to test the significance of the observed test statistic. So the p-value can be calculated with the rank of the observed test statistic among the total 1000 tests. In order to explore the effect of screening level α 1 in restricted likelihood ratio function, five different values: 0.05, 0.1, 0.2, 0.3 and 0.4 were set. Since the RSScan method is a hybrid between Tango s (2008) method and Assunção et al. s (2006) method, these two methods were chosen for comparison in an experiment. Considering Kulldorff s CSScan method is probably the most widely-used method for detecting spatial clusters, it also was added into the comparison. A 20% population in study region was set as the upper limit covered by the circular scan window in CSScan method, and the search radius K in other three methods are correspondingly set to 30 counties Experimental Results Power is the most important evaluation criterion for cluster detection tests, which indicates how effective methods are in identifying the presence of statistically noteworthy clusters (Kulldorff et al. 2003, Tango and Takahashi 2005, Assunção et al. 2006, Tango 2008). In 31

46 order to understand how well these methods identify the correct boundaries of a cluster, Kappa Index of Agreement (KIA, De Smith et al. 2007) is chosen as a complimentary metric to the power in this study since it not only shows the match degree between the detected cluster estimates and the true clusters, but also excludes the probability that the cluster regions are detected by chance. In this case, the KIA decreases the impacts on the evaluation caused by different cluster model properties, such as study region size and cluster size. In order to easily compare the performances of different methods or different screening level values in RSScan and Tango s method, the results of six cluster models were averaged in terms of the levels of disease cases and shapes of clusters Estimated Power of Methods The power in this study is defined as the ratio of statistically significant clusters detected (significance level=0.05) to the count of replications for each cluster model (500). The results of the power analysis for four spatial scan methods are shown in Table 2.3. The highest value for each scenario (column in the table) is bold. The test statistics in Assunção et al. s method and CSScan method can be regarded as the restricted likelihood ratio with α 1 =1. We can see that all four methods have higher power to detect significant clusters with lower level of disease cases (L_Cas) than those with higher level of disease cases (H_Cas). With the increase of α 1 from 0.05 to 0.4, RSScan method is easier to detect the significant clusters in the shapes varying from linear shape (Line) to round shape (Round) and then to trifurcate shape (Tri), while Tango s method is easier to detect the significant clusters in the shapes varying from linear shape (Line) to round shape (Round) but more difficult for the trifurcate shaped clusters (Tri) whatever the value of α 1 is. Assunção et al. s method and CSScan method both have highest powers for trifurcate shaped clusters (Tri). However, Assunção et al. s method is more difficult to 32

47 detetct significnat round clusters (Round) while CSScan method has the lowest power for linear clusters (Line). Table 2.3. Estimated power of four spatial scan methods (significance level=0.05) α 1 = 0.05 α 1 = 0.1 α 1 = 0.2 α 1 = 0.3 α 1 = 0.4 α 1 = 1 Number of Cases Cluster Shape H_Cas L_Cas Line Round Tri Average RSScan Tango s RSScan Tango s RSScan Tango s RSScan Tango s RSScan Tango s Assunção s CSScan Figure 2.4 shows the estimated average power for each method considering all scenarios. The figure shows that Assunção et al. s method has the highest average power (0.876) among these four methods for the clusters with any level of disease cases and any type of shape. RSScan method has a good power especially when α 1 is large such as 0.4 (0.835). CSScan method has a relatively low power (0.789), and Tango s method has the lowest power whatever the value of α 1 is Kappa Index of Agreement In order to evaluate the agreement between the most likely cluster detected and true clusters to understand how well these methods identify the correct boundaries of a cluster, KIA was used as another metric to evaluate the performance of these four methods. One advantage of KIA is that it excludes the probability of detected cluster regions caused merely by chance. There 33

48 are two categories of regions: inside cluster and outside cluster. Given the study area size (S), the true cluster size (T), the detected cluster estimate size (D), and the size of the intersection between the cluster estimate and the true cluster (I), Table 2.4 shows the contingency table for detected cluster estimates and true clusters. Power Screening level α 1 RSScan Assunção's Tango's CSScan Figure 2.4. Estimated average power of the four spatial scan methods Table 2.4. Contingency table for detected cluster estimates and true clusters True Cluster Cluster Estimate Inside Cluster Outside Cluster Total Inside Cluster I T-I T Outside Cluster D-I S-T-D+I S-T Total D S-D S Based on above contingency table, the KIA equation can be derived for this study (Equation 2.4): 34

49 O E κ = 1 E Equation 2.4 O I + ( S T D + I ) =, D T + ( S D) ( S T ) S E = S 2 where O is the observed proportion of matching values (the contingency table diagonal) and E is the expected proportion of matches in this diagonal assuming the two categories in true cluster are independent from the two categories in cluster estimate. KIA ranges from 0 to 1, and 1 means a perfect agreement. With the highest KIA value for each scenario (column in the table) in bold, Table 2.5 indicates that all methods have higher or close performance to identify the correct boundaries of a cluster when there is a relatively low level of disease cases in the study region (L_Cas). With the increase of α 1 from 0.05 to 0.4, both RSScan and Tango s methods are good at identifying the boundaries of the clusters in the shapes varying from line (Line) to round (Round). The boundaries of trifurcate shaped clusters (Tri) are difficult to be correctly identified by both methods. Assunção et al. s method is relatively better for clusters with trifurcate shape (Tri) than other shapes, and CSScan method is good for round cluster (Round). Figure 2.5 shows the average KIA value for each method considering all scenarios. The figure indicates that RSScan method has a better performance to detect the boundaries of clusters in various shapes than other three methods and peaks when α 1 is 0.2 (0.614). The performance of Tango s method peaks when α 1 is 0.4 and has a similar KIA value with CSScan method (about 0.47). Assunção et al. s method has a relatively low power (0.435) possibly due to many low-risk regions being involved into the cluster estimates. 35

50 Table 2.5. KIAs between the most likely clusters and true clusters for four spatial scan methods α 1 = 0.05 α 1 = 0.1 α 1 = 0.2 α 1 = 0.3 α 1 = 0.4 α 1 = 1 Number of Cases Cluster Shape H_Cas L_Cas Line Round Tri Average RSScan Tango s RSScan Tango s RSScan Tango s RSScan Tango s RSScan Tango s Assunção s CSScan KIA Screening level α 1 RSScan Tango's Assunção's CSScan Figure 2.5. Average KIAs of four spatial scan methods Non-cluster Scenario Results For non-cluster scenario, Table 2.6 shows that all methods averagely detected about 5% clusters out of 500 non-clustered replications. Considering the significance level of 0.05 used for these tests, the results indicate that all methods have good capabilities to control Type I error. 36

51 Table 2.6. Average Type I error of four spatial scan methods RSScan Tango s Assunção s CSScan α 1 = α 1 = α 1 = α 1 = α 1 = α 1 = Application: Georgia Lung Cancer, Based on above experimental results, the RSScan method with appropriate screening level α 1 value was found to usually have a higher capability than other three methods to detect the significant clusters and identify the boundaries of clusters in arbitrary shapes. 0.2 could be recommended as the default α 1 value. We use the RSScan method to detect the cluster of lung cancer diagnosed in GA in the period of The health data from Georgia Comprehensive Cancer Registry show that the lung cancer cases in GA from 1998 to 2005 total 42,521 among which male cases are 25,615 and female cases are 16,906. The expected number of cases for county i is calculated based on GA population in 2000 (Figure 2.2) and adjusted by the age and sex. Figure 2.6 shows standardized incidence ratio (SIR) for each county in GA and the detected cluster result using RSScan method with screening level α 1 = 0.2. The detected cluster is found to be located in north-western GA including total 8 counties: Bartow, Gordon, Haralson, Murray, Polk, Walker, Whitfield, and Paulding. The p-value of the cluster is 0.002, and total 3,177 cases occurred within the cluster area during that time. The SIR of the cluster is

52 Figure 2.6. SIRs and the detected cluster of lung cancer incidence in GA, Discussion and Conclusions It should be noted that the performances of both the RSScan method and the other three methods vary under different situations such as counts of disease incidence cases and cluster shapes. This finding corresponds well with the power analysis given by Waller and Gotway (2004) that most tests to detect clusters have spatially heterogeneous power. The high estimated power in the experiment indicates that these methods could be competent in the exploratory study which indicates the questionable areas for further study. However, the relatively low KIA 38

53 values indicate that these methods may be inappropriate for the applications which require accurate boundaries of clusters, such as the analysis of the change of spatial clusters over time. In order to get deeper insights about the spatio-temporal disease risk pattern, disease risk modeling, such as spatio-temporal multilevel models, may be a better way. Tango s restricted likelihood ratio has good interpretability and strong power in detecting disease clusters with circular scan window (Tango 2008). To our knowledge, however, there is no previous work discussing its performance in detecting clusters in arbitrary shapes with other search strategies. For the first time, this study implements and tests restricted likelihood ratio combined with Assunção et al. s dmst search strategy to quickly detect disease clusters in arbitrary shapes. In order to understand the performance of this redesigned hybrid method in various situations, more cluster models than Tango (2008) and Assunção et al. (2006) were designed in this performance test, which includes six cluster models and two non-cluster scenarios. These cluster models consider different numbers of disease cases in a study area and various shapes of clusters. The choice of the screening level α 1 in restricted likelihood ratio is also explored when combined with Assunção et al. s dmst search strategy in the RSScan method. Besides the metric of power, this study proposes using KIA to evaluate and compare the performances of cluster detection methods to identify the boundaries of clusters in order to avoid the effects due to the different cluster model properties. Finally, the application of the RSScan method was applied in a case of detecting the cluster of lung cancer in Georgia between 1998 and The experimental results indicate that the RSScan method with appropriate screening level α 1 generally has higher or similar capability to quickly detect statistically significant disease clusters and identify the boundaries of clusters than Tango s method, Assunção et al. s 39

54 method, and Kulldorff s CSScan method under the same situation, especially for the clusters in irregular shapes. Based on results of this study, 0.2 is recommended as a default for the screening level α 1 in the RSScan method. 40

55 References Anselin, L., Local indicators of spatial association-lisa. Geographical analysis, 27 (2), Assunção, R., Costa, M., Tavares, A. & Ferreira, S., Fast detection of arbitrarily shaped disease clusters. Statistics in Medicine, 25 (5), Besag, J. & Newell, J., The detection of clusters in rare diseases. Journal of the Royal Statistical Society. Series A (Statistics in Society), 154 (1), De Smith, M., Goodchild, M. & Longley, P., Geospatial analysis: A comprehensive guide to principles, techniques and software tools: Troubador Publishing. Duczmal, L. & Assunção, R., A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Computational Statistics & Data Analysis, 45 (2), Duczmal, L., Kulldorff, M. & Huang, L., Evaluation of spatial scan statistics for irregularly shaped clusters. Journal of Computational and Graphical Statistics, 15 (2), Dwass, M., Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics, 28 (1), Goovaerts, P., Geostatistical analysis of county level lung cancer mortality rates in the southeastern united states. Geographical analysis, 42 (1), Jacquez, G. & Greiling, D., Local clustering in breast, lung and colorectal cancer in long island, new york. International Journal of Health Geographics, 2 (1), 3. Kulldorff, M., A spatial scan statistic. Communications in Statistics-Theory and Methods, 26 (6), Kulldorff, M., Huang, L., Pickle, L. & Duczmal, L., An elliptic spatial scan statistic. Statistics in Medicine, 25 (22),

56 Kulldorff, M. & Information Management Services Inc., Satscantm v9.1: Software for the spatial and space-time scan statistics. Kulldorff, M. & Nagarwalla, N., Spatial disease clusters - detection and inference. Statistics in Medicine, 14 (8), Kulldorff, M., Tango, T. & Park, P.J., Power comparisons for disease clustering tests. Computational Statistics & Data Analysis, 42 (4), Lawson, A., Statistical methods in spatial epidemiology, 2nd ed. Chichester, England ; Hoboken, NJ: Wiley. Maheswaran, R. & Craglia, M., Gis in public health practice Boca Raton: CRC Press. Openshaw, S., Charlton, M., Wymer, C. & Craft, A., A mark 1 geographical analysis machine for the automated analysis of point data sets. International Journal of Geographical Information Systems, 1 (4), Rogerson, P. & Yamada, I., Statistical detection and surveillance of geographic clusters Boca Raton: CRC Press. Takahashi, K., Yokoyama, T. & Tango, T., Flexscan v3.1: Software for the flexible scan statistic. Tango, T., A spatial scan statistic with a restricted likelihood ratio. Japanese Journal of Biometrics, 29 (2), Tango, T. & Takahashi, K., A flexibly shaped spatial scan statistic for detecting clusters. International Journal of Health Geographics, 4, Turnbull, B.W., Iwano, E.J., Burnett, W.S., Howe, H.L. & Clark, L.C., Monitoring for clusters of disease - application to leukemia incidence in upstate new-york. American Journal of Epidemiology, 132 (1), S136-S143. Waller, L. & Gotway, C., Applied spatial statistics for public health data: Wiley- Interscience. 42

57 Yiannakoulias, N., Rosychuk, R.J. & Hodgson, J., Adaptations for finding irregularly shaped disease clusters. International Journal of Health Geographics, 6 (1),

58 CHAPTER 3 HIERARCHICAL BAYESIAN MODELING OF THE SPATIO-TEMPORAL PATTERNS OF LUNG CANCER INCIDENCE RISKS IN GEORGIA, Yin, P., Mu, L., Madden, M. and Vena, J. To be submitted to International Journal of Health Geographics. 44

59 Abstract Lung cancer is the second most commonly diagnosed cancer in men and women in Georgia. However, the related studies about the patterns of lung cancer in Georgia at a fine spatio-temporal scale are very limited. In this study, hierarchical Bayesian models are used to explore the spatio-temporal patterns of lung cancer incidence risks by race and sex in Georgia for the period of 2000 to With the census tract level as the spatial scale and the two-year period aggregation as the temporal scale, we propose and compare a total of seven Bayesian spatio-temporal models including two under the separate modeling framework and five models under the joint modeling framework. One of these models is finally chosen and its results clearly show that the northwest region of Georgia has stably elevated lung cancer incidence risks for all population groups during the study period. Showing more detailed and reliable variations of the lung cancer incidence risks in space and time, our study aims to better support assessing healthcare performance, establishing etiological hypotheses, and making effective and efficient health policies. In addition, our study shows that there are strong inverse relationships between the socioeconomic status (SES) and the lung cancer incidence risk in Georgia males, especially white males, and weak inverse relationships in both white and black Georgia females. The study results are expected to lead to further studies including, the spatial and temporal random effects in the models that may provide some implications on the potential disease risk factors for further ecological studies. The limitations of this study including the lack of smoking data and population estimation error are also discussed in the end. Keywords: Bayesian model, Spatio-temporal pattern, Lung cancer, Socioeconomic status, Georgia 45

60 3.1 Introduction Lung cancer is not only the second most commonly diagnosed cancer in men and women, but also the leading cause of cancer-related death in Georgia in the United States (Georgia Department of Public Health 2008). However, as far as we know, the lung cancer studies in Georgia are very few, and most of these mainly focus on descriptive analyses using crude rates at a coarse spatio-temporal scale, such as the 5-year incidence rates at the health district or county level. Such analytical results usually obscure the detailed variations of lung cancer risks in space and time, and could introduce inferential biases on etiological hypotheses. In addition, they can only provide limited help for healthcare performance assessment and health policy making to improve the efficiency of interventions and the distribution of resources. The small number problem is one of the challenges for mapping lung cancer risk at a fine spatio-temporal scale. For rare diseases such as cancers, the total counts of cases could become very sparse at some fine spatio-temporal scales, especially when more demographic dimensions are also considered, such as sex, age, race, among others. With the sparseness of the counts, some traditional estimates of disease risk or relative risk, such as the Standardized Incidence Ratio (SIR), could become unreliable and may lead to a large misunderstanding of the true disease risk due to high sampling variability. Recently, hierarchical Bayesian models have been widely used to map disease risk spatially or spatio-temporally (Bernardinelli et al. 1995, Waller et al. 1997, Xia and Carlin 1998, Knorr-Held 2000, Mollié 2001, Wakefield et al. 2001, Best et al. 2005, Richardson et al. 2006, Abellan et al. 2008, Lawson 2009, Fortunato et al. 2011). For sparse count data, integrating both data fit and subjective prior information makes Bayesian models possible to mitigate the inferential biases of frequentist methods that totally depend on data fit. In addition, it is easy to develop model-based spatial and spatio-temporal smoothing 46

61 methods under the Bayesian framework that not only consider the effects of disease risk factors, but also borrow strengths from neighboring areas and/or time periods. In this study, we use hierarchical Bayesian models to explore the spatial-temporal patterns of lung cancer incidence risks in Georgia. The analyses are conducted for four population groups stratified by sex and race at the census tract level over four two-year periods from A total of seven spatio-temporal models under two modeling frameworks were proposed and compared. One framework is to model the relative risks (RRs) of each population group separately, and the other framework is to jointly model the RRs of each population group under the assumption that some common disease risk factors exist in all population groups. One model is finally chosen based on some criterion and its results are interpreted. The aim of the study is to obtain reliable spatio-temporal patterns of lung cancer incidence risks by sex and race in Georgia at a fine scale, which are expected to identify the spatio-temporal hot-spots of the disease risks of a specific population group for further study, and help to facilitate the related health policy making in Georgia. In addition, evaluating the effects of area-based socioeconomic status (SES) on the lung cancer incidence risks in each population group is also explored in the modeling. The understandings of the socioeconomic gradients in lung cancer incidence risks by race and sex could provide some implications on how to reduce the lung cancer disparities in Georgia. This paper will be organized as follows. In the next section, the study area and data sources are described. Then, the method for population estimation, the area-based SES measure, and the seven Bayesian spatio-temporal models under the two modeling frameworks are explained. Next, the modeling results and discussions are given, followed by some conclusions. 47

62 3.2 Study Area and Data Our study area is the state of Georgia with 1,618 census tracts in Figure 3.1 shows the distribution of population density by census tract in Georgia The 10 most populous cities in 2000 are also shown in this map. We can see that the population is mainly concentrated in the north region of Georgia, especially in the metropolitan Atlanta area that includes the cities of Atlanta, Sandy Spring, Rowswell, and Marietta. All of the population data and socioeconomic data come from the U.S. Census. Figure 3.1. Population density by census tract and the 10 most populous cities in Georgia

63 The lung cancer data (primary site codes from C340-C349 in ICD-O-3) are extracted from the Georgia Comprehensive Cancer Registry (Georgia Department of Public Health 2011). A total of 44,671 lung cancer cases were diagnosed in Georgia from In this study, we only consider the cases among white and black individuals over 20 years old and the total number is 43,504. A total of 3,219 cases are excluded from the analyses due to their lower spatial accuracy than the census tract level. Therefore, 40,285 cases are finally included and aggregated to the 1,618 census tracts in the geography of the Census Table 3.1 shows the total number of cases of individuals over 20 years old and the percentage of included cases in the analyses by sex and race. We can see that the lowest percentage of included cases is 89.81% for black males. Table 3.1. Total number of cases of individuals over 20 years old and the percentage of included cases in the analyses by sex and race White Black Total cases Included cases (%) Total cases Included cases (%) Male 20, , Female 14, , To avoid a high level of sparseness while keeping the temporal dimension, cases are aggregated to four two-year periods, , , , and , for the analyses. The average number of cases per census tract per two-year period is 2.9 for white males, 2.1 for white females, 0.77 for black males, and 0.48 for black females. The expected numbers of cases by census tract by two-year period by sex and race are calculated based on the reference rates that are the average age-specific incidence rates by sex and race across the whole Georgia and over the time period In the calculation of the reference rates, a total 10 age groups are considered including age groups from and 40-49, 7 five-year age groups from and one group from 85 and over. 49

64 3.3 Methods Population Estimation for Intercensal Years The population at risk is important to accurately calculate expected cases and estimate disease risk. However, the census population data at the tract level are only available at the census years (e.g and 2010). It is also noted that the geographic boundaries of census tracts vary every census year. For example, there are a total of 1,618 tracts in Census 2000, while a total of 1,969 tracts exist in Census At the county level, the Census Bureau (Population Estimates Program 2011) provides the estimates of population by race, sex and age for each intercensal year. In this study, the boundaries of census tracts in 2000 are used as the standard geography for the whole study period. With the census population data currently available, one of the interpolation methods proposed by Best and Wakefield (1999) is used to estimate the population by race, sex and age at the census tract level for each intercensal year. The steps of the population estimation are as follows. First, we use the overlay function in the Geographical Information System (GIS), ArcGIS TM (ESRI, Inc.) and the areal weighting interpolation method (Goodchild and Lam 1980) to estimate the population in 2010 using the geography of the 2000 census tracts. To improve the accuracy, we use the 2010 population data at the block level instead of the tract level since blocks are at a finer spatial scale. Then, for each population group by race, sex and age in a county, we assume the population N are multinomially distributed to the census tracts in that county with a vector of apportionment probabilities p=(p 1,,p I ) T, where I denotes the number of census tracts in that county and p i is the proportion of the population in census tract i in the population of the county N. The probabilities p for each intercensal year is estimated via a simple linear interpolation between the censuses (i.e., 2000 and 2010). 50

65 Based on the population estimates, the reference rates for all population groups are then calculated. Using the U.S standard population for standardization, the direct age-adjusted (over 20 years old) lung cancer incidence annual rates (per population) in Georgia ( ) are for white males, 75.3 for white females, for black males, and 54.5 for black females Area-based SES Measure Due to the relative homogeneity, the area-based SES measure at the census tract level could be a good surrogate of individual SES in a health study when individual SES is unavailable (Krieger 1992). Detailed discussions of area-based SES measures can be found in the literature (Krieger et al. 1997, Carstairs 2001, Krieger et al. 2002, Darden et al. 2009). Various single variable or composite measures can capture different aspects of socioeconomic characteristics. In this study, we use the modified Darden-Kamel Composite Index (Darden et al. 2009) to measure the SES at the census tract level, and evaluate its relationships with the lung cancer incidence risks by race and sex. The modified Darden-Kamel Composite Index is an average Z score of total nine socioeconomic variables in U.S. census data (Table 3.2). Table 3.2. Variables incorporated in the modified Darden-Kamel Composite Index Modified Darden-Kamel Composite Index 1. Percentage of residents with university degrees 2. Median household income 3. Percentage of managerial and professional positions 4. Median value of dwelling 5. Median gross rent of dwelling 6. Percentage of homeownership 7. Percentage below poverty 8. Unemployment rate 9. Percentage of households with vehicle 51

66 Based on Census 2000 data, the modified Darden-Kamel Composite Indexes for the census tracts in Georgia are calculated and their range is from to A larger value means a higher SES. Based on the index, the census tracts in Georgia are divided into five SES groups with equal number of census tracts. Group 1 has the highest SES and group 5 has the lowest. Figure 3.2 shows the distribution of the SES by census tract. We can see that the higher SES regions are mainly concentrated in the large cities in Georgia. Figure 3.2. Quintile map of SES in Georgia

67 3.3.3 Bayesian Spatio-temporal Models Bayesian models have naturally hierarchical structures. At the first level, the number of observed cases y itk for census tract i =1,,1618, time period t =1,,4 and population group by race and sex k =1,,4 is assumed to follow a Poisson distribution with mean E itk R itk, where E itk and R itk are respectively the known expected number of cases and the unknown RR compared to the corresponding reference risk (measured by the reference rate of specific population group) in census tract i, time period t and population group k. At the second level, the logarithms of RRs are decomposed into fixed effects for those measured risk factors such as the SES, and random effects for those unmeasured or unobserved risk factors. In Bayesian spatio-temporal models, three random effects are usually considered: spatial random main effect, temporal random main effect and spatio-temporal interaction random effect. Both spatial and temporal random main effects could be further divided into a structured component and an unstructured component, which reflect the dependent and heterogeneous variations of risks in space and time, respectively. In the Bayesian paradigm, prior distributions are needed to be assigned to the model parameters and the random effects. Then, the references are made based on the posterior distributions of the parameters and random effects derived from simulations. In this study, we model the RR of each population group individually under two modeling frameworks. The first framework is separate modeling where each population group has an independent set of random effects. The second framework is joint modeling where there are shared random effects representing some common unmeasured or unknown risk factors among all the population groups. This joint modeling framework has been used to map one disease for multiple population groups or multiple diseases that have common risk factors (Knorr-Held and Best 2001, Held et al. 2005, Richardson et al. 2006, Downing et al. 2008). We 53

68 compare a total of seven models including two separate models and five joint models. Table 3.3 shows the components of the logarithms of RRs in each model. Table 3.3. Components of logarithms of RRs in the seven Bayesian spatio-temporal models Model Type Model # Logarithms of RRs T Model1 log ( Ritk ) = α k + β k xi + λik + ξtk Separate T Model2 log ( R ) α + β k x + λ + ξ + υ Model3 Model4 Joint Model5 Model6 Model7 log log log log log itk = k i T ( Ritk ) = k + i + δ1, kφi + δ 2, k ik α β x k ς + ω T ( Ritk ) = k + β k xi + δ1, kφi + δ 2, k R α ς + λ + ξ T ( itk ) = k + β k xi + 1, k i + 2, k R tk α δ φ δ ς + θ + λ + ξ α δ φ δ ς + λ + ξ + ω T ( itk ) = k + β k xi + 1, k i + 2, k α β x k ς + θ + λ + ξ + ω T ( Ritk ) = k + i + δ1, kφi + δ 2, k itk t t t t t ik it ik it itk tk ik tk ik tk tk itk itk In each model, α k is the overall log-rr for population group k across the whole study area over the whole study period, and β k are the coefficients associated with the SES group vector x i for population group k. The difference among the seven models is in the components of random effects. Separate models 1 and 2 both have spatial random main effect λ ik for population group k in census tract i and temporal random main effect ξ tk for population group k in time period t. Model 2 also considers the spatio-temporal interaction υ itk in census tract i and time period t for population group k. In addition to the population-group-specific random effects like those in separate models 1 and 2, joint models 3-7 also consider shared random effects across the four population groups by race and sex. In these shared components of the joint models, ϕ i represents the shared spatial component in census tract i, and ϛ t represents the shared temporal component in time period t. The coefficients δ 1,k and δ 2,k allow gradients of the shared spatial and temporal components among all the population groups. In models 5 and 7, a shared spatio-temporal interaction θ it is also considered. With respect to the population-group-specific random effects, 54

69 model 3 only considers a spatio-temporal interaction random effect ω itk for population group k, and models 4 and 5 only consider specific spatial and temporal random main effects λ ik and ξ tk. For the two components λ ik and ξ tk in models 4-7, We set them equal to 0 in white male models (k=1) so that these two components in other population group models (k=2, 3 and 4) actually are the differentials of the spatial and temporal random main effects between that population group and white males. Some early experiments show that only considering structured components in spatial and temporal random main effects have better modeling results than considering both structured and unstructured components. Therefore, the widely used Gaussian intrinsic conditional autoregression normal (CAR normal) prior proposed by Besag et al. (1991) are assigned to the spatial random main effects λ ik and ϕ i and the temporal random main effects ξ tk and ϛ t to represent the dependent variations of RRs over space and time. For a spatial random effect in an area, CAR normal specifies that its conditional distribution, given all other spatial effects, is a normal distribution with mean equal to the average spatial effects of its neighboring areas and variance inversely proportional to the number of these neighbors. In this study, the spatial neighbors are defined if they share a border or a vertex. For a temporal random effect in a time period, CAR normal smoothes it towards the temporal effects of its temporal neighbors (i.e. the previous and the next time periods). Due to the lack of strong prior knowledge, vague prior distributions are used for other parameters in the models based on current literature. We assign a flat prior on the overall log-rr terms, α k, and assign independent Normal (0, 10 5 ) prior distributions to fixed effects β k. The logarithms of the scaling parameters δ 1,k and δ 2,k are assigned independent Normal (0, 5) prior distributions (Downing et al. 2008). With respect to the spatio-temporal interaction random 55

70 effects, independent normal prior distributions with means equal to 0 and precisions τ υk, k =1,,4, are assigned to υ itk in model 2 for each population group, independent normal prior distributions with means equal to 0 and precisions τ θ are assigned to θ it in models 5 and 7, and a multivariate normal prior distribution with covariance matrix Σ is assigned to ω itk in models 3, 6 and 7 to allow correlations amongst the population groups (Richardson et al. 2006, Downing et al. 2008). Following the previous studies (Kelsall and Wakefield 1999, Best et al. 2005, Downing et al. 2008), independent conjugate hyperpriors Gamma (0.5, ) are assigned to all of the precision parameters in the normal priors for shared components τ ϕ, τ ϛ, τ θ and for population-group-specific components τ λk, τ ξk, τ υk, k =1,,4. The covariance matrix Σ in the multivariate normal prior is assigned a Wishart (Q, 4) distribution, where Q is set to be a diagonal matrix with 0.01s (Richardson et al. 2006). All of the models are constructed and run using WinBUGS software (Lunn et al. 2000). For each model, two independent chains are run. The first 50,000 iterations are discarded as burn-in to make sure inferences can be made based on converged simulations of the models. Then, 10,000 iterations are run and every 10 th is kept for reference. Therefore, the modeling results are based on thinned samples of 2,000. Brooks-Gelman-Rubin diagnostics (Brooks and Gelman 1998) and visual checks are used to assess convergence. Similar to the joint mapping of male and female lung cancer risks by Richardson et al (2006), the scaling parameters δ 2,k are difficult to converge during the data fitting of models. This could be because only four time periods cannot provide enough information to differentiate the shared and specific temporal patterns. So, we fixed δ 2,k = 1 for all joint models. We use the deviance information criterion (DIC) to compare the seven models and choose the best one to interpret. The DIC was proposed by Spiegelhalter et al (2002) as the sum of D 56

71 and pd, where D is the posterior mean of the deviance measuring the goodness-of-fit of a model, and pd is the number of effective model parameters measuring model complexity. The model with a smaller DIC is preferred. 3.4 Results From Table 3.4, we can see that joint model 6 has the smallest DIC value of among the seven models. The best data fit is model 7 and the simplest model is model 4. All of the joint models except for model 3 are better than the separate models based on their DICs. In the following, we choose the results of model 6 to interpret. In model 6, both the shared and the specific components include the spatial and temporal random main effects, and the specific spatio-temporal interaction random effect is also considered. Table 3.4. DICs of the seven models Model Type Model # D pd DIC Separate Model Model Model Model Joint Model Model Model As we know, the crude standardized incidence ratio (SIR), the ratio of the number of observed cases to the number of expected cases, is the best maximum likelihood estimate for RR in frequentist methods. For comparison, Figure 3.3 shows the spatial patterns of crude SIRs by race and sex in the first period Due to the uneven population distribution and possible missing in data collection, in these SIR maps, especially for black males and black females, many census tracts have zero cases observed in that tract in that time period which 57

72 cause zero SIRs. However, it is impossible that there are no disease risks in these census tracts in reality. In addition, it is obvious that the SIR surfaces are not smooth across the whole area since most of the RRs fall into either very high or very low category. Figure 3.3. Maps of crude standardized incidence ratios (SIRs) by race and sex during

73 Figures show the maps of posterior median RRs by race and sex in the four time periods. Compared to the crude SIRs in Figure 3.3, the posterior median RRs show much smoother spatial patterns without RRs equal to 0. For white males and white females, the high RRs are mainly concentrated in the northwest, southeast, and middle regions of Georgia. For black males, the high RRs are mainly concentrated in the northwest, east, and south in Georgia. The high RRs for black females are mainly concentrated in the northwest of Georgia. Comparing the maps of different time periods, we can see that, for white males and black males, more census tracts with moderate and low RRs emerge and the number of census tracts with high RRs decreases over the time; while the situations inverse for white females and black females. Following Richardson et al. s (2004) study evaluating the sensitivity and specificity of Bayesian hierarchical disease mapping models, we use a cut-off rule of 0.8 on the posterior probability that an area has a RR greater than 1 to pick out the areas with truly elevated RRs. Figure 3.8 shows the maps indicating how many times each census tract has an truly elevated RRs during the 4 time periods based on the rule of prob( RR>1) > 0.8. The frequency associated with each census tract reflects the stability of elevated RR in that area over the whole time period. From these maps, we can see that the northwest of Georgia and the area near Augusta have stably elevated RRs for all population groups. White males have the largest number of census tracts with stably elevated RRs, and black females have the smallest number. These results could be helpful to establish some etiological hypotheses. 59

74 Figure 3.4. Maps of the posterior median RRs for white males in each time period 60

75 Figure 3.5. Maps of the posterior median RRs for white females in each time period 61

76 Figure 3.6. Maps of the posterior median RRs for black males in each time period 62

77 Figure 3.7. Maps of the posterior median RRs for black females in each time period 63

78 Figure 3.8. Maps of elevated RR frequency by race and sex during Figure 3.9 shows clearer spatial patterns of RRs by the maps of the posterior median of the shared spatial component and the differential spatial components. Taking white males as the reference with its scaling parameter equal to 1 for the shared spatial component, the posterior median of the scaling parameters for white females, black males, and black females are 0.743, 0.538, and 0.571, respectively. The white female-white male differential and the black maleswhite males differential are relatively flat (less contrast) across the whole area, which indicates 64

79 that the pattern of the shared spatial component can well capture the variations of the spatial effects on RRs for both white females and white males. The strong contrast of the black femalewhite male differential reflects that there is an obvious difference in the patterns of spatial effects on RR between white males and black females. Figure 3.9. Maps of the posterior median of the shared spatial component and differential spatial components 65

80 Table 3.5 shows the posterior medians and 95% credible intervals (CIs) of the shared temporal component and the differential temporal components. We can see that the shared temporal trend keeps flat and slightly decreases after This trend well captures the temporal trend in the RRs of black males, but is different from those of white females and black females. Table 3.5. Posterior median (95% CI) of the shared temporal components and differential temporal components Time period Shared temporal components White female-white male differential Black male-white male differential Black female-white male differential (1.02, 1.07) 0.93 (0.90, 0.97) 1.01 (0.98, 1.06) 0.92 (0.86, 0.98) (1.01, 1.06) 0.97 (0.94, 1.00) 1.00 (0.97, 1.04)) 0.97 (0.92, 1.02) (0.96, 1.00) 1.02 (0.99, 1.05) 1.00 (0.97, 1.04) 1.03 (0.98, 1.08) (0.92, 097) 1.09 (1.05, 1.13) 0.98 (0.94, 1.02) 1.09 (1.03, 1.16) To understand the relationships between SES and RR by race and sex, Table 3.6 shows the posterior median of the RRs for SES quintile. The highest SES group is taken as the reference. We can see that the general trend for all population groups is that lower SES leads to a higher RR. However, the gradients of SES effects on the RRs in males, especially white males, are larger than those in females. That means the socioeconomic disparities in lung cancer RR are more obvious in males in Georgia. We also note that the RRs of SES groups 2 and 3 in black females are not statistically significant from that of SES group 1. Bayesian modeling is sensitive to the choice of priors and hyperpriors. Following Downing et al s (2008) work, we perform a sensitivity analysis using an alternative hyperprior distribution Gamma (1,1) to replace Gamma (0.5, ) for the precision parameters in model 2. The Gamma (0.5, ) distribution makes the variances (inverse of precision) have a 99% probability of lying between and 6.25 with a mode at For the Gamma (1, 1) 66

81 distribution, the 99% probability range of the variances is from to 100 and the mode is at 0.5. Table 3.7 shows the correlations between the posterior median RRs using model 2 with the two types of hyperpriors. We can see that the two groups of results show a good concordance in general, but the correlations in black indivduals are slightly lower than those in white individuals. These differences may be due to the different degrees of the sparseness of counts in races. Table 3.6. Posterior median (95% CI) of the RRs for SES quintile SES group White males White females Black males Black females 1 (highest) (1.20, 1.36) 1.11 (1.04, 1.18) 1.19 (1.04, 1.36) 1.01 (0.87, 1.19) (1.41, 1.62) 1.20 (1.12, 1.30) 1.42 (1.24, 1.63) 1.13 (0.96, 1.33) (1.46, 1.70) 1.16 (1.07, 1.26) 1.51 (1.32, 1.72) 1.23 (1.06, 1.44) 5 (lowest) 1.76 (1.61, 1.92) 1.32 (1.20, 1.44) 1.73 (1.52, 1.98) 1.41 (1.22, 1.65) Table 3.7. Correlations between the posterior median RRs using model 2 with two different types of hyperpriors Time period White males White females Black males Black females Discussions One of the limitations in this study is the lack of suitable smoking data at the fine spatial scale. It is well known that an individual s smoking behavior is an important risk factor for lung cancer. To some extent, the random effects in our hierarchical Bayesian spatio-temporal models can approximate the total effects of unmeasured or unknown risk factors including smoking. However, we believe that integrating suitable smoking data into the models can greatly improve the accuracy of the models. 67

82 For the diseases with a long latency period such as cancers, lifetime exposures could be important. In this study, we measure the area-based SES with Census 2000 data and assume they could reflect the individual SES during the long latency period. This assumption could introduce biases into the model inferences. In addition, the analysis of the relationship between disease RR and SES is subject to the modifiable area unit problem (Openshaw and Taylor 1981). It means that the references based on the analyses at current scale and/or unit definition may not be generalized to other scales and/or unit definitions. Estimation of population in small areas is a hot research topic in geography and statistics recently. In our study, we use an apportionment method to estimate the population by race, sex and age in each census tract in each intercensal year. Improvement of population estimation model could greatly benefit the disease mapping models. 3.6 Conclusions Facing the fact that there are a limited number of lung cancer studies in Georgia, especially at a fine spatio-temporal scale, we use hierarchical Bayesian models to explore the spatio-temporal patterns of lung cancer incidence risks in Georgia for the period The study is conducted at the census tract level using two-year time period as the temporal unit. The fine spatial and temporal scales enable the study to show more detailed variations of lung cancer incidence risks in space and time, which can better support healthcare performance assessment, thereby establishing potential etiological hypotheses and making effective and efficient health policies. Compared to the crude SIR, use of the Bayesian spatio-temporal model can provide a more reliable estimate of disease risk in a fine spatio-temporal scale. The study also shows that there are strong inverse relationships between SES and lung cancer incidence risk in males and 68

83 weak inverse relationships in females in Georgia. This could lead to further studies on the underlying reasons such as occupational risk factors. A total of seven Bayesian spatio-temporal models under the separate and joint modeling frameworks are proposed and compared. In this study, the joint models generally have better performance than the separate models using DIC as the criterion. Currently, our study is primarily focusing on mapping the patterns of disease risks. However, the spatial and temporal random effects in these disease mapping models may provide some implications on the potential disease risk factors for further ecological studies. 69

84 References Abellan, J.J., Richardson, S. & Best, N., Use of space time models to investigate the stability of patterns of disease. Environmental health perspectives, 116 (8), Bernardinelli, L., Clayton, D., Pascutto, C., Montomoli, C., Ghislandi, M. & Songini, M., Bayesian analysis of space time variation in disease risk. Statistics in Medicine, 14 (21 22), Besag, J., York, J. & Mollié, A., Bayesian image restoration, with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics, 43 (1), Best, N. & Jon, W., Accounting for inaccuracies in population counts and case registration in cancer mapping studies. Journal of the Royal Statistical Society. Series A (Statistics in Society), 162 (3), Best, N., Richardson, S. & Thomson, A., A comparison of bayesian spatial models for disease mapping. Statistical Methods in Medical Research, 14 (1), 35. Brooks, S.P. & Gelman, A., Alternative methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, Carstairs, V., Socio-economic factors at areal level and their relationship with health. Spatial Epidemiology, 1 (9), Darden, J., Rahbar, M., Jezierski, L., Li, M. & Velie, E., The measurement of neighborhood socioeconomic characteristics and black and white residential segregation in metropolitan detroit: Implications for the study of social disparities in health. Annals of the Association of American Geographers, 100 (1), Downing, A., Forman, D., Gilthorpe, M., Edwards, K. & Manda, S., Joint disease mapping using six cancers in the yorkshire region of england. International Journal of Health Geographics, 7 (1), 41. Fortunato, L., Abellan, J.J., Beale, L., Lefevre, S. & Richardson, S., Spatio-temporal patterns of bladder cancer incidence in utah ( ) and their association with the presence of toxic release inventory sites. International Journal of Health Geographics, 10 (1),

85 Georgia Department of Public Health, Cancer program and data summary. Atlanta,GA. Georgia Department of Public Health, Georgia comprehensive cancer registry [online]. [Accessed Access Date 2011]. Goodchild, M.F. & Lam, N.S., Areal interpolation: A variant of the traditional spatial problem. Geo-Processing, 1, Held, L., Natário, I., Fenton, S.E., Rue, H. & Becker, N., Towards joint disease mapping. Statistical Methods in Medical Research, 14 (1), Kelsall, J. & Wakefield, J., Discussion of ' bayesian models for spatially correlated disease and exposure data', by best et al. In Bernardo, J., Berger, J., Dawid, A. & Smith, A. eds. Bayesian statistics 6. Oxford, UK: Oxford University Press, 151. Knorr-Held, L., Bayesian modelling of inseparable space-time variation in disease risk. Statistics in Medicine, 19 (17-18), Knorr-Held, L. & Best, N.G., A shared component model for detecting joint and selective clustering of two diseases. Journal of the Royal Statistical Society: Series A (Statistics in Society), 164 (1), Krieger, N., Overcoming the absence of socioeconomic data in medical records: Validation and application of a census-based methodology. American Journal of Public Health, 82 (5), 703. Krieger, N., Chen, J.T., Waterman, P.D., Soobader, M.J., Subramanian, S. & Carson, R., Geocoding and monitoring of us socioeconomic inequalities in mortality and cancer incidence: Does the choice of area-based measure and geographic level matter? American Journal of Epidemiology, 156 (5), 471. Krieger, N., Williams, D.R. & Moss, N.E., Measuring social class in us public health research: Concepts, methodologies, and guidelines. Annual Review of Public Health, 18 (1), Lawson, A.B., Bayesian disease mapping: Hierarchical modeling in spatial epidemiology: Chapman & Hall/CRC. 71

86 Lunn, D.J., Thomas, A., Best, N. & Spiegelhalter, D., Winbugs-a bayesian modelling framework: Concepts, structure, and extensibility. Statistics and computing, 10 (4), Mollié, A., Bayesian mapping of hodgkins disease in france. Spatial Epidemiology, 1 (9), Openshaw, S. & Taylor, P.J., The modifiable areal unit problem. In Wrigley, N. & Bennett, R. eds. Quantitative geography: A british view. London and Boston: Routledge and Kegan Paul, Population Estimates Program, County intercensal estimates ( ) [online]. [Accessed Access Date 2012]. Richardson, S., Abellan, J. & Best, N., Bayesian spatio-temporal analysis of joint patterns of male and female lung cancer risks in yorkshire (uk). Statistical Methods in Medical Research, 15 (4), 385. Richardson, S., Thomson, A., Best, N. & Elliott, P., Interpreting posterior relative risk estimates in disease-mapping studies. Environmental health perspectives, 112 (9), Spiegelhalter, D.J., Best, N.G., Carlin, B.P. & Van Der Linde, A., Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64 (4), Wakefield, J., Best, N. & Waller, L., Bayesian approaches to disease mapping. Spatial Epidemiology, 1 (9), Waller, L., Carlin, B., Xia, H. & Gelfand, A., Hierarchical spatio-temporal mapping of disease rates. Journal of the American Statistical Association, Xia, H. & Carlin, B., Spatio-temporal models with errors in covariates: Mapping ohio lung cancer mortality. Statistics in Medicine, 17 (18),

87 CHAPTER 4 MODULAR CAPACITATED MAXIMAL COVERING LOCATION PROBLEM FOR THE OPTIMAL SITING OF EMERGENCY VEHICLES 3 3 Yin, P. and Mu, L Applied Geography 34: Reprinted here with permission of the publisher. 73

88 Abstract To improve the application of the maximal covering location problem (MCLP), several capacitated MCLP models were proposed to consider the capacity limits of facilities. However, most of these models assume only one fixed capacity level for the facility at each potential site. This assumption may limit the application of the capacitated MCLP. In this article, a modular capacitated maximal covering location problem (MCMCLP) is proposed and formulated to allow several possible capacity levels for the facility at each potential site. To optimally site emergency vehicles, this new model also considers allocations of the demands beyond the service covering standard. Two situations of the model are discussed: the MCMCLP-facility-constraint (FC), which fixes the total number of facilities to be located, and the MCMCLP-non-facility-constraint (NFC), which does not. In addition to the model formulations, one important aspect of location modeling spatial demand representation is included in the analysis and discussion. As an example, the MCMCLP is applied with Geographic Information System (GIS) and optimization software packages to optimally site ambulances for the Emergency Medical Services (EMS) Region 10 in the State of Georgia. The limitations of the model are also discussed. Keywords: Modular capacitated MCLP, Spatial demand representation, GIS, Emergency vehicle 74

89 4.1 Introduction Given a covering standard for a service, such as a distance or travel-time maximum, the objective of the maximal covering location problem (MCLP) is to locate a fixed number of facilities to provide the service to cover as many demands as possible. MCLP modeling, after being put forward by Church and ReVelle (1974), has been a powerful and widely used tool in many planning processes to optimally distribute limited resources to maximize social and economic benefits, such as the placement of emergency warning sirens (Current and O'Kelly 1992), fire stations (Indriasari et al. 2010), distribution centers for humanitarian relief (Balcik and Beamon 2008), health centers (Bennett et al. 1982, Verter and Lapierre 2002, Griffin et al. 2008, Ratick et al. 2009), and ecological reserves (Church et al. 1996). Among many different versions of MCLP models that have been proposed, a basic underlying assumption is that the facilities to be sited are uncapacitated. Under this assumption, the demand will be served as long as it is within the service covering standard of any facility. However, this assumption of uncapacitated facilities severely limits the application of covering models (Current and Storbeck 1988). Many service facilities have finite capacities to ensure an acceptable level of service and spatial equity (Murray and Gerrard 1997, Liao and Guo 2008). For example, an ambulance base can only respond to a limited number of demands within its service covering standard (e.g., 8- min driving distance) at one time because of the availability status of the ambulances stationed at the base. Therefore, the capacity limit the main constraint addressed in this article is an important consideration in location problems, especially for the siting of emergency facilities. Chung et al. (1983) and Current and Storbeck (1988) published two early papers dealing with the capacitated versions of the MCLP. Both groups of authors added maximum capacity constraints into the mathematical formulations of the MCLP to ensure that the demands allocated 75

90 to a facility will not exceed the capacity of that facility. However, these two capacitated MCLP models only consider the allocation of the demands within the service covering standard of facilities. Many systems, particularly public services, are typically available to all demands within their jurisdiction. For example, even if a demand is located in an area where no ambulances can reach the demand within a time standard, the demand must still be responded to and be counted as part of some facility s workload. Therefore, Pirkul and Schilling (1991) proposed an extension of the capacitated MCLP where all demands are assigned to facilities, regardless of whether that demand lies within the service covering standard. Such an idea of allocating all demands to facilities is also shown in some uncapacitated MCLP models, such as the generalized maximal covering location problem of Berman and Krass (2002). Following the work of Pirkul and Schilling (1991), Haghani (1996) proposed a multi-objective capacitated MCLP model where the objective function maximizes the weighted covered demand while simultaneously minimizing the average distance from the uncovered demands to the located facilities. He showed how to ensure the maximization of the weighted covered demand to be the primary objective in the model by adjusting its weight in the objective function. In all of the above capacitated MCLP models, only one fixed capacity level of the facility is considered for each potential facility site. However, many situations arise where each potential facility site could have several possible maximum capacity levels for a facility to choose. For example, the capacity limit of an emergency facility (e.g., ambulance base or fire station) can be assumed to be determined by its stationed emergency vehicles (e.g., ambulances or fire trucks). Therefore, varied numbers of emergency vehicles will provide a series of possible maximum capacity levels for the emergency facility to choose. Correia and Captivo (2003) called the location problems with such capacity constraints modular capacitated location problems. 76

91 However, their model is an extension of the capacitated plant location problem, the objective of which is to minimize total costs, including fixed costs and operating costs, associated with plant and transportation costs, among others. For emergency services, the objective is often stated as the minimization of losses to the public, which is equivalent to the maximization of benefits (Indriasari et al. 2010). Cost is usually not the first consideration in these services. Therefore, the capacitated MCLP is more suitable than the capacitated plant location problem for emergency services. Although Griffin et al. (2008) considered three capability levels for each type of health care facility in their capacitated MCLP model, there is no composing relationship for the capacity levels of facilities, such as that between emergency vehicles and emergency facilities. In addition, their model did not consider the allocation of demands outside the service covering standard. To apply the capacitated MCLP model to the emergency facility siting problem in which an emergency facility could have different possible capacity levels with varied numbers of stationed emergency vehicles, we propose an extension of the MCLP called the modular capacitated maximal covering location problem (MCMCLP). Similar to the multi-objective function in the model of Haghani (1996), the MCMCLP aims to maximize the weighted covered demand while simultaneously minimizing the average distance from the uncovered demands to the located facilities. The remainder of this article is organized as follows: In the next section, the concepts, formulations, and related issues of the MCMCLP are introduced and discussed in terms of two situations. The first situation involves a fixed total number of facilities to be located; in the second situation, the total number of facilities is not fixed. Subsequently, we briefly review the approaches for spatial demand representation that could influence the accuracy of the problem 77

92 solutions. The method called service area spatial demand representation (SASDR) is briefly described. Next, the MCMCLP and the SASDR are applied to the optimal siting of ambulances for the Emergency Medical Services (EMS) Region 10 in the State of Georgia (GA). Finally, a discussion and conclusions are provided. 4.2 Modular Capacitated Maximal Covering Location Problem (MCMCLP) Because of the capacity limit of a facility, the allocation problem (i.e., how to allocate demands to facilities) sometimes must be solved in conjunction with the location problem (i.e., where to site facilities) (Haghani 1996). Under the assumption that one demand can only be allocated to, at most, one facility, we define three demand types and use them in the following part of this article: 1) unallocated demand, which is not allocated to any facility (e.g., the demands d a and d b in Figure 4.1); 2) covered allocated demand, which is located within the service covering standard of a facility and is allocated to that facility (e.g., the demand d c in Figure 4.1); 3) uncovered allocated demand, which is located beyond the service covering standard of a facility but is allocated to that facility (e.g., the demand d d in Figure 4.1). d a d d f Facility Demand d b d c Allocated to Service Covering Standard Figure 4.1. Illustration of three demand types: unallocated demand (d a and d b ), covered allocated demand (d c ), and uncovered allocated demand (d d ) 78

93 Following the work of Pirkul and Schilling (1991) and Haghani (1996), and in light of a different perspective of the capacitated plant location problem of Correia and Captivo (2003), we present an extension to the capacitated MCLP called MCMCLP and utilize it for siting emergency services. In addition to the basic concept of the MCLP that the covered allocated demands should be maximized by optimally siting a fixed number of facilities, the MCMCLP also includes the following considerations: 1) the facility at each potential site has a maximum capacity, which will be chosen from a finite and discrete set of available capacity levels; 2) all demands need to be allocated to facilities (i.e., no unallocated demands exist), and the uncovered allocated demands could be assigned on the basis of their proximity to facilities; 3) the demands within a demand object, which is a spatial point or areal unit derived by abstracting or partitioning continuous demand space, may be divided and allocated to multiple facilities. An area with a larger population usually has a higher frequency of calls for emergency service than an area with a smaller population. In addition, one emergency vehicle can only respond to one call at a time and will be available only after that task is finished. Therefore, the larger population an ambulance serves, the higher the busyness probability it usually has, the longer the average response time for a call is, and the poorer the service it will provide. To ensure an acceptable average response time for a call, each emergency vehicle can be thought to have a maximum population that it can serve. In this article, we take population as demands, and the upper limit of the population served by an emergency vehicle is defined as the capacity of that vehicle. In fact, the calculation of an emergency vehicle s capacity needs to consider multiple factors, including the requirement for the average response time, the average frequency of calls in the population that it will serve, and the average treatment time for a task, among others. For simplicity, in this article, all emergency vehicles are assumed having the same 79

94 capacity, and the capacity of a facility can be assumed as the total capacities of all vehicles stationed in that facility. For example, if there could be at most p vehicles stationed in a facility, there are p possible levels of capacity from which to choose. A facility will not be established in a location unless at least one emergency vehicle needs to be stationed there. There are two situations for the MCMCLP. If there is no constraint on the total number of emergency facilities that will be established to station vehicles, then we call such a non-facilityconstraint problem MCMCLP-NFC. This situation mainly focuses on how to allocate a given number of vehicles to a set of predefined potential facility sites. If the total number of facilities is fixed, such facility-constraint problem is termed MCMCLP-FC. This situation needs to select the sites for a given number of facilities and then allocate a given number of vehicles to these facilities. Consider the following notation: I = the set of demand objects {1,..., i,,m; J = the set of potential facility sites {1,..., j,,n}; S = the service covering standard of facility (i.e., maximum distance or time); d ij = the travel distance or time from potential facility site j to demand object i; J i = the set of potential facility sites j within the service covering standard of which demand object i lies, i.e., { j d S }; ij a i = the amount of service demands at demand object i; p = the total number of emergency vehicles to be located; c = the capacity of one emergency vehicle (assuming all vehicles have the same capacity); w = the weight associated with all the uncovered allocated demands; 80

95 x j = the number of emergency vehicles stationed at potential facility site j; a facility is located on site j when x > 0; j y ij = the percentage of demands at demand object i that is allocated to the facility on site j. The formulation of the MCMCLP-NFC is Maximize i yij w a d a y Equation 4.1 i I j J i i I j J i ij i ij Subject to: i I a y i ij cx j j J Equation 4.2 j J x = p Equation 4.3 j j J y ij = 1 i I Equation 4.4 x j = 0,1,2,..., p j J Equation i I Equation 4.6 y ij Among Equations 4.1 to 4.6, 4.1 is a multiple objective function that seeks to maximize the amount of the covered allocated demands ( i I j J i a y i ij ) while simultaneously minimizing the total distance between the uncovered allocated demands and the sites to which they are assigned ( i I j J i d a y ). In this function, the weight w 0 can be varied to adjust the preference on each ij i ij objective. Constraints 4.2 ensure that all demands allocated to any facility cannot exceed the 81

96 maximum capacity of that facility (i.e., the total capacities of the emergency vehicles stationed there). If no facility (i.e., no vehicle) is located on a site, no demand will be allocated to that site. Constraint 4.3 specifies the total number of emergency vehicles to be located. Constraints 4.4 ensure that all demands at each demand object will be allocated to a facility. Constraints 4.5 indicate that the decision variable x j is a non-negative integer. Constraints 4.6 restrict the continuous decision variable y ij, which ranges from 0 to 1. We use min{p, n} to denote the smaller value between the total number of emergency vehicles, p, and the total number of potential facility sites, n. In the MCMCLP-NFC, emergency vehicles could be stationed in the facilities located on the sites as many as min{p, n}, whereas the MCMCLP-FC considers fixing the total number of facilities to be sited. To present the formulation of the MCMCLP-FC, we need to introduce additional notations: q = the total number of facilities to be sited; K = the set of possible facility sizes (i.e., the number of vehicles) on each potential facility site (1,, k,, p); x jk 1 if a facility with k vehicles is loated on potential facility site j = 0 otherwise The MCMCLP-FC has the same objective function Equation 4.1 and constraints 4.4 and 4.6 as the MCMCLP-NFC formulation. The other constraints include: k K x jk 1 j J Equation 4.7 i I a y i ij k K kcx jk j J Equation

97 j J k K j J k K kx = p Equation 4.9 jk x = q Equation 4.10 jk x jk { 1} j J k K 0,, Equation 4.11 Constraints 4.7 ensure that no more than one facility can be located on each potential facility site. Constraints 4.8 ensure that all the demands allocated to a facility cannot exceed the maximum capacity of that facility. Constraint 4.9 specifies the total number of emergency vehicles to be stationed. Constraint 4.10 specifies the total number of facilities to be sited. Constraints 4.11 impose integrality restriction on the decision variable x jk. In objective function Equation 4.1 for both MCMCLP models, the weight w associated with uncovered allocated demands can be varied to trade off the two objectives: the maximization of covered allocated demands and the minimization of the total distance of uncovered allocated demands to facilities. When w = 0, the model considers only the former objective, and the service level for the uncovered allocated demands will not be assured because they may be allocated to a further facility instead of to a nearer one. With w increases, the service level for the uncovered allocated demands will improve because more preference is given to the latter objective while the covered allocated demands may not be maximized by as many as demands as when w = 0. In general, maximization of the covered allocated demands would be the primary objective in emergency service planning, which means that, for a model with an appropriate weight w, the optimal solution will provide as good or better coverage of the covered allocated demand than any other feasible solutions (Haghani 1996). With the similar proof given by Haghani (1996), we can prove that, to ensure maximization of the covered allocated demands 83

98 is the primary objective, the weight w must meet the following condition when assuming integer demands: 1 0 w Equation 4.12 A d ( d ) max min where A is the total demands i I a, and d max and d min are the maximum and minimum distances, i respectively, between any pairs of demand object i and potential facility site j. 4.3 Spatial Demand Representation Taking residents as demands, the aggregated census data may be the spatial information of demands that we can easily obtain. When information on individual activity or tracking data is not available, a practical consideration is to assume that the demands are distributed continuously within the census units. For such continuous area demands, some spatial demand representation has to be adopted so that the MCLP model can be applied. The widely used point-based abstractions may be prone to measurement and coverage errors (Murray and O'Kelly 2002, Tong and Murray 2009). The areal representations with census units or grids of regular polygons often complicate the model because of the explicit processing of partial coverage caused by the mismatch between the boundaries of service covering areas and the demand areal units. To maintain both the simplicity and the high degree of accuracy of the maximal coverage model, the SASDR, which was proposed by Yin and Mu (2011), is used in this article to represent demand space. The SASDR is a polygon-overlay-based representation for continuously spatial demands. In this representation, the demand objects are created by using the service areas of all potential facility sites to partition the whole demand space. Figure 4.2(a) shows an example where a 84

99 square demand space U will be partitioned into the SASDR by two potential facilities f 1 and f 2 with circular service areas S 1 and S 2. Figure 4.2(b) shows the four resulting demand objects in the final SASDR, which includes U ( S 1 S 2 ), ( U S 1 ) S2, ( U S 2 ) S1, and U S 1 S 2. The biggest advantage of the SASDR is that all the demand objects lie either within or beyond the service covering standard of any potential facility site, which can avoid partial coverage in the model. With the basic functions in GIS software packages, such as buffer, overlay and network analysis, the SASDR can be easily realized. (a) (b) Figure 4.2. Example of the SASDR with circular facility service area (a) demand space U (the square) and two potential service areas S 1 and S 2 (the circles) (b) four demand objects in the SASDR result of demand space U partitioned by service areas S 1 and S Applications: Optimal Siting of Ambulances Because of its important social and economic objectives, the ambulance location problem has been widely studied over the past 40 years (Eaton et al. 1985, Adenso-Díaz and Rodríguez 1997, Brotcorne et al. 2003, Daskin and Dean 2005, Henderson and Mason 2005). Because ambulances are usually stationed in fire departments or parking lots with little additional 85

100 construction or administrative costs, it is unnecessary to limit the total number of facilities to be sited. Given this practical consideration, the MCMCLP-NFC model may be more appropriate than the MCMCLP-FC model. However, to better compare the performances of these two models, we here apply both MCMCLP-NFC and MCMCLP-FC to the optimal siting of ambulances for EMS Region 10 in GA Study Area and Data EMS Region 10 is one of the 10 EMS regions in GA, which is in the northeastern section of GA and is composed of 10 counties (Figure 4.3). The region serves 405,231 people (2000 census data) in a 3,006 total square mile area with 13 licensed ambulance services and 58 vehicles (OEMS 2006). The population in 2010 was 460,189, and the quartile map of the population density (persons/km 2 ) by census block group is shown in Fig. 3. The population data, boundary maps of census units, and street map are all taken from US 2010 census data because we need to reflect well the variation in demand across the study area with the population data at a relatively low spatial aggregation level, such as at the block group or block level, which are only available in census years. The Georgia EMS stations data from 2005 to 2007 are the only EMS data that we can obtain thus far; these data come from the Homeland Security Infrastructure Program (HSIP) and were downloaded from the website of the Georgia Department of Community Affairs (DCA 2011). These data consist of the information of the locations where the EMS personnel are stationed or based, or where the equipment that such personnel use in performing their jobs is stored for ready use. According to these data, a total of 82 EMS stations provide ambulance service in our study area (Figure 4.3). Among these stations, only two (Madison County Emergency Medical Services Station 4 and Greene County Emergency Medical Service) are not stationed in the fire departments. The count of EMS stations (82) is 86

101 larger than the count of ambulances (58). This result may be due to the inconsistency in the time periods for which the data were collected. In addition, it is common for ambulances to be periodically relocated among facilities to insure a good coverage at all times, which is an important difference between the operations of emergency medical services and other emergency services, such as those of fire departments or police departments (Brotcorne et al. 2003). Therefore, some EMS stations may not site the vehicles all the time. Although the population data and EMS data for different time periods are used, the time interval between these data is short; the time inconsistency is therefore ignored in this application until better-quality data become available. This data input is not the critical part of the models and should not significantly influence the illustration and validation of our models and their applications. Figure 4.3. Population density of Georgia EMS Region 10 (study area) by census block group and existing ambulance facility locations 87

102 4.4.2 Tasks To test the application of the MCMCLP for emergency services, a total of 58 ambulances will be allocated to maximize the covered allocated demands within 8-min driving distance from the facilities. The locations of 82 existing EMS stations are regarded as the potential facility sites. The demands are represented by the census population in 2010 by census block group. To ensure the existence of a feasible solution to the problem, we define the capacity of each ambulance as 8000 persons so that 58 ambulances have total capacity of 464,000, which exceeds the total demand of 460,189. We assume that the capacity of 8000 persons per ambulance can meet the requirement of the average response time to the calls for service in this region. In the MCMCLP- NFC model, the 58 vehicles could be allocated to, at most, 58 facility sites. In the MCMCLP-FC model, only 20 potential facility sites will be chosen, and the 58 vehicles will be allocated to these 20 sites. ArcGIS TM v9.3.1 is used to realize the SASDR. Programming with Visual Basic for Applications (VBA) for ArcObjects in ArcGIS TM v9.3.1 is used to structure the optimization model files. The optimization problems are then solved using the commercial mixed integer programming (MIP) software package CPLEX v12.2. All analyses are performed on a personal computer equipped with an Intel Core Quad 2.4 GHz CPU and 3 GB of RAM Results Realization of SASDR In the realization of SASDR, three types of roads are used to create the road network and then to create the 8-min service area for each potential facility site. The information for roads is listed in Table 4.1 and includes the MAF/TIGER Feature Class Codes (MTFCC) defined in the census data, road descriptions and hypothetical speed limits. Figure 4.4 shows the road network in the study area. 88

103 Table 4.1. Information for roads MTFCC Description Speed limit(miles/hour) S1100 Primary Road 70 S1200 Secondary Road 55 S1400 Local Neighborhood Road, Rural Road, City Street 40 Figure 4.4. Road network in EMS Region 10 in GA After the road network is created, a service layer that includes the 8-min service polygons for the 82 potential facility sites is created from the road network using the network-analysis functions in ArcGIS (Figure 4.5). The white areas indicate that no vehicles can reach these locations within 8 minutes from any potential facility location. Each service polygon was identified by the ID of its corresponding facility site. 89

104 Figure 4.5. Eight-minute service areas (non-white polygons) of all potential ambulance facility sites (red points) based on the road network With the polygon overlay tool Identity in ArcGIS, the service layer is used to partition the study area to derive the partition layer that includes all intersecting units among the service polygons and the study area. Because of possible overlap among the service polygons, the partition layer may include duplicate intersecting units that have the same location and shape but different facility site IDs. A new field, DO_ID, is created in the partition layer, and the Field Calculator function in ArcGIS with VBScript is used to compare the centroid coordinates and the area of each unit to identify the duplicate units. All units that represent the same demand object will be assigned the same demand object ID in the field DO_ID. In the attribute table of the partition layer, both facility site ID and demand object ID now exist in each record. The facility site j in the record of the demand object i indicates that the demand object i can be 90

105 completely covered by the service from the potential facility site j. This information will later be used to construct the model input file for CPLEX to solve the problem. A total of 2,721 demand objects are obtained for the study area. We export them from the partition layer to create the demand object boundary layer. The next step for the realization of SASDR is to calculate the amount of demands in each demand object, which will be interpolated from the census block group population data and assumed to be distributed uniformly within the demand object. When the polygon overlay tool Intersect in ArcGIS is used to overlay the layer of population density by block group on the demand object boundary layer, many intersecting units will emerge. The population in each unit is calculated by timing its population density with the size of that unit. Finally, the population of the intersecting units is aggregated to the demand objects. Fig. 6 shows the final SASDR result for the study area with demand (i.e., population) distribution. Because of the round-off error, a total aggregated population of 460,219 in the study area is obtained, which is then used as the total amount of demands in the subsequent model. There are 623 demand objects with no people because of their small sizes and low population densities. These zero-population demand objects are first excluded from the optimization problem to reduce the computing complexity. After the optimization problem is solved by CPLEX, these demand objects will be brought back and allocated to their nearest facilities. 91

106 Figure 4.6. SASDR result for the study area with demand (population) distribution Model Construction and Solution The distance between demand object and facility location is measured from the centroid of the demand object to the facility location point in kilometers. The maximum distance in this study area is km and the minimum distance is km. According to Equation , the value of weight w should be within the range [0, ] to ensure that the maximization of the covered allocated demands is the primary objective. In fact, as long as the value of weight w falls in this range and does not equal zero, the solutions of each model will be 8 the same, irrespective of the weight w. Therefore, we set w= 6 10 for both the MCMCLP- NFC and MCMCLP-FC models. 92

107 The model input files were constructed with the VBA program of ArcObjects in ArcGIS. These models were then solved in CPLEX, which uses a branch-and-cut technique to find the optimal solution (CPLEX Help 2011). The run time is 3,361 seconds for the MCMCLP-NFC model and 706 seconds for the MCMCLP-FC model. The solutions obtained from CPLEX were finally visualized as maps in ArcGIS. Figure 4.7 shows the results of two MCMCLP models using the choropleth maps overlaid with selected facility sites. In these maps, the facility and the demands allocated to it are represented in the same colors, and larger facility symbols indicate more ambulances. With such maps, the location-allocation patterns of the problem solution can be easily understood. For those demand objects whose demands will be divided and allocated to more than one facility, the strategy here is to split the demand object into multiple parts. For each facility that partially serves the demand object, there is a part in the demand object trying to be close to that facility, and its size is proportional to the percentage of demands served by that facility. In Figure 4.7(a), in which the MCMCLP-NFC is applied, a total of 51 out of 82 potential sites are chosen to set up the facilities, and 402,365 demands (87.4% of total demands) are covered within the 8-min service covering standard. In Figure 4.7(b), in which the MCMCLP-FC is applied, 20 facilities are required by the problem specification, and 358,477 demands (77.9% of total demands) are covered within the service covering standard. As expected, the amount of the covered allocated demands obtained by the MCMCLP-NFC is greater than that obtained by the MCMCLP-FC because more facilities in the MCMCLP-NFC provide greater flexibility for siting the ambulances. Because the proximity of the uncovered allocated demands to the facilities is 8 considered in both models (i.e., w= 6 10 ), the demands allocated to a facility are generally distributed more compactly and more continuous than those in the models with w=0 (results not 93

108 shown). However, the allocations of many facilities are still dispersed into several parts that may be far away from one another. For example, there are two major demand patches with varied sizes (filled with diagonals) allocated to the facility at site 13 in Figure 4.7(a). One reason for this allocation is that the primary objective of the models is to maximize the covered allocated demands instead of the proximity of the uncovered allocated demands to the facilities. The splitting operation of the demand objects to represent the partial coverage could also cause the noncontinuous demand allocations in the maps. Because of the smaller number of facilities established, the MCMCLP-FC shows a more compact and continuous distribution of the demands than the MCMCLP-NFC shows. Table 4.2 shows the counts of the facilities with varied numbers of ambulances in these two models. The maximum number of ambulances in a facility is 3 (site 45 in Figure 4.7(a)) in the MCMCLP-NFC model and 12 (site 35 in Figure 4.7(b)) in the MCMCLP-FC model. 94

109 (a) (b) Figure 4.7. Results of the MCMCLP models siting 58 ambulances in 82 potential facility 8 locations with w= 6 10 (the facility location is rendered in the same color as its allocation area) (a) the MCMCLP-NFC model (b) the MCMCLP-FC model with 20 facilities 95

110 Table 4.2. Count of the facilities with varied numbers of ambulances Number of ambulances in a facility MCMCLP-NFC Count of facilities MCMCLP-FC Total Discussion Several assumptions are made in this article to apply the MCMCLP models to optimally site emergency vehicles such as ambulances. One assumption is that a facility has a capacity that is related to the vehicles stationed there. This assumption is simple but reasonable. If the population in the jurisdiction of a facility is too large, one of the important indicators for the emergency service quality, the average response time to the calls for emergency service, will be too long. When the population exceeds a limit, the quality of the emergency service provided by that facility will be unacceptable. Given a requirement on the average response time to the calls, a facility with more vehicles may serve a greater population. In our application, for simplicity, we assume that each vehicle has the same capacity and that the capacity of a facility is equal to the total capacity of the vehicles located there. Admittedly, this is a very restrictive assumption because the capacity of an emergency vehicle actually depends on multiple factors, including the requirement on the average response time, the average frequency of calls in the population it will serve, and the average treatment time for a task, among others. A discussion of this problem exceeds the scope of this article. However, if the possible capacity levels of the facility at each potential site can be estimated and taken as a group of constants, the MCMCLP model can be easily modified to accommodate the situation. The location problems of emergency vehicles are, 96

111 in reality, complex. The MCMCLP is a static model that does not consider the dynamic factors such as the daily population movement. Accounting for such factors will be the focus of our future work. The MCLP has been proven to be nondeterministic polynomial time (NP)-hard (Megiddo et al. 1981), which means that no algorithm has yet been discovered to solve it in polynomial time in the worst case. As an extension to the MCLP, the MCMCLP is also NP-hard. Therefore, the use of exact methods (e.g., enumeration or linear programming with branch-and-bound) to solve a large-scale MCMCLP will be difficult. Seeking heuristic methods (e.g., genetic algorithm or Lagrangian relaxation) is important for promoting the applications of the MCMCLP. A potential heuristic method for solving the MCMCLP is a two-phase procedure, in which the locations of the facilities and the demand allocation are first determined under the assumption that the facilities are uncapacitated; the emergency vehicles are then allocated to each facility depending on the allocated demands. We note that this two-phase procedure does not consider that the second phase may change the demand allocation determined by the first phase, which will cause the configuration of facility locations determined by the first phase to not necessarily be the optimal solution for the whole problem. Although model formulation and the optimization of algorithms are always the focus in location modeling, many other aspects of the location problem, such as the representation for spatial demands, also influence the accuracy of the modeling solutions and require attention. An effective visualization of the problem solutions will be helpful in understanding the locationallocation patterns and in making decisions by comparing different modeling results. One problem that we need to address for our MCMCLP models in the future is how to better represent in the map the demand objects served by multiple facilities. 97

112 In the MCMCLP model, GIS plays an important role. It is used to manage and organize the spatial data, to realize the spatial demand representation, to help construct the model input file for optimization software packages, and to visualize the problem solution with maps. In addition to these important functions, GIS also facilitates theoretical advances in current location science (Church 2002, Murray 2010). 4.6 Conclusion The MCMCLP that we proposed in this article is an extension of the capacitated MCLP to accommodate situations where the facilities to be sited have several possible capacity levels. For the optimal siting of emergency vehicles, the MCMCLP considers the modular capacity levels of a facility, the allocation of all demands, and the proximity of the uncovered allocated demands to facilities. Two situations the MCMCLP-NFC and the MCMCLP-FC can be used depending on the circumstances of the facility. In cases where the cost of a facility is low and maximization of the covered allocated demands is the main purpose, such as establishing bases for ambulances that are not always based in a building but are often at a very rudimentary location such as a parking lot (Brotcorne et al. 2003), the MCMCLP-NFC may be more useful because more covered allocated demands are generally obtained than with the MCMCLP-FC. If the cost of facilities is also an important consideration, such as with fire stations for fire trucks, the MCMCLP-FC may be better because we can incorporate information about how many facilities we can build in the location modeling. 98

113 References Adenso-Díaz, B. & Rodríguez, F., A simple search heuristic for the mclp: Application to the location of ambulance bases in a rural region. Omega, 25 (2), Balcik, B. & Beamon, B.M., Facility location in humanitarian relief. International Journal of Logistics: Research & Applications, 11 (2), Bennett, V.L., Eaton, D.J. & Church, R.L., Selecting sites for rural health workers. Social Science & Medicine, 16 (1), Berman, O. & Krass, D., The generalized maximal covering location problem. Computers & Operations Research, 29 (6), Brotcorne, L., Laporte, G. & Semet, F., Ambulance location and relocation models. European Journal of Operational Research, 147 (3), Chung, C., Schilling, D. & Carbone, R., Year. The capacitated maximal covering problem: A heuristiced.^eds. Proceedings of the Fourteenth Annual Pittsburgh Conference on Modeling and Simulation, Church, R. & Revelle, C., The maximal covering location problem. Papers in regional science, 32 (1), Church, R.L., Geographical information systems and location science. Computers & Operations Research, 29 (6), Church, R.L., Stoms, D.M. & Davis, F.W., Reserve selection as a maximal covering location problem. Biological conservation, 76 (2), Correia, I. & Captivo, M.E., A lagrangean heuristic for a modular capacitated location problem. Annals of Operations Research, 122 (1), Cplex Help, Branch and cut [online]. l# [Accessed Access Date 2011]. 99

114 Current, J. & O'kelly, M., Locating emergency warning sirens. Decision Sciences, 23 (1), Current, J. & Storbeck, J., Capacitated covering models. Environment and Planning B, 15, Daskin, M. & Dean, L., Location of health care facilities. Operations Research and Health Care, Dca, Data and maps for planning [online]. [Accessed Access Date 2011]. Eaton, D.J., Daskin, M.S., Simmons, D., Bulloch, B. & Jansma, G., Determining emergency medical service vehicle deployment in austin, texas. Interfaces, Griffin, P.M., Scherrer, C.R. & Swann, J.L., Optimization of community health center locations and service offerings with statistical need estimation. IIE Transactions, 40 (9), Haghani, A., Capacitated maximum covering location models: Formulations and solution procedures. Journal of advanced transportation, 30 (3), Henderson, S. & Mason, A., Ambulance service planning: Simulation and data visualisation. Operations Research and Health Care, Indriasari, V., Mahmud, A.R., Ahmad, N. & Shariff, A.R.M., Maximal service area problem for optimal siting of emergency facilities. International Journal of Geographical Information Science, 24 (2), Liao, K. & Guo, D., A clustering based approach to the capacitated facility location problem. Transactions in GIS, 12 (3), Megiddo, N., Zemel, E. & Hakimi, S.L., The maximum coverage location problem: Northwestern University. Murray, A.T., Advances in location modeling: Gis linkages and contributions. Journal of geographical systems, 12 (3),

115 Murray, A.T. & Gerrard, R.A., Capacitated service and regional constraints in locationallocation modeling. Location Science, 5 (2), Murray, A.T. & O'kelly, M.E., Assessing representation error in point-based coverage modeling. Journal of geographical systems, 4 (2), Oems, Office of emergency medical services/trauma operating report. Pirkul, H. & Schilling, D.A., The maximal covering location problem with capacities on total workload. Management Science, 37 (2), Ratick, S.J., Osleeb, J.P. & Hozumi, D., Application and extension of the moore and revelle hierarchical maximal covering model. Socio-Economic Planning Sciences, 43 (2), Tong, D. & Murray, A.T., Maximising coverage of spatial demand for service. Papers in regional science, 88 (1), Verter, V. & Lapierre, S.D., Location of preventive health care facilities. Annals of Operations Research, 110 (1), Yin, P. & Mu, L., Service area spatial demand representation in maximal coverage modeling. Manuscript submitted for publication. 101

116 CHAPTER 5 AN EMPIRICAL COMPARISON OF SPATIAL DEMAND REPRESENTATIONS IN MAXIMAL COVERAGE MODELING 4 4 Yin, P and Mu, L. To be submitted to Environment and Planning B. 102

117 Abstract Operationally representing spatial demand is necessary to apply location models to planning processes and closely related to the efficiency of modeling solutions. A spatial demand representation should not only be able to minimize representation error, but also keep the complexity of model as low as possible. Most of the current research, however, is primarily focused on assessing and reducing/eliminating representation error while ignoring the complexity of modeling associated with demand representation. In this study, we use expressions of set theory to formulize a polygon-overlay-based demand representation called service area spatial demand representation (SASDR). Using the maximal covering location problem (MCLP) as an example, we empirically compare SASDR to widely-used point-based and regular-areabased demand representations in terms of both problem complexity and representation error. Our study shows that, although use of SASDR can eliminate some errors associated with other demand representations, problem complexity with SASDR could become extremely high with the increase of potential facility sites, which could become computationally intractable for exact methods in current optimization software. Point-based demand representation with fine granularity sometimes is a good alternative to SASDR because it can provide similarly effective modeling solutions while avoiding extensive computation in GIS for the realization of SASDR. Regular-area-based demand representation is not strongly recommended based on its poor performance compared to the point-based demand representation with a similar problem complexity. Keywords: MCLP, Spatial demand representation, Representation error, Problem complexity, GIS 103

118 5.1 Introduction The fact that different scale- and/or unit-definitions in geographic analyses produce different results is known as the modifiable areal unit problem (MAUP) (Openshaw and Taylor 1981). The MAUP is important not only in general areas of geographic analysis, but also in location modeling where the MAUP is manifested in aggregation and representation errors (Cromley et al. 2012). There has been a long history of study on aggregation error in location modeling including p-median problems and covering location problems (Hillsman and Rhoda 1978, Goodchild 1979, Current and Schilling 1987, Daskin et al. 1989, Current and Schilling 1990, Hodgson and Neuman 1993, Bowerman et al. 1999, Francis et al. 2009, Cromley et al. 2012). More recently, representation error in location modeling, especially covering location models, has started to receive more attention (Murray and O'Kelly 2002, Murray et al. 2008, Tong and Murray 2009, Cromley et al. 2012). For covering location modeling, it is common to assume that aggregated or continuous spatial demand is concentrated on a set of points or uniformly distributed within areal units. With respect to these point-based and area-based demand representations, there are several studies focusing on assessing the associated representation errors (Murray and O'Kelly 2002, Murray et al. 2008). Several other studies tried to reduce or eliminate the representation errors by new covering model formulations (Murray 2005, Tong and Murray 2009). Different from the traditional area-based representations using census units or regular polygons, such as triangles or rectangles, as demand objects, Cromley et al. (2012) proposed a new area-based demand representation that partitions a continuous demand space using polygon overlay methods into a set of areal units called the least common demand coverage units (LCDCUs). This representation 104

119 approach, without complicated model formulations, could reduce or eliminate some errors associated with the traditional point-based and area-based representations. Current studies with respect to spatial demand representations primarily focus on the evaluation of representation errors and how to reduce or eliminate these errors. However, the complexity of problems associated with demand representations is rarely discussed. Many covering location models, such as the maximal covering location problem (MCLP), have been proven to be nondeterministic polynomial time (NP)-hard (Megiddo et al. 1981), which means that no algorithm has been discovered yet to solve it in polynomial time in the worst case. Actually, the size of a covering location problem is highly related to the demand representation it adopts. Therefore, even if a demand representation approach may theoretically reduce or eliminate some representation errors in a problem, it probably could make the problem difficult, if not impossible, to solve using exact methods in current optimization software. Relying on some heuristic algorithms to solve such a complicated problem may introduce other errors in modeling results. As Cromley et al. s (2012) spatial demand representation with LCDCUs is based on the service area of a facility at each potential facility site, we define this representation as service area spatial demand representation (SASDR). In this paper, we use the MCLP as an example to empirically compare SASDR to the traditional point-based and regular-area-based representations where both representation error and problem complexity are simultaneously considered. Specifically, we evaluate problem complexity associated with these three types of demand representations and compare their representation errors given similar degrees of problem complexity. This comparison is expected to provide some insight on how to choose appropriate demand representations in practical applications. Although the question of how to realize 105

120 SASDR with GIS was briefly described in texts by Cromley et al. (2012), it is worth formulizing the process of its realization for better preciseness and clarity. In the following two sections, more details about representation error and problem complexity in the MCLP are reviewed. Next, the formulization of SASDR is given and explained. Experimental designs for understanding the problem complexity and modeling errors associated with the three types of demand representations are then described, followed by the experimental results and discussions. Finally, some conclusions are offered. 5.2 Representation Error in Covering Location Modeling In covering location modeling, aggregation and representation errors are related but fundamentally different. Murray and O Kelly (2002) have noted that the aggregation of spatial information assumes there is one true lowest level of data. For example, the population at any higher level in the census hierarchy is an instance of the aggregation of the population at any lower level such as the census block level. Aggregation error occurs in any analysis conducted above the level of the individual or whenever a scale change occurs (Cromley et al. 2012). Comparing to demand aggregation, demand representation usually has no such hierarchy as that in census data. Individual demand is usually represented by the location point of that demand. Any aggregated or continuous demand is often assumed to be concentrated on a set of points or uniformly distributed within areal units. With different point or areal tessellations for representing the same aggregated or continuous demand in a region, some modeling errors could occur. Such representation error is usually measured by comparing modeling results with one spatial demand representation to those with another at the same aggregation levels. It is a long-held tradition that continuous demand is represented by a set of discrete weighted points where the weight represents the amount of demand for service on that point. 106

121 Many location models including the MCLP were proposed based on this kind of demand representation. Along with the development of GIS in location science, areal units have been used to represent continuous demand due to the 2-dimensional nature of demand space and the strong capability of GIS to manipulate 2-dimensional spatial objects (Miller 1996, Kim and Murray 2008, Murray et al. 2008, Tong et al. 2009, Tong and Murray 2009, Alexandris and Giannikos 2010). Figure 5.1 shows four examples of the traditional point-based and area-based representations for the demand in a region with three polygons. In Figure 5.1(a), the demand in each polygon is assumed to be concentrated on the centroid of that polygon or uniformally distributed within that polygon. Figure 5.1(b) shows using a rectangle grid or its centroids to represent the demand space where the demand in each rectangle is assumed uniformally distributed or concentrated on its centroid. When the demand within each demand object cannot be obtained directly, which is very common, it may need to be estimated using areal interpolation techniques with other available demand data that have inconsistent boundaries of units with the demand representation. Especially, intelligent areal interpolation methods, which is based on the principles of dasymetric mapping, usually can provide better estimates of the spatial heterogeneity of demand within areal units than simple areal interplation methods do (Cromley et al. 2012). 107

122 (a) (b) Figure 5.1. Examples of spatial demand representations with (a) census blocks or their centroids, and (b) rectangle grid or its centroids In many covering location models, demand of a demand object only has a binary status being completely covered by a facility or completely not. In Figure 5.1, we assume a facility (the star) with circular service coverage is located in the region. According to the point-based demand representation in Figure 5.1(a), the demand within polygon C is considered covered by the facility since its centroid is within the service coverage. No demand in polygons A and B is considered covered since both of their centroids are outside the service coverage. However, the reality is that a portion of demand within polygon C is not covered while a portion of demand within polygons A and B is covered. Based on the area-based representation in Figure 5.1(a), no demand in the whole region is considered covered since none of these three polygons is completely within the service coverage. However, it is true that a portion of demand in these three polygons is covered. The similar situation occurs when using the point-based or area-based demand representations in Figure 5.1(b). Assuming the demand estimate within each areal unit is real, we can see that point-based demand representation could either underestimate or overestimate the amount of real demand covered, whereas traditional area-based demand 108

123 representation could underestimate the amount of real demand covered. Such underestimation or overestimation will lead to modeling errors in both the total amount of covered demand estimated by the objective functions of models and the configuration of facilities given by the decision variables in model results. Based on the discussions by Casillas (1983) and Cromley et al. (2012), representation error is defined as the difference between the objective function values optimized for the same study area with two different demand representations. We use Cromley et al. s (2012) terminology and consider the following notation: f a is an objective function using representation a f b is an objective function using representation b x a is the optimal solution to the problem using representation a x b is the optimal solution to the problem using representation b Taking representation b as the reference, representation error is defined as follow: Representation error ( xa ) f b ( xb ) f ( x ) [ f a = Equation 5.1 b b ] Representation error can be decomposed into cost error and optimality error. Cost error is the difference between the objective function values of the same solution measured with two different demand representations, which is shown as follow: Cost error ( xa ) f b ( xa ) f ( x ) [ f a = Equation 5.2 b b ] 109

124 Optimality error is the difference between the objective function values of two optimal solutions measured with the same demand representation. It is defined as follow: Optimality error ( xa ) f b ( xb ) f ( x ) [ f b = Equation 5.3 b b ] 5.3 The MCLP Model and Problem Complexity Given a covering standard for a service, such as maximum distance or travel time, the objective of the MCLP is to locate a fixed number of facilities to provide service coverage for as much spatial demand as possible. Consider the following notation: I = the set of demand objects (i as demand object index) J = the set of potential facility sites (j as facility site index) d ij = the travel distance or time from potential facility site j to demand object i S = the distance or time beyond which a demand object is considered uncovered w i = the demand for service at i p = the total number of facilities to be located x j y i 1 if facility site j is selected = 0 otherwise 1 if demand i is covered (or served) = 0 otherwise a ij 1 if facility site j is capable of serving demand i, i. e. dij S = 0 otherwise 110

125 The formulation of the MCLP (Church and ReVelle (1974) is Maximize i I w Equation 5.4 i y i Subject to j J a ij x j y i i Equation 5.5 j J x = p Equation 5.6 j x j { 1} j 0, Equation 5.7 y i { 1} i 0, Equation 5.8 The objective Equation 5.4 seeks to maximally cover the amount of weighted demand. Constraints 5.5 require that demand i can be covered only if at least one facility is located at the sites where the service can cover demand i. Constraint 5.6 specifies the total number of facilities to be located. Constraints 5.7 and 5.8 impose integrality conditions on decision variables. The complexity of the MCLP problem mainly depends on the number of demand constraints (Equation 5.5) and the number of integrality constraints on decision variables (Equation 5.7) and (Equation 5.8). For each demand object (e.g., point or areal unit), if its demand weight is larger than 0 and it can be covered by a facility at a potential location, there will be a demand constraint and an integrality constraint associated with this demand object in the MCLP model. Each potential facility site also contributes an integrality constraint to the model. Therefore, the complexity of the MCLP problem is highly related with the spatial demand representation and the number of potential facility sites in an application. When using census 111

126 units or their centroids to represent demand, the number of demand objects is equal to the number of census units in the study area. However, when using point grid or regular area grid to represent demand, the number of demand objects depends on the grid design which is often arbitrary. In applications of the MCLP model, the size of census unit or regular areal unit for demand representation is usually smaller than the service coverage of a facility for better accuracy of modeling results. Analysis based on a demand representation with finer granularity (i.e., smaller size of demand object) also is expected to lead to smaller representation errors since more complete demand objects can be covered within service coverage of a facility. With respect to predefined potential facility sites, we need to consider multiple factors including cost, site availability, proximity to demand, access to other services, etc., which may have large variability in a region. More potential facility sites could provide more configurations of facilities to choose, which in turn can improve the optimality on the amount of demand covered by a given number of facilities. It is noted that, however, at the same time when more demand objects and potential facility sites are used to improve modeling results, the model could become dramatically complex and lead to a computational challenge for exact methods in current commercial optimization software. Heuristic methods, such as genetic algorithms, provide alternative approaches to solve such complex location problems. However, they cannot ensure optimal solutions which could lead to other errors in modeling results, and sophisticated strategies for heuristic algorithms and strong programming skills are also required. 5.4 Service Area Spatial Demand Representation SASDR was originally described by Cromley et al. (2012) as an area-based demand representation, with or without intelligent areal interpolation, used to be compared to census- 112

127 centroid-based demand representation in terms of representation and scale error. In this section, we use expressions of set theory to formulize the realization of SASDR, which is easier to understand and to be implemented in different GIS software packages. In addition, we discuss both representation error and problem complexity of SASDR based on its concept. The map overlay process has been used for approximately 50 years, and its multiple forms are important spatial analysis methods in GIS (McHarg and American Museum of Natural History. 1969, Longley et al. 2005). SASDR is based on one of the map overlay operations. Considering two sets A (rectangle) and B (circle) in Figure 5.2(a), the overlay operation A B is defined as below: { X X I = {A B,A φ} A B = B} and X Equation 5.9 where I is a two-member set in which, as shown in Figure 5.2(b), member elements that are members of A but not members of B, and member A B is the set of all A B is the set of all elements that are members of both A and B. A B is the set whose members are those non-empty members of I. Therefore, A B can be a two-member set { A B, A B} when A B and A B φ, be a one-member set { A B} when A φ, A B and A B = φ, be a one-member set { A B} when B A = and A B φ, or be the empty set φ when A = φ. 113

128 (a) (b) Figure 5.2. Illustration of overlay operation A B: (a) set A and set B (b) the result from A B as below: For a set of sets C = {C i, i= 1, 2, 3,, n} and a set D, overlay operation C D is defined n ( C D) C D = Equation 5.10 i= 1 i Therefore, C D is actually a set of sets consisting of all members of the sets obtained by conducting the overlay operation on each member C i of set C with set D. Because the set of potential facility sites and the service standard are given in our case, the service area at each potential facility site can then be determined. Consider the following notation: U = the whole demand space S j = the service area at potential facility site j (j = 1, 2, 3,, m) SASDR is defined as the partition of demand space U into a finite demand object set SA_DOS : SA_DOS = U S Equation S2 S3... Sm 114

129 Each element D SA _ DOS is defined as a demand object, also called LCDCU following Cromley et al. s (2012) terminology, that is disjointed with one other and D SA_ DOS D = U. Figure 5.3(a) shows an example in which a rectangle demand space U will be partitioned into a SASDR by two potential facility sites f 1 and f 2 with circular service areas S 1 and S 2. First, demand space U is partitioned by service area S 1, creating two demand objects { U S U S } U S1 = (Figure 5.3(b)). Then, service area S 2 is used to continue to partition 1, 1 the demand space U. A total of four demand objects {( U S ) S,( U S ) S,( U S ) S,( U S ) S } = are created in the final U S1 S U S1 S U S1 S SASDR (Figure 5.3(c)). Demand objects ( ) 2 and ( ) 2 can be completely covered if a facility is located at site f 1, and demand objects ( U S 1 ) S2 and ( U S1) S2 can be completely covered if a facility is located at site f 2. Neither of the services can completely or partially cover demand object ( ) 2 U S 1 S. Despite the simple circular shape demonstrated, the facility service area could be any shape. We can see that SASDR is fundamentally a simple map overlay-based approach. Compared to point-based demand representations, it uses areal demand units that can reduce the potential measurement and coverage errors caused by aggregating continuous demand to discrete point demands. Compared to those traditional area-based demand representations using census units or regular area grid, it has the advantage that all demand objects will either be completely covered or not be covered by the service from any potential facility site. Without the partial coverage problem, the modeling is more efficient than those in which the partial coverage needs to be handled explicitly in models to reduce modeling errors, such as those proposed by Murray (2005) and Tong and Murray (2009). 115

130 (a) (b) (c) Figure 5.3. The SASDR with circular facility service area: (a) demand space U and two potential service areas S 1 and S 2, (b) the partition of demand space U with service area S 1, and (c) the partition of demand space U with both service areas S 1 and S 2 Different from point-based and traditional area-based demand representations where the number of demand objects is independent of the configuration of potential facility sites, the number and arrangement of demand objects in SASDR are completely determined by the service standard and the configuration of potential facility sites in an application. In other words, the complexity of a MCLP model using SASDR is a function of the combination of service standard 116

131 and configuration of potential facility sites. This could be a problem when a high density of potential facility sites is needed. 5.5 Experimental Design Unlike previous studies where the comparisons of spatial demand representations only focus on representation error, we also simultaneously consider problem complexity associated with spatial demand representations. It is known that the increase of demand objects or potential facility sites is expected to reduce representation error and improve the optimality of modeling solutions. In our experiments, we mainly focus on the following two questions: (1) How does the complexity of a problem using SASDR change when varying service standard and configuration of potential facility sites? (2) Given similar degrees of problem complexity, is there a large representation error between SASDR and other types of demand representations including point-based and traditional area-based approaches? The study area in the experiments is the City of Decatur, Georgia which has an area of approximately 4.2 square miles. The 2010 U.S. Census population data at the block level are used to estimate the demand of each spatial object in all representations. To improve the accuracy of the demand estimation, we use the 2010 land use data showing developed and undeveloped areas as ancillary data and overly it on the census population data so that all population are constrained within the developed areas. The 2010 land use data were downloaded from the website of Atlanta Regional Commission (ARC 2012). To have an understanding about question 1, we design three modes for potential facility sites including one regular pattern and two irregular patterns as shown in Figure 5.4. Figure 5.4(a) shows regular grid points with spacing R. Figure 5.4(b) shows the centroids of all census blocks, 117

132 and Figure 5.4(c) shows all intersections of major roads in the study area. Both GIS data for census blocks and major roads came from the 2010 Census data. For the mode of regular grid points in Figure 5.4(a), we set spacing R with 5 values (meter as unit) including 500m, 400m, 300m, 250m, and 200m, which produce 42, 66, 116, 177, and 272 potential facility sites. Then, the same numbers of potential facility sites are randomly chosen from the centroids of census blocks in Figure 5.4(b) and the intersections of major roads in Figure 5.4(c). Finally, we have total 15 configurations of potential facility sites with three modes (regular grid point, centroid of census block, and intersection of roads) and five different numbers of sites (42, 66, 116, 177, and 272). With respect to the service standard of facilities, we define circular service coverage with three different radii: 300m, 650m, and 1000m. With each combination of service standard and configuration of potential facility sites, we create a SASDR and record the number of demand objects. (a) (b) (c) Figure 5.4. Three modes of potential facility sites: (a) regular grid points with spacing R, (b) centroids of census blocks, and (c) intersections of major roads 118

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES Mariana Nagy "Aurel Vlaicu" University of Arad Romania Department of Mathematics and Computer Science

More information

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World Outline Practical Point Pattern Analysis Critiques of Spatial Statistical Methods Point pattern analysis versus cluster detection Cluster detection techniques Extensions to point pattern measures Multiple

More information

Hierarchical Bayesian Modeling of Spatio-temporal Patterns of Lung Cancer Incidence Risk in Georgia, USA:

Hierarchical Bayesian Modeling of Spatio-temporal Patterns of Lung Cancer Incidence Risk in Georgia, USA: Hierarchical Bayesian Modeling of Spatio-temporal Patterns of Lung Cancer Incidence Risk in Georgia, USA: 2000 2007 By: Ping Yin, Lan Mu, Marguerite Madden, John E. Vena Yin, P., Mu, L. Madden, M., Vena,

More information

FleXScan User Guide. for version 3.1. Kunihiko Takahashi Tetsuji Yokoyama Toshiro Tango. National Institute of Public Health

FleXScan User Guide. for version 3.1. Kunihiko Takahashi Tetsuji Yokoyama Toshiro Tango. National Institute of Public Health FleXScan User Guide for version 3.1 Kunihiko Takahashi Tetsuji Yokoyama Toshiro Tango National Institute of Public Health October 2010 http://www.niph.go.jp/soshiki/gijutsu/index_e.html User Guide version

More information

Spatio-Temporal Cluster Detection of Point Events by Hierarchical Search of Adjacent Area Unit Combinations

Spatio-Temporal Cluster Detection of Point Events by Hierarchical Search of Adjacent Area Unit Combinations Spatio-Temporal Cluster Detection of Point Events by Hierarchical Search of Adjacent Area Unit Combinations Ryo Inoue 1, Shiho Kasuya and Takuya Watanabe 1 Tohoku University, Sendai, Japan email corresponding

More information

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007 Cluster Analysis using SaTScan Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007 Outline Clusters & Cluster Detection Spatial Scan Statistic Case Study 28 September 2007 APHEO Conference

More information

Outline. 15. Descriptive Summary, Design, and Inference. Descriptive summaries. Data mining. The centroid

Outline. 15. Descriptive Summary, Design, and Inference. Descriptive summaries. Data mining. The centroid Outline 15. Descriptive Summary, Design, and Inference Geographic Information Systems and Science SECOND EDITION Paul A. Longley, Michael F. Goodchild, David J. Maguire, David W. Rhind 2005 John Wiley

More information

Purpose Study conducted to determine the needs of the health care workforce related to GIS use, incorporation and training.

Purpose Study conducted to determine the needs of the health care workforce related to GIS use, incorporation and training. GIS and Health Care: Educational Needs Assessment Cindy Gotz, MPH, CHES Janice Frates, Ph.D. Suzanne Wechsler, Ph.D. Departments of Health Care Administration & Geography California State University Long

More information

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

Inclusion of Non-Street Addresses in Cancer Cluster Analysis Inclusion of Non-Street Addresses in Cancer Cluster Analysis Sue-Min Lai, Zhimin Shen, Darin Banks Kansas Cancer Registry University of Kansas Medical Center KCR (Kansas Cancer Registry) KCR: population-based

More information

ACCELERATING THE DETECTION VECTOR BORNE DISEASES

ACCELERATING THE DETECTION VECTOR BORNE DISEASES NC State s Geospatial Analytics Forum October 22 2015 ACCELERATING THE DETECTION of SPACE-TIME CLUSTERS for VECTOR BORNE DISEASES Dr. Eric Delmelle Geography & Earth Sciences, University of North Carolina

More information

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2 ARIC Manuscript Proposal # 1186 PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2 1.a. Full Title: Comparing Methods of Incorporating Spatial Correlation in

More information

Cluster Analysis using SaTScan

Cluster Analysis using SaTScan Cluster Analysis using SaTScan Summary 1. Statistical methods for spatial epidemiology 2. Cluster Detection What is a cluster? Few issues 3. Spatial and spatio-temporal Scan Statistic Methods Probability

More information

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign GIS and Spatial Analysis Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu Outline GIS and Spatial Analysis

More information

Spatial Analysis 1. Introduction

Spatial Analysis 1. Introduction Spatial Analysis 1 Introduction Geo-referenced Data (not any data) x, y coordinates (e.g., lat., long.) ------------------------------------------------------ - Table of Data: Obs. # x y Variables -------------------------------------

More information

INTRODUCTION. In March 1998, the tender for project CT.98.EP.04 was awarded to the Department of Medicines Management, Keele University, UK.

INTRODUCTION. In March 1998, the tender for project CT.98.EP.04 was awarded to the Department of Medicines Management, Keele University, UK. INTRODUCTION In many areas of Europe patterns of drug use are changing. The mechanisms of diffusion are diverse: introduction of new practices by new users, tourism and migration, cross-border contact,

More information

University of Lusaka

University of Lusaka University of Lusaka BSPH 315 Health Mapping & GIS Topic: Background of GIS Content: 1. Aim of the course 2. What do you know about GIS? 3. Difference between geographic data and geographic information

More information

SPATIAL ANALYSIS. Transformation. Cartogram Central. 14 & 15. Query, Measurement, Transformation, Descriptive Summary, Design, and Inference

SPATIAL ANALYSIS. Transformation. Cartogram Central. 14 & 15. Query, Measurement, Transformation, Descriptive Summary, Design, and Inference 14 & 15. Query, Measurement, Transformation, Descriptive Summary, Design, and Inference Geographic Information Systems and Science SECOND EDITION Paul A. Longley, Michael F. Goodchild, David J. Maguire,

More information

Bayesian Hierarchical Models

Bayesian Hierarchical Models Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology

More information

GIS = Geographic Information Systems;

GIS = Geographic Information Systems; What is GIS GIS = Geographic Information Systems; What Information are we talking about? Information about anything that has a place (e.g. locations of features, address of people) on Earth s surface,

More information

A nonparametric spatial scan statistic for continuous data

A nonparametric spatial scan statistic for continuous data DOI 10.1186/s12942-015-0024-6 METHODOLOGY Open Access A nonparametric spatial scan statistic for continuous data Inkyung Jung * and Ho Jin Cho Abstract Background: Spatial scan statistics are widely used

More information

Applied Spatial Analysis in Epidemiology

Applied Spatial Analysis in Epidemiology Applied Spatial Analysis in Epidemiology COURSE DURATION This is an on-line, distance learning course and material will be available from: June 1 30, 2017 INSTRUCTOR Rena Jones, PhD, MS renajones@gmail.com

More information

Hierarchical Modeling and Analysis for Spatial Data

Hierarchical Modeling and Analysis for Spatial Data Hierarchical Modeling and Analysis for Spatial Data Bradley P. Carlin, Sudipto Banerjee, and Alan E. Gelfand brad@biostat.umn.edu, sudiptob@biostat.umn.edu, and alan@stat.duke.edu University of Minnesota

More information

A spatial scan statistic for multinomial data

A spatial scan statistic for multinomial data A spatial scan statistic for multinomial data Inkyung Jung 1,, Martin Kulldorff 2 and Otukei John Richard 3 1 Department of Epidemiology and Biostatistics University of Texas Health Science Center at San

More information

Advanced Algorithms for Geographic Information Systems CPSC 695

Advanced Algorithms for Geographic Information Systems CPSC 695 Advanced Algorithms for Geographic Information Systems CPSC 695 Think about Geography What is Geography The 3 W s of Geography What is where Why is it there Why do I care Data - Data - Data We all got

More information

GeoHealth Applications Platform ESRI Health GIS Conference 2013

GeoHealth Applications Platform ESRI Health GIS Conference 2013 GeoHealth Applications Platform ESRI Health GIS Conference 2013 Authors Thomas A. Horan, Ph.D. Professor, CISAT Director April Moreno Health GeoInformatics Ph.D. Student Brian N. Hilton, Ph.D. Clinical

More information

Pumps, Maps and Pea Soup: Spatio-temporal methods in environmental epidemiology

Pumps, Maps and Pea Soup: Spatio-temporal methods in environmental epidemiology Pumps, Maps and Pea Soup: Spatio-temporal methods in environmental epidemiology Gavin Shaddick Department of Mathematical Sciences University of Bath 2012-13 van Eeden lecture Thanks Constance van Eeden

More information

Applied Spatial Analysis in Epidemiology

Applied Spatial Analysis in Epidemiology Applied Spatial Analysis in Epidemiology COURSE DURATION Course material will be available from: June 1- June 30, 2018 INSTRUCTOR Rena Jones, PhD MS rena.jones@yale.edu COURSE DESCRIPTION This course will

More information

GIS for ChEs Introduction to Geographic Information Systems

GIS for ChEs Introduction to Geographic Information Systems GIS for ChEs Introduction to Geographic Information Systems AIChE Webinar John Cirucci 1 GIS for ChEs Introduction to Geographic Information Systems What is GIS? Tools and Methods Applications Examples

More information

Aggregated cancer incidence data: spatial models

Aggregated cancer incidence data: spatial models Aggregated cancer incidence data: spatial models 5 ième Forum du Cancéropôle Grand-est - November 2, 2011 Erik A. Sauleau Department of Biostatistics - Faculty of Medicine University of Strasbourg ea.sauleau@unistra.fr

More information

Map your way to deeper insights

Map your way to deeper insights Map your way to deeper insights Target, forecast and plan by geographic region Highlights Apply your data to pre-installed map templates and customize to meet your needs. Select from included map files

More information

Measuring Geographic Access to Primary Care Physicians

Measuring Geographic Access to Primary Care Physicians Measuring Geographic Access to Primary Care Physicians The New Mexico Health Policy Commission and the University of New Mexico s Division of Government Research have been working cooperatively to collect

More information

Spatial and Temporal Geovisualisation and Data Mining of Road Traffic Accidents in Christchurch, New Zealand

Spatial and Temporal Geovisualisation and Data Mining of Road Traffic Accidents in Christchurch, New Zealand 166 Spatial and Temporal Geovisualisation and Data Mining of Road Traffic Accidents in Christchurch, New Zealand Clive E. SABEL and Phil BARTIE Abstract This paper outlines the development of a method

More information

Acknowledgments xiii Preface xv. GIS Tutorial 1 Introducing GIS and health applications 1. What is GIS? 2

Acknowledgments xiii Preface xv. GIS Tutorial 1 Introducing GIS and health applications 1. What is GIS? 2 Acknowledgments xiii Preface xv GIS Tutorial 1 Introducing GIS and health applications 1 What is GIS? 2 Spatial data 2 Digital map infrastructure 4 Unique capabilities of GIS 5 Installing ArcView and the

More information

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System Outline I Data Preparation Introduction to SpaceStat and ESTDA II Introduction to ESTDA and SpaceStat III Introduction to time-dynamic regression ESTDA ESTDA & SpaceStat Learning Objectives Activities

More information

Interactive GIS in Veterinary Epidemiology Technology & Application in a Veterinary Diagnostic Lab

Interactive GIS in Veterinary Epidemiology Technology & Application in a Veterinary Diagnostic Lab Interactive GIS in Veterinary Epidemiology Technology & Application in a Veterinary Diagnostic Lab Basics GIS = Geographic Information System A GIS integrates hardware, software and data for capturing,

More information

Application of Indirect Race/ Ethnicity Data in Quality Metric Analyses

Application of Indirect Race/ Ethnicity Data in Quality Metric Analyses Background The fifteen wholly-owned health plans under WellPoint, Inc. (WellPoint) historically did not collect data in regard to the race/ethnicity of it members. In order to overcome this lack of data

More information

Chapter 6 Spatial Analysis

Chapter 6 Spatial Analysis 6.1 Introduction Chapter 6 Spatial Analysis Spatial analysis, in a narrow sense, is a set of mathematical (and usually statistical) tools used to find order and patterns in spatial phenomena. Spatial patterns

More information

LOCATION OF PREHOSPITAL CARE BASIS THROUGH COMBINED FUZZY AHP AND GIS METHOD

LOCATION OF PREHOSPITAL CARE BASIS THROUGH COMBINED FUZZY AHP AND GIS METHOD ISAHP Article: Mu, Saaty/A Style Guide for Paper Proposals To Be Submitted to the LOCATION OF PREHOSPITAL CARE BASIS THROUGH COMBINED FUZZY AHP AND GIS METHOD Marco Tiznado Departamento de Ingeniería Industrial,

More information

Core Courses for Students Who Enrolled Prior to Fall 2018

Core Courses for Students Who Enrolled Prior to Fall 2018 Biostatistics and Applied Data Analysis Students must take one of the following two sequences: Sequence 1 Biostatistics and Data Analysis I (PHP 2507) This course, the first in a year long, two-course

More information

Geog 469 GIS Workshop. Data Analysis

Geog 469 GIS Workshop. Data Analysis Geog 469 GIS Workshop Data Analysis Outline 1. What kinds of need-to-know questions can be addressed using GIS data analysis? 2. What is a typology of GIS operations? 3. What kinds of operations are useful

More information

Techniques for Science Teachers: Using GIS in Science Classrooms.

Techniques for Science Teachers: Using GIS in Science Classrooms. Techniques for Science Teachers: Using GIS in Science Classrooms. After ESRI, 2008 GIS A Geographic Information System A collection of computer hardware, software, and geographic data used together for

More information

THE 3D SIMULATION INFORMATION SYSTEM FOR ASSESSING THE FLOODING LOST IN KEELUNG RIVER BASIN

THE 3D SIMULATION INFORMATION SYSTEM FOR ASSESSING THE FLOODING LOST IN KEELUNG RIVER BASIN THE 3D SIMULATION INFORMATION SYSTEM FOR ASSESSING THE FLOODING LOST IN KEELUNG RIVER BASIN Kuo-Chung Wen *, Tsung-Hsing Huang ** * Associate Professor, Chinese Culture University, Taipei **Master, Chinese

More information

Spatial Clusters of Rates

Spatial Clusters of Rates Spatial Clusters of Rates Luc Anselin http://spatial.uchicago.edu concepts EBI local Moran scan statistics Concepts Rates as Risk from counts (spatially extensive) to rates (spatially intensive) rate =

More information

SaTScan TM. User Guide. for version 7.0. By Martin Kulldorff. August

SaTScan TM. User Guide. for version 7.0. By Martin Kulldorff. August SaTScan TM User Guide for version 7.0 By Martin Kulldorff August 2006 http://www.satscan.org/ Contents Introduction... 4 The SaTScan Software... 4 Download and Installation... 5 Test Run... 5 Sample Data

More information

FUNDAMENTALS OF GEOINFORMATICS PART-II (CLASS: FYBSc SEM- II)

FUNDAMENTALS OF GEOINFORMATICS PART-II (CLASS: FYBSc SEM- II) FUNDAMENTALS OF GEOINFORMATICS PART-II (CLASS: FYBSc SEM- II) UNIT:-I: INTRODUCTION TO GIS 1.1.Definition, Potential of GIS, Concept of Space and Time 1.2.Components of GIS, Evolution/Origin and Objectives

More information

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio Applied Spatial Data Analysis with R 4:1 Springer Contents Preface VII 1 Hello World: Introducing Spatial Data 1 1.1 Applied Spatial Data Analysis

More information

A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources

A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources Carol A. Gotway Crawford National Center for Environmental Health Centers for Disease Control and Prevention,

More information

GEOG 3340: Introduction to Human Geography Research

GEOG 3340: Introduction to Human Geography Research GEOG 3340: Introduction to Human Geography Research Lecture 1: Course Overview Guofeng Cao www.myweb.ttu.edu/gucao Department of Geosciences Texas Tech University guofeng.cao@ttu.edu Fall 2015 Course Description

More information

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Spatial Analysis I. Spatial data analysis Spatial analysis and inference Spatial Analysis I Spatial data analysis Spatial analysis and inference Roadmap Outline: What is spatial analysis? Spatial Joins Step 1: Analysis of attributes Step 2: Preparing for analyses: working with

More information

Long Island Breast Cancer Study and the GIS-H (Health)

Long Island Breast Cancer Study and the GIS-H (Health) Long Island Breast Cancer Study and the GIS-H (Health) Edward J. Trapido, Sc.D. Associate Director Epidemiology and Genetics Research Program, DCCPS/NCI COMPREHENSIVE APPROACHES TO CANCER CONTROL September,

More information

Texas A&M University

Texas A&M University Texas A&M University CVEN 658 Civil Engineering Applications of GIS Hotspot Analysis of Highway Accident Spatial Pattern Based on Network Spatial Weights Instructor: Dr. Francisco Olivera Author: Zachry

More information

A spatial literacy initiative for undergraduate education at UCSB

A spatial literacy initiative for undergraduate education at UCSB A spatial literacy initiative for undergraduate education at UCSB Mike Goodchild & Don Janelle Department of Geography / spatial@ucsb University of California, Santa Barbara ThinkSpatial Brown bag forum

More information

A Framework for the Study of Urban Health. Abdullah Baqui, DrPH, MPH, MBBS Johns Hopkins University

A Framework for the Study of Urban Health. Abdullah Baqui, DrPH, MPH, MBBS Johns Hopkins University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

GIS and Health Geography. What is epidemiology?

GIS and Health Geography. What is epidemiology? GIS and Health Geography { What is epidemiology? TOC GIS and health geography Major applications for GIS Epidemiology What is health (and how location matters) What is a disease (and how to identify one)

More information

Understanding China Census Data with GIS By Shuming Bao and Susan Haynie China Data Center, University of Michigan

Understanding China Census Data with GIS By Shuming Bao and Susan Haynie China Data Center, University of Michigan Understanding China Census Data with GIS By Shuming Bao and Susan Haynie China Data Center, University of Michigan The Census data for China provides comprehensive demographic and business information

More information

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE CO-282 POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE KYRIAKIDIS P. University of California Santa Barbara, MYTILENE, GREECE ABSTRACT Cartographic areal interpolation

More information

Applying Health Outcome Data to Improve Health Equity

Applying Health Outcome Data to Improve Health Equity Applying Health Outcome Data to Improve Health Equity Devon Williford, MPH, Health GIS Specialist Lorraine Dixon-Jones, Policy Analyst CDPHE Health Equity and Environmental Justice Collaborative Mile High

More information

Comparison of spatial methods for measuring road accident hotspots : a case study of London

Comparison of spatial methods for measuring road accident hotspots : a case study of London Journal of Maps ISSN: (Print) 1744-5647 (Online) Journal homepage: http://www.tandfonline.com/loi/tjom20 Comparison of spatial methods for measuring road accident hotspots : a case study of London Tessa

More information

An online data and consulting resource of THE UNIVERSITY OF TOLEDO THE JACK FORD URBAN AFFAIRS CENTER

An online data and consulting resource of THE UNIVERSITY OF TOLEDO THE JACK FORD URBAN AFFAIRS CENTER An online data and consulting resource of THE JACK FORD URBAN AFFAIRS CENTER THE CENTER FOR GEOGRAPHIC INFORMATION SCIENCE AND APPLIED GEOGRAPHICS DEPARTMENT OF GEOGRAPHY AND PLANNING THE UNIVERSITY OF

More information

Linkage Methods for Environment and Health Analysis General Guidelines

Linkage Methods for Environment and Health Analysis General Guidelines Health and Environment Analysis for Decision-making Linkage Analysis and Monitoring Project WORLD HEALTH ORGANIZATION PUBLICATIONS Linkage Methods for Environment and Health Analysis General Guidelines

More information

A Comparison of Three Exploratory Methods for Cluster Detection in Spatial Point Patterns

A Comparison of Three Exploratory Methods for Cluster Detection in Spatial Point Patterns A. Stewart Fotheringham and F. Benjamin Zhan A Comparison of Three Exploratory Methods for Cluster Detection in Spatial Point Patterns This paper compares the performances of three explorato y methods

More information

ASSESSING AND EVALUATING RECREATION RESOURCE IMPACTS: SPATIAL ANALYTICAL APPROACHES. Yu-Fai Leung

ASSESSING AND EVALUATING RECREATION RESOURCE IMPACTS: SPATIAL ANALYTICAL APPROACHES. Yu-Fai Leung ASSESSING AND EVALUATING RECREATION RESOURCE IMPACTS: SPATIAL ANALYTICAL APPROACHES by Yu-Fai Leung Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial

More information

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall Announcement New Teaching Assistant Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall 209 Email: michael.harrigan@ttu.edu Guofeng Cao, Texas Tech GIST4302/5302, Lecture 2: Review of Map Projection

More information

Using Geographic Information Systems for Exposure Assessment

Using Geographic Information Systems for Exposure Assessment Using Geographic Information Systems for Exposure Assessment Ravi K. Sharma, PhD Department of Behavioral & Community Health Sciences, Graduate School of Public Health, University of Pittsburgh, Pittsburgh,

More information

John Laznik 273 Delaplane Ave Newark, DE (302)

John Laznik 273 Delaplane Ave Newark, DE (302) Office Address: John Laznik 273 Delaplane Ave Newark, DE 19711 (302) 831-0479 Center for Applied Demography and Survey Research College of Human Services, Education and Public Policy University of Delaware

More information

Medical GIS: New Uses of Mapping Technology in Public Health. Peter Hayward, PhD Department of Geography SUNY College at Oneonta

Medical GIS: New Uses of Mapping Technology in Public Health. Peter Hayward, PhD Department of Geography SUNY College at Oneonta Medical GIS: New Uses of Mapping Technology in Public Health Peter Hayward, PhD Department of Geography SUNY College at Oneonta Invited research seminar presentation at Bassett Healthcare. Cooperstown,

More information

Applications of GIS in Health Research. West Nile virus

Applications of GIS in Health Research. West Nile virus Applications of GIS in Health Research West Nile virus Outline Part 1. Applications of GIS in Health research or spatial epidemiology Disease Mapping Cluster Detection Spatial Exposure Assessment Assessment

More information

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms Arthur Getis* and Jared Aldstadt** *San Diego State University **SDSU/UCSB

More information

GEOGRAPHY 350/550 Final Exam Fall 2005 NAME:

GEOGRAPHY 350/550 Final Exam Fall 2005 NAME: 1) A GIS data model using an array of cells to store spatial data is termed: a) Topology b) Vector c) Object d) Raster 2) Metadata a) Usually includes map projection, scale, data types and origin, resolution

More information

Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones

Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones Prepared for consideration for PAA 2013 Short Abstract Empirical research

More information

Victor C. NNAM, Bernard O. EKPETE and Obinna C. D. ANEJIONU, Nigeria

Victor C. NNAM, Bernard O. EKPETE and Obinna C. D. ANEJIONU, Nigeria IMPROVING STREET GUIDE MAPPING OF ENUGU SOUTH URBAN AREA THROUGH COMPUTER AIDED CARTOGRAPHY By Victor C. NNAM, Bernard O. EKPETE and Obinna C. D. ANEJIONU, Nigeria Presented at FIG Working Week 2012 Knowing

More information

Using Geospatial Methods with Other Health and Environmental Data to Identify Populations

Using Geospatial Methods with Other Health and Environmental Data to Identify Populations Using Geospatial Methods with Other Health and Environmental Data to Identify Populations Ellen K. Cromley, PhD Consultant, Health Geographer ellen.cromley@gmail.com Purpose and Outline To illustrate the

More information

Spatial Thinking and Modeling of Network-Based Problems

Spatial Thinking and Modeling of Network-Based Problems Spatial Thinking and Modeling of Network-Based Problems Presentation at the SPACE Workshop Columbus, Ohio, July 1, 25 Shih-Lung Shaw Professor Department of Geography University of Tennessee Knoxville,

More information

Content Area: Social Studies Standard: 1. History Prepared Graduates: Develop an understanding of how people view, construct, and interpret history

Content Area: Social Studies Standard: 1. History Prepared Graduates: Develop an understanding of how people view, construct, and interpret history Standard: 1. History Develop an understanding of how people view, construct, and interpret history 1. Organize and sequence events to understand the concepts of chronology and cause and effect in the history

More information

Scalable Bayesian Event Detection and Visualization

Scalable Bayesian Event Detection and Visualization Scalable Bayesian Event Detection and Visualization Daniel B. Neill Carnegie Mellon University H.J. Heinz III College E-mail: neill@cs.cmu.edu This work was partially supported by NSF grants IIS-0916345,

More information

Shape and scale in detecting disease clusters

Shape and scale in detecting disease clusters University of Iowa Iowa Research Online Theses and Dissertations 2008 Shape and scale in detecting disease clusters Soumya Mazumdar University of Iowa Copyright 2008 Soumya Mazumdar This dissertation is

More information

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX Well, it depends on where you're born: A practical application of geographically weighted regression to the study of infant mortality in the U.S. P. Johnelle Sparks and Corey S. Sparks 1 Introduction Infant

More information

ENV208/ENV508 Applied GIS. Week 1: What is GIS?

ENV208/ENV508 Applied GIS. Week 1: What is GIS? ENV208/ENV508 Applied GIS Week 1: What is GIS? 1 WHAT IS GIS? A GIS integrates hardware, software, and data for capturing, managing, analyzing, and displaying all forms of geographically referenced information.

More information

Are You Maximizing The Value Of All Your Data?

Are You Maximizing The Value Of All Your Data? Are You Maximizing The Value Of All Your Data? Using The SAS Bridge for ESRI With ArcGIS Business Analyst In A Retail Market Analysis SAS and ESRI: Bringing GIS Mapping and SAS Data Together Presented

More information

Modelling Accessibility to General Hospitals in Ireland

Modelling Accessibility to General Hospitals in Ireland Modelling Accessibility to General Hospitals in Ireland Stamatis Kalogirou 1,*, Ronan Foley 2 1. National Centre for Geocomputation, John Hume Building, NUI Maynooth, Maynooth, Co. Kildare, Ireland, Tel:

More information

How GIS can be used for improvement of literacy and CE programmes

How GIS can be used for improvement of literacy and CE programmes How GIS can be used for improvement of literacy and CE programmes Training Workshop for Myanmar Literacy Resource Center (MLRC) ( Yangon, Myanmar, 11 20 October 2000 ) Presented by U THEIN HTUT GEOCOMP

More information

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial Analysis Summary: Acute Myocardial Infarction and Social Determinants of Health Acute Myocardial Infarction Study Summary March 2014 Project Summary :: Purpose This report details analyses and methodologies

More information

Analyzing the Geospatial Rates of the Primary Care Physician Labor Supply in the Contiguous United States

Analyzing the Geospatial Rates of the Primary Care Physician Labor Supply in the Contiguous United States Analyzing the Geospatial Rates of the Primary Care Physician Labor Supply in the Contiguous United States By Russ Frith Advisor: Dr. Raid Amin University of W. Florida Capstone Project in Statistics April,

More information

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB SPACE Workshop NSF NCGIA CSISS UCGIS SDSU Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB August 2-8, 2004 San Diego State University Some Examples of Spatial

More information

GEOGRAPHIC INFORMATION SYSTEMS Session 8

GEOGRAPHIC INFORMATION SYSTEMS Session 8 GEOGRAPHIC INFORMATION SYSTEMS Session 8 Introduction Geography underpins all activities associated with a census Census geography is essential to plan and manage fieldwork as well as to report results

More information

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS Srinivasan R and Venkatesan P Dept. of Statistics, National Institute for Research Tuberculosis, (Indian Council of Medical Research),

More information

An Introduction to SaTScan

An Introduction to SaTScan An Introduction to SaTScan Software to measure spatial, temporal or space-time clusters using a spatial scan approach Marilyn O Hara University of Illinois moruiz@illinois.edu Lecture for the Pre-conference

More information

Urban GIS for Health Metrics

Urban GIS for Health Metrics Urban GIS for Health Metrics Dajun Dai Department of Geosciences, Georgia State University Atlanta, Georgia, United States Presented at International Conference on Urban Health, March 5 th, 2014 People,

More information

Mapping and Analysis for Spatial Social Science

Mapping and Analysis for Spatial Social Science Mapping and Analysis for Spatial Social Science Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu Outline

More information

The Scope and Growth of Spatial Analysis in the Social Sciences

The Scope and Growth of Spatial Analysis in the Social Sciences context. 2 We applied these search terms to six online bibliographic indexes of social science Completed as part of the CSISS literature search initiative on November 18, 2003 The Scope and Growth of Spatial

More information

Role of GIS in Tracking and Controlling Spread of Disease

Role of GIS in Tracking and Controlling Spread of Disease Role of GIS in Tracking and Controlling Spread of Disease For Dr. Baqer Al-Ramadan By Syed Imran Quadri CRP 514: Introduction to GIS Introduction Problem Statement Objectives Methodology of Study Literature

More information

Introduction to Geographic Information Science. Updates/News. Last Lecture 1/23/2017. Geography 4103 / Spatial Data Representations

Introduction to Geographic Information Science. Updates/News. Last Lecture 1/23/2017. Geography 4103 / Spatial Data Representations Geography 4103 / 5103 Introduction to Geographic Information Science Spatial Data Representations Updates/News Waitlisted students First graded lab this week: skills learning Instructional labs vs. independence

More information

Rapid detection of spatiotemporal clusters

Rapid detection of spatiotemporal clusters Rapid detection of spatiotemporal clusters Markus Loecher, Berlin School of Economics and Law July 2nd, 2015 Table of contents Motivation Spatial Plots in R RgoogleMaps Spatial cluster detection Spatiotemporal

More information

Geographic Information Systems A GIS Primer for Public Health. Capacity Building Workshop October 19, 2009

Geographic Information Systems A GIS Primer for Public Health. Capacity Building Workshop October 19, 2009 Geographic Information Systems A for Public Health Capacity Building Workshop October 19, 2009 Agenda Welcome (:10) OAHPP GIS Workshop Series (:10) GIS: A Four Letter Word (:10) The Fundamentals (:30)

More information

GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara

GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara Location as attribute The data table Census summary table What value is location as an explanatory

More information

Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS

Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS Study Guide: Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS This guide presents some study questions with specific referral to the essential

More information

Using GIS to Brief New York City Public Officials after September 11

Using GIS to Brief New York City Public Officials after September 11 Using GIS to Brief New York City Public Officials after September 11 Presented by Zvia Segal Naphtali, Ph.D. and Leonard M. Naphtali, Ph.D. Presented at the ESRI International Health GIS Conference, May

More information

Tracey Farrigan Research Geographer USDA-Economic Research Service

Tracey Farrigan Research Geographer USDA-Economic Research Service Rural Poverty Symposium Federal Reserve Bank of Atlanta December 2-3, 2013 Tracey Farrigan Research Geographer USDA-Economic Research Service Justification Increasing demand for sub-county analysis Policy

More information

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit 2011/04 LEUKAEMIA IN WALES 1994-2008 Welsh Cancer Intelligence and Surveillance Unit Table of Contents 1 Definitions and Statistical Methods... 2 2 Results 7 2.1 Leukaemia....... 7 2.2 Acute Lymphoblastic

More information

METHODS FOR STATISTICS

METHODS FOR STATISTICS DYNAMIC CARTOGRAPHIC METHODS FOR VISUALIZATION OF HEALTH STATISTICS Radim Stampach M.Sc. Assoc. Prof. Milan Konecny Ph.D. Petr Kubicek Ph.D. Laboratory on Geoinformatics and Cartography, Department of

More information