ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

Similar documents
Community Health Needs Assessment through Spatial Regression Modeling

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

Bayesian Hierarchical Models

Sidestepping the box: Designing a supplemental poverty indicator for school neighborhoods

Maggie M. Kovach. Department of Geography University of North Carolina at Chapel Hill

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

Medical GIS: New Uses of Mapping Technology in Public Health. Peter Hayward, PhD Department of Geography SUNY College at Oneonta

DEVELOPING DECISION SUPPORT TOOLS FOR THE IMPLEMENTATION OF BICYCLE AND PEDESTRIAN SAFETY STRATEGIES

Defining Statistically Significant Spatial Clusters of a Target Population using a Patient-Centered Approach within a GIS

Overview of Statistical Analysis of Spatial Data

Do the Causes of Poverty Vary by Neighborhood Type?

Aggregated cancer incidence data: spatial models

Hierarchical Modeling and Analysis for Spatial Data

NEW YORK DEPARTMENT OF SANITATION. Spatial Analysis of Complaints

Spatial Variation in Hospitalizations for Cardiometabolic Ambulatory Care Sensitive Conditions Across Canada

FleXScan User Guide. for version 3.1. Kunihiko Takahashi Tetsuji Yokoyama Toshiro Tango. National Institute of Public Health

Neighborhood social characteristics and chronic disease outcomes: does the geographic scale of neighborhood matter? Malia Jones

Spatial Disparities in the Distribution of Parks and Green Spaces in the United States

Rural Pennsylvania: Where Is It Anyway? A Compendium of the Definitions of Rural and Rationale for Their Use

Measuring community health outcomes: New approaches for public health services research

Jun Tu. Department of Geography and Anthropology Kennesaw State University

A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources

emerge Network: CERC Survey Survey Sampling Data Preparation

GRADUATE CERTIFICATE PROGRAM

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information

In matrix algebra notation, a linear model is written as

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Spatial Variation in Local Road Pedestrian and Bicycle Crashes

Exploratory Spatial Data Analysis (ESDA)

Environmental Analysis, Chapter 4 Consequences, and Mitigation

ESRI 2008 Health GIS Conference

Urban GIS for Health Metrics

Frontier and Remote (FAR) Area Codes: A Preliminary View of Upcoming Changes John Cromartie Economic Research Service, USDA

Integrating GIS into Food Access Analysis

Nature of Spatial Data. Outline. Spatial Is Special

Cluster Analysis using SaTScan

Deriving Spatially Refined Consistent Small Area Estimates over Time Using Cadastral Data

Geospatial Analysis of Job-Housing Mismatch Using ArcGIS and Python

Administrative Data Research Facility Linked HMDA and ACS Database

Understanding China Census Data with GIS By Shuming Bao and Susan Haynie China Data Center, University of Michigan

Spatial and Socioeconomic Analysis of Commuting Patterns in Southern California Using LODES, CTPP, and ACS PUMS

Variables and Variable De nitions

Combining Incompatible Spatial Data

Big-Geo-Data EHR Infrastructure Development for On-Demand Analytics

BROOKINGS May

Announcing a Course Sequence: March 9 th - 10 th, March 11 th, and March 12 th - 13 th 2015 Historic Charleston, South Carolina

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM

A Comprehensive Method for Identifying Optimal Areas for Supermarket Development. TRF Policy Solutions April 28, 2011

OBESITY AND LOCATION IN MARION COUNTY, INDIANA MIDWEST STUDENT SUMMIT, APRIL Samantha Snyder, Purdue University

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

emerge Network: CERC Survey Survey Sampling Data Preparation

Spatial Trends of unpaid caregiving in Ireland

Known unknowns : using multiple imputation to fill in the blanks for missing data

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

MODULE 12: Spatial Statistics in Epidemiology and Public Health Lecture 7: Slippery Slopes: Spatially Varying Associations

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Long Island Breast Cancer Study and the GIS-H (Health)

GIS Spatial Statistics for Public Opinion Survey Response Rates

Mapping Accessibility Over Time

Outline. 15. Descriptive Summary, Design, and Inference. Descriptive summaries. Data mining. The centroid

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

Spotlight on Population Resources for Geography Teachers. Pat Beeson, Education Services, Australian Bureau of Statistics

Varieties of Count Data

The Use of Spatial Weights Matrices and the Effect of Geometry and Geographical Scale

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

Analyzing spatial autoregressive models using Stata

GEO 463-Geographic Information Systems Applications. Lecture 1

SPATIAL ANALYSIS. Transformation. Cartogram Central. 14 & 15. Query, Measurement, Transformation, Descriptive Summary, Design, and Inference

The Scope and Growth of Spatial Analysis in the Social Sciences

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Analysis of Bank Branches in the Greater Los Angeles Region

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

WEB application for the analysis of spatio-temporal data

Census Geography, Geographic Standards, and Geographic Information

Speakers: Jeff Price, Federal Transit Administration Linda Young, Center for Neighborhood Technology Sofia Becker, Center for Neighborhood Technology

Commuting in Northern Ireland: Exploring Spatial Variations through Spatial Interaction Modelling

Small-Area Population Forecasting Using a Spatial Regression Approach

Gaussian Process Regression Model in Spatial Logistic Regression

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Demographic Data in ArcGIS. Harry J. Moore IV

Tracey Farrigan Research Geographer USDA-Economic Research Service

Dengue Forecasting Project

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

GIS for the Non-Expert

Scottish Atlas of Variation

Chapter 6 Spatial Analysis

Encapsulating Urban Traffic Rhythms into Road Networks

2009 ESRI User Conference San Diego, CA

ArcGIS Online Analytics. Mike Flanagan

Bayesian Spatial Health Surveillance

WU Weiterbildung. Linear Mixed Models

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Mining and Visualizing Spatial Interaction Patterns for Pandemic Response

Multivariate spatial modeling

Bayesian Estimation of Prediction Error and Variable Selection in Linear Regression

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Transcription:

ARIC Manuscript Proposal # 1186 PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2 1.a. Full Title: Comparing Methods of Incorporating Spatial Correlation in Models of the Association Between Hospitalized Myocardial Infarction and the Burden of Socioeconomic Status in Communities b. Abbreviated Title (Length 26 characters): Modeling Neighborhood SES 2. Writing Group: Kuo-Ping Li, Gerardo Heiss, Kathryn Rose, C. Suchindran, Eric Whitsel, Joy Wood. We welcome others who are interested. I, the first author, confirm that all the coauthors have given their approval for this manuscript proposal. X [please confirm with your initials electronically or in writing] First author: Kuo-Ping Li, PhD Address: Dept. of Biostatistics CB# 7420, McGavran-Greenberg Hall University of North Carolina Chapel Hill, NC 27599-7420 Phone: 919-843-6296 Fax: 919-843-4630 Email: kpli@email.unc.edu Corresponding/senior author (if different from first author correspondence will be sent to both the first author & the corresponding author): Kathryn Rose Cardiovascular Disease Program 137 E Franklin St, Suite 306 Chapel Hill NC 27514 E-mail: kathryn_rose@unc.edu

3. Timeline: Analyses to begin in Summer 2006. Draft of manuscript is expected during Fall 2006. 5. Rationale: Regression models are widely used in public health studies to relate explanatory variables to the outcomes. The data used in regression analysis are often (perhaps too often) assumed to be independently distributed. However, this assumption can be an oversimplification. This is especially true for spatial data as it is well known that observations that are close to each other are more similar than those that are far apart [1]. Thus, models that consider spatial correlation are more desirable for spatial data. In general, spatial models offer two types of benefit. The first considers the loss of information introduced by spatial correlation relative to independent samples of the same size. In this situation, incorporating the correlation into the model improves the accuracy of statistical inference. Another benefit of spatial models derives from the view that spatial correlation itself is an extra source of information. For example, when analyzing incidence of a rare disease, spatial correlation can be used to stabilize low counts in a region by borrowing strength from nearby regions. In ARIC ancillary study # 2004.05 (The Burden of SES in Communities), addresses abstracted from hospital records and death certificates for ARIC surveillance events have been geocoded so that neighborhood (census tract level) socioeconomic characteristics can be linked with event records. This allows an examination of the association between neighborhood socioeconomic characteristics and MI occurrence, treatment, and survival. Through georeferencing it becomes possible to incorporate spatial correlation into the analysis. One of the aims of this study is to examine whether the variation in socioeconomic characteristics of different spatial regions contributes to the observed variation in the rates of hospitalized MI events in these regions. As mentioned above, including spatial correlations in the model can improve the model accuracy as well as stabilize the observed event counts which can be quite low in many regions. Some methodological issues have to be considered in order to treat the spatial correlation properly. Three types of spatial models are commonly used, i.e., (1) fixed effect general linear models with spatial covariance structures (spatial GLM) [2-4], (2) mixed effect models with spatial random effects (spatial GLMM) [5], and (3) computational Bayesian spatial models [6-7]. In general, the choice of spatial model can greatly affect the results of statistical estimation. The first purpose of this manuscript is to compare these three modeling approaches and examine their strengths and shortcomings in the context of ARIC community surveillance. Second, all the spatial models mentioned above rely on user-specified neighbor structure, that is, a description of geographical relationships of spatial regions under consideration. The neighbor structure can be defined based on region-to-region distances, the adjacency of regions, or some weighted combination of these (for example, weighted

by population size). The spatial units considered in our models are the census tracts in the four ARIC community surveillance study sites. These census tracts vary widely in several ways: land area, number of neighbors within a given distance, number of nearest neighbors, etc. For example, the land area of the 75 census tracts in Forsyth County, NC varies from 0.8 to 78 km 2, and the number of neighboring tracts within a radius of 10 km varies from 3 to 45. There is less variation in the number of tracts adjacent to a tract, which ranges from 2 to 8. Moreover, these differently sized tracts are not randomly located but are highly clustered tracts in downtown areas tend to be small and to have many neighbors; suburban and rural tracts are large and have fewer neighbors. This implies that the neighbor structure based on tract-tract distances would be very different from neighbor structure based on adjacency of tracts. There is currently no consensus on the best approach to construct the neighbor structure in a given context. Thus, the second purpose of this manuscript is to construct the expressions of distance- and adjacencybased neighbor structures of ARIC study sites, compare the results of the two approaches, and to discuss their uses for spatial analysis in the ARIC study. 5. Main Hypothesis/Study Questions: We will consider the following three types of models: 1. Spatial GLM, 2. Spatial GLMM, 3. Computational Bayesian spatial models. Two ways of constructing the neighbor structure will be used to describe the spatial relationship of tracts: 1. Based on tract-tract distance, 2. Based on adjacency of tracts. We will attempt to answer the following questions: 1. Do different model-neighbor structure combinations give different estimates of model parameters? If so, what factors contribute to the differences? 2. Within the context of ARIC community surveillance, what would be the strengths and shortcomings of each modeling approach? Data: 1. The outcomes: aggregated hospitalized MI event counts for the period 1999-2001 of census tracts within the ARIC study sites, according to the patients places of residence, sex, race (black, white), and 5-year age groups. Expected event counts are also calculated using internal standardization. 2. Covariates: Census tract-level socioeconomic status (SES) data from the year 2000, including median household income, sex ratio of population, ratio of black population to white population, and age distribution. 3. Spatial information of census tracts: the coordinate (latitude, longitude) of census tract centroids

Exclusions: We will include hospitalized, definite or probable MI surveillance events. We will limit cases to those occurring in 1993 and later, as addresses were not abstracted prior to this time. We will exclude ungeocodable events, geocoded events outside ARIC community boundaries, as well as Blacks in Washington County and Minneapolis. Analyses: We will use Poisson regression as the common form to model the observed incident counts in terms of the expected counts, SES covariates, and tracts neighbor structures. For spatial GLM, we will construct a model in which the outcomes are treated as independent, and all spatial variations on the outcomes are described by the fixed, location-dependent covariate effect and spatially correlated residues. For spatial GLMM, we will construct a model similar to spatial GLM but with spatial random effects with non-spatial residues. For the computational Bayesian model, we will construct a Bayesian counterpart of the spatial GLMM, and use computational Bayesian techniques to estimate the model parameters. In all three models, we will use conditional autoregressive (CAR) models to describe the spatial correlation. The neighbor structure has to be specified for each CAR model as follows: For the distance-based neighbor structure, we will designate all tracts whose centroids are within a given radius from the centroid of the tract under consideration as its neighbors. Different radii (2 km, 5 km, 10 km) will be used for comparison. For the adjacency-based neighbor structure, we will include tracts that share common boundaries with the tract under consideration as its neighbors. For computation, suitable matrix representations are needed for the neighbor structures. The GeoBUGS software package will be used to obtain the matrix representation of the adjacency-based neighbor structure. The process of constructing a distance-based neighbor structure is more complicated, and currently there is no convenient software for doing this. We will write our own software to obtain the matrix representation of the distance-based neighbor structure. The GLM and GLMM will be evaluated by general-purpose statistical packages such as SAS and R. For the Bayesian model, WinBUGS will be used to carry out the numerical computation. To answer our study questions, the estimated model parameters and their confidence intervals (GLM and GLMM) and credible sets (Bayesian models) will be examined. The residues will also be checked. Criteria such as AIC and DIC will be used to compare different models. 7.a. Will the data be used for non-cvd analysis in this manuscript? x No Yes

b. If Yes, is the author aware that the file ICTDER02 must be used to exclude persons with a value RES_OTH = CVD Research for non-dna analysis, and for DNA analysis RES_DNA = CVD Research would be used? Yes No (This file ICTDER02 has been distributed to ARIC PIs, and contains the responses to consent updates related to stored sample use for research.) 8.a. Will the DNA data be used in this manuscript? _x No Yes 8.b. If yes, is the author aware that either DNA data distributed by the Coordinating Center must be used, or the file ICTDER02 must be used to exclude those with value RES_DNA = No use/storage DNA? n/a Yes No 9.The lead author of this manuscript proposal has reviewed the list of existing ARIC Study manuscript proposals and has found no overlap between this proposal and previously approved manuscript proposals either published or still in active status. ARIC Investigators have access to the publications lists under the Study Members Area of the web site at: http://www.cscc.unc.edu/aric/search.php X Yes No 10. What are the most related manuscript proposals in ARIC (authors are encouraged to contact lead authors of these proposals for comments on the new proposal or collaboration)? MS 1102, MS1103 11. a. Is this manuscript proposal associated with any ARIC ancillary studies or use any ancillary study data? x Yes No 11.b. If yes, is the proposal X_ A. primarily the result of an ancillary study (list number*) AS 2004.05 B. primarily based on ARIC data with ancillary data playing a minor role (usually control variables; list number(s)* ) *ancillary studies are listed by number at http://www.cscc.unc.edu/aric/forms/ 12. Manuscript preparation is expected to be completed in one to three years. If a manuscript is not submitted for ARIC review at the end of the 3-years from the date of the approval, the manuscript proposal will expire.

References 1. Tobler, W. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46, 234-240. 2. Cliff, A. D. and J. K. Ord (1981). Spatial Processes: Models and Applications. London: Pion. 3. Haining, R. (1990). Spatial data analysis in the social and environmental sciences. Cambridge: Cambridge University Press. 4. Cressie, N. A. C. (1993). Statistics for spatial data, rev. ed., pp. 14-15. New York: John Wiley & Sons. 5. Breslow, N. E. and D. G. Clayton (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9-25. 6. Carlin, B. P. and S. Banerjee (2003). Hierarchical multivariate CAR models for spatial-temporally correlated survival data (with discussion). In Bayesian Statistics 7, eds. J.M. Bernardo, et al. Oxford: Oxford University Press, pp 45-63. 7. Gelfand, A. E. and P. Vounatsou (2003). Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics, 4, 11-25.