Community Health Needs Assessment through Spatial Regression Modeling

Similar documents
Bayesian Hierarchical Models

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

Spatial Variation in Local Road Pedestrian and Bicycle Crashes

In matrix algebra notation, a linear model is written as

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Aggregated cancer incidence data: spatial models

Measuring community health outcomes: New approaches for public health services research

DIFFERENT INFLUENCES OF SOCIOECONOMIC FACTORS ON THE HUNTING AND FISHING LICENSE SALES IN COOK COUNTY, IL

Disease mapping with Gaussian processes

Statistics: A review. Why statistics?

Andrew B. Lawson 2019 BMTRY 763

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Temporal vs. Spatial Data

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

Statistícal Methods for Spatial Data Analysis

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Multi-level Models: Idea

Multilevel Statistical Models: 3 rd edition, 2003 Contents

STAT 518 Intro Student Presentation

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Part 8: GLMs and Hierarchical LMs and GLMs

I don t have much to say here: data are often sampled this way but we more typically model them in continuous space, or on a graph

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

DEVELOPING DECISION SUPPORT TOOLS FOR THE IMPLEMENTATION OF BICYCLE AND PEDESTRIAN SAFETY STRATEGIES

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Modeling the Ecology of Urban Inequality in Space and Time

Objectives Define spatial statistics Introduce you to some of the core spatial statistics tools available in ArcGIS 9.3 Present a variety of example a

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Spatial Smoothing in Stan: Conditional Auto-Regressive Models

Chapter 1. Modeling Basics

Combining Incompatible Spatial Data

How is Your Health? Using SAS Macros, ODS Graphics, and GIS Mapping to Monitor Neighborhood and Small-Area Health Outcomes

Tracey Farrigan Research Geographer USDA-Economic Research Service

Integrating GIS into Food Access Analysis

The econ Planning Suite: CPD Maps and the Con Plan in IDIS for Consortia Grantees Session 1

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Beyond MCMC in fitting complex Bayesian models: The INLA method

Modelling geoadditive survival data

An Introduction to SaTScan

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Using Estimating Equations for Spatially Correlated A

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Spatio-Temporal Modelling of Credit Default Data

This report details analyses and methodologies used to examine and visualize the spatial and nonspatial

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Nature of Spatial Data. Outline. Spatial Is Special

Bayesian Methods in Multilevel Regression

Lecture 1 Introduction to Multi-level Models

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

Measuring Geographic Access to Primary Care Physicians

Generalized Linear Models for Non-Normal Data

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

STA 4273H: Statistical Machine Learning

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Package HGLMMM for Hierarchical Generalized Linear Models

Exploratory Spatial Data Analysis Using GeoDA: : An Introduction

Geographical Information Systems Institute. Center for Geographic Analysis, Harvard University. GeoDa: Spatial Autocorrelation

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

STAT Lecture 11: Bayesian Regression

ZERO INFLATED POISSON REGRESSION

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

STAT 705 Generalized linear mixed models

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Generalized common spatial factor model

COLUMN. Spatial Analysis in R: Part 2 Performing spatial regression modeling in R with ACS data

Bayesian Areal Wombling for Geographic Boundary Analysis

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007

Hierarchical Additive Modeling of Nonlinear Association with Spatial Correlations

A short introduction to INLA and R-INLA

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

SAS/STAT 15.1 User s Guide Introduction to Mixed Modeling Procedures

Summary STK 4150/9150

Spatial Discrete Choice Models

Social and Economic Impacts of Brownfield Redevelopment in Florida New Analytical Tools to Assess the State and Tribal Response Program

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

Chapter 4 - Fundamentals of spatial processes Lecture notes

Medical GIS: New Uses of Mapping Technology in Public Health. Peter Hayward, PhD Department of Geography SUNY College at Oneonta

SuperMix2 features not available in HLM 7 Contents

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

Section Poisson Regression

GIS 520 Data Cardinality. Joining Tabular Data to Spatial Data in ArcGIS

USING MAPS TO SUPPORT TOBACCO EVALUATION: An Overview of ArcGIS and Tableau

2/7/2018. Module 4. Spatial Statistics. Point Patterns: Nearest Neighbor. Spatial Statistics. Point Patterns: Nearest Neighbor

Bayesian Spatial Health Surveillance

Index. Pagenumbersfollowedbyf indicate figures; pagenumbersfollowedbyt indicate tables.

Introduction to Spatial Analysis. Spatial Analysis. Session organization. Learning objectives. Module organization. GIS and spatial analysis

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

FleXScan User Guide. for version 3.1. Kunihiko Takahashi Tetsuji Yokoyama Toshiro Tango. National Institute of Public Health

Multivariate Count Time Series Modeling of Surveillance Data

OBESITY AND LOCATION IN MARION COUNTY, INDIANA MIDWEST STUDENT SUMMIT, APRIL Samantha Snyder, Purdue University

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Lecture 7 Autoregressive Processes in Space

Long Island Breast Cancer Study and the GIS-H (Health)

Transcription:

Community Health Needs Assessment through Spatial Regression Modeling Glen D. Johnson, PhD CUNY School of Public Health glen.johnson@lehman.cuny.edu

Objectives: Assess community needs with respect to particular health outcomes, to assist planning and evaluation of public health programs Incorporate community-level factors that are associated with the health outcome of interest Needs to estimate overall burden (expected number of cases), along with rates

Data Integration, Analysis and Visualization Framework Source data (examples) Vital statistics Hospital Discharges Census Data Environmental data Etc. statistical analyses tables / reports / graphs Maps: printed served (i.e. ArcGIS Online) GeoDatabase Geographic Information System export portable maps: - kml files - ArcGIS layers free virtual globe products for interactive viewing: i.e. Google Earth Many GIS data layers (mostly public domain)

Other approaches to indices of community needs / community deprivation typically take a multivariate statistics approach like cluster analysis, principal components analysis, etc. Our objective is to incorporate multiple community covariables, but have the index driven by particular health outcomes. Also need to account for residual spatial autocorrelation among geographic units of observation.

Solution: Model the health outcome as case counts, offset by the population at risk, aggregated within some suitable geographic resolution, as a function of select community-level covariables and a random effect associated with geographic location. (a spatial Generalized Linear Mixed Model)

Many options for spatial GLMMs Consider 1. How is the response variable distributed? How rare is the outcome? 2. How define a random effect for residual spatial autocorrelation? 3. Frequentist or Bayesian solutions?

Response Variable discrete continuous yes Rare events (many 0 values)? no Poisson Binomial Negative Binomial Gaussian Zero-Inflated models (i.e., ZIP models) Spatial Random Effect Random addition to the Intercept Simultaneous Autoregressive Term - Spatial error and spatial lag Conditional Autoregressive Term Local Neighborhood Definition Queen vs Rook 1 st Order, 2 nd Order? Geostatistical Term Defined by variogrambased distance-decay weighting of neighbors

Frequentist vs Bayesian Estimation Frequentist solution is through psuedo-likelihood estimation Model parameters and predicted values are point estimates and associated standard errors Bayesian solution through Monte Carlo Markov Chains (MCMC) or Integrated Nested Laplace Approximation (INLA)

Case Study: an Adolescent Sexual Health Community Needs Index To help guide funding allocation in a competitive RFP process to support a Comprehensive Adolescent Pregnancy Prevention Program in New York State. - a state-funded community-based public health program

For each ZIP code: Response (i.e. Teen Pregnancy cases) Population at Risk Predictors: % pop. > age 24 w/ 4-year or greater college degree % single-parent households out of households w/ at least one child < 18 years old % of tot. pop. that is Black Alone % of tot. pop. that is Hispanic, regardless of race % of tot. pop. that is a foreign-born naturalized citizen % of tot. pop. with income below poverty County (crude indicator of location effect)

Random addition to the intercept model For i = 1,, n ZIP codes, let y i = observed caseload, assumed to follow a Poisson or more general negative binomial distribution, n i = population at risk, {x 1,, x p } i = community predictors, {β 1,, β p } = coefficients, and L i = location effect, arising from a random process such that L i ~ N(0, σ L2 ). Then, the expected value of Y i, given {x 1,, x p, L} i = E[Y i {x 1,, x p, L} i ] = n i exp(β 1 x 1i + + β p x pi + L i ) Relative Risk of ZIP i

Illustration of random addition to the intercept

Teen Pregnancy, Poisson regression with a random addition to the intercept. Effect Estimate t Value Pr > t College % -0.02165-22.01 <.0001 Single % 0.009872 9.38 <.0001 Foreignborn % 0.006376 5.77 <.0001 Black % 0.003025 4.70 <.0001 Hispanic % 0.007236 8.24 <.0001 Below Poverty % 0.01359 10.23 <.0001 Pseudo-likelihood estimation with SAS Proc GLIMMIX

Conditional autoregressive (CAR) model EY [ ] s i i i i where the error term, on the local neighborhood, 2 [ s ] ~ N( i i, ), where i i j i j w i x β ij w ij j i and s, is defined as conditional i 2 i, of errors, such that: i j i 2 w ij.

A Conditional Autoregressive (CAR) model defines neighbors as those units sharing a common border (much less arbitrary than something like all units in the same county) j w ij i For each unit, i, neighbors are defined by weights, w ij, for every other unit j. The simplest specification is w ij = 1 if unit j is adjacent to unit i and 0 otherwise.

Teen Pregnancy, conditional autoregressive model Quantiles of model coefficients, based on 10,000 simulated values. Median 2.50% 97.50% College % -0.0204-0.0231-0.0177 Single % 0.0093 0.0057 0.0127 Foreignborn % 0.0029-0.0023 0.0087 Black % 0.0044 0.0017 0.0074 Hispanic % 0.0126 0.0084 0.017 Below Poverty % 0.0144 0.0084 0.0202 tau 2 * 0.5897 0.4868 0.7093 sigma 2 0.0039 0.0005 0.0162 * tau 2 = measure of local spatial clustering effect MCMC solution through CARBayes in R

Teen STD incidence, Poisson regression with a random addition to the intercept. Effect Estimate t Value Pr > t College % 0.001307 1.71 0.0878 Single % 0.01340 15.15 <.0001 Foreign born % 0.003374 4.53 <.0001 Black % 0.01766 34.99 <.0001 Hispanic % 0.009081 13.25 <.0001 Below Poverty % 0.008598 8.24 <.0001 Pseudo-likelihood estimation with SAS Proc GLIMMIX

The ASHNI : Teen pregnancy and STD incident estimated per annum case load for the years 2011-13

Teen pregnancy per annum rates for the years 2011-13, as the median of 10,000 values simulated from the posterior distribution for each ZIP code Buffalo Rochester NYC

This needs index is essentially a Community-level risk-adjusted estimate of the caseload for each ZIP code area, based on a larger reference population (i.e. statewide)

Current research is to improve on STD modelling The Poisson random addtion to the intercept model yields sensible predicted values, but model fit diagnostics are not acceptable. A negative binomial random addtion to the intercept model yields acceptable model fit diagnostics but unacceptable predicte values.

STD data are too sparse, with nearly 40% of ZIP codes yielding 0 cases. Therefore need a zero-inflated model (zero-inflated Poisson). Solution for a zero-inflated Poisson regression with a conditional autoregressive spatial random effect is with the newer INLA method (INLA-R).

A ZIP-CAR model is in the works...