Community Health Needs Assessment through Spatial Regression Modeling

Community Health Needs Assessment through Spatial Regression Modeling Glen D. Johnson, PhD CUNY School of Public Health glen.johnson@lehman.cuny.edu

Objectives: Assess community needs with respect to particular health outcomes, to assist planning and evaluation of public health programs Incorporate community-level factors that are associated with the health outcome of interest Needs to estimate overall burden (expected number of cases), along with rates

Data Integration, Analysis and Visualization Framework Source data (examples) Vital statistics Hospital Discharges Census Data Environmental data Etc. statistical analyses tables / reports / graphs Maps: printed served (i.e. ArcGIS Online) GeoDatabase Geographic Information System export portable maps: - kml files - ArcGIS layers free virtual globe products for interactive viewing: i.e. Google Earth Many GIS data layers (mostly public domain)

Other approaches to indices of community needs / community deprivation typically take a multivariate statistics approach like cluster analysis, principal components analysis, etc. Our objective is to incorporate multiple community covariables, but have the index driven by particular health outcomes. Also need to account for residual spatial autocorrelation among geographic units of observation.

Solution: Model the health outcome as case counts, offset by the population at risk, aggregated within some suitable geographic resolution, as a function of select community-level covariables and a random effect associated with geographic location. (a spatial Generalized Linear Mixed Model)

Many options for spatial GLMMs Consider 1. How is the response variable distributed? How rare is the outcome? 2. How define a random effect for residual spatial autocorrelation? 3. Frequentist or Bayesian solutions?

Response Variable discrete continuous yes Rare events (many 0 values)? no Poisson Binomial Negative Binomial Gaussian Zero-Inflated models (i.e., ZIP models) Spatial Random Effect Random addition to the Intercept Simultaneous Autoregressive Term - Spatial error and spatial lag Conditional Autoregressive Term Local Neighborhood Definition Queen vs Rook 1 st Order, 2 nd Order? Geostatistical Term Defined by variogrambased distance-decay weighting of neighbors

Frequentist vs Bayesian Estimation Frequentist solution is through psuedo-likelihood estimation Model parameters and predicted values are point estimates and associated standard errors Bayesian solution through Monte Carlo Markov Chains (MCMC) or Integrated Nested Laplace Approximation (INLA)

Case Study: an Adolescent Sexual Health Community Needs Index To help guide funding allocation in a competitive RFP process to support a Comprehensive Adolescent Pregnancy Prevention Program in New York State. - a state-funded community-based public health program

For each ZIP code: Response (i.e. Teen Pregnancy cases) Population at Risk Predictors: % pop. > age 24 w/ 4-year or greater college degree % single-parent households out of households w/ at least one child < 18 years old % of tot. pop. that is Black Alone % of tot. pop. that is Hispanic, regardless of race % of tot. pop. that is a foreign-born naturalized citizen % of tot. pop. with income below poverty County (crude indicator of location effect)

Random addition to the intercept model For i = 1,, n ZIP codes, let y i = observed caseload, assumed to follow a Poisson or more general negative binomial distribution, n i = population at risk, {x 1,, x p } i = community predictors, {β 1,, β p } = coefficients, and L i = location effect, arising from a random process such that L i ~ N(0, σ L2 ). Then, the expected value of Y i, given {x 1,, x p, L} i = E[Y i {x 1,, x p, L} i ] = n i exp(β 1 x 1i + + β p x pi + L i ) Relative Risk of ZIP i

Illustration of random addition to the intercept

Teen Pregnancy, Poisson regression with a random addition to the intercept. Effect Estimate t Value Pr > t College % -0.02165-22.01 <.0001 Single % 0.009872 9.38 <.0001 Foreignborn % 0.006376 5.77 <.0001 Black % 0.003025 4.70 <.0001 Hispanic % 0.007236 8.24 <.0001 Below Poverty % 0.01359 10.23 <.0001 Pseudo-likelihood estimation with SAS Proc GLIMMIX

Conditional autoregressive (CAR) model EY [ ] s i i i i where the error term, on the local neighborhood, 2 [ s ] ~ N( i i, ), where i i j i j w i x β ij w ij j i and s, is defined as conditional i 2 i, of errors, such that: i j i 2 w ij.

A Conditional Autoregressive (CAR) model defines neighbors as those units sharing a common border (much less arbitrary than something like all units in the same county) j w ij i For each unit, i, neighbors are defined by weights, w ij, for every other unit j. The simplest specification is w ij = 1 if unit j is adjacent to unit i and 0 otherwise.

Teen Pregnancy, conditional autoregressive model Quantiles of model coefficients, based on 10,000 simulated values. Median 2.50% 97.50% College % -0.0204-0.0231-0.0177 Single % 0.0093 0.0057 0.0127 Foreignborn % 0.0029-0.0023 0.0087 Black % 0.0044 0.0017 0.0074 Hispanic % 0.0126 0.0084 0.017 Below Poverty % 0.0144 0.0084 0.0202 tau 2 * 0.5897 0.4868 0.7093 sigma 2 0.0039 0.0005 0.0162 * tau 2 = measure of local spatial clustering effect MCMC solution through CARBayes in R

Teen STD incidence, Poisson regression with a random addition to the intercept. Effect Estimate t Value Pr > t College % 0.001307 1.71 0.0878 Single % 0.01340 15.15 <.0001 Foreign born % 0.003374 4.53 <.0001 Black % 0.01766 34.99 <.0001 Hispanic % 0.009081 13.25 <.0001 Below Poverty % 0.008598 8.24 <.0001 Pseudo-likelihood estimation with SAS Proc GLIMMIX

The ASHNI : Teen pregnancy and STD incident estimated per annum case load for the years 2011-13

Teen pregnancy per annum rates for the years 2011-13, as the median of 10,000 values simulated from the posterior distribution for each ZIP code Buffalo Rochester NYC

This needs index is essentially a Community-level risk-adjusted estimate of the caseload for each ZIP code area, based on a larger reference population (i.e. statewide)

Current research is to improve on STD modelling The Poisson random addtion to the intercept model yields sensible predicted values, but model fit diagnostics are not acceptable. A negative binomial random addtion to the intercept model yields acceptable model fit diagnostics but unacceptable predicte values.

STD data are too sparse, with nearly 40% of ZIP codes yielding 0 cases. Therefore need a zero-inflated model (zero-inflated Poisson). Solution for a zero-inflated Poisson regression with a conditional autoregressive spatial random effect is with the newer INLA method (INLA-R).

A ZIP-CAR model is in the works...