MODULE 12: Spatial Statistics in Epidemiology and Public Health Lecture 7: Slippery Slopes: Spatially Varying Associations

Similar documents
Modeling spatial variation in the relationships between violence and alcohol and drug

Bayesian Hierarchical Models

Simultaneous Coefficient Penalization and Model Selection in Geographically Weighted Regression: The Geographically Weighted Lasso

GeoDa-GWR Results: GeoDa-GWR Output (portion only): Program began at 4/8/2016 4:40:38 PM

Linear Regression Models P8111

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Community Health Needs Assessment through Spatial Regression Modeling

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

Spatial Smoothing in Stan: Conditional Auto-Regressive Models

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Fully Bayesian Spatial Analysis of Homicide Rates.

Hierarchical Additive Modeling of Nonlinear Association with Spatial Correlations

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Aggregated cancer incidence data: spatial models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Part 8: GLMs and Hierarchical LMs and GLMs

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Spatial Analysis of Incidence Rates: A Bayesian Approach

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

WU Weiterbildung. Linear Mixed Models

Lecture 1 Introduction to Multi-level Models

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Generalized Linear Models for Non-Normal Data

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Analysis of Violent Crime in Los Angeles County

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Statistics: A review. Why statistics?

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

Lecture 8. Poisson models for counts

Multi-level Models: Idea

Linear Regression (9/11/13)

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Generalized Linear Models. Kurt Hornik

Spatial Variation in Infant Mortality with Geographically Weighted Poisson Regression (GWPR) Approach

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Linear Methods for Prediction

Introduction (Alex Dmitrienko, Lilly) Web-based training program

Open Problems in Mixed Models

Challenges in modelling air pollution and understanding its impact on human health

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

For more information about how to cite these materials visit

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Quasi-likelihood Scan Statistics for Detection of

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Using Estimating Equations for Spatially Correlated A

Generalized Additive Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

General Regression Model

Measurement Error in Spatial Modeling of Environmental Exposures

Model Estimation Example

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Outline of GLMs. Definitions

STA 450/4000 S: January

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Part 6: Multivariate Normal and Linear Models

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Using Spatial Statistics Social Service Applications Public Safety and Public Health

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Gibbs Sampling in Endogenous Variables Models

Asymptotic standard errors of MLE

Penalized Loss functions for Bayesian Model Choice

Combining Incompatible Spatial Data

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Figure 36: Respiratory infection versus time for the first 49 children.

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Bayesian Thinking in Spatial Statistics

Hierarchical Modelling for Univariate and Multivariate Spatial Data

UNIVERSITY OF TORONTO Faculty of Arts and Science

Modelling geoadditive survival data

Bayesian Regression Linear and Logistic Regression

A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources

Exam Applied Statistical Regression. Good Luck!

Multivariate spatial modeling

Semiparametric Generalized Linear Models

Exploratory Spatial Data Analysis (ESDA)

A Micro-Spatial Analysis of Violent Crime

Multilevel spatio-temporal dual changepoint models for relating alcohol outlet destruction and changes in neighbourhood rates of assaultive violence

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

Survival Analysis I (CHL5209H)

Modeling Overdispersion

Poisson regression: Further topics

Generalized common spatial factor model

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Chapter 22: Log-linear regression for Poisson counts

Model comparison: Deviance-based approaches

Transcription:

MODULE 12: Spatial Statistics in Epidemiology and Public Health Lecture 7: Slippery Slopes: Spatially Varying Associations Jon Wakefield and Lance Waller 1 / 53

What are we doing? Alcohol Illegal drugs Violent crimes Regression Breaking the rules (what if the associations change in space?) 2 / 53

Fix it Model it SVC 3 / 53

Acknowledgements and References Collaborators: Paul Gruenewald, Dennis Gorman, Li Zhu, Carol Gotway, and David Wheeler References: Waller et al. (2008) Quantifying geographical associations between alcohol distribution and violence... Stoch Environ Res Risk Assess 21: 573-588. Wheeler and Caldor (2009) As assessment of coefficient accuracy...j Geogr Systems 9: 573-588. Wheeler and Waller (2009) Comparing spatially varying coefficient models... J Geogr Systems 11: 1-22. Finley (2011) Comparing spatially-varying coefficient models...methods in Ecology and Evolution 2: 143-154. 4 / 53

What do we want to do? Quantify associations between outcomes and covariates as observed in data. What if strength of association varies across space? 5 / 53

Motivating example Outcome: Number of violent crime reports by census tract Covariates: Alcohol sales, illegal drug arrests (also by census tract). Interaction of people and place (social disorganization, routine activities, crime potential) Q: What if the association depends on location? 6 / 53

Fix it Model it What analytic tools do we have? Linear regression: Y i = X i β + ɛ i ɛ i ind N(0, σ 2 ) (alternately, Y i ind N(X i β, σ2 )) Assumptions: independence, Gaussian errors, constant variance, linear association, constant β. We will break all but one of these. 7 / 53

Fix it Model it How do you measure association? β = (X X) 1 X Y from OLS (= MLE if Gaussian) What if an assumption doesn t hold? Two common strategies: * Fix it (adjustments to LS) * Model it (adjustments to ML) 8 / 53

Fix it Model it Fix it approach Not Gaussian? Transformation (log, Box-Cox,...). σ 2 not constant? WLS: β WLS = (X WX) 1 X WY ɛ i (Y i ) not independent? GLS: β GLS = (X ΣX) 1 X ΣY Fixing multiple problems? 9 / 53

Fix it Model it Fixing multiple problems one at a time 10 / 53

Fix it Model it Model it approach Random effects (mixed models). Modeling non-independence hierarchically: Y ij b j ind N(X β + b j, σ 2 ), b j N(0, τ 2 ), where b j = random intercept for elements of jth subgroup. 11 / 53

y Outline Fix it Model it Example of subgroup random intercepts 2 0 2 4 6 10 5 0 5 10 15 x 12 / 53

Fix it Model it Impact of random effects Impact: Adds extra noise, adds (positive) correlation within subgroups. Borrow information across subgroups. Allows modeler to specify variance-covariance structure. Fit with SAS PROC MIXED or R. What about a random intercept for each census tract with spatial correlation? b MVN(0, Σ). 13 / 53

Fix it Model it Inference for hierarchical models [Y β, σ 2, b][b τ 2 ][β][τ 2 ] = A B C D. Bayesian: A = (conditionally independent) likelihood, B, C = priors, D = hyperprior, inference based on [β, σ 2, b Y] A B C D. Classical: A B = (correlated) likelihood, C, D N/A, inference based on likelihood. Both computationally intensive. 14 / 53

SVC What about spatially varying associations? Fix it: Geographically weighted regression (GWR) Fotheringham et al. (2002) Model it: Spatially varying coefficient (SVC) models Leyland et al. (2000), Assuncao et al. (2003), Gelfand et al. (2003), Gamerman et al. (2003), Congdon (2003, 2006) 15 / 53

SVC Geographically weighted regression OLS: ˆβ = (X X) 1 X Y. WLS: ˆβ WLS = (X WX) 1 X WY. GWR: ˆβ GWR (s) = (X W(s)X) 1 X W(s)Y. s represents any spatial location in the study area. W(s) = diagonal matrix of weights for each observation (Y i ) with higher weights given to observations closer to s. 16 / 53

SVC GWR GWR: β GWR (s) = (X W(s)X) 1 X W(s)Y. Similar to local regression smoothing (e.g., loess), but here local means in geographic space, not covariate space. W(s) typically defined by a distance-decay kernel function such as the Gaussian kernel ( W(s) i = exp 1 ( ) ) s 2 si, 2 bw where s i = location of observation Y i and bw = bandwidth (smoothing parameter). Larger bw, smoother β GWR (s) surface. 17 / 53

SVC Example of Gaussian kernel 0.0 0.2 0.4 0.6 0.8 1.0 y 0 2 4 6 8 10 x 18 / 53

SVC Notes Fotheringham et al. (2002) (and their software) suggest using bw which minimizes the AIC. GWR a descriptive technique. Inference is complicated. Observation Y i receives different weight for each s. Provides a smooth estimate of the β GWR (s) surface, but not a probability model of Y as a function of β(s). Testing (e.g., β(s) = 0 at particular locations) difficult due to correlation induced by kernel function between separate local model fits (no global model). Most testing approaches based on ad hoc adjustments. 19 / 53

SVC Spatially varying coefficient models How could we do this with random effects? Consider a single covariate model Y i b 0,i ind N(β 0 + x 1,i β 1 + b 0,i, σ 2 ) b MVN(0, Σ) and Σ defines a spatial correlation matrix. β 0 + b 0,i sort of like β 0 (s) in GWR, but... GWR: β0 (s) is a smooth surface in GWR with smoothness defined by bw. SVC: Estimate β 0 and elements of Σ from data. These parameters induce spatial correlation between intercept associated with Y i and that of its neighbors. 20 / 53

SVC SVC Now, let s add spatial variation to β 1. Y i b 0,i, b 1,i ind N(β 0 + x 1,i β 1 + b 0,i + x 1,i b 1,i, σ 2 ) b 0 MVN(0, Σ b0 ) b 1 MVN(0, Σ b1 ) 21 / 53

SVC Notes Good news: [b 1 Y] gives inference we want (but at a computational cost). Not-so-good news: Identifiability Often complicated to fit via MCMC (hence the flurry of papers in 2003). 22 / 53

SVC So... What if the data are non-gaussian? Model it: GLMs Y =0/1 or proportions, logistic regression. Y = counts or rates, Poisson regression. Some additional baggage (mean and variance related) Estimate β via iteratively reweighted least squares. GWR for Poisson regression (Nakaya et al. 2005). GLMM for adding random effects (Agresti et al. 2000). 23 / 53

SVC My background: Spatial epidemiology Poisson regression with spatially correlated random effects. Outcomes: Counts of (incident or prevalent) cases of disease in small areas. Covariates: Environmental exposures. Potential confounders: Demographics. Data often from different sources (health department, EPA, census), linked by location via geographic information systems (GISs). Spatial components: Adjusting estimation for residual spatial correlation. 24 / 53

SVC Our data for today Outcome: Rates (number of cases per person per year) of violent crimes (police/sheriff reports). Covariates: Alcohol distribution (licenses and sales), illegal drug arrests (police/sheriff reports). Potential confounders: Sociodemographics (census). Linked to common spatial framework (census tracts) via GIS. 25 / 53

SVC Translation complications When are crime data like disease data? Counts from small areas. Per person rate of interest. When are crime data not like disease data? Outcome not as rare. Police vs. medical records. Residents not only ones at risk. 26 / 53

Background: Alcohol, drug arrests, and violent crime Large body of research showing association between alcohol distribution and the incidence of violence. Usually focuses on characteristics of: People (social normative, social disorganization theories) Places (routine activities theory) Interactions of people and places (crime potential, ecology of crime) Alcohol distribution of interest since it is regulated and we have data on what and how much is sold where. 27 / 53

Three questions and who asks them What population characteristics are associated with increased incidence of violence? (Criminologists, Sampson and Lauritsen 1994). In what places is violence more likely? (Routine activity theorists, Felson et al. 1997). What spatial interactions link people and place characteristics to violence? (Environmental criminologists, Brantingham and Brantingham 1993, 1999). 28 / 53

Data description Spatial support: 439 census tracts (2000 Census). Violent crime (murder, robbery, rape, aggravated assault) first reports for year 2000 from City of Houston Police Department website. Gorman et al. (2005, Drug Alcohol Rev) report less than 5% discrepancy with 2000 Uniform Crime Reports. 98% of reports geocoded to the census tract level. Alcohol data (locations of active distribution sites in 2000) from Texas Alcoholic Beverage Commission (6,609 outlets), 99.5% geocoded to the tract level. Drug law violations (also from City of Houston police data). 98% geocoded to the tract level. 29 / 53

Violent Crime reporting rates, Houston, 2000 Violent Crimes per Person, 2000 0.000-0.002 0.002-0.007 0.007-0.011 0.011-0.017 0.017-13.33 30 / 53

Standarized log(drug arrests), Houston, 2000 Standardized Log(Drug Arrests), 2000-1.83 - -0.83-0.83 - -0.20-0.20-0.31 0.31-0.79 0.79-4.63 31 / 53

Standarized log(alcohol sales), Houston, 2000 Standardized Log(Alcohol Sales), 2000 Standardized Log(Total Sales) -2.39 - -0.65-0.65 - -0.13-0.13-0.21 0.21-0.64 0.64-6.16 32 / 53

Data features 7 of 439 tracts have extremely small population sizes: 1, 3, 4, 16, 34, 116, and 246. Tracts typically have 3,000-5,000 residents. Local rates for such tracts are extremely unstable (e.g., 40 reports, 3 residents). Actually a motivating a reason for including the spatially varying intercept: borrow information across regions. 33 / 53

Low population tracts and high rates Low population size tracts Tract population less than 250 Per person rate of violent crime reports 0.00-1.00 1.00-13.33 34 / 53

Basic Poisson regression Let Y i = number of reports in tract i, i = 1,..., 439. Suppose Y i Poisson(E i exp(µ i )), where E i = the expected number of reports under some null model. Typically, E i = n i R where all n i individuals in region i are equally likely to report. exp(µ i ) = relative risk of outcome in region i. We add covariates in linear format (within exp( )): µ i = β 0 + β 1 x alc,i + β 2 x drug,i. Same skeleton for both GWR and SVC. 35 / 53

Why do we have E i? E i = n i R represents an offset in the model and lets us use Poisson regression to model rates as well as counts. E[Y i ] = E i exp(β 0 + β 1 x alc,i + β 2 x drug,i ) = exp(ln(e i ) + β 0 + β 1 x alc,i + β 2 x drug,i ) = exp(ln(n i ) + ln(r) + β 0 + β 1 x alc,i + β 2 x drug,i ) log(e[y i ]) = ln(n i ) + ln(r) + β 0 + β1x alc,i + β 2 x drug,i GWR offset: ln(n i ), SVC offset: ln(n i ) + ln(r). 36 / 53

GWPR (Nakaya et al., 2005) βgwpr = (X W(s)A(s)X) 1 X W(s)A(s)Z(s). A(s) = diagonal matrix of Fisher scores. Z(s) = Taylor-series approximation to transformed outcomes. Update A(s), Z(s) and β GWPR until convergence. 37 / 53

Fitting in R Waller et al. (2007) use GWR 3.0 software. Now can use R. maptools will read in ArcGIS-formatted shapefile (files) into R. spgwr fits linear GWR and GLM-type GWR. Let s try it out! 38 / 53

SVC µ i = β 0 + β 1 x alc,i + β 2 x drug,i + b 1,i x alc,i + b 2,i x drug,i + φ i + θ i. β 0, β 1, β 2 Uniform. Random intercept has 2 components (Besag et al. 1991): N(0, τ 2 ) ( j φ i φ j N w ) ijφ j 1 j w, ij λ j w. ij θ i ind where w ij defines neighbors, and λ controls spatial similarity. θ i allows overdispersion (smoothing to global mean). φ i follows conditionally autoregressive distribution (smoothing to local mean), generates MVN but more convenient for MCMC. 39 / 53

Defining the SVCs b 1, b 2 also given spatial priors and allowed to be correlated with one another. We use a formulation by Leyland et al. (2000) which defines. (b 1,i, b 2,i ) MVN((0, 0), Σ) 40 / 53

Fitting in WinBUGS Define the model in WinBUGS. MCMC fit. Note: Runs sloooooooowly. 41 / 53

Implementation Waller et al. (2007): GWR3.0 used to fit the GWR Poisson model. Converged to estimate in 100 iterations. Minutes. Example code using spgwr library in R. WinBUGS 1.4.1 used to fit SVC model. Converged to distribution in 2,000 iterations. 8,000 additional iterations used for inference. Hours. Fit several versions of SVC model and compared fit via deviance information criterion (Spiegelhalter et al., 2003). Best fit included spatial varying coefficients, random intercept, and correlation between alcohol and drug effects. 42 / 53

Frequency 0 50 100 150 200 Frequency 0 50 100 150 200 Frequency 0 50 100 150 200 Outline : Intercept WinBUGS: Posterior medians of beta0 + phi -6.5-6.0-5.5-5.0-4.5-4.0-3.5 Tract-specific beta0 + phi WinBUGS: Posterior medians of beta0 + phi + theta -6.5-6.0-5.5-5.0-4.5-4.0-3.5 Tract-specific beta0 + phi + theta GWR: Smoothed estimates of beta0-6.5-6.0-5.5-5.0-4.5-4.0-3.5 Tract-specific beta0 43 / 53

Frequency 0 50 100 150 200 Frequency 0 50 100 150 200 Frequency 0 50 100 150 200 Frequency 0 50 100 150 200 Outline : Alcohol sales and drug arrests WinBUGS: Posterior medians of beta1 WinBUGS: Posterior medians of beta2-1.0 0.0 0.5 1.0 1.5 2.0 Tract-specific beta1 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Tract-specific beta2 GWR: Smoothed estimates of beta1 GWR: Smoothed estimates of beta2-1.0 0.0 0.5 1.0 1.5 2.0 Tract-specific beta1 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Tract-specific beta2 44 / 53

Estimated effects GWR: Local Estimate of Intercept WinBUGS: Local Estimate of Intercept -6.200 - -4.914-4.914 - -4.805-4.805 - -4.727-4.727 - -4.66-4.66 - -4.554-6.168 - -5.237-5.237 - -5.014-5.014 - -4.862-4.862 - -4.665-4.665 - -4.021 GWR: Local Estimate of Beta 1 WinBUGS: Local Estimate of Beta 1-0.186-0.076 0.076-0.13 0.13-0.178 0.178-0.235 0.235-1.385-0.273-0.069 0.069-0.13 0.13-0.182 0.182-0.234 0.234-0.933 GWR: Local Estimate of Beta 2 WinBUGS: Local Estimate of Beta 2 0.498-0.67 0.67-0.723 0.723-0.784 0.784-0.903 0.903-1.69 0.181-0.687 0.687-0.814 0.814-0.923 0.923-1.083 1.083-2.278 45 / 53

Similarities Alcohol: Increased impact in western, south-central, and southeastern parts of city. Illegal drug: Increased impact on periphery, lower influence in central and southwestern parts of city. Intercept: Increased risk of violence in central area, above and beyond that predicted by alcohol sales and illegal drug arrests. But, associations not too close... 46 / 53

GWR estimate 0.0 1.0 2.0 3.0 GWR estimate -6.5-5.5-4.5-3.5 GWR estimate -1.0 0.0 1.0 2.0 Outline : tract-by-tract Local estimates of Intercept Local estimates of beta1-6.5-5.5-4.5-3.5 WinBUGS estimate -1.0 0.0 1.0 2.0 WinBUGS estimate Local estimates of beta2 0.0 1.0 2.0 3.0 WinBUGS estimate 47 / 53

Differences GWR much smoother based on global best fit for bw. SVC used adjacency-based smoothing and a different amount of smoothing for each covariate. GWR: collineary between surfaces (Wheeler and Tiefelsdorf, 2005). SVC: Model based approach removes (or at least reduces) collinearity. 48 / 53

GWR: Collinearity? 49 / 53

SVC: Collinearity? 50 / 53

Bayes ind: beta 2 0.5 1.0 1.5 2.0 Outline SVC: No prior correlation 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Bayes ind: beta 1 51 / 53

Let s try it out! Houston data on violent crime, alcohol sales, and illegal drug arrests. ArcGIS shapefile. Required R libraries: map tools (to read in shape file), RColorBrewer (to set colors), classint (to set intervals of values for mapping), and spgwr (for GWR). 52 / 53

GWR and SVC very different approaches to the same problem. Qualitatively similar in results, but not directly transformable. GWR fixed problems within somewhat of a black box. SVC allows probability model-based inference with lots of flexibility but at a computational cost (both in set-up and implementation). Ongoing work: Wheeler and Waller (2009): Attempt to set up SVC model to more closely mirror amount of smoothing in GWR. Collinearity ribbons. Griffith (2002) eigenvector spatial filtering to adjust collinearity. Interpretability? 53 / 53