Estimating the long-term health impact of air pollution using spatial ecological studies. Duncan Lee

Similar documents
Challenges in modelling air pollution and understanding its impact on human health

A Bayesian localised conditional autoregressive model for estimating the health. effects of air pollution

Air Quality Modelling under a Future Climate

Controlling for unmeasured confounding and spatial misalignment in long-term air pollution and health studies

A rigorous statistical framework for estimating the long-term health impact of air pollution

K. (2017) A 18 (2) ISSN

Handbook of Spatial Epidemiology. Andrew Lawson, Sudipto Banerjee, Robert Haining, Lola Ugarte

Air Quality Modelling for Health Impacts Studies

arxiv: v2 [stat.ap] 18 Apr 2016

Bayesian Hierarchical Models

Extended Follow-Up and Spatial Analysis of the American Cancer Society Study Linking Particulate Air Pollution and Mortality

Statistical and epidemiological considerations in using remote sensing data for exposure estimation

Data Integration Model for Air Quality: A Hierarchical Approach to the Global Estimation of Exposures to Ambient Air Pollution

Pumps, Maps and Pea Soup: Spatio-temporal methods in environmental epidemiology

Flexible Spatio-temporal smoothing with array methods

Measurement Error in Spatial Modeling of Environmental Exposures

A spatio-temporal process-convolution model for quantifying health inequalities in respiratory prescription rates in Scotland

Bayesian Spatial Health Surveillance

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University

Aggregated cancer incidence data: spatial models

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Developments in operational air quality forecasting at the Met Office

Package CARBayesdata

Statistical integration of disparate information for spatially-resolved PM exposure estimation

An application of the GAM-PCA-VAR model to respiratory disease and air pollution data

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

Deposited on: 07 September 2010

The Use of Spatial Exposure Predictions in Health Effects Models: An Application to PM Epidemiology

Predicting Long-term Exposures for Health Effect Studies

Extending clustered point process-based rainfall models to a non-stationary climate

Health-exposure modeling and the ecological fallacy

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter

Linkage Methods for Environment and Health Analysis General Guidelines

INTRODUCTION. In March 1998, the tender for project CT.98.EP.04 was awarded to the Department of Medicines Management, Keele University, UK.

Health-Exposure Modelling and the Ecological Fallacy

P -spline ANOVA-type interaction models for spatio-temporal smoothing

Flexible modelling of the cumulative effects of time-varying exposures

the university of british columbia department of statistics technical report # 217

A spatial causal analysis of wildfire-contributed PM 2.5 using numerical model output. Brian Reich, NC State

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Uncertainty in merged radar - rain gauge rainfall products

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit

Regression Modelling. Dr. Michael Schulzer. Centre for Clinical Epidemiology and Evaluation

Geographical Inequalities and Population Change in Britain,

Combining Incompatible Spatial Data

Nuoo-Ting (Jassy) Molitor, Nicky Best, Chris Jackson and Sylvia Richardson Imperial College UK. September 30, 2008

Disease mapping with Gaussian processes

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information

Applications of GIS in Health Research. West Nile virus

Spatio-Temporal Threshold Models for Relating UV Exposures and Skin Cancer in the Central United States

Package CARBayesST. April 17, Type Package

Characterisation of Near-Earth Magnetic Field Data for Space Weather Monitoring

Using sensor data and inversion techniques to systematically reduce dispersion model error

Hierarchical Modeling for Spatio-temporal Data

Alexina Mason. Department of Epidemiology and Biostatistics Imperial College, London. 16 February 2010

Space-time modelling of air pollution with array methods

Combining multiple observational data sources to estimate causal eects

American Journal of EPIDEMIOLOGY

Measurement error effects on bias and variance in two-stage regression, with application to air pollution epidemiology

BAYESIAN HIERARCHICAL MODELS FOR MISALIGNED DATA: A SIMULATION STUDY

Brent Coull, Petros Koutrakis, Joel Schwartz, Itai Kloog, Antonella Zanobetti, Joseph Antonelli, Ander Wilson, Jeremiah Zhu.

Bayesian hierarchical modelling for data assimilation of past observations and numerical model forecasts

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Radar-raingauge data combination techniques: A revision and analysis of their suitability for urban hydrology

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Impact of Cliff and Ord (1969, 1981) on Spatial Epidemiology

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Dirichlet process Bayesian clustering with the R package PReMiuM

Spatio-temporal modeling of weekly malaria incidence in children under 5 for early epidemic detection in Mozambique

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

Spatio-temporal modelling of daily air temperature in Catalonia

BAYESIAN MODEL FOR SPATIAL DEPENDANCE AND PREDICTION OF TUBERCULOSIS

SPATIO-TEMPORAL REGRESSION MODELS FOR DEFORESTATION IN THE BRAZILIAN AMAZON

Points. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

A Geostatistical Approach to Linking Geographically-Aggregated Data From Different Sources

A general mixed model approach for spatio-temporal regression data

Lognormal Measurement Error in Air Pollution Health Effect Studies

ABSTRACT. to serious health problems, including mortality. PM 2.5 has five main components: sulfate,

Fully Bayesian Spatial Analysis of Homicide Rates.

Cross-Sectional Regression after Factor Analysis: Two Applications

On case-crossover methods for environmental time series data

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

SPATIAL ANALYSIS & MORE

Modelling air pollution and personal exposure in Bangkok and Mexico City using a land use regression model

Statistics: A review. Why statistics?

Spatial Analysis of Incidence Rates: A Bayesian Approach

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

Estimating direct effects in cohort and case-control studies

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Cluster Analysis using SaTScan

Introduction to Spatial Data and Models

Using Geographic Information Systems for Exposure Assessment

Statistics 360/601 Modern Bayesian Theory

Community Health Needs Assessment through Spatial Regression Modeling

QUANTIFYING THE SPATIAL INEQUALITY AND TEMPORAL TRENDS IN MATERNAL SMOKING RATES IN GLASGOW

Railway suicide clusters: how common are they and what predicts them? Lay San Too Jane Pirkis Allison Milner Lyndal Bugeja Matthew J.

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Transcription:

Estimating the long-term health impact of air pollution using spatial ecological studies Duncan Lee EPSRC and RSS workshop 12th September 2014

Acknowledgements This is joint work with Alastair Rushworth and Richard Mitchell from the University of Glasgow and Sujit Sahu and Sabyasachi Mukhopadhyay from the University of Southampton, and Paul Agnew, Christophe Sarran, Fiona O Connor and Rachel McInnes from the Met Office. The work is funded by the EPSRC grants EP/J017442/1 and EP/J017485/1. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 2/28

1. Introduction Air pollution has long been known to adversely affect public health, in both the developed and developing world. A recent report by the UK government estimates that particulate matter alone reduces life expectancy by 6 months, with a health cost of 19 billion per year. Epidemiological studies into the effects of air pollution have been conducted since the 1990s, with one of the first being that conducted by Schwartz and Marcus (1990) in London. Since 1990 a large number of studies have been conducted, which collectively have investigated the short-term and long-term health impact of air pollution. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 3/28

Study designs Pollution legislation continues to be informed by epidemiological studies investigating both the short-term and long-term health effects of air pollution exposure. Acute studies investigate the effects resulting from a few days of high exposure. e.g. NMMAPS in the USA, Dominici et al (2002) and APHEA in Europe, Katsouyanni et al (2001). Chronic studies investigate the effects of cumulative (long-term) exposure over months and years. e.g. Dockery et al (1993) in six US cities, and Elliot et al (2007) in the UK. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 4/28

Chronic studies There are two main study designs when investigating the effects of long-term exposure to air pollution. Cohort studies e.g. The Six Cities study by Dockery et al (1993) and the American Cancer study by Pope et al (2002), which relate average air pollution concentrations to the health status of a large pre-defined cohort of people. Spatial ecological studies e.g. Elliot et al (2007) and Lee et al (2009), which relate yearly average air pollution concentrations in a set of contiguous areas (such as electoral wards), against yearly numbers of health events from the population living in that area. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 5/28

2. Spatial ecological studies Spatial ecological studies relate to populations living in a set of n non-overlapping areal units, rather than to individuals. Examples of such studies include Jerrett et al. (2005), Elliott et al. (2007), Lee et al. (2009) and Greven et al. (2011). The health data are denoted by Y = (Y 1,..., Y n ) and E = (E 1,..., E n ), which are the observed and expected numbers of disease cases in each areal unit over a year. The expected numbers of cases are computed using external standardisation, based on age and sex specific disease rates. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 6/28

Example - 323 local authorities in England Respiratory hospitalisation risk in 2010 - SIR k = Y k /E k. 2.5 2.0 Northing 1.5 1.0 Easting 0 100km 0.5 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 7/28

Pollution data Pollution data comes from two distinct sources. Observed data from the AURN network at single fixed geographical points located throughout the study region. These data are known to be measured with little error but do not provide complete spatial coverage of England. Estimated background concentrations over 12 Kilometre grid cells from the Air Quality in the Unified Model (AQUM) run by the Met Office. These model estimates provide complete spatial coverage of the study region, but are known to contain biases and are less accurate then the monitoring data. Both data sets can be available at an hourly resolution. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 8/28

Northing PM2.5 in local authorities in England 0 100km Easting How best to combine these two sources of data to estimate annual average pollution levels in each local authority? 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 9/28

A common statistical model for these data Y k Poisson(E k R k ), log(r k ) = z T k β z + x k β x + φ k, where R k quantifies disease risk in area k, so R k = 1.2 means a 20% increased risk of disease. z k is a vector of other covariates influencing health risk in the kth areal unit, such as measures of deprivation. x k is the estimated annual average pollution concentrations in the kth areal unit, and β x is the log-risk of air pollution on health. φ = (φ 1,..., φ n ) are random effects to model residual spatial autocorrelation not captured by the covariates. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 10/28

Residual spatial autocorrelation Spatial autocorrelation occurs when observations geographically close are more similar than those further apart. The residuals from fitting a regression model to the health data with just (z T k, x k) are typically spatially autocorrelated, requiring the random effect φ k to account for it. The autocorrelation is typically caused by unmeasured confounding, namely the presence of important spatially smooth risk factors that have been omitted from the regression model. Ignoring this autocorrelation is known to bias β x. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 11/28

Modelling spatial correlation Conditional Autoregressive (CAR, Besag et al. (1991)) models are typically specified to capture the spatial autocorrelation in φ, and can be written as a set of n univariate full conditional distributions f (φ k φ k ) for k = 1,..., n as: ( n φ k φ k, τ 2 i=1, W N w kiφ i n i=1 w, ki ) τ 2 n i=1 w. ki Here W = (w ki ) is a binary n n neighbourhood matrix, with w ki = 1, denoted k i if areal units (k, i) share a common border and w ki = 0 otherwise. Here w kk = 0. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 12/28

Interpreting β x The relative risk for a 1µgm 3 increase in pollution concentrations measures the proportional increase in health risk from increasing pollution by 1µgm 3, and is calculated as RR(β x, 1) = E k exp(z T k β z + (x k + 1)β x + φ k ) E k exp(z T k β z + x k β x + φ k ) = exp(1 β x ). Hence a relative risk of 1.03 means a 3% increase in disease risk when the pollution level increases by 1µgm 3. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 13/28

Limitations - spatial autocorrelation The CAR prior forces the random effects (φ 1,..., φ n ) to be globally spatially smooth everywhere. This causes two problems: Collinearity with covariates that are also globally smooth such as air pollution, as was illustrated by Clayton et al. (1993) and Hughes and Haran (2013). The spatial autocorrelation in the data remaining after accounting for the covariates is unlikely to be globally spatially smooth, because the disease data (e.g. the SIR) are not globally smooth so the residuals after removing covariate effects are also unlikely to be. Both these may result in biased estimates of β x. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 14/28

Limitations - Pollution data The pollution concentration for area k, x k is typically taken to be the average value from the set of modelled concentrations lying in area k. However, this has the following limitations: 1 The modelled AQUM data are known to contain biases, so the estimate of the average pollution concentration in each unit may be biased. In contrast, the AURN monitoring data are likely to measure with little error, but do not cover the entire study region. 2 The concentration for area k, x k is assumed to be a true known measurement when estimating its health effect. However, the true average is unknown and x k is an estimate and is subject to error and uncertainty. Ignoring these two issues may result in biased estimates of β x. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 15/28

3. Modelling This project aims to address all of these statistical shortcomings, by: 1 Developing a spatial regression model linking the monitoring and modelled pollution data, thus allowing average pollution concentrations x k to be predicted for each areal unit with an associated measure of uncertainty. 2 Extending the health model so that it allows for the uncertainty in x k. 3 Extending the health model so that it models the spatial autocorrelation (a prior for (φ 1,..., φ n )) more flexibly than the global CAR model, allowing for local rather than global patterns of spatial structure. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 16/28

3.1. A model for the pollution data A spatial regression model was developed, which has the monitoring data as the response and the modelled data as a covariate. This model allows for spatial autocorrelation in the pollution data, which is used to predict the monitoring values at unmeasured locations. Predictions were made at each 12 kilometre grid square (to align with the modelled data), and the predictions were averaged over each local authority to give the predicted average concentration of pollution in each area. A Bayesian modelling approach was taken, allowing a set of N predictions to be made for each local authority average concentration x k. That is, x k is predicted by the set (x (1),..., x (N) ), which quantifies its uncertainty. k k 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 17/28

Average PM 2.5 concentrations in 2010 16 15 14 13 Northing 12 11 10 9 0 100km Easting 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 18/28

Standard deviation in PM 2.5 concentrations 4.0 3.5 3.0 Northing 2.5 2.0 1.5 1.0 0 100km 0.5 Easting 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 19/28

3.2. Allowing for uncertainty in x k The standard regression model assumes x k is a known value measured without error, which is unrealistic in this setting. Two main statistical approaches have been proposed for correcting this. 1 Measurement error models - This approach assumes the predicted values (x (1) k,..., x (N) k ) are error prone measurements of the single true but unknown concentration x k. 2 Ecological bias models - This approach assumes the predicted values (x (1) k,..., x (N) k ) represent the range of concentrations in area k, allowing for within area variation in the concentrations. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 20/28

Ecological bias models As the health data Y k summarises disease burden over a population, the standard model naively assumes the pollution concentration x k is the same for each individual within area k. Work by Wakefield and Salway (2001) and others has shown that when there is within area variation in pollution concentrations, the estimated β x from the ecological level model does not equal the individual level association β I. They show this by taking a hypothesised individual level models and aggregating it to the population level, and showing that is has a different mathematical form to the standard ecological model. E[exp(x (i) k β I)] exp(e[x (i) k ]β x) 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 21/28

A model for ecological bias Richardson et al (1987) show that one solution to overcome this problem is to change the model for R k from to R k = exp(z T k β z + µ k β x + φ k ), R k = exp(z T k β z + µ k β x + σ 2 kβ 2 x /2 + φ k ), thus explicitly incorporating the variation in x k. Here (µ k, σk 2) respectively denote the mean and variance of (x (1) k,..., x (N) k ). This change is based on assuming the within area exposure distribution is Gaussian, and comes direct from the moment generating function E[exp(x (i) k β I)]. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 22/28

The impact of ecological bias The difference between the naive ecological model and the corrected model is the term σ 2 k β2 x /2, which is likely to be small if: β x is small. σ 2 k is small. The former is likely to be true and the latter may or may not be, and preliminary analyses show that the impact of ecological bias may be small for these studies. We are currently investigating the exact extent of this problem. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 23/28

3.3. The problem with CAR models for φ Recall that in the CAR spatial autocorrelation model for (φ 1,..., φ n ), spatial autocorrelation is induced by the binary neighbourhood matrix W, where if areas (k, i) share a common border (are spatially close) then w ki = 1. This induces spatial autocorrelation between (φ k, φ i ) as can be seen from their partial correlations: Corr[φ k, φ i φ ki ] = w ki ( n j=1 w kj)(. n l=1 w il) Hence all pairs of areas sharing a common border (w ki = 1) will be correlated. This assumes the same level of spatial smoothness across (φ 1,..., φ n ) and is not realistic. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 24/28

Alternatives? Ignore the spatial autocorrelation entirely and let φ k = 0 for all areas k. Replace the random effects (φ 1,..., φ n ) with an alternative spatial smoothing component that is forced to be orthogonal (unrelated) to the air pollution covariate, so that spatial confounding cannot occur. Such an approach was proposed by Hughes and Haran in (2013). Extend the CAR model to make it more flexible and allow for localised smoothness in the random effects, so that geographically adjacent values can be modelled as similar or very different. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 25/28

Locally smooth CAR models A number of approaches have been proposed to extend the standard CAR model to allow for localised spatial smoothness. They can be broken into two main approaches: Treat each w ki relating to neighbouring areas as binary random quantities, so that if w ki is estimated as one then (φ k, φ i ) are spatially smoothed, while if w ki is estimated as zero they are not. Examples include Lee and Mitchell (2013) and Lee, Rushworth and Sahu (2014). Augment the spatially smooth random effects with a piecewise constant jump component with different mean levels, so that if two areas close together have different mean levels their residuals will not be similar. Examples include Lawson and Clark (2002), Lee et al (2014). 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 26/28

Results The choice of residual spatial autocorrelation model can make a large difference on the estimated pollution-health relationship. The estimated relative risks and 95% uncertainty intervals for a 1µgm 3 increase in PM 2.5 were: Ignore correlation - 1.032 (1.005, 1.060). CAR model - 1.065 (1.043, 1.091). Orthogonal model - 1042 (1.040, 1.043). Localised CAR model - 1.046 (1.033, 1.060). Both the estimates and uncertainty intervals can differ substantially. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 27/28

5. Conclusions 1 Developing a statistically valid approach for estimating the health effects of air pollution using spatial ecological data is a challenging task, and requires complex models for both the pollution and health data. 2 Using an inappropriate statistical model results in estimated health effects that are likely to be biased. 3 Future work will extend this model into the spatio-temporal domain, and the replication of the spatial data over time will enable more precise estimation of the air pollution effects. 4 The effects of air pollution will be investigated across Scotland, to see if the England results are replicated here. 1. Introduction 2. Spatial ecological Studies 3. Modelling 5. Conclusions 28/28