Flexible modelling of the cumulative effects of time-varying exposures

Similar documents
Power and Sample Size Calculations with the Additive Hazards Model

Point-wise averaging approach in dose-response meta-analysis of aggregated data

Predicting Long-term Exposures for Health Effect Studies

Methodological challenges in research on consequences of sickness absence and disability pension?

Estimating the long-term health impact of air pollution using spatial ecological studies. Duncan Lee

Dirichlet process Bayesian clustering with the R package PReMiuM

Unbiased estimation of exposure odds ratios in complete records logistic regression

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Lecture 5: Poisson and logistic regression

Joint longitudinal and time-to-event models via Stan

Lecture 2: Poisson and logistic regression

Core Courses for Students Who Enrolled Prior to Fall 2018

Semiparametric Generalized Linear Models

Estimation of the Relative Excess Risk Due to Interaction and Associated Confidence Bounds

Spatio-Temporal Threshold Models for Relating UV Exposures and Skin Cancer in the Central United States

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Dr. Haritini Tsangari Associate Professor of Statistics University of Nicosia, Cyprus

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Analysing geoadditive regression data: a mixed model approach

Measurement Error in Spatial Modeling of Environmental Exposures

Anders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Brent Coull, Petros Koutrakis, Joel Schwartz, Itai Kloog, Antonella Zanobetti, Joseph Antonelli, Ander Wilson, Jeremiah Zhu.

The Use of Spatial Exposure Predictions in Health Effects Models: An Application to PM Epidemiology

Causality through the stochastic system approach

Pumps, Maps and Pea Soup: Spatio-temporal methods in environmental epidemiology

Part IV Statistics in Epidemiology

Meta-analysis of epidemiological dose-response studies

DISCRETE PROBABILITY DISTRIBUTIONS

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Multi-state models: prediction

Modern regression and Mortality

Analysing longitudinal data when the visit times are informative

Statistics for extreme & sparse data

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

ST5212: Survival Analysis

Statistical and epidemiological considerations in using remote sensing data for exposure estimation

A general mixed model approach for spatio-temporal regression data

Many natural processes can be fit to a Poisson distribution

Multi-state Models: An Overview

A Poisson Process Approach for Recurrent Event Data with Environmental Covariates NRCSE. T e c h n i c a l R e p o r t S e r i e s. NRCSE-TRS No.

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

High-Throughput Sequencing Course

Multivariate Count Time Series Modeling of Surveillance Data

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

One-stage dose-response meta-analysis

Authors: Antonella Zanobetti and Joel Schwartz

Modelling Survival Data using Generalized Additive Models with Flexible Link

Probability and Probability Distributions. Dr. Mohammed Alahmed

A Guide to Modern Econometric:

Other likelihoods. Patrick Breheny. April 25. Multinomial regression Robust regression Cox regression

Exam Applied Statistical Regression. Good Luck!

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Package SimSCRPiecewise

Cluster Analysis using SaTScan

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Modelling geoadditive survival data

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Spatio-temporal Statistical Modelling for Environmental Epidemiology

Fractional Polynomials and Model Averaging

Harvard University. Harvard University Biostatistics Working Paper Series

Statistical Modeling with Spline Functions Methodology and Theory

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Time-dependent covariates

Additive and multiplicative models for the joint effect of two risk factors

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data

Case-control studies C&H 16

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Transmission of Hand, Foot and Mouth Disease and Its Potential Driving

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions

Linear Regression Models P8111

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

How to Present Results of Regression Models to Clinicians

Lecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018

Adaptive Prediction of Event Times in Clinical Trials

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Dependence structures with applications to actuarial science

P -spline ANOVA-type interaction models for spatio-temporal smoothing

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

An application of the GAM-PCA-VAR model to respiratory disease and air pollution data

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

NISS. Technical Report Number 167 June 2007

TECHNICAL APPENDIX WITH ADDITIONAL INFORMATION ON METHODS AND APPENDIX EXHIBITS. Ten health risks in this and the previous study were

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Instrumental variables estimation in the Cox Proportional Hazard regression model

Survival Analysis with Time- Dependent Covariates: A Practical Example. October 28, 2016 SAS Health Users Group Maria Eberg

Transcription:

Flexible modelling of the cumulative effects of time-varying exposures Applications in environmental, cancer and pharmaco-epidemiology Antonio Gasparrini Department of Medical Statistics London School of Hygiene and Tropical Medicine () Centre for Statistical Methodology 28 November 2014

Outline 1 Introduction 2 Conceptual model 3 Statistical model 4 Examples 5 Software 6 Extensions 7 Discussion

Temporal aspects The relationship between a risk factor and the associated health effect always implies a temporal dependency: a common problem in biomedical research This issue encompasses study designs and statistical model: Tobacco smoke and CVD risk Occupational exposure and incidence of cancer Drug intake and beneficial or side effects Short-term temperature variation and mortality A topic (somewhat) neglected in methodological research

Previous research Standard statistical approaches do not directly characterize this temporal structure Challenge: modelling (potentially complex) temporal patterns of risk due to time-varying exposures Models previously proposed in cancer epidemiology (Thomas 1988, Hauptmann 2000, Richardson 2009) and pharmaco-epidemiology (Abrahamowicz 2012)

Limitations Incomplete statistical development: e.g. no measures of uncertainty Poor software implementation: ad-hoc routines, computational issues, convergence problems Lack of a consistent conceptual and interpretational framework

Distributed lag models DLMs proposed by Almon (Econometrica 1965) in econometrics for time series data, then applied in environmental epidemiology by Schwartz (Epidemiology 2000). Armstrong (Epidemiology 2006) extended them to distributed lag non-linear models (DLNMs), applicable to non-linear exposure-response associations A far more developed statistical framework, but only applicable to time series data

Conceptual representation Single exposure event Forward perspective Effect 0 t t + 1 t + 2 t + L Time

Conceptual representation Multiple exposure events Backward perspective 0 Effect t L t 2 t 1 t Time

Assumptions Under specific assumptions, these two perspectives can be merged together: assumption of identical effects (fundamental) assumption of independency These conditions underpin the conceptual framework for defining and modelling DLNMs

Conceptual representation New lag dimension Effect 0 Forward Backward t 0 t e t 0 + L 0 lag L Time (Lags)

Exposure-lag-response associations The risk is represented by a function s(x t l,..., x t L ) defined in terms of both intensity and timing of a series of past exposures, expressed through: an exposure-response function f (x) for exposure x a lag-response function w(l) for lag l Generating a bi-dimensional exposure-lag-response function f w(x, l), whose integral provides: s(x t l,..., x t L ) = L l 0 f w(x t l, l) dl L l=l 0 f w(x t l, l)

Distributed lag models (DLMs) Given a exposure history at time t for lags l = l 0,..., L: q xt = [x t l0,..., x t l,..., x t L ] T and assuming a linear exposure-response, we can write: s(q xt ; η) = q T x t Cη = w T x t η where C is obtained from the lag vector l = [l 0,..., l,..., L] T by applying a specific basis transformation

Distributed lag non-linear models (DLNMs) First the matrix R xt is obtained applying a second basis transformation to q xt Then we define a tensor product: A xt = (1 T v l R xt ) (C 1 T v x ) which forms the crossbasis: s(q xt ; η) = (1 T v x v l A xt )η = w T x t η The problem reduces to choosing a basis for each q xt and l, defining exposure-response and lag-response functions, respectively

Alternative study designs Time series A Case control A B C Cohort A B C Longitudinal A B C t Bj t Bj t Aj t Ck t Bj t Aj t Cj t j t Aj t Ck t k t Cj t Ak t Ak

First example Temperature and all-cause mortality Research area where DLNMs were originally proposed Time series data with daily death counts and temperature measurements between 1 st Jan 1993 and 31 st Dec 2006 in London (845,215 deaths in total) In this setting, exposure histories are simply derived by lagging the temperature series

Quasi-Poisson GLM Analysis with a generalized linear model with quasi-poisson family, controlling for trends and day of the week P log(µ t ) = α + s x (q xt ; η) + s z (z t ; β z ) p=1 Here spline functions used to specify both f (x) and w(l)

Exposure-lag-response 1.10 1.05 RR 1.00 0 0.95 20 15 Temperature (C) 10 5 0 15 20 10 Lag 5

RR Summaries Exposure lag response Lag response at temperature 22C 1.10 1.05 1.00 0.95 20 15 10 5 Tempeature (C) 15 0 20 0 5 10 Lag RR 0.98 1.00 1.02 1.04 1.06 1.08 0 5 10 15 20 Lag Exposure response at lag 4 Overall cumulative exposure response RR 0.98 1.00 1.02 1.04 1.06 RR 0.8 1.0 1.2 1.4 1.6 1.8 2.0 5 0 5 10 15 20 25 5 0 5 10 15 20 25 Temperature (C) Temperature (C)

Second example Radon exposure and lung cancer mortality 3,347 subjects working in the Colorado Plateau mines between 1950 1960, 258 lung cancer deaths Yearly exposure history to radon (WLM) and tobacco smoke (pack 100) reconstructed from 5-year age periods

Proportional hazard model Analysis with Cox proportional hazards model using age as time axis, controlling for smoking and calendar year. For subject i: log [h(it)] = log [h 0 (t)] + s x (q xit ; η x ) + s z (q zit ; η z ) + γu it Different functions used to specify f (x) and w(l): constant, piecewise constant, quadratic B-spline

Exposure-lag-response Linear-by-constant 1.10 1.08 1.06 HR 1.04 1.02 1.00 0.98 250 200 150 WLM/year 100 50 0 40 35 30 25 20 15 Lag (years) 10 5

Exposure-lag-response Spline-by-constant 1.16 1.14 1.12 1.10 1.08 HR 1.06 1.04 1.02 1.00 0.98 250 200 150 WLM/year 100 50 0 40 35 30 25 20 15 Lag (years) 10 5

Exposure-lag-response Linear-by-spline 1.10 1.08 1.06 HR 1.04 1.02 1.00 0.98 250 200 150 WLM/year 100 50 0 40 35 30 25 20 15 Lag (years) 10 5

Exposure-lag-response Step-by-step 1.30 1.25 1.20 HR 1.15 1.10 1.05 1.00 250 200 150 WLM/year 100 50 0 40 35 30 25 20 15 Lag (years) 10 5

Exposure-lag-response Spline-by-spline 1.30 1.25 1.20 HR 1.15 1.10 1.05 1.00 250 200 150 WLM/year 100 50 0 40 35 30 25 20 15 Lag (years) 10 5

Lag-response curves from DLNMs RR for 100 WLM/year 0.9 1.0 1.1 1.2 1.3 Spline by spline Spline by piecewise 0 5 10 15 20 25 30 35 40 Lag (years)

Exposure-responses at different lags RR at lag 15 0.9 1.0 1.1 1.2 1.3 Lag 5 Lag 15 Lag 25 0 50 100 150 200 250 WLM/year

Third example MMR vaccine and ITP risk Data from 35 children receiving the MMR (measles, mumps, rubella) vaccine months and admitted to the hospital for idiopathic trombocytopenic purpura (ITS) within 12-24 months of age. Replicating and extending a previous analysis using the self-controlled case series design (Whitaker 2006)

Conditional Poisson regression Analysis with conditional Poisson regression controlling for age. For subject i at age a: log(λ iat ) = α i + s x (q xit ; η x ) + f (a it ; γ) Single exposure event modelled with a binary variable Exposure-response assumed linear, lag-response modelled with spline or piecewise constant functions

Lag-response IRR 0 5 10 15 Spline Piecewise constant 0 10 20 30 40 Lag (days)

Fourth example Tobacco and lung cancer incidence 1,479 cases and 1,918 controls from three case-control studies within the Synergy network Yearly exposure history to tobacco smoke (cigarette/day) reconstructed from questionnaires

Logistic regression Analysis with logistic regression controlling for sex logit (µ i ) = α + s x (q xi ; η x ) + γu i Different functions used to specify f (x) and w(l): log, piecewise constant, quadratic B-spline

OR Exposure-lag-response Log-by-spline 1.15 1.10 1.05 1.00 80 60 Cig/day/year 40 20 0 60 50 40 30 Lag (years) 20 10 0

Lag-response curves from DLNMs OR for 20 cig/day/year 0.95 1.00 1.05 1.10 1.15 Spline Step 0 10 20 30 40 50 60 Lag (years)

Dynamic prediction of risk Cumulative OR 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Quit Relapse Cessation 50 40 30 20 10 0 Cig/day/year 0 10 20 30 40 50 60 years

Fifth example Trial on the effect of a drug 50 subjects followed for 4 weeks Time-varying treatment randomly allocated in two of the four weeks, each with a different dose selected at random Outcome measured at the end of the 28 days

Linear regression Analysis with linear regression controlling for sex y i = α + s x (q xi ; η x ) + γu i + ɛ i Exposure-response assumed linear Lag-response modelled with spline or decay functions

Exposure-lag-response linear-by-spline 6 Effect 4 2 0 100 80 60 Dose 40 20 0 25 5 10 15 20 Lag (days) 0

Lag-response Spline function Effect at dose 60 1 0 1 2 3 4 5 6 0 5 10 15 20 25 Lag (days)

Lag-response Decay function Effect at dose 60 1 0 1 2 3 4 5 6 0 5 10 15 20 25 Lag (days)

Software implementation The framework is fully implemented in the R package dlnm, available from the CRAN (Gasparrini JSS 2011) The package contains a new vignette focusing on applications beyond time series data

The R package dlnm Example of code library(dlnm) cb <- crossbasis(q,lag=c(2,40), argvar=list(fun="bs",degree=2,knots=59.4,cen=0), arglag=list(fun="bs",degree=2,knots=13.3,int=f)) model <- coxph(surv(agest,ageexit,ind)~cb+smoke+caltime,data) pred <- crosspred(cb,model,at=0:25*10) plot(pred,"3d",xlab="wlm/year",ylab="lag (years)",zlab="rr") plot(pred,var=100,xlab="lag (years)",ylab="rr") plot(pred,lag=15,xlab="wlm/years",ylab="rr")

HR HR HR Simulations Linear Constant Plateau Decay Exponential Peak 1.10 1.08 1.06 1.04 1.5 1.4 1.3 1.2 1.5 1.4 1.3 1.2 1.02 1.1 1.1 1.00 10 8 0 6 10 4 20 2 30 0 40 Exposure Lag 1.0 10 8 0 6 10 4 20 2 30 0 40 Exposure Lag 1.0 10 8 0 6 10 4 20 2 30 0 40 Exposure Lag HR 0.9 1.0 1.1 1.2 1.3 True AIC avg AIC samples HR 0.9 1.1 1.3 1.5 True AIC avg AIC samples HR 0.9 1.1 1.3 1.5 True AIC avg AIC samples 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 Lag Lag Lag HR 0.9 1.0 1.1 1.2 1.3 True AIC avg AIC samples HR 0.9 1.1 1.3 1.5 True AIC avg AIC samples HR 0.9 1.1 1.3 1.5 True AIC avg AIC samples 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 Exposure Exposure Exposure

Penalized DLNMs Currently, the bi-dimensional exposure-lag-response function f w(x, l) is specified using completely parametric methods However, simple DLMs also proposed in a Bayesian (Welty 2008) or penalized versions (Zanobetti 2000, Rushworth 2013, Obermeier 2015) An obvious extension is to develop a semi-parametric version of DLNMs through penalized splines The development may be facilitated by embedding the R package mgcv in dlnm, exploiting the existing GAM implementation

Interactions in DLNMs Interactions in DLNMs would allow the exposure-lag-response association varying depending on the value of other predictors (see also Rushworth 2013) This corresponds to relaxing the assumption of identical effects This development extends the framework to a wide range of new applications However, it entails non-trivial methodological problems

Time-varying DLNMs Japan Spain 1.6 1985 2012 3.0 1990 2010 1.4 2.5 RR 1.2 RR 2.0 1.5 1.0 0.8 1.0 0.5 0 1 10 50 90 99 Summer temperature percentile 0 1 10 50 90 99 100 Summer temperature percentile UK USA 1.8 1993 2006 1.4 1985 2006 1.6 1.3 1.4 1.2 RR 1.2 RR 1.1 1.0 1.0 0.8 0.9 0 1 10 50 90 99 100 Summer temperature percentile 0 1 10 50 90 99 100 Summer temperature percentile

Some advantages DLNMs offer a flexible way to model exposure-lag-response associations Unified framework based on a general conceptual and statistical definition, applicable in various study designs Complete software implementation, models can be fitted with standard regression routines

Some limitations The DLNM framework is only applicable to time-varying (non-constant) exposures It requires the availability of exposure histories (possibly reconstructed) Model selection procedures still under-developed

Main references. Modeling exposure-lag-response associations with distributed lag non-linear models. Statistics in Medicine. 2014;33(5):881-899. & Armstrong B. The R package dlnm. http: //cran.r-project.org/web/packages/dlnm/index.html E-mail: antonio.gasparrini@lshtm.ac.uk

Other references (I) Abrahamowicz et al (2006). Modeling cumulative dose and exposure duration provided insights regarding the associations between benzodiazepines and injuries. Journal of Clinical Epidemiology, 59(4):393 403. Abrahamowicz et al (2012), Comparison of alternative models for linking drug exposure with adverse effects. Statistics in Medicine, 31:1014 1030. Almon S (1965). The distributed lag between capital appropriations and expenditures. Econometrica, 33(1):178 196. Armstrong (2006). Models for the relationship between ambient temperature and daily mortality. Epidemiology, 17(6): 624 631. Berhane et al (2008). Using tensor product splines in modeling exposure-time-response relationships: application to the Colorado Plateau Uranium Miners cohort. Statistics in Medicine, 27(26):5484 96. Heaton et al (2014). Extending distributed lag models to higher degrees. Biostatistics, 15(2):398 412. Gasparrini et al (2010). Distributed lag non-linear models. Statistics in Medicine, 29(21):2224 2234. Gasparrini (2011). Distributed lag linear and non-linear models in R: the package dlnm. Journal of Statistical Software, 43(8):1 20. Hauptmann et al (2000). Analysis of exposure-time-response relationships using a spline weight function. Biometrics, 56(4):1105 8. Langholz et al (1999). Latency analysis in epidemiologic studies of occupational exposures: application to the Colorado Plateau uranium miners cohort. American Journal of Industrial Medicine, 35(3):246 56.

Other references (II) Leffondre et al (2002). Modeling smoking history: a comparison of different approaches. American Journal of Epidemiology, 156(9):813. Obermeier et al (2015). Flexible distributed lag models and their application to geophysical data. Journal of the Royal Statistical Society: Series B, ahead of print. Richardson (2009). Latency models for analyses of protracted exposures. Epidemiology, 20:395 399. Rushworth et al (2013). Distributed lag models for hydrological data. Biometrics, 69:537 544. Schwartz (2000). The distributed lag between air pollution and daily deaths. Epidemiology, 11(3):320 326. Sylvestre & Abrahamowicz (2009). Flexible modeling of the cumulative effects of time-dependent exposures on the hazard. Statistics in Medicine, 28(27):3437 53. Thomas (1983). Statistical methods for analyzing effects of temporal patterns of exposure on cancer risks. Scand J Work Environ Health, 9(4):353 366. Thomas (1988). Models for exposure-time-response relationships with applications to cancer epidemiology. Annual Review of Public Health, 9:451 82. Welty et al (2008). Bayesian distributed lag models: estimating effects of particulate matter air pollution on daily mortality. Biometrics, 65:282-291. Whitaker et al (2006). Tutorial in biostatistics: The self-controlled case series method. Statistics in Medicine, 25:1768 1797.