Multiple imputation to account for measurement error in marginal structural models

Similar documents
An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies

Does low participation in cohort studies induce bias? Additional material

Lecture 7 Time-dependent Covariates in Cox Regression

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

IP WEIGHTING AND MARGINAL STRUCTURAL MODELS (CHAPTER 12) BIOS IPW and MSM

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

Federated analyses. technical, statistical and human challenges

Logistic regression model for survival time analysis using time-varying coefficients

MAS3301 / MAS8311 Biostatistics Part II: Survival

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

Ignoring the matching variables in cohort studies - when is it valid, and why?

Multi-state Models: An Overview

Machine Learning Linear Classification. Prof. Matteo Matteucci

SAS Analysis Examples Replication C11

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Extensions of Cox Model for Non-Proportional Hazards Purpose

Analysis of competing risks data and simulation of data following predened subdistribution hazards

The MIANALYZE Procedure (Chapter)

Marginal Structural Cox Model for Survival Data with Treatment-Confounder Feedback

Known unknowns : using multiple imputation to fill in the blanks for missing data

MISSING or INCOMPLETE DATA

TMA 4275 Lifetime Analysis June 2004 Solution

Biostatistics Advanced Methods in Biostatistics IV

Package threg. August 10, 2015

A Survival Analysis of GMO vs Non-GMO Corn Hybrid Persistence Using Simulated Time Dependent Covariates in SAS

GEE for Longitudinal Data - Chapter 8

PubH 7470: STATISTICS FOR TRANSLATIONAL & CLINICAL RESEARCH

Survival Analysis with Time- Dependent Covariates: A Practical Example. October 28, 2016 SAS Health Users Group Maria Eberg

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Extending causal inferences from a randomized trial to a target population

Survival Regression Models

Advantages of Mixed-effects Regression Models (MRM; aka multilevel, hierarchical linear, linear mixed models) 1. MRM explicitly models individual

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

Effects of multiple interventions

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Case-control studies

On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

Robustifying Trial-Derived Treatment Rules to a Target Population

Multiple imputation in Cox regression when there are time-varying effects of covariates

One-stage dose-response meta-analysis

First Aid Kit for Survival. Hypoxia cohort. Goal. DFS=Clinical + Marker 1/21/2015. Two analyses to exemplify some concepts of survival techniques

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

β j = coefficient of x j in the model; β = ( β1, β2,

Missing Data in Longitudinal Studies: Mixed-effects Pattern-Mixture and Selection Models

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Lab 8. Matched Case Control Studies

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

SAS/STAT 13.1 User s Guide. The MIANALYZE Procedure

A fast routine for fitting Cox models with time varying effects

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Equivalence of random-effects and conditional likelihoods for matched case-control studies

More Statistics tutorial at Logistic Regression and the new:

Growth Mixture Model

Generating survival data for fitting marginal structural Cox models using Stata Stata Conference in San Diego, California

Master s Written Examination - Solution

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

General Regression Model

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

MAS3301 / MAS8311 Biostatistics Part II: Survival

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

TGDR: An Introduction

Beyond GLM and likelihood

Some Relatively Simple Ways to Get Good Answers

arm, we estimate the time to initial suppression using the nonparametric Kaplan Meier

A note on R 2 measures for Poisson and logistic regression models when both models are applicable

arxiv: v5 [stat.me] 13 Feb 2018

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

An Introduction to Causal Inference, with Extensions to Longitudinal Data

Digital Southern. Georgia Southern University. Varadan Sevilimedu Georgia Southern University. Fall 2017

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

Statistical Methods for Alzheimer s Disease Studies

Introduction to Statistical Analysis

A discrete semi-markov model for the effect of need-based treatments on the disease states Ekkehard Glimm Novartis Pharma AG

University of California, Berkeley

PASS Sample Size Software. Poisson Regression

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Statistics 262: Intermediate Biostatistics Model selection

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Extensions of Cox Model for Non-Proportional Hazards Purpose

Lecture 01: Introduction

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Part IV Statistics in Epidemiology

Online supplement. Absolute Value of Lung Function (FEV 1 or FVC) Explains the Sex Difference in. Breathlessness in the General Population

Survival Analysis. Stat 526. April 13, 2018

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Logistic regression and linear classifiers COMS 4771

Machine Learning for Signal Processing Bayes Classification

ANALYSIS OF CORRELATED DATA SAMPLING FROM CLUSTERS CLUSTER-RANDOMIZED TRIALS

STA6938-Logistic Regression Model

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Time-varying proportional odds model for mega-analysis of clustered event times

Building a Prognostic Biomarker

Simple logistic regression

Transcription:

Multiple imputation to account for measurement error in marginal structural models Supplementary material A. Standard marginal structural model We estimate the parameters of the marginal structural model in equation 2 using a 2-step process: First, we estimate inverse probability of exposure weights W i (t) and then we maximize the weighted partial likelihood N exp{θ 1 s + θ 2 a(y i ) + θ 3 s a(y i )} L(θ) = { N I(y j y i ) exp{θ 1 s + θ 2 a(y i ) + θ 3 s a(y i )} w j (y i ) i=1 j=1 δ i w i (y i ) } (3) where I(y j y i ) is an indicator of patient j being in the risk set at time y i. Weights in the standard analysis were estimated for each month as W (t) = W S W A(S )(t), or the product of time-fixed stabilized inverse-probability of (providerreported) smoking weights and time-varying stabilized inverse-probability of treatment weights. Specifically, the smoking weight for each individual was estimated as W S = f(s ) f(s L), (4) where L is a vector time-fixed covariates including age, sex, race, ethnicity, and year, HIV transmission risk factor (indicators of being a man who has sex with men and injection drug use), CD4 cell count, and viral load at study entry. Numerator and denominator were estimated using logistic regression. Each patient s weight for therapy initiation was the inverse probability receiving the treatment history that patient received by time t, where time is measured in months since

study entry for each patient. Stabilized weights for therapy initiation were estimated for each patient at each time t in the standard analysis as t W A(S ) (t) = f{a(k) A (k 1), S } f{a(k) A (k 1), L, Z (k), S, (5) } k=0 where A (k 1) represents treatment history, L represents the time-fixed covariates described above, Z (k) represents the history of a vector of time-varying covariates, including CD4 cell count and an indicator of detectable viral load (>500 copies/ml). To summarize treatment history, we let A (k 1) = 1 if the patient had initiated treatment through time k 1, and A (k 1) = 0 otherwise. Covariate history was summarized as the value of each covariate at time k. Note that the treatment weights include the patient s error-prone smoking status S. The numerator and denominator of the treatment weights were estimated using pooled logistic regression. Continuous variables in both sets of weights were modeled using restricted quadratic splines with 4 knots placed at the 5 th, 35 th, 65 rd, and 95 th percentiles 20.

B. SAS code to implement multiple imputation to account for measurement error in an inverse-probability-weighted marginal structural model The SAS code below is adapted from Cole et al (2006) 25. /********************notation****************** p is a dataset with one record per person and the following variables: x = true exposure available only for people in validation study w = misclassified exposure (available for all participants) delta = outcome indicator y = time from study entry to first of outcome or censoring logy = ln(y) z = confounder wd = w * delta dlogy = delta*logy **********************************************/ %let m=20; data m; set _null_; run; proc logistic data=p descending covout outest=b(keep=_name_ intercept w delta wd logy z dlogy) noprint; model x = w d wd logy z dlogy; data bb; set b; if _name_="x"; bh0=intercept; bh1=w; bh2=delta; bh3=wd; bh4=logy; bh5=z; bh6=dlogy; keep bh0-bh6; data cov; set b; if name_^="x"; keep intercept w delta wd logy z dlogy; proc iml; use cov; read all into cov; *variance-covariance matrix; use bb; read all into mu; mu=mu`;*means; v=nrow(cov); *number of variables; n=&m; *number of imputations; seed=222; l=t(root(cov)); *cholesky root of cov matrix; z=normal(j(v,n,seed)); *generate nvars*samplesize normals x=l*z; *premultiply by cholesky root; x=repeat(mu,1,n)+x; *add in the means; tx=t(x); create m from tx; *write out sample data to sas dataset; append from tx; quit; data m ;set m; array col {7} col1-col7; array b {7} b0-b6; retain _imputation_ 0; _imputation_=_imputation_+1; do u=1 to 7; b[u]=col[u]; end; drop col1-col7 u; run; data aa; set p ; do _imputation_=1 to &m; output; end; run; proc sort data=m; by _imputation_; run; proc sort data=aa; by _imputation_; run; data mi; merge aa m; by _imputation_;

data mi; set mi; call streaminit(320); if x=. then x=rand("bernoulli",1/(1+exp(-(b0+b1*w+b2*delta+b3*wd+ b4*logy+b5*z+b6*dlogy)))); run; proc logistic data=mi desc noprint; by _imputation_; model x= ; output out=num(keep=_imputation_ j n) p=n; proc logistic data=mi desc noprint; by _imputation_; model x = z; output out=den(keep=_imputation_ j d) p=d; data w; merge mi num den; by _imputation_ id; num=(n*(x=1)+(1-n)*(x=0)); den=(d*(x=1)+(1-d)*(x=0)); sw=num/den; run; proc phreg data=w covs covout outest=imp noprint; by _imputation_; model y*delta(0)=x; weight sw; proc mianalyze data=imp; modeleffects x; run;

C. Simulation details In each scenario, let i index hypothetical patients from 1 to 2,000 in each of 2,000 simulated cohorts. Measured confounder Z = 1 for 30% of patients, and patients with Z = 1 were half as likely to be exposed to the time-fixed exposure (X = 1) as patients with Z = 0. The outcome time (T i ) was generated for each patient based on X i and Z i, and time to censoring for each patient (C i ) was generated based on X i. The observed time for each patient was represented by Y i = T i ^ C i. A marginal structural Cox model estimated by inverse probability weighting of the true exposure X yielded a hazard ratio of 1.75. A misclassified version of the exposure, V i, was generated for each patient based on X i and misclassification probabilities reflecting the misclassification probabilities for smoking status in the application described above. When X i = 1, V i was drawn from a Bernoulli distribution with probability equal to the sensitivity, which was set to 70%. When X i = 0, V i was drawn from a Bernoulli distribution with probability 1 specificity, where specificity was 80%. For each patient, R i was an indicator of being included in the validation subgroup. R i was drawn from a Bernoulli distribution with probability of 0.3 to represent a 30% validation subgroup. In each simulated cohort, the standard analysis fit a marginal structural Cox model using inverse probability weighting of exposure V i. Weights were stabilized by the marginal probability of exposure so that they took the form w i = P(V i = v i ) P(V i = v i Z i ). The hazard ratio in the standard analysis was estimated as exp(ζ) in the marginal structural Cox model λ T v(t) = λ 0 (t) exp(ζv), where ζ was obtained by maximizing the weighted partial likelihood given in text as equation 3.

The analysis accounting for exposure measurement error first imputed the gold standard exposure for individuals outside the validation subgroup based on the observed exposure V i and covariate Z i in each of M imputations. Weights were stabilized by the marginal probability of exposure and were estimated in each imputation m as w i m = P(X i m = x i m )/P(X i m = x i m Z i ). The hazard ratio in each imputation was estimated as exp (ψ m ) in the marginal structural Cox model λ m T x(t) = λ m 0 (t)exp(ψ m x m ), where ψ was obtained by maximizing the weighted partial likelihood given in the text as equation 7. The results were combined across imputations using Rubin s rules so that the summary hazard ratio was exp(ψ ) = exp(m 1 M m=1 V(ψ ) = M 1 M m=1 V(ψ m ) + (1 + M 1 )(M 1) 1 M (ψ m ψ ) 2 ψ m ) and the variance was given by m=1. Bias was defined as the average difference between the ln(hr) estimated using the standard marginal structural model or the marginal structural model using multiple imputation to account for measurement error and the true ln(hr), as given by the marginal structural model fit using the true exposure. Coverage was the proportion of simulated cohorts in each scenario in which the 95% confidence intervals included the true HR, and power was the proportion of cohorts in which the 95% confidence interval excluded the null value. The bias-variance tradeoff was evaluated using mean squared error, which was the sum of the squared bias and the squared standard deviation of the bias.