Global Sensitivity Analysis for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach

Similar documents
36 th Annual Meeting Preconference Workshop P3 Handout

Global Sensitivity Analysis of Randomized Trials with Missing Data

Global Sensitivity Analysis of Randomized Trials with Missing Data

Global Sensitivity Analysis for Repeated Measures Studies with. Informative Drop-out: A Semi-Parametric Approach

Global Sensitivity Analysis for Repeated Measures Studies with Informative Drop-Out: A Semi-Parametric Approach

Global Sensitivity Analysis for Repeated Measures Studies with. Informative Drop-out: A Fully Parametric Approach

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A discrete semi-markov model for the effect of need-based treatments on the disease states Ekkehard Glimm Novartis Pharma AG

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Whether to use MMRM as primary estimand.

Methods for Handling Missing Data

Bounds on Causal Effects in Three-Arm Trials with Non-compliance. Jing Cheng Dylan Small

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

7 Sensitivity Analysis

On the Analysis of Tuberculosis Studies with Intermittent Missing Sputum Data

A Sampling of IMPACT Research:

Extending causal inferences from a randomized trial to a target population

Selection on Observables: Propensity Score Matching.

Causal Inference Basics

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Targeted Maximum Likelihood Estimation in Safety Analysis

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Bootstrapping Sensitivity Analysis

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

6 Pattern Mixture Models

Semiparametric Generalized Linear Models

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Multi-state Models: An Overview

Lecture 9: Learning Optimal Dynamic Treatment Regimes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Sample Size Determination

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Robustness to Parametric Assumptions in Missing Data Models

Analysing longitudinal data when the visit times are informative

A Clinical Trial Simulation System, Its Applications, and Future Challenges. Peter Westfall, Texas Tech University Kuenhi Tsai, Merck Research Lab

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

14.30 Introduction to Statistical Methods in Economics Spring 2009

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

2 Naïve Methods. 2.1 Complete or available case analysis

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology

University of California, Berkeley

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Semi-Nonparametric Inferences for Massive Data

Gov 2002: 5. Matching

Group Sequential Designs: Theory, Computation and Optimisation

The Nonparametric Bootstrap

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Machine Learning Linear Classification. Prof. Matteo Matteucci

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Tutorial 3: Power and Sample Size for the Two-sample t-test with Equal Variances. Acknowledgements:

Classification. Chapter Introduction. 6.2 The Bayes classifier

Longitudinal analysis of ordinal data

Lecture 7 Time-dependent Covariates in Cox Regression

Bayesian Linear Regression

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Comparative effectiveness of dynamic treatment regimes

Use of Matching Methods for Causal Inference in Experimental and Observational Studies. This Talk Draws on the Following Papers:

Does low participation in cohort studies induce bias? Additional material

Sample Size and Power Considerations for Longitudinal Studies

Estimation of Treatment Effects under Essential Heterogeneity

Plausible Values for Latent Variables Using Mplus

Implementing Response-Adaptive Randomization in Multi-Armed Survival Trials

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Investigating mediation when counterfactuals are not metaphysical: Does sunlight exposure mediate the effect of eye-glasses on cataracts?

Data Integration for Big Data Analysis for finite population inference

Introduction to mtm: An R Package for Marginalized Transition Models

Combining multiple observational data sources to estimate causal eects

Imputation Algorithm Using Copulas

Introduction An approximated EM algorithm Simulation studies Discussion

Econ 582 Nonparametric Regression

Analysis of Incomplete Non-Normal Longitudinal Lipid Data

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Dose-response modeling with bivariate binary data under model uncertainty

Basics of Modern Missing Data Analysis

Bias Variance Trade-off

Comparing two independent samples

Estimating Causal Effects of Organ Transplantation Treatment Regimes

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

P Values and Nuisance Parameters

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

BMA-Mod : A Bayesian Model Averaging Strategy for Determining Dose-Response Relationships in the Presence of Model Uncertainty

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Introduction to Crossover Trials

University of California, Berkeley

Introduction. Chapter 1

Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness

BIOSTATISTICAL METHODS

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Adaptive Trial Designs

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models

Accepted Manuscript. Comparing different ways of calculating sample size for two independent means: A worked example

BLINDED EVALUATIONS OF EFFECT SIZES IN CLINICAL TRIALS: COMPARISONS BETWEEN BAYESIAN AND EM ANALYSES

Transcription:

Global for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach Daniel Aidan McDermott Ivan Diaz Johns Hopkins University Ibrahim Turkoz Janssen Research and Development September 18, 2014

Andrei Yakovlev Colloquium Dr. Jack Hall said that Dr. Yakovlev enjoyed discovering major flaws in widely used methodology and creating innovative methods to overcome them.

Focus Restrict consideration to follow-up randomized study designs that prescribe that measurements of an outcome of interest are to be taken on each study participant at fixed time-points. Focus on monotone missing data pattern Consider the case where interest is focused on a comparison of treatment arm means at the last scheduled visit.

Asssumptions Inference about the treatment arm means requires two types of assumptions: (i) unverifiable assumptions about the distribution of outcomes among those with missing data and (ii) additional testable assumptions that serve to increase the efficiency of estimation.

Type (i) assumptions are necessary to identify the treatment-specific means. Since type (i) assumptions are not testable, it is essential to conduct a sensitivity analysis, whereby the data analysis is repeated under different type (i) assumptions. There are an infinite number of ways of positing type (i) assumptions. Ultimately, these assumptions prescribe how missing outcomes should be imputed.

Types of Ad-hoc Try a bunch of different methods. Local Explore sensitivity in a small neighborhood around a benchmark assumption. Global Explore sensitivity in a much larger neighborhood around a benchmark assumption.

Notation K scheduled post-baseline assessments. There are (K + 1) patterns representing each of the visits an individual might last be seen, i.e., 0,..., K. The (K + 1)st pattern represents individuals who complete the study. Let Y k be the outcome scheduled to be measured at visit k, with visit 0 denoting the baseline measure (assumed to be observed). Let Y k = (Y 0,..., Y k ) and Y + k = (Y k+1,..., Y K ).

Notation Let R k be the indicator of being on study at visit k R 0 = 1; R k = 1 implies that R k 1 = 1. Let C be the last visit that the patient is on-study. We focus inference separately for each treatment arm. The observed data for an individual is O = (C, Y C ). We want to estimate µ = E[Y K ].

Benchmark Assumption: Missing at Random R k+1 Y + k Type (i) Assumption R k = 1, Y k

Class of Type (i) Assumptions For k = 0,..., K 1, logit P[R k+1 = 0 R k = 1, Y K ] = h k(y k ) + αr(y k+1) where h k (Y k ) = logit P[R k+1 = 0 R k = 1, Y k ] log{e[exp(αr(y k+1 )) R k = 1, Y k ]} r(y k+1 ) is a specified increasing function of Y k+1 α is a sensitivity analysis parameter.

Class of Type (i) Assumptions α = 0 is Missing at Random α quantifies the influence of Y k+1 on the decision to drop-out before visit k + 1, among those on study at visit k with observed history Y k.

Identification Formula [ ] µ(p R K Y K ) = E K 1 k=0 (1 + exp(h k(y k ) + αr(y k+1))) 1 where P is the true distribution of the observed data, characterized by P[R k+1 = 0 R k = 1, Y k ] f (Y k+1 R k+1 = 1, Y k ) and f (Y 0) These conditional distributions can t be estimated at fast enough rates so a plug-in estimator of µ will converge at n rates.

Type (ii) Assumptions First-order Markov assumptions: P[R k+1 = 0 R k = 1, Y k ] = P[R k+1 = 0 R k = 1, Y k ] and f (Y k+1 R k+1 = 1, Y k ) = f (Y k+1 R k+1 = 1, Y k ) Non-parametric smoothing with respect to the covariate Y k using a Gaussian kernel. Estimate optimal smoothing parameters using a weighted squared-error loss function and 10-fold cross validation.

Estimation of µ Plug-in estimator, µ( P), can suffer from non-standard asymptotics. To correct this problem, we use a one-step estimator: plug-in + average of estimated influence function

Aside Consider a parametric submodel indexed by a finite dimensional parameter, say θ, that passes through P P. A parametric submodel is a collection of distributions {P θ : θ Θ} P where, WLOG, P θ=0 = P. An asymptotically linear estimator of µ(p) with (mean zero) influence function, ψ P (O), will be regular at P if and only if, for all parametric submodels, µ(p θ ) θ θ=0 = E P [ψ P (O)S θ (O)] (1) where S θ (O) = log dp θ θ θ=0.

Aside This implies that µ(p θ ) µ(p) = E Pθ [ψ P (O)] + O( θ 2 ) (2) for all parametric submodels. This implies that µ(q) µ(p) = E Q [ψ P (O)] + O( Q P 2 ), (3) where Q is some other distribution in P.

Aside With P = P, and Q = P, (3) becomes µ( P) µ = E P [ψ P(O)] + O P ( P P 2 ) (4) Adding and subtracting terms, we obtain µ( P) µ = E n [ψ P (O)] E n [ψ P(O)] + {ψ P(o) ψ P (o)}{dp n (o) dp (o)} + O P ( P P 2 )

Aside Assuming P P 2 = o P (n 1/2 ) and additional regularity conditions, µ( P) µ = E n [ψ P (O)] E n [ψ P(O)] + o P (n 1/2 ) Consider the one-step estimator Then µ = µ( P) + E n [ψ P(O)] n( µ µ ) = 1 n n ψ P (O i ) + o P (1) That is, µ is asymptotically linear with influence function ψ P (O). i=1

Aside If no testable restrictions are placed on P, then ψ P (O) satisfying (1) will be unique: ψ np P (O). If testable restrictions are placed on P, then ψ P (O) satisfying (1) will not generally be unique. The influence function that yields the smallest asymptotic variance, ψ sp P (O), is the projection of ψ np P (O) onto the tangent space of the model P. The tangent space of a parametric submodel passing through P P is a space of random variables that can be expressed as linear combinations of the components of S θ (O). The tangent space of the model P is the smallest, closed space that contains all the parametric submodel tangent spaces.

Uncertainty An influence function-based 95% confidence interval takes the form µ ± 1.96ŝe( µ), where ŝe( µ) = E n [ψ sp P (O)2 ]/n. In studentized bootstrap, the confidence interval takes the form [ µ + t 0.025 { ŝe( µ), µ + t 0.975 ŝe( µ)], } where t q is the µ qth quantile of (b) µ : b = 1,..., B and ŝe( µ (b) ) ŝe( µ (b) )

Uncertainty - Double Bootstrap For the bth bootstrapped dataset, n observed patient records are repeatedly re-sampled with replacement to create S new datasets. For each of these datasets the entire estimation procedure is executed to obtain parameter estimates { µ (b,s) : s = 1,..., S}. Let t (b) q to be qth quantile of Solve for q such that 1 B B b=1 { µ (b,s) µ (b) ŝe( µ (b,s) ) I ( µ [ µ (b) + t q (b) ŝe( µ (b) ), µ (b) + t (b) } : s = 1,..., S 1 qŝe( µ(b) )]) 0.95 is minimized; denote the solution by q. The 95% double bootstrap confidence interval takes the form [ µ + t q ŝe( µ), µ + t 1 q ŝe( µ)].

Uncertainty - Fast Double Bootstrap The drawback of double bootstrap is that it is computationally intensive. To address this issue, set S = 1 and defined t (b) above to be qth quantile of { µ (b,1) µ (b) ŝe( µ (b,1) ) q = t q : b = 1,..., B }.

Case Study: SCA-3004 Randomized trial designed to evaluate the efficacy and safety of once-monthly, injectable paliperidone palmitate (PP1M) relative to placebo (PBO) in delaying the time to relapse in subjects with schizoaffective disorder. Open-label phase consisting of a flexible-dose, lead-in period and a fixed-dose, stabilization period. Stable subjects entered a 15-month relapse-prevention phase and were randomized to receive PP1M or placebo injections at baseline (Visit 0) and every 28 days (Visits 1-15). Additional clinic visit (Visit 16) scheduled for 28 days after the last scheduled injection. 170 and 164 subjects were randomized to the PBO and PP1M arms.

Case Study: SCA-3004 Research question: Are functional outcomes better in patients with schizoaffective disorder better maintained if they continue on treatment or are withdrawn from treatment and given placebo instead? An ideal study would follow all randomized subjects through Visit 16 while maintaining them on their randomized treatment and examine symptomatic and functional outcomes at that time point. Since clinical relapse can have a major negative impact, the study design required that patients who had signs of relapse were discontinued from the study. In addition, some patients discontinued due to adverse events, withdrew consent or were lost to follow-up. 38% and 60% of patients in the PBO and PP1M arms were followed through Visit 16 (p=0.0001).

Case Study: SCA-3004 Cumulative Probability 0.0 0.2 0.4 0.6 0.8 1.0 PP1M Placebo 0 5 10 15 Last Visit On Study

Case Study: SCA-3004 Focus: Patient function as measured by the Personal and Social Performance (PSP) scale. The PSP scale is scored from 1 to 100 with higher scores indicating better functioning based on evaluation of 4 domains (socially useful activities, personal/social relationships, self-care, and disturbing/aggressive behaviors). Estimate treatment-specific mean PSP at Visit 16 in the counterfactual world in which all patients who are followed to Visit 16. The mean PSP score among completers was 76.05 and 76.96 in the PBO and PP1M arms; the estimated difference is -0.91 (95%: -3.98:2.15).

Case Study: SCA-3004 L n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 3 70.7 1 8 67.6 65.3 2 8 76.3 74.3 60.6 3 9 70.9 71.8 68.7 58.1 4 5 75.2 75.2 75.6 67.6 64.6 5 8 74.8 77.3 75.1 76.4 78.9 74.9 6 3 72.7 74.7 73.7 73.0 74.3 73.0 68.7 7 2 72.0 68.5 68.5 71.0 72.5 72.5 72.5 68.5 8 2 80.5 79.5 74.0 73.0 71.5 72.0 72.5 71.5 63.5 9 4 69.8 69.0 70.3 71.8 73.3 72.8 71.8 73.5 70.5 59.8 10 4 74.3 71.8 73.3 72.5 73.5 74.0 73.8 78.0 78.0 78.0 67.3 11 2 72.0 71.0 70.0 71.5 69.5 72.0 75.0 71.0 72.5 76.5 75.5 74.0 12 4 76.5 78.0 72.8 74.5 74.0 74.0 74.5 77.5 76.8 76.3 75.5 78.3 72.0 15 4 69.8 70.8 70.0 69.8 70.8 72.8 71.5 72.0 68.0 67.3 67.0 68.3 68.0 66.0 67.0 70.3 16 98 73.0 73.8 73.7 74.4 74.9 75.3 74.9 75.0 75.5 75.9 76.3 76.6 76.8 76.8 76.6 77.0 77.0 T 164 72.9 73.3 72.5 72.9 74.3 74.8 74.4 74.8 74.9 75.1 75.6 76.3 76.3 76.3 76.2 76.7 77.0

Case Study: SCA-3004 L n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 2 67.5 1 12 68.3 60.2 2 12 67.3 66.0 57.4 3 15 67.2 67.7 68.1 60.1 4 6 73.3 75.7 75.0 79.7 63.7 5 14 69.9 72.3 72.2 72.1 71.9 60.9 6 7 70.9 71.6 69.4 68.6 70.0 70.7 65.7 7 6 69.7 71.5 70.8 68.8 69.8 71.5 72.0 59.7 8 6 79.0 80.0 80.7 80.5 79.5 79.2 78.2 79.3 74.8 9 9 72.3 73.4 72.9 73.2 74.3 74.0 73.2 72.6 74.1 58.0 10 3 73.3 75.0 75.7 75.7 80.0 79.7 80.0 72.7 75.7 76.3 52.3 11 4 72.5 71.0 71.0 68.5 70.0 68.8 70.5 72.5 70.3 67.8 64.3 60.8 12 1 62.0 62.0 62.0 62.0 62.0 62.0 62.0 62.0 62.0 62.0 63.0 63.0 62.0 13 3 81.7 75.0 73.7 78.3 76.3 75.7 77.7 71.3 78.3 78.7 77.3 73.0 70.0 55.3 14 2 77.0 79.5 74.5 76.5 80.0 74.0 81.0 81.0 82.5 77.0 81.5 75.5 74.5 75.0 65.0 15 3 65.7 65.7 65.3 66.0 66.3 66.0 67.0 68.0 67.3 67.3 68.7 70.0 68.7 68.7 67.3 65.3 16 65 72.1 73.0 73.2 73.3 73.2 73.3 73.5 74.3 74.6 75.3 75.3 75.3 76.0 76.3 76.0 76.5 76.0 T 170 71.1 71.2 71.3 71.8 72.7 71.8 73.2 73.2 74.4 73.0 73.7 74.1 75.3 75.1 75.3 76.0 76.0

Case Study: SCA-3004 Conditional probability of dropout (simulated data) 0.10 0.08 0.06 0.04 0.02 0.00 Placebo arm Active arm 0.00 0.02 0.04 0.06 0.08 0.10 Conditional probability of dropout (actual data)

Case Study: SCA-3004 0.04 Kolmogorov-Smirnov Statistic 0.03 0.02 0.01 0.00 0 5 10 15 visit Placebo Active

Case Study: SCA-3004 Under MAR (i.e., α = 0), the estimated means of interest are 69.60 and 74.37 for the PBO and PP1M arms. The estimated treatment difference is 4.77 (95% CI: -10.89 to 0.09).

Case Study: SCA-3004 r(y) 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 y

Case Study: SCA-3004 y k+1 yk+1 Log Odds Ratio 30 20 α 0.01 40 30 α 0.18 50 40 α 0.40 60 50 α 0.30 70 60 α 0.09 80 700 α 0.01

Case Study: SCA-3004 80 PP1M Estimated PSP score at last visit 80 Placebo Estimated PSP score at last visit 75 75 70 70 PSP 65 PSP 65 60 60 55 bias corrected one-step 55 bias corrected one-step one step one step 50 50-10 -5 0 5 10-10 -5 0 5 10 α α

Case Study: SCA-3004

Simulation Study PP1M PBO α Estimator µ Bias MSE µ Bias MSE -10 µ( P) 73.64 0.43 1.41 69.06 2.04 7.47 µ 0.33 1.29 1.53 6.47 µ bc 0.07 2.28 0.55 9.03-5 µ( P) 74.25 0.29 1.17 70.23 1.55 5.12 µ 0.19 1.08 1.13 4.54 µ bc -0.00 1.98 0.38 6.86-1 µ( P) 74.59 0.20 1.04 71.47 0.94 3.05 µ 0.09 0.96 0.59 2.84 µ bc -0.07 1.82 0.08 4.98 0 µ( P) 74.63 0.19 1.03 71.70 0.82 2.75 µ 0.08 0.95 0.50 2.61 µ bc -0.07 1.82 0.04 4.68 1 µ( P) 74.67 0.18 1.01 71.90 0.72 2.52 µ 0.07 0.94 0.43 2.42 µ bc -0.07 1.79 0.01 4.44 5 µ( P) 74.77 0.16 0.99 72.41 0.48 2.04 µ 0.06 0.92 0.27 2.03 µ bc -0.07 1.75-0.03 3.87 10 µ( P) 74.84 0.15 0.97 72.74 0.34 1.80 µ 0.06 0.91 0.20 1.82 µ bc -0.06 1.73-0.01 3.54

Simulation Study PP1M PBO α Procedure Coverage Coverage -10 IF 88.6% 65.8% SB 93.6% 90.8% FDB 94.3% 93.9% -5 IF 91.3% 72.3% SB 94.2% 91.4% FDB 94.6% 93.9% -1 IF 92.7% 81.6% SB 94.4% 92.2% FDB 94.8% 94.1% 0 IF 92.8% 83.1% SB 94.4% 92.6% FDB 94.8% 94.2% 1 IF 92.9% 84.2% SB 94.5% 92.8% FDB 94.9% 94.1% 5 IF 93.0% 87.0% SB 94.6% 93.5% FDB 94.7% 94.6% 10 IF 93.1% 88.7% SB 94.6% 94.1% FDB 94.8% 94.8%

Software - SAMON www.missingdatamatters.org

Discussion Among patients on study at visit k with observed history Y k, our model does not allow unmeasured predictors of R k+1 and Y k+1. logit P[R k+1 = 0 R k = 1, Y k, Y K] = h k (Y k ) + αr(y K) Incorporate auxiliary covariates. Intermittent missing data.