Modeling and Analysis of Recurrent Event Data

Similar documents
Semiparametric Estimation for a Generalized KG Model with R. Model with Recurrent Event Data

Modelling and Analysis of Recurrent Event Data

Recurrent Event Data: Models, Analysis, Efficiency

Time-to-Event Modeling and Statistical Analysis

Modeling and Non- and Semi-Parametric Inference with Recurrent Event Data

Dynamic Models in Reliability and Survival Analysis

Introduction to repairable systems STK4400 Spring 2011

e 4β e 4β + e β ˆβ =0.765

Multistate models and recurrent event models

Multistate models and recurrent event models

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

1 Glivenko-Cantelli type theorems

Estimating Load-Sharing Properties in a Dynamic Reliability System. Paul Kvam, Georgia Tech Edsel A. Peña, University of South Carolina

Lecture 5 Models and methods for recurrent event data

Lecture 22 Survival Analysis: An Introduction

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Lecture 3. Truncation, length-bias and prevalence sampling

Multivariate Survival Analysis

Chapter 2 Inference on Mean Residual Life-Overview

Non- and Semi-parametric Bayesian Inference with Recurrent Events and Coherent Systems Data

Frailty Models and Copulas: Similarities and Differences

Survival Times (in months) Survival Times (in months) Relative Frequency. Relative Frequency

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA


Survival Times (in months) Survival Times (in months) Relative Frequency. Relative Frequency

STAT Sample Problem: General Asymptotic Results

ST495: Survival Analysis: Maximum likelihood

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Multistate Modeling and Applications

Concepts and Tests for Trend in Recurrent Event Processes

Statistical Inference and Methods

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

Frailty Modeling for clustered survival data: a simulation study

Step-Stress Models and Associated Inference

Analysing geoadditive regression data: a mixed model approach

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Optimal goodness-of-fit tests for recurrent event data

Models for Multivariate Panel Count Data

From semi- to non-parametric inference in general time scale models

CTDL-Positive Stable Frailty Model

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data

Beyond GLM and likelihood

A general mixed model approach for spatio-temporal regression data

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Harvard University. Harvard University Biostatistics Working Paper Series

MAS3301 / MAS8311 Biostatistics Part II: Survival

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

FRAILTY MODELS FOR MODELLING HETEROGENEITY

ST745: Survival Analysis: Nonparametric methods

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Censoring mechanisms

Survival Analysis Math 434 Fall 2011

MAS3301 / MAS8311 Biostatistics Part II: Survival

Tests of independence for censored bivariate failure time data

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

On the generalized maximum likelihood estimator of survival function under Koziol Green model

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

,..., θ(2),..., θ(n)

Efficiency of Profile/Partial Likelihood in the Cox Model

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

A generalized Brown-Proschan model for preventive and corrective maintenance. Laurent DOYEN.

Joint Modeling of Longitudinal Item Response Data and Survival

Can we do statistical inference in a non-asymptotic way? 1

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Likelihood Construction, Inference for Parametric Survival Distributions

Figure 36: Respiratory infection versus time for the first 49 children.

Bayesian estimation of the discrepancy with misspecified parametric models

Hacettepe Journal of Mathematics and Statistics Volume 45 (5) (2016), Abstract

Exercises. (a) Prove that m(t) =

Qualifying Exam in Probability and Statistics.

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

Survival Analysis for Case-Cohort Studies

BIOS 312: Precision of Statistical Inference

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

CHAPTER 1 A MAINTENANCE MODEL FOR COMPONENTS EXPOSED TO SEVERAL FAILURE MECHANISMS AND IMPERFECT REPAIR

Propensity Score Weighting with Multilevel Data

11 Survival Analysis and Empirical Likelihood

Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis

Using Estimating Equations for Spatially Correlated A

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models.

ON THE FAILURE RATE ESTIMATION OF THE INVERSE GAUSSIAN DISTRIBUTION

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Modelling the risk process

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

University of California, Berkeley

Previous lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.

GOODNESS-OF-FIT TEST FOR RANDOMLY CENSORED DATA BASED ON MAXIMUM CORRELATION. Ewa Strzalkowska-Kominiak and Aurea Grané (1)

Statistical Analysis of Competing Risks With Missing Causes of Failure

Session 9: Introduction to Sieve Analysis of Pathogen Sequences, for Assessing How VE Depends on Pathogen Genomics Part I

Survival Analysis. Stat 526. April 13, 2018

Quantile Regression for Residual Life and Empirical Likelihood

Transcription:

Edsel A. Peña (pena@stat.sc.edu) Department of Statistics University of South Carolina Columbia, SC 29208 New Jersey Institute of Technology Conference May 20, 2012

Historical Perspective: Random Censorship Model (RCM) Random Observables: T 1, T 2,...,T n IID F C 1, C 2,...,C n IID G F and G not related {T i } {C i } (Z 1, δ 1 ), (Z 2, δ 2 ),...,(Z n, δ n ) Z i = T i C i and δ i = I {T i C i } Goal: To make inference on the distribution F or the hazard Λ = df/ F, or functionals (mean, median, etc.).

Nonparametric Inference F F = space of continuous distributions Λ = log F; F = 1 F = exp( Λ); λ = Λ N(s) = i I {Z i s; δ i = 1} and Y (s) = i I {Z i s} NAE: ˆΛ = dn Y and PLE: ˆ F = [ 1 dn Y ] Properties of ˆΛ and ˆF well-known, e.g., biased; consistent; and when normalized, weakly convergent to Gaussian processes. Avar(ˆ F(t)) = 1 n F(t) 2 t 0 df(s) F(s) 2Ḡ(s)

An Informative RCM Koziol-Green (KG) Model (1976): β 0, Ḡ(t) = F(t) β Lehmann-type alternatives Proportional hazards: Z i = min(t i, C i ) F β+1 Λ G = βλ F δ i = I {T i C i } BER(1/(β + 1)) An important characterizing property: Z i δ i

MMC Data: First Event Times and Tau s Migratory Motor Complex (MMC) data set from Aalen and Husebye, Stat Med, 1991. MMC Data Set Unit Number 0 5 10 15 20 0 100 200 300 400 500 600 700 Calendar Time

MMC Data: Hazard Estimates KG model holds if and only if Λ τ Λ T1 MMC Data Set Cumulative Hazard Estimate 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0 100 200 300 400 500 600 700 First Event Time and Tau

KG Model s Utility Chen, Hollander and Langberg (JASA, 1982): exploited independence between Z i and δ i to obtain the exact bias, variance, and MSE functions of PLE. Comparisons with asymptotic results. Cheng and Lin (1987): exploited semiparametric nature of KG model to obtain a more efficient estimator of F compared to the PLE. Hollander and Peña (1988): obtained better confidence bands for F under KG model. Csorgo & Faraway (1998): KG model not practically viable, but serves as a mathematical specimen (a lá yeast) for examining exact properties of procedures and for providing pinpoint assessment of efficiency losses/gains.

Recurrent Events: Some Examples admission to hospital due to chronic disease tumor re-occurrence migraine attacks alcohol or drug (eg cocaine) addiction machine failure or discovery of a bug in a software commission of a criminal act by a delinquent minor! major disagreements between a couple non-life insurance claim drop of 200 points in DJIA during trading day publication of a research paper by a professor

Full MMC Data: With Recurrences n = 19 subjects; event = end of migratory motor complex cycle; random length of monitoring period per subject. MMC Data Set Unit Number 0 5 10 15 20 0 100 200 300 400 500 600 700 Calendar Time

Data Accrual: One Subject

Some Aspects in Recurrent Data random monitoring length (τ). random # of events (K) and sum-quota constraint: k K K = max k : T j τ with T j τ < j=1 j=1 Basic Observable: (K, τ, T 1, T 2,...,T K, τ S K ) always a right-censored observation. dependent and informative censoring. K+1 effects of covariates, frailties, interventions after each event, and accumulation of events. j=1 T j

Simplest Model: One Subject T 1, T 2,... IID F: (renewal model) perfect interventions after each event τ G F and G not related no covariates (X) no frailties (Z) F could be parametric or nonparametric Peña, Strawderman and Hollander (JASA, 01): nonparametric estimation of F.

Nonparametric Estimation of F Y (t) = N(t) = n K i I {T ij t} i=1 j=1 n K i I {T ij t} + I {τ i S iki t} i=1 j=1 t dn(w) GNAE : Λ(t) = 0 Y (w) t [ GPLE : F(t) = 1 dn(w) ] Y (w) 0

Main Asymptotic Result k kth Convolution: F (k) (t) = Pr T j t Renewal Function: ρ(t) = ν(t) = 1 Ḡ(t) σ 2 (t) = F(t) 2 t 0 t j=1 F (k) (t) k=1 ρ(w t)dg(w) df(w) F(w) 2Ḡ(w)[1 + ν(w)] Theorem (JASA, 01): n( F(t) F(t)) GP(0, σ 2 (t))

Extending KG Model: Recurrent Setting Wanted: a tractable model with monitoring time informative about F. Potential to refine analysis of efficiency gains/losses. Idea: Why not simply generalize the KG model for the RCM. Generalized KG Model (GKG) for Recurrent Events: β > 0, Ḡ(t) = F(t) β with β unknown, and F the common inter-event time distribution function. Remark: τ may also represent system failure/death, while the recurrent event could be shocks to the system. Remark: Association (within unit) could be modeled through a frailty.

Estimation Issues and Some Questions How to semiparametrically estimate β, Λ, and F? Parametric estimation in Adekpedjou, Peña, and Quiton (2010, JSPI). How much efficiency loss is incurred when the informative monitoring model structure is ignored? How much penalty is incurred with Single-event analysis relative to Recurrent-event analysis? In particular, what is the efficiency loss for estimating F when using the nonparametric estimator in PSH (2001) relative to the semiparametric estimator that exploits the informative monitoring structure?

Basic Processes S ij = j k=1 T ik N i (s) = I {S ij s} j=1 Y i (s) = I {τ i s} R i (s) = s S in = backward recurrence time i (s ) A i (s) = s 0 Y i (v)λ[r i (v)]dv N τ i (s) = I {τ i s} Y τ i (s) = I {τ i s}

Transformed Processes Z i (s, t) = I {R i (s) t} N i (s, t) = Y i (s, t) = A i (s, t) = s 0 N i (s ) j=1 s 0 N Z i (v, t)n i (s)) i (dv) = I {T ij t} j=1 I {T ij t} + I {(s τ i ) S in i (s ) t} t Z i (v, t)a i (dv) = Y i (s, w)λ(w)dw 0

Aggregated Processes N(s, t) = Y (s, t) = A(s, t) = n N i (s, t) i=1 n Y i (s, t) i=1 n A i (s, t) i=1 N τ (s) = Y τ (s) = n Ni τ (s) i=1 n i=1 Y τ i (s)

First, Assume β Known Via Method-of-Moments Approach, estimator of Λ: ˆΛ(s, t β) = t 0 { N(s, dw) + N τ } (dw) Y (s, w) + βy τ (w) Using product-integral representation of F in terms of Λ, estimator of F: ˆ F(s, t β) = t w=0 { 1 N(s, dw) + } Nτ (dw) Y (s, w) + βy τ (w)

Estimating β: Profile Likelihood MLE Profile Likelihood: L P (s ; β) = β Nτ (s ) {[ n s { i=1 Estimator of β: [ s v=0 { v=0 1 Y (s, v) + βy τ (v) 1 Y (s, v) + βy τ (v) ˆβ = arg max β L P(s ; β) } N τ i ( v) ] } Ni (s, v) ]}

Is It A Legitimate Profile Likelihood? Let C be the space of all cumulative hazard functions defined on [0, ). Theorem Let L(s ; Λ, β) be the full likelihood function given by L = { n s i=1 v=0 exp { } [ ] N Y i (v) i (v)λ[r i (v)]dv [Y τ i (v)βλ(v)dv] Nτ i (v) [ n s i=1 0 ]} s Y i (v)λ[r i (v)]dv + Yi τ (v)βλ(v)dv. 0 then L P (s ; β) = max Λ C L(s ; Λ, β).

Estimators of Λ and F Estimator of Λ: ˆΛ(s, t) = ˆΛ(s, t ˆβ) = Estimator of F: ˆ F(s, t) = ˆ F(s, t ˆβ) = t 0 { } N(s, dw) + N τ (dw) Y (s, w) + ˆβY τ (w) { } t 1 N(s, dw) + N τ (dw) Y (s, w) + ˆβY τ (w) w=0

Illustrative Data (n = 30): GKG[Wei(2,.1), β =.2] Recurrent Event Data Subject 0 5 10 15 20 25 30 0.0 0.1 0.2 0.3 0.4 Calendar Time

Estimates of β and F ˆβ =.2331 Estimates of SF for Illustrative Data (Blue=GKG; Red=PSH; Green=TRUE) Estimate Survivor Probability 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 Time

Properties of Estimators G s (w) = G(w)I {w < s} + I {w s} E{Y 1 (s, t)} y(s, t) = F(t)Ḡ s (t) + F(t) E{Y τ 1 (t)} y τ (t) = F(t) β t ρ(w t)dg s (w) True Values = (F 0, Λ 0, β 0 ) y 0 (s, t) = y(s, t; Λ 0, β 0 ) y τ 0 (s) = y τ (s; Λ 0, β 0 )

Existence, Consistency, Normality Theorem There is a sequence of ˆβ that is consistent, and ˆΛ(s, ) and ˆ F(s, ) are both uniformly strongly consistent. Theorem As n, we have n(ˆβ β 0 ) N(0, [I P (s ; Λ 0, β 0 )] 1 ) with I P (s ; Λ 0, β 0 ) = 1 β 0 s 0 y τ 0 (v)y 0(s, v) y 0 (s, v) + β 0 y τ 0 (v)λ 0(v)dv.

Weak Convergence of ˆΛ(s, ) Theorem As n, { n[ˆλ(s, t) Λ 0 (t)] : t [0, t ]} converges weakly to a zero-mean Gaussian process with variance function σ 2ˆΛ (s, t) = [ s 0 [ t 0 t 0 Λ 0 (dv) y 0 (s, v) + β 0 y τ 0 (v)+ y 0 (s, v)y τ 0 (v) β 0 [y 0 (s, v) + β 0 y τ 0 (v)]λ 0(dv) y τ 0 (v) y 0 (s, v) + β 0 y τ 0 (v)λ 0(dv) ] 2. ] 1 Remark: The last product term is the effect of estimating β. It inflates the asymptotic variance.

Weak Convergence of ˆ F(s, ) and F(s, ) Corollary As n, { n[ˆ F(s, t) F 0 (t)] : t [0, t ]} converges weakly to a zero-mean Gaussian process whose variance function is σ 2ˆF (s, t) = F 0 (t) 2 σ 2ˆΛ (s, t) F 0 (t) 2 σ 2 Λ(s, t). Recall/Compare! Theorem (PSH, 2001) As n, { n[ F(s, t) F 0 (t)] : t [0, t ]} converges weakly to a zero-mean Gaussian process whose variance function is σ 2 F(s, t) = F 0 (t) 2 t 0 Λ 0 (dv) y 0 (s, v).

Asymptotic Relative Efficiency: β 0 Known If we know β 0 : ARE{ F(s, t) : ˆ F(s, t β 0 )} = { t } 1 Λ 0 (dw) 0 y 0 (s, w) { t } Λ 0 (dw) y 0 (s, w) + β 0 y0 τ(w) 0 Clearly, less than or equal to unity, as is to be expected.

Case of Exponential F: β 0 Known Theorem If F 0 (t) = exp{ θ 0 t} for t 0 and s, then Also, t 0, ARE{ F(, t) : ˆ F(, t β 0 )} = { } 1 1 du F 0 (t) (1 + β 0 )u 2+β 0 { } 1 du (1 + β 0 )u 2+β 0 + β 2 0 u 1+β. 0 F 0 (t) ARE{ F(, t) : ˆ F(, t; β 0 )} 1 + β 0 1 + β 0 + β0 2.

ARE-Plots; β 0 {.1,.3,.5,.7,.9, 1.0, 1.5} Known; F = Exponential Efficiency(PSH:GKG0) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 F(t)

Case of β 0 Unknown As to be expected, if β 0 is known, then the estimator exploiting the GKG structure is more efficient. Question: Does this dominance hold true still if β 0 is now estimated? Theorem Under the GKG model, for all ( F 0, β 0 ) with β 0 > 0, F(s, t) is asymptotically dominated by ˆ F(s, t) in the sense that ARE( F(s, t) : ˆ F(s, t)) 1. Proof. Neat application of Cauchy-Schwartz Inequality.

ARE-Plots; β 0 {.1,.3,.5,.7,.9, 1.0, 1.5} Unknown; F = Exponential Efficiency(PSH:GKG) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 F(t)

Assessing Asymptotic Approximations under Exponential n = 50 n = 100 β 0 Mean SE ASE Mean SE ASE 0.1.1014.0245.0234.1005.0169.0165 0.3.3048.0624.0596.3018.0430.0422 0.5.5119.1038.0983.5045.0708.0695 0.7.7176.1518.1406.7075.1032.0994 0.9.9213.1964.1864.9093.1356.1318 1.0 1.0294.2298.2107 1.0172.1579.1490 1.5 1.5615.3963.3445 1.5316.2572.2436 Mean: mean of the simulated values of ˆβ. SE: standard error of the simulated values of ˆβ. ASE: asymptotic standard error obtained from earlier formula. 10000 replications were performed.

Simulations under Weibull F: ˆβ Results n = 30,α = 2 n = 50,α = 2 n = 30,α =.9 n = 50,α =.9 β 0 Mean SE Mean SE Mean SE Mean SE 0.1.10.03.10.02.10.03.10.02 0.3.31.09.30.07.30.07.30.06 0.5.52.16.51.12.51.13.50.10 0.7.73.23.72.17.72.20.71.14 0.9.95.32.92.23.94.26.92.19 1.0 1.06.36 1.03.26 1.04.30 1.03.22 1.5 1.64.63 1.58.43 1.60.55 1.56.38

Simulated Rel Eff of F : ˆ F under a Weibull F with α = 2 Simulated Efficiencies of PLEPSH relative to PLEGKG Effi(PSH:GKG) 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 F(t)

Simulated Rel Eff of F : ˆ F under a Weibull F with α = 0.9 Simulated Efficiencies of PLEPSH relative to PLEGKG Effi(PSH:GKG) 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 F(t)

Mis-Specified Model: F 0 is Exp(1) and G 0 is Uniform[0, θ] Simulated Efficiencies of PLEPSH relative to PLEGKG Effi(PSH:GKG) 0.0 0.5 1.0 1.5 2.0 2.5 Theta = 1 Theta = 2 Theta = 4 Theta = 8 0.0 0.2 0.4 0.6 0.8 1.0 F(t)

Mis-Specified Model: F 0 is Weibull(2, 1) and G 0 is Exp(θ) Simulated Efficiencies of PLEPSH relative to PLEGKG Effi(PSH:GKG) 1 2 3 4 5 6 7 Theta = 1.2 Theta = 1 Theta = 0.5 Theta = 0.25 0.0 0.2 0.4 0.6 0.8 1.0 F(t)

Briefly: On More General Recurrent Event Models Must be able to model the following: comparing treatment groups. effect of covariates or regressor variables. effect of interventions after each event. effect of accumulating event occurrences. modeling association of inter-event times within a unit or subject.

Recall: Data Accrual for One Subject

Bladder Cancer Data Set: Groups [control (red), thiotepa (blue)]; Covariates; in WLW (89) Subject 0 20 40 60 80 0 10 20 30 40 50 60 Calendar Time

A General Class of Dynamic Recurrent Event Models P{dN i (s) = 1 F s, Z i } = Z i λ 0 [E i (s)]ρ[(s, N i (s ));α]ψ[x i(s)β]ds λ 0 ( ): baseline hazard rate function; nonparametric. E i ( ): predictable effective (real) age process; effect of interventions. Ei (s) = s: minimal intervention. Ei (s) = s S in i (s ): perfect intervention. ρ(, ; α): known function; effect of event occurrences. ψ( ): known link function; effect of covariate X i. Z i : unobserved frailty variable; distribution H( ; η). Model Parameters: (λ 0 ( ), α, β, η).

Some Inferential Aspects for General Model Peña and Hollander (2004): proposed and discussed model, especially its generality. Peña, Slate and Gonzalez (2007): semiparametric estimation of model parameters. Stocker and Peña (2007): parametric estimation of model parameters. Peña (2012?): asymptotic properties of the estimators of semiparametric estimators of model parameters. Other inference methods: goodness-of-fit; group comparisons; model validation; estimating parameters in effective age process; Bayesian Approach. Challenge: need to gather effective age or real age of subjects/units/components in real studies.

Some Concluding Thoughts Revisited RCM with one event and Koziol-Green (KG) model. Revisited nonparametric estimation of inter-event distribution with recurrent event data. Extended KG model to recurrent event settings. Semiparametric estimation under this GKG recurrent event model. Examined properties of estimators under GKG structure. Efficiency losses of fully nonparametric estimator relative to GKG-based estimator. How about a Bayesian Approach to Inference? Touched on general dynamic models for recurrent event data.

Acknowledgements Work on GKG model joint with Akim Adekpedjou (Missouri University of Science and Technology). Research support from the National Science Foundation and National Institutes of Health. Thanks to Professor Sundar Subramanian and the organizers for inviting me to this conference.