Recurrent Event Data: Models, Analysis, Efficiency

Similar documents
Modeling and Non- and Semi-Parametric Inference with Recurrent Event Data

Time-to-Event Modeling and Statistical Analysis

Modelling and Analysis of Recurrent Event Data

Modeling and Analysis of Recurrent Event Data

Semiparametric Estimation for a Generalized KG Model with R. Model with Recurrent Event Data

Dynamic Models in Reliability and Survival Analysis

Lecture 5 Models and methods for recurrent event data

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Lecture 3. Truncation, length-bias and prevalence sampling

e 4β e 4β + e β ˆβ =0.765


Survival Analysis Math 434 Fall 2011

Frailty Models and Copulas: Similarities and Differences

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Non- and Semi-parametric Bayesian Inference with Recurrent Events and Coherent Systems Data

11 Survival Analysis and Empirical Likelihood

Lecture 22 Survival Analysis: An Introduction

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Multivariate Survival Analysis

Introduction to repairable systems STK4400 Spring 2011

Multistate Modeling and Applications

Chapter 2 Inference on Mean Residual Life-Overview

DAGStat Event History Analysis.

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

MAS3301 / MAS8311 Biostatistics Part II: Survival

Survival Analysis. Stat 526. April 13, 2018

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Survival Times (in months) Survival Times (in months) Relative Frequency. Relative Frequency

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Survival Times (in months) Survival Times (in months) Relative Frequency. Relative Frequency

Survival Regression Models

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Goodness-of-fit test for the Cox Proportional Hazard Model

Models for Multivariate Panel Count Data

1 Glivenko-Cantelli type theorems

University of California, Berkeley

Lecture 2: Martingale theory for univariate survival analysis

AFT Models and Empirical Likelihood

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

STAT331. Cox s Proportional Hazards Model

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Exercises. (a) Prove that m(t) =

Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

8. Parametric models in survival analysis General accelerated failure time models for parametric regression

Efficiency of Profile/Partial Likelihood in the Cox Model

Estimating Load-Sharing Properties in a Dynamic Reliability System. Paul Kvam, Georgia Tech Edsel A. Peña, University of South Carolina

ST495: Survival Analysis: Maximum likelihood

Survival Analysis for Case-Cohort Studies

Introduction to Reliability Theory (part 2)

Frailty Modeling for clustered survival data: a simulation study

From semi- to non-parametric inference in general time scale models

Cox s proportional hazards model and Cox s partial likelihood

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Multivariate Survival Data With Censoring.

STAT Sample Problem: General Asymptotic Results

Quantile Regression for Residual Life and Empirical Likelihood

Multistate models and recurrent event models

Step-Stress Models and Associated Inference

Statistical Inference and Methods

Power and Sample Size Calculations with the Additive Hazards Model

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Empirical Likelihood in Survival Analysis

β j = coefficient of x j in the model; β = ( β1, β2,

Hacettepe Journal of Mathematics and Statistics Volume 45 (5) (2016), Abstract

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

Statistical Analysis of Competing Risks With Missing Causes of Failure

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Multistate models and recurrent event models

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

On the Breslow estimator

Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Survival Analysis. STAT 526 Professor Olga Vitek

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Nonparametric two-sample tests of longitudinal data in the presence of a terminal event

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

Likelihood Construction, Inference for Parametric Survival Distributions

Reliability Engineering I

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Censoring mechanisms

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Beyond GLM and likelihood

Optimal goodness-of-fit tests for recurrent event data

Lecture 7 Time-dependent Covariates in Cox Regression

Survival Analysis. Lu Tian and Richard Olshen Stanford University

CTDL-Positive Stable Frailty Model

1 Introduction. 2 Residuals in PH model

On the generalized maximum likelihood estimator of survival function under Koziol Green model

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip

Regularization in Cox Frailty Models

Investigation of goodness-of-fit test statistic distributions by random censored samples

CHAPTER 1 A MAINTENANCE MODEL FOR COMPONENTS EXPOSED TO SEVERAL FAILURE MECHANISMS AND IMPERFECT REPAIR

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Transcription:

Recurrent Event Data: Models, Analysis, Efficiency Edsel A. Peña Department of Statistics University of South Carolina Columbia, SC 29208 Talk at Brown University March 10, 2008

Recurrent Event Data:Models, Analysis, Efficiency p. Some (Recurrent) Events Submission of a manuscript for publication. Occurrence of tumor. Onset of depression. Patient hospitalization. Machine/system failure. Occurrence of a natural disaster. Non-life insurance claim. Change in job. Onset of economic recession. At least a 200 points decrease in the DJIA. Marital disagreement.

Recurrent Event Data:Models, Analysis, Efficiency p. Event Times and Distributions T : the time to the occurrence of an event of interest. F (t) = Pr{T t} : the distribution function of T. S(t) = F (t) = 1 F (t) : survivor/reliability function. Hazard rate/ probability and Cumulative Hazards: Cont: λ(t)dt Pr{T t + dt T t} = f(t) S(t ) dt Disc: λ(t j ) = Pr{T = t j T t j } = f(t j) S(t j ) Cumulative: Λ(t) = t 0 λ(w)dw or Λ(t) = t j t λ(t j )

Recurrent Event Data:Models, Analysis, Efficiency p. Representation/Relationships 0 < t 1 <... < t M = t, M(t) = max t i t i 1 = o(1), S(t) = Pr{T > t} = M i=1 M i=1 Pr{T > t i T t i 1 } [1 {Λ(t i ) Λ(t i 1 )}]. Identities: S(t) = w t [1 Λ(dw)] Λ(t) = t 0 df (w) 1 F (w )

Recurrent Event Data:Models, Analysis, Efficiency p. Estimating F A Classic Problem: Given T 1, T 2,..., T n IID F, obtain an estimator ˆF of F. Importance? θ(f ) of F (e.g., mean, median, variance) estimated via ˆθ = θ( ˆF ). Prediction of time-to-event for new units. Comparing groups, e.g., thru a statistic Q = [ W (t)d ˆF1 (t) ˆF ] 2 (t) where W (t) is some weight function.

Recurrent Event Data:Models, Analysis, Efficiency p. Gastroenterology Data: Aalen and Husebye ( 91) Migratory Motor Complex (MMC) Times for 19 Subjects MMC Data Set Unit Number 0 5 10 15 20 0 100 200 300 400 500 600 700 Calendar Time Question: How to estimate the MMC period dist, F?

Recurrent Event Data:Models, Analysis, Efficiency p. Parametric Approach Assume a Model: F = {F (t; θ) : θ Θ R p } Given t 1, t 2,..., t n, θ estimated by ˆθ such as ML. L(θ) = n i=1 f(t i; θ) = n i=1 λ(t i; θ) exp{ Λ(t i ; θ)} ˆθ = arg max θ L(θ) ˆF pa (t) = F (t; ˆθ) I(θ) = Var{ θ log f(t 1; θ)} ˆθ AN ( θ, 1 n I(θ) 1) F (t; θ) = θ ˆF pa (t) AN F (t; θ) ( F (t; θ), n 1 F (t; θ) I(θ) 1 F ) (t; θ)

Recurrent Event Data:Models, Analysis, Efficiency p. Nonparametric Approach No assumptions are made regarding the family of distributions to which the unknown df F belongs. Empirical Distribution Function (EDF): ˆF np (t) = 1 n n i=1 I{T i t} ˆF np ( ) is a nonparametric MLE of F. Since I{T i t}, i = 1, 2,..., n, are IID Ber(F (t)), by Central Limit Theorem, ˆF np (t) AN (F (t), 1n ) F (t)[1 F (t)].

Recurrent Event Data:Models, Analysis, Efficiency p. An Efficiency Comparison Assume: F F = {F (t; θ) : θ Θ} ˆF pa and ˆF np are asymptotically unbiased. Efficiency = Ratio of Asymptotic Variances: Eff( ˆF pa (t) : ˆF np (t)) = F (t; θ)[1 F (t; θ)] F (t; θ) I(θ) 1 F. (t; θ) When F = {F (t; θ) = 1 exp{ θt} : θ > 0}: Eff( ˆF pa (t) : ˆF np (t)) = exp{θt} 1 (θt) 2

Efficiency: Parametric/Nonparametric Asymptotic efficiency of parametric versus nonparametric estimators under a correct negative exponential family model. Effi of Para Relative to NonPara in Expo Case Efficiency (in percent) 0 200 400 600 800 1000 0 1 2 3 4 5 Value of (Theta)(t) Recurrent Event Data:Models, Analysis, Efficiency p.

Recurrent Event Data:Models, Analysis, Efficiency p.1 Whither Nonparametrics? A Misspecified Model: Fitted model is negative exponential family, but the true model is the gamma family. Under wrong model, with T = 1 n n i=1 T i the sample mean, the parametric estimator of F is ˆF pa (t) = 1 exp{ t/ T }. Under gamma with shape α and scale θ, and since T AN(α/θ, α/(nθ 2 )), by δ-method ˆF pa (t) AN ( { 1 exp θt }, 1 α n (θt) 2 α 3 exp { 2(θt) }). α

Efficiency: Under a Mis-specified Model Simulated Effi: MSE(Non-Parametric)/MSE(Parametric) under a mis-specified exponential family model. True Family of Model: Gamma Family Effi of Para vs NonPara under Misspecified Model Effi(%) (Para/Nonpara) 0 50 100 150 200 250 300 n=10 n=30 n=100 n=500 0 1 2 3 4 5 6 Value of t Recurrent Event Data:Models, Analysis, Efficiency p.1

Recurrent Event Data:Models, Analysis, Efficiency p.1 MMC Data: Censoring Aspect For each unit, red mark is the potential termination time. MMC Data Set Unit Number 0 5 10 15 20 0 100 200 300 400 500 600 700 Calendar Time Remark: All 19 MMC times completely observed.

Recurrent Event Data:Models, Analysis, Efficiency p.1 Estimation of F : With Censoring Right-censoring variables: C 1, C 2,..., C n IID G. Observables: (Z i, δ i ), i = 1, 2,..., n with Z i = min{t i, C i } and δ i = I{T i C i }. Problem: Given (Z i, δ i )s, estimate df F or hazard function Λ of the T i s. Nonparametric Approaches: Nonparametric MLE (Kaplan-Meier). Martingale and method-of-moments. Pioneers: Kaplan & Meier; Efron; Nelson; Breslow; Breslow & Crowley; Aalen; Gill.

Recurrent Event Data:Models, Analysis, Efficiency p.1 Product-Limit Estimator Counting and At-Risk Processes: N(t) = n i=1 I{Z i t; δ i = 1}; Y (t) = n i=1 I{Z i t} Hazard probability estimate at t: ˆΛ(dt) = N(t) Y (t) = # of Observed Failures at t # at-risk at t Product-Limit Estimator (PLE): 1 ˆF (t) = Ŝ(t) = w t [ 1 N(t) ] Y (t)

Recurrent Event Data:Models, Analysis, Efficiency p.1 Stochastic Process Approach A martingale M is a zero-mean process which models a fair game. With H t = history up to t: E{M(s + t) H t } = M(t). M(t) = N(t) t 0 Y (w)λ(dw) is a martingale, so with J(t) = I{Y (t) > 0} and stochastic integration, E { t 0 } J(w) Y (w) dn(w) = E { t Nelson-Aalen estimator of Λ, and PLE: 0 } J(w)Λ(dw). ˆΛ(t) = t 0 dn(w) Y (w), so Ŝ(t) = [1 ˆΛ(dw)]. w t

Recurrent Event Data:Models, Analysis, Efficiency p.1 Asymptotic Properties NAE: n[ˆλ(t) Λ(t)] Z 1 (t) with {Z 1 (t) : t 0} a zero-mean Gaussian process with d 1 (t) = Var(Z 1 (t)) = t 0 Λ(dw) S(w)Ḡ(w ). PLE: n[ ˆF (t) F (t)] Z 2 (t) st = S(t)Z 1 (t) so d 2 (t) = Var(Z 2 (t)) = S(t) 2 t 0 Λ(dw) S(w)Ḡ(w ). If Ḡ(w) 1 (no censoring), d 2(t) = F (t)s(t)!

Recurrent Event Data:Models, Analysis, Efficiency p.1 Regression Models Covariates: temperature, degree of usage, stress level, age, blood pressure, race, etc. How to account of covariates to improve knowledge of time-to-event. Modelling approaches: Log-linear models: log(t ) = β x + σɛ. The accelerated failure-time model. Error distribution to use? Normal errors not appropriate. Hazard-based models: Cox proportional hazards (PH) model; Aalen s additive hazards model.

Recurrent Event Data:Models, Analysis, Efficiency p.1 Cox ( 72) PH Model: Single Event Conditional on x, hazard rate of T is: λ(t x) = λ 0 (t) exp{β x}. ˆβ maximizes partial likelihood function of β: L P (β) n [ exp(β x i ) n j=1 Y j(t) exp(β x j ) ] Ni (t). i=1 t< Aalen-Breslow semiparametric estimator of Λ 0 ( ): ˆΛ 0 (t) = t 0 n i=1 dn i(w) n i=1 Y i(w) exp( ˆβ x i ).

Recurrent Event Data:Models, Analysis, Efficiency p.1 MMC Data: Recurrent Aspect Aalen and Husebye ( 91) Full Data MMC Data Set Unit Number 0 5 10 15 20 0 100 200 300 400 500 600 700 Calendar Time Problem: Estimate inter-event time distribution.

Representation: One Subject Recurrent Event Data:Models, Analysis, Efficiency p.2

Recurrent Event Data:Models, Analysis, Efficiency p.2 Observables: One Subject X(s) = covariate vector, possibly time-dependent T 1, T 2, T 3,... = inter-event or gap times S 1, S 2, S 3,... = calendar times of event occurrences τ = end of observation period: Assume τ G K = max{k : S k τ} = number of events in [0, τ] Z = unobserved frailty variable N (s) = number of events in [0, s] Y (s) = I{τ s} = at-risk indicator at time s F = {F s : s 0} = filtration: information that includes interventions, covariates, etc.

Recurrent Event Data:Models, Analysis, Efficiency p.2 Aspect of Sum-Quota Accrual Observed Number of Events: K = max k : k j=1 T j τ Induced Constraint: (T 1, T 2,..., T K ) satisfies K T j τ < j=1 K+1 j=1 T j. (K, T 1, T 2,..., T K ) are all random and dependent, and K is informative about F.

Recurrent Event Data:Models, Analysis, Efficiency p.2 Recurrent Event Models: IID Case Parametric Models: HPP: T i1, T i2, T i3,... IID EXP(λ). IID Renewal Model: T i1, T i2, T i3,... IID F where F F = {F ( ; θ) : θ Θ R p }; e.g., Weibull family; gamma family; etc. Non-Parametric Model: T i1, T i2, T i3,... IID F which is some df. With Frailty: For each unit i, there is an unobservable Z i from some distribution H( ; ξ) and (T i1, T i2, T i3,...), given Z i, are IID with survivor function [1 F (t)] Z i.

Recurrent Event Data:Models, Analysis, Efficiency p.2 A General Class of Full Models Peña and Hollander (2004) model. N (s) = A (s Z) + M (s Z) M (s Z) M 2 0 = sq-int martingales A (s Z) = Intensity Process: s 0 Y (w)λ(w Z)dw λ(s Z) = Z λ 0 [E(s)] ρ[n (s ); α] ψ[β t X(s)]

Effective Age Process: E(s) Recurrent Event Data:Models, Analysis, Efficiency p.2

Recurrent Event Data:Models, Analysis, Efficiency p.2 Effective Age Process, E(s) PERFECT Intervention: E(s) = s S N (s ). IMPERFECT Intervention: E(s) = s. MINIMAL Intervention (BP 83; BBS 85): E(s) = s S Γη(s ) where, with I 1, I 2,... IID BER(p), η(s) = N (s) i=1 I i and Γ k = min{j > Γ k 1 : I j = 1}.

Recurrent Event Data:Models, Analysis, Efficiency p.2 Semi-Parametric Estimation: No Frailty Observed Data for n Subjects: {(X i (s), N i (s), Y i (s), E i(s)) : 0 s s }, i = 1,..., n N i (s) = # of events in [0, s] for ith unit Y i (s) = at-risk indicator at s for ith unit with the model for the signal being A i (s) = s 0 Y i (v) ρ[n i (v ); α] ψ[βt X i (v)] λ 0 [E i (v)]dv where λ 0 ( ) is an unspecified baseline hazard rate function.

Processes and Notations Calendar/Gap Time Processes: N i (s, t) = s 0 I{E i (v) t}n i (dv) A i (s, t) = Notational Reductions: s 0 I{E i (v) t}a i (dv) E ij 1 (v) E i (v)i (Sij 1,S ij ](v)i{y i (v) > 0} ϕ ij 1 (w α, β) ρ(j 1; α)ψ{βt X i [E 1 ij 1 (w)]} E ij 1 [E 1 ij 1 (w)] Recurrent Event Data:Models, Analysis, Efficiency p.2

Recurrent Event Data:Models, Analysis, Efficiency p.2 Generalized At-Risk Process Y i (s, w α, β) N i (s ) j=1 I (Eij 1 (S ij 1 ), E ij 1 (S ij )](w) ϕ ij 1 (w α, β)+ I (EiN (s )(S in (s )), E i))](w) ϕ in (s )((s τ in i (s )(w α, β) i i i For IID Renewal Model (PSH, 01) this simplifies to: Y i (s, w) = N i (s ) j=1 I{T ij w} + I{(s τ i ) S in i (s ) w}

Recurrent Event Data:Models, Analysis, Efficiency p.3 Estimation of Λ 0 A i (s, t α, β) = t 0 Y i (s, w α, β)λ 0 (dw) S 0 (s, t α, β) = n i=1 Y i (s, t α, β) J(s, t α, β) = I{S 0 (s, t α, β) > 0} Generalized Nelson-Aalen Estimator : ˆΛ 0 (s, t α, β) = t 0 { } { } J(s, w α, β) n N i (s, dw) S 0 (s, w α, β) i=1

Recurrent Event Data:Models, Analysis, Efficiency p.3 Estimation of α and β Partial Likelihood (PL) Process: L P (s α, β) = n i=1 N i (s ) j=1 [ ρ(j 1; α)ψ[β t X i (S ij )] S 0 [s, E i (S ij ) α, β] ] N i (S ij) PL-MLE: ˆα and ˆβ are maximizers of the mapping (α, β) L P (s α, β) Iterative procedures. Implemented in an R package called gcmrec (Gonzaléz, Slate, Peña 04).

Recurrent Event Data:Models, Analysis, Efficiency p.3 Estimation of F 0 G-NAE of Λ 0 ( ): ˆΛ 0 (s, t) ˆΛ 0 (s, t ˆα, ˆβ) G-PLE of F 0 (t): t [ ] n i=1 N i(s, dw) ˆ F 0 (s, t) = w=0 1 S 0 (s, w ˆα, ˆβ) For IID renewal model with E i (s) = s S in i (s ), ρ(k; α) = 1, and ψ(w) = 1, the generalized product-limit estimator in PSH (2001, JASA) obtains.

Recurrent Event Data:Models, Analysis, Efficiency p.3 First Application: MMC Data Set Aalen and Husebye (1991) Data Estimates of distribution of MMC period Survivor Probability Estimate 0.0 0.2 0.4 0.6 0.8 1.0 IIDPLE WCPLE FRMLE 50 100 150 200 250 Migrating Moto Complex (MMC) Time, in minutes

Recurrent Event Data:Models, Analysis, Efficiency p.3 Second Application: Bladder Data Set Bladder cancer data pertaining to times to recurrence for n = 85 subjects studied in Wei, Lin and Weissfeld ( 89). Subject 0 20 40 60 80 0 10 20 30 40 50 60 Calendar Time

Recurrent Event Data:Models, Analysis, Efficiency p.3 Results and Comparisons Estimates from Different Methods for Bladder Data Cova Para AG WLW PWP General Model Marginal Cond*nal Perfect a Minimal b log N(t ) α - - -.98 (.07).79 Frailty ξ - - -.97 rx β 1 -.47 (.20) -.58 (.20) -.33 (.21) -.32 (.21) -.57 Size β 2 -.04 (.07) -.05 (.07) -.01 (.07) -.02 (.07) -.03 Number β 3.18 (.05).21 (.05).12 (.05).14 (.05).22 a Effective Age is backward recurrence time (E(s) = s S N (s ) ). b Effective Age is calendar time (E(s) = s). Details: Peña, Slate, and Gonzalez (2007). JSPI.

Recurrent Event Data:Models, Analysis, Efficiency p.3 Sum-Quota Effect: IID Renewal Generalized product-limit estimator ˆF of common gap-time df F presented in PSH (2001, JASA). n( ˆ F ( ) F ( )) = GP(0, σ 2 ( )) σ 2 (t) = F (t) 2 t ν(w) = ρ ( ) = 0 1 Ḡ(w ) dλ(w) F (w)ḡ(w ) [1 + ν(w)] w ρ (v w)dg(v) F j ( ) = renewal function j=1

Recurrent Event Data:Models, Analysis, Efficiency p.3 Efficiency: Some Questions Is it worth using the additional event recurrences in the analysis? How much do we gain in efficiency? Impact of G, the distribution of the τ i s, when G is related to inter-event time distribution? Loss if this informative monitoring structure is ignored? (In)Efficiency of GPLE relative to estimator which exploits informative monitoring structure? What is a reasonable informative monitoring model for examining these questions? Could we extend similar studies that were performed for the PLE using the so-called Koziol-Green Model?

Recurrent Event Data:Models, Analysis, Efficiency p.3 Koziol-Green Model Koziol & Green (1976); Chen Hollander & Langberg (1982); Cheng & Lin (1989) T F and C G with T failure time and C right censoring time. Assumption: 1 G = (1 F ) β for some β 0. Z = min(t, C) and δ = I{T C} are independent; Z H = F β+1 ; δ Ber(1/(1 + β)); β = censoring parameter. CHL: Exact properties of the PLE: mean, variance, mean-squared error. CL: Efficiency of PLE relative to estimator that exploits KG assumption.

Recurrent Event Data:Models, Analysis, Efficiency p.3 Generalized KG: Recurrent Events For a unit or subject, Inter-event times T j s are IID F ; End-of-monitoring time τ has distribution G. Assumption: 1 G = (1 F ) β for some β 0. Remark: Independence property that allowed exact derivations in right-censored single-event settings does not play a role in this recurrent event setting. Efficiency comparisons performed via asymptotic analysis and through computer simulations. Two Cases: (i) F is exponential, and (ii) F is two-parameter Weibull.

Recurrent Event Data:Models, Analysis, Efficiency p.4 Ignoring Informative Structure When F is exponential, so model is HPP. ˆθ n : estimator that exploits informative monitoring. θ n : estimator that ignores informative monitoring. Efficiency Result: ARE(ˆθ n : θ n ) = 0. Surprising result!? It turns out that in the exponential setting, these two estimators are identical. The estimators are: ˆθ n = θ n = n i=1 K i n i=1 τ i

Recurrent Event Data:Models, Analysis, Efficiency p.4 Single-Event Analysis Single-Event Analysis: Only the first, possibly right-censored, observations are used in the statistical analysis? ˇθ: depends only on the first event times. When F is exponential, we have ARE(ˆθ n : ˇθ n ) = 1 β. 1/β: (approximate) expected number of events per unit.

Recurrent Event Data:Models, Analysis, Efficiency p.4 Inefficiency of GPLE How inefficient is the generalized PLE of F compared to the parametric estimator that exploits informative monitoring structure? ˆF n : parametric estimator and exploits informative structure. F n : generalized PLE in PSH (JASA, 01). When F is exponential, ARE( F n (t) : ˆF n (t)) = [(1 + β)t] 2 exp[(1 + β)t] 1, t 0.

Recurrent Event Data:Models, Analysis, Efficiency p.4 Efficiency Plot: Exponential F GPLE ( F ) versus Parametric Estimator ( ˆF ) ARE(p) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.2 0.4 0.6 0.8 1.0 p = Prob(min(T,tau) > t)

Recurrent Event Data:Models, Analysis, Efficiency p.4 Efficiencies: Weibull F n θ 1 θ 2 β MeanEvs Eff(ˆθ : θ) Eff(ˆθ : ˇθ) 50 0.9 1 0.3 3.90 1.26 30.27 50 0.9 1 0.5 2.25 1.41 13.48 50 0.9 1 0.7 1.57 1.66 7.96 50 1.0 1 0.3 3.34 1.30 23.29 50 1.0 1 0.5 2.00 1.53 9.91 50 1.0 1 0.7 1.42 1.71 6.69 50 1.5 1 0.3 1.97 1.49 9.53 50 1.5 1 0.5 1.34 1.75 5.55 50 1.5 1 0.7 1.02 2.11 3.82

Recurrent Event Data:Models, Analysis, Efficiency p.4 Efficiency Plot: Weibull F GPLE ( F ) versus Parametric Estimator ( ˆF ) Simul Params: n = 50 Theta1 = 1.5 Theta2 = 1 Relative Efficiency (F_Tilde:F_Hat) 0.1 0.2 0.3 0.4 0.5 0.6 beta= 0.3 beta= 0.5 beta= 0.7 0.0 0.2 0.4 0.6 0.8 1.0 F(t)

Recurrent Event Data:Models, Analysis, Efficiency p.4 On Marginal Modeling: WLW and PWP k 0 specified (usually the maximum value of the observed Ks). Assume a Cox PH-type model for each S k, k = 1,..., k 0. Counting Processes (k = 1, 2,..., k 0 ): N k (s) = I{S k s; S k τ} At-Risk Processes (k = 1, 2,..., k 0 ): Y W LW k (s) = I{S k s; τ s} Y P W P k (s) = I{S k 1 < s S k ; τ s}

Recurrent Event Data:Models, Analysis, Efficiency p.4 Working Model Specifications WLW Model { N k (s) s 0 Y W LW k (v)λ W LW 0k (v) exp{β W LW k } X(v)}dv PWP Model { N k (s) s 0 Y P W P k (v)λ P W P 0k (v) exp{β P W P k } X(v)}dv are assumed to be zero-mean martingales (in s).

Recurrent Event Data:Models, Analysis, Efficiency p.4 Parameter Estimation See Therneau & Grambsch s book Modeling Survival Data: Extending the Cox Model. ˆβ k W LW and ˆβ k P W P obtained via partial likelihood (Cox (72) and Andersen and Gill (82)). Overall β-estimate: ˆβ W LW = k 0 k=1 ĉ k ˆβW LW k ; c k s being optimal weights. See WLW paper. ˆΛ W 0k LW ( ) and ˆΛ 0k P W P ( ): Aalen-Breslow-Nelson type estimators.

Recurrent Event Data:Models, Analysis, Efficiency p.4 Two Relevant Questions Question 1: When one assumes marginal models for S k s that are of the Cox PH-type, does there exist a full model that actually induces such PH-type marginal models? Answer: YES, by a very nice paper by Nang and Ying (Biometrika:2001). BUT, the joint model obtained is rather limited. Question 2: If one assumes Cox PH-type marginal models for the S k s (or T k s), but the true full model does not induce such PH-type marginal models [which may usually be the case in practice], what are the consequences?

Recurrent Event Data:Models, Analysis, Efficiency p.5 Case of the HPP Model True Full Model: for a unit with covariate X = x, events occur according to an HPP model with rate: λ(t x) = θ exp(βx). For this unit, inter-event times T k, k = 1, 2,... are IID exponential with mean time 1/λ(t x). Assume also that X BER(p) and µ τ = E(τ). Main goal is to infer about the regression coefficient β which relates the covariate X to the event occurrences.

Recurrent Event Data:Models, Analysis, Efficiency p.5 Full Model Analysis ˆβ solves Xi K i Ki = τi X i exp(βx i ) τi exp(βx i ). ˆβ does not directly depend on the S ij s. Why? Sufficiency: (K i, τ i )s contain all information on (θ, β). (S i1, S i2,..., S iki ) (K i, τ i ) d = τ i (U (1), U (2),..., U (Ki )). Asymptotics: ˆβ AN ( β, 1 n (1 p) + pe β ) µ τ θ[(1 p) + pe β. ]

Recurrent Event Data:Models, Analysis, Efficiency p.5 Some Questions Under WLW or the PWP: how are βk W LW related to θ and β? Impact of event position k? and β P W P k Are we ignoring that K i s are informative? Why not also a marginal model on the K i s? Are we violating the Sufficiency Principle? Results simulation-based: Therneau & Grambsch book ( 01) and Metcalfe & Thompson (SMMR, 07). Comment by D. Oakes that PWP estimates less biased than WLW estimates.

Recurrent Event Data:Models, Analysis, Efficiency p.5 Properties of ˆβ W LW k ˆβ W LW k Let on at-risk process Yk W LW Question: Does be the partial likelihood MLE of β based (v). ˆβ W LW k converge to β? g k (w) = w k 1 e w /Γ(k): standard gamma pdf. Ḡ k (v) = v g k (w)dw: standard gamma survivor function. Ḡ( ): survivor function of τ. E( ): denotes expectation wrt X.

Recurrent Event Data:Models, Analysis, Efficiency p.5 Limit Value (LV) of ˆβ W LW k Limit Value βk = β k (θ, β) of ˆβ W LW k : solution in β of 0 E(Xθe βx βx g k (vθe ))Ḡ(v)dv = where 0 e W LW k (v; θ, β, β )E(θe βx βx g k (vθe ))Ḡ(v)dv e W LW k (v; θ, β, β ) = E(Xeβ X Ḡ k (vθe βx )) E(e β X Ḡ k (vθe βx )) Asymptotic Bias of ˆβ W LW k = β k β

Recurrent Event Data:Models, Analysis, Efficiency p.5 Bias Plots for WLW Estimator Colors pertain to value of k, the Event Position k = 1: Black; k = 2: Red; k = 3: Green; k = 4: DarkBlue; k = 5: LightBlue Theoretical Simulated Asymptotic Bias 2 1 0 1 Simulated Bias of WLW 2 1 0 1 2 1.0 0.5 0.0 0.5 1.0 Beta 1.0 0.5 0.0 0.5 1.0 True Beta

Recurrent Event Data:Models, Analysis, Efficiency p.5 On PWP Estimators Main Difference Between WLW and PWP: E(Y W LW k E(Y P W P k (v) X) = Ḡ(v)Ḡk(vθ exp(βx)); (v) X) = Ḡ(v)g k(vθ exp(βx)) θ exp(βx) Leads to: u P W P k (s; θ, β) = 0 for k = 1, 2,.... ˆβ k P W P are asymptotically unbiased for β for each k (at least in this HPP model)! Theoretical result consistent with observed results from simulation studies and D. Oakes observation..

Recurrent Event Data:Models, Analysis, Efficiency p.5 Concluding Remarks Recurrent events prevalent in many areas. Dynamic models: accommodate unique aspects. More research in inference for dynamic models. Current limitation: tracking effective age. Efficiency gains with recurrences. Caution: informative aspects of model. Caution: marginal modeling approaches. Dynamic recurrent event modeling: challenging and still fertile!

Recurrent Event Data:Models, Analysis, Efficiency p.5 Acknowledgements Thanks to NIH grants that partially support research. Research collaborators: Myles Hollander, Rob Strawderman, Elizabeth Slate, Juan Gonzalez, Russ Stocker, Akim Adekpedjou, and Jonathan Quiton. Thanks to current students: Alex McLain, Laura Taylor, Josh Habiger, and Wensong Wu. Thanks to all of you!