Semiparametric Estimation for a Generalized KG Model with R. Model with Recurrent Event Data

Similar documents
Modeling and Analysis of Recurrent Event Data

Modelling and Analysis of Recurrent Event Data

Recurrent Event Data: Models, Analysis, Efficiency

Dynamic Models in Reliability and Survival Analysis

Time-to-Event Modeling and Statistical Analysis

Modeling and Non- and Semi-Parametric Inference with Recurrent Event Data

Introduction to repairable systems STK4400 Spring 2011

e 4β e 4β + e β ˆβ =0.765

Estimating Load-Sharing Properties in a Dynamic Reliability System. Paul Kvam, Georgia Tech Edsel A. Peña, University of South Carolina

Multivariate Survival Analysis

Lecture 5 Models and methods for recurrent event data

Multistate models and recurrent event models

Lecture 3. Truncation, length-bias and prevalence sampling

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Multistate models and recurrent event models

Chapter 2 Inference on Mean Residual Life-Overview

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Multistate Modeling and Applications

Lecture 22 Survival Analysis: An Introduction

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

Non- and Semi-parametric Bayesian Inference with Recurrent Events and Coherent Systems Data

1 Glivenko-Cantelli type theorems

Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data


Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

On the generalized maximum likelihood estimator of survival function under Koziol Green model

Concepts and Tests for Trend in Recurrent Event Processes

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Step-Stress Models and Associated Inference

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

Survival Times (in months) Survival Times (in months) Relative Frequency. Relative Frequency

Tests of independence for censored bivariate failure time data

A Regression Model For Recurrent Events With Distribution Free Correlation Structure

Survival Times (in months) Survival Times (in months) Relative Frequency. Relative Frequency

Hacettepe Journal of Mathematics and Statistics Volume 45 (5) (2016), Abstract

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

ST495: Survival Analysis: Maximum likelihood

Optimal goodness-of-fit tests for recurrent event data

MAS3301 / MAS8311 Biostatistics Part II: Survival

Efficiency Comparison Between Mean and Log-rank Tests for. Recurrent Event Time Data

Frailty Models and Copulas: Similarities and Differences

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Frailty Modeling for clustered survival data: a simulation study

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Survival Analysis Math 434 Fall 2011

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Joint Modeling of Longitudinal Item Response Data and Survival

Survival Analysis. Lu Tian and Richard Olshen Stanford University

CHAPTER 1 A MAINTENANCE MODEL FOR COMPONENTS EXPOSED TO SEVERAL FAILURE MECHANISMS AND IMPERFECT REPAIR

Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation

STAT331. Cox s Proportional Hazards Model

STAT Sample Problem: General Asymptotic Results

Analysing geoadditive regression data: a mixed model approach

From semi- to non-parametric inference in general time scale models

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

Harvard University. Harvard University Biostatistics Working Paper Series

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

Multivariate Survival Data With Censoring.

A Bivariate Weibull Regression Model

Propensity Score Weighting with Multilevel Data

A general mixed model approach for spatio-temporal regression data

University of California, Berkeley

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models.

11 Survival Analysis and Empirical Likelihood

Censoring mechanisms

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

MAS3301 / MAS8311 Biostatistics Part II: Survival

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

JOINT REGRESSION MODELING OF TWO CUMULATIVE INCIDENCE FUNCTIONS UNDER AN ADDITIVITY CONSTRAINT AND STATISTICAL ANALYSES OF PILL-MONITORING DATA

Rank Regression Analysis of Multivariate Failure Time Data Based on Marginal Linear Models

ST745: Survival Analysis: Nonparametric methods

Survival Analysis for Case-Cohort Studies

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistical Inference and Methods

Modeling of Dependence Between Critical Failure and Preventive Maintenance: The Repair Alert Model

4. Comparison of Two (K) Samples

Using Estimating Equations for Spatially Correlated A

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

Hybrid Censoring; An Introduction 2

Stat 5101 Lecture Notes

Missing Covariate Data in Matched Case-Control Studies

Two Likelihood-Based Semiparametric Estimation Methods for Panel Count Data with Covariates

A Very Brief Summary of Statistical Inference, and Examples

Survival Distributions, Hazard Functions, Cumulative Hazards

Exercises. (a) Prove that m(t) =

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Survival Analysis. Stat 526. April 13, 2018

Models for Multivariate Panel Count Data

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

1 Delayed Renewal Processes: Exploiting Laplace Transforms

CTDL-Positive Stable Frailty Model

Goodness-of-fit test for the Cox Proportional Hazard Model

Attributable Risk Function in the Proportional Hazards Model

Beyond GLM and likelihood

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

A New Two Sample Type-II Progressive Censoring Scheme

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Transcription:

Semiparametric Estimation for a Generalized KG Model with Recurrent Event Data Edsel A. Peña (pena@stat.sc.edu) Department of Statistics University of South Carolina Columbia, SC 29208 MMR Conference Beijing, China June 2011

Historical Perspective: Random Censorship Model (RCM) Random Observables: T 1,T 2,...,T n IID F C 1,C 2,...,C n IID G F and G not related {T i } {C i } (Z 1,δ 1 ),(Z 2,δ 2 ),...,(Z n,δ n ) Z i = T i C i and δ i = I{T i C i } Goal: To make inference on the distribution F or the hazard Λ = df/ F, or functionals (mean, median, etc.).

Nonparametric Inference F F = space of continuous distributions Λ = log F; F = 1 F = exp( Λ);λ = Λ N(s) = i I{Z i s;δ i = 1} and Y(s) = i I{Z i s} NAE: ˆΛ = dn Y and PLE: ˆ F = [ 1 dn Y ] Properties of ˆΛ and ˆF well-known, e.g., biased; consistent; and when normalized, weakly convergent to Gaussian processes. 1 t Avar(ˆ F(t)) = n F(t) 2 0 df(s) F(s) 2Ḡ(s)

An Informative RCM Koziol-Green (KG) Model (1976): β 0, Ḡ(t) = F(t) β Lehmann-type alternatives Proportional hazards: Z i = min(t i,c i ) F β+1 Λ G = βλ F δ i = I{T i C i } BER(1/(β +1)) An important characterizing property: Z i δ i

MMC Data: First Event Times and Tau s Migratory Motor Complex (MMC) data set from Aalen and Husebye, Stat Med, 1991. MMC Data Set Unit Number 0 5 10 15 20 0 100 200 300 400 500 600 700 Calendar Time

MMC Data: Hazard Estimates KG model holds if and only if Λ τ Λ T1 MMC Data Set Cumulative Hazard Estimate 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0 100 200 300 400 500 600 700 First Event Time and Tau

KG Model s Utility Chen, Hollander and Langberg (JASA, 1982): exploited independence between Z i and δ i to obtain the exact bias, variance, and MSE functions of PLE. Comparisons with asymptotic results. Cheng and Lin (1987): exploited semiparametric nature of KG model to obtain a more efficient estimator of F compared to the PLE. Hollander and Peña (1988): obtained better confidence bands for F under KG model. Csorgo & Faraway (1998): KG model not practically viable, but serves as a mathematical specimen (a la yeast) for examining exact properties of procedures and for providing pinpoint assessment of efficiency losses/gains.

Recurrent Events: Some Examples admission to hospital due to chronic disease tumor re-occurrence migraine attacks alcohol or drug (eg cocaine) addiction machine failure or discovery of a bug in a software commission of a criminal act by a delinquent minor! major disagreements between a couple non-life insurance claim drop of 200 points in DJIA during trading day publication of a research paper by a professor

Full MMC Data: With Recurrences n = 19 subjects; event = end of migratory motor complex cycle; random length of monitoring period per subject. MMC Data Set Unit Number 0 5 10 15 20 0 100 200 300 400 500 600 700 Calendar Time

Bladder Cancer Data Set: Groups [control (red), thiotepa (blue)]; Covariates; in WLW (89) Subject 0 20 40 60 80 0 10 20 30 40 50 60 Calendar Time

Data Accrual: One Subject

Some Aspects in Recurrent Data random monitoring length (τ). random # of events (K) and sum-quota constraint: k K K = max k : T j τ with T j τ < j=1 j=1 Basic Observable: (K,τ,T 1,T 2,...,T K,τ S K ) always a right-censored observation. dependent and informative censoring. K+1 effects of covariates, frailties, interventions after each event, and accumulation of events. j=1 T j

Approaches: Recurrent Event Analysis First-event analysis. Inefficient. How much do we gain by utilizing the recurrences? Full modeling approach: AG (82) and in PH (04), PSG (07), book by ABG (08). Full modeling: harder to implement and requires intensity process specification. Marginal modeling approach: Wei, Lin, and Weissfeld (89). Conditional modeling approach: Prentice, Williams, and Peterson (81). Marginal and conditional approaches: quite popular, but are there foundational issues?

Simplest Model: One Subject T 1,T 2,... IID F: (renewal model) perfect interventions after each event τ G F and G not related no covariates (X) no frailties (Z) F could be parametric or nonparametric Peña, Strawderman and Hollander (JASA, 01): nonparametric estimation of F.

Nonparametric Estimation of F Y(t) = N(t) = n K i I{T ij t} i=1 j=1 n K i I{T ij t}+i{τ i S iki t} i=1 j=1 t dn(w) GNAE : Λ(t) = 0 Y(w) t [ GPLE : F(t) = 1 dn(w) ] Y(w) 0

Main Asymptotic Result k kth Convolution: F (k) (t) = Pr T j t Renewal Function: ρ(t) = ν(t) = 1 Ḡ(t) σ 2 (t) = F(t) 2 t 0 t j=1 F (k) (t) k=1 ρ(w t)dg(w) df(w) F(w) 2Ḡ(w)[1+ν(w)] Theorem (JASA, 01): n( F(t) F(t)) GP(0,σ 2 (t))

Extending KG Model: Recurrent Setting Wanted: a tractable model with monitoring time informative about F. Potential to refine analysis of efficiency gains/losses. Idea: Why not simply generalize the KG model for the RCM. Generalized KG Model (GKG) for Recurrent Events: β > 0, Ḡ(t) = F(t) β with β unknown, and F the common inter-event time distribution function. Remark: τ may also represent system failure/death, while the recurrent event could be shocks to the system. Remark: Association (within unit) could be modeled through a frailty.

Estimation Issues and Some Questions How to semiparametrically estimate β, Λ, and F? Parametric estimation in Adekpedjou, Peña, and Quiton (2010, JSPI). How much efficiency loss is incurred when the informative monitoring model structure is ignored? How much penalty is incurred with Single-event analysis relative to Recurrent-event analysis? In particular, what is the efficiency loss for estimating F when using the nonparametric estimator in PSH (2001) relative to the semiparametric estimator that exploits the informative monitoring structure?

Basic Processes S ij = j k=1 T ik N i (s) = I{S ij s} j=1 Y i (s) = I{τ i s} R i (s) = s S in = backward recurrence time i (s ) A i (s) = s 0 Y i (v)λ[r i (v)]dv N τ i (s) = I{τ i s} Y τ i (s) = I{τ i s}

Transformed Processes Z i (s,t) = I{R i (s) t} N i (s,t) = Y i (s,t) = A i (s,t) = s 0 N i (s ) j=1 s 0 N Z i (v,t)n i (s)) i (dv) = I{T ij t} j=1 I{T ij t}+i{(s τ i ) S in t} i (s ) t Z i (v,t)a i (dv) = Y i (s,w)λ(w)dw 0

Aggregated Processes N(s,t) = Y(s,t) = A(s,t) = n N i (s,t) i=1 n Y i (s,t) i=1 n A i (s,t) i=1 N τ (s) = Y τ (s) = n Ni τ (s) i=1 n i=1 Y τ i (s)

First, Assume β Known Via Method-of-Moments Approach, estimator of Λ: ˆΛ(s,t β) = t 0 { N(s,dw)+N τ } (dw) Y(s,w)+βY τ (w) Using product-integral representation of F in terms of Λ, estimator of F: ˆ F(s,t β) = t w=0 { } 1 N(s,dw)+Nτ (dw) Y(s,w)+βY τ (w)

Estimating β: Profile Likelihood MLE Profile Likelihood: L P (s ;β) = β Nτ (s ) {[ n s { i=1 Estimator of β: [ s v=0 { v=0 1 Y(s,v)+βY τ (v) 1 Y(s,v)+βY τ (v) ˆβ = argmax β L P(s ;β) } N τ i (Δv) ] } Ni (s,δv) ]} Computational Aspect: in R, we used optimize to get good seed for the Newton-Raphson iteration.

Estimators of Λ and F Estimator of Λ: ˆΛ(s,t) = ˆΛ(s,t ˆβ) = Estimator of F: ˆ F(s,t) = ˆ F(s,t ˆβ) = t 0 { } N(s,dw)+N τ (dw) Y(s,w)+ ˆβY τ (w) { } t 1 N(s,dw)+N τ (dw) Y(s,w)+ ˆβY τ (w) w=0

Illustrative Data (n = 30): GKG[Wei(2,.1), β =.2] Recurrent Event Data Subject 0 5 10 15 20 25 30 0.0 0.1 0.2 0.3 0.4 Calendar Time

Estimates of β and F ˆβ =.2331 Estimates of SF for Illustrative Data (Blue=GKG; Red=PSH; Green=TRUE) Estimate Survivor Probability 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.1 0.2 0.3 0.4 Time

Properties of Estimators G s (w) = G(w)I{w < s}+i{w s} E{Y 1 (s,t)} y(s,t) = F(t)Ḡs(t)+ F(t) E{Y1 τ (t)} yτ β (t) = F(t) t ρ(w t)dg s (w) True Values = (F 0,Λ 0,β 0 ) y 0 (s,t) = y(s,t;λ 0,β 0 ) y τ 0 (s) = yτ (s;λ 0,β 0 )

Existence, Consistency, Normality Theorem There is a sequence of ˆβ that is consistent, and ˆΛ(s, ) and ˆ F(s, ) are both uniformly strongly consistent. Theorem As n, we have n(ˆβ β0 ) N(0,[I P (s ;Λ 0,β 0 )] 1 ) with I P (s ;Λ 0,β 0 ) = 1 β 0 s 0 y τ 0 (v)y 0(s,v) y 0 (s,v)+β 0 y τ 0 (v)λ 0(v)dv.

Weak Convergence of ˆΛ(s, ) Theorem As n, { n[ˆλ(s,t) Λ 0 (t)] : t [0,t ]} converges weakly to a zero-mean Gaussian process with variance function σ 2ˆΛ(s,t) = [ s 0 [ t 0 t 0 Λ 0 (dv) y 0 (s,v)+β 0 y τ 0 (v)+ y 0 (s,v)y τ 0 (v) β 0 [y 0 (s,v)+β 0 y τ 0 (v)]λ 0(dv) y τ 0 (v) y 0 (s,v)+β 0 y τ 0 (v)λ 0(dv) ] 2. ] 1 Remark: The last product term is the effect of estimating β. It inflates the asymptotic variance.

Weak Convergence of ˆ F(s, ) and F(s, ) Corollary As n, { n[ˆ F(s,t) F 0 (t)] : t [0,t ]} converges weakly to a zero-mean Gaussian process whose variance function is σ 2ˆF (s,t) = F 0 (t) 2 σ 2ˆΛ (s,t) F 0 (t) 2 σ 2 Λ (s,t). Recall/Compare! Theorem (PSH, 2001) As n, { n[ F(s,t) F 0 (t)] : t [0,t ]} converges weakly to a zero-mean Gaussian process whose variance function is σ 2 F(s,t) = F 0 (t) 2 t 0 Λ 0 (dv) y 0 (s,v).

Asymptotic Relative Efficiency: β 0 Known If we know β 0 : ARE{ F(s,t) : ˆ F(s,t β 0 )} = { t } 1 Λ 0 (dw) 0 y 0 (s,w) { t } Λ 0 (dw) y 0 (s,w)+β 0 y0 τ(w) 0 Clearly, this could not exceed unity, as is to be expected.

Case of Exponential F: β 0 Known Theorem If F 0 (t) = exp{ θ 0 t} for t 0 and s, then Also, t 0, ARE{ F(,t) : ˆ F(,t β0 )} = { } 1 1 du F 0 (t) (1+β 0 )u 2+β 0 { } 1 du (1+β 0 )u 2+β 0 +β 2 0 u 1+β. 0 F 0 (t) ARE{ F(,t) : ˆ F(,t;β0 )} 1+β 0 1+β 0 +β0 2.

ARE-Plots; β 0 {.1,.3,.5,.7,.9,1.0,1.5} Known; F = Exponential Efficiency(PSH:GKG0) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 F(t)

Case of β 0 Unknown As to be expected, if β 0 is known, then the estimator exploiting the GKG structure is more efficient. Question: Does this dominance hold true still if β 0 is now estimated? Theorem Under the GKG model, for all ( F 0,β 0 ) with β 0 > 0, F(s,t) is asymptotically dominated by ˆ F(s,t) in the sense that ARE( F(s,t) : ˆ F(s,t)) 1. Proof. Neat application of Cauchy-Schwartz Inequality.

ARE-Plots; β 0 {.1,.3,.5,.7,.9,1.0,1.5} Unknown; F = Exponential Efficiency(PSH:GKG) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 F(t)

Assessing Asymptotic Approximations under Exponential n = 50 n = 100 β 0 Mean SE ASE Mean SE ASE 0.1.1014.0245.0234.1005.0169.0165 0.3.3048.0624.0596.3018.0430.0422 0.5.5119.1038.0983.5045.0708.0695 0.7.7176.1518.1406.7075.1032.0994 0.9.9213.1964.1864.9093.1356.1318 1.0 1.0294.2298.2107 1.0172.1579.1490 1.5 1.5615.3963.3445 1.5316.2572.2436 Mean: mean of the simulated values of ˆβ. SE: standard error of the simulated values of ˆβ. ASE: asymptotic standard error obtained from earlier formula. 10000 replications were performed.

Simulations under Weibull F: ˆβ Results n = 30,α = 2 n = 50,α = 2 n = 30,α =.9 n = 50,α =.9 β 0 Mean SE Mean SE Mean SE Mean SE 0.1.10.03.10.02.10.03.10.02 0.3.31.09.30.07.30.07.30.06 0.5.52.16.51.12.51.13.50.10 0.7.73.23.72.17.72.20.71.14 0.9.95.32.92.23.94.26.92.19 1.0 1.06.36 1.03.26 1.04.30 1.03.22 1.5 1.64.63 1.58.43 1.60.55 1.56.38

Simulated Rel Eff of F : ˆ F under a Weibull F with α = 2 Simulated Efficiencies of PLEPSH relative to PLEGKG Effi(PSH:GKG) 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 F(t)

Simulated Rel Eff of F : ˆ F under a Weibull F with α = 0.9 Simulated Efficiencies of PLEPSH relative to PLEGKG Effi(PSH:GKG) 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 F(t)

Some Concluding Thoughts Revisited the random censorship model with one event and the Koziol-Green model. Revisited nonparametric estimation of inter-event distribution in the presence of recurrent event data. Extended the Koziol-Green model to the situation with recurrent event. Considered the semiparametric estimation of the inter-event distribution under this generalized recurrent event data. Examined finite- and asymptotic properties of the estimators under this GKG structure. Assessed efficiency losses of the fully nonparametric estimator of F relative to that exploiting GKG structure.

Acknowledgements Joint work with Akim Adekpedjou (Missouri University of Science and Technology). Research support from the National Science Foundation and National Institutes of Health. Thanks to Bo Lindqvist for inviting me to participate in this MMR and to Lirong Cui and his colleagues for organizing this MMR.