Understanding product integration. A talk about teaching survival analysis.

Similar documents
A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Multi-state models: prediction

Multistate Modeling and Applications

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Empirical Processes & Survival Analysis. The Functional Delta Method

A multi-state model for the prognosis of non-mild acute pancreatitis

Robust estimates of state occupancy and transition probabilities for Non-Markov multi-state models

STAT Sample Problem: General Asymptotic Results

Philosophy and Features of the mstate package

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

A multi-state model for the prognosis of non-mild acute pancreatitis

Exercises. (a) Prove that m(t) =

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

STAT331 Lebesgue-Stieltjes Integrals, Martingales, Counting Processes

Censoring and Truncation - Highlighting the Differences

9 Estimating the Underlying Survival Distribution for a

Statistical Analysis of Competing Risks With Missing Causes of Failure

Estimation for Modified Data

A multistate additive relative survival semi-markov model

Survival Analysis Math 434 Fall 2011

Multistate models STK4080 H Competing risk setting 2. Illness-Death setting 3. General event histories and Aalen-Johansen estimator

Product-limit estimators of the survival function with left or right censored data

Multi-state Models: An Overview

DAGStat Event History Analysis.

Estimating transition probabilities for the illness-death model The Aalen-Johansen estimator under violation of the Markov assumption Torunn Heggland

Pricing and Risk Analysis of a Long-Term Care Insurance Contract in a non-markov Multi-State Model

A General Kernel Functional Estimator with Generalized Bandwidth Strong Consistency and Applications

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Analysis of competing risks data and simulation of data following predened subdistribution hazards

ST745: Survival Analysis: Nonparametric methods

Resampling methods for randomly censored survival data

9. Estimating Survival Distribution for a PH Model

Survival Regression Models

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Cox s proportional hazards model and Cox s partial likelihood

STAT 331. Martingale Central Limit Theorem and Related Results

Introduction to BGPhazard

11 Survival Analysis and Empirical Likelihood

Statistical Inference and Methods

Survival Analysis I (CHL5209H)

The Wild Bootstrap for Multivariate Nelson-Aalen Estimators

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

A note on the decomposition of number of life years lost according to causes of death

MAS3301 / MAS8311 Biostatistics Part II: Survival

Kernel density estimation in R

CHAPTER 1 A MAINTENANCE MODEL FOR COMPONENTS EXPOSED TO SEVERAL FAILURE MECHANISMS AND IMPERFECT REPAIR


Nonparametric Model Construction

1 Glivenko-Cantelli type theorems

Multi-state models: a flexible approach for modelling complex event histories in epidemiology

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Quantifying the Predictive Accuracy of Time to Event Models in the Presence of Competing Risks

Direct likelihood inference on the cause-specific cumulative incidence function: a flexible parametric regression modelling approach

Right-truncated data. STAT474/STAT574 February 7, / 44

Reliability Engineering I

Multistate models and recurrent event models

Survival Analysis. Stat 526. April 13, 2018

Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time

Product-limit estimators of the gap time distribution of a renewal process under different sampling patterns

Lecture 7. Poisson and lifetime processes in risk analysis

Introduction to Reliability Theory (part 2)

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Residuals and model diagnostics

UNIVERSITY OF CALIFORNIA, SAN DIEGO

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

A GENERALIZED ADDITIVE REGRESSION MODEL FOR SURVIVAL TIMES 1. By Thomas H. Scheike University of Copenhagen

Multistate models and recurrent event models

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Survival analysis in R

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Longitudinal + Reliability = Joint Modeling

Consistency of bootstrap procedures for the nonparametric assessment of noninferiority with random censorship

Package Rsurrogate. October 20, 2016

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Empirical Likelihood in Survival Analysis

arxiv: v1 [stat.me] 5 Feb 2016

The changelos Package

Assessing the effect of a partly unobserved, exogenous, binary time-dependent covariate on -APPENDIX-

On the generalized maximum likelihood estimator of survival function under Koziol Green model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Lecture 5 Models and methods for recurrent event data

Journal of Statistical Software

Journal of Statistical Software

Technische Universität München. Estimating LTC Premiums using. GEEs for Pseudo-Values

On the Breslow estimator

4. Comparison of Two (K) Samples

Multivariate Survival Data With Censoring.

STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model

Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes.

1 The problem of survival analysis

A nonparametric test for path dependence in discrete panel data

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Survival analysis in R

Efficiency of Profile/Partial Likelihood in the Cox Model

Modelling geoadditive survival data

STAT Section 2.1: Basic Inference. Basic Definitions

57:022 Principles of Design II Midterm Exam #2 Solutions

Transcription:

Understanding product integration. A talk about teaching survival analysis. Jan Beyersmann, Arthur Allignol, Martin Schumacher. Freiburg, Germany DFG Research Unit FOR 534 jan@fdm.uni-freiburg.de It is product integration that switches from hazards to probabilities. Product integration is not unusually difficult, but notoriously neglected. This talk: Use R for approaching product integration. One R function for approximating the true survival function and for computing Kaplan-Meier. Generalizes to more complex models; e.g. useful for numerical approximation and simulation with time-dependent covariates. 1

Survival analysis is hazard-based. Alive Dead Survival time T, censoring time C: T C, 1(T C) The hazard is undisturbed by censoring: cumulative hazard A(t), hazard A(dt) = P (T dt T t) = P (T C dt, T C T C t) A(dt) estimated by increments of the Nelson-Aalen estimator: Â(dt) = # observed alive dead transitions at t # observed to be alive just prior t Kaplan-Meier is a deterministic function of the Nelson-Aalen estimator Â(dt), and we have ( 1 Â(dt i) ) ( P t ) exp A(du) = P (T > t) 0 t i t The convergence statement is not very intuitive. 2

Product integration π Recall A(du) = P (T < u + du T u). 1 A(du) = P (T u + du T u) Survival function P (T > t) = P (T t + dt) should be an infinite product over [0, t] of 1 A(du)-terms: S(t) = π t 0 (1 A(du)) K k=1 (1 A(t k )) for a partition (t k ) of [0, t] K k=1 P (T > t k T > t k 1 ), P (T > t) = exp ( t 0 A(du) ) : solution of a product integral. Kaplan-Meier is a product integral of the empirical hazards. Roadmap: Check this via R. Use exactly the same code for true survival function and Kaplan-Meier. 3

A simple R function for product integration Pass partition of [0, t] and cumulative hazard to prodint prodint <- function(time.points,a){ prod(1-diff(apply(x=matrix(times), MARGIN=1, FUN=A))) } E.g. exponential distribution with cumulative hazard A(t) = 0.9 t A.exp <- function(time.point){return(0.9*time.point)} on the time interval [0, 1]: > times <- seq(0,1,0.001) > prodint(times,a.exp);exp(-0.9*max(times)) [1] 0.4064049 [1] 0.4065697 The vector of time points does not have to be equally spaced: > prodint(runif(n=1000, min=0, max=1), A.exp) [1] 0.4063475 Conclusion: K k=1 (1 A(t k )) approaches S(t) and we write π t 0 (1 da(u)) for the limit. Can be tailored to return a survival function. 4

From Nelson-Aalen to Kaplan-Meier via product integration Recall: empirical hazard Â(dt) = # observed alive dead transitions at t # observed to be alive just prior t Nelson-Aalen estimator Â(dt) of the cumulative hazard. Kaplan-Meier is the product integral of one minus Nelson- Aalen: π Ŝ(t) = t ( 0 1 Â(du)) = ( 1 Â(dt k) ) t k t Continuous mapping theorem: Ŝ(t) = π t 0 ( 1 Â(du)) P π t 0 (1 A(du)) = S(t) Kaplan-Meier can be computed by prodint applied to Â(dt). 5

prodint computes Kaplan-Meier. 100 event times exp 0.9: event.times <- rexp(100,0.9) 100 censoring times cens.times u[0, 5]: runif(100,0,5) Observed times obs.times <- pmin(event.times, cens.times) About 24% of the observations censored. Compute Nelson-Aalen with mvna or fit.surv <- survfit(surv(obs.times,c(event.times<=cens.times))) A <- function(time.point){ sum(fit.surv$n.event[fit.surv$time <= time.point]/ fit.surv$n.risk[fit.surv$time <= time.point]) } and estimate the survival function at, e.g., time 1 > prodint(obs.times[obs.times<=1],a) [1] 0.4370994 Value of fit.surv$surv for time 1 is 0.4370994. 6

Why is product integration useful? Survival analysis is hazard-based. It is product integration that recovers both the underlying and the empirical distribution function. Properties of Nelson-Aalen estimator are easiest to study. Properties of product integration (continuity, Hadamard-differentiability) allow to transfer results to Kaplan-Meier: consistency, asymptotic distribution. Generalizes to quite complex models where Kaplan-Meier and the exp( cumulative hazard)-formula fail, but are often erroneously applied. 7

Matrix-valued product integration for multivariate hazards. Transient 1 One hazard per arrow! Transient 0 2 Absorbing Closed formulae for transition probabilities usually not available. Can be approximated using product integration. Can be estimated by applying product integration to multivariate Nelson-Aalen: Aalen-Johansen. R: packages mvna, etm, matrix-valued function prodint E.g. useful for time-dependent covariates: estimation, simulation. Standard assumptions: time-inhomogeneous Markov or random censoring. 8

A brief summary and some references Move from hazards to probabilities thru product integration both in the modelling and the empirical world. We can and should do this teaching survival analysis. Works in more complex models (incl. competing risks), avoiding hypothetical quantities. R. Gill and S. Johansen. A survey of product-integration with a view towards application in survival analysis. Annals of Statistics, 18(4):1501 1555, 1990. O. Aalen and S. Johansen, An empirical transition matrix for non-homogeneous Markov chains based on censored observations, Scand J Stat vol. 5 pp. 141 150, 1978. P. Andersen, Ø. Borgan, R. Gill, and N. Keiding.Statistical models based on counting processes. Springer, 1993. J. Beyersmann, T. Gerds, and M. Schumacher. Letter to the editor: comment on Illustrating the impact of a time-varying covariate with an extended Kaplan-Meier estimator by Steven Snapinn, Qi Jiang, and Boris Iglewicz in the November 2005 issue of The American Statistician. The American Statistician, 60(30):295 296, 2006. Arthur s talk on mvna. 9