Lecture 2: Martingale theory for univariate survival analysis

Similar documents
STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model

Survival Analysis: Counting Process and Martingale. Lu Tian and Richard Olshen Stanford University

STAT331 Lebesgue-Stieltjes Integrals, Martingales, Counting Processes

Linear rank statistics

STAT Sample Problem: General Asymptotic Results

Exercises. (a) Prove that m(t) =

STAT 331. Martingale Central Limit Theorem and Related Results

UNIVERSITY OF CALIFORNIA, SAN DIEGO

DAGStat Event History Analysis.

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

University of California, Berkeley

PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

STAT331. Cox s Proportional Hazards Model

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

Lecture 5 Models and methods for recurrent event data

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Survival Analysis Math 434 Fall 2011

Lecture 22 Survival Analysis: An Introduction

Efficiency of Profile/Partial Likelihood in the Cox Model

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

9 Estimating the Underlying Survival Distribution for a

1. Stochastic Processes and filtrations

Goodness-of-fit test for the Cox Proportional Hazard Model

1 Glivenko-Cantelli type theorems

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Follow this and additional works at: Part of the Applied Mathematics Commons

Cox s proportional hazards model and Cox s partial likelihood

MAS3301 / MAS8311 Biostatistics Part II: Survival

TMA 4275 Lifetime Analysis June 2004 Solution

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Lecture 3. Truncation, length-bias and prevalence sampling

Tentamentsskrivning: Stochastic processes 1

Product-limit estimators of the survival function with left or right censored data

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

A Comparison of Different Approaches to Nonparametric Inference for Subdistributions

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Survival Analysis for Case-Cohort Studies

Martingale Theory for Finance

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Survival Analysis. Stat 526. April 13, 2018

MATH 6605: SUMMARY LECTURE NOTES

1 Introduction. 2 Residuals in PH model

Attributable Risk Function in the Proportional Hazards Model

EMPIRICAL LIKELIHOOD AND DIFFERENTIABLE FUNCTIONALS

Cox s proportional hazards/regression model - model assessment

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

(A n + B n + 1) A n + B n

Statistical Methods for Alzheimer s Disease Studies

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

11 Survival Analysis and Empirical Likelihood

Statistical Inference and Methods

Filtrations, Markov Processes and Martingales. Lectures on Lévy Processes and Stochastic Calculus, Braunschweig, Lecture 3: The Lévy-Itô Decomposition

Survival Analysis. STAT 526 Professor Olga Vitek

6. Brownian Motion. Q(A) = P [ ω : x(, ω) A )

Asymptotic Properties of Kaplan-Meier Estimator. for Censored Dependent Data. Zongwu Cai. Department of Mathematics

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

9 Brownian Motion: Construction

Empirical Processes & Survival Analysis. The Functional Delta Method

STAT 200C: High-dimensional Statistics

Universal examples. Chapter The Bernoulli process

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

The instantaneous and forward default intensity of structural models

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Multivariate Survival Data With Censoring.

ECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018

Solution for Problem 7.1. We argue by contradiction. If the limit were not infinite, then since τ M (ω) is nondecreasing we would have

Continuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

POWER AND SAMPLE SIZE DETERMINATIONS IN DYNAMIC RISK PREDICTION. by Zhaowen Sun M.S., University of Pittsburgh, 2012

Survival Analysis I (CHL5209H)

ST745: Survival Analysis: Nonparametric methods

Applications of Ito s Formula

CIMPA SCHOOL, 2007 Jump Processes and Applications to Finance Monique Jeanblanc

Exercise Exercise Homework #6 Solutions Thursday 6 April 2006

9. Estimating Survival Distribution for a PH Model

Chapter 2 Inference on Mean Residual Life-Overview

Statistical Analysis of Competing Risks With Missing Causes of Failure

A Very Brief Summary of Statistical Inference, and Examples

Tests of independence for censored bivariate failure time data

Point Process Control

Tyler Hofmeister. University of Calgary Mathematical and Computational Finance Laboratory

Estimation of the Bivariate and Marginal Distributions with Censored Data

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Quick Review on Linear Multiple Regression

n E(X t T n = lim X s Tn = X s

Residuals and model diagnostics

Likelihood Construction, Inference for Parametric Survival Distributions

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Economics 508 Lecture 22 Duration Models

BRANCHING PROCESSES AND THEIR APPLICATIONS: Lecture 15: Crump-Mode-Jagers processes and queueing systems with processor sharing

4. Comparison of Two (K) Samples

Module 9: Stationary Processes

A Concise Course on Stochastic Partial Differential Equations

Problem Sheet 1. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X : Ω R be defined by 1 if n = 1

Analysis of Time-to-Event Data: Chapter 2 - Nonparametric estimation of functions of survival time

TESTINGGOODNESSOFFITINTHECOX AALEN MODEL

Transcription:

Lecture 2: Martingale theory for univariate survival analysis In this lecture T is assumed to be a continuous failure time. A core question in this lecture is how to develop asymptotic properties when studying statistical methods for univariate survival data? Empirical process approach - a general tool for asymptotic theory Martingale theory - enjoy some advantages in variance simplifications for right-censored data; widely used!

2.1 Notation. f(t): density function of T F (t) = P (T t) = t f(u)du: cumulative distribution function S(t) = 1 F (t) = f(u)du: survival function of T t S(t) = exp{ Λ(t)}, Λ(t): cumulative hazard function Λ(t) = logs(t) λ(t) = Λ (t): hazard function C: censoring time

more notation: i = 1, 2,..., n: index for subjects X i = min(t i, C i ): observed survival time (possibly censored) i = I(T i < C i ): censoring indicator X (1), X (2),..., X (k) : ordered uncensored times R(t) = {(X j, j ) : X j t, j = 1, 2,..., n}: risk set at t Y i (t) = I(X i t): at-risk indicator Y (t) = Y i(t): total no. of subjects at risk at t Z i or Z i (t): (time-varying) covariates N i (t) = I(X i t, i = 1): count subject i s failure event prior to time t N(t) = N i(t): total number of observed events prior to time t {(X i, i, Z i ) : i = 1, 2,..., n}: collected data

2.2 Martingale theory: Initial setting Probability space (Ω, F, P ) Ω: sample space F : a class of subsets of Ω; the class is a σ-algebra P : a probability measure on (Ω, F ) Conditional expectation E[Y X] 1. E[Y X] is σ(x)-measurable 2. E[Y X] = E[Y σ(x)] - average prediction of Y given all the information on X 3. E[Y ] = E[E[Y X]]

Stochastic process {X(t); t τ} on (Ω, F, P ) 1. a collection of random variables indexed by t τ 2. for a fixed sample ω Ω, X(t; ω) is a sample path as a function of t; for convenience, we usually write X(t) instead of X(t; ω)

- Define F t as the σ-field generated by {X(u) : u t}: information about X( ) from to t - {W (t); t } is adapted to F t if W (t) is F t -measureable - Define F as the collection of F t, t τ. It is called the filtration of the underlying probability space A stochastic process M(t) is a martingale with respect to the stochastic process X(t), t τ, if - M(t) is adapted to filtration F : for each t, the random variable M(t) is a F t -measurable function - E[M(t)] < - for s, E[M(t + s) F t ] = M(t) (fair game)

Under the same setting, for s - M(t) is submartingale if E[M(t + s) F t ] > M(t) (winning) - M(t) is supermartingale if E[M(t + s) F t ] < M(t) (losing) If M(t) is a martingale, then - E[dM(t)] = - if E[M()] =, then E[M(s)] =, s Predictable process W (t) - W (t) is F t -measurable - E[W (t) F t ] = W (t) - left-continuous process adapted to F t is predictable - example: at-risk process E[Y (t) F t ] = Y (t)

2.3 Doob-Meyer Decomposition Theorem Doob-Meyer Decomposition Theorem If X(t) is an adapted, right-continuous, non-negative submartingale, then there exists a unique left-continuous, increasing, predictable process A(t) such that A() =, E[A(t)] < and Q(t) = X(t) A(t) is a martingale. A(t) is called a compensator. da(t) = E[dX(t) Ft ]

Example. - X(t) = N i (t) is a sub-martingale. - Y i (t) = I(X i t) is left-continuous with left-hand limit and is a predictable process. - Define a predictable process A(t) = t Y i(u)λ(u)du. - Then, da(t) = Y i (t)λ(t)dt = E[dN i (t) F t ] - Q(t) = M i (t) = N i (t) t Y i(u)λ(u)du is a martingale. - We can show by a more direct proof that M i (t) is a martingale in next few slides.

Consider a Hilbert space H which is a complete metric space with respect to the distance function induced by the inner product < X, Y >= E[XY ]. The norm of the inner product for X H based on the inner product <, > is defined as X = < X, X > = (E[X 2 ]) 1/2. martingale: M(t) = N(t) t Y (u)λ(u)du variance: < M, M > (t) = var[m(t)] = E[M(t) 2 ] Define: d c < M, M > (t) = E[d(M(t)) 2 F t ]

2.4 variance and covariance of martingales Calculation of E[dN i (t) F t ], E[dM i (t) F t ] dn i (t) = or 1 E[dN i (t) F t ] = P r(dn i (t) = 1 F t ) - given F t, Y i (t) is known (predictable) - if Y i (t) = (i.e., subject i has failed or been censored before t), then P r(dn i (t) = 1 F t ) = - if Y i (t) = 1 (i.e., subject i at risk at t), then P r(dn i (t) = 1 F t ) = λ(t)dt under independent censoring. Exercise. Prove this result. E[dN i (t) F t ] = Y i (t)λ(t)dt E[{dN i (t) Y i (t)λ(t)} F t ] = E[dM i (t) F t ] =

M i (t) = N i (t) t Y i(u)λ(u)du is a martingale. proof. - M i (t) is adapted to F - E[M i (t)] = < - Note that for s, [{ t+s E[M i (t + s) F t ] = E dn i (u) t t+s = = = t+s t } ] Y i (u)λ(u)du F t + M i (t) E [dn i (u) Y i (u)λ(u)du F t ] + M i (t) t t+s [ ] E[E dn i (u) Y i (u)λ(u)du F u F t ] + M i (t) t t+s E[ F t ] + M i (t) = M i (t) t M(t) = N(t) t Y (u)λ(u)du is a martingale.

From now on, use martingales: M i (t) = N i (t) t Y i(u)λ(u)du M(t) = N(t) t Y (u)λ(u)du Note that - E[M i (t)] =, t τ; E[dM i (t) F t ] =, < t τ - E[M(t)] =, t τ; E[dM(t) F t ] =, < t τ

Calculation of variance and covariance: d c < M i, M j > (t) d c < M i, M j > (t) = E[d(M i (t)m j (t)) F t ] d[m i (t)m j (t)] = M i (t)m j (t) M i (t )M j (t ) Thus, = [M i (t ) + dm i (t)][m j (t ) + dm j (t)] M i (t )M j (t ) = M j (t )dm i (t) + M i (t )dm j (t) + dm i (t)dm j (t) E[d(M i (t)m j (t)) F t ] = E[dM i (t) dm j (t) F t ] = cov[dm i (t), dm j (t) F t ] Note: M i (t) = M j (t) d c < M i, M i > (t) = E[d(M i (t)) 2 F t ] = var[dm i (t) F t ]

d c < M i, M i > (t) = Y i (t)λ(t)dt + o p (dt) - dm i (t) = dn i (t) Y i (t)λ(t)dt - Given F t, Y i (t)λ(t)dt is a constant term - var[dm i (t) F t ] = var[dn i (t) F t ] = E[(dN i (t)) 2 F t ] E[dN i (t) F t ] 2 = E[dN i (t) F t ] E[dN i (t) F t ] 2 = Y i (t)λ(t)dt Y i (t)[λ(t)dt] 2 = Y i (t)λ(t)dt + o p (dt) Thus, d c < M i, M i > (t) = var[dm i (t) F t ] = Y i (t)λ(t)dt + o p (dt), this is a predictable variation. (Remark: o p (dt)/dt as dt o p (dt) )

if j k, d c < M j, M k > (t) = o p (dt) M j (t) = N j (t) t Y j(u)λ(u)du M k (t) = N k (t) t Y k(u)λ(u)du d c < M j, M k > (t) = cov[dm j (t), dm k (t) F t ] = E[(dN j (t) Y j (t)λ(t)dt)(dn k (t) Y k (t)λ(t)dt) F t ] = E[dN j (t)dn k (t) F t ] Y j (t)λ(t)dt E[dN k (t) F t ] E[dN j (t) F t ] Y k (t)λ(t)dt + Y j (t)y k (t)(λ(t)dt) 2 = E[dN j (t)dn k (t) F t ] Y j (t)y k (t)(λ(t)dt) 2 = E[dN j (t)dn k (t) F t ] + o p(dt) Note that E[dN j (t)dn k (t) F t ] = as long as N j (t) and N k (t) do not have jump at the same t (with positive probability). Thus, for continuous failure time models, E[dN j (t)dn k (t) F t ] =.

2.5 Martingale Central Limit Theorem (MCLT) Property. Suppose M(t) is a martingale and H(t) is predictable. Then L(t) = t H(u)dM(u) is martingale subject to F t. Recall ˆΛ(t) = t dn(u) Y (u), U n (t) = n[ˆλ(t) Λ(t)] = t 1 n n [dn i (u) Y i (u)λ(u)du] Y (u) n t n t n = Y (u) dm i(u) = Y (u) dm(u) for t < τ = sup t {t : P r(x t) > }.

What is the limiting distribution of U n (t)? An application of MCLT Assumptions: 1. convergent variance: < U n, U n > (t) v(t) 2. U n (t) is smooth (condition skipped) Results: 1. U n (t) converges weakly to U(t) 2. U(t) is a Brownian motion E[U(t)] = var[u(t)] = lim n < U n, U n > (t) independent increment: U(s) and U(t) U(s) are independent for s t

Use the results of the variance and covariance calculations, - if u < v, E[ n Y (u) dm i(u) n Y (v) dm j(v) F v ] = n Y (u) dm i(u) E[ n Y (v) dm j(v) F v ] = - E[ n Y (u) dm i(u) n Y (u) dm i(u) F u ] = n Y (u) var[dm 2 i (u) F u ] = n Y (u) Y 2 i (u)λ(u)du [ ] t n Also, < U n, U n > (t) = E Y (u) 2 Y i(u)λ(u)du [ t ] n = E Y (u) 2 Y (u)λ(u)du [ t ] λ(u)du t λ(u)du = E Y (u)/n E[Y 1 (u)] = v(t) where v(t) = t λ(u)du E[Y du = t 1(u)] λ(u)du S c(u)s(u)

Thus, n[ˆλ(t) Λ(t)] converges weakly to Brownian motion U(t), where E[U(t)] = and var[u(t)] = t λ(u)du S. c(u)s(u) Note that S(t) = e Λ(t), and the Kaplan-Meier estimator satisfies Ŝ(t) = e t dn(u) Y (u) + o p (n 1/2 ) (reference: Breslow and Crowley, 1974). By the functional Delta method, n[ Ŝ(t) S(t)] = n[e ˆΛ(t) e Λ(t) ] + o p (1) d S(t) U(t) (weak convergence)

Results: 1. n[ŝ(t) S(t)] converges weakly to V (t) = S(t) U(t) 2. V (t) is a mean zero Gaussian process E[V (t)] = var[v (t)] = [S(t)] 2 t λ(u)du S c(u)s(u)

Remark: 1. Estimation of var[v (t)]: df t U (v) t var[v (t)] = [S(t)] 2 S X (v) S c (v)s(v) = df [S(t)]2 U (v) S 2 (v) X Thus, by plugging in the KM and empirical distribution estimators, var[v (t)] can be estimated by t d var[v (t)] = [Ŝ(t)]2 ˆF U (v) Ŝ 2 (v), X which is approximately the Greenwood s formula: ṽar(v (t)) = [Ŝ(t)]2 t dn(u) t n 1 Y (u) (Y (u) dn(u)) = [Ŝ(t)]2 d ˆF U (u) Ŝ (u)ŝx (u+ X )

2. Both survival and censoring distributions plays roles in var[v (t)]. Note: The censoring distribution does not play a role in the likelihood function, but does play a role implicitly in estimation inference.

2.6 Proportional hazards model Now consider the proportional hazards model, λ(t Z) = λ (t)exp(βz), where Z is a 1-dimensional covariate. For simplicity of notation, we consider time-independent covariates only. The results can also be extended to time-dependent-covariates cases. L = (i) p(z (i) H(x (i) ), x (i)) {a residual likelihood}, where H(t ) = data history prior to t. Partial likelihood L p = (i) p(z (i) H(x (i) ), x (i)) = n ( e βz i j=1 yj(ti)eβz j Partial score equation: S n (β) = {Z i Z(u; β)}dn i (u) where ZiYi(u) exp(βzi) Z(u; β) = Yi(u) exp(βzi) (expected value of Z i in the risk set at u). The partial likelihood estimator ˆβ is derived by solving S n (β) =. ) δi

A representation of S n (β) S n(t; β) = = + = + = n t {Z i Z(u; β)}dn i (u) n t {Z i Z(u; β)}[dn i (u) Y i (u) exp(βz i )λ (u)du] n t {Z i Z(u; β)}y i (u) exp(βz i )λ (u)du n t {Z i Z(u; β)}dm i (u; β) { t n } n Z i Y i (u) exp(βz i ) Y i (u) exp(βz i ) Z(u; β) λ (u)du n t {Z i Z(u; β)}dm i (u; β) ( )

Let U n (t) = n 1/2 S n (t; β). Then under proper regularity conditions < U n, U n > (t) = 1 n n t t {Z i Z(u)} 2 Y i (u) exp(βz i )λ (u)du E[{Z µ Z (u)} 2 Y i (u) exp(βz i )]λ (u)du Properties of S n (t; β) 1. E[S n (t; β)] = 2. using representation (*) to derive independent increment property: for s t; cov[s n (s; β){s n (t; β) S n (s; β)}] = 3. var[n 1/2 S n (t; β)] =< U n, U n > (t)

The partial likelihood estimator ˆβ n is consistent (proof skipped) Asymptotic normality partial score equation S n (t; β) = t {Z i Z(u; β)}dn i (u) Taylor expansion (mean value expansion): for β n lying between ˆβ n and β, Thus, = S n (t; ˆβ n ) = S n (t; β) + S n(β) β=β β n ( ˆβ n β) { n 1/2 ( ˆβ n β) = n 1 S } 1 n(β) β=β β n n 1/2 S n (β) n 1/2 S n (β) d N(, σ 2 )

slope: S n (β) β = since β n β, n j=1 {Z j Z(u; β)}y j (u) exp(βz j ) j=1 Y dn i (u) j(u) exp(βz j ) n 1 S n(β) β=β β n E[{Z j Z(u; β)}y j (u) exp(βz j )] dp (X j u, j = 1) = σ 2 E[Y j (u) exp(βz j )] thus, by likelihood theory, n( ˆβ n β) d N(, σ 2 )

therefore, standardized version [ n ˆβ n β ] 1/2 D N (, 1) j=1 {Zj Z(u)} 2 exp( ˆβ nz j)y j(u) j=1 exp( ˆβ dn i (u) nz j)y j(u) estimation of σ 2 : n 1 n i j=1 {Z j Z(X i ; ˆβ n )}Y j (X i ) exp( ˆβ n Z j ) j=1 Y j(x i ) exp( ˆβ n Z j )

Denote S Z (u; β) = 95% confidence interval for β is ˆβ n ± 1.96 j=1 {Zj Z(u)} 2 exp(βz j)y j(u) j=1 exp(βzj)yj(u) [ n S Z (u; ˆβ n )dn i (u) ] 1/2

Hypothesis testing on H : β = β : rejects H at 5% type-i error Wald s test: ˆβ n β [ n S Z (u; ˆβ n )dn i (u) The partial likelihood score test: ] 1/2 n 1 2 S n (β ) var[s n (β )] 1.96 1.96 i.e., {Z i Z(u; β )}dn i (u) [ S Z (u; β )dn i (u) ] 1/2 1.96

what if β = and Z = /1 in the partial likelihood score test: {Z i Z(u; β )}dn i (u) [ S Z (u; β )dn i (u) ] 1/2 where S Z (u; β ) = j=1 {Z j Z(u)} 2 exp(β Z)Y j (u) j=1 exp(β Z)Y j (u) = Z(u){1 Z(u)} and Z(u) is the fraction of treated subjects in the risk set at u. Note: The partial likelihood score test is the Log-rank test with a more general variance estimator. This variance estimator estimator coincides with that of the Log-rank test when failure time T is continuous.

Proportional hazards model for multivariate covariates Z R p λ(t Z) = λ (t)exp(β T Z) β = (β 1,..., β p ) T is p-vector partial score function where Z p 1 i = S n (β) = Z i1 Z i2... Z ip n p 1 {Z p 1 i Z p 1 (u, β)}dn i (u), Zp 1 (u, β) = Z 1 (u, β) Z 2 (u, β)... Z p (u, β) p 1

n 1/2 S n (β) D N (, Σ(β)), where Σ(β) is the limit of n 1 = n 1 n j=1 {Z j Z} p 1 {Z j Z} T 1 pexp(βz j )Y i (u) j=1 exp(βz dn i (u) j)y i (u) where v = 1, v 1 = v, v 2 = vv T. partial derivative Sn(β n ) β Σ(β) p p n 1/2 ( ˆβ n β) D N (, Σ 1 (β)) j=1 {Z 2 j Z} p pexp(βz j )Y i (u) j=1 exp(βz dn i (u) j)y i (u)

Estimation of baseline hazard function (Breslow estimator) ˆΛ (t; ˆβ n ) = n t dn i (u) t Y i(u)exp( ˆβ n T Z i ) = dn i(u) Y i(u)exp( ˆβ n T Z i ) consistent asymptotic normality: n 1/2 [ˆΛ (t; ˆβ n ) Λ (t)] equals n 1/2 [ˆΛ (t; β) Λ (t)] + n 1/2 [ˆΛ (t; ˆβ n ) ˆΛ (t; β)] = Term I + Term II

Term I: n 1/2 n t dm i (t; β) exp(βt Z i )Y i (u), which is a sum of martingale residuals with predictable variation: n n t Y i (u)exp(β T Z i )λ (u)du t [ exp(βt Z i )Y i (u)] λ (u)du 2 E[exp(β T Z)Y (u)] = σ2 1(t)

Term II: mean value expansion t = n 1/2 ( ˆβ n β) t n 1/2 [ˆΛ (t; ˆβ n ) ˆΛ (t; β)] exp(β T n Z i )Z i Y i (u)dn(u) [ exp(β T n Z i )Y i (u)] 2 exp(β T n Z i )Z i Y i (u)dn(u) [ µ(t) exp(β T n Z i )Y i (u)] 2

n 1/2 ( ˆβ n β) = n 1/2 {Z i Z}dM i (u; β) S 2 Z (u; β)dn(u) D N (, σ 2 ) covariance between Term I and Term II n t dm i (t; β) exp(βt Z i )Y i (u), which is a predictable covariation n t n {Z i Z(u)}dM i (u; β), {Z i Z(u)}exp(β T Z i )Y i (u)λ (u)du exp(βt Z i )Y i (u) =

Term I and II are uncorrelated Thus, n 1/2 [ˆΛ (t; ˆβ n ) Λ (t)] N (, σ 2 1(t) + µ(t)2 σ 2 )

Appendix Consistency of K-M estimator We consider the general case that the failure time T could be partly continuous and partly discrete. Recall S(t) = P {T t}. Define S(t) = 1 S(t) = P {T < t} and Define the subsurvival functions S (t) = S Y (t) = P {Y t} = [S(t)][S C (t)] Then, SU (t) = P {Y t, δ = 1} = SC(t) = P {Y t, δ = } = t S (t) = S U (t) + S C(t). t [S C (u)]d S(u) [S(u)]d S C (u)

We will show that S(t) can be expressed as a function of SU (t) and S C (t). (i) Suppose SU (t) is continuous at u < t. t ds U (u) S U (u) + S C (u) = = t t = log S(t). S C (u)df (u) S(u)S C (u) ds(u) S(u) = log[s(u)] t

Therefore, [ t dsu S(t) = exp (u) ] SU (u) + S C (u). (ii) Suppose SU has a jump at t, but S C is continuous at t. log S U (t+) + S C (t+) S U (t ) + S C (t ) = log [S(t+)][S C(t+)] [S(t )][S C (t )], = log [S(t+)] [S(t )] (The second equality follows from the fact that S c is continuous at t so S C (t+) = S C (t ).)

Therefore, S(t+) S(t) = S(t+) { S(t ) = exp log [ S U (t+) + SC (t+) ]} SU (t ) + S C (t ). (The first equality is due to the left continuity of survival function.) If the underlying distributions S and S C have no common jumps, then from (i) and (ii) t S(t) = exp{c ds U (u) S U (u) + S C (u) + d u<t log [ S U (u+) + SC (u+) ] SU (u ) + S C (u ) }, where c denotes integration over the continuity intervals of Su and d denotes summation over the discrete jumps of Su. The above expression is called Peterson s representation, showing that S(t) can be represented as a function of SU, S C, and t, that is, S(t) = ψ(s U, S C; t).

Peterson s representation gives us a proof that the KM-estimator Ŝ(t) is consistent. The proof proceeds as follows. Define the empirical sub-survival functions Ŝ U (t) = 1 n Ŝ C(t) = 1 n n I(y i > t, δ i = 1), n I(y i > t, δ i = ). It can be seen that the PL estimator is Ŝ(t) = ψ(ŝ U, Ŝ C; t), provided any ties between uncensored and censored observations are interpreted as uncensored observations preceding censored.

Notice that since Ŝ U is discrete, ψ(ŝ U, Ŝ C ; t) involves only the summation over the discrete jumps of Ŝ U. By the Glivenko-Cantelli theorem, Ŝ U (t) a.s. S U (t), Ŝ C(t) a.s. S C(t), uniformly in t. (The notation a.s. denotes converges almost surely to. ) Also, ψ is a continuous function of SU, S C in the sup norm. That is, if SU S U = sup t SU (t) S U (t) and S C S C, then ψ(su, S C ; t) ψ(s U, S c ; t). Therefore, Ŝ(t) = ψ(ŝ U, Ŝ C; t) a.s. ψ(s U, S C; t) = S(t).

REFERENCE Peterson, JASA (1977).

Functional Delta method Suppose that n 1/2 (ˆΛ( ) Λ( )) d U( ). If φ( ) is compact differentiable, then that is, n 1/2 [φ(ˆλ( )) φ(λ( ))]) d dφ(λ( ); U( )) n 1/2 [φ(λ n ( )) φ(λ( ))]) d dφ(λ( ); n 1/2 (ˆΛ(t) Λ(t))) Note: dφ(f ; G) means differentiability at F in the direction of G