Survival Analysis for Interval Censored Data - Nonparametric Estimation

Survival Analysis for Interval Censored Data - Nonparametric Estimation Seminar of Statistics, ETHZ Group 8, 02. May 2011 Martina Albers, Nanina Anderegg, Urs Müller

Overview Examples Nonparametric MLE for Current Status Censored Data Existence? Uniqueness? Characterization? Asymptotic Behavior?

Goals Different types of censoring Terminology Examples INTRODUCTION

Censored data Right censoring Current status censoring Interval censoring case Mixed case interval censoring Bivariate interval censored data k

Right censoring 1. X: failure time X ~ F 2. C: censoring time C ~ G 3. Cindependent of X 4. n observations: iid copies of Variable of interest ( ) ( U, ) min( X, C), 1 ( X C ) Goal: Estimate distribution function F( x) = P[ X x]

Censored data Right censoring Current status censoring Interval censoring case Mixed case interval censoring Bivariate interval censored data k

Current status censoring 1. X: failure time X ~ F 2. T: observation time T ~ G 3. X independent of T 4. n observations: iid copies of Variable of interest ( T ) ( T ),,1 { X T } Goal: Estimate distribution function F( x) = P[ X x]

Right censoring vs. Current status censoring Right Censoring Observe: Current Status Censoring Observe: ( ) ( U, ) min( X, C), 1 { X C } ( T ) ( T ),,1 { X T }

Example: Lung cancer data Time of onset of lung cancer (non lethal) 144 (special) mice (likely to get cancer) sacrifice each mouse at random time Determine current status: cancer/no cancer 2 groups Conventional environment (96) Germfree environment (48) Time to onset of lung cancer: [0 (start) onset of cancer]

Time to onset of cancer At time of sacrifice: find tumor Onset occurred at some earlier point At time of sacrifice: no tumor Onset will occur later/never Goal: estimate distribution of time to onset! (analyze for the two groups separately)

Mathematical Formulation X: time to onset of lung cancer T: time of sacrifice Random time! (chosen randomly!) Both are measured from the beginning of study

Mathematical Formulation ( x,t,δ ) realization of ( X,T, ) Cancer was observed at time of sacrifice: x t observe: conclusion: Cancer not observed at time of sacrifice: x > t observe: conclusion: ( t 1 ) (,1), { x t} = t ( 0 ] x,t ( t 1 ) (,0), { x t} = t ( t ) x,

Right censoring vs. Current status censoring Right Censoring Observe: ( ) ( U, ) min( X, C), 1 ( X C ) Current Status Censoring Observe: ( T ) ( T ),,1 { X T } Either know x exactly or c, NEVER know x exactly, i.e. x 0,t or x t, x ( ) ( ] ( )

Right censoring c x x ( c, )

Current status censoring (Interval censoring case 1) t ( 0 ] x ( t, ) x,t

The data Conventional environment (96) Germfree environment (48)

The data With tumor: 381 After 381days: tumor was found in mouse No tumor: 45 After 45days: no tumor was found in mouse

Censored data Right censoring Current status censoring Interval censoring case Mixed case interval censoring Bivariate interval censored data k

2 intervals (k+1)intervals!

Theoretical example: HIV infection Age at HIV infection Representative sample Follow each person for 2 years 2 HIV tests during this period Determine: negative/positive Time to infection: [0 (start) HIV infection]

Interval censoring case k=2 1. X: failure time X ~ F 2. ( T ) 1,T 2 : observation times ( T T ) ~ G 1, 2 3. X independent of T 1,T 2 4. n observations: iid copies of ( ) ( T, T ), ( 1,1, ) ( ) T, 2 { X T } { T < X T } { T < X } ( ) Variable of interest 1 1 1 1 2 2 Goal: Estimate distribution function F( x) = P[ X x]

2 observation times T1 T2 ( ) 1 2 3 Nota bene: 3 = 1 1 2

Interval Censoring, case 1 vs. Interval Censoring, case k Case 1 1 observation per subject: ( T ) T, 1{ X T } ( ) Case k, ( ) k observations per subject: T, = (( T, T,..., T ) ( )) 1 2 k, 1 { X T,1,...,1 1 } { T1 < X T2 } { Tk < X } ( ) = ( T,1, ) T, { X T } 1{ T < X }

Mathematical Formulation X: age at HIV infection ( ) : ages at time of HIV tests T 1,T 2 Random times (chosen randomly!) All times are measured from the beginning of study

Mathematical Formulation ( ) x, t1, t2, δ1, δ2, δ3 realization of 1 st test is positive: x t 1 observe: conclusion: 1 st test is negative, 2 nd test is positive: t1 < x t 2 observe: conclusion: 1 st and 2 nd test are negative: x > t 2 observe: conclusion ( X, T, T,, ) 1 2 1 2, ( t t,1,1,1 ) ( t,,1,0,0 ), 2 { x t } { t < x t } { t < x} = 1 2 1 t 1 1 2 2 x x ( 0, t ] ( t t,1,1,1 ) ( t,,0,1,0 ), 2 { x t } { t < x t } { t < x} = 1 2 1 t 1 1 2 2 1 [ ) t 1,t 2 ( t t,1,1,1 ) ( t,,0,0,1 ), 2 { x t1} { t1< x t2} { t < x} = 1 t2 x t 2, 1 2 [ ) 3

Censored data Right censoring Current status censoring Interval censoring case Mixed case interval censoring Bivariate interval censored data k

fixed k variable K! for all patients: k observations (fix!) Let number of observations vary from patient to patient Define random variable Kas number of visits

HIV infection Patient A: ( t t,1,1,1 ) ( t,,1,0,0 ), 2 { x t } { t < x t } { t < x} = 1 2 1 t 1 1 2 2 Patient B: ( t t, t,1,1,1,1 ) ( t, t,,0,0,1,0 ), 2 3 { x t } { t < x t } { t < x t } { t < x} = 1 2 3 1 t 1 1 2 2 3 3 Patient C: K = 2 K = 3 K = 1 ( t 1,1 ) (,0,1), { x t } { t < x} = 1 1 t 1 1

Example: Breast cosmesisdata Time to breast retraction 94 patients clinic visits to determine current status retraction/no retraction Number of visits vary from patient to patient! 2 groups Radiotherapy (46) Radiotherapy + chemotherapy (48) Time to breast retraction: [0 (start) retraction]

The data

The data ( 6,10] At 6 months: patient showed no deterioration in initial cosmetic result By 10 months: retraction was present 45, _ At 45 months: patient showed no deterioration in initial cosmetic result _ : No second test ( ]

Censored data Right censoring Current status censoring Interval censoring case Mixed case interval censoring Bivariate interval censored data k

Bivariate interval censoring 1. ( X,Y ) : failure times ( X, Y ) ~ F U = ( U ( U, V 1, U2 ), V = ( V1, V2 ) ) ~ G ( X,Y ) 2. : observation times 3. independent of V 4. n observations: iid copies of where = U, ( ) Variables of interest, ( U,V ) (,,,,,,, ) 11 12 13 21 22 23 31 32, Goal: Estimate distribution function F( x, y) = P[ X x, Y y] 33

Bivariate interval censoring R R R R R R R R R 11 12 13 21 22 23 31 32 33 = = = = = = = = = ij = 1, {( X Y ) } R ij ( 0, U1] ( 0, V1 ] ( U1, U2] ( 0, V1 ] ( U2, ) ( 0, V1 ] ( 0, U1] ( V1, V2 ] ( U1, U2] ( V1, V2 ] ( U2, ) ( V1, V2 ] ( 0, U1] ( V2, ) ( U1, U2] ( V2, ) ( U, ) ( V, ) 2 2

Sketch on blackboard

Example: ACTG 181 Time to shedding of a cytomegalovirus (CMV) Time to colonization of mycobacterium avium complex (MAC) 204 patients test for CMV shedding and MAC colonization at least once during trial (no prior CMV/MAC diagnosis) Tested at regular monthly intervals

Time to CMV shedding and MAC colonization Patient didn t miss any clinical visit: record month of 1 st positive test ( discrete failure time data) Patient missed some visits and was tested positive after missed visit: record time interval ( discrete interval censored failure time data) Goal: estimate distribution of time of shedding and of colonization!

Mathematical Formulation X: time to CMV shedding Y: time to MAC colonization ( U,U 1 2 ) ( ) : times of tests for CMV shedding : times of tests for MAC colonization V 1,V 2 Model is not perfect here! Number of observations varies from patient to patient Model should include different or random number of observation times

The data 204 subjects: 68 interval censored time of CMV shedding 10 interval censored time of MAC colonization 89 right censored times of CMV shedding 190 right censored times of MAC colonization 46 left censored times of CMV shedding 4 left censored times of MAC colonization Test every 12 weeks, i.e. quarterly screens Visit rounded to closest quarter

Number of patients The data an excerpt CMV MAC CMV MAC CMV MAC CMV MAC CMV MAC CMV MAC CMV MAC

( ) 0,6 15 : The data at start: no CMV shedding missed controls in the following 6 months after 6 months: first CMV shedding MAC colonization after 15 months ( ) 3,3 : actual left and right endpoints are rounded to the same number.

Derive the likelihood for different censoring models THE LIKELIHOOD

Nonparametric MLE Completely nonparametric Do not assume that failure time distribution follows any parametric model Only assumption: variable of interest (X) is independent of censoring resp. observation times (Cresp. T)

Recall: Right censored data Likelihood of n iid observations n i= 1 f Nonparametric MLE F maximizes: ( u δ ), ( u, δ ),..., (, δ ) 1, 1 2 2 ( ) δ ( ( )) ( ( ) ) ( ) i 1 δi δi u 1 F u 1 G u g u i L n n i ( ) ({ }) δ = ( ( )) i F F u 1 F u i= 1 i i i 1 δ i i u n 1 δ i n where F ({ u} ) = F( u) F( u )

Other representation: observed sets R, : observed sets 1 R2,..., R n Contain the unobservable realization of X! i.e.: { ui} if δi = 1 Ri = u, if δ = 0 ( i ) Nonparametric MLE F maximizes: ( ) Where is the probability under distribution Fthat X R i ( ) { } P F R i n i ( ) δi ( ( )) 1 δi 1 ( ) L F = F u F u = P R n i i F i i= 1 i= 1 n

Current status data Recall: n observations: iid copies of i.e.: X ~ ( T ) ( T ),,1 { X T } 1, 1 2 2 t n n ( t δ ), ( t, δ ),..., (, δ ) F T ~ G Likelihood = product of densities of observations Consider = 0 and = 1separately, then combine

Current status data Likelihood of n iid observations Nonparametric MLE F maximizes: ( ) ( ) ( ) ( ) = n i i i i t g t F t F i i 1 1 1 δ δ ( ) ( ) ( ) ( ) = = n i i i n i i t F t F F L 1 1 1 δ δ ( ) ( ) ( ) n t n t t δ δ δ,,...,,,, 2 2 1 1

Other representation: observed sets R, : observed sets 1 R2,..., R n Recall: contain the unobservable realization of X i.e.: ( 0, ti ] if δi = 1 Ri = t, if δ = 0 ( i ) Nonparametric MLE F maximizes: ( ) Where is the probability under distribution F that X R i P F R i n i ( ) ( ) 1 ( ) i ( ) 1 i ( ) δ δ L F = F t F t = P R n i i F i i= 1 i= 1 n

General form of the likelihood Representation in terms of observed sets: R 1, R2,..., R n L n n ( F ) = P ( ) F Ri i= 1 where change depending on censoring model! R i

ReducingtheOptimizationProblem Likelihood n L ( F) = P ( R ) n F i i= 1 Optimization problem: sup l ( F), F F n with l ( F) log L ( F) log P ( R ) = = n n F i i= 1 where F is the space of all distribution functions on the appropriate space. n

ReducingtheOptimizationProblem Problem infinite-dimensional optimization problem Solution maximal intersections A,, of observed sets 1 Am R,, 1 Rn Definition A j is a maximal intersection iff A R for some subset j = β { 1,, n} i β j i j { 1,, n} * no strict superset of s.t.: β j β j i β * j R i.

ReducingtheOptimizationProblem Maximize log PF ( Ri ), i= 1 distribute probability mass (total amount 1) on probability space. Properties of MLE n No mass outside the sets R, 1, Rn Some positive mass in each set R,, 1 Rn Mass only in maximal intersections: A,, 1 Am indifferent to the distribution of mass within the maximal intersections (representational nonuniqueness)

ReducingtheOptimizationProblem α,, : masses in corresponding maximal 1 αm intersection C: ( m n) clique matrix, with Likelihoodin termsof l n n ( α) = log P ( R ) i= 1 α i α C = 1{ A R } ji j i n n n = log α j1{ Aj Ri} = log C i= 1 j= 1 i= 1 ( T α ) i

Existenceand(non-)uniquenessofthe Notation where A Theorem 3.1 ˆα Theorem 3.2 The MLE exists. l n MLE ( ˆ α) = max l ( α) α A n m m = α R : α j 0; j = 1, m; α j = 1 i= 1 n Log-likelihood log PF ( Ri ) strictly concave in P ( F R ). i= 1 Thus, MLE estimates the probabilities P ( 1 ),, F R PF ( Rn ) of the observation rectangles uniquely. Proof: cf. Blackboard

Existenceand(non-)uniquenessofthe Remarks MLE Log-likelihood (not strictly) concave in F two different functions vector ( ) ( ) ( P R P R ) F F n 1,, F1, F2 F can yield same The estimation of F is not necessarily unique. Same holds for α (mixture non-uniqueness)

(Non-)uniquenessoftheMLE Example max α R 4 4 i= 1 log subject to P α Solution: α 0, j = 1,,4 j ( R ) i ( α + α ) + ( α + α ) + ( α + α ) + og ( α α ) { } = max log log log l + 1 4 1 2 2 3 3 4 α1 + α2 + α3 + α4 = 1 ˆα not unique! ˆ ˆ 1 ˆ ˆ 1 α1 = α3 = x, α2 = α4 = x, for some x 0, 2 2

Non-uniqueness of the MLE Representational non-uniqueness where to put the masses in the maximal intersections Mixture non-uniqueness how to assign masses to the intersections

(Non-)uniqueness of the MLE Handling the representational nonuniqueness Lower bound for MLE by assigning all mass to the upper right corners of the maximal intersections Upper bound for MLE by assigning all mass to the lower left corners of the maximal intersections

Sufficient condition for mixture uniqueness Theorem 3.3 ˆα The MLE is unique if the clique matrix C has rank m. Proof: cf. Blackboard.

Basic Characterization of the MLE CURRENT STATUS DATA

Problem Setting Recall 1. X: failure time X ~ F 2. T: observation time T ~ G 3. Xindependent of T 4. n observations: iid copies of ( T ) ( T ),,1 { X T } Goal: Estimate distribution function F( x) = P[ X x]

Notation T (1) ( ) Problem Setting,, T : order statistics of T n 1, T,, : corresponding values, n (1) ( ), n i.e. Recall = if T = T ( i) j ( i) j. n i n = i i i= 1 n L ( F) F( T ) (1 F( T )) 1 l ( F) = log F( T ) + (1 )log(1 F( T )) n i i i i i= 1 i

Basic Characterization Let n Y = { y R : 0 < y y < 1} 1 n ŷ Define by setting yˆ Fˆ. ( T ) ( ) i n i an estimator of F

Basic Characterization Corollary 1.2 ŷ The vector is the MLE iff j {, n} { ˆ } ( i) yi 1, : 0. i< j If yˆ > yˆ (with yˆ = 0, yˆ = 1) : j j 1 0 n+ 1 { } ( i) yi i< j ˆ = 0

Basic Characterization Proposition 1.3 P = Pi = i, ( j), i = 0,, n j i H greatest convex minorant of P. ŷ is MLE iff i = 1, n : y equals the left derivative of H at. ˆi Definition His called greatest convex minorantof V, if - His a convex function such that - H ( t) = V ( t) holds if Hhas a change of slope. H ( t) V ( t), t, i

Asymptotic Theory Consistency: «Does the MLE converge to the actual distribution?» Rate of convergence: «How fast does converge?» Limiting distribution: «What does converge to?»

Global and local consistency of our estimator CONSISTENCY

Consistency «Does the MLE converge to the actual distribution?» Definition An estimator of parameter is said to be consistent if

Consistency Global Consistency Local Consistency

Global Consistency For current status data one can show ( ) consistency:.. 0 Why global? Role of?

LocalConsistency From the global consistency one can derive: Proposition continuous at G continuously differentiable at with strictly positive derivative Then we can choose >0 such that: sup [, ] ( ).. 0

Consistency «DoestheMLE convergetotheactual distribution?» «Yes, convergesto globallyand locally»

Global and local rate of convergence of our estimator RATE OFCONVERGENCE

Rate OfConvergence «Howfast does converge?»

Rate OfConvergence Global rate of convergence Local rate of convergence

Global Rate OfConvergence For current status data one can show: = (1) global rate is /! Definition A sequenceofr.v.,,,issaidtobeoforder 1 if: >0, such that > < forall >

LocalRate OfConvergence Theorem 0< <1 and continuously differentiable at with positive derivatives ( )resp. Then local rate is! ( ) = 1

Rate OfConvergence «Howfast does converge?» «Itconvergesatrate, globallyand locally»

The limiting distribution of the MLE LOCALLIMITINGDISTRIBUTION

LocalLimitingDistribution «What does converge to?»

LocalLimitingDistribution Definition two sided BM with mean 0 and variance = 1 >0 1 / ( ) + ( ) greatest convex minorantof (see blackboard) Definition A Brownian Motion ( )is a stochastic process with: =0 continuous a.s. has independent increments with ~ (0, )

Local Limiting Distribution Theorem 0< <1 and continuously differentiable at withcontinuous derivatives ( )resp. ( ) Then ( ) (0)

Local Limiting Distribution «What does converge to?» «To the slope of the convex minorantof a Brownian motion process plus a parabola»

Likelihood ratio test LR-test 2 2 ~ D = ( S ( t) S0 ( t)) dt. S: slope process of greatest convex minorantof a two-sided Brownian motion plus a parabola S 0 : slope process of greatest convex minorantof a two-sided Brownian motion plus a parabola under the constraints: slopes Slopes 0, for t > 0, 0, for t < 0.

Likelihood ratio test Remarks 2 Not a χ distribution Depends on the slope process of a Brownian motion Independent of the parameters of the problem

Pointwise confidence intervals LR-test denoted by For H0 : F( t0) = θ H : F( t ) θ. 1 0 testing Approximate 1 α confidence set: C { θ λ θ d } It can be shown: (0,1) if λ ( ) n θ 0 < α < 1: let d s.t. P( D > d ) = α n α α, : 2log n( ) Cn, α X < t and T > t. 0 0 α α are closed intervals in

Pointwise confidence intervals Proposition 4.1 Fand Ghave densities fand g, both positive, continuous in a neighborhood of t. 0 Then, for n : ( ) ( ) P F t C P D d F, G ( 0 ) n, α α = 1 α.

Thank you for your attention!