SEMIPARAMETRIC APPROACHES TO INFERENCE IN JOINT MODELS FOR LONGITUDINAL AND TIME-TO-EVENT DATA Marie Davidian and Anastasios A. Tsiatis http://www.stat.ncsu.edu/ davidian/ http://www.stat.ncsu.edu/ tsiatis/ In part joint work with Xiao Song, Department of Biostatistics, University of Washington Semparametric Inference in Joint Models 1
OUTLINE 1. Introduction and example 2. Joint models 3. Conditional score approach 4. Semiparametric likelihood approach 5. Simulation evidence 6. Example, revisited 7. Discussion Semparametric Inference in Joint Models 2
1. INTRODUCTION AND EXAMPLE Longitudinal studies in medical research: Time-to-event, e.g. death, progression to AIDS, etc. (possibly censored) Intermittent longitudinal measurements (time-dependent) Time-independent covariates, e.g. treatment group, age at baseline, gender, etc Questions of interest: Interrelationships between these variables Semparametric Inference in Joint Models 3
AIDS Clinical Trials Group (ACTG) 175: Compare 4 antiretroviral regimens in asymptomatic HIV-infected patients 2467 subjects, 1991 1994 Primary objective: Compare on basis of time to AIDS or death CD4 counts approximately every 12 weeks Subsequent objective 1: Characterize within-subject patterns of CD4 change Subsequent objective 2: Characterize relationship between features of CD4 profiles and survival Semparametric Inference in Joint Models 4
CD4 profiles: 10 randomly chosen subjects log CD4 2.0 2.2 2.4 2.6 2.8 3.0 0 50 100 150 week Semparametric Inference in Joint Models 5
Ideally: To address these objectives, would like to have for all subjects i T i = time to event X i (u) = longitudinal trajectory for all times u 0 Z i = vector of time-independent covariates Semparametric Inference in Joint Models 6
Objective 1: Characterize within-subject patterns of CD4 change Estimate aspects of average longitudinal trajectories; e.g. E{X i (u)}, var{x i (u)} Trajectory of the average person Difficulties/complications: Data are collected intermittently Possibly measured with error Informative censoring: no measurement available after death! [E.g., Wu and Carroll (1988), Hogan and Laird (1997)] Observe only W i (t ij ), j =1,...,m i,t ij T i W i (t ij )=X i (t ij )+e(t ij ) Semparametric Inference in Joint Models 7
Objective 2: Characterize relationship between features of CD4 profiles and survival Establish relationships between T i and X i (u), u 0, and Z i History X H i (t) ={X i(u),u t} Proportional hazards model: λ i (u) = lim du 0 du 1 pr{u T i <u+ du T i u, Xi H (u),z i } = λ 0 (u)exp{γx i (u)+η T Z i } Interest in estimating γ, η, (q 1). Semiparametric model (baseline hazard λ 0 (u) isunspecified) Focus here: Objective 2 Semparametric Inference in Joint Models 8
Popular focus: Surrogate marker problem Can we replace the time-to-event endpoint by CD4 in assessing treatment efficacy? Roughly speaking (Prentice conditions) Treatment is effective on time-to-event Treatment has effect on marker; e.g. CD4 more on treatment than placebo Effect of treatment should manifest through the marker; i.e., risk of event given specific marker trajectory should be of treatment For example, λ 0 (u)exp{γx i (u)+η T Z i }, If γ<0andη =0? X i (u) isasurrogate marker Semparametric Inference in Joint Models 9
Complication 1: Time-to-event may be censored Underlying censoring variable C i (T i,c i ) = (survival,censoring) time for i Observe V i =min(t i,c i ), i = I(T i C i ) Cox (1972, 1975) partial likelihood: maximize [ n exp{γx i (V i )+η T Z i } n j=1 exp{γx j(v i )+η T Z j }I(V j V i ) i=1 ] i Semparametric Inference in Joint Models 10
Complication 2: Presumes X i (u),i=1,...,n is available at all failure times X i (u) only available intermittently Subject to error and intra-subject variation Naive approach: Extrapolate X i (u) at failure times u using available longitudinal data, substitute in partial likelihood E.g. use most recent observed, error-prone covariate value (in place of unobserved, true covariate) Last Value Carried Forward (LVCF) Such substitution biased estimation (Prentice, 1982) Semparametric Inference in Joint Models 11
2. JOINT MODELS Popular approach: Joint longitudinal data-survival model True longitudinal data process: mixed effects model (+ stochastic process) Normal measurement error Proportional hazards model depending on true longitudinal data process Semparametric Inference in Joint Models 12
Longitudinal data process: Popular models X i (u) =α 0i + α 1i u, α i =(α 0i,α 1i ) T X i (u) =α 0i + α 1i u + α 2i u 2 + + α pi u p X i (u) =α 0i + α 1i u + U i (u) U i (u) stationary Gaussian process (Henderson, Diggle, Zeger, et al.) U i (u) integrated Ornstein-Uhlenbeck (IOU) process (Taylor, DeGruttola, et al.) Introduce within-subject autocorrelation, allow time-varying trend Semparametric Inference in Joint Models 13
Ingredients of joint model: Two linked model components Longitudinal data: At times t i = {t ij, j =1,...,m i } W i (t ij )=X i (t ij )+e(t ij ) e(t ij ) N(0,σ 2 ) of α i, U i (u) α i usually assumed normal (more later... ) Time-to-event data: Proportional hazards model λ i (u) = lim du 0 du 1 pr{u T i <u+ du T i u, Xi H (u),z i } = λ 0 (u)exp{γx i (u)+η T Z i } Dependence on α i through X i (u) Semparametric Inference in Joint Models 14
Philosophical interlude: From biological point of view, form of X i (u) may be dictated by beliefs in underlying biological mechanisms E.g., X i (u) =α 0i + α 1i u smooth trend is predominant feature associated with prognosis E.g., X i (u) =α 0i + α 1i u + U i (u) local good or bad periods predominant feature associated with prognosis From empirical point of view, represent hazard in terms of relevant features of longitudinal process Higher-order polynomials, splines versus stochastic process Ease of implementation From practical point of view intermittency motivates assumption that within-subject autocorrelation negligible U i (u) absorbed intoe i (u) Semparametric Inference in Joint Models 15
Observed data: (V i, i,w i,z i,t i ), i =1,...,m V i =min(t i,c i ), i = I(T i C i ),W i = {W (t i1 ),...,W(t imi )} T Implementation: Assuming normal α i Two-stage ( Regression calibration ): (i) EBLUPs from normal mixed model fit at each survival time, (ii) insert in Cox partial likelihood reduces but doesn t eliminate bias (Pawitan & Self, 1993; Tsiatis et al., 1995; Dafni & Tsiatis, 1998; Lavalley and DeGruttola, 1996; Bycott & Taylor, 1998) Likelihood: DeGruttola & Tu (1994), Wulfsohn & Tsiatis (1997), Henderson et al. (2000) Bayesian: Faucett & Thomas (1996), Xu and Zeger (2001), Wang and Taylor (2001), Brown & Ibrahim (2003) Issues: Sensitivity to assumptions Computational complexity Semparametric Inference in Joint Models 16
ACTG 175: Assuming X i (u) = log CD4, X i (u) =α 0i + α 1i u Raw individual OLS estimates 0.0 1.0 2.0 3.0 1.5 2.0 2.5 3.0 intercept 0 50 100 150 200 250-0.02-0.01 0.0 0.01 slope Semparametric Inference in Joint Models 17
3. CONDITIONAL SCORE APPROACH Our objective: X i (u) =f(u) T α i ; e.g., X i (u) =α 0i + α 1i u Simple method (computationally and conceptually straightforward) that yields consistent, asymptotically normal estimator for γ, η... And that furthermore makes no distributional assumption about the underlying random effects α i Semiparametric model/method (λ 0 (u) and distribution of α i unspecified) Our approach: Exploit the conditional score idea of Stefanski & Carroll (1987) for GLIMs Condition away the random effects α i Semparametric Inference in Joint Models 18
Assume: σ 2 known for now Notation: ˆX i (u) OLS estimator for X i (u) based on t i (u) =(t ij <u) (requires 2 measurements prior to u) dn i (u) =I(u V i <u+ du, i =1,t i2 u) puts point mass at u for observed death time if after 2nd measurement Y i (u) =I(V i u, t i2 u) =at risk with 2 measurements at u Semparametric Inference in Joint Models 19
Assume: Hazard relationship satisfies λ i (u) = lim i <u+ du T i u, C i u, t i (u), W i (u),α i,z i ) du 0 = lim i <u+ du T i u, C i u, t i (u), ẽ i (u),α i,z i ) du 0 = lim i <u+ du T i u, α i,z i ) du 0 = λ 0 (u)exp{γx i (u)+η T Z i }. W i (u) ={W i (t ij ): t ij <u}, ẽ i (u) ={e i (t ij ): t ij <u} Also assume: Distribution of e i (t ij ) given measurement at t ij,at risk at t ij, measurement history prior to t ij, α i, Z i is N (0,σ 2 ) May be shown with additional assumptions on censoring, timing of measurements ˆX i (u) Y i (u) =1,t i (u),α i,z i N{X i (u),σ 2 θ i (u)} σ 2 θ i (u) =prediction variance [depends on t i (u)] Semparametric Inference in Joint Models 20
At any time u: Conditional on being at risk at u pr{dn i (u) =r, ˆX i (u) =x Y i (u) =1,α i,z i,t i (u)} =pr{dn i (u) =r Y i (u) =1, ˆX i (u) =x, α i,z i,t i (u)} pr{ ˆX i (u) =x Y i (u) =1,α i,z i,t i (u)} 1st piece: Bernoulli[λ 0 (u)du exp{γx i (u)+η T Z i }] 2nd piece: N{X i (u),σ 2 θ i (u)} At time u, likelihood {dn i (u), ˆX i (u)} Y i (u) =1,α i,z i,t i (u): To order du [ { γσ 2 θ i (u)dn i (u)+ =exp X i (u) ˆX }] i (u) σ 2 θ i (u) { {λ 0(u)exp(η T Z i )du} dn i(u) exp ˆX } i 2(u)+X2 i (u) {2πσ 2 θ i (u)} 1/2 2σ 2 θ i (u) Semparametric Inference in Joint Models 21
Thus: Sufficient statistic for α i (conditional on Y i (u) =1) S i (u, γ, σ 2 )=γσ 2 θ i (u)dn i (u)+ ˆX i (u) Suggestion: Conditioning on S i (u, γ, σ 2 ) would remove the dependence of the conditional distribution on (the nuisance parameter ) α i Form estimating equations in same spirit as the partial likelihood score equations Semparametric Inference in Joint Models 22
Usual partial likelihood equations: X i (u) known Intensity lim du 0 du 1 pr{dn i (u) =1 X i (u),z i,y i (u)} = λ 0 (u)exp{γx i (u)+η T Z i }Y i (u) =λ 0 (u)e 0i (u, γ, η) Estimating equations n {X i (u),zi T } T {dn i (u) E 0i (u, γ, η, σ 2 )λ 0 (u)du} =0 i=1 n {dn i (u) E 0i (u, γ, η, σ 2 )λ 0 (u)du} =0 i=1 E 0 (u, γ, η) = n j=1 E 0j(u, γ, η), ˆλ 0 (u)du = n j=1 dn j(u)/e 0 (u, γ, η) Substitute ˆλ 0 (u)du, rearrange, solve in (γ,η) n [ {X i (u),zi T } T E ] 1(u, γ, η) dn i (u) =0 E 0 (u, γ, η) i=1 E 1j (u, γ, η) ={X i (u),zi T } T E 0j (u, γ, η), E 1 (u, γ, η) = n j=1 E 1j(u, γ, η) Semparametric Inference in Joint Models 23
Conditional score estimating equations: By analogy Conditional intensity canshow lim du 0 du 1 pr{dn i (u) =1 S i (u, γ, σ 2 ),Z i,t i (u),y i (u)} = λ 0 (u)exp{γs i (u, γ, σ 2 ) γ 2 σ 2 θ i (u)/2+η T Z i }Y i (u) = λ 0 (u)e 0i(u, γ, η, σ 2 ) Estimating equations n {S i (u, γ, σ 2 ),Zi T } T {dn i (u) E0i(u, γ, η, σ 2 )λ 0 (u)du} =0 i=1 n {dn i (u) E0i(u, γ, η, σ 2 )λ 0 (u)du} =0 i=1 E 0 (u, γ, η, σ 2 )= n j=1 E 0j(u, γ, η, σ 2 ), ˆλ 0 (u)du = n j=1 dn j(u)/e 0 (u, γ, η, σ 2 ) Semparametric Inference in Joint Models 24
Substitute ˆλ 0 (u)du, rearrange solve in (γ,η) n [ {S i (u, γ, σ 2 ),Zi T } T E 1(u, γ, η, σ 2 ] ) E0 (u, γ, η, dn i (u) =0 σ2 ) i=1 E1j(u, γ, η, σ 2 )={S j (u, γ, σ 2 ),Zj T } T E0j(u, γ, η, σ 2 ), E1 (u, γ, η, σ 2 )= n j=1 E 1j(u, γ, η, σ 2 ) Remarks: Easy to compute Can substitute ˆσ 2 based on OLS residuals Consistent, asymptotically normal, SEs via sandwich Reduces to usual partial likelihood equations when σ 2 =0[so ˆX i (u) =X i (u)] Tsiatis and Davidian (2001, Biometrika), Song, Davidian, and Tsiatis (2002, Biostatistics) Semparametric Inference in Joint Models 25
4. SEMIPARAMETRIC LIKELIHOOD APPROACH Potential drawbacks of conditional score approach: Possible inefficiency Includes any distribution for α i Alternative approach: Likelihood formulation Take X i (u) =f T (u)α i as before Make realistic but not overly restrictive assumption on α i Assume: α i have density p(α i )inaclassof smooth densities H (Gallant and Nychka, 1987) Densities in H do not have jumps, kinks or oscillations but may be skewed, multi-modal, fat- or thin-tailed relative to the normal (and the normal is H) α i = g(µ, Z i )+Rz i, R lower triangular, z i has density h H Semparametric Inference in Joint Models 26
Representation of h H: h(z) =P (z)ϕ 2 q (z)+small lower bound for tail behavior P (z) is an infinite-dimensional polynomial ϕ q (z) isq-variate standard normal density Practical approximation: Truncate h K (z) =PK(z)ϕ 2 q (z) P K (z) iskth order polynomial; e.g., for K =2 P K (z) =a 00 + a 10 z 1 + a 01 z 2 + a 20 z1 2 + a 02 z2 2 + a 11 z 1 z 2 Vector of coefficients a must satisfy h K (z) dz =1 K =0isstandard normal α i N{g(µ, Z i ),RR T } SemiNonParametric (SNP) Semparametric Inference in Joint Models 27
n i=1 As before: Proportional hazard model assumption, normal errors same as for conditional score approach Implementation issues: Polar coordinate reparameterization in terms of vector of angles φ to enforce h K (z) dz = 1, numerical stability Parameters of interest: Ω = {λ 0 ( ),γ,η,σ 2,µ,R,φ} K controls degree of flexibility, must be chosen somehow Need a likelihood function Likelihood: Under certain assumptions on nature of censoring, timing of measurements, likelihood as function of Ω [ λ 0 (V i )exp{γx i (V i )+η T Z i }] i exp [ [ 1 (2πσ 2 ) exp m i/2 m i j=1 Vi 0 ] λ 0 (u)exp{γx i (u)+η T Z i } du ] {W i (t ij ) X i (t ij )} 2 h 2σ 2 K (z i ) dz i Semparametric Inference in Joint Models 28
Implementation: EM algorithm E-step: Intractable integration by quadrature M-step: Maximization in (µ, R, φ) and(γ,η,σ 2,λ 0 ) separates; one-step NR update for (γ,η) SEs, CIs by profile likelihood Choice of K: Adaptive based on inspection of information criteria If l K ( Ω) is maximized loglikelihood for fixed K, N = total number of observations, Ω (p 1), minimize { l K ( Ω) + pc(n)}/n AIC, C(N) = 1; BIC, C(N) = log N/2; Hannan-Quinn (HQ), C(N) = log log N AIC prefers larger models, BIC smaller, HQ intermediate Song, Davidian, and Tsiatis (2002), Biometrics Semparametric Inference in Joint Models 29
5. SIMULATION EVIDENCE Simple situation: η =0 X i (u) =α 0i + α 1i u E(α i )=(4.173, 0.0103) T, γ = 1.0, λ 0 (u) =1, C i exp(1/110), additional censoring at 80 weeks, σ 2 =0.6 Nominal t ij =(0, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80), 10% missing Case 1: α i bivariate mixture of normals Case 2: α i bivariate normal Semparametric Inference in Joint Models 30
Methods: I, Ideal, X i (u) known for all u LV, LVCF CS, Conditional score, σ 2 estimated K = 0, likelihood with normal α i SNP, K chosen by HQ Next slide: 200 Monte Carlo replications, n = 200, σ 2 =0.60 MC SD, Average estimated SE, CP of 95% Wald CI Semparametric Inference in Joint Models 31
I LV CS K =0 SNP Case 1: Mixture Scenario γ 1.02 0.82 1.05 1.01 1.01 SD 0.09 0.07 0.17 0.10 0.10 SE 0.09 0.07 0.14 0.11 0.11 CP 0.96 0.31 0.94 0.97 0.95 Case 2: Normal Scenario γ 1.01 0.81 1.04 1.00 1.00 SD 0.08 0.07 0.15 0.10 0.10 SE 0.08 0.07 0.13 0.10 0.10 CP 0.96 0.25 0.93 0.96 0.95 Semparametric Inference in Joint Models 32
Observations: Conditional score is inefficient (but much faster... ) Interesting robustness to specification of distribution of α i Not shown: Likelihood inference on aspects of α i compromised by incorrect distributional assumption Advantage of likelihood inference: Insight on distribution of α i model refinements Semparametric Inference in Joint Models 33
0-0.5-1 -5 0 5 10 15 αi0 1 0.5 αi1-0.5-1 -5 0 5 10 15 αi0 1 0.5 0 αi1 density 0 0.20.40.6 0.8 1 density 0 0.20.40.6 0.8 1 IWSM2003 True and MC average density estimates: Semparametric Inference in Joint Models 34
Intercept marginal Average and raw estimates: density 0.0 0.10 0.20 0.30-5 0 5 10 15 α i0 density 0.0 0.10 0.20 0.30 5 0 5 10 15 α i0 Semparametric Inference in Joint Models 35
6. EXAMPLE, REVISITED Model: W (t ij )=X i (t ij )+e(t ij ), X i (u) =α 0i + α 1i u, e(t ij ) N(0,σ 2 ) α i = µ 0 (1 Z i )+µ 1 Z i + Rz i, Z i = I(Trt=ZDV) λ i (u) =λ 0 (u)exp{γx i (u)+ηz i } 2467 subjects Conditional Score: Estimate (γ,η,σ 2 ) with no assumptions on α i (so no model for α i ) Likelihood: Assume h Hand estimate {γ,η,σ 2,µ,R,λ 0 (u)} with K =0, 1, 2, 3, 4; all criteria chose K =3or4 Semparametric Inference in Joint Models 36
CS K =0 K =2 K =3 K =4 γ 2.214 2.487 2.487 2.490 2.498 (0.207) (0.091) (0.092) (0.092) (0.092) η 0.145 0.003 0.002 0.001 0.007 (0.264) (0.132) (0.132) (0.132) (0.131) loglike 8558.465 9018.782 9310.945 9347.163 AIC 0.412 0.435 0.449 0.450 HQ 0.397 0.418 0.432 0.433 BIC 0.364 0.385 0.397 0.396 Semparametric Inference in Joint Models 37
-0.02 1.5 2 2.5 3 αi0 0-0.01 αi1 0 200400600 density 800 IWSM2003 Estimated joint density, K =4: Semparametric Inference in Joint Models 38
Estimated slope marginal, K =0, 2, 3, 4: density 0 50 100 150 200 250-0.02-0.01 0.0 0.01 α i1 Semparametric Inference in Joint Models 39
7. DISCUSSION Naive methods yield biased inferences Regression calibration may also yield bias Likelihood or methods like conditional score are required, yield unbiased inferences Conditional score is easy to compute, readily extends to multiple longitudinal processes, and does not require as restrictive assumptions on censoring and timing of measurements But is inefficient, does not accommodate additional stochastic process, only permits inference on hazard parameters Full likelihood approaches are computationally intensive; robustness to violation of assumptions still unclear Philosophical issue: Modeling the longitudinal process X i (u) Semparametric Inference in Joint Models 40