SEMIPARAMETRIC APPROACHES TO INFERENCE IN JOINT MODELS FOR LONGITUDINAL AND TIME-TO-EVENT DATA

Similar documents
Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

JOINT MODELING OF LONGITUDINAL AND TIME-TO-EVENT DATA: AN OVERVIEW

Joint Modeling of Longitudinal and Time-to-Event Data: An Overview

Some New Methods for Latent Variable Models and Survival Analysis. Latent-Model Robustness in Structural Measurement Error Models.

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

Longitudinal + Reliability = Joint Modeling

Review Article Analysis of Longitudinal and Survival Data: Joint Modeling, Inference Methods, and Issues

Modelling Survival Events with Longitudinal Data Measured with Error

A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL

Joint modelling of longitudinal measurements and event time data

2 X. Song et al. 1. Introduction In longitudinal studies, information is collected on a time-to-event (e.g. `survival') and timedependent and time-ind

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Latent-model Robustness in Joint Models for a Primary Endpoint and a Longitudinal Process

Mixed model analysis of censored longitudinal data with flexible random-effects density

STAT331. Cox s Proportional Hazards Model

Parametric Joint Modelling for Longitudinal and Survival Data

Power and Sample Size Calculations with the Additive Hazards Model

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Lecture 5 Models and methods for recurrent event data

Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

On Estimating the Relationship between Longitudinal Measurements and Time-to-Event Data Using a Simple Two-Stage Procedure

Multivariate Survival Analysis

Survival Analysis for Case-Cohort Studies

Estimating the Mean Response of Treatment Duration Regimes in an Observational Study. Anastasios A. Tsiatis.

Joint Modeling of Longitudinal Item Response Data and Survival

Semiparametric Regression

Conditional Estimation for Generalized Linear Models When Covariates Are Subject-specific Parameters in a Mixed Model for Longitudinal Measurements

Personalized Treatment Selection Based on Randomized Clinical Trials. Tianxi Cai Department of Biostatistics Harvard School of Public Health

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Lecture 22 Survival Analysis: An Introduction

A Sampling of IMPACT Research:

Statistical Methods for Joint Analysis of Survival Time and Longitudinal Data

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Multi-state Models: An Overview

FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS FOR LONGITUDINAL AND SURVIVAL DATA

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Joint Modeling of Survival and Longitudinal Data: Likelihood Approach Revisited

Semiparametric Generalized Linear Models

Modelling geoadditive survival data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Bayesian Semiparametric Methods for Joint Modeling Longitudinal and Survival Data

6 Pattern Mixture Models

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes

Separate and Joint Modeling of Longitudinal and Event Time Data Using Standard Computer Packages

Bayesian Methods for Machine Learning

Lecture 3. Truncation, length-bias and prevalence sampling

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University

On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data

Joint Modelling of Accelerated Failure Time and Longitudinal Data

Integrated likelihoods in survival models for highlystratified

University of California, Berkeley

Multilevel Statistical Models: 3 rd edition, 2003 Contents

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Estimation with clustered censored survival data with missing covariates in the marginal Cox model

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Chapter 2 Inference on Mean Residual Life-Overview

USING TRAJECTORIES FROM A BIVARIATE GROWTH CURVE OF COVARIATES IN A COX MODEL ANALYSIS

Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources

A penalized likelihood approach to joint modeling of longitudinal measurements and time-to-event data

Likelihood-based inference with missing data under missing-at-random

Introduction to Statistical Analysis

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Survival Prediction Under Dependent Censoring: A Copula-based Approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Alzheimer s Disease Studies

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Contents. Part I: Fundamentals of Bayesian Inference 1

Partly Conditional Survival Models for Longitudinal Data

UNIVERSITY OF CALIFORNIA, SAN DIEGO

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Multistate Modeling and Applications

Analysing repeated measurements whilst accounting for derivative tracking, varying within-subject variance and autocorrelation: the xtiou command

Stat 579: Generalized Linear Models and Extensions

18 Bivariate normal distribution I

Estimating subgroup specific treatment effects via concave fusion

Bayesian Quantile Regression for Longitudinal Studies with Nonignorable Missing Data

Models for Multivariate Panel Count Data

Residuals and model diagnostics

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Integrated approaches for analysis of cluster randomised trials

PQL Estimation Biases in Generalized Linear Mixed Models

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Generalized Linear Models with Functional Predictors

Censoring mechanisms

DESIGN CONSIDERATIONS FOR COMPLEX SURVIVAL MODELS

A Study of Joinpoint Models for Longitudinal Data

Approximate Bayesian Computation

Statistical Distribution Assumptions of General Linear Models

Transcription:

SEMIPARAMETRIC APPROACHES TO INFERENCE IN JOINT MODELS FOR LONGITUDINAL AND TIME-TO-EVENT DATA Marie Davidian and Anastasios A. Tsiatis http://www.stat.ncsu.edu/ davidian/ http://www.stat.ncsu.edu/ tsiatis/ In part joint work with Xiao Song, Department of Biostatistics, University of Washington Semparametric Inference in Joint Models 1

OUTLINE 1. Introduction and example 2. Joint models 3. Conditional score approach 4. Semiparametric likelihood approach 5. Simulation evidence 6. Example, revisited 7. Discussion Semparametric Inference in Joint Models 2

1. INTRODUCTION AND EXAMPLE Longitudinal studies in medical research: Time-to-event, e.g. death, progression to AIDS, etc. (possibly censored) Intermittent longitudinal measurements (time-dependent) Time-independent covariates, e.g. treatment group, age at baseline, gender, etc Questions of interest: Interrelationships between these variables Semparametric Inference in Joint Models 3

AIDS Clinical Trials Group (ACTG) 175: Compare 4 antiretroviral regimens in asymptomatic HIV-infected patients 2467 subjects, 1991 1994 Primary objective: Compare on basis of time to AIDS or death CD4 counts approximately every 12 weeks Subsequent objective 1: Characterize within-subject patterns of CD4 change Subsequent objective 2: Characterize relationship between features of CD4 profiles and survival Semparametric Inference in Joint Models 4

CD4 profiles: 10 randomly chosen subjects log CD4 2.0 2.2 2.4 2.6 2.8 3.0 0 50 100 150 week Semparametric Inference in Joint Models 5

Ideally: To address these objectives, would like to have for all subjects i T i = time to event X i (u) = longitudinal trajectory for all times u 0 Z i = vector of time-independent covariates Semparametric Inference in Joint Models 6

Objective 1: Characterize within-subject patterns of CD4 change Estimate aspects of average longitudinal trajectories; e.g. E{X i (u)}, var{x i (u)} Trajectory of the average person Difficulties/complications: Data are collected intermittently Possibly measured with error Informative censoring: no measurement available after death! [E.g., Wu and Carroll (1988), Hogan and Laird (1997)] Observe only W i (t ij ), j =1,...,m i,t ij T i W i (t ij )=X i (t ij )+e(t ij ) Semparametric Inference in Joint Models 7

Objective 2: Characterize relationship between features of CD4 profiles and survival Establish relationships between T i and X i (u), u 0, and Z i History X H i (t) ={X i(u),u t} Proportional hazards model: λ i (u) = lim du 0 du 1 pr{u T i <u+ du T i u, Xi H (u),z i } = λ 0 (u)exp{γx i (u)+η T Z i } Interest in estimating γ, η, (q 1). Semiparametric model (baseline hazard λ 0 (u) isunspecified) Focus here: Objective 2 Semparametric Inference in Joint Models 8

Popular focus: Surrogate marker problem Can we replace the time-to-event endpoint by CD4 in assessing treatment efficacy? Roughly speaking (Prentice conditions) Treatment is effective on time-to-event Treatment has effect on marker; e.g. CD4 more on treatment than placebo Effect of treatment should manifest through the marker; i.e., risk of event given specific marker trajectory should be of treatment For example, λ 0 (u)exp{γx i (u)+η T Z i }, If γ<0andη =0? X i (u) isasurrogate marker Semparametric Inference in Joint Models 9

Complication 1: Time-to-event may be censored Underlying censoring variable C i (T i,c i ) = (survival,censoring) time for i Observe V i =min(t i,c i ), i = I(T i C i ) Cox (1972, 1975) partial likelihood: maximize [ n exp{γx i (V i )+η T Z i } n j=1 exp{γx j(v i )+η T Z j }I(V j V i ) i=1 ] i Semparametric Inference in Joint Models 10

Complication 2: Presumes X i (u),i=1,...,n is available at all failure times X i (u) only available intermittently Subject to error and intra-subject variation Naive approach: Extrapolate X i (u) at failure times u using available longitudinal data, substitute in partial likelihood E.g. use most recent observed, error-prone covariate value (in place of unobserved, true covariate) Last Value Carried Forward (LVCF) Such substitution biased estimation (Prentice, 1982) Semparametric Inference in Joint Models 11

2. JOINT MODELS Popular approach: Joint longitudinal data-survival model True longitudinal data process: mixed effects model (+ stochastic process) Normal measurement error Proportional hazards model depending on true longitudinal data process Semparametric Inference in Joint Models 12

Longitudinal data process: Popular models X i (u) =α 0i + α 1i u, α i =(α 0i,α 1i ) T X i (u) =α 0i + α 1i u + α 2i u 2 + + α pi u p X i (u) =α 0i + α 1i u + U i (u) U i (u) stationary Gaussian process (Henderson, Diggle, Zeger, et al.) U i (u) integrated Ornstein-Uhlenbeck (IOU) process (Taylor, DeGruttola, et al.) Introduce within-subject autocorrelation, allow time-varying trend Semparametric Inference in Joint Models 13

Ingredients of joint model: Two linked model components Longitudinal data: At times t i = {t ij, j =1,...,m i } W i (t ij )=X i (t ij )+e(t ij ) e(t ij ) N(0,σ 2 ) of α i, U i (u) α i usually assumed normal (more later... ) Time-to-event data: Proportional hazards model λ i (u) = lim du 0 du 1 pr{u T i <u+ du T i u, Xi H (u),z i } = λ 0 (u)exp{γx i (u)+η T Z i } Dependence on α i through X i (u) Semparametric Inference in Joint Models 14

Philosophical interlude: From biological point of view, form of X i (u) may be dictated by beliefs in underlying biological mechanisms E.g., X i (u) =α 0i + α 1i u smooth trend is predominant feature associated with prognosis E.g., X i (u) =α 0i + α 1i u + U i (u) local good or bad periods predominant feature associated with prognosis From empirical point of view, represent hazard in terms of relevant features of longitudinal process Higher-order polynomials, splines versus stochastic process Ease of implementation From practical point of view intermittency motivates assumption that within-subject autocorrelation negligible U i (u) absorbed intoe i (u) Semparametric Inference in Joint Models 15

Observed data: (V i, i,w i,z i,t i ), i =1,...,m V i =min(t i,c i ), i = I(T i C i ),W i = {W (t i1 ),...,W(t imi )} T Implementation: Assuming normal α i Two-stage ( Regression calibration ): (i) EBLUPs from normal mixed model fit at each survival time, (ii) insert in Cox partial likelihood reduces but doesn t eliminate bias (Pawitan & Self, 1993; Tsiatis et al., 1995; Dafni & Tsiatis, 1998; Lavalley and DeGruttola, 1996; Bycott & Taylor, 1998) Likelihood: DeGruttola & Tu (1994), Wulfsohn & Tsiatis (1997), Henderson et al. (2000) Bayesian: Faucett & Thomas (1996), Xu and Zeger (2001), Wang and Taylor (2001), Brown & Ibrahim (2003) Issues: Sensitivity to assumptions Computational complexity Semparametric Inference in Joint Models 16

ACTG 175: Assuming X i (u) = log CD4, X i (u) =α 0i + α 1i u Raw individual OLS estimates 0.0 1.0 2.0 3.0 1.5 2.0 2.5 3.0 intercept 0 50 100 150 200 250-0.02-0.01 0.0 0.01 slope Semparametric Inference in Joint Models 17

3. CONDITIONAL SCORE APPROACH Our objective: X i (u) =f(u) T α i ; e.g., X i (u) =α 0i + α 1i u Simple method (computationally and conceptually straightforward) that yields consistent, asymptotically normal estimator for γ, η... And that furthermore makes no distributional assumption about the underlying random effects α i Semiparametric model/method (λ 0 (u) and distribution of α i unspecified) Our approach: Exploit the conditional score idea of Stefanski & Carroll (1987) for GLIMs Condition away the random effects α i Semparametric Inference in Joint Models 18

Assume: σ 2 known for now Notation: ˆX i (u) OLS estimator for X i (u) based on t i (u) =(t ij <u) (requires 2 measurements prior to u) dn i (u) =I(u V i <u+ du, i =1,t i2 u) puts point mass at u for observed death time if after 2nd measurement Y i (u) =I(V i u, t i2 u) =at risk with 2 measurements at u Semparametric Inference in Joint Models 19

Assume: Hazard relationship satisfies λ i (u) = lim i <u+ du T i u, C i u, t i (u), W i (u),α i,z i ) du 0 = lim i <u+ du T i u, C i u, t i (u), ẽ i (u),α i,z i ) du 0 = lim i <u+ du T i u, α i,z i ) du 0 = λ 0 (u)exp{γx i (u)+η T Z i }. W i (u) ={W i (t ij ): t ij <u}, ẽ i (u) ={e i (t ij ): t ij <u} Also assume: Distribution of e i (t ij ) given measurement at t ij,at risk at t ij, measurement history prior to t ij, α i, Z i is N (0,σ 2 ) May be shown with additional assumptions on censoring, timing of measurements ˆX i (u) Y i (u) =1,t i (u),α i,z i N{X i (u),σ 2 θ i (u)} σ 2 θ i (u) =prediction variance [depends on t i (u)] Semparametric Inference in Joint Models 20

At any time u: Conditional on being at risk at u pr{dn i (u) =r, ˆX i (u) =x Y i (u) =1,α i,z i,t i (u)} =pr{dn i (u) =r Y i (u) =1, ˆX i (u) =x, α i,z i,t i (u)} pr{ ˆX i (u) =x Y i (u) =1,α i,z i,t i (u)} 1st piece: Bernoulli[λ 0 (u)du exp{γx i (u)+η T Z i }] 2nd piece: N{X i (u),σ 2 θ i (u)} At time u, likelihood {dn i (u), ˆX i (u)} Y i (u) =1,α i,z i,t i (u): To order du [ { γσ 2 θ i (u)dn i (u)+ =exp X i (u) ˆX }] i (u) σ 2 θ i (u) { {λ 0(u)exp(η T Z i )du} dn i(u) exp ˆX } i 2(u)+X2 i (u) {2πσ 2 θ i (u)} 1/2 2σ 2 θ i (u) Semparametric Inference in Joint Models 21

Thus: Sufficient statistic for α i (conditional on Y i (u) =1) S i (u, γ, σ 2 )=γσ 2 θ i (u)dn i (u)+ ˆX i (u) Suggestion: Conditioning on S i (u, γ, σ 2 ) would remove the dependence of the conditional distribution on (the nuisance parameter ) α i Form estimating equations in same spirit as the partial likelihood score equations Semparametric Inference in Joint Models 22

Usual partial likelihood equations: X i (u) known Intensity lim du 0 du 1 pr{dn i (u) =1 X i (u),z i,y i (u)} = λ 0 (u)exp{γx i (u)+η T Z i }Y i (u) =λ 0 (u)e 0i (u, γ, η) Estimating equations n {X i (u),zi T } T {dn i (u) E 0i (u, γ, η, σ 2 )λ 0 (u)du} =0 i=1 n {dn i (u) E 0i (u, γ, η, σ 2 )λ 0 (u)du} =0 i=1 E 0 (u, γ, η) = n j=1 E 0j(u, γ, η), ˆλ 0 (u)du = n j=1 dn j(u)/e 0 (u, γ, η) Substitute ˆλ 0 (u)du, rearrange, solve in (γ,η) n [ {X i (u),zi T } T E ] 1(u, γ, η) dn i (u) =0 E 0 (u, γ, η) i=1 E 1j (u, γ, η) ={X i (u),zi T } T E 0j (u, γ, η), E 1 (u, γ, η) = n j=1 E 1j(u, γ, η) Semparametric Inference in Joint Models 23

Conditional score estimating equations: By analogy Conditional intensity canshow lim du 0 du 1 pr{dn i (u) =1 S i (u, γ, σ 2 ),Z i,t i (u),y i (u)} = λ 0 (u)exp{γs i (u, γ, σ 2 ) γ 2 σ 2 θ i (u)/2+η T Z i }Y i (u) = λ 0 (u)e 0i(u, γ, η, σ 2 ) Estimating equations n {S i (u, γ, σ 2 ),Zi T } T {dn i (u) E0i(u, γ, η, σ 2 )λ 0 (u)du} =0 i=1 n {dn i (u) E0i(u, γ, η, σ 2 )λ 0 (u)du} =0 i=1 E 0 (u, γ, η, σ 2 )= n j=1 E 0j(u, γ, η, σ 2 ), ˆλ 0 (u)du = n j=1 dn j(u)/e 0 (u, γ, η, σ 2 ) Semparametric Inference in Joint Models 24

Substitute ˆλ 0 (u)du, rearrange solve in (γ,η) n [ {S i (u, γ, σ 2 ),Zi T } T E 1(u, γ, η, σ 2 ] ) E0 (u, γ, η, dn i (u) =0 σ2 ) i=1 E1j(u, γ, η, σ 2 )={S j (u, γ, σ 2 ),Zj T } T E0j(u, γ, η, σ 2 ), E1 (u, γ, η, σ 2 )= n j=1 E 1j(u, γ, η, σ 2 ) Remarks: Easy to compute Can substitute ˆσ 2 based on OLS residuals Consistent, asymptotically normal, SEs via sandwich Reduces to usual partial likelihood equations when σ 2 =0[so ˆX i (u) =X i (u)] Tsiatis and Davidian (2001, Biometrika), Song, Davidian, and Tsiatis (2002, Biostatistics) Semparametric Inference in Joint Models 25

4. SEMIPARAMETRIC LIKELIHOOD APPROACH Potential drawbacks of conditional score approach: Possible inefficiency Includes any distribution for α i Alternative approach: Likelihood formulation Take X i (u) =f T (u)α i as before Make realistic but not overly restrictive assumption on α i Assume: α i have density p(α i )inaclassof smooth densities H (Gallant and Nychka, 1987) Densities in H do not have jumps, kinks or oscillations but may be skewed, multi-modal, fat- or thin-tailed relative to the normal (and the normal is H) α i = g(µ, Z i )+Rz i, R lower triangular, z i has density h H Semparametric Inference in Joint Models 26

Representation of h H: h(z) =P (z)ϕ 2 q (z)+small lower bound for tail behavior P (z) is an infinite-dimensional polynomial ϕ q (z) isq-variate standard normal density Practical approximation: Truncate h K (z) =PK(z)ϕ 2 q (z) P K (z) iskth order polynomial; e.g., for K =2 P K (z) =a 00 + a 10 z 1 + a 01 z 2 + a 20 z1 2 + a 02 z2 2 + a 11 z 1 z 2 Vector of coefficients a must satisfy h K (z) dz =1 K =0isstandard normal α i N{g(µ, Z i ),RR T } SemiNonParametric (SNP) Semparametric Inference in Joint Models 27

n i=1 As before: Proportional hazard model assumption, normal errors same as for conditional score approach Implementation issues: Polar coordinate reparameterization in terms of vector of angles φ to enforce h K (z) dz = 1, numerical stability Parameters of interest: Ω = {λ 0 ( ),γ,η,σ 2,µ,R,φ} K controls degree of flexibility, must be chosen somehow Need a likelihood function Likelihood: Under certain assumptions on nature of censoring, timing of measurements, likelihood as function of Ω [ λ 0 (V i )exp{γx i (V i )+η T Z i }] i exp [ [ 1 (2πσ 2 ) exp m i/2 m i j=1 Vi 0 ] λ 0 (u)exp{γx i (u)+η T Z i } du ] {W i (t ij ) X i (t ij )} 2 h 2σ 2 K (z i ) dz i Semparametric Inference in Joint Models 28

Implementation: EM algorithm E-step: Intractable integration by quadrature M-step: Maximization in (µ, R, φ) and(γ,η,σ 2,λ 0 ) separates; one-step NR update for (γ,η) SEs, CIs by profile likelihood Choice of K: Adaptive based on inspection of information criteria If l K ( Ω) is maximized loglikelihood for fixed K, N = total number of observations, Ω (p 1), minimize { l K ( Ω) + pc(n)}/n AIC, C(N) = 1; BIC, C(N) = log N/2; Hannan-Quinn (HQ), C(N) = log log N AIC prefers larger models, BIC smaller, HQ intermediate Song, Davidian, and Tsiatis (2002), Biometrics Semparametric Inference in Joint Models 29

5. SIMULATION EVIDENCE Simple situation: η =0 X i (u) =α 0i + α 1i u E(α i )=(4.173, 0.0103) T, γ = 1.0, λ 0 (u) =1, C i exp(1/110), additional censoring at 80 weeks, σ 2 =0.6 Nominal t ij =(0, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80), 10% missing Case 1: α i bivariate mixture of normals Case 2: α i bivariate normal Semparametric Inference in Joint Models 30

Methods: I, Ideal, X i (u) known for all u LV, LVCF CS, Conditional score, σ 2 estimated K = 0, likelihood with normal α i SNP, K chosen by HQ Next slide: 200 Monte Carlo replications, n = 200, σ 2 =0.60 MC SD, Average estimated SE, CP of 95% Wald CI Semparametric Inference in Joint Models 31

I LV CS K =0 SNP Case 1: Mixture Scenario γ 1.02 0.82 1.05 1.01 1.01 SD 0.09 0.07 0.17 0.10 0.10 SE 0.09 0.07 0.14 0.11 0.11 CP 0.96 0.31 0.94 0.97 0.95 Case 2: Normal Scenario γ 1.01 0.81 1.04 1.00 1.00 SD 0.08 0.07 0.15 0.10 0.10 SE 0.08 0.07 0.13 0.10 0.10 CP 0.96 0.25 0.93 0.96 0.95 Semparametric Inference in Joint Models 32

Observations: Conditional score is inefficient (but much faster... ) Interesting robustness to specification of distribution of α i Not shown: Likelihood inference on aspects of α i compromised by incorrect distributional assumption Advantage of likelihood inference: Insight on distribution of α i model refinements Semparametric Inference in Joint Models 33

0-0.5-1 -5 0 5 10 15 αi0 1 0.5 αi1-0.5-1 -5 0 5 10 15 αi0 1 0.5 0 αi1 density 0 0.20.40.6 0.8 1 density 0 0.20.40.6 0.8 1 IWSM2003 True and MC average density estimates: Semparametric Inference in Joint Models 34

Intercept marginal Average and raw estimates: density 0.0 0.10 0.20 0.30-5 0 5 10 15 α i0 density 0.0 0.10 0.20 0.30 5 0 5 10 15 α i0 Semparametric Inference in Joint Models 35

6. EXAMPLE, REVISITED Model: W (t ij )=X i (t ij )+e(t ij ), X i (u) =α 0i + α 1i u, e(t ij ) N(0,σ 2 ) α i = µ 0 (1 Z i )+µ 1 Z i + Rz i, Z i = I(Trt=ZDV) λ i (u) =λ 0 (u)exp{γx i (u)+ηz i } 2467 subjects Conditional Score: Estimate (γ,η,σ 2 ) with no assumptions on α i (so no model for α i ) Likelihood: Assume h Hand estimate {γ,η,σ 2,µ,R,λ 0 (u)} with K =0, 1, 2, 3, 4; all criteria chose K =3or4 Semparametric Inference in Joint Models 36

CS K =0 K =2 K =3 K =4 γ 2.214 2.487 2.487 2.490 2.498 (0.207) (0.091) (0.092) (0.092) (0.092) η 0.145 0.003 0.002 0.001 0.007 (0.264) (0.132) (0.132) (0.132) (0.131) loglike 8558.465 9018.782 9310.945 9347.163 AIC 0.412 0.435 0.449 0.450 HQ 0.397 0.418 0.432 0.433 BIC 0.364 0.385 0.397 0.396 Semparametric Inference in Joint Models 37

-0.02 1.5 2 2.5 3 αi0 0-0.01 αi1 0 200400600 density 800 IWSM2003 Estimated joint density, K =4: Semparametric Inference in Joint Models 38

Estimated slope marginal, K =0, 2, 3, 4: density 0 50 100 150 200 250-0.02-0.01 0.0 0.01 α i1 Semparametric Inference in Joint Models 39

7. DISCUSSION Naive methods yield biased inferences Regression calibration may also yield bias Likelihood or methods like conditional score are required, yield unbiased inferences Conditional score is easy to compute, readily extends to multiple longitudinal processes, and does not require as restrictive assumptions on censoring and timing of measurements But is inefficient, does not accommodate additional stochastic process, only permits inference on hazard parameters Full likelihood approaches are computationally intensive; robustness to violation of assumptions still unclear Philosophical issue: Modeling the longitudinal process X i (u) Semparametric Inference in Joint Models 40