CASE STUDY: Bayesian Incidence Analyses from Cross-Sectional Data with Multiple Markers of Disease Severity. Outline:

CASE STUDY: Bayesian Incidence Analyses from Cross-Sectional Data with Multiple Markers of Disease Severity Outline: 1. NIEHS Uterine Fibroid Study Design of Study Scientific Questions Difficulties 2. General Problem and Earlier Approaches 3. Bayesian Modeling Framework Stochastic structure Prior Elicitation MCMC Scheme 4. Application to Fibroid Data & Results 5. Discussion

Studying Uterine Leiomyoma (Fibroids) Background: Uterine fibroids are a smooth muscle tumor Fibroids bleeding, pain, infertility, preg compl. Fibroids hysterectomy Fibroids typically regress after menopause African Americans have a higher clinical rate Interests: Inferences on black-white differences in Age-specific rate of preclinical onset Rate of progression after onset

NIEHS Uterine Fibroid Study (Donna Baird, PI) Design: Cross-Sectional Screening Study Participants: Sample from D.C. HMO 840 African American, 524 whites Aged 35-49, pre- and postmenopausal Data: Clinical history (age at first diagnosis) Current presence of detectable fibroids Severity (length uterus, size/number tumors) Age at myomectomy, hysterectomy or menopause

Goals: 1. Estimate cumulative incidence 2. Compare black & white incidence 3. Assess differences in preclinical progression Problem: Data are cross-sectional Age at onset is interval censored Current severity depends on age at onset Informatively missing data What can we do?

Multistate Modeling R S Disease-Free (or not detectable) λ(t) Preclinical Detectable Disease α(t) Clinical Disease State 1 State 2 State 3 Figure 1. Progressive three-state model of the onset and diagnosis process.

Summary of NIEHS uterine fibroid data Surrogate Data a Race Current Status State Number Uterus(cm) Tumor rank b Black No fibroids 1 130 Preclinical 2 185 9.07 (2.17) 1.81 (.76) History of fibroids 3 420 No history, missing? 105 All black 840 White No fibroids 1 190 Preclinical 2 140 8.62 (1.48) 1.49 (.65) History of fibroids 3 125 No history, missing? 69 All white 524 a Mean (sd) among women with leiomyoma detected at screening. b Ordinal measure of size/number of tumors was averaged.

Earlier Approaches Dunson and Baird (2001, Biometrics): Incorporates diagnostic history in estimating incidence. Ryan and Orav (1988, Biometrika): Tumor severity incorporated as covariates in modeling rate of natural death (tumorigenicity experiments). Chen et al (2000, Biometrics): Time-homogenous Markov modeling approach - allows progression between states within preclinical stage. Craig et al (1999), Stats in Med): Bayesian analysis of interval-censored disease data - severity data incorporated as covariates. New approach is needed for inference on progression

Discrete-Time Stochastic Modeling Approach R = Age at Preclinical Onset (Entry State 2) S = Age at Clinical Diagnosis (Entry State 3) I j = (t j 1, t j ] for j = 1,..., J, t 0 = 0 Transition Rates: λ j = Pr(R I j R > t j 1 ) Onset Rate α j = Pr(S I j S > t j 1, R t j ) Diagnosis Rate

Marker Process: Z k = kth Measure of Severity at T (Screen) Zk = Normal variable underlying Z k Z k = g k (Zk; τ k ) Link model Underlying severity model: (Z k R I j, T I l, j l, S > t l ) N ( l j h=1 µ hk, 1 ) Conditional expectation of Z k is 0 when j = l Expectation depends on waiting time in disease state Accommodates discrete and continuous measures

Bayesian Semiparametric Analysis Regression models for the rate parameters: λ ij = h 1 (ω j + x ijβ) Onset Rate α ij = h 2 (ν j + x ijψ) Diagnosis Rate µ ihk = u hυ k + x ihκ k + σξ i Progression, ξ i N(0, 1) ω j, ν j, υ k Baseline parameters β, ψ, κ k Regression parameters σξ i Prior distributions: Subject-specific effect Normal for β, ψ, {υ k, κ k } and σ. For the baseline rate parameters, let ω j 1 2 (ω j 1 + ω j+1 ) Gamma(c 1, d 1 ) ν j 1 2 (ν j 1 + ν j+1 ) Gamma(c 2, d 2 ) The degree of smoothing towards local linearity depends on c, d for small samples

Posterior Computation using MCMC 1. For subjects with disease, sample interval of entry. 2. Sample underlying variables {Z ik} and link parameters τ. 3. Update the parameters {ω j }, β, {ν j }, ψ in onset and diagnosis process. 4. Update the parameters {υ k, κ k }, σ and latent variables {ξ i } in disease progression process. 5. Repeat steps 1-4. Conditional on onset interval, likelihood is simple MCMC Algorithm is straightforward to implement

Application to NIEHS Uterine Fibroid Data Interest: Black-white differences onset/progression Assumption: Preclinical disease menopause Transition Rate Models: log{ log(1 λ ij )} = ω j + x i (j = 1, 2 j 10, 11 j 14)β, log{ log(1 α ij )} = ω j (1 x i ) + β j x i, x i = 1 for blacks, x i = 0 for whites Age is divided into J = 14 intervals: (0, 36], (36, 37], (37, 38],..., (48, 49].

Disease Progression Model Z 1 Z 2 = Length of uterus = 1-3 ranking of size/number of tumors Link Model: Z i1 = τ 11 + τ 21 Z i1 and Z i2 = τ are nuisance link parameters 1 Z i2 τ 12 2 τ 12 < Z i2 τ 22 3 τ 22 < Z i2 Underlying Variable Model: µ ihk = υ k + x i κ k + σξ i for k = 1, 2 and h = 1,..., 13, υ k κ k σξ i = Underlying rate of change in marker k for whites = Black-white difference in underlying rate = Woman-specific factor

Bayesian Analysis Prior distributions were chosen 30,000 MCMC iterates collected (5000 burn-in) Posterior Summaries Progression Paramters Posterior Summaries Surrogate Parameter Mean 90% Credible Interval Uterine Length τ 11 8.196 (7.909, 8.495) (Z 1 ) τ 21 1.465 (1.334, 1.607) υ 1 0.090 (0.028, 0.155) κ 1 0.074 (0.014, 0.136) Tumor Size/ τ 12 0.149 (-0.269, 0.591) Number τ 22 1.399 (0.884, 1.968) (Z 2 ) υ 2 0.030 (0.001, 0.095) κ 2 0.064 (0.002, 0.138) Shared Parameter σ 0.124 (0.094, 0.155) Conclusion: African Americans have a higher rate of preclinical growth of uterine fibroids

Higher incidence rate among blacks 35 (Pr(β 1 > 0) > 0.99, Pr(β 2 > 0) =.58, Pr(β 3 > 0) =.32 )

Estimated Cumulative Incidence Curves Cumulative Incidence of Leiomyoma 0.0 0.2 0.4 0.6 0.8 1.0 African American White 36 38 40 42 44 46 48 Age (years) Estimates not incorporating disease severity data are denoted by +.

SUMMARY Bayesian approach for inference on disease incidence and progression from cross-sectional data. Three-state model of preclinical onset and clinical diagnosis Multiple markers are linked to underlying normal variables Accounts for dependency among markers with different scales Approach can be adapted for different data structures

Discussion This type of multistate modeling approach with surrogates for the waiting times in the different states can be adapted for many different applications. Often, data do not consist of a simple right-censored survival time. When there are several states an individual can progress through and the exact transition times are unknown, the likelihood can be complex. The problem can be simplified by working in discrete time and considering the interval censored transition times as latent variables. Often, identifiability is in question and one may need some assumptions

Common Assumptions Markov: transition rates are independent of history of process Homogeneity: time transition rates are independent of Semi-Markov: transition rates are independent of time given waiting time in the current state