Multistate Modeling and Applications

Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor) September 7, 00 / 4

Use of Multistate Models Focus on multiple events with inherent dependence stages of degradation, rehabilitation,... Offer flexible and general structure i.e. Disabled Twin Dead Alive Fail Healthy Both Alive Both Dead Dead Twin Dead Usual Failure Model Illness-death Model Twin Survival Model Markov models have been used extensively. especially in epidemiology, reliability and risk analysis. Semi-Markov models are becoming popular. Yang Yang (UM, Ann Arbor) September 7, 00 / 4

Time-to-Failure: From A Process Point of View Traditional time-to-failure analysis: focus merely on failure event Parametric models: Exponential, Weibull, Log-normal, Inverse-gaussian Nonparametric inference: Kaplan-Meier, Nelson-Aalen Regression modeling: Cox proportional hazards, Accelerated failure time From a process point of view: failure as the end point of some underlying process time to reach the absorbing state after moving among other states in the system Failure Figure: Phase-type distribution (Markov) more informative to model the entire process Yang Yang (UM, Ann Arbor) September 7, 00 3 / 4

An Illustrative Application Product: Home equity line of credit Natural states: current, -month-delinquent, -month-delinquent, 3-month-delinquent, default default time Y = 7 censoring time C = 8 MOB 3 4 5 6 7 8 9 0 3 4 5 6 7 8 A 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 8 56 A 0 0 0 0 4 3 0 0 0 0 5 0 7 A3 0 7 0 9 59 89 0 9 0 9 58 88 0 9 59 89 9 A4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Of interest: ) predicting time-to-default for various vintages (FICO, balance, HPI,...), or starting from x-month delinquent to default; ) do people who move from delinquency to current behave the same as people who are always current, etc. Yang Yang (UM, Ann Arbor) September 7, 00 4 / 4

Estimating the Default Distribution: Two approaches Consider a general semi-markov process J(t) for the multistate data. Let T d be the time-to-failure (absorption or default). Consider the hazard function λ d (t) = P(T d = t T d t) Two approaches to estimate λ d (t): i. Traditional estimator: i.e. Kaplan-Meier or Nelson-Aalen ignore the underlying multistate process ii. Exploit the underlying structure: estimate the initial distribution π, the form of the transition probability P(t) = {P ij (t)}, with P ij (t) P(J(t) = j J(0) = i) nonparametrically and use these to estimate λ d Yang Yang (UM, Ann Arbor) September 7, 00 5 / 4

Comparison of the Two Estimators Simulated data from a 5-state (Markov) model state 5 is absorbing Cumulative Hazard 0 3 4 5 6 Hazard 0.0 0. 0. 0.3 0.4 0.5 0 40 60 80 Time 0 0 40 60 80 Time Figure: ˆΛ P and ˆλ P (blue), ˆΛ T and ˆλ T with 95% pointwise confidence interval (black), and MLE (red) Traditional estimate ˆλ T is close initially but becomes highly variable in the tail (as the number at risk becomes small). Estimate ˆλ P that exploits the underlying process is more stable, closer to the MLE, and can be shown to be much more efficient. Yang Yang (UM, Ann Arbor) September 7, 00 6 / 4

Inference for Multistate Models with Panel Data In practice, multistate data are only observed at discrete times. Instead of exact transition times τ s and visited states i s, only the state information x k are observed at some discrete time t k. x 0 x t0 i x Z {( s, is); s,..., S} Y {( tk, xk); k,..., K} x i x x i x 3 3 3 t t t 3 Transition occurs Figure: Z latent complete history, Y observed censored history To make inference of parameters θ that characterizes J(t), likelihood function is: L(θ Y) = P(J(t 0) = x 0, J(t ) = x, J(t ) = x, J(t 3) = x 3) When J(t) is a Markov model, the likelihood can be evaluated (numerically). However, this is not true in general, even for semi-markov models. Yang Yang (UM, Ann Arbor) September 7, 00 7 / 4

Challenges with Panel Data 7 Underlying J(t) Hard to derive information about Z = {(τ s, i s); s =,..., S}: if x 0 =, x =, x = 3, x 3 = 4, the number of transition S is possibly! assumed that J(t) is acyclic/progressive (by removing the red transitions). The likelihood function L(θ Y) involves high dimensional integral. if x 0 =, x =, x = 3, x 3 = 4, and S = K, i k = x k, L(θ Y) = g(y)q (τ )q 3(τ )q 34(τ 3)dτ dτ dτ 3, where g(y) = {τ (t 0, t ], τ + τ (t, t ], τ + τ + τ 3 (t, t 3]}. Yang Yang (UM, Ann Arbor) September 7, 00 8 / 4

Likelihood-Based Inference via Stochastic Approximation and MCMC sampling For general semi-markov models, Observed (Incomplete) Y: L(θ Y) and h(θ, Y) θ log L(θ Y) are hard to evaluate. Unknown (Complete) Z: L(θ Z) and H(θ, Z) θ log L(θ Z) have explicit forms. Note h(θ, Y) = E Z θ,y H(θ, Z). If Z i p( θ, Y), h(θ, Y) can be empirically estimated by H(θ, Z) = m m i= H(θ, Z i ). To obtain MLE, we use stochastic approximation: Pick a positive sequence γ n s.t. n γn = and n γ n <. Update the estimate by θ n = θ n + γ n H(θ n, Z (n) ), () where Z (n) = {Z (n), Z(n),..., Z(n) m } and Z (n) i is sampled from p( θ n, Y) by MCMC sampling. We actually use an improved version of () proposed in Gu and Kong (PNAS, 998). Yang Yang (UM, Ann Arbor) September 7, 00 9 / 4

Sampling Z from p( θ, Y ) 3 x 0 x i 4 x δ denotes the sample path of the process. Note: The dimension of the unknown Z = (δ, τ ) can vary. Two sampling schemes: x ' ' i i x x i i 3 t t x x ' ' ' 0 t 3 t3 i Calculate the conditional probabilities of each path δ, given θ n and Y, then use Gibbs sampling to sample the multiple transition times τ ; ii Use reversible jump MCMC to handle the varying dimension directly, and sample δ and τ together. In our experience, reversible jump MCMC is computationally faster. x 3 Yang Yang (UM, Ann Arbor) September 7, 00 0 / 4

Illustration for a Markov model 3 3 34 3 4 4 4 λ.098.00.0.04.06.08 λ 3 0.85 0.830 0.835 0.840 0.845 0.850 0.855 0.860 λ 4 0.405 0.40 0.45 0.40 0 500 000 500 000 500 3000 0 500 000 500 000 500 3000 0 500 000 500 000 500 3000 Model: a 4-state progressive Markov model Data: 00 interval-censored observations λ 3 0.560 0.565 0.570 0.575 0.580 0.585 0.590 Iteration λ 4 0.605 0.60 0.65 0.60 0.65 Iteration λ 34.39.40.4.4 Iteration 0 500 000 500 000 500 3000 0 500 000 500 000 500 3000 0 500 000 500 000 500 3000 Iteration Iteration Iteration Figure: black dash MLE, red Conditional, green RJMCMC Yang Yang (UM, Ann Arbor) September 7, 00 / 4

Simulation Study F t F () t 3 () F () t 3 3 Figure: 3-state Semi-Markov Model n = 00 and t k t k = 0.3, with () F = LN(, ), F 3 = LN(, ), F 3 = LN(.5, ), p = 0.6 () F = W (0.8,.), F 3 = W (0.8,.), F 3 = W (, ), p = 0.4 (3) F = W (0.8,.), F 3 = LN(, ), F 3 = Exp(), p = 0.6 Conditional RJMCMC Model () 9099 6 Model () 84 687 Model (3) 57 675 Table: Computation time (second) of 3000 iterations in SA Yang Yang (UM, Ann Arbor) September 7, 00 / 4

Improvement in Inference and Prediction Model (3) with F = W (0.9,.), F 3 = LN(, ), F 3 = Exp(), p = 0.6 θ MLE = c(0.76,.7, 0.84, 0.99,.7, 0.5) θ Heurstic = c(.03,.6, 0., 0.85, 0.53, 0.) Hazard: >.0.5.0.5 Hazard: > 3 0.0 0.5.0.5.0 0 3 4 5 Time 0 3 4 5 Time Hazard: > 3 0.6 0.8.0..4.6 MRF: till absorbing.5.0.5 3.0 0 3 4 5 Time 0 3 4 5 Time Figure: red from MLE; black from heurstic estimates Yang Yang (UM, Ann Arbor) September 7, 00 3 / 4

Work in Progress Time-to-failure: from a process point of view Derived the nonparametric estimator ˆλ P that exploits the underlying process Comparison with the traditional estimator ˆλ T Asymptotic properties of ˆλ P Inference for semi-markov multistate model with panel data: Derived a computationally intense but general algorithm to estimate MLE Can be readily extended to include covariates (time non-varying), and non-progressive models are possible in principle Study of other parametric estimators based on pseduo-likelihood based on method of moments Semi-parametric and nonparametric estimator Non-semi-Markov Models Yang Yang (UM, Ann Arbor) September 7, 00 4 / 4