Regression Modeling of Time to Event Data Using the Ornstein-Uhlenbeck Process

Size: px

Start display at page:

Download "Regression Modeling of Time to Event Data Using the Ornstein-Uhlenbeck Process"

Emory Glenn
6 years ago
Views:

1 Regression Modeling of Time to Event Data Using the Ornstein-Uhlenbeck Process Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Roger Alan Erich, M.S. Graduate Program in Biostatistics The Ohio State University 2012 Dissertation Committee: Professor Michael L. Pennell, Advisor Professor Thomas J. Santner Professor Dennis K. Pearl

2 c Copyright by Roger Alan Erich

3 Abstract In this research, we develop innovative regression models for survival analysis that model time to event data using a latent health process which stabilizes around an equilibrium point; a characteristic often observed in biological systems. Regression modeling in survival analysis is typically accomplished using Cox regression, which requires the assumption of proportional hazards. An alternative model, which does not require proportional hazards, is the First Hitting Time (FHT) model where a subject s health is modeled using a latent stochastic process. In this modeling framework, an event occurs once the process hits a predetermined boundary. The parameters of the process are related to covariates through generalized link functions thereby providing regression coefficients with clinically meaningful interpretations. In this dissertation, we present an FHT model based on the Ornstein-Uhlenbeck (OU) process; a modified Wiener process which drifts from the starting value of the process toward a state of equilibrium or homeostasis present in many biological applications. We extend previous OU process models to allow the process to change according to covariate values. We also discuss extensions of our methodology to include random effects accounting for unmeasured covariates. In addition, we present a mixture model with a cure rate using the OU process to model the latent health status of those subjects susceptible to experiencing the event under study. We apply these methods ii

4 to survival data collected on melanoma patients and to another survival data set pertaining to carcinoma of the oropharynx. iii

5 This document is dedicated to my family and to those brave men and women of the Armed Forces that gave their lives to protect our country s freedom during the completion of this PhD. iv

6 Acknowledgments Without the support, patience and guidance of the following people, this study would not have been completed. It is to them that I owe my deepest gratitude. Dr. Michael Pennell, my advisor, who guided me through the entire process of this research. Without his expertise, this would not have been possible. Dove Erich, my dear wife, without whom this effort would have been worth nothing. Your love, support, and sacrifice helped tremendously during this trying time, and I will be forever grateful. Ashley and Ellie Erich, my girls, who have sacrificed so much time with me. Arnold and Doris Erich, my parents, who have always believed in me. My dissertation committee that provided me with valuable guidance and feedback to make this research stronger and more viable. All of the faculty and staff at The Air Force Institute of Technology who supported me through this longer than anticipated PhD program. Dr. Bill Baker who provided valuable mathematical insight to help me get unstuck in my research which allowed me to complete this degree. Last, but not least, I am ever grateful to God who makes all things possible. v

7 Vita June St Marys Area High School B.S. Mathematics, Pennsylvania State University M.S. Applied Mathematics, A.F. Inst. of Technology 2007 to Graduate Student, Department of Biostatistics, The Ohio State University Fields of Study Major Field: Biostatistics vi

8 Table of Contents Page Abstract Dedication Acknowledgments List of Figures List of Tables ii iv v x xii 1. Introduction Threshold Regression: A First Hitting Time Regression Model Gamma Process and Inverse Gamma First Hitting Time Wiener Process Models The Wiener Process The Inverse Gaussian Distribution The Inverse Gaussian Distribution and Survival Analysis Previous Work using the FHT model with an Underlying Wiener Stochastic Process Strengths and Limitations of Using the Wiener Process Model Survival Models Based on the Ornstein-Uhlenbeck Process First Hitting Time for the Ornstein-Uhlenbeck Process The Shape of the Hazard Function Modeling the Hazard as the Square of an Ornstein-Uhlenbeck Process The Ornstein-Uhlenbeck Process in Biostatistical Applications Operational Time vii

9 3. The Ornstein-Uhlenbeck Model With Initial State Dependent On Covariates Proposed OU Threshold Regression Model Simulation Study Application of OU-TR Model to Overall Survival of Patients with Carcinoma of the Oropharynx Discussion The Ornstein-Uhlenbeck Mixture Model Proposed OU Threshold Regression Mixture Model Simulation Study Application of OU-TR Mixture Model to Time to Relapse Data from Patients with Melanoma Discussion The Ornstein-Uhlenbeck Random Effects Model for Survival Data with Unmeasured Covariates OU-TR Random Effects Model Proposed Model Simulation Study Application of the OU-TR Random Effects Model to Overall Survival of Patients with Carcinoma of the Oropharynx OU-TR Random Effects Mixture Model Proposed Model Simulation Study Application of OU-TR Random Effects Mixture Model to Time to Relapse Data from Patients with Melanoma Discussion Conclusion Bibliography Appendices A. Standard Error Derivations for OU-TR Model viii

10 B. Standard Error Derivations for OU-TR Mixture Model C. Random Effects Density Function and Survival Function Derivations D. Simplification of the Likelihood Function Under the OU-TR Random Effects Model E. Standard Error Derivations for OU-TR Random Effects Model F. Standard Error Derivations for OU-TR Random Effects Mixture Model. 127 G. Newly Developed Matlab Functions for Fitting OU-TR Models to Data. 131 G.1 OU-TR Model G.2 OU-TR Mixture Model G.3 OU-TR Random Effects Model G.4 OU-TR Random Effects Mixture Model ix

11 List of Figures Figure Page 2.1 Inverse Gaussian Densities with τ = 1 for Several Values of λ Inverse Gaussian Densities with λ = 1 for Several Values of τ Sample Paths of OU Process and Wiener Process Hazard function of time to absorption (parameter values: a = 0,b = 1,σ 2 = 2) Goodness of Fit of Best BIC Carcinoma of the Oropharynx Model Goodness of Fit of Second Best BIC Carcinoma of the Oropharynx Model for Subjects with a Disability Goodness of Fit of Second Best BIC Carcinoma of the Oropharynx Model for Subjects without a Disability Comparing Goodness of Fit of Model with Interaction Between Disability StatusandTumorSize(Int)withthebestBICMainEffectsModel(No Int) Goodness of Fit of Best and Second Best Melanoma Models (in terms of BIC) for Nodal Categories 0 and Goodness of Fit of Best and Second Best Melanoma Models (in terms of BIC) for Nodal Categories 2 and Estimated Survival Curves for OU-TR and OU-TR Random Effects Models When Psi = 0.25 (Scenario 1 from Table 5.6) x

12 5.2 Estimated Survival Curves for OU-TR and OU-TR Random Effects Models When Psi = 0.5 (Scenario 2 from Table 5.6) Estimated Survival Curves for OU-TR and OU-TR Random Effects Models When Psi = 1 (Scenario 3 from Table 5.6) Estimated Survival Curves for OU-TR and OU-TR Random Effects Models When Psi = 2 (Scenario 4 from Table 5.6) Goodness of Fit of Best BIC Carcinoma of the Oropharynx Model (OU-TR and OU-TR Random Effects) for Subjects with Disability Goodness of Fit of Best BIC Carcinoma of the Oropharynx Model (OU-TR and OU-TR Random Effects) for Subjects with No Disability Comparison of Survival Estimates When Bias is Present in ˆψ Estimated Survival Curves for OU-TR Mixture Model and OU-TR Random Effects Mixture Model When Psi = 0.25 (Scenario 1 from Table 5.13) Estimated Survival Curves for OU-TR Mixture Model and OU-TR RandomEffectsMixtureModelWhenPsi=0.5(Scenario2fromTable 5.13) Estimated Survival Curves for OU-TR Mixture Model and OU-TR Random Effects Mixture Model When Psi = 1 (Scenario 3 from Table 5.13) Estimated Survival Curves for OU-TR Mixture Model and OU-TR Random Effects Mixture Model When Psi = 2 (Scenario 4 from Table 5.13) Goodness of Fit of Best BIC Melanoma Model (Mixture Model and Random Effects Mixture Model) for Nodal Categories 0 and Goodness of Fit of Best BIC Melanoma Model (Mixture Model and Random Effects Mixture Model) for Nodal Categories 2 and xi

13 List of Tables Table Page 3.1 Results of Simulation Study Based on 1000 Data Sets of Size Summary Statistics of Variables Considered in Modeling the Oropharynx Data Final Model for Death from Carcinoma of the Oropharynx Results of Mixture Model Simulation Study Based on 1000 Data Sets of Size Summary Statistics of Variables Considered in Modeling the Melanoma Data Stage 1 of OU-TR Mixture Model Building for Melanoma Data With Relapse as Event Stage 2 of OU-TR Mixture Model Building for Melanoma Data With Relapse as Event Final Model for Relapse from Melanoma Simulation Results for OU-TR Random Effects Model Based on 1000 Data Sets of Size 200 with Psi = Simulation Results for OU-TR Random Effects Model Based on 1000 Data Sets of Size 200 with Psi = Simulation Results for OU-TR Random Effects Model Based on 1000 Data Sets of Size 200 with Psi = xii

14 5.4 Simulation Results for OU-TR Random Effects Model Based on 1000 Data Sets of Size 200 with Psi = Simulation Results for OU-TR RE Model Based on 1000 Data Sets of Size 300 with Psi = Simulation Results Examining Effect of Ignoring Random Effect (RE) in the OU-TR Model. Results are Based on 1000 Data Sets of Size OU-TR and OU-TR Random Effects Models for Carcinoma of the Oropharynx Data Simulation Results Based on 1000 Data Sets of Size 300 with Psi = Simulation Results Based on 1000 Data Sets of Size 300 with Psi = Simulation Results Based on 1000 Data Sets of Size 300 with Psi = Simulation Results Based on 1000 Data Sets of Size 300 with Psi = Simulation Results Based on 1000 Data Sets of Size 1000 with Varying True Values of Psi Simulation Results Based on 1000 Data Sets of Size 300 for OU-TR Random Effects Mixture Model Comparison to OU-TR Mixture Model without Random Effects Final OU-TR Mixture Model and OU-TR Random Effects Mixture Model for the Melanoma Data xiii

15 Chapter 1: Introduction In biostatistical research, it is often the goal to determine important factors affecting subject s survival time or time to development of disease. Numerous models are used to identify these factors. One popular choice is the Cox proportional hazards model (Cox, 1972). This model has many great features that include not being required to assume a distribution for the baseline hazard function, interpretation of regression parameters in terms of relative risk and the use of the partial likelihood function. However, this model provides erroneous results if the proportional hazards assumption is violated due to time varying covariate effects. For example, the effectiveness of a drug treatment may increase or decrease over time. A doctor may prescribe an antibiotic which loses its ability to fight infection over time. Thus, another treatment may be prescribed that builds up in the system eliminating the infection. If we look at a patient s health post-surgery, there usually is an initial increase in mortality risk immediately following surgery before a beneficial health effect is observed. Also, unexplained heterogeneity in a subject s risk or frailty may also result in non proportional hazards (Hougaard, 1991 and Keiding, 1997). To combat this phenomenon, a shared frailty model may be fit to the data (Vaupel et. al., 1979). However, within each cluster of subjects with the shared frailty value, the proportional hazards assumption must still be met in order to draw sound inferences. Other 1

16 methods are available to remedy problems associated with non proportional hazards under the Cox model. They include using time dependent covariate effects in the model or simply stratifying on the covariate that is introducing the non proportional hazards (Klein and Moeschberger, 2003). Another useful, though less frequently used, approach for identifying important prognostic variables of survival is a First Hitting Time (FHT) model. This approach does not require the proportional hazards assumption. In an FHT model, a stochastic process represents patient health with failure occurring once the process hits a boundary (Lee and Whitmore, 2006). For example, we may model a subject s health status using a Wiener process resulting in an FHT with an inverse Gaussian distribution (Chhikara and Folks, 1989). Since death or disease is the outcome of a series of genetic and physiological events where a subject s health deteriorates until it reaches a boundary, the FHT model is theoretically an attractive choice (Pennell et. al., 2010). Take for instance, a subject who has been diagnosed with lung cancer. This cancer has stages 0 through 5 with higher stages indicating more extensive disease. If left untreated, subjects will transition from one stage to the next, until they ultimately die from the disease. In this context, an FHT model would be well suited to analyze time to event data and highlight important variables that have an impact on survival. In threshold regression, covariate information is integrated into the parameters of FHT models via generalized link functions (Lee and Whitmore, 2006). For example, the initial state and variance of the Wiener process have been related to covariates using a log-link, and an identity link has been used for the drift parameter (cf. Lee et. al., 2000, 2004; Aalen and Gjessing, 2001; Aalen et. al., 2008). The proportional 2

17 hazards assumption is avoided in these models given that the effects on the hazard vary with time. In a 2004 paper, Aalen and Gjessing discuss survival models based on the Ornstein- Uhlenbeck (OU) process. The OU process is a modification of a Wiener process to include drift toward an equilibrium state. Many biological processes have the property, termed homeostasis, of diffusing back and forth while simultaneously tending to stabilize around a certain point. Examples of homeostatic biological processes include body temperature regulation in warm-blooded animals and blood ph regulation in the human body (Blessing, 1997). Also, the urinary system in the human body removes salt, excess ions and waste from plasma which is vital in the homeostatic regulation of the ionic composition, volume and ph of the internal environment (Chiras, 2005). In this dissertation, we developed new statistical methodologies for analyzing time to event data based on the OU process. We have extended previous models, based on the OU process, to incorporate available covariate information. To demonstrate the usefulness of these methodologies, we applied the methods to real data from biomedical studies and assessed model fit by comparing our estimated OU survival curve to the Kaplan-Meier curve generated from the same data. The first data set consists of 192 subjects from a clinical trial in the treatment of carcinoma of the oropharynx found in Kalbfleisch and Prentice (1980). Patients were randomly assigned to one of two treatments, radiation therapy in itself or radiation therapy in conjunction with chemotherapy. An objective of this study was to compare the two treatments with respect to patient survival. Covariates considered in this OU model approach included age, sex, treatment, patient physical condition, tumor site, 3

18 tumor grade, tumor T-stage and tumor N-stage. Time until death from cancer of the oropharynx was recorded. The second data set examined comes from a clinical trial which includes 713 melanoma patients who, after definitive surgery, were randomly assigned to treatment or observation groups. We applied our model to data from 315 subjects who did not receive treatment during the study in order to analyze the natural progression of the disease. Covariate information for each subject, including age, sex, treatment, nodal category, and Breslow score were available for analysis. Time until relapse and/or death from melanoma were recorded. We also broaden our threshold regression approach to model data using a mixture model which introduces a cure rate. Finally, we extended our approach by including subject specific random effects which account for unexplained heterogeneity in initial health status. The remainder of this dissertation is organized as follows. First we describe the concept of the threshold regression model and provide some examples of these models. Next, we detail two specific types of threshold regression models; the Wiener process model and the Ornstein-Uhlenbeck process model. Then, we perform a simulation study using the OU process with covariates incorporated into the initial health status (called the OU-TR model). Following this section, we apply the OU-TR model to the carcinoma of the oropharynx clinical trial data and present results. In the next chapter, we describe the use of a mixture model incorporating the OU process to model those subjects susceptible to experiencing the event under study. A simulation study is conducted using this OU-TR mixture model, and the model is applied to the melanoma study data. Following this chapter, an explanation is given for the OU process models in which random effects are incorporated to capture unexplained heterogeneity between subjects. Simulation studies of this random effect modeling 4

19 method are conducted for both the OU-TR random effects model and the OU-TR random effects mixture model, and these models are applied to the carcinoma of the oropharynx and the melanoma clinical trial data respectively. Finally, we highlight future work to be accomplished that may enhance the capabilities of the OU process models described in this dissertation. 5

20 Chapter 2: Threshold Regression: A First Hitting Time Regression Model There are two basic components to the FHT model as described in Lee and Whitmore, The first is a parent stochastic process {X t,t T,X t = x X} with initial value X 0 = x 0, T is the time space and X is the state space of the process. The second component consists of a boundary set or threshold B, where B X. X t may have many different properties such as one or more dimensions, the Markov property, a continuous or discrete state or a monotonic sample path. In the context of medical applications, X t is often latent and describes the health status of the subject. In epidemiological applications, X t frequently describes the unobservable status of the disease under investigation. AsdescribedinLeeandWhitmore(2006), ifwetakex 0 tolieoutsideofb, thefirst hitting time of B is the random variable S = inf {t : X t B}. Therefore, the time when the stochastic process first encounters B is the first hitting time. The threshold state is the first state encountered by the process in the boundary set, X S B. Thus, a stopping condition is defined by the boundary set. If the parent process is latent, we cannot observe the FHT event in the state space of the process directly. For example, liver transplant patients have several factors used to determine initial health status after transplant. These factors may include type of transplant, age and weight to 6

21 name a few. The boundary can be set as death due to complications from the liver transplant. Thus, the process models the decline in health from the initial point to death when the process hits the threshold. First hitting time (FHT) models have been applied in an array of fields such as engineering, economics, business and medicine. They have been used to model labour turnover (Whitmore, 1979), the onset time for a cancer induced by occupational exposure (Lee et al., 2004), length of a hospital stay (Eaton and Whitmore, 1977) and strike duration(linden, 2000). What makes FHT models valuable in applications is the capability to include regression structures. This allows effects of covariates to account for natural dispersion of the data, thereby explaining variability and sharpening inferences. Regression structures also provide scientific insights into potential causal roles of covariates in the underlying processes, boundary sets and time scales (Lee and Whitmore, 2006). As described by Lee and Whitmore (2006), there are several possible choices for the stochastic process X t including a Bernoulli process, Poisson process, Markov chain, Wiener process, gamma process and an Ornstein- Uhlenbeck (OU) process. 2.1 Gamma Process and Inverse Gamma First Hitting Time In the gamma process model, described in Lee and Whitmore (2006), the parent process is {X t,t 0} with initial value X 0 = x 0 > 0 and X t = x 0 G t where {G t,t 0} is a gamma process with G 0 = 0. The gamma process, described in Kyprianou (2006) in section (pages 7-8), has increments G t s = G t G s, where 0 s < t <, that are stationary, independent and gamma distributed with shape parameter α and scale parameter β which are constants. Thus the pdf of the 7

22 gamma(α, β) distribution is f(g t s α,β) = βα Γ(α) (G t s) α 1 e βg t s where G t s (0, ), α > 0 and β > 0. Some authors have considered generalizations of the gamma process in which the shape or scale parameter vary monotonically with time. For instance, Kalbfleisch (1978) used a gamma process with shape parameter α(t) Λ 0(t) as the prior for the cumulative baseline hazard function (Λ 0 (t)) in a Bayesian analysis of the Cox model, where Λ (t) is a parametric cumulative baseline hazard function representing one s best a priori guess at Λ 0 (t). Since a gamma process has monotonic sample paths, the first hitting time of the parent process (X t = 0) has an inverse gamma distribution. An advantage of this model is that computational routines for the gamma distribution are readily available. In Singpurwalla (1995), the use of this model is motivated by the fact that item wear is nondecreasing and failure of many components or systems of components is more likely due to wear than a traumatic event. In Lawless and Crowder (2004), Singpurwalla s gamma process model is extended to incorporate covariates to better explain reliability of items with certain characteristics. Random effects are also included to explain heterogeneity between these items not accounted for by the observed covariates. Lawless and Crowder(2004) set up the gamma process model by defining G t to be gamma(α,η(t)) where η(t) is a given monotone increasing function of time. Covariates are incorporated in the gamma process by changing α to α(v) where v is a vector of covariate values which allows rescaling of G t without changing the shape parameter of its gamma distribution. To incorporate random effects into this model, further alteration of α is accomplished by using rα(v) where r is the random effect. In their paper, an application is presented involving metal 8

23 fatigue crack growth data with random effects specific to each unit but no covariates. However, it is suggested that α(v) = exp(βv ) be used as the regression function specification when covariates are involved. They define the monotone increasing function of time to be η(t) = β 0 (1 y β 2 0 β 1 β 2 t) β 1 2 where y 0 is the initial crack length and the β s are parameters that vary randomly across units. A possible use of the gamma process model in a biological application is given in Lee and Whitmore (2006). Here, they define the process as X t = x 0 with probability 1 p and X t = x 0 Z t with probability p where p is a susceptibility probability and Z t is a gamma process. For example, a patient can have a benign form of disease with probability 1 p or a malignant form with probability p. Thus, the malignant form of the disease advances monotonically en route to death from the disease. In contrast, the gamma process model may not be a good choice in applications where health does not decline consistently over time; for example, diseases with long latency periods. 2.2 Wiener Process Models The Wiener Process We begin by defining the Wiener process (Prahbu, 1965, Section 3, p. 10), X t, withdriftµ (, )andvarianceσ 2 > 0. Theprocesshasthefollowingproperties for any t 1 < t 2 < t 3 < t 4 : 1. X t has independent increments; X t2 X t1 and X t4 X t3 are independent. 2. X t2 X t1 has a normal distribution with mean µ(t 2 t 1 ) and variance σ 2 (t 2 t 1 ) where t 1 < t 2. 9

24 Under these conditions, the probability density function (pdf) for X t = X given that the process started at x 0 is f(x x 0,t) = 1 σ 2πt exp ( (x x 0 µt) 2 2σ 2 t ). (2.1) Further details on the Wiener process can be found in Chhikara and Folks (1989) and Prahbu (1965). Next we focus on the first passage time T of X t to a < x 0, where a is the predetermined threshold value. The conditions X 0 = x 0, X t > a, 0 < t < T, and X T = a are necessary for T to be the first passage time. If T is finite, the density function of T is derived by finding the Laplace transform (Prabhu, 1965). Details of these derivations can be found in Chhikara and Folks (1989). The resulting first hitting time distribution is inverse Gaussian which is explained in the following section The Inverse Gaussian Distribution The probability density function (pdf) of an inverse Gaussian random variable X is f (x τ,λ) = λ 2π x 3/2 exp [ λ(x τ)2 2τ 2 x ], x > 0 (2.2) where τ and λ are greater than zero. The mean of the distribution is τ and the scale parameter is λ. If we define φ = λ/τ, the shape of the distribution depends on φ only. The inverse Gaussian distribution represents a broad class of distributions, varying from a highly skewed to a symmetrical distribution as φ goes from 0 to (Chhikara and Folks, 1989). Since φ 1 = τ/λ, the inverse Gaussian distribution moves closer 10

25 to normal when φ is increased. As shown in Chhikara and Folks (1989), the density curves in Figures 2.1 and 2.2 illustrate the wide range of shapes possible when using the inverse Gaussian distribution λ = λ = λ = 0.5 λ = 1 λ = Time Figure 2.1: Inverse Gaussian Densities with τ = 1 for Several Values of λ 6 5 τ = τ = τ = 1 τ = 5 τ = Time Figure 2.2: Inverse Gaussian Densities with λ = 1 for Several Values of τ 11

26 2.2.3 The Inverse Gaussian Distribution and Survival Analysis When studying subject survival or time to disease occurrence/recurrence, the inverse Gaussian model has some useful properties. The hazard function for the inverse Gaussian tends to initially increase, then decrease, and approach a constant value as the lifetime becomes infinite. This property is frequently found when lifetimes are dominated by early event times (Chhikara and Folks, 1989) such as studies involving organ and bone marrow transplants(klein and Moeschberger, 2003). Another important property is that the family of inverse Gaussian distributions is rather broad. This distribution can represent a highly skewed to an almost normal distribution (Chhikara and Folks, 1989). Suppose F(t) denotes the cdf of a subject s survival time. Then, the subject s survival function S(t) at time t is the probability of experiencing the event after time t. Therefore, S(t) = 1 F(t). The cdf of t in terms of the standard normal distribution function, given by Schuster (1968), is F(t) = Φ [ ( ) ] ( λ t λ +exp(2λ/τ)φ[ t τ 1 1+ t t τ) ]. (2.3) Therefore, the survival function for the inverse Gaussian is S(t) = Φ [ ( λ 1 t t τ) ] [ ( λ exp(2λ/τ)φ 1+ t t τ) ]. (2.4) AsmentionedinSection2.2.1, thefirsthittingtime(t)ofawienerprocessfollows an inverse Gaussian distribution. The drift of the Wiener process may be positive, negative or zero. If the Wiener process has negative drift (µ < 0), then there is a propensity to drift toward the threshold (a), (Whitmore, 1979); i.e., S( ) = 0. With µ < 0, we obtain, from derivations explained in Chhikara and Folks (1989), a proper 12

27 inverse Gaussian distribution IG(τ,λ) of the first hitting times with parameters defined as follows: τ = (x 0 a) µ and λ = (a x 0) 2 σ 2. (2.5) Thus, the pdf of the inverse Gaussian first hitting time distribution when µ < 0 in terms of the Wiener process parameters is [ f (t µ,σ 2 ) = x 0 a 2πσ 2 t 3/2 exp µ(t+ ] x 0 a µ )2, t > 0,σ 2 > 0. (2.6) 2σ 2 t The corresponding cdf is F(t) = Φ (x 0 a) 2 σ 2 t ( ) tµ x 0 a 1 +exp ( ) 2(x0 a)µ (x 0 a) σ 2 Φ 2 σ 2 t ( 1+ tµ ). (2.7) x 0 a Inaprocessthathaspositivedrift(µ > 0), hittingapresetthreshold(a)theoretically may never occur (Whitmore, 1979); i.e., it has a cure rate (S( ) is not necessarily 0). For example, in a clinical trial study where a patient s health was modeled using the Wiener process with positive drift, the subject may never experience the event under study (they are cured). With µ > 0, the resulting improper distribution of the first hitting time is inverse Gaussian IG( τ,λ) and the cure rate is S( ) = 1 exp( 2x 0 µ/σ 2 ). If µ = 0, the process also has a propensity to drift toward a (Whitmore, 1979). However, as seen in Chhikara and Folks (1989), the FHT is not inverse Gaussian when µ = 0, but it is a stable distribution with index 1/2 (See Feller 1966, Section 6.1, p. 170) with probability density function f(t σ 2 ) = ( 1 2σ 2πt exp 1 ). 3 8tσ 2 13

28 Inthresholdregression,theprocess{X t }andboundarysetb haveparametersthat are dependent on covariates differing between individuals (Lee and Whitmore 2006). Using appropriate regression link functions, these parameters are joined to linear combinations of covariates, such as g θ (θ i ) = z i γ for θ. Here g θ is the link function, the parameter θ i is the value of the parameter θ for individual i, z i = (1,z i1,z i2,...,z ik ) is the covariate vector of individual i and γ is the associated vector of regression coefficients. Normally, the link function will be chosen to map the parameter space into the real line. Likewise, covariates and their mathematical forms in the regression function zγ must be chosen appropriately, as is the case in a conventional regression analysis. An attractive feature of the threshold regression model is that we are able to relate subject characteristics to clinically meaningful parameters. We illustrate an example of this using the Wiener process FHT model. The Wiener process has mean (drift) parameter µ and variance parameter σ 2 initial process level parameter x 0 and the boundary set that includes the threshold a. However, the survival function only depends on these three parameters via x 0 /σ and µ/σ. Hence, when analyzing right censored data, there are essentially only two free parameters. Thus, we can arbitrarily set σ 2 = 1 without loss of generality (Aalen and Gjessing, 2001). In the railroad worker case-control study presented in Lee et al. (2009), covariates, such as smoking status, asbestos exposure and whether or not the subject worked as an engineer, were incorporated into the Wiener process model to determine their effect on survival. In this case-control study, the following link functions were used: µ = β 0 +β 1 y 1 + +β k y k. ln(x 0 ) = γ 0 +γ 1 y 1 + +γ k y k. 14

29 where y = (y 1,y 2,...,y k ) is a vector of regression covariates. The underlying process was assumed to be a Wiener process with negative drift. Therefore, the first hitting time distribution was IG( τ,λ), with τ and λ as defined in (2.5), and maximum likelihoodtechniqueswereusedtofindtheestimatesofβ andγ fromthelinkfunctions above. In this model setup, β represents the covariate effects on the initial health status and γ represents the covariate effects on the rate of decline in health status Previous Work using the FHT model with an Underlying Wiener Stochastic Process Several authors have utilized the Wiener process FHT model in survival, reliability and economic applications. In this section, we will summarize several specific uses of this model. The first example comes from research conducted in Lee et al. (2009). An FHT model with the Wiener process as the underlying stochastic process was used to analyze data, detailed in Garshick et al. (2004), from a case-control study which includes 3641 railroad workers where 1256 died from lung cancer (cases) and 2385 workers from the same population that did not die of lung cancer, suicide, accident or unknown cause (controls). Since 1959, the rail industry used diesel power for their locomotives. Thus, railroad workers began to be exposed to diesel exhaust. For this case-control study, diesel exhaust exposure was captured by breaking down jobs with the railroad into three categories. The first category contains engineers, brakemen, firemen, conductors and hostlers. The second category consists of railroad shop workers and the third includes all other workers such as ticket and station agents, clerks and rail car repair workers. Since this data comes from a case-control study, each case subject contributes an observed lifetime from the reference date to the year 15

30 of death and each control subject contributes a censored survival time (censored by some other cause of death) measured from the same reference date (Lee et al., 2009). Covariates were incorporated into the model via the drift parameter and the initial health status. Operational time was also used in this study and is explained at the end of this chapter. In Pennell et al. (2010), a Bayesian methodology was used in a Wiener process FHT model that accounts for unmeasured covariates in both the initial health status and the drift. To accomplish this, a random effect was included in the drift component and each subject s initial health status, x 0i, was modeled as a truncated normal random variable. This methodology was applied to data from malignant melanoma patients where non proportional hazards and unexplained heterogeneity were present. The results are compared to previous studies conducted on this data using Cox regression and fitting a similar FHT model without random effects. Research conducted by Lee, Whitmore and Rosner (2010) explored threshold regression for survival data in longitudinal studies involving time-varying covariates. To handle this type of data, the authors suggest breaking up longitudinal data into intervals and modeling time to event over each interval using threshold regression with a latent Wiener process under a Markov assumption. This method was illustrated using data from a nurse s health study of lung cancer risk with completion times of surveys defining the different time intervals. Lee, Chang and Whitmore (2008) conducted research using a threshold regression mixture model for assessing treatment efficacy in a multiple myeloma clinical trial. The subjects in this study were initially randomized to either receive Velcade or a high-dose Dexamethasone treatment. Based on the subject s response, they were 16

31 switched to the other treatment if necessary. A mixture of two Wiener process FHT models was fit to the survival data since there was evidence of a bimodal FHT distribution in each treatment group. The mixing parameter in this model is the proportion of patients receiving one of the two treatments. A composite time scale was used to distinguish the rate of disease progression before and after switching treatments (see section 2.4 at the end of this chapter). Covariates were incorporated in the model via the drift parameter. An extension of the univariate Wiener process model can be found in research conducted by Whitmore, Crowder and Lawless (1998). Here, a bivariate Wiener process was used to jointly model a latent process and an observable marker process. This technique was demonstrated with a simulated example and applied to a data set obtained from an aluminum production process. The data set contained failure age in days of the reduction cells (used to perform electrolysis of molten alumina and cryolite) and failure data on two markers which include the percentage iron contamination level and horizontal distortion of the cell in inches. The bivariate Wiener process model of Whitmore et al. (1998) was extended by Tong et al. (2008) to the case when only current status data are available. Current status data is also known as interval censored data where only one observation on each subject is available and the failure time is either smaller or larger than the observed time. This type of data can be found in cross-sectional studies and animal studies examining time to appearance of internal tumors. Horrocks and Thompson (2004), proposed a Wiener process model for competing risks data. The model is based on the time, T, that a Wiener process hits one of two boundaries which represent two possible competing outcomes. Covariates 17

32 were incorporated into the model via the drift component of the underlying Wiener process. The model was used on a subset of data from the Utah Department of Health representing all hospital discharges in The two competing outcomes were healthy discharge and death in hospital. The upper and lower thresholds were modeled as a linear function of covariates. Horrocks and Thompson discussed an extension of their model for length of stay that accounts for the presence of heterogeneity in the population (accomplished through use of a mixture model). Another competing risk application of the Wiener process FHT model was used in research conducted by Lindqvist and Skogsrud (2009). They considered a competing risks framework for a component that will experience either failure or a preventive maintenance procedure to avoid failure. A novel approach is presented that models component degradation using the Wiener process with failure associated with hitting a predetermined threshold. In addition, a potential time for maintenance associated with hitting a threshold before the failure threshold is accounted for in the model. A final example of an application involving the Wiener process comes from research conducted by Saebo et al. (2005) on genetic evaluation of mastitis resistance in cows. In the model setup, it is assumed that each cow is in a unique state of health at any given time that is a certain distance from onset of disease. The latent physiological battle against the disease can be modeled by a Wiener process with drift toward the disease threshold. Two risk patterns are associated with development of mastitis that include physical changes known to start in the days leading up to calving and the cow s environment such as milking technique and hygiene. Thus, these two risk patterns invite the model setup involving two latent Wiener processes. 18

33 2.2.5 Strengths and Limitations of Using the Wiener Process Model The Wiener process is widely used as the underlying stochastic process in research involving first hitting time models in survival and reliability applications. The first hitting time distribution, when using this process, is inverse Gaussian. As explained in Chhikara and Folks (1989), this distribution is very flexible and can represent skewed as well as approximately normal data. Thus, the Wiener process can be an important tool when modeling data characterized by early incidence of events. Also, when using the Wiener process, the inverse Gaussian distribution and the survival function for the first hitting times are easily computable and provide an efficient means of finding maximum likelihood estimates of model parameters. Another useful feature is the ability to model cure rates since the drift parameter can either be positive or negative. Finally, the ease of incorporating covariates into the model via the drift parameter or the applicable threshold make the Wiener process a viable choice in biostatistical research. A limitation of the Wiener process model, originating in the defining properties of the process, is that disjoint time increments are independent. This can cause problems for modeling, for example, movements of an organism or a patient s health status that logically depend on the state in the previous time increment of the process. This virtually eliminates the capability to model homeostasis in the underlying process as the Wiener process models rapidly fluctuating phenomena (Horsthemke and Lefever, 1984). A possible solution to this deficiency is the incorporation of the Ornstein- Uhlenbeck process, a modification of the Wiener process, to allow adequate modeling 19

34 of the homeostatic properties of many biological processes. The OU process is described in the next section. 2.3 Survival Models Based on the Ornstein-Uhlenbeck Process In a 2004 paper, Aalen and Gjessing discussed survival models based on the Ornstein-Uhlenbeck (OU) process. This process is a mean reverting modification of a Wiener process in that it has a propensity to drift in the direction of a fixed equilibrium level. Homeostasis, defined as simultaneously diffusing back and forth while stabilizing around a certain point, is a characteristic found in many natural processes. Thus, the OU process is natural to consider in a biological context. An example of a biological process that exhibits homeostasis is kidney function. Kidneys getridofextrawaterandionsfrombloodthroughpassageofurine. Thus, thekidneys carry out homeostatic regulation by removing waste or excess products from the body. For the purposes of this research, we consider two concepts when modeling with the OU process. If the event under study is a positive one, such as modeling time to discharge from the hospital or ICU, the threshold, in the FHT model context, will be regarded as a healthy homeostasis. Also, in another modeling situation, we can have subjects being pulled from a healthy status toward an unhealthy homeostasis or threshold representing death or disease. In the following sections, details of the OU process are explained and an example is given describing a previous use of this model in the literature First Hitting Time for the Ornstein-Uhlenbeck Process AalenandGjessing(2004)statethattheWienerprocess,representedbyW t,iswell known for modeling random processes with continuous sample paths. Its time steps 20

35 over an interval are normally distributed with mean 0 and variance proportional to the interval length. The OU process, represented by X t, is actually a modified Wiener process with a drift toward a state of equilibrium. The OU process can be defined by the stochastic differential equation (Cox and Miller, 1965, Section 5.8, p. 226) dx t = (a bx t )dt+σdw t (2.8) Where < a <, b > 0 and σ > 0. According to Aalen and Gjessing (2004), X 0 is typically modeled as Gaussian or treated as a constant. This equation tells us that for small time intervals (t,t + t), the change in X t has drift toward a/b, but is agitated by the Gaussian noise contained in dw t (often called white noise). This process is attracted to the equilibrium point a/b. This attraction is known as the mean-reverting property of the OU process. As shown in Aalen and Gjessing (2004), X is Gaussian, and EX t = a/b+(x 0 a/b)exp( bt) which converges to a/b as t. Also, Var(X t ) = [σ 2 /(2b)](1 exp( 2bt)) which converges to σ 2 /(2b) as t and Cov(X s,x t ) = [σ 2 /(2b)][exp( b s t ) exp( b(s + t))]. If we ignore the initial fluctuations at the start of the process due to X 0 a/b, the OU process is stationary and Gaussian with an autocorrelation function that decays exponentially over time (Aalen and Gjessing, 2004). Details regarding the OU process are also available in Aalen et al. (2008). To demonstrate the tendency for the OU process to reach a state of equilibrium in contrast to the Wiener process, corresponding sample paths are generated for the two processes and shown in figure 2.3. The initial value, X 0, of all processes was set to 4, σ 2 is set to 2, the mean of the OU process was set to 0 (a = 0,b = 1) and the drift parameter for the Wiener process was set to 2, 0 and 2. In this plot we see the Wiener process with positive drift tends to move away from X 0 in a positive direction, and the Wiener process with negative drift tends to 21

36 move away from X 0 in a negative direction. For the Wiener process with zero drift, the path tends to stay close to the starting point of the process. The OU sample path moves toward the process mean of 0 and stabilizes. This behavior exhibits the OU process mean reverting property. The variance of the Wiener processes, described in this plot, is 2 while the variance of the OU process converges to 1 as t approaches infinity. 25 Wiener Process (drift = 2, 0, and 2) and OU Process (mean = 0) Paths with Xo = 4 20 Wiener Process drift = Wiener Process drift = 0 Health Status 5 0 OU Process 5 10 Wiener Process drift = Time (years) Figure 2.3: Sample Paths of OU Process and Wiener Process From this point on, it is assumed that the process is absorbed once it hits zero. The OU process X t is a process describing the latent progression of a subject toward a health-relatedevent. SupposeweletX t correspondtoasubject sdiseasedevelopment which may be latent. Then, the time the development reaches a particular level, an event occurs for that subject. Therefore, we define the subject s event time as the first time the latent process X t hits a threshold. We assume the OU process, X t, 22

37 starts at a deterministic positive value x 0. We define T = inf{t : X t = 0}, where T 0, to be the event time (Aalen and Gjessing, 2004). We can define the hazard rate for continuous T by h(t) = d/dtp(t > t). (2.9) P(T > t) It can be shown that when t approaches, the hazard rate h(t), the rate of the first passage across the boundary, converges to a constant h 0 (Aalen and Gjessing, 2004) The Shape of the Hazard Function Exact formulas for the hazard rate exist in the symmetric case (a = 0 in equation (2.8)). Thus, in this situation, we are modeling time to homeostasis since the mean and the threshold of the process are equal to 0. Unfortunately, under the general OU model, there is no closed form for the hazard rate. In Finch (2004), an attempt was made to find the general formula in closed form, but only a numerical solution is available. Also, in Aalen and Gjessing (2004), they state a closed-form symbolic inversion is hardly possible in general for the Laplace transforms required to find formulas for the density and survival functions. In Ricciardi and Sato (1988, p. 46) the probability density of time to event when starting in X 0 (parameter values a = 0,b = 1,σ 2 = 2) is given by 2 f(t) = π X e 2t 0 (e 2t 1) and the corresponding survival function is ( exp 3/2 X2 0 2(e 2t 1) ). (2.10) ( ) X0 S(t) = 2Φ 1, (2.11) e 2t 1 23

38 where Φ(.) is the standard cumulative normal distribution function. The corresponding hazard rate is calculated as h(t) = f(t)/s(t). For the OU process, the hazard rate, where the parameter values are a = 0,b = 1,σ 2 = 2, starts at 0 and then converges toward an equilibrium level. In the case of these specific model parameters, the hazard rate converges to 1. According to Aalen and Gjessing (2004), this convergence indicates the advancement of the underlying distribution toward quasi-stationarity on the state space, (see also Aalen and Gjessing, 2001). Aalen and Gjessing (2004) discuss how the shape of the hazard function changes with X 0 ; Figure 2.4 is a redrawing of their Figure 1. The hazard rate is generally increasing if X 0 is far from 0. When X 0 moves closer to zero, we get a more unimodal hazard. For small X 0 that are close to 0, we obtain a generally decreasing hazard rate. Note that the hazard function corresponding to X 0 = 0.2 in Figure 2.4 starts out at zero, has a strong initial increase and then is generally decreasing. Therefore, the shape of the hazard rate is driven by the distance X 0 is from the threshold (Aalen and Gjessing, 2001). 24

for Time-to-event Data Mei-Ling Ting Lee University of Maryland, College Park

Threshold Regression for Time-to-event Data Mei-Ling Ting Lee University of Maryland, College Park MLTLEE@UMD.EDU Outline The proportional hazards (PH) model is widely used in analyzing time-to-event data.