Prediction Error Estimation for Cure Probabilities in Cure Models and Its Applications

Size: px
Start display at page:

Download "Prediction Error Estimation for Cure Probabilities in Cure Models and Its Applications"

Transcription

1 Prediction Error Estimation for Cure Probabilities in Cure Models and Its Applications by Haoyu Sun A thesis submitted to the Department of Mathematics and Statistics in conformity with the requirements for the degree of Master of Science Queen s University Kingston, Ontario, Canada September 2014 Copyright Haoyu Sun, 2014

2 Abstract Cure models are often used to describe survival data with a cure fraction. Many researchers have proposed different methods for model fitting; however, little research has been conducted on the assessment of prediction error for cure models. This report proposes an estimate of the expected Brier score as a measurement of prediction error for mixture cure models, regarding to the prediction of cure status in particular. Both resubstitution and cross-validation methods were used to calculate the value of the proposed estimate. Simulation studies demonstrated that both of these methods work well in terms of assessing prediction error, especially when sample size is large. The proposed prediction error estimates are demonstrated to be able to detect differences between prediction models in the presence of model misspecification. Application of the proposed estimate to a data set of bone marrow transplant patients presents the usefulness of this method in practice. i

3 Acknowledgments First and foremost, I would like to acknowledge and thank my supervisors Dr. Wenyu Jiang and Dr. Paul Peng for all their guidance. No matter how busy they are, Dr. Jiang and Dr. Peng always made themselves available to discuss and review my master project. I am grateful for their advice and caring on my study. I am thankful to have had them as my mentors and my gratitude cannot be captured in words alone. I would also like to thank Dr. Dongsheng Tu who introduced me to the area of biostatistics and inspired me to pursue my master in this area. I am also grateful to his instructions on the course Advanced Biostatistics so I can understand how to apply statistical methods to real problems in clinical trials. A number of individuals have supported me to finish this master program. The department graduate assistant Jennifer Read helped me a lot on daily routines so that I could stay focused on my study. I would like to thank my friends and colleagues from both the Department of Mathematics and Statistics and the Department of Public Health for making me feel at home from day one. My undergrad classmate Jiadong Mao from the University of Melbourne and Benyu Wang from the University of George Washington helped me with some coding issues in my project. Last but not least, I would like to thank my parents, who have been my constant ii

4 support throughout my life. Their support encouraged me whenever I came across any problems in my study or my life. Their trust in me has been great encouragement for me to overcome all the difficulties and finish this master program. iii

5 Contents Abstract Acknowledgments Contents List of Tables List of Figures i ii iv vi vii Chapter 1: Introduction Background Contribution Organization of the report Chapter 2: Review of Mixture Cure Model and Prediction Error Measurement Chapter Overview Notation and Problem Setting for Survival Data Mixture Cure Model Prediction Error of a Survival Model Expected Brier Score Chapter 3: Prediction Error Measurement for Mixture Cure Model Chapter Overview Measurement of Prediction Error for Mixture Cure Models Other Error Estimates Chapter 4: Simulation Study Chapter Overview Data Generation Resubstitution Method iv

6 4.4 Cross-Validation Method Model Misspecification Model with Redundant Variables Model with Variable Left Out Model Comparison Chapter 5: Application to Bone Marrow Transplant Data Chapter Overview Data Description and Problem Setting Analysis Conclusion Chapter 6: Summary and Future Work Summary Future Work Appendix A: R Codes 53 A.1 Codes Simulation Studies A.2 Codes for Model Comparison A.3 Codes for the Bone Marrow Transplant Data v

7 List of Tables 4.1 Key Features of the Simulated Data Prediction Error Measurement Using Resubstitution Method Prediction Error Measurement Using Cross-Validation Method Prediction Error Measurement for Models with Redundant Variables Prediction Error Measurement for Models with Missing Variable Results for Model Comparison Results for Model Comparison with a Narrower Censoring Distribution Variable Description Prediction Error for Each Model Results from Model 1 and Model Results from Model 3 and Model vi

8 List of Figures 4.1 K-M Plot of the Simulated Data Plots with Different Sample Size using Resubstitution Method Plots with Different Sample Size using Cross-Validation Method Plots of Prediction Error for Models with Redundant Variables Plots of Prediction Error for Models with Missing Variable K-M Plot of the Simulated Data with a Narrower Censoring Distribution K-M Plot of the Entire Data K-M Plot for Each Disease Group vii

9 1 Chapter 1 Introduction 1.1 Background In some cancer clinical trial settings, where participants are grouped and receive different treatments, survival data with a sizable cure fraction are commonly encountered. The cure fraction is a useful measure to monitor trends in survival of curable diseases. Traditional survival analysis techniques, such as the Cox proportional hazards model, provide no direct estimation of the cure fraction. If it is believed that a proportion of individuals will not experience the event of interest, then it may be appropriate to fit models that explicitly allow for the cure fraction to be estimated and directly modeled (Lambert et al., 2007). For this purpose, cure rate models or cure models are often used. Because of the mixture nature of patients in the trials, the most popular type of cure models is the mixture model (Peng, 2003). For example, Weston et al. (2004) used a cure model to study leukemia in children and found that the proportion of cured leukemia children with the ET-2 protocol to be 70% among those who had nonpelvic primary site for less than 10 years without metastatic disease. Li et al.

10 1.1. BACKGROUND 2 (2010) analyzed smoking cessation data with a novel cure model and found a positive but non-significant association between the lapse and recovery frailties. Yilmaz et al. (2013) applied cure models in the anlysis of molecular genetic prognostic factors for disease-free survival and time to disease recurrence in a cohort of patients with axillary lymph node-negative breast cancer. Many researchers have proposed different estimation methods for cure models, such as parametric models (Farewell, 1986; Yamaguchi, 1992; Peng et al., 1998; Wileyto et al., 2013), semiparametric models (Peng, 2003; Niu and Peng, 2013; Zhang et al., 2013), nonparametic models (Peng and Dear, 2000). Besides, Corbiere and Joly (2007) and Cai et al. (2012) have explored a SAS macro and an R package for model application, respectively. It is of great interest to assess the performance of a mixture cure model in terms of its prediction error, which evaluates the extent to which the predicted event outcomes agree with the observed event outcomes on future patients. The performance of point predictors and predictive distributions are often assessed through loss functions and their expectations, which are termed as prediction error or prediction accuracy (Lawless and Yuan, 2010). For the purpose of evaluating risk prediction models in survival analysis, time-dependent residuals are defined by the difference of time-dependent survival status and predicted survival probabilities. Korn and Simon (1990) introduced a general loss function approach for assessing survival models. However one assumption underlying the approach they proposed is that the survival models are correctly specified. This is problematic when one compares different survival models because when models are misspecified, the bias depends on the degree of misspecification. Graf et al. (1999) proposed estimators of the expected Brier score that avoid any dependence on the assumed survival model by using the

11 1.2. CONTRIBUTION 3 inverse probability-of-censoring weight (IPCW). The estimator was later proved to be consistent with the expected Brier score (Gerds and Schumacher, 2006). However, little research has been conducted concerning the estimation of prediction error for a mixture cure model. In this report, I will propose a measurement of the prediction error and its estimation methods for a mixture cure model. Since the availability of future data is always an issue, we usually need to estimate the prediction error of a model based on current data, by allocating some part of the data as the training data set to build the prediction model, and the other part of the data as the testing data set to assess the model. For example, resubstitution or cross-validation methods are often used when estimating the prediction error for survival data (Lawless and Yuan, 2010). In this report, I used both resubstitution and cross-validation methods to calculate the proposed measurement of prediction error for mixture cure model. 1.2 Contribution This project was inspired by the estimators proposed by Graf et al. (1999). It proposed an estimator of prediction error for the mixture cure model. The proposed estimator can be applied to multicovariate cases and can be used to compare different cure models. A simulation study shows that this measurement converges to the expected Brier score when sample size is large and it works well in moderate to large sample size cases, which are typical in the application areas involving survival predictions.

12 1.3. ORGANIZATION OF THE REPORT Organization of the report In Chapter 2, I will review the mixture cure model and prediction error measurement, particularly the assessment method proposed by Graf et al. (1999). Chapter 3 describes the details of the new proposed prediction error measurement and some methods to assess the measurement. Results of simulation studies, with both resubstitution and cross-validation methods, are reported and discussed in Chapter 4. Chapter 5 presents an application of the proposed method to a data set of the bone marrow transplant patients extracted from Klein and Moeschberger (2003). In Chapter 6, some conclusions are made regarding to the use and properties of the proposed measurement of prediction error for mixture cure models. Future work is also discussed.

13 5 Chapter 2 Review of Mixture Cure Model and Prediction Error Measurement 2.1 Chapter Overview This chapter reviews the mixture cure model and prediction error measurement for survival models. Specifically, Section 2.2 presents notation and the problem setting. Section 2.3 reviews the mixture cure model. Section 2.4 reviews prediction error measurement for survival models. 2.2 Notation and Problem Setting for Survival Data Suppose that there is a random sample of n subjects from an underlying population. Let T i and C i denote the potential failure time and the potential censoring time, respectively. For example, T i could be the time from receiving treatment to the relapse of a particular disease and C i could be the time the subject leaves the study or the end of the study for those disease-free subjects. Assume that T i and C i are independent given covariates x i and z i, where x is a covariate vector that contains covariates that affect the survival time of the uncured subjects, and z is a covariate

14 2.3. MIXTURE CURE MODEL 6 vector that affect the uncure probabilities of the subjects. Let T i = min(t i, C i ) denote the observed time for the i th individual, δ i be the censoring indicator: δ i = 1 if T i C i and δ i = 0 otherwise, U i be an indicator of uncured status, i.e. U i = 1 if the patient is not cured and U i = 0 otherwise. Obviously, U i is a latent variable that is only observable for uncensored subjects. Then, the observed data for each subject is a tuple D = { T i, δ i, x i, z i n i=1. In survival analysis, the hazard function is defined as: Pr{t T t + ɛ Z λ(t Z) = lim, (2.1) ɛ 0 Pr{T t Z for a given covairate vector Z. The proportional hazards model was proposed by Cox (1972). It assumes that the hazard function is affected by covariates Z in the form of λ(t) = λ 0 (t) exp{g(z), where λ 0 ( ) is the baseline hazard, and g(z) reflects the covariates effect. Generally, g(z) = β T Z, where β is an unknown parameter vector. This leads to the usual Cox s proportional hazards model λ(t) = λ 0 (t) exp{β T Z. (2.2) 2.3 Mixture Cure Model In some clinical studies, a substantial proportion of patients who respond favorably to treatment may turn out to be free of any signs or symptoms of the disease and may be considered as cured, while the remaining patients may eventually relapse. Long-term censored survival times usually appear in such data (Peng et al., 1998). Farewell (1986) presented a typical case in a study of breast cancer. The problems of interest in that study included the proportion of patients that could be cured and

15 2.3. MIXTURE CURE MODEL 7 the effects of the treatment methods and other factors on the cure rate and on the failure time of uncured patients. Given the notations in the precious section, the mixture cure model can be presented as follows: S(t x, z) = π(z)s u (t x) + 1 π(z), (2.3) where S(t x, z) is the unconditional survival function of T for the entire population, S u (t x) = P (T > t U = 1, x) is the survival function for uncured patients given the covariate vector x = (x 1,..., x m ), and π(z) = P (U = 1 z) is the probability of being uncured given the covariate vector z = (z 1,..., z q ). The probability of being uncured is often expressed in a logistic form as follows: logit[π(z)] = log[π(z)/(1 π(z))] = (1, z )γ, (2.4) where γ is usually used to describe the effect of z on the probability of π(z). Let h u (t x) denote the hazard function of an uncured patient with the covariate x at time t. If the proportional hazards model is considered for h u (t), h u (t x) = h u0 (t) exp(ζ), where h u0 (t) is the baseline hazard function and ζ = β x. The proportional hazards mixture cure model can be written as: S(t x, z) = π(z)s u0 (t) exp(ζ) + 1 π(z) (2.5) where S u0 (t) = exp { t 0 h u0(w)dw is the baseline survival function. Likelihood method is often applied for parameter estimation. For proportional hazards mixture cure models, given cure status u = (u 1,..., u n ), the complete likelihood function for equation (2.5) is n i=1 π(z i ) u i {1 π(z i ) 1 u i h u (t i x i ) δ i S u (t i x i ) u i.

16 2.3. MIXTURE CURE MODEL 8 Peng (2003) proposed that the EM algorithm could be used to fit the semiparametric cure model. He proposed that the E-step in the (r + 1)th iteration calculated the conditional expected complete log-likelihood function given estimates at the rth iteration, β (r),γ (r), and S (r) u0 (t), which was the sum of the following two functions: L 1 (γ) = log n i=1 π(z i ) p(r) i {1 π(z i ) 1 p(r) i (2.6) L 2 (β, S u0 (t)) = log n i=1 [h u0 (t i ) exp(ζ i )] δ i S u0 (t i ) p(r) i exp(ζ i ) (2.7) where p (r) i δ i + (1 δ i ) = E{u i γ (r), β (r), S (r) u0 (t) = P {u i = 1 γ (r), β (r), S (r) u0 (t), and is given by π (r) (z i )S (r) exp (ζ(r) u0 (t i ) i ) 1 π (r) (z i ) + π (r) (z i )S (r) u0 (t i ) exp (ζ(r) logit[π (r) ((z i ))] = (1, z i)γ (r), and ζ (r) = x iβ (r). The M-step in the (r+1)th iteration maximizes equation (2.6) and (2.7) separately to obtain γ (r+1), β (r+1) and S (r+1) u0 (t). The algorithm is iterated until it converges. A log-normal mixture cure model is also considered in this work, where for individuals who are uncured (U = 1), the time to the event can be modeled as a log-normal distribution, with density function: i ), f(t U = 1, x) = φ ( log(t) µ σ σt ) (2.8) where φ( ) is the probability density function of a standard normal distribution, σ > 0 is a shape parameter and µ = β x. So the log-normal mixture cure models can be

17 2.3. MIXTURE CURE MODEL 9 written as: [ ( )] log(t) µ S(t x, z) = π(z) 1 Φ + 1 π(z) (2.9) σ where Φ( ) is the cumulative density function of a standard normal distribution. The likelihood method is also applied to obtain the estimates of the parameters. When the survival time for uncured patients is assumed to follow an exponential distribution, an exponential mixture cure model is used to fit the data. The density function for uncured patients in exponential mixture cure models is: f(t U = 1, x) = λe λt (2.10) where λ > 0 is called an inverse scale parameter. We know that exponential distribution is a log location-scale distribution with form: Y = log(t ) = µ + W (2.11) where µ = β x and W EV (0, 1). Thus, S T (t x) = P (Y log(t)) = e e[log(t) µ(x)] = [e elog(t)] e µ(x) = [S 0 (t)] e µ(x) = [S 0 (t)] eβ x where S 0 (t) is the baseline survival function with x = 0. Thus, exponential distribution follows the proportional hazards assumption and the exponential mixture cure model is a special case of proportional hazards mixture cure models.

18 2.4. PREDICTION ERROR OF A SURVIVAL MODEL Prediction Error of a Survival Model Let Y be the actual value of survival time and Ŷ be the estimated value of Y, then the error of a prediction Ŷ of Y can be quantified by using a loss function L(Y, Ŷ ), assumed to be nonnegative with L(Y, Y ) = 0 (Lawless and Yuan, 2010). Some of the most common loss functions are: squared error loss (Y Ŷ )2, absolute error loss Y Ŷ, and misclassification error loss I(W t Ŵt), where W t is the actual survival status, and Ŵt is the estimate of W t. Sometimes people may also consider the squared and absolute error losses on the log scale, i.e. taking the logarithm of Y, because the distribution of log(y ) is typically more symmetric. The performance of a predictor Ŷ = Ĝ(z) is measured by the expected loss or prediction error: P = E{L(Y, Ĝ(z)) (2.12) Expected Brier Score Brier index or Brier score (Brier, 1950) has been widely used to assess the quality of probability estimates. Suppose on each of n occasions, an event can occur only at one of r possible classes or categories and on each occasion, the probability that the event will occur in class j is f ij, i = 1,..., n, j = 1,..., r. The Brier score is defined as: BS = 1 n r n (f ij E ij ) 2 (2.13) j=1 i=1 where E ij indicates whether the event of occasion i occurred in class j or not. E ij = 1 if the event of occasion i occurred in class j and E ij = 0 otherwise.

19 2.4. PREDICTION ERROR OF A SURVIVAL MODEL 11 This score quantifies the accuracy of a set of judgments, by comparing the expressed probabilities to the actual outcomes (Redelmeier et al., 1991). In survival analysis, time-dependent residuals are defined by the difference of time-dependent survival status and predicted survival probabilities, i.e. at a fixed time t, the timedependent residuals are defined as I(T > t ) S(t ). For observed data D, the Brier score at time t can be defined as: BS(t ) = 1 n n [I(T i > t ) S D (t x i )] 2 (2.14) i=1 where S D (t x i ) are the survival probabilities for each subject i at time t, given observed covariates x i in data set D if all T i s are observed. And I(T i > t ) indicates whether subject i is still alive at time t. The expected Brier score at time t is defined as: EBS(t ) = E{[I(T > t ) S D (t x)] 2 (2.15) It can be interpreted as a mean squared error of prediction when the survival probabilities S D (t x) are viewed as predictions of the event status at time t, and the expectation is taken over the data set D based on which the prediction model is built, and T and x from the underlying distribution. Censoring often occurs in survival data, which makes it difficult to calculate the Brier score (2.14) directly. Graf et al. (1999) proposed an estimator of the expected Brier score as the measure of prediction error for right censored survival data with prognostic classification schemes, which was later proved to be consistent with the

20 2.4. PREDICTION ERROR OF A SURVIVAL MODEL 12 expected Brier score (Gerds and Schumacher, 2006). They proposed that given observed covariates x i, the prediction error for right censored survival data at time point t can be measured as follows: BS c (t ) = 1 n n {(0 ŜD(t x i )) 2 I( T i t, δ i = 1)(1/Ĝ( T i ))+(1 ŜD(t x i )) 2 I( T i > t )(1/Ĝ(t )) i=1 (2.16) where ŜD(t x i ) are the estimated event-free probabilities, Ĝ(t) denotes the Kaplan- Meier estimate of the censoring distribution G, i.e., the Kaplan-Meier estimate based on ( T i, 1 δ i ), i = 1,..., n. The contribution of the entire data set to the Brier score can be classified into three categories: Category 1: Ti t and δ i = 1; Category 2: Ti > t (δ i = 1 or δ i = 0); Category 3: Ti t and δ i = 0. For an uncensored observation of category 1, the event occurred before t, and the event status at t is equal to I(T i > t ) = 0; thus the contribution to the Brier score is (0 ŜD(t x i )) 2. In category 2 the observed event status at t is equal to 1 since all of these patients are known to be event free at t ; the resulting contribution to the Brier score is (1 ŜD(t x i )) 2. For a censored observation in category 3, the censoring occurred before t, and the event status at t is unknown. Thus this observation does not contribute to the Brier score. Since some censored observations do not contribute to the Brier score calculation, remaining individual contributions have to be reweighted here to compensate for the loss of information due to the exclusion of some censored observations. Observations

21 2.4. PREDICTION ERROR OF A SURVIVAL MODEL 13 in category 1 get the weight 1/Ĝ( T i ), where G( T i ) = P (C i T i ), which corresponds to the situation of category 1; those in category 2 get the weight 1/Ĝ(t ), where G(t ) = P (C i t ), which corresponds to the situation of category 2; and those in category 3 get weight zero. This weighting scheme is called the inverse probabilityof-censoring weight (IPCW), which was first proposed by Robins et al. (1995). Gerds and Schumacher (2006) proved that this weighting scheme does not depend on the estimated event-free probabilities ŜD(t x i ) and hence the estimator is robust against misspecification of the survival model.

22 14 Chapter 3 Prediction Error Measurement for Mixture Cure Model 3.1 Chapter Overview This chapter describes a method for measuring prediction error for mixture cure models. Specifically, section 3.2 describes the details of the method for prediction error measurement on mixture cure models. Section 3.3 introduced some other assessment methods for prediction error to compare with the proposed estimate. 3.2 Measurement of Prediction Error for Mixture Cure Models In the setting of mixture cure models, the expected Brier score defined as: EBS(z) = E{[U ˆπ(z D)] 2 (3.1) where ˆπ(z D) represents the estimated uncure probabilities based on covariate vector z, given data D. The expectation is taken over the data D and the covariate vector z and uncure status U of future patients. The expected Brier score can be interpreted

23 3.2. MEASUREMENT OF PREDICTION ERROR FOR MIXTURE CURE MODELS 15 as a mean squared error of prediction when the estimated probabilities ˆπ i are viewed formally as predictions of the uncured status. It can be used as a measurement of prediction error for cure rate in the mixture cure models. Using the estimated Brier score has a remarkable advantage. At the baseline time point, the survival status is unknown at time t. This means that the predictions are made in terms of predictive values of a diagnostic test, i.e., probabilities of a positive or negative cure status, instead of classifying the patient as cured or uncured. It has been documented in literature that even in the diagnostic setting the predictive value associated with the result of a diagnostic test is more relevant than the test result itself (Graf et al., 1999). Thus, to judge the quality of classification, the Brier score, which measures the average discrepancies between true uncure status and estimated predictive values, may be more preferable than the misclassification rate, which only considers the proportion of an observation being allocated to the incorrect group. Resubstitution and cross-validation are often used when estimating the prediction error for survival data. In this report, I use both of the two methods to estimate the Brier score for the mixture cure model in the presence of censored subjects. Let L denote the proposed estimate of prediction error, L R and L CV denote the value of L calculated by resubstitution and cross-validation method respectively. In resubstitution, one simply builds a mixture cure model based on the entire data set, and then calculate the value of estimate of prediction error using the predicted uncure rate from the model on the same data set. That is, the the prediction error

24 3.2. MEASUREMENT OF PREDICTION ERROR FOR MIXTURE CURE MODELS 16 for mixture cure model, using resubstitution method, can be calculated by: L R = 1 n n {(1 ˆπ(z i D)) 2 I(t i c i )(1/ĜD(t i )) + (0 ˆπ(z i D)) 2 I(t i > t )(1/ĜD(t )) i=1 (3.2) where ˆπ(z i D) are the estimated uncure probabilities based on covariate vector z in data D; t i and c i are respectively the observed failure time and censoring time; t is the largest observed failure time; and ĜD( ) is the Kaplan-Meier estimate of the survival function for the censoring time based on data D. The idea underlying this estimate is that the subjects are in the following categories: Category 1: δ i = 1; Category 2: t i > t ; Category 3: t i t and δ i = 0. All patients in Category 1 are uncured since their failure time is observed; thus their contribution to the Brier score is (1 ˆπ(z i D)) 2. It is assumed that those censored after t are cured. So all patients in Category 2 are assumed to be cured and the resulting contribution of this category to the Brier score is (0 ˆπ(z i D)) 2. However, the cure status of patients in Category 3 is not known so the contribution of this category to the Brier score is not included here. I use the same weighting scheme as in Graf et al. s paper to recover the loss of information due to censoring. Divided by n, these weights sum up to 1. Patients in categories 1 and 2 have C i > T i and C i > t respectively, and they are assigned the weights of 1/(nĜD( ˆT i )) and 1/(nĜD(t )) according to IPCW scheme. Category 3 is dropped for unknown status.

25 3.2. MEASUREMENT OF PREDICTION ERROR FOR MIXTURE CURE MODELS 17 However, when assessing the prediction accuracy of a model with the same data on which the model is built, over-fitting may occur. It is a term that refers to a situation when the model requires more information than the data can provide. Generally, when over-fitting occurs, the prediction error will be underestimated. Cross-validation techniques are always used to overcome this issue. These techniques do not use the entire data set when building a model. Some cases are removed before the data is modeled. The removed cases are often called a testing set and the remaining cases are called a training set. Once the model has been built using the training set, the testing set can be used to test the performance of the model on the unseen data. There are several strategies to conduct cross-validation, such as Split-half Cross Validation, Leave One Out Cross Validation (LOOCV), Bootstrapped LOOCV, etc. In this report, I used the K fold cross validation method. This method splits the data into K folds of the same size, denoted by D 1,..., D K, and repeatedly leaves one fold out as the testing data set and uses the remaining K 1 folds as the training data set. The cross-validation estimate for L is given as: L CV = 1 n K k=1 l F k {(1 ˆπ(z l D k )) 2 I(t l c l )(1/Ĝ k(t l )) (3.3) +(0 ˆπ(z l D k )) 2 I(t l > t k)(1/ĝ k(t k)) where F k is the collection of patients in fold k, ˆπ(z l D k ) are predicted uncure rate for patient l in fold k, which is calculated by the coefficients from mixture cure models built on training data set and covariates of patient l in the testing data set. Ĝ k and t k are respectively the Kaplan-Meier estimate of the survival function for censoring

26 3.3. OTHER ERROR ESTIMATES 18 time and the largest observed failure time in the data set with fold k left out. In this report, I set K = 5 when doing the calculation. 3.3 Other Error Estimates To assess the performance of the proposed estimate of prediction error L, I compare it with two error estimators: BS 1 = 1 n n (U i π i ) 2 (3.4) i=1 and BS 2 = 1 n n (U i ˆπ(z i D)) 2 (3.5) i=1 where U i are the true uncure status of the subjects in the generated data, π i and ˆπ(z i D) are respectively the true uncure rate and the estimated uncure rate from mixture cure models. BS 1 is based on the true uncure probabilities and represents the mean square difference between cure status and cure rate without any estimation error, while BS 2 is a prediction error based on ˆπ(z i D) when cure statuses are available. Neither BS 1 nor BS 2 can be calculated for real data though and they are proposed for the simulation study in the next chapter to investigate the properties of the proposed estimator (3.2) The expectation of BS 1 can be calculated as follows: E(BS 1 ) = E [ 1 n ] n (U i π i ) 2 i=1 =E[(U π) 2 ] = E[E[(U π) 2 π]] (3.6) =E[π(1 π)] = E(π) E(π 2 )

27 3.3. OTHER ERROR ESTIMATES 19 where π is a random variable depending on the covariate vector z. Similarly, the variance of BS 1 can be calculated as follows: V ar(bs 1 ) = V ar [ 1 n ] n (U i π i ) 2 = 1 n V ar[(u π)2 ] i=1 = 1 n {E[(U π)4 ] (E[(U π) 2 ]) 2 U follows a binary distribution with rate π E[(U π) 4 ] is the fourth central moment of U. (3.7) E[(U π) 4 ] = π(1 π) 3π 2 (1 π) 2 = π 4π 2 + 6π 3 3π 4. V ar(bs 1 ) = 1 n {E(π) 5E(π2 ) + 8E(π 3 ) 4E(π 4 ). BS 2 calculated by the cross-validation method can be given as: BS CV 2 = 1 n K k=1 l F k (U l ˆπ(z l D k )) 2 (3.8) where F k is the collection of patients in fold k, U l is the true uncure status of patients in fold k, and ˆπ(z l D k ) is calculated by the covariates of the data set in fold k (testing data set) and the coefficients of the cure model based on the data set with fold k left out (training data set). Though cannot be observed, the value of BS 1 represents the true value of prediction error, measured by the expected Brier score. Thus, the difference between L R (or L CV ) and the expectation of BS 1 measures how well it measures prediction error of the model in terms of accuracy. Since we assume that the cure status for all subjects are known when calculating BS 2 (or BS CV 2 ), there s no loss of information due to censoring. Thus, the comparison between L R and BS 2, or that between L CV

28 3.3. OTHER ERROR ESTIMATES 20 and BS CV 2 examines the appropriateness of the weighting scheme to deal with right censoring. The prediction error measures the accuracy of the prediction model built on the observed data, and should ideally be assessed on a large independent test set. This ideal assessment is reflected by L New, although in application this is not possible because of the lack of the independent test set. L New can be calculated as: L New = 1 m m j=1 {(1 ˆπ(z New j D)) 2 I(t New j (0 ˆπ(z New j D)) 2 I(t New j > t )(1/ĜD(t )) c New j )(1/ĜD(t New j ))+ (3.9) where ˆπ(z New j D) are predicted uncure probabilities for patients in the new data set, which is calculated by the coefficients from mixture cure models built on data D and covariate vector z j in the new test set, t New j censoring time in the test set respectively. and c New j represent the failure time and Similarly, the value of BS 2 for the independent test set can be calculated as: BS New 2 = 1 m m j=1 (U j ˆπ(z New j D)) 2 (3.10) L New and BS New 2 are only computable on simulation data and are for comparison purposes.

29 21 Chapter 4 Simulation Study 4.1 Chapter Overview This chapter investigates properties of the proposed measurement of prediction error for mixture cure models via simulation. Specifically, section 4.2 presents data generation for the simulation study. Section 4.3 and section 4.4 respectively presents the results using resubstitution method and cross-validation methods. Results of the proposed estimate under model misspecification were presented in section 4.5. Section 4.6 presents the results of model comparison using the measurement of prediction error proposed in this report. The section compares the prediction error of one proportional hazards mixture cure model and one log-normal mixture cure model built on the same data. The R codes for all the simulation in this chapter can be found in Appendix A. 4.2 Data Generation Two binary variables z 1 and z 2 are generated and both follow a Bernoulli distribution with rate 0.5. For the cure part of the model, we set γ 0 = 2, γ 1 = 1, and γ 2 =

30 4.2. DATA GENERATION , which correspond to the logistic form of the uncure rate π as: logit(π) = 2 z 1 1.5z 2 (4.1) i.e. π(z 1, z 2 ) = exp(2 z 1 1.5z 2 ) 1 + exp(2 z 1 1.5z 2 ) (4.2) The uncure status U i for each subject i is generated from a Bernoulli distribution with rate π i generated by equation (4.2). For uncured subjects, the standard exponential distribution is used as the baseline distribution. The coefficients of z 1 and z 2 that affects the survival time are β 1 = log(0.5) and β 2 = log(0.4) respectively. Thus, the survival time for the uncured subjects follows: S u (t) = S u0 (t) exp{log(0.5)z 1+log(0.4)z 2 (4.3) where S u0 = e t. Referring to equation (2.5), the survival function of all the subjects can be written as: S(t z 1, z 2 ) = π(z 1, z 2 ) S u0 (t) exp{log(0.5)z 1+log(0.4)z π(z 1, z 2 ) (4.4) The censoring times are generated according to a uniform distribution in the range [0,60]. Under these assumptions, 200 different data sets (each with sample size 100) are generated using R software and fitted using the mixcure package programmed by Yingwei Peng. Since both z 1 and z 2 are binary variables with rate 0.5, subjects in the data set

31 4.2. DATA GENERATION 23 can be split into four groups: Group 1: z 1 = 0, z 2 = 0; S(t) = exp( t) Group 2: z 1 = 1, z 2 = 0; S(t) = exp( 0.5t) Group 3: z 1 = 0, z 2 = 1; S(t) = exp( 0.4t) Group 4: z 1 = 1, z 2 = 1; S(t) = exp( 0.2t) After the data sets are generated, Kaplan-Meier survival curves of all the data sets are plotted in Figure 4.1. The theoretical survival curve for each group is also plotted in the figure. Some key features of the generated data are described in Table 4.1. Figure 4.1: K-M Plot of the Simulated Data

32 4.2. DATA GENERATION 24 Table 4.1: Key Features of the Simulated Data Cure Rate 1st Quartile of T Median of T 3rd Quartile of T Group (0.1192) (0.2877) (0.7297) (1.3402) Group (0.2689) (0.6895) (1.2824) (2.2959) Group (0.3775) (0.8252) (1.4562) (2.5349) Group (0.6225) (0.9205) (1.6858) (2.9941) Note: The censoring rate for the generated data is The values in brackets are from theoretical distributions in each group. Graphically, one feature of the data that is suitable for a mixture cure model is that the survival curve levels off toward a positive asymptote. Figure 4.1 and Table 4.1 demonstrate that: 1) the data sets are correctly generated since the survival function of the generated data are quite close to the theoretical survival functions for each group; 2) both the generated data and the theoretical data have a flat tail in the survival curve, which means data is suitable for a mixture cure model. The key features presented in Table 4.1 also demonstrate that the cure rate of the generated data is very close to the theoretical value of E[(1 π(z 1, z 2 ))] = From equation (3.6) we can calculate that under the assumptions in this report, the expectation of BS 1 can be calculated as follows: E(BS 1 ) = E(π) E(π 2 ) π = exp(2 z 1 1.5z 2 ) 1 + exp(2 z 1 1.5z 2 ) and both z 1 and z 2 can be either 1 or 0 with probability 0.5 π can be , , ,or , all with probability E(π) = ( )/4 = ; E(π 2 ) = ( )/4 = E(BS 1 ) = =

33 4.3. RESUBSTITUTION METHOD 25 From equation (3.7), the variance of BS 1 can be calculated as follows: V ar(bs 1 ) = 1 n {E(π) 5E(π2 ) + 8E(π 3 ) 4E(π 4 ) E(π) = , E(π 2 ) = , E(π 3 ) = , E(π 4 ) = V ar(bs 1 ) = 1 { n = 1 n Resubstitution Method I first used resubstitution method to calculate the proposed estimate of prediction error. The resubstitution estimates of the prediction error L R were calculated as in equation (3.2). To check whether over-fitting is a problem when calculating L using resubstitution method, an independent data sets with the same sample size is generated as the test set. The prediction error measured by the test set was calculated as L New in equation (3.9). BS R 2 and BS New 2 were calculated as in equation (3.5) and (3.10) respectively. The comparison plots and main statistical features of different statistics are shown in Figure 4.2 and Table 4.2. Different sample sizes are used when plotting and doing calculation in order to detect the value that L R and L New converge to. The dashed lines in the plots indicate the theoretical value of E(BS 1 ). The red line in each plot has intercept 0 and slope 1. It demonstrates the similarity between the statistics on the horizontal axis and vertical axis. Ideally, if the two statistics are exactly the same, the simulated circles should lie perfectly on the red line. Results in Figure 4.2 show that the black circles in both the plots of L R v.s. BS R 2 and L New v.s. BS New 2 agree with the ideal red line well. Thus, the weighting scheme in the proposed estimate is appropriate for right-censored data. Besides, when sample

34 4.3. RESUBSTITUTION METHOD 26 Table 4.2: Prediction Error Measurement Using Resubstitution Method n=60 n=80 n=100 n=200 n=500 n=1000 L R L New BS2 R BS2 New Mean Standard Deviation Mean Standard Deviation Mean Standard Deviation Mean Standard Deviation Mean Standard Deviation Mean Standard Deviation Note: E(BS 1 ) = , SD(BS 1 ) = 1 n size increases, the black circles in both the plots of L R v.s. BS R 2 and L New v.s. BS New 2 become more focused to the intersection of the two dashed lines, which means that all of L R, L New, BS R 2 and BS New 2 converge to E(BS 1 ) as sample size increases. In addition, from the mean values of L R and L New in Table 4.2 we can conclude that the value of L R is always smaller than that of L New, which means the value of L calculated by resubstitution method slightly underestimates the prediction error. The comparison between L R and BS New 2 also demonstrates the underestimation of prediction error, especially when the sample size is small.

35 4.3. RESUBSTITUTION METHOD 27 Figure 4.2: Plots with Different Sample Size using Resubstitution Method

36 4.4. CROSS-VALIDATION METHOD Cross-Validation Method The values of L CV were calculated as in equation (3.3), and the values of BS CV 2 were calculated as in equation (3.6) with K = 5. The results are demonstrated in Figure 4.3 and Table 4.3. n=60 n=80 n=100 n=200 n=500 n=1000 Table 4.3: Prediction Error Measurement Using Cross-Validation Method L CV BS2 CV L CV BS2 New L CV L New Mean Standard Deviation Mean Standard Deviation Mean Standard Deviation Mean Standard Deviation Mean Standard Deviation Mean Standard Deviation Note: E(BS 1 ) = SD(BS 1 ) = 1 n L CV Figure 4.3 presents similar results as in Figure 4.2. The black circles in plots of v.s. BS CV 2 are generally around the ideal red line, which means the weighting scheme still works well for cross-validation method. When sample size increases, the black circles are more focused to the ideal line and converges to E(BS 1 ). Table 4.3 compares the values of L using both resubstitution and cross-validation methods with the same data set. The results demonstrate that the values of L calculated by crossvalidation method is slightly larger than L New. This means cross-validation method may tend to over-estimate the prediction error. This is because the training sets contains fewer patients than the whole observed data. The difference between L CV

37 4.4. CROSS-VALIDATION METHOD 29 and BS New 2 is small, especially when the sample size is large. This implies the crossvalidation method is quite accurate. Thus, in the following sections, I will present the value using both resubstitution and cross-validation methods. Figure 4.3: Plots with Different Sample Size using Cross-Validation Method

38 4.5. MODEL MISSPECIFICATION Model Misspecification The sensitivity of the proposed prediction error measurement to model misspecification is also of interest. In this section, I examine the performance of L R and L CV with both redundant variables and missing variable. The values calculated in misspecification situations are compared with the values in correct specified models to compare the sensitivity of the proposed method. Similarly, I also compare the values calculated in misspecification situations with BS 1 and BS 2 (or BS CV 2 ) to check its accuracy and the performance of the IPCW scheme Model with Redundant Variables Two redundant covariates z 3 and z 4 are generated. z 3 Bernoulli(0.5) and z 4 N(0, 1). Neither z 3 nor z 4 is correlated with the uncure rate or the survival time in the simulation, but both are included when building mixture cure models. The values of z 1 and z 2 in the data sets are exactly the same as those used in the two sections. Both resubstitution and cross-validation methods are used to calculate the estimate of prediction error. The former one is denoted as L R R, while the latter one is denoted as L CV R. BS 2R and BS2R CV represents the values of BS 2 for models with redundant variables calculated using resubstitution and cross-validation methods respectively. L New R and BS New 2R denote the value of L and BS 2 in models with redundant variables calculated by new test set respectively. The results are presented in Figure 4.4 and Table 4.4. Compared with the results in section 4.3 and 4.4, we can find that the weighting scheme still works well in models with redundant variables. All of L R R, LCV R, BS 2R, and BS CV 2R converge to E(BS 1). Both the difference between L R R and LR, and the

39 4.5. MODEL MISSPECIFICATION 31 Table 4.4: Prediction Error Measurement for Models with Redundant Variables n=60 n=80 n=100 n=200 n=500 n=1000 L CV R BS2R CV L R R BS 2R L New R BS2R New L CV R LCV L R R LR Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Note: E(BS 1 ) = , SD(BS 1 ) = 1 n , SD denotes standard deviation difference between L CV R and LCV are small, especially when the sample size is large. This means models with redundant variables are not so different from the correctly specified models, in terms of prediction error. In addition, the comparison between L New R and L R R demonstrates that in models with redundant variables, the value of L calculated by resubstitution method still underestimates the prediction error, while the value calculated by cross-validation method is accurate. The difference between L R R and LNew R in Table 4.4 is larger than the difference between L R and L New in Table 4.2. This suggests the resubstitution method tends to underestimate the prediction error more seriously when redundant variables are included in the model.

40 4.5. MODEL MISSPECIFICATION 32 Figure 4.4: Plots of Prediction Error for Models with Redundant Variables

41 4.5. MODEL MISSPECIFICATION Model with Variable Left Out Consider the same data sets in section 4.3 and 4.4. When building the prediction models, I omit one covariate (z 2 ) in the model. The values of the estimate calculated by resubstitution and cross-validation methods are denoted as L R M and LCV M, respectively. BS 2M and BS CV 2M respectively represents the value of BS 2 calculated by resubstitution and cross-validation methods. L New M and LNew M denote the value of L and BS 2 in models with variable left out calculated by new test set respectively. The results are presented in Figure 4.5 and Table 4.5. n=60 n=80 n=100 n=200 n=500 Table 4.5: Prediction Error Measurement for Models with Missing Variable n=1000 L CV M BS2M CV M BS 2M L New M BS2M New L CV M LCV L R M LR Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Note: E(BS 1 ) = , SD(BS 1 ) = 1 n , SD denotes standard deviation The comparison between L R M and LNew M shows that when models are built with some covariates left out, the value of L using resubstitution method still underestimates the prediction error. The comparison between L CV M and BSNew 2M demonstrates that the value of L calculated by cross-validation method overcomes the underestimation issue. Figure 4.5 demonstrates that when an important variable is missing, the

42 4.5. MODEL MISSPECIFICATION 34 weighting scheme still works well as the black circles in all plots are around the red line. This is similar to what is shown in Graf et al. s paper (Graf et al., 1999) that the weighting scheme is robust against model misspecification. However, as sample size increases, all the statistics tend to converge to a value greater than E(BS 1 ). The bias is caused by the missing of the important information (z 2 ) when building cure models. The results also indicate that the mixture cure models with missing covariate tend to have larger prediction errors. The results in section and indicate that when models are misspecified, the values of L calculated by resubstitution method underestimate the prediction error, while the values calculated by cross-validation method overcomes the underestimation problem. The IPCW weighting scheme works well even when models are misspecified. When models are built with redundant variables, the values of L, either calculated by resubstitution or cross-validation method, are similar to the correct model, which means the models are not very different in terms of prediction error. However, when models are built with some important variables left out, either L R or L CV detects a larger prediction error in the misspecified models. Thus, the proposed estimate can be used to assess the importance of a variable in a model and for compare prediction models. If the prediction error of the model with a specific variable is smaller than that of the model without the specific variable, it means the variable is important and should be included in the prediction model.

43 4.5. MODEL MISSPECIFICATION 35 Figure 4.5: Plots of Prediction Error for Models with Missing Variable

44 4.6. MODEL COMPARISON Model Comparison In this section, we compare the prediction errors of two models, one with semiparametric PH mixture cure model, and the the other is the log-normal mixture cure model. The data are generated from the exponential mixture cure model as described in section 4.2. Model 1 is built by fitting a semiparametric cure model that models the survival time of the uncured subjects by the Cox s proportional hazards model. Model 2 is built by fitting a parametric model that assumes the survival time of the uncured subjects follows a log-normal distribution. From the assumptions for data generation, it is known that model 1 is a correct model and model 2 is a wrong model for the data. The values of L R and L CV calculated in model 1 are denoted as L R m1 and L CV m1, while those calculated in model 2 are denoted as L R m2 and L CV m2. Results under different sample sizes are presented in Table 4.6. Table 4.6: Results for Model Comparison L R m1 L R m2 L CV m1 L CV m2 L R m2 L R m1 L CV m2 L CV m1 n= n= n= n= n= n= The results do not show any difference of the two models in terms of prediction error under any sample size. The reason is that both models produced similar cure rate estimates due to long followup in the data. When the followup time is long, it is clear which subjects are cured and which are uncured in the data set. The models are

45 4.6. MODEL COMPARISON 37 built by the uncured subjects in the data set. Thus, whatever distribution we assume the survival time of the uncured subjects follows, the coefficients of the models are similar, which in turn gives similar estimates of cure rates. To further compare the prediction estimation between the two models when the the followup in the data shorter, I reduced the range of the censoring distribution of the simulated data to U[0, 13]. The K-M plot of the new generated data is presented in Figure 4.6. The results of model comparison are presented in Table 4.7. The censoring rate for the new generated data is 47%. Figure 4.6: K-M Plot of the Simulated Data with a Narrower Censoring Distribution

Joint Modeling of Longitudinal Item Response Data and Survival

Joint Modeling of Longitudinal Item Response Data and Survival Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What? You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David

More information

Part III Measures of Classification Accuracy for the Prediction of Survival Times

Part III Measures of Classification Accuracy for the Prediction of Survival Times Part III Measures of Classification Accuracy for the Prediction of Survival Times Patrick J Heagerty PhD Department of Biostatistics University of Washington 102 ISCB 2010 Session Three Outline Examples

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Advanced Methodology Developments in Mixture Cure Models

Advanced Methodology Developments in Mixture Cure Models University of South Carolina Scholar Commons Theses and Dissertations 1-1-2013 Advanced Methodology Developments in Mixture Cure Models Chao Cai University of South Carolina Follow this and additional

More information

Statistical Modeling and Analysis for Survival Data with a Cure Fraction

Statistical Modeling and Analysis for Survival Data with a Cure Fraction Statistical Modeling and Analysis for Survival Data with a Cure Fraction by Jianfeng Xu A thesis submitted to the Department of Mathematics and Statistics in conformity with the requirements for the degree

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

TMA 4275 Lifetime Analysis June 2004 Solution

TMA 4275 Lifetime Analysis June 2004 Solution TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials

Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Progress, Updates, Problems William Jen Hoe Koh May 9, 2013 Overview Marginal vs Conditional What is TMLE? Key Estimation

More information

Survival Analysis. Stat 526. April 13, 2018

Survival Analysis. Stat 526. April 13, 2018 Survival Analysis Stat 526 April 13, 2018 1 Functions of Survival Time Let T be the survival time for a subject Then P [T < 0] = 0 and T is a continuous random variable The Survival function is defined

More information

Multivariate Survival Analysis

Multivariate Survival Analysis Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in

More information

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

More information

Lecture 8 Stat D. Gillen

Lecture 8 Stat D. Gillen Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels

More information

Meei Pyng Ng 1 and Ray Watson 1

Meei Pyng Ng 1 and Ray Watson 1 Aust N Z J Stat 444), 2002, 467 478 DEALING WITH TIES IN FAILURE TIME DATA Meei Pyng Ng 1 and Ray Watson 1 University of Melbourne Summary In dealing with ties in failure time data the mechanism by which

More information

5. Parametric Regression Model

5. Parametric Regression Model 5. Parametric Regression Model The Accelerated Failure Time (AFT) Model Denote by S (t) and S 2 (t) the survival functions of two populations. The AFT model says that there is a constant c > 0 such that

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements [Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements Aasthaa Bansal PhD Pharmaceutical Outcomes Research & Policy Program University of Washington 69 Biomarkers

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Package threg. August 10, 2015

Package threg. August 10, 2015 Package threg August 10, 2015 Title Threshold Regression Version 1.0.3 Date 2015-08-10 Author Tao Xiao Maintainer Tao Xiao Depends R (>= 2.10), survival, Formula Fit a threshold regression

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

PhD course: Statistical evaluation of diagnostic and predictive models

PhD course: Statistical evaluation of diagnostic and predictive models PhD course: Statistical evaluation of diagnostic and predictive models Tianxi Cai (Harvard University, Boston) Paul Blanche (University of Copenhagen) Thomas Alexander Gerds (University of Copenhagen)

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS330 / MAS83 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-0 8 Parametric models 8. Introduction In the last few sections (the KM

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Distributions, Hazard Functions, Cumulative Hazards BIO 244: Unit 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution

More information

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE:

Data splitting. INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+TITLE: #+TITLE: Data splitting INSERM Workshop: Evaluation of predictive models: goodness-of-fit and predictive power #+AUTHOR: Thomas Alexander Gerds #+INSTITUTE: Department of Biostatistics, University of Copenhagen

More information

Parametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored Data

Parametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored Data Columbia International Publishing Journal of Advanced Computing (2013) 1: 43-58 doi:107726/jac20131004 Research Article Parametric Maximum Likelihood Estimation of Cure Fraction Using Interval-Censored

More information

1 The problem of survival analysis

1 The problem of survival analysis 1 The problem of survival analysis Survival analysis concerns analyzing the time to the occurrence of an event. For instance, we have a dataset in which the times are 1, 5, 9, 20, and 22. Perhaps those

More information

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes Introduction Method Theoretical Results Simulation Studies Application Conclusions Introduction Introduction For survival data,

More information

Prediction Performance of Survival Models

Prediction Performance of Survival Models Prediction Performance of Survival Models by Yan Yuan A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Doctor of Philosophy in Statistics Waterloo,

More information

The influence of categorising survival time on parameter estimates in a Cox model

The influence of categorising survival time on parameter estimates in a Cox model The influence of categorising survival time on parameter estimates in a Cox model Anika Buchholz 1,2, Willi Sauerbrei 2, Patrick Royston 3 1 Freiburger Zentrum für Datenanalyse und Modellbildung, Albert-Ludwigs-Universität

More information

Ph.D. course: Regression models. Introduction. 19 April 2012

Ph.D. course: Regression models. Introduction. 19 April 2012 Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 19 April 2012 www.biostat.ku.dk/~pka/regrmodels12 Per Kragh Andersen 1 Regression models The distribution of one outcome variable

More information

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen Recap of Part 1 Per Kragh Andersen Section of Biostatistics, University of Copenhagen DSBS Course Survival Analysis in Clinical Trials January 2018 1 / 65 Overview Definitions and examples Simple estimation

More information

MODELING MISSING COVARIATE DATA AND TEMPORAL FEATURES OF TIME-DEPENDENT COVARIATES IN TREE-STRUCTURED SURVIVAL ANALYSIS

MODELING MISSING COVARIATE DATA AND TEMPORAL FEATURES OF TIME-DEPENDENT COVARIATES IN TREE-STRUCTURED SURVIVAL ANALYSIS MODELING MISSING COVARIATE DATA AND TEMPORAL FEATURES OF TIME-DEPENDENT COVARIATES IN TREE-STRUCTURED SURVIVAL ANALYSIS by Meredith JoAnne Lotz B.A., St. Olaf College, 2004 Submitted to the Graduate Faculty

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 25 April 2013 www.biostat.ku.dk/~pka/regrmodels13 Per Kragh Andersen Regression models The distribution of one outcome variable

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Tied survival times; estimation of survival probabilities

Tied survival times; estimation of survival probabilities Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation

More information

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016 Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of

More information

Multistate models and recurrent event models

Multistate models and recurrent event models Multistate models Multistate models and recurrent event models Patrick Breheny December 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Multistate models In this final lecture,

More information

Residuals and model diagnostics

Residuals and model diagnostics Residuals and model diagnostics Patrick Breheny November 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/42 Introduction Residuals Many assumptions go into regression models, and the Cox proportional

More information

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

More information

Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes

Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes Semiparametric Models for Joint Analysis of Longitudinal Data and Counting Processes by Se Hee Kim A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

Multistate models in survival and event history analysis

Multistate models in survival and event history analysis Multistate models in survival and event history analysis Dorota M. Dabrowska UCLA November 8, 2011 Research supported by the grant R01 AI067943 from NIAID. The content is solely the responsibility of the

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Multi-state models: prediction

Multi-state models: prediction Department of Medical Statistics and Bioinformatics Leiden University Medical Center Course on advanced survival analysis, Copenhagen Outline Prediction Theory Aalen-Johansen Computational aspects Applications

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials

More information

On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data

On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data Yehua Li Department of Statistics University of Georgia Yongtao

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Instrumental variables estimation in the Cox Proportional Hazard regression model

Instrumental variables estimation in the Cox Proportional Hazard regression model Instrumental variables estimation in the Cox Proportional Hazard regression model James O Malley, Ph.D. Department of Biomedical Data Science The Dartmouth Institute for Health Policy and Clinical Practice

More information

For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood,

For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood, A NOTE ON LAPLACE REGRESSION WITH CENSORED DATA ROGER KOENKER Abstract. The Laplace likelihood method for estimating linear conditional quantile functions with right censored data proposed by Bottai and

More information

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -

More information

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n Bios 323: Applied Survival Analysis Qingxia (Cindy) Chen Chapter 4 Fall 2012 4.2 Estimators of the survival and cumulative hazard functions for RC data Suppose X is a continuous random failure time with

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

Missing Covariate Data in Matched Case-Control Studies

Missing Covariate Data in Matched Case-Control Studies Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago prathouz@health.bsd.uchicago.edu with

More information

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Guosheng Yin Department of Statistics and Actuarial Science The University of Hong Kong Joint work with J. Xu PSI and RSS Journal

More information

Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring

Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring Noname manuscript No. (will be inserted by the editor) Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring Thomas A. Gerds 1, Michael W Kattan

More information

Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS

NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS BIRS 2016 1 NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS Malka Gorfine Tel Aviv University, Israel Joint work with Danielle Braun and Giovanni

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

Multistate models and recurrent event models

Multistate models and recurrent event models and recurrent event models Patrick Breheny December 6 Patrick Breheny University of Iowa Survival Data Analysis (BIOS:7210) 1 / 22 Introduction In this final lecture, we will briefly look at two other

More information

POWER AND SAMPLE SIZE DETERMINATIONS IN DYNAMIC RISK PREDICTION. by Zhaowen Sun M.S., University of Pittsburgh, 2012

POWER AND SAMPLE SIZE DETERMINATIONS IN DYNAMIC RISK PREDICTION. by Zhaowen Sun M.S., University of Pittsburgh, 2012 POWER AND SAMPLE SIZE DETERMINATIONS IN DYNAMIC RISK PREDICTION by Zhaowen Sun M.S., University of Pittsburgh, 2012 B.S.N., Wuhan University, China, 2010 Submitted to the Graduate Faculty of the Graduate

More information

3003 Cure. F. P. Treasure

3003 Cure. F. P. Treasure 3003 Cure F. P. reasure November 8, 2000 Peter reasure / November 8, 2000/ Cure / 3003 1 Cure A Simple Cure Model he Concept of Cure A cure model is a survival model where a fraction of the population

More information

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Analysis of competing risks data and simulation of data following predened subdistribution hazards Analysis of competing risks data and simulation of data following predened subdistribution hazards Bernhard Haller Institut für Medizinische Statistik und Epidemiologie Technische Universität München 27.05.2013

More information

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Introduction This course is an applied course,

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Machine Learning. Module 3-4: Regression and Survival Analysis Day 2, Asst. Prof. Dr. Santitham Prom-on

Machine Learning. Module 3-4: Regression and Survival Analysis Day 2, Asst. Prof. Dr. Santitham Prom-on Machine Learning Module 3-4: Regression and Survival Analysis Day 2, 9.00 16.00 Asst. Prof. Dr. Santitham Prom-on Department of Computer Engineering, Faculty of Engineering King Mongkut s University of

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 248 Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Kelly

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Proportional hazards model for matched failure time data

Proportional hazards model for matched failure time data Mathematical Statistics Stockholm University Proportional hazards model for matched failure time data Johan Zetterqvist Examensarbete 2013:1 Postal address: Mathematical Statistics Dept. of Mathematics

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 9.1 Survival analysis involves subjects moving through time Hazard may

More information

STAT 6350 Analysis of Lifetime Data. Probability Plotting

STAT 6350 Analysis of Lifetime Data. Probability Plotting STAT 6350 Analysis of Lifetime Data Probability Plotting Purpose of Probability Plots Probability plots are an important tool for analyzing data and have been particular popular in the analysis of life

More information