# Extensions of Cox Model for Non-Proportional Hazards Purpose

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 PhUSE 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Jadwiga Borucka, PAREXEL, Warsaw, Poland ABSTRACT Cox proportional hazard model is one of the most common methods used in analysis of time to event data. The idea of the model is to define hazard level as a dependent variable which is being explained by the time-related component (so called baseline hazard) and covariates-related component. Model is based on several restrictive assumptions which need to be carefully verified before interpretation of parameters estimates. One of them is the assumption of proportional hazard which results directly from the model formula and means that hazard ratio needs to be constant over time. However, if this assumption is violated, it does not necessarily prevent analyst from using Cox model. The current paper presents two ways of model modification in case of non-proportional hazards: introducing interactions of selected covariates with function of time and stratification model. Both of them are easily applicable with the use of PHREG procedure in SAS. The paper consists of introduction, two sections dedicated to the methods of non-proportional hazards handling and conclusions. References, acknowledgements and contact information are included at the end of this article. INTRODUCTION Cox proportional hazard model is one of the most common methods used in analysis of time to event data. The idea of the model is to define hazard level as a dependent variable which is being explained by the time-related component (so called baseline hazard) and covariates-related component. The model is defined as follows: where: λ(t, x) = λ (t) exp(βx) λ(t, x) hazard function that depends on timepoint t and vector of covariates x, λ (t) baseline hazard function that depends on time only, exp(βx) covariates-related component. Cox model is based on several restrictive assumptions. One of them is the assumption of proportional hazard that the name of the model refers to and which results directly from the model formula as follows: where: HR hazard ratio, x 1 vector of covariates of subject I, x 2 vector of covariates of subject II. HR = λ(t, x ) λ(t, x ) = λ (t)exp(βx ) λ (t)exp(βx ) = exp(βx ) exp(βx ) = exp [β(x x )] The assumption states that hazard ratio for two subjects who are characterized by different sets of covariates depends only on values of these covariates and does not depend on time. In other words: hazard ratio is constant over time which means that the effect of the given covariate on the hazard level is the same at all timepoints. There are various opinions on the importance of this assumption with regard to the parameters interpretation. Some authors state that violation from it is nothing extremely problematic as in such cases parameter for a covariate for which assumption is not satisfied can be understood as average effect over timepoints that are observed in a dataset (Allison, 1995). The others however underline the importance of this assumption (Hosmer, Lemeshow 1999) and suggest potential modification of the model if hazard ratio turns out not to be constant over time for some covariates. While in some situations measuring average effect of a covariate for which proportional hazard assumption is not satisfied might be enough, it is possible to recall cases where this approach is not satisfying. Hosmer and Lemeshow (1999) discuss this issue and give an example of randomized clinical trials in which site is 1

2 relatively often being used as a covariate in Cox model. Such an approach results in assuming that baseline hazards are proportional across study sites which might not necessarily be justified. In such cases it would be worth taking this fact into account and estimate the model adjusting for potentially timevarying effect of study site rather than stating that parameter estimate for site expresses its average effect on the hazard level. There are several methods that enable verification of proportional hazard assumption. Firstly, one can consider graphical method which is based on the plot of log-negative-log of the Kaplan-Meier estimator of survival function presented separately for each group defined on the basis of values of a covariate for which the assumption is being verified (Hosmer, Lemeshow 1999). If the assumption is satisfied, plot should present several curves with distance between them that does not change over time. One possible disadvantage of this method might be the fact that it is quite problematic to visually assess how far these lines are from parallel position, especially for small samples. The other graphical method employs Schoenfeld residuals which are expected to have mean equal to 0 which might be assessed on the basis of the plot (it is expected that residuals will show no trend over time). The third method adds to the model interaction of the covariate of interest with time if such a variable turns out to be statistically significant, it indicates that proportional hazard assumption might be violated. The last method is not only a way to verify the assumption but also a potential solution to the problem of its violation, which will be discussed wider in the next section. As soon as it is stated that proportional hazard assumption is not satisfied for a covariate, it should be decided which approach is to be chosen. As it was mentioned before, one can think of the parameter estimate as of the average strength of the covariate impact on the hazard rate. If this is the case, nothing more should be done in terms of Cox model construction. If the influence however varies over time and this changing impact is also of interest, then one of the available methods of Cox model modification for non-proportional hazards might be applied. There are two methods that are being considered most often: adding covariate to the model which is defined as an interaction of particular covariate with a function of time variable and stratification model. The next two sections present these methods, including example application on the dataset containing data for 60 subjects from open-label clinical trial: age at screening (in years), site (coded as SITE = 1 that corresponds to site B or SITE = 2 that corresponds to site A) and time since the beginning of study to death/censoring (in days) as well as censoring information (coded as CENSOR = 1 for subjects who experience the event, here: death, and CENSOR = 0 otherwise). Cox model is used in order to analyze time to death among subjects enrolled in the study, with regard to age of patients and study site. For presenting purpose, simple Cox model is being considered, including age and site as explanatory variables. All calculations are being performed with the use of statistical and graphical procedures in SAS Base 9.3. INTERACTION WITH TIME The first method uses interactions with time for covariates for which assumption is not satisfied. This method is in fact both the way to identify such covariates in the model and solution of the problem at the same time (Allison, 1995). After adding an interaction of some function of time variable with a covariate included in the initial model, the statistical significance is being verified. If newly added variable turns out to be significant, it indicates that proportional hazard assumption is not satisfied for the given covariate which means that its effect is changing over time. Including the interaction in the model enables interpretation of the parameters that takes into consideration the fact that the covariate s influence on the hazard level is not constant. As far as the function type is concerned, some authors suggest using logarithm rather than any other function (Quantin, et al., 1996), the others however underline that there is no theoretical reason to choose logarithm as this approach is seen rather as a technical solution that enables to avoid numerical problems (Allison, 1995). PHREG procedure is said to be robust to such problems (Allison, 1995), thus simple linear function is chosen. The sample dataset is being used to estimate the Cox proportional model, where the event is defined as death of patient, time is measured from the beginning of the open-label study, age and site are included as covariates. The exact method according to Kalbfleisch and Prentice (1980) is being used in model estimation in order to account for presence of tied evens in the dataset. The following piece of code estimates the model, using TIME as time variable, CENSOR as censoring indicator (0 means lack of event), AGE and SITE as covariates. /*Initial Cox model estimation site and age as covariates*/ proc phreg data = a.site; format site site.; model time*censor(0) = age site / ties = exact; output out = schoen ressch = age_s site_s ; 2

3 Model 1: AGE SITE Convergence criterion is satisfied, both covariates are significant at the level of 0.1. The proportional hazard assumption is being verified for both variables, using interactions with time and Schoenfeld residuals plots. For categorical variable SITE, additionally plot of log-negative-log is presented. Proportional hazard assumption verification for SITE: (1) Plot of log-negative-log of survival function: Log-negative-log of survival function plots are being obtained with the use of LIFETEST procedure with STRATA statement, as follows: /*Lifetables estimation according to Kaplan-Meier formula; generation of the plot of 'log-negative-log' of survival function separately for each site > STRATA statement*/ proc lifetest data = a.site plots = (s, lls); format site site.; strata site; time time*censor(0); (2) Plot of Schoenfeld residuals: In order to obtain Schoenfeld residuals plots, it is necessary to estimate Cox model using PROC PHREG and save Schoenfeld residuals in a separate dataset, adding OUTPUT OUT statement. Then, GPLOT procedure enables to generate the plot of residuals as a function of time. /*Cox model estimation age and site as covariates; saving Schoenfeld residuals in output dataset*/ proc phreg data = a.site; format site site.; model time*censor(0) = age site / ties = exact; output out = schoen ressch = age_s site_s ; 3

4 /*Generation of plot of Schoenfeld residuals for site as function of time*/ proc gplot data = schoen; symbol1 v = dot c = red width = 1 i = sm80s; plot site_s*time / haxis = axis1 vaxis = axis2; axis1 label = ('Time'); axis2 label = (a = 90 'Schoenfeld Residual for Site'); (3) Interaction of SITE with TIME variable: Adding interaction with time to the Cox model is quite simple in PHREG procedure. The additional variable needs to be named in the MODEL statement and then defined, using programming statements available in the procedure. /*Cox model estimation age and site as covariates, with interaction of site and time added*/ proc phreg data = a.site; format site site.; model time*censor(0) = age site site_t/ ties = exact; site_t = site*time; test: test site_t; Model 2: AGE SITE site_t Proportional hazard assumption verification for AGE: (1) Plot of Schoenfeld residuals: 4

5 (2) Interaction of AGE with TIME variable: Model 3: AGE SITE age_t On the basis of the above results it can be stated that proportional hazard assumption seems to be satisfied for AGE interaction of AGE and TIME is not statistically significant at any acceptable level, Schoenfeld residuals on the plot do not show any trend, smoothed line has approximate zero slope. For SITE however, conclusion seems to be quite opposite interaction with time is statistically significant at the level 0.1. Plot of Schoenfeld residuals does not give straightforward answer however it might suggest that hazards are not proportional across study sites which can be stated also on the basis of log-negativelog, as distance between lines is not constant for all values of TIME. This might lead to the conclusion that proportional hazard assumption is violated for SITE variable and effect of this variable might be changing over time. Thus, it would be worth considering the model including interaction of SITE and TIME. Comparing Model 1 (initial one) and Model 2 (with TIME by SITE interaction added) it can be seen that information criteria have lower values in Model 2. Likelihood ratio test in this case is in fact test for significance of TIME by SITE interaction, as it is the only covariate added, and leads to rejection of the null hypothesis which assumes that added variable is not significant. It seems then that adding new variable to the model resolves problem with violated assumption and improves fit statistics. Thus, considering two models presented above, Model 2 should be chosen rather than Model 1. Let s focus on the parameters interpretation for SITE variable in both models. As far as Model 1 is concerned, interpretation is quite straightforward. Parameter estimate equal to results in hazard ratio for binary variable at the level of 0.48 which means that subjects from site A are approximately 52% less likely to die than subjects from site B. Due to the fact that proportional hazard assumption is violated for SITE, this interpretation might only refer to average effect of SITE, as suggested by Allison. Now, let s take into account the fact that effect of this covariate is changing over time. In order to calculate hazard ratio between site A and site B on the basis of Model 2 estimation, the following equation is being derived: HR = λ(t, age = a, site = 2) λ(t, age = a, site = 1) = λ (t)exp [β a + 2β + 2β t] = exp[β λ (t)exp [β a + β + β t] +β t]. where: β 1 parameter estimate for age, β 2 parameter estimate for site, β 3 parameter estimate for interaction of site and time. As it can be seen, hazard ratio depends on time thus interpretation of site effect on the hazard level should incorporate value of time variable as well. The plot below presents how hazard ratio between two subjects of the same age but from different sites changes over time. 5

6 On the basis of the plot it can be stated that hazard ratio is very low up to 80 th day which means that for relatively low survival times subjects from site B are much more likely to die than subjects from site A, however hazard ratio constantly increases so this difference is becoming smaller and smaller. On 130 th day subjects from site A are approximately 50% less likely to die than subjects from site B. Hazard ratio reaches value of 1 on 148 th day which means that chances of dying are equal for subjects treated in both sites. After that time hazard ratio rapidly increases and exceeds 3 after 176 th day meaning that eventually subjects from site A are even three times more likely to die than subjects treated in site B. It should be mentioned that the first event in the whole sample was recorded on 92 nd day thus drawing conclusions for period 0 92 days might not be reliable and the interpretation should be focused rather on values of TIME greater than or equal to 92. In general, it might be stated that subjects from site B seem to experience the event relatively early but if they survive long enough, they are not likely to die in later phase. Subjects from site A are more likely to live longer but after some time point they seem to experience the event approximately as often as subjects from site B. To be more specific, 14 subjects from site B are dying between 92 nd and 160 th day, 17 subjects who die in site A have survival times between 125 th and 188 th day in the study. As compared with the results of Model 1, where it is stated that subjects from Site A are in average approximately 52% less likely to experience the event, it is worth being mentioned that after accounting for the fact that effect of SITE is not constant over time, this difference turns out to be much higher at some time points and what is more important hazard ratio is lower than 1 in early phase (meaning that subjects from site B are more risky group) but exceeds 1 after 148 th day indicating that subjects from site A become more likely to die. STRATIFIED MODEL The second method that enables to handle non-proportional hazards is stratification. The main idea is to split the whole sample into subgroups on the basis of categorical variable which is called stratification variable and re-estimate the model, letting the baseline hazard function differ between these subgroups. It makes sense to choose a categorical covariate as a stratification variable if it interacts with time (i.e. proportional hazard assumption is not satisfied for this covariate) and is not of primary interest as stratification of the model automatically excludes stratification variable from explanatory variables set. As far as coefficient estimates are concerned, in the basic form of stratified model it is assumed that they are constant across strata groups however it is possible to include interaction of a stratification variable and another covariate in order to take into account different slopes (Hosmer, Lemeshow 1999). In general, the stratified model for stratum s is defined as follows: λ (t, x) = λ (t) exp(βx) where s = 1, 2,, S and S is the total number of subgroups created on the basis of stratification variable. The partial likelihood function formula is similar to the function proposed by Cox having additional subscript indicating the stratum number. By multiplying partial likelihood function for each stratum, the full partial likelihood function is obtained (for details, please refer to Hosmer, Lemeshow 1999). After the stratified model is estimated, it is possible to obtain estimation of baseline survival and baseline cumulative hazard functions for each stratum. Additionally, one can estimate covariates-adjusted survival and cumulative hazard functions. There is available option BASELINE in PHREG procedure which enables to calculate obtain these estimates for each stratum. Calculations are being performed for each set of covariates that are specified by the user. If no input dataset containing specified sets of covariates is defined, then SAS calculates survival and cumulative hazard function for each value of stratification variable, taking the mean of continuous variables within each stratum. Let s consider SITE for which it is known that proportional hazard assumption is violated as a stratification variable. It will be not possible to obtain parameter estimates for SITE, however using BASELINE statement enables to estimate survival and cumulative hazard function estimates for each site separately, adjusting for age. The Cox model is re-estimated in the modified formula, using STRATA statement in PHREG procedure. /*Cox model estimation AGE as a covariate, SITE as stratification variable; saving estimates of survival and cumulative hazard functions in BASE dataset*/ proc phreg data = a.site; baseline out = base survival = surv cumhaz = cumhaz; format site site.; strata site; model time*censor(0) = age / ties = exact; 6

7 Model 4: Summary of the Number of Event and Censored Values Percent Stratum SITE Total Event Censored Censored 1 Site A Site B Total AGE Results of the model estimation are presented above. Convergence criterion is satisfied, AGE which is the only covariate in the stratified model is significant at any acceptable level. As it can be noticed, hazard ratio for AGE does not differ to a large extent as compared with Model 1 and is equal to 1.23, which means that every year the risk of dying increases by 23% as compared with the previous year. However, it should be taken into account that hazard ratio calculated by SAS concerns comparison of subjects with one year age difference. While comparing subjects with larger age difference, e.g. 20 years, hazard ratios are equal to 66 for stratified model and to 63 for initial model which makes the difference between the initial model and the stratified model more visible. Additionally, plots of cumulative hazard and survival functions are presented below. /*Plots of covariates-adjusted cumulative hazard and survival functions obtained on the basis of the stratified model*/ proc gplot data = base; symbol1 v = dot c = red width = 1 i = sm50s; symbol2 v = dot c = blue width = 1 i = sm50s; plot cumhaz*time = site / haxis = axis1 vaxis = axis2; axis1 label = ('Time'); axis2 label = (a = 90 'Cumulative Hazard Function'); 7

8 As it is expected on the basis of previous results, subjects from site A tend to have relatively longer survival times as compared with subjects from site B. Cumulative hazard function for almost all time points is higher for site B which means that expected number of events till the given time point is usually higher for subjects treated in this site. On the basis of the models presented above it can be stated that accounting for the non-proportional hazards (if exist) provides more detailed interpretation. Not only is it possible to state which group is more likely to experience the event but it is also possible to analyze hazard ratio that is changing over time and corresponds to varying effect of the given covariate. At the same time, it is hard to define a general rule saying which of the two methods presented above should be chosen. On the one hand, stratification model is easier to implement and requires less computational resources (Allison, 1995). On the other hand, if analysis is performed with the use of relatively small dataset or with the use of powerful computer, resources are not that important and interaction with time might be introduced to the model. The latter approach enables to obtain parameter estimate for covariate for which proportional hazard assumption is violated, as well as analyze how hazard ratio changes over time, which is impossible if stratification model is chosen. While trying to compare two models accounting for non-proportional hazards presented above, i.e. Model 2 including interaction of SITE by TIME and Model 4 based on stratification, information criteria and partial likelihood function values might be compared. Statistics corresponding to each model are presented in the table below. Additionally, fit statistics for initial model including SITE and AGE as covariates are included. SITE by TIME interaction SITE as stratification variable Initial model -2lnL AIC SBC It can be noticed that initial model (Model 1 including SITE and AGE as covariates) that neglects the fact that proportional hazard assumption is violated for SITE variable has the highest values of all three fit statistics. While comparing Model 2 and Model 4, it can be stated that all criteria have lower values for stratification model, which stands for this approach. Likelihood ratio test however cannot be performed in this case as considered models are not nested. Additionally in order to perform an overall assessment of both models linear predictor is being calculated and ten binary variables are created on the basis of its percentiles. Nine out of ten binary variables were introduced to the models and their statistical significance is being verified. The piece of SAS code presented below is being used to perform these procedures. /*Overall assessment - model with site by time interaction*/ data over1; set a.site; xbeta = *age *site *site*time; count = 1; proc univariate data = over1 noprint; var xbeta; output out = perc pctlpre = p pctlpts = ; data perc; set perc; count = 1; data over1a; merge over1 perc; by count; %macro ret; data over1a; set over1a; if xbeta<=p10 then x10 = 1; else x10 = 0; if xbeta<p90 then x100 = 1; 8

9 else x100 = 0; %do i = 10 %to 80 %by 10; %do j = 20 %to 90 %by 10; if xbeta>p&i and xbeta<=p&j then x&j = 1; else x&j = 0; %end; %end; %mend; %ret; proc phreg data = over1a; model time*censor(0) = age site site_t x10 x20 x30 x40 x50 x60 x70 x80 x90 / ties = exact; site_t = site*time; Model 2 (including SITE by TIME Interaction): The PHREG Procedure AGE SITE site_t x x x x x x x x x Model 4 (SITE as stratification variable): AGE x x x x x x x x x In stratification model, none of newly added variables is statistically significant at any acceptable level. As it comes to model with TIME by SITE interaction, two variables for which parameters estimates were obtained are significant which suggests poor fit of the model. Thus, comparison of information criteria as well as linear predictor method would suggest using stratification model rather than introducing interaction with time to the model. It should be noticed however that these tests should not be treated as an oracle as they give rather kind of suggestions rather than unequivocal determinant of decision which model should be chosen. 9

10 CONCLUSIONS Although proportional hazard assumption is one of the most important features in Cox model, its violation should not definitely prevent from using this statistical tool. The current paper presents two methods that were developed in order to take into account the effect of a covariate that varies through time. Calculations performed give the evidence that stratification or including interaction with time results in better model fit in case of covariates for which proportional hazard assumption is not satisfied. Additionally, more detailed results and interpretation are obtained with the use of presented method. It would be hard however to define general rule for non-proportional hazards handling. The analyst can consider one of three possibilities, i.e. keeping all covariates in the model and neglecting the fact of the violation from nonproportional hazard assumption, introducing interaction of TIME by SITE and estimation of stratification model. Each of these approaches has its pros and cons. What is more, it is hard to compare these models, especially stratification model with non-stratified models, as they differ in their construction. Thus, as soon as non-proportional hazards are identified in the model, this fact should definitely be taken into consideration but the choice of the method needs to be adjusted for the particular example. REFERENCES Allison P. D., Survival analysis using SAS. A practical guide, 1995, Cary. Hosmer D., Lemeshow S., 1999, Applied survival analysis. Regression modeling time to event data, New York. Kalbfleisch J. D., Prentice R. L., 1980, The statistical analysis of failure time data, New York. Quantin C, Moreau, Т., Asselain В., Maccario J., Lellouch, J., 1996, A regression survival model for testing the proportional hazards hypothesis., in: Biometrics, 52. Therneau T. M., Grambsch P. M., 2000, Modeling survival data, Berlin. ACKNOWLEDGEMENTS Special thanks to Marzena Marcinowska and Kirill Skouibine for their help and support. CONTACT INFORMATION Jadwiga Borucka, PAREXEL International Marconich str. 6, Warsaw, Poland Tel

### ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

### Survival Regression Models

Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

### Residuals and model diagnostics

Residuals and model diagnostics Patrick Breheny November 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/42 Introduction Residuals Many assumptions go into regression models, and the Cox proportional

### Müller: Goodness-of-fit criteria for survival data

Müller: Goodness-of-fit criteria for survival data Sonderforschungsbereich 386, Paper 382 (2004) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Goodness of fit criteria for survival data

### Lecture 8 Stat D. Gillen

Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels

### BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

### Analysing categorical data using logit models

Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

### Time-Dependent Covariates Survival More in PROC PHREG Fengying Xue,Sanofi R&D, China Michael Lai, Sanofi R&D, China

Time-Dependent Covariates Survival More in PROC PHREG Fengying Xue,Sanofi R&D, China Michael Lai, Sanofi R&D, China ABSTRACT Survival analysis is a powerful tool with much strength, especially the semi-parametric

### Basic Medical Statistics Course

Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

### Multinomial Logistic Regression Models

Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

### The coxvc_1-1-1 package

Appendix A The coxvc_1-1-1 package A.1 Introduction The coxvc_1-1-1 package is a set of functions for survival analysis that run under R2.1.1 [81]. This package contains a set of routines to fit Cox models

### LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

### Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Proportional Hazards Model - Handling Ties and Survival Estimation Statistics 255 - Survival Analysis Presented February 4, 2016 likelihood - Discrete Dan Gillen Department of Statistics University of

### Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

### Chapter 1 Statistical Inference

Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

### FULL LIKELIHOOD INFERENCES IN THE COX MODEL

October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

### Chapter 4 Regression Models

23.August 2010 Chapter 4 Regression Models The target variable T denotes failure time We let x = (x (1),..., x (m) ) represent a vector of available covariates. Also called regression variables, regressors,

### Chapter 5: Logistic Regression-I

: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

### ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

### CHAPTER 6: SPECIFICATION VARIABLES

Recall, we had the following six assumptions required for the Gauss-Markov Theorem: 1. The regression model is linear, correctly specified, and has an additive error term. 2. The error term has a zero

### Correlation and regression

1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

### STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs The data for the tutorial came from support.sas.com, The LOGISTIC Procedure: Conditional Logistic Regression for Matched Pairs Data :: SAS/STAT(R)

### Analysis of competing risks data and simulation of data following predened subdistribution hazards

Analysis of competing risks data and simulation of data following predened subdistribution hazards Bernhard Haller Institut für Medizinische Statistik und Epidemiologie Technische Universität München 27.05.2013

### SAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures

SAS/STAT 13.1 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete

### Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of

### A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

### A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials

### STAT331. Cox s Proportional Hazards Model

STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

### Survival analysis in R

Survival analysis in R Niels Richard Hansen This note describes a few elementary aspects of practical analysis of survival data in R. For further information we refer to the book Introductory Statistics

### Understanding the Cox Regression Models with Time-Change Covariates

Understanding the Cox Regression Models with Time-Change Covariates Mai Zhou University of Kentucky The Cox regression model is a cornerstone of modern survival analysis and is widely used in many other

### Neuendorf Logistic Regression. The Model: Assumptions:

1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that

### Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes Today s Class: Review of 3 parts of a generalized model Models for discrete count or continuous skewed outcomes Models for two-part

### Models for Binary Outcomes

Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

### Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

### 7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that

### SAS/STAT 13.2 User s Guide. Introduction to Survey Sampling and Analysis Procedures

SAS/STAT 13.2 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 13.2 User s Guide. The correct bibliographic citation for the complete

### Models for Ordinal Response Data

Models for Ordinal Response Data Robin High Department of Biostatistics Center for Public Health University of Nebraska Medical Center Omaha, Nebraska Recommendations Analyze numerical data with a statistical

### Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

### Frailty Modeling for clustered survival data: a simulation study

Frailty Modeling for clustered survival data: a simulation study IAA Oslo 2015 Souad ROMDHANE LaREMFiQ - IHEC University of Sousse (Tunisia) souad_romdhane@yahoo.fr Lotfi BELKACEM LaREMFiQ - IHEC University

### Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

### SAS/STAT 13.1 User s Guide. The PHREG Procedure

SAS/STAT 13.1 User s Guide The PHREG Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

### COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

### Analyzing Residuals in a PROC SURVEYLOGISTIC Model

Paper 1477-2017 Analyzing Residuals in a PROC SURVEYLOGISTIC Model Bogdan Gadidov, Herman E. Ray, Kennesaw State University ABSTRACT Data from an extensive survey conducted by the National Center for Education

### STA6938-Logistic Regression Model

Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

### PASS Sample Size Software. Poisson Regression

Chapter 870 Introduction Poisson regression is used when the dependent variable is a count. Following the results of Signorini (99), this procedure calculates power and sample size for testing the hypothesis

### Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable

### Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

### Longitudinal Modeling with Logistic Regression

Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

### Hazards, Densities, Repeated Events for Predictive Marketing. Bruce Lund

Hazards, Densities, Repeated Events for Predictive Marketing Bruce Lund 1 A Proposal for Predicting Customer Behavior A Company wants to predict whether its customers will buy a product or obtain service

### Business Statistics. Lecture 9: Simple Regression

Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

### A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes

A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes Ritesh Ramchandani Harvard School of Public Health August 5, 2014 Ritesh Ramchandani (HSPH) Global Rank Test for Multiple Outcomes

### Chapter 19: Logistic regression

Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

### Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs Christopher Jennison Department of Mathematical Sciences, University of Bath http://people.bath.ac.uk/mascj

### 9 Generalized Linear Models

9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

### 8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

### Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

### Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

### Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

### Guideline on adjustment for baseline covariates in clinical trials

26 February 2015 EMA/CHMP/295050/2013 Committee for Medicinal Products for Human Use (CHMP) Guideline on adjustment for baseline covariates in clinical trials Draft Agreed by Biostatistics Working Party

### Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract

### Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

### Tied survival times; estimation of survival probabilities

Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation

### CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

### LOGISTICS REGRESSION FOR SAMPLE SURVEYS

4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

### Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

### Model Estimation Example

Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

### Comparison of Two Samples

2 Comparison of Two Samples 2.1 Introduction Problems of comparing two samples arise frequently in medicine, sociology, agriculture, engineering, and marketing. The data may have been generated by observation

### Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions Alan J Xiao, Cognigen Corporation, Buffalo NY ABSTRACT Logistic regression has been widely applied to population

### Integrated likelihoods in survival models for highlystratified

Working Paper Series, N. 1, January 2014 Integrated likelihoods in survival models for highlystratified censored data Giuliana Cortese Department of Statistical Sciences University of Padua Italy Nicola

### A note on R 2 measures for Poisson and logistic regression models when both models are applicable

Journal of Clinical Epidemiology 54 (001) 99 103 A note on R measures for oisson and logistic regression models when both models are applicable Martina Mittlböck, Harald Heinzl* Department of Medical Computer

### Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Outline CHL 5225H Advanced Statistical Methods for Clinical Trials: Survival Analysis Prof. Kevin E. Thorpe Defining Survival Data Mathematical Definitions Non-parametric Estimates of Survival Comparing

### A fast routine for fitting Cox models with time varying effects

Chapter 3 A fast routine for fitting Cox models with time varying effects Abstract The S-plus and R statistical packages have implemented a counting process setup to estimate Cox models with time varying

### Comparison of Predictive Accuracy of Neural Network Methods and Cox Regression for Censored Survival Data

Comparison of Predictive Accuracy of Neural Network Methods and Cox Regression for Censored Survival Data Stanley Azen Ph.D. 1, Annie Xiang Ph.D. 1, Pablo Lapuerta, M.D. 1, Alex Ryutov MS 2, Jonathan Buckley

### Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

9.1 Scatter Plots and Linear Correlation Answers 1. A high school psychologist wants to conduct a survey to answer the question: Is there a relationship between a student s athletic ability and his/her

### SPSS LAB FILE 1

SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:

### Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

### 5 Transfer function modelling

MSc Further Time Series Analysis 5 Transfer function modelling 5.1 The model Consider the construction of a model for a time series (Y t ) whose values are influenced by the earlier values of a series

### , (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

### Package ICBayes. September 24, 2017

Package ICBayes September 24, 2017 Title Bayesian Semiparametric Models for Interval-Censored Data Version 1.1 Date 2017-9-24 Author Chun Pan, Bo Cai, Lianming Wang, and Xiaoyan Lin Maintainer Chun Pan

### Survival Distributions, Hazard Functions, Cumulative Hazards

BIO 244: Unit 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution

### 0.1 weibull: Weibull Regression for Duration Dependent

0.1 weibull: Weibull Regression for Duration Dependent Variables Choose the Weibull regression model if the values in your dependent variable are duration observations. The Weibull model relaxes the exponential

### TESTING FOR CO-INTEGRATION

Bo Sjö 2010-12-05 TESTING FOR CO-INTEGRATION To be used in combination with Sjö (2008) Testing for Unit Roots and Cointegration A Guide. Instructions: Use the Johansen method to test for Purchasing Power

### 11. Generalized Linear Models: An Introduction

Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

### SESUG 2011 ABSTRACT INTRODUCTION BACKGROUND ON LOGLINEAR SMOOTHING DESCRIPTION OF AN EXAMPLE. Paper CC-01

Paper CC-01 Smoothing Scaled Score Distributions from a Standardized Test using PROC GENMOD Jonathan Steinberg, Educational Testing Service, Princeton, NJ Tim Moses, Educational Testing Service, Princeton,

### Kernel density estimation in R

Kernel density estimation in R Kernel density estimation can be done in R using the density() function in R. The default is a Guassian kernel, but others are possible also. It uses it s own algorithm to

### Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

### Subgroup analysis using regression modeling multiple regression. Aeilko H Zwinderman

Subgroup analysis using regression modeling multiple regression Aeilko H Zwinderman who has unusual large response? Is such occurrence associated with subgroups of patients? such question is hypothesis-generating:

### a = 4 levels of treatment A = Poison b = 3 levels of treatment B = Pretreatment n = 4 replicates for each treatment combination

In Box, Hunter, and Hunter Statistics for Experimenters is a two factor example of dying times for animals, let's say cockroaches, using 4 poisons and pretreatments with n=4 values for each combination

### Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

### Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Title stata.com stcrreg postestimation Postestimation tools for stcrreg Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

### Introduction to Generalized Models

Introduction to Generalized Models Today s topics: The big picture of generalized models Review of maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical

### DAGStat Event History Analysis.

DAGStat 2016 Event History Analysis Robin.Henderson@ncl.ac.uk 1 / 75 Schedule 9.00 Introduction 10.30 Break 11.00 Regression Models, Frailty and Multivariate Survival 12.30 Lunch 13.30 Time-Variation and

### Sigmaplot di Systat Software

Sigmaplot di Systat Software SigmaPlot Has Extensive Statistical Analysis Features SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical

### Generalized Linear Modeling - Logistic Regression

1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

### Full likelihood inferences in the Cox model: an empirical likelihood approach

Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:

### Likelihood Construction, Inference for Parametric Survival Distributions

Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make

### Kaplan-Meier in SAS. filename foo url "http://math.unm.edu/~james/small.txt"; data small; infile foo firstobs=2; input time censor; run;

Kaplan-Meier in SAS filename foo url "http://math.unm.edu/~james/small.txt"; data small; infile foo firstobs=2; input time censor; run; proc print data=small; run; proc lifetest data=small plots=survival;

### ABSTRACT INTRODUCTION. SESUG Paper

SESUG Paper 140-2017 Backward Variable Selection for Logistic Regression Based on Percentage Change in Odds Ratio Evan Kwiatkowski, University of North Carolina at Chapel Hill; Hannah Crooke, PAREXEL International