Comparison of Hazard, Odds and Risk Ratio in the Two-Sample Survival Problem

Similar documents
MAS3301 / MAS8311 Biostatistics Part II: Survival

Power and Sample Size Calculations with the Additive Hazards Model

Survival Analysis. Stat 526. April 13, 2018

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Beyond GLM and likelihood

Lecture 8 Stat D. Gillen

Lecture 22 Survival Analysis: An Introduction

STAT331. Cox s Proportional Hazards Model

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Survival Analysis Math 434 Fall 2011

Semiparametric Regression

Relative-risk regression and model diagnostics. 16 November, 2015

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Cox s proportional hazards model and Cox s partial likelihood

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Introduction to Statistical Analysis

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Classification. Chapter Introduction. 6.2 The Bayes classifier

Survival Regression Models

β j = coefficient of x j in the model; β = ( β1, β2,

LOGISTIC REGRESSION Joseph M. Hilbe

Survival analysis in R

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Multi-state Models: An Overview

Survival analysis in R

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Multistate models and recurrent event models

Lecture 7 Time-dependent Covariates in Cox Regression

Survival Analysis. STAT 526 Professor Olga Vitek

Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

MAS3301 / MAS8311 Biostatistics Part II: Survival

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Multistate models and recurrent event models

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Linear Regression Models P8111

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 4 Regression Models

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Generalized Linear Modeling - Logistic Regression

Lecture 4 - Survival Models

TMA 4275 Lifetime Analysis June 2004 Solution


Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Tied survival times; estimation of survival probabilities

Package threg. August 10, 2015

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Duration Analysis. Joan Llull

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

9 Generalized Linear Models

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

The coxvc_1-1-1 package

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Lecture 5 Models and methods for recurrent event data

On a connection between the Bradley-Terry model and the Cox proportional hazards model

Linear Regression With Special Variables

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Survival Analysis for Case-Cohort Studies

GOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Statistical Methods for Alzheimer s Disease Studies

High-Throughput Sequencing Course

5. Parametric Regression Model

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

BIOS 312: Precision of Statistical Inference

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Likelihood Construction, Inference for Parametric Survival Distributions

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

Step-Stress Models and Associated Inference

STAT 526 Spring Final Exam. Thursday May 5, 2011

Non-parametric Mediation Analysis for direct effect with categorial outcomes

Multistate Modeling and Applications

Exam Applied Statistical Regression. Good Luck!

Pubh 8482: Sequential Analysis

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Stat 5101 Lecture Notes

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

UNIVERSITY OF TORONTO Faculty of Arts and Science

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Survival Analysis I (CHL5209H)

Survival Distributions, Hazard Functions, Cumulative Hazards

Building a Prognostic Biomarker

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Multistate models in survival and event history analysis

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

7.1 The Hazard and Survival Functions

Linear Methods for Prediction

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

General Regression Model

Modeling Arbitrarily Interval-Censored Survival Data with External Time-Dependent Covariates

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Transcription:

Western Michigan University ScholarWorks at WMU Dissertations Graduate College 8-2014 Comparison of Hazard, Odds and Risk Ratio in the Two-Sample Survival Problem Benedict P. Dormitorio Western Michigan University, b.dormitorio@gmail.com Follow this and additional works at: https://scholarworks.wmich.edu/dissertations Part of the Probability Commons, Statistical Methodology Commons, and the Statistical Models Commons Recommended Citation Dormitorio, Benedict P., "Comparison of Hazard, Odds and Risk Ratio in the Two-Sample Survival Problem" (2014). Dissertations. 304. https://scholarworks.wmich.edu/dissertations/304 This Dissertation-Open Access is brought to you for free and open access by the Graduate College at ScholarWorks at WMU. It has been accepted for inclusion in Dissertations by an authorized administrator of ScholarWorks at WMU. For more information, please contact maira.bundza@wmich.edu.

COMPARISON OF HAZARD, ODDS AND RISK RATIO IN THE TWO-SAMPLE SURVIVAL PROBLEM by Benedict P. Dormitorio A dissertation submitted to the Graduate College in partial fulfillment of the requirements for the degree of Doctor of Philosophy Statistics Western Michigan University August 2014 Doctoral Committe: Joshua Naranjo, Ph.D., Chair Rajib Paul, Ph.D. Jung Chao Wang, Ph.D. Mark Schauer, MD

COMPARISON OF HAZARD, ODDS AND RISK RATIO IN THE TWO-SAMPLE SURVIVAL PROBLEM Benedict P. Dormitorio, Ph.D. Western Michigan University, 2014 Cox proportional hazards is the standard method for analyzing treatment efficacy when time-to-event data is available. In the absence of time-to-event, investigators may use logistic regression which only requires relative frequencies of events, or Poisson regression which requires only interval-summarized frequency tables of time-to-event. When event frequencies are used instead of time-to-events, does it always result in a loss in power? We investigate the relative performance of the three methods. In particular, we compare the power of tests based on the respective effect-size estimates (1)hazard ratio (HR), (2)odds ratio (OR), and (3)risk ratio (RR). We use a variety of survival distributions and cut-off points representing length of study. We will show that the relative performance of OR against HR depends on the relative early-or-late separation of the two survival curves, and that OR and HR performed better than RR. We propose diagnostics based on the maximum separation to help investigators choose between OR and HR.

c 2014 Benedict P. Dormitorio

ACKNOWLEDGEMENTS I would like to thank everyone who have vastly contributed to my education. To my advisor, Dr. Naranjo, for his guidance and constant encouragement - thank you. To my committee members, Dr. Paul, Dr. Wang and Dr. Schauer, thank you for the review and recommendations. To those I have worked with over the years, Dr. Toledo and the rest of WMED, Dr. Paul, Dr. Kothari and Dr. Curtis at HDREAM, Dr. Tersptra at the consulting lab and colleagues at MPI Research - thank you. To my family and friends who have kept me sane and grounded, thank you. Lastly, to God - thank you. Benedict P. Dormitorio ii

TABLE OF CONTENTS ACKNOWLEDGEMENTS................................. ii LIST OF TABLES..................................... vi LIST OF FIGURES..................................... vii CHAPTER 1. INTRODUCTION................................... 1 1.1 Background and Motivation........................... 1 1.2 Basic Definition.................................. 2 1.2.1 Two-Sample Survival Problem...................... 2 1.2.2 Data Types................................ 3 1.2.3 Survival Distributions.......................... 6 1.2.4 Measures of Risk............................. 8 2. HAZARD VERSUS ODDS RATIO.......................... 9 2.1 Time-to-Event Data and Hazard Ratio...................... 9 2.1.1 Model................................... 10 2.1.2 Estimation................................ 10 2.1.3 Asymptotic Normality.......................... 11 2.1.4 Power for Testing H 0 : β = 0...................... 12 2.2 Binary Data and Odds Ratio........................... 15 2.2.1 Model................................... 15 2.2.2 Estimation................................ 16 iii

Table of Contents Continued CHAPTER 2.2.3 Asymptotic Normality.......................... 16 2.2.4 Power for Testing H 0 : β = 0...................... 17 2.3 Comparison.................................... 18 2.4 Asymptotic Variance Under H 0.......................... 18 2.5 Power Rates Under the Alternative........................ 19 3. SENSITIVITY TO LENGTH OF STUDY....................... 23 3.1 HR Estimate.................................... 26 3.2 OR Estimate.................................... 30 3.3 HR versus OR................................... 34 3.4 Discussion..................................... 37 4. METHOD SELECTION DIAGNOSTICS....................... 39 4.1 Cutoff Time C................................... 39 4.2 Maximum Separation............................... 40 4.2.1 Q-Test................................... 41 4.2.2 Q m, Modified Q-Test.......................... 42 4.3 Selection Diagnostics............................... 43 4.3.1 ˆP1 and Q-Test.............................. 43 4.3.2 ˆP1 and Q m -Test............................. 46 4.3.3 Example................................. 47 5. POISSON RISK RATIO................................ 50 5.1 Grouped Data and Poisson Risk Ratio...................... 50 iv

Table of Contents Continued CHAPTER 5.1.1 Model................................... 51 5.1.2 Estimation................................ 51 5.1.3 Asymptotic Normality.......................... 52 5.1.4 Power................................... 52 5.2 Sensitivity to Increasing Periods......................... 54 5.2.1 Analysis at Cutoff............................ 54 5.2.2 Analysis at Each Period......................... 56 5.3 RR versus HR and OR.............................. 58 6. CONCLUSION..................................... 64 APPENDICES A. Approximation of P (δ = 1)EV (X T, δ)........................ 65 B. Comparison of Theoretical and Simulated Power................... 67 REFERENCES....................................... 69 v

LIST OF TABLES 1.1 Time-to-Event Data for the AML Trial........................ 4 1.2 Binary Data for the AML Trial............................ 5 4.1 Power of Q m for Detecting Maximum after C.................... 43 5.1 Grouped Data for AML Summarized Every 10 Weeks................ 50 5.2 Simulated Power with Weibull Survival Times under Proportional Hazards. T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.25). RR Estimates are Based on Grouped Data on 1, 2, 3, 4 and m 1t + m 2t Intervals (left to right) and Increasing Study Length (top to bottom.)....................... 54 5.3 Simulated Power with Weibull Survival Times under Proportional Hazards. T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.5). RR Estimates are Based on Grouped Data on 1, 2, 3, 4 and m 1t + m 2t Intervals (left to right) and Increasing Study Length (top to bottom.)....................... 55 5.4 Simulated Power with Weibull Survival Times under Proportional Hazards. T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 2). RR Estimates are Based on Grouped Data on 1, 2, 3, 4 and m 1t + m 2t Intervals (left to right) and Increasing Study Length (top to bottom.)............................ 55 5.5 Simulated Power with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.2).......................................... 56 5.6 Simulated Power with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.5).......................................... 56 5.7 Simulated Power with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 2) 57 vi

LIST OF FIGURES 1.1 AML Trial with 2 Groups, Maintained Group (Group 1, Straight Lines) and Nonmaintained Group (Group 2, Dashed Lines), and a Cutoff Time of 30 Weeks.... 3 1.2 Survival Probability, S(t), of Four Different Weibull Times: T 1 W eibull(a = 1, b = 1), T 2 W eibull(a = 1, b = 2), T 3 W eibull(a = 2, b = 1) and T 4 W eibull(a = 2, b = 2)............................. 7 2.1 Simulated Power of HR for Testing β = 0 for ˆβ from CPH where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) and Three Cutoff Times such that F (C) = 0.30, 0.50, 0.70............................... 14 2.2 Simulated Power of OR for Testing β = 0 for ˆβ where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1to2) and Three Cutoff Times, F (C) = 0.30, 0.50, 0.70.................................... 18 2.3 Simulated Power of HR and OR at F 1 (C) = 0.30 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2)....................... 20 2.4 Simulated Power of HR and OR at F 1 (C) = 0.50 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2)....................... 21 2.5 Simulated Power of HR and OR at F 1 (C) = 0.70 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2)....................... 22 3.1 Survival Plots of Case 1, 2, 3 and 4.......................... 25 3.2 Simulated Power of HR under Case 1. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.2)............................... 26 3.3 Simulated Power of HR under Case 2. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5)............................... 27 3.4 Simulated Power of HR under Case 3. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3)............................... 28 3.5 Simulated Power of HR under Case 4. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.5)............................... 29 vii

List of Figures Continued 3.6 Simulated Power of OR under Case 1. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.2)............................... 30 3.7 Simulated Power of HR under Case 2. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5)............................... 31 3.8 Simulated Power of OR under Case 3. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3)............................... 32 3.9 Simulated Power of OR under Case 4. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.5)............................... 33 3.10 Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.2) 34 3.11 Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5) 35 3.12 Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3) 36 3.13 Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.5) 37 4.1 Example of Maximum Separation between Two Weibulls.............. 41 4.2 Diagnostic Using ˆP 1 and Q............................. 44 4.3 Diagnostic Using ˆP 1 and Q m............................ 46 4.4 Survival Plot of the AML data............................ 48 5.1 Simulated Power of the Test of RR for H 0 : β = 0 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) and Three Cutoff Times such that F (C) = 0.30, 0.50, 0.70............................... 53 5.2 Simulated Power of RR against HR and OR with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.25) and increasing Cutoffs............... 58 viii

List of Figures Continued 5.3 Simulated Power of RR against HR and OR with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.5) and increasing Cutoffs............... 59 5.4 Simulated Power of RR against HR and OR with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 2) and increasing Cutoffs................ 60 5.5 Simulated Power of RR against HR and OR with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 2) and Increasing Periods................ 61 5.6 Simulated Power of RR against HR and OR with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.5) and Increasing Periods............... 62 5.7 Simulated Power of RR against HR and OR with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 2) and Increasing Periods................ 63 A.1 Illustration of V (X T, δ)............................... 66 B.1 Theoretical versus Simulated Power of HR...................... 67 B.2 Theoretical versus Simulated Power of OR...................... 68 B.3 Theoretical versus Simulated Power of RR...................... 68 ix

CHAPTER 1 INTRODUCTION 1.1 Background and Motivation In survival experiments, the primary goal is to investigate the effect of a treatment to the risk of an event. For example, a researcher may be interested in the efficacy of a new drug in reducing the risk of an unwanted event such as death or complications or in increasing the chance of a wanted event such as recovery from illness or surgery. The most widely used data and analysis method for survival experiments is time-to-events data and the Cox proportional hazard (CPH) model (Cox 1972, 1975), respectively. In this setup, subjects are assigned to one of the groups and followed until the event occurred or the study ends. The time until the event occurred is recorded for each subject. The hazard ratio (HR) is then estimated using CPH to evaluate the relative risks among the groups. An alternative approach is to only collect a dichotomous or binary data and use logistic regression (LR) for the analysis. Here, subjects are assigned to one of the groups and evaluated at the end of the study. The evaluation only considers whether the event occurred or not. The odds ratio (OR) is then estimated using LR to assess the relative risk among the groups. When the data collected varies, the statistical method that can be used to estimate and quantify the risk also varies. When summarized or binary data is used instead of time-to-events in a survival experiment, does it always result to a loss in power? If so, how much power is lost? If not, under what situations? The literature in comparing statistical methods for survival analysis is extensive. For example, the optimality of the log-rank and Wilcoxon test for testing difference in survivability or risk of event is well studied. Various testing schemes such as Martinez and Naranjo [2010] and 1

Darilay and Naranjo [2011] was proposed to take advantage of the known optimal scenarios for log-rank and Wilcoxon test. However, research on the various data type and analysis methods for survival analysis is limited. The use of Poisson regression (PR) for grouped data was proposed by Prentice and Gloeckler [1978], Holford [1980], Frome [1983] and Laird and Olivier [1981] as an alternative to CPH mainly due to computational limitations. Research on PR and LR as an approximation to CPH was also studied by Callas et al. [1998], et. al. [1989], Walter [2000] and Symons and Moore [2002]. On the other hand, the optimal scenarios for the performance of CPH, PR and LR as a consequence of data collection has not been explored. In this research, we will investigate the performance of Cox proportional hazard model, logistic and Poisson regression as a consequence of the data type collected using their statistical power. Specifically, we will explore their sensitivity to the length of study and the sensitivity of PR for increasing number of summary under different distributions. Lastly, we will propose diagnostics and recommendations on when to use the three methods. 1.2 Basic Definition 1.2.1 Two-Sample Survival Problem Two-sample survival experiments are conducted to investigate the relative risk of an event among two groups. We denote the two groups as group 1 and group 2 in a general setting. n = n 1 + n 2 subjects are assigned to groups 1 and 2, respectively, and observed over the study period called cutoff time. As an example, we adapt a data set from Miller [1981]. The efficacy of maintained chemotherapy for patients with acute myelogenous leukemia (AML) was studied in a clinical trial. The patients undergo initial chemotherapy until a state of remission was attained. Then they were assigned to either a group with maintained chemotherapy or no maintained chemotherapy. The goal is to assess whether maintained chemotherapy reduces the risk of relapse. We consider the data with only a follow-up period of 30 weeks. Figure 1.1 is an illustration of the AML trial. The x-axis is the time in weeks and y-axis is the subject number. The straight and dashed lines denotes the two groups, maintained and non-maintained. The length of the lines shows the time to relapse. For example, 4 subjects in the maintained group had a relapse in less than 30 weeks while there were 7 in the non-maintained group. Subject 1, from the maintained chemotherapy group, had a relapse after 9 weeks and 2

subject 5, from non-maintained chemotherapy group, after 4 weeks. Figure 1.1: AML Trial with 2 Groups, Maintained Group (Group 1, Straight Lines) and Nonmaintained Group (Group 2, Dashed Lines), and a Cutoff Time of 30 Weeks. 1.2.2 Data Types Data collected from survival experiments can vary and depends on available resources. A wellfunded experiment, for example, may allow for a thorough follow-up of each subject. In contrast, a study with limited funding faces constraints and may only allow limited follow-ups. Availability of data also put some restrictions on data collection such as epidemiological or retrospective studies where available data are already summarized. 3

Time-to-Event When an experiment is conducted such that the time it took for an event to occur for each subject, called time-to-events, can be collected. Time-to-event data comes in pair of variables: 1.) time and 2.) an indicator of event. The time is recorded for each subject with an event while the cutoff time and a + sign is recorded when the subject did not have the event yet during the study. In Table 1.1, the time-to-event data from the AML trial is shown. The column Weeks to Relapse is the time and the column Relapse? indicates the event. A Treatment column shows which treatment group the patient belongs to. Table 1.1: Time-to-Event Data for the AML Trial Weeks to Relapse Relapse? Treatment 9 Yes Maintained 13 Yes Maintained 18 Yes Maintained 22 Yes Maintained 4 Yes Nonmaintained 5 Yes Nonmaintained 7 Yes Nonmaintained 8 Yes Nonmaintained 12 Yes Nonmaintained 23 Yes Nonmaintained 27 Yes Nonmaintained 30+ No Maintained 30+ No Maintained 30+ No Maintained 30+ No Maintained 30+ No Maintained 30+ No Nonmaintained 30+ No Nonmaintained 30+ No Nonmaintained 4

Binary Data When the observations made on the subjects only indicates whether the event occurred or not, dichotomous or binary data are recorded. The advantage of this data type is simplicity since the experimenters are only require to collect the data at the end of the study. For example, medical doctors may find it easier to evaluate if a patient have complications in a pre-specified follow-up time than to evaluate the actual time when the complications occurred. Table 1.2: Binary Data for the AML Trial Relapse? Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No No No No No No No No Treatment Maintained Maintained Maintained Maintained Nonmaintained Nonmaintained Nonmaintained Nonmaintained Nonmaintained Nonmaintained Nonmaintained Maintained Maintained Maintained Maintained Maintained Nonmaintained Nonmaintained Nonmaintained Table 1.2 shows a binary data collected from the AML trial. The column Relapse is the indicator variable indicating whether the patient had a relapse within the 30-week observation. Another column Treatment indicates the treatment a patient had. 5

1.2.3 Survival Distributions Weibull Distribution Let T be the time-to-event with probability distribution distribution f(t). The probability distribution function for a Weibull distribution with shape parameter a > 0 and scale parameter b > 0 is f(t) = b a ( ) t b 1 e (t/a)b. a The mean and variance in Weibull distribution are mean = aγ(1 + 1/b) and variance = a 2 [ Γ(1 + 2/k) (Γ(1 + 1/k)) 2] where Γ(.) is the Gamma function. The distribution of T can also be described by its cumulative probability P (T < t) = F (t) = or its survival probability t 0 f(s)ds = 1 e (T/a)b S(t) = 1 F (t) = e (T/a)b. Figure 1.2 shows the survival probability S(t) over time for 4 different time-to-events in two plots. The first plot shows T 1 and T 2 : T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 2) and the second plot shows T 3 and T 4 : T 3 W eibull(a = 2, b = 1) and T 4 W eibull(a = 2, b = 2). 6

Figure 1.2: Survival Probability, S(t), of Four Different Weibull Times: T 1 W eibull(a = 1, b = 1), T 2 W eibull(a = 1, b = 2), T 3 W eibull(a = 2, b = 1) and T 4 W eibull(a = 2, b = 2) As shown in figure 1.2, the survival probability is 1 at time 0 and goes to 0 as time increases. Under equal shape parameters, a 1 = a 2, of 2 Weibull times, the difference in the scale parameters, b 1 and b 2, indicates that S 2 (t) S 1 (t) for all t except at 0 and. 7

1.2.4 Measures of Risk "Risk", for now, denotes a general notion of chance that the event will occur. A popular approach to compare the risks of 2 subjects is to use a ratio of 2 risks or risk ratio. Denote the "risk ratio", Θ. It can be expressed as Θ = risk of event for treatment group risk of event for control group. Two different measures of risk will used as a consequence of using either time-to-event or binary data. Let T be the time-to-event and it follows a distribution f(t). The first measure of risk is the hazard ratio (HR.) which is a ratio of two hazards. Define the hazard at time t as the "chance of instantaneous event at time t given that the event has not occurred prior to t". Formally, the hazard at time t is Thus, hazard ratio between 2 groups is h(t) = f(t) S(t). HR = h 2 (t)/h 2 (t) (1.1) The second measure of risk is the odds ratio (OR.) The odds of an event is another measure of risk so that the odds of an event is defined as The odds ratio between 2 groups is F (t) 1 F (t) = F (t) S(t) OR = F 2(t)/(1 F 2 (t)) F 1 (t)/(1 F 1 (t)) = F 2(t)/(S 2 (t)) F 1 (t)/(s 1 (t)) (1.2) 8

CHAPTER 2 HAZARD VERSUS ODDS RATIO Hazard and odds ratios via Cox proportional hazard and logistic regression, respectively, represents two ends in the spectrum of methods for survival analysis. On one end is HR which uses time-to-events of each subject. On the other end is OR which only requires one overview look at all the subjects at the end of the study. The use of HR and OR for analysis of survival data are driven mainly by computational restrictions in the years before 1990. A direct comparison of LR to CPH can be found in et. al. [1989]. Similar comparisons can be found in Symons and Moore [2002], Walter [2000], Callas et al. [1998] and Peduzzi et al. [1987]. Their comparisons are based on using OR to approximate HR based on the closeness of values. Their recommendations are to use OR as an alternative when the event is rare and the length of study is short. Computational limitations do not apply today and the preference of experimenters in using OR over HR is due to its simplicity and convenience. 2.1 Time-to-Event Data and Hazard Ratio The go-to analysis method for time-to-event data is the use of HR in the CPH model. It was first proposed by Cox [1975] as a regression method for censored observations and further works on estimation and asymptotic properties was presented by Cox [1975], Efron [1977] and Andersen and Gill [1982]. 9

2.1.1 Model The CPH model is specified using the hazard of an event occurring. Let h i (t) be the hazard of subject i at time t, the two-sample model is h i (t) = h 0 (t)e X iβ h 0 (t) if X = 0 = h 0 (t)e β if X = 1 (2.1) where h 0 (t) is the baseline hazard at time t, X i is the group indicator, and e β is the treatment effect on the hazard. The treatment effect in CPH is measured as the hazard ratio defined as HR = h 0(t)e β h 0 (t) = e β (2.2) The model specifies a multiplicative effect of the covariates or treatment to the hazard of the event. Also, the baseline hazard, h 0 (t), cancels out when HR is computed. This implies that the hazard ratio between 2 subjects are proportional to e β, hence the name of the model. 2.1.2 Estimation The estimation of the coefficient β is via a partial likelihood (Cox [1975]). Let h i be the hazard corresponding to the ordered survival times. The partial likelihood, P L(β), is where δ i is the indicator of event P L(β) = = = ( ) δi n h i i=1 R i h i ( n h i e X iβ i=1 j R i h i e X jβ ( n i=1 e X iβ j R i e X jβ 10 ) δi ) δi (2.3)

X i is the matrix of covariates R i denotes the risk set (i.e. subjects at risk) when subject i dies. This method is referred to as semi-nonparametric since β can be estimated from the likelihood without specifying the baseline hazard,h 0 (t). The estimation of the coefficients is carried out in a similar fashion to maximum likelihood where ˆβ is the value of β that maximizes (2.3) or equivalently, ˆβ is the solution to the log of the partial likelihood equals 0. β logl(β j) = = 0 ] n j R δ i [X ij i X ij e X iβ j R i e X iβ i=1 2.1.3 Asymptotic Normality The asymptotic normality of the coefficient estimate is ( ˆβ β) N(0, 1) I( ˆβ) 1/2 where I( ˆβ) is the Fisher s information such that so that where In a two-sample problem, 1 n I( ˆβ) is 2 1 n β 2 logl(β) = 1 n = 1 n [ E 1 n n i=1 n i I( ˆβ) = E[ 2 β 2 logl(β)] [ e X i β X ij X ik e Xiβ ( X ij e Xiβ )( X ik e Xiβ ] ) δ i ( e Xiβ ) 2 δ i Y 1i Y 2i e β (Y 1i + Y 2i e β ) 2 (2.4) 2 ] β 2 logl(β) = P (δ = 1)E(V (X T, δ = 1)) (2.5) Y it is the number at risk in group i at time t, P (δ = 1) is the probability of the event over the study duration, and E(V (X T, δ = 1)) is the variance of X conditional on the observations T and δ. 11

The standard error of ˆβ depends on the probability of the event to occur. As a consequence,the CPH becomes less efficient at shorter cutoffs. For example, if more subjects will have the event in a 12-month study compared to 6-month period. The standard errors at the 12-months period will be less than the standard error at 6-months. Specifically, a 12-month study is P (eventat6 month) P (eventat12 month) times more efficient than a 6-month study. E(V (X T, δ = 1)) is the variance of X conditional on the observations T and δ. This can only be computed after T and δ are observed. For simplicity, Jeong and Oakes [2007] used E(V (X T, δ = 1)) = 0.25 in comparing CPH to other parametric models for time-to-event data. Alternatively, we used a numerical approach to approximate the conditional variance using an idealized sample. Appendix A shows the details of the numerical approach for computing P (δ = 1)E(V (X T, δ = 1)). A more rigorous proof of the asymptotic normality of ˆβ has been shown by Andersen and Gill [1982] using martingale central limit theory. I( ˆβ) was shown to converge in probability to where 1 τ lim n n I( ˆβ) = Σ( ˆβ) = V (β, t)s 0 (β, t)λ(t)dt 0 2.1.4 Power for Testing H 0 : β = 0 S 0 (β, t) = 1 Yi (t)e X iβ n S 1 (β, t) = 1 Xi Y i (t)e X iβ n S 2 (β, t) = 1 X 2 i Y i (t)e X iβ n E(β, t) = S1 (β, t) S 0 (β, t) V (β, t) = S2 (β, t) S 0 E(β, t) 2 (β, t) Recall that HR = e β under the CPH model. It implies that β = log(hr) and testing β = 0 is equivalent to testing HR = 1 since exponentiation is a monotonic transformation. The test for β = 0 using CPH is to reject H 0 if ˆβ 0 I( ˆβ) 1/2 > Z α 2 (2.6) where ˆβ is the coefficient estimate from CPH, 12

I( ˆβ) 1/2 is the square root of the Fisher s information, and Z α 2 is the critical value from a standard normal distribution. The power function is ( ) ( π( ˆβ, σ( ˆβ)) β 0 = P σ( ˆβ) > Z β 0 1 α + P 2 ( ) ˆβ β = P σ( ˆβ)) > Z 1 α β 2 σ( ˆβ) + P ( 1 Φ Z 1 α β 2 + Φ σ( ˆβ) < Z α 2 ) ( ˆβ β σ( ˆβ) < Z α 2 β (P (δ = 1)E(V (X T, δ = 1))) ( ) 1/2 Z α β 2 (P (δ = 1)E(V (X T, δ = 1))) 1/2 σ( ˆβ) ) ) (2.7) Figure 2.1 shows three power curves for ˆβ for three different cutoff times: F 1 (C) = 0.30, 0.50 and 0.70. The distribution of T 1 is Weibull(a 1 = 2, b 1 = 1) and T 2 is a Weibull(a 2 = 2, b 2 ) where b 2 = {1, 1.1, 1.2,..., 2}. Increments of the shape parameter b 2 signifies increasing effect size of HR. Figure 2.1 shows that at increasing HR and cutoff times, the power also increases. 13

Figure 2.1: Simulated Power of HR for Testing β = 0 for ˆβ from CPH where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) and Three Cutoff Times such that F (C) = 0.30, 0.50, 0.70 The power curves shows the response of HR to time-to-event distribution for T 2. The x-axis is the scale parameter in a Weibull distribution for T 2. Scale parameter = 1 denotes the null while scale parameters > 1 denotes the alternatives. The curves shows its power increases with HR and cutoff time. 14

2.2 Binary Data and Odds Ratio 2.2.1 Model Let δ i be the indicator of event and P i (δ i = 1) = P i = P (T < C) = F i (C) = C 0 f i (t)dt (2.8) be the cumulative probability of an event from time 0 to C. LR uses a logit-link function to link the binary response to the treatment groups. The logit-link is where P i log(odds i ) = log( ) 1 P i P i = = α + X i β or equivalently eα+x iβ 1 + e α+x iβ (2.9) e α is the baseline odds of event, X i is the group indicator variable, and e β is the treatment effect to the odds of event. The method also assumes that each observation is a Bernoulli trial, taking a value of 1 if the event occurs and 0 otherwise. The Bernoulli distribution is P δ i i (1 P i ) 1 δ i The odds ratio in a two-sample problem is OR = P 2/(1 P 2 P 1 )/(1 P 1 ) = e β (2.10) 15

2.2.2 Estimation The estimation of the coefficients is via MLE (GLM). The likelihood and log-likelihood function are L(X i, α, β, δ i ) = n P δ i i (1 P i ) 1 δ i i=1 n ( Pi ) δi = (1 P i ) 1 P i=1 i n i=1 = (eα+xiβ ) δ i n i=1 (1 + eα+xiβ ) and logl(x i, α, β, δ i ) = n n δ i (α + X i β) log(1 + e α+xiβ ) i=1 i=1 The estimate of the coefficients,ˆα and ˆβ, is either the value that maximizes the likelihood or the solution to logl(x, β) = 0. The estimation is typically carried out numerically. 2.2.3 Asymptotic Normality The asymptotic normality of the coefficient estimate is ( ˆβ β) N(0, 1) I( ˆβ) 1/2 where I( ˆβ) is the Fisher s information evaluated at ˆβ. The I( ˆβ) is the estimator of the variance of ˆβ and for a two-sample problem, I 1 ( ˆβ) is (n 1 P 1 (1 P 1 )) 1 + (n 2 P 2 (1 P 2 )) 1 16

The asymptotic variance of ˆβ is the (2, 2) th element of the Fisher s information matrix. It has been shown that I 1 (ˆα, ˆβ) converges in probability to Σ 1 ; see, for example, Agresti [2002]. I 1 (ˆα, ˆβ) Σ 1 = ( X (diag(f (C)S(C)))X ) 1 0 n1 = 1 n 1 1 n2 1 n2 0 n1 I n 1 P 1 (1 P 1 ) 0 n1 1 n 1 0 n2 I n2 P 2 (1 P 2 ) 1 n2 1 n2 = n 1P 1 (1 P 1 ) + n 2 P 2 (1 P 2 ) n 2 P 2 (1 P 2 ) n 2 P 2 (1 P 2 ) n 2 P 2 (1 P 2 ) = (n 1P 1 (1 P 1 )) 1 (n 1 P 1 (1 P 1 )) 1 (n 1 P 1 (1 P 1 )) 1 (n 1 P 1 (1 P 1 )) 1 + (n 2 P 2 (1 P 2 )) 1 1 1 Thus, I 1 ( ˆβ) = (n 1 P 1 (1 P 1 )) 1 + (n 2 P 2 (1 P 2 )) 1. 2.2.4 Power for Testing H 0 : β = 0 Recall that β = log(or). The test for OR = 1 is equivalent to the test for β = 0 since exponentiation is a monotonic transformation. The asymptotic power, π, for testing β = 0 is ( ) ( ) β 0 β 0 π( ˆβ, σ( ˆβ)) = P σ( ˆβ) > Z 1 α + P 2 σ( ˆβ) < Z α 2 ( ) ˆβ β = P σ( ˆβ) > Z 1 α β 2 σ( ˆβ) + P ( 1 Φ Z 1 α β 2 + Φ ( ˆβ β σ( ˆβ < Z α β 2 σ( ˆβ (n 1 P 1 (1 P 1 )) 1 + (n 2 P 2 (1 P 2 )) ( ) 1 Z α β 2 (n 1 P 1 (1 P 1 )) 1/2 + (n 2 P 2 (1 P 2 )) 1/2 Figure 2.2 shows three power curves for testing β = 0. Each curve represents three cutoff times, F 1 (C) = 0.30, 0.50 and 0.70. T 1 a Weibull survival time with shape and scale parameters equal to 2 and 1 while T 2 follows a Weibull survival time with scale parameter 2 and shape parameters from 1 to 2 with increments of 0.1. It shows that at increasing degree of difference between T 1 and T 2, the power also increases. ) ) 17

Figure 2.2: Simulated Power of OR for Testing β = 0 for ˆβ where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1to2) and Three Cutoff Times, F (C) = 0.30, 0.50, 0.70 2.3 Comparison Under the H 0, the two groups have the same survivorship. Thus HR = OR = 1 or, equivalently, the coefficients are equal to 0. A direct comparison of standard errors is plausible. Under the alternative hypothesis, however, the coefficient parameters are not equivalent. 2.4 Asymptotic Variance Under H 0 The relative performance of the effect sizes also depends on the their asymptotic variances. Recall that the asymptotic variances of the estimators are HR 1 (P (event)e[v (X T ))] 18

where P (event) = F (C) is the total chance of the event over the interval while E[V (X T )] is the expected variance in X conditional on the observations T with a maximum value of 0.25. LR 1 n 1 F 1 (C)S 1 (C) + 1 n 2 F 2 (C)S 2 (C) Under the null hypothesis, HR and OR are equal to one. Equivalently, the coefficient parameters, β, associated with each are zero since log(1) = 0. Thus, a direct comparison of the asymptotic variances is valid. The ARE of the odds ratio against hazard ratio under the null are ARE H0 (HR, OR) = 1 (P (event)v (X T )) 2 1 P (event)(1 P (event)) 2 = 1 P (event) (2.11) P (event) denotes the combined probability of event for the groups. The asymptotic variance of HR and OR have ratio of 1 P (event). This implies that the asymptotic variance of OR will be larger compared to HR by a factor of 1/(1 P (event)). 2.5 Power Rates Under the Alternative A direct comparison of the asymptotic variances of the estimates is invalid under the alternative that S 1 (t) S 2 (t) are different parameters and do not coincide. Therefore, we will use the power of HR and OR to assess their relative performance. 19

Figure 2.3: Simulated Power of HR and OR at F 1 (C) = 0.30 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) 20

Figure 2.4: Simulated Power of HR and OR at F 1 (C) = 0.50 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) 21

Figure 2.5: Simulated Power of HR and OR at F 1 (C) = 0.70 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) 22

CHAPTER 3 SENSITIVITY TO LENGTH OF STUDY In this chapter, we study the performance of HR and OR under different cutoff times. Under varying cutoffs, the difference in P 1 and P 2 also changes. Simulations from the previous chapter shows that the response of HR and OR at 3 fixed cutoffs, F 1 (C) = 0.30, 0.50 and 0.70, are similar. Is this the case for any cutoffs? What if the survival distribution have non-proportional hazards? To compare their performance, we simulate two-sample data with n i = 50 and 100 under different Weibull distributions. Four general cases were simulated Case 1 T 1 W eibull(a = 2, b = 1) T 2 W eibull(a = 2, b = 1.2) This is a case where T have a proportional hazards and a small effect size. Case 2 T 1 W eibull(a = 2, b = 1) T 2 W eibull(a = 2, b = 1.5) This is a case where T have a proportional hazards and a moderate effect size. Case 3 23

T 1 W eibull(a = 2, b = 1) T 2 W eibull(a = 3, b = 1.2) This is a case where T have a non-proportional hazards and a small effect size. Case 4 T 1 W eibull(a = 2, b = 1) T 2 W eibull(a = 3, b = 1.5) This is a case where T have a non-proportional hazards and a moderate effect size. Figure 3.1 shows the survival function of each of the cases. 24

(a) Case 1 (b) Case 2 (c) Case 3 (d) Case 4 Figure 3.1: Survival Plots of Case 1, 2, 3 and 4 25

3.1 HR Estimate Figure 3.2 shows the simulated power of HR under proportional hazards and small effect size. It shows that power increases with increasing cutoff times. Figure 3.2: Simulated Power of HR under Case 1. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.2) 26

Figure 3.3 shows the simulated power of HR under proportional hazards and moderate effect size. It shows that power increases with increasing cutoff times. Figure 3.3: Simulated Power of HR under Case 2. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5) 27

Figure 3.4 shows the simulated power of HR under non-proportional hazards and small effect size. It shows that power initially increases with increasing cutoff times, reaches a maximum point and start to decrease. The decrease is observed when the cutoff is such that F 1 (C) = 0.60. Figure 3.4: Simulated Power of HR under Case 3. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3) 28

Figure 3.5 shows the simulated power of HR under non-proportional hazards and moderate effect size. It shows that power increases with increasing cutoff times. Figure 3.5: Simulated Power of HR under Case 4. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.5) 29

3.2 OR Estimate Figure 3.6 shows the simulated power of HR under proportional hazards and small effect size. It shows that power increases with increasing cutoff times until F 1 (C) where it decreases afterwards. Figure 3.6: Simulated Power of OR under Case 1. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.2) 30

Figure 3.7 shows the simulated power of HR under proportional hazards and moderate effect size. It shows that power increases with increasing cutoff times. Figure 3.7: Simulated Power of HR under Case 2. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5) 31

Figure 3.8 shows the simulated power of OR under non-proportional hazards and small effect size. It shows that power initially increases with increasing cutoff times, reaches a maximum point and start to decrease. The decrease is observed when the cutoff is such that F 1 (C) = 0.60. Figure 3.8: Simulated Power of OR under Case 3. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3) 32

Figure 3.9 shows the simulated power of HR under non-proportional hazards and moderate effect size. It shows that power increases with increasing cutoff times but started to decrease at F 1 (C) = 0.70. Figure 3.9: Simulated Power of OR under Case 4. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.5) 33

3.3 HR versus OR Figure 3.10 shows a comparison of HR and OR s performance at increasing cutoffs. It shows that their performance is at par up to F 1 (C) = 0.70. Afterwards, HR beats OR and becomes more apparent as F 1 (C) increases from 0.70. Figure 3.10: Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.2) 34

Figure 3.10 shows a comparison of HR and OR s performance at increasing cutoffs. It shows that their performance is at par up to F 1 (C) = 0.70. Afterwards, HR beats OR and becomes more apparent as as F 1 (C) increases from 0.70. Figure 3.11: Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5) 35

Figure 3.12 shows that the power of HR and OR seems to be equal in study with short cutoff time. After P 1 (C) = 0.30, the power curves starts to separate. Small difference in the power occurs from P 1 (C) = 30 up to 0.60 while the difference is apparent after P 1 (C) > 0.60 favoring HR. (a) Figure 3.12: Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3) 36

Figure 3.13 shows that the power of HR and OR seems to be equal in study at longer cutoff time. HR and OR both performs equally up to P 1 (C) = 0.40. A slightly better power is observed from P 1 (C) = 0.40 up to 0.70 and a clear separation favoring HR is observed afterwards. (a) Figure 3.13: Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.5) 3.4 Discussion In HR versus OR, cutoff time seems to be the key factor that determines their relative performance. Under proportional hazards assumption, HR s performance increase with increasing cutoffs. Under non-proportional hazards, HR s performance over different cutoffs does not necessarily increase. Another observation is that their performance is determined by the maximum separation. This is observed in OR s performance. Its power is increasing and seems to peak at the point when the maximum difference occurs. Afterwards the maximum, its performance deteriorates. 37

As the cutoff time increases up to the maximum difference, OR and HR s performance are almost equal. When the maximum separation occurs earlier than the cutoff and under nonproportional hazards, both HR and OR s performance deteriorate. Moreover, OR s performance seems to deteriorate quickly compared to HR. 38

CHAPTER 4 METHOD SELECTION DIAGNOSTICS In this chapter, we propose method selection diagnostics for choosing between HR and OR. The objective is to help experimenters validate their choice of method for the analysis. Two major factors will be considered in the diagnostics: 1.) cutoff time and 2.) maximum separation. 4.1 Cutoff Time C Cutoff time, C, is shown to be one of the major factors that has an effect on the performance of HR and OR. For HR, cutoff time C have a direct effect on standard error of ˆβ. The standard error, P (δ = 1)E(V (X T, δ = 1)) increases directly with cutoff time. Hence, longer study times have, in general, a positive effect for HR s performance. For OR, the cutoff time C both has an effect on ˆβ and the standard error of ˆβ. Recall that OR = P 2/(1 P 2 ) P 1 /(1 P 1 ) = e β where P i = P (T C). The estimated OR (and thus ˆβ) changes with the cutoff time. Its maximum performance with respect to cutoff time is somewhere in the middle such that P 1 (C) bounded away from 0 or 1 (see Chapter 3.) 39

Therefore, one of the underlying aspects in the diagnostic is to know the cutoff time and evaluate if it is a short, moderate or long study length. To know the length of follow-up, we use P 1 (C) to be the lower of the two groups. We define the length of study as short when P 1 (C) 0.30 moderate when 0.30 < P 1 (C) 0.70 long when P 1 (C) > 0.70 For example, P treatment = 0.45 and P control = 0.75. Since the lower of the two is P treatment = 0.45, then we denote it by P 1 (C) = 0.45. 4.2 Maximum Separation A second factor in the relative performance of HR versus OR is the maximum separation. Suppose that T 1 and T 2 are the times-to-event of the two groups. The difference in the proportion of events among the two groups changes over time. Maximum separation is achieved when the difference in the proportion is largest. Early and late separation refer to where the maximum difference, P 2 (t) P 1 (t) occurs. Survival distributions with late separation include all proportional hazards distributions but may also include non-proportional hazards distributions. Recall that under the CPH model, h 2 (t) = h 1 (t)e β (4.1) This implies that, since S(t) = e h(t), S 2 (t) = S 1 (t) eβ (4.2) Equation 4.2 is often referred to as Lehmann alternatives which is an alternative representation of the CPH model. It can be shown that under the Lehmann alternative, the maximum difference between the survival functions S 1 (t) and S 2 (t) will occur at t such that S 1 (t) < 0.4. (See Martinez and Naranjo [2010]) 40

Figure 4.1: Example of Maximum Separation between Two Weibulls By extension of the late treatment differences, the maximum difference between P 1 and P 2 occurs at t such that P 1 (t) > 0.6. Thus, by using some information on S(t) or P (t), a test can be devised to detect an early or late effect. 4.2.1 Q-Test Q-Test (Martinez and Naranjo, 2010) uses Ŝi to detect early or late treatment differences. The test statistic is Q = [Ŝ2(t 0.6 ) Ŝ1(t 0.6 )] [Ŝ2(t 0.2 ) Ŝ1(t 0.2 )] (4.3) where Ŝ i (t) is the survival estimate at time t for group i t 0.6 is the time such that Ŝ1(t) = 0.60, and t 0.2 is the time such that Ŝ1(t) = 0.20 41

The test concludes a late effect if Q < 0 and an early effect otherwise. Q-test, however, requires Ŝ 1 (t 0.2 ) which is not available if a study ends when S 1 (C) > 0.20. For example, a study is designed such that S 1 (C) = 0.40 and S 1 (C) > S 2 (C). In this scenario, the Q-test cannot be computed. The choice of 0.6 and 0.2 in test Q-test is arbitray as long as the maximum difference 0.4 is captured inside and the choice of quantiles are bounded away from 0 and 1. Hence, it can be modified to accommodate the restrictions in our survival experiment setup. Instead of 0.2, for example, any choice from 0.4 or lower can be chosen and instead of 0.6, any choice higher than 0.4 can be chosen. Therefore, pairs such as (0.75, 0.25) or (0.6, 0.3) can be used for early or late effect. 4.2.2 Q m, Modified Q-Test We propose a modified Q-test to discriminate whether the maximum separation occurred before or after the cutoff. Let Q m = [Ŝ2(t P1 /2) Ŝ1(t P1 /2)] [Ŝ2(t P1 ) Ŝ1(t P1 )] (4.4) where Ŝ i (t) is the survival estimate at time t for group i, t P1 is the maximum time observed for group 1, and t P1 /2 is the time such that P 1 (t P1 /2) = P 1 (C)/2 The Q-test tells whether the difference in S 1 and S 2 is greater at time C than the before C when Q m < 0. To test whether maximum separation occurs before the cutoff or not, if Q m 0 then the maximum separation is before the cutoff time. if Q m < 0 then the maximum separation is after the cutoff time. 42

A simulation study is conducted to test the power of Q m test in detecting whether the maximum separation occurs after the cutoff. Table 4.1 shows the result of the simulation for 2 cases with cutoffs at F 1 (C) = 0.20 up to F 1 (C) = 0.80. 1. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3) 2. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5) Table 4.1: Power of Q m for Detecting Maximum after C F 1 (C) Early Separation Late Separation 0.20 0.9431 0.8060 0.30 0.9366 0.8630 0.40 0.9160 0.8927 0.50 0.8602 0.9049 0.60 0.7499 0.9083 0.70 0.5420 0.8966 0.80 0.2825 0.8697 The simulation shows that Q m detects the maximum separation after the cutoff more when there is a late separation than when there is early separation. 4.3 Selection Diagnostics 4.3.1 ˆP1 and Q-Test In this section, we prepare a diagnostic that helps the experimenter choose betweeen HR and OR by answering the following questions: Does the study have a short, medium or long study length? Does the maximum separation occurs early or late? Figure 4.2 summarizes our proposed diagnositic which (i) uses ˆP 1 to measure the length of the study and (ii) uses the Q-test to determine an early or late effect. 43

Figure 4.2: Diagnostic Using ˆP 1 and Q The x-axis points to when the maximum separation occurred and the Q-test can be used to formally answer this question. The y-axis points to the length of the study. ˆF1 0.30 denotes a short cutoff time, 0.30 0.70 denotes a medium length cutoff time and greater than 0.70 denotes a long cutoff time. The Figure 4.2 maps out which case does the experiment fall into. 1. Late effect and short study (F 1 (C) 0.30). Under this scenario, HR and OR perform equally well. 2. Late effect and moderate study (0.70 F 1 (C) 0.30). Under this scenario, HR and OR perform equally well. 3. Late effect and long study (F 1 (C) 0.70). Under this scenario, HR outperforms OR. 4. Early effect and short study (F 1 (C) 0.30). Under this scenario, HR and OR perform equally well. 44

5. Early effect and moderate study (0.70 F 1 (C) 0.30). Under this scenario, HR moderately outperforms OR. 6. Early effect and long study (F 1 (C) 0.70). Under this scenario, HR outperforms OR. 45

4.3.2 ˆP1 and Q m -Test In this subsection, the experimenter evaluates HR and OR s relative performance by answering the following questions: Does the study have a short, medium or long study length? Did the maximum separation occurred earlier than the cutoff? Figure 4.3 summarizes the diagnositics using ˆP 1 and Q m. Figure 4.3: Diagnostic Using ˆP 1 and Q m Although similar to the first selection scheme, the main difference is on how it determines when the maximum separation occurs. Here, we ask the question Does the maximum occur before or after the study? instead. Using the scheme, the experimenter can map which case the experiment is. Ideally, an optimal experiment is designed so that the cutoff is around the maximum. If the maximum occurs before the cutoff then HR beats OR and if the cutoff occurs after C then HR and OR are almost equivalent. 46

4.3.3 Example We illustrate the use of the scheme on the AML data. Figure 4.4 plots the estimated survival curve of the data and the estimates are shown below. The failure estimates are computed by taking F = 1 S. x=maintained time n.risk n.event survival failure 9 9 1 0.889 0.111 13 8 1 0.778 0.222 18 7 1 0.667 0.667 23 6 1 0.556 0.444 x=nonmaintained time n.risk n.event survival failure 5 10 2 0.8 0.2 8 8 2 0.6 0.4 12 6 1 0.5 0.5 23 5 1 0.4 0.6 27 4 1 0.3 0.7 47

Figure 4.4: Survival Plot of the AML data By the end of the 30-week period 0.444 and 0.7 of the patients have a relapse in the maintained and non-maintained groups, respectively. Since the lower of the two is 0.444, then we denote F 1 (30) = 0.444 and hence a medium cutoff. We use the Q test with (0.5, 0.3) as points of comparison, then Q = [.889 0.6] [0.556 0.3] = 0.033 If we use Q m test, then Q m = [1 (1.556)/2 S 1 ] [0.556 0.3] = [0.778 0.5] [0.556 0.3] = 0.022 The Q-test concludes that their is an early effect while the Q m -test concludes that the maximum occurred before week 30. Since the study has a moderate length and an early effect or maximum occurring before the cutoff, then OR s performance is slightly less than HR s. 48

The output below are results for using CPH and logistic regression for estimating HR and OR respectively. As expected, the p-value for HR is less than OR although both are not significant at 0.05 level of significance. coxph(formula = Surv(time, status) ~ x, data = aml3) coef exp(coef) se(coef) z p xnonmaintained 0.779 2.18 0.629 1.24 0.22 glm(formula = status ~ x, family = binomial, data = aml3) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) -0.2231 0.6708-0.333 0.739 xnonmaintained 1.0704 0.9624 1.112 0.266 49