Comparison of Hazard, Odds and Risk Ratio in the Two-Sample Survival Problem

Size: px
Start display at page:

Download "Comparison of Hazard, Odds and Risk Ratio in the Two-Sample Survival Problem"

Transcription

1 Western Michigan University ScholarWorks at WMU Dissertations Graduate College Comparison of Hazard, Odds and Risk Ratio in the Two-Sample Survival Problem Benedict P. Dormitorio Western Michigan University, Follow this and additional works at: Part of the Probability Commons, Statistical Methodology Commons, and the Statistical Models Commons Recommended Citation Dormitorio, Benedict P., "Comparison of Hazard, Odds and Risk Ratio in the Two-Sample Survival Problem" (2014). Dissertations This Dissertation-Open Access is brought to you for free and open access by the Graduate College at ScholarWorks at WMU. It has been accepted for inclusion in Dissertations by an authorized administrator of ScholarWorks at WMU. For more information, please contact

2 COMPARISON OF HAZARD, ODDS AND RISK RATIO IN THE TWO-SAMPLE SURVIVAL PROBLEM by Benedict P. Dormitorio A dissertation submitted to the Graduate College in partial fulfillment of the requirements for the degree of Doctor of Philosophy Statistics Western Michigan University August 2014 Doctoral Committe: Joshua Naranjo, Ph.D., Chair Rajib Paul, Ph.D. Jung Chao Wang, Ph.D. Mark Schauer, MD

3 COMPARISON OF HAZARD, ODDS AND RISK RATIO IN THE TWO-SAMPLE SURVIVAL PROBLEM Benedict P. Dormitorio, Ph.D. Western Michigan University, 2014 Cox proportional hazards is the standard method for analyzing treatment efficacy when time-to-event data is available. In the absence of time-to-event, investigators may use logistic regression which only requires relative frequencies of events, or Poisson regression which requires only interval-summarized frequency tables of time-to-event. When event frequencies are used instead of time-to-events, does it always result in a loss in power? We investigate the relative performance of the three methods. In particular, we compare the power of tests based on the respective effect-size estimates (1)hazard ratio (HR), (2)odds ratio (OR), and (3)risk ratio (RR). We use a variety of survival distributions and cut-off points representing length of study. We will show that the relative performance of OR against HR depends on the relative early-or-late separation of the two survival curves, and that OR and HR performed better than RR. We propose diagnostics based on the maximum separation to help investigators choose between OR and HR.

4 c 2014 Benedict P. Dormitorio

5 ACKNOWLEDGEMENTS I would like to thank everyone who have vastly contributed to my education. To my advisor, Dr. Naranjo, for his guidance and constant encouragement - thank you. To my committee members, Dr. Paul, Dr. Wang and Dr. Schauer, thank you for the review and recommendations. To those I have worked with over the years, Dr. Toledo and the rest of WMED, Dr. Paul, Dr. Kothari and Dr. Curtis at HDREAM, Dr. Tersptra at the consulting lab and colleagues at MPI Research - thank you. To my family and friends who have kept me sane and grounded, thank you. Lastly, to God - thank you. Benedict P. Dormitorio ii

6 TABLE OF CONTENTS ACKNOWLEDGEMENTS ii LIST OF TABLES vi LIST OF FIGURES vii CHAPTER 1. INTRODUCTION Background and Motivation Basic Definition Two-Sample Survival Problem Data Types Survival Distributions Measures of Risk HAZARD VERSUS ODDS RATIO Time-to-Event Data and Hazard Ratio Model Estimation Asymptotic Normality Power for Testing H 0 : β = Binary Data and Odds Ratio Model Estimation iii

7 Table of Contents Continued CHAPTER Asymptotic Normality Power for Testing H 0 : β = Comparison Asymptotic Variance Under H Power Rates Under the Alternative SENSITIVITY TO LENGTH OF STUDY HR Estimate OR Estimate HR versus OR Discussion METHOD SELECTION DIAGNOSTICS Cutoff Time C Maximum Separation Q-Test Q m, Modified Q-Test Selection Diagnostics ˆP1 and Q-Test ˆP1 and Q m -Test Example POISSON RISK RATIO Grouped Data and Poisson Risk Ratio iv

8 Table of Contents Continued CHAPTER Model Estimation Asymptotic Normality Power Sensitivity to Increasing Periods Analysis at Cutoff Analysis at Each Period RR versus HR and OR CONCLUSION APPENDICES A. Approximation of P (δ = 1)EV (X T, δ) B. Comparison of Theoretical and Simulated Power REFERENCES v

9 LIST OF TABLES 1.1 Time-to-Event Data for the AML Trial Binary Data for the AML Trial Power of Q m for Detecting Maximum after C Grouped Data for AML Summarized Every 10 Weeks Simulated Power with Weibull Survival Times under Proportional Hazards. T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.25). RR Estimates are Based on Grouped Data on 1, 2, 3, 4 and m 1t + m 2t Intervals (left to right) and Increasing Study Length (top to bottom.) Simulated Power with Weibull Survival Times under Proportional Hazards. T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.5). RR Estimates are Based on Grouped Data on 1, 2, 3, 4 and m 1t + m 2t Intervals (left to right) and Increasing Study Length (top to bottom.) Simulated Power with Weibull Survival Times under Proportional Hazards. T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 2). RR Estimates are Based on Grouped Data on 1, 2, 3, 4 and m 1t + m 2t Intervals (left to right) and Increasing Study Length (top to bottom.) Simulated Power with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.2) Simulated Power with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.5) Simulated Power with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 2) 57 vi

10 LIST OF FIGURES 1.1 AML Trial with 2 Groups, Maintained Group (Group 1, Straight Lines) and Nonmaintained Group (Group 2, Dashed Lines), and a Cutoff Time of 30 Weeks Survival Probability, S(t), of Four Different Weibull Times: T 1 W eibull(a = 1, b = 1), T 2 W eibull(a = 1, b = 2), T 3 W eibull(a = 2, b = 1) and T 4 W eibull(a = 2, b = 2) Simulated Power of HR for Testing β = 0 for ˆβ from CPH where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) and Three Cutoff Times such that F (C) = 0.30, 0.50, Simulated Power of OR for Testing β = 0 for ˆβ where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1to2) and Three Cutoff Times, F (C) = 0.30, 0.50, Simulated Power of HR and OR at F 1 (C) = 0.30 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) Simulated Power of HR and OR at F 1 (C) = 0.50 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) Simulated Power of HR and OR at F 1 (C) = 0.70 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) Survival Plots of Case 1, 2, 3 and Simulated Power of HR under Case 1. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.2) Simulated Power of HR under Case 2. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5) Simulated Power of HR under Case 3. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3) Simulated Power of HR under Case 4. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.5) vii

11 List of Figures Continued 3.6 Simulated Power of OR under Case 1. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.2) Simulated Power of HR under Case 2. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5) Simulated Power of OR under Case 3. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3) Simulated Power of OR under Case 4. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.5) Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.2) Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5) Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3) Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.5) Example of Maximum Separation between Two Weibulls Diagnostic Using ˆP 1 and Q Diagnostic Using ˆP 1 and Q m Survival Plot of the AML data Simulated Power of the Test of RR for H 0 : β = 0 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) and Three Cutoff Times such that F (C) = 0.30, 0.50, Simulated Power of RR against HR and OR with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.25) and increasing Cutoffs viii

12 List of Figures Continued 5.3 Simulated Power of RR against HR and OR with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.5) and increasing Cutoffs Simulated Power of RR against HR and OR with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 2) and increasing Cutoffs Simulated Power of RR against HR and OR with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 2) and Increasing Periods Simulated Power of RR against HR and OR with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 1.5) and Increasing Periods Simulated Power of RR against HR and OR with T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 2) and Increasing Periods A.1 Illustration of V (X T, δ) B.1 Theoretical versus Simulated Power of HR B.2 Theoretical versus Simulated Power of OR B.3 Theoretical versus Simulated Power of RR ix

13 CHAPTER 1 INTRODUCTION 1.1 Background and Motivation In survival experiments, the primary goal is to investigate the effect of a treatment to the risk of an event. For example, a researcher may be interested in the efficacy of a new drug in reducing the risk of an unwanted event such as death or complications or in increasing the chance of a wanted event such as recovery from illness or surgery. The most widely used data and analysis method for survival experiments is time-to-events data and the Cox proportional hazard (CPH) model (Cox 1972, 1975), respectively. In this setup, subjects are assigned to one of the groups and followed until the event occurred or the study ends. The time until the event occurred is recorded for each subject. The hazard ratio (HR) is then estimated using CPH to evaluate the relative risks among the groups. An alternative approach is to only collect a dichotomous or binary data and use logistic regression (LR) for the analysis. Here, subjects are assigned to one of the groups and evaluated at the end of the study. The evaluation only considers whether the event occurred or not. The odds ratio (OR) is then estimated using LR to assess the relative risk among the groups. When the data collected varies, the statistical method that can be used to estimate and quantify the risk also varies. When summarized or binary data is used instead of time-to-events in a survival experiment, does it always result to a loss in power? If so, how much power is lost? If not, under what situations? The literature in comparing statistical methods for survival analysis is extensive. For example, the optimality of the log-rank and Wilcoxon test for testing difference in survivability or risk of event is well studied. Various testing schemes such as Martinez and Naranjo [2010] and 1

14 Darilay and Naranjo [2011] was proposed to take advantage of the known optimal scenarios for log-rank and Wilcoxon test. However, research on the various data type and analysis methods for survival analysis is limited. The use of Poisson regression (PR) for grouped data was proposed by Prentice and Gloeckler [1978], Holford [1980], Frome [1983] and Laird and Olivier [1981] as an alternative to CPH mainly due to computational limitations. Research on PR and LR as an approximation to CPH was also studied by Callas et al. [1998], et. al. [1989], Walter [2000] and Symons and Moore [2002]. On the other hand, the optimal scenarios for the performance of CPH, PR and LR as a consequence of data collection has not been explored. In this research, we will investigate the performance of Cox proportional hazard model, logistic and Poisson regression as a consequence of the data type collected using their statistical power. Specifically, we will explore their sensitivity to the length of study and the sensitivity of PR for increasing number of summary under different distributions. Lastly, we will propose diagnostics and recommendations on when to use the three methods. 1.2 Basic Definition Two-Sample Survival Problem Two-sample survival experiments are conducted to investigate the relative risk of an event among two groups. We denote the two groups as group 1 and group 2 in a general setting. n = n 1 + n 2 subjects are assigned to groups 1 and 2, respectively, and observed over the study period called cutoff time. As an example, we adapt a data set from Miller [1981]. The efficacy of maintained chemotherapy for patients with acute myelogenous leukemia (AML) was studied in a clinical trial. The patients undergo initial chemotherapy until a state of remission was attained. Then they were assigned to either a group with maintained chemotherapy or no maintained chemotherapy. The goal is to assess whether maintained chemotherapy reduces the risk of relapse. We consider the data with only a follow-up period of 30 weeks. Figure 1.1 is an illustration of the AML trial. The x-axis is the time in weeks and y-axis is the subject number. The straight and dashed lines denotes the two groups, maintained and non-maintained. The length of the lines shows the time to relapse. For example, 4 subjects in the maintained group had a relapse in less than 30 weeks while there were 7 in the non-maintained group. Subject 1, from the maintained chemotherapy group, had a relapse after 9 weeks and 2

15 subject 5, from non-maintained chemotherapy group, after 4 weeks. Figure 1.1: AML Trial with 2 Groups, Maintained Group (Group 1, Straight Lines) and Nonmaintained Group (Group 2, Dashed Lines), and a Cutoff Time of 30 Weeks Data Types Data collected from survival experiments can vary and depends on available resources. A wellfunded experiment, for example, may allow for a thorough follow-up of each subject. In contrast, a study with limited funding faces constraints and may only allow limited follow-ups. Availability of data also put some restrictions on data collection such as epidemiological or retrospective studies where available data are already summarized. 3

16 Time-to-Event When an experiment is conducted such that the time it took for an event to occur for each subject, called time-to-events, can be collected. Time-to-event data comes in pair of variables: 1.) time and 2.) an indicator of event. The time is recorded for each subject with an event while the cutoff time and a + sign is recorded when the subject did not have the event yet during the study. In Table 1.1, the time-to-event data from the AML trial is shown. The column Weeks to Relapse is the time and the column Relapse? indicates the event. A Treatment column shows which treatment group the patient belongs to. Table 1.1: Time-to-Event Data for the AML Trial Weeks to Relapse Relapse? Treatment 9 Yes Maintained 13 Yes Maintained 18 Yes Maintained 22 Yes Maintained 4 Yes Nonmaintained 5 Yes Nonmaintained 7 Yes Nonmaintained 8 Yes Nonmaintained 12 Yes Nonmaintained 23 Yes Nonmaintained 27 Yes Nonmaintained 30+ No Maintained 30+ No Maintained 30+ No Maintained 30+ No Maintained 30+ No Maintained 30+ No Nonmaintained 30+ No Nonmaintained 30+ No Nonmaintained 4

17 Binary Data When the observations made on the subjects only indicates whether the event occurred or not, dichotomous or binary data are recorded. The advantage of this data type is simplicity since the experimenters are only require to collect the data at the end of the study. For example, medical doctors may find it easier to evaluate if a patient have complications in a pre-specified follow-up time than to evaluate the actual time when the complications occurred. Table 1.2: Binary Data for the AML Trial Relapse? Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No No No No No No No No Treatment Maintained Maintained Maintained Maintained Nonmaintained Nonmaintained Nonmaintained Nonmaintained Nonmaintained Nonmaintained Nonmaintained Maintained Maintained Maintained Maintained Maintained Nonmaintained Nonmaintained Nonmaintained Table 1.2 shows a binary data collected from the AML trial. The column Relapse is the indicator variable indicating whether the patient had a relapse within the 30-week observation. Another column Treatment indicates the treatment a patient had. 5

18 1.2.3 Survival Distributions Weibull Distribution Let T be the time-to-event with probability distribution distribution f(t). The probability distribution function for a Weibull distribution with shape parameter a > 0 and scale parameter b > 0 is f(t) = b a ( ) t b 1 e (t/a)b. a The mean and variance in Weibull distribution are mean = aγ(1 + 1/b) and variance = a 2 [ Γ(1 + 2/k) (Γ(1 + 1/k)) 2] where Γ(.) is the Gamma function. The distribution of T can also be described by its cumulative probability P (T < t) = F (t) = or its survival probability t 0 f(s)ds = 1 e (T/a)b S(t) = 1 F (t) = e (T/a)b. Figure 1.2 shows the survival probability S(t) over time for 4 different time-to-events in two plots. The first plot shows T 1 and T 2 : T 1 W eibull(a = 1, b = 1) and T 2 W eibull(a = 1, b = 2) and the second plot shows T 3 and T 4 : T 3 W eibull(a = 2, b = 1) and T 4 W eibull(a = 2, b = 2). 6

19 Figure 1.2: Survival Probability, S(t), of Four Different Weibull Times: T 1 W eibull(a = 1, b = 1), T 2 W eibull(a = 1, b = 2), T 3 W eibull(a = 2, b = 1) and T 4 W eibull(a = 2, b = 2) As shown in figure 1.2, the survival probability is 1 at time 0 and goes to 0 as time increases. Under equal shape parameters, a 1 = a 2, of 2 Weibull times, the difference in the scale parameters, b 1 and b 2, indicates that S 2 (t) S 1 (t) for all t except at 0 and. 7

20 1.2.4 Measures of Risk "Risk", for now, denotes a general notion of chance that the event will occur. A popular approach to compare the risks of 2 subjects is to use a ratio of 2 risks or risk ratio. Denote the "risk ratio", Θ. It can be expressed as Θ = risk of event for treatment group risk of event for control group. Two different measures of risk will used as a consequence of using either time-to-event or binary data. Let T be the time-to-event and it follows a distribution f(t). The first measure of risk is the hazard ratio (HR.) which is a ratio of two hazards. Define the hazard at time t as the "chance of instantaneous event at time t given that the event has not occurred prior to t". Formally, the hazard at time t is Thus, hazard ratio between 2 groups is h(t) = f(t) S(t). HR = h 2 (t)/h 2 (t) (1.1) The second measure of risk is the odds ratio (OR.) The odds of an event is another measure of risk so that the odds of an event is defined as The odds ratio between 2 groups is F (t) 1 F (t) = F (t) S(t) OR = F 2(t)/(1 F 2 (t)) F 1 (t)/(1 F 1 (t)) = F 2(t)/(S 2 (t)) F 1 (t)/(s 1 (t)) (1.2) 8

21 CHAPTER 2 HAZARD VERSUS ODDS RATIO Hazard and odds ratios via Cox proportional hazard and logistic regression, respectively, represents two ends in the spectrum of methods for survival analysis. On one end is HR which uses time-to-events of each subject. On the other end is OR which only requires one overview look at all the subjects at the end of the study. The use of HR and OR for analysis of survival data are driven mainly by computational restrictions in the years before A direct comparison of LR to CPH can be found in et. al. [1989]. Similar comparisons can be found in Symons and Moore [2002], Walter [2000], Callas et al. [1998] and Peduzzi et al. [1987]. Their comparisons are based on using OR to approximate HR based on the closeness of values. Their recommendations are to use OR as an alternative when the event is rare and the length of study is short. Computational limitations do not apply today and the preference of experimenters in using OR over HR is due to its simplicity and convenience. 2.1 Time-to-Event Data and Hazard Ratio The go-to analysis method for time-to-event data is the use of HR in the CPH model. It was first proposed by Cox [1975] as a regression method for censored observations and further works on estimation and asymptotic properties was presented by Cox [1975], Efron [1977] and Andersen and Gill [1982]. 9

22 2.1.1 Model The CPH model is specified using the hazard of an event occurring. Let h i (t) be the hazard of subject i at time t, the two-sample model is h i (t) = h 0 (t)e X iβ h 0 (t) if X = 0 = h 0 (t)e β if X = 1 (2.1) where h 0 (t) is the baseline hazard at time t, X i is the group indicator, and e β is the treatment effect on the hazard. The treatment effect in CPH is measured as the hazard ratio defined as HR = h 0(t)e β h 0 (t) = e β (2.2) The model specifies a multiplicative effect of the covariates or treatment to the hazard of the event. Also, the baseline hazard, h 0 (t), cancels out when HR is computed. This implies that the hazard ratio between 2 subjects are proportional to e β, hence the name of the model Estimation The estimation of the coefficient β is via a partial likelihood (Cox [1975]). Let h i be the hazard corresponding to the ordered survival times. The partial likelihood, P L(β), is where δ i is the indicator of event P L(β) = = = ( ) δi n h i i=1 R i h i ( n h i e X iβ i=1 j R i h i e X jβ ( n i=1 e X iβ j R i e X jβ 10 ) δi ) δi (2.3)

23 X i is the matrix of covariates R i denotes the risk set (i.e. subjects at risk) when subject i dies. This method is referred to as semi-nonparametric since β can be estimated from the likelihood without specifying the baseline hazard,h 0 (t). The estimation of the coefficients is carried out in a similar fashion to maximum likelihood where ˆβ is the value of β that maximizes (2.3) or equivalently, ˆβ is the solution to the log of the partial likelihood equals 0. β logl(β j) = = 0 ] n j R δ i [X ij i X ij e X iβ j R i e X iβ i= Asymptotic Normality The asymptotic normality of the coefficient estimate is ( ˆβ β) N(0, 1) I( ˆβ) 1/2 where I( ˆβ) is the Fisher s information such that so that where In a two-sample problem, 1 n I( ˆβ) is 2 1 n β 2 logl(β) = 1 n = 1 n [ E 1 n n i=1 n i I( ˆβ) = E[ 2 β 2 logl(β)] [ e X i β X ij X ik e Xiβ ( X ij e Xiβ )( X ik e Xiβ ] ) δ i ( e Xiβ ) 2 δ i Y 1i Y 2i e β (Y 1i + Y 2i e β ) 2 (2.4) 2 ] β 2 logl(β) = P (δ = 1)E(V (X T, δ = 1)) (2.5) Y it is the number at risk in group i at time t, P (δ = 1) is the probability of the event over the study duration, and E(V (X T, δ = 1)) is the variance of X conditional on the observations T and δ. 11

24 The standard error of ˆβ depends on the probability of the event to occur. As a consequence,the CPH becomes less efficient at shorter cutoffs. For example, if more subjects will have the event in a 12-month study compared to 6-month period. The standard errors at the 12-months period will be less than the standard error at 6-months. Specifically, a 12-month study is P (eventat6 month) P (eventat12 month) times more efficient than a 6-month study. E(V (X T, δ = 1)) is the variance of X conditional on the observations T and δ. This can only be computed after T and δ are observed. For simplicity, Jeong and Oakes [2007] used E(V (X T, δ = 1)) = 0.25 in comparing CPH to other parametric models for time-to-event data. Alternatively, we used a numerical approach to approximate the conditional variance using an idealized sample. Appendix A shows the details of the numerical approach for computing P (δ = 1)E(V (X T, δ = 1)). A more rigorous proof of the asymptotic normality of ˆβ has been shown by Andersen and Gill [1982] using martingale central limit theory. I( ˆβ) was shown to converge in probability to where 1 τ lim n n I( ˆβ) = Σ( ˆβ) = V (β, t)s 0 (β, t)λ(t)dt Power for Testing H 0 : β = 0 S 0 (β, t) = 1 Yi (t)e X iβ n S 1 (β, t) = 1 Xi Y i (t)e X iβ n S 2 (β, t) = 1 X 2 i Y i (t)e X iβ n E(β, t) = S1 (β, t) S 0 (β, t) V (β, t) = S2 (β, t) S 0 E(β, t) 2 (β, t) Recall that HR = e β under the CPH model. It implies that β = log(hr) and testing β = 0 is equivalent to testing HR = 1 since exponentiation is a monotonic transformation. The test for β = 0 using CPH is to reject H 0 if ˆβ 0 I( ˆβ) 1/2 > Z α 2 (2.6) where ˆβ is the coefficient estimate from CPH, 12

25 I( ˆβ) 1/2 is the square root of the Fisher s information, and Z α 2 is the critical value from a standard normal distribution. The power function is ( ) ( π( ˆβ, σ( ˆβ)) β 0 = P σ( ˆβ) > Z β 0 1 α + P 2 ( ) ˆβ β = P σ( ˆβ)) > Z 1 α β 2 σ( ˆβ) + P ( 1 Φ Z 1 α β 2 + Φ σ( ˆβ) < Z α 2 ) ( ˆβ β σ( ˆβ) < Z α 2 β (P (δ = 1)E(V (X T, δ = 1))) ( ) 1/2 Z α β 2 (P (δ = 1)E(V (X T, δ = 1))) 1/2 σ( ˆβ) ) ) (2.7) Figure 2.1 shows three power curves for ˆβ for three different cutoff times: F 1 (C) = 0.30, 0.50 and The distribution of T 1 is Weibull(a 1 = 2, b 1 = 1) and T 2 is a Weibull(a 2 = 2, b 2 ) where b 2 = {1, 1.1, 1.2,..., 2}. Increments of the shape parameter b 2 signifies increasing effect size of HR. Figure 2.1 shows that at increasing HR and cutoff times, the power also increases. 13

26 Figure 2.1: Simulated Power of HR for Testing β = 0 for ˆβ from CPH where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) and Three Cutoff Times such that F (C) = 0.30, 0.50, 0.70 The power curves shows the response of HR to time-to-event distribution for T 2. The x-axis is the scale parameter in a Weibull distribution for T 2. Scale parameter = 1 denotes the null while scale parameters > 1 denotes the alternatives. The curves shows its power increases with HR and cutoff time. 14

27 2.2 Binary Data and Odds Ratio Model Let δ i be the indicator of event and P i (δ i = 1) = P i = P (T < C) = F i (C) = C 0 f i (t)dt (2.8) be the cumulative probability of an event from time 0 to C. LR uses a logit-link function to link the binary response to the treatment groups. The logit-link is where P i log(odds i ) = log( ) 1 P i P i = = α + X i β or equivalently eα+x iβ 1 + e α+x iβ (2.9) e α is the baseline odds of event, X i is the group indicator variable, and e β is the treatment effect to the odds of event. The method also assumes that each observation is a Bernoulli trial, taking a value of 1 if the event occurs and 0 otherwise. The Bernoulli distribution is P δ i i (1 P i ) 1 δ i The odds ratio in a two-sample problem is OR = P 2/(1 P 2 P 1 )/(1 P 1 ) = e β (2.10) 15

28 2.2.2 Estimation The estimation of the coefficients is via MLE (GLM). The likelihood and log-likelihood function are L(X i, α, β, δ i ) = n P δ i i (1 P i ) 1 δ i i=1 n ( Pi ) δi = (1 P i ) 1 P i=1 i n i=1 = (eα+xiβ ) δ i n i=1 (1 + eα+xiβ ) and logl(x i, α, β, δ i ) = n n δ i (α + X i β) log(1 + e α+xiβ ) i=1 i=1 The estimate of the coefficients,ˆα and ˆβ, is either the value that maximizes the likelihood or the solution to logl(x, β) = 0. The estimation is typically carried out numerically Asymptotic Normality The asymptotic normality of the coefficient estimate is ( ˆβ β) N(0, 1) I( ˆβ) 1/2 where I( ˆβ) is the Fisher s information evaluated at ˆβ. The I( ˆβ) is the estimator of the variance of ˆβ and for a two-sample problem, I 1 ( ˆβ) is (n 1 P 1 (1 P 1 )) 1 + (n 2 P 2 (1 P 2 )) 1 16

29 The asymptotic variance of ˆβ is the (2, 2) th element of the Fisher s information matrix. It has been shown that I 1 (ˆα, ˆβ) converges in probability to Σ 1 ; see, for example, Agresti [2002]. I 1 (ˆα, ˆβ) Σ 1 = ( X (diag(f (C)S(C)))X ) 1 0 n1 = 1 n 1 1 n2 1 n2 0 n1 I n 1 P 1 (1 P 1 ) 0 n1 1 n 1 0 n2 I n2 P 2 (1 P 2 ) 1 n2 1 n2 = n 1P 1 (1 P 1 ) + n 2 P 2 (1 P 2 ) n 2 P 2 (1 P 2 ) n 2 P 2 (1 P 2 ) n 2 P 2 (1 P 2 ) = (n 1P 1 (1 P 1 )) 1 (n 1 P 1 (1 P 1 )) 1 (n 1 P 1 (1 P 1 )) 1 (n 1 P 1 (1 P 1 )) 1 + (n 2 P 2 (1 P 2 )) Thus, I 1 ( ˆβ) = (n 1 P 1 (1 P 1 )) 1 + (n 2 P 2 (1 P 2 )) Power for Testing H 0 : β = 0 Recall that β = log(or). The test for OR = 1 is equivalent to the test for β = 0 since exponentiation is a monotonic transformation. The asymptotic power, π, for testing β = 0 is ( ) ( ) β 0 β 0 π( ˆβ, σ( ˆβ)) = P σ( ˆβ) > Z 1 α + P 2 σ( ˆβ) < Z α 2 ( ) ˆβ β = P σ( ˆβ) > Z 1 α β 2 σ( ˆβ) + P ( 1 Φ Z 1 α β 2 + Φ ( ˆβ β σ( ˆβ < Z α β 2 σ( ˆβ (n 1 P 1 (1 P 1 )) 1 + (n 2 P 2 (1 P 2 )) ( ) 1 Z α β 2 (n 1 P 1 (1 P 1 )) 1/2 + (n 2 P 2 (1 P 2 )) 1/2 Figure 2.2 shows three power curves for testing β = 0. Each curve represents three cutoff times, F 1 (C) = 0.30, 0.50 and T 1 a Weibull survival time with shape and scale parameters equal to 2 and 1 while T 2 follows a Weibull survival time with scale parameter 2 and shape parameters from 1 to 2 with increments of 0.1. It shows that at increasing degree of difference between T 1 and T 2, the power also increases. ) ) 17

30 Figure 2.2: Simulated Power of OR for Testing β = 0 for ˆβ where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1to2) and Three Cutoff Times, F (C) = 0.30, 0.50, Comparison Under the H 0, the two groups have the same survivorship. Thus HR = OR = 1 or, equivalently, the coefficients are equal to 0. A direct comparison of standard errors is plausible. Under the alternative hypothesis, however, the coefficient parameters are not equivalent. 2.4 Asymptotic Variance Under H 0 The relative performance of the effect sizes also depends on the their asymptotic variances. Recall that the asymptotic variances of the estimators are HR 1 (P (event)e[v (X T ))] 18

31 where P (event) = F (C) is the total chance of the event over the interval while E[V (X T )] is the expected variance in X conditional on the observations T with a maximum value of LR 1 n 1 F 1 (C)S 1 (C) + 1 n 2 F 2 (C)S 2 (C) Under the null hypothesis, HR and OR are equal to one. Equivalently, the coefficient parameters, β, associated with each are zero since log(1) = 0. Thus, a direct comparison of the asymptotic variances is valid. The ARE of the odds ratio against hazard ratio under the null are ARE H0 (HR, OR) = 1 (P (event)v (X T )) 2 1 P (event)(1 P (event)) 2 = 1 P (event) (2.11) P (event) denotes the combined probability of event for the groups. The asymptotic variance of HR and OR have ratio of 1 P (event). This implies that the asymptotic variance of OR will be larger compared to HR by a factor of 1/(1 P (event)). 2.5 Power Rates Under the Alternative A direct comparison of the asymptotic variances of the estimates is invalid under the alternative that S 1 (t) S 2 (t) are different parameters and do not coincide. Therefore, we will use the power of HR and OR to assess their relative performance. 19

32 Figure 2.3: Simulated Power of HR and OR at F 1 (C) = 0.30 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) 20

33 Figure 2.4: Simulated Power of HR and OR at F 1 (C) = 0.50 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) 21

34 Figure 2.5: Simulated Power of HR and OR at F 1 (C) = 0.70 where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1 to 2) 22

35 CHAPTER 3 SENSITIVITY TO LENGTH OF STUDY In this chapter, we study the performance of HR and OR under different cutoff times. Under varying cutoffs, the difference in P 1 and P 2 also changes. Simulations from the previous chapter shows that the response of HR and OR at 3 fixed cutoffs, F 1 (C) = 0.30, 0.50 and 0.70, are similar. Is this the case for any cutoffs? What if the survival distribution have non-proportional hazards? To compare their performance, we simulate two-sample data with n i = 50 and 100 under different Weibull distributions. Four general cases were simulated Case 1 T 1 W eibull(a = 2, b = 1) T 2 W eibull(a = 2, b = 1.2) This is a case where T have a proportional hazards and a small effect size. Case 2 T 1 W eibull(a = 2, b = 1) T 2 W eibull(a = 2, b = 1.5) This is a case where T have a proportional hazards and a moderate effect size. Case 3 23

36 T 1 W eibull(a = 2, b = 1) T 2 W eibull(a = 3, b = 1.2) This is a case where T have a non-proportional hazards and a small effect size. Case 4 T 1 W eibull(a = 2, b = 1) T 2 W eibull(a = 3, b = 1.5) This is a case where T have a non-proportional hazards and a moderate effect size. Figure 3.1 shows the survival function of each of the cases. 24

37 (a) Case 1 (b) Case 2 (c) Case 3 (d) Case 4 Figure 3.1: Survival Plots of Case 1, 2, 3 and 4 25

38 3.1 HR Estimate Figure 3.2 shows the simulated power of HR under proportional hazards and small effect size. It shows that power increases with increasing cutoff times. Figure 3.2: Simulated Power of HR under Case 1. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.2) 26

39 Figure 3.3 shows the simulated power of HR under proportional hazards and moderate effect size. It shows that power increases with increasing cutoff times. Figure 3.3: Simulated Power of HR under Case 2. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5) 27

40 Figure 3.4 shows the simulated power of HR under non-proportional hazards and small effect size. It shows that power initially increases with increasing cutoff times, reaches a maximum point and start to decrease. The decrease is observed when the cutoff is such that F 1 (C) = Figure 3.4: Simulated Power of HR under Case 3. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3) 28

41 Figure 3.5 shows the simulated power of HR under non-proportional hazards and moderate effect size. It shows that power increases with increasing cutoff times. Figure 3.5: Simulated Power of HR under Case 4. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.5) 29

42 3.2 OR Estimate Figure 3.6 shows the simulated power of HR under proportional hazards and small effect size. It shows that power increases with increasing cutoff times until F 1 (C) where it decreases afterwards. Figure 3.6: Simulated Power of OR under Case 1. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.2) 30

43 Figure 3.7 shows the simulated power of HR under proportional hazards and moderate effect size. It shows that power increases with increasing cutoff times. Figure 3.7: Simulated Power of HR under Case 2. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5) 31

44 Figure 3.8 shows the simulated power of OR under non-proportional hazards and small effect size. It shows that power initially increases with increasing cutoff times, reaches a maximum point and start to decrease. The decrease is observed when the cutoff is such that F 1 (C) = Figure 3.8: Simulated Power of OR under Case 3. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3) 32

45 Figure 3.9 shows the simulated power of HR under non-proportional hazards and moderate effect size. It shows that power increases with increasing cutoff times but started to decrease at F 1 (C) = Figure 3.9: Simulated Power of OR under Case 4. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.5) 33

46 3.3 HR versus OR Figure 3.10 shows a comparison of HR and OR s performance at increasing cutoffs. It shows that their performance is at par up to F 1 (C) = Afterwards, HR beats OR and becomes more apparent as F 1 (C) increases from Figure 3.10: Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.2) 34

47 Figure 3.10 shows a comparison of HR and OR s performance at increasing cutoffs. It shows that their performance is at par up to F 1 (C) = Afterwards, HR beats OR and becomes more apparent as as F 1 (C) increases from Figure 3.11: Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5) 35

48 Figure 3.12 shows that the power of HR and OR seems to be equal in study with short cutoff time. After P 1 (C) = 0.30, the power curves starts to separate. Small difference in the power occurs from P 1 (C) = 30 up to 0.60 while the difference is apparent after P 1 (C) > 0.60 favoring HR. (a) Figure 3.12: Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3) 36

49 Figure 3.13 shows that the power of HR and OR seems to be equal in study at longer cutoff time. HR and OR both performs equally up to P 1 (C) = A slightly better power is observed from P 1 (C) = 0.40 up to 0.70 and a clear separation favoring HR is observed afterwards. (a) Figure 3.13: Power Curves of HR (solid line) versus OR (dashed line) Over Increasing Study Cutoff Times where T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.5) 3.4 Discussion In HR versus OR, cutoff time seems to be the key factor that determines their relative performance. Under proportional hazards assumption, HR s performance increase with increasing cutoffs. Under non-proportional hazards, HR s performance over different cutoffs does not necessarily increase. Another observation is that their performance is determined by the maximum separation. This is observed in OR s performance. Its power is increasing and seems to peak at the point when the maximum difference occurs. Afterwards the maximum, its performance deteriorates. 37

50 As the cutoff time increases up to the maximum difference, OR and HR s performance are almost equal. When the maximum separation occurs earlier than the cutoff and under nonproportional hazards, both HR and OR s performance deteriorate. Moreover, OR s performance seems to deteriorate quickly compared to HR. 38

51 CHAPTER 4 METHOD SELECTION DIAGNOSTICS In this chapter, we propose method selection diagnostics for choosing between HR and OR. The objective is to help experimenters validate their choice of method for the analysis. Two major factors will be considered in the diagnostics: 1.) cutoff time and 2.) maximum separation. 4.1 Cutoff Time C Cutoff time, C, is shown to be one of the major factors that has an effect on the performance of HR and OR. For HR, cutoff time C have a direct effect on standard error of ˆβ. The standard error, P (δ = 1)E(V (X T, δ = 1)) increases directly with cutoff time. Hence, longer study times have, in general, a positive effect for HR s performance. For OR, the cutoff time C both has an effect on ˆβ and the standard error of ˆβ. Recall that OR = P 2/(1 P 2 ) P 1 /(1 P 1 ) = e β where P i = P (T C). The estimated OR (and thus ˆβ) changes with the cutoff time. Its maximum performance with respect to cutoff time is somewhere in the middle such that P 1 (C) bounded away from 0 or 1 (see Chapter 3.) 39

52 Therefore, one of the underlying aspects in the diagnostic is to know the cutoff time and evaluate if it is a short, moderate or long study length. To know the length of follow-up, we use P 1 (C) to be the lower of the two groups. We define the length of study as short when P 1 (C) 0.30 moderate when 0.30 < P 1 (C) 0.70 long when P 1 (C) > 0.70 For example, P treatment = 0.45 and P control = Since the lower of the two is P treatment = 0.45, then we denote it by P 1 (C) = Maximum Separation A second factor in the relative performance of HR versus OR is the maximum separation. Suppose that T 1 and T 2 are the times-to-event of the two groups. The difference in the proportion of events among the two groups changes over time. Maximum separation is achieved when the difference in the proportion is largest. Early and late separation refer to where the maximum difference, P 2 (t) P 1 (t) occurs. Survival distributions with late separation include all proportional hazards distributions but may also include non-proportional hazards distributions. Recall that under the CPH model, h 2 (t) = h 1 (t)e β (4.1) This implies that, since S(t) = e h(t), S 2 (t) = S 1 (t) eβ (4.2) Equation 4.2 is often referred to as Lehmann alternatives which is an alternative representation of the CPH model. It can be shown that under the Lehmann alternative, the maximum difference between the survival functions S 1 (t) and S 2 (t) will occur at t such that S 1 (t) < 0.4. (See Martinez and Naranjo [2010]) 40

53 Figure 4.1: Example of Maximum Separation between Two Weibulls By extension of the late treatment differences, the maximum difference between P 1 and P 2 occurs at t such that P 1 (t) > 0.6. Thus, by using some information on S(t) or P (t), a test can be devised to detect an early or late effect Q-Test Q-Test (Martinez and Naranjo, 2010) uses Ŝi to detect early or late treatment differences. The test statistic is Q = [Ŝ2(t 0.6 ) Ŝ1(t 0.6 )] [Ŝ2(t 0.2 ) Ŝ1(t 0.2 )] (4.3) where Ŝ i (t) is the survival estimate at time t for group i t 0.6 is the time such that Ŝ1(t) = 0.60, and t 0.2 is the time such that Ŝ1(t) =

54 The test concludes a late effect if Q < 0 and an early effect otherwise. Q-test, however, requires Ŝ 1 (t 0.2 ) which is not available if a study ends when S 1 (C) > For example, a study is designed such that S 1 (C) = 0.40 and S 1 (C) > S 2 (C). In this scenario, the Q-test cannot be computed. The choice of 0.6 and 0.2 in test Q-test is arbitray as long as the maximum difference 0.4 is captured inside and the choice of quantiles are bounded away from 0 and 1. Hence, it can be modified to accommodate the restrictions in our survival experiment setup. Instead of 0.2, for example, any choice from 0.4 or lower can be chosen and instead of 0.6, any choice higher than 0.4 can be chosen. Therefore, pairs such as (0.75, 0.25) or (0.6, 0.3) can be used for early or late effect Q m, Modified Q-Test We propose a modified Q-test to discriminate whether the maximum separation occurred before or after the cutoff. Let Q m = [Ŝ2(t P1 /2) Ŝ1(t P1 /2)] [Ŝ2(t P1 ) Ŝ1(t P1 )] (4.4) where Ŝ i (t) is the survival estimate at time t for group i, t P1 is the maximum time observed for group 1, and t P1 /2 is the time such that P 1 (t P1 /2) = P 1 (C)/2 The Q-test tells whether the difference in S 1 and S 2 is greater at time C than the before C when Q m < 0. To test whether maximum separation occurs before the cutoff or not, if Q m 0 then the maximum separation is before the cutoff time. if Q m < 0 then the maximum separation is after the cutoff time. 42

55 A simulation study is conducted to test the power of Q m test in detecting whether the maximum separation occurs after the cutoff. Table 4.1 shows the result of the simulation for 2 cases with cutoffs at F 1 (C) = 0.20 up to F 1 (C) = T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 3, b = 1.3) 2. T 1 W eibull(a = 2, b = 1) and T 2 W eibull(a = 2, b = 1.5) Table 4.1: Power of Q m for Detecting Maximum after C F 1 (C) Early Separation Late Separation The simulation shows that Q m detects the maximum separation after the cutoff more when there is a late separation than when there is early separation. 4.3 Selection Diagnostics ˆP1 and Q-Test In this section, we prepare a diagnostic that helps the experimenter choose betweeen HR and OR by answering the following questions: Does the study have a short, medium or long study length? Does the maximum separation occurs early or late? Figure 4.2 summarizes our proposed diagnositic which (i) uses ˆP 1 to measure the length of the study and (ii) uses the Q-test to determine an early or late effect. 43

56 Figure 4.2: Diagnostic Using ˆP 1 and Q The x-axis points to when the maximum separation occurred and the Q-test can be used to formally answer this question. The y-axis points to the length of the study. ˆF denotes a short cutoff time, denotes a medium length cutoff time and greater than 0.70 denotes a long cutoff time. The Figure 4.2 maps out which case does the experiment fall into. 1. Late effect and short study (F 1 (C) 0.30). Under this scenario, HR and OR perform equally well. 2. Late effect and moderate study (0.70 F 1 (C) 0.30). Under this scenario, HR and OR perform equally well. 3. Late effect and long study (F 1 (C) 0.70). Under this scenario, HR outperforms OR. 4. Early effect and short study (F 1 (C) 0.30). Under this scenario, HR and OR perform equally well. 44

57 5. Early effect and moderate study (0.70 F 1 (C) 0.30). Under this scenario, HR moderately outperforms OR. 6. Early effect and long study (F 1 (C) 0.70). Under this scenario, HR outperforms OR. 45

58 4.3.2 ˆP1 and Q m -Test In this subsection, the experimenter evaluates HR and OR s relative performance by answering the following questions: Does the study have a short, medium or long study length? Did the maximum separation occurred earlier than the cutoff? Figure 4.3 summarizes the diagnositics using ˆP 1 and Q m. Figure 4.3: Diagnostic Using ˆP 1 and Q m Although similar to the first selection scheme, the main difference is on how it determines when the maximum separation occurs. Here, we ask the question Does the maximum occur before or after the study? instead. Using the scheme, the experimenter can map which case the experiment is. Ideally, an optimal experiment is designed so that the cutoff is around the maximum. If the maximum occurs before the cutoff then HR beats OR and if the cutoff occurs after C then HR and OR are almost equivalent. 46

59 4.3.3 Example We illustrate the use of the scheme on the AML data. Figure 4.4 plots the estimated survival curve of the data and the estimates are shown below. The failure estimates are computed by taking F = 1 S. x=maintained time n.risk n.event survival failure x=nonmaintained time n.risk n.event survival failure

60 Figure 4.4: Survival Plot of the AML data By the end of the 30-week period and 0.7 of the patients have a relapse in the maintained and non-maintained groups, respectively. Since the lower of the two is 0.444, then we denote F 1 (30) = and hence a medium cutoff. We use the Q test with (0.5, 0.3) as points of comparison, then Q = [ ] [ ] = If we use Q m test, then Q m = [1 (1.556)/2 S 1 ] [ ] = [ ] [ ] = The Q-test concludes that their is an early effect while the Q m -test concludes that the maximum occurred before week 30. Since the study has a moderate length and an early effect or maximum occurring before the cutoff, then OR s performance is slightly less than HR s. 48

61 The output below are results for using CPH and logistic regression for estimating HR and OR respectively. As expected, the p-value for HR is less than OR although both are not significant at 0.05 level of significance. coxph(formula = Surv(time, status) ~ x, data = aml3) coef exp(coef) se(coef) z p xnonmaintained glm(formula = status ~ x, family = binomial, data = aml3) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) xnonmaintained

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

Power and Sample Size Calculations with the Additive Hazards Model

Power and Sample Size Calculations with the Additive Hazards Model Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine

More information

Survival Analysis. Stat 526. April 13, 2018

Survival Analysis. Stat 526. April 13, 2018 Survival Analysis Stat 526 April 13, 2018 1 Functions of Survival Time Let T be the survival time for a subject Then P [T < 0] = 0 and T is a continuous random variable The Survival function is defined

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

Beyond GLM and likelihood

Beyond GLM and likelihood Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence

More information

Lecture 8 Stat D. Gillen

Lecture 8 Stat D. Gillen Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels

More information

Lecture 22 Survival Analysis: An Introduction

Lecture 22 Survival Analysis: An Introduction University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 22 Survival Analysis: An Introduction There is considerable interest among economists in models of durations, which

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Semiparametric Regression

Semiparametric Regression Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under

More information

Relative-risk regression and model diagnostics. 16 November, 2015

Relative-risk regression and model diagnostics. 16 November, 2015 Relative-risk regression and model diagnostics 16 November, 2015 Relative risk regression More general multiplicative intensity model: Intensity for individual i at time t is i(t) =Y i (t)r(x i, ; t) 0

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis CIMAT Taller de Modelos de Capture y Recaptura 2010 Known Fate urvival Analysis B D BALANCE MODEL implest population model N = λ t+ 1 N t Deeper understanding of dynamics can be gained by identifying variation

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n Bios 323: Applied Survival Analysis Qingxia (Cindy) Chen Chapter 4 Fall 2012 4.2 Estimators of the survival and cumulative hazard functions for RC data Suppose X is a continuous random failure time with

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

β j = coefficient of x j in the model; β = ( β1, β2,

β j = coefficient of x j in the model; β = ( β1, β2, Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Survival analysis in R

Survival analysis in R Survival analysis in R Niels Richard Hansen This note describes a few elementary aspects of practical analysis of survival data in R. For further information we refer to the book Introductory Statistics

More information

UNIVERSITY OF CALIFORNIA, SAN DIEGO

UNIVERSITY OF CALIFORNIA, SAN DIEGO UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Survival analysis in R

Survival analysis in R Survival analysis in R Niels Richard Hansen This note describes a few elementary aspects of practical analysis of survival data in R. For further information we refer to the book Introductory Statistics

More information

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

More information

Multistate models and recurrent event models

Multistate models and recurrent event models Multistate models Multistate models and recurrent event models Patrick Breheny December 10 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Multistate models In this final lecture,

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

Survival Analysis. STAT 526 Professor Olga Vitek

Survival Analysis. STAT 526 Professor Olga Vitek Survival Analysis STAT 526 Professor Olga Vitek May 4, 2011 9 Survival Data and Survival Functions Statistical analysis of time-to-event data Lifetime of machines and/or parts (called failure time analysis

More information

Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS330 / MAS83 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-0 8 Parametric models 8. Introduction In the last few sections (the KM

More information

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction Outline CHL 5225H Advanced Statistical Methods for Clinical Trials: Survival Analysis Prof. Kevin E. Thorpe Defining Survival Data Mathematical Definitions Non-parametric Estimates of Survival Comparing

More information

Multistate models and recurrent event models

Multistate models and recurrent event models and recurrent event models Patrick Breheny December 6 Patrick Breheny University of Iowa Survival Data Analysis (BIOS:7210) 1 / 22 Introduction In this final lecture, we will briefly look at two other

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016 Statistics 255 - Survival Analysis Presented March 8, 2016 Dan Gillen Department of Statistics University of California, Irvine 12.1 Examples Clustered or correlated survival times Disease onset in family

More information

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require Chapter 5 modelling Semi parametric We have considered parametric and nonparametric techniques for comparing survival distributions between different treatment groups. Nonparametric techniques, such as

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

Chapter 4 Regression Models

Chapter 4 Regression Models 23.August 2010 Chapter 4 Regression Models The target variable T denotes failure time We let x = (x (1),..., x (m) ) represent a vector of available covariates. Also called regression variables, regressors,

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Generalized Linear Modeling - Logistic Regression

Generalized Linear Modeling - Logistic Regression 1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

More information

Lecture 4 - Survival Models

Lecture 4 - Survival Models Lecture 4 - Survival Models Survival Models Definition and Hazards Kaplan Meier Proportional Hazards Model Estimation of Survival in R GLM Extensions: Survival Models Survival Models are a common and incredibly

More information

TMA 4275 Lifetime Analysis June 2004 Solution

TMA 4275 Lifetime Analysis June 2004 Solution TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,

More information

Goodness-of-Fit Tests With Right-Censored Data by Edsel A. Pe~na Department of Statistics University of South Carolina Colloquium Talk August 31, 2 Research supported by an NIH Grant 1 1. Practical Problem

More information

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample Bios 323: Applied Survival Analysis Qingxia (Cindy) Chen Chapter 7 Fall 2012 Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample H 0 : S(t) = S 0 (t), where S 0 ( ) is known survival function,

More information

Tied survival times; estimation of survival probabilities

Tied survival times; estimation of survival probabilities Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation

More information

Package threg. August 10, 2015

Package threg. August 10, 2015 Package threg August 10, 2015 Title Threshold Regression Version 1.0.3 Date 2015-08-10 Author Tao Xiao Maintainer Tao Xiao Depends R (>= 2.10), survival, Formula Fit a threshold regression

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters Communications for Statistical Applications and Methods 2017, Vol. 24, No. 5, 519 531 https://doi.org/10.5351/csam.2017.24.5.519 Print ISSN 2287-7843 / Online ISSN 2383-4757 Goodness-of-fit tests for randomly

More information

Duration Analysis. Joan Llull

Duration Analysis. Joan Llull Duration Analysis Joan Llull Panel Data and Duration Models Barcelona GSE joan.llull [at] movebarcelona [dot] eu Introduction Duration Analysis 2 Duration analysis Duration data: how long has an individual

More information

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Guosheng Yin Department of Statistics and Actuarial Science The University of Hong Kong Joint work with J. Xu PSI and RSS Journal

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0, Accelerated failure time model: log T = β T Z + ɛ β estimation: solve where S n ( β) = n i=1 { Zi Z(u; β) } dn i (ue βzi ) = 0, Z(u; β) = j Z j Y j (ue βz j) j Y j (ue βz j) How do we show the asymptotics

More information

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times Patrick J. Heagerty PhD Department of Biostatistics University of Washington 1 Biomarkers Review: Cox Regression Model

More information

The coxvc_1-1-1 package

The coxvc_1-1-1 package Appendix A The coxvc_1-1-1 package A.1 Introduction The coxvc_1-1-1 package is a set of functions for survival analysis that run under R2.1.1 [81]. This package contains a set of routines to fit Cox models

More information

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016 Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

On a connection between the Bradley-Terry model and the Cox proportional hazards model

On a connection between the Bradley-Terry model and the Cox proportional hazards model On a connection between the Bradley-Terry model and the Cox proportional hazards model Yuhua Su and Mai Zhou Department of Statistics University of Kentucky Lexington, KY 40506-0027, U.S.A. SUMMARY This

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Residuals for the

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

GOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching

GOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching GOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching Mayya Komisarchik Harvard University April 13, 2016 1 Heartfelt thanks to all of the Gov 2001 TFs of yesteryear; this section draws heavily

More information

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data Outline Frailty modelling of Multivariate Survival Data Thomas Scheike ts@biostat.ku.dk Department of Biostatistics University of Copenhagen Marginal versus Frailty models. Two-stage frailty models: copula

More information

Statistical Methods for Alzheimer s Disease Studies

Statistical Methods for Alzheimer s Disease Studies Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

5. Parametric Regression Model

5. Parametric Regression Model 5. Parametric Regression Model The Accelerated Failure Time (AFT) Model Denote by S (t) and S 2 (t) the survival functions of two populations. The AFT model says that there is a constant c > 0 such that

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor

More information

Likelihood Construction, Inference for Parametric Survival Distributions

Likelihood Construction, Inference for Parametric Survival Distributions Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

Step-Stress Models and Associated Inference

Step-Stress Models and Associated Inference Department of Mathematics & Statistics Indian Institute of Technology Kanpur August 19, 2014 Outline Accelerated Life Test 1 Accelerated Life Test 2 3 4 5 6 7 Outline Accelerated Life Test 1 Accelerated

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Non-parametric Mediation Analysis for direct effect with categorial outcomes

Non-parametric Mediation Analysis for direct effect with categorial outcomes Non-parametric Mediation Analysis for direct effect with categorial outcomes JM GALHARRET, A. PHILIPPE, P ROCHET July 3, 2018 1 Introduction Within the human sciences, mediation designates a particular

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Pubh 8482: Sequential Analysis

Pubh 8482: Sequential Analysis Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 10 Class Summary Last time... We began our discussion of adaptive clinical trials Specifically,

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Right censored

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL The Cox PH model: λ(t Z) = λ 0 (t) exp(β Z). How do we estimate the survival probability, S z (t) = S(t Z) = P (T > t Z), for an individual with covariates

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Distributions, Hazard Functions, Cumulative Hazards BIO 244: Unit 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data 1 Part III. Hypothesis Testing III.1. Log-rank Test for Right-censored Failure Time Data Consider a survival study consisting of n independent subjects from p different populations with survival functions

More information

Multistate models in survival and event history analysis

Multistate models in survival and event history analysis Multistate models in survival and event history analysis Dorota M. Dabrowska UCLA November 8, 2011 Research supported by the grant R01 AI067943 from NIAID. The content is solely the responsibility of the

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status

Ph.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 25 April 2013 www.biostat.ku.dk/~pka/regrmodels13 Per Kragh Andersen Regression models The distribution of one outcome variable

More information

7.1 The Hazard and Survival Functions

7.1 The Hazard and Survival Functions Chapter 7 Survival Models Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION

PubH 7405: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION PubH 745: REGRESSION ANALYSIS INTRODUCTION TO LOGISTIC REGRESSION Let Y be the Dependent Variable Y taking on values and, and: π Pr(Y) Y is said to have the Bernouilli distribution (Binomial with n ).

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

Modeling Arbitrarily Interval-Censored Survival Data with External Time-Dependent Covariates

Modeling Arbitrarily Interval-Censored Survival Data with External Time-Dependent Covariates University of Northern Colorado Scholarship & Creative Works @ Digital UNC Dissertations Student Research 12-9-2015 Modeling Arbitrarily Interval-Censored Survival Data with External Time-Dependent Covariates

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information