# Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Save this PDF as:

Size: px
Start display at page:

Download "Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction"

## Transcription

1 Outline CHL 5225H Advanced Statistical Methods for Clinical Trials: Survival Analysis Prof. Kevin E. Thorpe Defining Survival Data Mathematical Definitions Non-parametric Estimates of Survival Comparing Survival Dalla Lana School of Public Health University of Toronto Measuring Treatment Effect Parametric Estimates of Survival References Defining Survival Data Introduction Definition of Failure Time Interest centres on patients who are at risk for some event and the time to the occurrence of the event. s: Time to death Time to stroke Time to disease recurrence Three requirements to precisely determine failure time: unambiguous time origin scale for measuring the passage of time meaning of the failure event must be entirely clear

2 Censoring Typical Survival Data Arising From a Clinical Trial Patients who are not observed to failure are called censored Some Causes: study concludes before all patients destined to fail have actually failed patient withdraws from study before failure observed Mathematical Definitions Introduction The Survivor Function The survivor function F(t) is defined to be Let the failure time be represented by a non-negative random variable T. Usually, we consider continuous distributions. Discrete and mixed discrete-continuous distributions can also handled. F(t) = Pr(T t) It is sometimes useful to consider the cumulative risk of an event. The cumulative risk function is defined by R(t) = 1 F(t) Note that the definition of F(t) leads to left continuous CDF rather than the usual right continuity.

3 Probability Density Function Hazard Function Continuous Distributions Continuous T. which gives f (t) = F (t) = F(t) = Pr(t T < t + ) lim 0+ t f (u)du Discrete T. Usually handled by assigning to the density a component f j δ(t a j ) for an atom f j at a j where δ( ) is the Dirac delta function. The hazard function (also called age-specific failure rate or force of mortality is defined by h(t) = Pr(t T < t + t T ) lim 0+ By the definition of conditional probability h(t) = f (t)/f(t) Sometimes the hazard is denoted by λ(t). Hazard Function Continuous Distributions (continued) By the definitions of f (t) and that F(0) = 1 we have and h(t) = F (t)/f(t) = d log F(t)/dt ( t ) F(t) = exp h(u)du 0 = exp[ H(t)] Hazard Function Discrete Distributions If there is an atom f j of probability at time a j then h j = f j /F(a j ) = f j /(f j + f j+1 + ) With this definition it can be shown that F(t) = a j <t(1 h j ) By defining the integrated hazard as where H( ) is called the integrated hazard. Finally, f (t) = h(t)exp[ H(t)] we still have H(t) = log(1 h j ) a j <t F(t) = exp[ H(t)]

4 The Likelihood Function Survival Models We assume each patient has a failure time t i and a censoring time c i associated with them. In practice, we observe x i = min(t i, c i ). L = u f (t i; φ) c F(c i; φ) l = u log f (t i; φ) + c log F(c i; φ) l = u log h(x i; φ) + log F(x i ; φ) Suppose in addition to survival some covariates z are measured with the intent of modeling their association with survival. For some baseline hazard function h 0 (t) (or survivor function F 0 (t) we have the following commonly used models. The Accelerated Life model is defined by F(t; z) = F 0 [tψ(z)] or h(t; z) = h 0 [tψ(z)]ψ(z) The choice of F 0 is dictated by a choice of parametric model. The Proportional Hazard model is defined by F(t; z) = [F 0 (t)] ψ(z) or h(t; z) = ψ(z)h 0 (t) The baseline hazard does not need to be explicitly known for this model. The Likelihood Function We begin by assuming a discrete distribution atoms of probability at g unique failure times t 1, t 2,..., t g. It can be shown that the log-likelihood is l = j [d j log h j + (r j d j ) log(1 h j )] where r j is the number of patients in view at time t j and d j is the number who fail at t j. By convention, r j includes individuals who are censored at t j. This is identical to the log likelihood for g independent binomials with r j trials, d j failures and probability of failure h j. This fact is used in the derivation of the results on the next slide. Product-Limit or Kaplan-Meier Estimate Suppose g unique failure times are observed. Put them in order from smallest to largest. t 1, t 2,..., t g The survivor function is estimated at time t j by ˆF(t j ) = j i=1 ( 1 d ) i = r i j ( ) ri d i Note the similarity between the first expression above and the expression for a discrete survivor function, F(t) = a j <t (1 h j) given previously. The variance of ˆF(t) at time t j is (Greenwood s formula) V ( ˆF(t j )) = [ ˆF(t j )] 2 j i=1 i=1 r i d i r i (r i d i )

5 Suppose we had the following data (+ indicates censored observations). [1] [12] [23] t j r j d j P(t j ) = 1 d j /r j ˆF(t j ) SE /27 = = /25 = = /21 = = /19 = = /17 = = /13 = = /12 = = /11 = = /10 = = /6 = = /4 = = Proportion Event Free Kaplan Meier Survivor Function with Greenwood 95% Confidence Interval R Code > library(survival) > eg1a.km <- survfit(surv(days, status), data = eg1a) > summary(eg1a.km) > plot(eg1a.km, mark.time = FALSE, xlab = "Days", + ylab = "Proportion Event-Free", + main = "Kaplan-Meier Survivor Function\n with Greenwood 95% Confidence Interval") Days

6 Actuarial Estimator Comparing Survival Suppose instead of exact failure/censoring times we know only the numbers of failures and censored observations between time intervals. For each interval (t j 1, t j ] we define r j 1 to be the number at risk at t j 1, d j to be the number who fail during the interval and m j to be the number censored during the interval. Define r j = r j m j. Then, the actuarial estimator of survival is F(t j ) = ( 1 d ) k r k j k There are a variety of methods that can be used to compare survival in various ways. Point in time comparison (two groups) Mantel-Haenszel or Log Rank Test (two or more survival curves) Model-based tests (two or more groups) Point in Time Comparison The object is to compare survival between two treatments at a pre-specified time point t 0. From the Kaplan-Meier procedure and Greenwood s formula we have, Treatment : ˆF T (t 0 ) and V [ ˆF T (t 0 )] Control : ˆF C (t 0 ) and V [ ˆF C (t 0 )] Then, to conduct an approximate test of the null hypothesis H 0 : F T (t 0 ) = F C (t 0 ) we calculate the Z statistic, Z = Under H 0, Z N(0, 1). ˆF T (t 0 ) ˆF C (t 0 ) V [ ˆF T (t 0 )] + V [ ˆF C (t 0 )] Mantel-Haenszel Test Consider the two survival curves as a series of 2x2 contingency tables constructed at each time point where one or more events occur. At a time point t j we have a table of the form: Treatment Control Total Dead d Tj d Cj d j Alive n Tj d Tj n Cj d Cj N j d j Total n Tj n Cj N j Then, from the hypergeometric distribution, E(d Tj ) = n T j d j N j V (d Tj ) = n T j n Cj (N j d j )d j N 2 j (N j 1)

7 Mantel-Haenszel Test Continued Comparing Survival in R Define the following quantities. O T = d Tj E T = E(d Tj ) O C = d j O T E C = d j E T V = V (d Tj ) Now, the null hypothesis H 0 : F T (t) = F C (t) is usually tested with a χ 2 (1) test, but may also be tested with a Z test. and χ 2 = (O T E T ) 2 V Z = O T E T V In R the survdiff() function is used to compare survival curves. This performs tests in a family described by Harrington and Fleming (Biometrika: 1982) where the weights on each event time are F(t) ρ. The survdiff() function accepts an argument rho. If rho = 0 (the default) you get the Mantel-Haenszel test and if rho = 1 you get the Peto & Peto modification of the Gehan-Wilcoxon test. This is often called the Log Rank Test. Proportion Event Free > survdiff(surv(days, status) ~ rx, data = eg1b) Call: survdiff(formula = Surv(days, status) ~ rx, data = eg1b) N Observed Expected (O-E)^2/E (O-E)^2/V rx=control rx=treatment Chisq= 0.9 on 1 degrees of freedom, p= Days

8 Measuring the Treatment Effect Among other things, clinical trials are intended to provide an estimate of the treatment effect. For a survival outcome, two types of estimates are commonly used. Point in time Risk Ratio or Risk Reduction. All time or whole curve effect measures by means of Hazard Ratios. Point in Time Treatment Effect Suppose a trial is comparing a treatment T to a control C. The usual expectation would be that the risk of the event would be lower in the treatment group than the control group. There may be situations where there is some clinically meaningful time point t 0 say, that is of interest. Consider three point in time estimates of risk reduction. Risk Difference = R C (t 0 ) R T (t 0 ) Risk Ratio = R T (t 0 )/R C (t 0 ) Relative Risk Reduction = R C (t 0 ) R T (t 0 ) R C (t 0 ) = 1 R T (t 0 ) R C (t 0 ) Estimating Risk Difference Estimate T & C survival curves using Kaplan-Meier procedure and compute. RD = ˆR C (t 0 ) ˆR T (t 0 ) = (1 ˆF C (t 0 )) (1 ˆF T (t 0 )) = ˆF T (t 0 ) ˆF C (t 0 ) Calculate V ( ˆF T (t 0 )) and V ( ˆF C (t 0 )) using Greenwood s formula (ie. SE 2 ). Then, 95% CI on the RD is (approximately): RD ± 1.96 V ( ˆF T (t)) + V ( ˆF C (t)) Estimating Risk Ratio Estimate T & C survival curves using Kaplan-Meier procedure and compute. RR = ˆR T (t 0 )/ˆR C (t 0 ) = 1 ˆF T (t 0 ) 1 ˆF C (t 0 ) Calculate V (log ˆR T (t 0 )) and V (log ˆR C (t 0 )) using the following formula. V (log ˆR(t 0 )) = V ( ˆF(t 0 )) (ˆR(t 0 )) 2 Then, 95% CI on RR is (approximately): RR e ±1.96 V (log ˆR T (t 0 ))+V (log ˆR C (t 0 ))

9 Estimating Relative Risk Reduction Estimating Hazard Ratio Estimate T & C survival curves using Kaplan-Meier procedure and compute. RRR = 1 ˆR T (t 0 ) ˆR C (t 0 ) = 1 RR Determine the CI for RR. Say this is (RR L, RR U ). The the CI for RRR is (1 RR U, 1 RR L ). The Mantel-Haenzsel test calculations enable estimation of the hazard ratio. ˆψ = O T /E T O C /E C V [log ˆψ] = E T E C Various model-based estimates are preferred where appropriate. Introduction to Parametric Survival Distributions The Exponential Distribution We saw earlier that the Kaplan-Meier estimate of survival is not particularly smooth. By assuming a particular parametric form for the survival distribution, smooth estimates of survival can be obtained. Some commonly used distributions in survival analysis are: Exponential Weibull Log-normal Log-logistic We will look at the exponential and Weibull distributions in more detail. For the exponential distribution with parameter ρ and mean 1/ρ, we have F(t) = e ρt, f (t) = ρe ρt, h(t) = ρ, H(t) = ρt A plot of H(t) = log( ˆF(t)), where ˆF(t) is the Kaplan-Meier estimate versus t will be linear if the exponential distribution is appropriate. The MLE for ρ is ˆρ = d/ x i where d is the total number of failures.

10 The Exponential Likelihood Likelihood Ratio Based Confidence Interval For exponential failure times with d failures we get l(ρ; x) = u log h(x i; φ) + log F(x i ; φ) = u log ρ ρ x i = d log ρ ρ x i U(ρ) = d/ρ x i I (ρ) = d/ρ 2 It is easy to show that solving U(ρ) = 0 gives ˆρ = d/ x i as the MLE and 1/I (ˆρ) is an estimate of its standard error, although not the best to use in practice. Confidence intervals based on the likelihood ratio statistic or from a fitted model are preferred. The likelihood ratio statistic for the null hypothesis ρ = ρ 0 is W (ρ 0 ) = W = 2[l(ˆρ) l(ρ 0 )] This has a nice simple form in the exponential case and is distributed approximately as a chi-squared variate with 1 degree of freedom. A 1 α confidence region is {ρ : W (ρ) c 1,α} where c1,α is the upper α point of the chi-squared distribution with 1 degree of freedom. Cumulative Hazard Plot Direct computation of hazard and symmetric 95% CI. > rhohat <- sum(eg1a\$status)/sum(eg1a\$days) > se.rhohat <- sqrt(rhohat^2/sum(eg1a\$status)) > symci <- rhohat + c(-1, 1) * qnorm(0.975) * se.rhohat > c(rhohat, symci) [1] H(t) Time

11 log likelihood Exponential Log Likelihood Model based estimate. > eg1a.sr <- survreg(surv(days, status) ~ 1, data = eg1a, + dist = "exponential") > summary(eg1a.sr) Call: survreg(formula = Surv(days, status) ~ 1, data = eg1a, dist = "exponenti Value Std. Error z p (Intercept) e-63 Scale fixed at 1 Exponential distribution Loglik(model)= -70 Loglik(intercept only)= -70 Number of Newton-Raphson Iterations: 5 n= ρ > exp(-coef(eg1a.sr)) (Intercept) The Two Group Problem Comparing confidence intervals: Symmetric: Likelihood Based: Model Based: Suppose you have exponentially distributed failure data for a treatment group T and a control group C. Estimate ˆρ T = d T / x Ti, ˆρ C = d C / x Ci and ˆψ = ˆρ T /ˆρ C. It turns out that V (log ˆψ) 1/d T + 1/d C which permits approximate Z-tests and confidence intervals to be calculated.

12 Direct Calculation > rhohats <- tapply(eg1b\$status, eg1b\$rx, sum)/tapply(eg1b\$days, + eg1b\$rx, sum) > hr <- rhohats[2]/rhohats[1] > lhrse <- sqrt(sum(1/tapply(eg1b\$status, eg1b\$rx, + sum))) > c(hr, log(hr), lhrse) Treatment Treatment Model based. > eg1b.sr <- survreg(surv(days, status) ~ rx, data = eg1b, + dist = "exponential") > summary(eg1b.sr) Call: survreg(formula = Surv(days, status) ~ rx, data = eg1b, dist = "exponent Value Std. Error z p (Intercept) e-63 rxtreatment e-01 Scale fixed at 1 Exponential distribution Loglik(model)= Loglik(intercept only)= Chisq= 0.71 on 1 degrees of freedom, p= 0.4 Number of Newton-Raphson Iterations: 5 n= 54 The Weibull Distribution The Weibull distribution has the following: Model based (continued). > exp(-coef(eg1b.sr)[2]) rxtreatment F(t) = exp[ (ρt) κ ] f (t) = κρ(ρt) κ 1 exp[ (ρt) κ ] h(t) = κρ(ρt) κ 1 Note that if κ = 1 we are left with an exponential distribution with a hazard of ρ. A plot of log( log( ˆF(t))) versus log(t) will be linear if the Weibull distribution is appropriate. The biggest difficulty is that the coefficient estimates don t have nice straight forward interpretations like other models we ve used. A scale parameter is estimated and e β i /scale gives a hazard ratio.

13 Fitting a Weibull in R log(h(t) log(t) > eg2b.sr <- survreg(surv(days, status) ~ rx, data = eg1b, + dist = "weibull") > summary(eg2b.sr) Call: survreg(formula = Surv(days, status) ~ rx, data = eg1b, dist = "weibull" Value Std. Error z p (Intercept) e-176 rxtreatment e-01 Log(scale) e-03 Scale= Weibull distribution Loglik(model)= Loglik(intercept only)= Chisq= 0.83 on 1 degrees of freedom, p= 0.36 Number of Newton-Raphson Iterations: 8 n= 54 The hazard ratio for treatment is given by: > exp(-coef(eg2b.sr)[2]/eg2b.sr\$scale) rxtreatment Survival Days

14 Log-logistic and Log-normal Distributions Computing log hazard in R These distributions are very similar. Both result in non-proportional hazards models. If log T has a logistic distribution, we say T is log-logistic If log T as a normal distribtion, we say T is log-normal If log h(t) is non-monotonic in t then these distributions are possible. > kmf <- survfit(surv(days, status) ~ rx, data = eg1b) > H1 <- -log(kmf[1]\$surv) > H2 <- -log(kmf[2]\$surv) > Ht1 <- eg1b.km[1]\$time > Ht2 <- eg1b.km[2]\$time > dh1 <- diff(h1)/diff(ht1) > dh2 <- diff(h2)/diff(ht2) > loghaz <- data.frame(ldh = log(c(dh1, dh2)), times = c(ht1[-1], + Ht2[-1]), rx = factor(c(rep(1, length(dh1)), + rep(2, length(dh2))), labels = c("control", + "Treatment"))) 2 Plotting log hazard in R > library(lattice) > xyplot(ldh ~ times, data = loghaz, groups = rx, + type = c("p", "smooth"), ylab = "log hazard", + xlab = "Time") log hazard Time

15 Fitting a log-logistic Model in R > llog.fit <- survreg(surv(days, status) ~ rx, data = eg1b, + dist = "loglogistic") > summary(llog.fit) Call: > pct <- 0:99/100 survreg(formula = Surv(days, status) ~ rx, data = eg1b, dist = "loglogistic") > llog.pred <- predict(llog.fit, newdata = data.frame(rx = factor(1:2, Value Std. Error z p + labels = c("control", "Treatment"))), type = "quantile", (Intercept) e p = pct) rxtreatment e-01 > plot(kmf, mark.time = FALSE, lty = 1:2) Log(scale) e-05 > matlines(t(llog.pred), cbind(1 - pct, 1 - pct), + col = "red", lty = 1:2) Scale= 0.48 Log logistic distribution Loglik(model)= Loglik(intercept only)= Chisq= 0.71 on 1 degrees of freedom, p= 0.4 Number of Newton-Raphson Iterations: 5 n= 54 Plotting the Result Adding Weibull > weib.pred <- predict(eg2b.sr,, newdata = data.frame(rx = factor(1:2, + labels = c("control", "Treatment"))), type = "quantile", + p = pct) > matlines(t(weib.pred), cbind(1 - pct, 1 - pct), + col = "blue", lty = 1:2)

16 References Cox D.R. and Oakes D. (1984). Analysis of Survival Data. Chapman and Hall Kalbfleisch J.D. and Prentice R.L. (1980). The Statistical Analysis of Failure Time Data. New York: Wiley

### STAT331. Cox s Proportional Hazards Model

STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

### Survival Distributions, Hazard Functions, Cumulative Hazards

BIO 244: Unit 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution

### Survival Analysis. Lu Tian and Richard Olshen Stanford University

1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival

### Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

### ST495: Survival Analysis: Maximum likelihood

ST495: Survival Analysis: Maximum likelihood Eric B. Laber Department of Statistics, North Carolina State University February 11, 2014 Everything is deception: seeking the minimum of illusion, keeping

### Chapter 4 Regression Models

23.August 2010 Chapter 4 Regression Models The target variable T denotes failure time We let x = (x (1),..., x (m) ) represent a vector of available covariates. Also called regression variables, regressors,

### Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

### A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

### Multistate Modeling and Applications

Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

### BIOS 312: Precision of Statistical Inference

and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

### BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

### Survival Analysis APTS 2016/17. Ingrid Van Keilegom ORSTAT KU Leuven. Glasgow, August 21-25, 2017

Survival Analysis APTS 2016/17 Ingrid Van Keilegom ORSTAT KU Leuven Glasgow, August 21-25, 2017 Basic What is Survival analysis? Survival analysis (or duration analysis) is an area of statistics that and

### Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data

International Mathematical Forum, 3, 2008, no. 33, 1643-1654 Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data A. Al-khedhairi Department of Statistics and O.R. Faculty

### Reliability Engineering I

Happiness is taking the reliability final exam. Reliability Engineering I ENM/MSC 565 Review for the Final Exam Vital Statistics What R&M concepts covered in the course When Monday April 29 from 4:30 6:00

### SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000) AMITA K. MANATUNGA THE ROLLINS SCHOOL OF PUBLIC HEALTH OF EMORY UNIVERSITY SHANDE

### The coxvc_1-1-1 package

Appendix A The coxvc_1-1-1 package A.1 Introduction The coxvc_1-1-1 package is a set of functions for survival analysis that run under R2.1.1 [81]. This package contains a set of routines to fit Cox models

### Investigation of goodness-of-fit test statistic distributions by random censored samples

d samples Investigation of goodness-of-fit test statistic distributions by random censored samples Novosibirsk State Technical University November 22, 2010 d samples Outline 1 Nonparametric goodness-of-fit

### Analysis of competing risks data and simulation of data following predened subdistribution hazards

Analysis of competing risks data and simulation of data following predened subdistribution hazards Bernhard Haller Institut für Medizinische Statistik und Epidemiologie Technische Universität München 27.05.2013

### Tied survival times; estimation of survival probabilities

Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation

### Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract

### Exam C Solutions Spring 2005

Exam C Solutions Spring 005 Question # The CDF is F( x) = 4 ( + x) Observation (x) F(x) compare to: Maximum difference 0. 0.58 0, 0. 0.58 0.7 0.880 0., 0.4 0.680 0.9 0.93 0.4, 0.6 0.53. 0.949 0.6, 0.8

### AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

### Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

Log-linearity for Cox s regression model Thesis for the Degree Master of Science Zaki Amini Master s Thesis, Spring 2015 i Abstract Cox s regression model is one of the most applied methods in medical

### COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

### Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of

### Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes.

Unit 2: Models, Censoring, and Likelihood for Failure-Time Data Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes. Ramón

### Sigmaplot di Systat Software

Sigmaplot di Systat Software SigmaPlot Has Extensive Statistical Analysis Features SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical

### STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

STAT 331 Accelerated Failure Time Models Previously, we have focused on multiplicative intensity models, where h t z) = h 0 t) g z). These can also be expressed as H t z) = H 0 t) g z) or S t z) = e Ht

### Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

### Longitudinal Modeling with Logistic Regression

Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

### HYPOTHESIS TESTING: FREQUENTIST APPROACH.

HYPOTHESIS TESTING: FREQUENTIST APPROACH. These notes summarize the lectures on (the frequentist approach to) hypothesis testing. You should be familiar with the standard hypothesis testing from previous

### Müller: Goodness-of-fit criteria for survival data

Müller: Goodness-of-fit criteria for survival data Sonderforschungsbereich 386, Paper 382 (2004) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Goodness of fit criteria for survival data

### Accelerated life testing in the presence of dependent competing causes of failure

isid/ms/25/5 April 21, 25 http://www.isid.ac.in/ statmath/eprints Accelerated life testing in the presence of dependent competing causes of failure Isha Dewan S. B. Kulathinal Indian Statistical Institute,

### A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

Communications for Statistical Applications and Methods 2016, Vol. 23, No. 6, 517 529 http://dx.doi.org/10.5351/csam.2016.23.6.517 Print ISSN 2287-7843 / Online ISSN 2383-4757 A comparison of inverse transform

### Extensions of Cox Model for Non-Proportional Hazards Purpose

PhUSE 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Jadwiga Borucka, PAREXEL, Warsaw, Poland ABSTRACT Cox proportional hazard model is one of the most common methods used

### Statistics 262: Intermediate Biostatistics Regression & Survival Analysis

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Introduction This course is an applied course,

### Implementing Response-Adaptive Randomization in Multi-Armed Survival Trials

Implementing Response-Adaptive Randomization in Multi-Armed Survival Trials BASS Conference 2009 Alex Sverdlov, Bristol-Myers Squibb A.Sverdlov (B-MS) Response-Adaptive Randomization 1 / 35 Joint work

### Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

### Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =

### Frailty Modeling for clustered survival data: a simulation study

Frailty Modeling for clustered survival data: a simulation study IAA Oslo 2015 Souad ROMDHANE LaREMFiQ - IHEC University of Sousse (Tunisia) souad_romdhane@yahoo.fr Lotfi BELKACEM LaREMFiQ - IHEC University

### Continuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

Hazard functions Patrick Breheny August 27 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21 Introduction Continuous case Let T be a nonnegative random variable representing the time to an event

### ECON 721: Lecture Notes on Duration Analysis. Petra E. Todd

ECON 721: Lecture Notes on Duration Analysis Petra E. Todd Fall, 213 2 Contents 1 Two state Model, possible non-stationary 1 1.1 Hazard function.......................... 1 1.2 Examples.............................

### Kernel density estimation in R

Kernel density estimation in R Kernel density estimation can be done in R using the density() function in R. The default is a Guassian kernel, but others are possible also. It uses it s own algorithm to

### 18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Name No calculators. 18.05 Final Exam Number of problems 16 concept questions, 16 problems, 21 pages Extra paper If you need more space we will provide some blank paper. Indicate clearly that your solution

### Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

### Statistical Methods in Clinical Trials Categorical Data

Statistical Methods in Clinical Trials Categorical Data Types of Data quantitative Continuous Blood pressure Time to event Categorical sex qualitative Discrete No of relapses Ordered Categorical Pain level

### Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data

Biometrika (28), 95, 4,pp. 947 96 C 28 Biometrika Trust Printed in Great Britain doi: 1.193/biomet/asn49 Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival

### Count and Duration Models

Count and Duration Models Stephen Pettigrew April 2, 2014 Stephen Pettigrew Count and Duration Models April 2, 2014 1 / 61 Outline 1 Logistics 2 Last week s assessment question 3 Counts: Poisson Model

### Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

### Notes from Survival Analysis

Notes from Survival Analysis Cambridge Part III Mathematical Tripos 2012-2013 Lecturer: Peter Treasure Vivak Patel March 23, 2013 1 Contents 1 Introduction, Censoring and Hazard 4 1.1 Introduction..............................

### Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect

### Understanding the Cox Regression Models with Time-Change Covariates

Understanding the Cox Regression Models with Time-Change Covariates Mai Zhou University of Kentucky The Cox regression model is a cornerstone of modern survival analysis and is widely used in many other

### STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis

### Analysis of Cure Rate Survival Data Under Proportional Odds Model

Analysis of Cure Rate Survival Data Under Proportional Odds Model Yu Gu 1,, Debajyoti Sinha 1, and Sudipto Banerjee 2, 1 Department of Statistics, Florida State University, Tallahassee, Florida 32310 5608,

### Generalized linear models

Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

### Multinomial Logistic Regression Models

Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

### 6 Single Sample Methods for a Location Parameter

6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually

### STA6938-Logistic Regression Model

Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

### Reports of the Institute of Biostatistics

Reports of the Institute of Biostatistics No 02 / 2008 Leibniz University of Hannover Natural Sciences Faculty Title: Properties of confidence intervals for the comparison of small binomial proportions

### From semi- to non-parametric inference in general time scale models

From semi- to non-parametric inference in general time scale models Thierry DUCHESNE duchesne@matulavalca Département de mathématiques et de statistique Université Laval Québec, Québec, Canada Research

### Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs Christopher Jennison Department of Mathematical Sciences, University of Bath http://people.bath.ac.uk/mascj

### Promotion Time Cure Rate Model with Random Effects: an Application to a Multicentre Clinical Trial of Carcinoma

www.srl-journal.org Statistics Research Letters (SRL) Volume Issue, May 013 Promotion Time Cure Rate Model with Random Effects: an Application to a Multicentre Clinical Trial of Carcinoma Diego I. Gallardo

### Constant Stress Partially Accelerated Life Test Design for Inverted Weibull Distribution with Type-I Censoring

Algorithms Research 013, (): 43-49 DOI: 10.593/j.algorithms.01300.0 Constant Stress Partially Accelerated Life Test Design for Mustafa Kamal *, Shazia Zarrin, Arif-Ul-Islam Department of Statistics & Operations

### A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes

A Generalized Global Rank Test for Multiple, Possibly Censored, Outcomes Ritesh Ramchandani Harvard School of Public Health August 5, 2014 Ritesh Ramchandani (HSPH) Global Rank Test for Multiple Outcomes

### A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

### Statistical Prediction Based on Censored Life Data

Editor s Note: Results following from the research reported in this article also appear in Chapter 12 of Meeker, W. Q., and Escobar, L. A. (1998), Statistical Methods for Reliability Data, New York: Wiley.

### Multivariate Survival Data With Censoring.

1 Multivariate Survival Data With Censoring. Shulamith Gross and Catherine Huber-Carol Baruch College of the City University of New York, Dept of Statistics and CIS, Box 11-220, 1 Baruch way, 10010 NY.

### Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

### Maximum Likelihood Based Estimation of Hazard Function under Shape. Restrictions and Related Statistical Inference.

Maximum Likelihood Based Estimation of Hazard Function under Shape Restrictions and Related Statistical Inference by Desale Habtzghi (Under the direction of Somnath Datta and Mary Meyer ) Abstract The

### 5 Introduction to the Theory of Order Statistics and Rank Statistics

5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order

### E509A: Principle of Biostatistics. (Week 11(2): Introduction to non-parametric. methods ) GY Zou.

E509A: Principle of Biostatistics (Week 11(2): Introduction to non-parametric methods ) GY Zou gzou@robarts.ca Sign test for two dependent samples Ex 12.1 subj 1 2 3 4 5 6 7 8 9 10 baseline 166 135 189

### A note on R 2 measures for Poisson and logistic regression models when both models are applicable

Journal of Clinical Epidemiology 54 (001) 99 103 A note on R measures for oisson and logistic regression models when both models are applicable Martina Mittlböck, Harald Heinzl* Department of Medical Computer

### Journal of Statistical Software

JSS Journal of Statistical Software January 2011, Volume 38, Issue 2. http://www.jstatsoft.org/ Analyzing Competing Risk Data Using the R timereg Package Thomas H. Scheike University of Copenhagen Mei-Jie

### The Type I Generalized Half Logistic Survival Model

International Journal of Theoretical and Applied Mathematics 2016; 2(2): 74-78 http://www.sciencepublishinggroup.com/j/ijtam doi: 10.11648/j.ijtam.20160202.17 The Type I Generalized Half Logistic urvival

### time = c(4.5, 3.9, 1.0, 2.3, 4.8, 0.1, 0.3, 2.0, 0.8, 3.0, 0.3, 2.1) answer = c( 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0)

Chapter 15. Censored Data Models Suppose you are given a small data set from a call center on how long the customers stayed on the phone, until either hanging up or being answered. The data set looks like

### Linear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017

Linear Models: Comparing Variables Stony Brook University CSE545, Fall 2017 Statistical Preliminaries Random Variables Random Variables X: A mapping from Ω to ℝ that describes the question we care about

### Binary choice. Michel Bierlaire

Binary choice Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique Fédérale de Lausanne M. Bierlaire (TRANSP-OR ENAC EPFL)

### Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

### Integrated likelihoods in survival models for highlystratified

Working Paper Series, N. 1, January 2014 Integrated likelihoods in survival models for highlystratified censored data Giuliana Cortese Department of Statistical Sciences University of Padua Italy Nicola

### Modeling and inference for an ordinal effect size measure

STATISTICS IN MEDICINE Statist Med 2007; 00:1 15 Modeling and inference for an ordinal effect size measure Euijung Ryu, and Alan Agresti Department of Statistics, University of Florida, Gainesville, FL

### Randomization Tests for Regression Models in Clinical Trials

Randomization Tests for Regression Models in Clinical Trials A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University By Parwen

### Estimation of Conditional Kendall s Tau for Bivariate Interval Censored Data

Communications for Statistical Applications and Methods 2015, Vol. 22, No. 6, 599 604 DOI: http://dx.doi.org/10.5351/csam.2015.22.6.599 Print ISSN 2287-7843 / Online ISSN 2383-4757 Estimation of Conditional

### Chapter 4 Parametric Families of Lifetime Distributions

Chapter 4 Parametric Families of Lifetime istributions In this chapter, we present four parametric families of lifetime distributions Weibull distribution, gamma distribution, change-point model, and mixture

### FRAILTY MODELS FOR MODELLING HETEROGENEITY

FRAILTY MODELS FOR MODELLING HETEROGENEITY By ULVIYA ABDULKARIMOVA, B.Sc. A Thesis Submitted to the School of Graduate Studies in Partial Fulfillment of the Requirements for the Degree Master of Science

### Attributable Risk Function in the Proportional Hazards Model

UW Biostatistics Working Paper Series 5-31-2005 Attributable Risk Function in the Proportional Hazards Model Ying Qing Chen Fred Hutchinson Cancer Research Center, yqchen@u.washington.edu Chengcheng Hu

### Sample Size and Power I: Binary Outcomes. James Ware, PhD Harvard School of Public Health Boston, MA

Sample Size and Power I: Binary Outcomes James Ware, PhD Harvard School of Public Health Boston, MA Sample Size and Power Principles: Sample size calculations are an essential part of study design Consider

### Lecture 8 Stat D. Gillen

Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels

### Logistic regression: Miscellaneous topics

Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio

### Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

The Analysis of Exponentially Distributed Life-Times with Two Types of Failure Author(s): D. R. Cox Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 21, No. 2 (1959), pp.

### A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials

### Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data

Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 22 11-2014 Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data Jayanthi Arasan University Putra Malaysia,

### ANALYSIS OF COMPETING RISKS DATA WITH MISSING CAUSE OF FAILURE UNDER ADDITIVE HAZARDS MODEL

Statistica Sinica 18(28, 219-234 ANALYSIS OF COMPETING RISKS DATA WITH MISSING CAUSE OF FAILURE UNDER ADDITIVE HAZARDS MODEL Wenbin Lu and Yu Liang North Carolina State University and SAS Institute Inc.

### Stat 475 Life Contingencies I. Chapter 2: Survival models

Stat 475 Life Contingencies I Chapter 2: Survival models The future lifetime random variable Notation We are interested in analyzing and describing the future lifetime of an individual. We use (x) to denote

### Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

### MODULE 6 LOGISTIC REGRESSION. Module Objectives:

MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between

### Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

### INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

Pak. J. Statist. 2017 Vol. 33(1), 37-61 INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION A. M. Abd AL-Fattah, A.A. EL-Helbawy G.R. AL-Dayian Statistics Department, Faculty of Commerce, AL-Azhar