Continuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

Similar documents
Censoring mechanisms

The Weibull Distribution

Survival Analysis I (CHL5209H)

Semiparametric Regression

1 Delayed Renewal Processes: Exploiting Laplace Transforms

MAS3301 / MAS8311 Biostatistics Part II: Survival

Poisson Processes. Particles arriving over time at a particle detector. Several ways to describe most common model.

Residuals and model diagnostics

Multistate models and recurrent event models

Accelerated Failure Time Models

Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes.

Lectures prepared by: Elchanan Mossel Yelena Shvets

Multistate models and recurrent event models

Lecture 7. Poisson and lifetime processes in risk analysis

Empirical Processes & Survival Analysis. The Functional Delta Method

Proportional hazards regression

Time-dependent coefficients

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 4 - Survival Models

errors every 1 hour unless he falls asleep, in which case he just reports the total errors

Lecture 22 Survival Analysis: An Introduction

Two-sample inference: Continuous data

ST495: Survival Analysis: Maximum likelihood

Basics of Stochastic Modeling: Part II

Survival Distributions, Hazard Functions, Cumulative Hazards

Cox regression: Estimation

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Lecture for Week 2 (Secs. 1.3 and ) Functions and Limits

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Theoretical results for lasso, MCP, and SCAD

Stat 512 Homework key 2

Stat 475 Life Contingencies I. Chapter 2: Survival models

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d

Lecture 5 Models and methods for recurrent event data

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

STAT 380 Markov Chains

Engineering Risk Benefit Analysis

Step-Stress Models and Associated Inference

Two-sample inference: Continuous data

Survival Analysis. Stat 526. April 13, 2018

Introduction to Reliability Theory (part 2)

3 Continuous Random Variables

MATH 56A: STOCHASTIC PROCESSES CHAPTER 6

Relative efficiency. Patrick Breheny. October 9. Theoretical framework Application to the two-group problem

Math Lecture 4 Limit Laws

UNIVERSITY OF CALIFORNIA, SAN DIEGO

Dynamic Models Part 1

CAS Exam MAS-1. Howard Mahler. Stochastic Models

PROBABILITY DISTRIBUTIONS

The normal distribution

Statistical Quality Control IE 3255 Spring 2005 Solution HomeWork #2

Lecture 4a: Continuous-Time Markov Chain Models

Two-sample Categorical data: Testing

Duration Analysis. Joan Llull

PHYSICS 653 HOMEWORK 1 P.1. Problems 1: Random walks (additional problems)

a j x j. j=0 The number R (possibly infinite) which Theorem 1 guarantees is called the radius of convergence of the power series.

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Optional Stopping Theorem Let X be a martingale and T be a stopping time such

The exponential distribution and the Poisson process

STAT Sample Problem: General Asymptotic Results

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Multiple samples: Modeling and ANOVA

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS

MAT137 - Term 2, Week 2

DAGStat Event History Analysis.

MAS3301 / MAS8311 Biostatistics Part II: Survival

Point Process Control

Practical Applications of Reliability Theory

Fundamentals of Reliability Engineering and Applications

One-sample categorical data: approximate inference

7.1 The Hazard and Survival Functions

Resonance and response

The random counting variable. Barbara Russo

Examination paper for TMA4275 Lifetime Analysis

Modelling data networks stochastic processes and Markov chains

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Slides 8: Statistical Models in Simulation

ASM Study Manual for Exam P, First Edition By Dr. Krzysztof M. Ostaszewski, FSA, CFA, MAAA Errata

Simulating events: the Poisson process

Right-truncated data. STAT474/STAT574 February 7, / 44

STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Multiple Random Variables

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Exponential Distribution and Poisson Process

DS-GA 1002 Lecture notes 2 Fall Random variables

Exam 3, Math Fall 2016 October 19, 2016

Lecture 7 Time-dependent Covariates in Cox Regression

Pump failure data. Pump Failures Time

Honors Differential Equations

Math Spring Practice for the Second Exam.

EE/CpE 345. Modeling and Simulation. Fall Class 5 September 30, 2002

Lecture 10. Riemann-Stieltjes Integration

Department of Mathematics

MA 0090 Section 21 - Slope-Intercept Wednesday, October 31, Objectives: Review the slope of the graph of an equation in slope-intercept form.

Random variables. DS GA 1002 Probability and Statistics for Data Science.

For those of you who are taking Calculus AB concurrently with AP Physics, I have developed a

In case (1) 1 = 0. Then using and from the previous lecture,

Transcription:

Hazard functions Patrick Breheny August 27 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

Introduction Continuous case Let T be a nonnegative random variable representing the time to an event The distribution of T can be specified in a variety of ways, three of which are special to survival analysis in the sense that they come up often in the field, but are encountered only rarely outside of it: The survival function, S(t) The hazard function, λ(t) The cumulative hazard function, Λ(t) We will begin by discussing the case where T follows a continuous distribution, and come back to the discrete and general cases toward the end of lecture Patrick Breheny Survival Data Analysis (BIOS 7210) 2/21

Survival function Continuous case Special case: Constant hazard The survival function of T, denoted S(t), is defined as for t > 0 S(t) = P(T > t) Note that S(t) is related to the distribution function F (t) and the density function f(t) in the following ways: S(t) = 1 F (t) f(t) = S (t) S(t) = t f(s)ds Note that S(t) uniquely defines a distribution Patrick Breheny Survival Data Analysis (BIOS 7210) 3/21

Hazard function Continuous case Special case: Constant hazard The hazard function, λ(t), is the instantaneous rate of failure at time t, given that an individual has survived until at least time t: In addition, note that λ(t) = lim h 0 + P(t T < t + h T t) h = f(t)/s(t) λ(t) = d log S(t) dt Note that hazard functions are nonnegative and, like S(t), uniquely define a distribution (under the assumption that f(t) is continuous) Patrick Breheny Survival Data Analysis (BIOS 7210) 4/21

Cumulative hazard function Special case: Constant hazard Finally, the cumulative hazard function is simply the accumulated hazard up until time t: Note that Λ(t) = t 0 Λ(t) = log S(t) λ(s)ds S(t) = exp{ Λ(t)} f(t) = λ(t) exp{ Λ(t)} and once again, Λ(t) uniquely defines a distribution Patrick Breheny Survival Data Analysis (BIOS 7210) 5/21

Mean residual life Continuous case Special case: Constant hazard It is worth mentioning that there is an interesting connection between the mean and the survival function Specifically, for any nonnegative continuous variable, the expected residual life, r(t) = E(T t T t), is equal to r(t) = t S(u)du S(t) In particular, E(T ) = 0 S(u)du, which you are asked to show as a homework exercise Patrick Breheny Survival Data Analysis (BIOS 7210) 6/21

Constant hazard Continuous case Special case: Constant hazard Let s consider the simplest meaningful survival distribution, that of the constant hazard rate: λ(t) λ 0 0 Obviously, this is overly simplistic for many situations, but is still a very convenient special case to work with t Patrick Breheny Survival Data Analysis (BIOS 7210) 7/21

Special case: Constant hazard Cumulative hazard, survival, and density functions For the special case of constant hazard, the cumulative hazard is Λ(t) = λt the survival function is therefore and the density function is S(t) = e λt f(t) = λe λt Thus, by assuming a constant hazard, we arrive at the exponential distribution Patrick Breheny Survival Data Analysis (BIOS 7210) 8/21

Discrete distributions Continuous case Example: Human lifetime hazards Let s consider the corresponding quantities and relationships for the analogous case where T has a discrete distribution: i.e., T = t j with probability f(t j ) for a set of values t 1 < t 2 < In principle, survival data should always be continuous because time is a continuous quantity For a variety of reasons, however, the discrete case comes up often in survival analysis: Nonparametric approaches typically condition on the observed failure times, resulting in discrete distributions Data is not always recorded at precise times, but only at the level of day/month/year Patrick Breheny Survival Data Analysis (BIOS 7210) 9/21

Survival function in the discrete case Example: Human lifetime hazards The survival function is simply S(t) = t j >t f(t j ) Note that some authors define S(t) as P(T t), while others define it as P(T > t); we will use the latter definition to be consistent with the book Patrick Breheny Survival Data Analysis (BIOS 7210) 10/21

Hazards in the discrete case Example: Human lifetime hazards The hazard function is defined as in the continuous case: where λ j = P(T = t j T t j ) = f(t j )/S(t j ), S(t j ) = lim t t j S(t) As in the continuous case, there is a relationship between S and λ: S(t) = t j t{1 λ j } Patrick Breheny Survival Data Analysis (BIOS 7210) 11/21

Example: Human lifetime hazards Relationship between discrete and continuous cases On the surface, this relationship seems different than what we had for the continuous case However, consider discretizing a continuous hazard by dividing its range into intervals of equal length (i.e., failure at time t j refers to failure in the jth interval) Homework: Show that, for a distribution with constant hazard λ(t) = λ, taking the limit of S(t) = t j t {1 λ j} as the length of the intervals goes to zero yields the exponential distribution survival function This can be shown not just for constant hazards but for any continuous hazard function, although the proof is considerably longer in the general case Patrick Breheny Survival Data Analysis (BIOS 7210) 12/21

Example: Human lifetime hazards Mass function and hazard in the discrete case Finally, we have the following relationships between the probability mass function and hazard in the discrete case: f(t k ) = λ k {1 λ j } j<k Patrick Breheny Survival Data Analysis (BIOS 7210) 13/21

U.S. mortality data Continuous case Example: Human lifetime hazards In the previous lecture, we discussed human life expectancy and the subtle relationship between distributions and hazards Let s look at some actual data from the Human Mortality Database at what the real distribution and hazard functions look like The data come from death counts by year for the United States; it is worth mentioning that I have made no effort to adjust for the fact that the size of the U.S. population was not constant over this time, so these results are clearly biased, but still serve to illustrate the general idea Patrick Breheny Survival Data Analysis (BIOS 7210) 14/21

Example: Human lifetime hazards Distribution: Age at death for people who died in 1960 100 80 Deaths (thousands) 60 40 20 0 0 20 40 60 80 100 Age Patrick Breheny Survival Data Analysis (BIOS 7210) 15/21

Example: Human lifetime hazards Hazard function (based on 1960 deaths) 0.3 Hazard 0.2 0.1 0.0 0 20 40 60 80 100 Age Patrick Breheny Survival Data Analysis (BIOS 7210) 16/21

Example: Human lifetime hazards Log(Hazard function) (based on 1960 deaths) log(hazard) 1 2 3 4 5 6 7 0 20 40 60 80 100 Age Patrick Breheny Survival Data Analysis (BIOS 7210) 17/21

: Introduction Finally, let us consider the general case, without assuming that T is either continuous or discrete For the most part, everything we will do in this class falls into either the continuous or discrete cases, but seeing the general results are useful for a few reasons: General results can help one to see things from a broader, more universal perspective We need general results to deal with mixed distributions that have both continuous and discrete components It provides a good exposure to some extensions of calculus that may be new to you Patrick Breheny Survival Data Analysis (BIOS 7210) 18/21

Differential increments Continuous case Since F and Λ are not necessarily differentiable, expressions such as f(t) = d dtf (t) are not valid Instead, we must work in terms of differential increments: df (t) = P{T [t, t + dt)} = f j 1{t = t j } + f(t)dt, where f j = P{T = t j } and f(t) is the density of the continuous component of T Similarly, the differential increment of Λ(t) is dλ(t) = λ j 1{t = t j } + λ(t)dt Patrick Breheny Survival Data Analysis (BIOS 7210) 19/21

Stieltjes integrals Continuous case F and Λ can then be reconstructed from their differential increments using an extension of integration called Stieltjes integration Stieltjes integrals are written in terms of the differential increments as: Λ(t) = t 0 dλ = lim n n Λ(t i ) Λ(t i 1 ), i=1 where 0 = t 0 < t 1 < < t n = t Obviously, we re glossing over some technical ideas here (does this limit exist? Does it depend on the partition we choose?, etc.), but hopefully the basic idea makes sense Patrick Breheny Survival Data Analysis (BIOS 7210) 20/21

Product integrals Continuous case Finally, we can relate hazard functions and survival functions, but we need something called a product integral (basically, a product integral is to products what the integral is to sums): t S(t) = {1 dλ} 0 lim n k=1 t = exp n {1 [Λ(t k ) Λ(t k 1 )]} { 0 } λ(u)du t j <t (1 λ j ) Next week, we ll start talking about inference how to infer things about λ(t) and S(t) from data particularly in the presence of censoring Patrick Breheny Survival Data Analysis (BIOS 7210) 21/21