PhD course in Advanced survival analysis. One-sample tests. Properties. Idea: (ABGK, sect. V.1.1) Counting process N(t)

Similar documents
Statistics 262: Intermediate Biostatistics Non-parametric Survival Analysis

Linear rank statistics

STAT331. Combining Martingales, Stochastic Integrals, and Applications to Logrank Test & Cox s Model

STAT 331. Martingale Central Limit Theorem and Related Results

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

STAT Sample Problem: General Asymptotic Results

Chapter 7 Fall Chapter 7 Hypothesis testing Hypotheses of interest: (A) 1-sample

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

STAT331. Cox s Proportional Hazards Model

Cox s proportional hazards model and Cox s partial likelihood

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Definitions and examples Simple estimation and testing Regression models Goodness of fit for the Cox model. Recap of Part 1. Per Kragh Andersen

Lecture 2: Martingale theory for univariate survival analysis

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Survival Analysis I (CHL5209H)

Survival Analysis Math 434 Fall 2011

Session 3 The proportional odds model and the Mann-Whitney test

β j = coefficient of x j in the model; β = ( β1, β2,

Lecture 3. Truncation, length-bias and prevalence sampling

USING MARTINGALE RESIDUALS TO ASSESS GOODNESS-OF-FIT FOR SAMPLED RISK SET DATA

DAGStat Event History Analysis.

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

TESTS FOR LOCATION WITH K SAMPLES UNDER THE KOZIOL-GREEN MODEL OF RANDOM CENSORSHIP Key Words: Ke Wu Department of Mathematics University of Mississip

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Multistate Modeling and Applications

ST495: Survival Analysis: Maximum likelihood

4. Comparison of Two (K) Samples


Lecture 22 Survival Analysis: An Introduction

Survival Regression Models

Lecture 5 Models and methods for recurrent event data

Introduction to Statistical Analysis

Exercises. (a) Prove that m(t) =

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Chapter 7: Hypothesis testing

ST745: Survival Analysis: Nonparametric methods

4 Testing Hypotheses. 4.1 Tests in the regression setting. 4.2 Non-parametric testing of survival between groups

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Nonparametric two-sample tests of longitudinal data in the presence of a terminal event

Statistical Inference and Methods

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Examination paper for TMA4275 Lifetime Analysis

1 Glivenko-Cantelli type theorems

STAT331 Lebesgue-Stieltjes Integrals, Martingales, Counting Processes

Statistical Analysis of Competing Risks With Missing Causes of Failure

11 Survival Analysis and Empirical Likelihood

Kernel density estimation in R

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Goodness-of-fit test for the Cox Proportional Hazard Model

log T = β T Z + ɛ Zi Z(u; β) } dn i (ue βzi ) = 0,

Smoothing the Nelson-Aalen Estimtor Biostat 277 presentation Chi-hong Tseng

MAS3301 / MAS8311 Biostatistics Part II: Survival

Relative-risk regression and model diagnostics. 16 November, 2015

Power and Sample Size Calculations with the Additive Hazards Model

Resampling methods for randomly censored survival data

Empirical Processes & Survival Analysis. The Functional Delta Method

Survival Analysis for Case-Cohort Studies

Modelling the risk process

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

EMPIRICAL ENVELOPE MLE AND LR TESTS. Mai Zhou University of Kentucky

Survival analysis in R

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Follow this and additional works at: Part of the Applied Mathematics Commons

Semiparametric Regression

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Beyond GLM and likelihood

Survival Analysis: Weeks 2-3. Lu Tian and Richard Olshen Stanford University

Multivariate Survival Data With Censoring.

Faculty of Health Sciences. Cox regression. Torben Martinussen. Department of Biostatistics University of Copenhagen. 20. september 2012 Slide 1/51

Survival analysis in R

CIMPA SCHOOL, 2007 Jump Processes and Applications to Finance Monique Jeanblanc

Lecture 17: Likelihood ratio and asymptotic tests

A note on the decomposition of number of life years lost according to causes of death

9 Estimating the Underlying Survival Distribution for a

Multistate models STK4080 H Competing risk setting 2. Illness-Death setting 3. General event histories and Aalen-Johansen estimator

Survival Times (in months) Survival Times (in months) Relative Frequency. Relative Frequency

Philosophy and Features of the mstate package

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 7 Time-dependent Covariates in Cox Regression

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Survival Analysis: Counting Process and Martingale. Lu Tian and Richard Olshen Stanford University

MODELING MISSING COVARIATE DATA AND TEMPORAL FEATURES OF TIME-DEPENDENT COVARIATES IN TREE-STRUCTURED SURVIVAL ANALYSIS

Estimation for Modified Data

Survival Analysis. STAT 526 Professor Olga Vitek

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Central Limit Theorem ( 5.3)

Discrete Multivariate Statistics

Chapter 4 Fall Notations: t 1 < t 2 < < t D, D unique death times. d j = # deaths at t j = n. Y j = # at risk /alive at t j = n

Master s Written Examination - Solution

Turning a research question into a statistical question.

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Survival Analysis APTS 2016/17. Ingrid Van Keilegom ORSTAT KU Leuven. Glasgow, August 21-25, 2017

Survival Times (in months) Survival Times (in months) Relative Frequency. Relative Frequency

Modeling Prediction of the Nosocomial Pneumonia with a Multistate model

Tests of independence for censored bivariate failure time data

Logistic regression: Miscellaneous topics

A Comparison of Different Approaches to Nonparametric Inference for Subdistributions

Transcription:

PhD course in Advanced survival analysis. (ABGK, sect. V.1.1) One-sample tests. Counting process N(t) Non-parametric hypothesis tests. Parametric models. Intensity process λ(t) = α(t)y (t) satisfying Aalen s multiplicative intensity model. Usually, N(t) is obtained by aggregating individual counting processes, α(t) is the hazard function, and Y (t) is the number at risk. 1 October 212 Per Kragh Andersen We will test the null hypothesis H : α(t) = α (t), with α (t) a known hazard function, e.g. a population mortality. 1 2 Compare Nelson-Aalen increments dn(s) dâ(s) = Y (s) Idea: with α (s)ds. Formally: introduce predictable, non-negative weights, K(s), and consider the process: Z(t) = ( ) K(s) dâ(s) α (s)ds. Then Z(t) tends to be large (small) if α(s) is larger (smaller) than α (s) for all s [, t]. Note that we always assume that K(s) = whenever Y (s) =. Properties. If H is true then dn(s) = α (s)y (s)ds + dm(s) and Z(t) equals the stochastic integral Z(t) = K(s) Y (s) dm(s). In particular, Z(t) is a mean zero martingale with (under H ) ( ) 2 K(s) t K 2 (s) Z (t) = d M (s) = Y (s) Y (s) α (s)ds. Conditions can be found under which, by the martingale CLT, the test statistic Z(t)/ Z (t) is asymptotically standard normal under H. 3 4

Choice of weights. Common choice is K(t) = Y (t), cf. Ex. V.1.1. Then with Z(t) = ( ) dn(s) Y (s) Y (s) α (s)ds = N(t) E(t), Z (t) = E(t) = Y 2 (s) Y (s) α (s)ds = E(t) Y (s)α (s)ds. The one-sample logrank test statistic is then Example: Fyn diabetics. For female diabetics (Ex. V.1.3) we have, for t = 1 years, N(t) = 237 and using published Danish life-tables we get E(t) = 8.1. The one-sample logrank statistic takes the (highly significant) value: 237 8.1 8.1 = 17.5. (N(t) E(t))/ E(t) and E(t) is the expected number of events in [, t] under H (since then N(t) E(t) is a martingale). 5 6 (ABGK sec. V.2.1) k-sample tests. k-variate counting process: (N 1 (t),...,n k (t)) where, typically, each component, N h (t), is obtained by aggregating individual processes. Intensity process (λ 1 (t),...,λ k (t)) of the form: We will test the null hypothesis: λ h (t) = α h (t)y h (t), h = 1,...,k. H : α 1 (t) =... = α k (t) for all t, e.g., if mortality is the same in k groups. 7 Properties of test statistic. If H is true then N (t) = k h=1 N h(t) is a univariate counting process with intensity process: λ (t) = k λ h (t) = α(t)y (t) h=1 where α(t) is the common value of the α h (t) s under H. The idea is now to compare the Nelson-Aalen increments a priori: with those under H : dâh(t) = dn h(t) Y h (t) dâ(t) = dn (t) Y (t). 8

Properties of test statistic. Formally, follow the approach from the one-sample situation: introduce non-negative weights, K(s), and consider the processes (for h = 1,...,k): Z h (t) = = Note that k h=1 Z h(t) =. ( ) K(s)Y h (s) dâ h (s) dâ(s) K(s)dN h (s) (observed (weighted)) K(s) Y h(s) Y (s) dn (s) (expected (weighted)). We assume that K(s) = whenever Y (s) =. Properties of test statistic. Under H, the Z h (t) s are mean zero martingales. From their predictable (co-)variance processes, we obtain the (co-)variance estimators: σ hj (t) = K 2 (s) Y ( h(s) δ hj Y ) j(s) dn (s). Y (s) Y (s) Introduce Z(t) = (Z 1 (t),...,z k 1 (t)) T and Σ(t) = (σ hj (t)) k 1 h,j=1 and use the quadratic form as test statistic: Z(t) T Σ(t) 1 Z(t) which is χ 2 k 1 under H under suitable regularity conditions (Th. V.2.1). 9 1 General choices: Choices of weights. Nelson-Aalen estimates by tumor thickness. Log-rank (Savage): K(s) = I(Y (s) > ). Gehan-Breslow (Wilcoxon, Kruskal-Wallis): K(s) = Y (s). Tarone-Ware: K(s) = Y (s) ρ. Survival data: Peto-Prentice (Wilcoxon, Kruskal-Wallis): K(s) = ( ) u<s 1 N (u) Y (s) Y (u)+1 Y (s)+1 Harrington-Fleming: K(s) = I(Y (s) > )Ŝ(s ) ρ. The log-rank test is optimal for proportional hazards alternatives. Categories: -2, 2-5, 5+ mm 11 12

Melanoma data Compare mortality for tumor thickness groups: below 2mm, 2-5mm, above 5mm. Log-rank test (at t =14 years): X 2 (t) = 31.6, d.f.=2 Harrington-Fleming test (at t =14 years, ρ = 1): X 2 (t) = 33.9, d.f.=2 The two-sample case. When k = 2 the test statistic takes the form X 2 (t) = (Z 1(t)) 2 σ 11 (t) and an alternative, standard normally distributed (under H ), statistic is: U(t) = Note that U(t) has a direction. Z 1(t) σ11 (t). Example: melanoma data, compare males and females. Log-rank: U(t) = 2.54, P =.11, Gehan-Breslow: U(t) = 2.72, P =.65, Peto-Prentice: U(t) = 2.66, P =.77. 13 14 Nelson-Aalen estimates by sex. Exercise 1. Show that the 2-sample logrank test (i.e., K(s) = I(Y. (s) > )) can be written in the form Z 1 (t) = Y 1 (s)y 2 (s) Y 1 (s) + Y 2 (s) (dâ2(s) dâ1(s)). 2. Show that, under H : α 1 (t) = α 2 (t), Z 1 is a martingale. 3. Find the predictable variation process Z 1 and the variance estimate σ 11 (t). Males: dashed, females: solid 15 16

Parametric models. Set-up: N i (t), i = 1,...,n are counting processes with intensity processes λ i (t), i = 1,...,n w.r.t. filtration (F t ). Recall that λ i (t)dt P(dN i (t) = 1 F t ). Examples (mainly survival data) of parametric models (where Y i (t) is at-risk indicator): λ i (t) = αy i (t), constant hazard λ i (t) = αρ(αt) ρ 1 Y i (t), Weibull λ i (t) = θµ i (t)y i (t), relative mortality λ i (t) = (γ + µ i (t))y i (t), excess mortality λ i (t) = α j Y i (t), t j < t t j+1 piecewise constant hazard Likelihood: Jacod s formula. ABGK; p.58. Consider a general parametric model λ i (t) = α i (t; θ)y i (t), where θ = (θ 1,...,θ p ) is the parameter vector. To derive the likelihood for θ, let N = i N i, Y = i Y i and note that P(dN (t) = F t ) 1 λ (t; θ)dt. With τ being the terminal time of observation we have P(data) = <t τ π P(dN 1 (t),...,dn n (t) F t )P(other events at t dn(t), F t ) <t τ π P(dN 1 (t),...,dn n (t) F t ) 17 18 L(θ) = P(data) = ( Likelihood. π <t τ λ 1 (t, θ) dn 1(t) λ n (t, θ) dn n(t) (1 λ (t, θ)dt) 1 dn (t) n <t τ λ i (t, θ) dni(t) ) exp( λ (u, θ)du). Here, dn i (t) = 1 if N i jumps at t and otherwise. One may show that the scores U j (θ) = θ j log L(θ) are stochastic integrals w.r.t. the counting process martingales when evaluated at the true parameter: Score: log L(θ) = θ j log L(θ) = log λ i (t, θ)dn i (t) λ (t)dt. λ ij (t, θ) τ λ i (t, θ) dn i(t) λ j(t)dt. Use the Doob-Meyer decomposition dn i (t) = λ i (t, θ )dt dm i (t) to show that the score evaluated at θ equals: λ ij (t, θ) log L(θ) = θ j λ i (t, θ) dm i(t). Thereby, using the martingale CLT, asymptotic normality of the scores may be established leading to a proof that the MLE for θ = (θ 1,...,θ q ) enjoys the usual large sample properties. 19 2

Sufficiency. In most of the examples above, λ i (t; θ) = α(t; θ)y i (t) in which case the likelihood is (proportional to): L(θ) = ( α(t; θ) dn (t) ) exp( <t τ It follows that (N, Y ) is sufficient for θ. α(u; θ)y (u)du). Constant hazard. (N, Y ) sufficient when the hazard is assumed constant, i.e. λ i (t; α) = αy i (t). If we let R(t) = Y (u)du be the total time at risk in [, t] then L(α) = α N (τ) exp( αr(τ)) and the MLE is the occurrence/exposure rate α = N (τ) R(τ) NOTE: The Nelson-Aalen estimator (and thereby, by properties of the product-integral, also the Kaplan-Meier and Aalen-Johansen estimators, to be defined later) may be given a MLE interpretation. and ŜD( α) = ( 2 α 2 log L( α)) 1 2 = α R(τ). 21 22 Nelson-Aalen estimates for male and female melanoma patients. Example: melanoma data. Here we have, for females: α = 28 786.9 =.356 per year with ŜD =.67 while, for males, we have α = 29 42.8 =.689 per year with ŜD =.128. We can compare the two using the approximately χ 2 1 LR statistic which takes the value 6.15 (P =.1). The constant hazard hypothesis may be evaluated using confidence bands for the Nelson-Aalen estimates or by embedding the model in the larger Weibull model. For the Weibull model the estimated shape parameter for females is ρ = 1.197(.22) while for males it is ρ = 1.17(.166) and for both sexes the Wald statistic ( ρ 1)/ŜD is quite insignificant. Males: dashed, females: solid 23 24

Nelson-Aalen estimate for male melanoma patients with bands. Piecewise constant hazards. The constant hazard model is often quite restrictive, however, a simple device offers a lot of flexibility, relax the assumption to piecewise constant hazards: λ i (t) = α j Y i (t), t j < t t j+1, where = t < t 1 <... < t k 1 < t k = + are pre-specified interval cut-points. Define occurrence, D j = N (t j+1 ) N (t j ) and exposure, R j = j+1 t j Y (u)du. Then L(α) = ( j α D j j ) exp( j α j R j ), and the MLE simply becomes α j = D j R j, ŜD( α j ) = α j R j, as. indep. 25 26 Example: suicides among Danish non-manual workers. Ex. I.3.3, p.17. Four categories of non-manual workers: 1. academics 2. advanced non-academic training 3. extensive practical training 4. other non-manual workers Suicides and person-years tabulated by age group and work category (excerpt): Suicides Age Group 1 Group 2 Group 3 Group 4 2-24 2 14 31 51 25-29 1 48 46 46 Males... 6-64 6 1 21 17 2-24 12 29 61 25-29 5 1 27 65 Females... 6-64 2 3 7 17 Person-years Age Group 1 Group 2 Group 3 Group 4 2-24 13439 67623 153241 22563 25-29 52787 Males... 6-64 22965 3716 8327 45255 2-24 2366 79773 199778 6134 25-29 1447 Females... 6-64 4866 18858 3275 69418 27 28

Example: suicides among Danish non-manual workers. Age-specific suicide rates (e.g., male academics, 2-24, 2/13439=.15; 25-29, 1/52787=.19): Suicide rates (per 1 years) (SD) Age Males Females 2-24 1.5 (1.1) () 25-29 1.9 (.6) 3.6 (1.6) 3-34 4.9 (1.) 4.9 (1.9) 35-39 4. (.9) 7.6 (2.9) 4-44 5.2 (1.) 6.4 (2.6) 45-49 4.1 (.9) 4.9 (2.2) 5-54 5.1 (1.1) 4.6 (2.3) 55-59 5.8 (1.3) 9. (3.7) 6-64 2.6 (1.1) 4.1 (2.9) The standardized mortality ratio. Recall the model for relative mortality λ i (t) = θµ i (t)y i (t) where µ i (t) is a known mortality rate, e.g. an age-, sex-, and calendar time specific population mortality for invividual i at time t. If θ = 1 we have N (τ) = µ i (u)y i (u)du + M (τ), and thereby E(N (τ)) = E( n µ i(u)y i (u)du). Therefore, E(τ) = µ i (u)y i (u)du can be thought of as the (under H : θ = 1) expected number of deaths. 29 3 The MLE is θ = N (τ) E(τ), the standardized mortailty ratio with θ ŜD( θ) = E(τ). Example: suicides (continued), let µ i (t) = µsex(i)(t) at age t be defined as occ/exp rates from combined data (treated as known). Then we get for the academics Males: θ = 1.2(.1) Females: θ = 1.57(.25) both marginally significantly (judged from a Wald test) larger than 1. 31