Introduction to the rstpm2 package

Similar documents
Modelling excess mortality using fractional polynomials and spline functions. Modelling time-dependent excess hazard

Generalized Survival Models as a Tool for Medical Research

Modelling geoadditive survival data

The nltm Package. July 24, 2006

Introduction to the predictnl function

Simulating complex survival data

Direct likelihood inference on the cause-specific cumulative incidence function: a flexible parametric regression modelling approach

flexsurv: flexible parametric survival modelling in R. Supplementary examples

Instantaneous geometric rates via Generalized Linear Models

3003 Cure. F. P. Treasure

Lecture 7 Time-dependent Covariates in Cox Regression

Instantaneous geometric rates via generalized linear models

Package GORCure. January 13, 2017

Joint longitudinal and time-to-event models via Stan

Direct likelihood inference on the cause-specific cumulative incidence function: a flexible parametric regression modelling approach

Analysing geoadditive regression data: a mixed model approach

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Parametric multistate survival analysis: New developments

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

Classification. Chapter Introduction. 6.2 The Bayes classifier

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Package SimSCRPiecewise

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Package ICGOR. January 13, 2017

A general mixed model approach for spatio-temporal regression data

The coxvc_1-1-1 package

Accelerated Failure Time Models

One-stage dose-response meta-analysis

Survival Analysis I (CHL5209H)

Package JointModel. R topics documented: June 9, Title Semiparametric Joint Models for Longitudinal and Counting Processes Version 1.

Multivariable Fractional Polynomials

mboost - Componentwise Boosting for Generalised Regression Models

Lecture 8 Stat D. Gillen

Regularization in Cox Frailty Models

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Package CorrMixed. R topics documented: August 4, Type Package

ST495: Survival Analysis: Maximum likelihood

Harvard University. Harvard University Biostatistics Working Paper Series. Survival Analysis with Change Point Hazard Functions

Multivariable Fractional Polynomials

NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS

Reduced-rank hazard regression

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

MAS3301 / MAS8311 Biostatistics Part II: Survival

Time-dependent coefficients

lqmm: Estimating Quantile Regression Models for Independent and Hierarchical Data with R

Part [1.0] Measures of Classification Accuracy for the Prediction of Survival Times

Meta-analysis of epidemiological dose-response studies

Quantile Regression for Residual Life and Empirical Likelihood

Survival Regression Models

Survival Analysis. Stat 526. April 13, 2018

Estimating compound expectation in a regression framework with the new cereg command

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Marginal Survival Modeling through Spatial Copulas

SSUI: Presentation Hints 2 My Perspective Software Examples Reliability Areas that need work

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Point-wise averaging approach in dose-response meta-analysis of aggregated data

ST745: Survival Analysis: Cox-PH!

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

for Time-to-event Data Mei-Ling Ting Lee University of Maryland, College Park

Multistate models and recurrent event models

Flexible parametric alternatives to the Cox model, and more

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

Case-control studies

Multivariate Survival Analysis

Frailty models. November 30, Programme

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

stcrmix and Timing of Events with Stata

Multistate models and recurrent event models

Extensions of Cox Model for Non-Proportional Hazards Purpose

CTDL-Positive Stable Frailty Model

Lecture 10. Diagnostics. Statistics Survival Analysis. Presented March 1, 2016

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

Maximum likelihood estimation of a log-concave density based on censored data

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

A brief note on the simulation of survival data with a desired percentage of right-censored datas

Residuals and model diagnostics

Hierarchical Hurdle Models for Zero-In(De)flated Count Data of Complex Designs

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

BEGINNING BAYES IN R. Comparing two proportions

Package ICBayes. September 24, 2017

β j = coefficient of x j in the model; β = ( β1, β2,

Working with Stata Inference on the mean

Package spef. February 26, 2018

General Regression Model

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

MAS3301 / MAS8311 Biostatistics Part II: Survival

Error distribution function for parametrically truncated and censored data

parfm: Parametric Frailty Models in R

Package threg. August 10, 2015

Available online Journal of Scientific and Engineering Research, 2018, 5(6): Research Article

Longitudinal + Reliability = Joint Modeling

Research Design: Topic 18 Hierarchical Linear Modeling (Measures within Persons) 2010 R.C. Gardner, Ph.d.

Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

RESEARCH ARTICLE. Detecting Multiple Change Points in Piecewise Constant Hazard Functions

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Package penmsm. R topics documented: February 20, Type Package

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

Transcription:

Introduction to the rstpm2 package Mark Clements Karolinska Institutet Abstract This vignette outlines the methods and provides some examples for link-based survival models as implemented in the R rstpm2 package. Keywords: survival, splines. 1. Background and theory Link-based survival models provide a flexible and general approach to modelling survival or time-to-event data. The survival function S(t x) to time t for covariates x is defined in terms of a link function G and a linear prediction η(t, x), such that S(t x) = G(η(t, x)) where η is a function of both time t and covariates x. The linear predictor can be constructed in a flexible manner. Royston and Parmar (2003) focused on time being modelled using natural splines for log-time, including left truncation and relative survival. We have implemented the Royston-Parmar model class and extended it in several ways, allowing for: (i) general parametric models for η(t, x), including B-splines and natural splines for different transformations of time; (ii) general semi-parametric models for η(t, x) including penalised smoothers together with unpenalised parametric functions; (iii) interval censoring; and (iv) frailties using Gamma and log-normal distributions. Fully parametric models are estimated using maximum likelihood, while the semi-parametric models are estimated using maximum penalised likelihood with smoothing parameters selected using A more detailed theoretical development is available from the paper by Liu, Pawitan and Clements (available on request). Why would you want to use these models? 2. Mean survival This has a useful interpretation for causal inference. E Z (S(t Z, X = 1)) E Z (S(t Z, X = 0)) fit <- stpm2(...) predict(fit,type="meansurv",newdata=data)

2 Introduction to the rstpm2 package 3. Cure models For cure, we use the melanoma dataset used by Andersson and colleagues for cure models with Stata s stpm2 (see http://www.pauldickman.com/survival/). Initially, we merge the patient data with the all cause mortality rates. > popmort2 <- transform(rstpm2::popmort,exitage=age,exityear=year,age=null,year=null) > colon2 <- within(rstpm2::colon, { + status <- ifelse(surv_mm>120.5,1,status) + tm <- pmin(surv_mm,120.5)/12 + exit <- dx+tm*365.25 + sex <- as.numeric(sex) + exitage <- pmin(floor(age+tm),99) + exityear <- floor(yydx+tm) + ##year8594 <- (year8594=="diagnosed 85-94") + }) > colon2 <- merge(colon2,popmort2) For comparisons, we fit the relative survival model without and with cure. > fit0 <- stpm2(surv(tm,status %in% 2:3)~I(year8594=="Diagnosed 85-94"), + data=colon2, + bhazard=colon2$rate, df=5) > summary(fit <- stpm2(surv(tm,status %in% 2:3)~I(year8594=="Diagnosed 85-94"), + data=colon2, + bhazard=colon2$rate, + df=5,cure=true)) Maximum likelihood estimation Call: mle2(minuslogl = negll, start = coef, eval.only = TRUE, vecpar = TRUE, gr = function (beta) { localargs <- args localargs$init <- beta localargs$return_type <- "gradient" return(.call("model_output", localargs, PACKAGE = "rstpm2")) }, control = list(parscale = c(`(intercept)` = 1, `I(year8594 == "Diagnosed 85-94")TRUE `nsx(log(tm), df = 5, cure = TRUE)1` = 1, `nsx(log(tm), df = 5, cure = TRUE)2` = 1, `nsx(log(tm), df = 5, cure = TRUE)3` = 1, `nsx(log(tm), df = 5, cure = TRUE)4` = 1, `nsx(log(tm), df = 5, cure = TRUE)5` = 1), maxit = 300), lower = -Inf, upper = Inf) Coefficients: Estimate Std. Error z value Pr(z)

Mark Clements 3 (Intercept) -3.977663 0.054782-72.6093 < 2.2e-16 I(year8594 == "Diagnosed 85-94")TRUE -0.155511 0.025089-6.1984 5.704e-10 nsx(log(tm), df = 5, cure = TRUE)1 3.323382 0.053169 62.5058 < 2.2e-16 nsx(log(tm), df = 5, cure = TRUE)2 3.628899 0.053163 68.2597 < 2.2e-16 nsx(log(tm), df = 5, cure = TRUE)3 1.634974 0.022466 72.7752 < 2.2e-16 nsx(log(tm), df = 5, cure = TRUE)4 6.592489 0.111512 59.1192 < 2.2e-16 nsx(log(tm), df = 5, cure = TRUE)5 3.371954 0.042789 78.8036 < 2.2e-16 (Intercept) *** I(year8594 == "Diagnosed 85-94")TRUE *** nsx(log(tm), df = 5, cure = TRUE)1 *** nsx(log(tm), df = 5, cure = TRUE)2 *** nsx(log(tm), df = 5, cure = TRUE)3 *** nsx(log(tm), df = 5, cure = TRUE)4 *** nsx(log(tm), df = 5, cure = TRUE)5 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1-2 log L: 42190.77 > predict(fit,head(colon2),se.fit=true) 1 0.8611043 0.8543119 0.8676050 2 0.7934962 0.7850418 0.8016614 3 0.6967834 0.6863627 0.7069356 4 0.8611043 0.8543119 0.8676050 5 0.8221508 0.8143497 0.8296593 6 0.8611043 0.8543119 0.8676050 The estimate for the year parameter from the model without cure is within three significant figures with that in Stata. For the predictions, the Stata model gives: +---------------------------------+ surv surv_lci surv_uci --------------------------------- 1..86108264.8542898.8675839 2..79346526.7850106.8016309 3..69674037.6863196.7068927 4..86108264.8542898.8675839 5..82212425.8143227.8296332 --------------------------------- 6..86108264.8542898.8675839 +---------------------------------+ We can estimate the proportion of failures prior to the last event time:

4 Introduction to the rstpm2 package > newdata.eof <- data.frame(year8594 = unique(colon2$year8594), + tm=10) > 1-predict(fit0, newdata.eof, type="surv", se.fit=true) 1 0.6060950 0.6208814 0.5913491 2 0.5512519 0.5658463 0.5367742 > 1-predict(fit, newdata.eof, type="surv", se.fit=true) 1 0.5912976 0.6054691 0.5771835 2 0.5350852 0.5485412 0.5217471 > predict(fit, newdata.eof, type="haz", se.fit=true) 1 1.253896e-06 1.092818e-06 1.438717e-06 2 1.073307e-06 9.334234e-07 1.234153e-06 We can plot the predicted survival estimates: > tms=seq(0,10,length=301)[-1] > plot(fit0,newdata=data.frame(year8594 = "Diagnosed 85-94", tm=tms), ylim=0:1, + xlab="time since diagnosis (years)", ylab="relative survival") > plot(fit0,newdata=data.frame(year8594 = "Diagnosed 75-84",tm=tms), + add=true,line.col="red",rug=false) > ## warnings: Predicted hazards less than zero for cure > plot(fit,newdata=data.frame(year8594 = "Diagnosed 85-94",tm=tms), + add=true,ci=false,lty=2,rug=false) > plot(fit,newdata=data.frame(year8594="diagnosed 75-84",tm=tms), + add=true,rug=false,line.col="red",ci=false,lty=2) > legend("topright",c("85-94 without cure","75-84 without cure", + "85-94 with cure","75-84 with cure"), + col=c(1,2,1,2), lty=c(1,1,2,2), bty="n")

Mark Clements 5 Relative survival 0.0 0.2 0.4 0.6 0.8 1.0 85 94 without cure 75 84 without cure 85 94 with cure 75 84 with cure 0 2 4 6 8 10 Time since diagnosis (years) And the hazard curves: > plot(fit0,newdata=data.frame(year8594 = "Diagnosed 85-94", tm=tms), + ylim=c(0,0.5), type="hazard", + xlab="time since diagnosis (years)",ylab="excess hazard") > plot(fit0,newdata=data.frame(year8594 = "Diagnosed 75-84", tm=tms), + type="hazard", + add=true,line.col="red",rug=false) > plot(fit,newdata=data.frame(year8594 = "Diagnosed 85-94", tm=tms), + type="hazard", + add=true,ci=false,lty=2,rug=false) > plot(fit,newdata=data.frame(year8594="diagnosed 75-84", tm=tms), + type="hazard", + add=true,rug=false,line.col="red",ci=false,lty=2) > legend("topright",c("85-94 without cure","75-84 without cure", + "85-94 with cure","75-84 with cure"), + col=c(1,2,1,2), lty=c(1,1,2,2), bty="n")

6 Introduction to the rstpm2 package Excess hazard 0.0 0.1 0.2 0.3 0.4 0.5 85 94 without cure 75 84 without cure 85 94 with cure 75 84 with cure 0 2 4 6 8 10 Time since diagnosis (years) Affiliation: Mark Clements Department of Medical Epidemiology and Biostatistics Karolinska Institutet Email: mark.clements@ki.se