Springer Series in Statistics Series editors Peter Bickel, CA, USA Peter Diggle, Lancaster, UK Stephen E. Fienberg, Pittsburgh, PA, USA Ursula Gather, Dortmund, Germany Ingram Olkin, Stanford, CA, USA Scott Zeger, Baltimore, MD, USA
More information about this series at http://www.springer.com/series/692
Gerhard Tutz Matthias Schmid Modeling Discrete Time-to-Event Data 123
Gerhard Tutz LMU Munich Munich, Germany Matthias Schmid University of Bonn Bonn, Germany ISSN 0172-7397 ISSN 2197-568X (electronic) Springer Series in Statistics ISBN 978-3-319-28156-8 ISBN 978-3-319-28158-2 (ebook) DOI 10.1007/978-3-319-28158-2 Library of Congress Control Number: 2016942538 Springer International Publishing Switzerland 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG Switzerland
Preface In recent years, a large variety of textbooks dealing with time-to-event analysis has been published. Most of these books focus on the statistical analysis of observations in continuous time. In practice, however, one often observes discrete event times either because of grouping effects or because event times are intrinsically measured on a discrete scale. Statistical methodology for discrete event times has been mainly presented in journal articles and a few book chapters. In this book we introduce basic concepts and give several extensions that allow to model discrete time data adequately. In particular, modeling discrete time-to-event data strongly profits from the smoothing and regularization methods that have been developed in recent decades. The presented approaches include methods that allow to find much more flexible models than in the early times of survival modeling. The book is aimed at applied statisticians, students of statistics and researchers from areas like biometrics, social sciences and econometrics. The mathematical level is moderate, instead we focus on basic concepts and data analysis. Objectives The main aims of the book are to provide a thorough introduction to basic and advanced concepts of discrete hazard modelling, to exploit the relationship between hazard models and generalized linear models, to demonstrate how existing statistical software can be used to fit discrete timeto-event models, and to illustrate the statistical methodology for discrete time-to-event models by considering applications from the social sciences, economics and biomedical sciences. v
vi Preface Special Topics This book provides a comprehensive treatment of statistical methodology for discrete time-to-event models. Special topics include non-parametric modeling of survival (e.g., by using smooth baseline hazards and/or smooth predictor effects), methods for the evaluation of model fit and prediction accuracy of discrete timeto-event models, regularized estimation techniques for predictor selection in high-dimensional covariate spaces, and tree-based methods for discrete time-to-event analysis. In addition, each section of the book contains a set of exercises on the respective topics. Implementation and Software All numerical results presented in this book were obtained by using the R System for Statistical Computing (R Core Team 2015). Hence readers are able to reproduce all the results by using freely available software. Various functions and tools for the analysis of discrete time-to-event data are collected in the R package discsurv (Welchowski and Schmid 2015). We are grateful to many colleagues for valuable discussions and suggestions, in particular to Kaveh Bashiri, Moritz Berger, Jutta Gampe, Andreas Groll, Wolfgang Hess, Stephanie Möst, Vito M. R. Mugeo, Margret Oelker, Hein Putter, Micha Schneider and Steffen Unkel. Silke Janitza carefully read preliminary versions of the book and helped to reduce the number of mistakes. We also thank Helmut Küchenhoff for late but substantial suggestions. Special thanks go to Thomas Welchowski for his excellent programming work and to Pia Oberschmidt for assisting us in compiling the subject index. München, Germany Bonn, Germany April 2015 Gerhard Tutz Matthias Schmid
Contents 1 Introduction... 1 1.1 Survivaland Time-to-EventData... 1 1.2 Continuous Versus Discrete Survival... 4 1.3 Overview... 6 1.4 Examples... 7 2 The Life Table... 15 2.1 Life Table Estimates... 15 2.1.1 DistributionalAspects... 19 2.1.2 SmoothLife Table Estimators... 20 2.1.3 HeterogeneousIntervals... 23 2.2 Kaplan MeierEstimator... 25 2.3 Life Tables in Demography... 27 2.4 Literatureand FurtherReading... 31 2.5 Software... 31 2.6 Exercises... 32 3 Basic Regression Models... 35 3.1 The Discrete Hazard Function... 35 3.2 ParametricRegression Models... 37 3.2.1 Logistic Discrete Hazards: The Proportional ContinuationRatio Model... 38 3.2.2 AlternativeModels... 42 3.3 Discrete and Continuous Hazards... 48 3.3.1 Concepts for Continuous Time... 48 3.3.2 The Proportional Hazards Model... 50 3.4 Estimation... 51 3.4.1 StandardErrors... 58 3.5 Time-VaryingCovariates... 59 3.6 Continuous Versus Discrete Proportional Hazards... 64 3.7 Subject-SpecificIntervalCensoring... 67 3.8 Literatureand FurtherReading... 70 vii
viii Contents 3.9 Software... 70 3.10 Exercises... 71 4 Evaluation and Model Choice... 73 4.1 Relevance ofpredictors: Tests... 73 4.2 Residuals and Goodness-of-Fit... 77 4.2.1 No Censoring... 78 4.2.2 Deviancein the Case of Censoring... 80 4.2.3 MartingaleResiduals... 81 4.3 MeasuringPredictivePerformance... 86 4.3.1 Predictive Deviance and R 2 Coefficients... 86 4.3.2 PredictionErrorCurves... 88 4.3.3 DiscriminationMeasures... 92 4.4 Choice of Link FunctionandFlexible Links... 96 4.4.1 Families of Response Functions... 97 4.4.2 Nonparametric Estimation of Link Functions... 101 4.5 Literatureand FurtherReading... 101 4.6 Software... 102 4.7 Exercises... 102 5 Nonparametric Modeling and Smooth Effects... 105 5.1 Smooth Baseline Hazard... 105 5.1.1 Estimation... 109 5.1.2 SmoothLife Table Estimates... 112 5.2 AdditiveModels... 115 5.3 Time-VaryingCoefficients... 118 5.3.1 Penalty for Smooth Time-Varying Effects andselection... 119 5.3.2 Time-VaryingEffects and AdditiveModels... 121 5.4 Inclusionof CalendarTime... 122 5.5 Literatureand FurtherReading... 124 5.6 Software... 125 5.7 Exercises... 125 6 Tree-Based Approaches... 129 6.1 Recursive Partitioning... 130 6.2 Recursive Partitioning Based on Covariate-Free Discrete Hazard Models... 132 6.3 Recursive Partitioning with Binary Outcome... 133 6.4 Ensemble Methods... 141 6.4.1 Bagging... 141 6.4.2 Random Forests... 142 6.5 Literatureand FurtherReading... 144 6.6 Software... 144 6.7 Exercises... 144
Contents ix 7 High-Dimensional Models: Structuring and Selection of Predictors... 149 7.1 Penalized Likelihood Approaches... 151 7.2 Boosting... 155 7.2.1 GenericBoosting AlgorithmforArbitraryOutcomes... 155 7.2.2 Application to Discrete Hazard Models... 158 7.3 Extensionto AdditivePredictors... 162 7.4 Literatureand FurtherReading... 163 7.5 Software... 164 7.6 Exercises... 164 8 Competing Risks Models... 167 8.1 ParametricModels... 167 8.1.1 MultinomialLogitModel... 169 8.1.2 OrderedTarget Events... 170 8.1.3 GeneralForm... 170 8.1.4 SeparateModelingof Single Targets... 171 8.2 Maximum Likelihood Estimation... 172 8.3 Variable Selection... 175 8.4 Literatureand FurtherReading... 180 8.5 Software... 181 8.6 Exercises... 181 9 Frailty Models and Heterogeneity... 185 9.1 Discrete Hazard Frailty Model... 186 9.1.1 Individual and Population Level Hazard... 186 9.1.2 Basic Frailty Model Including Covariates... 188 9.1.3 Modeling with Frailties... 190 9.2 Estimation of Frailty Models... 192 9.3 Extensions to Additive Models Including Frailty... 194 9.4 Variable Selection in Frailty Models... 197 9.5 Fixed-EffectsModel... 199 9.6 Finite MixtureModels... 201 9.6.1 Extensions to Covariate-Dependent Mixture Probabilities... 203 9.6.2 The Cure Model... 204 9.6.3 Estimationfor Finite Mixtures... 207 9.7 Sequential Models in Item Response Theory... 208 9.8 Literatureand FurtherReading... 209 9.9 Software... 210 9.10 Exercises... 211 10 Multiple-Spell Analysis... 213 10.1 Multiple Spells... 213 10.1.1 Estimation... 214 10.2 Multiple Spells as Repeated Measurements... 216
x Contents 10.3 GeneralizedEstimation Approachto Repeated Measurements... 219 10.4 Literatureand FurtherReading... 221 10.5 Software... 221 10.6 Exercises... 221 References... 225 List of Examples... 237 Subject Index... 239 Author Index... 243