Springer Series in Statistics

Similar documents
Semantics of the Probabilistic Typed Lambda Calculus

Igor Emri Arkady Voloshin. Statics. Learning from Engineering Examples

Doubt-Free Uncertainty In Measurement

Non-Western Theories of International Relations

Topics in Algebra and Analysis

Dynamics and Control of Lorentz-Augmented Spacecraft Relative Motion

Dynamics Formulas and Problems

Springer Atmospheric Sciences

SpringerBriefs in Probability and Mathematical Statistics

Multivariable Calculus with MATLAB

SpringerBriefs in Mathematics

Non-Instantaneous Impulses in Differential Equations

Electrochemical Science for a Sustainable Society

Fundamentals of Mass Determination

Particle Acceleration and Detection

Springer Series on Atomic, Optical, and Plasma Physics

Solid Phase Microextraction

Theory of Nonparametric Tests

Advanced Calculus of a Single Variable

Lecture Notes in Mathematics 2156

SpringerBriefs in Statistics

UNITEXT La Matematica per il 3+2. Volume 87

CISM International Centre for Mechanical Sciences

Fundamentals of Electrical Circuit Analysis

ThiS is a FM Blank Page

Signaling and Communication in Plants

Stochastic and Infinite Dimensional Analysis

Astronomers Universe. More information about this series at

Quantum Biological Information Theory

Theoretical Physics 4

Springer Proceedings in Mathematics & Statistics. Volume 206

Statistics and Measurement Concepts with OpenStat

Lecture Notes in Mathematics 2209

Springer Biographies

Tritium: Fuel of Fusion Reactors

Lecture Notes in Mathematics 2138

Differential-Algebraic Equations Forum

Generalized Locally Toeplitz Sequences: Theory and Applications

SpringerBriefs in Agriculture

Advanced Courses in Mathematics CRM Barcelona

Advanced Structured Materials

Publication of the Museum of Nature South Tyrol Nr. 11

Fractal Control Theory

Hiromitsu Yamagishi Netra Prakash Bhandary Editors. GIS Landslide

EURO Advanced Tutorials on Operational Research. Series editors M. Grazia Speranza, Brescia, Italy José Fernando Oliveira, Porto, Portugal

Statics and Mechanics of Structures

Radiation Therapy Study Guide

Ahsan Habib Khandoker Chandan Karmakar Michael Brennan Andreas Voss Marimuthu Palaniswami. Poincaré Plot Methods for Heart Rate Variability Analysis

Statics and Influence Functions From a Modern Perspective

Wei Gao. Editor. Graphene Oxide. Reduction Recipes, Spectroscopy, and Applications

Progress in Mathematics 313. Jaume Llibre Rafael Ramírez. Inverse Problems in Ordinary Differential Equations and Applications

Geotechnologies and the Environment

Springer INdAM Series

Challenges and Advances in Computational Chemistry and Physics

Public Participation as a Tool for Integrating Local Knowledge into Spatial Planning

Springer Proceedings in Mathematics & Statistics. Volume 226

Undergraduate Lecture Notes in Physics

Electroanalysis in Biomedical and Pharmaceutical Sciences

Springer Series in Statistics. Gerhard Tutz Matthias Schmid. Modeling Discrete Time-to-Event Data

Plant and Vegetation. Volume 14. Series editor M.J.A. Werger, Utrecht, The Netherlands

Mechanics of Materials

UNITEXT La Matematica per il 3+2

Regularization in Cox Frailty Models

Qing-Hua Qin. Advanced Mechanics of Piezoelectricity

Two -Dimensional Digital Signal Processing II

Karl-Rudolf Koch Introduction to Bayesian Statistics Second Edition

361 Topics in Current Chemistry

40 Topics in Heterocyclic Chemistry

Modern Birkhäuser Classics

SpringerBriefs in Probability and Mathematical Statistics

Springer-Verlag Berlin Heidelberg GmbH

Springer Series in Solid-State Sciences

Data Analysis Using the Method of Least Squares

Studies in Systems, Decision and Control. Series editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

Landolt-Börnstein / New Series

Probability Theory, Random Processes and Mathematical Statistics

Astronomers Universe

Lecture Notes in Mathematics Editors: J.-M. Morel, Cachan F. Takens, Groningen B. Teissier, Paris

Springer Berlin Heidelberg New York Barcelona Budapest Hong Kong London Milan Paris Santa Clara Singapore Tokyo

Conference Proceedings of the Society for Experimental Mechanics Series

UNITEXT La Matematica per il 3+2

Universitext. Series editors Sheldon Axler San Francisco State University. Carles Casacuberta Universitat de Barcelona

Springer Texts in Electrical Engineering. Consulting Editor: John B. Thomas

Statistics for Social and Behavioral Sciences

Linear Models in Matrix Form

Lecture Notes in Economics and Mathematical Systems

Progress in Advanced Structural and Functional Materials Design

Polymers on the Crime Scene

Experimental Techniques in Nuclear and Particle Physics

Regulated CheInicals Directory

Econometric Analysis of Count Data

Discriminant Analysis and Statistical Pattern Recognition

Nuclear Magnetic Resonance Data

Mathematical Engineering

332 Topics in Current Chemistry

Applied Multivariate Statistical Analysis

STATISTICAL ANALYSIS WITH MISSING DATA

Landolt-Börnstein Numerical Data and Functional Relationships in Science and Technology New Series / Editor in Chief: W.

More information about this series at

Torge Geodesy. Unauthenticated Download Date 1/9/18 5:16 AM

Transcription:

Springer Series in Statistics Series editors Peter Bickel, CA, USA Peter Diggle, Lancaster, UK Stephen E. Fienberg, Pittsburgh, PA, USA Ursula Gather, Dortmund, Germany Ingram Olkin, Stanford, CA, USA Scott Zeger, Baltimore, MD, USA

More information about this series at http://www.springer.com/series/692

Gerhard Tutz Matthias Schmid Modeling Discrete Time-to-Event Data 123

Gerhard Tutz LMU Munich Munich, Germany Matthias Schmid University of Bonn Bonn, Germany ISSN 0172-7397 ISSN 2197-568X (electronic) Springer Series in Statistics ISBN 978-3-319-28156-8 ISBN 978-3-319-28158-2 (ebook) DOI 10.1007/978-3-319-28158-2 Library of Congress Control Number: 2016942538 Springer International Publishing Switzerland 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG Switzerland

Preface In recent years, a large variety of textbooks dealing with time-to-event analysis has been published. Most of these books focus on the statistical analysis of observations in continuous time. In practice, however, one often observes discrete event times either because of grouping effects or because event times are intrinsically measured on a discrete scale. Statistical methodology for discrete event times has been mainly presented in journal articles and a few book chapters. In this book we introduce basic concepts and give several extensions that allow to model discrete time data adequately. In particular, modeling discrete time-to-event data strongly profits from the smoothing and regularization methods that have been developed in recent decades. The presented approaches include methods that allow to find much more flexible models than in the early times of survival modeling. The book is aimed at applied statisticians, students of statistics and researchers from areas like biometrics, social sciences and econometrics. The mathematical level is moderate, instead we focus on basic concepts and data analysis. Objectives The main aims of the book are to provide a thorough introduction to basic and advanced concepts of discrete hazard modelling, to exploit the relationship between hazard models and generalized linear models, to demonstrate how existing statistical software can be used to fit discrete timeto-event models, and to illustrate the statistical methodology for discrete time-to-event models by considering applications from the social sciences, economics and biomedical sciences. v

vi Preface Special Topics This book provides a comprehensive treatment of statistical methodology for discrete time-to-event models. Special topics include non-parametric modeling of survival (e.g., by using smooth baseline hazards and/or smooth predictor effects), methods for the evaluation of model fit and prediction accuracy of discrete timeto-event models, regularized estimation techniques for predictor selection in high-dimensional covariate spaces, and tree-based methods for discrete time-to-event analysis. In addition, each section of the book contains a set of exercises on the respective topics. Implementation and Software All numerical results presented in this book were obtained by using the R System for Statistical Computing (R Core Team 2015). Hence readers are able to reproduce all the results by using freely available software. Various functions and tools for the analysis of discrete time-to-event data are collected in the R package discsurv (Welchowski and Schmid 2015). We are grateful to many colleagues for valuable discussions and suggestions, in particular to Kaveh Bashiri, Moritz Berger, Jutta Gampe, Andreas Groll, Wolfgang Hess, Stephanie Möst, Vito M. R. Mugeo, Margret Oelker, Hein Putter, Micha Schneider and Steffen Unkel. Silke Janitza carefully read preliminary versions of the book and helped to reduce the number of mistakes. We also thank Helmut Küchenhoff for late but substantial suggestions. Special thanks go to Thomas Welchowski for his excellent programming work and to Pia Oberschmidt for assisting us in compiling the subject index. München, Germany Bonn, Germany April 2015 Gerhard Tutz Matthias Schmid

Contents 1 Introduction... 1 1.1 Survivaland Time-to-EventData... 1 1.2 Continuous Versus Discrete Survival... 4 1.3 Overview... 6 1.4 Examples... 7 2 The Life Table... 15 2.1 Life Table Estimates... 15 2.1.1 DistributionalAspects... 19 2.1.2 SmoothLife Table Estimators... 20 2.1.3 HeterogeneousIntervals... 23 2.2 Kaplan MeierEstimator... 25 2.3 Life Tables in Demography... 27 2.4 Literatureand FurtherReading... 31 2.5 Software... 31 2.6 Exercises... 32 3 Basic Regression Models... 35 3.1 The Discrete Hazard Function... 35 3.2 ParametricRegression Models... 37 3.2.1 Logistic Discrete Hazards: The Proportional ContinuationRatio Model... 38 3.2.2 AlternativeModels... 42 3.3 Discrete and Continuous Hazards... 48 3.3.1 Concepts for Continuous Time... 48 3.3.2 The Proportional Hazards Model... 50 3.4 Estimation... 51 3.4.1 StandardErrors... 58 3.5 Time-VaryingCovariates... 59 3.6 Continuous Versus Discrete Proportional Hazards... 64 3.7 Subject-SpecificIntervalCensoring... 67 3.8 Literatureand FurtherReading... 70 vii

viii Contents 3.9 Software... 70 3.10 Exercises... 71 4 Evaluation and Model Choice... 73 4.1 Relevance ofpredictors: Tests... 73 4.2 Residuals and Goodness-of-Fit... 77 4.2.1 No Censoring... 78 4.2.2 Deviancein the Case of Censoring... 80 4.2.3 MartingaleResiduals... 81 4.3 MeasuringPredictivePerformance... 86 4.3.1 Predictive Deviance and R 2 Coefficients... 86 4.3.2 PredictionErrorCurves... 88 4.3.3 DiscriminationMeasures... 92 4.4 Choice of Link FunctionandFlexible Links... 96 4.4.1 Families of Response Functions... 97 4.4.2 Nonparametric Estimation of Link Functions... 101 4.5 Literatureand FurtherReading... 101 4.6 Software... 102 4.7 Exercises... 102 5 Nonparametric Modeling and Smooth Effects... 105 5.1 Smooth Baseline Hazard... 105 5.1.1 Estimation... 109 5.1.2 SmoothLife Table Estimates... 112 5.2 AdditiveModels... 115 5.3 Time-VaryingCoefficients... 118 5.3.1 Penalty for Smooth Time-Varying Effects andselection... 119 5.3.2 Time-VaryingEffects and AdditiveModels... 121 5.4 Inclusionof CalendarTime... 122 5.5 Literatureand FurtherReading... 124 5.6 Software... 125 5.7 Exercises... 125 6 Tree-Based Approaches... 129 6.1 Recursive Partitioning... 130 6.2 Recursive Partitioning Based on Covariate-Free Discrete Hazard Models... 132 6.3 Recursive Partitioning with Binary Outcome... 133 6.4 Ensemble Methods... 141 6.4.1 Bagging... 141 6.4.2 Random Forests... 142 6.5 Literatureand FurtherReading... 144 6.6 Software... 144 6.7 Exercises... 144

Contents ix 7 High-Dimensional Models: Structuring and Selection of Predictors... 149 7.1 Penalized Likelihood Approaches... 151 7.2 Boosting... 155 7.2.1 GenericBoosting AlgorithmforArbitraryOutcomes... 155 7.2.2 Application to Discrete Hazard Models... 158 7.3 Extensionto AdditivePredictors... 162 7.4 Literatureand FurtherReading... 163 7.5 Software... 164 7.6 Exercises... 164 8 Competing Risks Models... 167 8.1 ParametricModels... 167 8.1.1 MultinomialLogitModel... 169 8.1.2 OrderedTarget Events... 170 8.1.3 GeneralForm... 170 8.1.4 SeparateModelingof Single Targets... 171 8.2 Maximum Likelihood Estimation... 172 8.3 Variable Selection... 175 8.4 Literatureand FurtherReading... 180 8.5 Software... 181 8.6 Exercises... 181 9 Frailty Models and Heterogeneity... 185 9.1 Discrete Hazard Frailty Model... 186 9.1.1 Individual and Population Level Hazard... 186 9.1.2 Basic Frailty Model Including Covariates... 188 9.1.3 Modeling with Frailties... 190 9.2 Estimation of Frailty Models... 192 9.3 Extensions to Additive Models Including Frailty... 194 9.4 Variable Selection in Frailty Models... 197 9.5 Fixed-EffectsModel... 199 9.6 Finite MixtureModels... 201 9.6.1 Extensions to Covariate-Dependent Mixture Probabilities... 203 9.6.2 The Cure Model... 204 9.6.3 Estimationfor Finite Mixtures... 207 9.7 Sequential Models in Item Response Theory... 208 9.8 Literatureand FurtherReading... 209 9.9 Software... 210 9.10 Exercises... 211 10 Multiple-Spell Analysis... 213 10.1 Multiple Spells... 213 10.1.1 Estimation... 214 10.2 Multiple Spells as Repeated Measurements... 216

x Contents 10.3 GeneralizedEstimation Approachto Repeated Measurements... 219 10.4 Literatureand FurtherReading... 221 10.5 Software... 221 10.6 Exercises... 221 References... 225 List of Examples... 237 Subject Index... 239 Author Index... 243