Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Similar documents
Variable Selection and Model Choice in Structured Survival Models

mboost - Componentwise Boosting for Generalised Regression Models

Modelling geoadditive survival data

Analysing geoadditive regression data: a mixed model approach

A Framework for Unbiased Model Selection Based on Boosting

gamboostlss: boosting generalized additive models for location, scale and shape

A general mixed model approach for spatio-temporal regression data

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Regularization in Cox Frailty Models

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Beyond Mean Regression

Estimating prediction error in mixed models

Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost

Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost

Model-based boosting in R

Boosting Techniques for Nonlinear Time Series Models

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

P -spline ANOVA-type interaction models for spatio-temporal smoothing

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

Regression, Ridge Regression, Lasso

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Lecture 3: Introduction to Complexity Regularization

Fahrmeir: Discrete failure time models

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Package penmsm. R topics documented: February 20, Type Package

Lecture 3: Statistical Decision Theory (Part II)

Selection of Effects in Cox Frailty Models by Regularization Methods

Conditional Transformation Models

Analysis of competing risks data and simulation of data following predened subdistribution hazards

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Efficient Semiparametric Estimators via Modified Profile Likelihood in Frailty & Accelerated-Failure Models

Survival Analysis I (CHL5209H)

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

twinsir: Individual-level epidemic modeling for a fixed population with known distances

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )

Adapting Prediction Error Estimates for Biased Complexity Selection in High-Dimensional Bootstrap Samples. Harald Binder & Martin Schumacher

Chapter 7: Model Assessment and Selection

STAT331. Cox s Proportional Hazards Model

Ensemble Methods and Random Forests

Reduced-rank hazard regression

Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Statistics 262: Intermediate Biostatistics Model selection

Generalized Boosted Models: A guide to the gbm package

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.

Dynamic Models Part 1

STAT 740: B-splines & Additive Models

Generalized additive modelling of hydrological sample extremes

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Geoadditive regression modeling of stream biological condition

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

Multivariate Calibration with Robust Signal Regression

Modelling spatial patterns of distribution and abundance of mussel seed using Structured Additive Regression models

CURE MODEL WITH CURRENT STATUS DATA

Tutz, Binder: Flexible Modelling of Discrete Failure Time Including Time-Varying Smooth Effects

ECE521 week 3: 23/26 January 2017

Lecture 6: Methods for high-dimensional problems

Application of density estimation methods to quantal analysis

Dynamic survival models for credit risks.

Survival Regression Models

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Modeling Real Estate Data using Quantile Regression

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

Stat 5101 Lecture Notes

Chapter 3. Linear Models for Regression

STK-IN4300 Statistical Learning Methods in Data Science

Introduction to Statistical modeling: handout for Math 489/583

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Penalized Splines, Mixed Models, and Recent Large-Sample Results

TGDR: An Introduction

PENALIZED STRUCTURED ADDITIVE REGRESSION FOR SPACE-TIME DATA: A BAYESIAN PERSPECTIVE

CS6220: DATA MINING TECHNIQUES

Flexible Spatio-temporal smoothing with array methods

A Penalty Approach to Differential Item Functioning in Rasch Models

Model checking overview. Checking & Selecting GAMs. Residual checking. Distribution checking

DATA-ADAPTIVE VARIABLE SELECTION FOR

Estimating Timber Volume using Airborne Laser Scanning Data based on Bayesian Methods J. Breidenbach 1 and E. Kublin 2

Linear Models in Machine Learning

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Lecture 22 Survival Analysis: An Introduction

Model comparison and selection

A Variational Model for Data Fitting on Manifolds by Minimizing the Acceleration of a Bézier Curve

Building a Prognostic Biomarker

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Ph.D. course: Regression models

Survival Analysis Math 434 Fall 2011

Assessment of time varying long term effects of therapies and prognostic factors

Spectral Regularization

Statistical Machine Learning from Data

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Transcription:

Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität Erlangen-Nürnberg joint work with Thomas Kneib and Torsten Hothorn Department of Statistics Ludwig-Maximilians-Universität München user! 2008 1 benjamin.hofner@imbe.med.uni-erlangen.de

Introduction Cox PH model: λ i (t) = λ(t, x i ) = λ 0 (t) exp(x iβ) with λ i (t) hazard rate of observation i [i = 1,..., n] λ 0 (t) baseline hazard rate x i vector of covariates for observation i [i = 1,..., n] β vector of regression coefficients Problem: restrictive model, not allowing for non-proportional hazards (e.g., time-varying effects) non-linear effects

Additive Hazard Regression Generalisation: Additive Hazard Regression (Kneib & Fahrmeir, 2007) λ i (t) = exp(η i (t)) with J η i (t) = f j (x i (t)), j=1 generic representation of covariate effects f j (x i ) a) linear effects: f j (x i (t)) = f linear ( x i ) = x i β b) smooth effects: f j (x i (t)) = f smooth ( x i ) c) time-varying effects: f j (x i (t)) = f smooth (t) x i where x i x i (t). Note: c) includes log-baseline for x i 1

P-Splines flexible terms can be represented using P-splines (Eilers & Marx, 1996) with model term (x can be either x i or t): M f j (x) = β jm B jm (x) (j = 1,..., J) penalty: m=1 pen j (β j ) = { κj β j Kβ j cases b),c) 0 case a) K = D ( D (i.e., cross product of ) difference matrix D) D e.g. 1 2 1... = 0 1 2 1... κ j smoothing parameter (larger κ j more penalization smoother fit)

Inference Penalized Likelihood Criterion: (NB: this is the full log-likelihood) L pen (β) = n i=1 [ δ i η i (t i ) ti 0 ] exp(η i (t)) dt J pen j (β j ) j=0 T i true survival time C i censoring time t i = min(t i, C i ) observed survival time (right censoring) δ i = 1(T i C i ) indicator for non-censoring Problem: Estimation and in particular model choice

Cox flex Boost Aim: Maximization of a (potentially) high-dimensional log-likelihood with different modeling alternatives Thus, we use: Iterative algorithm Likelihood-based boosting algorithm Component-wise base-learners Therefore: Use one base-learner g j ( ) for each covariate (or each model component) [ j {1,..., J} ] Component-Wise Boosting as a means of estimation and variable selection combined with model choice.

Cox flex Boost Aim: Maximization of a (potentially) high-dimensional log-likelihood with different modeling alternatives Thus, we use: Iterative algorithm Likelihood-based boosting algorithm Component-wise base-learners Therefore: Use one base-learner g j ( ) for each covariate (or each model component) [ j {1,..., J} ] Component-Wise Boosting as a means of estimation and variable selection combined with model choice.

Cox flex Boost Aim: Maximization of a (potentially) high-dimensional log-likelihood with different modeling alternatives Thus, we use: Iterative algorithm Likelihood-based boosting algorithm Component-wise base-learners Therefore: Use one base-learner g j ( ) for each covariate (or each model component) [ j {1,..., J} ] Component-Wise Boosting as a means of estimation and variable selection combined with model choice.

Cox flex Boost Algorithm (i) Initialization: Iteration index m := 0. Function estimates (for all j {1,..., J}): ˆf [0] j ( ) 0 Offset (MLE for constant log hazard): ( n ˆη [0] i=1 ( ) log δ ) i n i=1 t i

(ii) Estimation: m := m + 1. Fit all (linear/p-spline) base-learners separately by penalized MLE, i.e., ĝ j = g j ( ; ˆβ j ), j {1,..., J}, ˆβ j = arg max β L[m] j,pen (β) with the penalized log-likelihood ( analogously as above ) n [ L [m] j,pen (β) = δ i (ˆη [m 1] i + g j (x i (t i ); β)) i=1 with the additive predictor η i split ti { } ] exp ˆη [m 1] i ( t) + g j (x i ( t); β) d t pen j (β), 0 into the estimate from previous iteration ˆη [m 1] i and the current base-learner g j ( ; β)

(iii) Selection: Choose base-learner ĝ j with j = arg max j {1,...,J} L[m] j,unpen (ˆβ j ) (iv) Update: Function estimates (for all j {1,..., J}): { [m 1] ˆf j + ν ĝ j j = j ˆf [m] j = Additive predictor (= fit): ˆf [m 1] j j j ˆη [m] = ˆη [m 1] + ν ĝ j with step-length ν (0, 1] (here: ν = 0.1) (v) Stopping rule: Continue iterating steps (ii) to (iv) until m = m stop

Some Aspects of Cox flex Boost Estimation Selection Base-Learners full penalized MLE ν (step-length) based on unpenalized log-likelihood L [m] j,unpen specified by (initial) degrees of freedom, i.e., df j = df j Likelihood-based boosting (in general): See, e.g., Tutz and Binder (2006) Above aspects in Cox flex Boost: See, e.g., model based boosting (Bühlmann & Hothorn, 2007)

Degrees of Freedom Specifying df more intuitive than specifying smoothing parameter κ Comparable to other modeling components, e.g., linear effects Problem: Not constant over the (boosting) iterations But simulation studies showed: No big deviation from the initial df j = df j bbs(x 3) df(m) 0.0 0.2 0.4 0.6 0.8 1.0 Estimated degrees of freedom traced over the boosting steps for the flexible base-learners of x 3 (in 200 replicates) and initially specified degrees of freedom (dashed line). 0 200 400 600 800 boosting iteration m

Model Choice Recall from generic representation: f j ( x i ) can be a a) linear effect: f j (x i (t)) = f linear ( x i ) = x i β b) smooth effect: f j (x i (t)) = f smooth ( x i ) c) time-varying effect: f j (x i (t)) = f smooth (t) x i We see: x i can enter the model in 3 different ways But how? Add all possibilities as base-learners to the model. Boosting can chose between the possibilities But the df must be comparable! Otherwise: more flexible base-learners are preferred

Model Choice Recall from generic representation: f j ( x i ) can be a a) linear effect: f j (x i (t)) = f linear ( x i ) = x i β b) smooth effect: f j (x i (t)) = f smooth ( x i ) c) time-varying effect: f j (x i (t)) = f smooth (t) x i We see: x i can enter the model in 3 different ways But how? Add all possibilities as base-learners to the model. Boosting can chose between the possibilities But the df must be comparable! Otherwise: more flexible base-learners are preferred

For higher order differences (d 2): df > 1 (κ ) Polynomial of order d 1 remains unpenalized Solution: Decomposition (based on Kneib, Hothorn, & Tutz, 2008) g(x) = β 0 + β 1 x +... + β d 1 x d 1 }{{} unpenalized, parametric part + g centered (x) }{{} deviation from polynomial Add unpenalized part as separate, parametric base-learners Assign df = 1 to the centered effect (and add as P-spline base-learner) Analogously for time-varying effects Technical realization (see Fahrmeir, Kneib, & Lang, 2004): decomposing the vector of regression coefficients β into ( β unpen, β pen ) utilizing a spectral decomposition of the penalty matrix

Early Stopping 1 Run the algorithm m stop -times (previously defined). 2 Determine new m stop,opt m stop :... based on out-of-bag sample (with simulations easy to use)... based on information criterion, e.g., AIC Prevents algorithm to stop in a local maximum (of the log-likelihood) Early stopping prevents overfitting

Variable Selection and Model Choice... is achieved by selection of base-learner (in step (iii) of Cox flex Boost), i.e., component-wise boosting and early stopping Simulation-Results (in Short) Good variable selection strategy Good model choice strategy if only linear and smooth effects are used Selection bias in favor of time-varying base-learners (if present) standardizing time could be a solution Estimates are better if model choice is performed

Computational Aspects Cox flex Boost is implemented using R Crucial computation: Integral in L [m] j,pen (β): ti 0 { } exp ˆη [m 1] i ( t) + g j (x i ( t); β) d t time consuming very often evaluated (maximization of L [m] j,pen (β)) R-function integrate() slow in this context (specialized) vectorized trapezoid integration implemented 100 times quicker Efficient storage of matrices can reduce computational burden recycling of results

Computational Aspects Cox flex Boost is implemented using R Crucial computation: Integral in L [m] j,pen (β): ti 0 { } exp ˆη [m 1] i ( t) + g j (x i ( t); β) d t time consuming very often evaluated (maximization of L [m] j,pen (β)) R-function integrate() slow in this context (specialized) vectorized trapezoid integration implemented 100 times quicker Efficient storage of matrices can reduce computational burden recycling of results

Computational Aspects Cox flex Boost is implemented using R Crucial computation: Integral in L [m] j,pen (β): ti 0 { } exp ˆη [m 1] i ( t) + g j (x i ( t); β) d t time consuming very often evaluated (maximization of L [m] j,pen (β)) R-function integrate() slow in this context (specialized) vectorized trapezoid integration implemented 100 times quicker Efficient storage of matrices can reduce computational burden recycling of results

Summary & Outlook Cox flex Boost...... allows for variable selection and model choice.... allows for flexible modeling flexible, non-linear effects time-varying effects (i.e., non-proportional hazards)... provides functions to manipulate and show results (summary(), plot(), subset(),... ) To be continued... Formula for AIC (for Boosting in Survival Models) Include mandatory covariates (update in each step) Measure for variable importance: e.g., ˆf [mstop] j ( )

Summary & Outlook Cox flex Boost...... allows for variable selection and model choice.... allows for flexible modeling flexible, non-linear effects time-varying effects (i.e., non-proportional hazards)... provides functions to manipulate and show results (summary(), plot(), subset(),... ) To be continued... Formula for AIC (for Boosting in Survival Models) Include mandatory covariates (update in each step) Measure for variable importance: e.g., ˆf [mstop] j ( )

Literature Bühlmann, P., & Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statistical Science, 22(4), 477-505. Eilers, P. H. C., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11(2), 89 121. Fahrmeir, L., Kneib, T., & Lang, S. (2004). Penalized structured additive regression: A Bayesian perspective. Statistica Sinica, 14, 731 761. Kneib, T., & Fahrmeir, L. (2007). A mixed model approach for geoadditive hazard regression. Scand. J. Statist., 34, 207 228. Kneib, T., Hothorn, T., & Tutz, G. (2008). Variable selection and model choice in geoadditive regression. Biometrics (accepted). Tutz, G., & Binder, H. (2006). Generalized additive modelling with implicit variable selection by likelihood-based boosting. Biometrics, 62, 961 971.

l with model choice l without model choice log(hazard rate) 0.8 0.6 0.4 0.2 0.0 0.2 0.4 log(hazard rate) 0.8 0.6 0.4 0.2 0.0 0.2 0.4 1.0 0.5 0.0 0.5 1.0 x 1 1.0 0.5 0.0 0.5 1.0 x 1 log(hazard rate) 3 2 1 0 1 2 3 log(hazard rate) 2 0 2 2 1 0 1 2 3 2 1 0 1 2

log(hazard rate) 6 4 2 0 2 log(hazard rate) 6 4 2 0 2 0 2 4 6 8 10 time 0 2 4 6 8 10 time