Modelling geoadditive survival data

Similar documents
A general mixed model approach for spatio-temporal regression data

Analysing geoadditive regression data: a mixed model approach

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Regularization in Cox Frailty Models

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

Beyond Mean Regression

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

PENALIZED STRUCTURED ADDITIVE REGRESSION FOR SPACE-TIME DATA: A BAYESIAN PERSPECTIVE

Multivariate Survival Analysis

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Estimating Timber Volume using Airborne Laser Scanning Data based on Bayesian Methods J. Breidenbach 1 and E. Kublin 2

Contents. Part I: Fundamentals of Bayesian Inference 1

Stat 5101 Lecture Notes

Structured Additive Regression Models: An R Interface to BayesX

Aggregated cancer incidence data: spatial models

Modeling Real Estate Data using Quantile Regression

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Disease mapping with Gaussian processes

General Regression Model

Variable Selection and Model Choice in Structured Survival Models

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Semiparametric Regression

Fahrmeir: Discrete failure time models

Scaling intrinsic Gaussian Markov random field priors in spatial modelling

A short introduction to INLA and R-INLA

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Estimating prediction error in mixed models

Bayesian Methods for Machine Learning

TGDR: An Introduction

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

for function values or parameters, which are neighbours in the domain of a metrical covariate or a time scale, or in space. These concepts have been u

Reduced-rank hazard regression

Longitudinal + Reliability = Joint Modeling

ST745: Survival Analysis: Cox-PH!

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Modelling spatial patterns of distribution and abundance of mussel seed using Structured Additive Regression models

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Spatially Adaptive Smoothing Splines

Bayesian linear regression

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Lecture 3. Truncation, length-bias and prevalence sampling

SSUI: Presentation Hints 2 My Perspective Software Examples Reliability Areas that need work

CTDL-Positive Stable Frailty Model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET

Lecture 5 Models and methods for recurrent event data

Philosophy and Features of the mstate package

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

Building a Prognostic Biomarker

mboost - Componentwise Boosting for Generalised Regression Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Power and Sample Size Calculations with the Additive Hazards Model

Classification. Chapter Introduction. 6.2 The Bayes classifier

CURE MODEL WITH CURRENT STATUS DATA

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Disease mapping with Gaussian processes

Subject CS1 Actuarial Statistics 1 Core Principles

Outline. Frailty modelling of Multivariate Survival Data. Clustered survival data. Clustered survival data

Semiparametric Generalized Linear Models

Frailty Modeling for clustered survival data: a simulation study

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions

3 Joint Distributions 71

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

Package SimSCRPiecewise

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Working Papers in Economics and Statistics

Sparse Linear Models (10/7/13)

Statistical Inference and Methods

Econometric Analysis of Cross Section and Panel Data

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Accelerated Failure Time Models

Index. Regression Models for Time Series Analysis. Benjamin Kedem, Konstantinos Fokianos Copyright John Wiley & Sons, Inc. ISBN.

Approximation of Survival Function by Taylor Series for General Partly Interval Censored Data

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

ECE521 week 3: 23/26 January 2017

Bayesian mixture modeling for spectral density estimation

On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data

Survival Analysis. Stat 526. April 13, 2018

STAT 740: B-splines & Additive Models

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Survival Analysis Math 434 Fall 2011

Outline of GLMs. Definitions

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Problem Set 3: Bootstrap, Quantile Regression and MCMC Methods. MIT , Fall Due: Wednesday, 07 November 2007, 5:00 PM

arxiv: v1 [stat.co] 16 May 2011

Model Selection in Bayesian Survival Analysis for a Multi-country Cluster Randomized Trial

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Selection of Effects in Cox Frailty Models by Regularization Methods

Assessment of time varying long term effects of therapies and prognostic factors

STATISTICS-STAT (STAT)

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

Introduction to BGPhazard

Nonparametric Bayesian Methods - Lecture I

Poisson regression 1/15

Transcription:

Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model based inference 4. Results 5. Software 6. Discussion 22.11.2004

Leukemia survival data Survival time of adults after diagnosis of acute myeloid leukemia. Leukemia survival data 1,043 cases diagnosed between 1982 and 1998 in Northwest England. 16 % (right) censored. Continuous and categorical covariates: age age at diagnosis, wbc white blood cell count at diagnosis, sex sex of the patient, tpi Townsend deprivation index. Spatial information in different resolution. Modelling geoadditive survival data 1

Classical Cox proportional hazards model: Leukemia survival data λ(t; x) = λ 0 (t) exp(x γ). Baseline-hazard λ 0 (t) is a nuisance parameter and remains unspecified. Estimate γ based on the partial likelihood. Questions / Limitations: Estimate the baseline simultaneously with covariate effects. Flexible modelling of covariate effects (e.g. nonlinear effects, interactions). Spatially correlated survival times. Non-proportional hazards models / time-varying effects. Structured hazard regression models. Modelling geoadditive survival data 2

Structured hazard regression Structured hazard regression Replace usual parametric predictor with a flexible semiparametric predictor λ(t; ) = λ 0 (t) exp[f 1 (age) + f 2 (wbc) + f 3 (tpi) + f spat (s i ) + γ 1 sex] and absorb the baseline where λ(t; ) = exp[f 0 (t) + f 1 (age) + f 2 (wbc) + f 3 (tpi) + f spat (s i ) + γ 1 sex] f 0 (t) = log(λ 0 (t)) is the log-baseline-hazard, f 1, f 2, f 3 are nonparametric functions of age, white blood cell count and deprivation, and f spat is a spatial function. Modelling geoadditive survival data 3

Structured hazard regression f 0 (t), f 1 (age), f 2 (wbc), f 3 (tpi): P-splines Approximate f j by a B-spline of a certain degree (basis function approach). Penalize differences between parameters of adjacent basis functions to ensure smoothness. Alternatives: Random walks, more general autoregressive priors. f spat (s): District-level analysis Markov random field approach. Generalization of a first order random walk to two dimensions. Consider two districts as neighbors if they share a common boundary. Assume that the expected value of f spat (s) is the average of the function evaluations of adjacent sites. Modelling geoadditive survival data 4

Structured hazard regression f spat (s): Individual-level analysis Stationary Gaussian random field (kriging). Spatial effect follows a zero mean stationary Gaussian stochastic process. Correlation of two arbitrary sites is defined by an intrinsic correlation function. Low-rank approximations to Gaussian random fields. Extensions Cluster-specific frailties. Surface smoothers based on two-dimensional P-splines. Varying coefficient terms with continuous or spatial effect modifiers. Time-varying effects based on varying coefficient terms with survival time as effect modifier. Structured hazard regression handles all model terms in a unified way. Modelling geoadditive survival data 5

Structured hazard regression Express f j as the product of a design matrix Z j and regression coefficients β j. Rewrite the model in matrix notation as log(λ(t; )) = Z 0 (t)β 0 + Z 1 β 1 + Z 2 β 2 + Z 3 β 3 + Z spat β spat + Uγ. Bayesian approach: Assign an appropriate prior to β j. Frequentist approach: Assume (correlated) random effects distribution for β j. All priors can be cast into the general form ( p(β j τ 2 j ) exp 1 2τ 2 j β jk j β j ) where K j is a penalty matrix and τ 2 j is a smoothing parameter. Type of the covariate and prior beliefs about the smoothness of f j determine special Z j and K j. Modelling geoadditive survival data 6

Mixed model based inference Mixed model based inference Each parameter vector β j can be partitioned into an unpenalized part (with flat prior) and a penalized part (with i.i.d. Gaussian prior), i.e. β j = Z unp j β unp j This yields a variance components model + Z pen j β pen j η = X unp β unp + X pen β pen with and p(β unp ) const β pen N(0, Λ) Λ = blockdiag(τ 2 0 I,..., τ 2 spati). Regression coefficients are estimated via a Newton-Raphson-algorithm. Numerical integration has to be used to evaluate the log-likelihood and its derivatives. Modelling geoadditive survival data 7

Mixed model based inference The variance components representation with proper priors allows for restricted maximum likelihood / marginal likelihood estimation of the variance components: L(Λ) = L(β unp, β pen, Λ)p(β pen )dβ pen dβ unp max Λ. The marginal likelihood can not be derived analytically. Some approximations lead to a simple Fisher-scoring-algorithm. Proved to work well in simulations and applications. We obtain empirical Bayes / posterior mode estimates. Modelling geoadditive survival data 8

Results Results log(baseline) 4 Log-baseline hazard. 1-2 -5 0 7 14 time in years 1.5 effect of age.75 0 -.75 Effect of age at diagnosis. -1.5 14 27 40 53 66 79 92 age in years Modelling geoadditive survival data 9

Results effect of white blood cell count 2 Effect of white blood cell count. 1.5 1.5 0 -.5 0 125 250 375 500 white blood cell count effect of townsend deprivation index.5.25 0 -.25 Effect of deprivation. -.5-6 -2 2 6 10 townsend deprivation index Modelling geoadditive survival data 10

Results -0.31 0 0.24 District-level analysis 0.44 0.3 Individual-level analysis Modelling geoadditive survival data 11

Software Software Estimation was carried out using BayesX, a public domain software package for Bayesian inference. Available from http://www.stat.uni-muenchen.de/~lang/bayesx Modelling geoadditive survival data 12

Software Features (within a mixed model setting): Responses: Gaussian, Gamma, Poisson, Binomial, ordered and unordered multinomial, Cox models. Nonparametric estimation of the log-baseline and time-varying effects based on P-splines. Continuous covariates and time scales: Random Walks, P-splines, autoregressive priors for seasonal components. Spatial Covariates: Markov random fields, stationary Gaussian random fields, two-dimensional P-Splines. Interactions: Two-dimensional P-splines, varying coefficient models with continuous and spatial effect modifiers. Random intercepts and random slopes (frailties). Modelling geoadditive survival data 13

Discussion Discussion Comparison with fully Bayesian approach based on MCMC (Hennerfeind et al., 2003): Cons: Credible intervals rely on asymptotic normality. Only plug-in estimates for functionals. Approximations for marginal likelihood estimation. Pros: No questions concerning mixing and convergence. No sensitivity with respect to prior assumptions on variance parameters. Somewhat better point estimates (in simulations). Numerical integration is required less often. Modelling geoadditive survival data 14

Future work: Discussion More general censoring / truncation schemes. Event history / competing risks models Modelling geoadditive survival data 15

References References Kneib, T. and Fahrmeir, L. (2004): A mixed model approach for structured hazard regression. SFB 386 Discussion Paper 400, University of Munich. Fahrmeir, L., Kneib, T. and Lang, S. (2004): Penalized structured additive regression for space-time data: A Bayesian perspective. Statistica Sinica, 14, 715-745. Available from http://www.stat.uni-muenchen.de/~kneib Modelling geoadditive survival data 16