Analysing geoadditive regression data: a mixed model approach

Similar documents
A general mixed model approach for spatio-temporal regression data

Modelling geoadditive survival data

mboost - Componentwise Boosting for Generalised Regression Models

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Beyond Mean Regression

Regularization in Cox Frailty Models

Estimating Timber Volume using Airborne Laser Scanning Data based on Bayesian Methods J. Breidenbach 1 and E. Kublin 2

Aggregated cancer incidence data: spatial models

Working Papers in Economics and Statistics

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Structured Additive Regression Models: An R Interface to BayesX

Modelling spatial patterns of distribution and abundance of mussel seed using Structured Additive Regression models

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Geoadditive Latent Variable Modelling of Child Morbidity and Malnutrition in Nigeria

Multivariate Survival Analysis

Lecture 5 Models and methods for recurrent event data

Bayesian linear regression

Bayesian non-parametric model to longitudinally predict churn

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Modeling Real Estate Data using Quantile Regression

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

for function values or parameters, which are neighbours in the domain of a metrical covariate or a time scale, or in space. These concepts have been u

Frailty Modeling for clustered survival data: a simulation study

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Variable Selection and Model Choice in Structured Survival Models

Longitudinal breast density as a marker of breast cancer risk

Gibbs Sampling in Latent Variable Models #1

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Semiparametric Generalized Linear Models

A short introduction to INLA and R-INLA

Beyond MCMC in fitting complex Bayesian models: The INLA method

STAT 518 Intro Student Presentation

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Estimating prediction error in mixed models

Conditional Transformation Models

Statistical Inference and Methods

Disease mapping with Gaussian processes

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

Efficient adaptive covariate modelling for extremes

PENALIZED STRUCTURED ADDITIVE REGRESSION FOR SPACE-TIME DATA: A BAYESIAN PERSPECTIVE

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Gaussian Process Regression Model in Spatial Logistic Regression

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Spatially Adaptive Smoothing Splines

Assessment of time varying long term effects of therapies and prognostic factors

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Stat 5101 Lecture Notes

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

[Part 2] Model Development for the Prediction of Survival Times using Longitudinal Measurements

Markov Chain Monte Carlo methods

Introduction to Statistical Analysis

STA 216, GLM, Lecture 16. October 29, 2007

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Regression models for multivariate ordered responses via the Plackett distribution

Joint Modeling of Longitudinal Item Response Data and Survival

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Part IV Extensions: Competing Risks Endpoints and Non-Parametric AUC(t) Estimation

Lecture 3. Truncation, length-bias and prevalence sampling

Bayesian Linear Regression

Extreme Value Analysis and Spatial Extremes

Fahrmeir: Recent Advances in Semiparametric Bayesian Function Estimation

CASE STUDY: Bayesian Incidence Analyses from Cross-Sectional Data with Multiple Markers of Disease Severity. Outline:

P -spline ANOVA-type interaction models for spatio-temporal smoothing

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data

A spatio-temporal model for extreme precipitation simulated by a climate model

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

11 Survival Analysis and Empirical Likelihood

CTDL-Positive Stable Frailty Model

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Experimental Design and Data Analysis for Biologists

Fahrmeir: Discrete failure time models

Additive Isotonic Regression

Bayes methods for categorical data. April 25, 2017

Spatial point processes in the modern world an

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

12 - Nonparametric Density Estimation

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET

R-INLA. Sam Clifford Su-Yun Kang Jeff Hsieh. 30 August Bayesian Research and Analysis Group 1 / 14

Continuous Time Survival in Latent Variable Models

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Semiparametric Methods for Joint Modeling Longitudinal and Survival Data

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Hierarchical Modeling for Multivariate Spatial Data

Multistate Modeling and Applications

Package threg. August 10, 2015

Chapter 4 - Fundamentals of spatial processes Lecture notes

Transcription:

Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005

Spatio-temporal regression data Spatio-temporal regression data Regression in a general sense: Generalised linear models, Multivariate (categorical) generalised linear models, Regression models for survival times (Cox-type models, AFT models). Common structure: Model a quantity of interest in terms of categorical and continuous covariates, e.g. or E(y u) = h(u γ) λ(t u) = λ 0 (t) exp(u γ) (GLM) (Cox model) Spatio-temporal data: Temporal and spatial information as additional covariates. Analysing geoadditive regression data: a mixed model approach 1

Spatio-temporal regression models should allow Spatio-temporal regression data to account for spatial and temporal correlations, for time- and space-varying effects, for non-linear effects of continuous covariates, for flexible interactions, to account for unobserved heterogeneity. Analysing geoadditive regression data: a mixed model approach 2

Example I: Forest health data Example I: Forest health data Yearly forest health inventories carried out from 1983 to 2004. 83 beeches within a 15 km times 10 km area. Response: defoliation degree of beech i in year t, measured in three ordered categories: y it = 1 no defoliation, y it = 2 defoliation 25% or less, y it = 3 defoliation above 25%. Covariates: t s i a it u it calendar time, site of the beech, age of the tree in years, further (mostly categorical) covariates. Analysing geoadditive regression data: a mixed model approach 3

Example I: Forest health data 0.0 0.2 0.4 0.6 0.8 1.0 no damage medium damage severe damage Empirical time trends. 1985 1990 1995 2000 calendar time Empirical spatial effect. 0 1 Analysing geoadditive regression data: a mixed model approach 4

Cumulative probit model: ) P (y it r) = Φ (θ (r) η it Example I: Forest health data with standard normal cdf Φ, thresholds = θ (0) < θ (1) < θ (2) < θ (3) = and η it = f 1 (t) + f 2 (age it ) + f 3 (t, age it ) + f spat (s i ) + u itγ θ (1) θ (2) η θ (3) θ (1) θ (2) η θ (3) Analysing geoadditive regression data: a mixed model approach 5

Example I: Forest health data Analysing geoadditive regression data: a mixed model approach 6

Example I: Forest health data 6 3 0 3 5 30 55 80 105 130 155 180 205 230 age in years 1.5 1.0 0.5 0.0 2 1 0 1 2 0.5 1.0 1.5 200 150 age in years 100 50 1985 1990 1995 calendar time 2000 1983 1990 1997 2004 calendar time Analysing geoadditive regression data: a mixed model approach 7

Category-specific trends: P (y it r) = Φ [ ] θ (r) f (r) 1 (t) f 2(age it ) f spat (s i ) u itγ Example I: Forest health data More complicated constraints: < θ (1) f (1) 1 (t) < θ(2) f (2) 1 (t) < for all t. time trend 1 time trend 2 2 1 0 1 2 1983 1990 1997 2004 calendar time 2 1 0 1 2 1983 1990 1997 2004 calendar time Analysing geoadditive regression data: a mixed model approach 8

Structured additive regression Structured additive regression General Idea: Replace usual parametric predictor with a flexible semiparametric predictor containing Nonparametric effects of time scales and continuous covariates, Spatial effects, Interaction surfaces, Varying coefficient terms (continuous and spatial effect modifiers), Random intercepts and random slopes. All effects can be cast into one general framework. Analysing geoadditive regression data: a mixed model approach 9

Penalised splines. Structured additive regression Approximate f(x) by a weighted sum of B-spline basis functions. Employ a large number of basis functions to enable flexibility. Penalise differences between parameters of adjacent basis functions to ensure smoothness. 2 1 0 1 2 3 1.5 0 1.5 3 2 1 0 1 2 3 1.5 0 1.5 3 2 1 0 1 2 3 1.5 0 1.5 3 Analysing geoadditive regression data: a mixed model approach 10

Bivariate penalised splines. Structured additive regression Varying coefficient models. Effect of covariate x varies smoothly over the domain of a second covariate z: f(x, z) = x g(z) Spatial effect modifier Geographically weighted regression. Analysing geoadditive regression data: a mixed model approach 11

Spatial effect for regional data: Markov random fields. Structured additive regression Bivariate extension of a first order random walk on the real line. Define appropriate neighbourhoods for the regions. Assume that the expected value of f spat (s) is the average of the function evaluations of adjacent sites. f(t+1) E[f(t) f(t 1),f(t+1)] f(t 1) τ 2 2 t 1 t t+1 Analysing geoadditive regression data: a mixed model approach 12

Structured additive regression Spatial effect for point-referenced data: Stationary Gaussian random fields. Well-known as Kriging in the geostatistics literature. Spatial effect follows a zero mean stationary Gaussian stochastic process. Correlation of two arbitrary sites is defined by an intrinsic correlation function. Can be interpreted as a basis function approach with radial basis functions. Analysing geoadditive regression data: a mixed model approach 13

Mixed model based inference Mixed model based inference Each term in the predictor is associated with a vector of regression coefficients with multivariate Gaussian prior / random effects distribution: p(ξ j τ 2 j ) exp ( 1 2τ 2 j ξ jk j ξ j ) K j is a penalty matrix, τ 2 j a smoothing parameter. In most cases K j is rank-deficient. Reparametrise the model to obtain a mixed model with proper distributions. Analysing geoadditive regression data: a mixed model approach 14

Decompose Mixed model based inference where ξ j = X j β j + Z j b j, p(β j ) const and b j N(0, τ 2 j I). β j is a fixed effect and b j is an i.i.d. random effect. This yields the variance components model where in turn η = x β + z b, p(β) const and b N(0, Q). Analysing geoadditive regression data: a mixed model approach 15

Mixed model based inference Obtain empirical Bayes estimates / penalised likelihood estimates via iterating Penalised maximum likelihood for the regression coefficients β and b. Restricted Maximum / Marginal likelihood for the variance parameters in Q: L(Q) = L(β, b, Q)p(b)dβdb max Q. Analysing geoadditive regression data: a mixed model approach 16

Software Software Implemented in the software package BayesX. Available from http://www.stat.uni-muenchen.de/~bayesx Analysing geoadditive regression data: a mixed model approach 17

Childhood mortality in Nigeria Childhood mortality in Nigeria Data from the 2003 Demographic and Health Survey (DHS) in Nigeria. Retrospective questionnaire on the health status of women in reproductive age and their children. Survival time of n = 5323 children. Numerous covariates including spatial information. Analysis based on the Cox model: λ(t; u) = λ 0 (t) exp(u γ). Analysing geoadditive regression data: a mixed model approach 18

Limitations of the classical Cox model: Childhood mortality in Nigeria Restricted to right censored observations. Post-estimation of the baseline hazard. Proportional hazards assumption. Parametric form of the predictor. No spatial correlations. Geoadditive hazard regression. Analysing geoadditive regression data: a mixed model approach 19

Interval censored survival times Interval censored survival times In theory, survival times should be available in days. Retrospective questionnaire most uncensored survival times are rounded (Heaping). 0 50 100 150 200 250 300 0 6 12 18 24 30 36 42 48 54 In contrast: censoring times are given in days. Treat survival times as interval censored. Analysing geoadditive regression data: a mixed model approach 20

Interval censored survival times interval censored right censored uncensored 0 C T lower T T upper Analysing geoadditive regression data: a mixed model approach 21

Likelihood contributions: Interval censored survival times P (T > C) = S(C) [ = exp C 0 λ(t)dt ]. P (T [T lower, T upper ]) = S(T lower ) S(T upper ) [ ] = exp Tlower 0 λ(t)dt [ exp Tupper 0 λ(t)dt ]. Derivatives of the log-likelihood become much more complicated for interval censored survival times. Numerical integration techniques have to be used in both cases. Piecewise constant time-varying covariates and left truncation can easily be included. Analysing geoadditive regression data: a mixed model approach 22

Interval censored survival times Spatial effect without covariates. 0.6 0 0.5 Spatial effect including covariates. 0.05 0 0.05 Analysing geoadditive regression data: a mixed model approach 23

Interval censored survival times 2 1 0 1 2 15 25 35 45 age of the mother at birth Body mass index of the mother. Age of the mother at birth. 2 1 0 1 2 10 20 30 40 50 body mass index of the mother Analysing geoadditive regression data: a mixed model approach 24

Interval censored survival times log(baseline) 50 40 30 20 10 without interval censoring with interval censoring 0 500 1000 1500 survival time in days Analysing geoadditive regression data: a mixed model approach 25

Discussion Discussion Empirical Bayesian treatment of complex geoadditive regression models: Based on mixed model representation. Applicable for a wide range of regression models. Does not rely on MCMC simulation techniques. No questions on convergence and mixing of Markov chains, no hyperpriors. Closely related to penalised likelihood estimation in a frequentist setting. Future work: Extended modelling for categorical responses, e.g. based on correlated latent utilities. Multi state models. Interval censoring for multi state models. Analysing geoadditive regression data: a mixed model approach 26

References References Fahrmeir, L., Kneib, T. & Lang, S. (2004): Penalized structured additive regression for space-time data: A Bayesian perspective. Statistica Sinica, 14, 715-745. Kneib, T. & Fahrmeir, L. (2005): Structured additive regression for categorical space-time data: A mixed model approach. Biometrics, to appear. Kneib, T. (2005): Geoadditive hazard regression for interval censored survival times. SFB 386 Discussion Paper 447, University of Munich. Software and preprints: http://www.stat.uni-muenchen.de/~kneib Analysing geoadditive regression data: a mixed model approach 27