Beyond MCMC in fitting complex Bayesian models: The INLA method

Similar documents
R-INLA. Sam Clifford Su-Yun Kang Jeff Hsieh. 30 August Bayesian Research and Analysis Group 1 / 14

A short introduction to INLA and R-INLA

Disease mapping with Gaussian processes

Integrated Non-Factorized Variational Inference

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Regression Linear and Logistic Regression

Principles of Bayesian Inference

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Chapter 4 - Fundamentals of spatial processes Lecture notes

Principles of Bayesian Inference

A general mixed model approach for spatio-temporal regression data

Analysing geoadditive regression data: a mixed model approach

NORGES TEKNISK-NATURVITENSKAPELIGE UNIVERSITET

Principles of Bayesian Inference

Bayesian Linear Regression

Fast approximations for the Expected Value of Partial Perfect Information using R-INLA

Default Priors and Effcient Posterior Computation in Bayesian

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Spatial point processes in the modern world an

Performance of INLA analysing bivariate meta-regression and age-period-cohort models

Statistics & Data Sciences: First Year Prelim Exam May 2018

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA)

STAT 518 Intro Student Presentation

Bayesian Methods for Machine Learning

Bayesian Inference and MCMC

Bayesian Hierarchical Models

One-parameter models

Markov chain Monte Carlo

Lecture 13 Fundamentals of Bayesian Inference

Linear Models A linear model is defined by the expression

Penalized Loss functions for Bayesian Model Choice

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Density Estimation. Seungjin Choi

David Giles Bayesian Econometrics

Inference for latent variable models with many hyperparameters

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

MCMC algorithms for fitting Bayesian models

Monte Carlo Techniques for Regressing Random Variables. Dan Pemstein. Using V-Dem the Right Way

arxiv: v1 [stat.co] 16 May 2011

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Spatial modelling using INLA

Bayesian Linear Models

Principles of Bayesian Inference

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Introduction to Probabilistic Machine Learning

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

SPATIAL AND SPATIO-TEMPORAL BAYESIAN MODELS WITH R - INLA BY MARTA BLANGIARDO, MICHELA CAMELETTI

Bayes: All uncertainty is described using probability.

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Markov Chain Monte Carlo methods

CPSC 540: Machine Learning

Scaling intrinsic Gaussian Markov random field priors in spatial modelling

Bayesian Estimation with Sparse Grids

Community Health Needs Assessment through Spatial Regression Modeling

Beyond Mean Regression

Lecture : Probabilistic Machine Learning

Model Assessment and Comparisons

Large-scale Ordinal Collaborative Filtering

David Giles Bayesian Econometrics

STA414/2104 Statistical Methods for Machine Learning II

Bayesian analysis for a class of beta mixed models

Departamento de Economía Universidad de Chile

Data are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Big Data in Environmental Research

Metropolis-Hastings Algorithm

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Bayesian non-parametric model to longitudinally predict churn

Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Contents. Part I: Fundamentals of Bayesian Inference 1

Fully Bayesian Spatial Analysis of Homicide Rates.

Monte Carlo in Bayesian Statistics

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

Introduction to Bayesian Inference

DAG models and Markov Chain Monte Carlo methods a short overview

Data Integration Model for Air Quality: A Hierarchical Approach to the Global Estimation of Exposures to Ambient Air Pollution

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Bayesian model selection: methodology, computation and applications

Centre for Biodiversity Dynamics, Department of Biology, NTNU, NO-7491 Trondheim, Norway,

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data

10. Exchangeability and hierarchical models Objective. Recommended reading

Bayesian Methods in Multilevel Regression

Nonparameteric Regression:

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Markov Chain Monte Carlo methods

Nonparametric Bayesian Methods (Gaussian Processes)

The STS Surgeon Composite Technical Appendix

Efficient Likelihood-Free Inference

The profit function system with output- and input- specific technical efficiency

-A wild house sparrow population case study

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Bayesian Model Diagnostics and Checking

Transcription:

Beyond MCMC in fitting complex Bayesian models: The INLA method Valeska Andreozzi Centre of Statistics and Applications of Lisbon University (valeska.andreozzi at fc.ul.pt) European Congress of Epidemiology Porto, Portugal September 2012

Summary This talk will be focused on an alternative method to perform Bayesian inference of complex models, namely the integrated nested Laplace approximation (INLA). This alternative technique will be illustrated by an application of a generalized linear/additive mixed model in the analysis of a cohort study. An R interface for INLA program (r-inla package) will be demonstrated. 2 of 36

Outline Describe statistical modelling Give an illustrative example (Cohort study) Describe the Bayesian inference Summarize the MCMC methods Present a briefly introduction to INLA Ilustrate the software available to implement INLA (r-inla package) Provide some references to more details about INLA Concluding remarks 3 of 36

Complex models in Health By nature, Health is a complex system. Applied statisticians are more and more facing model complexity to represent Health problems satisfactory/appropriately. 4 of 36

Statistical Modelling Set a hypothesis Collect data (y) Trend + error Statistical model for the data Joint probability distribution for the data f(y θ) 5 of 36

Statistical Modelling Actually, the joint probability distribution f(y θ) for the data depend on the values of the unknown parameters θ. In practice, when f(y θ) is viewed as a function of θ rather than y, f(y θ) is known as the likelihood for the data. The likelihood under the proposed model represent how likely it is to observe the data y we have collected, given specific values for the parameters θ 6 of 36

Illustrative example Let s introduce some example to illustrate the methods that will be discussed: Investigate the rate of infant weight gain. Observational cohort study: infants attending the Public Health System of the city of Rio de Janeiro. Project: Quality assessment of care for infants under six months of age at the Public Health System of the city of Rio de Janeiro Coordinator: Doctor Maria do Carmo Leal (ENSP/Fiocruz/Brazil) 7 of 36

Infant weight gain in Rio de Janeiro Data Data consist of repeated measures of infant weight collected from the child health handbook. Weight (kg) 10 8 6 4 2 Covariates available: Infant characteristics: age, birth weight, gender, type of breastfeeding, whether attending nursery school, whether has been hospitalized. Mother characteristics: age, level of education, marital status, number of pregnancies, number of people in the household, number of child less than 5 years in the household, Kotelchuck Index 0 5 10 15 20 25 Months 8 of 36

Infant weight gain in Rio de Janeiro Model formulation The simplest model (likelihood - f(y θ)) for the data y i is the normal distribution To take into account the longitudinal structured of the data (within correlation among repeated measures of weight), random effects have been included. The result model for the infant weight gain can be expressed as y i = X i β +Z i ω i +ǫ i 9 of 36

Infant weight gain in Rio de Janeiro Model formulation (cont.) Considering a hierarchical model formulation y i N(µ i,σ 1 ) µ i = X i β +Z i ω i ω i N(0,D 1 ) Using the previous notation, the data y is assumed to have a normal likelihood (f(y θ)) and the vector of unknown parameters consist of θ = (β,σ,d) 10 of 36

Model inference Maximum likelihood estimates (MLE) To find good estimates for θ, one can choose values θ that maximize the likelihood f(y θ) MLE Maximum likelihood methods is fine, but if the form of the likelihood f(y θ) is complex and/or the number of individual parameters involved is large then the approach may prove either very difficult or infeasible to implement. 11 of 36

Bayes inference In this scenario, Bayesian inference is a appealing approach. Start by reviewing the Bayes theorem The posterior probability distribution for the parameters given the observed data is: P(θ y) = P(θ)f(y θ) f(y) = P(θ)f(y θ) θ P(θ)f(y θ)dθ This results show that: 12 of 36

Bayes inference Posterior Prior Likelihood The prior probability distribution for the parameters P(θ) express the uncertainty about θ before taking the data into account. Usually it is chosen to be non-informative. The posterior probability distribution for the parameters P(θ y) express the uncertainty about θ after observed the data. So in Bayesian inference all parameter information comes from the posterior distribution. For example: 13 of 36 θ = E(θ y) = θ θp(θ y)dθ = θ θf(y θ)p(θ)dθ θ f(y θ)p(θ)dθ

Computing the posterior distribution Until recently, model solutions were very hard to obtain because integrals of the posterior distribution cannot be evaluated analitically. Solution: draw samples values from the posterior distribution approximating any characteristics of it by the characteristics of these samples. But how? 14 of 36

MCMC Using Monte Carlo integration methods with Markov Chain (MCMC) This algorithm construct a Markov chain with stationary distribution identical to the posterior and use values from the Markov chain after a sufficiently long burn-in as simulated samples from the posterior. An attractive method to implement an MCMC algorithm is the Gibbs sampling. 15 of 36

Gibbs sampling Suppose the set of conditional distributions: [θ 1 θ 2,...,θ p,data] [θ 2 θ 1,...,θ p,data]. [θ p θ 1,...,θ p 1,data] The idea behind Gibbs sampling is that we can set up a Markov chain simulation algorithm from the joint posterior distribution by successfully simulating individual parameters from the set of p conditional distributions. Under general conditions, draws from the simulating algorithm will converge to the target distribution of interest (the joint posterior distribution of θ). 16 of 36

MCMC estimation Although very flexible to program complex models. One has to monitor the performance of a MCMC algorithm to decide, at a long run (?), if the simulated sample provides a reasonable approximation to the posterior density. Issues to ensure good estimates: Convergence (burn-in required). Mixing (required number of samples after convergence). 17 of 36

MCMC estimation Difficult to construct a MCMC scheme that converges in a reasonable amount of time (can take hours or even days to deliver correct results). Difficulty in specifying prior distributions. In practice, the handicap of data analysis using MCMC is the large computational burden. How to overcome this efficiency problem? 18 of 36

Alternative method Use alternative approximation methods to MCMC, namely, the Integrated Nested Laplace Approximation. Designed to a class of hierarchical models called latent Gaussian models Examples where INLA can be applied: Generalized linear mixed models Generalized additive mixed models Spline smoothing models Disease mapping (including ecological studies) Spatial and spatio-temporal models Dynamic generalized linear models Survival models 19 of 36

Latent Gaussian models It is a hierarchical model defined in 3 stages: Observation model: y i θ f(y i θ) Latent Gaussian field or Parameter model: θ γ N(µ(γ),Q(γ) 1 ) Hyperparameter: γ f(γ) 20 of 36

Latent Gaussian models The observation y i belongs to the exponential family where the mean µ i is linked to a structured additive predictor η i through a link function g(µ i ) = η i n β η i = β 0 + β k x ki + h (j) (z ji )+ǫ i k=1 n h j=1 β k represent the linear effect of covariates x h (j) are unknown functions of covariates z. Can represent non-linear effects, time-trend, seasonal effects, random effects, spatial random effects. ǫ i are the unstructured random effects. 21 of 36

Latent Gaussian models Observation model: y i θ f(y i θ) Latent Gaussian field or Parameter model: θ γ N(µ(γ),Q(γ) 1 ) Hyperparameter: γ f(γ) The models are assumed to satisfy two properties: 1. The latent Gaussian field θ, usually of large dimension, is conditionally independent, so its precision matrix Q(γ) is sparse (full of zeros because many parameters are not correlated) 2. the dimension of the hyperparameter vector γ is small ( 6) 22 of 36

INLA With gaussian parameters and sparse precision matrix, it can be assumed a multivariated gaussian distribution for the ordinary scenario (normal response variable) In the Gaussian layout, inference is an easy task. It means that the posterior distribution of parameters is easy to calculate. For more complex models (non Gaussian response variable, for example), approximation method of integration of the posterior density has to be used Laplace. 23 of 36

INLA The marginal posterior density for θ is given by P(θ i y) = P(θ i γ,y)p(γ y)dγ and INLA approximate this by γ P(θ i y) = k P(θ i γ k,y) P(γ k y) k where Laplace is applied to carry out the integrations required for evaluation of P. So no simulations are needed to find estimate for θ. The INLA output is the marginal posterior distribution which can be summarized by means, variances and quantiles. DIC and predictive measures (CPO, PIT) are also available. 24 of 36

Results n β weight i = η i = β 0 + β k x ki + h (j) (z ji )+ǫ i n h k=1 j=1 Structured additive predictor Processing DIC η i total time (s) β 0 +β 1 age i +β 2 age 2 i +h(1) (u i )+ǫ i 9.70 4647.21 β 0 +β 1 age i +β 2 age 2 i +h(1) (u i )+ǫ i + cov 15.62 4631.66 β 0 +h (1) (u i )+h (2) (age i )+ǫ i 15.46 4641.46 β 0 +h (1) (u i )+h (2) (age i )+ǫ i + cov 19.52 4627.81 h (1) (u i ) N(0,τ 1 u ) and τ 1 Gamma(a,b) h (2) (age i ) is a random walk smoothness prior with precision τ 2 25 of 36

Results Model hyperparameters: Precision mean sd 0.025quant 0.975quant β 0 +β 1 age i +β 2 age 2 i +h(1) (u i )+ǫ i + cov within child (ǫ) 5.54 0.16 5.23 5.87 intercept (u) 3.14 0.16 2.83 3.47 β 0 +β 1 age i +β 2 age 2 i +h(1) (u i )+ǫ i + cov within child (ǫ) 5.55 0.16 5.24 5.87 intercept (u) 3.55 0.19 3.19 3.95 β 0 +h (1) (u i )+h (2) (age i )+ǫ i within child (ǫ) 5.56 0.16 5.25 5.89 intercept (u) 3.14 0.16 2.82 3.47 age 3710.58 2207.72 949.28 9313.81 β 0 +h (1) (u i )+h (2) (age i )+ǫ i + cov within child (ǫ) 5.56 0.17 5.21 5.88 intercept (u) 3.57 0.20 3.15 3.95 age 3756.31 2187.72 909.55 9182.95 26 of 36

Results Posterior density estimation of hyperparameters. PostDens [Precision for Gaussian observations] PostDens [Precision for id] 0.0 0.5 1.0 1.5 2.0 4 5 6 7 8 0.0 0.5 1.0 1.5 2 3 4 5 6 7 0.00000 0.00010 0.00020 PostDens [Precision for age] 0 10000 20000 30000 40000 50000 27 of 36

Results Posterior estimated of the non-linear effect of age by the model adjusted by mother and infant covariates age 2 1 0 1 2 0 5 10 15 20 25 28 of 36 PostMean 0.025% 0.5% 0.975%

Results Posterior density estimation of covariate effects PostDens [(Intercept)] PostDens [genderfem] PostDens [education] PostDens [kcindexintermediário] PostDens [kcindexadequado] PostDens [kcindexmais que adequado 0.0 1.0 2.0 0 2 4 6 8 0 20 40 5.0 5.5 6.0 6.5 7.0 0.5 0.4 0.3 0.2 0.1 0.04 0.02 0.00 0.02 Mean = 5.86 SD = 0.199 Mean = 0.297 SD = 0.037 Mean = 0.011 SD = 0.007 PostDens [parity2+] PostDens [nbornalive] PostDens [seindex] 0 2 4 6 0 5 10 15 0 5 10 20 0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.15 0.05 0.05 0.10 0.10 0.05 0.00 0.05 0.10 Mean = 0.202 SD = 0.053 Mean = 0.038 SD = 0.023 Mean = 0.016 SD = 0.018 PostDens [nhouse] PostDens [n5house] 0 10 20 0 4 8 0.0 1.0 2.0 0.0 1.0 2.0 0.0 1.0 2.0 0.0 1.0 2.0 0.5 0.0 0.5 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 Mean = 0.117 SD = 0.17 Mean = 0.182 SD = 0.169 Mean = 0.079 SD = 0.185 PostDens [blength] PostDens [feedingnomore] PostDens [feedingsupplemented] 0 100 300 0 2 4 6 0 2 4 6 8 0.002 0.002 0.006 0.010 0.4 0.2 0.0 0.2 0.4 0.3 0.2 0.1 0.0 Mean = 0.004 SD = 0.001 Mean = 0.038 SD = 0.065 Mean = 0.157 SD = 0.039 PostDens [kcindexinadequado] PostDens [nurseryschoolsim] PostDens [hospitalsim] 0.0 1.0 2.0 3.0 0 2 4 6 0.10 0.05 0.00 0.05 0.2 0.1 0.0 0.1 0.2 1.0 0.5 0.0 0.5 1.0 0.5 0.0 0.5 0.6 0.4 0.2 0.0 0.2 Mean = 0.012 SD = 0.014 Mean = 0.005 SD = 0.036 Mean = 0.051 SD = 0.174 Mean = 0.07 SD = 0.13 Mean = 0.191 SD = 0.063 29 of 36

INLA in R > library(inla) > model<-inla(y ~ x1+x2+ f(age,model="rw2") + f(id,model="iid"), data=dataframe,family="gaussian", control.compute=list(dic=true,cpo=t)) > summary(model) > plot(model) 30 of 36

www.r-inla.org 31 of 36

Concluding remarks 4 Search: Integrated Nested Laplace Approximation(s) Number of articles in Pubmed 3 2 1 0 32 of 36 2010 2011 2012 Year Explore it!

Concluding remarks The main aim of this presentation was twofold: to highlight the fact that MCMC sampling is not a simple tool to be used in routine analysis due to convergence and computational time issues. to show that applied research can benefit, specially computationaly, if an alternative method, such as INLA is adopted for Bayesian inference. 33 of 36

Concluding remarks The lessons are: At the end of the day, there s no free lunch. INLA does not substitute MCMC. It is designed for a specific class of models, whilst very flexible one. It comes to complement and make Health problems statistical analysis (Bayesian speaking) more straightforward. And beyond any doubt: If you want r-inla to have a particular feature, observation model or prior model, you need to ask us! Simpson D, Lindgren F, Rue H 34 of 36

Selected references Rue H., Martino S. and Chopin N. 2009, Approximate Bayesian Inference for Latent Gaussian Models Using Integrated Nested Laplace Approximations (with discussion). Journal of the Royal Statistical Society, Series B, 71, 319-392 Fong Y., Rue H. and Wakefield J. 2010, Bayesian inference for Generalized Linear Mixed Models. Biostatistics, 11, 397-412. (R-code and supplementary material) Simpson D., Lindgren, F. and Rue H. 2011, Fast approximate inference with INLA: the past, the present and the future. Technical report at arxiv.org Held L., Schrodle B. and Rue H., Posterior and Cross-validatory Predictive Checks: A Comparison of MCMC and INLA (2009), a chapter in Statistical Modelling and Regression Structures. Editors: Tutz, G. and Kneib, T. Physica-Verlag, Heidelberg. Martino S. and Rue H. Case Studies in Bayesian Computation using INLA (2010) to appear in Complex data modeling and computationally intensive statistical methods (R-code) A complete list can be found at www.r-inla.org/papers 35 of 36

This research has been partially supported by National Funds through FCT Fundação para Ciência e Tecnologia, projects PTDC/MAT/118335/2010 and PEst-OE/MAT/UI0006/2011 Thank you very much for your attention! 36 of 36