gamboostlss: boosting generalized additive models for location, scale and shape

Similar documents
LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

mboost - Componentwise Boosting for Generalised Regression Models

Beyond Mean Regression

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates

Model-based boosting in R

Conditional Transformation Models

Variable Selection and Model Choice in Structured Survival Models

Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost

On the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

A Framework for Unbiased Model Selection Based on Boosting

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Modelling geoadditive survival data

A general mixed model approach for spatio-temporal regression data

Statistical Data Mining and Machine Learning Hilary Term 2016

Analysing geoadditive regression data: a mixed model approach

Estimating prediction error in mixed models

Regularization in Cox Frailty Models

School on Modelling Tools and Capacity Building in Climate and Public Health April 2013

Boosted Beta Regression

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Gradient boosting in Markov-switching generalized additive models for location, scale and shape

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Structured Additive Regression Models: An R Interface to BayesX

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Regression Shrinkage and Selection via the Lasso

Boosting structured additive quantile regression for longitudinal childhood obesity data

Graphical Models for Collaborative Filtering

High-dimensional regression with unknown variance

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Geoadditive regression modeling of stream biological condition

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost

Linear Models in Machine Learning

MSA220/MVE440 Statistical Learning for Big Data

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Likelihood-Based Methods

Probabilistic temperature post-processing using a skewed response distribution

Quantile Regression for Panel/Longitudinal Data

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

STK-IN4300 Statistical Learning Methods in Data Science

STK4900/ Lecture 5. Program

Math 181B Homework 1 Solution

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Graphical Model Selection

Introduction to Machine Learning

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Nonparametric inference in hidden Markov and related models

arxiv: v1 [stat.me] 24 Jul 2016

A short introduction to INLA and R-INLA

arxiv: v2 [stat.me] 28 Nov 2012

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Technical Vignette 5: Understanding intrinsic Gaussian Markov random field spatial models, including intrinsic conditional autoregressive models

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Regression, Ridge Regression, Lasso

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;

Bayesian methods in economics and finance

Chapter 3. Linear Models for Regression

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

Sparse Gaussian conditional random fields

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

Biostatistics Advanced Methods in Biostatistics IV

Maximum Likelihood Estimation

25 : Graphical induced structured input/output models

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Linear Model Selection and Regularization

Marginal Screening and Post-Selection Inference

Experiment 1: Linear Regression

Bayesian Linear Regression

Modeling Real Estate Data using Quantile Regression

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Sparse inverse covariance estimation with the lasso

Quantile Regression for Longitudinal Data

Or How to select variables Using Bayesian LASSO

Machine Learning for OR & FE

Chapter 17: Undirected Graphical Models

Regression Analysis: Exploring relationships between variables. Stat 251

Nonparametric Quantile Regression

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Statistical Inference

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

MODULE -4 BAYEIAN LEARNING

Generalized Linear Models. Kurt Hornik

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Algorithms for Collaborative Filtering

Generalized Elastic Net Regression

ECE521 week 3: 23/26 January 2017

Generalized Additive Models (GAMs)

Multidimensional Density Smoothing with P-splines

Introduction to Machine Learning (67577) Lecture 7

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Transcription:

gamboostlss: boosting generalized additive models for location, scale and shape Benjamin Hofner Joint work with Andreas Mayr, Nora Fenske, Thomas Kneib and Matthias Schmid Department of Medical Informatics, Biometry and Epidemiology FAU Erlangen-Nürnberg, Germany user! 2011

Motivation: Munich rental guide Aim: Data: Provide precise point predictions and prediction intervals for the net-rent of flats in the city of Munich. Covariates: 325 (mostly) categorical, 2 continuous and 1 spatial Observations: 3016 flats Problem: Heteroscedasticity found in the data Idea Model not only the expected mean but also the variance GAMLSS user! 2011 (Benjamin Hofner) gamboostlss 2 / 18

GAMLSS Model Class The GAMLSS model class Generalized Additive Models for Location, Scale and Shape p 1 g 1 (µ) = η µ = β 0µ + f j µ (x j ) j =1 p 2 g 2 (σ) = η σ = β 0σ + f j σ (x j ) j =1 location scale. Introduced by Rigby and Stasinopoulos (2005) Flexible alternative to generalized additive models (GAM) Up to four distribution parameters are regressed on the covariates. Every distribution parameter is modeled by its own predictor and an associated link function g k ( ). user! 2011 (Benjamin Hofner) gamboostlss 3 / 18

GAMLSS Fitting Algorithms Current fitting algorithm The gamlss package Fitting algorithms for a large amount of distribution families are provided by the R package gamlss (Stasinopoulus and Rigby, 2007). Estimation is based on a penalized likelihood approach. Modified versions of back-fitting (as for conventional GAMs) are used. These algorithms work remarkably well in many applications, but: It is not feasible for high-dimensional data (p n). No spatial effects are implemented. Variable selection is based on generalized AIC, which is known to be unstable. More work needs to be done here (Stasinopoulus and Rigby, 2007). user! 2011 (Benjamin Hofner) gamboostlss 4 / 18

GAMLSS Fitting Algorithms Optimization problem for GAMLSS The task is to model the distribution parameters of the conditional density f dens (y µ, σ, ν, τ) The optimization problem can be formulated as ) [ (ˆµ, ˆσ, ˆν, ˆτ argmin E }{{} Y,X ρ ( ) ] Y, η µ (X ), η σ (X ), η ν (X ), η τ (X ) η µ,η σ,η ν,η τ }{{} θ η with loss function ρ = l, i.e., the negative log-likelihood of the response distribution: l = n log [f dens (y i θ i )] = i=1 n log [f dens (y i µ i, σ i, ν i, τ i )] i=1 Maximum likelihood approach user! 2011 (Benjamin Hofner) gamboostlss 5 / 18

Boosting (in a Nutshell) Alternative to ML: Component-wise boosting Boosting minimizes empirical risk (e.g., negative log likelihood) in an iterative fashion via functional gradient descent (FGD). In boosting iteration m + 1 Compute (negative) gradient of the loss function and plug in the current estimate u [m+1] i = ρ(y i,η) η η=ˆη [m] i Estimate u [m+1] i via base-learners (i.e., simple regression models) Update: use only the best-fitting base-learner; add a small fraction ν of this estimated base-learner (e.g., 10%) to the model Variable selection intrinsically within the fitting process user! 2011 (Benjamin Hofner) gamboostlss 6 / 18

Boosting (in a Nutshell) Boosting for GAMLSS models Boosting was recently extended to risk functions with multiple components (Schmid et al., 2010) Idea Use partial derivatives instead of gradient Specify a set of base-learners one base-learner per covariate Fit each of the base-learners separately to the partial derivatives Cycle through the partial derivatives within each boosting step user! 2011 (Benjamin Hofner) gamboostlss 7 / 18

Boosting (in a Nutshell) Boosting for GAMLSS models Boosting was recently extended to risk functions with multiple components (Schmid et al., 2010) Idea Use partial derivatives instead of gradient Specify a set of base-learners one base-learner per covariate Fit each of the base-learners separately to the partial derivatives Cycle through the partial derivatives within each boosting step ρ η µ (y i, ˆµ [m], ˆσ [m], ˆν [m], ˆτ [m] ) update best fitting BL ˆη [m+1] µ = ˆµ [m+1], user! 2011 (Benjamin Hofner) gamboostlss 7 / 18

Boosting (in a Nutshell) Boosting for GAMLSS models Boosting was recently extended to risk functions with multiple components (Schmid et al., 2010) Idea Use partial derivatives instead of gradient Specify a set of base-learners one base-learner per covariate Fit each of the base-learners separately to the partial derivatives Cycle through the partial derivatives within each boosting step ρ (y i, ˆµ [m], ˆσ [m], ˆν [m], ˆτ [m] ) η µ ρ (y i, ˆµ [m+1], ˆσ [m], ˆν [m], ˆτ [m] ) η σ update best fitting BL update best fitting BL ˆη [m+1] µ = ˆµ [m+1], ˆη [m+1] σ = ˆσ [m+1], user! 2011 (Benjamin Hofner) gamboostlss 7 / 18

Boosting (in a Nutshell) Boosting for GAMLSS models Boosting was recently extended to risk functions with multiple components (Schmid et al., 2010) Idea Use partial derivatives instead of gradient Specify a set of base-learners one base-learner per covariate Fit each of the base-learners separately to the partial derivatives Cycle through the partial derivatives within each boosting step ρ (y i, ˆµ [m], ˆσ [m], ˆν [m], ˆτ [m] ) η µ ρ (y i, ˆµ [m+1], ˆσ [m], ˆν [m], ˆτ [m] ) η σ ρ (y i, ˆµ [m+1], ˆσ [m+1], ˆν [m], ˆτ [m] ) η ν update best fitting BL update best fitting BL update best fitting BL ˆη [m+1] µ = ˆµ [m+1], ˆη [m+1] σ = ˆσ [m+1], ˆη [m+1] ν = ˆν [m+1], user! 2011 (Benjamin Hofner) gamboostlss 7 / 18

Boosting (in a Nutshell) Boosting for GAMLSS models Boosting was recently extended to risk functions with multiple components (Schmid et al., 2010) Idea Use partial derivatives instead of gradient Specify a set of base-learners one base-learner per covariate Fit each of the base-learners separately to the partial derivatives Cycle through the partial derivatives within each boosting step ρ (y i, ˆµ [m], ˆσ [m], ˆν [m], ˆτ [m] ) η µ ρ (y i, ˆµ [m+1], ˆσ [m], ˆν [m], ˆτ [m] ) η σ ρ (y i, ˆµ [m+1], ˆσ [m+1], ˆν [m], ˆτ [m] ) η ν ρ (y i, ˆµ [m+1], ˆσ [m+1], ˆν [m+1], ˆτ [m] ) η τ update best fitting BL update best fitting BL update best fitting BL update best fitting BL ˆη [m+1] µ = ˆµ [m+1], ˆη [m+1] σ = ˆσ [m+1], ˆη [m+1] ν = ˆν [m+1], ˆη [m+1] τ = ˆτ [m+1]. user! 2011 (Benjamin Hofner) gamboostlss 7 / 18

Boosting (in a Nutshell) Variable selection and shrinkage The main tuning parameter are the stopping iterations m stop,k. They control variable selection and the amount of shrinkage. If boosting is stopped before convergence only the most important variables are included in the final model. Variables that have never been selected in the updated step, are excluded. Due to the small increments added in the update step, boosting incorporates shrinkage of effect sizes (compare to LASSO), leading to more stable predictions. For large m stop,k boosting converges to the same solution as the original algorithm (in low-dimensional settings). The selection of m stop,k is normally based on resampling methods, optimizing the predictive risk. user! 2011 (Benjamin Hofner) gamboostlss 8 / 18

Data example Munich Rental Guide Data example: Munich rental guide To deal with heteroscedasticity, we chose a three-parametric t-distribution with E(y) = µ and Var(y) = σ 2 df df 2 For each of the parameters µ, σ, and df, we consider the candidate predictors Base-learners η µi = β 0µ + x i β µ + f 1,µ (size i ) + f 2,µ (year i ) + f spat,µ (s i ), η σi = β 0σ + x i β σ + f 1,σ (size i ) + f 2,σ (year i ) + f spat,σ (s i ), η dfi = β 0df + x i β df + f 1,df (size i ) + f 2,df (year i ) + f spat,df (s i ). Categorical variables: Simple linear models Continuous variables: P-splines Spatial variable: Gaussian MRF (Markov random fields) user! 2011 (Benjamin Hofner) gamboostlss 9 / 18

Data example Munich Rental Guide Package gamboostlss Boosting for GAMLSS models is implemented in the R package gamboostlss ( now available on CRAN). Package relies on the well tested and mature boosting package mboost. Lots of the mboost infrastructure is available in gamboostlss as well (e.g., base-learners & convenience functions). user! 2011 (Benjamin Hofner) gamboostlss 10 / 18

Data example Munich Rental Guide Package gamboostlss Boosting for GAMLSS models is implemented in the R package gamboostlss ( now available on CRAN). Package relies on the well tested and mature boosting package mboost. Lots of the mboost infrastructure is available in gamboostlss as well (e.g., base-learners & convenience functions). Now let s start and have a short look at some code! user! 2011 (Benjamin Hofner) gamboostlss 10 / 18

Data example Munich Rental Guide Package gamboostlss > ## Install package mboost: (we use the R-Forge version as the > ## bmrf base-learner is not yet included in the CRAN version) > install.packages("mboost", + repos = "http://r-forge.r-project.org") > ## Install and load package gamboostlss: > install.packages("gamboostlss") > library("gamboostlss") user! 2011 (Benjamin Hofner) gamboostlss 11 / 18

Data example Munich Rental Guide (Simplified) code to fit the model > ## Load data first, and load boundary file for spatial effects > ## Now set up formula: > form <- paste(names(data)[1], " ~ ", paste(names(data)[-c(1, 327, 328, 329)], collapse = " + "), " + bbs(wfl) + bbs(bamet) + bmrf(region, bnd = bound)") > form <- as.formula(form) > form nmqms ~ erstbezg + dienstwg + gebmeist + gebgruen + hzkohojn +... + bbs(wfl) + bbs(bamet) + bmrf(region, bnd = bound) > ## Fit the model with (initially) 100 boosting steps > mod <- gamboostlss(formula = form, families = StudentTLSS(), control = boost_control(mstop = 100, trace = TRUE), baselearner = bols, data = data) [ 1]... -- risk: 3294.323 [ 41]... -- risk: 3091.206 [ 81]... Final risk: 3038.919 user! 2011 (Benjamin Hofner) gamboostlss 12 / 18

Data example Munich Rental Guide (Simplified) code to fit the model (ctd.) > ## optimal number of boosting iterations fund by 3-dimensional > ## cross-validation on a logarithmic grid resulted in > ## 750 (mu), 108 (sigma), 235 (df) steps; > ## Let model run until these values: > mod[c(750, 108, 235)] > > ## Let s look at the number of variables per parameter: > sel <- selected(mod) > lapply(sel, function(x) length(unique(x))) $mu [1] 115 $sigma [1] 31 $df [1] 7 > ## (Very) sparse model (only 115, 31 and 5 base-learners out of 328) user! 2011 (Benjamin Hofner) gamboostlss 13 / 18

Data example Munich Rental Guide (Simplified) code to fit the model (ctd.) > ## Now we can look at the estimated parameters > ## e.g., the effect of roof terrace on the mean > coef(mod, which = "dterasn", parameter = "mu") $ bols(dterasn) (Intercept) dterasn -0.004254606 0.293792997 > ## We can also easily plot the estimated smooth effects: > plot(mod, which = "bbs(wfl)", parameter = "mu", + xlab = "flat size (in square meters)", type = "l") 1.0 f partial 0.5 0.0 50 100 150 200 250 flat size (in square meters) user! 2011 (Benjamin Hofner) gamboostlss 14 / 18

Data example Results Results: spatial effects GAMLSS: µ GAMLSS: σ 0.4648 0 0.4616 0.1665 0.0245 Estimated spatial effects obtained for the high-dimensional GAMLSS for distribution parameters µ and σ. For the third parameter df, the corresponding variable was not selected. user! 2011 (Benjamin Hofner) gamboostlss 15 / 18

Data example Results Results: prediction intervals 0 500 1000 1500 2000 2500 3000 5 0 5 10 15 20 25 GAMLSS net rent / m² 0 500 1000 1500 2000 2500 3000 5 0 5 10 15 20 25 GAM net rent / m² 95% prediction intervals based on the quantiles of the modeled conditional distribution. Coverage probability GAMLSS 93.93% (92.07-95.80); coverage probability GAM 92.23% (89.45-94.32). user! 2011 (Benjamin Hofner) gamboostlss 16 / 18

Summary Summary As gamboostlss relies on mboost, we have a well tested, mature back end. The base-learners offer great flexibility when it comes to the type of effects (linear, non-linear, spatial, random, monotonic,... ). Boosting is feasible even if p n. Variable selection is included in the fitting process. Additional shrinkage leads to more stable results. The algorithm is implemented in the R add-on package gamboostlss now available on CRAN. user! 2011 (Benjamin Hofner) gamboostlss 17 / 18

Literature Further literature Mayr, A., N. Fenske, B. Hofner, T. Kneib and M. Schmid, (2010). GAMLSS for high-dimensional data a flexible approach based on boosting. Journal of the Royal Statistical Society, Series C (Applied Statistics), accepted. B. Hofner, A. Mayr, N. Fenske, and M. Schmid, (2010). gamboostlss: Boosting Methods for GAMLSS Models. R package version 1.0-1. Bühlmann, P. and Hothorn, T., (2007). Boosting algorithms: Regularization, prediction and model fitting. Statistical Science, 22, 477-522. Rigby, R. A. and Stasinopoulos, D. M., (2005). Generalized additive models for location, scale and shape. Applied Statistics, 54, 507-554. Schmid, M., S. Potapov, A. Pfahlberg and T. Hothorn, (2010). Estimation and regularization techniques for regression models with multidimensional prediction functions. Statistics and Computing, 20, 139-150. user! 2011 (Benjamin Hofner) gamboostlss 18 / 18