mboost - Componentwise Boosting for Generalised Regression Models

Similar documents
A Framework for Unbiased Model Selection Based on Boosting

Variable Selection and Model Choice in Survival Models with Time-Varying Effects

Analysing geoadditive regression data: a mixed model approach

gamboostlss: boosting generalized additive models for location, scale and shape

A general mixed model approach for spatio-temporal regression data

Model-based boosting in R

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Variable Selection and Model Choice in Structured Survival Models

Conditional Transformation Models

Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )

Beyond Mean Regression

Estimating prediction error in mixed models

Categorical Predictor Variables

Linear Models in Machine Learning

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Modelling geoadditive survival data

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions

Proteomics and Variable Selection

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

Boosting Techniques for Nonlinear Time Series Models

Conjugate direction boosting

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost

Regularization Paths

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Available online at ScienceDirect. Procedia Environmental Sciences 27 (2015 ) Spatial Statistics 2015: Emerging Patterns

Statistics for high-dimensional data: Group Lasso and additive models

MSA220/MVE440 Statistical Learning for Big Data

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Business Statistics. Tommaso Proietti. Model Evaluation and Selection. DEF - Università di Roma 'Tor Vergata'

Boosting. March 30, 2009

High-dimensional regression

Tutz, Binder: Boosting Ridge Regression

Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Regularization in Cox Frailty Models

Least Angle Regression, Forward Stagewise and the Lasso

COMS 4771 Regression. Nakul Verma

The OSCAR for Generalized Linear Models

Regularization Paths. Theme

Bridging the two cultures: Latent variable statistical modeling with boosted regression trees

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Regression Shrinkage and Selection via the Lasso

10-701/ Machine Learning - Midterm Exam, Fall 2010

Boosting structured additive quantile regression for longitudinal childhood obesity data

Estimating Timber Volume using Airborne Laser Scanning Data based on Bayesian Methods J. Breidenbach 1 and E. Kublin 2

Learned-Loss Boosting

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

ECE 5424: Introduction to Machine Learning

Classification. Chapter Introduction. 6.2 The Bayes classifier

Hierarchical Modeling for Univariate Spatial Data

Lecture 4 Multiple linear regression

Genomics, Transcriptomics and Proteomics in Clinical Research. Statistical Learning for Analyzing Functional Genomic Data. Explanation vs.

A Short Introduction to the Lasso Methodology

Chapter 3. Linear Models for Regression

STK-IN4300 Statistical Learning Methods in Data Science

Data Mining Stat 588

LogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets

Geoadditive regression modeling of stream biological condition

Nonparametric Small Area Estimation Using Penalized Spline Regression

Bayesian Linear Regression

Statistical Data Mining and Machine Learning Hilary Term 2016

Linear Models for Regression CS534

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

CS6220: DATA MINING TECHNIQUES

VBM683 Machine Learning

Quantile Regression Methods for Reference Growth Charts

Hierarchical Modelling for Univariate Spatial Data

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Journal of Machine Learning Research 7 (2006) Submitted 08/05; Revised 3/06; Published 6/06. Sparse Boosting. Abstract

Iterative Selection Using Orthogonal Regression Techniques

Linear Regression (9/11/13)

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Lecture 3: Statistical Decision Theory (Part II)

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Linear models and their mathematical foundations: Simple linear regression

Generalized Additive Models

Pruscha, Göttlein: Regression Analysis for Forest Inventory Data with Time and Space Dependencies

Spectral Regularization

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

CPSC 340: Machine Learning and Data Mining. Gradient Descent Fall 2016

A Magiv CV Theory for Large-Margin Classifiers

Gradient Boosting (Continued)

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

econstor Make Your Publications Visible.

Bayesian non-parametric model to longitudinally predict churn

Regularization via Spectral Filtering

Generalized linear models

Additional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:

Transcription:

mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten Hothorn Department of Statistics Ludwig-Maximilians-University Munich 13.8.2008

Boosting in a Nutshell Boosting in a Nutshell Boosting is a simple but versatile iterative stepwise gradient descent algorithm. Versatility: Estimation problems are described in terms of a loss function ρ (e.g. the negative log-likelihood). Simplicity: Estimation reduces to iterative fitting of base-learners to residuals (e.g. regression trees). Componentwise boosting yields a structured model fit (interpretable results), model choice and variable selection. mboost - Componentwise Boosting for Generalised Regression Models 1

Example: Estimation of a generalised linear model Boosting in a Nutshell E(y η) = h(η), η = β 0 + x 1 β 1 +... + x p β p. Employ the negative log-likelihood as the loss function ρ. Componentwise boosting algorithm: (i) Initialise the parameters (e.g. ˆβj 0); set m = 0. (ii) Compute the negative gradients ( residuals ) u i = η ρ(y i, η) η=ˆη [m 1], i = 1,..., n. mboost - Componentwise Boosting for Generalised Regression Models 2

Boosting in a Nutshell (iii) Fit least-squares base-learning procedures for all the parameters yielding and find the best-fitting one: b j = (X jx j ) 1 X ju j = argmin 1 j p n (u i x ij b j ) 2. i=1 (iv) Update the estimates via and ˆβ [m] j = ˆβ [m 1] j + νb j, ˆβ [m] j = ˆβ [m 1] j for all j j. (v) If m < m stop, increase m by 1 and go back to step (ii). mboost - Componentwise Boosting for Generalised Regression Models 3

Boosting in a Nutshell The reduction factor ν turns the base-learner into a weak learning procedure (avoids to large steps along the gradient in the boosting algorithm). The componentwise strategy yields a structured model fit (recurs to single regression coefficients). Most crucial point: Determine optimal stopping iteration m stop. Most frequent strategies: AIC-reduction or cross-validation. When stopping the algorithm, redundant covariate effects will never have been selected as the best-fitting component These drop completely out of the model. Componentwise boosting with early stopping implements model choice and variable selection. mboost - Componentwise Boosting for Generalised Regression Models 4

mboost mboost mboost implements a variety of base-learners and boosting algorithms for generalised regression models. Examples of loss functions: L 2, L 1, exponential family log-likelihoods, Huber, etc. Three model types: glmboost for models with linear predictor. blackboost for prediction oriented black-box models. gamboost for models with additive predictors. mboost - Componentwise Boosting for Generalised Regression Models 5

Various baselearning procedures: mboost bbs: penalized B-splines for univariate smoothing and varying coefficients. bspatial: penalized tensor product splines for spatial effects and interaction surfaces. brandom: ridge regression for random intercepts and slopes. btree: stumps for one or two variables. further univariate smoothing baselearners: bss, bns. mboost - Componentwise Boosting for Generalised Regression Models 6

Penalised Least Squares Base-Learners Penalised Least Squares Base-Learners Several of mboost s baselearning procedures are based on penalised least-squares fits. Characterised by the hat matrix S λ = X(X X + λk) 1 X with smoothing parameter λ and penalty matrix K. Crucial: Choose the smoothing parameter appropriately. To avoid biased selection towards more flexible effects, all base-learners should be assigned comparable degrees of freedom df(λ) = trace(x(x X + λk) 1 X ). mboost - Componentwise Boosting for Generalised Regression Models 7

Penalised Least Squares Base-Learners In many cases, a reparameterisation is required to achieve suitable values for the degrees of freedom. Example: A linear effect remains unpenalised with penalised spline smoothing and second derivative penalty df(λ) 2. Decompose f(x) into a linear component and the deviation from the linear component. Assign separate base-learners (with df = 1) to the linear effect and the deviation. Additional advantage: Allows to decide whether a non-linear effect is required. mboost - Componentwise Boosting for Generalised Regression Models 8

Forest Health Example: Geoadditive Regression Forest Health Example: Geoadditive Regression Aim of the study: Identify factors influencing the health status of trees. Database: Yearly visual forest health inventories carried out from 1983 to 2004 in a northern Bavarian forest district. 83 observation plots of beeches within a 15 km times 10 km area. Response: binary defoliation indicator y it of plot i in year t (1 = defoliation higher than 25%). Spatially structured longitudinal data. mboost - Componentwise Boosting for Generalised Regression Models 9

Forest Health Example: Geoadditive Regression Covariates: Continuous: Categorical Binary average age of trees at the observation plot elevation above sea level in meters inclination of slope in percent depth of soil layer in centimeters ph-value in 0 2cm depth density of forest canopy in percent thickness of humus layer in 5 ordered categories base saturation in 4 ordered categories type of stand application of fertilisation mboost - Componentwise Boosting for Generalised Regression Models 10

Specification of a logit model Forest Health Example: Geoadditive Regression P (y it = 1) = exp(η it) 1 + exp(η it ) with geoadditive predictor η it. All continuous covariates are included with penalised spline base-learners decomposed into a linear component and the orthogonal deviation, i.e. g(x) = xβ + g centered (x). An interaction effect between age and calendar time is included in addition (centered around the constant effect). The spatial effect is included both as a plot-specific random intercept and a bivariate surface of the coordinates (centered around the constant effect). Categorical and binary covariates are included as least-squares base-learners. mboost - Componentwise Boosting for Generalised Regression Models 11

Results: Forest Health Example: Geoadditive Regression No effects of ph-value, inclination of slope and elevation above sea level. Parametric effects for type of stand, fertilisation, thickness of humus layer, and base saturation. Nonparametric effects for canopy density and soil depth. Both spatially structured effects (surface) and unstructured effect (random effect) with a clear domination of the latter. Interaction effect between age and calendar time. mboost - Componentwise Boosting for Generalised Regression Models 12

Forest Health Example: Geoadditive Regression canopy density depth of soil layer 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.8 1.0 Correlated spatial effect 2.0 10 20 30 40 50 Uncorrelated random effect 0.02 1.5 0.01 1.0 0.5 0.00 0.0 0.01 0.5 1.0 mboost - Componentwise Boosting for Generalised Regression Models 13

Forest Health Example: Geoadditive Regression 2 1 0 1 2 200 150 age of the tree 100 50 1985 1990 1995 calendar year 2000 mboost - Componentwise Boosting for Generalised Regression Models 14

Summary Summary Boosting provides both a structured model fit and a possibility for model choice and variable selection in generalised regression models. Simple approach based on iterative fitting of negative gradients. Flexible class of base-learners based on penalised least squares. Implemented in the R package mboost (Hothorn & Bühlmann with contributions by Kneib & Schmid). mboost - Componentwise Boosting for Generalised Regression Models 15

References: Summary Kneib, T., Hothorn, T. and Tutz, G. (2008): Model Choice and Variable Selection in Geoadditive Regression. To appear in Biometrics. Bühlmann, P. and Hothorn, T. (2007): Boosting Algorithms: Regularization, Prediction and Model Fitting. Statistical Science, 22, 477 505. Find out more: http://www.stat.uni-muenchen.de/~kneib mboost - Componentwise Boosting for Generalised Regression Models 16