LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape
|
|
- Lily Marianna Lawrence
- 5 years ago
- Views:
Transcription
1 LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf
2 Overview Joint work with Andreas Groll, Julien Hambuckers and Thomas Kneib. 1 Introduction 2 Model specification 3 Model fitting 4 L1-type penalization 5 A simulation study 6 An application on the Munich rent data European Meeting of Statisticians, Helsinki 2017/07/27 1
3 Introduction Helsinki daily maximum temperature data (1993/ /06) T N(µ, σ 2 ). European Meeting of Statisticians, Helsinki 2017/07/27 2
4 Introduction Helsinki daily maximum temperature data (1993/ /06) T N(µ = f (T t 1 ), log(σ 2 ) = β 0 ). European Meeting of Statisticians, Helsinki 2017/07/27 3
5 Introduction Helsinki daily maximum temperature data (1993/ /06) T N(µ = f (T t 1 ), log(σ 2 ) = f (T t 1 )). European Meeting of Statisticians, Helsinki 2017/07/27 4
6 Introduction Helsinki daily maximum temperature data (1993/ /06) T N(µ = f (T t 1 ), log(σ 2 ) = f (T t 1 )). European Meeting of Statisticians, Helsinki 2017/07/27 5
7 Model specification Any parameter of a population distribution D may be modeled by explanatory variables y D (h 1 (θ 1 ) = η 1, h 2 (θ 2 ) = η 2,..., h K (θ K ) = η K ), Each parameter is linked to a structured additive predictor h k (θ k ) = η k = η k (x; β k ) = f 1k (x; β 1k ) f Jk k(x; β Jk k), j = 1,..., J k and k = 1,..., K and h k ( ) are known monotonic link functions. Vector of function evaluations f jk = (f jk (x 1 ; β jk ),..., f jk (x n ; β jk )) f jk (x 1 ; β jk ) f jk =. = f jk(x jk ; β jk ). f jk (x n ; β jk ) European Meeting of Statisticians, Helsinki 2017/07/27 6
8 Model specification European Meeting of Statisticians, Helsinki 2017/07/27 7
9 Model specification For simple linear effects X jk β jk : p jk (β jk ) const. For the smooth terms: p jk (β jk ; τ jk, α jk ) d βjk (β jk τ jk ; α βjk ) d τ jk (τ jk α τ jk ). Using a basis function approach a common choice is d βjk (β jk τ jk, α βjk ) P jk (τ jk ) 1 2 exp ( 1 2 β jkp jk (τ jk )β jk ). Precision matrix P jk (τ jk ) derived from prespecified penalty matrices α βjk = {K 1jk,..., K Ljk }. The variances parameters τ jk are equivalent to the inverse smoothing parameters in a frequentist approach. European Meeting of Statisticians, Helsinki 2017/07/27 8
10 Regularization in the GAMLSS framework ˆ A gradient boosting approach is provided by Mayr et al. (2012). ˆ Allows for variable selection within GAMLSS framework. ˆ Corresponding R-package gamboostlss (Hofner et al., 2015). ˆ Provides a large number of pre-specified distributions. ˆ New: an alternative gradient boosting approach is implemented in the R-package bamlss (Umlauf et al., 2017b): ˆ embeds many different approaches suggested in literature and software, ˆ serves as unified conceptional Lego toolbox for complex regression models. European Meeting of Statisticians, Helsinki 2017/07/27 9
11 Regularization in the GAMLSS framework New model terms f jk (x; β jk ) with LASSO-type penalties. European Meeting of Statisticians, Helsinki 2017/07/27 10
12 Regularization in the GAMLSS framework New model terms f jk (x; β jk ) with LASSO-type penalties. European Meeting of Statisticians, Helsinki 2017/07/27 11
13 Model fitting The main building block of regression model algorithms is the probability density function d y (y θ 1,..., θ K ). Estimation typically requires to evaluate n l(β; y, X) = log d y (y i ; θ i1 = h1 1 (η i1(x i, β 1 )),... i=1..., θ ik = h 1 K (η ik (x i, β K ))), with β = (β 1,..., β K ) and X = (X 1,..., X K ). The log-posterior log π(β, τ ; y, X, α) l(β; y, X) + K J k k=1 j=1 [ log pjk (β jk ; τ jk, α jk ) ], where τ = (τ 1,..., τ K ) = (τ 11,..., τ J 1 1,..., τ 1K,..., τ J K K ) (frequentist, penalized log-likelihood). European Meeting of Statisticians, Helsinki 2017/07/27 12
14 Model fitting Posterior mode estimation, fortunately, partitioned updating is possible β (t+1) 1 = U 1 (β (t) 1, β(t) 2,..., β(t) K ) β (t+1) 2 = U 2 (β (t+1) 1, β (t) 2,..., β(t) K ). β (t+1) K = U K (β (t+1) 1, β (t+1) 2,..., β (t) K ), E.g., Newton-Raphson type updating ( β (t+1) k = U k (β (t) k, ) = β(t) k H kk β (t) k ) 1 s ( β (t) k Can be further partitioned for each function within parameter block k. Moreover, using a basis function approach yields IWLS updates β (t+1) jk ). = (X jkw kk X jk + G jk (τ jk )) 1 X jkw kk (z k η (t) k, j ). European Meeting of Statisticians, Helsinki 2017/07/27 13
15 Model fitting A simple generic algorithm for distributional regression models: while(eps > ε & t < maxit) { for(k in 1:K) { for(j in 1:J[k]) { Compute η = η k f jk. Obtain new (β jk, τ jk) = U jk(x jk, y, η, β [t] jk, τ [t] Update η k. } } t = t + 1 Compute new eps. } jk ). Functions U jk ( ) could either return updates from an optimizing algorithm or proposals from a MCMC sampler. European Meeting of Statisticians, Helsinki 2017/07/27 14
16 L1-type penalization Idea: depending on the type of covariate effects, subtract a combination of (parts of) the following penalty terms τ 1 J(β) from the log-likelihood. Classical LASSO (Tibshirani, 1996): For a metric covariate x jk use J m (β jk ) = β jk. Group LASSO (Meier et al., 2008): For a (dummy-encoded) categorical covariate x jk use J g (β jk ) = β jk 2, with vector β jk collecting all corresponding coefficients. European Meeting of Statisticians, Helsinki 2017/07/27 15
17 L1-type penalization Alternatively, for categorical covariates often clustering of categories with implicit factor selection is desirable. Fused LASSO (Gertheiss and Tutz, 2010): Depending on the nominal (left) or ordinal scale level (right) of the covariate, use J f (β jk ) = l>m w (jk) lm β jkl β jkm or J f (β jk ) = c jk l=1 w (jk) l β jkl β jk,l 1 where c jk is the number of levels of categorical predictor x jk and w (jk) lm, w (jk) l denote suitable weights. Choosing l = 0 as the reference, β jk0 = 0 is fixed. European Meeting of Statisticians, Helsinki 2017/07/27 16
18 L1-type penalization Quadratic approximations of the penalties (compare Oelker & Tutz, 2017) J jk (β jk ) J jk (β (t) jk ) + 1 ( ) β 2 jkp jk (β jk )β jk + (β (t) jk ) P jk (β (t) jk )β(t) jk, with ( a P jk (β (t) jk ) = q jk jkβ (t) jk ) Njk Djk(a jk β(t) jk ) a jk a jk. a jk β(t) jk E.g., β 1 = β is approximated by β 2 + c, hence, IWLS based updating functions U jk ( ) are relatively easy to implement. European Meeting of Statisticians, Helsinki 2017/07/27 17
19 L1-type penalization Example of the approximation of the L 1 norm. Usually setting the constant to c 10 5 works well. European Meeting of Statisticians, Helsinki 2017/07/27 18
20 R package bamlss The package is available at Development version, in R simply type R> install.packages("bamlss", + repos = " European Meeting of Statisticians, Helsinki 2017/07/27 19
21 R package bamlss formula family data model.frame optimizer transformer sampler results summary plot select predict In principle, the setup does not restrict to any specific type of engine (Bayesian or frequentist). European Meeting of Statisticians, Helsinki 2017/07/27 20
22 R package bamlss Type Parser Transformer Optimizer Sampler Results Function bamlss.frame() bamlss.engine.setup(), randomize() bfit(), opt(), cox.mode(), jm.mode() boost(), lasso() GMCMC(), JAGS(), STAN(), BayesX(), cox.mcmc(), jm.mcmc() results.bamlss.default() To implement new engines, only the building block functions have to be exchanged. European Meeting of Statisticians, Helsinki 2017/07/27 21
23 R package bamlss Work in progress... Function beta_bamlss() binomial_bamlss() cnorm_bamlss() cox_bamlss() gaussian_bamlss() gamma_bamlss() gpareto_bamlss() jm_bamlss() multinomial_bamlss() mvn_bamlss() poisson_bamlss()... Distribution Beta distribution Binomial distribution Censored normal distribution Continuous time Cox-model Gaussian distribution Gamma distribution Generalized Pareto distribution Continuous time joint-model Multinomial distribution Multivariate normal distribution Poisson distribution New families only require density, distribution, random number generator, quantile, score and hess functions. European Meeting of Statisticians, Helsinki 2017/07/27 22
24 R package bamlss Wrapper function: R> f <- list(y ~ la(id,fuse=2), sigma ~ la(id,fuse=1)) R> b <- bamlss(f, family = "gaussian", sampler = FALSE, + optimizer = lasso, criterion = "BIC", multiple = TRUE) Standard extractor and plotting functions: summary(), plot(), fitted(), residuals(), predict(), coef(), loglik(), DIC(), samples(),... European Meeting of Statisticians, Helsinki 2017/07/27 23
25 Simulation setting ˆ Y N(µ(x); σ(x) 2 ) ˆ µ(x) = β 0µ + x 1 β 1µ x 4 β 4µ, σ(x) = exp ( β 0σ + x 1 β 1σ x 4 β 4σ ) ˆ x 1, x 2 nominal factors, x 3, x 4 ordinal factors. ˆ Intercepts: β 0µ = 1, β 0σ = 0.5. β 1µ = (0, 0.5, 0.5, 0.5, 0.5, 0.2, 0.2), β 2µ = (0, 1, 1) β 3µ = (0, 0.5, 0.5, 1, 1, 2, 2), β 4µ = (0, 0.3, 0.3) β 1σ = (0, 0.5, 0.4, 0, 0.5, 0.4, 0), β 2σ = (0.4, 0, 0.4) β 3σ = (0, 0, 0.4, 0.4, 0.4, 0.8, 0.8), β 4σ = (0, 0.5, 0.5) ˆ Additional noise variables: x 5, x 6 nominal factors, x 7, x 8 ordinal factors. ˆ n train = 1000 with 100 simulation runs. European Meeting of Statisticians, Helsinki 2017/07/27 24
26 Model comparison The following different models are compared w.r.t. goodness-of-fit: ˆ (unregularized) Maximum-Likelihood. ˆ Gradient boosting with m stop determined by BIC. (bamlss) ˆ Gradient boosting with m stop determined by out-of-sample prediction error (glmboostlss / gamboostlss). ˆ Fused LASSO with global λ determined by BIC. ˆ Fused LASSO with λ µ, λ σ determined by BIC. ˆ Fused LASSO with separate λ s for each predictor term. ˆ Combination of gradient boosting and (fused) LASSO. European Meeting of Statisticians, Helsinki 2017/07/27 25
27 Goodness-of-fit measures The different models are compared (amongst others) w.r.t. the following criteria: ˆ MSE βµ = β µ ˆβ µ 2, MSE βσ = β σ ˆβ σ 2 ˆ False positives of differences. ˆ False positives of pure noise variables. ˆ False negatives of non-noise variables. European Meeting of Statisticians, Helsinki 2017/07/27 26
28 Results: MSE βµ MSE.mu T GamBoost T T European Meeting of Statisticians, Helsinki 2017/07/27 27
29 Results: MSE βσ MSE.sigma T T GamBoost T European Meeting of Statisticians, Helsinki 2017/07/27 28
30 Results: FP of differences µ: nominal 1 µ: nominal µ: ordinal 1 µ: ordinal σ: nominal 1 σ: nominal σ: ordinal 1 σ: ordinal European Meeting of Statisticians, Helsinki 2017/07/27 29
31 Results: FP of pure noise variables µ: nominal 3 µ: nominal µ: ordinal 3 µ: ordinal σ: nominal 3 σ: nominal σ: ordinal 3 σ: ordinal European Meeting of Statisticians, Helsinki 2017/07/27 30
32 Results: FN of non-noise variables µ: nominal 1 µ: nominal µ: ordinal 1 µ: ordinal σ: nominal 1 σ: nominal σ: ordinal 1 σ: ordinal European Meeting of Statisticians, Helsinki 2017/07/27 31
33 The Munich rent data 2007 ˆ Munich rent standard data: used as a reference for the average rent of a flat depending on its characteristics and spatial features. ˆ n = 3015 households. ˆ Response: monthly rent per square meter (in e). ˆ Covariates: out of a large set we incorporate a selection of 9 factors (ordered and nominal/binary; similar to Gertheiss and Tutz, 2010). ˆ Model: Gaussian GAMLSS and use for both distribution parameters, i.e. µ and σ, a combination of the two different fused LASSO penalties. ˆ Optimal tuning parameters: via BIC on a 2-dimensional grid. European Meeting of Statisticians, Helsinki 2017/07/27 32
34 The Munich rent data 2007 Marginal BIC curves for µ & σ (in each case fixing the other tuning parameter at its respective BIC-minimum) European Meeting of Statisticians, Helsinki 2017/07/27 33
35 The Munich rent data 2007 Ordinal fused coefficient paths for the year of construction for parameters µ (left) and σ (right). European Meeting of Statisticians, Helsinki 2017/07/27 34
36 The Munich rent data 2007 Nominal fused coefficient paths for the district effect for parameters µ (left) and σ (right). European Meeting of Statisticians, Helsinki 2017/07/27 35
37 Summary & Conclusions ˆ Different LASSO-type penalties for the GAMLSS have been proposed. ˆ In particular, good results of the fused LASSO are obtained if clustering of categories is desirable/necessary. ˆ Boosting methods turned out to be problematic, if categories are clustered. ˆ Reasonable results for the application on the Munich rent data. ˆ Implementation available in the R-package bamlss. ˆ Further investigation of the combination of boosting and LASSO. European Meeting of Statisticians, Helsinki 2017/07/27 36
38 References & Software Gertheiss, J. & Tutz, G. (2010). Sparse modeling of categorial explanatory variables. The Annals of Applied Statistics, 4(4), Hofner, B., A. Mayr, & M. Schmid (2016). gamboostlss: An R package for model building and variable selection in the GAMLSS framework. Journal of Statistical Software 74(1), Mayr, A., Fenske, N., Hofner, B., Kneib, T., & Schmid, M.(2012). Generalized additive models for location, scale and shape for high-dimensional data - a flexible approach based on boosting. Journal of the Royal Statistical Society, Series C, 61(3), Meier, L., Van De Geer, S., & Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society B 70(1), Oelker, M.-R. & Tutz, G. (2017). A uniform framework for the combination of penalties in generalized structured models. Advances in Data Analysis and Classification 11(1), European Meeting of Statisticians, Helsinki 2017/07/27 37
39 References & Software Rigby, R. A. and D. M. Stasinopoulos (2005). Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society: Series C (Applied Statistics) 54(3), Stasinopoulos, D.M. & Rigby, R.A. (2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software 23(7). Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B Umlauf, N., Klein, N., & Zeileis, A. (2017a). BAMLSS: Bayesian additive models for location, scale and shape (and beyond). Working Paper, Faculty of Economics and Statistics, University of Innsbruck. Umlauf, N., Klein, N., Zeileis, A. & Köhler, M. (2017b). bamlss: Bayesian additive models for location, scale and shape (and beyond). R package version 0.1-2, URL: European Meeting of Statisticians, Helsinki 2017/07/27 38
40 Thank you for your attention! Nikolaus Umlauf
41 Weights for fused LASSO Along the lines of Gertheiss and Tutz (2010), for the fused LASSO the following weights are used: For nominal factors: w (jk) For ordinal factors: lm = 2 βjkl ML βjkm ML (c jk + 1)c jk w (jk) l = β ML jkl 1 βjk,l 1 ML c jk n (jk) l + n (jk) m n n (jk) l + n (jk) l 1 n Here, n (jk) l denotes the number of observations on level l of predictor x jk. European Meeting of Statisticians, Helsinki 2017/07/27 1
gamboostlss: boosting generalized additive models for location, scale and shape
gamboostlss: boosting generalized additive models for location, scale and shape Benjamin Hofner Joint work with Andreas Mayr, Nora Fenske, Thomas Kneib and Matthias Schmid Department of Medical Informatics,
More informationBAMLSS Bayesian Additive Models for Location Scale and Shape (and Beyond)
BAMLSS Bayesian Additive Models for Location Scale and Shape (and Beyond) Nikolaus Umlauf http://eeecon.uibk.ac.at/~umlauf/ Overview Introduction Distributional regression Lego toolbox R package bamlss
More informationModelling geoadditive survival data
Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model
More informationBeyond Mean Regression
Beyond Mean Regression Thomas Kneib Lehrstuhl für Statistik Georg-August-Universität Göttingen 8.3.2013 Innsbruck Introduction Introduction One of the top ten reasons to become statistician (according
More informationOn the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016
On the Dependency of Soccer Scores - A Sparse Bivariate Poisson Model for the UEFA EURO 2016 A. Groll & A. Mayr & T. Kneib & G. Schauberger Department of Statistics, Georg-August-University Göttingen MathSport
More informationRegularization in Cox Frailty Models
Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University
More informationVariable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting
Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,
More informationA general mixed model approach for spatio-temporal regression data
A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression
More informationAnalysing geoadditive regression data: a mixed model approach
Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression
More informationStructured Additive Regression Models: An R Interface to BayesX
Structured Additive Regression Models: An R Interface to BayesX Nikolaus Umlauf, Thomas Kneib, Stefan Lang, Achim Zeileis http://eeecon.uibk.ac.at/~umlauf/ Overview Introduction Structured Additive Regression
More informationProbabilistic temperature post-processing using a skewed response distribution
Probabilistic temperature post-processing using a skewed response distribution Manuel Gebetsberger 1, Georg J. Mayr 1, Reto Stauffer 2, Achim Zeileis 2 1 Institute of Atmospheric and Cryospheric Sciences,
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationRegularization Path Algorithms for Detecting Gene Interactions
Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationmboost - Componentwise Boosting for Generalised Regression Models
mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten Hothorn Department of Statistics Ludwig-Maximilians-University Munich 13.8.2008 Boosting in a Nutshell Boosting
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationVariable Selection and Model Choice in Survival Models with Time-Varying Effects
Variable Selection and Model Choice in Survival Models with Time-Varying Effects Boosting Survival Models Benjamin Hofner 1 Department of Medical Informatics, Biometry and Epidemiology (IMBE) Friedrich-Alexander-Universität
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationVariable Selection and Weighting by Nearest Neighbor Ensembles
Variable Selection and Weighting by Nearest Neighbor Ensembles Jan Gertheiss (joint work with Gerhard Tutz) Department of Statistics University of Munich WNI 2008 Nearest Neighbor Methods Introduction
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationA Framework for Unbiased Model Selection Based on Boosting
Benjamin Hofner, Torsten Hothorn, Thomas Kneib & Matthias Schmid A Framework for Unbiased Model Selection Based on Boosting Technical Report Number 072, 2009 Department of Statistics University of Munich
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationBayesian Effect Fusion for Categorical Predictors
Bayesian Effect Fusion for Categorical Predictors Daniela Pauger, Helga Wagner arxiv:1703.10245v2 [stat.co] 16 Nov 2017 November 17, 2017 Abstract We propose a Bayesian approach to obtain a sparse representation
More informationModeling Real Estate Data using Quantile Regression
Modeling Real Estate Data using Semiparametric Quantile Regression Department of Statistics University of Innsbruck September 9th, 2011 Overview 1 Application: 2 3 4 Hedonic regression data for house prices
More informationRatemaking application of Bayesian LASSO with conjugate hyperprior
Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationConditional Transformation Models
IFSPM Institut für Sozial- und Präventivmedizin Conditional Transformation Models Or: More Than Means Can Say Torsten Hothorn, Universität Zürich Thomas Kneib, Universität Göttingen Peter Bühlmann, Eidgenössische
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationKneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"
Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/
More informationSelection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty
Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the
More informationA note on the group lasso and a sparse group lasso
A note on the group lasso and a sparse group lasso arxiv:1001.0736v1 [math.st] 5 Jan 2010 Jerome Friedman Trevor Hastie and Robert Tibshirani January 5, 2010 Abstract We consider the group lasso penalty
More informationPenalized Regression
Penalized Regression Deepayan Sarkar Penalized regression Another potential remedy for collinearity Decreases variability of estimated coefficients at the cost of introducing bias Also known as regularization
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationLinear model selection and regularization
Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationMachine Learning. Part 1. Linear Regression. Machine Learning: Regression Case. .. Dennis Sun DATA 401 Data Science Alex Dekhtyar..
.. Dennis Sun DATA 401 Data Science Alex Dekhtyar.. Machine Learning. Part 1. Linear Regression Machine Learning: Regression Case. Dataset. Consider a collection of features X = {X 1,...,X n }, such that
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationLecture 14: Shrinkage
Lecture 14: Shrinkage Reading: Section 6.2 STATS 202: Data mining and analysis October 27, 2017 1 / 19 Shrinkage methods The idea is to perform a linear regression, while regularizing or shrinking the
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationSTAT 740: Testing & Model Selection
STAT 740: Testing & Model Selection Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 34 Testing & model choice, likelihood-based A common way to
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationRegularization Methods for Categorical Data
1/62 Regularization Methods for Categorical Data Gerhard Tutz Ludwig-Maximilians-Universität München Wien Oktober 2011 Framework for Univariate Responses Model for µ i = E(y i x i ) µ i = h(η i ) or g(µ
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationSTATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010
STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationSimultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR
Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR Howard D. Bondell and Brian J. Reich Department of Statistics, North Carolina State University,
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationChoosing among models
Eco 515 Fall 2014 Chris Sims Choosing among models September 18, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationPairwise Fused Lasso
Sebastian Petry, Claudia Flexeder & Gerhard Tutz Pairwise Fused Lasso Technical Report Number 102, 2011 Department of Statistics University of Munich http://www.stat.uni-muenchen.de Pairwise Fused Lasso
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationTreatment Effects Beyond the Mean: A Practical Guide. Using Distributional Regression
Treatment Effects Beyond the Mean: A Practical Guide Using Distributional Regression Maike Hohberg 1, Peter Pütz 1, and Thomas Kneib 1 1 University of Goettingen October 24, 2017 Abstract This paper introduces
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationLinear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman
Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationParameter Norm Penalties. Sargur N. Srihari
Parameter Norm Penalties Sargur N. srihari@cedar.buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationBiostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences
Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per
More informationPackage effectfusion
Package November 29, 2016 Title Bayesian Effect Fusion for Categorical Predictors Version 1.0 Date 2016-11-21 Author Daniela Pauger [aut, cre], Helga Wagner [aut], Gertraud Malsiner-Walli [aut] Maintainer
More informationThe OSCAR for Generalized Linear Models
Sebastian Petry & Gerhard Tutz The OSCAR for Generalized Linear Models Technical Report Number 112, 2011 Department of Statistics University of Munich http://www.stat.uni-muenchen.de The OSCAR for Generalized
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationGradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates
DOI 1.17/s11-17-975- Gradient boosting for distributional regression: faster tuning and improved variable selection via non updates Janek Thomas 1 Andreas Mayr,3 Bernd Bischl 1 Matthias Schmid 3 Adam Smith
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationEstimating Sparse High Dimensional Linear Models using Global-Local Shrinkage
Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationGeneralized Bayesian Inference with Sets of Conjugate Priors for Dealing with Prior-Data Conflict
Generalized Bayesian Inference with Sets of Conjugate Priors for Dealing with Prior-Data Conflict Gero Walter Lund University, 15.12.2015 This document is a step-by-step guide on how sets of priors can
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationCross-sectional space-time modeling using ARNN(p, n) processes
Cross-sectional space-time modeling using ARNN(p, n) processes W. Polasek K. Kakamu September, 006 Abstract We suggest a new class of cross-sectional space-time models based on local AR models and nearest
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More information