PENALIZING YOUR MODELS

Similar documents
Machine Learning for OR & FE

MS-C1620 Statistical inference

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

Theorems. Least squares regression

Prediction & Feature Selection in GLM

ISyE 691 Data mining and analytics

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

A Modern Look at Classical Multivariate Techniques

Linear Regression 9/23/17. Simple linear regression. Advertising sales: Variance changes based on # of TVs. Advertising sales: Normal error?

A simulation study of model fitting to high dimensional data using penalized logistic regression

Bayesian linear regression

The OSCAR for Generalized Linear Models

LOGISTIC REGRESSION Joseph M. Hilbe

Regularization: Ridge Regression and the LASSO

Stat 587: Key points and formulae Week 15

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Regression Model Building

Linear regression methods

MSA220/MVE440 Statistical Learning for Big Data

Tuning Parameter Selection in L1 Regularized Logistic Regression

New Statistical Methods That Improve on MLE and GLM Including for Reserve Modeling GARY G VENTER

Feature Engineering, Model Evaluations

Variable Selection under Measurement Error: Comparing the Performance of Subset Selection and Shrinkage Methods

Proteomics and Variable Selection

Sparse Linear Models (10/7/13)

Generalized Linear Models (GLZ)

Scientific Measurement

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Linear Methods for Regression. Lijun Zhang

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

Prediction of Bike Rental using Model Reuse Strategy

Transformations The bias-variance tradeoff Model selection criteria Remarks. Model selection I. Patrick Breheny. February 17

Generalized Linear Models Introduction

Package covtest. R topics documented:

MEANINGFUL REGRESSION COEFFICIENTS BUILT BY DATA GRADIENTS

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions

Regularized Multiple Regression Methods to Deal with Severe Multicollinearity

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

In Search of Desirable Compounds

IEOR165 Discussion Week 5

A Review of 'Big Data' Variable Selection Procedures For Use in Predictive Modeling

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Multicollinearity and A Ridge Parameter Estimation Approach

Iterative Selection Using Orthogonal Regression Techniques

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Outlier detection and variable selection via difference based regression model and penalized regression

A Practitioner s Guide to Generalized Linear Models

LECTURE 10: LINEAR MODEL SELECTION PT. 1. October 16, 2017 SDS 293: Machine Learning

Dimension Reduction Methods

Homework 1: Solutions

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

Generalized Elastic Net Regression

Ratemaking application of Bayesian LASSO with conjugate hyperprior

Regularized Skew-Normal Regression. K. Shutes & C.J. Adcock. March 31, 2014

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Introduction to Machine Learning

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Regularization Paths. Theme

LASSO Review, Fused LASSO, Parallel LASSO Solvers

Introduction to Machine Learning Midterm, Tues April 8

Data Mining Stat 588

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

High-Dimensional Statistical Learning: Introduction

Chapter 3. Linear Models for Regression

Fast Regularization Paths via Coordinate Descent

Regression, Ridge Regression, Lasso

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

MSA220/MVE440 Statistical Learning for Big Data

Day 4: Shrinkage Estimators

High-dimensional regression

Shrinkage Methods: Ridge and Lasso

Regularized Regression A Bayesian point of view

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

PARTICLE COUNTS MISLEAD SCIENTISTS: A JMP AND SAS INTEGRATION CASE STUDY INVESTIGATING MODELING

Summary and discussion of The central role of the propensity score in observational studies for causal effects

Machine Learning for OR & FE

Biostatistics Advanced Methods in Biostatistics IV

The lasso, persistence, and cross-validation

GLM I An Introduction to Generalized Linear Models

Linear Regression Models P8111

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Regularization Paths

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

S-PLUS and R package for Least Angle Regression

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Pathwise coordinate optimization

Relationship between ridge regression estimator and sample size when multicollinearity present among regressors

Generalized linear models

Or How to select variables Using Bayesian LASSO

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Content Preview. Multivariate Methods. There are several theoretical stumbling blocks to overcome to develop rating relativities

A robust hybrid of lasso and ridge regression

Linear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017

Bayesian Regularization for Class Rates

Transcription:

PENALIZING YOUR MODELS AN OVERVIEW OF THE GENERALIZED REGRESSION PLATFORM Michael Crotty & Clay Barker Research Statisticians JMP Division, SAS Institute Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d.

OUTLINE Motivation Generalized Linear Models Penalization Methods New features 11.1 11.2 12 Examples Heart disease data Bowl game data Conclusions References

MOTIVATION REGRESSION Basic regression model: E y i = β 0 + β 1 x 1i + β 2 x 2i + Why do we build regression models? 1. To better understand our data and make decisions 2. To make predictions for new observations The Generalized Regression platform in JMP Pro can help users with both of these goals.

MOTIVATION The Generalized Regression platform (personality of Fit Model) allows users to fit Penalized Generalized Linear Models. Penalized means that we are using a penalized likelihood to fit our model. This allows us to do variable selection and shrinkage. Generalized means that our response doesn t have to be normally distributed. Not everything is normal: counts, yes/no, skewed, outliers,

GENERALIZED LINEAR MODELS Many regression tools assume the response variable is approximately normally distributed. But, this isn t always an appropriate assumption Insurance claims are often skewed (gamma distribution) Presence of heart disease is yes/no (binomial distribution) Number of horse kick victims (Poisson distribution) JMP 11 supports 7 distributions for the response. JMP 12 supports 14 distributions for the response.

GENERALIZED LINEAR MODELS RESPONSE DISTRIBUTIONS The following distributions are available in the Generalized Regression platform. Continuous Discrete Zero-inflated Normal Binomial ZI Binomial Cauchy Beta Binomial ZI Beta Binomial Exponential Poisson ZI Poisson Gamma Negative Binomial ZI Negative Binomial Beta ZI Gamma Note: Distributions added in JMP 12 are italicized above.

GENERALIZED LINEAR MODELS BRIEF DETAILS Linear model: E Y = μ where μ = Xβ Generalized linear model (in three parts): Mean component: E Y = μ Systematic component: linear predictor η = p 1 x j β j Link function: η = g(μ) Then, y x, β is f distributed with link function g. Most common GLM is probably logistic regression.

PENALIZATION METHODS BENEFITS A penalty is not a bad thing; rather, it s a great thing! Maximum likelihood estimation gives us the model that best fits the observed data. If we optimize a penalized likelihood instead, We will predict better on new data We should have a model that is easier to interpret We can overcome problems with our data: Not enough observations (more columns than rows, n<p ) Correlated predictors (multicollinearity)

PENALIZATION METHODS THE PENALTY We want our model to fit well, but not get too complex. r Objective function: Q β = log L β + λ j=1 λ controls the stiffness of the penalty. p(β j ) Method p(β j ) Lasso β j Ridge β j 2 Elastic Net mix of β j and β j 2 Many penalties have been proposed, but these three are available in JMP Pro.

PENALIZATION METHODS KEY IDEA What is special about these penalized estimates? Key idea: accept some bias in order to reduce variance. Figure: Two different estimators of Pr(coin lands on heads)

PENALIZATION METHODS OVERFITTING What if we have tons of predictors? Don t use them all! Overfitting means that our model will fit our observed data very well, but it will perform poorly on new observations. The Lasso and Elastic Net shrink some coefficients to zero, providing us with variable selection. Selection and shrinkage help us avoid overfitting.

PENALIZATION METHODS HOW TO CHOOSE AN ESTIMATION METHOD? Shrinkage Selection No Yes No ML Forward Selection Yes Ridge Lasso & Elastic Net Lasso tends to give you a more parsimonious model than Elastic Net. Elastic Net can better handle collinearity.

PENALIZATION METHODS SOLUTION PATH GRAPH The solution path is crucial to understanding what has happened in the model fitting process. Each line in the plot represents the value of a parameter in the model. As the penalty is relaxed more terms enter the model (left to right). MLE estimate is at the far right of the solution path (in most cases). Parameter estimates can change sign as they travel along the path.

PENALIZATION METHODS ADAPTIVE LASSO AND ELASTIC NET What if we know which predictors are good and/or bad? The l 1 penalty can be adaptive. The adaptive forms of the Lasso and Elastic Net penalties use weights generated from the original Maximum Likelihood estimates.

PENALIZATION METHODS VALIDATION METHODS The platform has many options for validation methods. These are used for finding the optimal value of the penalty. Kfold Holdback Leave-One-Out BIC (not available for Ridge) AIC (not available for Ridge) None (only available for Maximum Likelihood) Validation Column Note: Maximum Likelihood can only use the last two methods (no penalty to optimize).

NEW IN JMP 11.1 Improved computation time (sometimes drastically). Lines in the solution path became selectable. Early stopping rules were added (another way of speeding up fitting). Parameter Estimates for Original Predictors added to report output. Bug fixes.

NEW IN JMP 11.2 Improved computation time. Better drawing of the solution path (no longer selectable on y=0 line). Added support for validation columns with a Test set (training/validation/test). Bug fixes.

NEW IN JMP 12 Speed (by using a new, faster optimizer for fitting models). New response distributions. Interactive solution path. Advanced controls for estimation options. New diagnostic plots. Inverse prediction. Multiple comparisons. Ability to force specific terms into the model. Forward selection. Bug fixes.

NEW IN JMP 12 DISTRIBUTIONS FOR THE RESPONSE Many more distributions added in JMP 12:

NEW IN JMP 12 INTERACTIVE SOLUTION PATH Allows you to explore different models by changing the penalty setting. For KFold, Leave-one-out, BIC and AIC, you also get some guidance for equivalent models.

NEW IN JMP 12 ADVANCED CONTROLS Advanced controls allow setting: Number of grid points Scale of the grid points Minimum penalty fraction (p) Elastic Net alpha (balance between l 1 and l 2 penalties)

NEW IN JMP 12 ADVANCED GRID CONTROLS The fitting routine searches over a grid of λ values to find the optimal penalty. JMP12 allows user to control: Number of grid points Default is 150 points Scale of the grid points Linear, Log, or Square Root scale Default is Linear

NEW IN JMP 12 OTHER ADVANCED CONTROLS Minimum penalty fraction (p) p = 0 no penalty MLE If MLE is overfitting, solution path can be distorted! Elastic Net alpha Balances between l 1 and l 2 penalties Avoids searching over a 2 nd grid α = 0 Ridge (all l 2 penalty) α = 1 Lasso (all l 1 penalty) Default=0.9 (variable selection with some singularity handling)

EXAMPLES Heart Disease Data Bowl Game Data

CONCLUSIONS The Generalized Regression platform allows users to build models that are easy to interpret and that predict well. Even when the response isn t normally distributed and when the predictors are correlated and even when we don t have enough observations! Many new features and improved computational time will be included in JMP Pro version 12.

REFERENCES Burnham, K. and Anderson, D. 2002. Model Selection and Multimodel Inference. New York, NY: Springer. Gregg, X. Best teams for college bowl attendance & TV ratings, http://blogs.sas.com/content/jmp/2013/12/06/most-and-leastdesirable-bowl-teams/ (accessed 15Aug2014). Hastie, T., Tibshirani, R., and Friedman, J. 2009. The Elements of Statistical Learning, Second Edition, New York, NY: Springer. Hesterburg, T., Choi, N., Meier, L., and Fraley, C. 2008. Least angle and l 1 penalized regression: A review, Statistical Surveys, 2, p. 61 93. Hoerl, A. and Kennard, R. 1970. Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics, 12:1, p. 55 67. McCullagh, P. and Nelder, J. 1989. Generalized Linear Models, Second Edition, London: Chapman & Hall. SAS Institute Inc. 2015. JMP 12 Fitting Linear Models. Cary, NC: SAS Institute Inc. Tibshirani, R. Model selection and validation 2: Model assessment, more cross-validation, http://www.stat.cmu.edu/ ryantibs/datamining/lectures/19-val2.pdf (accessed 15Aug2014). Tibshirani, R. 1996. Regression Shrinkage and Selection via the Lasso, JRSSB, 58:1, p. 267 288. von Bortkiewicz, L. 1898. Das Gesetz der Kleinen Zahlen. Leipzig: Teubner. Zou, H. and Hastie, T. 2005. Regularization and variable selection via the elastic net, JRSSB, 67:2, p. 301 320.

THANK YOU! Michael Crotty & Clay Barker Research Statisticians JMP Division, SAS Institute Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. www.sas.com