ESL Chap3. Some extensions of lasso

Similar documents
A note on the group lasso and a sparse group lasso

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

Linear Methods for Regression. Lijun Zhang

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Regularization and Variable Selection via the Elastic Net

Regression Shrinkage and Selection via the Lasso

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

Stability and the elastic net

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Linear Model Selection and Regularization

A Short Introduction to the Lasso Methodology

MS-C1620 Statistical inference

Machine Learning for OR & FE

Chapter 3. Linear Models for Regression

Different types of regression: Linear, Lasso, Ridge, Elastic net, Ro

Bi-level feature selection with applications to genetic association

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

Pathwise coordinate optimization

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

6. Regularized linear regression

Learning with Sparsity Constraints

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

MSA220/MVE440 Statistical Learning for Big Data

Linear regression methods

Oslo Class 6 Sparsity based regularization

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Standardization and the Group Lasso Penalty

Lecture 14: Variable Selection - Beyond LASSO

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

The lasso: some novel algorithms and applications

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

25 : Graphical induced structured input/output models

LASSO Review, Fused LASSO, Parallel LASSO Solvers

Lasso: Algorithms and Extensions

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

MSA220/MVE440 Statistical Learning for Big Data

Sparse Additive Functional and kernel CCA

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Bayesian Grouped Horseshoe Regression with Application to Additive Models

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

Lecture 14: Shrinkage

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent

STANDARDIZATION AND THE GROUP LASSO PENALTY

Relaxed Lasso. Nicolai Meinshausen December 14, 2006

Variable Selection for Highly Correlated Predictors

Statistics for high-dimensional data: Group Lasso and additive models

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Biostatistics Advanced Methods in Biostatistics IV

The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping

25 : Graphical induced structured input/output models

Regularization: Ridge Regression and the LASSO

Sparsity in Underdetermined Systems

DATA MINING AND MACHINE LEARNING

Regularization Paths. Theme

An Introduction to Sparse Approximation

Lecture 5: Soft-Thresholding and Lasso

A Modern Look at Classical Multivariate Techniques

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

STAT 462-Computational Data Analysis

Regularization Paths

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Lass0: sparse non-convex regression by local search

Hierarchical Penalization

Sparse statistical modelling

ORIE 4741: Learning with Big Messy Data. Regularization

Fast Regularization Paths via Coordinate Descent

Sparse Linear Models (10/7/13)

Generalized Power Method for Sparse Principal Component Analysis

High-dimensional Ordinary Least-squares Projection for Screening Variables

ISyE 691 Data mining and analytics

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

CONSISTENT BI-LEVEL VARIABLE SELECTION VIA COMPOSITE GROUP BRIDGE PENALIZED REGRESSION INDU SEETHARAMAN

Lasso Regression: Regularization for feature selection

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Comparisons of penalized least squares. methods by simulations

Stat588 Homework 1 (Due in class on Oct 04) Fall 2011

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Inference Conditional on Model Selection with a Focus on Procedures Characterized by Quadratic Inequalities

Introduction to the genlasso package

Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR

Homework 1: Solutions

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Generalized Elastic Net Regression

Regression Shrinkage and Selection via the Elastic Net, with Applications to Microarrays

Adaptive Lasso for correlated predictors

Iterative Selection Using Orthogonal Regression Techniques

Compressed Sensing in Cancer Biology? (A Work in Progress)

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

UVA CS 4501: Machine Learning. Lecture 6: Linear Regression Model with Dr. Yanjun Qi. University of Virginia

Least Angle Regression, Forward Stagewise and the Lasso

Variable Selection for Highly Correlated Predictors

Sparsity Regularization

Guiding the Lasso: Regression in High Dimensions

A simulation study of model fitting to high dimensional data using penalized logistic regression

Transcription:

ESL Chap3 Some extensions of lasso 1

Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2

Consistency of lasso for model selection A number of authors have studied the ability of the lasso and related procedures to recover the correct model, as N and p grow. (In contrast to low prediction error, which is an easier goal). Many of the results in this area assume a irrepresentability condition on the design matrix of the form (X S T X S ) 1 X S T X S c (1 ɛ) for some ɛ (0, 1]. (1) Here S indexes the subset of features with non-zero coefficients in the true underlying model, and X S are the columns of X corresponding to those features. Similarly S c are the features with true coefficients equal to zero, and X S c the corresponding columns. This says that the least squares coefficients for the columns of X S c on X S are not too large, that is, the good variables S are not too highly correlated with the nuisance variables S c. 3

Adaptive lasso The adaptive lasso uses a weighted penalty of the form p j=1 w j β j where w j = 1/ ˆβ j ν, ˆβ j is the ordinary least squares estimate and ν > 0. The adaptive lasso yields consistent estimates of the parameters while retaining the attractive convexity property of the lasso. Idea is to favor predictors with univariate strength, to avoid spurious selection of noise predictores. Can fit via LARS algorithm. When p > N, can use univariate regression coefficients in place of full least squares estimates. Adaptive lasso recovers the correct model under milder conditions than does the lasso 4

Elastic net In genomic applications, there are often strong correlations among the variables; genes tend to operate in molecular pathways. The lasso penalty is somewhat indifferent to the choice among a set of strong but correlated variables. The ridge penalty, on the other hand, tends to shrink the coefficients of correlated variables toward each other The elastic net penalty is a compromise, and has the form p ( α βj + (1 α)βj 2 ). (2) j=1 The second term encourages highly correlated features to be averaged, while the first term encourages a sparse solution in the coefficients of these averaged features. Unlike lasso, more than min(n, p) coefficients can be nonzero. See Figure 18.5 in text 5

Group lasso In some problems, the predictors belong to pre-defined groups; for example genes that belong to the same biological pathway, or collections of indicator (dummy) variables for representing the levels of a categorical predictor. In this situation it may be desirable to shrink and select the members of a group together. The group lasso is one way to achieve this. 6

Group lasso- continued Suppose that the p predictors are divided into L groups, with p l the number in group l. For ease of notation, we use a matrix X l to represent the predictors corresponding to the lth group, with corresponding coefficient vector β l. The grouped-lasso minimizes the convex criterion ( ) L L y β 0 1 X l β l 2 2 + λ pl β l 2, (3) min β R p l=1 where the p l terms accounts for the varying group sizes, and 2 is the Euclidean norm (not squared). Since the Euclidean norm of a vector β l is zero only if all of its components are zero, this procedure encourages sparsity at the group level. l=1 7

Standard group lasso algorithm uses coordinate descent, and assumes that the design matrix in each group is orthonormal (not just orthogonal), and uses simple soft-thresholding. This is a restrictive assumption e.g if we have a categorical predictor coded by indicator variables, we can orthonormalize without changing the problem only if the number of observations in each category are the same. for general design matrices, computation is more intricate. see A note on the group lasso and a sparse group lasso (Friedman et al) on Tibshirani website 8