Group exponential penalties for bi-level variable selection

Similar documents
The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

Bi-level feature selection with applications to genetic association

Nonconvex penalties: Signal-to-noise ratio and algorithms

Stability and the elastic net

Regularized methods for high-dimensional and bilevel variable selection

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

Penalized Methods for Bi-level Variable Selection

Penalized Methods for Multiple Outcome Data in Genome-Wide Association Studies

ESL Chap3. Some extensions of lasso

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Semi-Penalized Inference with Direct FDR Control

25 : Graphical induced structured input/output models

Theoretical results for lasso, MCP, and SCAD

Lecture 14: Variable Selection - Beyond LASSO

Concave selection in generalized linear models

CONSISTENT BI-LEVEL VARIABLE SELECTION VIA COMPOSITE GROUP BRIDGE PENALIZED REGRESSION INDU SEETHARAMAN

Robust Variable Selection Methods for Grouped Data. Kristin Lee Seamon Lilly

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

Sparse inverse covariance estimation with the lasso

25 : Graphical induced structured input/output models

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

Other likelihoods. Patrick Breheny. April 25. Multinomial regression Robust regression Cox regression

Comparisons of penalized least squares. methods by simulations

The lasso: some novel algorithms and applications

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Graphical Model Selection

Smoothing and variable selection using P-splines

Feature selection with high-dimensional data: criteria and Proc. Procedures

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

TGDR: An Introduction

Chapter 3. Linear Models for Regression

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

Pathwise coordinate optimization

High-dimensional Ordinary Least-squares Projection for Screening Variables

Extensions to LDA and multinomial regression

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

Group variable selection via convex Log-Exp-Sum penalty with application to a breast cancer survivor study

HIGH-DIMENSIONAL VARIABLE SELECTION WITH THE GENERALIZED SELO PENALTY

Penalized methods in genome-wide association studies

Estimating subgroup specific treatment effects via concave fusion

Exam: high-dimensional data analysis January 20, 2014

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

Fast Regularization Paths via Coordinate Descent

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

Consistent high-dimensional Bayesian variable selection via penalized credible regions

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Permutation-invariant regularization of large covariance matrices. Liza Levina

arxiv: v1 [stat.ap] 14 Apr 2011

VARIABLE SELECTION, SPARSE META-ANALYSIS AND GENETIC RISK PREDICTION FOR GENOME-WIDE ASSOCIATION STUDIES

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

arxiv: v1 [stat.me] 30 Dec 2017

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

New Penalized Regression Approaches to Analysis of Genetic and Genomic Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Variable selection in high-dimensional regression problems

MS&E 226: Small Data

Log Covariance Matrix Estimation

Statistics for high-dimensional data: Group Lasso and additive models

Extended Bayesian Information Criteria for Gaussian Graphical Models

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Generalized Linear Models and Its Asymptotic Properties

Accounting for Linkage Disequilibrium in Genome-Wide Association Studies: A Penalized Regression Method

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate

An Introduction to Graphical Lasso

Incorporating Group Correlations in Genome-Wide Association Studies Using Smoothed Group Lasso

Single Index Quantile Regression for Heteroscedastic Data

Regularization Path Algorithms for Detecting Gene Interactions

Towards a Regression using Tensors

Sparse Linear Models (10/7/13)

Lasso: Algorithms and Extensions

MSA220/MVE440 Statistical Learning for Big Data

A general mixed model approach for spatio-temporal regression data

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Sparse Learning and Distributed PCA. Jianqing Fan

Consistent Group Identification and Variable Selection in Regression with Correlated Predictors

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017

Logistic Regression with the Nonnegative Garrote

Feature Screening in Ultrahigh Dimensional Cox s Model

STANDARDIZATION AND THE GROUP LASSO PENALTY

Sparsity Regularization

arxiv: v1 [stat.me] 11 Feb 2016

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

THE Mnet METHOD FOR VARIABLE SELECTION

Fast Regularization Paths via Coordinate Descent

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Iterative Selection Using Orthogonal Regression Techniques

Standardization and the Group Lasso Penalty

ADAPTIVE LASSO FOR SPARSE HIGH-DIMENSIONAL REGRESSION MODELS

Linear Regression. Volker Tresp 2018

Transcription:

for bi-level variable selection Department of Biostatistics Department of Statistics University of Kentucky July 31, 2011

Introduction In regression, variables can often be thought of as grouped: Indicator variables for a single categorical variable Basis functions derived from a single continuous variable Grouped by exterior knowledge (e.g. genetic variants by gene, genes by pathway) In the penalized regression framework, this issue was first addressed by Yuan and Lin (2006), who proposed the group lasso: Q glasso (β) = 1 2n y Xβ 2 + λ J Kj β j j=1

Group penalization framework One limitation of the group lasso is that it selects entire groups at once i.e., either an entire group of coefficients is zero or they are all nonzero To extend the group lasso idea and allow for bi-level selection, Breheny and Huang (2009) proposed a general framework of group penalization, in which the penalty applied to the jth group consists of an outer penalty f O and an inner penalty f I : K j f I ( β jk ) f O k=1 This framework encompassed existing approaches such as the group lasso, group bridge (Huang et al. 2009), and group SCAD (Wang, Chen, and Li 2007), as well as suggesting a new approach, group MCP

Overview In this talk, I will propose a new class of group penalties (also fitting into the aforementioned framework) based on exponential penalties, and Illustrate why they are attractive in theory Demonstrate that they perform well in simulation Apply them to the problem of genetic association studies of rare variants

Exponential penalty To begin, let us define the exponential penalty: p(x λ, τ) = λ2 { 1 exp ( τ )} τ λ x Lasso p'(β) 0 λ Exp MCP 0 γ β

Coupling Note that the partial derivative with respect to the jkth covariate is K j f O f I ( β jk ) f I( β jk ) k=1 Thus, if we choose the outer penalty to be exponential, we can control the rate of penalty relaxation applied to a coefficient if it is located in an important group We refer to this notion (coefficients are more likely to be selected if they are in an important group) as coupling, and consequently refer to τ as the coupling parameter, as it controls the magnitude of this phenomenon In principle, any penalty could be chosen as the inner penalty we focus here on the lasso (gelasso) and MCP (gemcp)

Coupling plots As an illustration for a group of two covariates, consider the effect of increasing β 1 on the selection threshold for β 2 : Selection threshold for β 2 0 λ τ = 1 K=10 τ = 1 20 K=2 τ = 1 4 Lasso Group lasso Group MCP gelasso 0 γ β 1

τ: Connections and convexity It can be shown that λ 2 τ 1 exp τ K j K j λ 2 f I ( β jk ) = f I ( β jk ) + O(τ) k=1 k=1 i.e., as τ 0, gelasso is equivalent to the Lasso, gemcp is equivalent to MCP, and so on Furthermore, the objective function is to remain convex in every coordinate dimension, there are also upper bounds on τ: { τ 1 gelasso τ + 1 γ 1 gemcp

Local coordinate descent can be fit via the local coordinate descent algorithm, introduced in Breheny and Huang (2009); this approach has the following attractive property: Proposition At every step of the LCD algorithm for both gelasso and gemcp, Q(β (m+1) ) Q(β (m) ) (1) i.e., the algorithms decrease the objective function at every iteration

Simulation study Simulation GAW 2010 data To examine the statistical properties of group exponential penalties, we carried out a simulation study with the following design: n = p = 100, with 10 groups, and 10 covariates in each group True coefficients were heterogeneous so that there were strongly predictive and weakly predictive groups, as well as strongly and weakly predictive covariates within a group, but the overall SNR was always 1 There were three nonzero groups, but the number of nonzero covariates within a group varied γ = 4 for gmcp and gemcp; τ = 1/3 for gelasso and gemcp; BIC was used to select λ

Simulation GAW 2010 data Simulation results MSE and model size glasso gmcp gelasso gemcp 0.6 30 25 MSE 0.4 0.2 Model size 20 15 10 5 0.0 0 2 4 6 8 10 Nonzero covariates/group 2 4 6 8 10 Nonzero covariates/group

Simulation GAW 2010 data Simulation results variable selection glasso gmcp gelasso gemcp 1.0 1.0 0.8 0.8 FDR (covariates) 0.6 0.4 FDR (groups) 0.6 0.4 0.2 0.2 0.0 0.0 2 4 6 8 10 Nonzero covariates/group 2 4 6 8 10 Nonzero covariates/group

Rare variants Simulation GAW 2010 data One important potential application of this method is to the problem of rare variants in genetic association studies Briefly, the issue in such studies is that, while the collective effect of multiple rare genetic variants may be large, each variant is sufficiently rare that association studies have low power to detect them Currently, most methods for addressing this problem involve collapsing the rare variants into a single test This approach has the considerable disadvantage of assuming that every variant has precisely the same effect on the outcome

Description of GAW 2010 data Simulation GAW 2010 data An alternative approach is to use a group penalization approach in which the variants are grouped by the gene they belong to To test this approach on real(istic) data, we analyzed the data set from the 2010 Genetic Analysis Workshop (GAW) The data set contains real data from the 1000 genomes project on 697 unrelated individuals and 24,487 genetic variants, grouped into 3,205 genes 200 independent sets of outcomes were simulated by the organizers of the workshop according to a plausible genetic model of variant-disease association

GAW results Simulation GAW 2010 data There were 39 causal variants; each method was allowed to select 39 covariates Variants Genes Variants Genes selected selected correct (%) correct (%) Lasso 39.2 35.9 10.4 5.7 glasso 248.8 2.2 0.1 1.1 glasso( K j ) 41.0 34.8 1.1 1.3 gmcp 42.6 39.5 9.1 5.0 gelasso 39.6 6.9 28.3 21.6 gemcp 44.1 7.7 25.0 19.2

Conclusions Simulation GAW 2010 data are both theoretically attractive and practically useful The framework allows control over the degree of grouping (coupling) The idea extends readily to GLMs The method is publicly available via the grpreg package