Convex relaxation for Combinatorial Penalties

Similar documents
OWL to the rescue of LASSO

Structured sparsity-inducing norms through submodular functions

Parcimonie en apprentissage statistique

1 Sparsity and l 1 relaxation

Hierarchical kernel learning

Recent Advances in Structured Sparse Models

Structured Sparse Estimation with Network Flow Optimization

Topographic Dictionary Learning with Structured Sparsity

Structured sparsity-inducing norms through submodular functions

Structured sparsity-inducing norms through submodular functions

Spectral k-support Norm Regularization

DATA MINING AND MACHINE LEARNING

Trace Lasso: a trace norm regularization for correlated designs

1 Regression with High Dimensional Data

Structured sparsity through convex optimization

Group Lasso with Overlaps: the Latent Group Lasso approach

Proximal Methods for Optimization with Spasity-inducing Norms

Group lasso for genomic data

Tales from fmri Learning from limited labeled data. Gae l Varoquaux

Structured sparsity through convex optimization

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

Composite Loss Functions and Multivariate Regression; Sparse PCA

Sparse Estimation and Dictionary Learning

6. Regularized linear regression

High-dimensional graphical model selection: Practical and information-theoretic limits

A Family of Penalty Functions for Structured Sparsity

High-dimensional Statistical Models

Nikhil Rao, Miroslav Dudík, Zaid Harchaoui

Statistical Machine Learning for Structured and High Dimensional Data

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

High-dimensional regression with unknown variance

Restricted Strong Convexity Implies Weak Submodularity

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Structured Sparsity. group testing & compressed sensing a norm that induces structured sparsity. Mittagsseminar 2011 / 11 / 10 Martin Jaggi

High-dimensional covariance estimation based on Gaussian graphical models

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

SVRG++ with Non-uniform Sampling

An iterative hard thresholding estimator for low rank matrix recovery

Introduction to graphical models: Lecture III

High-dimensional graphical model selection: Practical and information-theoretic limits

Learning with stochastic proximal gradient

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Analysis of Greedy Algorithms

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD

Tractable Upper Bounds on the Restricted Isometry Constant

(k, q)-trace norm for sparse matrix factorization

25 : Graphical induced structured input/output models

Submodular Functions Properties Algorithms Machine Learning

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Structured sparsity through convex optimization

Lasso, Ridge, and Elastic Net

25 : Graphical induced structured input/output models

High-dimensional Statistics

Robust Principal Component Analysis

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Online Dictionary Learning with Group Structure Inducing Norms

Causal Inference: Discussion

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Computational and Statistical Aspects of Statistical Machine Learning. John Lafferty Department of Statistics Retreat Gleacher Center

Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries

An Interactive Greedy Approach to Group Sparsity in High Dimension

l 1 and l 2 Regularization

Mathematical Methods for Data Analysis

Reconstruction from Anisotropic Random Measurements

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Structured Variable Selection with Sparsity-Inducing Norms

Bolasso: Model Consistent Lasso Estimation through the Bootstrap

Probabilistic Graphical Models

Unsupervised dimensionality reduction

Proximal Methods for Hierarchical Sparse Coding

On Bayesian Computation

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Oracle Inequalities for High-dimensional Prediction

Generalized Conditional Gradient and Its Applications

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

Learning with Submodular Functions

Permutation-invariant regularization of large covariance matrices. Liza Levina

Learning Spectral Graph Segmentation

Learning with Submodular Functions: A Convex Optimization Perspective

Penalized versus constrained generalized eigenvalue problems

sparse and low-rank tensor recovery Cubic-Sketching

arxiv: v3 [stat.me] 8 Jun 2018

CPSC 540: Machine Learning

Graph Wavelets to Analyze Genomic Data with Biological Networks

CSC 576: Variants of Sparse Learning

(Part 1) High-dimensional statistics May / 41

Regularized Estimation of High Dimensional Covariance Matrices. Peter Bickel. January, 2008

AFRL-OSR-VA-TR

TREE-GUIDED GROUP LASSO FOR MULTI-RESPONSE REGRESSION WITH STRUCTURED SPARSITY, WITH AN APPLICATION TO EQTL MAPPING

STAT 200C: High-dimensional Statistics

Oslo Class 6 Sparsity based regularization

Learning Bound for Parameter Transfer Learning

Guaranteed Sparse Recovery under Linear Transformation

Transcription:

Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation, Inference and Optimization IHES - March 20th 2013 Convex relaxation for Combinatorial Penalties 1/33

From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Convex relaxation for Combinatorial Penalties 2/33

From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Support of the model: Supp(w) = {i w i 0}. Convex relaxation for Combinatorial Penalties 2/33

From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection min L(w) + λ Supp(w) w Rd Convex relaxation for Combinatorial Penalties 2/33

From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection min L(w) + λ Supp(w) w Rd Convex relaxation for Combinatorial Penalties 2/33

From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Supp(w) = n i=1 1 {wi 0} Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection min L(w) + λ Supp(w) w Rd 0 Convex relaxation for Combinatorial Penalties 2/33

From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Supp(w) = n i=1 1 {wi 0} Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection min L(w) + λ Supp(w) w Rd 1 0 1 Convex relaxation for Combinatorial Penalties 2/33

From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Supp(w) = n i=1 1 {wi 0} Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection min L(w) + λ Supp(w) w Rd 1 0 1 Convex relaxation for Combinatorial Penalties 2/33

From sparsity... Empirical risk: for w R d, L(w) = 1 2n n (y i xi w) 2 i=1 Supp(w) = n i=1 1 {wi 0} Support of the model: Supp(w) = {i w i 0}. Penalization for variable selection min L(w) + λ Supp(w) w Rd Lasso 1 0 1 min w R d L(w) + λ w 1 Convex relaxation for Combinatorial Penalties 2/33

... to Structured Sparsity The support is not only sparse, but, in addition, we have prior information about its structure. Convex relaxation for Combinatorial Penalties 3/33

... to Structured Sparsity The support is not only sparse, but, in addition, we have prior information about its structure. Examples The variables should be selected in groups. Convex relaxation for Combinatorial Penalties 3/33

... to Structured Sparsity The support is not only sparse, but, in addition, we have prior information about its structure. Examples The variables should be selected in groups. The variables lie in a hierarchy. Convex relaxation for Combinatorial Penalties 3/33

... to Structured Sparsity The support is not only sparse, but, in addition, we have prior information about its structure. Examples The variables should be selected in groups. The variables lie in a hierarchy. The variables lie on a graph or network and the support should be localized or densely connected on the graph. Convex relaxation for Combinatorial Penalties 3/33

Difficult inverse problem in Brain Imaging Scale 6 - Fold 9-5.00e-02 5.00e-02 L R L R y=-84 x=17 z=-13 Jenatton et al. (2012) Convex relaxation for Combinatorial Penalties 4/33

Hierarchical Dictionary Learning Figure: Hierarchical dictionary of image patches Figure: Hierarchical Topic model Mairal, Jenatton, Obozinski and Bach (2010) Convex relaxation for Combinatorial Penalties 5/33

Ideas in structured sparsity Convex relaxation for Combinatorial Penalties 6/33

Group Lasso and l 1 /l p norm (Yuan and Lin, 2006) Group Lasso Given G = {A 1,..., A m } a partition of V := {1,..., d} consider w l1 /l p = A G δ A w A p Convex relaxation for Combinatorial Penalties 7/33

Group Lasso and l 1 /l p norm (Yuan and Lin, 2006) Group Lasso Given G = {A 1,..., A m } a partition of V := {1,..., d} consider w l1 /l p = A G δ A w A p Overlapping groups: direct extension of Jenatton et al. (2011). Interesting induced structures Induce patterns of rooted subtree Induce convex patterns on a grid Convex relaxation for Combinatorial Penalties 7/33

Hierarchical Norms (Zhao et al., 2009; Bach, 2008) 1 2 3 4 5 6 (Jenatton, Mairal, Obozinski and Bach, 2010a) A covariate can only be selected after its ancestors Structure on parameters w Convex relaxation for Combinatorial Penalties 8/33

Hierarchical Norms (Zhao et al., 2009; Bach, 2008) (Jenatton, Mairal, Obozinski and Bach, 2010a) A covariate can only be selected after its ancestors Structure on parameters w Hierarchical penalization: Ω(w) = g G w g 2 where groups g in G are equal to the set of descendants of some nodes in a tree. Convex relaxation for Combinatorial Penalties 8/33

A new approach based on combinatorial functions Convex relaxation for Combinatorial Penalties 9/33

General framework Let V = {1,..., d}. Given a set function F : 2 V R + Convex relaxation for Combinatorial Penalties 10/33

General framework Let V = {1,..., d}. Given a set function F : 2 V R + consider min L(w) + F (Supp(w)) w Rd Convex relaxation for Combinatorial Penalties 10/33

General framework Let V = {1,..., d}. Given a set function F : 2 V R + consider min L(w) + F (Supp(w)) w Rd Examples of combinatorial functions Use recursivity or counts of structures (e.g. tree) with DP Block-coding (Huang et al., 2011) G(A) = min B i F (B 1 ) +... + F (B k ) s.t. B 1... B k A Submodular functions (Work on convex relaxations by Bach (2010)) Convex relaxation for Combinatorial Penalties 10/33

A relaxation for F...? How to solve? min L(w) + F (Supp(w)) w Rd Convex relaxation for Combinatorial Penalties 11/33

A relaxation for F...? How to solve? min L(w) + F (Supp(w)) w Rd Greedy algorithms Non-convex methods Relaxation Convex relaxation for Combinatorial Penalties 11/33

A relaxation for F...? How to solve? min L(w) + F (Supp(w)) w Rd Greedy algorithms Non-convex methods Relaxation A L(w) + λ Supp(w) F (A) Convex relaxation for Combinatorial Penalties 11/33

A relaxation for F...? How to solve? min L(w) + F (Supp(w)) w Rd Greedy algorithms Non-convex methods Relaxation A L(w) + λ Supp(w) F (A) L(w) + λ w 1 Convex relaxation for Combinatorial Penalties 11/33

A relaxation for F...? How to solve? min L(w) + F (Supp(w)) w Rd Greedy algorithms Non-convex methods Relaxation A L(w) + λ Supp(w) F (A) L(w) + λ F (Supp(w)) L(w) + λ w 1 Convex relaxation for Combinatorial Penalties 11/33

A relaxation for F...? How to solve? min L(w) + F (Supp(w)) w Rd Greedy algorithms Non-convex methods Relaxation A F (A) L(w) + λ Supp(w) L(w) + λ F (Supp(w))? L(w) + λ w 1 L(w) + λ...?... Convex relaxation for Combinatorial Penalties 11/33

Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Convex relaxation for Combinatorial Penalties 12/33

Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Limitations and open issues: Convex relaxation for Combinatorial Penalties 12/33

Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Limitations and open issues: The relaxation is defined on the unit l ball. Seems to implicitly assume that the w to be estimated is in a fixed l ball Convex relaxation for Combinatorial Penalties 12/33

Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Limitations and open issues: The relaxation is defined on the unit l ball. Seems to implicitly assume that the w to be estimated is in a fixed l ball The choice of l seems arbitrary Convex relaxation for Combinatorial Penalties 12/33

Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Limitations and open issues: The relaxation is defined on the unit l ball. Seems to implicitly assume that the w to be estimated is in a fixed l ball The choice of l seems arbitrary The l relaxation induces undesirable clustering artifacts of the coefficients absolute values. Convex relaxation for Combinatorial Penalties 12/33

Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Limitations and open issues: The relaxation is defined on the unit l ball. Seems to implicitly assume that the w to be estimated is in a fixed l ball The choice of l seems arbitrary The l relaxation induces undesirable clustering artifacts of the coefficients absolute values. What happens in the non-submodular case? Convex relaxation for Combinatorial Penalties 12/33

Previous relaxation result Bach (2010) showed that if F is a submodular function, it is possible to construct the tightest convex relaxation of the penalty F for vectors w R d such that w 1. Limitations and open issues: The relaxation is defined on the unit l ball. Seems to implicitly assume that the w to be estimated is in a fixed l ball The choice of l seems arbitrary The l relaxation induces undesirable clustering artifacts of the coefficients absolute values. What happens in the non-submodular case? Convex relaxation for Combinatorial Penalties 12/33

Penalizing and regularizing... Given a function F : 2 V R +, consider for ν, µ > 0 the combined penalty: pen(w) = µ F (Supp(w)) + ν w p p. Convex relaxation for Combinatorial Penalties 13/33

Penalizing and regularizing... Given a function F : 2 V R +, consider for ν, µ > 0 the combined penalty: pen(w) = µ F (Supp(w)) + ν w p p. Motivations Compromise between variable selection and smooth regularization Convex relaxation for Combinatorial Penalties 13/33

Penalizing and regularizing... Given a function F : 2 V R +, consider for ν, µ > 0 the combined penalty: pen(w) = µ F (Supp(w)) + ν w p p. Motivations Compromise between variable selection and smooth regularization Required for F allowing large supports such as A 1 {A } Convex relaxation for Combinatorial Penalties 13/33

Penalizing and regularizing... Given a function F : 2 V R +, consider for ν, µ > 0 the combined penalty: pen(w) = µ F (Supp(w)) + ν w p p. Motivations Compromise between variable selection and smooth regularization Required for F allowing large supports such as A 1 {A } Leads to a penalty which is coercive so that a convex relaxation on R d will not be trivial. Convex relaxation for Combinatorial Penalties 13/33

A convex and homogeneous relaxation Looking for a convex relaxation of pen(w). Require as well that it is positively homogeneous scale invariance. Convex relaxation for Combinatorial Penalties 14/33

A convex and homogeneous relaxation Looking for a convex relaxation of pen(w). Require as well that it is positively homogeneous scale invariance. Definition (Homogeneous extension of a function g) 1 g h : x inf λ>0 λ g(λx). Convex relaxation for Combinatorial Penalties 14/33

A convex and homogeneous relaxation Looking for a convex relaxation of pen(w). Require as well that it is positively homogeneous scale invariance. Definition (Homogeneous extension of a function g) Proposition 1 g h : x inf λ>0 λ g(λx). The tightest convex positively homogeneous lower bound of a function g is the convex envelope of g h. Convex relaxation for Combinatorial Penalties 14/33

A convex and homogeneous relaxation Looking for a convex relaxation of pen(w). Require as well that it is positively homogeneous scale invariance. Definition (Homogeneous extension of a function g) Proposition 1 g h : x inf λ>0 λ g(λx). The tightest convex positively homogeneous lower bound of a function g is the convex envelope of g h. Leads us to consider: 1 ( ) pen h (w) = inf µ F (Supp(λw)) + ν λw p λ>0 λ p Convex relaxation for Combinatorial Penalties 14/33

A convex and homogeneous relaxation Looking for a convex relaxation of pen(w). Require as well that it is positively homogeneous scale invariance. Definition (Homogeneous extension of a function g) Proposition 1 g h : x inf λ>0 λ g(λx). The tightest convex positively homogeneous lower bound of a function g is the convex envelope of g h. Leads us to consider: 1 ( ) pen h (w) = inf µ F (Supp(λw)) + ν λw p λ>0 λ p Θ(w) := w p F (Supp(w)) 1/q with 1 p + 1 q = 1. Convex relaxation for Combinatorial Penalties 14/33

Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Ω p(s) = max A V,A s A q F (A) 1/q. Convex relaxation for Combinatorial Penalties 15/33

Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Ω p(s) = max A V,A s A q F (A) 1/q. with unit ball: B Ω p := { s R d A V, s A q q F (A) } Convex relaxation for Combinatorial Penalties 15/33

Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Proposition Ω p(s) = max A V,A s A q F (A) 1/q. with unit ball: B Ω p := { s R d A V, s A q q F (A) } The norm Ω p is the convex envelope (tightest convex lower bound) of the function w w p F (Supp(w)) 1/q. Convex relaxation for Combinatorial Penalties 15/33

Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Proposition Ω p(s) = max A V,A s A q F (A) 1/q. with unit ball: B Ω p := { s R d A V, s A q q F (A) } The norm Ω p is the convex envelope (tightest convex lower bound) of the function w w p F (Supp(w)) 1/q. Proof. Denote Θ(w) = w p F (Supp(w)) 1/q : Θ (s) = max w R d w s w p F (Supp(w)) 1/q Convex relaxation for Combinatorial Penalties 15/33

Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Proposition Ω p(s) = max A V,A s A q F (A) 1/q. with unit ball: B Ω p := { s R d A V, s A q q F (A) } The norm Ω p is the convex envelope (tightest convex lower bound) of the function w w p F (Supp(w)) 1/q. Proof. Denote Θ(w) = w p F (Supp(w)) 1/q : Θ (s) = max w R d w s w p F (Supp(w)) 1/q = max A V max w w A R A A s A w A p F (A) 1/q Convex relaxation for Combinatorial Penalties 15/33

Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Proposition Ω p(s) = max A V,A s A q F (A) 1/q. with unit ball: B Ω p := { s R d A V, s A q q F (A) } The norm Ω p is the convex envelope (tightest convex lower bound) of the function w w p F (Supp(w)) 1/q. Proof. Denote Θ(w) = w p F (Supp(w)) 1/q : Θ (s) = max w R d w s w p F (Supp(w)) 1/q = max max w A V w A R A A s A w A p F (A) 1/q = max ι { s A V A q F (A) 1/q } Convex relaxation for Combinatorial Penalties 15/33

Envelope of the homogeneous penalty Θ Consider Ω p with dual norm Proposition Ω p(s) = max A V,A s A q F (A) 1/q. with unit ball: B Ω p := { s R d A V, s A q q F (A) } The norm Ω p is the convex envelope (tightest convex lower bound) of the function w w p F (Supp(w)) 1/q. Proof. Denote Θ(w) = w p F (Supp(w)) 1/q : Θ (s) = max w R d w s w p F (Supp(w)) 1/q = max max w A V w A R A A s A w A p F (A) 1/q = max ι { s A V A q F (A) 1/q } = ι {Ω p (s) 1} Convex relaxation for Combinatorial Penalties 15/33

Graphs of the different penalties for w R 2 F (Supp(w)) Convex relaxation for Combinatorial Penalties 16/33

Graphs of the different penalties for w R 2 F (Supp(w)) pen(w) = µ F (Supp(w)) + ν w 2 2 Convex relaxation for Combinatorial Penalties 16/33

Graphs of the different penalties for w R 2 Θ(w) = F (Supp(w)) w 2 Convex relaxation for Combinatorial Penalties 17/33

Graphs of the different penalties for w R 2 Θ(w) = F (Supp(w)) w 2 Ω F (w) Convex relaxation for Combinatorial Penalties 17/33

A large latent group Lasso (Jacob et al., 2009) V = {v = (v A ) A V ( R V ) 2 V s.t. Supp(v A ) A} Ω p (w) = min F (A) 1 q v A p s.t. w = v A, v V A V A V v {1} v {2}... v {1,2}... v {1,2,3,4} w = + + + + + + + + + + + + + + Convex relaxation for Combinatorial Penalties 18/33

Some simple examples F Ω p A w 1 If G is a partition of {1,..., d}: 1 {A } w p B G 1 {A B } B G w B p Convex relaxation for Combinatorial Penalties 19/33

Some simple examples F Ω p A w 1 If G is a partition of {1,..., d}: 1 {A } w p B G 1 {A B } B G w B p When p = and F is submodular, our relaxation coincides with that of Bach (2010). Convex relaxation for Combinatorial Penalties 19/33

Some simple examples F Ω p A w 1 If G is a partition of {1,..., d}: 1 {A } w p B G 1 {A B } B G w B p When p = and F is submodular, our relaxation coincides with that of Bach (2010). However, when G is not a partition and p <, Ω p is not in general an l 1 /l p -norms! New norms... e.g. the k-support norm of Argyriou et al. (2012). Convex relaxation for Combinatorial Penalties 19/33

Example Consider V={1, 2, 3}. G = {{1, 2}, {1, 3}, {2, 3}} F ({1, 2}) = 1, F ({1, 3}) = 1, F ({2, 3}) = 1, F (A) = or defined by block-coding. Convex relaxation for Combinatorial Penalties 20/33

Example Consider V={1, 2, 3}. G = {{1, 2}, {1, 3}, {2, 3}} F ({1, 2}) = 1, F ({1, 3}) = 1, F ({2, 3}) = 1, F (A) = or defined by block-coding. Convex relaxation for Combinatorial Penalties 20/33

How tight is the relaxation? Example: the range function Consider V = {1,..., p} and the function F (A) = range(a) = max(a) min(a) + 1. Leads to the selection of interval patterns. Convex relaxation for Combinatorial Penalties 21/33

How tight is the relaxation? Example: the range function Consider V = {1,..., p} and the function F (A) = range(a) = max(a) min(a) + 1. Leads to the selection of interval patterns. What is its convex relaxation? Convex relaxation for Combinatorial Penalties 21/33

How tight is the relaxation? Example: the range function Consider V = {1,..., p} and the function F (A) = range(a) = max(a) min(a) + 1. Leads to the selection of interval patterns. What is its convex relaxation? Ω F p (w) = w 1 Convex relaxation for Combinatorial Penalties 21/33

How tight is the relaxation? Example: the range function Consider V = {1,..., p} and the function F (A) = range(a) = max(a) min(a) + 1. Leads to the selection of interval patterns. What is its convex relaxation? Ω F p (w) = w 1 The relaxation fails Convex relaxation for Combinatorial Penalties 21/33

How tight is the relaxation? Example: the range function Consider V = {1,..., p} and the function F (A) = range(a) = max(a) min(a) + 1. Leads to the selection of interval patterns. What is its convex relaxation? Ω F p (w) = w 1 The relaxation fails New concept of Lower Combinatorial envelope provides a tool to analyze the tightness of the relaxation. Convex relaxation for Combinatorial Penalties 21/33

Submodular penalties A function F : 2 V R is submodular if A, B V, F (A) + F (B) F (A B) + F (A B) (1) For these functions Ω F (w) = f ( w ) for f the Lovász extension of F. Properties of submodular function f is computed efficiently (via the so-called greedy algorithm) decomposition ( weak separability) properties F and f can be minimized in polynomial time. Convex relaxation for Combinatorial Penalties 22/33

Submodular penalties A function F : 2 V R is submodular if A, B V, F (A) + F (B) F (A B) + F (A B) (1) For these functions Ω F (w) = f ( w ) for f the Lovász extension of F. Properties of submodular function f is computed efficiently (via the so-called greedy algorithm) decomposition ( weak separability) properties F and f can be minimized in polynomial time.... leads to properties of the corresponding submodular norms Regularized empirical risk minimization problems solved efficiently Statistical guarantees in terms of consistency and support recovery. Convex relaxation for Combinatorial Penalties 22/33

Consistency for the Lasso (Bickel et al., 2009) Assume that y = Xw + σε, with ε N (0, Id n ) Let Q = 1 n X X R d d. Denote J = Supp(w ). Assume the l 1 -Restricted Eigenvalue condition: s.t. J c 1 3 J 1, Q κ J 2 1. Then we have 1 n X ŵ Xw 2 2 72 J σ2 κ ( 2 log p + t 2 n ), with probability larger than 1 exp( t 2 ). Convex relaxation for Combinatorial Penalties 23/33

Support Recovery for the Lasso (Wainwright, 2009) Assume y = Xw + σε, with ε N (0, Id n ) Let Q = 1 n X X R d d. Denote by J = Supp(w ). Define ν = min j,w j 0 w j > 0 Assume κ = λ min (Q JJ ) > 0 Assume the Irrepresentability Condition, i.e., that for η > 0, 2σ 2 log(p) n Q 1 JJ Q JJ c, 1 η. Then, if 2 η < λ < κν J, the minimizer ŵ is unique and has support equal to J, with probability larger than 1 4 exp( c 1 nλ 2 ). Convex relaxation for Combinatorial Penalties 24/33

An example: penalizing the range Structured prior on support (Jenatton et al., 2011): the support is an interval of {1,..., p} Convex relaxation for Combinatorial Penalties 25/33

An example: penalizing the range Structured prior on support (Jenatton et al., 2011): the support is an interval of {1,..., p} Natural associated penalization: F (A) = range(a) = i max (A) i min (A) + 1. Convex relaxation for Combinatorial Penalties 25/33

An example: penalizing the range Structured prior on support (Jenatton et al., 2011): the support is an interval of {1,..., p} Natural associated penalization: F (A) = range(a) = i max (A) i min (A) + 1. F is not submodular... Convex relaxation for Combinatorial Penalties 25/33

An example: penalizing the range Structured prior on support (Jenatton et al., 2011): the support is an interval of {1,..., p} Natural associated penalization: F (A) = range(a) = i max (A) i min (A) + 1. F is not submodular... G(A) = A Convex relaxation for Combinatorial Penalties 25/33

An example: penalizing the range Structured prior on support (Jenatton et al., 2011): the support is an interval of {1,..., p} Natural associated penalization: F (A) = range(a) = i max (A) i min (A) + 1. F is not submodular... G(A) = A But F (A) : = d 1 + range(a) is submodular! Convex relaxation for Combinatorial Penalties 25/33

An example: penalizing the range Structured prior on support (Jenatton et al., 2011): the support is an interval of {1,..., p} Natural associated penalization: F (A) = range(a) = i max (A) i min (A) + 1. F is not submodular... G(A) = A But F (A) : = d 1 + range(a) is submodular! In fact F (A) = B G 1 {A B } for B of the form: Jenatton et al. (2011) considered Ω(w) = B B w B d B 2. Convex relaxation for Combinatorial Penalties 25/33

Experiments S 1 1.5 1 0.5 0 0 50 100 150 200 250 1 S 2 S 3 S 4 S 5 0.5 0.5 0.5 0 0 50 100 150 200 250 1 0 0 50 100 150 200 250 1 0 0 50 100 150 200 250 2 0 S 1 constant S 2 triangular shape S 3 x sin(x) sin(5x) S 4 a slope pattern S 5 i.i.d. Gaussian pattern 2 0 50 100 150 200 250 Figure: Signals Compare: Lasso Elastic Net Naive l 2 group-lasso Ω 2 for F (A) = d 1 + range(a) Ω for F (A) = d 1 + range(a) The weighted l 2 group-lasso of (Jenatton et al., 2011). Convex relaxation for Combinatorial Penalties 26/33

Constant signal Best Hamming 100 80 60 40 20 d=256, k=160, σ=0.5 EN GL+w GL L1 Sub p= Sub p=2 d = 256 k = 160 σ =.5 0 0 500 1000 1500 2000 n Convex relaxation for Combinatorial Penalties 27/33

Triangular signal Best Hamming 100 90 80 70 60 50 40 30 20 10 Best Hamming d=256, k=160, σ=0.5, S 2, cov=id 0 0 500 1000 1500 2000 n EN GL+w GL L1 Sub p= Sub p=2 d = 256 k = 160 σ =.5 Convex relaxation for Combinatorial Penalties 28/33

(x 1, x 2 ) sin(x 1 ) sin(5x 1 ) sin(x 2 ) sin(5x 2 ) signal in 2D Best Hamming 100 90 80 70 60 50 40 30 d=256, k=160, σ=1.0 EN GL+w GL L1 Sub p= Sub p=2 d = 256 k = 160 σ =.5 20 0 500 1000 1500 2000 2500 n Convex relaxation for Combinatorial Penalties 29/33

i.i.d Random signal in 2D Best Hamming 100 80 60 40 20 d=256, k=160, σ=1.0 EN GL+w GL L1 Sub p= Sub p=2 d = 256 k = 160 σ =.5 0 0 500 1000 1500 2000 2500 n Convex relaxation for Combinatorial Penalties 30/33

Summary A convex relaxation for functions penalizing (a) the support via a general set function (b) the l p norm of the parameter vector w. Principled construction of: known norms like the group Lasso or l 1 /l p -norm many new sparsity inducing norms Caveat: the relaxation can fail to capture the structure (e.g. range function) For submodular functions we can obtain efficient algorithms, and theoretical results such as consistency and support recovery guarantees. Convex relaxation for Combinatorial Penalties 31/33

References I Argyriou, A., Foygel, R., and Srebro, N. (2012). Sparse prediction with the k-support norm. In Advances in Neural Information Processing Systems 25, pages 1466 1474. Bach, F. (2008). Exploring large feature spaces with hierarchical multiple kernel learning. In Advances in Neural Information Processing Systems. Bach, F. (2010). Structured sparsity-inducing norms through submodular functions. In Adv. NIPS. Bickel, P., Ritov, Y., and Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector. Annals of Statistics, 37(4):1705 1732. Huang, J., Zhang, T., and Metaxas, D. (2011). Learning with structured sparsity. J. Mach. Learn. Res., 12:3371 3412. Jacob, L., Obozinski, G., and Vert, J. (2009). Group lasso with overlap and graph lasso. In ICML. Jenatton, R., Audibert, J., and Bach, F. (2011). Structured variable selection with sparsity-inducing norms. JMLR, 12:2777 2824. Jenatton, R., Gramfort, A., Michel, V., Obozinski, G., Eger, E., Bach, F., and Thirion, B. (2012). Multiscale mining of fmri data with hierarchical structured sparsity. SIAM Journal on Imaging Sciences, 5(3):835 856. Mairal, J., Jenatton, R., Obozinski, G., and Bach, F. (2011). Convex and network flow optimization for structured sparsity. JMLR, 12:2681 2720. Convex relaxation for Combinatorial Penalties 32/33

References II Wainwright, M. J. (2009). Sharp thresholds for noisy and high-dimensional recovery of sparsity using l 1 - constrained quadratic programming. IEEE Transactions on Information Theory, 55:2183 2202. Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B, 68:49 67. Zhao, P., Rocha, G., and Yu, B. (2009). Grouped and hierarchical model selection through composite absolute penalties. Annals of Statistics, 37(6A):3468 3497. Convex relaxation for Combinatorial Penalties 33/33