Group lasso for genomic data

Similar documents
Supervised classification for structured data: Applications in bio- and chemoinformatics

Statistical learning on graphs

Convex relaxation for Combinatorial Penalties

6. Regularized linear regression

ESL Chap3. Some extensions of lasso

Graph Wavelets to Analyze Genomic Data with Biological Networks

Proximal Methods for Optimization with Spasity-inducing Norms

Sparse Additive Functional and kernel CCA

25 : Graphical induced structured input/output models

Compressed Sensing and Neural Networks

Group Lasso with Overlaps: the Latent Group Lasso approach

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Variable Selection in Structured High-dimensional Covariate Spaces

Post-selection Inference for Changepoint Detection

OWL to the rescue of LASSO

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

The lasso: some novel algorithms and applications

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Discriminative Feature Grouping

Linear models for the joint analysis of multiple. array-cgh profiles

Composite Loss Functions and Multivariate Regression; Sparse PCA

Structured Sparse Estimation with Network Flow Optimization

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Generalized Power Method for Sparse Principal Component Analysis

The lasso, persistence, and cross-validation

25 : Graphical induced structured input/output models

1 Regression with High Dimensional Data

l 1 and l 2 Regularization

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

DATA MINING AND MACHINE LEARNING

Correlate. A method for the integrative analysis of two genomic data sets

A Short Introduction to the Lasso Methodology

Pathwise coordinate optimization

Structural Learning and Integrative Decomposition of Multi-View Data

Fast Regularization Paths via Coordinate Descent

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms

arxiv: v1 [stat.me] 11 Feb 2016

Parcimonie en apprentissage statistique

Sparse inverse covariance estimation with the lasso

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Least Squares Regression

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

On Optimal Frame Conditioners

An Homotopy Algorithm for the Lasso with Online Observations

Hub Gene Selection Methods for the Reconstruction of Transcription Networks

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

ORIE 4741: Learning with Big Messy Data. Regularization

Least Squares Regression

Sparse canonical correlation analysis, with applications to genomic data

Sparse Linear Models (10/7/13)

(k, q)-trace norm for sparse matrix factorization

Research Statement on Statistics Jun Zhang

Predicting Workplace Incidents with Temporal Graph-guided Fused Lasso

Homogeneity Pursuit. Jianqing Fan

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

Lecture 25: November 27

Gaussian Graphical Models and Graphical Lasso

Metabolic networks: Activity detection and Inference

Linear Methods for Regression. Lijun Zhang

Segmentation of the mean of heteroscedastic data via cross-validation

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data

High-dimensional covariance estimation based on Gaussian graphical models

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Regularized Regression A Bayesian point of view

Grouping of correlated feature vectors using treelets

An Introduction to Sparse Approximation

Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification

LASSO Review, Fused LASSO, Parallel LASSO Solvers

A Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure

Lecture 14: Variable Selection - Beyond LASSO

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

1.1 Basis of Statistical Decision Theory

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A Study of Relative Efficiency and Robustness of Classification Methods

A Modern Look at Classical Multivariate Techniques

An Efficient Proximal Gradient Method for General Structured Sparse Learning

Structured Sparsity. group testing & compressed sensing a norm that induces structured sparsity. Mittagsseminar 2011 / 11 / 10 Martin Jaggi

Conditional Gradient (Frank-Wolfe) Method

10708 Graphical Models: Homework 2

Overlapping Variable Clustering with Statistical Guarantees and LOVE

IEOR 265 Lecture 3 Sparse Linear Regression

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

STANDARDIZATION AND THE GROUP LASSO PENALTY

Standardization and the Group Lasso Penalty

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Statistical learning theory, Support vector machines, and Bioinformatics

Nikhil Rao, Miroslav Dudík, Zaid Harchaoui

Kernels to detect abrupt changes in time series

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

The knockoff filter for FDR control in group-sparse and multitask regression

In this section you will learn the following : 40.1Double integrals

Linear and logistic regression

Transcription:

Group lasso for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm Machine learning: Theory and Computation workshop, IMA, Minneapolis, March 26-3, 22 J.P Vert (ParisTech) Group lasso in genomics IMA / 5

Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5

Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5

Chromosomic aberrations in cancer J.P Vert (ParisTech) Group lasso in genomics IMA 4 / 5

Comparative Genomic Hybridization (CGH) Motivation Comparative genomic hybridization (CGH) data measure the DNA copy number along the genome Very useful, in particular in cancer research to observe systematically variants in DNA content Log-ratio.5 -.5 - J.P Vert (ParisTech) 2 3 4 5 6 Chromosome 7 8 9 Group lasso in genomics 2 3 4 5 6 7 8 9 2222 23 X IMA 5 / 5

Can we identify breakpoints and "smooth" each profile?.4.2.8.6.4.2.2 2 3 4 5 6 7 8 9 J.P Vert (ParisTech) Group lasso in genomics IMA 6 / 5

Can we detect frequent breakpoints?.5.5 2 4 6 8 2 4 6 8 2.5.5 2 4 6 8 2 4 6 8 2.5.5 2 4 6 8 2 4 6 8 2.5.5 2 4 6 8 2 4 6 8 2 A collection of bladder tumour copy number profiles. J.P Vert (ParisTech) Group lasso in genomics IMA 7 / 5

DNA RNA protein CGH shows the (static) DNA Cancer cells have also abnormal (dynamic) gene expression (= transcription) J.P Vert (ParisTech) Group lasso in genomics IMA 8 / 5

Can we identify the cancer subtype? (diagnosis) J.P Vert (ParisTech) Group lasso in genomics IMA 9 / 5

Can we predict the future evolution? (prognosis) J.P Vert (ParisTech) Group lasso in genomics IMA / 5

Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA / 5

The problem.4.2.8.6.4.2.2 2 3 4 5 6 7 8 9 Let Y R p the signal We want to find a piecewise constant approximation Û Rp with at most k change-points. J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5

The problem.4.2.8.6.4.2.2 2 3 4 5 6 7 8 9 Let Y R p the signal We want to find a piecewise constant approximation Û Rp with at most k change-points. J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5

An optimal solution?.4.2.8.6.4.2.2 2 3 4 5 6 7 8 9 We can define an "optimal" piecewise constant approximation Û R p as the solution of min U R p Y U 2 such that p (U i+ U i ) k This is an optimization problem over the ( p k) partitions... Dynamic programming finds the solution in O(p 2 k) in time and O(p 2 ) in memory But: does not scale to p = 6 9... J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5 i=

An optimal solution?.4.2.8.6.4.2.2 2 3 4 5 6 7 8 9 We can define an "optimal" piecewise constant approximation Û R p as the solution of min U R p Y U 2 such that p (U i+ U i ) k This is an optimization problem over the ( p k) partitions... Dynamic programming finds the solution in O(p 2 k) in time and O(p 2 ) in memory But: does not scale to p = 6 9... J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5 i=

An optimal solution?.4.2.8.6.4.2.2 2 3 4 5 6 7 8 9 We can define an "optimal" piecewise constant approximation Û R p as the solution of min U R p Y U 2 such that p (U i+ U i ) k This is an optimization problem over the ( p k) partitions... Dynamic programming finds the solution in O(p 2 k) in time and O(p 2 ) in memory But: does not scale to p = 6 9... J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5 i=

An optimal solution?.4.2.8.6.4.2.2 2 3 4 5 6 7 8 9 We can define an "optimal" piecewise constant approximation Û R p as the solution of min U R p Y U 2 such that p (U i+ U i ) k This is an optimization problem over the ( p k) partitions... Dynamic programming finds the solution in O(p 2 k) in time and O(p 2 ) in memory But: does not scale to p = 6 9... J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5 i=

Promoting sparsity with the l penalty The l penalty (Tibshirani, 996; Chen et al., 998) If R(β) is convex and "smooth", the solution of p min R(β) + λ β i β R p is usually sparse. i= J.P Vert (ParisTech) Group lasso in genomics IMA 4 / 5

Promoting piecewise constant profiles penalty The total variation / variable fusion penalty If R(β) is convex and "smooth", the solution of min R(β) + λ p β i+ β i β R p is usually piecewise constant (Rudin et al., 992; Land and Friedman, 996). Proof: i= Change of variable u i = β i+ β i, u = β We obtain a Lasso problem in u R p u sparse means β piecewise constant J.P Vert (ParisTech) Group lasso in genomics IMA 5 / 5

TV signal approximator min β R p Y β 2 such that p β i+ β i µ Adding additional constraints does not change the change-points: p i= β i ν (Tibshirani et al., 25; Tibshirani and Wang, 28) p i= β2 i ν (Mairal et al. 2) i= J.P Vert (ParisTech) Group lasso in genomics IMA 6 / 5

Solving TV signal approximator min β R p Y β 2 such that p β i+ β i µ QP with sparse linear constraints in O(p 2 ) -> 35 min for p = 5 (Tibshirani and Wang, 28) Coordinate descent-like method O(p)? -> 3s s for p = 5 (Friedman et al., 27) For all µ with the LARS in O(pK ) (Harchaoui and Levy-Leduc, 28) For all µ in O(p ln p) (Hoefling, 29) For the first K change-points in O(p ln K ) (Bleakley and V., 2) i= J.P Vert (ParisTech) Group lasso in genomics IMA 7 / 5

We will investigate different function I L (I), I R (I) and (I). The dichotomic segmentation method, presented in Algorithm, then proceeds as follows: starting from the full interval [i, n] as a trivial partition of [,n] into intervals, and then iteratively refine any partition P of [,n] into p intervals P = {I,...,I p } by splitting the interval I 2 P with maximal (I ) into the two intervals I L (I ) and I R (I ). TV signal approximator as dichotomic segmentation Algorithm Greedy dichotomic segmentation Require: k number of intervals, (I) gain function to split an interval I into I L (I),I R (I) : I represents the interval [,n] 2: P = {I } 3: for i =to k do 4: I arg max (I ) I2P 5: P P\{I } 6: P P [ {I L (I ),I R (I )} 7: end for 8: return P Theorem TV signal approximator performs "greedy" dichotomic segmentation (V. and Bleakley, 2; see also Hoefling, 29) 2 J.P Vert (ParisTech) Group lasso in genomics IMA 8 / 5

Speed trial : 2 s. for K =, p = 7 Speed for K=,, e2, e3, e4, e5.9.8.7.6 seconds.5.4.3.2. 2 3 4 5 signal length 6 7 8 9 x 5 J.P Vert (ParisTech) Group lasso in genomics IMA 9 / 5

Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5

The problem.5.5.5 2 3 4 5 6 7 8 9.5.5.5 2 3 4 5 6 7 8 9.5.5.5 2 3 4 5 6 7 8 9 Let Y R p n the n signals of length p We want to find a piecewise constant approximation Û Rp n with at most k change-points. J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5

The problem.5.5.5 2 3 4 5 6 7 8 9.5.5.5 2 3 4 5 6 7 8 9.5.5.5 2 3 4 5 6 7 8 9 Let Y R p n the n signals of length p We want to find a piecewise constant approximation Û Rp n with at most k change-points. J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5

"Optimal" segmentation by dynamic programming.5.5.5 2 3 4 5 6 7 8 9.5.5.5 2 3 4 5 6 7 8 9.5.5.5 2 3 4 5 6 7 8 9 Define the "optimal" piecewise constant approximation Û Rp n of Y as the solution of min Y U 2 such U Rp n that p ( ) U i+, U i, k i= DP finds the solution in O(p 2 kn) in time and O(p 2 ) in memory But: does not scale to p = 6 9... J.P Vert (ParisTech) Group lasso in genomics IMA 22 / 5

Selecting pre-defined groups of variables Group lasso (Yuan & Lin, 26) If groups of covariates are likely to be selected together, the l /l 2 -norm induces sparse solutions at the group level: Ω group (w) = g w g 2 Ω(w, w 2, w 3 ) = (w, w 2 ) 2 + w 3 2 = w 2 + w 22 + w3 2 J.P Vert (ParisTech) Group lasso in genomics IMA 23 / 5

TV approximator for many signals Replace min Y U 2 such U Rp n that p ( ) U i+, U i, k i= by min Y U 2 such U Rp n that p w i U i+, U i, µ i= Questions Practice: can we solve it efficiently? Theory: does it benefit from increasing p (for n fixed)? J.P Vert (ParisTech) Group lasso in genomics IMA 24 / 5

TV approximator as a group Lasso problem Make the change of variables: γ = U,, β i, = w i ( Ui+, U i, ) for i =,..., p. TV approximator is then equivalent to the following group Lasso problem (Yuan and Lin, 26): min Ȳ Xβ p 2 + λ β i,, β R (p ) n where Ȳ is the centered signal matrix and X is a particular (p ) (p ) design matrix. i= J.P Vert (ParisTech) Group lasso in genomics IMA 25 / 5

TV approximator implementation Theorem min Ȳ Xβ p 2 + λ β i,, β R (p ) n i= The TV approximator can be solved efficiently: approximately with the group LARS in O(npk) in time and O(np) in memory exactly with a block coordinate descent + active set method in O(np) in memory J.P Vert (ParisTech) Group lasso in genomics IMA 26 / 5

Proof: computational tricks... Although X is (p ) (p ): For any R R p n, we can compute C = X R in O(np) operations and memory For any two subset of indices A = ( a,..., a A ) and B = ( b,..., b B ) in [, p ], we can compute X,A X,B in O ( A B ) in time and memory For any A = ( a,..., a A ), set of distinct indices with a <... < a A p, and for any A n matrix R, we can compute C = ( X,A X,A ) R in O( A n) in time and memory J.P Vert (ParisTech) Group lasso in genomics IMA 27 / 5

Next, we tested empirically the accuracy the group fused Lasso for detecting a single change-point. We first generated J.P Vert (ParisTech) multidimensional profiles ofgroup dimension lasso inp, genomics withasinglejumpofheight at a position IMA u, 28 for / 5 respect to the LARS version. We suggest that this may be due to the Lasso optimization problem becoming relatively easier to solve when n or p increases, as we observed that the Lasso algorithm converged quickly to its final set of change-points. The main difference between thelassoandlarsperformanceappears when the number of change-points increases: with respective empiricalcomplexitiescubicandlinearink, as predicted by the theoretical analysis, Lasso is already, times slower than LARS when we seek change-points. Speed trial 3 2 3 2 2 time (s) time (s) time (s) 2 GFLars GFLasso 3 2 4 6 8 n 2 GFLars GFLasso 3 5 p 2 GFLars GFLasso 3 2 3 k Figure 2: Speed trials for group fused LARS (top row) and Lasso (bottom row). Left column: varying n,with fixedp =and k =; center column: varying p,with fixedn =and k =; right column: varying k, with fixed n =and p =.Figure axes are log-log. Results are averaged over trials. 6.2 Accuracy for detection of a single change-point

Consistency for a single change-point Suppose a single change-point: at position u = αp with increments (β i ) i=,...,n s.t. β 2 = lim k n n i= β2 i corrupted by i.i.d. Gaussian noise of variance σ 2 4 2 2 2 3 4 5 6 7 8 9 4 2 2 2 3 4 5 6 7 8 9 4 2 2 2 3 4 5 6 7 8 9 Does the TV approximator correctly estimate the first change-point as p increases? J.P Vert (ParisTech) Group lasso in genomics IMA 29 / 5

Consistency of the unweighted TV approximator Theorem min Y U 2 such U Rp n that p U i+, U i, µ The unweighted TV approximator finds the correct change-point with probability tending to (resp. ) as n + if σ 2 < σ α 2 (resp. σ 2 > σ α), 2 where σ α 2 = p β ( 2 α)2 (α 2p ) α 2. 2p i= correct estimation on [pɛ, p( ɛ)] with ɛ = wrong estimation near the boundaries σ 2 2p β 2 + o(p /2 ). J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5

Consistency of the weighted TV approximator Theorem min Y U 2 such U Rp n that p w i U i+, U i, µ i= The weighted TV approximator with weights i(p i) i [, p ], w i = p correctly finds the first change-point with probability tending to as n +. we see the benefit of increasing n we see the benefit of adding weights to the TV penalty J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5

Proof sketch The first change-point î found by TV approximator maximizes F i = ĉ i, 2, where ĉ = X Ȳ = X Xβ + X W. ĉ is Gaussian, and F i is follows a non-central χ 2 distribution with { G i = EF i i(p i) β = p pwi 2 σ 2 2 + wi 2 wu 2 p i 2 (p u) 2 if i u, 2 u 2 (p i) 2 otherwise. We then just check when G u = max i G i J.P Vert (ParisTech) Group lasso in genomics IMA 32 / 5

Consistency for a single change-point remains J.Probust Vert (ParisTech) against fluctuations in the exact Groupchange-point lasso genomics location. IMA 33 / 5.9.9.9.8.8.8 Accuracy: unweighted.7.6.5.4.3 u=5 u=6 u=7 u=8 u=9 Accuracy: weighted.7.6.5.4.3 u=5 u=6 u=7 u=8 u=9 Accuracy: weighted+vary.7.6.5.4.3 u=5±2 u=6±2 u=7±2 u=8±2 u=9±2.2.2.2... 2 4 p 2 4 p 2 4 p Figure 3: Single change-point accuracy for the group fused Lasso. Accuracy as a function of the number of profiles p when the change-point is placed in a variety of positions u =5to u =9(left and centre plots, resp. unweighted and weighted group fused Lasso), or: u = 5±2 to u = 9±2 (right plot, weighted with varying change-point location), for a signal of length.

Estimation of more change-points? Accuracy.9.8.7.6.5.4.3.2 U LARS W LARS. U Lasso W Lasso 2 4 p Accuracy.9.8.7.6.5.4.3.2. 2 4 p Accuracy.9.8.7.6.5.4.3.2. 2 4 p Figure 4: Multiple change-point accuracy. Accuracy as a function of the number of profiles p when change-points are placed at the nine positions {, 2,...,9} and the variance σ 2 of the centered Gaussian noise is either.5 (left),.2 (center) and (right). The profile length is. loss on each interval of the segmentation by summarizing each profilebyitsaveragevalueontheinterval. Note that we do not assume that all profiles share exactly the same change-points, but merely see the joint segmentation J.P Vert (ParisTech) as an adaptive way to reducegroup the dimension lasso in genomics and remove noise from data. IMA 34 / 5

Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 35 / 5

Molecular diagnosis / prognosis / theragnosis J.P Vert (ParisTech) Group lasso in genomics IMA 36 / 5

Gene networks, gene groups N Glycan biosynthesis Glycolysis / Gluconeogenesis Porphyrin and chlorophyll metabolism Riboflavin metabolism Sulfur metabolism - Protein kinases Nitrogen, asparagine metabolism Folate biosynthesis Biosynthesis of steroids, ergosterol metabolism DNA and RNA polymerase subunits Lysine biosynthesis Phenylalanine, tyrosine and tryptophan biosynthesis Oxidative phosphorylation, TCA cycle Purine metabolism J.P Vert (ParisTech) Group lasso in genomics IMA 37 / 5

Structured feature selection Basic biological functions usually involve the coordinated action of several proteins: Formation of protein complexes Activation of metabolic, signalling or regulatory pathways How to perform structured feature selection, such that selected genes belong to only a few groups? form a small number of connected components on the graph? N Glycan biosynthesis Glycolysis / Gluconeogenesis Porphyrin and chlorophyll metabolism Protein kinases Sulfur metabolism Nitrogen, asparagine metabolism - Riboflavin metabolism Folate biosynthesis DNA and RNA polymerase subunits Biosynthesis of steroids, ergosterol metabolism Lysine biosynthesis Phenylalanine, tyrosine and tryptophan biosynthesis J.P Vert (ParisTech) Oxidative phosphorylation, TCA cycle Purine metabolism Group lasso in genomics IMA 38 / 5

Selecting pre-defined groups of variables Group lasso (Yuan & Lin, 26) If groups of covariates are likely to be selected together, the l /l 2 -norm induces sparse solutions at the group level: Ω group (w) = g w g 2 Ω(w, w 2, w 3 ) = (w, w 2 ) 2 + w 3 2 J.P Vert (ParisTech) Group lasso in genomics IMA 39 / 5

Group lasso with overlapping groups Idea : shrink groups to zero (Jenatton et al., 29) Ω group (w) = g w g 2 sets groups to. One variable is selected all the groups to which it belongs are selected. w g 2 = w g3 2 = IGF selection selection of unwanted groups Removal of any group containing a gene the weight of the gene is. J.P Vert (ParisTech) Group lasso in genomics IMA 4 / 5

Group lasso with overlapping groups Idea 2: latent group Lasso (Jacob et al., 29) min v g 2 v Ω G latent (w) = g G w = g G v g supp ( ) v g g. Properties Resulting support is a union of groups in G. Possible to select one variable without selecting all the groups containing it. Equivalent to group lasso when there is no overlap J.P Vert (ParisTech) Group lasso in genomics IMA 4 / 5

Overlap and group unity balls Balls for Ω G group ( ) (middle) and Ω G latent ( ) (right) for the groups G = {{, 2}, {2, 3}} where w 2 is represented as the vertical coordinate. Left: group-lasso (G = {{, 2}, {3}}), for comparison. J.P Vert (ParisTech) Group lasso in genomics IMA 42 / 5

Theoretical results Consistency in group support (Jacob et al., 29) Then Let w be the true parameter vector. Assume that there exists a unique decomposition v g such that w = g v g and Ω G latent ( w) = v g 2. Consider the regularized empirical risk minimization problem L(w) + λω G latent (w). under appropriate mutual incoherence conditions on X, as n, with very high probability, the optimal solution ŵ admits a unique decomposition (ˆv g ) g G such that { g G ˆvg } = { g G v g }. J.P Vert (ParisTech) Group lasso in genomics IMA 43 / 5

Theoretical results Consistency in group support (Jacob et al., 29) Then Let w be the true parameter vector. Assume that there exists a unique decomposition v g such that w = g v g and Ω G latent ( w) = v g 2. Consider the regularized empirical risk minimization problem L(w) + λω G latent (w). under appropriate mutual incoherence conditions on X, as n, with very high probability, the optimal solution ŵ admits a unique decomposition (ˆv g ) g G such that { g G ˆvg } = { g G v g }. J.P Vert (ParisTech) Group lasso in genomics IMA 43 / 5

Experiments Synthetic data: overlapping groups groups of variables with 2 variables of overlap between two successive groups :{,..., }, {9,..., 8},..., {73,..., 82}. Support: union of 4th and 5th groups. Learn from training points. Frequency of selection of each variable with the lasso (left) and Ω G latent (.) (middle), comparison of the RMSE of both methods (right). J.P Vert (ParisTech) Group lasso in genomics IMA 44 / 5

Graph lasso Two solutions Ω G group (β) = i j β 2 i + β 2 j, Ω G latent (β) = sup α β. α R p : i j, α 2 i +α2 j J.P Vert (ParisTech) Group lasso in genomics IMA 45 / 5

Preliminary results Breast cancer data Gene expression data for 8, 4 genes in 295 breast cancer tumors. Canonical pathways from MSigDB containing 639 groups of genes, 637 of which involve genes from our study. Graph on the genes. METHOD l Ω G LATENT (.) ERROR.38 ±.4.36 ±.3 MEAN PATH. 3 3 METHOD l Ω graph (.) ERROR.39 ±.4.36 ±. AV. SIZE C.C..3.3 J.P Vert (ParisTech) Group lasso in genomics IMA 46 / 5

Lasso signature J.P Vert (ParisTech) Group lasso in genomics IMA 47 / 5

Graph Lasso signature J.P Vert (ParisTech) Group lasso in genomics IMA 48 / 5

Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 49 / 5

Conclusions Feature / pattern selection in high dimension is central for many applications Convex sparsity-inducing penalties are useful; efficient implementations + consistency results Kevin Bleakley (INRIA), Laurent Jacob (UC Berkeley) Guillaume Obozinski (INRIA) Post-docs available in Paris! J.P Vert (ParisTech) Group lasso in genomics IMA 5 / 5