Group lasso for genomic data

Size: px
Start display at page:

Download "Group lasso for genomic data"

Transcription

1 Group lasso for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm Machine learning: Theory and Computation workshop, IMA, Minneapolis, March 26-3, 22 J.P Vert (ParisTech) Group lasso in genomics IMA / 5

2 Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5

3 Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5

4 Chromosomic aberrations in cancer J.P Vert (ParisTech) Group lasso in genomics IMA 4 / 5

5 Comparative Genomic Hybridization (CGH) Motivation Comparative genomic hybridization (CGH) data measure the DNA copy number along the genome Very useful, in particular in cancer research to observe systematically variants in DNA content Log-ratio J.P Vert (ParisTech) Chromosome Group lasso in genomics X IMA 5 / 5

6 Can we identify breakpoints and "smooth" each profile? J.P Vert (ParisTech) Group lasso in genomics IMA 6 / 5

7 Can we detect frequent breakpoints? A collection of bladder tumour copy number profiles. J.P Vert (ParisTech) Group lasso in genomics IMA 7 / 5

8 DNA RNA protein CGH shows the (static) DNA Cancer cells have also abnormal (dynamic) gene expression (= transcription) J.P Vert (ParisTech) Group lasso in genomics IMA 8 / 5

9 Can we identify the cancer subtype? (diagnosis) J.P Vert (ParisTech) Group lasso in genomics IMA 9 / 5

10 Can we predict the future evolution? (prognosis) J.P Vert (ParisTech) Group lasso in genomics IMA / 5

11 Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA / 5

12 The problem Let Y R p the signal We want to find a piecewise constant approximation Û Rp with at most k change-points. J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5

13 The problem Let Y R p the signal We want to find a piecewise constant approximation Û Rp with at most k change-points. J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5

14 An optimal solution? We can define an "optimal" piecewise constant approximation Û R p as the solution of min U R p Y U 2 such that p (U i+ U i ) k This is an optimization problem over the ( p k) partitions... Dynamic programming finds the solution in O(p 2 k) in time and O(p 2 ) in memory But: does not scale to p = J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5 i=

15 An optimal solution? We can define an "optimal" piecewise constant approximation Û R p as the solution of min U R p Y U 2 such that p (U i+ U i ) k This is an optimization problem over the ( p k) partitions... Dynamic programming finds the solution in O(p 2 k) in time and O(p 2 ) in memory But: does not scale to p = J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5 i=

16 An optimal solution? We can define an "optimal" piecewise constant approximation Û R p as the solution of min U R p Y U 2 such that p (U i+ U i ) k This is an optimization problem over the ( p k) partitions... Dynamic programming finds the solution in O(p 2 k) in time and O(p 2 ) in memory But: does not scale to p = J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5 i=

17 An optimal solution? We can define an "optimal" piecewise constant approximation Û R p as the solution of min U R p Y U 2 such that p (U i+ U i ) k This is an optimization problem over the ( p k) partitions... Dynamic programming finds the solution in O(p 2 k) in time and O(p 2 ) in memory But: does not scale to p = J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5 i=

18 Promoting sparsity with the l penalty The l penalty (Tibshirani, 996; Chen et al., 998) If R(β) is convex and "smooth", the solution of p min R(β) + λ β i β R p is usually sparse. i= J.P Vert (ParisTech) Group lasso in genomics IMA 4 / 5

19 Promoting piecewise constant profiles penalty The total variation / variable fusion penalty If R(β) is convex and "smooth", the solution of min R(β) + λ p β i+ β i β R p is usually piecewise constant (Rudin et al., 992; Land and Friedman, 996). Proof: i= Change of variable u i = β i+ β i, u = β We obtain a Lasso problem in u R p u sparse means β piecewise constant J.P Vert (ParisTech) Group lasso in genomics IMA 5 / 5

20 TV signal approximator min β R p Y β 2 such that p β i+ β i µ Adding additional constraints does not change the change-points: p i= β i ν (Tibshirani et al., 25; Tibshirani and Wang, 28) p i= β2 i ν (Mairal et al. 2) i= J.P Vert (ParisTech) Group lasso in genomics IMA 6 / 5

21 Solving TV signal approximator min β R p Y β 2 such that p β i+ β i µ QP with sparse linear constraints in O(p 2 ) -> 35 min for p = 5 (Tibshirani and Wang, 28) Coordinate descent-like method O(p)? -> 3s s for p = 5 (Friedman et al., 27) For all µ with the LARS in O(pK ) (Harchaoui and Levy-Leduc, 28) For all µ in O(p ln p) (Hoefling, 29) For the first K change-points in O(p ln K ) (Bleakley and V., 2) i= J.P Vert (ParisTech) Group lasso in genomics IMA 7 / 5

22 We will investigate different function I L (I), I R (I) and (I). The dichotomic segmentation method, presented in Algorithm, then proceeds as follows: starting from the full interval [i, n] as a trivial partition of [,n] into intervals, and then iteratively refine any partition P of [,n] into p intervals P = {I,...,I p } by splitting the interval I 2 P with maximal (I ) into the two intervals I L (I ) and I R (I ). TV signal approximator as dichotomic segmentation Algorithm Greedy dichotomic segmentation Require: k number of intervals, (I) gain function to split an interval I into I L (I),I R (I) : I represents the interval [,n] 2: P = {I } 3: for i =to k do 4: I arg max (I ) I2P 5: P P\{I } 6: P P [ {I L (I ),I R (I )} 7: end for 8: return P Theorem TV signal approximator performs "greedy" dichotomic segmentation (V. and Bleakley, 2; see also Hoefling, 29) 2 J.P Vert (ParisTech) Group lasso in genomics IMA 8 / 5

23 Speed trial : 2 s. for K =, p = 7 Speed for K=,, e2, e3, e4, e seconds signal length x 5 J.P Vert (ParisTech) Group lasso in genomics IMA 9 / 5

24 Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5

25 The problem Let Y R p n the n signals of length p We want to find a piecewise constant approximation Û Rp n with at most k change-points. J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5

26 The problem Let Y R p n the n signals of length p We want to find a piecewise constant approximation Û Rp n with at most k change-points. J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5

27 "Optimal" segmentation by dynamic programming Define the "optimal" piecewise constant approximation Û Rp n of Y as the solution of min Y U 2 such U Rp n that p ( ) U i+, U i, k i= DP finds the solution in O(p 2 kn) in time and O(p 2 ) in memory But: does not scale to p = J.P Vert (ParisTech) Group lasso in genomics IMA 22 / 5

28 Selecting pre-defined groups of variables Group lasso (Yuan & Lin, 26) If groups of covariates are likely to be selected together, the l /l 2 -norm induces sparse solutions at the group level: Ω group (w) = g w g 2 Ω(w, w 2, w 3 ) = (w, w 2 ) 2 + w 3 2 = w 2 + w 22 + w3 2 J.P Vert (ParisTech) Group lasso in genomics IMA 23 / 5

29 TV approximator for many signals Replace min Y U 2 such U Rp n that p ( ) U i+, U i, k i= by min Y U 2 such U Rp n that p w i U i+, U i, µ i= Questions Practice: can we solve it efficiently? Theory: does it benefit from increasing p (for n fixed)? J.P Vert (ParisTech) Group lasso in genomics IMA 24 / 5

30 TV approximator as a group Lasso problem Make the change of variables: γ = U,, β i, = w i ( Ui+, U i, ) for i =,..., p. TV approximator is then equivalent to the following group Lasso problem (Yuan and Lin, 26): min Ȳ Xβ p 2 + λ β i,, β R (p ) n where Ȳ is the centered signal matrix and X is a particular (p ) (p ) design matrix. i= J.P Vert (ParisTech) Group lasso in genomics IMA 25 / 5

31 TV approximator implementation Theorem min Ȳ Xβ p 2 + λ β i,, β R (p ) n i= The TV approximator can be solved efficiently: approximately with the group LARS in O(npk) in time and O(np) in memory exactly with a block coordinate descent + active set method in O(np) in memory J.P Vert (ParisTech) Group lasso in genomics IMA 26 / 5

32 Proof: computational tricks... Although X is (p ) (p ): For any R R p n, we can compute C = X R in O(np) operations and memory For any two subset of indices A = ( a,..., a A ) and B = ( b,..., b B ) in [, p ], we can compute X,A X,B in O ( A B ) in time and memory For any A = ( a,..., a A ), set of distinct indices with a <... < a A p, and for any A n matrix R, we can compute C = ( X,A X,A ) R in O( A n) in time and memory J.P Vert (ParisTech) Group lasso in genomics IMA 27 / 5

33 Next, we tested empirically the accuracy the group fused Lasso for detecting a single change-point. We first generated J.P Vert (ParisTech) multidimensional profiles ofgroup dimension lasso inp, genomics withasinglejumpofheight at a position IMA u, 28 for / 5 respect to the LARS version. We suggest that this may be due to the Lasso optimization problem becoming relatively easier to solve when n or p increases, as we observed that the Lasso algorithm converged quickly to its final set of change-points. The main difference between thelassoandlarsperformanceappears when the number of change-points increases: with respective empiricalcomplexitiescubicandlinearink, as predicted by the theoretical analysis, Lasso is already, times slower than LARS when we seek change-points. Speed trial time (s) time (s) time (s) 2 GFLars GFLasso n 2 GFLars GFLasso 3 5 p 2 GFLars GFLasso k Figure 2: Speed trials for group fused LARS (top row) and Lasso (bottom row). Left column: varying n,with fixedp =and k =; center column: varying p,with fixedn =and k =; right column: varying k, with fixed n =and p =.Figure axes are log-log. Results are averaged over trials. 6.2 Accuracy for detection of a single change-point

34 Consistency for a single change-point Suppose a single change-point: at position u = αp with increments (β i ) i=,...,n s.t. β 2 = lim k n n i= β2 i corrupted by i.i.d. Gaussian noise of variance σ Does the TV approximator correctly estimate the first change-point as p increases? J.P Vert (ParisTech) Group lasso in genomics IMA 29 / 5

35 Consistency of the unweighted TV approximator Theorem min Y U 2 such U Rp n that p U i+, U i, µ The unweighted TV approximator finds the correct change-point with probability tending to (resp. ) as n + if σ 2 < σ α 2 (resp. σ 2 > σ α), 2 where σ α 2 = p β ( 2 α)2 (α 2p ) α 2. 2p i= correct estimation on [pɛ, p( ɛ)] with ɛ = wrong estimation near the boundaries σ 2 2p β 2 + o(p /2 ). J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5

36 Consistency of the weighted TV approximator Theorem min Y U 2 such U Rp n that p w i U i+, U i, µ i= The weighted TV approximator with weights i(p i) i [, p ], w i = p correctly finds the first change-point with probability tending to as n +. we see the benefit of increasing n we see the benefit of adding weights to the TV penalty J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5

37 Proof sketch The first change-point î found by TV approximator maximizes F i = ĉ i, 2, where ĉ = X Ȳ = X Xβ + X W. ĉ is Gaussian, and F i is follows a non-central χ 2 distribution with { G i = EF i i(p i) β = p pwi 2 σ wi 2 wu 2 p i 2 (p u) 2 if i u, 2 u 2 (p i) 2 otherwise. We then just check when G u = max i G i J.P Vert (ParisTech) Group lasso in genomics IMA 32 / 5

38 Consistency for a single change-point remains J.Probust Vert (ParisTech) against fluctuations in the exact Groupchange-point lasso genomics location. IMA 33 / Accuracy: unweighted u=5 u=6 u=7 u=8 u=9 Accuracy: weighted u=5 u=6 u=7 u=8 u=9 Accuracy: weighted+vary u=5±2 u=6±2 u=7±2 u=8±2 u=9± p 2 4 p 2 4 p Figure 3: Single change-point accuracy for the group fused Lasso. Accuracy as a function of the number of profiles p when the change-point is placed in a variety of positions u =5to u =9(left and centre plots, resp. unweighted and weighted group fused Lasso), or: u = 5±2 to u = 9±2 (right plot, weighted with varying change-point location), for a signal of length.

39 Estimation of more change-points? Accuracy U LARS W LARS. U Lasso W Lasso 2 4 p Accuracy p Accuracy p Figure 4: Multiple change-point accuracy. Accuracy as a function of the number of profiles p when change-points are placed at the nine positions {, 2,...,9} and the variance σ 2 of the centered Gaussian noise is either.5 (left),.2 (center) and (right). The profile length is. loss on each interval of the segmentation by summarizing each profilebyitsaveragevalueontheinterval. Note that we do not assume that all profiles share exactly the same change-points, but merely see the joint segmentation J.P Vert (ParisTech) as an adaptive way to reducegroup the dimension lasso in genomics and remove noise from data. IMA 34 / 5

40 Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 35 / 5

41 Molecular diagnosis / prognosis / theragnosis J.P Vert (ParisTech) Group lasso in genomics IMA 36 / 5

42 Gene networks, gene groups N Glycan biosynthesis Glycolysis / Gluconeogenesis Porphyrin and chlorophyll metabolism Riboflavin metabolism Sulfur metabolism - Protein kinases Nitrogen, asparagine metabolism Folate biosynthesis Biosynthesis of steroids, ergosterol metabolism DNA and RNA polymerase subunits Lysine biosynthesis Phenylalanine, tyrosine and tryptophan biosynthesis Oxidative phosphorylation, TCA cycle Purine metabolism J.P Vert (ParisTech) Group lasso in genomics IMA 37 / 5

43 Structured feature selection Basic biological functions usually involve the coordinated action of several proteins: Formation of protein complexes Activation of metabolic, signalling or regulatory pathways How to perform structured feature selection, such that selected genes belong to only a few groups? form a small number of connected components on the graph? N Glycan biosynthesis Glycolysis / Gluconeogenesis Porphyrin and chlorophyll metabolism Protein kinases Sulfur metabolism Nitrogen, asparagine metabolism - Riboflavin metabolism Folate biosynthesis DNA and RNA polymerase subunits Biosynthesis of steroids, ergosterol metabolism Lysine biosynthesis Phenylalanine, tyrosine and tryptophan biosynthesis J.P Vert (ParisTech) Oxidative phosphorylation, TCA cycle Purine metabolism Group lasso in genomics IMA 38 / 5

44 Selecting pre-defined groups of variables Group lasso (Yuan & Lin, 26) If groups of covariates are likely to be selected together, the l /l 2 -norm induces sparse solutions at the group level: Ω group (w) = g w g 2 Ω(w, w 2, w 3 ) = (w, w 2 ) 2 + w 3 2 J.P Vert (ParisTech) Group lasso in genomics IMA 39 / 5

45 Group lasso with overlapping groups Idea : shrink groups to zero (Jenatton et al., 29) Ω group (w) = g w g 2 sets groups to. One variable is selected all the groups to which it belongs are selected. w g 2 = w g3 2 = IGF selection selection of unwanted groups Removal of any group containing a gene the weight of the gene is. J.P Vert (ParisTech) Group lasso in genomics IMA 4 / 5

46 Group lasso with overlapping groups Idea 2: latent group Lasso (Jacob et al., 29) min v g 2 v Ω G latent (w) = g G w = g G v g supp ( ) v g g. Properties Resulting support is a union of groups in G. Possible to select one variable without selecting all the groups containing it. Equivalent to group lasso when there is no overlap J.P Vert (ParisTech) Group lasso in genomics IMA 4 / 5

47 Overlap and group unity balls Balls for Ω G group ( ) (middle) and Ω G latent ( ) (right) for the groups G = {{, 2}, {2, 3}} where w 2 is represented as the vertical coordinate. Left: group-lasso (G = {{, 2}, {3}}), for comparison. J.P Vert (ParisTech) Group lasso in genomics IMA 42 / 5

48 Theoretical results Consistency in group support (Jacob et al., 29) Then Let w be the true parameter vector. Assume that there exists a unique decomposition v g such that w = g v g and Ω G latent ( w) = v g 2. Consider the regularized empirical risk minimization problem L(w) + λω G latent (w). under appropriate mutual incoherence conditions on X, as n, with very high probability, the optimal solution ŵ admits a unique decomposition (ˆv g ) g G such that { g G ˆvg } = { g G v g }. J.P Vert (ParisTech) Group lasso in genomics IMA 43 / 5

49 Theoretical results Consistency in group support (Jacob et al., 29) Then Let w be the true parameter vector. Assume that there exists a unique decomposition v g such that w = g v g and Ω G latent ( w) = v g 2. Consider the regularized empirical risk minimization problem L(w) + λω G latent (w). under appropriate mutual incoherence conditions on X, as n, with very high probability, the optimal solution ŵ admits a unique decomposition (ˆv g ) g G such that { g G ˆvg } = { g G v g }. J.P Vert (ParisTech) Group lasso in genomics IMA 43 / 5

50 Experiments Synthetic data: overlapping groups groups of variables with 2 variables of overlap between two successive groups :{,..., }, {9,..., 8},..., {73,..., 82}. Support: union of 4th and 5th groups. Learn from training points. Frequency of selection of each variable with the lasso (left) and Ω G latent (.) (middle), comparison of the RMSE of both methods (right). J.P Vert (ParisTech) Group lasso in genomics IMA 44 / 5

51 Graph lasso Two solutions Ω G group (β) = i j β 2 i + β 2 j, Ω G latent (β) = sup α β. α R p : i j, α 2 i +α2 j J.P Vert (ParisTech) Group lasso in genomics IMA 45 / 5

52 Preliminary results Breast cancer data Gene expression data for 8, 4 genes in 295 breast cancer tumors. Canonical pathways from MSigDB containing 639 groups of genes, 637 of which involve genes from our study. Graph on the genes. METHOD l Ω G LATENT (.) ERROR.38 ±.4.36 ±.3 MEAN PATH. 3 3 METHOD l Ω graph (.) ERROR.39 ±.4.36 ±. AV. SIZE C.C..3.3 J.P Vert (ParisTech) Group lasso in genomics IMA 46 / 5

53 Lasso signature J.P Vert (ParisTech) Group lasso in genomics IMA 47 / 5

54 Graph Lasso signature J.P Vert (ParisTech) Group lasso in genomics IMA 48 / 5

55 Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 49 / 5

56 Conclusions Feature / pattern selection in high dimension is central for many applications Convex sparsity-inducing penalties are useful; efficient implementations + consistency results Kevin Bleakley (INRIA), Laurent Jacob (UC Berkeley) Guillaume Obozinski (INRIA) Post-docs available in Paris! J.P Vert (ParisTech) Group lasso in genomics IMA 5 / 5

Supervised classification for structured data: Applications in bio- and chemoinformatics

Supervised classification for structured data: Applications in bio- and chemoinformatics Supervised classification for structured data: Applications in bio- and chemoinformatics Jean-Philippe Vert Jean-Philippe.Vert@mines-paristech.fr Mines ParisTech / Curie Institute / Inserm The third school

More information

Statistical learning on graphs

Statistical learning on graphs Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Graph Wavelets to Analyze Genomic Data with Biological Networks

Graph Wavelets to Analyze Genomic Data with Biological Networks Graph Wavelets to Analyze Genomic Data with Biological Networks Yunlong Jiao and Jean-Philippe Vert "Emerging Topics in Biological Networks and Systems Biology" symposium, Swedish Collegium for Advanced

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Sparse Additive Functional and kernel CCA

Sparse Additive Functional and kernel CCA Sparse Additive Functional and kernel CCA Sivaraman Balakrishnan* Kriti Puniyani* John Lafferty *Carnegie Mellon University University of Chicago Presented by Miao Liu 5/3/2013 Canonical correlation analysis

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Group Lasso with Overlaps: the Latent Group Lasso approach

Group Lasso with Overlaps: the Latent Group Lasso approach Group Lasso with Overlaps: the Latent Group Lasso approach Guillaume Obozinski, Laurent Jacob, Jean-Philippe Vert To cite this version: Guillaume Obozinski, Laurent Jacob, Jean-Philippe Vert. Group Lasso

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

Post-selection Inference for Changepoint Detection

Post-selection Inference for Changepoint Detection Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

The lasso: some novel algorithms and applications

The lasso: some novel algorithms and applications 1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Discriminative Feature Grouping

Discriminative Feature Grouping Discriative Feature Grouping Lei Han and Yu Zhang, Department of Computer Science, Hong Kong Baptist University, Hong Kong The Institute of Research and Continuing Education, Hong Kong Baptist University

More information

Linear models for the joint analysis of multiple. array-cgh profiles

Linear models for the joint analysis of multiple. array-cgh profiles Linear models for the joint analysis of multiple array-cgh profiles F. Picard, E. Lebarbier, B. Thiam, S. Robin. UMR 5558 CNRS Univ. Lyon 1, Lyon UMR 518 AgroParisTech/INRA, F-75231, Paris Statistics for

More information

Composite Loss Functions and Multivariate Regression; Sparse PCA

Composite Loss Functions and Multivariate Regression; Sparse PCA Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.

More information

Structured Sparse Estimation with Network Flow Optimization

Structured Sparse Estimation with Network Flow Optimization Structured Sparse Estimation with Network Flow Optimization Julien Mairal University of California, Berkeley Neyman seminar, Berkeley Julien Mairal Neyman seminar, UC Berkeley /48 Purpose of the talk introduce

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

l 1 and l 2 Regularization

l 1 and l 2 Regularization David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

Correlate. A method for the integrative analysis of two genomic data sets

Correlate. A method for the integrative analysis of two genomic data sets Correlate A method for the integrative analysis of two genomic data sets Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten February 19, 2010 Introduction Sparse Canonical Correlation

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

Pathwise coordinate optimization

Pathwise coordinate optimization Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,

More information

Structural Learning and Integrative Decomposition of Multi-View Data

Structural Learning and Integrative Decomposition of Multi-View Data Structural Learning and Integrative Decomposition of Multi-View Data, Department of Statistics, Texas A&M University JSM 2018, Vancouver, Canada July 31st, 2018 Dr. Gen Li, Columbia University, Mailman

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional

More information

arxiv: v1 [stat.me] 11 Feb 2016

arxiv: v1 [stat.me] 11 Feb 2016 The knockoff filter for FDR control in group-sparse and multitask regression arxiv:62.3589v [stat.me] Feb 26 Ran Dai e-mail: randai@uchicago.edu and Rina Foygel Barber e-mail: rina@uchicago.edu Abstract:

More information

Parcimonie en apprentissage statistique

Parcimonie en apprentissage statistique Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic

More information

On Optimal Frame Conditioners

On Optimal Frame Conditioners On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

Hub Gene Selection Methods for the Reconstruction of Transcription Networks

Hub Gene Selection Methods for the Reconstruction of Transcription Networks for the Reconstruction of Transcription Networks José Miguel Hernández-Lobato (1) and Tjeerd. M. H. Dijkstra (2) (1) Computer Science Department, Universidad Autónoma de Madrid, Spain (2) Institute for

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium

More information

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed Laurent Jacob 1 laurent@stat.berkeley.edu Johann Gagnon-Bartsch 1 johann@berkeley.edu Terence

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

ORIE 4741: Learning with Big Messy Data. Regularization

ORIE 4741: Learning with Big Messy Data. Regularization ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Sparse canonical correlation analysis, with applications to genomic data

Sparse canonical correlation analysis, with applications to genomic data Sparse canonical correlation analysis, with applications to genomic data Daniela M. Witten and Robert Tibshirani June 15, 2009 The framework Suppose that we have n observations on p 1 + p 2 variables,

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

(k, q)-trace norm for sparse matrix factorization

(k, q)-trace norm for sparse matrix factorization (k, q)-trace norm for sparse matrix factorization Emile Richard Department of Electrical Engineering Stanford University emileric@stanford.edu Guillaume Obozinski Imagine Ecole des Ponts ParisTech guillaume.obozinski@imagine.enpc.fr

More information

Research Statement on Statistics Jun Zhang

Research Statement on Statistics Jun Zhang Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation

More information

Predicting Workplace Incidents with Temporal Graph-guided Fused Lasso

Predicting Workplace Incidents with Temporal Graph-guided Fused Lasso Predicting Workplace Incidents with Temporal Graph-guided Fused Lasso Keerthiram Murugesan 1 and Jaime Carbonell 1 1 Language Technologies Institute Carnegie Mellon University Pittsburgh, USA CMU-LTI-15-??

More information

Homogeneity Pursuit. Jianqing Fan

Homogeneity Pursuit. Jianqing Fan Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles

More information

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.

SCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University. SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Metabolic networks: Activity detection and Inference

Metabolic networks: Activity detection and Inference 1 Metabolic networks: Activity detection and Inference Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group Advanced microarray analysis course, Elsinore, Denmark, May 21th,

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Segmentation of the mean of heteroscedastic data via cross-validation

Segmentation of the mean of heteroscedastic data via cross-validation Segmentation of the mean of heteroscedastic data via cross-validation 1 UMR 8524 CNRS - Université Lille 1 2 SSB Group, Paris joint work with Sylvain Arlot GDR Statistique et Santé Paris, October, 21 2009

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee May 13, 2005

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations

Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem

More information

Regularized Regression A Bayesian point of view

Regularized Regression A Bayesian point of view Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,

More information

Grouping of correlated feature vectors using treelets

Grouping of correlated feature vectors using treelets Grouping of correlated feature vectors using treelets Jing Xiang Department of Machine Learning Carnegie Mellon University Pittsburgh, PA 15213 jingx@cs.cmu.edu Abstract In many applications, features

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification

Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification Tong Tong Wu Department of Epidemiology and Biostatistics

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

A Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure

A Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure A Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure Micol Marchetti-Bowick Machine Learning Department Carnegie Mellon University micolmb@cs.cmu.edu Abstract

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

An Efficient Proximal Gradient Method for General Structured Sparse Learning

An Efficient Proximal Gradient Method for General Structured Sparse Learning Journal of Machine Learning Research 11 (2010) Submitted 11/2010; Published An Efficient Proximal Gradient Method for General Structured Sparse Learning Xi Chen Qihang Lin Seyoung Kim Jaime G. Carbonell

More information

Structured Sparsity. group testing & compressed sensing a norm that induces structured sparsity. Mittagsseminar 2011 / 11 / 10 Martin Jaggi

Structured Sparsity. group testing & compressed sensing a norm that induces structured sparsity. Mittagsseminar 2011 / 11 / 10 Martin Jaggi Structured Sparsity group testing & compressed sensing a norm that induces structured sparsity (ariv.org/abs/1110.0413 Obozinski, G., Jacob, L., & Vert, J.-P., October 2011) Mittagssear 2011 / 11 / 10

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Overlapping Variable Clustering with Statistical Guarantees and LOVE

Overlapping Variable Clustering with Statistical Guarantees and LOVE with Statistical Guarantees and LOVE Department of Statistical Science Cornell University WHOA-PSI, St. Louis, August 2017 Joint work with Mike Bing, Yang Ning and Marten Wegkamp Cornell University, Department

More information

IEOR 265 Lecture 3 Sparse Linear Regression

IEOR 265 Lecture 3 Sparse Linear Regression IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

STANDARDIZATION AND THE GROUP LASSO PENALTY

STANDARDIZATION AND THE GROUP LASSO PENALTY Statistica Sinica (0), 983-00 doi:http://dx.doi.org/0.5705/ss.0.075 STANDARDIZATION AND THE GROUP LASSO PENALTY Noah Simon and Robert Tibshirani Stanford University Abstract: We re-examine the original

More information

Standardization and the Group Lasso Penalty

Standardization and the Group Lasso Penalty Standardization and the Group Lasso Penalty Noah Simon and Rob Tibshirani Corresponding author, email: nsimon@stanfordedu Sequoia Hall, Stanford University, CA 9435 March, Abstract We re-examine the original

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Nikhil Rao, Miroslav Dudík, Zaid Harchaoui

Nikhil Rao, Miroslav Dudík, Zaid Harchaoui THE GROUP k-support NORM FOR LEARNING WITH STRUCTURED SPARSITY Nikhil Rao, Miroslav Dudík, Zaid Harchaoui Technicolor R&I, Microsoft Research, University of Washington ABSTRACT Several high-dimensional

More information

Kernels to detect abrupt changes in time series

Kernels to detect abrupt changes in time series 1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort

More information

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,

More information

The knockoff filter for FDR control in group-sparse and multitask regression

The knockoff filter for FDR control in group-sparse and multitask regression The knockoff filter for FDR control in group-sparse and multitask regression Ran Dai Department of Statistics, University of Chicago, Chicago IL 6637 USA Rina Foygel Barber Department of Statistics, University

More information

In this section you will learn the following : 40.1Double integrals

In this section you will learn the following : 40.1Double integrals Module 14 : Double Integrals, Applilcations to Areas and Volumes Change of variables Lecture 40 : Double integrals over rectangular domains [Section 40.1] Objectives In this section you will learn the

More information

Linear and logistic regression

Linear and logistic regression Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis

More information