Group lasso for genomic data
|
|
- Nathan Lamb
- 5 years ago
- Views:
Transcription
1 Group lasso for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm Machine learning: Theory and Computation workshop, IMA, Minneapolis, March 26-3, 22 J.P Vert (ParisTech) Group lasso in genomics IMA / 5
2 Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5
3 Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5
4 Chromosomic aberrations in cancer J.P Vert (ParisTech) Group lasso in genomics IMA 4 / 5
5 Comparative Genomic Hybridization (CGH) Motivation Comparative genomic hybridization (CGH) data measure the DNA copy number along the genome Very useful, in particular in cancer research to observe systematically variants in DNA content Log-ratio J.P Vert (ParisTech) Chromosome Group lasso in genomics X IMA 5 / 5
6 Can we identify breakpoints and "smooth" each profile? J.P Vert (ParisTech) Group lasso in genomics IMA 6 / 5
7 Can we detect frequent breakpoints? A collection of bladder tumour copy number profiles. J.P Vert (ParisTech) Group lasso in genomics IMA 7 / 5
8 DNA RNA protein CGH shows the (static) DNA Cancer cells have also abnormal (dynamic) gene expression (= transcription) J.P Vert (ParisTech) Group lasso in genomics IMA 8 / 5
9 Can we identify the cancer subtype? (diagnosis) J.P Vert (ParisTech) Group lasso in genomics IMA 9 / 5
10 Can we predict the future evolution? (prognosis) J.P Vert (ParisTech) Group lasso in genomics IMA / 5
11 Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA / 5
12 The problem Let Y R p the signal We want to find a piecewise constant approximation Û Rp with at most k change-points. J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5
13 The problem Let Y R p the signal We want to find a piecewise constant approximation Û Rp with at most k change-points. J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5
14 An optimal solution? We can define an "optimal" piecewise constant approximation Û R p as the solution of min U R p Y U 2 such that p (U i+ U i ) k This is an optimization problem over the ( p k) partitions... Dynamic programming finds the solution in O(p 2 k) in time and O(p 2 ) in memory But: does not scale to p = J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5 i=
15 An optimal solution? We can define an "optimal" piecewise constant approximation Û R p as the solution of min U R p Y U 2 such that p (U i+ U i ) k This is an optimization problem over the ( p k) partitions... Dynamic programming finds the solution in O(p 2 k) in time and O(p 2 ) in memory But: does not scale to p = J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5 i=
16 An optimal solution? We can define an "optimal" piecewise constant approximation Û R p as the solution of min U R p Y U 2 such that p (U i+ U i ) k This is an optimization problem over the ( p k) partitions... Dynamic programming finds the solution in O(p 2 k) in time and O(p 2 ) in memory But: does not scale to p = J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5 i=
17 An optimal solution? We can define an "optimal" piecewise constant approximation Û R p as the solution of min U R p Y U 2 such that p (U i+ U i ) k This is an optimization problem over the ( p k) partitions... Dynamic programming finds the solution in O(p 2 k) in time and O(p 2 ) in memory But: does not scale to p = J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5 i=
18 Promoting sparsity with the l penalty The l penalty (Tibshirani, 996; Chen et al., 998) If R(β) is convex and "smooth", the solution of p min R(β) + λ β i β R p is usually sparse. i= J.P Vert (ParisTech) Group lasso in genomics IMA 4 / 5
19 Promoting piecewise constant profiles penalty The total variation / variable fusion penalty If R(β) is convex and "smooth", the solution of min R(β) + λ p β i+ β i β R p is usually piecewise constant (Rudin et al., 992; Land and Friedman, 996). Proof: i= Change of variable u i = β i+ β i, u = β We obtain a Lasso problem in u R p u sparse means β piecewise constant J.P Vert (ParisTech) Group lasso in genomics IMA 5 / 5
20 TV signal approximator min β R p Y β 2 such that p β i+ β i µ Adding additional constraints does not change the change-points: p i= β i ν (Tibshirani et al., 25; Tibshirani and Wang, 28) p i= β2 i ν (Mairal et al. 2) i= J.P Vert (ParisTech) Group lasso in genomics IMA 6 / 5
21 Solving TV signal approximator min β R p Y β 2 such that p β i+ β i µ QP with sparse linear constraints in O(p 2 ) -> 35 min for p = 5 (Tibshirani and Wang, 28) Coordinate descent-like method O(p)? -> 3s s for p = 5 (Friedman et al., 27) For all µ with the LARS in O(pK ) (Harchaoui and Levy-Leduc, 28) For all µ in O(p ln p) (Hoefling, 29) For the first K change-points in O(p ln K ) (Bleakley and V., 2) i= J.P Vert (ParisTech) Group lasso in genomics IMA 7 / 5
22 We will investigate different function I L (I), I R (I) and (I). The dichotomic segmentation method, presented in Algorithm, then proceeds as follows: starting from the full interval [i, n] as a trivial partition of [,n] into intervals, and then iteratively refine any partition P of [,n] into p intervals P = {I,...,I p } by splitting the interval I 2 P with maximal (I ) into the two intervals I L (I ) and I R (I ). TV signal approximator as dichotomic segmentation Algorithm Greedy dichotomic segmentation Require: k number of intervals, (I) gain function to split an interval I into I L (I),I R (I) : I represents the interval [,n] 2: P = {I } 3: for i =to k do 4: I arg max (I ) I2P 5: P P\{I } 6: P P [ {I L (I ),I R (I )} 7: end for 8: return P Theorem TV signal approximator performs "greedy" dichotomic segmentation (V. and Bleakley, 2; see also Hoefling, 29) 2 J.P Vert (ParisTech) Group lasso in genomics IMA 8 / 5
23 Speed trial : 2 s. for K =, p = 7 Speed for K=,, e2, e3, e4, e seconds signal length x 5 J.P Vert (ParisTech) Group lasso in genomics IMA 9 / 5
24 Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5
25 The problem Let Y R p n the n signals of length p We want to find a piecewise constant approximation Û Rp n with at most k change-points. J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5
26 The problem Let Y R p n the n signals of length p We want to find a piecewise constant approximation Û Rp n with at most k change-points. J.P Vert (ParisTech) Group lasso in genomics IMA 2 / 5
27 "Optimal" segmentation by dynamic programming Define the "optimal" piecewise constant approximation Û Rp n of Y as the solution of min Y U 2 such U Rp n that p ( ) U i+, U i, k i= DP finds the solution in O(p 2 kn) in time and O(p 2 ) in memory But: does not scale to p = J.P Vert (ParisTech) Group lasso in genomics IMA 22 / 5
28 Selecting pre-defined groups of variables Group lasso (Yuan & Lin, 26) If groups of covariates are likely to be selected together, the l /l 2 -norm induces sparse solutions at the group level: Ω group (w) = g w g 2 Ω(w, w 2, w 3 ) = (w, w 2 ) 2 + w 3 2 = w 2 + w 22 + w3 2 J.P Vert (ParisTech) Group lasso in genomics IMA 23 / 5
29 TV approximator for many signals Replace min Y U 2 such U Rp n that p ( ) U i+, U i, k i= by min Y U 2 such U Rp n that p w i U i+, U i, µ i= Questions Practice: can we solve it efficiently? Theory: does it benefit from increasing p (for n fixed)? J.P Vert (ParisTech) Group lasso in genomics IMA 24 / 5
30 TV approximator as a group Lasso problem Make the change of variables: γ = U,, β i, = w i ( Ui+, U i, ) for i =,..., p. TV approximator is then equivalent to the following group Lasso problem (Yuan and Lin, 26): min Ȳ Xβ p 2 + λ β i,, β R (p ) n where Ȳ is the centered signal matrix and X is a particular (p ) (p ) design matrix. i= J.P Vert (ParisTech) Group lasso in genomics IMA 25 / 5
31 TV approximator implementation Theorem min Ȳ Xβ p 2 + λ β i,, β R (p ) n i= The TV approximator can be solved efficiently: approximately with the group LARS in O(npk) in time and O(np) in memory exactly with a block coordinate descent + active set method in O(np) in memory J.P Vert (ParisTech) Group lasso in genomics IMA 26 / 5
32 Proof: computational tricks... Although X is (p ) (p ): For any R R p n, we can compute C = X R in O(np) operations and memory For any two subset of indices A = ( a,..., a A ) and B = ( b,..., b B ) in [, p ], we can compute X,A X,B in O ( A B ) in time and memory For any A = ( a,..., a A ), set of distinct indices with a <... < a A p, and for any A n matrix R, we can compute C = ( X,A X,A ) R in O( A n) in time and memory J.P Vert (ParisTech) Group lasso in genomics IMA 27 / 5
33 Next, we tested empirically the accuracy the group fused Lasso for detecting a single change-point. We first generated J.P Vert (ParisTech) multidimensional profiles ofgroup dimension lasso inp, genomics withasinglejumpofheight at a position IMA u, 28 for / 5 respect to the LARS version. We suggest that this may be due to the Lasso optimization problem becoming relatively easier to solve when n or p increases, as we observed that the Lasso algorithm converged quickly to its final set of change-points. The main difference between thelassoandlarsperformanceappears when the number of change-points increases: with respective empiricalcomplexitiescubicandlinearink, as predicted by the theoretical analysis, Lasso is already, times slower than LARS when we seek change-points. Speed trial time (s) time (s) time (s) 2 GFLars GFLasso n 2 GFLars GFLasso 3 5 p 2 GFLars GFLasso k Figure 2: Speed trials for group fused LARS (top row) and Lasso (bottom row). Left column: varying n,with fixedp =and k =; center column: varying p,with fixedn =and k =; right column: varying k, with fixed n =and p =.Figure axes are log-log. Results are averaged over trials. 6.2 Accuracy for detection of a single change-point
34 Consistency for a single change-point Suppose a single change-point: at position u = αp with increments (β i ) i=,...,n s.t. β 2 = lim k n n i= β2 i corrupted by i.i.d. Gaussian noise of variance σ Does the TV approximator correctly estimate the first change-point as p increases? J.P Vert (ParisTech) Group lasso in genomics IMA 29 / 5
35 Consistency of the unweighted TV approximator Theorem min Y U 2 such U Rp n that p U i+, U i, µ The unweighted TV approximator finds the correct change-point with probability tending to (resp. ) as n + if σ 2 < σ α 2 (resp. σ 2 > σ α), 2 where σ α 2 = p β ( 2 α)2 (α 2p ) α 2. 2p i= correct estimation on [pɛ, p( ɛ)] with ɛ = wrong estimation near the boundaries σ 2 2p β 2 + o(p /2 ). J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5
36 Consistency of the weighted TV approximator Theorem min Y U 2 such U Rp n that p w i U i+, U i, µ i= The weighted TV approximator with weights i(p i) i [, p ], w i = p correctly finds the first change-point with probability tending to as n +. we see the benefit of increasing n we see the benefit of adding weights to the TV penalty J.P Vert (ParisTech) Group lasso in genomics IMA 3 / 5
37 Proof sketch The first change-point î found by TV approximator maximizes F i = ĉ i, 2, where ĉ = X Ȳ = X Xβ + X W. ĉ is Gaussian, and F i is follows a non-central χ 2 distribution with { G i = EF i i(p i) β = p pwi 2 σ wi 2 wu 2 p i 2 (p u) 2 if i u, 2 u 2 (p i) 2 otherwise. We then just check when G u = max i G i J.P Vert (ParisTech) Group lasso in genomics IMA 32 / 5
38 Consistency for a single change-point remains J.Probust Vert (ParisTech) against fluctuations in the exact Groupchange-point lasso genomics location. IMA 33 / Accuracy: unweighted u=5 u=6 u=7 u=8 u=9 Accuracy: weighted u=5 u=6 u=7 u=8 u=9 Accuracy: weighted+vary u=5±2 u=6±2 u=7±2 u=8±2 u=9± p 2 4 p 2 4 p Figure 3: Single change-point accuracy for the group fused Lasso. Accuracy as a function of the number of profiles p when the change-point is placed in a variety of positions u =5to u =9(left and centre plots, resp. unweighted and weighted group fused Lasso), or: u = 5±2 to u = 9±2 (right plot, weighted with varying change-point location), for a signal of length.
39 Estimation of more change-points? Accuracy U LARS W LARS. U Lasso W Lasso 2 4 p Accuracy p Accuracy p Figure 4: Multiple change-point accuracy. Accuracy as a function of the number of profiles p when change-points are placed at the nine positions {, 2,...,9} and the variance σ 2 of the centered Gaussian noise is either.5 (left),.2 (center) and (right). The profile length is. loss on each interval of the segmentation by summarizing each profilebyitsaveragevalueontheinterval. Note that we do not assume that all profiles share exactly the same change-points, but merely see the joint segmentation J.P Vert (ParisTech) as an adaptive way to reducegroup the dimension lasso in genomics and remove noise from data. IMA 34 / 5
40 Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 35 / 5
41 Molecular diagnosis / prognosis / theragnosis J.P Vert (ParisTech) Group lasso in genomics IMA 36 / 5
42 Gene networks, gene groups N Glycan biosynthesis Glycolysis / Gluconeogenesis Porphyrin and chlorophyll metabolism Riboflavin metabolism Sulfur metabolism - Protein kinases Nitrogen, asparagine metabolism Folate biosynthesis Biosynthesis of steroids, ergosterol metabolism DNA and RNA polymerase subunits Lysine biosynthesis Phenylalanine, tyrosine and tryptophan biosynthesis Oxidative phosphorylation, TCA cycle Purine metabolism J.P Vert (ParisTech) Group lasso in genomics IMA 37 / 5
43 Structured feature selection Basic biological functions usually involve the coordinated action of several proteins: Formation of protein complexes Activation of metabolic, signalling or regulatory pathways How to perform structured feature selection, such that selected genes belong to only a few groups? form a small number of connected components on the graph? N Glycan biosynthesis Glycolysis / Gluconeogenesis Porphyrin and chlorophyll metabolism Protein kinases Sulfur metabolism Nitrogen, asparagine metabolism - Riboflavin metabolism Folate biosynthesis DNA and RNA polymerase subunits Biosynthesis of steroids, ergosterol metabolism Lysine biosynthesis Phenylalanine, tyrosine and tryptophan biosynthesis J.P Vert (ParisTech) Oxidative phosphorylation, TCA cycle Purine metabolism Group lasso in genomics IMA 38 / 5
44 Selecting pre-defined groups of variables Group lasso (Yuan & Lin, 26) If groups of covariates are likely to be selected together, the l /l 2 -norm induces sparse solutions at the group level: Ω group (w) = g w g 2 Ω(w, w 2, w 3 ) = (w, w 2 ) 2 + w 3 2 J.P Vert (ParisTech) Group lasso in genomics IMA 39 / 5
45 Group lasso with overlapping groups Idea : shrink groups to zero (Jenatton et al., 29) Ω group (w) = g w g 2 sets groups to. One variable is selected all the groups to which it belongs are selected. w g 2 = w g3 2 = IGF selection selection of unwanted groups Removal of any group containing a gene the weight of the gene is. J.P Vert (ParisTech) Group lasso in genomics IMA 4 / 5
46 Group lasso with overlapping groups Idea 2: latent group Lasso (Jacob et al., 29) min v g 2 v Ω G latent (w) = g G w = g G v g supp ( ) v g g. Properties Resulting support is a union of groups in G. Possible to select one variable without selecting all the groups containing it. Equivalent to group lasso when there is no overlap J.P Vert (ParisTech) Group lasso in genomics IMA 4 / 5
47 Overlap and group unity balls Balls for Ω G group ( ) (middle) and Ω G latent ( ) (right) for the groups G = {{, 2}, {2, 3}} where w 2 is represented as the vertical coordinate. Left: group-lasso (G = {{, 2}, {3}}), for comparison. J.P Vert (ParisTech) Group lasso in genomics IMA 42 / 5
48 Theoretical results Consistency in group support (Jacob et al., 29) Then Let w be the true parameter vector. Assume that there exists a unique decomposition v g such that w = g v g and Ω G latent ( w) = v g 2. Consider the regularized empirical risk minimization problem L(w) + λω G latent (w). under appropriate mutual incoherence conditions on X, as n, with very high probability, the optimal solution ŵ admits a unique decomposition (ˆv g ) g G such that { g G ˆvg } = { g G v g }. J.P Vert (ParisTech) Group lasso in genomics IMA 43 / 5
49 Theoretical results Consistency in group support (Jacob et al., 29) Then Let w be the true parameter vector. Assume that there exists a unique decomposition v g such that w = g v g and Ω G latent ( w) = v g 2. Consider the regularized empirical risk minimization problem L(w) + λω G latent (w). under appropriate mutual incoherence conditions on X, as n, with very high probability, the optimal solution ŵ admits a unique decomposition (ˆv g ) g G such that { g G ˆvg } = { g G v g }. J.P Vert (ParisTech) Group lasso in genomics IMA 43 / 5
50 Experiments Synthetic data: overlapping groups groups of variables with 2 variables of overlap between two successive groups :{,..., }, {9,..., 8},..., {73,..., 82}. Support: union of 4th and 5th groups. Learn from training points. Frequency of selection of each variable with the lasso (left) and Ω G latent (.) (middle), comparison of the RMSE of both methods (right). J.P Vert (ParisTech) Group lasso in genomics IMA 44 / 5
51 Graph lasso Two solutions Ω G group (β) = i j β 2 i + β 2 j, Ω G latent (β) = sup α β. α R p : i j, α 2 i +α2 j J.P Vert (ParisTech) Group lasso in genomics IMA 45 / 5
52 Preliminary results Breast cancer data Gene expression data for 8, 4 genes in 295 breast cancer tumors. Canonical pathways from MSigDB containing 639 groups of genes, 637 of which involve genes from our study. Graph on the genes. METHOD l Ω G LATENT (.) ERROR.38 ±.4.36 ±.3 MEAN PATH. 3 3 METHOD l Ω graph (.) ERROR.39 ±.4.36 ±. AV. SIZE C.C..3.3 J.P Vert (ParisTech) Group lasso in genomics IMA 46 / 5
53 Lasso signature J.P Vert (ParisTech) Group lasso in genomics IMA 47 / 5
54 Graph Lasso signature J.P Vert (ParisTech) Group lasso in genomics IMA 48 / 5
55 Outline Motivations 2 Finding multiple change-points in a single profile 3 Finding multiple change-points shared by many signals 4 Learning molecular classifiers with network information 5 Conclusion J.P Vert (ParisTech) Group lasso in genomics IMA 49 / 5
56 Conclusions Feature / pattern selection in high dimension is central for many applications Convex sparsity-inducing penalties are useful; efficient implementations + consistency results Kevin Bleakley (INRIA), Laurent Jacob (UC Berkeley) Guillaume Obozinski (INRIA) Post-docs available in Paris! J.P Vert (ParisTech) Group lasso in genomics IMA 5 / 5
Supervised classification for structured data: Applications in bio- and chemoinformatics
Supervised classification for structured data: Applications in bio- and chemoinformatics Jean-Philippe Vert Jean-Philippe.Vert@mines-paristech.fr Mines ParisTech / Curie Institute / Inserm The third school
More informationStatistical learning on graphs
Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationGraph Wavelets to Analyze Genomic Data with Biological Networks
Graph Wavelets to Analyze Genomic Data with Biological Networks Yunlong Jiao and Jean-Philippe Vert "Emerging Topics in Biological Networks and Systems Biology" symposium, Swedish Collegium for Advanced
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationSparse Additive Functional and kernel CCA
Sparse Additive Functional and kernel CCA Sivaraman Balakrishnan* Kriti Puniyani* John Lafferty *Carnegie Mellon University University of Chicago Presented by Miao Liu 5/3/2013 Canonical correlation analysis
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationGroup Lasso with Overlaps: the Latent Group Lasso approach
Group Lasso with Overlaps: the Latent Group Lasso approach Guillaume Obozinski, Laurent Jacob, Jean-Philippe Vert To cite this version: Guillaume Obozinski, Laurent Jacob, Jean-Philippe Vert. Group Lasso
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationVariable Selection in Structured High-dimensional Covariate Spaces
Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007
More informationPost-selection Inference for Changepoint Detection
Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationThe lasso: some novel algorithms and applications
1 The lasso: some novel algorithms and applications Newton Institute, June 25, 2008 Robert Tibshirani Stanford University Collaborations with Trevor Hastie, Jerome Friedman, Holger Hoefling, Gen Nowak,
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationDiscriminative Feature Grouping
Discriative Feature Grouping Lei Han and Yu Zhang, Department of Computer Science, Hong Kong Baptist University, Hong Kong The Institute of Research and Continuing Education, Hong Kong Baptist University
More informationLinear models for the joint analysis of multiple. array-cgh profiles
Linear models for the joint analysis of multiple array-cgh profiles F. Picard, E. Lebarbier, B. Thiam, S. Robin. UMR 5558 CNRS Univ. Lyon 1, Lyon UMR 518 AgroParisTech/INRA, F-75231, Paris Statistics for
More informationComposite Loss Functions and Multivariate Regression; Sparse PCA
Composite Loss Functions and Multivariate Regression; Sparse PCA G. Obozinski, B. Taskar, and M. I. Jordan (2009). Joint covariate selection and joint subspace selection for multiple classification problems.
More informationStructured Sparse Estimation with Network Flow Optimization
Structured Sparse Estimation with Network Flow Optimization Julien Mairal University of California, Berkeley Neyman seminar, Berkeley Julien Mairal Neyman seminar, UC Berkeley /48 Purpose of the talk introduce
More informationInferring Transcriptional Regulatory Networks from Gene Expression Data II
Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday
More informationGeneralized Power Method for Sparse Principal Component Analysis
Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationl 1 and l 2 Regularization
David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationCorrelate. A method for the integrative analysis of two genomic data sets
Correlate A method for the integrative analysis of two genomic data sets Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten February 19, 2010 Introduction Sparse Canonical Correlation
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationPathwise coordinate optimization
Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,
More informationStructural Learning and Integrative Decomposition of Multi-View Data
Structural Learning and Integrative Decomposition of Multi-View Data, Department of Statistics, Texas A&M University JSM 2018, Vancouver, Canada July 31st, 2018 Dr. Gen Li, Columbia University, Mailman
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationPenalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms
university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional
More informationarxiv: v1 [stat.me] 11 Feb 2016
The knockoff filter for FDR control in group-sparse and multitask regression arxiv:62.3589v [stat.me] Feb 26 Ran Dai e-mail: randai@uchicago.edu and Rina Foygel Barber e-mail: rina@uchicago.edu Abstract:
More informationParcimonie en apprentissage statistique
Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationSparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results
Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic
More informationOn Optimal Frame Conditioners
On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,
More informationAn Homotopy Algorithm for the Lasso with Online Observations
An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu
More informationHub Gene Selection Methods for the Reconstruction of Transcription Networks
for the Reconstruction of Transcription Networks José Miguel Hernández-Lobato (1) and Tjeerd. M. H. Dijkstra (2) (1) Computer Science Department, Universidad Autónoma de Madrid, Spain (2) Institute for
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationLeast squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova
Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium
More informationCorrecting gene expression data when neither the unwanted variation nor the factor of interest are observed
Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed Laurent Jacob 1 laurent@stat.berkeley.edu Johann Gagnon-Bartsch 1 johann@berkeley.edu Terence
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationORIE 4741: Learning with Big Messy Data. Regularization
ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationSparse canonical correlation analysis, with applications to genomic data
Sparse canonical correlation analysis, with applications to genomic data Daniela M. Witten and Robert Tibshirani June 15, 2009 The framework Suppose that we have n observations on p 1 + p 2 variables,
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More information(k, q)-trace norm for sparse matrix factorization
(k, q)-trace norm for sparse matrix factorization Emile Richard Department of Electrical Engineering Stanford University emileric@stanford.edu Guillaume Obozinski Imagine Ecole des Ponts ParisTech guillaume.obozinski@imagine.enpc.fr
More informationResearch Statement on Statistics Jun Zhang
Research Statement on Statistics Jun Zhang (junzhang@galton.uchicago.edu) My interest on statistics generally includes machine learning and statistical genetics. My recent work focus on detection and interpretation
More informationPredicting Workplace Incidents with Temporal Graph-guided Fused Lasso
Predicting Workplace Incidents with Temporal Graph-guided Fused Lasso Keerthiram Murugesan 1 and Jaime Carbonell 1 1 Language Technologies Institute Carnegie Mellon University Pittsburgh, USA CMU-LTI-15-??
More informationHomogeneity Pursuit. Jianqing Fan
Jianqing Fan Princeton University with Tracy Ke and Yichao Wu http://www.princeton.edu/ jqfan June 5, 2014 Get my own profile - Help Amazing Follow this author Grace Wahba 9 Followers Follow new articles
More informationSCMA292 Mathematical Modeling : Machine Learning. Krikamol Muandet. Department of Mathematics Faculty of Science, Mahidol University.
SCMA292 Mathematical Modeling : Machine Learning Krikamol Muandet Department of Mathematics Faculty of Science, Mahidol University February 9, 2016 Outline Quick Recap of Least Square Ridge Regression
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationMetabolic networks: Activity detection and Inference
1 Metabolic networks: Activity detection and Inference Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group Advanced microarray analysis course, Elsinore, Denmark, May 21th,
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationSegmentation of the mean of heteroscedastic data via cross-validation
Segmentation of the mean of heteroscedastic data via cross-validation 1 UMR 8524 CNRS - Université Lille 1 2 SSB Group, Paris joint work with Sylvain Arlot GDR Statistique et Santé Paris, October, 21 2009
More informationApproximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.
Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and
More informationA Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data
A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee May 13, 2005
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationMultiple Change-Point Detection and Analysis of Chromosome Copy Number Variations
Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem
More informationRegularized Regression A Bayesian point of view
Regularized Regression A Bayesian point of view Vincent MICHEL Director : Gilles Celeux Supervisor : Bertrand Thirion Parietal Team, INRIA Saclay Ile-de-France LRI, Université Paris Sud CEA, DSV, I2BM,
More informationGrouping of correlated feature vectors using treelets
Grouping of correlated feature vectors using treelets Jing Xiang Department of Machine Learning Carnegie Mellon University Pittsburgh, PA 15213 jingx@cs.cmu.edu Abstract In many applications, features
More informationAn Introduction to Sparse Approximation
An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,
More informationVariable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification
Variable Selection for High-Dimensional Data with Spatial-Temporal Effects and Extensions to Multitask Regression and Multicategory Classification Tong Tong Wu Department of Epidemiology and Biostatistics
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationA Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure
A Penalized Regression Model for the Joint Estimation of eqtl Associations and Gene Network Structure Micol Marchetti-Bowick Machine Learning Department Carnegie Mellon University micolmb@cs.cmu.edu Abstract
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationAn Efficient Proximal Gradient Method for General Structured Sparse Learning
Journal of Machine Learning Research 11 (2010) Submitted 11/2010; Published An Efficient Proximal Gradient Method for General Structured Sparse Learning Xi Chen Qihang Lin Seyoung Kim Jaime G. Carbonell
More informationStructured Sparsity. group testing & compressed sensing a norm that induces structured sparsity. Mittagsseminar 2011 / 11 / 10 Martin Jaggi
Structured Sparsity group testing & compressed sensing a norm that induces structured sparsity (ariv.org/abs/1110.0413 Obozinski, G., Jacob, L., & Vert, J.-P., October 2011) Mittagssear 2011 / 11 / 10
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationOverlapping Variable Clustering with Statistical Guarantees and LOVE
with Statistical Guarantees and LOVE Department of Statistical Science Cornell University WHOA-PSI, St. Louis, August 2017 Joint work with Mike Bing, Yang Ning and Marten Wegkamp Cornell University, Department
More informationIEOR 265 Lecture 3 Sparse Linear Regression
IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationSTANDARDIZATION AND THE GROUP LASSO PENALTY
Statistica Sinica (0), 983-00 doi:http://dx.doi.org/0.5705/ss.0.075 STANDARDIZATION AND THE GROUP LASSO PENALTY Noah Simon and Robert Tibshirani Stanford University Abstract: We re-examine the original
More informationStandardization and the Group Lasso Penalty
Standardization and the Group Lasso Penalty Noah Simon and Rob Tibshirani Corresponding author, email: nsimon@stanfordedu Sequoia Hall, Stanford University, CA 9435 March, Abstract We re-examine the original
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationNikhil Rao, Miroslav Dudík, Zaid Harchaoui
THE GROUP k-support NORM FOR LEARNING WITH STRUCTURED SPARSITY Nikhil Rao, Miroslav Dudík, Zaid Harchaoui Technicolor R&I, Microsoft Research, University of Washington ABSTRACT Several high-dimensional
More informationKernels to detect abrupt changes in time series
1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationGeneralized Concomitant Multi-Task Lasso for sparse multimodal regression
Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias https://mathurinm.github.io INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort
More informationStochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,
More informationThe knockoff filter for FDR control in group-sparse and multitask regression
The knockoff filter for FDR control in group-sparse and multitask regression Ran Dai Department of Statistics, University of Chicago, Chicago IL 6637 USA Rina Foygel Barber Department of Statistics, University
More informationIn this section you will learn the following : 40.1Double integrals
Module 14 : Double Integrals, Applilcations to Areas and Volumes Change of variables Lecture 40 : Double integrals over rectangular domains [Section 40.1] Objectives In this section you will learn the
More informationLinear and logistic regression
Linear and logistic regression Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Linear and logistic regression 1/22 Outline 1 Linear regression 2 Logistic regression 3 Fisher discriminant analysis
More information