Generalized Concomitant Multi-Task Lasso for sparse multimodal regression

Size: px
Start display at page:

Download "Generalized Concomitant Multi-Task Lasso for sparse multimodal regression"

Transcription

1 Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort (INRIA Saclay) Joseph Salmon (Télécom ParisTech)

2 Table of Contents Motivation - problem setup Joint estimators of the noise level General noise model Block homoscedastic model

3 "One" motivation: M/EEG inverse problem sensors: magneto- and electro-encephalogram measurements during a cognitive experiment sources: brain locations

4 p(m) e assume Gaussian variables: The M/EEG inverse problem: modelisation variables: E N (0, E ) E N (0, 2.5e 13 X N (0, dditive model: 0 X) B X) 1.3e e 13 X p n 100 d 1.3e 13 X N (0, E) M = GX + E. M = GX +E. X are Y known, X is obtained by maximizing the l is obtained by maximizing the likelihood which lead X = arg min M GX E + X = arg min M ads to: X GX X E X = + X XG T X (G, XG T + E) 1

5 The M/EEG inverse problem: modelisation Multi-task regression: n observations q tasks p features Y R n q observation matrix X R n p forward matrix Y = XB + E where B R p q is the true source activity matrix E R n q is an additive white noise.

6 l 2,1 regularization sources Penalty: Group-Lasso (l 2,1 ) p B 2,1 = B j 2 j=1 (B j : j th row of B) we solve: 1 ˆB arg min B R p q 2 Y XB 2 + λ B 2,1 time Source activity ˆB R p q a.k.a. Multiple Measurement Vector (MMV) in signal processing or multitask Lasso in ML [Obozinski et al., 2010]

7 Table of Contents Motivation - problem setup Joint estimators of the noise level General noise model Block homoscedastic model

8 Choice of λ and noise level The noise E is assumed white gaussian, with i.i.d. entries E i,t N (0, σ 2 ) For optimal performance of the Lasso, λ should be proportional to the noise level (λ σ 2009] Yet σ is unknown in practice! log p n ) [Bickel et al.,

9 Joint estimation of β and σ (illustrated on Lasso for simplicity: model is y = Xβ + ε) Intuitive idea: run Lasso with some λ, get ˆβ estimate σ with residuals: σ = y X ˆβ / n relaunch Lasso with λ σ etc. Note: this is the original implementation proposed for the Scaled-Lasso [Sun and Zhang, 2012]

10 Concomitant Lasso Concomitant Lasso [Owen, 2007] (inspired by Huber [1981]): ( ˆβ, y Xβ 2 ˆσ) arg min + σ β R p,σ>0 2nσ 2 + λ β 1 also equivalent to the Square-root/Scaled Lasso [Belloni et al., 2011, Sun and Zhang, 2012]. Note: note that σ 2 acts as a penalty over the noise

11 Solving the Concomitant Lasso A jointly convex formulation (w.r.t. both β and σ): y Xβ 2 arg min + σ β R p,σ>0 2nσ 2 + λ β 1 β update: smooth + separable function, make CD steps [Friedman et al., 2007].

12 Solving the Concomitant Lasso A jointly convex formulation (w.r.t. both β and σ): y Xβ 2 arg min + σ β R p,σ>0 2nσ 2 + λ β 1 β update: smooth + separable function, make CD steps [Friedman et al., 2007]. σ update: 1D optimization problem, closed-form σ = y Xβ / n.

13 Solving the Concomitant Lasso A jointly convex formulation (w.r.t. both β and σ): y Xβ 2 arg min + σ β R p,σ>0 2nσ 2 + λ β 1 β update: smooth + separable function, make CD steps [Friedman et al., 2007]. σ update: 1D optimization problem, closed-form σ = y Xβ / n.

14 Solving the Concomitant Lasso A jointly convex formulation (w.r.t. both β and σ): y Xβ 2 arg min + σ β R p,σ>0 2nσ 2 + λ β 1 β update: smooth + separable function, make CD steps [Friedman et al., 2007]. σ update: 1D optimization problem, closed-form σ = y Xβ / n. But what if we hit y = Xβ?

15 Smoothed Concomitant Lasso Smoothed Concomitant Lasso [Ndiaye et al., 2017]: y Xβ 2 arg min + σ β R p,σ>σ 2nσ 2 + λ β 1

16 Smoothed Concomitant Lasso Smoothed Concomitant Lasso [Ndiaye et al., 2017]: y Xβ 2 arg min + σ β R p,σ>σ 2nσ 2 + λ β 1 Called smoothed following the terminology of Nesterov [2005], because it amounts to smoothing in the dual: for X = ˆθ = arg max y, λθ + σ θ X ( ) 1 2 nλ2 2 θ 2 { } θ R n : X θ 1, θ 1 λn. state-of-the-art solvers as fast as for the Lasso.

17 Table of Contents Motivation - problem setup Joint estimators of the noise level General noise model Block homoscedastic model

18 What about more complex noise models? What can we do if the noise is not white? Smoothed Generalized Concomitant Lasso (SGCL): (ˆB, ˆΣ) arg min B R p q Σ S n ++,Σ Σ Y XB 2 Σ 1 2nq + Tr(Σ) 2n + λ B 2,1 with Z A = Tr Z AZ and Σ = σ Id.

19 What about more complex noise models? What can we do if the noise is not white? Smoothed Generalized Concomitant Lasso (SGCL): (ˆB, ˆΣ) arg min B R p q Σ S n ++,Σ Σ Y XB 2 Σ 1 2nq + Tr(Σ) 2n + λ B 2,1 with Z A = Tr Z AZ and Σ = σ Id. In the case Σ = σ Id, we recover the Smoothed Concomitant.

20 What about more complex noise models? What can we do if the noise is not white? Smoothed Generalized Concomitant Lasso (SGCL): (ˆB, ˆΣ) arg min B R p q Σ S n ++,Σ Σ Y XB 2 Σ 1 2nq + Tr(Σ) 2n + λ B 2,1 with Z A = Tr Z AZ and Σ = σ Id. In the case Σ = σ Id, we recover the Smoothed Concomitant. Note: the noise penalty is now on the sum of the eigenvalues of Σ

21 Solving the SGCL Jointly convex formulation, we can still use alternate minimization and use the duality gap as stopping criterion. Σ fixed: smooth + l 1 -type, BCD works

22 Solving the SGCL Jointly convex formulation, we can still use alternate minimization and use the duality gap as stopping criterion. B fixed: with the current residuals R = Y XB, the problem is: 1 arg min Σ S n ++,Σ Σ 2nq Tr[R Σ 1 R] + 1 2n Tr(Σ). Closed-form solution: if U diag(s 1,..., s n )U is the SVD of RR : arg min Σ = U diag(max(σ, s 1 ),..., max(σ, s n ))U.

23 Alternate minimization Algorithm: Alternate min. for multi-task SGCL input : X, Y, Σ, λ, f, T init : B = 0 p,q, Σ 1 = Σ 1, R = Y for iter = 1,..., T do if iter = 1 (mod f) then Σ Ψ(R, Σ) // closed-form sol. of minimization in Σ for j = 1,..., p do L j = Xj Σ 1 X j // Lipschitz constants for j = 1,..., p do R R + X j B j // partial residual update (X j B j BST R, λnq ) L j L j // coef. update R R X j B j // residual update return B, Σ Complexity? OK if we store Σ 1 X, and Σ 1 R instead of R.

24 Main drawbacks Σ update OK for M/EEG because n = 300, but O(n 3 ) SVD problematic otherwise. O(n 2 ) parameters to infer for Σ, with nq observations: works only for q 10n.

25 Table of Contents Motivation - problem setup Joint estimators of the noise level General noise model Block homoscedastic model

26 Block Homoscedastic model In our case we record 3 different types of signals: electrodes measure the electric potentials magnetometers measure the magnetic field gradiometers measure the gradient of the magnetic field physical natures, different noise levels Observations are divided into 3 blocks & the partition is known.

27 Block Homoscedastic model K groups of observations: X = X 1. X K, Y = Y 1. Y K, E = E 1. E K Σ = diag(σ 1 Id n1,..., σ K Id nk ) For each block, homoscedastic model with white noise: Y k = X k B + σ ke k and entries of E k i.i.d. N (0, 1)

28 Smoothed Block Homoscedastic Concomitant (SBHCL) Plugging a diagonal Σ (constant over consecutive blocks) into the general model yields: Block Homoscedastic Concomitant: ( K Y k X k B 2 arg min B R p q, 2nqσ k=1 k σ 1,...,σ K R K ++ σ k σ k, k [K] + n ) kσ k + λ B 2n 2,1 Σ down from n(n 1) 2 parameters (hopeless without more structure) to K.

29 Solving the SBHCL (block) coordinate descent steps remain the same computing Σ 1 R for the BCD is easier

30 Solving the SBHCL (block) coordinate descent steps remain the same computing Σ 1 R for the BCD is easier σ k s updates are simple and can even be performed at each B j update (as for the concomitant)

31 Solving the SBHCL (block) coordinate descent steps remain the same computing Σ 1 R for the BCD is easier σ k s updates are simple and can even be performed at each B j update (as for the concomitant)

32 Alternate minimization Algorithm: Alternate min. for multi-task SBHCL input : X 1,..., X K, Y 1,..., Y K, σ 1,..., σ K, λ, T init : B = 0 p,q, k [K], σ k = Y k / n k q, R k = Y k, k [K], j [p], L k,j = Xj k 2 2 for iter = 1,..., T do for j = 1,..., p do for k = 1,..., K do R k R k + Xj kb j // residual update ( K B j BST k=1 Xj k σ k for k = 1,..., K do R k R k Xj kb j σ k σ k Rk nk q return B, σ 1,..., σ k R k ), λnq / K k=1 L k,j σ k // residual update // smart std dev update

33 In practice Design: (n, p, q) = 300, 1000, 100 X Toeplitz-correlated: Cov(X i, X j ) = ρ i j, ρ ]0, 1[ 3 blocks with noise in ratio 1, 2, 5

34 Support recovery True positive rate True positive rate SBHCL MTL MTL (Block 1) SCL SCL (Block 1) False positive rate SBHCL MTL MTL (Block 1) SCL SCL (Block 1) False positive rate ROC curves of true support recovery: for SBHCL, MTL and SCL on all blocks, and the MTL and SCL on the least noisy block. Top: SNR=1, ρ = 0.1, Bottom: SNR = 1, ρ = 0.9.

35 Prediction performance log10(rmse/rmseoracle) log10(rmse/rmseoracle) SBHCL (block 1) SBHCL (block 2) SBHCL (block 3) SCL (block 1) SCL (block 2) SCL (block 3) / max SBHCL (block 1) SBHCL (block 2) SBHCL (block 3) SCL (block 1) SCL (block 2) SCL (block 3) / max

36 Take home message more general noise models: possible to estimate full covariance if there are enough tasks if more noise structure is known (e.g., block homoscedastic model): not more costly than the multi-task Lasso (MTL) taking into account multiple noise levels helps: both for prediction and support identification using additional (though noisier) data helps! Python code is available at This work was funded by ERC Starting Grant SLAB ERC-YStG Slides powered by MooseTeX.

37 References I G. Obozinski, B. Taskar, and M. I. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20(2): , P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist., 37(4): , T. Sun and C.-H. Zhang. Scaled sparse linear regression. Biometrika, 99(4): , A. B. Owen. A robust hybrid of lasso and ridge regression. Contemporary Mathematics, 443:59 72, P. J. Huber. Robust Statistics. John Wiley & Sons Inc., A. Belloni, V. Chernozhukov, and L. Wang. Square-root Lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4): , J. Friedman, T. J. Hastie, H. Höfling, and R. Tibshirani. Pathwise coordinate optimization. Ann. Appl. Stat., 1(2): , E. Ndiaye, O. Fercoq, A. Gramfort, V. Leclère, and J. Salmon. Efficient smoothed concomitant Lasso estimation for high dimensional regression. In NCMIP, Y. Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103(1): , 2005.

Generalized Concomitant Multi-Task Lasso for Sparse Multimodal Regression

Generalized Concomitant Multi-Task Lasso for Sparse Multimodal Regression Generalized Concomitant Multi-Task Lasso for Sparse Multimodal Regression Mathurin Massias Olivier Fercoq Alexandre Gramfort Joseph Salmon Inria LTCI, Télécom ParisTech Inria LTCI, Télécom ParisTech Université

More information

arxiv: v2 [stat.ml] 18 Oct 2017

arxiv: v2 [stat.ml] 18 Oct 2017 Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias *, Olivier Fercoq, Alexandre Gramfort 2, and Joseph Salmon 2 arxiv:705.09778v2 [stat.ml] 8 Oct 207 INRIA, Université

More information

Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression

Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Vincent Leclère, Joseph Salmon To cite this version: Eugene Ndiaye, Olivier

More information

From safe screening rules to working sets for faster Lasso-type solvers

From safe screening rules to working sets for faster Lasso-type solvers From safe screening rules to working sets for faster Lasso-type solvers Mathurin Massias INRIA, Université Paris Saclay, Palaiseau, France Alexandre Gramfort INRIA, Université Paris Saclay, Palaiseau,

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Computational Statistics and Optimisation. Joseph Salmon Télécom Paristech, Institut Mines-Télécom

Computational Statistics and Optimisation. Joseph Salmon   Télécom Paristech, Institut Mines-Télécom Computational Statistics and Optimisation Joseph Salmon http://josephsalmon.eu Télécom Paristech, Institut Mines-Télécom Plan Duality gap and stopping criterion Back to gradient descent analysis Forward-backward

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

arxiv: v2 [math.st] 15 Sep 2015

arxiv: v2 [math.st] 15 Sep 2015 χ 2 -confidence sets in high-dimensional regression Sara van de Geer, Benjamin Stucky arxiv:1502.07131v2 [math.st] 15 Sep 2015 Abstract We study a high-dimensional regression model. Aim is to construct

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

LASSO Review, Fused LASSO, Parallel LASSO Solvers

LASSO Review, Fused LASSO, Parallel LASSO Solvers Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Celer: Fast solver for the Lasso with dual extrapolation

Celer: Fast solver for the Lasso with dual extrapolation Celer: Fast solver for the Lasso with dual extrapolation Mathurin Massias https://mathurinm.github.io INRIA Joint work with: Alexandre Gramfort (INRIA) Joseph Salmon (Télécom ParisTech) To appear in ICML

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

arxiv: v2 [stat.ml] 1 May 2017

arxiv: v2 [stat.ml] 1 May 2017 From safe screening rules to working sets for faster Lasso-type solvers Mathurin Massias Alexandre Gramfort Joseph Salmon arxiv:1703.07285v2 [stat.ml] 1 May 2017 May 1, 2017 Abstract Convex sparsity-promoting

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014 Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,

regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

Multivariate Normal Models

Multivariate Normal Models Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors Sean Borman and Robert L. Stevenson Department of Electrical Engineering, University of Notre Dame Notre Dame,

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 Feature selection task 1 Why might you want to perform feature selection? Efficiency: - If size(w)

More information

6. Regularized linear regression

6. Regularized linear regression Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression

Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

arxiv: v1 [math.st] 13 Feb 2012

arxiv: v1 [math.st] 13 Feb 2012 Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch

More information

Sparse orthogonal factor analysis

Sparse orthogonal factor analysis Sparse orthogonal factor analysis Kohei Adachi and Nickolay T. Trendafilov Abstract A sparse orthogonal factor analysis procedure is proposed for estimating the optimal solution with sparse loadings. In

More information

Adaptive one-bit matrix completion

Adaptive one-bit matrix completion Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

A Robust Approach to Regularized Discriminant Analysis

A Robust Approach to Regularized Discriminant Analysis A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

MATH 680 Fall November 27, Homework 3

MATH 680 Fall November 27, Homework 3 MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn CMU 10-701: Machine Learning (Fall 2016) https://piazza.com/class/is95mzbrvpn63d OUT: September 13th DUE: September

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart

More information

Comparisons of penalized least squares. methods by simulations

Comparisons of penalized least squares. methods by simulations Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information