Sparsity Regularization

Size: px
Start display at page:

Download "Sparsity Regularization"

Transcription

1 Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41

2 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41

3 problem setup finite-dimensional formulation x R p : the unknown signal b = Ax + η, η R n : additive Gaussian noise; ɛ = η : noise level A R n p, p n: (normalized column), i.e., A i = 1 The problem has infinitely many solutions (if it has one), which one shall we take? 3 / 41

4 insights from exact data toy example: find a reasonable solution to the problem x 1 + 2x 2 = 5 There are infinitely many solutions. (1,2) Which one shall we take? convention: least-squares Gauss 1809 min x x 2 2 s.t. x 1 + 2x 2 = 5 generalized minimum-energy solution min x 1 p + x 2 p, 0 p 2 s.t. x 1 + 2x 2 = 5 4 / 41

5 (1,2) p = 2 p = 3/2 p = 1 p = 1/2 5 / 41

6 in case of noisy data: Tikhonov regularization 1 2 Ax b 2 + αψ(x) two possible choices of ψ(x) (convexity...) classical Tikhonov regularization ψ(x) = 1 2 x 2 2 =: 1 2 x i 2 sparsity regularization general analogues... ψ(x) = x p p =: 1 x i p, p [0, 1] p i i 6 / 41

7 in case of noisy data: Tikhonov regularization 1 2 Ax b 2 + αψ(x) assumption: i.i.d. additive Gaussian noise on the data likelihood assumption: prior knowledge b i = b i + ξ i, ξ i N(0, σ 2 ) p(b x) e 1 2σ 2 (Ax b)2 p(x) e λψ(x) classical Tikhonov regularization Gaussian prior distribution sparsity regularization Laplace distribution 7 / 41

8 1 0.8 Gaussian Laplace Gaussian v.s. Laplace distribution 8 / 41

9 Gaussian v.s. Laplace 9 / 41

10 The energy can be more general: ψ(x) = ψ(wx), under certain transformation, e.g., wavelet, framelet, curvelet, shearlet... The discussions below extend to these more complex cases 10 / 41

11 natural idea for sparse solution is to penalize the number of unknowns where 1 2 Ax b 2 + α x 0 x 0 = #(nonzeros in x) conceptually intuitive, but computationally very challenging: approximations: bridge penalty x q q = x i q, q (0, 1) l1 penalty x 1 = x i 11 / 41

12 / 41

13 The l0 penalty is the genuine choice, but VERY challenging there are different ways to approximate it... lq is an approximation, and there are many others especially l1 is very popular, since l1 is convex further, there is a solid theory 13 / 41

14 notation: S = {1,..., p} I S, x I : subvector consisting of entries of x indexed by i I I S, A I : submatrix consisting of columns of A indexed by i I 14 / 41

15 restricted isometry property (RIP) RIP of order s, if a δ s (0, 1) s.t. (1 δ s ) c 2 A I c 2 (1 + δ s ) c 2 I S, I s. with δ s being the smallest constant for which RIP holds δ s := inf{δ : (1 δ) c 2 A I c 2 (1+δ) c 2 I s, c R I } denoted by RIP (s, δ s ) RIP (s, δ s ) 1 δ s λ min (A I A I) λ max (A I A I) 1 + δ s the submatrix A I is fairly well-conditioned RIP is difficult to compute 15 / 41

16 Matrices satisfying RIP random matrices with i.i.d. Gaussian/Bernoulli with zero mean and variance 1/p RIP holds with overwhelming probability if s cn/ log(p/n) random matrices from Fourier ensemble: randomly select n rows from p p discrete Fourier transform matrix uniformly at random, and then re-normalized RIP holds with overwhelming probability if s cn/(log(p)) 6 DFT matrix is used in Magnetic Resonance Imaging 16 / 41

17 under certain conditions on the matrix A and the true solution x : conditions x x α Cɛ the result holds on δ 3s + 3δ 4s < 2 n is nearly of order s, i.e., n s up to log factors the reconstruction error is of the same order as data error ɛ much better than the classical inverse problems sublinear much stronger conditions there are some other methods that also achieves the similar errors 17 / 41

18 an example from radar imaging c Potter et al 2010 passive imaging, backhoe data conventional v.s. sparsity 18 / 41

19 convex function convex functions: f (x) is convex over its domain dom(f ) if f (λx 1 + (1 λ)x 2 ) λf (x 1 ) + (1 λ)f (x 2 ) λ [0, 1], x 1, x 2 dom(f ) f is concave if f is convex f is strictly convex if f (λx 1 +(1 λ)x 2 )<λf (x 1 )+(1 λ)f (x 2 ) λ (0, 1), x 1 x 2 dom(f ), if f differentiable f (x 2 ) f (x 1 ) + ( f (x 1 ), x 2 x 1 ) first-order Taylor exp. is a global under-estimator how to verify: by definition if f is twice differential: convex f 0 19 / 41

20 l 1 term is not differentiable, but a generalized derivative exists a vector g R n is a subgradient of a convex function f (x) : R n R at x 0 if i.e., f (x) f (x 0 ) x x 0, g x dom(f ) f (x) f (x 0 ) + x x 0, g x dom(f ) the set of subgradient at x 0 is denoted by f (x 0 ) if f is differentiable at x 0, then it is identical with f (x 0 ) property: x is a minimizer to f if and only if 0 f (x ) sum rules (under certain mild conditions) 20 / 41

21 the subdifferential of f (t) = t at t 0, f is differentiable, f (t) = {f (t)}, i.e., f (t) = sign(t), t 0 at t = 0, f (t) is not differentiable: any constant c s.t. t = f (t) f (0) + c(t 0) = ct t R 1 c 1, i.e. ( t )(0) = [ 1, 1] Hence, t 1, t > 0, ( t ) = 1, t < 0, [ 1, 1], t = / 41

22 one-dimensional example: fixed t f (s) = 1 2 (t s)2 + α s the function is strictly convex! a unique minimizer { 1 f (s) = 1 2 (t s)2 2 + α s = (t s)2 + αs, s > (t s)2 αs, s 0 suppose t > 0 and the minimum is achieved at s > 0, then s = t α > 0, f (s ) = 1 2 α2 + α(t α) s = 0, f (s ) = 1 2 t2 t α 0 s = t α t α < 0 s = 0 22 / 41

23 t α, t > α s = S α (t) = 0, t α t + α, t < α the optimality condition is 0 (s t) + α s, i.e. t s + α s, soft thresholding operator S α (t) = ( α + I) 1 (t) 23 / 41

24 0 0 0 t t 0 t S α (t) It shrinks the value, and zeros it if small 24 / 41

25 Convex approach l1 penalty popular approach: basis pursuit or lasso Chen et al 1998, Tibshirani 1996 min J α(x) = 1 x R p 2 Ax b 2 + α x 1, How can we obtain such nice pictures numerically? 25 / 41

26 iterative soft thresholding iterative soft thresholding Daubechies-defrise-de Mol 2005 an iterative algorithm for computing the solution by surrogate function approach (majorization-minimization, optimization transfer) observations: J α (x) = 1 2 Ax y 2 + α x 1, if A = I, the problem is easy J α (x) = ( 1 2 (x i y i ) 2 + α x i ) i the problem decouples into n one-dimensional problems if A is orthonormal, i.e., A A = I, J α (x) = 1 2 x A b 2 + α x 1 26 / 41

27 the presence of an operator A surrogate function coupling f (x) = 1 2 Ax b 2 1st-order Taylor expansion... given the current guess x k f (x) = 1 2 A(x x k ) + Ax k b 2 = 1 2 A(x x k ) 2 + A(x x k ), Ax k b Ax k y 2 τ k 2 x x k 2 + x x k, A (Ax k b) Ax k b 2 := Q(x, x k ) it is easy to verify that and further Q(x k, x k ) = f (x k ), Q (x k, x k ) = f (x k ) Q(x, x k ) f (x) if τ k A 2 algorithm: simplified minimization problem: x k+1 arg min Q(x, x k ) + α x 1 27 / 41

28 approximate minimization problem let then x k+1 = arg min Q(x, x k ) + α x 1 Q(x, x k ) + α x = τ k 2 x x k 2 A (Ax k b), x x k + α x 1 = τ k 2 x (x k τ 1 k A (Ax k b)) 2 + α x 1 1 2τ k A (Ax k b) 2 x k+1 = x k τ 1 k A (Ax k b) Q(x, x k ) + α x = τ k 2 x x k α x 1 + cnst = ( ) τk 2 (x i x k+1 i ) 2 + α x i + cnst i 28 / 41

29 The one-dimensional optimization problem the solution S α (t) is given by 1 2 (s t)2 + α s t α, t > α 0 S α (t) = 0, t α t + α, t < α 0 It shrinks the value, and zeros it if small t 29 / 41

30 iterative soft thresholding Daubechies-Defrise-De Mol, 2005 given initial guess x 0, update the solution iteratively by x k+1 = x k τ 1 A (Ax k b) x k+1 = S τ 1 α( x k+1 ) (gradient descent) (thresholding) iterative thresholding iteration is a (nonlinear) gradient descent method the convergence is slow... adaptive choice of step size can improve convergence... primal dual active set (PDAS) algorithm PDAS = Newton method, for a class of convex optimization 30 / 41

31 choice I: Cauchy step size Cauchy 1847 τ k = arg min τ>0 A(x k τa (Ax k b)) b i.e., τ k = A (Ax k b) 2 AA (Ax k b) 2 = d k 2 Ad k 2 31 / 41

32 Choice II: Barzilai-Borwein rule Barzilai-Borwein 1988 to use preceding two iterates to decide the step size general quasi-newton method: x k+1 = x k (B k ) 1 g k, B k (x k x k 1 ) = g k g k 1 (quasi-newton relation) select D k = τ k I and x k+1 = x k D k g k to mimic the quasi-newton method (in least-squares sense) min (x k x k 1 ) τ(g k g k 1 ) τ k = x k x k 1, g k g k 1 g k g k / 41

33 fast iterative shrinkage-thresholding algorithm Nesterov 1980s, Beck-Teboulle 2008 x 1 = x 0, z 1 = x 0, and for k 1, t 1 = 1 extrapolated point z k x k = S α (z k A (Az k b)) t k+1 = tk 2 2 z k+1 = x k + t k 1 (x k x k 1 ) t k+1 33 / 41

34 observation: One-dimensional problem can be solved easily! The presence of the operator K messes things up, so we update the solution componentwise theoretically P. Tseng, 2001 x k 1 arg min J α (x 1, x k 1 2,, x k 1 p ) x k 2 arg min J α (x k 1, x 2, x k 1 3,, x k 1 p ). x k p arg min J α (x k 1, x k 2,, x k p 1, x k ) The sequence have a subsequence converging to the minimizer. The sequence of function value to the minimum. revived interest in statistics Friedman et al / 41

35 simple case α = 0, f (x) = 1 2 Ax b 2 minimizing over x i, with all x j, j i fixed i.e. 0 = i f (x) = A i (Ax b) = A i (A ix i + A i x i b) x i = A i (b A ix i ) A i A i coordinate descent repeats this for i = 1, 2,...,... x i = A i r A i 2 + x i old with r = y Ax O(n) operation per cycle 35 / 41

36 l1 problem minimization over x i, with x j, j i fixed 0 = A i A ix i + A i (A ix i b) + αs i s i x i ( ) A x i = S i (b A i x i ) α/ Ai 2 A i 2 36 / 41

37 iteratively reweighed least-squares (IRLS) Another viewpoint: recall for the quadratic penalty 1 2 Ax b 2 + α 2 x 2 the optimal solution x α satisfies the following optimality system A (Ax α b) + αx α = 0 i.e., (A A + αi)x α = A b 37 / 41

38 To take advantage of the quadratic problem, we rewrite the l1 problem as (given current estimate x k ) with 1 2 Ax b 2 + αx t W k x, W k = diag( x k i 1 ) Then this gives the iterative scheme (+ regularization with small ɛ > 0): W k = 2diag(( x k i + ɛ) 1 ), x k = (A A + W k ) 1 A b. 38 / 41

39 what is beyond Newton type method nonlinear forward operators (many medical imaging problems) structured sparsity patterns total variation regularization nonconvex penalties (can be efficiently solved) 39 / 41

40 Probability of exact recover of the true support Sparsity level l 0 l 1/2 l 1 SCAD MCP Capped-l 1 setting: Gaussian Ψ and noise, , DR = 10 3, σ = 0.01 with 500 data points: the l 1 allows exact support recovery only if solution is very sparse nonconvex models allow recovering far more nonzeros 40 / 41

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Recovering overcomplete sparse representations from structured sensing

Recovering overcomplete sparse representations from structured sensing Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix

More information

Making Flippy Floppy

Making Flippy Floppy Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Optimization Algorithms for Compressed Sensing

Optimization Algorithms for Compressed Sensing Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

Lecture 22: More On Compressed Sensing

Lecture 22: More On Compressed Sensing Lecture 22: More On Compressed Sensing Scribed by Eric Lee, Chengrun Yang, and Sebastian Ament Nov. 2, 207 Recap and Introduction Basis pursuit was the method of recovering the sparsest solution to an

More information

A Unified Primal Dual Active Set Algorithm for Nonconvex Sparse Recovery

A Unified Primal Dual Active Set Algorithm for Nonconvex Sparse Recovery A Unified Primal Dual Active Set Algorithm for Nonconvex Sparse Recovery Jian Huang Yuling Jiao Bangti Jin Jin Liu Xiliang Lu Can Yang January 5, 2018 Abstract In this paper, we consider the problem of

More information

Sparse Legendre expansions via l 1 minimization

Sparse Legendre expansions via l 1 minimization Sparse Legendre expansions via l 1 minimization Rachel Ward, Courant Institute, NYU Joint work with Holger Rauhut, Hausdorff Center for Mathematics, Bonn, Germany. June 8, 2010 Outline Sparse recovery

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa Convex Optimization Lecture 12 - Equality Constrained Optimization Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 19 Today s Lecture 1 Basic Concepts 2 for Equality Constrained

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

Stability and the elastic net

Stability and the elastic net Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for

More information

A PRIMAL DUAL ACTIVE SET ALGORITHM FOR A CLASS OF NONCONVEX SPARSITY OPTIMIZATION

A PRIMAL DUAL ACTIVE SET ALGORITHM FOR A CLASS OF NONCONVEX SPARSITY OPTIMIZATION A PRIMAL DUAL ACTIVE SET ALGORITHM FOR A CLASS OF NONCONVEX SPARSITY OPTIMIZATION YULING JIAO, BANGTI JIN, XILIANG LU, AND WEINA REN Abstract. In this paper, we consider the problem of recovering a sparse

More information

Theoretical results for lasso, MCP, and SCAD

Theoretical results for lasso, MCP, and SCAD Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

MLISP: Machine Learning in Signal Processing Spring Lecture 10 May 11

MLISP: Machine Learning in Signal Processing Spring Lecture 10 May 11 MLISP: Machine Learning in Signal Processing Spring 2018 Lecture 10 May 11 Prof. Venia Morgenshtern Scribe: Mohamed Elshawi Illustrations: The elements of statistical learning, Hastie, Tibshirani, Friedman

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Making Flippy Floppy

Making Flippy Floppy Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Vietnam

More information

SPARSE recovery has received much attention in many

SPARSE recovery has received much attention in many IEEE TRANSECTION ON SIGNAL PROCESSING,VOL, NO,,205 Group Sparse Recovery via the l 0 (l 2 ) Penalty: Theory and Algorithm Yuling Jiao, Bangti Jin and Xiliang Lu Abstract In this work we propose and analyze

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

AN INTRODUCTION TO COMPRESSIVE SENSING

AN INTRODUCTION TO COMPRESSIVE SENSING AN INTRODUCTION TO COMPRESSIVE SENSING Rodrigo B. Platte School of Mathematical and Statistical Sciences APM/EEE598 Reverse Engineering of Complex Dynamical Networks OUTLINE 1 INTRODUCTION 2 INCOHERENCE

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

arxiv: v3 [stat.me] 8 Jun 2018

arxiv: v3 [stat.me] 8 Jun 2018 Between hard and soft thresholding: optimal iterative thresholding algorithms Haoyang Liu and Rina Foygel Barber arxiv:804.0884v3 [stat.me] 8 Jun 08 June, 08 Abstract Iterative thresholding algorithms

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Stochastic Optimization: First order method

Stochastic Optimization: First order method Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Lecture 23: November 21

Lecture 23: November 21 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Robust Sparse Recovery via Non-Convex Optimization

Robust Sparse Recovery via Non-Convex Optimization Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.

Approximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina. Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and

More information

COMPRESSED SENSING IN PYTHON

COMPRESSED SENSING IN PYTHON COMPRESSED SENSING IN PYTHON Sercan Yıldız syildiz@samsi.info February 27, 2017 OUTLINE A BRIEF INTRODUCTION TO COMPRESSED SENSING A BRIEF INTRODUCTION TO CVXOPT EXAMPLES A Brief Introduction to Compressed

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector

Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector Muhammad Salman Asif Thesis Committee: Justin Romberg (Advisor), James McClellan, Russell Mersereau School of Electrical and Computer

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Phase recovery with PhaseCut and the wavelet transform case

Phase recovery with PhaseCut and the wavelet transform case Phase recovery with PhaseCut and the wavelet transform case Irène Waldspurger Joint work with Alexandre d Aspremont and Stéphane Mallat Introduction 2 / 35 Goal : Solve the non-linear inverse problem Reconstruct

More information

Sparse linear models

Sparse linear models Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm J. K. Pant, W.-S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University

More information

arxiv: v3 [math.oc] 29 Jun 2016

arxiv: v3 [math.oc] 29 Jun 2016 MOCCA: mirrored convex/concave optimization for nonconvex composite functions Rina Foygel Barber and Emil Y. Sidky arxiv:50.0884v3 [math.oc] 9 Jun 06 04.3.6 Abstract Many optimization problems arising

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

ANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD

ANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD ANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD WOTAO YIN Abstract. This paper analyzes and improves the linearized Bregman method for solving the basis pursuit and related sparse optimization

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 1, 2015

Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 1, 2015 Ridge Regression Mohammad Emtiyaz Khan EPFL Oct, 205 Mohammad Emtiyaz Khan 205 Motivation Linear models can be too limited and usually underfit. One way is to use nonlinear basis functions instead. Nonlinear

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Using ADMM and Soft Shrinkage for 2D signal reconstruction

Using ADMM and Soft Shrinkage for 2D signal reconstruction Using ADMM and Soft Shrinkage for 2D signal reconstruction Yijia Zhang Advisor: Anne Gelb, Weihong Guo August 16, 2017 Abstract ADMM, the alternating direction method of multipliers is a useful algorithm

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 27 Introduction Fredholm first kind integral equation of convolution type in one space dimension: g(x) = 1 k(x x )f(x

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Iterative regularization of nonlinear ill-posed problems in Banach space

Iterative regularization of nonlinear ill-posed problems in Banach space Iterative regularization of nonlinear ill-posed problems in Banach space Barbara Kaltenbacher, University of Klagenfurt joint work with Bernd Hofmann, Technical University of Chemnitz, Frank Schöpfer and

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

IEOR 265 Lecture 3 Sparse Linear Regression

IEOR 265 Lecture 3 Sparse Linear Regression IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M

More information