Sparsity Regularization
|
|
- Eleanor Stafford
- 5 years ago
- Views:
Transcription
1 Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41
2 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41
3 problem setup finite-dimensional formulation x R p : the unknown signal b = Ax + η, η R n : additive Gaussian noise; ɛ = η : noise level A R n p, p n: (normalized column), i.e., A i = 1 The problem has infinitely many solutions (if it has one), which one shall we take? 3 / 41
4 insights from exact data toy example: find a reasonable solution to the problem x 1 + 2x 2 = 5 There are infinitely many solutions. (1,2) Which one shall we take? convention: least-squares Gauss 1809 min x x 2 2 s.t. x 1 + 2x 2 = 5 generalized minimum-energy solution min x 1 p + x 2 p, 0 p 2 s.t. x 1 + 2x 2 = 5 4 / 41
5 (1,2) p = 2 p = 3/2 p = 1 p = 1/2 5 / 41
6 in case of noisy data: Tikhonov regularization 1 2 Ax b 2 + αψ(x) two possible choices of ψ(x) (convexity...) classical Tikhonov regularization ψ(x) = 1 2 x 2 2 =: 1 2 x i 2 sparsity regularization general analogues... ψ(x) = x p p =: 1 x i p, p [0, 1] p i i 6 / 41
7 in case of noisy data: Tikhonov regularization 1 2 Ax b 2 + αψ(x) assumption: i.i.d. additive Gaussian noise on the data likelihood assumption: prior knowledge b i = b i + ξ i, ξ i N(0, σ 2 ) p(b x) e 1 2σ 2 (Ax b)2 p(x) e λψ(x) classical Tikhonov regularization Gaussian prior distribution sparsity regularization Laplace distribution 7 / 41
8 1 0.8 Gaussian Laplace Gaussian v.s. Laplace distribution 8 / 41
9 Gaussian v.s. Laplace 9 / 41
10 The energy can be more general: ψ(x) = ψ(wx), under certain transformation, e.g., wavelet, framelet, curvelet, shearlet... The discussions below extend to these more complex cases 10 / 41
11 natural idea for sparse solution is to penalize the number of unknowns where 1 2 Ax b 2 + α x 0 x 0 = #(nonzeros in x) conceptually intuitive, but computationally very challenging: approximations: bridge penalty x q q = x i q, q (0, 1) l1 penalty x 1 = x i 11 / 41
12 / 41
13 The l0 penalty is the genuine choice, but VERY challenging there are different ways to approximate it... lq is an approximation, and there are many others especially l1 is very popular, since l1 is convex further, there is a solid theory 13 / 41
14 notation: S = {1,..., p} I S, x I : subvector consisting of entries of x indexed by i I I S, A I : submatrix consisting of columns of A indexed by i I 14 / 41
15 restricted isometry property (RIP) RIP of order s, if a δ s (0, 1) s.t. (1 δ s ) c 2 A I c 2 (1 + δ s ) c 2 I S, I s. with δ s being the smallest constant for which RIP holds δ s := inf{δ : (1 δ) c 2 A I c 2 (1+δ) c 2 I s, c R I } denoted by RIP (s, δ s ) RIP (s, δ s ) 1 δ s λ min (A I A I) λ max (A I A I) 1 + δ s the submatrix A I is fairly well-conditioned RIP is difficult to compute 15 / 41
16 Matrices satisfying RIP random matrices with i.i.d. Gaussian/Bernoulli with zero mean and variance 1/p RIP holds with overwhelming probability if s cn/ log(p/n) random matrices from Fourier ensemble: randomly select n rows from p p discrete Fourier transform matrix uniformly at random, and then re-normalized RIP holds with overwhelming probability if s cn/(log(p)) 6 DFT matrix is used in Magnetic Resonance Imaging 16 / 41
17 under certain conditions on the matrix A and the true solution x : conditions x x α Cɛ the result holds on δ 3s + 3δ 4s < 2 n is nearly of order s, i.e., n s up to log factors the reconstruction error is of the same order as data error ɛ much better than the classical inverse problems sublinear much stronger conditions there are some other methods that also achieves the similar errors 17 / 41
18 an example from radar imaging c Potter et al 2010 passive imaging, backhoe data conventional v.s. sparsity 18 / 41
19 convex function convex functions: f (x) is convex over its domain dom(f ) if f (λx 1 + (1 λ)x 2 ) λf (x 1 ) + (1 λ)f (x 2 ) λ [0, 1], x 1, x 2 dom(f ) f is concave if f is convex f is strictly convex if f (λx 1 +(1 λ)x 2 )<λf (x 1 )+(1 λ)f (x 2 ) λ (0, 1), x 1 x 2 dom(f ), if f differentiable f (x 2 ) f (x 1 ) + ( f (x 1 ), x 2 x 1 ) first-order Taylor exp. is a global under-estimator how to verify: by definition if f is twice differential: convex f 0 19 / 41
20 l 1 term is not differentiable, but a generalized derivative exists a vector g R n is a subgradient of a convex function f (x) : R n R at x 0 if i.e., f (x) f (x 0 ) x x 0, g x dom(f ) f (x) f (x 0 ) + x x 0, g x dom(f ) the set of subgradient at x 0 is denoted by f (x 0 ) if f is differentiable at x 0, then it is identical with f (x 0 ) property: x is a minimizer to f if and only if 0 f (x ) sum rules (under certain mild conditions) 20 / 41
21 the subdifferential of f (t) = t at t 0, f is differentiable, f (t) = {f (t)}, i.e., f (t) = sign(t), t 0 at t = 0, f (t) is not differentiable: any constant c s.t. t = f (t) f (0) + c(t 0) = ct t R 1 c 1, i.e. ( t )(0) = [ 1, 1] Hence, t 1, t > 0, ( t ) = 1, t < 0, [ 1, 1], t = / 41
22 one-dimensional example: fixed t f (s) = 1 2 (t s)2 + α s the function is strictly convex! a unique minimizer { 1 f (s) = 1 2 (t s)2 2 + α s = (t s)2 + αs, s > (t s)2 αs, s 0 suppose t > 0 and the minimum is achieved at s > 0, then s = t α > 0, f (s ) = 1 2 α2 + α(t α) s = 0, f (s ) = 1 2 t2 t α 0 s = t α t α < 0 s = 0 22 / 41
23 t α, t > α s = S α (t) = 0, t α t + α, t < α the optimality condition is 0 (s t) + α s, i.e. t s + α s, soft thresholding operator S α (t) = ( α + I) 1 (t) 23 / 41
24 0 0 0 t t 0 t S α (t) It shrinks the value, and zeros it if small 24 / 41
25 Convex approach l1 penalty popular approach: basis pursuit or lasso Chen et al 1998, Tibshirani 1996 min J α(x) = 1 x R p 2 Ax b 2 + α x 1, How can we obtain such nice pictures numerically? 25 / 41
26 iterative soft thresholding iterative soft thresholding Daubechies-defrise-de Mol 2005 an iterative algorithm for computing the solution by surrogate function approach (majorization-minimization, optimization transfer) observations: J α (x) = 1 2 Ax y 2 + α x 1, if A = I, the problem is easy J α (x) = ( 1 2 (x i y i ) 2 + α x i ) i the problem decouples into n one-dimensional problems if A is orthonormal, i.e., A A = I, J α (x) = 1 2 x A b 2 + α x 1 26 / 41
27 the presence of an operator A surrogate function coupling f (x) = 1 2 Ax b 2 1st-order Taylor expansion... given the current guess x k f (x) = 1 2 A(x x k ) + Ax k b 2 = 1 2 A(x x k ) 2 + A(x x k ), Ax k b Ax k y 2 τ k 2 x x k 2 + x x k, A (Ax k b) Ax k b 2 := Q(x, x k ) it is easy to verify that and further Q(x k, x k ) = f (x k ), Q (x k, x k ) = f (x k ) Q(x, x k ) f (x) if τ k A 2 algorithm: simplified minimization problem: x k+1 arg min Q(x, x k ) + α x 1 27 / 41
28 approximate minimization problem let then x k+1 = arg min Q(x, x k ) + α x 1 Q(x, x k ) + α x = τ k 2 x x k 2 A (Ax k b), x x k + α x 1 = τ k 2 x (x k τ 1 k A (Ax k b)) 2 + α x 1 1 2τ k A (Ax k b) 2 x k+1 = x k τ 1 k A (Ax k b) Q(x, x k ) + α x = τ k 2 x x k α x 1 + cnst = ( ) τk 2 (x i x k+1 i ) 2 + α x i + cnst i 28 / 41
29 The one-dimensional optimization problem the solution S α (t) is given by 1 2 (s t)2 + α s t α, t > α 0 S α (t) = 0, t α t + α, t < α 0 It shrinks the value, and zeros it if small t 29 / 41
30 iterative soft thresholding Daubechies-Defrise-De Mol, 2005 given initial guess x 0, update the solution iteratively by x k+1 = x k τ 1 A (Ax k b) x k+1 = S τ 1 α( x k+1 ) (gradient descent) (thresholding) iterative thresholding iteration is a (nonlinear) gradient descent method the convergence is slow... adaptive choice of step size can improve convergence... primal dual active set (PDAS) algorithm PDAS = Newton method, for a class of convex optimization 30 / 41
31 choice I: Cauchy step size Cauchy 1847 τ k = arg min τ>0 A(x k τa (Ax k b)) b i.e., τ k = A (Ax k b) 2 AA (Ax k b) 2 = d k 2 Ad k 2 31 / 41
32 Choice II: Barzilai-Borwein rule Barzilai-Borwein 1988 to use preceding two iterates to decide the step size general quasi-newton method: x k+1 = x k (B k ) 1 g k, B k (x k x k 1 ) = g k g k 1 (quasi-newton relation) select D k = τ k I and x k+1 = x k D k g k to mimic the quasi-newton method (in least-squares sense) min (x k x k 1 ) τ(g k g k 1 ) τ k = x k x k 1, g k g k 1 g k g k / 41
33 fast iterative shrinkage-thresholding algorithm Nesterov 1980s, Beck-Teboulle 2008 x 1 = x 0, z 1 = x 0, and for k 1, t 1 = 1 extrapolated point z k x k = S α (z k A (Az k b)) t k+1 = tk 2 2 z k+1 = x k + t k 1 (x k x k 1 ) t k+1 33 / 41
34 observation: One-dimensional problem can be solved easily! The presence of the operator K messes things up, so we update the solution componentwise theoretically P. Tseng, 2001 x k 1 arg min J α (x 1, x k 1 2,, x k 1 p ) x k 2 arg min J α (x k 1, x 2, x k 1 3,, x k 1 p ). x k p arg min J α (x k 1, x k 2,, x k p 1, x k ) The sequence have a subsequence converging to the minimizer. The sequence of function value to the minimum. revived interest in statistics Friedman et al / 41
35 simple case α = 0, f (x) = 1 2 Ax b 2 minimizing over x i, with all x j, j i fixed i.e. 0 = i f (x) = A i (Ax b) = A i (A ix i + A i x i b) x i = A i (b A ix i ) A i A i coordinate descent repeats this for i = 1, 2,...,... x i = A i r A i 2 + x i old with r = y Ax O(n) operation per cycle 35 / 41
36 l1 problem minimization over x i, with x j, j i fixed 0 = A i A ix i + A i (A ix i b) + αs i s i x i ( ) A x i = S i (b A i x i ) α/ Ai 2 A i 2 36 / 41
37 iteratively reweighed least-squares (IRLS) Another viewpoint: recall for the quadratic penalty 1 2 Ax b 2 + α 2 x 2 the optimal solution x α satisfies the following optimality system A (Ax α b) + αx α = 0 i.e., (A A + αi)x α = A b 37 / 41
38 To take advantage of the quadratic problem, we rewrite the l1 problem as (given current estimate x k ) with 1 2 Ax b 2 + αx t W k x, W k = diag( x k i 1 ) Then this gives the iterative scheme (+ regularization with small ɛ > 0): W k = 2diag(( x k i + ɛ) 1 ), x k = (A A + W k ) 1 A b. 38 / 41
39 what is beyond Newton type method nonlinear forward operators (many medical imaging problems) structured sparsity patterns total variation regularization nonconvex penalties (can be efficiently solved) 39 / 41
40 Probability of exact recover of the true support Sparsity level l 0 l 1/2 l 1 SCAD MCP Capped-l 1 setting: Gaussian Ψ and noise, , DR = 10 3, σ = 0.01 with 500 data points: the l 1 allows exact support recovery only if solution is very sparse nonconvex models allow recovering far more nonzeros 40 / 41
ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationConstrained optimization
Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationRecovering overcomplete sparse representations from structured sensing
Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix
More informationMaking Flippy Floppy
Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Current
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationOptimization Algorithms for Compressed Sensing
Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationMessage Passing Algorithms for Compressed Sensing: II. Analysis and Validation
Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationLecture 22: More On Compressed Sensing
Lecture 22: More On Compressed Sensing Scribed by Eric Lee, Chengrun Yang, and Sebastian Ament Nov. 2, 207 Recap and Introduction Basis pursuit was the method of recovering the sparsest solution to an
More informationA Unified Primal Dual Active Set Algorithm for Nonconvex Sparse Recovery
A Unified Primal Dual Active Set Algorithm for Nonconvex Sparse Recovery Jian Huang Yuling Jiao Bangti Jin Jin Liu Xiliang Lu Can Yang January 5, 2018 Abstract In this paper, we consider the problem of
More informationSparse Legendre expansions via l 1 minimization
Sparse Legendre expansions via l 1 minimization Rachel Ward, Courant Institute, NYU Joint work with Holger Rauhut, Hausdorff Center for Mathematics, Bonn, Germany. June 8, 2010 Outline Sparse recovery
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationConvex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa
Convex Optimization Lecture 12 - Equality Constrained Optimization Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 19 Today s Lecture 1 Basic Concepts 2 for Equality Constrained
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationSparse and Regularized Optimization
Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline
More informationCompressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles
Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional
More informationStability and the elastic net
Stability and the elastic net Patrick Breheny March 28 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/32 Introduction Elastic Net Our last several lectures have concentrated on methods for
More informationA PRIMAL DUAL ACTIVE SET ALGORITHM FOR A CLASS OF NONCONVEX SPARSITY OPTIMIZATION
A PRIMAL DUAL ACTIVE SET ALGORITHM FOR A CLASS OF NONCONVEX SPARSITY OPTIMIZATION YULING JIAO, BANGTI JIN, XILIANG LU, AND WEINA REN Abstract. In this paper, we consider the problem of recovering a sparse
More informationTheoretical results for lasso, MCP, and SCAD
Theoretical results for lasso, MCP, and SCAD Patrick Breheny March 2 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/23 Introduction There is an enormous body of literature concerning theoretical
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationMLISP: Machine Learning in Signal Processing Spring Lecture 10 May 11
MLISP: Machine Learning in Signal Processing Spring 2018 Lecture 10 May 11 Prof. Venia Morgenshtern Scribe: Mohamed Elshawi Illustrations: The elements of statistical learning, Hastie, Tibshirani, Friedman
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationMaking Flippy Floppy
Making Flippy Floppy James V. Burke UW Mathematics jvburke@uw.edu Aleksandr Y. Aravkin IBM, T.J.Watson Research sasha.aravkin@gmail.com Michael P. Friedlander UBC Computer Science mpf@cs.ubc.ca Vietnam
More informationSPARSE recovery has received much attention in many
IEEE TRANSECTION ON SIGNAL PROCESSING,VOL, NO,,205 Group Sparse Recovery via the l 0 (l 2 ) Penalty: Theory and Algorithm Yuling Jiao, Bangti Jin and Xiliang Lu Abstract In this work we propose and analyze
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationAN INTRODUCTION TO COMPRESSIVE SENSING
AN INTRODUCTION TO COMPRESSIVE SENSING Rodrigo B. Platte School of Mathematical and Statistical Sciences APM/EEE598 Reverse Engineering of Complex Dynamical Networks OUTLINE 1 INTRODUCTION 2 INCOHERENCE
More informationAn accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems
An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization
More informationarxiv: v3 [stat.me] 8 Jun 2018
Between hard and soft thresholding: optimal iterative thresholding algorithms Haoyang Liu and Rina Foygel Barber arxiv:804.0884v3 [stat.me] 8 Jun 08 June, 08 Abstract Iterative thresholding algorithms
More informationElaine T. Hale, Wotao Yin, Yin Zhang
, Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationStochastic Optimization: First order method
Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationLecture 23: November 21
10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationAn Homotopy Algorithm for the Lasso with Online Observations
An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationRobust Sparse Recovery via Non-Convex Optimization
Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationLecture Notes 9: Constrained Optimization
Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form
More informationApproximation. Inderjit S. Dhillon Dept of Computer Science UT Austin. SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina.
Using Quadratic Approximation Inderjit S. Dhillon Dept of Computer Science UT Austin SAMSI Massive Datasets Opening Workshop Raleigh, North Carolina Sept 12, 2012 Joint work with C. Hsieh, M. Sustik and
More informationCOMPRESSED SENSING IN PYTHON
COMPRESSED SENSING IN PYTHON Sercan Yıldız syildiz@samsi.info February 27, 2017 OUTLINE A BRIEF INTRODUCTION TO COMPRESSED SENSING A BRIEF INTRODUCTION TO CVXOPT EXAMPLES A Brief Introduction to Compressed
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More informationMotivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting
More informationA Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationPrimal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector
Primal Dual Pursuit A Homotopy based Algorithm for the Dantzig Selector Muhammad Salman Asif Thesis Committee: Justin Romberg (Advisor), James McClellan, Russell Mersereau School of Electrical and Computer
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationPhase recovery with PhaseCut and the wavelet transform case
Phase recovery with PhaseCut and the wavelet transform case Irène Waldspurger Joint work with Alexandre d Aspremont and Stéphane Mallat Introduction 2 / 35 Goal : Solve the non-linear inverse problem Reconstruct
More informationSparse linear models
Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time
More informationBayesian Methods for Sparse Signal Recovery
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery
More informationRecovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm
Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm J. K. Pant, W.-S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University
More informationarxiv: v3 [math.oc] 29 Jun 2016
MOCCA: mirrored convex/concave optimization for nonconvex composite functions Rina Foygel Barber and Emil Y. Sidky arxiv:50.0884v3 [math.oc] 9 Jun 06 04.3.6 Abstract Many optimization problems arising
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD
ANALYSIS AND GENERALIZATIONS OF THE LINEARIZED BREGMAN METHOD WOTAO YIN Abstract. This paper analyzes and improves the linearized Bregman method for solving the basis pursuit and related sparse optimization
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationRidge Regression. Mohammad Emtiyaz Khan EPFL Oct 1, 2015
Ridge Regression Mohammad Emtiyaz Khan EPFL Oct, 205 Mohammad Emtiyaz Khan 205 Motivation Linear models can be too limited and usually underfit. One way is to use nonlinear basis functions instead. Nonlinear
More informationRecent Developments in Compressed Sensing
Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationUsing ADMM and Soft Shrinkage for 2D signal reconstruction
Using ADMM and Soft Shrinkage for 2D signal reconstruction Yijia Zhang Advisor: Anne Gelb, Weihong Guo August 16, 2017 Abstract ADMM, the alternating direction method of multipliers is a useful algorithm
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationLarge-Scale L1-Related Minimization in Compressive Sensing and Beyond
Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March
More informationInverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology
Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 27 Introduction Fredholm first kind integral equation of convolution type in one space dimension: g(x) = 1 k(x x )f(x
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationIterative regularization of nonlinear ill-posed problems in Banach space
Iterative regularization of nonlinear ill-posed problems in Banach space Barbara Kaltenbacher, University of Klagenfurt joint work with Bernd Hofmann, Technical University of Chemnitz, Frank Schöpfer and
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013
Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations
More informationSparsity in Underdetermined Systems
Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2
More informationIEOR 265 Lecture 3 Sparse Linear Regression
IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M
More information