Sparse Regularization via Convex Analysis
|
|
- Basil Edwards
- 5 years ago
- Views:
Transcription
1 Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66
2 Convex or non-convex: Which is better for inverse problems? Benefits of convex optimization:. Absence of suboptimal local minima 2. Continuity of solution as a function of input data 3. Algorithms guaranteed to converge to a global optimum 4. Regularization parameters easier to set But convex regularization tends to under-estimate signal values. Non-convex regularization often performs better! Can we design non-convex sparsity-inducing penalties that maintain the convexity of the cost function to be minimized? 2 / 66
3 Convex function 4 convex function / 66
4 Non-Convex function 2 Non-convex function / 66
5 Goal Goal: Find a sparse approximate solution to a linear system y = Ax. Minimize a cost function: J(x) = 2 y Ax λ x or F (x) = 2 y Ax λ ψ(x) Question: How to define ψ? Let us allow ψ to be non-convex such that F is convex. This is the Convex Non-Convex (CNC) approach. 5 / 66
6 Biosensor Signal Biosensor data Time (n) 6 / 66
7 Linear Filter Given noisy data y R N, perform smoothing via: ˆx = arg min x R N which can be written where { N } N y(n) x(n) 2 + λ x(n) x(n ) 2 n= n= { } ˆx = arg min y x 2 x R N 2 + λ Dx 2 2 x 2 2 := n x(n) 2 D = Solution: ˆx = (I + λd T D) y 7 / 66
8 Biosensor Signal Biosensor data Time (n) Linear filter Time (n) 8 / 66
9 Total Variation Denoising (Nonlinear Filter) Given noisy data y R N, perform smoothing via: { } N N ˆx = arg min y(n) x(n) 2 + λ x(n) x(n ) x R N 2 which can be written n= ˆx = arg min x R N n= { } 2 y x λ Dx where x 2 2 := n x(n) 2, x := n x(n) Solution? No closed form solution. D = Use iterative algorithm... but note the cost function is not differentiable! 9 / 66
10 Biosensor Signal Biosensor data Time (n) Total variation denoising Time (n) / 66
11 Two Penalties The l norm induces sparsity unlike the the sum of squares. x 2 x 2 2 x / 66
12 Combine Quadratic and Sparse Regularization { } arg min u,v R N 2 y u v λ Du + λ2 2 Dv 2 2 ˆx = u + v 2 / 66
13 Combine Quadratic and Sparse Regularization Solving for v gives { } arg min u,v R N 2 y u v λ Du + λ2 2 Dv 2 2 v = (I + λ 2D T D) (y u) x = v + u = (I + λ 2D T D) (y + λ 2D T Du) Substituting v back in to the cost function: or J(u) = λ2 2 (y u)t D T (I + λ 2DD T ) D(y u) + λ Du J(u) = λ2 2 R D(y u) λ Du RR T = I + λ 2DD T (R is a banded matrix) Since x depends on Du, not u directly, define g = Du. So we need to minimize F (g) = λ2 2 R Dy R g λ g 3 / 66
14 Biosensor Signal Biosensor data Time (n) SIPS Denoising Time (n) 4 / 66
15 Scalar case { } ˆx = arg min x 2 (y x)2 + λ x = ˆx λ y 5 / 66
16 Non-convex scalar penalty functions (alternatives to l norm) Log φ a(x) = log( + a x ) a Rat φ a(x) = x + a x /2 Exp φ a(x) = ) ( e a x a x a MC φ a(x) = x 2, x /a 2, x /a 2a The penalties are parameterized such that φ a( + ) = φ a ( + ) = a. 6 / 66
17 Non-convex scalar penalty functions Penalty functions with a =. 4 Penalty functions 3 x 2 Log Rat Exp MC x 7 / 66
18 Logarithmic penalty 2 Logarithmic penalty (a = ) Second derivative x 8 / 66
19 MC penalty 2 MC penalty (a = ) Second derivative x 9 / 66
20 l p penalty, < p < (precluded) 2 x p (p =.5) Second derivative x 2 / 66
21 Scalar MC penalty We consider henceforth only the minimax-concave (MC) penalty function x a φ a(x) = x 2, x /a 2, x /a 2a x φ(x) x / a 2 / 66
22 Scalar MC penalty The parameter a controls the non-convexity of φ a. 2.5 MC penalty φ a (x) a =. 2.5 a =.25 a =.5.5 a = x 22 / 66
23 Scalar case using MC penalty { } ˆx = arg min x 2 (y x)2 + λφ a(x) ˆx λ / a y ˆx is a continuous function of y when a < λ. 23 / 66
24 Scalar case using MC penalty Consider f (x) = 2 (y x)2 + λφ a(x). For what values a is f a convex function? f (x) = 2 x 2 + λφ a(x) }{{} f (x) It is sufficient to consider the convexity of [ + f (x) = 2 x 2 + λφ a(x). ] 2 y 2 yx. }{{} convex in x 24 / 66
25 Scalar case using MC penalty f (x) = 2 x 2 + λφ a(x). 5 x 2 λ φ a ( x ) Is f convex? φ a is not differentiable. So we can not simply check that the second derivative of f is positive / 66
26 Scalar case using MC penalty Let us write φ a(x) = x s a(x) φ a ( x ) = x s a ( x ) x s a ( x ) We see the Huber function: a s a(x) = x 2, x /a 2 x, x /a. 2a 26 / 66
27 Scalar case using MC penalty Writing φ a as φ a(x) = x s a(x), we have f (x) = 2 (y x)2 + λφ a(x) = [ ] 2 (y x)2 + λ x s a(x) = [ 2 x 2 λs a(x) + λ x + ] }{{} 2 y 2 yx }{{} g(x) convex in x g convex = f convex Note that g is differentiable unlike f. Is g convex? It depends on a and λ. 27 / 66
28 Scalar case using MC penalty g(x) = 2 x 2 λs a(x). 5 x 2 s a ( x ) g ( x ) =. 5 x 2 λ s a ( x ) a =, λ =.8, g is convex. 28 / 66
29 Scalar case using MC penalty g(x) = 2 x 2 λs a(x). 5 x 2 s a ( x ) g ( x ) =. 5 x 2 λ s a ( x ) a =, λ =.3, g is not convex. 29 / 66
30 Scalar case using MC penalty The Huber function is differentiable. But not twice differentiable. s ( x ) s a ( x ) s a ( x ) 3 / 66
31 Scalar case using MC penalty g(x) = 2 x 2 λs a(x) When is g convex? We can not check the second derivative of g because it is not twice differentiable (see previous page). How can we ensure g (and hence f ) is convex? 3 / 66
32 Huber function as an infimal convolution The Huber function can be written as { a } s a(x) = min v 2 (x v)2 + v. As infimal convolution ( a ) s a(x) = 2 ( )2 (x) where infimal convolution (Moreau-Yosida regularization) is defined as (f g)(x) := min v { f (v) + g(x v) } 32 / 66
33 Huber function as an infimal convolution ( a 2 ( )2 ) (x). 5 x 2 g ( x ) = x Huber function 33 / 66
34 Epigraph The epigraph of a function is a set comprising points on and above the graph The epigraph of epi{f g} = epi{f } + epi{g} 34 / 66
35 Scalar case using MC penalty The Huber function can be written as { a } s a(x) = min v 2 (x v)2 + v. When is g convex? g(x) = 2 x 2 λs a(x) = { 2 x 2 a } λ min v 2 (x v)2 + v = 2 x 2 λ min v = 2 x 2 λ a 2 x 2 + λ max v { a } 2 (x 2 2xv + v 2 ) + v { a } 2 (2xv v 2 ) v }{{} } affine in x {{ } convex in x = 2 ( aλ) x 2 + convex function The function g is convex if aλ is non-negative, i.e., a /λ We do not need derivatives! 35 / 66
36 Scalar MC penalty The MC penalty can be written as φ a(x) = x s a(x) { a } = x min v 2 (x v)2 + v }{{} Huber function ( a ) = x 2 ( )2 (x) φ a ( x ) = x s a ( x ) x s a ( x ) 36 / 66
37 Multivariate case F (x) = 2 y Ax λ ψ(x) How can we set ψ so that F is convex and promotes sparsity of x? We can generalize the scalar case / 66
38 Separable penalty (precluded) Conventional penalty: additive (separable) φ(x) = φ(x ) + φ(x 2) φ(x ) + φ(x 2 ).5 x x 38 / 66
39 Generalized Huber function Let B R M N. We define the generalized Huber function { } S B (x) := min v R N 2 B(x v) v. In the notation of infimal convolution, we have ( ) S B (x) := 2 B 2 2 (x). 39 / 66
40 Example. Generalized Huber function Generalized Huber function S B (x) 3 2 B = x 2 x S B (x) := min v { 2 B(x v) v } Contours of S B 4 / 66
41 Example 2. Generalized Huber function Generalized Huber function S B (x) 2 B = [.5] x 2 x S B (x) := min v { 2 B(x v) v } Contours of S B 4 / 66
42 Example 3. Generalized Huber function Generalized Huber function S B (x) 2 [ ] B = S B (x) := min v { 2 B(x v) v } x 2 x Contours of S B If B is diagonal, then S B is separable. 42 / 66
43 Generalized Huber function The generalized Huber function is differentiable. Its gradient is given by S B (x) = B T B ( { }) x arg min v R N 2 B(x v) v. Neither the generalized Huber function nor its gradient have simple closed form expressions. But we will still be able to use them... When B = I we recover a well known identity S I (x) = x arg min v R N { 2 x v v } 43 / 66
44 Generalized MC penalty We define the generalized MC (GMC) penalty ψ B (x) := x S B (x) { } := x min v R N 2 B(x v) v. 44 / 66
45 Example. Generalized MC penalty x S B (x) ψ B (x) = x S B (x) x 2 x x 2 x Contours Contours Contours 45 / 66
46 Example 2. Generalized MC penalty x S B (x) ψ B (x) = x S B (x) x 2 x x 2 x Contours Contours Contours 46 / 66
47 Example 3. Generalized MC penalty x S B (x) ψ B (x) = x S B (x) x 2 x x 2 x Contours Contours Contours 47 / 66
48 Convexity Theorem. The function F (x) = 2 y Ax λ ψ B (x) = 2 y Ax λ [ x S B (x) ] = 2 y Ax 2 2 λ x λ min v R N { 2 B(x v) v }. is convex if B T B λ AT A even when ψ B is non-convex. 48 / 66
49 Convexity condition proof Write F as F (x) = y 2 Ax λ ψ B (x) = y 2 Ax λ [ x S B (x) ] [ ] = 2 Ax y 2 2 y T Ax + λ x λs B (x) = 2 Ax 2 2 λs B (x) + 2 }{{} y 2 2 y T Ax + λ x }{{} G(x) convex in x G convex = F convex 49 / 66
50 Convexity condition proof Write G as G(x) = 2 Ax 2 2 λs B (x) { = 2 Ax 2 2 λ min B(x } v 2 v) v { = 2 Ax 2 2 λ min v 2 Bx Bv 2 2 v T B T } Bx + v { = 2 Ax 2 2 λ 2 Bx λ max v T B T } Bx v 2 Bv 2 2 v }{{} } affine in x {{ } convex in x = 2 x T( A T A λb T B ) x + convex function The function G (hence F ) is convex if A T A λb T B is positive semidefinite, i.e. B T B λ AT A. 5 / 66
51 Convexity condition A straightforward choice of B to satisfy B T B λ AT A. is for some γ with γ. B = γ λ A 5 / 66
52 Optimization Algorithm We can use the Forward-Backward Splitting (FBS) algorithm The FBS algorithm is given by: F (x) = 2 y Ax λ ψ B (x) = 2 y Ax λ x λs B (x) = y 2 Ax 2 2 λs B (x) + λ x }{{}}{{} convex, differentiable convex = f (x) + f 2(x) w (i) = x (i) µ [ f (x (i) ) ] x (i+) = arg min x { 2 w (i) x µf 2(x) } = prox µf2 (w (i) ) 52 / 66
53 Optimization Algorithm Let γ and B = γ/λ A. Then a minimizer of the objective function F is obtained by the iteration: v (i) = arg min v R N { γ 2 A(x (i) v) λ v } z (i) = γa(x (i) v (i) ) x (i+) = arg min x R N { 2 y + z (i) Ax λ x } Interpretation: iteratively adjusted additive data perturbation of l norm regularized problem / 66
54 Biosensor Signal Biosensor data Time (n) SIPS Denoising Time (n) 54 / 66
55 Biosensor Signal Biosensor data Time (n) SIPS Denoising (non separable non convex penalty) Time (n) 55 / 66
56 ECG Signal.5 ECG data Time (n).5 Linear filter Time (n) 56 / 66
57 ECG Signal.5 ECG data Time (n).5 Total variation denoising Time (n) 57 / 66
58 ECG Signal.5 ECG data Time (n).5 SIPS Denoising Time (n) 58 / 66
59 ECG Signal.5 ECG data Time (n).5 SIPS Denoising (non separable non convex penalty) Time (n) 59 / 66
60 Generalizations (TV, nuclear norm, etc) Consider the function J (x) := 2 Ax b λ R(x) () where R is a convex regularizer of the form R(x) = Φ(G(Lx)) where L is a linear operator, G is possibly nonlinear, and Φ promotes sparsity. For example, total variation regularization is expressed as R(x) = G(Lx) = i g i (Lx) with ([ ] ) Dh D v x g i (Lx) = i (Dh x) 2 i + (D v x) 2 i anisotropic TV isotropic TV 6 / 66
61 Generalizations (TV, nuclear norm, etc) Suppose R satisfies A) R( ) = Φ(G(L )) is convex and bounded from below by zero; A2) Φ(G( )) is a proper, lower semicontinuous and coercive function. Consider the function with non-convex regularizer J B (x) := 2 Ax b λ R B (x) (2) R B (x) := R(x) min v R N { 2 B(x v) R(v) }. The function J B (x) is convex (strongly convex) if A T A λ B T B ( ). Reference: A. Lanza, S. Morigi, I. Selesnick, and F. Sgallari. Sparsity-inducing Non-convex Non-separable Regularization for Convex Image Processing. 27. Submitted. 6 / 66
62 Numerical Examples D Total Variation denoising 6 Clean signal 6 Noisy signal TV Denoising (L norm) (best λ) RMSE = TV Denoising (GMC penalty) (best λ) RMSE = λ = λ = Reference: I. Selesnick, A. Lanza, S. Morigi, and F. Sgallari. Total Variation Signal Denoising via Convex Analysis. 27. Submitted. 62 / 66
63 Summary We show how to construct non-convex regularizers that preserve the convexity of functionals for sparse-regularized linear least-squares. Generalizes the l norm. Can be used in conjunction with other convex non-smooth regularizers (TV, nuclear norm, etc). Resulting regularizers are non-separable. Optimization implementable using proximal algorithms (as for l norm). 63 / 66
64 References A. Lanza, S. Morigi, I. Selesnick, and F. Sgallari. Sparsity-inducing Non-convex Non-separable Regularization for Convex Image Processing. 27. Submitted. I. Selesnick, A. Lanza, S. Morigi, and F. Sgallari. Total Variation Signal Denoising via Convex Analysis. 27. Submitted. A. Lanza, S. Morigi, I. Selesnick, and F. Sgallari. Nonconvex nonsmooth optimization via convex nonconvex majorization minimization. Numerische Mathematik, 36(2):343 38, 27. I. Selesnick. Sparse Regularization via Convex Analysis. IEEE Trans. Signal Process., 65(7): , September 27. I. Selesnick. Total Variation Denoising via the Moreau Envelope. IEEE Signal Processing Letters, 24(2):26 22, February 27. A. Lanza, S. Morigi, and F. Sgallari. Convex Image Denoising via Non-convex Regularization with Parameter Selection. J. Math. Imaging and Vision, 56(2):95 22, 26. A. Lanza, S. Morigi, and F. Sgallari. Convex Image Denoising via Non-Convex Regularization. In Scale Space and Variational Methods in Computer Vision. Volume I. W. Selesnick, A. Parekh, and I. Bayram. Convex -D Total Variation Denoising with Non-convex Regularization. IEEE Signal Processing Letters, 22(2):4 44, February / 66
65 Optimization Algorithm (saddle point algorithm) With we have and J B (x) = Ax 2 b λ R B (x) { R B (x) = R(x) min B(x v 2 v) R(v) } J B (x) = Ax { 2 b λr(x) λ min B(x v 2 v) R(v) } = Ax { 2 b λr(x) + λ max B(x v 2 v) 2 2 R(v) } ˆx = arg min J B (x) x = arg min { x or a saddle point problem (ˆx, ˆv) = arg min x 2 Ax b λr(x) + λ max v max v { B(x 2 v) 2 2 R(v) }} { 2 Ax b λr(x) λ 2 B(x v) 2 2 λr(v) } 65 / 66
66 Optimization Algorithm (saddle point algorithm) This saddle-point problem is an instance of a monotone inclusion problem. Hence, the solution can be obtained using the forward-backward (FB) algorithm. for k =,, 2,... [ ] w k = x k µ A T (Ax k b) + λ B T B (v k x k ) u k = v k µλb T B(v k x k ) { x k+ = arg min R(x) + } x R n 2µλ x w k 2 2 { v k+ = arg min R(v) + } v R n 2µλ v u k 2 2 end This algorithm reduces to ISTA for B = and R(x) = x. 66 / 66
The Suppression of Transient Artifacts in Time Series via Convex Analysis
The Suppression of Transient Artifacts in Time Series via Convex Analysis Yining Feng 1, Harry Graber 2, and Ivan Selesnick 1 (1) Tandon School of Engineering, New York University, Brooklyn, New York (2)
More informationSPARSE SIGNAL RESTORATION. 1. Introduction
SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful
More informationSparse Regularization via Convex Analysis
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP. 448-4494, SEPTEMBER 7 (PREPRINT) Sparse Regularization via Convex Analysis Ivan Selesnick Abstract Sparse approximate solutions to linear equations
More informationIntroduction to Sparsity in Signal Processing
1 Introduction to Sparsity in Signal Processing Ivan Selesnick Polytechnic Institute of New York University Brooklyn, New York selesi@poly.edu 212 2 Under-determined linear equations Consider a system
More informationSparse Signal Estimation by Maximally Sparse Convex Optimization
IEEE TRANSACTIONS ON SIGNAL PROCESSING. (PREPRINT) Sparse Signal Estimation by Maximally Sparse Convex Optimization Ivan W. Selesnick and Ilker Bayram Abstract This paper addresses the problem of sparsity
More informationIntroduction to Sparsity in Signal Processing
1 Introduction to Sparsity in Signal Processing Ivan Selesnick Polytechnic Institute of New York University Brooklyn, New York selesi@poly.edu 212 2 Under-determined linear equations Consider a system
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationSparsity-Assisted Signal Smoothing
Sparsity-Assisted Signal Smoothing Ivan W. Selesnick Electrical and Computer Engineering NYU Polytechnic School of Engineering Brooklyn, NY, 1121 selesi@poly.edu Summary. This chapter describes a method
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationOn the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems
On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationϕ ( ( u) i 2 ; T, a), (1.1)
CONVEX NON-CONVEX IMAGE SEGMENTATION RAYMOND CHAN, ALESSANDRO LANZA, SERENA MORIGI, AND FIORELLA SGALLARI Abstract. A convex non-convex variational model is proposed for multiphase image segmentation.
More informationA New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction
A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization
More informationRelaxed linearized algorithms for faster X-ray CT image reconstruction
Relaxed linearized algorithms for faster X-ray CT image reconstruction Hung Nien and Jeffrey A. Fessler University of Michigan, Ann Arbor The 13th Fully 3D Meeting June 2, 2015 1/20 Statistical image reconstruction
More informationConvex Analysis and Optimization Chapter 2 Solutions
Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationConvex Functions. Pontus Giselsson
Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationTranslation-Invariant Shrinkage/Thresholding of Group Sparse. Signals
Translation-Invariant Shrinkage/Thresholding of Group Sparse Signals Po-Yu Chen and Ivan W. Selesnick Polytechnic Institute of New York University, 6 Metrotech Center, Brooklyn, NY 11201, USA Email: poyupaulchen@gmail.com,
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationAdaptive discretization and first-order methods for nonsmooth inverse problems for PDEs
Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs Christian Clason Faculty of Mathematics, Universität Duisburg-Essen joint work with Barbara Kaltenbacher, Tuomo Valkonen,
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationConvex Optimization Conjugate, Subdifferential, Proximation
1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian Goldlücke Overview
More informationVariational methods for restoration of phase or orientation data
Variational methods for restoration of phase or orientation data Martin Storath joint works with Laurent Demaret, Michael Unser, Andreas Weinmann Image Analysis and Learning Group Universität Heidelberg
More informationPDEs in Image Processing, Tutorials
PDEs in Image Processing, Tutorials Markus Grasmair Vienna, Winter Term 2010 2011 Direct Methods Let X be a topological space and R: X R {+ } some functional. following definitions: The mapping R is lower
More informationDenoising of NIRS Measured Biomedical Signals
Denoising of NIRS Measured Biomedical Signals Y V Rami reddy 1, Dr.D.VishnuVardhan 2 1 M. Tech, Dept of ECE, JNTUA College of Engineering, Pulivendula, A.P, India 2 Assistant Professor, Dept of ECE, JNTUA
More informationThe proximal mapping
The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function
More informationEfficient robust optimization for robust control with constraints Paul Goulart, Eric Kerrigan and Danny Ralph
Efficient robust optimization for robust control with constraints p. 1 Efficient robust optimization for robust control with constraints Paul Goulart, Eric Kerrigan and Danny Ralph Efficient robust optimization
More informationA Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing
A Majorize-Minimize subspace approach for l 2 -l 0 regularization with applications to image processing Emilie Chouzenoux emilie.chouzenoux@univ-mlv.fr Université Paris-Est Lab. d Informatique Gaspard
More informationSPARSITY-ASSISTED SIGNAL SMOOTHING (REVISITED) Ivan Selesnick
SPARSITY-ASSISTED SIGNAL SMOOTHING (REVISITED) Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering, New York University Brooklyn, New York, USA ABSTRACT This paper proposes
More informationConvex Functions and Optimization
Chapter 5 Convex Functions and Optimization 5.1 Convex Functions Our next topic is that of convex functions. Again, we will concentrate on the context of a map f : R n R although the situation can be generalized
More informationArtifact-free Wavelet Denoising: Non-convex Sparse Regularization, Convex Optimization
IEEE SIGNAL PROCESSING LETTERS. 22(9):164-168, SEPTEMBER 215. (PREPRINT) 1 Artifact-free Wavelet Denoising: Non-convex Sparse Regularization, Convex Optimization Yin Ding and Ivan W. Selesnick Abstract
More informationA memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration
A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot Université Paris-Est Lab. d Informatique Gaspard
More informationAbout Split Proximal Algorithms for the Q-Lasso
Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationSolving DC Programs that Promote Group 1-Sparsity
Solving DC Programs that Promote Group 1-Sparsity Ernie Esser Contains joint work with Xiaoqun Zhang, Yifei Lou and Jack Xin SIAM Conference on Imaging Science Hong Kong Baptist University May 14 2014
More informationSolution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark
Solution Methods Richard Lusby Department of Management Engineering Technical University of Denmark Lecture Overview (jg Unconstrained Several Variables Quadratic Programming Separable Programming SUMT
More informationOperator Splitting for Parallel and Distributed Optimization
Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer
More informationVariational Image Restoration
Variational Image Restoration Yuling Jiao yljiaostatistics@znufe.edu.cn School of and Statistics and Mathematics ZNUFE Dec 30, 2014 Outline 1 1 Classical Variational Restoration Models and Algorithms 1.1
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationProximal splitting methods on convex problems with a quadratic term: Relax!
Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationEE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1
EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex
More informationExponential Weighted Aggregation vs Penalized Estimation: Guarantees And Algorithms
Exponential Weighted Aggregation vs Penalized Estimation: Guarantees And Algorithms Jalal Fadili Normandie Université-ENSICAEN, GREYC CNRS UMR 6072 Joint work with Tùng Luu and Christophe Chesneau SPARS
More information4TE3/6TE3. Algorithms for. Continuous Optimization
4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca
More informationN. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:
0.1 N. L. P. Katta G. Murty, IOE 611 Lecture slides Introductory Lecture NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP does not include everything
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationA Dykstra-like algorithm for two monotone operators
A Dykstra-like algorithm for two monotone operators Heinz H. Bauschke and Patrick L. Combettes Abstract Dykstra s algorithm employs the projectors onto two closed convex sets in a Hilbert space to construct
More informationDuality, Dual Variational Principles
Duality, Dual Variational Principles April 5, 2013 Contents 1 Duality 1 1.1 Legendre and Young-Fenchel transforms.............. 1 1.2 Second conjugate and convexification................ 4 1.3 Hamiltonian
More informationAdaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise
Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,
More informationEE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)
EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in
More informationGEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect
GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, 2018 BORIS S. MORDUKHOVICH 1 and NGUYEN MAU NAM 2 Dedicated to Franco Giannessi and Diethard Pallaschke with great respect Abstract. In
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationSelected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018
Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationLarge-Scale L1-Related Minimization in Compressive Sensing and Beyond
Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationMATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018
MATH 57: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 18 1 Global and Local Optima Let a function f : S R be defined on a set S R n Definition 1 (minimizers and maximizers) (i) x S
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationWSEAS TRANSACTIONS on MATHEMATICS
l p -Norm Minimization Method for Solving Nonlinear Systems of Equations ANDRÉ A. KELLER Laboratoire d Informatique Fondamentale de Lille/ Section SMAC UMR 8022 CNRS) Université de Lille 1 Sciences et
More informationShould penalized least squares regression be interpreted as Maximum A Posteriori estimation?
Should penalized least squares regression be interpreted as Maximum A Posteriori estimation? Rémi Gribonval To cite this version: Rémi Gribonval. Should penalized least squares regression be interpreted
More informationBayesian Models for Regularization in Optimization
Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationChapter 13. Convex and Concave. Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44
Chapter 13 Convex and Concave Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44 Monotone Function Function f is called monotonically increasing, if x 1 x 2 f (x 1 ) f (x 2 ) It
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationIn collaboration with J.-C. Pesquet A. Repetti EC (UPE) IFPEN 16 Dec / 29
A Random block-coordinate primal-dual proximal algorithm with application to 3D mesh denoising Emilie CHOUZENOUX Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est, France Horizon Maths 2014
More informationSolution-driven Adaptive Total Variation Regularization
1/15 Solution-driven Adaptive Total Variation Regularization Frank Lenzen 1, Jan Lellmann 2, Florian Becker 1, Stefania Petra 1, Johannes Berger 1, Christoph Schnörr 1 1 Heidelberg Collaboratory for Image
More informationImage restoration: numerical optimisation
Image restoration: numerical optimisation Short and partial presentation Jean-François Giovannelli Groupe Signal Image Laboratoire de l Intégration du Matériau au Système Univ. Bordeaux CNRS BINP / 6 Context
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationConvex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version
Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationSequential convex programming,: value function and convergence
Sequential convex programming,: value function and convergence Edouard Pauwels joint work with Jérôme Bolte Journées MODE Toulouse March 23 2016 1 / 16 Introduction Local search methods for finite dimensional
More informationarxiv: v3 [math.oc] 29 Jun 2016
MOCCA: mirrored convex/concave optimization for nonconvex composite functions Rina Foygel Barber and Emil Y. Sidky arxiv:50.0884v3 [math.oc] 9 Jun 06 04.3.6 Abstract Many optimization problems arising
More informationBASICS OF CONVEX ANALYSIS
BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,
More informationA function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3
Convex functions The domain dom f of a functional f : R N R is the subset of R N where f is well-defined. A function(al) f is convex if dom f is a convex set, and f(θx + (1 θ)y) θf(x) + (1 θ)f(y) for all
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationNonlinear Diffusion. Journal Club Presentation. Xiaowei Zhou
1 / 41 Journal Club Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology 2009-12-11 2 / 41 Outline 1 Motivation Diffusion process
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationPrimal-dual subgradient methods for convex problems
Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationTight Rates and Equivalence Results of Operator Splitting Schemes
Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1
More informationSequential Unconstrained Minimization: A Survey
Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.
More informationEnhanced Compressive Sensing and More
Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University
More informationA user s guide to Lojasiewicz/KL inequalities
Other A user s guide to Lojasiewicz/KL inequalities Toulouse School of Economics, Université Toulouse I SLRA, Grenoble, 2015 Motivations behind KL f : R n R smooth ẋ(t) = f (x(t)) or x k+1 = x k λ k f
More informationMath (P)refresher Lecture 8: Unconstrained Optimization
Math (P)refresher Lecture 8: Unconstrained Optimization September 2006 Today s Topics : Quadratic Forms Definiteness of Quadratic Forms Maxima and Minima in R n First Order Conditions Second Order Conditions
More informationInverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology
Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 27 Introduction Fredholm first kind integral equation of convolution type in one space dimension: g(x) = 1 k(x x )f(x
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationAUGMENTED LAGRANGIAN METHODS FOR CONVEX OPTIMIZATION
New Horizon of Mathematical Sciences RIKEN ithes - Tohoku AIMR - Tokyo IIS Joint Symposium April 28 2016 AUGMENTED LAGRANGIAN METHODS FOR CONVEX OPTIMIZATION T. Takeuchi Aihara Lab. Laboratories for Mathematics,
More informationQUADRATIC MAJORIZATION 1. INTRODUCTION
QUADRATIC MAJORIZATION JAN DE LEEUW 1. INTRODUCTION Majorization methods are used extensively to solve complicated multivariate optimizaton problems. We refer to de Leeuw [1994]; Heiser [1995]; Lange et
More informationNumerical Optimization
Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17 Partial derivative of a two variable function
More information