The proximal mapping

Similar documents
EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

8. Conjugate functions

Dual Proximal Gradient Method

Lecture: Duality of LP, SOCP and SDP

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Dual Decomposition.

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lecture 1: Background on Convex Analysis

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Convex Functions. Pontus Giselsson

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Lecture: Smoothing.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Constrained Optimization and Lagrangian Duality

Lecture 4: Convex Functions, Part I February 1

6. Proximal gradient method

Lecture: Convex Optimization Problems

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

6. Proximal gradient method

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Proximal methods. S. Villa. October 7, 2014

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Optimization and Optimal Control in Banach Spaces

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Chapter 2 Convex Analysis

Lecture: Duality.

Convex Optimization Conjugate, Subdifferential, Proximation

9. Dual decomposition and dual algorithms

Convex Optimization & Lagrange Duality

1 Directional Derivatives and Differentiability

POLARS AND DUAL CONES

Convex Sets. Prof. Dan A. Simovici UMB

IE 521 Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

4. Convex optimization problems

3.1 Convexity notions for functions and basic properties

IE 521 Convex Optimization

10. Unconstrained minimization

A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

LECTURE SLIDES ON BASED ON CLASS LECTURES AT THE CAMBRIDGE, MASS FALL 2007 BY DIMITRI P. BERTSEKAS.

Dedicated to Michel Théra in honor of his 70th birthday

Linear and non-linear programming

Lecture Note 5: Semidefinite Programming for Stability Analysis

15. Conic optimization

Lecture 3. Optimization Problems and Iterative Algorithms

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

Helly's Theorem and its Equivalences via Convex Analysis

Lecture 2: Convex functions

EE Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko, JPL Third Term

1 Review of last lecture and introduction

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

Lecture 8. Strong Duality Results. September 22, 2008

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Optimality Conditions for Nonsmooth Convex Optimization

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Optimization Theory. A Concise Introduction. Jiongmin Yong

Dual and primal-dual methods

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1

BASICS OF CONVEX ANALYSIS

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

Lecture 1: Convex Sets January 23

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

Convex functions. Definition. f : R n! R is convex if dom f is a convex set and. f ( x +(1 )y) < f (x)+(1 )f (y) f ( x +(1 )y) apple f (x)+(1 )f (y)

Strongly convex functions, Moreau envelopes and the generic nature of convex functions with strong minimizers

Math 273a: Optimization Subgradient Methods

Convex Functions and Optimization

Summer School: Semidefinite Optimization

LECTURE SLIDES ON CONVEX OPTIMIZATION AND DUALITY THEORY TATA INSTITUTE FOR FUNDAMENTAL RESEARCH MUMBAI, INDIA JANUARY 2009 PART I

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

5. Duality. Lagrangian

Optimization for Machine Learning

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Analysis and Optimization Chapter 2 Solutions

Convex Analysis and Economic Theory Winter 2018

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Convex Optimization M2

Math 273a: Optimization Convex Conjugacy

Convex Analysis Background

Math 273a: Optimization Subgradients of convex functions

Introduction to Convex and Quasiconvex Analysis

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

A : k n. Usually k > n otherwise easily the minimum is zero. Analytical solution:

LECTURE SLIDES ON CONVEX ANALYSIS AND OPTIMIZATION BASED ON LECTURES GIVEN AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY CAMBRIDGE, MASS

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Mathematics 530. Practice Problems. n + 1 }

A Dykstra-like algorithm for two monotone operators

Convex Optimization Boyd & Vandenberghe. 5. Duality

Transcription:

The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes

Outline 2/37 1 closed function 2 Conjugate function 3 Proximal Mapping

Closed set 3/37 a set C is closed if it contains its boundary: x k C, x k x = x C operations that preserve closedness the intersection of (finitely or infinitely many) closed sets is closed the union of a finite number of closed sets is closed inverse under linear mapping: {x Ax C} is closed if C is closed

4/37 Image under linear mapping the image of a closed set under a linear mapping is not necessarily closed example(c is closed, AC = {Ax x C} is open): C = {(x 1, x 2 ) R 2 + x 1 x 2 1}, A = [ 1 0 ], AC = R ++ sufficient condition: AC is closed if C is closed and convex and C does not have a recession direction in the nullspace of A. i.e., Ay = 0, ˆx C, ˆx + αy C α 0 = y = 0 in particular, this holds for any A if C is bounded

5/37 Closed function definition: a function is closed if its epigraph is a closed set examples f (x) = log(1 x 2 ) with dom f = {x x < 1} f (x) = x log x with dom f = R + and f (0) = 0 indicator function of a closed set C: f (x) = 0 if x C = dom f not closed f (x) = x log x with dom f = R ++ or dom f = R + and f (0) = 1 indicator function of a set C if C is not closed

Properties sublevel sets: f is closed if and only if all its sublevel sets are closed minimum: if f is closed with bounded sublevel sets then it has a minimizer Weierstrass Suppose that the set D E (a finite dimensional vector space over R n ) is nonempty and closed, and that all sublevel sets of the continuous function f : D R are bounded. Then f has a global minimizer. common operations on convex functions that preserve closedness sum: f + g is closed if f and g are closed (and dom f dom g ) composition with affine mapping: f (Ax + b) is closed if f is closed supremum: sup α f α (x) is closed if each function f α is closed 6/37

Outline 7/37 1 closed function 2 Conjugate function 3 Proximal Mapping

Conjugate function 8/37 the conjugate of a function f is f (y) = sup (y T x f (x)) x dom f f is closed and convex even if f is not Fenchel s s inequality f (x) + f (y) x T y x, y (extends inequality x T x/2 + y T y/2 x T y to non-quadratic convex f )

Quadratic function 9/37 f (x) = 1 2 xt Ax + b T x + c strictly convex case (A 0) f (y) = 1 2 (y b)t A 1 (y b) c general convex case (A 0) f (y) = 1 2 (y b)t A (y b) c, dom f = range(a) + b

Negative entropy and negative logarithm 10/37 negative entropy f (x) = negative logarithm n x i log x i f (y) = n i=1 i=1 e y i 1 n f (x) = log x i i=1 n f (y) = log( y i ) n i=1 matrix logarithm f (x) = log det X (dom f = S n ++) f (Y) = log det( Y) n

Indicator function and norm 11/37 indicator of convex set C: conjugate is support function of C { 0, x C f (x) = +, x C f (y) = sup y T x x C norm: conjugate is indicator of unit dual norm ball { f (x) = x f 0, y 1 (y) = +, y > 1 (see next page)

12/37 proof: recall the definition of dual norm: y = sup x T y x 1 to evaluate f (y) = sup x (y T x x ) we distinguish two cases if y 1, then (by definition of dual norm) y T x x x and equality holds if x = 0; therefore sup x (y T x x ) = 0 if y > 1, there exists an x with x 1, x T y > 1; then f (y) y T (tx) tx = t(y T x x ) and r.h.s. goes to infinity if t

The second conjugate 13/37 f (x) = sup (x T y f (y)) y dom f f (x) is closed and convex from Fenchel s inequality x T y f (y) f (x) for all y and x): f f (x) x equivalently, epi f epi f (for any f ) if f is closed and convex, then f (x) = f (x) x equivalently, epi f = epi f (if f is closed convex); proof on next page

proof (f = f if f is closed and convex): by contradiction suppose (x, f (x)) epi f ; then there is a strict separating hyperplane: [ a b ] T [ z x s f (x) ] c 0 (z, s) epi f for some a, b, c with b 0 (b > 0 gives a contradiction as s ) if b < 0, define y = a/( b) and maximize l.h.s. over (z, s) epi f : f (y) y T x + f (x) c/( b) < 0 this contradicts Fenchel s inequality if b = 0, choose ŷ dom f and add small multiple of (ŷ, 1) to (a, b): [ ] T [ ] a + ɛŷ z x ɛ s f c + ɛ(f (ŷ) x T ŷ + f (x)) < 0 (x) now apply the argument for b < 0 14/37

Conjugates and subgradients 15/37 if f is closed and convex, then y f (x) x f (y) x T y = f (x) + f (y) proof: if y f (x), then f (y) = sup u (y T u f (u)) = y T x f (x) f (v) = sup(v T u f (u) u v T x f (x) = x T (v y) f (x) + y T x = f (y) + x T (v y) (1) for all v; therefore, x is a subgradient of f at y (x f (y)) reverse implication x f (y) y f (x) follows from f = f

Some calculus rules 16/37 separable sum f (x 1, x 2 ) = g(x 1 ) + h(x 2 ) f (y 1, y 2 ) = g (y 1 ) + h (y 2 ) scalar multiplication: (for α > 0) f (x) = αg(x) f (y) = αg (y/α) addition to affine function f (x) = g(x) + a T x + b f (y) = g (y a) b infimal convolution f (x) = inf (g(u) + h(v)) f u+v=x (y) = g (y) + h (y)

Outline 17/37 1 closed function 2 Conjugate function 3 Proximal Mapping

Proximal mapping 18/37 ( prox f (x) = argmin f (u) + 1 ) u 2 u x 2 2 if f is closed and convex then prox f (x) exists and is unique for all x existence: f (u) + (1/2) u x 2 2 is closed with bounded sublevel sets uniqueness: f (u) + (1/2) u x 2 2 is strictly (in fact, strongly) convex subgradient characterization u = prox f (x) x u f (u) we are interested in functions f for which prox tf is inexpensive

Examples 19/37 quadratic function (A 0) f (x) = 1 2 xt Ax + b T x + c, prox t f (x) = (I + ta) 1 (x tb) Euclidean norm: f (x) = x 2 f (x) = { (1 t/ x 2 )x x 2 t 0 otherwise logarithmic barrier f (x) = n i=1 log x i, prox t f (x) i = x i + xi 2 + 4t, i = 1,..., n 2

Some simple calculus rules 20/37 separable sum f ([ ]) ([ ]) x x = g(x) + h(y), prox y f = y [ ] proxg (x) prox h (y) scaling and translation of argument : with λ 0 f (x) = g(λx + a), prox f (x) = 1 λ (prox λ 2 g (λx + a) a) scaling and translation of argument : with λ > 0 f (x) = λg(x/λ), prox f = λprox λ 1 g (x/λ)

Addition to linear or quadratic function 21/37 linear function f (x) = g(x) + a T x, prox f = prox g (x a) quadratic function: with u > 0 f (x) = g(x) + u 2 x a 2 2, prox f (x) = prox θg (θx + (1 θ)a), where θ = 1/(1 + u)

22/37 Moreau decomposition x = prox f (x) + prox f (x) x follows from properties of conjugates and subgradients: u = prox f (x) x u f (u) u f (x u) x u = prox f (x) generalizes decomposition by orthogonal projection on subspaces: x = P L (x) + P L (x) if L is a subspace, L its orthogonal complement (this is Moreau decomposition with f = I L, f = I L )

Extended Moreau decomposition 23/37 for λ > 0 x = prox λf (x) + λprox λ 1 f (x/λ) x proof: apply Moreau decomposition to λf x = prox λf (x) + prox (λf ) (x) = prox λf (x) + λprox λ 1 f (x/λ) second line uses (λf ) (y) = λf (y/λ)

Composition with affine mapping 24/37 for general A, the prox-operator of f (x) = g(ax + b) does not follow easily from the prox-operator of g however if AA T = (1/α)I, we have prox f (x) = (I αa T A)x + αa T (prox α 1 g (Ax + b) b) example: f (x 1,..., x m ) = g(x 1 + x 2 + + x m ) prox f (x 1,..., x m ) i = x i 1 m m x j prox m mg ( x j ) j=1 j=1

25/37 proof: u = prox f (x) is the solution of the optimization problem with variables u, y min u,y g(y) + 1 2 u x 2 2 s.t. Au + b = y eliminate u using the expression optimal y is minimizer of u = x + A T (AA T ) 1 (y b Ax) = (I αa T A)x + αa T (y b) g(y) + α2 2 AT (y b Ax) 2 2 = g(y) + α 2 y b Ax 2 2 solution is y = prox α 1 g(ax + b)

Projection on affine sets 26/37 hyperplane: C = {x a T x = b} (with a 0) P C (x) = x + b at x a 2 a 2 affine set: C = {x Ax = b} (with A R p n and rank(a) = p) P C (x) = x + A T (AA T ) 1 (b Ax) inexpensive if p n, or AA T = I,...

Projection on simple polyhedral sets 27/37 halfspace: C = {x a T x b}(with a 0) P C (x) = { x + b a T xa a 2 2 x rectangle: C = [l, u] = {l x u} if a T x > b if a T x b P C (x) i = l i x i u i x i l i l i x i u i x i u i nonnegative orthant: C = R n + P C (x) = x + (x + is componentwise maximum of 0 and x )

28/37 probability simplex: C = {x 1 T x = 1, x 0} P C (x) = (x λ1) + where λ is the solution of the equation 1 T (x λ1) + = n max{0, x k λ} = 1 i=1 probability simplex: C = {x a T x = b, l x u} P c (x) = P [l,u] (x λa) where λ is the solution of a T P [l,u] (x λa) = b

Projection on norm balls 29/37 Euclidean ball: C = {x x 2 1} P C (x) = 1-norm ball: C = {x x 1 1} { 1 x 2 x if x 2 > 1 x if x 2 1 x k λ, x k > λ P c (x) k = 0, λ x k λ x k + λ, x k < λ λ = 0 if x 1 1; otherwise λ is the solution of the equation n max{ x k λ, 0} = 1 k=1

30/37 Projection on simple cones second order cone C = {(x, t) R n 1 x 2 t} P C (x, t) = (x, t) if x 2 t, P C (x, t) = (0, 0) if x 2 t and P C (x, t) = t + x [ 2 x ] 2 x 2 x 2 if t < x 2 < t, x 0 positive semidefinite cone C = S n + P C (X) = n max{0, λ i }q i q T i if X = n i=1 λ iq i q T i is the eigenvalue decomposition of X i=1

Support function 31/37 conjugate of support function of closed convex set is indicator function f (x) = S C (x) = sup y C x T y, f (y) = I C (y) prox-operator of support function: apply Moreau decomposition prox t f = x tprox t 1 f (x/t) = x tp C (x/t) example: f (x) is sum of largest r components of x f (x) = x [1] + + x [r] = S C (x), C = {y 0 y 1, 1 T y = r} prox-operator of f is easily evaluated via projection on C

32/37 Norms conjugate of norm is indicator function of dual norm ball: f (x) = c, f (x) = I B (y) (B = {y y 1}) prox-operator of norm: apply Moreau decomposition prox t f = x tprox t 1 f (x/t) = x tp B (x/t) = x P tb (x) useful formula for prox t when projection on tb = {x x t} is cheap examples: 1, 2

Distance to a point 33/37 distance (in general norm) f (x) = x a prox-operator: from page 20, with g(x) = x prox t f = a + prox t g (x a) = a + x a tp B ( x a ) t = x P tb (x a) B is the unit ball for the dual norm

34/37 Euclidean distance to a set Euclidean distance (to a closed convex set C) d(x) = inf y C x y 2 prox-operator of distance prox td (x) = θp C (x) + (1 θ)x, θ = { t/d(x) prox-operator of squared distance: f (x) = d(x) 2 /2 prox t f = 1 1 + t x + t 1 + t P C(x) d(x) t 1 otherwise

35/37 proof (expression for prox td (x)) if u = prox td (x) / C, then from the definition and subgradient for d x u = t d(u) (u P C(u)) implies P C (u) = P C (x), d(x) t, u is convex combination of x, P C (x) if u C minimizes d(u) + (1/(2t)) u x 2 2, then u = P C(x)

36/37 proof (expression for prox tf (x) when f (x) = d(x) 2 /2) ( 1 prox t f (x) = arg min u 2 d(u)2 + 1 ) 2t u x 2 2 ( 1 = arg min inf u 2 u v 2 2 + 1 2t 2) u x 2 v C optimal u as a function of v is optimal v minimizes 1 2 u = t t + 1 v + 1 t + 1 x t t + 1 v + 1 t + 1 x v 2 + 1 t 2 2t t + 1 v + 1 t + 1 x x Over C, i.e., v = P C (x) 2 2 = 1 2(1 + t) v x 2 2

Refrences 37/37 P. L. Combettes and V. R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Modeling and Simulation (2005). P. L. Combettes and J.-Ch. Pesquet, Proximal splitting methods in signal processsing, Fixed-Point Algorithms for Inverse Problems in Science and Engineering (2011). N. Parikh and S. Boyd, Proximal algorithms (2013).