EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

Similar documents
The proximal mapping

6. Proximal gradient method

6. Proximal gradient method

8. Conjugate functions

Dual Proximal Gradient Method

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Lecture 1: Background on Convex Analysis

Dual Decomposition.

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Fast proximal gradient methods

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Lecture: Duality of LP, SOCP and SDP

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Lecture: Smoothing.

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

Proximal methods. S. Villa. October 7, 2014

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

LECTURE 13 LECTURE OUTLINE

Optimization and Optimal Control in Banach Spaces

L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1

Convex Functions. Pontus Giselsson

9. Dual decomposition and dual algorithms

5. Subgradient method

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Convex Optimization Conjugate, Subdifferential, Proximation

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Lecture 2: Convex functions

Proximal gradient methods

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

A Brief Review on Convex Optimization

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

EE Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko, JPL Third Term

Subgradient Method. Ryan Tibshirani Convex Optimization

Convex Optimization. Lieven Vandenberghe. Electrical Engineering Department, UC Los Angeles

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

4. Convex optimization problems (part 1: general)

Lecture Notes on Iterative Optimization Algorithms

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Convex Analysis Background

A Dykstra-like algorithm for two monotone operators

Math 273a: Optimization Convex Conjugacy

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

Lecture 3. Optimization Problems and Iterative Algorithms

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

Mathematics 530. Practice Problems. n + 1 }

Optimization methods

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Constrained Optimization and Lagrangian Duality

3.1 Convexity notions for functions and basic properties

4. Convex optimization problems

15. Conic optimization

Unconstrained minimization

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Convex Optimization and Modeling

Splitting methods for decomposing separable convex programs

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Dual and primal-dual methods

Linear and non-linear programming

Optimality Conditions for Nonsmooth Convex Optimization

Linear Analysis Lecture 5

A user s guide to Lojasiewicz/KL inequalities

Conditional Gradient (Frank-Wolfe) Method

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Lecture: Convex Optimization Problems

Lecture 2: Convex Sets and Functions

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Convex functions. Definition. f : R n! R is convex if dom f is a convex set and. f ( x +(1 )y) < f (x)+(1 )f (y) f ( x +(1 )y) apple f (x)+(1 )f (y)

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Optimisation convexe: performance, complexité et applications.

Lecture 4: Convex Functions, Part I February 1

Sequential Unconstrained Minimization: A Survey

ORIE 6326: Convex Optimization. Quasi-Newton Methods

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Convex Optimization and Modeling

Lecture 1 Introduction

In Progress: Summary of Notation and Basic Results Convex Analysis C&O 663, Fall 2009

Proximal Operator and Proximal Algorithms (Lecture notes of UCLA 285J Fall 2016)

Convex Optimization M2

Lecture Note 5: Semidefinite Programming for Stability Analysis

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Strongly convex functions, Moreau envelopes and the generic nature of convex functions with strong minimizers

Douglas-Rachford splitting for nonconvex feasibility problems

Dual methods for the minimization of the total variation

Lecture 8. Strong Duality Results. September 22, 2008

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

Lecture 7: September 17

Transcription:

EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1

Proximal mapping the proximal mapping (prox-operator) of a convex function h is examples prox h (x) = argmin u h(x) = 0: prox h (x) = x ( h(u)+ 1 ) 2 u x 2 2 h(x) = I C (x) (indicator function of C): prox h is projection on C prox h (x) = argmin u C u x 2 2 = P C (x) h(x) = t x 1 : prox h is the soft-threshold (shrinkage) operation x i t x i t prox h (x) i = 0 x i t x i +t x i t Proximal mapping 6 2

Proximal gradient method unconstrained problem with cost function split in two components minimize f(x) = g(x)+h(x) g convex, differentiable, with domg = R n h convex, possibly nondifferentiable, with inexpensive prox-operator proximal gradient algorithm x (k) = prox tk h ( ) x (k 1) t k g(x (k 1) ) t k > 0 is step size, constant or determined by line search Proximal mapping 6 3

Interpretation x + = prox th (x t g(x)) from definition of proximal operator: x + = argmin u = argmin u ( h(u)+ 1 ) 2t u x+t g(x) 2 2 ( h(u)+g(x)+ g(x) T (u x)+ 1 ) 2t u x 2 2 x + minimizes h(u) plus a simple quadratic local model of g(u) around x Proximal mapping 6 4

Examples minimize g(x) + h(x) gradient method: h(x) = 0, i.e., minimize g(x) x + = x t g(x) gradient projection method: h(x) = I C (x), i.e., minimize g(x) over C x x + = P C (x t g(x)) C x + x t g(x) Proximal mapping 6 5

soft-thresholding: h(x) = x 1, i.e., minimize g(x)+ x 1 x + = prox th (x t g(x)) and prox th (u) i = u i t u i t 0 t u i t u i +t u i t prox th (u) i t t u i Proximal mapping 6 6

Outline introduction review of conjugate functions proximal mapping

Closed convex function a function with a closed convex epigraph examples f(x) = x, f(x) = log(1 x 2 ), f(x) = xlogx x > 0 0 x = 0 + x < 0 convex functions that are not closed f(x) = { x x < 1 + x 1, f(x) = xlogx x > 0 1 x = 0 + x < 0 Proximal mapping 6 7

Conjugate function the conjugate of a function f is f(x) xy f (y) = sup (y T x f(x)) x dom f x (0, f (y)) f is closed and convex (even if f is not) Fenchel s inequality f(x)+f (y) x T y x,y Proximal mapping 6 8

Examples negative logarithm f(x) = logx f (y) = sup(xy +logx) = x>0 { 1 log( y) y < 0 otherwise quadratic function f(x) = (1/2)x T Qx with Q S n ++ f (y) = sup(y T x 1 x 2 xt Qx) = 1 2 yt Q 1 y Proximal mapping 6 9

for f closed and convex f = f Conjugate of closed functions y f(x) if and only if x f (y) proof: if ˆy f(ˆx), then f(z) f(ˆx)+ ˆy T (z ˆx), z, or z Tˆy f(z) ˆx Tˆy f(ˆx), z so z Tˆy f(z) reaches its maximum over z at ˆx; therefore ˆx f (ˆy) (see page 4-15, on subgradient of sup of functions); this shows ˆy f(ˆx) = ˆx f (ˆy) reverse implication follows from f = f hence, for closed convex f the following three statements are equivalent: x T y = f(x)+f (y) y f(x) x f (y) Proximal mapping 6 10

Indicator function the indicator function of a set C is I C (x) = { 0 x C + otherwise I [ 1,1] (x) I ( 1,1) (x) 1 1 x 1 1 x the indicator function of a (closed) convex set is a (closed) convex function Proximal mapping 6 11

Subgradients and conjugate of indicator function subdifferential is the normal cone to C at x (notation: N C (x)) I C (x) = N C (x) = {s s T (y x) 0 for all y C} C x N C (x) conjugate is the support function of C (notation: S C (y)) I C(y) = sup x ( y T x I C (x) ) = supy T x x C Proximal mapping 6 12

Dual norm the dual norm of a norm is y = sup y T x x 1 (the support function of the unit ball for ) common pairs of dual vector norms x 2, y 2, x 1, y, xt Qx, y T Q 1 y (Q 0) common pairs of dual matrix norms (for inner product tr(x T Y)) X F, Y F, X 2 = σ max (X), Y = i σ i (Y) Proximal mapping 6 13

Conjugate of norm conjugate of f = x is the indicator function of the dual unit norm ball f ( (y) = sup y T x x ) = x { 0 y 1 + otherwise proof if y 1, then by definition of dual norm, y T x x x and equality holds if x = 0; therefore sup(y T x x ) = 0 if y > 1, there exists an x with x 1, y T x > 1; then f (y) y T (tx) tx = t(y T x x ) if t Proximal mapping 6 14

Outline introduction review of conjugate functions proximal mapping

Proximal mapping definition: proximal mapping associated with closed convex h prox h (x) = argmin u ( h(u)+ 1 ) 2 u x 2 2 it can be shown that prox h (x) exists and is unique for all x subgradient characterization from optimality conditions of minimization in the definition: u = prox h (x) x u h(u) Proximal mapping 6 15

Projection proximal mapping of indicator function I C is the Euclidean projection on C prox IC (x) = argmin u C u x 2 2 = P C (x) subgradient characterization u = P C (x) satisfies x N C (u) x u I C (u) = N C (u) u in other words, C (x u) T (y u) 0 y C we will see that proximal mappings have many properties of projections Proximal mapping 6 16

Nonexpansiveness if u = prox h (x), v = prox h (y), then (u v) T (x y) u v 2 2 prox h is firmly nonexpansive, or co-coercive with constant 1 follows from characterization of p.6 15 and monotonicity (p.4-8) x u h(u), y v h(v) = (x u y +v) T (u v) 0 implies (from Cauchy-Schwarz inequality) prox h (x) prox h (y) 2 x y 2 prox h is nonexpansive, or Lipschitz continuous with constant 1 Proximal mapping 6 17

Moreau decomposition prox h (x) = x prox h (x) proof: define u = prox h (x), v = x u from subgradient characterization on p. 6 15: v h(u) hence (from p. 6 10), u h (v) therefore (again from p. 6 15), v = prox h (x) interpretation: decomposition of x in two components x = prox h (x)+prox h (x) Proximal mapping 6 18

example: h(u) = I L (u), L a subspace of R n conjugate is the indicator function of the orthogonal complement L h (v) = supv T u = u L { 0 v L = I L (v) + otherwise Moreau decomposition is orthogonal decomposition x = P L (x)+p L (x) Proximal mapping 6 19

Projection on affine sets hyperplane: C = {x a T x = b} (with a = 0) P C (x) = x+ b at x a 2 2 a affine set: C = {x Ax = b} (with A R p n and rank(a) = p) P C (x) = x+a T (AA T ) 1 (b Ax) inexpensive if p n, or AA T = I,... Proximal mapping 6 20

Projection on simple polyhedral sets halfspace: C = {x a T x b} (with a = 0) P C (x) = x+ b at x a 2 a if a T x > b, 2 P C (x) = x if a T x b rectangle: C = [l,u] = {x l x u} P C (x) i = l i x i l i x i l i x i u i u i x i u i nonnegative orthant: C = R n + P C (x) = x + (x + is componentwise max of 0 and x) Proximal mapping 6 21

probability simplex: C = {x 1 T x = 1,x ર 0} P C (x) = (x λ1) + where λ is the solution of the equation 1 T (x λ1) + = n max{0,x k λ} = 1 i=1 intersection of hyperplane and rectangle: C = {x a T x = b,l x u} P C (x) = P [l,u] (x λa) where λ is the solution of a T P [l,u] (x λa) = b Proximal mapping 6 22

Projection on norm balls Euclidean ball: C = {x x 2 1} P C (x) = 1 x 2 x if x 2 > 1, P C (x) = x if x 2 1 1-norm ball: C = {x x 1 1} P C (x) k = x k λ x k > λ 0 λ x k λ x k +λ x k < λ λ = 0 if x 1 1; otherwise λ is the solution of the equation n max{ x k λ,0} = 1 k=1 Proximal mapping 6 23

Projection on simple cones second order cone C = {(x,t) R n 1 x 2 t} P C (x,t) = (x,t) if x 2 t, P C (x,t) = (0,0) if x 2 t and P C (x,t) = t+ x 2 2 x 2 [ ] x x 2 if t < x 2 < t, x = 0 positive semidefinite cone C = S n + P C (X) = n max{0,λ i }q i qi T i=1 if X = n i=1 λ i q i q T i is the eigenvalue decomposition of X Proximal mapping 6 24

Other examples of proximal mappings quadratic function h(x) = 1 2 xt Ax+b T x+c, prox th (x) = (I +ta) 1 (x tb) Euclidean norm: h(x) = x 2 prox th (x) = { (1 t/ x 2 )x x 2 t 0 otherwise logarithmic barrier h(x) = n logx i, i=1 prox th (x) i = x i+ x 2 i +4t, i = 1,...,n 2 Proximal mapping 6 25

Some calculus rules separable sum: h(x 1,x 2 ) = h 1 (x 1 )+h 2 (x 2 ) prox h (x 1,x 2 ) = ( prox h1 (x 1 ),prox h2 (x 2 ) ) scaling and translation of argument: h(x) = f(λx+a) with λ = 0 prox h (x) = 1 λ ( proxλ 2 f(λx+a) a ) Proximal mapping 6 26

prox-operator of conjugate: for t > 0, prox th (x) = x tprox h/t (x/t) proof the conjugate of g(u) = h(u)/t is g (v) = sup u ( v T u h(u)/t ) = 1 t sup u ( (tv) T u h(u) ) = 1 t h (tv) by the Moreau decomposition and the second property on page 6 26 prox g (y) = y prox g (y) = y 1 t prox th (ty) a change of variables x = ty gives prox h/t (x/t) = 1 t x 1 t prox th (x) Proximal mapping 6 27

Support function support function of closed convex set h(x) = S C (x) = supx T y y C prox-operator: h = I C (y) is indicator function of C; so from p.6 27, prox th (x) = x tprox h /t(x/t) = x tp C (x/t) example: h(x) is sum of largest r components of x h(x) = x [1] + +x [r] = S C (x), C = {y 0 y 1,1 T y = r} prox-operator of h is easily evaluated via projection on C Proximal mapping 6 28

Norms h(x) = x, h (y) = I B (y) where B = {y y 1}, the unit norm ball for the dual norm (p.6 29) prox-operator: from page 6 27, prox th (x) = x tprox h /t(x/t) = x tp B (x/t) a useful formula for prox th when the projection P B is inexpensive examples h(x) = x 1, B = {y y 1} (gives soft-threshold operator; p.6 2) h(x) = x 2, B = {y y 2 1} (gives formula of page 6 25) Proximal mapping 6 29

Distance to a point distance in general norm h(x) = x a prox-operator: from p.6 26, with g(x) = x prox th (x) = a+prox g (x a) = a+x a tp B ( x a ) t ( ) x a = x tp B t B is the unit ball for the dual norm Proximal mapping 6 30

Euclidean distance to a set Euclidean distance (C is a closed convex set) dist(x) = inf y C x y 2 prox-operator prox td (x) = θp C (x)+(1 θ)x, θ = { t/dist(x) dist(x) t 1 otherwise prox-operator of squared distance: h(x) = dist(x) 2 /2 prox th (x) = 1 t x+ 1+t 1+t P C(x) Proximal mapping 6 31

proof of expression for prox td (x) if u = prox td (x) C, then from page 6 15 and page 5-15 x u = t d(u) (u P C(u)) implies P C (u) = P C (x), d(x) t, u is convex combination of x, P C (x) u x C P C (u) if u C minimizes d(u)+(1/(2t)) u x 2 2, then u = P C (x) Proximal mapping 6 32

proof of expression for prox th (x) when h(x) = d(x) 2 /2 prox th (x) = argmin u = argmin u optimal u as a function of v is optimal v minimizes ( 1 2 d(u)2 + 1 ) 2t u x 2 2 ( 1 inf v C 2 u v 2 2+ 1 2t u x 2 2 u = t t+1 v + 1 t+1 x ) 1 2 t t+1 v + 1 2 t+1 x v 2+ 1 2t t t+1 v + 1 t+1 x x 2 2 = t 2(1+t) v x 2 2 over C, i.e., v = P C (x) Proximal mapping 6 33

References this lecture is a slightly modified version of: L. Vandenberghe, Lecture notes for EE236C - Optimization Methods for Large-Scale Systems (Spring 2011), UCLA. P. L. Combettes and V.-R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Modeling and Simulation (2005) P. L. Combettes and J.-Ch. Pesquet, Proximal splitting methods in signal processsing, arxiv.org/abs/0912.3522v4 (2010) Proximal mapping 6 34