EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1
Proximal mapping the proximal mapping (prox-operator) of a convex function h is examples prox h (x) = argmin u h(x) = 0: prox h (x) = x ( h(u)+ 1 ) 2 u x 2 2 h(x) = I C (x) (indicator function of C): prox h is projection on C prox h (x) = argmin u C u x 2 2 = P C (x) h(x) = t x 1 : prox h is the soft-threshold (shrinkage) operation x i t x i t prox h (x) i = 0 x i t x i +t x i t Proximal mapping 6 2
Proximal gradient method unconstrained problem with cost function split in two components minimize f(x) = g(x)+h(x) g convex, differentiable, with domg = R n h convex, possibly nondifferentiable, with inexpensive prox-operator proximal gradient algorithm x (k) = prox tk h ( ) x (k 1) t k g(x (k 1) ) t k > 0 is step size, constant or determined by line search Proximal mapping 6 3
Interpretation x + = prox th (x t g(x)) from definition of proximal operator: x + = argmin u = argmin u ( h(u)+ 1 ) 2t u x+t g(x) 2 2 ( h(u)+g(x)+ g(x) T (u x)+ 1 ) 2t u x 2 2 x + minimizes h(u) plus a simple quadratic local model of g(u) around x Proximal mapping 6 4
Examples minimize g(x) + h(x) gradient method: h(x) = 0, i.e., minimize g(x) x + = x t g(x) gradient projection method: h(x) = I C (x), i.e., minimize g(x) over C x x + = P C (x t g(x)) C x + x t g(x) Proximal mapping 6 5
soft-thresholding: h(x) = x 1, i.e., minimize g(x)+ x 1 x + = prox th (x t g(x)) and prox th (u) i = u i t u i t 0 t u i t u i +t u i t prox th (u) i t t u i Proximal mapping 6 6
Outline introduction review of conjugate functions proximal mapping
Closed convex function a function with a closed convex epigraph examples f(x) = x, f(x) = log(1 x 2 ), f(x) = xlogx x > 0 0 x = 0 + x < 0 convex functions that are not closed f(x) = { x x < 1 + x 1, f(x) = xlogx x > 0 1 x = 0 + x < 0 Proximal mapping 6 7
Conjugate function the conjugate of a function f is f(x) xy f (y) = sup (y T x f(x)) x dom f x (0, f (y)) f is closed and convex (even if f is not) Fenchel s inequality f(x)+f (y) x T y x,y Proximal mapping 6 8
Examples negative logarithm f(x) = logx f (y) = sup(xy +logx) = x>0 { 1 log( y) y < 0 otherwise quadratic function f(x) = (1/2)x T Qx with Q S n ++ f (y) = sup(y T x 1 x 2 xt Qx) = 1 2 yt Q 1 y Proximal mapping 6 9
for f closed and convex f = f Conjugate of closed functions y f(x) if and only if x f (y) proof: if ˆy f(ˆx), then f(z) f(ˆx)+ ˆy T (z ˆx), z, or z Tˆy f(z) ˆx Tˆy f(ˆx), z so z Tˆy f(z) reaches its maximum over z at ˆx; therefore ˆx f (ˆy) (see page 4-15, on subgradient of sup of functions); this shows ˆy f(ˆx) = ˆx f (ˆy) reverse implication follows from f = f hence, for closed convex f the following three statements are equivalent: x T y = f(x)+f (y) y f(x) x f (y) Proximal mapping 6 10
Indicator function the indicator function of a set C is I C (x) = { 0 x C + otherwise I [ 1,1] (x) I ( 1,1) (x) 1 1 x 1 1 x the indicator function of a (closed) convex set is a (closed) convex function Proximal mapping 6 11
Subgradients and conjugate of indicator function subdifferential is the normal cone to C at x (notation: N C (x)) I C (x) = N C (x) = {s s T (y x) 0 for all y C} C x N C (x) conjugate is the support function of C (notation: S C (y)) I C(y) = sup x ( y T x I C (x) ) = supy T x x C Proximal mapping 6 12
Dual norm the dual norm of a norm is y = sup y T x x 1 (the support function of the unit ball for ) common pairs of dual vector norms x 2, y 2, x 1, y, xt Qx, y T Q 1 y (Q 0) common pairs of dual matrix norms (for inner product tr(x T Y)) X F, Y F, X 2 = σ max (X), Y = i σ i (Y) Proximal mapping 6 13
Conjugate of norm conjugate of f = x is the indicator function of the dual unit norm ball f ( (y) = sup y T x x ) = x { 0 y 1 + otherwise proof if y 1, then by definition of dual norm, y T x x x and equality holds if x = 0; therefore sup(y T x x ) = 0 if y > 1, there exists an x with x 1, y T x > 1; then f (y) y T (tx) tx = t(y T x x ) if t Proximal mapping 6 14
Outline introduction review of conjugate functions proximal mapping
Proximal mapping definition: proximal mapping associated with closed convex h prox h (x) = argmin u ( h(u)+ 1 ) 2 u x 2 2 it can be shown that prox h (x) exists and is unique for all x subgradient characterization from optimality conditions of minimization in the definition: u = prox h (x) x u h(u) Proximal mapping 6 15
Projection proximal mapping of indicator function I C is the Euclidean projection on C prox IC (x) = argmin u C u x 2 2 = P C (x) subgradient characterization u = P C (x) satisfies x N C (u) x u I C (u) = N C (u) u in other words, C (x u) T (y u) 0 y C we will see that proximal mappings have many properties of projections Proximal mapping 6 16
Nonexpansiveness if u = prox h (x), v = prox h (y), then (u v) T (x y) u v 2 2 prox h is firmly nonexpansive, or co-coercive with constant 1 follows from characterization of p.6 15 and monotonicity (p.4-8) x u h(u), y v h(v) = (x u y +v) T (u v) 0 implies (from Cauchy-Schwarz inequality) prox h (x) prox h (y) 2 x y 2 prox h is nonexpansive, or Lipschitz continuous with constant 1 Proximal mapping 6 17
Moreau decomposition prox h (x) = x prox h (x) proof: define u = prox h (x), v = x u from subgradient characterization on p. 6 15: v h(u) hence (from p. 6 10), u h (v) therefore (again from p. 6 15), v = prox h (x) interpretation: decomposition of x in two components x = prox h (x)+prox h (x) Proximal mapping 6 18
example: h(u) = I L (u), L a subspace of R n conjugate is the indicator function of the orthogonal complement L h (v) = supv T u = u L { 0 v L = I L (v) + otherwise Moreau decomposition is orthogonal decomposition x = P L (x)+p L (x) Proximal mapping 6 19
Projection on affine sets hyperplane: C = {x a T x = b} (with a = 0) P C (x) = x+ b at x a 2 2 a affine set: C = {x Ax = b} (with A R p n and rank(a) = p) P C (x) = x+a T (AA T ) 1 (b Ax) inexpensive if p n, or AA T = I,... Proximal mapping 6 20
Projection on simple polyhedral sets halfspace: C = {x a T x b} (with a = 0) P C (x) = x+ b at x a 2 a if a T x > b, 2 P C (x) = x if a T x b rectangle: C = [l,u] = {x l x u} P C (x) i = l i x i l i x i l i x i u i u i x i u i nonnegative orthant: C = R n + P C (x) = x + (x + is componentwise max of 0 and x) Proximal mapping 6 21
probability simplex: C = {x 1 T x = 1,x ર 0} P C (x) = (x λ1) + where λ is the solution of the equation 1 T (x λ1) + = n max{0,x k λ} = 1 i=1 intersection of hyperplane and rectangle: C = {x a T x = b,l x u} P C (x) = P [l,u] (x λa) where λ is the solution of a T P [l,u] (x λa) = b Proximal mapping 6 22
Projection on norm balls Euclidean ball: C = {x x 2 1} P C (x) = 1 x 2 x if x 2 > 1, P C (x) = x if x 2 1 1-norm ball: C = {x x 1 1} P C (x) k = x k λ x k > λ 0 λ x k λ x k +λ x k < λ λ = 0 if x 1 1; otherwise λ is the solution of the equation n max{ x k λ,0} = 1 k=1 Proximal mapping 6 23
Projection on simple cones second order cone C = {(x,t) R n 1 x 2 t} P C (x,t) = (x,t) if x 2 t, P C (x,t) = (0,0) if x 2 t and P C (x,t) = t+ x 2 2 x 2 [ ] x x 2 if t < x 2 < t, x = 0 positive semidefinite cone C = S n + P C (X) = n max{0,λ i }q i qi T i=1 if X = n i=1 λ i q i q T i is the eigenvalue decomposition of X Proximal mapping 6 24
Other examples of proximal mappings quadratic function h(x) = 1 2 xt Ax+b T x+c, prox th (x) = (I +ta) 1 (x tb) Euclidean norm: h(x) = x 2 prox th (x) = { (1 t/ x 2 )x x 2 t 0 otherwise logarithmic barrier h(x) = n logx i, i=1 prox th (x) i = x i+ x 2 i +4t, i = 1,...,n 2 Proximal mapping 6 25
Some calculus rules separable sum: h(x 1,x 2 ) = h 1 (x 1 )+h 2 (x 2 ) prox h (x 1,x 2 ) = ( prox h1 (x 1 ),prox h2 (x 2 ) ) scaling and translation of argument: h(x) = f(λx+a) with λ = 0 prox h (x) = 1 λ ( proxλ 2 f(λx+a) a ) Proximal mapping 6 26
prox-operator of conjugate: for t > 0, prox th (x) = x tprox h/t (x/t) proof the conjugate of g(u) = h(u)/t is g (v) = sup u ( v T u h(u)/t ) = 1 t sup u ( (tv) T u h(u) ) = 1 t h (tv) by the Moreau decomposition and the second property on page 6 26 prox g (y) = y prox g (y) = y 1 t prox th (ty) a change of variables x = ty gives prox h/t (x/t) = 1 t x 1 t prox th (x) Proximal mapping 6 27
Support function support function of closed convex set h(x) = S C (x) = supx T y y C prox-operator: h = I C (y) is indicator function of C; so from p.6 27, prox th (x) = x tprox h /t(x/t) = x tp C (x/t) example: h(x) is sum of largest r components of x h(x) = x [1] + +x [r] = S C (x), C = {y 0 y 1,1 T y = r} prox-operator of h is easily evaluated via projection on C Proximal mapping 6 28
Norms h(x) = x, h (y) = I B (y) where B = {y y 1}, the unit norm ball for the dual norm (p.6 29) prox-operator: from page 6 27, prox th (x) = x tprox h /t(x/t) = x tp B (x/t) a useful formula for prox th when the projection P B is inexpensive examples h(x) = x 1, B = {y y 1} (gives soft-threshold operator; p.6 2) h(x) = x 2, B = {y y 2 1} (gives formula of page 6 25) Proximal mapping 6 29
Distance to a point distance in general norm h(x) = x a prox-operator: from p.6 26, with g(x) = x prox th (x) = a+prox g (x a) = a+x a tp B ( x a ) t ( ) x a = x tp B t B is the unit ball for the dual norm Proximal mapping 6 30
Euclidean distance to a set Euclidean distance (C is a closed convex set) dist(x) = inf y C x y 2 prox-operator prox td (x) = θp C (x)+(1 θ)x, θ = { t/dist(x) dist(x) t 1 otherwise prox-operator of squared distance: h(x) = dist(x) 2 /2 prox th (x) = 1 t x+ 1+t 1+t P C(x) Proximal mapping 6 31
proof of expression for prox td (x) if u = prox td (x) C, then from page 6 15 and page 5-15 x u = t d(u) (u P C(u)) implies P C (u) = P C (x), d(x) t, u is convex combination of x, P C (x) u x C P C (u) if u C minimizes d(u)+(1/(2t)) u x 2 2, then u = P C (x) Proximal mapping 6 32
proof of expression for prox th (x) when h(x) = d(x) 2 /2 prox th (x) = argmin u = argmin u optimal u as a function of v is optimal v minimizes ( 1 2 d(u)2 + 1 ) 2t u x 2 2 ( 1 inf v C 2 u v 2 2+ 1 2t u x 2 2 u = t t+1 v + 1 t+1 x ) 1 2 t t+1 v + 1 2 t+1 x v 2+ 1 2t t t+1 v + 1 t+1 x x 2 2 = t 2(1+t) v x 2 2 over C, i.e., v = P C (x) Proximal mapping 6 33
References this lecture is a slightly modified version of: L. Vandenberghe, Lecture notes for EE236C - Optimization Methods for Large-Scale Systems (Spring 2011), UCLA. P. L. Combettes and V.-R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Modeling and Simulation (2005) P. L. Combettes and J.-Ch. Pesquet, Proximal splitting methods in signal processsing, arxiv.org/abs/0912.3522v4 (2010) Proximal mapping 6 34