EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping PDF Free Download

EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1

Proximal mapping the proximal mapping (prox-operator) of a convex function h is examples prox h (x) = argmin u h(x) = 0: prox h (x) = x ( h(u)+ 1 ) 2 u x 2 2 h(x) = I C (x) (indicator function of C): prox h is projection on C prox h (x) = argmin u C u x 2 2 = P C (x) h(x) = t x 1 : prox h is the soft-threshold (shrinkage) operation x i t x i t prox h (x) i = 0 x i t x i +t x i t Proximal mapping 6 2

Proximal gradient method unconstrained problem with cost function split in two components minimize f(x) = g(x)+h(x) g convex, differentiable, with domg = R n h convex, possibly nondifferentiable, with inexpensive prox-operator proximal gradient algorithm x (k) = prox tk h ( ) x (k 1) t k g(x (k 1) ) t k > 0 is step size, constant or determined by line search Proximal mapping 6 3

Interpretation x + = prox th (x t g(x)) from definition of proximal operator: x + = argmin u = argmin u ( h(u)+ 1 ) 2t u x+t g(x) 2 2 ( h(u)+g(x)+ g(x) T (u x)+ 1 ) 2t u x 2 2 x + minimizes h(u) plus a simple quadratic local model of g(u) around x Proximal mapping 6 4

Examples minimize g(x) + h(x) gradient method: h(x) = 0, i.e., minimize g(x) x + = x t g(x) gradient projection method: h(x) = I C (x), i.e., minimize g(x) over C x x + = P C (x t g(x)) C x + x t g(x) Proximal mapping 6 5

soft-thresholding: h(x) = x 1, i.e., minimize g(x)+ x 1 x + = prox th (x t g(x)) and prox th (u) i = u i t u i t 0 t u i t u i +t u i t prox th (u) i t t u i Proximal mapping 6 6

Outline introduction review of conjugate functions proximal mapping

Closed convex function a function with a closed convex epigraph examples f(x) = x, f(x) = log(1 x 2 ), f(x) = xlogx x > 0 0 x = 0 + x < 0 convex functions that are not closed f(x) = { x x < 1 + x 1, f(x) = xlogx x > 0 1 x = 0 + x < 0 Proximal mapping 6 7

Conjugate function the conjugate of a function f is f(x) xy f (y) = sup (y T x f(x)) x dom f x (0, f (y)) f is closed and convex (even if f is not) Fenchel s inequality f(x)+f (y) x T y x,y Proximal mapping 6 8

Examples negative logarithm f(x) = logx f (y) = sup(xy +logx) = x>0 { 1 log( y) y < 0 otherwise quadratic function f(x) = (1/2)x T Qx with Q S n ++ f (y) = sup(y T x 1 x 2 xt Qx) = 1 2 yt Q 1 y Proximal mapping 6 9

for f closed and convex f = f Conjugate of closed functions y f(x) if and only if x f (y) proof: if ˆy f(ˆx), then f(z) f(ˆx)+ ˆy T (z ˆx), z, or z Tˆy f(z) ˆx Tˆy f(ˆx), z so z Tˆy f(z) reaches its maximum over z at ˆx; therefore ˆx f (ˆy) (see page 4-15, on subgradient of sup of functions); this shows ˆy f(ˆx) = ˆx f (ˆy) reverse implication follows from f = f hence, for closed convex f the following three statements are equivalent: x T y = f(x)+f (y) y f(x) x f (y) Proximal mapping 6 10

Indicator function the indicator function of a set C is I C (x) = { 0 x C + otherwise I [ 1,1] (x) I ( 1,1) (x) 1 1 x 1 1 x the indicator function of a (closed) convex set is a (closed) convex function Proximal mapping 6 11

Subgradients and conjugate of indicator function subdifferential is the normal cone to C at x (notation: N C (x)) I C (x) = N C (x) = {s s T (y x) 0 for all y C} C x N C (x) conjugate is the support function of C (notation: S C (y)) I C(y) = sup x ( y T x I C (x) ) = supy T x x C Proximal mapping 6 12

Dual norm the dual norm of a norm is y = sup y T x x 1 (the support function of the unit ball for ) common pairs of dual vector norms x 2, y 2, x 1, y, xt Qx, y T Q 1 y (Q 0) common pairs of dual matrix norms (for inner product tr(x T Y)) X F, Y F, X 2 = σ max (X), Y = i σ i (Y) Proximal mapping 6 13

Conjugate of norm conjugate of f = x is the indicator function of the dual unit norm ball f ( (y) = sup y T x x ) = x { 0 y 1 + otherwise proof if y 1, then by definition of dual norm, y T x x x and equality holds if x = 0; therefore sup(y T x x ) = 0 if y > 1, there exists an x with x 1, y T x > 1; then f (y) y T (tx) tx = t(y T x x ) if t Proximal mapping 6 14

Outline introduction review of conjugate functions proximal mapping

Proximal mapping definition: proximal mapping associated with closed convex h prox h (x) = argmin u ( h(u)+ 1 ) 2 u x 2 2 it can be shown that prox h (x) exists and is unique for all x subgradient characterization from optimality conditions of minimization in the definition: u = prox h (x) x u h(u) Proximal mapping 6 15

Projection proximal mapping of indicator function I C is the Euclidean projection on C prox IC (x) = argmin u C u x 2 2 = P C (x) subgradient characterization u = P C (x) satisfies x N C (u) x u I C (u) = N C (u) u in other words, C (x u) T (y u) 0 y C we will see that proximal mappings have many properties of projections Proximal mapping 6 16

Nonexpansiveness if u = prox h (x), v = prox h (y), then (u v) T (x y) u v 2 2 prox h is firmly nonexpansive, or co-coercive with constant 1 follows from characterization of p.6 15 and monotonicity (p.4-8) x u h(u), y v h(v) = (x u y +v) T (u v) 0 implies (from Cauchy-Schwarz inequality) prox h (x) prox h (y) 2 x y 2 prox h is nonexpansive, or Lipschitz continuous with constant 1 Proximal mapping 6 17

Moreau decomposition prox h (x) = x prox h (x) proof: define u = prox h (x), v = x u from subgradient characterization on p. 6 15: v h(u) hence (from p. 6 10), u h (v) therefore (again from p. 6 15), v = prox h (x) interpretation: decomposition of x in two components x = prox h (x)+prox h (x) Proximal mapping 6 18

example: h(u) = I L (u), L a subspace of R n conjugate is the indicator function of the orthogonal complement L h (v) = supv T u = u L { 0 v L = I L (v) + otherwise Moreau decomposition is orthogonal decomposition x = P L (x)+p L (x) Proximal mapping 6 19

Projection on affine sets hyperplane: C = {x a T x = b} (with a = 0) P C (x) = x+ b at x a 2 2 a affine set: C = {x Ax = b} (with A R p n and rank(a) = p) P C (x) = x+a T (AA T ) 1 (b Ax) inexpensive if p n, or AA T = I,... Proximal mapping 6 20

Projection on simple polyhedral sets halfspace: C = {x a T x b} (with a = 0) P C (x) = x+ b at x a 2 a if a T x > b, 2 P C (x) = x if a T x b rectangle: C = [l,u] = {x l x u} P C (x) i = l i x i l i x i l i x i u i u i x i u i nonnegative orthant: C = R n + P C (x) = x + (x + is componentwise max of 0 and x) Proximal mapping 6 21

probability simplex: C = {x 1 T x = 1,x ર 0} P C (x) = (x λ1) + where λ is the solution of the equation 1 T (x λ1) + = n max{0,x k λ} = 1 i=1 intersection of hyperplane and rectangle: C = {x a T x = b,l x u} P C (x) = P [l,u] (x λa) where λ is the solution of a T P [l,u] (x λa) = b Proximal mapping 6 22

Projection on norm balls Euclidean ball: C = {x x 2 1} P C (x) = 1 x 2 x if x 2 > 1, P C (x) = x if x 2 1 1-norm ball: C = {x x 1 1} P C (x) k = x k λ x k > λ 0 λ x k λ x k +λ x k < λ λ = 0 if x 1 1; otherwise λ is the solution of the equation n max{ x k λ,0} = 1 k=1 Proximal mapping 6 23

Projection on simple cones second order cone C = {(x,t) R n 1 x 2 t} P C (x,t) = (x,t) if x 2 t, P C (x,t) = (0,0) if x 2 t and P C (x,t) = t+ x 2 2 x 2 [ ] x x 2 if t < x 2 < t, x = 0 positive semidefinite cone C = S n + P C (X) = n max{0,λ i }q i qi T i=1 if X = n i=1 λ i q i q T i is the eigenvalue decomposition of X Proximal mapping 6 24

Other examples of proximal mappings quadratic function h(x) = 1 2 xt Ax+b T x+c, prox th (x) = (I +ta) 1 (x tb) Euclidean norm: h(x) = x 2 prox th (x) = { (1 t/ x 2 )x x 2 t 0 otherwise logarithmic barrier h(x) = n logx i, i=1 prox th (x) i = x i+ x 2 i +4t, i = 1,...,n 2 Proximal mapping 6 25

Some calculus rules separable sum: h(x 1,x 2 ) = h 1 (x 1 )+h 2 (x 2 ) prox h (x 1,x 2 ) = ( prox h1 (x 1 ),prox h2 (x 2 ) ) scaling and translation of argument: h(x) = f(λx+a) with λ = 0 prox h (x) = 1 λ ( proxλ 2 f(λx+a) a ) Proximal mapping 6 26

prox-operator of conjugate: for t > 0, prox th (x) = x tprox h/t (x/t) proof the conjugate of g(u) = h(u)/t is g (v) = sup u ( v T u h(u)/t ) = 1 t sup u ( (tv) T u h(u) ) = 1 t h (tv) by the Moreau decomposition and the second property on page 6 26 prox g (y) = y prox g (y) = y 1 t prox th (ty) a change of variables x = ty gives prox h/t (x/t) = 1 t x 1 t prox th (x) Proximal mapping 6 27

Support function support function of closed convex set h(x) = S C (x) = supx T y y C prox-operator: h = I C (y) is indicator function of C; so from p.6 27, prox th (x) = x tprox h /t(x/t) = x tp C (x/t) example: h(x) is sum of largest r components of x h(x) = x [1] + +x [r] = S C (x), C = {y 0 y 1,1 T y = r} prox-operator of h is easily evaluated via projection on C Proximal mapping 6 28

Norms h(x) = x, h (y) = I B (y) where B = {y y 1}, the unit norm ball for the dual norm (p.6 29) prox-operator: from page 6 27, prox th (x) = x tprox h /t(x/t) = x tp B (x/t) a useful formula for prox th when the projection P B is inexpensive examples h(x) = x 1, B = {y y 1} (gives soft-threshold operator; p.6 2) h(x) = x 2, B = {y y 2 1} (gives formula of page 6 25) Proximal mapping 6 29

Distance to a point distance in general norm h(x) = x a prox-operator: from p.6 26, with g(x) = x prox th (x) = a+prox g (x a) = a+x a tp B ( x a ) t ( ) x a = x tp B t B is the unit ball for the dual norm Proximal mapping 6 30

Euclidean distance to a set Euclidean distance (C is a closed convex set) dist(x) = inf y C x y 2 prox-operator prox td (x) = θp C (x)+(1 θ)x, θ = { t/dist(x) dist(x) t 1 otherwise prox-operator of squared distance: h(x) = dist(x) 2 /2 prox th (x) = 1 t x+ 1+t 1+t P C(x) Proximal mapping 6 31

proof of expression for prox td (x) if u = prox td (x) C, then from page 6 15 and page 5-15 x u = t d(u) (u P C(u)) implies P C (u) = P C (x), d(x) t, u is convex combination of x, P C (x) u x C P C (u) if u C minimizes d(u)+(1/(2t)) u x 2 2, then u = P C (x) Proximal mapping 6 32

proof of expression for prox th (x) when h(x) = d(x) 2 /2 prox th (x) = argmin u = argmin u optimal u as a function of v is optimal v minimizes ( 1 2 d(u)2 + 1 ) 2t u x 2 2 ( 1 inf v C 2 u v 2 2+ 1 2t u x 2 2 u = t t+1 v + 1 t+1 x ) 1 2 t t+1 v + 1 2 t+1 x v 2+ 1 2t t t+1 v + 1 t+1 x x 2 2 = t 2(1+t) v x 2 2 over C, i.e., v = P C (x) Proximal mapping 6 33

References this lecture is a slightly modified version of: L. Vandenberghe, Lecture notes for EE236C - Optimization Methods for Large-Scale Systems (Spring 2011), UCLA. P. L. Combettes and V.-R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Modeling and Simulation (2005) P. L. Combettes and J.-Ch. Pesquet, Proximal splitting methods in signal processsing, arxiv.org/abs/0912.3522v4 (2010) Proximal mapping 6 34

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1