The proximal mapping

Size: px

Start display at page:

Download "The proximal mapping"

James Shields
6 years ago
Views:

1 The proximal mapping Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes

2 Outline 2/37 1 closed function 2 Conjugate function 3 Proximal Mapping

3 Closed set 3/37 a set C is closed if it contains its boundary: x k C, x k x = x C operations that preserve closedness the intersection of (finitely or infinitely many) closed sets is closed the union of a finite number of closed sets is closed inverse under linear mapping: {x Ax C} is closed if C is closed

4 4/37 Image under linear mapping the image of a closed set under a linear mapping is not necessarily closed example(c is closed, AC = {Ax x C} is open): C = {(x 1, x 2 ) R 2 + x 1 x 2 1}, A = [ 1 0 ], AC = R ++ sufficient condition: AC is closed if C is closed and convex and C does not have a recession direction in the nullspace of A. i.e., Ay = 0, ˆx C, ˆx + αy C α 0 = y = 0 in particular, this holds for any A if C is bounded

5 5/37 Closed function definition: a function is closed if its epigraph is a closed set examples f (x) = log(1 x 2 ) with dom f = {x x < 1} f (x) = x log x with dom f = R + and f (0) = 0 indicator function of a closed set C: f (x) = 0 if x C = dom f not closed f (x) = x log x with dom f = R ++ or dom f = R + and f (0) = 1 indicator function of a set C if C is not closed

6 Properties sublevel sets: f is closed if and only if all its sublevel sets are closed minimum: if f is closed with bounded sublevel sets then it has a minimizer Weierstrass Suppose that the set D E (a finite dimensional vector space over R n ) is nonempty and closed, and that all sublevel sets of the continuous function f : D R are bounded. Then f has a global minimizer. common operations on convex functions that preserve closedness sum: f + g is closed if f and g are closed (and dom f dom g ) composition with affine mapping: f (Ax + b) is closed if f is closed supremum: sup α f α (x) is closed if each function f α is closed 6/37

7 Outline 7/37 1 closed function 2 Conjugate function 3 Proximal Mapping

8 Conjugate function 8/37 the conjugate of a function f is f (y) = sup (y T x f (x)) x dom f f is closed and convex even if f is not Fenchel s s inequality f (x) + f (y) x T y x, y (extends inequality x T x/2 + y T y/2 x T y to non-quadratic convex f )

9 Quadratic function 9/37 f (x) = 1 2 xt Ax + b T x + c strictly convex case (A 0) f (y) = 1 2 (y b)t A 1 (y b) c general convex case (A 0) f (y) = 1 2 (y b)t A (y b) c, dom f = range(a) + b

10 Negative entropy and negative logarithm 10/37 negative entropy f (x) = negative logarithm n x i log x i f (y) = n i=1 i=1 e y i 1 n f (x) = log x i i=1 n f (y) = log( y i ) n i=1 matrix logarithm f (x) = log det X (dom f = S n ++) f (Y) = log det( Y) n

11 Indicator function and norm 11/37 indicator of convex set C: conjugate is support function of C { 0, x C f (x) = +, x C f (y) = sup y T x x C norm: conjugate is indicator of unit dual norm ball { f (x) = x f 0, y 1 (y) = +, y > 1 (see next page)

12 12/37 proof: recall the definition of dual norm: y = sup x T y x 1 to evaluate f (y) = sup x (y T x x ) we distinguish two cases if y 1, then (by definition of dual norm) y T x x x and equality holds if x = 0; therefore sup x (y T x x ) = 0 if y > 1, there exists an x with x 1, x T y > 1; then f (y) y T (tx) tx = t(y T x x ) and r.h.s. goes to infinity if t

13 The second conjugate 13/37 f (x) = sup (x T y f (y)) y dom f f (x) is closed and convex from Fenchel s inequality x T y f (y) f (x) for all y and x): f f (x) x equivalently, epi f epi f (for any f ) if f is closed and convex, then f (x) = f (x) x equivalently, epi f = epi f (if f is closed convex); proof on next page

14 proof (f = f if f is closed and convex): by contradiction suppose (x, f (x)) epi f ; then there is a strict separating hyperplane: [ a b ] T [ z x s f (x) ] c 0 (z, s) epi f for some a, b, c with b 0 (b > 0 gives a contradiction as s ) if b < 0, define y = a/( b) and maximize l.h.s. over (z, s) epi f : f (y) y T x + f (x) c/( b) < 0 this contradicts Fenchel s inequality if b = 0, choose ŷ dom f and add small multiple of (ŷ, 1) to (a, b): [ ] T [ ] a + ɛŷ z x ɛ s f c + ɛ(f (ŷ) x T ŷ + f (x)) < 0 (x) now apply the argument for b < 0 14/37

15 Conjugates and subgradients 15/37 if f is closed and convex, then y f (x) x f (y) x T y = f (x) + f (y) proof: if y f (x), then f (y) = sup u (y T u f (u)) = y T x f (x) f (v) = sup(v T u f (u) u v T x f (x) = x T (v y) f (x) + y T x = f (y) + x T (v y) (1) for all v; therefore, x is a subgradient of f at y (x f (y)) reverse implication x f (y) y f (x) follows from f = f

16 Some calculus rules 16/37 separable sum f (x 1, x 2 ) = g(x 1 ) + h(x 2 ) f (y 1, y 2 ) = g (y 1 ) + h (y 2 ) scalar multiplication: (for α > 0) f (x) = αg(x) f (y) = αg (y/α) addition to affine function f (x) = g(x) + a T x + b f (y) = g (y a) b infimal convolution f (x) = inf (g(u) + h(v)) f u+v=x (y) = g (y) + h (y)

17 Outline 17/37 1 closed function 2 Conjugate function 3 Proximal Mapping

18 Proximal mapping 18/37 ( prox f (x) = argmin f (u) + 1 ) u 2 u x 2 2 if f is closed and convex then prox f (x) exists and is unique for all x existence: f (u) + (1/2) u x 2 2 is closed with bounded sublevel sets uniqueness: f (u) + (1/2) u x 2 2 is strictly (in fact, strongly) convex subgradient characterization u = prox f (x) x u f (u) we are interested in functions f for which prox tf is inexpensive

19 Examples 19/37 quadratic function (A 0) f (x) = 1 2 xt Ax + b T x + c, prox t f (x) = (I + ta) 1 (x tb) Euclidean norm: f (x) = x 2 f (x) = { (1 t/ x 2 )x x 2 t 0 otherwise logarithmic barrier f (x) = n i=1 log x i, prox t f (x) i = x i + xi 2 + 4t, i = 1,..., n 2

20 Some simple calculus rules 20/37 separable sum f ([ ]) ([ ]) x x = g(x) + h(y), prox y f = y [ ] proxg (x) prox h (y) scaling and translation of argument : with λ 0 f (x) = g(λx + a), prox f (x) = 1 λ (prox λ 2 g (λx + a) a) scaling and translation of argument : with λ > 0 f (x) = λg(x/λ), prox f = λprox λ 1 g (x/λ)

21 Addition to linear or quadratic function 21/37 linear function f (x) = g(x) + a T x, prox f = prox g (x a) quadratic function: with u > 0 f (x) = g(x) + u 2 x a 2 2, prox f (x) = prox θg (θx + (1 θ)a), where θ = 1/(1 + u)

22 22/37 Moreau decomposition x = prox f (x) + prox f (x) x follows from properties of conjugates and subgradients: u = prox f (x) x u f (u) u f (x u) x u = prox f (x) generalizes decomposition by orthogonal projection on subspaces: x = P L (x) + P L (x) if L is a subspace, L its orthogonal complement (this is Moreau decomposition with f = I L, f = I L )

23 Extended Moreau decomposition 23/37 for λ > 0 x = prox λf (x) + λprox λ 1 f (x/λ) x proof: apply Moreau decomposition to λf x = prox λf (x) + prox (λf ) (x) = prox λf (x) + λprox λ 1 f (x/λ) second line uses (λf ) (y) = λf (y/λ)

24 Composition with affine mapping 24/37 for general A, the prox-operator of f (x) = g(ax + b) does not follow easily from the prox-operator of g however if AA T = (1/α)I, we have prox f (x) = (I αa T A)x + αa T (prox α 1 g (Ax + b) b) example: f (x 1,..., x m ) = g(x 1 + x x m ) prox f (x 1,..., x m ) i = x i 1 m m x j prox m mg ( x j ) j=1 j=1

25 25/37 proof: u = prox f (x) is the solution of the optimization problem with variables u, y min u,y g(y) u x 2 2 s.t. Au + b = y eliminate u using the expression optimal y is minimizer of u = x + A T (AA T ) 1 (y b Ax) = (I αa T A)x + αa T (y b) g(y) + α2 2 AT (y b Ax) 2 2 = g(y) + α 2 y b Ax 2 2 solution is y = prox α 1 g(ax + b)

26 Projection on affine sets 26/37 hyperplane: C = {x a T x = b} (with a 0) P C (x) = x + b at x a 2 a 2 affine set: C = {x Ax = b} (with A R p n and rank(a) = p) P C (x) = x + A T (AA T ) 1 (b Ax) inexpensive if p n, or AA T = I,...

27 Projection on simple polyhedral sets 27/37 halfspace: C = {x a T x b}(with a 0) P C (x) = { x + b a T xa a 2 2 x rectangle: C = [l, u] = {l x u} if a T x > b if a T x b P C (x) i = l i x i u i x i l i l i x i u i x i u i nonnegative orthant: C = R n + P C (x) = x + (x + is componentwise maximum of 0 and x )

28 28/37 probability simplex: C = {x 1 T x = 1, x 0} P C (x) = (x λ1) + where λ is the solution of the equation 1 T (x λ1) + = n max{0, x k λ} = 1 i=1 probability simplex: C = {x a T x = b, l x u} P c (x) = P [l,u] (x λa) where λ is the solution of a T P [l,u] (x λa) = b

29 Projection on norm balls 29/37 Euclidean ball: C = {x x 2 1} P C (x) = 1-norm ball: C = {x x 1 1} { 1 x 2 x if x 2 > 1 x if x 2 1 x k λ, x k > λ P c (x) k = 0, λ x k λ x k + λ, x k < λ λ = 0 if x 1 1; otherwise λ is the solution of the equation n max{ x k λ, 0} = 1 k=1

30 30/37 Projection on simple cones second order cone C = {(x, t) R n 1 x 2 t} P C (x, t) = (x, t) if x 2 t, P C (x, t) = (0, 0) if x 2 t and P C (x, t) = t + x [ 2 x ] 2 x 2 x 2 if t < x 2 < t, x 0 positive semidefinite cone C = S n + P C (X) = n max{0, λ i }q i q T i if X = n i=1 λ iq i q T i is the eigenvalue decomposition of X i=1

31 Support function 31/37 conjugate of support function of closed convex set is indicator function f (x) = S C (x) = sup y C x T y, f (y) = I C (y) prox-operator of support function: apply Moreau decomposition prox t f = x tprox t 1 f (x/t) = x tp C (x/t) example: f (x) is sum of largest r components of x f (x) = x [1] + + x [r] = S C (x), C = {y 0 y 1, 1 T y = r} prox-operator of f is easily evaluated via projection on C

32 32/37 Norms conjugate of norm is indicator function of dual norm ball: f (x) = c, f (x) = I B (y) (B = {y y 1}) prox-operator of norm: apply Moreau decomposition prox t f = x tprox t 1 f (x/t) = x tp B (x/t) = x P tb (x) useful formula for prox t when projection on tb = {x x t} is cheap examples: 1, 2

33 Distance to a point 33/37 distance (in general norm) f (x) = x a prox-operator: from page 20, with g(x) = x prox t f = a + prox t g (x a) = a + x a tp B ( x a ) t = x P tb (x a) B is the unit ball for the dual norm

34 34/37 Euclidean distance to a set Euclidean distance (to a closed convex set C) d(x) = inf y C x y 2 prox-operator of distance prox td (x) = θp C (x) + (1 θ)x, θ = { t/d(x) prox-operator of squared distance: f (x) = d(x) 2 /2 prox t f = t x + t 1 + t P C(x) d(x) t 1 otherwise

35 35/37 proof (expression for prox td (x)) if u = prox td (x) / C, then from the definition and subgradient for d x u = t d(u) (u P C(u)) implies P C (u) = P C (x), d(x) t, u is convex combination of x, P C (x) if u C minimizes d(u) + (1/(2t)) u x 2 2, then u = P C(x)

36 36/37 proof (expression for prox tf (x) when f (x) = d(x) 2 /2) ( 1 prox t f (x) = arg min u 2 d(u)2 + 1 ) 2t u x 2 2 ( 1 = arg min inf u 2 u v t 2) u x 2 v C optimal u as a function of v is optimal v minimizes 1 2 u = t t + 1 v + 1 t + 1 x t t + 1 v + 1 t + 1 x v t 2 2t t + 1 v + 1 t + 1 x x Over C, i.e., v = P C (x) 2 2 = 1 2(1 + t) v x 2 2

37 Refrences 37/37 P. L. Combettes and V. R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Modeling and Simulation (2005). P. L. Combettes and J.-Ch. Pesquet, Proximal splitting methods in signal processsing, Fixed-Point Algorithms for Inverse Problems in Science and Engineering (2011). N. Parikh and S. Boyd, Proximal algorithms (2013).

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex