1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich
2 Bastian Goldlücke Overview x Lecture Notes, HCI WS 211 1 Conjugate functionals Subdifferential calculus 2 Moreau s theorem Moreau s theorem Fixed points Subgradient descent 3 Summary
3 Overview x Bastian Goldlücke Lecture Notes, HCI WS 211 1 Conjugate functionals Subdifferential calculus 2 Moreau s theorem Moreau s theorem Fixed points Subgradient descent 3 Summary
Affine functions x Conjugate functionals x Bastian Goldlücke Lecture Notes, HCI WS 211 In this tutorial, we interpret elements of the dual in a very geometrical way as the slope of affine functions. Definition Let ϕ V and c R, then an affine function on V is given by h ϕ,c : v x, ϕ c. We call ϕ the slope and c the intercept of h ϕ,c. R c [ϕ 1] h ϕ,c V 4
Affine functions x Conjugate functionals x Bastian Goldlücke Lecture Notes, HCI WS 211 We would like to find the largest affine function below f. For this, consider for each x V the affine function which passes through (x, f (x)): h ϕ,c (x) = f (x) x, ϕ c = f (x) c = x, ϕ f (x). ( x, ϕ f (x)) f (x) epi(f ) f h ϕ, x,ϕ f (x) x V To get the largest affine function below f, we have to pass to the supremum. The intercept of this function is called the conjugate functional of f. 5
6 Conjugate functionals x Conjugate functionals x Bastian Goldlücke Lecture Notes, HCI WS 211 Definition Let f conv(v). Then the conjugate functional f : V R { } is defined as f (ϕ) := sup [ x, ϕ f (x)]. x V An immediate consequence of the definition is Fenchel s inequality Let f conv(v). Then for all x V and ϕ V, x, ϕ f (x) + f (ϕ). In the above equation, equality holds if and only if ϕ belongs to the subdifferential f (x).
7 Conjugate functionals x Bastian Goldlücke Geometric interpretation of the conjugate x Lecture Notes, HCI WS 211 R f (ϕ) epi(f ) [ϕ 1] f V h ϕ,f (ϕ)
8 Conjugate functionals x Bastian Goldlücke Example: conjugate of an indicator function x Lecture Notes, HCI WS 211 Let K V be convex, and δ K be its indicator function. Then δ K (ϕ) = sup { x, ϕ δ K (x)} x K = sup x, ϕ = σ K (ϕ). x K i.e. the conjugate of an indicator function is the support functional.
9 Conjugate functionals x Bastian Goldlücke Properties of the conjugate functional x Lecture Notes, HCI WS 211 If f is convex, then f has the following remarkable property. The proof is not difficult (exercise). Theorem Let f conv(v). Then f is closed and convex. This will ultimately lead to a similar scenario which we had for the minimum norm problem: the dual problem of a convex optimization problem always attains its extremum, even if the primal problem does not.
1 The epigraph of f x Conjugate functionals x Bastian Goldlücke Lecture Notes, HCI WS 211 By definition, we have (ϕ, t) epi(f ) t sup [ x, ϕ f (x)]. x V If we define for each ϕ V and t R the affine functionals h ϕ,t (x) = x, ϕ t, then the epigraph of f can be written as epi(f ) = {(ϕ, t) V R : f h ϕ,t }. In other words, the epigraph of f consists of all pairs (ϕ, t) such that the affine function h ϕ,t lies below f. This insight will yield the interesting relationship f = f for closed convex functionals.
11 Second conjugate x Conjugate functionals x Bastian Goldlücke Lecture Notes, HCI WS 211 The epigraph of f consists of all pairs (ϕ, c) such that h ϕ,c lies below f. It almost completely characterizes f. The reason for the almost is that you can recover f only up to closure. Theorem Let f conv(v) be closed and V be reflexive, i.e. V = V. Then f = f. For the proof, note that f (x) = sup h ϕ,c (x) = sup h ϕ,c (x) h ϕ,c f (ϕ,c) epi(f ) = sup ϕ V [ x, ϕ f (ϕ)] = f (x). The first equality is intuitive, but surprisingly difficult to show - it is a consequence of the theorem of Hahn-Banach applied to the epigraph of f.
12 Conjugate functionals x Bastian Goldlücke Ex: second conjugate of an indicator function x Lecture Notes, HCI WS 211 Directly from the definition, we get the following Proposition Let K V be convex and δ K its indicator function. Then the support function of K is the conjugate of δ K, or σ K = δk. In addition, if K is closed, then δk = δ K, i.e. σk = δ K. The latter is correct because obviously, if K is closed, then so is K R + = epi(δ K ).
13 The conjugate of J x Conjugate functionals x Bastian Goldlücke Lecture Notes, HCI WS 211 Let K L 2 (Ω) be the following closed convex set: K = cl { div(ξ) : ξ C 1 c (Ω, R n ), ξ 1 }. Note that the space L 2 (Ω) is a Hilbert space, thus K is also a subset of its dual space. Proposition For every u, v L 2 (Ω), J(u) := σ K (u) = sup u, v = δk (u) v K J (v) = σk (v) = δ K (v) = δ K (v) = { if v K, otherwise.
14 The subdifferential x Subdifferential calculus x Bastian Goldlücke Lecture Notes, HCI WS 211 Definition Let f conv(v). A vector ϕ V is called a subgradient of f at x V if f (y) f (x) + y x, ϕ for all y V. The set of all subgradients of f at x is called the subdifferential f (x). Geometrically speaking, ϕ is a subgradient if the graph of the affine function h(y) = f (x) + y x, ϕ lies below the epigraph of f. Note that also h(x) = f (x), so it touches the epigraph.
15 The subdifferential x Subdifferential calculus x Bastian Goldlücke Lecture Notes, HCI WS 211 Example: the subdifferential of f : x x in is f () = [ 1, 1].
16 Subdifferential and derivatives x Subdifferential calculus x Bastian Goldlücke Lecture Notes, HCI WS 211 The subdifferential is a generalization of the Fréchet derivative (or the gradient in finite dimension), in the following sense. Theorem (subdifferential and Fréchet derivative Let f conv(v) be Fréchet differentiable at x V. Then f (x) = {df (x)}. The proof of the theorem is surprisingly involved - it requires to relate the subdifferential to one-sided directional derivatives. We will not explore these relationships in this lecture.
17 x Subdifferential calculus x Bastian Goldlücke Relationship between subgradient and conjugate x Lecture Notes, HCI WS 211 R f (ϕ) epi(f ) f h ϕ,f (ϕ) x V Here, we can see the equivalence ϕ f (x) h ϕ,f (ϕ)(y) = f (x) + y x, ϕ f (ϕ) = x, ϕ f (x)
18 The subdifferential and duality x Subdifferential calculus x Bastian Goldlücke Lecture Notes, HCI WS 211 The previously seen relationship between subgradients and conjugate functional can be summarized in the following theorem. Theorem Let f conv(v) and x V. Then the following conditions on a vector ϕ V are equivalent: ϕ f (x). x = argmax y V [ y, ϕ f (y)]. f (x) + f (ϕ) = x, ϕ. If furthermore, f is closed, then more conditions can be added to this list: x f (ϕ). ϕ = argmax ψ V [ x, ψ f (ψ)].
19 Formal proof of the theorem x Subdifferential calculus x Bastian Goldlücke Lecture Notes, HCI WS 211 The equivalences are easy to see. Rewriting the subgradient definition, one sees that ϕ f (x) means x, ϕ f (x) y, ϕ f (y) for all y V. This implies the first equivalence. Since the supremum over all y V on the right hand side is f (ϕ), we get the second together with the Fenchel inequality. If f is closed, then f = f, thus we get f (x) + f (ϕ) = x, ϕ. This is equivalent to the last two conditions using the same arguments as above on the conjugate functional.
2 Subdifferential calculus x Bastian Goldlücke Variational principle for convex functionals x Lecture Notes, HCI WS 211 As a corollary of the previous theorem, we obtain a generalized variational principle for convex functionals. It is a necessary and sufficient condition for the (global) extremum. Corollary (variational principle for convex functionals) Let f conv(v). Then ˆx is a global minimum of f if and only if f (ˆx). Furthermore, if f is closed, then ˆx is a global minimum if and only if ˆx f (), i.e. minimizing a functional is the same as computing the subdifferential of the conjugate functional at. To see this, just set ϕ = in the previous theorem.
21 Moreau s theorem x Bastian Goldlücke Overview x Lecture Notes, HCI WS 211 1 Conjugate functionals Subdifferential calculus 2 Moreau s theorem Moreau s theorem Fixed points Subgradient descent 3 Summary
Moreau s theorem x Moreau s theorem x Bastian Goldlücke Moreau s theorem x Lecture Notes, HCI WS 211 For the remainder of the lecture, we will assume that the underlying space is a Hilbert space H, for example L 2 (Ω). Theorem (geometric Moreau) Let f be convex and closed on the Hilbert space H, which we identify with its dual. Then for every z H there is a unique decomposition z = ˆx + ϕ with ϕ f (ˆx), and the unique ˆx in this decomposition can be computed with the proximation { } 1 prox f (z) := argmin x H 2 x z 2 H + f (x). Corollary to Theorem 31.5 in Rockafellar, page 339 (of 423). The actual theorem has somewhat more content, but is very technical and quite hard to digest. The above is the essential consequence. 22
23 Moreau s theorem x Moreau s theorem x Bastian Goldlücke Proof of Moreau s Theorem x Lecture Notes, HCI WS 211 The correctness of the theorem is not too hard to see: if ˆx = prox f (z), then { } 1 ˆx argmin x H 2 x z 2 H + f (x) ˆx z + f (ˆx) z ˆx + f (ˆx). Existence and uniqueness of the proximation follows because the functional is closed, strictly convex and coercive.
24 Moreau s theorem x Moreau s theorem x Bastian Goldlücke The geometry of the graph of f x Lecture Notes, HCI WS 211 We will see in a moment that prox f is continuous. In particular, the map z (prox f (z), z prox f (z)) is a continuous map from H into the graph of f, graph( f ) := {(x, ϕ) : x H, ϕ f (x)} H H, with continuous inverse (x, ϕ) x + ϕ. The theorem of Moreau now says that this map is one-to one. In particular, H graph( f ), i.e. the sets are homeomorphic. In particular, graph( f ) is always connected. Another corollary of Moreau s theorem is that z = prox f (z) + prox f (z).
25 Moreau s theorem x Moreau s theorem x Bastian Goldlücke The proximation operator is continuous x Lecture Notes, HCI WS 211 Proposition Let f be a convex and closed functional on the Hilbert space H. Then prox f is Lipschitz with constant 1, i.e. for all z, z 1 in H, prox f (z ) prox f (z 1 ) H z z 1. We will prove this in an exercise.
26 Moreau s theorem x Fixed points x Bastian Goldlücke Fixed points of the proximation operator x Lecture Notes, HCI WS 211 Proposition Let f be closed and convex on the Hilbert space H. Let ẑ be a fixed point of the proximation operator prox f, i.e. ẑ = prox f (ẑ). Then ẑ is a minimizer of f. In particular, it also follows that ẑ (I prox f ) 1 (). To proof this, just note that because of Moreau s theorem, if ẑ is a fixed point. ẑ prox f (ẑ) + f (ẑ) f (ẑ)
27 Moreau s theorem x Subgradient descent x Bastian Goldlücke Subgradient descent x Lecture Notes, HCI WS 211 Let λ >, z H and x = prox λf (z). Then z x + λf (x) x z λ f (x). In particular, we have the following interesting observation: The proximation operator prox λf computes an implicit subgradient descent step of step size λ for the functional f. Implicit here means that the subgradient is not evaluated at the original, but at the new location. This improves stability of the descent. Note that if subgradient descent converges, then it converges to a fixed point ẑ of I λ f, in particular ẑ is a minimizer of the functional f.
28 Summary Summary x Bastian Goldlücke Lecture Notes, HCI WS 211 Convex optimization deals with finding minima of convex functionals, which can be non-differentiable. The generalization of the variational principle for a convex functional is the condition that at a minimum, zero must be an element of the subgradient. Efficient optimization methods rely heavily on the concept of duality and the conjugate functional. We will see that it allows to transform convex minimization problems into saddle point problems, which are sometimes easier to handle. Implicit subgradient descent for convex functionals can be computed by evaluating the proximation operator, which means solving another minimization problem.
29 Summary Bastian Goldlücke References x Lecture Notes, HCI WS 211 Convex Optimization Boyd and Vandenberghe, Convex Optimization, Stanford University Press 24. Excellent recent introduction to convex optimization. Reads very well, available online for free. Rockafellar, Convex Analysis, Princeton University Press 197. Classical introduction to convex analysis and optimization. Somewhat technical and not too easy to read, but very exhaustive.