Convex Optimization Conjugate, Subdifferential, Proximation

Similar documents
BASICS OF CONVEX ANALYSIS

Dual methods for the minimization of the total variation

Brøndsted-Rockafellar property of subdifferentials of prox-bounded functions. Marc Lassonde Université des Antilles et de la Guyane

Dual Decomposition.

A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06

Gradient Descent and Implementation Solving the Euler-Lagrange Equations in Practice

Optimization and Optimal Control in Banach Spaces

Convex Functions. Pontus Giselsson

Convex Analysis Background

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

The proximal mapping

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

IE 521 Convex Optimization

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Subdifferential representation of convex functions: refinements and applications

Lecture 1: Background on Convex Analysis

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

A Greedy Framework for First-Order Optimization

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Proximal methods. S. Villa. October 7, 2014

Strongly convex functions, Moreau envelopes and the generic nature of convex functions with strong minimizers

Math 273a: Optimization Convex Conjugacy

8. Conjugate functions

Dual Proximal Gradient Method

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Radial Subgradient Descent

Dedicated to Michel Théra in honor of his 70th birthday

MAXIMALITY OF SUMS OF TWO MAXIMAL MONOTONE OPERATORS

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

On Gap Functions for Equilibrium Problems via Fenchel Duality

Lecture 16: FTRL and Online Mirror Descent

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

SOME REMARKS ON THE SPACE OF DIFFERENCES OF SUBLINEAR FUNCTIONS

On duality theory of conic linear problems

6. Proximal gradient method

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

1 Sparsity and l 1 relaxation

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

THE UNIQUE MINIMAL DUAL REPRESENTATION OF A CONVEX FUNCTION

Continuous Primal-Dual Methods in Image Processing

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Lecture 6: September 12

Math 273a: Optimization Subgradients of convex functions

Dual and primal-dual methods

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

Extended Monotropic Programming and Duality 1

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Maximal monotone operators are selfdual vector fields and vice-versa

arxiv: v2 [math.fa] 21 Jul 2013

6. Proximal gradient method

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

A Dual Condition for the Convex Subdifferential Sum Formula with Applications

On Nonconvex Subdifferential Calculus in Banach Spaces 1

Helly's Theorem and its Equivalences via Convex Analysis

A characterization of essentially strictly convex functions on reflexive Banach spaces

Introduction to Functional Analysis With Applications

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

UTILITY OPTIMIZATION IN A FINITE SCENARIO SETTING

Fenchel-Moreau Conjugates of Inf-Transforms and Application to Stochastic Bellman Equation

CONVEX OPTIMIZATION VIA LINEARIZATION. Miguel A. Goberna. Universidad de Alicante. Iberian Conference on Optimization Coimbra, November, 2006

Stochastic model-based minimization under high-order growth

CHARACTERIZATION OF THE SUBDIFFERENTIALS OF CONVEX FUNCTIONS

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Optimization for Machine Learning

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Epiconvergence and ε-subgradients of Convex Functions

Optimization methods

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Lecture 6 : Projected Gradient Descent

A Variational Approach to Lagrange Multipliers

Self-equilibrated Functions in Dual Vector Spaces: a Boundedness Criterion

Merit Functions and Descent Algorithms for a Class of Variational Inequality Problems

Lecture: Duality of LP, SOCP and SDP

arxiv: v1 [math.oc] 21 Apr 2016

Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

HAMILTON-JACOBI THEORY AND PARAMETRIC ANALYSIS IN FULLY CONVEX PROBLEMS OF OPTIMAL CONTROL. R. T. Rockafellar 1

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

A Dykstra-like algorithm for two monotone operators

Lagrange Relaxation: Introduction and Applications

Design of optimal RF pulses for NMR as a discrete-valued control problem

ON PROXIMAL POINT-TYPE ALGORITHMS FOR WEAKLY CONVEX FUNCTIONS AND THEIR CONNECTION TO THE BACKWARD EULER METHOD

Optimality Conditions for Nonsmooth Convex Optimization

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

DUALIZATION OF SUBGRADIENT CONDITIONS FOR OPTIMALITY

In Progress: Summary of Notation and Basic Results Convex Analysis C&O 663, Fall 2009

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

Chapter 2 Convex Analysis

Primal/Dual Decomposition Methods

Sequential Pareto Subdifferential Sum Rule And Sequential Effi ciency

Brézis - Haraux - type approximation of the range of a monotone operator composed with a linear mapping

CO 250 Final Exam Guide

Chapter 3. Characterization of best approximations. 3.1 Characterization of best approximations in Hilbert spaces

Convex Optimization and Modeling

A user s guide to Lojasiewicz/KL inequalities

Transcription:

1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich

2 Bastian Goldlücke Overview x Lecture Notes, HCI WS 211 1 Conjugate functionals Subdifferential calculus 2 Moreau s theorem Moreau s theorem Fixed points Subgradient descent 3 Summary

3 Overview x Bastian Goldlücke Lecture Notes, HCI WS 211 1 Conjugate functionals Subdifferential calculus 2 Moreau s theorem Moreau s theorem Fixed points Subgradient descent 3 Summary

Affine functions x Conjugate functionals x Bastian Goldlücke Lecture Notes, HCI WS 211 In this tutorial, we interpret elements of the dual in a very geometrical way as the slope of affine functions. Definition Let ϕ V and c R, then an affine function on V is given by h ϕ,c : v x, ϕ c. We call ϕ the slope and c the intercept of h ϕ,c. R c [ϕ 1] h ϕ,c V 4

Affine functions x Conjugate functionals x Bastian Goldlücke Lecture Notes, HCI WS 211 We would like to find the largest affine function below f. For this, consider for each x V the affine function which passes through (x, f (x)): h ϕ,c (x) = f (x) x, ϕ c = f (x) c = x, ϕ f (x). ( x, ϕ f (x)) f (x) epi(f ) f h ϕ, x,ϕ f (x) x V To get the largest affine function below f, we have to pass to the supremum. The intercept of this function is called the conjugate functional of f. 5

6 Conjugate functionals x Conjugate functionals x Bastian Goldlücke Lecture Notes, HCI WS 211 Definition Let f conv(v). Then the conjugate functional f : V R { } is defined as f (ϕ) := sup [ x, ϕ f (x)]. x V An immediate consequence of the definition is Fenchel s inequality Let f conv(v). Then for all x V and ϕ V, x, ϕ f (x) + f (ϕ). In the above equation, equality holds if and only if ϕ belongs to the subdifferential f (x).

7 Conjugate functionals x Bastian Goldlücke Geometric interpretation of the conjugate x Lecture Notes, HCI WS 211 R f (ϕ) epi(f ) [ϕ 1] f V h ϕ,f (ϕ)

8 Conjugate functionals x Bastian Goldlücke Example: conjugate of an indicator function x Lecture Notes, HCI WS 211 Let K V be convex, and δ K be its indicator function. Then δ K (ϕ) = sup { x, ϕ δ K (x)} x K = sup x, ϕ = σ K (ϕ). x K i.e. the conjugate of an indicator function is the support functional.

9 Conjugate functionals x Bastian Goldlücke Properties of the conjugate functional x Lecture Notes, HCI WS 211 If f is convex, then f has the following remarkable property. The proof is not difficult (exercise). Theorem Let f conv(v). Then f is closed and convex. This will ultimately lead to a similar scenario which we had for the minimum norm problem: the dual problem of a convex optimization problem always attains its extremum, even if the primal problem does not.

1 The epigraph of f x Conjugate functionals x Bastian Goldlücke Lecture Notes, HCI WS 211 By definition, we have (ϕ, t) epi(f ) t sup [ x, ϕ f (x)]. x V If we define for each ϕ V and t R the affine functionals h ϕ,t (x) = x, ϕ t, then the epigraph of f can be written as epi(f ) = {(ϕ, t) V R : f h ϕ,t }. In other words, the epigraph of f consists of all pairs (ϕ, t) such that the affine function h ϕ,t lies below f. This insight will yield the interesting relationship f = f for closed convex functionals.

11 Second conjugate x Conjugate functionals x Bastian Goldlücke Lecture Notes, HCI WS 211 The epigraph of f consists of all pairs (ϕ, c) such that h ϕ,c lies below f. It almost completely characterizes f. The reason for the almost is that you can recover f only up to closure. Theorem Let f conv(v) be closed and V be reflexive, i.e. V = V. Then f = f. For the proof, note that f (x) = sup h ϕ,c (x) = sup h ϕ,c (x) h ϕ,c f (ϕ,c) epi(f ) = sup ϕ V [ x, ϕ f (ϕ)] = f (x). The first equality is intuitive, but surprisingly difficult to show - it is a consequence of the theorem of Hahn-Banach applied to the epigraph of f.

12 Conjugate functionals x Bastian Goldlücke Ex: second conjugate of an indicator function x Lecture Notes, HCI WS 211 Directly from the definition, we get the following Proposition Let K V be convex and δ K its indicator function. Then the support function of K is the conjugate of δ K, or σ K = δk. In addition, if K is closed, then δk = δ K, i.e. σk = δ K. The latter is correct because obviously, if K is closed, then so is K R + = epi(δ K ).

13 The conjugate of J x Conjugate functionals x Bastian Goldlücke Lecture Notes, HCI WS 211 Let K L 2 (Ω) be the following closed convex set: K = cl { div(ξ) : ξ C 1 c (Ω, R n ), ξ 1 }. Note that the space L 2 (Ω) is a Hilbert space, thus K is also a subset of its dual space. Proposition For every u, v L 2 (Ω), J(u) := σ K (u) = sup u, v = δk (u) v K J (v) = σk (v) = δ K (v) = δ K (v) = { if v K, otherwise.

14 The subdifferential x Subdifferential calculus x Bastian Goldlücke Lecture Notes, HCI WS 211 Definition Let f conv(v). A vector ϕ V is called a subgradient of f at x V if f (y) f (x) + y x, ϕ for all y V. The set of all subgradients of f at x is called the subdifferential f (x). Geometrically speaking, ϕ is a subgradient if the graph of the affine function h(y) = f (x) + y x, ϕ lies below the epigraph of f. Note that also h(x) = f (x), so it touches the epigraph.

15 The subdifferential x Subdifferential calculus x Bastian Goldlücke Lecture Notes, HCI WS 211 Example: the subdifferential of f : x x in is f () = [ 1, 1].

16 Subdifferential and derivatives x Subdifferential calculus x Bastian Goldlücke Lecture Notes, HCI WS 211 The subdifferential is a generalization of the Fréchet derivative (or the gradient in finite dimension), in the following sense. Theorem (subdifferential and Fréchet derivative Let f conv(v) be Fréchet differentiable at x V. Then f (x) = {df (x)}. The proof of the theorem is surprisingly involved - it requires to relate the subdifferential to one-sided directional derivatives. We will not explore these relationships in this lecture.

17 x Subdifferential calculus x Bastian Goldlücke Relationship between subgradient and conjugate x Lecture Notes, HCI WS 211 R f (ϕ) epi(f ) f h ϕ,f (ϕ) x V Here, we can see the equivalence ϕ f (x) h ϕ,f (ϕ)(y) = f (x) + y x, ϕ f (ϕ) = x, ϕ f (x)

18 The subdifferential and duality x Subdifferential calculus x Bastian Goldlücke Lecture Notes, HCI WS 211 The previously seen relationship between subgradients and conjugate functional can be summarized in the following theorem. Theorem Let f conv(v) and x V. Then the following conditions on a vector ϕ V are equivalent: ϕ f (x). x = argmax y V [ y, ϕ f (y)]. f (x) + f (ϕ) = x, ϕ. If furthermore, f is closed, then more conditions can be added to this list: x f (ϕ). ϕ = argmax ψ V [ x, ψ f (ψ)].

19 Formal proof of the theorem x Subdifferential calculus x Bastian Goldlücke Lecture Notes, HCI WS 211 The equivalences are easy to see. Rewriting the subgradient definition, one sees that ϕ f (x) means x, ϕ f (x) y, ϕ f (y) for all y V. This implies the first equivalence. Since the supremum over all y V on the right hand side is f (ϕ), we get the second together with the Fenchel inequality. If f is closed, then f = f, thus we get f (x) + f (ϕ) = x, ϕ. This is equivalent to the last two conditions using the same arguments as above on the conjugate functional.

2 Subdifferential calculus x Bastian Goldlücke Variational principle for convex functionals x Lecture Notes, HCI WS 211 As a corollary of the previous theorem, we obtain a generalized variational principle for convex functionals. It is a necessary and sufficient condition for the (global) extremum. Corollary (variational principle for convex functionals) Let f conv(v). Then ˆx is a global minimum of f if and only if f (ˆx). Furthermore, if f is closed, then ˆx is a global minimum if and only if ˆx f (), i.e. minimizing a functional is the same as computing the subdifferential of the conjugate functional at. To see this, just set ϕ = in the previous theorem.

21 Moreau s theorem x Bastian Goldlücke Overview x Lecture Notes, HCI WS 211 1 Conjugate functionals Subdifferential calculus 2 Moreau s theorem Moreau s theorem Fixed points Subgradient descent 3 Summary

Moreau s theorem x Moreau s theorem x Bastian Goldlücke Moreau s theorem x Lecture Notes, HCI WS 211 For the remainder of the lecture, we will assume that the underlying space is a Hilbert space H, for example L 2 (Ω). Theorem (geometric Moreau) Let f be convex and closed on the Hilbert space H, which we identify with its dual. Then for every z H there is a unique decomposition z = ˆx + ϕ with ϕ f (ˆx), and the unique ˆx in this decomposition can be computed with the proximation { } 1 prox f (z) := argmin x H 2 x z 2 H + f (x). Corollary to Theorem 31.5 in Rockafellar, page 339 (of 423). The actual theorem has somewhat more content, but is very technical and quite hard to digest. The above is the essential consequence. 22

23 Moreau s theorem x Moreau s theorem x Bastian Goldlücke Proof of Moreau s Theorem x Lecture Notes, HCI WS 211 The correctness of the theorem is not too hard to see: if ˆx = prox f (z), then { } 1 ˆx argmin x H 2 x z 2 H + f (x) ˆx z + f (ˆx) z ˆx + f (ˆx). Existence and uniqueness of the proximation follows because the functional is closed, strictly convex and coercive.

24 Moreau s theorem x Moreau s theorem x Bastian Goldlücke The geometry of the graph of f x Lecture Notes, HCI WS 211 We will see in a moment that prox f is continuous. In particular, the map z (prox f (z), z prox f (z)) is a continuous map from H into the graph of f, graph( f ) := {(x, ϕ) : x H, ϕ f (x)} H H, with continuous inverse (x, ϕ) x + ϕ. The theorem of Moreau now says that this map is one-to one. In particular, H graph( f ), i.e. the sets are homeomorphic. In particular, graph( f ) is always connected. Another corollary of Moreau s theorem is that z = prox f (z) + prox f (z).

25 Moreau s theorem x Moreau s theorem x Bastian Goldlücke The proximation operator is continuous x Lecture Notes, HCI WS 211 Proposition Let f be a convex and closed functional on the Hilbert space H. Then prox f is Lipschitz with constant 1, i.e. for all z, z 1 in H, prox f (z ) prox f (z 1 ) H z z 1. We will prove this in an exercise.

26 Moreau s theorem x Fixed points x Bastian Goldlücke Fixed points of the proximation operator x Lecture Notes, HCI WS 211 Proposition Let f be closed and convex on the Hilbert space H. Let ẑ be a fixed point of the proximation operator prox f, i.e. ẑ = prox f (ẑ). Then ẑ is a minimizer of f. In particular, it also follows that ẑ (I prox f ) 1 (). To proof this, just note that because of Moreau s theorem, if ẑ is a fixed point. ẑ prox f (ẑ) + f (ẑ) f (ẑ)

27 Moreau s theorem x Subgradient descent x Bastian Goldlücke Subgradient descent x Lecture Notes, HCI WS 211 Let λ >, z H and x = prox λf (z). Then z x + λf (x) x z λ f (x). In particular, we have the following interesting observation: The proximation operator prox λf computes an implicit subgradient descent step of step size λ for the functional f. Implicit here means that the subgradient is not evaluated at the original, but at the new location. This improves stability of the descent. Note that if subgradient descent converges, then it converges to a fixed point ẑ of I λ f, in particular ẑ is a minimizer of the functional f.

28 Summary Summary x Bastian Goldlücke Lecture Notes, HCI WS 211 Convex optimization deals with finding minima of convex functionals, which can be non-differentiable. The generalization of the variational principle for a convex functional is the condition that at a minimum, zero must be an element of the subgradient. Efficient optimization methods rely heavily on the concept of duality and the conjugate functional. We will see that it allows to transform convex minimization problems into saddle point problems, which are sometimes easier to handle. Implicit subgradient descent for convex functionals can be computed by evaluating the proximation operator, which means solving another minimization problem.

29 Summary Bastian Goldlücke References x Lecture Notes, HCI WS 211 Convex Optimization Boyd and Vandenberghe, Convex Optimization, Stanford University Press 24. Excellent recent introduction to convex optimization. Reads very well, available online for free. Rockafellar, Convex Analysis, Princeton University Press 197. Classical introduction to convex analysis and optimization. Somewhat technical and not too easy to read, but very exhaustive.