Contents. 1 Introduction 5. 2 Convexity 9. 3 Cones and Generalized Inequalities Conjugate Functions Duality in Optimization 27

Similar documents
A set C R n is convex if and only if δ C is convex if and only if. f : R n R is strictly convex if and only if f is convex and the inequality (1.

Optimization and Optimal Control in Banach Spaces

Optimality Conditions for Nonsmooth Convex Optimization

Convex Analysis and Optimization Chapter 2 Solutions

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

The proximal mapping

Constraint qualifications for convex inequality systems with applications in constrained optimization

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

Convex Functions. Pontus Giselsson

Lecture 1: Background on Convex Analysis

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Dedicated to Michel Théra in honor of his 70th birthday

The local equicontinuity of a maximal monotone operator

Chapter 2 Convex Analysis

Lecture 5. The Dual Cone and Dual Problem

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Integral Jensen inequality

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

BASICS OF CONVEX ANALYSIS

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Optimality Conditions for Constrained Optimization

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

On duality theory of conic linear problems

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

Dual methods for the minimization of the total variation

THE INVERSE FUNCTION THEOREM

Convex Analysis and Optimization Chapter 4 Solutions

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

3.1 Convexity notions for functions and basic properties

L p Spaces and Convexity

A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06

Semi-infinite programming, duality, discretization and optimality conditions

UTILITY OPTIMIZATION IN A FINITE SCENARIO SETTING

A Dual Condition for the Convex Subdifferential Sum Formula with Applications

SWFR ENG 4TE3 (6TE3) COMP SCI 4TE3 (6TE3) Continuous Optimization Algorithm. Convex Optimization. Computing and Software McMaster University

Semicontinuous functions and convexity

Nonsymmetric potential-reduction methods for general cones

Appendix B Convex analysis

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Victoria Martín-Márquez

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Practice Exam 1: Continuous Optimisation

Lectures on Geometry

Constrained Optimization and Lagrangian Duality

Math 273a: Optimization Subgradients of convex functions

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

Set, functions and Euclidean space. Seungjin Han

Optimization for Machine Learning

On the use of semi-closed sets and functions in convex analysis

Math The Laplacian. 1 Green s Identities, Fundamental Solution

On duality gap in linear conic problems

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Lecture: Duality of LP, SOCP and SDP

USING FUNCTIONAL ANALYSIS AND SOBOLEV SPACES TO SOLVE POISSON S EQUATION

DUALITY, OPTIMALITY CONDITIONS AND PERTURBATION ANALYSIS

Lecture 8. Strong Duality Results. September 22, 2008

ZERO DUALITY GAP FOR CONVEX PROGRAMS: A GENERAL RESULT

Convex Optimization and Modeling

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Convex Stochastic optimization

Functional Analysis I

Review of Multi-Calculus (Study Guide for Spivak s CHAPTER ONE TO THREE)

MAT 578 FUNCTIONAL ANALYSIS EXERCISES

Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems

Subdifferential representation of convex functions: refinements and applications

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Math 273a: Optimization Convex Conjugacy

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

Second Order Elliptic PDE

Lecture 2: Convex Sets and Functions

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

1 Inner Product Space

Math 341: Convex Geometry. Xi Chen

(c) For each α R \ {0}, the mapping x αx is a homeomorphism of X.

Representation of the polar cone of convex functions and applications

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 7 Monotonicity. September 21, 2008

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Introduction to Convex and Quasiconvex Analysis

LECTURE SLIDES ON BASED ON CLASS LECTURES AT THE CAMBRIDGE, MASS FALL 2007 BY DIMITRI P. BERTSEKAS.

Convex Optimization M2

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

Maths 212: Homework Solutions

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Convex Optimization Boyd & Vandenberghe. 5. Duality

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

A SHORT INTRODUCTION TO BANACH LATTICES AND

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

Continuous Sets and Non-Attaining Functionals in Reflexive Banach Spaces

Inequality Constraints

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

CONVEX OPTIMIZATION VIA LINEARIZATION. Miguel A. Goberna. Universidad de Alicante. Iberian Conference on Optimization Coimbra, November, 2006

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Transcription:

A N D R E W T U L L O C H C O N V E X O P T I M I Z AT I O N T R I N I T Y C O L L E G E T H E U N I V E R S I T Y O F C A M B R I D G E

Contents 1 Introduction 5 1.1 Setup 5 1.2 Convexity 6 2 Convexity 9 3 Cones and Generalized Inequalities 15 4 Conjugate Functions 21 4.1 The Legendre-Fenchel Transform 21 4.2 Duality Correspondences 24 5 Duality in Optimization 27 6 First-order Methods 31 7 Interior Point Methods 33

4 andrew tulloch 8 Support Vector Machines 37 8.1 Machine Learning 37 8.2 Linear Classifiers 38 8.3 Kernel Trick 39 9 Total Variation 41 9.1 Meyer s G-norm 41 9.2 Non-local Regularization 42 9.2.1 How to choose w(x, y)? 43 10 Relaxation 45 10.1 Mumford-Shah model 45 11 Bibliography 49

1 Introduction 1.1 Setup A setup is (i) A function f : U Ω (energy), an input image (i), and a reconstructed candidate image (u) and find the minimizer of the problem where f is typically of the form min f (u, i) (1.1) u U f (u, i) = l(u, i) + }{{} r(u) }{{} cost regulariser (1.2) Example 1.1 (Rudin-Osher-Fortem). Will reduce contrast 1 min u (u I) 2 dx + u du (1.3) 2 Ω Ω Will not introduce new jumps Example 1.2 (Total variation (TV) L1). 1 min u u I dx + λ u du (1.4) 2 Ω Ω In general does not cause contrast loss.

6 andrew tulloch Can show that if I = B r (0), then I r λ 2 u = c r < λ 2 (1.5) Example 1.3 (Inpainting). 1 min u I dx + λ u dx (1.6) u 2 Ω/A Ω 1.2 Convexity Theorem 1.4 If a function f is convex, every local optimizer is a global optimizer. Theorem 1.5 If a function f is convex, it can (very often) be efficiently optimized (polynomial time in number of bits in the input) In computer vision, problems are: Usually large-scale (10 5-10 7 variables) Usually non-differentiable Definition 1.6. f is lower semicountinuous if f (x ) lim inf x x f (x) = min{α R (x k ) x 1, f (xk ) α} (1.7) Theorem 1.7. f : R n R. Then the following are equivalent (i) f is lower semi continuous, (ii) epi f is closed C in R n R (iii) lev α f = {x R n f (x) α} are closed for all α R Proof. ((i) (ii)) Take (x k, α k ) epi f (x, α) R n R. Then By 2.8 f (x) lim inf k f (x k ) limin f k x k = α as (x, α) epi f. ((ii) (iii)) epi f closed implies epi f (R n {a }) closed for all α R, if and only if {x R n f (x) α } (1.8)

convex optimization 7 is closed for all α R If α =, lev + f = R n. If α =, lev f = k N lev k f which is the intersection of closed sets. ((iii) (i)) For x j x, take the subsequence x k x, with Then f (x k ) lim inf f (x) = c (1.9) x x if c R, for all K(ɛ) = K such that f (x k ) C + ɛ for all k > K. x k lev C+ɛ f x lev C+ɛ. Equivalently, x lev C f f (x ) C = lim inf x x f (x). If c = +, done - f (x ) + = lim inf. If c =, same argument with k N, f (x k ) k,... Example 1.8. f (x) = x is lower semi-continuous, but does not have a minimizer. Definition 1.9. f : R n R is level-bounded if and only if lev α f is bounded for all α R. This is also known as coercivity. Example 1.10. f (x) = x 2 is level-bounded. f (x) = 1 is not level x bounded. Theorem 1.11. f : R n R is level bounded if and only if f (x k ) if x k. Proof. Exercise. Theorem 1.12. f : R n R is lower-semicontinuous, level-bounded, and proper. Then inf x f (x) (, + ) and argmin f = {x f (x) in f f (x)} is nonempty and compact. Proof. arg min f = {x f (x) inf x f (x)} (1.10) = {x f (x) inf f (x) + 1, k N} (1.11) k = k N lev f (1.12) inf f + 1 k If inf f is, just replace inf f + 1 k by α with α >, and set α k = k, k N.

8 andrew tulloch These are bounded (by level-boundedness), closed (by f being lower semicontinuous), and non-empty (since 1 k 0.). Then these limit sets are compact, we can just take the limit of the left boundaries of the level sets, and construct the convergent subsequence that is contained in every level set. We need to show that inf f =. If inf f =, then there exists x argmin f with f (x) =. Since f is proper, this cannot exist. Thus, inf f == in f ty. Remark 1.13. For Theorem 1.12, it suffices to have lev α f bounded and non-empty for at least one α R. Proposition 1.14. We have the following properties for lower semi continuity. (i) If f, g is lower semicontinuous, then f + g is lower semicontinuous (ii) IF f is lower semicontinuous, λ 0, then λ f is lower semicontinuous (iii) f : R n R is lower semicontinuous, g : R m R n is continuous, then f g is lower semicontinuous.

2 Convexity Definition 2.1. (i) f : R n R is convex if f ((1 τ)x + τy) (1 τ) f (x) + τ f (y) (2.1) for all x, y R n, τ (0, 1). (ii) A set C R n is convex if and only if I(C) is convex. (iii) f is strictly convex if and only if (2.1) holds strictly whenever x = y and f (x), f (y) R. Remark 2.2. C R n is convex if and only if for all x, y C, the connecting line segment is contained in C. Exercise 2.3. Show {x a T x + b 0} (2.2) is convex. f (x) = a T x + b (2.3) is convex. Definition 2.4. x 0,..., x m R n, λ 0,..., λ m 0 with m i=0 λ i = 1, then i λ i x i is a convex combination of the x i. Theorem 2.5. R n R is convex if and only if f ( n i=1 λ i x i ) n λ i f (x i ) (2.4) i=1

10 andrew tulloch C R n is convex if and only if C contains all convex combinations of it s elements. Proof. n ( ) λ λ i x i = λ m x m + 1 λ i m x 1 λ i i=1 m (2.5) which is a convex combination of two points, and proceed by induction on m. The set version is proven by application on I(C). Proposition 2.6. f : R n R is convex implies that the domain of f is convex. Proposition 2.7. f : R n R is convex if and only if epi f is convex in R n R if and only if {(x, α) R n R f (x) < α} (2.6) is convex in R n R. Proof. epi f is convex if and only if for all τ (0, 1), x, y, α f (x), β > f (y), (x, α), (y, β) epi f, we have f ((1 τ)x + τy) (1 τ)α + τβ (2.7) τ (0, 1), x, y, f ((1 τ)x + τy) (1 τ) f (x) + τ f (y) (2.8) f is convex. (2.9) Proposition 2.8. f : R n R is convex implies lev α f is convex for all α R. Proof. f ((1 τ)x + τy) (1 τ) f (x) + τ f (y) α which implies lev α is convex. α = then the lev α f = R n which is convex. Theorem 2.9. f : R n R is convex. Then

convex optimization 11 (i) arg min f is convex (ii) x is a local minimizer of f implies x is a global minimizer of f (iii) f is strictly convex and proper implies f has at most one global minimizer. Proof. (i) f = arg min f =. f = arg min f = lev inf f f is convex by previous proposition. (ii) Assume x is a local minimizer and there exists y with f (y) < f (x). Then f ((1 τ)x + τy) (1 τ) f (x) + τ f (y) < f (x) }{{} < f (x) Taking τ 0 shows that x cannot be a local minimizer, and thus y cannot exist. (iii) Assume x, y minimizes, which implies f (x) = f (y). Then f ( 1 2 x + 1 2 y) < 1 2 f (x) + 1 f (y) = f (x) = f (y) 2 which implies x, y not global minimizers. Proposition 2.10. To construct convex functions: (i) Let f i, i I convex, then is convex. f (x) = sup i I (ii) Let f i, i I strictly convex, I finite, then f (x) (2.10) is strictly convex. sup i I f (i) (2.11) (iii) C i, i I convex sets, then i I C i (2.12) is convex.

12 andrew tulloch (iv) f k, k N convex, is convex. lim sup k f k (x) (2.13) Proof. Exercise. Example 2.11. (i) C, D convex does not imply C D is convex (e.g. disjoint) (ii) f : R n R is convex, C is convex implies f + I(C) is convex. (iii) f (x) = x = max{x x} is convex. (iv) f (x) = x p, p 1 is convex, as p = sup, y (2.14) y p =1 Theorem 2.12. Let C R n be open and convex. Let f : C R be differentiable. Then the following are equivalent: (i) f is [strictly] convex (ii) y x, f (y) f (x) 00 for all x, y C [with > 0 for x = y] (iii) f (x) + f (x), y x f (y) for all x, y C [with < for x = y] (iv) If f is additionally twice differentiable, then 2 f is positive semidefinite for all x C. (ii) is monotonicity of f. Proof. Exercise (reduce to n = 1, then extend by a convex function is convex on R n iff it is convex on all R n 1 subsets.) Remark 2.13. Note that the inverse of (iv) does not necessarily hold, for example f (x) = x 4. Proposition 2.14. We have the following results hold for convex functions. (i) f i,..., f m : R n R is convex, λ i,..., λ m 0, the n f = λ i f i (2.15) i is convex, and strictly convex if there exists i such that λ i > 0 and f i is strictly convex.

convex optimization 13 (ii) f i : R n R is convex implies f (x 1,..., x m ) = f i (x i ) is convex, strictly convex if all f i are strictly convex. (iii) f : R n R is convex, A R n n, b R n implies g(x) = f (Ax + b) is convex. Remark 2.15. is convex. M ɛ = sup x R n ( Mx ) Proposition 2.16. (i) c 1,..., C m convex implies C 1 C m convex (ii) C R n, A R m n, b R m implies L(C) is convex with L(x) = Ax + b. (iii) C R m,... L (C) is convex. (iv) f, g convex implies f + g are convex (v) f convex, λ R implies λ f is convex Definition 2.17. For S R n, x R n, define the projection of x onto S as Π S (y) = arg min x y 2 (2.16) x S Proposition 2.18. If C R n is convex, closed, and non-empty, then Π C is single-valued - that is, the projection is unique. Proof. Let Π S (y) = arg min x S 1 2 x y 2 2 + I C(x) (2.17) To show uniqueness, 1 2 x y 2 is strictly convex, and 2 1 2 x y 2 = I > 0, so f is strictly convex. f is proper (C = ), and thus f has at most one minimizer. To show existence, we have that f is proper, lower semicontinuous (left part from continuous, right part from C closed). Level bounded as x y 2 2 as x 2, and I(C) 0. Thus, the arg min f =.

14 andrew tulloch Definition 2.19. Let S R n is arbitrary. Then con S = C convex, C S C (2.18) is the convex ball of S. Remark 2.20. con S is the smallest convex set that contains S. Theorem 2.21. Let S R n, then con S = { n i=0 λ i x i x i S, λ i 0, n i=0 λ i = 1 } (2.19) Proof. D is convex and contains S - if x, y D, then (1 τ)x + (τ)y D. Thus con S D. From a previous theorem, con S convex implies con S contains all convex combinations of points in S. Thus con S D Thus con S = D. Definition 2.22. For a set C R n, define cl C as cl C = {x R n for all open neighborhoods N of x, N C = } int C as (2.20) int C = {x R n there exists an open neighborhood N of x with N C} C (the boundary) as (2.21) C = cl C\ int C (2.22) Remark 2.23. cl G = S (2.23) S closed, S G

3 Cones and Generalized Inequalities Definition 3.1. K R n is a cone if and only if 0 K and λx K for all x K, λ 0. Definition 3.2. A cone in pointed if and only if x 1 + + x n = 0 x 1 = = x n = 0 (3.1) Example 3.3. (i) R n is a pointed cone. (ii) {(x 1, x 2 ) R 2 x 2 > 0} is not a cone. (iii) {(x 1, x 2 ) R 2 x 2 0} is a cone, but is not pointed. (iv) {x R n x i 0, i 1,..., n} is a pointed cone. Proposition 3.4. K R n be any set. Then the following are equivalent: (i) K is a convex cone. (ii) K is a cone and K + K K. (iii) K = and i=0 n α ix i K for all x i K, α i 0. Proof. Exercise. Proposition 3.5. If K is a convex cone, then K is pointed if and only if K ( K) = {0}. Proof. Exercise. Definition 3.6. f (x) = {v R n f (x) + v, y x f (y) y} (3.2)

16 andrew tulloch f (x) = lim t 0 f (x + t) f (x) t Proposition 3.7. Let f, g : R n R be convex. Then (3.3) (i) f differentiable at x implies f (x) = { f (x)}. (ii) f differentiable at x, g(x) R implies ( f + g)(x) = f (x) + g(x) (3.4) Proof. We have (i) Equivalent to (ii) with g = 0. (ii) To show LHS RHS, if v g(x), then v, y x g(y), which by Theorem 3.12 in notes gives f (x), y x + f (x) f (y) (3.5) v + f (x), y x + ( f + g)(x) ( f + g)(y) (3.6) v + f (x) ( f + g) (3.7) The other direction is given as lim inf t 0 lim inf z x lim inf z x f (z) + g(z) f (x) g(x) v, z x z x g(z) g(x) v f (x), z x z x g(z(t)) g(x) v f (x), (1 t)x + ty y t y x lim inf t 0 t(g(y) g(x)) t v f (x), y x t y x 0 (3.8) 0 (3.9) 0 (3.10) 0 (3.11) g(y) g(x) v f (x), y x 0 v f (x) (3.12) g(x) v f (x) g(x) (3.13)

convex optimization 17 Theorem 3.8. Let f : R n R be proper. Then x arg min f 0 f (x) (3.14) Proof. 0 f (x) 0, y x + f (x) f (y) (3.15) }{{} 0 Definition 3.9. For a convex set C R n and point x C, N C (x) = {v R n v, y x 0 y G} (3.16) is the normal cone of x. By convention, N C (x) = for all x / C. Proposition 3.10. Let C be convex and C =. Then I(C) (x) = N C (x) (3.17) Proof. For x C, we have I(C) = {v I(C) (x) + v, y x I(C) (y) y C} = N C (x) (3.18) For x / C,. Fill in proof here Proposition 3.11. C, closed, C = and convex, x R n. Then y = Π C (x) x y N C (y) (3.19) Proof. y Π C (x) y minimizes 1 2 y x 2 + I(C) (y ). (3.20) }{{} g(y) If and only if 0 g(y) if and only if 0 y x + I(C) (y) Proposition 3.12. Let f : R n R be convex proper. Then x / dom f f (x) = {v R n (u, 1) N epi f (x, f (x))} x dom f (3.21)

18 andrew tulloch If x dom f, N dom f = {v R n (x, 0) N epi f (x, f (x))} (3.22) Example 3.13. Let subdifferential of f : R R, f (x) = x is sign x x = 0 f (x) = [ 1, 1] x = 0 (3.23) Definition 3.14. For C R n, define the affine hull as aff(c) = A affine, C A A (3.24) rint(c) = {x R n there exists an open neighborhood N of x such that N aff(c) C} (3.25) Example 3.15. aff(r n ) = R n (3.26) aff(r n ) = R n (3.27) rint([0, 1] 2 ) = (0, 1) 2 (3.28) rint([0, 1] {0}) = (0, 1) {0} (3.29) Exercise 3.16. We know inta B A B (3.30) Does rint(a B) rint A rint B (3.31) Proposition 3.17. Let f : R n R be convex. Then (i) (i) g(x) = f (x + y) g(x) = f (x + y) (ii) g(x) = f (λx) g(x) = λ f (x) (iii) g(x) = λ f (x) g(x) = λ f (x) (ii) f : R n R proper, convex, A R m n such that {Ay y R m } rint dom f = (3.32)

convex optimization 19 Then for x dom( f A) we have ( f A)(x) = A T f (Ax) (3.33) (iii) Let f 1,... f m : R n R be proper, convex, and rint dom f 1 rint dom f m =. Then f (x) = f 1 (x) + + f m (x) (3.34)

4 Conjugate Functions 4.1 The Legendre-Fenchel Transform Definition 4.1. For f : R n R, define is the convex hull of f. con f (x) = sup g(x) (4.1) g f,g convex Proposition 4.2. con f is the greatest convex function majorized by f. Definition 4.3. For f : R n R, the (lower) closure of f is defined as cl f (x) = lim inf y x Proposition 4.4. For f : R n R, f (y) (4.2) epi(cl f ) = cl(epi f ) (4.3) In particular, if f is convex, then cl f is convex. Proof. Exercise. Proposition 4.5. If f : R n R, then (cl f )(x) = sup g(x) (4.4) g f,g lsc Proof. Fill in

22 andrew tulloch Theorem 4.6. Let C R n be closed and convex. Then C = H b,β (4.5) (b,β)s.t.c H b,β where H b,β = {x R n x, b β 0} (4.6) Proof. Theorem 4.7. Let f : R n R be proper, lsc, and convex. Then f (x) = sup g(x) (4.7) g f,g affine Proof. Definition 4.8. For f : R n R, let f : R n R be defined by This is quite an involved proof. f (v) = sup x R n { v, x f (x)} (4.8) as the conjugate to f. The mapping f f is the Legendre-Fenchel Transform Remark 4.9. For v R n, f (v) = sup{x R n { v, x f (x)} (4.9) f (v) v, x f (x) x R n f (x) v, x f (x) x R n. (4.10) Thus f is the largest affine function with gradient v majorized by f. Theorem 4.10. Assume f : R n R. Then (i) f = (con f ) = (cl f ) = (cl con f ) and f f = ( f ), the biconjugate of f. (ii) If con f is proper, then f, f as proper, lower semicontinuous, convex and f = cl con f. (iii) If f : R n R is proper, lower semicontinuous, and proper, then f = f (4.11)

convex optimization 23 Proof. We have (v, β) epi f β v, x f (x) x R n f (x) v, x β x R n (4.12) We claim that for an affine function h, we have h f h con f h cl f h cl con f. This is shown as con f is the largest convex function less than or equal to f, and h is convex. Same for cl, cl con, etc. Thus in (4.12) we can replace f by con f, cl f, cl con f, which gives our required result. We also have f (y) = sup v R n { v, y sup x R n { v, x f (x)}} (4.13) sup v R n { v, y v, y + f (y)} (4.14) = f (y) (4.15) For the second part, we have con f is proper, and claim that cl con f is proper, lower semicontinuous, and convex. Lower semicontinuity is a give. Convexity is given by the previous proposition that f is convex implies cl f is convex. Properness is to be shown in an exercise. Applying the previous theorem, cl con f (x) = sup g(x) (4.16) g cl con f,g affine = sup (v,β) epi f { v, x β} (4.17) = sup (v,β) epi f { v, x f (v)} (4.18) = sup v dom f { v, x f (v)} (4.19) = sup v R n { v, x f (v)} f (x) (4.20) with g(x) v, x β, (v, β) epi(cl con f ) (v, β) epi f. To show f is proper, lower semicontinuous, and convex, we have epi f is the intersection of closed convex sets, and therefore closed and convex, and hence f is lower semicontinuous and convex.

24 andrew tulloch To show properness, we have con f is proper implies there exists x R n with con f (x) <. Then f (v) = sup x R n{ v, x f (x)}, which is greater than. If f +, then cl con f = f = sup v v, x f (x), and so cl con f is proper, which implies f is proper, lower semicontinuous, and convex. Applying to f - we need con f proper (which is proper by previous result) and thus f is proper, lower semicontinuous, and convex. For part 3, apply 2 - f is convex, which implies con f = f and con f is proper (as f is proper), and f is lsc and convex, and thus f = cl con f = f. 4.2 Duality Correspondences Theorem 4.11. Let f : R n R be proper, lower semicontinuous, and convex. Then: (i) f = ( f ) 1 (ii) v f (x) f (x) + f (v) = v, x x ( f )(v). (iii) f (x) = arg max{ v, x f (v )} (4.21) v f (x) = arg max{ v, x f (x )} (4.22) x (4.23) (ii) Proof. (i) This is obvious from (2) f (x) + f (v) = v, x (4.24) { f (v) = v, x f (x) (4.25) x arg max{ v, x f (x )} (4.26)... (4.27) Finish off proof

convex optimization 25 Proposition 4.12. Let f : R n R be proper, lower semicontinuous, and convex. Then Proof. Exercise ( f ( ) a, ) = f ( +a) (4.28) ( f ( + b)) = f ( ), b (4.29) ( f ( ) + c) = f ( ) c (4.30) (λ f ( )) = λ f ( ), λ > 0 λ (4.31) (λ f ( λ )) = λ f ( ) (4.32) Proposition 4.13. f i : R n R, i : 1,..., m proper, f (x 1,..., x m ) = i f i (x i ). Then f (v 1,..., v m ) = i f (v i ) Definition 4.14. For any set S R n, define the support function G S (v) = sup v, x = (δ S ) (v) (4.33) x S Definition 4.15. A function h : R n R is positively homogeneous if 0 dom h and h(λx) = λh(x) for all λ > 0, x R n. Proposition 4.16. The set of positive homogeneous proper lower semicontinuous convex functions and the set of closed convex nonempty sets are in one-to-one correspondence through the Legendre-Fenchel transform. δ C G C (4.34) and and x G C (v) x C (4.35) G C (v) = v, x v N C (x) = δ C (x) (4.36) The set of closed convex cones is in one-to-one correspondence with itself: δ K δ K (4.37) K = {v R n v, x 0 x K} (4.38)

26 andrew tulloch and x N K (v) x K, v K, (4.39) x, v = 0 v N K (x) (4.40)

5 Duality in Optimization Definition 5.1. For f : R n R m R proper, lower semicontinuous, convex, we define the primal and dual problems and the inf-projections inf φ(x), φ(x) = f (x, 0) (5.1) x Rn sup y R m ψ(y), ψ(y) = f (0, y) (5.2) p(u) = inf f (x, u) (5.3) x Rn q(v) = inf y R m f (v, y) (5.4) f is the perturbation function for φ, p is the associated projection function. Consider the problem inf x Consider the perturbed problem 1 2 x z 2 + δ 0 (Ax b) = inf φ(x) (5.5) f (x, u) = 1 2 x z 2 + δ 0 (Ax b + u) (5.6) Proposition 5.2. Assume f satisfying the assumptions in Definition 5.1. Then (i) φ, psi are convex and lower semicontinuous (ii) p, q are convex

28 andrew tulloch (iii) p(0) = inf x φ(x) (iv) p (0) = sup y ψ(y) (v) inf x φ(x) < 0 dom p (vi) sup ψ(y) > 0 dom q Proof. (i) φ is clearly convex. For ψ, f is lower semicontinuous and convex, which implies ψ is lower semicontinuous and convex. (ii) Look at the strict epigraph of p: E = {(u, α) R m R p(u) < α} (5.7) = {(u, α) R m R x : f (x, u) < α} (5.8) = A{(u, α, x) R m R R n f (x, u) α} (5.9) = A(u, α, x) (u, α) (5.10) as a linear map over a convex set. As E is convex, p must be convex. Similarly with q. (iii) For p(0), this proceeds by definition. For p (0), p (y) = sup{ y, u p(u)} (5.11) u = sup y, u f (x, u) (5.12) u,x = f (0, y) (5.13) and p (0) = sup 0, y p (y) (5.14) y = sup f (0, y) (5.15) y = sup ψ(y) (5.16) y (iv) By definition, 0 dom p p(0) = inf x f (x) <. complete proof

convex optimization 29 Theorem 5.3. Let f as in Definition 7.1. Then weak duality holds p(0) = inf x φ(x) sup ψ(y) = p (0) (5.17) and under certain conditions the inf, sup are equal and finite (strong duality). p(0) R and p lower-semicontinuous at 0 if and only if inf φ(x) = sup ψ(y) R. y Definition 5.4. inf φ sup ψ (5.18) is the duality gap Proof. ( ) p cl p p cl p(0) = p(0). ( ) cl p is lower semicontinuous, convex cl p(x) is proper, sup ψ = (p ) (0) = (cl p) (0) = cl p(0) = p(0) = inf φ Proposition 5.5. b Fill in notes from lecture

6 First-order Methods Idea is that we do gradient descent on our objective function f, with step size τ k. Definition 6.1. Let f : R n R, then (i) F τk f (x k ) = x k τ k f (x k ) = (I τ k f )(x k ) (6.1) (ii) B τk f (x k ) = (I + τ k f ) 1 (x k ) (6.2) so (6.3) Fill in Proposition 6.2. If f : R n R is proper, lower semicontinuous, and convex, for τ > 0, Proof. B τk f (x) = arg min{ 1 y x 2 2 + f (y)} (6.4) y 2τ k y arg min{...} 0 1 τ k (y x) f (y) (6.5) (6.6) Missed lecture material

7 Interior Point Methods Theorem 7.1 (Problem). Consider the problem with K a closed convex cone - thus Ax K b. inf x c, x + δ K(Ax b) (7.1) Notation. The idea is to replace δ K by a smooth approximation F, with F(x) as x boundary K, and solve inf c, x + 1 t F(x), equivalent to solving t c, x + F(x). Proposition 7.2. F is canonical barrier for K. Then F is smooth on dom F = K and strictly convex, with F(tx) = F(x) Θ F λt, (7.2) and (i) F(x) dom F (ii) F(x), x = Θ F (iii) F( F(x)) = x (iv) F(tx) = 1 t F(x). For the problem inf c, x (7.3) x

34 andrew tulloch such that Ax K b with K closed, convex, self-dual cone, we have sup inf c, x + δ K(Ax b) (7.4) x x b, y h ( A T y c) k (y) (7.5) sup b, y s.t.y K 0, A T y = c (7.6) y inf c, x + F(ax b) (7.7) x sup b, y F(y) δ A T y=c (7.8) y More conditions on, etc Proposition 7.3. For t > 0, define x(t) = arg min{t c, x + F(Ax b)} (7.9) y(t) = arg min{ t b, y + F(y) + δ A T y=c } (7.10) z(t) = (x(t), y(t)) (7.11) These paths exist, are unique, and (x, y) = (x(t), y(t)) A T y = c ty + F(Ax b) = 0 (7.12) Proof. The first part follows by definition. For (7.12), the primal optimality condition is that 0 = tc + A T F(Ax b). The dual optimality condition is that 0 tb + F(y) + N z A T z=c (y), which is A T y = c 0 tb + F(y) + range A A T y = c (7.13) Assume x satisfies the primal optimality condition. Then 0 tb + F(y) + range A, A T y = c (7.14) tc + A T F(Ax b) = 0 tc + A T ( ty) = 0 (7.15) A T y = c (7.16)

convex optimization 35 A few more steps... Proposition 7.4. If x, y feasible, then φ(x) ψ(y) = y, Ax b (7.17) Moreover, for (x(t), y(t)) on the central path, φ(x(t)) ψ(y(t)) = Θ F t (7.18) Proof. φ(x) ψ(y) = c, x b, y (7.19) = A T y, x b, y (7.20) = Ax b, y (7.21) For the second part, we have φ(x(t)) ψ(y(t)) = y(t), Ax(t) b (7.22) = 1t F(Ax(t) b), Ax(t) b (7.23) = Θ F t (7.24) Fill in from lecture notes

8 Support Vector Machines 8.1 Machine Learning Given X = {x 1,..., x n } a sample set, x i F R m, with F our feature space, and C a set of classes. We seek to find h θ : F C, θ Θ (8.1) with Θ our parameter space. Our task is to find θ such that h θ is the best mapping from X into C. (i) Unsupervised learning (only X is known, usually C not known) Clustering Outlier detection Mapping to lower dimensional subspace (ii) Supervised learning ( C known). We have training date T = {(x 1, y 1 ),..., (x n, y n )}, with y i C. We seek to find θ = arg min θ Θ f (θ, T) = n g(y i, h θ (x i )) + R(θ) (8.2) i=1

38 andrew tulloch 8.2 Linear Classifiers The idea is to consider h θ : R m { 1, 1}, θ = w (8.3) b h θ (x) = sign( w, x + b) (8.4) We want to consider maximum margin classifiers, satisfying which can be rewritten as w max min y i ( w,b,w =0 i w, x + b w ) (8.5) max c (8.6) w,b such that ( w c y i w, xi + b ) ( w w c y i w, x i ) + b (8.7) (8.8) (8.9) or just min w,b 1 2 w 2 (8.10) such that 1 y i ( w, x + b) (8.11) Definition 8.1. In standard form, inf w,b k(w, b) + h(m w e) (8.12) b

convex optimization 39 The conjugates are k(w, b) = 1 2 w 2 (8.13) k (u, c) = 1 2 u 2 + δ {0} (c) (8.14) h(z) = δ 0 (z)h (v) = δ v (v) (8.15) In saddle point form, inf sup 1 w,b z 2 w 2 2 + M w e, z δ 0 (z) (8.16) b The dual problem is sup e, z δ 0 (z) 1 n z 2 y i x i z i 2 2 δ {0}( y, z ) (8.17) i=1 and thus inf z such that z 0, y, z = 0. The optimality conditions are 1 2 y i x i z i 2 2 + e, z (8.18) i (8.19) We use the fact that if k(x) + h(ax + b) is our primal, then the dual is b, y k ( A T y) h (z). 8.3 Kernel Trick Fill in rest of optimality conditions Explanation of support vectors The idea is to embed our features into a a higher dimensional space, mapping function φ. Then our decision function takes the form h θ (x) = sign( φ(x), w + b) (8.20)

9 Total Variation 9.1 Meyer s G-norm Idea - regulariser that favors textured regions u G = {inf v v = u, v L (R d, R d )} (9.1) Discretized, we have u G = inf{δ C(v) + δ G T v=u } (9.2) C = {v v(x) 2 1} (9.3) x = inf v sup v = sup w sup{δc w, (v) G T v + u } w (9.4) sup{ v, Gw + δc (v) w, u } v (9.5) = sup{δ C ( Gw) + w, u } (9.6) w = sup{ w, u δ C ( Gw)} (9.7) w = sup{ w, u TV(w) 1} (9.8) and so G is the dual to TV. G = δ B TV (9.9)

42 andrew tulloch where B TV = {u TV(u) 1}. Similarly, sup{ u, w w G 1} (9.10) = sup{ u, w v : w = G T v, v 1} (9.11) = sup{ u, G Tv v 1} (9.12) = TV(u) (9.13) Why is good in separating noise? arg min 1 2 u g 2 2 s.t. u G λ (9.14) = λb G (g) = B λb G (g) = g B λb G (9.15) = g B λtv (g) 9.2 Non-local Regularization In real-world images, large u does not always mean noise. Definition 9.1. Ω = {1,..., n}, given u R n, x, y Ω, w Ω 2 R 0, then y u(x) = (u(y) u(x))w(x, y) (9.16) w u(x = ( y u(x))) y Ω (9.17) A suitable divergence w u(x) = y Ω (w(x, y) v(y, x))w(x, y) adjoint to w with respect to Euclidean inner products. w v, u = v, u w (9.18) Non-local regularizers are J(u) = g( u(x)) (9.19) w x Ω

convex optimization 43 with TV g NL (u) = u(x) 2 (9.20) w x Ω TV d NL (u) = x Ω y Ω y u(x) (9.21) This reduces to classical TV if the weights are chosen as constant (w(x, y) = 1 h ). 9.2.1 How to choose w(x, y)? (i) Large if neighborhoods of x, y are similar with respect to a distance metric. (ii) Sparse, otherwise we have n 2 terms in the regularized (n is number of pixels). Possible choice d u (x, y) = A(x) = arg min A Ω K G (t)(u(y + t) u(x + t)) 2 dt (9.22) { d u (x, y) A S(x), A = k} (9.23) y A 1 y A(x), x A(y) w(x, y) = 0 otherwise (9.24) with K G a Gaussian kernel of variance G 2.

10 Relaxation Example 10.1 (Segmentation). Find C Ω that fits the given data and prior knowledge about typical shapes. Theorem 10.2 (Chan-Vese). Given g : Ω R, Ω R d, consider inf f CV(C, c 1, c 2 ) (10.1) C R,c 1,c 2 R f CV (C, c 1, c 2 ) = (g c 1 ) 2 dx + (g c 2 ) 2 dx + λh d 1 ( C) C Ω\C (10.2). Confirm form of the regulariser Thus fit c 1 to shape, c 2 to outside of shape, and C is the region of the shape. 10.1 Mumford-Shah model f (K, u) = inf K Ω closed,u C (Ω\K) Ω (g u) 2 dx + λ f (K, u) (10.3) Ω\K u 2 2 dx + νhd 1 ( K). (10.4) Chan-Vese is a special case of this, with forcing u = c 1 I(C) + c 2 (1 I(C)).

46 andrew tulloch inf (g c 1 ) 2 dx + (g c 2 ) 2 dx + λh d 1 ( C) (10.5) C Ω,c 1,c 2 R C Ω\C inf u(g c 1 ) 2 + (1 u)(g c 2 ) 2 dx + λtv(u) u BV(D,{0,1}),c 1,c 2 R Fix c 1, c 2. Then Ω (10.6) inf u ((g c 1 ) 2 (g c 2 ) 2 ) dx + λtv(u) u BV(Ω,{0,1}) Ω }{{} S( ) (C) Replacing {0, 1} by it s convex hull, we obtain inf u Sdx + λtv(u) u BV(Ω,[0,1]) Ω (R) This is a convex relaxation - (i) Replace non-convex energy by convex approximation (ii) Replace non-convex f by con f. Can the minima of (R) have values / {0, 1}. Assume u 1, u 2 BV(Ω, {0, 1}) that u 2 minimized (R) and (C), and u 1 = u 2. Then is a minimizer of (R). 1 2 u 1 + 1 2 u 2 / BV(Ω, {0, 1}) (10.7) Proposition 10.3. Let c 1, c 2 be fixed. If u minimizes (R) and u BV(Ω, {0, 1}), then u minimizes (C) and (u = 1 C ) C minimizes f CV (, c 1, c 2 ). Proof. Let C Ω. Then 1 C BV(Ω, [0, 1]). Then f (1 C ) f (1 C ) (10.8) C Ω. Definition 10.4. L = BV(Ω, [0, 1]). For u L, α [0, 1], 1 u(x) > α u α (x) = I({u > α}) (x) = 0 u(x) α (10.9)

convex optimization 47 Then f : L R satisfies general coarea condition (gcc) if and only if f (u) = 1 0 f (u α )dα (10.10) Upper bound? Proposition 10.5. Let s L (Ω), Ω bounded. Then f as in (R) satisfies gcc. Proof. If f 1, f 2 satisfy gcc, then f 1 + f 2 satisfy gcc. This follows as λtv satisfies gcc by the coarea formula for total variation. Then we have Ω 1 u(x)s(x)dx = ( I({u(x) > α}) S(x)dα)dx (10.11) Ω 0 1 = I({u(x) > α}) dxdα. (10.12) 0 Ω Theorem 10.6. Assume f : BV(Ω, [0, 1]) R satisfies gcc and with u arg min f (u). (10.13) u BV(Ω,[0,1]) Then for almost any α [0, 1], u α is a minimizer of f over BV(Ω, {0, 1}), Proof. u α arg min f (u). (10.14) u BV(Ω,{0,1}) S = {α f (u α) = f (u )} (10.15) If L(S) =, then for a.e. α, f (u a) = f (u ) = If L(S) > 0, then there exists ɛ > 0 such that inf f (u) (10.16) u BV(Ω,[0,1]) L(S ɛ ) > 0, S ɛ = {α u α f (u ) + ɛ} (10.17)

48 andrew tulloch which implies f (u ) = f (u )dα + f (u )dα (10.18) [0,1]\S ɛ S ɛ < which contradicts gcc. 1 0 1 Remark 10.7. Discretization problem, consider such that 0 u 1 vs 0 f (u α)dα ɛl(s ɛ ) (10.19) f (u α)dα (10.20) min f (u), f (u) = s, u + λ Gu 2 (10.21) u R n min f (u), f (u) = s, u + λ Gu 2. (10.22) u R n such that u {0, 1} n. If u solves the relaxation, does u α solve the combinatorial problem? Only if the discretized energy f satisfies gcc. i i

11 Bibliography