Contents. 1 Introduction 5. 2 Convexity 9. 3 Cones and Generalized Inequalities Conjugate Functions Duality in Optimization 27

Size: px

Start display at page:

Download "Contents. 1 Introduction 5. 2 Convexity 9. 3 Cones and Generalized Inequalities Conjugate Functions Duality in Optimization 27"

Oscar Jennings
5 years ago
Views:

1 A N D R E W T U L L O C H C O N V E X O P T I M I Z AT I O N T R I N I T Y C O L L E G E T H E U N I V E R S I T Y O F C A M B R I D G E

3 Contents 1 Introduction Setup Convexity 6 2 Convexity 9 3 Cones and Generalized Inequalities 15 4 Conjugate Functions The Legendre-Fenchel Transform Duality Correspondences 24 5 Duality in Optimization 27 6 First-order Methods 31 7 Interior Point Methods 33

4 4 andrew tulloch 8 Support Vector Machines Machine Learning Linear Classifiers Kernel Trick 39 9 Total Variation Meyer s G-norm Non-local Regularization How to choose w(x, y)? Relaxation Mumford-Shah model Bibliography 49

5 1 Introduction 1.1 Setup A setup is (i) A function f : U Ω (energy), an input image (i), and a reconstructed candidate image (u) and find the minimizer of the problem where f is typically of the form min f (u, i) (1.1) u U f (u, i) = l(u, i) + }{{} r(u) }{{} cost regulariser (1.2) Example 1.1 (Rudin-Osher-Fortem). Will reduce contrast 1 min u (u I) 2 dx + u du (1.3) 2 Ω Ω Will not introduce new jumps Example 1.2 (Total variation (TV) L1). 1 min u u I dx + λ u du (1.4) 2 Ω Ω In general does not cause contrast loss.

6 6 andrew tulloch Can show that if I = B r (0), then I r λ 2 u = c r < λ 2 (1.5) Example 1.3 (Inpainting). 1 min u I dx + λ u dx (1.6) u 2 Ω/A Ω 1.2 Convexity Theorem 1.4 If a function f is convex, every local optimizer is a global optimizer. Theorem 1.5 If a function f is convex, it can (very often) be efficiently optimized (polynomial time in number of bits in the input) In computer vision, problems are: Usually large-scale ( variables) Usually non-differentiable Definition 1.6. f is lower semicountinuous if f (x ) lim inf x x f (x) = min{α R (x k ) x 1, f (xk ) α} (1.7) Theorem 1.7. f : R n R. Then the following are equivalent (i) f is lower semi continuous, (ii) epi f is closed C in R n R (iii) lev α f = {x R n f (x) α} are closed for all α R Proof. ((i) (ii)) Take (x k, α k ) epi f (x, α) R n R. Then By 2.8 f (x) lim inf k f (x k ) limin f k x k = α as (x, α) epi f. ((ii) (iii)) epi f closed implies epi f (R n {a }) closed for all α R, if and only if {x R n f (x) α } (1.8)

7 convex optimization 7 is closed for all α R If α =, lev + f = R n. If α =, lev f = k N lev k f which is the intersection of closed sets. ((iii) (i)) For x j x, take the subsequence x k x, with Then f (x k ) lim inf f (x) = c (1.9) x x if c R, for all K(ɛ) = K such that f (x k ) C + ɛ for all k > K. x k lev C+ɛ f x lev C+ɛ. Equivalently, x lev C f f (x ) C = lim inf x x f (x). If c = +, done - f (x ) + = lim inf. If c =, same argument with k N, f (x k ) k,... Example 1.8. f (x) = x is lower semi-continuous, but does not have a minimizer. Definition 1.9. f : R n R is level-bounded if and only if lev α f is bounded for all α R. This is also known as coercivity. Example f (x) = x 2 is level-bounded. f (x) = 1 is not level x bounded. Theorem f : R n R is level bounded if and only if f (x k ) if x k. Proof. Exercise. Theorem f : R n R is lower-semicontinuous, level-bounded, and proper. Then inf x f (x) (, + ) and argmin f = {x f (x) in f f (x)} is nonempty and compact. Proof. arg min f = {x f (x) inf x f (x)} (1.10) = {x f (x) inf f (x) + 1, k N} (1.11) k = k N lev f (1.12) inf f + 1 k If inf f is, just replace inf f + 1 k by α with α >, and set α k = k, k N.

8 8 andrew tulloch These are bounded (by level-boundedness), closed (by f being lower semicontinuous), and non-empty (since 1 k 0.). Then these limit sets are compact, we can just take the limit of the left boundaries of the level sets, and construct the convergent subsequence that is contained in every level set. We need to show that inf f =. If inf f =, then there exists x argmin f with f (x) =. Since f is proper, this cannot exist. Thus, inf f == in f ty. Remark For Theorem 1.12, it suffices to have lev α f bounded and non-empty for at least one α R. Proposition We have the following properties for lower semi continuity. (i) If f, g is lower semicontinuous, then f + g is lower semicontinuous (ii) IF f is lower semicontinuous, λ 0, then λ f is lower semicontinuous (iii) f : R n R is lower semicontinuous, g : R m R n is continuous, then f g is lower semicontinuous.

9 2 Convexity Definition 2.1. (i) f : R n R is convex if f ((1 τ)x + τy) (1 τ) f (x) + τ f (y) (2.1) for all x, y R n, τ (0, 1). (ii) A set C R n is convex if and only if I(C) is convex. (iii) f is strictly convex if and only if (2.1) holds strictly whenever x = y and f (x), f (y) R. Remark 2.2. C R n is convex if and only if for all x, y C, the connecting line segment is contained in C. Exercise 2.3. Show {x a T x + b 0} (2.2) is convex. f (x) = a T x + b (2.3) is convex. Definition 2.4. x 0,..., x m R n, λ 0,..., λ m 0 with m i=0 λ i = 1, then i λ i x i is a convex combination of the x i. Theorem 2.5. R n R is convex if and only if f ( n i=1 λ i x i ) n λ i f (x i ) (2.4) i=1

10 10 andrew tulloch C R n is convex if and only if C contains all convex combinations of it s elements. Proof. n ( ) λ λ i x i = λ m x m + 1 λ i m x 1 λ i i=1 m (2.5) which is a convex combination of two points, and proceed by induction on m. The set version is proven by application on I(C). Proposition 2.6. f : R n R is convex implies that the domain of f is convex. Proposition 2.7. f : R n R is convex if and only if epi f is convex in R n R if and only if {(x, α) R n R f (x) < α} (2.6) is convex in R n R. Proof. epi f is convex if and only if for all τ (0, 1), x, y, α f (x), β > f (y), (x, α), (y, β) epi f, we have f ((1 τ)x + τy) (1 τ)α + τβ (2.7) τ (0, 1), x, y, f ((1 τ)x + τy) (1 τ) f (x) + τ f (y) (2.8) f is convex. (2.9) Proposition 2.8. f : R n R is convex implies lev α f is convex for all α R. Proof. f ((1 τ)x + τy) (1 τ) f (x) + τ f (y) α which implies lev α is convex. α = then the lev α f = R n which is convex. Theorem 2.9. f : R n R is convex. Then

11 convex optimization 11 (i) arg min f is convex (ii) x is a local minimizer of f implies x is a global minimizer of f (iii) f is strictly convex and proper implies f has at most one global minimizer. Proof. (i) f = arg min f =. f = arg min f = lev inf f f is convex by previous proposition. (ii) Assume x is a local minimizer and there exists y with f (y) < f (x). Then f ((1 τ)x + τy) (1 τ) f (x) + τ f (y) < f (x) }{{} < f (x) Taking τ 0 shows that x cannot be a local minimizer, and thus y cannot exist. (iii) Assume x, y minimizes, which implies f (x) = f (y). Then f ( 1 2 x y) < 1 2 f (x) + 1 f (y) = f (x) = f (y) 2 which implies x, y not global minimizers. Proposition To construct convex functions: (i) Let f i, i I convex, then is convex. f (x) = sup i I (ii) Let f i, i I strictly convex, I finite, then f (x) (2.10) is strictly convex. sup i I f (i) (2.11) (iii) C i, i I convex sets, then i I C i (2.12) is convex.

12 12 andrew tulloch (iv) f k, k N convex, is convex. lim sup k f k (x) (2.13) Proof. Exercise. Example (i) C, D convex does not imply C D is convex (e.g. disjoint) (ii) f : R n R is convex, C is convex implies f + I(C) is convex. (iii) f (x) = x = max{x x} is convex. (iv) f (x) = x p, p 1 is convex, as p = sup, y (2.14) y p =1 Theorem Let C R n be open and convex. Let f : C R be differentiable. Then the following are equivalent: (i) f is [strictly] convex (ii) y x, f (y) f (x) 00 for all x, y C [with > 0 for x = y] (iii) f (x) + f (x), y x f (y) for all x, y C [with < for x = y] (iv) If f is additionally twice differentiable, then 2 f is positive semidefinite for all x C. (ii) is monotonicity of f. Proof. Exercise (reduce to n = 1, then extend by a convex function is convex on R n iff it is convex on all R n 1 subsets.) Remark Note that the inverse of (iv) does not necessarily hold, for example f (x) = x 4. Proposition We have the following results hold for convex functions. (i) f i,..., f m : R n R is convex, λ i,..., λ m 0, the n f = λ i f i (2.15) i is convex, and strictly convex if there exists i such that λ i > 0 and f i is strictly convex.

13 convex optimization 13 (ii) f i : R n R is convex implies f (x 1,..., x m ) = f i (x i ) is convex, strictly convex if all f i are strictly convex. (iii) f : R n R is convex, A R n n, b R n implies g(x) = f (Ax + b) is convex. Remark is convex. M ɛ = sup x R n ( Mx ) Proposition (i) c 1,..., C m convex implies C 1 C m convex (ii) C R n, A R m n, b R m implies L(C) is convex with L(x) = Ax + b. (iii) C R m,... L (C) is convex. (iv) f, g convex implies f + g are convex (v) f convex, λ R implies λ f is convex Definition For S R n, x R n, define the projection of x onto S as Π S (y) = arg min x y 2 (2.16) x S Proposition If C R n is convex, closed, and non-empty, then Π C is single-valued - that is, the projection is unique. Proof. Let Π S (y) = arg min x S 1 2 x y I C(x) (2.17) To show uniqueness, 1 2 x y 2 is strictly convex, and x y 2 = I > 0, so f is strictly convex. f is proper (C = ), and thus f has at most one minimizer. To show existence, we have that f is proper, lower semicontinuous (left part from continuous, right part from C closed). Level bounded as x y 2 2 as x 2, and I(C) 0. Thus, the arg min f =.

14 14 andrew tulloch Definition Let S R n is arbitrary. Then con S = C convex, C S C (2.18) is the convex ball of S. Remark con S is the smallest convex set that contains S. Theorem Let S R n, then con S = { n i=0 λ i x i x i S, λ i 0, n i=0 λ i = 1 } (2.19) Proof. D is convex and contains S - if x, y D, then (1 τ)x + (τ)y D. Thus con S D. From a previous theorem, con S convex implies con S contains all convex combinations of points in S. Thus con S D Thus con S = D. Definition For a set C R n, define cl C as cl C = {x R n for all open neighborhoods N of x, N C = } int C as (2.20) int C = {x R n there exists an open neighborhood N of x with N C} C (the boundary) as (2.21) C = cl C\ int C (2.22) Remark cl G = S (2.23) S closed, S G

15 3 Cones and Generalized Inequalities Definition 3.1. K R n is a cone if and only if 0 K and λx K for all x K, λ 0. Definition 3.2. A cone in pointed if and only if x x n = 0 x 1 = = x n = 0 (3.1) Example 3.3. (i) R n is a pointed cone. (ii) {(x 1, x 2 ) R 2 x 2 > 0} is not a cone. (iii) {(x 1, x 2 ) R 2 x 2 0} is a cone, but is not pointed. (iv) {x R n x i 0, i 1,..., n} is a pointed cone. Proposition 3.4. K R n be any set. Then the following are equivalent: (i) K is a convex cone. (ii) K is a cone and K + K K. (iii) K = and i=0 n α ix i K for all x i K, α i 0. Proof. Exercise. Proposition 3.5. If K is a convex cone, then K is pointed if and only if K ( K) = {0}. Proof. Exercise. Definition 3.6. f (x) = {v R n f (x) + v, y x f (y) y} (3.2)

16 16 andrew tulloch f (x) = lim t 0 f (x + t) f (x) t Proposition 3.7. Let f, g : R n R be convex. Then (3.3) (i) f differentiable at x implies f (x) = { f (x)}. (ii) f differentiable at x, g(x) R implies ( f + g)(x) = f (x) + g(x) (3.4) Proof. We have (i) Equivalent to (ii) with g = 0. (ii) To show LHS RHS, if v g(x), then v, y x g(y), which by Theorem 3.12 in notes gives f (x), y x + f (x) f (y) (3.5) v + f (x), y x + ( f + g)(x) ( f + g)(y) (3.6) v + f (x) ( f + g) (3.7) The other direction is given as lim inf t 0 lim inf z x lim inf z x f (z) + g(z) f (x) g(x) v, z x z x g(z) g(x) v f (x), z x z x g(z(t)) g(x) v f (x), (1 t)x + ty y t y x lim inf t 0 t(g(y) g(x)) t v f (x), y x t y x 0 (3.8) 0 (3.9) 0 (3.10) 0 (3.11) g(y) g(x) v f (x), y x 0 v f (x) (3.12) g(x) v f (x) g(x) (3.13)

17 convex optimization 17 Theorem 3.8. Let f : R n R be proper. Then x arg min f 0 f (x) (3.14) Proof. 0 f (x) 0, y x + f (x) f (y) (3.15) }{{} 0 Definition 3.9. For a convex set C R n and point x C, N C (x) = {v R n v, y x 0 y G} (3.16) is the normal cone of x. By convention, N C (x) = for all x / C. Proposition Let C be convex and C =. Then I(C) (x) = N C (x) (3.17) Proof. For x C, we have I(C) = {v I(C) (x) + v, y x I(C) (y) y C} = N C (x) (3.18) For x / C,. Fill in proof here Proposition C, closed, C = and convex, x R n. Then y = Π C (x) x y N C (y) (3.19) Proof. y Π C (x) y minimizes 1 2 y x 2 + I(C) (y ). (3.20) }{{} g(y) If and only if 0 g(y) if and only if 0 y x + I(C) (y) Proposition Let f : R n R be convex proper. Then x / dom f f (x) = {v R n (u, 1) N epi f (x, f (x))} x dom f (3.21)

18 18 andrew tulloch If x dom f, N dom f = {v R n (x, 0) N epi f (x, f (x))} (3.22) Example Let subdifferential of f : R R, f (x) = x is sign x x = 0 f (x) = [ 1, 1] x = 0 (3.23) Definition For C R n, define the affine hull as aff(c) = A affine, C A A (3.24) rint(c) = {x R n there exists an open neighborhood N of x such that N aff(c) C} (3.25) Example aff(r n ) = R n (3.26) aff(r n ) = R n (3.27) rint([0, 1] 2 ) = (0, 1) 2 (3.28) rint([0, 1] {0}) = (0, 1) {0} (3.29) Exercise We know inta B A B (3.30) Does rint(a B) rint A rint B (3.31) Proposition Let f : R n R be convex. Then (i) (i) g(x) = f (x + y) g(x) = f (x + y) (ii) g(x) = f (λx) g(x) = λ f (x) (iii) g(x) = λ f (x) g(x) = λ f (x) (ii) f : R n R proper, convex, A R m n such that {Ay y R m } rint dom f = (3.32)

19 convex optimization 19 Then for x dom( f A) we have ( f A)(x) = A T f (Ax) (3.33) (iii) Let f 1,... f m : R n R be proper, convex, and rint dom f 1 rint dom f m =. Then f (x) = f 1 (x) + + f m (x) (3.34)

21 4 Conjugate Functions 4.1 The Legendre-Fenchel Transform Definition 4.1. For f : R n R, define is the convex hull of f. con f (x) = sup g(x) (4.1) g f,g convex Proposition 4.2. con f is the greatest convex function majorized by f. Definition 4.3. For f : R n R, the (lower) closure of f is defined as cl f (x) = lim inf y x Proposition 4.4. For f : R n R, f (y) (4.2) epi(cl f ) = cl(epi f ) (4.3) In particular, if f is convex, then cl f is convex. Proof. Exercise. Proposition 4.5. If f : R n R, then (cl f )(x) = sup g(x) (4.4) g f,g lsc Proof. Fill in

22 22 andrew tulloch Theorem 4.6. Let C R n be closed and convex. Then C = H b,β (4.5) (b,β)s.t.c H b,β where H b,β = {x R n x, b β 0} (4.6) Proof. Theorem 4.7. Let f : R n R be proper, lsc, and convex. Then f (x) = sup g(x) (4.7) g f,g affine Proof. Definition 4.8. For f : R n R, let f : R n R be defined by This is quite an involved proof. f (v) = sup x R n { v, x f (x)} (4.8) as the conjugate to f. The mapping f f is the Legendre-Fenchel Transform Remark 4.9. For v R n, f (v) = sup{x R n { v, x f (x)} (4.9) f (v) v, x f (x) x R n f (x) v, x f (x) x R n. (4.10) Thus f is the largest affine function with gradient v majorized by f. Theorem Assume f : R n R. Then (i) f = (con f ) = (cl f ) = (cl con f ) and f f = ( f ), the biconjugate of f. (ii) If con f is proper, then f, f as proper, lower semicontinuous, convex and f = cl con f. (iii) If f : R n R is proper, lower semicontinuous, and proper, then f = f (4.11)

23 convex optimization 23 Proof. We have (v, β) epi f β v, x f (x) x R n f (x) v, x β x R n (4.12) We claim that for an affine function h, we have h f h con f h cl f h cl con f. This is shown as con f is the largest convex function less than or equal to f, and h is convex. Same for cl, cl con, etc. Thus in (4.12) we can replace f by con f, cl f, cl con f, which gives our required result. We also have f (y) = sup v R n { v, y sup x R n { v, x f (x)}} (4.13) sup v R n { v, y v, y + f (y)} (4.14) = f (y) (4.15) For the second part, we have con f is proper, and claim that cl con f is proper, lower semicontinuous, and convex. Lower semicontinuity is a give. Convexity is given by the previous proposition that f is convex implies cl f is convex. Properness is to be shown in an exercise. Applying the previous theorem, cl con f (x) = sup g(x) (4.16) g cl con f,g affine = sup (v,β) epi f { v, x β} (4.17) = sup (v,β) epi f { v, x f (v)} (4.18) = sup v dom f { v, x f (v)} (4.19) = sup v R n { v, x f (v)} f (x) (4.20) with g(x) v, x β, (v, β) epi(cl con f ) (v, β) epi f. To show f is proper, lower semicontinuous, and convex, we have epi f is the intersection of closed convex sets, and therefore closed and convex, and hence f is lower semicontinuous and convex.

24 24 andrew tulloch To show properness, we have con f is proper implies there exists x R n with con f (x) <. Then f (v) = sup x R n{ v, x f (x)}, which is greater than. If f +, then cl con f = f = sup v v, x f (x), and so cl con f is proper, which implies f is proper, lower semicontinuous, and convex. Applying to f - we need con f proper (which is proper by previous result) and thus f is proper, lower semicontinuous, and convex. For part 3, apply 2 - f is convex, which implies con f = f and con f is proper (as f is proper), and f is lsc and convex, and thus f = cl con f = f. 4.2 Duality Correspondences Theorem Let f : R n R be proper, lower semicontinuous, and convex. Then: (i) f = ( f ) 1 (ii) v f (x) f (x) + f (v) = v, x x ( f )(v). (iii) f (x) = arg max{ v, x f (v )} (4.21) v f (x) = arg max{ v, x f (x )} (4.22) x (4.23) (ii) Proof. (i) This is obvious from (2) f (x) + f (v) = v, x (4.24) { f (v) = v, x f (x) (4.25) x arg max{ v, x f (x )} (4.26)... (4.27) Finish off proof

25 convex optimization 25 Proposition Let f : R n R be proper, lower semicontinuous, and convex. Then Proof. Exercise ( f ( ) a, ) = f ( +a) (4.28) ( f ( + b)) = f ( ), b (4.29) ( f ( ) + c) = f ( ) c (4.30) (λ f ( )) = λ f ( ), λ > 0 λ (4.31) (λ f ( λ )) = λ f ( ) (4.32) Proposition f i : R n R, i : 1,..., m proper, f (x 1,..., x m ) = i f i (x i ). Then f (v 1,..., v m ) = i f (v i ) Definition For any set S R n, define the support function G S (v) = sup v, x = (δ S ) (v) (4.33) x S Definition A function h : R n R is positively homogeneous if 0 dom h and h(λx) = λh(x) for all λ > 0, x R n. Proposition The set of positive homogeneous proper lower semicontinuous convex functions and the set of closed convex nonempty sets are in one-to-one correspondence through the Legendre-Fenchel transform. δ C G C (4.34) and and x G C (v) x C (4.35) G C (v) = v, x v N C (x) = δ C (x) (4.36) The set of closed convex cones is in one-to-one correspondence with itself: δ K δ K (4.37) K = {v R n v, x 0 x K} (4.38)

26 26 andrew tulloch and x N K (v) x K, v K, (4.39) x, v = 0 v N K (x) (4.40)

27 5 Duality in Optimization Definition 5.1. For f : R n R m R proper, lower semicontinuous, convex, we define the primal and dual problems and the inf-projections inf φ(x), φ(x) = f (x, 0) (5.1) x Rn sup y R m ψ(y), ψ(y) = f (0, y) (5.2) p(u) = inf f (x, u) (5.3) x Rn q(v) = inf y R m f (v, y) (5.4) f is the perturbation function for φ, p is the associated projection function. Consider the problem inf x Consider the perturbed problem 1 2 x z 2 + δ 0 (Ax b) = inf φ(x) (5.5) f (x, u) = 1 2 x z 2 + δ 0 (Ax b + u) (5.6) Proposition 5.2. Assume f satisfying the assumptions in Definition 5.1. Then (i) φ, psi are convex and lower semicontinuous (ii) p, q are convex

28 28 andrew tulloch (iii) p(0) = inf x φ(x) (iv) p (0) = sup y ψ(y) (v) inf x φ(x) < 0 dom p (vi) sup ψ(y) > 0 dom q Proof. (i) φ is clearly convex. For ψ, f is lower semicontinuous and convex, which implies ψ is lower semicontinuous and convex. (ii) Look at the strict epigraph of p: E = {(u, α) R m R p(u) < α} (5.7) = {(u, α) R m R x : f (x, u) < α} (5.8) = A{(u, α, x) R m R R n f (x, u) α} (5.9) = A(u, α, x) (u, α) (5.10) as a linear map over a convex set. As E is convex, p must be convex. Similarly with q. (iii) For p(0), this proceeds by definition. For p (0), p (y) = sup{ y, u p(u)} (5.11) u = sup y, u f (x, u) (5.12) u,x = f (0, y) (5.13) and p (0) = sup 0, y p (y) (5.14) y = sup f (0, y) (5.15) y = sup ψ(y) (5.16) y (iv) By definition, 0 dom p p(0) = inf x f (x) <. complete proof

29 convex optimization 29 Theorem 5.3. Let f as in Definition 7.1. Then weak duality holds p(0) = inf x φ(x) sup ψ(y) = p (0) (5.17) and under certain conditions the inf, sup are equal and finite (strong duality). p(0) R and p lower-semicontinuous at 0 if and only if inf φ(x) = sup ψ(y) R. y Definition 5.4. inf φ sup ψ (5.18) is the duality gap Proof. ( ) p cl p p cl p(0) = p(0). ( ) cl p is lower semicontinuous, convex cl p(x) is proper, sup ψ = (p ) (0) = (cl p) (0) = cl p(0) = p(0) = inf φ Proposition 5.5. b Fill in notes from lecture

31 6 First-order Methods Idea is that we do gradient descent on our objective function f, with step size τ k. Definition 6.1. Let f : R n R, then (i) F τk f (x k ) = x k τ k f (x k ) = (I τ k f )(x k ) (6.1) (ii) B τk f (x k ) = (I + τ k f ) 1 (x k ) (6.2) so (6.3) Fill in Proposition 6.2. If f : R n R is proper, lower semicontinuous, and convex, for τ > 0, Proof. B τk f (x) = arg min{ 1 y x f (y)} (6.4) y 2τ k y arg min{...} 0 1 τ k (y x) f (y) (6.5) (6.6) Missed lecture material

33 7 Interior Point Methods Theorem 7.1 (Problem). Consider the problem with K a closed convex cone - thus Ax K b. inf x c, x + δ K(Ax b) (7.1) Notation. The idea is to replace δ K by a smooth approximation F, with F(x) as x boundary K, and solve inf c, x + 1 t F(x), equivalent to solving t c, x + F(x). Proposition 7.2. F is canonical barrier for K. Then F is smooth on dom F = K and strictly convex, with F(tx) = F(x) Θ F λt, (7.2) and (i) F(x) dom F (ii) F(x), x = Θ F (iii) F( F(x)) = x (iv) F(tx) = 1 t F(x). For the problem inf c, x (7.3) x

34 34 andrew tulloch such that Ax K b with K closed, convex, self-dual cone, we have sup inf c, x + δ K(Ax b) (7.4) x x b, y h ( A T y c) k (y) (7.5) sup b, y s.t.y K 0, A T y = c (7.6) y inf c, x + F(ax b) (7.7) x sup b, y F(y) δ A T y=c (7.8) y More conditions on, etc Proposition 7.3. For t > 0, define x(t) = arg min{t c, x + F(Ax b)} (7.9) y(t) = arg min{ t b, y + F(y) + δ A T y=c } (7.10) z(t) = (x(t), y(t)) (7.11) These paths exist, are unique, and (x, y) = (x(t), y(t)) A T y = c ty + F(Ax b) = 0 (7.12) Proof. The first part follows by definition. For (7.12), the primal optimality condition is that 0 = tc + A T F(Ax b). The dual optimality condition is that 0 tb + F(y) + N z A T z=c (y), which is A T y = c 0 tb + F(y) + range A A T y = c (7.13) Assume x satisfies the primal optimality condition. Then 0 tb + F(y) + range A, A T y = c (7.14) tc + A T F(Ax b) = 0 tc + A T ( ty) = 0 (7.15) A T y = c (7.16)

35 convex optimization 35 A few more steps... Proposition 7.4. If x, y feasible, then φ(x) ψ(y) = y, Ax b (7.17) Moreover, for (x(t), y(t)) on the central path, φ(x(t)) ψ(y(t)) = Θ F t (7.18) Proof. φ(x) ψ(y) = c, x b, y (7.19) = A T y, x b, y (7.20) = Ax b, y (7.21) For the second part, we have φ(x(t)) ψ(y(t)) = y(t), Ax(t) b (7.22) = 1t F(Ax(t) b), Ax(t) b (7.23) = Θ F t (7.24) Fill in from lecture notes

37 8 Support Vector Machines 8.1 Machine Learning Given X = {x 1,..., x n } a sample set, x i F R m, with F our feature space, and C a set of classes. We seek to find h θ : F C, θ Θ (8.1) with Θ our parameter space. Our task is to find θ such that h θ is the best mapping from X into C. (i) Unsupervised learning (only X is known, usually C not known) Clustering Outlier detection Mapping to lower dimensional subspace (ii) Supervised learning ( C known). We have training date T = {(x 1, y 1 ),..., (x n, y n )}, with y i C. We seek to find θ = arg min θ Θ f (θ, T) = n g(y i, h θ (x i )) + R(θ) (8.2) i=1

38 38 andrew tulloch 8.2 Linear Classifiers The idea is to consider h θ : R m { 1, 1}, θ = w (8.3) b h θ (x) = sign( w, x + b) (8.4) We want to consider maximum margin classifiers, satisfying which can be rewritten as w max min y i ( w,b,w =0 i w, x + b w ) (8.5) max c (8.6) w,b such that ( w c y i w, xi + b ) ( w w c y i w, x i ) + b (8.7) (8.8) (8.9) or just min w,b 1 2 w 2 (8.10) such that 1 y i ( w, x + b) (8.11) Definition 8.1. In standard form, inf w,b k(w, b) + h(m w e) (8.12) b

39 convex optimization 39 The conjugates are k(w, b) = 1 2 w 2 (8.13) k (u, c) = 1 2 u 2 + δ {0} (c) (8.14) h(z) = δ 0 (z)h (v) = δ v (v) (8.15) In saddle point form, inf sup 1 w,b z 2 w M w e, z δ 0 (z) (8.16) b The dual problem is sup e, z δ 0 (z) 1 n z 2 y i x i z i 2 2 δ {0}( y, z ) (8.17) i=1 and thus inf z such that z 0, y, z = 0. The optimality conditions are 1 2 y i x i z i e, z (8.18) i (8.19) We use the fact that if k(x) + h(ax + b) is our primal, then the dual is b, y k ( A T y) h (z). 8.3 Kernel Trick Fill in rest of optimality conditions Explanation of support vectors The idea is to embed our features into a a higher dimensional space, mapping function φ. Then our decision function takes the form h θ (x) = sign( φ(x), w + b) (8.20)

41 9 Total Variation 9.1 Meyer s G-norm Idea - regulariser that favors textured regions u G = {inf v v = u, v L (R d, R d )} (9.1) Discretized, we have u G = inf{δ C(v) + δ G T v=u } (9.2) C = {v v(x) 2 1} (9.3) x = inf v sup v = sup w sup{δc w, (v) G T v + u } w (9.4) sup{ v, Gw + δc (v) w, u } v (9.5) = sup{δ C ( Gw) + w, u } (9.6) w = sup{ w, u δ C ( Gw)} (9.7) w = sup{ w, u TV(w) 1} (9.8) and so G is the dual to TV. G = δ B TV (9.9)

42 42 andrew tulloch where B TV = {u TV(u) 1}. Similarly, sup{ u, w w G 1} (9.10) = sup{ u, w v : w = G T v, v 1} (9.11) = sup{ u, G Tv v 1} (9.12) = TV(u) (9.13) Why is good in separating noise? arg min 1 2 u g 2 2 s.t. u G λ (9.14) = λb G (g) = B λb G (g) = g B λb G (9.15) = g B λtv (g) 9.2 Non-local Regularization In real-world images, large u does not always mean noise. Definition 9.1. Ω = {1,..., n}, given u R n, x, y Ω, w Ω 2 R 0, then y u(x) = (u(y) u(x))w(x, y) (9.16) w u(x = ( y u(x))) y Ω (9.17) A suitable divergence w u(x) = y Ω (w(x, y) v(y, x))w(x, y) adjoint to w with respect to Euclidean inner products. w v, u = v, u w (9.18) Non-local regularizers are J(u) = g( u(x)) (9.19) w x Ω

43 convex optimization 43 with TV g NL (u) = u(x) 2 (9.20) w x Ω TV d NL (u) = x Ω y Ω y u(x) (9.21) This reduces to classical TV if the weights are chosen as constant (w(x, y) = 1 h ) How to choose w(x, y)? (i) Large if neighborhoods of x, y are similar with respect to a distance metric. (ii) Sparse, otherwise we have n 2 terms in the regularized (n is number of pixels). Possible choice d u (x, y) = A(x) = arg min A Ω K G (t)(u(y + t) u(x + t)) 2 dt (9.22) { d u (x, y) A S(x), A = k} (9.23) y A 1 y A(x), x A(y) w(x, y) = 0 otherwise (9.24) with K G a Gaussian kernel of variance G 2.

45 10 Relaxation Example 10.1 (Segmentation). Find C Ω that fits the given data and prior knowledge about typical shapes. Theorem 10.2 (Chan-Vese). Given g : Ω R, Ω R d, consider inf f CV(C, c 1, c 2 ) (10.1) C R,c 1,c 2 R f CV (C, c 1, c 2 ) = (g c 1 ) 2 dx + (g c 2 ) 2 dx + λh d 1 ( C) C Ω\C (10.2). Confirm form of the regulariser Thus fit c 1 to shape, c 2 to outside of shape, and C is the region of the shape Mumford-Shah model f (K, u) = inf K Ω closed,u C (Ω\K) Ω (g u) 2 dx + λ f (K, u) (10.3) Ω\K u 2 2 dx + νhd 1 ( K). (10.4) Chan-Vese is a special case of this, with forcing u = c 1 I(C) + c 2 (1 I(C)).

46 46 andrew tulloch inf (g c 1 ) 2 dx + (g c 2 ) 2 dx + λh d 1 ( C) (10.5) C Ω,c 1,c 2 R C Ω\C inf u(g c 1 ) 2 + (1 u)(g c 2 ) 2 dx + λtv(u) u BV(D,{0,1}),c 1,c 2 R Fix c 1, c 2. Then Ω (10.6) inf u ((g c 1 ) 2 (g c 2 ) 2 ) dx + λtv(u) u BV(Ω,{0,1}) Ω }{{} S( ) (C) Replacing {0, 1} by it s convex hull, we obtain inf u Sdx + λtv(u) u BV(Ω,[0,1]) Ω (R) This is a convex relaxation - (i) Replace non-convex energy by convex approximation (ii) Replace non-convex f by con f. Can the minima of (R) have values / {0, 1}. Assume u 1, u 2 BV(Ω, {0, 1}) that u 2 minimized (R) and (C), and u 1 = u 2. Then is a minimizer of (R). 1 2 u u 2 / BV(Ω, {0, 1}) (10.7) Proposition Let c 1, c 2 be fixed. If u minimizes (R) and u BV(Ω, {0, 1}), then u minimizes (C) and (u = 1 C ) C minimizes f CV (, c 1, c 2 ). Proof. Let C Ω. Then 1 C BV(Ω, [0, 1]). Then f (1 C ) f (1 C ) (10.8) C Ω. Definition L = BV(Ω, [0, 1]). For u L, α [0, 1], 1 u(x) > α u α (x) = I({u > α}) (x) = 0 u(x) α (10.9)

47 convex optimization 47 Then f : L R satisfies general coarea condition (gcc) if and only if f (u) = 1 0 f (u α )dα (10.10) Upper bound? Proposition Let s L (Ω), Ω bounded. Then f as in (R) satisfies gcc. Proof. If f 1, f 2 satisfy gcc, then f 1 + f 2 satisfy gcc. This follows as λtv satisfies gcc by the coarea formula for total variation. Then we have Ω 1 u(x)s(x)dx = ( I({u(x) > α}) S(x)dα)dx (10.11) Ω 0 1 = I({u(x) > α}) dxdα. (10.12) 0 Ω Theorem Assume f : BV(Ω, [0, 1]) R satisfies gcc and with u arg min f (u). (10.13) u BV(Ω,[0,1]) Then for almost any α [0, 1], u α is a minimizer of f over BV(Ω, {0, 1}), Proof. u α arg min f (u). (10.14) u BV(Ω,{0,1}) S = {α f (u α) = f (u )} (10.15) If L(S) =, then for a.e. α, f (u a) = f (u ) = If L(S) > 0, then there exists ɛ > 0 such that inf f (u) (10.16) u BV(Ω,[0,1]) L(S ɛ ) > 0, S ɛ = {α u α f (u ) + ɛ} (10.17)

48 48 andrew tulloch which implies f (u ) = f (u )dα + f (u )dα (10.18) [0,1]\S ɛ S ɛ < which contradicts gcc Remark Discretization problem, consider such that 0 u 1 vs 0 f (u α)dα ɛl(s ɛ ) (10.19) f (u α)dα (10.20) min f (u), f (u) = s, u + λ Gu 2 (10.21) u R n min f (u), f (u) = s, u + λ Gu 2. (10.22) u R n such that u {0, 1} n. If u solves the relaxation, does u α solve the combinatorial problem? Only if the discretized energy f satisfies gcc. i i

49 11 Bibliography

A set C R n is convex if and only if δ C is convex if and only if. f : R n R is strictly convex if and only if f is convex and the inequality (1.

A set C R n is convex if and only if δ C is convex if and only if. f : R n R is strictly convex if and only if f is convex and the inequality (1. ONVEX OPTIMIZATION SUMMARY ANDREW TULLOH 1. Eistence Definition. For R n, define δ as 0 δ () = / (1.1) Note minimizes f over if and onl if minimizes f + δ over R n. Definition. (ii) (i) dom f = R n f()