Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions

Shiqian Ma, MAT-258A: Numerical Optimization 3 Basic inequality recall basic inequality for convex differentiable f: f(y) f(x) + f(x) (y x) the first-order approximation of f at x is a global lower bound f(x) defines non-vertical supporting hyperplane to epi f at (x, f(x)) ( ) (( ) ( )) f(x) y x 0, (y, t) epi f 1 t f(x) what if f is not differentiable?

Shiqian Ma, MAT-258A: Numerical Optimization 4 Subgradient g is a subgradient of a convex function f at x dom f if f(y) f(x) + g (y x), y dom f properties f(x) + g (y x) is a global lower bound on f(y) g defines non-vertical supporting hyperplane to epi f at (x, f(x)) ( ) (( ) ( )) g y x 0, (y, t) epi f 1 t f(x) if f is convex and differentiable, then f(x) is a subgradient of f at x

Shiqian Ma, MAT-258A: Numerical Optimization 5 applications algorithms for nondifferentiable convex optimization unconstrained optimality: x minimizes f(x) iff 0 f(x) KKT conditions with nondifferentiable functions

Shiqian Ma, MAT-258A: Numerical Optimization 6 Example f(x) = max{f 1 (x), f 2 (x)}, f 1, f 2 convex and differentiable subgradients at x 0 form line segment [ f 1 (x 0 ), f 2 (x 0 )] if f 1 (ˆx) > f 2 (ˆx), subgradient of f at ˆx is f 1 (ˆx) if f 1 (ˆx) < f 2 (ˆx), subgradient of f at ˆx is f 2 (ˆx)

Shiqian Ma, MAT-258A: Numerical Optimization 7 4.1.1. Subdifferential the subdifferential f(x) of f at x is the set of all subgradients: properties f(x) = {g g (y x) f(y) f(x), y dom f} f(x) is a closed convex set (possibly empty) (follows from the definition: f(x) is an intersection of halfspaces) if x int dom f then f(x) is nonempty and bounded (see the next two pages for proof)

Shiqian Ma, MAT-258A: Numerical Optimization 8 proof: we show that f(x) is nonempty when x int dom f (x, f(x)) is in the boundary of the convex set epi f therefore there exists a supporting hyperplane to epi f at (x, f(x)): ( ) (( ) ( )) a y x (a, b) 0, 0, (y, t) epi f b t f(x) b > 0 gives a contradiction as t b = 0 gives a contradiction for y = x + ɛa a with small ɛ > 0 therefore b < 0 and g = a/ b is a subgradient of f at x

Shiqian Ma, MAT-258A: Numerical Optimization 9 proof: f(x) is bounded when x int dom f for small r > 0, define a set of 2n points B = {x ± re k k = 1,..., n} dom f and define M = max y B f(y) < for every nonzero g f(x), there is a point y B with f(y) f(x) + g (y x) = f(x) + r g (choose an index k with g k = g, and take y = x+rsign(g k )e k therefore f(x) is bounded: sup g M f(x) g f(x) r

Shiqian Ma, MAT-258A: Numerical Optimization 10 absolute value f(x) = x : f(x) = Examples 1 x > 0 [ 1, 1] x = 0 1 x < 0 Euclidean norm f(x) = x 2 : f(x) = 1 x 2 x, if x 0, f(x) = {g g 2 1}, if x = 0

Shiqian Ma, MAT-258A: Numerical Optimization 11 Monotonicity subdifferential of a convex function is a monotone operator: (u v) (x y) 0, x, y, u f(x), v f(y) proof: by definition f(y) f(x) + u (y x), f(x) f(y) + v (x y) combining the two inequalities shows monotonicity

Shiqian Ma, MAT-258A: Numerical Optimization 12 Examples of non-subdifferentiable functions the following functions are not subdifferentiable at x = 0 f : R R, dom f = R + f(x) = 1 if x = 0, f(x) = 0 if x > 0 f : R R, dom f = R + f(x) = x the only supporting hyperplane to epi f at (0, f(0)) is vertical

Shiqian Ma, MAT-258A: Numerical Optimization 13 Subgradients and sublevel sets if g is a subgradient of f at x, then f(y) f(x) g (y x) 0 nonzero subgradients at x define supporting hyperplanes to sublevel set {y f(y) f(x)}

Shiqian Ma, MAT-258A: Numerical Optimization 14 4.1.2. Subgradient calculus weak subgradient calculus: rules for finding one subgradient sufficient for most nondifferentiable convex optimization algorithms if you can evaluate f(x), you can usually compute a subgradient strong subgradient calculus: subgradients) rules for finding f(x) (all some algorithms, optimality conditions, etc., need entire subdifferential can be quite complicated we will assume that x int dom f

Shiqian Ma, MAT-258A: Numerical Optimization 15 Basic rules differentiable functions: f(x) = { f(x)} if f is differentiable at x nonnegative combination: if h(x) = α 1 f 1 (x) + α 2 f 2 (x) with α 1, α 2 0, then h(x) = α 1 f 1 (x) + α 2 f 2 (x) (r.h.s. is addition of sets) affine transformation of variables: if h(x) = f(ax + b), then h(x) = A f(ax + b)

Shiqian Ma, MAT-258A: Numerical Optimization 16 Pointwise Maximum f(x) = max{f 1 (x),..., f m (x)} define I(x) = {i f i (x) = f(x)}, the active functions at x weak result: to compute a subgradient at x, choose any k I(x), and any subgradient of f k at x strong result: f(x) = conv f i (x) i I(x) convex hull of the union of subdifferentials of active functions at x if f i s are differentiable, f(x) = conv{ f i (x) i I(x)}

Shiqian Ma, MAT-258A: Numerical Optimization 17 Example: piecewise-linear function f(x) = max i=1,...,m a i x + b i the subdifferential at x is a polyhedron with I(x) = {i a i x + b i = f(x)} f(x) = conv{a i i I(x)}

Shiqian Ma, MAT-258A: Numerical Optimization 18 Example: l 1 -norm f(x) = x 1 = max s { 1,1} s x n the subdifferential is a product of intervals [ 1, 1] x k = 0 f(x) = J 1... J n, J k = {1} x k > 0 { 1} x k < 0

Shiqian Ma, MAT-258A: Numerical Optimization 19 Pointwise supremum f(x) = sup f α (x), f α (x) convex in x for every α α A weak result: to find a subgradient at ˆx, find any β for which f(ˆx) = f β (ˆx) (assuming maximum is attained) choose any g f β (ˆx) (partial) strong result: define I(x) = {α A f α (x) = f(x)} conv f α (x) f(x) α I(x) equality requires extra conditions (e.g., A compact, f α continuous in α)

Shiqian Ma, MAT-258A: Numerical Optimization 20 Exercise: maximum eigenvalue problem: explain how to find a subgradient of f(x) = λ max (A(x)) = sup y A(x)y y 2 =1 where A(x) = A 0 + x 1 A 1 +... + x n A n with symmetric coefficients A i solution: to find a subgradient at ˆx, choose any unit eigenvector y with eigenvalue λ max (A(ˆx)) the gradient of y A(x)y at ˆx is a subgradient of f: (y A 1 y,..., y A n y) f(ˆx)

Shiqian Ma, MAT-258A: Numerical Optimization 21 Minimization f(x) = inf h(x, y), h jointly convex in (x, y) y weak result: to find a subgradient at ˆx, find ŷ that minimizes h(ˆx, y) (assuming minimum is attained) find subgradient (g, 0) h(ˆx, ŷ) proof: for all x, y, h(x, y) h(ˆx, ŷ) + g (x ˆx) + 0 (y ŷ) = f(ˆx) + g (x ˆx) therefore, f(x) = inf y h(x, y) f(ˆx) + g (x ˆx)

Shiqian Ma, MAT-258A: Numerical Optimization 22 Exercise: Euclidean distance to convex set problem: explain how to find a subgradient of f(x) = inf y C x y 2 where C is a closed convex set solution: to find a subgradient at ˆx, if f(ˆx) = 0 (that is, ˆx C), take g = 0 if f(ˆx) > 0, find projection ŷ = P (ˆx) on C; take g = 1 ŷ ˆx 2 (ˆx ŷ) = 1 ˆx P (ˆx) 2 (ˆx P (ˆx))

Shiqian Ma, MAT-258A: Numerical Optimization 23 Composition f(x) = h(f 1 (x),..., f k (x)), h convex non-decreasing, f i convex weak result: to find a subgradient at ˆx, find z h(f 1 (ˆx),..., f k (ˆx)) and g i f i (ˆx) then g = z 1 g 1 +... + z k g k f(ˆx) reduces to standard formula for differentiable h, f i proof. f(x) h(f 1 (ˆx) + g1 (x ˆx),..., f k (ˆx) + gk (x ˆx)) h(f 1 (ˆx),..., f k (ˆx)) + z (g1 (x ˆx),..., gk (x ˆx)) = f(ˆx) + g (x ˆx)

Shiqian Ma, MAT-258A: Numerical Optimization 24 Optimal value function define h(u, v) as the optimal value of convex problem min f 0 (x) s.t., f i (x) u i, i = 1,..., m Ax = b + v (functions f i are convex; optimization variable is x) weak result: suppose h(û, ˆv) is finite, strong duality holds with the dual max inf x ( f0 (x) + i λ i(f i (x) û i ) + ν (Ax b ˆv) ) s.t., λ 0 if ˆλ, ˆν are optimal dual variables (for rhs û,ˆv), then ( ˆλ, ˆν) h(û, ˆv)

Shiqian Ma, MAT-258A: Numerical Optimization 25 proof: by weak duality for problem with rhs u,v h(u, v) inf x (f 0 (x) + ˆλ ) i i (f i (x) u i ) + ˆν (Ax b v) = inf x (f 0 (x) + ˆλ ) i i (f i (x) û i ) + ˆν (Ax b ˆv) ˆλ (u û) ˆν (v ˆv) = h(û, ˆv) ˆλ (u û) ˆν (v ˆv)

Shiqian Ma, MAT-258A: Numerical Optimization 26 Expectation f(x) = Eh(x, u) u random, h convex in x for every u weak result: to find a subgradient at ˆx choose a function u g(u) with g(u) x h(ˆx, u) then, g = E u g(u) f(ˆx) proof: by convexity of h and definition of g(u), f(x) = Eh(x, u) E(h(ˆx, u) + g(u) (x ˆx)) = f(ˆx) + g (x ˆx)

Shiqian Ma, MAT-258A: Numerical Optimization 27 4.1.3. Duality and Optimality Conditions

Shiqian Ma, MAT-258A: Numerical Optimization 28 Optimality conditions-unconstrained x minimizes f(x) if and only if 0 f(x ) proof: by definition f(y) f(x ) + 0 (y x ) for all y 0 f(x )

Shiqian Ma, MAT-258A: Numerical Optimization 29 Example: piecewise linear minimization f(x) = max i=1,...,m (a i x + b i ) optimality condition 0 conv{a i i I(x )} ( where I(x) = {i a i x + b i = f(x)}) in other words, x is optimal if and only if there is a λ with m λ 0, e λ = 1, λ i a i = 0, λ i = 0 for i / I(x ) i=1 these are the optimality conditions for the equivalent linear program min t s.t., Ax + b te max b λ s.t., A λ = 0 λ 0, e λ = 1

Shiqian Ma, MAT-258A: Numerical Optimization 30 Optimality conditions-constrained min f 0 (x) s.t. f i (x) 0, i = 1,..., m from Lagrange duality if strong duality holds, then x, λ are primal, dual optimal iff 1. x is primal feasible 2. λ 0 3. λ i f i(x ) = 0 for i = 1,..., m 4. x is a minimizer of L(x, λ ) = f 0 (x) + m λ i f i (x) i=1

Shiqian Ma, MAT-258A: Numerical Optimization 31 Karush-Kuhn-Tucker conditions (if dom f i = R n ) conditions 1, 2, 3 and m 0 L x (x, λ ) = f 0 (x ) + λ i f i (x ) i=1 this generalizes the condition m 0 = f 0 (x ) + λ i f i (x ) for differentiable f i i=1