Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Similar documents
Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Lecture 1: Background on Convex Analysis

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Constrained Optimization and Lagrangian Duality

5. Duality. Lagrangian

Convex Optimization Boyd & Vandenberghe. 5. Duality

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Primal/Dual Decomposition Methods

Lecture: Duality of LP, SOCP and SDP

Convex Optimization M2

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

Existence of minimizers

Convex Optimization & Lagrange Duality

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Lecture: Duality.

Math 273a: Optimization Subgradients of convex functions

Chap 2. Optimality conditions

Optimality Conditions for Nonsmooth Convex Optimization

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3. Optimization Problems and Iterative Algorithms

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Optimality Conditions for Constrained Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

CS-E4830 Kernel Methods in Machine Learning

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

Convex Optimization Lecture 6: KKT Conditions, and applications

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Optimization for Machine Learning

Solutions Chapter 5. The problem of finding the minimum distance from the origin to a line is written as. min 1 2 kxk2. subject to Ax = b.

Convex Optimization and Modeling

Optimization and Optimal Control in Banach Spaces

The proximal mapping

Lecture 8. Strong Duality Results. September 22, 2008

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

ICS-E4030 Kernel Methods in Machine Learning

Nonlinear Programming 3rd Edition. Theoretical Solutions Manual Chapter 6

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Convex Functions. Pontus Giselsson

EE/AA 578, Univ of Washington, Fall Duality

Lecture 6: Conic Optimization September 8

CO 250 Final Exam Guide

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Lecture 6: September 17

Inequality Constraints

5. Subgradient method

Lecture 18: Optimization Programming

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Linear and non-linear programming

Additional Homework Problems

4TE3/6TE3. Algorithms for. Continuous Optimization

On the Method of Lagrange Multipliers

Duality Theory of Constrained Optimization

1 Review of last lecture and introduction

The Karush-Kuhn-Tucker (KKT) conditions

8. Conjugate functions

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

subject to (x 2)(x 4) u,

Numerical Optimization

Stochastic Programming Math Review and MultiPeriod Models

Convex analysis and profit/cost/support functions

Math 273a: Optimization Convex Conjugacy

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Chapter 2 Convex Analysis

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Convex Optimization and SVM

Lecture 2: Convex functions

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

IE 521 Convex Optimization Homework #1 Solution

Lecture 2: Convex Sets and Functions

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

Mathematics 530. Practice Problems. n + 1 }

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Lagrangian Duality and Convex Optimization

Constrained Optimization

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

1. f(β) 0 (that is, β is a feasible point for the constraints)

Convex Analysis Background

Equality constrained minimization

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

Extreme points of compact convex sets

BASICS OF CONVEX ANALYSIS

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Convex Optimization in Communications and Signal Processing

Lagrange Relaxation and Duality

CONVEX OPTIMIZATION, DUALITY, AND THEIR APPLICATION TO SUPPORT VECTOR MACHINES. Contents 1. Introduction 1 2. Convex Sets

Generalization to inequality constrained problem. Maximize

Transcription:

Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions

Shiqian Ma, MAT-258A: Numerical Optimization 3 Basic inequality recall basic inequality for convex differentiable f: f(y) f(x) + f(x) (y x) the first-order approximation of f at x is a global lower bound f(x) defines non-vertical supporting hyperplane to epi f at (x, f(x)) ( ) (( ) ( )) f(x) y x 0, (y, t) epi f 1 t f(x) what if f is not differentiable?

Shiqian Ma, MAT-258A: Numerical Optimization 4 Subgradient g is a subgradient of a convex function f at x dom f if f(y) f(x) + g (y x), y dom f properties f(x) + g (y x) is a global lower bound on f(y) g defines non-vertical supporting hyperplane to epi f at (x, f(x)) ( ) (( ) ( )) g y x 0, (y, t) epi f 1 t f(x) if f is convex and differentiable, then f(x) is a subgradient of f at x

Shiqian Ma, MAT-258A: Numerical Optimization 5 applications algorithms for nondifferentiable convex optimization unconstrained optimality: x minimizes f(x) iff 0 f(x) KKT conditions with nondifferentiable functions

Shiqian Ma, MAT-258A: Numerical Optimization 6 Example f(x) = max{f 1 (x), f 2 (x)}, f 1, f 2 convex and differentiable subgradients at x 0 form line segment [ f 1 (x 0 ), f 2 (x 0 )] if f 1 (ˆx) > f 2 (ˆx), subgradient of f at ˆx is f 1 (ˆx) if f 1 (ˆx) < f 2 (ˆx), subgradient of f at ˆx is f 2 (ˆx)

Shiqian Ma, MAT-258A: Numerical Optimization 7 4.1.1. Subdifferential the subdifferential f(x) of f at x is the set of all subgradients: properties f(x) = {g g (y x) f(y) f(x), y dom f} f(x) is a closed convex set (possibly empty) (follows from the definition: f(x) is an intersection of halfspaces) if x int dom f then f(x) is nonempty and bounded (see the next two pages for proof)

Shiqian Ma, MAT-258A: Numerical Optimization 8 proof: we show that f(x) is nonempty when x int dom f (x, f(x)) is in the boundary of the convex set epi f therefore there exists a supporting hyperplane to epi f at (x, f(x)): ( ) (( ) ( )) a y x (a, b) 0, 0, (y, t) epi f b t f(x) b > 0 gives a contradiction as t b = 0 gives a contradiction for y = x + ɛa a with small ɛ > 0 therefore b < 0 and g = a/ b is a subgradient of f at x

Shiqian Ma, MAT-258A: Numerical Optimization 9 proof: f(x) is bounded when x int dom f for small r > 0, define a set of 2n points B = {x ± re k k = 1,..., n} dom f and define M = max y B f(y) < for every nonzero g f(x), there is a point y B with f(y) f(x) + g (y x) = f(x) + r g (choose an index k with g k = g, and take y = x+rsign(g k )e k therefore f(x) is bounded: sup g M f(x) g f(x) r

Shiqian Ma, MAT-258A: Numerical Optimization 10 absolute value f(x) = x : f(x) = Examples 1 x > 0 [ 1, 1] x = 0 1 x < 0 Euclidean norm f(x) = x 2 : f(x) = 1 x 2 x, if x 0, f(x) = {g g 2 1}, if x = 0

Shiqian Ma, MAT-258A: Numerical Optimization 11 Monotonicity subdifferential of a convex function is a monotone operator: (u v) (x y) 0, x, y, u f(x), v f(y) proof: by definition f(y) f(x) + u (y x), f(x) f(y) + v (x y) combining the two inequalities shows monotonicity

Shiqian Ma, MAT-258A: Numerical Optimization 12 Examples of non-subdifferentiable functions the following functions are not subdifferentiable at x = 0 f : R R, dom f = R + f(x) = 1 if x = 0, f(x) = 0 if x > 0 f : R R, dom f = R + f(x) = x the only supporting hyperplane to epi f at (0, f(0)) is vertical

Shiqian Ma, MAT-258A: Numerical Optimization 13 Subgradients and sublevel sets if g is a subgradient of f at x, then f(y) f(x) g (y x) 0 nonzero subgradients at x define supporting hyperplanes to sublevel set {y f(y) f(x)}

Shiqian Ma, MAT-258A: Numerical Optimization 14 4.1.2. Subgradient calculus weak subgradient calculus: rules for finding one subgradient sufficient for most nondifferentiable convex optimization algorithms if you can evaluate f(x), you can usually compute a subgradient strong subgradient calculus: subgradients) rules for finding f(x) (all some algorithms, optimality conditions, etc., need entire subdifferential can be quite complicated we will assume that x int dom f

Shiqian Ma, MAT-258A: Numerical Optimization 15 Basic rules differentiable functions: f(x) = { f(x)} if f is differentiable at x nonnegative combination: if h(x) = α 1 f 1 (x) + α 2 f 2 (x) with α 1, α 2 0, then h(x) = α 1 f 1 (x) + α 2 f 2 (x) (r.h.s. is addition of sets) affine transformation of variables: if h(x) = f(ax + b), then h(x) = A f(ax + b)

Shiqian Ma, MAT-258A: Numerical Optimization 16 Pointwise Maximum f(x) = max{f 1 (x),..., f m (x)} define I(x) = {i f i (x) = f(x)}, the active functions at x weak result: to compute a subgradient at x, choose any k I(x), and any subgradient of f k at x strong result: f(x) = conv f i (x) i I(x) convex hull of the union of subdifferentials of active functions at x if f i s are differentiable, f(x) = conv{ f i (x) i I(x)}

Shiqian Ma, MAT-258A: Numerical Optimization 17 Example: piecewise-linear function f(x) = max i=1,...,m a i x + b i the subdifferential at x is a polyhedron with I(x) = {i a i x + b i = f(x)} f(x) = conv{a i i I(x)}

Shiqian Ma, MAT-258A: Numerical Optimization 18 Example: l 1 -norm f(x) = x 1 = max s { 1,1} s x n the subdifferential is a product of intervals [ 1, 1] x k = 0 f(x) = J 1... J n, J k = {1} x k > 0 { 1} x k < 0

Shiqian Ma, MAT-258A: Numerical Optimization 19 Pointwise supremum f(x) = sup f α (x), f α (x) convex in x for every α α A weak result: to find a subgradient at ˆx, find any β for which f(ˆx) = f β (ˆx) (assuming maximum is attained) choose any g f β (ˆx) (partial) strong result: define I(x) = {α A f α (x) = f(x)} conv f α (x) f(x) α I(x) equality requires extra conditions (e.g., A compact, f α continuous in α)

Shiqian Ma, MAT-258A: Numerical Optimization 20 Exercise: maximum eigenvalue problem: explain how to find a subgradient of f(x) = λ max (A(x)) = sup y A(x)y y 2 =1 where A(x) = A 0 + x 1 A 1 +... + x n A n with symmetric coefficients A i solution: to find a subgradient at ˆx, choose any unit eigenvector y with eigenvalue λ max (A(ˆx)) the gradient of y A(x)y at ˆx is a subgradient of f: (y A 1 y,..., y A n y) f(ˆx)

Shiqian Ma, MAT-258A: Numerical Optimization 21 Minimization f(x) = inf h(x, y), h jointly convex in (x, y) y weak result: to find a subgradient at ˆx, find ŷ that minimizes h(ˆx, y) (assuming minimum is attained) find subgradient (g, 0) h(ˆx, ŷ) proof: for all x, y, h(x, y) h(ˆx, ŷ) + g (x ˆx) + 0 (y ŷ) = f(ˆx) + g (x ˆx) therefore, f(x) = inf y h(x, y) f(ˆx) + g (x ˆx)

Shiqian Ma, MAT-258A: Numerical Optimization 22 Exercise: Euclidean distance to convex set problem: explain how to find a subgradient of f(x) = inf y C x y 2 where C is a closed convex set solution: to find a subgradient at ˆx, if f(ˆx) = 0 (that is, ˆx C), take g = 0 if f(ˆx) > 0, find projection ŷ = P (ˆx) on C; take g = 1 ŷ ˆx 2 (ˆx ŷ) = 1 ˆx P (ˆx) 2 (ˆx P (ˆx))

Shiqian Ma, MAT-258A: Numerical Optimization 23 Composition f(x) = h(f 1 (x),..., f k (x)), h convex non-decreasing, f i convex weak result: to find a subgradient at ˆx, find z h(f 1 (ˆx),..., f k (ˆx)) and g i f i (ˆx) then g = z 1 g 1 +... + z k g k f(ˆx) reduces to standard formula for differentiable h, f i proof. f(x) h(f 1 (ˆx) + g1 (x ˆx),..., f k (ˆx) + gk (x ˆx)) h(f 1 (ˆx),..., f k (ˆx)) + z (g1 (x ˆx),..., gk (x ˆx)) = f(ˆx) + g (x ˆx)

Shiqian Ma, MAT-258A: Numerical Optimization 24 Optimal value function define h(u, v) as the optimal value of convex problem min f 0 (x) s.t., f i (x) u i, i = 1,..., m Ax = b + v (functions f i are convex; optimization variable is x) weak result: suppose h(û, ˆv) is finite, strong duality holds with the dual max inf x ( f0 (x) + i λ i(f i (x) û i ) + ν (Ax b ˆv) ) s.t., λ 0 if ˆλ, ˆν are optimal dual variables (for rhs û,ˆv), then ( ˆλ, ˆν) h(û, ˆv)

Shiqian Ma, MAT-258A: Numerical Optimization 25 proof: by weak duality for problem with rhs u,v h(u, v) inf x (f 0 (x) + ˆλ ) i i (f i (x) u i ) + ˆν (Ax b v) = inf x (f 0 (x) + ˆλ ) i i (f i (x) û i ) + ˆν (Ax b ˆv) ˆλ (u û) ˆν (v ˆv) = h(û, ˆv) ˆλ (u û) ˆν (v ˆv)

Shiqian Ma, MAT-258A: Numerical Optimization 26 Expectation f(x) = Eh(x, u) u random, h convex in x for every u weak result: to find a subgradient at ˆx choose a function u g(u) with g(u) x h(ˆx, u) then, g = E u g(u) f(ˆx) proof: by convexity of h and definition of g(u), f(x) = Eh(x, u) E(h(ˆx, u) + g(u) (x ˆx)) = f(ˆx) + g (x ˆx)

Shiqian Ma, MAT-258A: Numerical Optimization 27 4.1.3. Duality and Optimality Conditions

Shiqian Ma, MAT-258A: Numerical Optimization 28 Optimality conditions-unconstrained x minimizes f(x) if and only if 0 f(x ) proof: by definition f(y) f(x ) + 0 (y x ) for all y 0 f(x )

Shiqian Ma, MAT-258A: Numerical Optimization 29 Example: piecewise linear minimization f(x) = max i=1,...,m (a i x + b i ) optimality condition 0 conv{a i i I(x )} ( where I(x) = {i a i x + b i = f(x)}) in other words, x is optimal if and only if there is a λ with m λ 0, e λ = 1, λ i a i = 0, λ i = 0 for i / I(x ) i=1 these are the optimality conditions for the equivalent linear program min t s.t., Ax + b te max b λ s.t., A λ = 0 λ 0, e λ = 1

Shiqian Ma, MAT-258A: Numerical Optimization 30 Optimality conditions-constrained min f 0 (x) s.t. f i (x) 0, i = 1,..., m from Lagrange duality if strong duality holds, then x, λ are primal, dual optimal iff 1. x is primal feasible 2. λ 0 3. λ i f i(x ) = 0 for i = 1,..., m 4. x is a minimizer of L(x, λ ) = f 0 (x) + m λ i f i (x) i=1

Shiqian Ma, MAT-258A: Numerical Optimization 31 Karush-Kuhn-Tucker conditions (if dom f i = R n ) conditions 1, 2, 3 and m 0 L x (x, λ ) = f 0 (x ) + λ i f i (x ) i=1 this generalizes the condition m 0 = f 0 (x ) + λ i f i (x ) for differentiable f i i=1