Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Similar documents
Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Lecture 1: Background on Convex Analysis

Optimization for Machine Learning

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

Lecture 6: September 12

Optimality Conditions for Nonsmooth Convex Optimization

Convex Analysis Background

Lecture 6: September 17

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lecture 2: Convex Sets and Functions

Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

8. Conjugate functions

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013

Math 273a: Optimization Subgradients of convex functions

Optimization and Optimal Control in Banach Spaces

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

BASICS OF CONVEX ANALYSIS

Convex Analysis and Economic Theory AY Elementary properties of convex functions

Primal/Dual Decomposition Methods

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Lecture: Duality of LP, SOCP and SDP

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

The proximal mapping

Functional Analysis Exercise Class

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Dedicated to Michel Théra in honor of his 70th birthday

Dual Decomposition.

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

In Progress: Summary of Notation and Basic Results Convex Analysis C&O 663, Fall 2009

Convex analysis and profit/cost/support functions

Continuity of convex functions in normed spaces

Lecture 2: Convex functions

Helly's Theorem and its Equivalences via Convex Analysis

IE 521 Convex Optimization Homework #1 Solution

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

Hence, (f(x) f(x 0 )) 2 + (g(x) g(x 0 )) 2 < ɛ

Lecture 3. Optimization Problems and Iterative Algorithms

Convex Optimization Theory. Chapter 4 Exercises and Solutions: Extended Version

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

Math 361: Homework 1 Solutions

Lagrangian Duality and Convex Optimization

FALL 2018 MATH 4211/6211 Optimization Homework 1

Subdifferential representation of convex functions: refinements and applications

Functions of Several Variables

Convex Optimization Conjugate, Subdifferential, Proximation

LECTURE 4 LECTURE OUTLINE

8. Geometric problems

Math 273a: Optimization Subgradient Methods

Lecture 8. Strong Duality Results. September 22, 2008

Convex Optimization Lecture 6: KKT Conditions, and applications

The sum of two maximal monotone operator is of type FPV

Dual and primal-dual methods

Midterm 1 Review. Distance = (x 1 x 0 ) 2 + (y 1 y 0 ) 2.

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

MAXIMALITY OF SUMS OF TWO MAXIMAL MONOTONE OPERATORS

EE Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko, JPL Third Term

A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06

5. Subgradient method

Chapter 2 Convex Analysis

Conditional Gradient (Frank-Wolfe) Method

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Convex optimization COMS 4771

Convex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK)

9. Geometric problems

Linear and non-linear programming

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

1 Sparsity and l 1 relaxation

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Assignment 1: From the Definition of Convexity to Helley Theorem

Convex Analysis and Economic Theory Fall 2018 Winter 2019

Final exam.

Math 273a: Optimization Convex Conjugacy

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Duality and dynamics in Hamilton-Jacobi theory for fully convex problems of control

8. Geometric problems

Homework 2. Convex Optimization /36-725

Definition of convex function in a vector space

Math 328 Course Notes

4 Linear operators and linear functionals

What can be expressed via Conic Quadratic and Semidefinite Programming?

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

SOME REMARKS ON THE SPACE OF DIFFERENCES OF SUBLINEAR FUNCTIONS

Duality Theory of Constrained Optimization

WEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE

Online Convex Optimization

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

MATH 680 Fall November 27, Homework 3

Lecture 7: September 17

Transcription:

Convex Optimization (EE227A: UC Berkeley) Lecture 4 (Conjugates, subdifferentials) 31 Jan, 2013 Suvrit Sra

Organizational HW1 due: 14th Feb 2013 in class. Please L A TEX your solutions (contact TA if this is an issue) Discussion with classmates is ok Each person must submit his/her individual solutions Acknowledge any help you receive Do not copy! Make sure you understand the solution you submit Cite any source that you use Have fun solving problems! 2 / 30

Recap Eigenvalues, singular values, positive definiteness Convex sets, θ 1 x + θ 2 y C, θ 1 + θ 2 = 1, θ i 0 Convex functions, midpoint convex, recognizing convexity Norms, mixed-norms, matrix norms, dual norms Indicator, distance function, minimum of jointly convex Brief mention of other forms of convexity f ( x+y 2 ) 1 2 [f(x) + f(y)] + continuity = f is cvx 2 f(x) 0 implies f is cvx. 3 / 30

Fenchel Conjugate 4 / 30

Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. 5 / 30

Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x 5 / 30

Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x Exercise: Verify that u is a norm. 5 / 30

Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x Exercise: Verify that u is a norm. u + v = sup { (u + v) T x x 1 } But sup (A + B) sup A + sup B 5 / 30

Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x Exercise: Verify that u is a norm. u + v = sup { (u + v) T x x 1 } But sup (A + B) sup A + sup B Exercise: Let 1/p + 1/q = 1, where p, q 1. Show that q is dual to p. In particular, the l 2 -norm is self-dual. 5 / 30

Dual norms Def. Let be a norm on R n. Its dual norm is u := sup { u T x x 1 }. Exercise: Verify that we may write u = sup x 0 u T x x Exercise: Verify that u is a norm. u + v = sup { (u + v) T x x 1 } But sup (A + B) sup A + sup B Exercise: Let 1/p + 1/q = 1, where p, q 1. Show that q is dual to p. In particular, the l 2 -norm is self-dual. Hint: Use Hölder s inequality: u T v u p v q 5 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α Then, z T x x = αz T u αu = α(z T u u ); 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α Then, z T x x = αz T u αu = α(z T u u ); 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α Then, z T x x = αz T u αu = α(z T u u ); Case (ii): Since z T x x z, 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α Then, z T x x = αz T u αu = α(z T u u ); Case (ii): Since z T x x z, x T z x x ( z 1) 0. 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α Then, z T x x = αz T u αu = α(z T u u ); Case (ii): Since z T x x z, x T z x x ( z 1) 0. x = 0 maximizes x ( z 1), hence f(z) = 0. 6 / 30

Fenchel conjugate Def. The Fenchel conjugate of a function f is f (z) := sup x dom f x T z f(x). Note: f pointwise (over x) sup of linear functions of z, so cvx! Example Let f(x) = x. We have f (z) = I 1(z). That is, conjugate of norm is the indicator function of dual norm ball. Consider two cases: (i) z > 1; (ii) z 1 Case (i), by definition of dual norm (sup over z T u) there is a u s.t. u 1 and z T u > 1 f (z) = sup x x T z f(x). Rewrite x = αu, and let α Then, z T x x = αz T u αu = α(z T u u ); Case (ii): Since z T x x z, x T z x x ( z 1) 0. x = 0 maximizes x ( z 1), hence f(z) = 0. Thus, f(z) = + if (i), and 0 if (ii), as desired. 6 / 30

Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x 7 / 30

Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. 7 / 30

Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. Thus, dom f = {a}, and f (a) = b. 7 / 30

Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. Thus, dom f = {a}, and f (a) = b. Example Let a 0, and set f(x) = a 2 x 2 if x a, and + otherwise. Then, f (z) = a 1 + z 2. 7 / 30

Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. Thus, dom f = {a}, and f (a) = b. Example Let a 0, and set f(x) = a 2 x 2 if x a, and + otherwise. Then, f (z) = a 1 + z 2. Example f(x) = 1 2 xt Ax, where A 0. Then, f (z) = 1 2 zt A 1 z. 7 / 30

Fenchel conjugate Example f(x) = ax + b; then, f (z) = sup zx (ax + b) x =, if (z a) 0. Thus, dom f = {a}, and f (a) = b. Example Let a 0, and set f(x) = a 2 x 2 if x a, and + otherwise. Then, f (z) = a 1 + z 2. Example f(x) = 1 2 xt Ax, where A 0. Then, f (z) = 1 2 zt A 1 z. Exercise: If f(x) = max(0, 1 x), then dom f is [ 1, 0], and within this domain, f (z) = z. Hint: Analyze cases: max(0, 1 x) = 0; and max(0, 1 x) = 1 x 7 / 30

Subdifferentials 8 / 30

First order global underestimator f(y) f(x) f(y) + f(y), x y y x f(x) f(y) + f(y), x y 9 / 30

First order global underestimator f(x) g 1 f(y) f(y) + g 1, x y g 2 g 3 y f(x) f(y) + g, x y 9 / 30

Subgradients g 2 g 3 f(x) g 1 f(y) f(y) + g 1, x y g 1, g 2, g 3 are subgradients at y y 10 / 30

Subgradients basic facts f is convex, differentiable: f(y) the unique subgradient at y A vector g is a subgradient at a point y if and only if f(y) + g, x y is globally smaller than f(x). Usually, one subgradient costs approx. as much as f(x) 11 / 30

Subgradients basic facts f is convex, differentiable: f(y) the unique subgradient at y A vector g is a subgradient at a point y if and only if f(y) + g, x y is globally smaller than f(x). Usually, one subgradient costs approx. as much as f(x) Determining all subgradients at a given point difficult. Subgradient calculus a great achievement in convex analysis 11 / 30

Subgradients basic facts f is convex, differentiable: f(y) the unique subgradient at y A vector g is a subgradient at a point y if and only if f(y) + g, x y is globally smaller than f(x). Usually, one subgradient costs approx. as much as f(x) Determining all subgradients at a given point difficult. Subgradient calculus a great achievement in convex analysis Without convexity, things become wild! advanced course 11 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f 1 (x) 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f 1 (x) f 2 (x) 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) y 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) f 1 (x) > f 2 (x): unique subgradient of f is f 1 (x) y 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) f 1 (x) > f 2 (x): unique subgradient of f is f 1 (x) f 1 (x) < f 2 (x): unique subgradient of f is f 2 (x) y 12 / 30

Subgradients example f(x) := max(f 1 (x), f 2 (x)); both f 1, f 2 convex, differentiable f(x) f 1 (x) f 2 (x) f 1 (x) > f 2 (x): unique subgradient of f is f 1 (x) f 1 (x) < f 2 (x): unique subgradient of f is f 2 (x) f 1 (y) = f 2 (y): subgradients, the segment [f 1 (y), f 2 (y)] (imagine all supporting lines turning about point y) y 12 / 30

Subgradients Def. A vector g R n is called a subgradient at a point y, if for all x dom f, it holds that f(x) f(y) + g, x y 13 / 30

Subdifferential Def. The set of all subgradients at y denoted by f(y). This set is called subdifferential of f at y 14 / 30

Subdifferential Def. The set of all subgradients at y denoted by f(y). This set is called subdifferential of f at y If f is convex, f(x) is nice: If x relative interior of dom f, then f(x) nonempty 14 / 30

Subdifferential Def. The set of all subgradients at y denoted by f(y). This set is called subdifferential of f at y If f is convex, f(x) is nice: If x relative interior of dom f, then f(x) nonempty If f differentiable at x, then f(x) = { f(x)} 14 / 30

Subdifferential Def. The set of all subgradients at y denoted by f(y). This set is called subdifferential of f at y If f is convex, f(x) is nice: If x relative interior of dom f, then f(x) nonempty If f differentiable at x, then f(x) = { f(x)} If f(x) = {g}, then f is differentiable and g = f(x) 14 / 30

Subdifferential example f(x) = x 15 / 30

Subdifferential example f(x) = x +1 f(x) 1 x 15 / 30

Subdifferential example f(x) = x +1 f(x) 1 x 1 x < 0, x = +1 x > 0, [ 1, 1] x = 0. 15 / 30

More examples Example f(x) = x 2. Then, { x 1 2 f(x) := x x 0, {z z 2 1} x = 0. 16 / 30

More examples Example f(x) = x 2. Then, { x 1 2 f(x) := x x 0, {z z 2 1} x = 0. Proof. z 2 x 2 + g, z x 16 / 30

More examples Example f(x) = x 2. Then, { x 1 2 f(x) := x x 0, {z z 2 1} x = 0. Proof. z 2 x 2 + g, z x z 2 g, z 16 / 30

More examples Example f(x) = x 2. Then, { x 1 2 f(x) := x x 0, {z z 2 1} x = 0. Proof. z 2 x 2 + g, z x z 2 g, z = g 2 1. 16 / 30

More examples Example A convex function need not be subdifferentiable everywhere. Let { (1 x 2 2 f(x) := )1/2 if x 2 1, + otherwise. f diff. for all x with x 2 < 1, but f(x) = whenever x 2 1. 17 / 30

Calculus 18 / 30

Recall basic calculus If f and k are differentiable, we know that Addition: (f + k)(x) = f(x) + k(x) Scaling: (αf(x)) = α f(x) 19 / 30

Recall basic calculus If f and k are differentiable, we know that Addition: (f + k)(x) = f(x) + k(x) Scaling: (αf(x)) = α f(x) Chain rule If f : R n R m, and k : R m R p. Let h : R n R p be the composition h(x) = (k f)(x) = k(f(x)). Then, Dh(x) = Dk(f(x))Df(x). 19 / 30

Recall basic calculus If f and k are differentiable, we know that Addition: (f + k)(x) = f(x) + k(x) Scaling: (αf(x)) = α f(x) Chain rule If f : R n R m, and k : R m R p. Let h : R n R p be the composition h(x) = (k f)(x) = k(f(x)). Then, Dh(x) = Dk(f(x))Df(x). Example If f : R n R and k : R R, then using the fact that h(x) = [Dh(x)] T, we obtain h(x) = k (f(x)) f(x). 19 / 30

Subgradient calculus Finding one subgradient within f(x) 20 / 30

Subgradient calculus Finding one subgradient within f(x) Determining entire subdifferential f(x) at a point x 20 / 30

Subgradient calculus Finding one subgradient within f(x) Determining entire subdifferential f(x) at a point x Do we have the chain rule? 20 / 30

Subgradient calculus Finding one subgradient within f(x) Determining entire subdifferential f(x) at a point x Do we have the chain rule? Usually not easy! 20 / 30

Subgradient calculus If f is differentiable, f(x) = { f(x)} 21 / 30

Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} 21 / 30

Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} Addition : (f + k)(x) = f(x) + k(x) (set addition) 21 / 30

Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} Addition : (f + k)(x) = f(x) + k(x) (set addition) Chain rule : Let A R m n, b R m, f : R m R, and h : R n R be given by h(x) = f(ax + b). Then, h(x) = A T f(ax + b). 21 / 30

Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} Addition : (f + k)(x) = f(x) + k(x) (set addition) Chain rule : Let A R m n, b R m, f : R m R, and h : R n R be given by h(x) = f(ax + b). Then, h(x) = A T f(ax + b). Chain rule : h(x) = f k, where k : X Y is diff. h(x) = f(k(x)) Dk(x) = [Dk(x)] T f(k(x)) 21 / 30

Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} Addition : (f + k)(x) = f(x) + k(x) (set addition) Chain rule : Let A R m n, b R m, f : R m R, and h : R n R be given by h(x) = f(ax + b). Then, h(x) = A T f(ax + b). Chain rule : h(x) = f k, where k : X Y is diff. h(x) = f(k(x)) Dk(x) = [Dk(x)] T f(k(x)) Max function : If f(x) := max 1 i m f i (x), then f(x) = conv { f i (x) f i (x) = f(x)}, convex hull over subdifferentials of active functions at x 21 / 30

Subgradient calculus If f is differentiable, f(x) = { f(x)} Scaling α > 0, (αf)(x) = α f(x) = {αg g f(x)} Addition : (f + k)(x) = f(x) + k(x) (set addition) Chain rule : Let A R m n, b R m, f : R m R, and h : R n R be given by h(x) = f(ax + b). Then, h(x) = A T f(ax + b). Chain rule : h(x) = f k, where k : X Y is diff. h(x) = f(k(x)) Dk(x) = [Dk(x)] T f(k(x)) Max function : If f(x) := max 1 i m f i (x), then f(x) = conv { f i (x) f i (x) = f(x)}, convex hull over subdifferentials of active functions at x Conjugation: z f(x) if and only if x f (z) 21 / 30

Examples It can happen that (f 1 + f 2 ) f 1 + f 2 22 / 30

Examples It can happen that (f 1 + f 2 ) f 1 + f 2 Example Define f 1 and f 2 by { 2 x if x 0, f 1 (x) := + if x < 0, and f 2 (x) := { + if x > 0, 2 x if x 0. Then, f = max {f 1, f 2 } = I 0, whereby f(0) = R But f 1 (0) = f 2 (0) =. 22 / 30

Examples It can happen that (f 1 + f 2 ) f 1 + f 2 Example Define f 1 and f 2 by { 2 x if x 0, f 1 (x) := + if x < 0, and f 2 (x) := { + if x > 0, 2 x if x 0. Then, f = max {f 1, f 2 } = I 0, whereby f(0) = R But f 1 (0) = f 2 (0) =. However, f 1 (x) + f 2 (x) (f 1 + f 2 )(x) always holds. 22 / 30

Examples Example f(x) = x. Then, f(0) = conv {±e 1,..., ±e n }, where e i is i-th canonical basis vector (column of identity matrix). 23 / 30

Examples Example f(x) = x. Then, f(0) = conv {±e 1,..., ±e n }, where e i is i-th canonical basis vector (column of identity matrix). { To prove, notice that f(x) = max 1 i n e T i x }. 23 / 30

Example (S. Boyd) Example Let f(x) = max { s T x s i { 1, 1} } (2 n members) 24 / 30

Example (S. Boyd) Example Let f(x) = max { s T x s i { 1, 1} } (2 n members) ( 1, 1) (1, 1) f at x = (0, 0) 24 / 30

Example (S. Boyd) Example Let f(x) = max { s T x s i { 1, 1} } (2 n members) ( 1, 1) +1 1 (1, 1) 1 f at x = (0, 0) f at x = (1, 0) 24 / 30

Example (S. Boyd) Example Let f(x) = max { s T x s i { 1, 1} } (2 n members) ( 1, 1) +1 +1 (1, 1) 1 (1, 1) 1 1 f at x = (0, 0) f at x = (1, 0) f at x = (1, 1) 24 / 30

Rules for subgradients 25 / 30

Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! 26 / 30

Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): 26 / 30

Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) 26 / 30

Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) Pick any subgradient g h(x, y ) 26 / 30

Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) Pick any subgradient g h(x, y ) This g f(x) 26 / 30

Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) Pick any subgradient g h(x, y ) This g f(x) h(z, y ) h(x, y ) + g T (z x) h(z, y ) f(x) + g T (z x) 26 / 30

Subgradient for pointwise sup f(x) := sup y Y h(x, y) Getting f(x) is complicated! Simple way to obtain some g f(x): Pick any y for which h(x, y ) = f(x) Pick any subgradient g h(x, y ) This g f(x) h(z, y ) h(x, y ) + g T (z x) h(z, y ) f(x) + g T (z x) f(z) h(z, y) (because of sup) f(z) f(x) + g T (z x). 26 / 30

Example Suppose a i R n and b i R. And f(x) := max 1 i n (at i x + b i ). This f a max (in fact, over a finite number of terms) 27 / 30

Example Suppose a i R n and b i R. And f(x) := max 1 i n (at i x + b i ). This f a max (in fact, over a finite number of terms) Suppose f(x) = a T k x + b k for some index k 27 / 30

Example Suppose a i R n and b i R. And f(x) := max 1 i n (at i x + b i ). This f a max (in fact, over a finite number of terms) Suppose f(x) = a T k x + b k for some index k Here f(x; y) = f k (x) = a T k x + b k, and f k (x) = { f k (x)} 27 / 30

Example Suppose a i R n and b i R. And f(x) := max 1 i n (at i x + b i ). This f a max (in fact, over a finite number of terms) Suppose f(x) = a T k x + b k for some index k Here f(x; y) = f k (x) = a T k x + b k, and f k (x) = { f k (x)} Hence, a k f(x) works! 27 / 30

Subgradient of expectation Suppose f = Ef(x, u), where f is convex in x for each u (an r.v.) f(x) := f(x, u)p(u)du 28 / 30

Subgradient of expectation Suppose f = Ef(x, u), where f is convex in x for each u (an r.v.) f(x) := f(x, u)p(u)du For each u choose any g(x, u) x f(x, u) 28 / 30

Subgradient of expectation Suppose f = Ef(x, u), where f is convex in x for each u (an r.v.) f(x) := f(x, u)p(u)du For each u choose any g(x, u) x f(x, u) Then, g = g(x, u)p(u)du = Eg(x, u) f(x) 28 / 30

Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). 29 / 30

Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: 29 / 30

Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: For i = 1 to n, compute g i f i (x) 29 / 30

Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: For i = 1 to n, compute g i f i (x) Compute u h(f 1 (x),..., f n (x)) 29 / 30

Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: For i = 1 to n, compute g i f i (x) Compute u h(f 1 (x),..., f n (x)) Set g = u 1 g 1 + u 2 g 2 + + u n g n ; this g f(x) 29 / 30

Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: For i = 1 to n, compute g i f i (x) Compute u h(f 1 (x),..., f n (x)) Set g = u 1 g 1 + u 2 g 2 + + u n g n ; this g f(x) Compare with f(x) = J h(x), where J matrix of f i (x) 29 / 30

Subgradient of composition Suppose h : R n R cvx and nondecreasing; each f i cvx f(x) := h(f 1 (x), f 2 (x),..., f n (x)). To find a vector g f(x), we may: For i = 1 to n, compute g i f i (x) Compute u h(f 1 (x),..., f n (x)) Set g = u 1 g 1 + u 2 g 2 + + u n g n ; this g f(x) Compare with f(x) = J h(x), where J matrix of f i (x) Exercise: Verify g f(x) by showing f(z) f(x) + g T (z x) 29 / 30

References 1 R. T. Rockafellar. Convex Analysis 2 S. Boyd (Stanford); EE364b Lecture Notes. 30 / 30