Math 273a: Optimization Subgradients of convex functions

Similar documents
Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Lecture 1: Background on Convex Analysis

Math 273a: Optimization Convex Conjugacy

Convex Analysis Background

Optimality Conditions for Nonsmooth Convex Optimization

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradient Method. Ryan Tibshirani Convex Optimization

Convex Analysis and Optimization Chapter 4 Solutions

Convex Functions. Pontus Giselsson

In Progress: Summary of Notation and Basic Results Convex Analysis C&O 663, Fall 2009

BASICS OF CONVEX ANALYSIS

Lecture 8. Strong Duality Results. September 22, 2008

8. Geometric problems

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Lecture 2: Convex Sets and Functions

IE 521 Convex Optimization

8. Conjugate functions

8. Geometric problems

Chapter 2: Preliminaries and elements of convex analysis

LECTURE SLIDES ON CONVEX OPTIMIZATION AND DUALITY THEORY TATA INSTITUTE FOR FUNDAMENTAL RESEARCH MUMBAI, INDIA JANUARY 2009 PART I

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Introduction to Convex Analysis Microeconomics II - Tutoring Class

Convex Optimization Theory

Math 273a: Optimization Lagrange Duality

Handout 2: Elements of Convex Analysis

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

Optimization and Optimal Control in Banach Spaces

LECTURE 4 LECTURE OUTLINE

Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems

Continuity of convex functions in normed spaces

5. Subgradient method

Epiconvergence and ε-subgradients of Convex Functions

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Convex Analysis and Optimization Chapter 2 Solutions

18.657: Mathematics of Machine Learning

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

4. Convex Sets and (Quasi-)Concave Functions

Lecture 3. Optimization Problems and Iterative Algorithms

LECTURE SLIDES ON BASED ON CLASS LECTURES AT THE CAMBRIDGE, MASS FALL 2007 BY DIMITRI P. BERTSEKAS.

Helly's Theorem and its Equivalences via Convex Analysis

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Smoothing Proximal Gradient Method. General Structured Sparse Regression

THE INVERSE FUNCTION THEOREM

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Primal/Dual Decomposition Methods

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3

Lecture 6 : Projected Gradient Descent

Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 3 LECTURE OUTLINE

Radial Subgradient Descent

Introduction to Convex and Quasiconvex Analysis

Parcours OJD, Ecole Polytechnique et Université Pierre et Marie Curie 05 Mai 2015

Newton s Method. Javier Peña Convex Optimization /36-725

Subdifferential representation of convex functions: refinements and applications

LECTURE SLIDES ON CONVEX ANALYSIS AND OPTIMIZATION BASED ON LECTURES GIVEN AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY CAMBRIDGE, MASS

Convex Optimization Theory. Chapter 3 Exercises and Solutions: Extended Version

The Q-parametrization (Youla) Lecture 13: Synthesis by Convex Optimization. Lecture 13: Synthesis by Convex Optimization. Example: Spring-mass System

Lecture 6: September 17

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

Dedicated to Michel Théra in honor of his 70th birthday

Optimization for Machine Learning

Lecture 14: Newton s Method

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convergence of Fixed-Point Iterations

Convex Functions and Optimization

Sum of two maximal monotone operators in a general Banach space is maximal

Math 205b Homework 2 Solutions

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008

Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide

The local equicontinuity of a maximal monotone operator

Convex optimization COMS 4771

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

FRTN10 Multivariable Control, Lecture 13. Course outline. The Q-parametrization (Youla) Example: Spring-mass System

Integral Jensen inequality

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Lecture 6: Conic Optimization September 8

Online Convex Optimization

Existence of Global Minima for Constrained Optimization 1

Optimality, identifiability, and sensitivity

5. Duality. Lagrangian

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Appendix B Convex analysis

Convex Optimization Boyd & Vandenberghe. 5. Duality

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

Chapter 2 Convex Analysis

Stochastic Optimization Lecture notes for part I. J. Frédéric Bonnans 1

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

Lecture 1: January 12

IE 521 Convex Optimization

1 Lesson 0: Intro and motivation (Brunn-Minkowski)

Transcription:

Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20

Subgradients Assumptions and Notation 1. f : R n R n { } is a closed proper convex function, i.e. epif = {(x, t) R n R f(x) t} is closed and convex. 2. The effective domain of f is domf = {x R n : f(x) < } 3. The function f is proper, i.e. domf. 4. A raised (e.g., x ) means global minimum of some function. 2 / 20

C 1 convex function Recall: a convex function of C 1 obeys f(y) f(x) + f(x), y x, x, y R n 1 1 figure taken from Boyd and Vandenberghe, Convex Optimization. 3 / 20

Non-C 1 convex functions For all each ˉx R n, f(ˉx) := {g R n : f(y) f(ˉx) + g, y ˉx } 4 / 20

subgradient is defined via a global property, not by taking limits. in contrast, the set of regular subgradients is defined as ˆ f(x) = {g δ > 0 such that f(y) f(x)+ g, y x +o( y x ), y B δ (x)} g is a general subgradient of f at x if: there is a sequence x i x and g i ˆ f(x i ) with g i g. Reference: Variational analysis by Rockafellar and Wets. Definition 8.3. 5 / 20

Which functions have subgradients? If f C 1, then f(x) f(x). In fact, f(x) = { f(x)}. Proof: if g is a subgradient, then for y R n f(x), y = f(x + ty) f(x) lim t 0 t g, x + ty x lim t 0 = g, y. Change y to y, and the inequality still holds. Plugging in standard basis vectors = f(x) = g. Next, the general case. t (by f def.) (by subgrad def.) = f(x), y = g, y. 6 / 20

Which functions have subgradients? Theorem (Nesterov 03 Thm 3.1.13) Let f be a closed convex function and x 0 int(dom(f)). Then f(x 0) is a nonempty bounded set. Proof uses supporting hyperplanes of epigraph to show existence, and local Lipschitz continuity of convex functions to show boundedness. 2 2 figure taken from Boyd and Vandenberghe, http://see.stanford.edu/materials/lsocoee364b/01-subgradients_notes.pdf. 7 / 20

The converse Lemma (Nesterov 03 Lm 3.1.6) If f(x) for all x dom(f), then f is convex. Proof. x, y dom(f), α [0, 1], y α = (1 α)x + αy = x + α(y x), g f(y α ). f(y) f(y α) + g, y y α = f(y α) + (1 α) g, y x (1) f(x) f(y α) + g, x y α = f(y α) α g, y x (2) Multiply equation (1) by α and equation (2) by (1 α). Add them together to get αf(y) + (1 α)f(x) f(y α) 8 / 20

Technicality: int(dom(f)) We cannot relax the assumption x int(dom(f)) to x dom(f). Example: f : [0, + ) R. f(x) = x. dom(f) = [0, + ) but f(0) =. 9 / 20

Compute subgradients: general rules Smooth functions: f(x) = { f(x)}. Composition with affine mapping: φ(x) = f(a(x) + b) φ(x) = A T f(a(x) + b). Positive sums: α, β > 0, f(x) = αf 1(x) + βf 2(x). f(x) = α f 1(x) + β f 2(x) Maximums: f(x) = max i {1,,n} {f i(x)} f(x) = conv{ f i(x) f i(x) = f(x)} 10 / 20

Examples f(x) = x. f(x) = { {sign(x)} x 0; [ 1, 1] otherwise 3 as seen, f is a relation or a point-to-set mapping 3 figure taken from Boyd and Vandenberghe, http://see.stanford.edu/materials/lsocoee364b/01-subgradients_notes.pdf. 11 / 20

Examples f(x) = n ai, x bi. Define i=1 I (x) = {i a i, x b i < 0} I +(x) = {i a i, x b i > 0} I 0(x) = {i a i, x b i = 0}. Then f(x) = a i a i + [ a i, a i] i I + (x) i I (x) i I 0 (x) (the last sum is the Minkowski sum) minimize x Ax b 1 is known as robust fitting, which is more robust to the outliers than the least-squares problem. 12 / 20

f(x) = max i {1,,n} x i. Then f(x) = conv{e i x i = f(x)} f(0) = conv { e i i {1,, n} } Note: conv denotes the convex hull: conv{x i} := { α ix i : a i 0, α i = 1 } i i 13 / 20

Examples f(x) = x 2. f is differential away from 0, so: f(x) = At 0, go back to subgradient equation: Thus, g f(0), if, and only if, g,y y 2 ball B2(0, 1) = B 2(0, 1). This is a common pattern! x x 2, where x 0. y 2 0 + g, y 0 1 for all y 0. Thus, g is in the dual 14 / 20

How to compute subgradients: Examples f(x) = x = max i {1,,n} x (i). f(x) = conv{g i : g i x (i), x (i) = f(x)}, where x 0. Going back to subgradient equation y 0 + g, y g,y Thus, g f(0), if, and only if, y 1 for all y 0. Thus, f(0) is the dual ball to the l norm: B 1(0, 1). 15 / 20

Examples f(x) = x 1 = n xi. Then i=1 f(x) = e i x i >0 e i + [ e i, e i] x i <0 x i =0 for all x. Then n f(0) = [ e i, e i] = B (0, 1). i=1 16 / 20

Semi-continuity Definition (upper semi-continuity) A point-to-set mapping M is upper semi-continuous at x if any convergent sequence (x k, s k ) (x, s ) satisfying s k M(x k ) for each k also obeys s M(x ). Interpretation: if (i) x k x and (ii) for each x k you can find s k M(x k ) so that s k s, then s M(x ). Lower semi-continuity is essentially the opposite. Definition (lower semi-continuity, lsc) A point-to-set mapping M is lower semi-continuous if any convergent sequence x k x and s M(x ), there exists sequence s i M(x k i ) such that s i s. 17 / 20

Lemma Let f be a proper convex function. f is upper semi-continuous, and f(x) is a convex set. Proof: Take the limit of f(y) f(x k ) + s k, y x k, s k f(x k ). The second part is a direct result of linearity of, y x. However, if f(x) = x, the f is not lower semi-continuous at x = 0. 18 / 20

Directional derivative versus Subgradient Assume that f : R n R {+ } is a proper, closed (thus lsc), and convex. Then 1. the directional derivative is well defined for every d R n : f (x; d) = lim α 0 f(x + αd) f(x). α 2. the directional derivatives at x bound all the subgradient projections f(x) = {p R n : f (x; d) p, d, d R n }. 3. directional derivatives are extreme subgradients: f (x; d) = max{ p, d : p f(x)}. 19 / 20

0 subgradient = minimum Suppose that 0 f(x). If f is smooth and convex, 0 f(x) = { f(x)} = f(x) = 0. In general: If 0 f(x), then f(y) f(x) + 0, y x = f(x) for all y R n. = x is a minimum! Converse also true: f(y) f(x ) + 0 = f(x ) + 0, y x. 20 / 20