Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient

Similar documents
Optimization and Optimal Control in Banach Spaces

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Math 273a: Optimization Subgradients of convex functions

BASICS OF CONVEX ANALYSIS

It follows from the above inequalities that for c C 1

Convex Optimization Notes

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

Math 273a: Optimization Subgradient Methods

Lecture 2: Convex Sets and Functions

Dual Decomposition.

L p Spaces and Convexity

Math 273a: Optimization Subgradients of convex functions

Lecture 4 Lebesgue spaces and inequalities

Introduction to Convex Analysis Microeconomics II - Tutoring Class

Online Convex Optimization

It follows from the above inequalities that for c C 1

6. Proximal gradient method

Lecture: Duality of LP, SOCP and SDP

Math 273a: Optimization Convex Conjugacy

Solving Dual Problems

Convex Optimization Theory. Chapter 4 Exercises and Solutions: Extended Version

Lecture: Smoothing.

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

8. Conjugate functions

Convex Functions and Optimization

Convex Optimization Conjugate, Subdifferential, Proximation

Problem set 5, Real Analysis I, Spring, otherwise. (a) Verify that f is integrable. Solution: Compute since f is even, 1 x (log 1/ x ) 2 dx 1

Convex Analysis and Economic Theory AY Elementary properties of convex functions

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

Lecture 16: FTRL and Online Mirror Descent

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Fixed point theory for nonlinear mappings in Banach spaces and applications

Optimality Conditions for Nonsmooth Convex Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Subdifferential representation of convex functions: refinements and applications

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS

Convex Sets and Functions (part III)

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Chapter 1 Preliminaries

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Epiconvergence and ε-subgradients of Convex Functions

LECTURE 9 LECTURE OUTLINE. Min Common/Max Crossing for Min-Max

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Analysis and Optimization Chapter 4 Solutions

Solution of the 8 th Homework

Econ Slides from Lecture 1

Analysis II - few selective results

Finite Dimensional Optimization Part III: Convex Optimization 1

Convergent Iterative Algorithms in the 2-inner Product Space R n

Conditional Gradient (Frank-Wolfe) Method

Elements of Convex Optimization Theory

212a1214Daniell s integration theory.

Convex Functions. Pontus Giselsson

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

PROPERTIES OF A CLASS OF APPROXIMATELY SHRINKING OPERATORS AND THEIR APPLICATIONS

6. Proximal gradient method

Lecture 6: September 17

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

GARP and Afriat s Theorem Production

AD for PS Functions and Relations to Generalized Derivative Concepts

Dual and primal-dual methods

QUADRATIC MAJORIZATION 1. INTRODUCTION

M2PM1 Analysis II (2008) Dr M Ruzhansky List of definitions, statements and examples Preliminary version

Monotone Operator Splitting Methods in Signal and Image Recovery

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

MTH 503: Functional Analysis

A. Derivation of regularized ERM duality

1/37. Convexity theory. Victor Kitov

Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives

14 Lecture 14 Local Extrema of Function

CHAPTER III THE PROOF OF INEQUALITIES

Gauge optimization and duality

9. Dual decomposition and dual algorithms

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Nonlinear Functional Analysis and its Applications

Lecture Note 5: Semidefinite Programming for Stability Analysis

Subgradient Method. Ryan Tibshirani Convex Optimization

Econ Lecture 3. Outline. 1. Metric Spaces and Normed Spaces 2. Convergence of Sequences in Metric Spaces 3. Sequences in R and R n

Convex Optimization M2

SPECTRAL PROPERTIES OF THE LAPLACIAN ON BOUNDED DOMAINS

Radial Subgradient Descent

4. We accept without proofs that the following functions are differentiable: (e x ) = e x, sin x = cos x, cos x = sin x, log (x) = 1 sin x

Lecture 3. Econ August 12

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

Convex Optimization and Modeling

Convergence Rates in Regularization for Nonlinear Ill-Posed Equations Involving m-accretive Mappings in Banach Spaces

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Majorization Minimization - the Technique of Surrogate

arxiv: v1 [math.oc] 7 Dec 2018

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

Transcription:

Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient Xingyu Zhou The Ohio State University zhou.2055@osu.edu December 5, 2017 Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 1 / 47

Main Result Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 2 / 47

Main Result Main Result Theorem (i) If f is closed and strong convex with parameter µ, then f has a Lipschitz continuous gradient with parameter 1 µ. (ii) If f is convex and has a Lipschitz continuous gradient with parameter L, then f is strong convex with parameter 1 L. We provide a very simple proof for this theorem (two-line). To this end, we first present equivalent conditions for strong convexity and Lipschitz continuous gradient. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 3 / 47

Review of Convex Function Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 4 / 47

Review of Convex Function Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 5 / 47

Review of Convex Function (Convexity) Suppose f : R n R with the extended-value extension. Then, f is said to be a convex function if for any α [0, 1], for any x and y. f (αx + (1 α)y) αf (x) + (1 α)f (y) Note: This definition does not require the differentiability of f. We are familiar that this definition is equivalent to the first-order condition and monotonicity condition assuming f is differentiable. In the following, we will first show that this equivalence still holds for a non-smooth function. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 6 / 47

Review of Convex Function Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 7 / 47

Review of Convex Function for Convexity Lemma (Equivalence for Convexity) Suppose f : R n R with the extended-value extension. Then, the following statements are equivalent: (i) (Jensen s inequality): f is convex. (ii) (First-order): f (y) f (x) + s T x (y x) for all x, y and any s x f (x). (iii) (Monotonicity of subgradient): (s y s x ) T (y x) 0 for all x, y and any s x f (x), s y f (y). Note: The lemma is a non-trivial extension as we will see later in the proof. Moreover, this lemma will be extremely useful as we later deal with strong convexity. Specifically, we will prove this lemma by showing (i) = (ii) = (iii) = (i). Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 8 / 47

Review of Convex Function Proof (i) = (ii): Since f is convex, by definition, we have for any 0 < α 1 f (x + α(y x)) αf (y) + (1 α)f (x) f (x + α(y x)) f (x) f (y) f (x) α (a) = f (x, y x) f (y) f (x) (b) = s T x (y x) f (y) f (x) s x f (x) (a) is obtained by letting α 0 and the definition of directional derivate. Note that the limit always exists for a convex f by the monotonicity of difference quotient. (b) follows from the fact that f (x, v) = sup s f (x) s T v for a convex f. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 9 / 47

Review of Convex Function Proof (Cont d) (ii) = (iii): From (ii), we have Adding together directly yields (iii). f (y) f (x) + s T x (y x) f (x) f (y) + s T y (x y) Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 10 / 47

Review of Convex Function Proof (Cont d) (iii) = (i): To this end, we will show (iii) = (ii) and (ii) = (i). First, (iii) = (ii): Let φ(α) := f (x + α(y x)) and x α := x + α(y x), then f (y) f (x) = φ(1) φ(0) = 1 0 s T α (y x)dα (1) where s α f (x α ) for t [0, 1]. Then by (iii), for any s x f (x), we have s T α (y x) s T x (y x) which combined with inequality above directly implies (ii). Second, (ii) = (i): By (ii), we have f (y) f (x α ) + s T t (y x α ) (2) f (x) f (x α ) + s T t (x x α ) (3) Multiplying (2) by α and (3) by 1 α, and adding together yields (i). Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 11 / 47

Strong Convexity Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 12 / 47

Strong Convexity Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 13 / 47

Strong Convexity Strongly Convex Function (Strong Convexity) Suppose f : R n R with the extended-value extension. Then, f is said to be strongly convex with parameter µ if is convex. g(x) = f (x) µ 2 x T x Note: As before, this definition does not require the differentiability of f. In the following, by applying the previous equivalent conditions of convexity to g, we can easily establish equivalence between strong convexity. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 14 / 47

Strong Convexity Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 15 / 47

Strong Convexity for Strong Convexity Lemma (Equivalence for Strong Convexity) Suppose f : R n R with the extended-value extension. Then, the following statements are equivalent: (i) f is strongly convex with parameter µ. (ii) f (αx + (1 α)y) αf (x) + (1 α)f (y) µ 2 α(1 α) y x 2 for any x, y. (iii) f (y) f (x) + s T x (y x) + µ 2 y x 2 for all x, y and any s x f (x). (iv) (s y s x ) T (y x) µ y x 2 for all x, y and any s x f (x), s y f (y). Proof: (i) (ii): It follows from the definition of convexity for g(x) = f (x) µ 2 x T x. (i) (iii): It follows from the equivalent first-order condition of convexity for g. (i) (iv): It follows from the equivalent monotonicity of (sub)-gradient of convexity for g. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 16 / 47

Strong Convexity Implications of Strong Convexity Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 17 / 47

Strong Convexity Implications of Strong Convexity Conditions Implied by Strong Convexity (SC) Lemma (Implications of SC) Suppose f : R n R with the extended-value extension. The following conditions are all implied by strong convexity with parameter µ: (i) 1 2 s x 2 µ(f (x) f ), x and s x f (x). (ii) s y s x µ y x x, y and any s x f (x), s y f (y). (iii) f (y) f (x) + s T x (y x) + 1 2µ s y s x 2 x, y and any s x f (x), s y f (y). (iv) (s y s x ) T (y x) 1 µ s y s x 2 x, y and any s x f (x), s y f (y). Note: In the case of smooth function f, (i) reduced to Polyak-Lojasiewicz (PL) inequality, which is more general than SC and is extremely useful for linear convergence. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 18 / 47

Strong Convexity Implications of Strong Convexity Proof (i) 1 2 s x 2 µ(f (x) f ) (SC) = (i): By equivalence of strong convexity in the previous lemma, we have f (y) f (x) + sx T (y x) + µ y x 2 2 Taking minimization with respect to y on both sides, yields Re-arranging it yields (i). f f (x) 1 2µ s x 2 Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 19 / 47

Strong Convexity Implications of Strong Convexity Proof (Cont d) (ii) s y s x µ y x SC = (ii): By equivalence of strong convexity in the previous lemma, we have (s y s x ) T (y x) µ y x 2 Applying Cauchy-Schwartz inequality to the left-hand-side, yields (ii). Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 20 / 47

Strong Convexity Implications of Strong Convexity Proof (Cont d) (iii) f (y) f (x) + s T x (y x) + 1 2µ s y s x 2 SC = (iii): For any given s x f (x), let φ x (z) := f (z) s T x z. First, since (s φ z 1 s φ z 2 )(z 1 z 2 ) = (s f z 1 s f z 2 )(z 1 z 2 ) µ z 1 z 2 2 which implies that φ x (z) is also strong convexity with parameter µ. Then, applying PL-inequality to φ x (z), yields Re-arranging it yields (iii). φ x = f (x) sx T x φ x (y) 1 sy φ 2 2µ = f (y) sx T y 1 2µ s y s x 2. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 21 / 47

Strong Convexity Implications of Strong Convexity Proof (Cont d) (iv) (s y s x ) T (y x) 1 µ s y s x 2 SC = (iv): From previous result, we have Adding together yields (iv). f (y) f (x) + s T x (y x) + 1 2µ s y s x 2 f (x) f (y) + s T y (x y) + 1 2µ s x s y 2 Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 22 / 47

Lipschitz Continuous Gradient Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 23 / 47

Lipschitz Continuous Gradient Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 24 / 47

Lipschitz Continuous Gradient A differentiable function f is said to have an L-Lipschitz continuous gradient if for some L > 0 f (x) f (y) L x y, x, y. Note: The definition does not require the convexity of f. In the following, we will see different conditions that are equivalent or implied by Lipschitz continuous gradient condition. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 25 / 47

Lipschitz Continuous Gradient Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 26 / 47

Lipschitz Continuous Gradient Relationship [0] f (x) f (y) L x y, x, y. [1] g(x) = L 2 x T x f (x) is convex, x [2] f (y) f (x) + f (x) T (y x) + L 2 y x 2, x, y. [3] ( f (x) f (y) T (x y) L x y 2, x, y. α(1 α)l [4] f (αx + (1 α)y) αf (x) + (1 α)f (y) x y 2, x, y and α [ 2 [5] f (y) f (x) + f (x) T (y x) + 1 2L f (y) f (x) 2, x, y. [6] ( f (x) f (y) T (x y) 1 L f (x) f (y) 2, x, y. [7] f (αx + (1 α)y) αf (x) + (1 α)f (y) α(1 α) f (x) f (y) 2, x, y 2L Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 27 / 47

Lipschitz Continuous Gradient Relationship (Cont d) Lemma For a function f with a Lipschitz continuous gradient over R n, the following relations hold: [5] [7] = [6] = [0] = [1] [2] [3] [4] (4) If the function f is convex, then all the conditions [0]-[7] are equivalent. Note: From (5), we can see that conditions [0]-[7] can be grouped into four classes. Class A: [1]-[4] Class B: [5],[7] Class C: [6] Class D: [0] Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 28 / 47

Lipschitz Continuous Gradient Proof Class A [1] g(x) = L 2 x T x f (x) is convex [2] f (y) f (x) + f (x) T (y x) + L y x 2 2 [3] ( f (x) f (y) T (x y) L x y 2 [4] f (αx + (1 α)y) αf (x) + (1 α)f (y) α(1 α)l x y 2 2 [1] [2]: It follows from the first-order equivalence of convexity. [1] [3]: It follows from the monotonicity of gradient equivalence of convexity. [1] [4]: It follows from the definition of convexity. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 29 / 47

Lipschitz Continuous Gradient Proof (Cont d) Class B [5] f (y) f (x) + f (x) T (y x) + 1 2L f (y) f (x) 2. [7] f (αx + (1 α)y) αf (x) + (1 α)f (y) [5] = [7]: Let x α := αx + (1 α)y, then α(1 α) f (x) f (y) 2. 2L f (x) f (x α ) + f (x α ) T (x x α ) + 1 2L f (x) f (x α) 2 f (y) f (x α ) + f (z) T (y x α ) + 1 2L f (y) f (x α) 2 Multiplying the first inequality with α and second inequality with 1 α, and adding them together and using α x 2 + (1 α) y 2 α(1 α) x y 2, yields [7]. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 30 / 47

Lipschitz Continuous Gradient Proof (Cont d) Class B [5] f (y) f (x) + f (x) T (y x) + 1 2L f (y) f (x) 2. [7] f (αx + (1 α)y) αf (x) + (1 α)f (y) [7] = [5]: Interchanging x and y in [7] and re-writing it as f (y) f (x) + Letting α 0, yields [5]. f (x + α(y x)) f (x) α α(1 α) f (x) f (y) 2. 2L + 1 α f (x) f (y) 2 2L Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 31 / 47

Lipschitz Continuous Gradient Relationship (Cont d) Lemma For a function f with a Lipschitz continuous gradient over R n, the following relations hold: [5] [7] = [6] = [0] = [1] [2] [3] [4] (5) If the function f is convex, then all the conditions [0]-[7] are equivalent. Note: From (5), we can see that conditions [0]-[7] can be grouped into four classes. Class A: [1]-[4] Class B: [5],[7] Class C: [6] Class D: [0] Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 32 / 47

Lipschitz Continuous Gradient Proof (Cont d) Class B ([5]) = Class C ([6]) = Class D ([0]) = Class A ([3]) [5] f (y) f (x) + f (x) T (y x) + 1 2L f (y) f (x) 2. [6] ( f (x) f (y)) T (x y) 1 f (x) f (y) 2 L [0] f (x) f (y) L x y [3] ( f (x) f (y)) T (x y) L x y 2 [5] = [6]: Interchanging x and y and adding together. [6] = [0]: It follows directly from Cauchy-Schwartz inequality. [0] = [3]: It following directly from Cauchy-Schwartz inequality. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 33 / 47

Lipschitz Continuous Gradient Proof (Cont d) Class A ([3]) = Class B ([5]) for convex f [3] ( f (x) f (y)) T (x y) L x y 2. [5] f (y) f (x) + f (x) T (y x) + 1 2L f (y) f (x) 2. [3] = [5]: Let φ x (z) := f (z) f (x) T z. Note that since f is convex, φ x (z) attains its minimum at z = x. By [3], we have which implies that ( φ x (z 1 ) φ x (z 2 ) T (z 1 z 2 ) L z 1 z 2 2 φ x (z) φ x (y) + φ x (y) T (z y) + L 2 z y 2. Taking minimization over z, yields [5]. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 34 / 47

Conjugate Function Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 35 / 47

Conjugate Function Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 36 / 47

Conjugate Function (Conjugate) The conjugate of a function f is f (s) = sup (s T x f (x)) x domf Note: f is convex and closed (even f is not). (Subgradient) s is a subgradient of f at x if Note: f need not be convex. f (y) f (x) + s T (y x) for all y Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 37 / 47

Conjugate Function Useful Results Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 38 / 47

Conjugate Function Useful Results Useful Results Lemma (Fenchel-Young s Equality) Consider the following conditions for a general f : [1] f (s) = s T x f (x) [2] s f (x) [3] x f (s) Then, we have [1] [2] = [3] Further, if f is closed and convex, then all these conditions are equivalent. Lemma (Differentiability) For a closed and strictly convex f, f (s) = argmax x (s T x f (x)). Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 39 / 47

Conjugate Function Useful Results Proof [1] f (s) = s T x f (x) [2] s f (x) [1] [2]: s f (x) f (y) f (x) + s T (y x) y s T x f (x) s T y f (y) y s T x f (x) f (s) Since f (s) s T x f (x) always holds, [1] [2]. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 40 / 47

Conjugate Function Useful Results Proof (Cont d) [2] s f (x) [3] x f (s) [2] = [3]: If [2] holds, then by previous result we have f (s) = s T x f (x). Thus, f (z) = sup(z T u f (u)) u z T x f (x) = s T x f (x) + x T (z s) = f (s) + x T (z s) This holds for all z, which implies x f (s). For a closed and convex function, we have [3] = [2]: This follows from f = f. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 41 / 47

Conjugate Function Useful Results Proof (Differentiability) Suppose x 1 f (s) and x 2 f (s). By closeness and convexity, we have [3] = [2], which gives s f x 1 and s f x 2 By previous result ([2] = [1]), we have x 1 = argmax(s T x f (x)) and x 2 = argmax(s T x f (x)) By strictly convexity, s T x f (x) has a unique maximizer for every s. Thus, x := x 1 = x 2 Thus f (s) = {x}, which implies f is differentiable at s and f (s) = argmax x (s T x f (x)). Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 42 / 47

Proof of Main Result Outline 1 Main Result 2 Review of Convex Function 3 Strong Convexity Implications of Strong Convexity 4 Lipschitz Continuous Gradient 5 Conjugate Function Useful Results 6 Proof of Main Result Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 43 / 47

Proof of Main Result Proof of Main Result Theorem (i) If f is closed and strong convex with parameter µ, then f has a Lipschitz continuous gradient with parameter 1 µ. (ii) If f is convex and has a Lipschitz continuous gradient with parameter L, then f is strong convex with parameter 1 L. Proof: (1): By implication of strong convexity, we have which implies s x s y µ x y s x f (x), s y f (y) s x s y µ f (s x ) f (s y ) Hence, f has a Lipschitz continuous gradient with 1 µ. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 44 / 47

Proof of Main Result Proof (Cont d) (2): By implication of Lipschitz continuous gradient for convex f, we have which implies ( f (x) f (y) T (x y) 1 f (x) f (y) 2 L (s x s y ) T (x y) 1 L (s x s y ) x f (s x ), y f (s y ) Hence, f is strongly convex with parameter 1 L. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 45 / 47

Proof of Main Result Summary A very simple proof of Fenchel duality between strong convexity and Lipschitz continuous gradient. A summary of conditions for strong convexity. A summary of conditions for Lipschitz continuous gradient Common tricks: Taking limit (α 0). Taking minimization over both sides. Interchanging x and y, and adding together. Applying mean-value theorem. Applying Cauchy-Schwartz inequality. Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 46 / 47

Proof of Main Result Thank you! Q & A Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 47 / 47