Lecture 2: Convex Sets and Functions

Similar documents
Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

LECTURE 3 LECTURE OUTLINE

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Lecture 3. Optimization Problems and Iterative Algorithms

Constrained Optimization and Lagrangian Duality

14 Lecture 14 Local Extrema of Function

Static Problem Set 2 Solutions

Linear and non-linear programming

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Lecture Note 5: Semidefinite Programming for Stability Analysis

Convex optimization COMS 4771

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Analysis and Optimization Chapter 2 Solutions

Introduction to Convex Analysis Microeconomics II - Tutoring Class

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

Optimization and Optimal Control in Banach Spaces

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research

Handout 2: Elements of Convex Analysis

Optimization Theory. Lectures 4-6

LECTURE SLIDES ON BASED ON CLASS LECTURES AT THE CAMBRIDGE, MASS FALL 2007 BY DIMITRI P. BERTSEKAS.

CS-E4830 Kernel Methods in Machine Learning

1/37. Convexity theory. Victor Kitov

Module 04 Optimization Problems KKT Conditions & Solvers

Lecture 1: January 12

LECTURE SLIDES ON CONVEX OPTIMIZATION AND DUALITY THEORY TATA INSTITUTE FOR FUNDAMENTAL RESEARCH MUMBAI, INDIA JANUARY 2009 PART I

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Hamiltonian Mechanics

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Assignment 1: From the Definition of Convexity to Helley Theorem

Microeconomics I. September, c Leopold Sögner

8. Geometric problems

Convex Functions and Optimization

13. Convex programming

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

LECTURE SLIDES ON CONVEX ANALYSIS AND OPTIMIZATION BASED ON LECTURES GIVEN AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY CAMBRIDGE, MASS

Optimization for Machine Learning

Optimality Conditions for Nonsmooth Convex Optimization

Math 273a: Optimization Subgradient Methods

Convex Optimization Theory

Definition of convex function in a vector space

5. Duality. Lagrangian

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

Convex Optimization Theory. Chapter 4 Exercises and Solutions: Extended Version

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

4. Convex Sets and (Quasi-)Concave Functions

Chapter 2: Preliminaries and elements of convex analysis

Convex Analysis and Economic Theory AY Elementary properties of convex functions

Lecture 3: Functions of Symmetric Matrices

Math 273a: Optimization Subgradients of convex functions

Convex Optimization M2

Convex Optimization Boyd & Vandenberghe. 5. Duality

8. Geometric problems

1 Strict local optimality in unconstrained optimization

Math 273a: Optimization Subgradients of convex functions

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lecture 3: Basics of set-constrained and unconstrained optimization

Economics 101A (Lecture 3) Stefano DellaVigna

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Newton s Method. Javier Peña Convex Optimization /36-725

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Lecture: Duality.

Optimality Conditions for Constrained Optimization

1 Directional Derivatives and Differentiability

Mathematical Economics: Lecture 16

Chap 2. Optimality conditions

A Brief Review on Convex Optimization

Lecture: Duality of LP, SOCP and SDP

MATH2070 Optimisation

Lecture 2: Convergence of Random Variables

Inequality Constraints

Lecture 2: Convex functions

Lagrangian Duality Theory

Introduction to Nonlinear Stochastic Programming

Lecture 1: Background on Convex Analysis

Introduction and Math Preliminaries

Some Background Math Notes on Limsups, Sets, and Convexity

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Nonlinear Programming 3rd Edition. Theoretical Solutions Manual Chapter 6

Online Convex Optimization

Information Theory and Communication

SWFR ENG 4TE3 (6TE3) COMP SCI 4TE3 (6TE3) Continuous Optimization Algorithm. Convex Optimization. Computing and Software McMaster University

ICS-E4030 Kernel Methods in Machine Learning

Chapter 13. Convex and Concave. Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

January 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions

Symmetric Matrices and Eigendecomposition

Week 4: Calculus and Optimization (Jehle and Reny, Chapter A2)

8. Constrained Optimization

Convex Functions. Pontus Giselsson

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

EE514A Information Theory I Fall 2013

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Chapter 14 Unconstrained and Constrained Optimization Problems

Transcription:

Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22

Optimization Problems Optimization problems are generally expressed as min f(x) subject to x X, where X is a subset of R n. The function f : X R is called the objective function of the optimization problem, and the condition x X is called the constraint(s) of the problem. The set X is called the feasible set, and a vector x is said to be feasible if x X. The goal is to find a vector x X that minimizes the objective function. When the objective function or the feasible set is nonlinear, the optimization problem is called a nonlinear optimization problem or nonlinear program. If both the objective function and feasible set are linear, it is called a linear optimization problem or linear program. Lecture 2 Network Optimization, Fall 2015 2 / 22

Convex Optimization When the objective function and the feasible set are both convex (We will soon study the convexity of sets and functions), the optimization problem is called a convex optimization problem or convex program, and it can be solved efficiently (under some conditions). Examples Linear regression: min w Xw y, where (X, y) is the measurement setup and observed data pair, and x is the vector of (linear) model parameters. Classification (logistic regression or SVM): min w n log(1 + exp( y i x T i w)) or w 2 + C n ξ s.t. ξ 1 y i x T i w, ξ 0. Lecture 2 Network Optimization, Fall 2015 3 / 22

Examples Examples (contd.) Maximum likelihood estimation: k-means: max θ min µ 1,...,µ k n log p θ (x i ) k j=1 i C j x i µ j 2 Lecture 2 Network Optimization, Fall 2015 4 / 22

Course Outline Convexity of sets and functions Unconstrained optimization Optimality conditions and algorithms Constrained optimization Optimality conditions and algorithms Lagrange multiplier theory Lagrange multiplier algorithms Duality and convex programming Dual methods First order methods for large scale problems Robust optimization Lecture 2 Network Optimization, Fall 2015 5 / 22

Convex Sets A convex combination of points x 1,..., x m R n is defined as α 1 x 1 + + α m x m where α 1 + + α m = 1 and α i 0, i. Note that convex combinations of two points are the line segment connecting the two points. Definition of Convex Set A subset C of R n is said to be convex if αx + (1 α)y C, x, y C, α [0, 1]. For a convex set C and any two points in C, the line segment connecting the two points is contained in C. Examples on the board Lecture 2 Network Optimization, Fall 2015 6 / 22

Illustration of Convex Sets αx + (1 α)y, 0 α 1 x y x y x x y y Figure : Examples of convex/nonconvex sets ([Bertsekas, p.688]) Lecture 2 Network Optimization, Fall 2015 7 / 22

Convex Functions Definition of Convex Function Let C be a convex subset of R n. A function f : C R is said to be convex if f(αx + (1 α)y) αf(x) + (1 α)f(y), x, y C, α [0, 1]. The function f is called concave if f is convex. The function f is called strictly convex if the above inequality is strict for all x, y C with x y, and all α (0, 1). Examples Convex functions: x 2, e x Concave functions: log(x), x Neither convex nor concave: x 3 Lecture 2 Network Optimization, Fall 2015 8 / 22

Illustration of Convex Functions f(y) "#*% f(x) "#$% αf(x)!"#$%&'&#(&)&!%"#*% + (1 α)f(y) f ( αx "#!$&'&#(&)&!%*% + (1 α)y ) x$ αx!$&'&#(&)&!%* + (1 α)y y* C+ Figure : Illustration of definition of convex function ([Bertsekas, p.689]) Lecture 2 Network Optimization, Fall 2015 9 / 22

Convex Sets and Functions Proposition (a) For any collection {C i i I} of convex sets, the set intersection i I C i is convex. (b) The vector sum of two convex sets C 1 and C 2 is convex. (c) The image of a convex set under a linear transformation is convex. (d) For a convex set C and convex function f : C R, the level sets {x C f(x) α} and {x C f(x) < α} are convex for all scalars α. Exercises Is the union of convex sets convex? Lecture 2 Network Optimization, Fall 2015 10 / 22

Convex Functions Jensen s Inequality For a convex set C and a convex function f : C R, (a) ( m ) m f α i x i α i f(x i ), x 1,..., x m, α 1,..., α m 0 and m α i = 1. (b) ( f C ) xw(x)dx f(x)w(x)dx, C where w : C R + such that w(x)dx = 1. C Exercises Apply Jensen s inequality to prove that (x 1 x n ) 1/n x1+ +xn n nonnegative numbers x 1,..., x n. (Hint. Use the convexity of e x ) for Lecture 2 Network Optimization, Fall 2015 11 / 22

Jensen s Inequality Part (b) can be viewed as an extension of part (a) in the sense that the nonnegative weight function w corresponds to the nonnegative weight α and it integrates to 1, which is an analogy to the condition that the weights α i add up to 1. Jensen s inequality is one of the most used inequalities in applied mathematics and probability theory. For a random variable X and a convex function f, Part (b) leads to f(e[x]) E[f(X)] where E denotes the expectation. For example, we have E[X 2 ] E[X] 2 which is clear from the definition of variance. Part (a) can be proved by mathematical induction (assume WLOG α i > 0, i) ( m ) m 1 α f α i x i α mf(x n) + (1 α m)f i x i (why?) 1 α m m 1 α i α mf(x n) + (1 α m) f (x i ) (by induction hypothesis) 1 α m m = α i f(x i ) Lecture 2 Network Optimization, Fall 2015 12 / 22

Convex Functions Properties of Convex Function (a) A linear function is convex. (b) Any vector norm is convex. (c) The weighted sum of convex functions, with positive weights, is convex. (d) If I is an index set, C is a convex subset of R n, and f i : C R is convex for each i I, then the function h : C (, ] defined by is also convex. h(x) = sup f i (x) i I Exercises Let f(x) = 2x + 3 and g(x) = 2x + 1. Draw the function h(x) = max{f(x), g(x)} and check the convexity. Lecture 2 Network Optimization, Fall 2015 13 / 22

Differentiable Convex Functions Characterization of Differentiable Convex Functions Let C be a convex subset of R n and let f : R n R be differentiable over R n. (a) f is convex over C if and only if f(z) f(x) + (z x) T f(x), x, z C. (b) f is strictly convex over C if and only if the above inequality is strict whenever x z. Proof: ( ) Assume the inequality holds. Choose any x, y C and α [0, 1], and let z = αx + (1 α)y. Using the inequality twice, we obtain f(x) f(z) + (x z) T f(z), f(y) f(z) + (y z) T f(z). Multiply the first inequality by α, the second by (1 α), and add them to obtain αf(x) + (1 α)f(y) f(z) + (αx + (1 α)y z) T f(z) = z, which proves that f is convex. Conversely, assume that f is convex, let x and z be any vectors in C with x z, and for α (0, 1), consider the function f(x + α(z x)) f(x) g(α) =, α (0, 1]. α Lecture 2 Network Optimization, Fall 2015 14 / 22

Differentiable Convex Functions Proof (contd.): Consider any α 1, α 2, with 0 < α 1 < α 2 < 1, and let ᾱ = α 1 α 2, z = x + α 2 (z x). We have f(x + ᾱ( z x)) f(x) f(x + ᾱ( z x)) = ᾱf( z) + (1 ᾱ)f(x) ( z) f(x) ᾱ Note that the above inequality is strict if f is strictly convex. It follows from the above inequality that f(x + α 1 (z x)) f(x) α 1 f(x + α 2(z x)) f(x) α 2 g(α 1 ) g(α 2 ), with strict inequality if f is convex. This shows that g is an monotonically increasing function with α and thus, (z x) T f(x) = lim α 0 g(α) g(1) = f(z) f(x), which is the desired inequality. Exercises Check the above characterization by drawing convex function x 2 and its outer linear approximation. Lecture 2 Network Optimization, Fall 2015 15 / 22

Second Order Characterization Conditions for Convexity Let C be a convex subset of R n and let f : R n R be twice continuously differentiable over R n. (a) If 2 f(x) is positive semidefinite for all x C, then f is convex over C. (b) If 2 f(x) is positive definite for all x C, then f is strictly convex over C. (c) If C is open and f is convex over C, then 2 f(x) is positive semidefinite for all x C. (d) If f(x) = x T Qx, where Q is a symmetric matrix, then f is convex if and only if Q is positive semidefinite. Furthermore, f is strictly convex if and only if Q is positive definite. Proof of (a): For all x, y C, we have f(y) = f(x) + (y x) T f(x) + 1 2 (y x)t 2 f(x + α(y x))(y x) for some α [0, 1]. By the positive semidefiniteness of 2 f, we have f(y) = f(x) + (y x) T f(x), which shows that f is convex over C. Lecture 2 Network Optimization, Fall 2015 16 / 22

Second Order Characterization Exercises Find an example of a strictly convex function f such that the Hessian 2 f is not positive definite. This shows that the converse of part (b) is not true in general. Check whether the following functions are (strictly) convex (a) f(x 1, x 2) = x 2 1 + x 2 2 over R 2 (b) f(x) = log(x) over R ++ (c) f(x, y, z) = e x+y+z (d) f(x, y, z) = e x2 +y+z log(x + y) + 3 z2 Lecture 2 Network Optimization, Fall 2015 17 / 22

Strong Convexity We now consider a strengthened form of convexity, which is the key property in proving the linear convergence of first order method for solving the convex optimization problem. A continuously differentiable function f : C R, where C is a convex set, is said to be strongly convex if for some α > 0, ( f(x) f(y)) T (x y) α x y 2, x, y C. Strong Convexity Let C be an open convex subset of R n, and let f : C R be a function that is continuously differentiable over C. If f is strongly convex, then f is strictly convex. Furthermore, if f is twice continuously differentiable over C, then f satisfies the strong convexity condition if and only if the matrix 2 f(x) αi, where I is the identity, is positive definite for every x C. Lecture 2 Network Optimization, Fall 2015 18 / 22

Exercises Assume that is l 2 -norm in this slide. Consider a twice continuously differentiable function f defined over an open convex set C. Prove that for some positive constant α the following are equivalent: (a) f(x) α 2 x 2 is convex. (b) f(cx + (1 c)y) cf(x) + (1 c)f(y) c(1 c)α 2 x y 2, c [0, 1]. Prove that f is strongly convex if either (a) or (b) holds. Lecture 2 Network Optimization, Fall 2015 19 / 22

Convex and Affine Hull Let X be a subset of R n. Recall that a convex combination of elements of X is a vector of the form m α ix i, where x 1,..., x m X and α 1,..., α m are nonnegative and add up to 1. The convex hull of X, denoted conv(x), is the set of all convex combinations of elements of X. In particular, if X contains a finite number of elements x 1,...x m, then { m } m conv(x) = α i x i α i 0, i = 1,..., m, α i = 1. Caratheodory s Theorem Let X be a subset of R n. Every element of conv(x) can be represented as a convex combination of no more than n + 1 elements of X. Lecture 2 Network Optimization, Fall 2015 20 / 22

Local and Global Minima Let X R n and let f : X R be a function. A vector x X is called a local minimum of f if there exists some ɛ > 0 such that f(x) f(y) for every y X satisfying x y ɛ, where is some vector norm. A vector x is called a global minimum if f(x) f(y), y X. Under convexity assumptions, local minimum is equivalent to global minimum. Local Min = Global Min under Convexity If C is a convex subset of R n and f : C R is a convex function, then a local minimum of f is also a global minimum. If in addition f is strictly convex, then there exists at most one global minimum of f. Lecture 2 Network Optimization, Fall 2015 21 / 22

Projection Theorem Projection Theorem Let C be a closed convex set and let be the Euclidean norm. (a) For every x R n, there exists a unique vector z C that minimizes z x over all z C. This vector is called the projection of x on C, and is denoted by [x] +, i.e., [x] + = arg min z x. z C (b) Given some x R n, a vector z C is equal to [x] + if and only if (y z) T (x z) 0, y C. (c) The mapping f : R n C defined by f(x) = [x] + is continuous and nonexpansive, i.e., [x] + [y] + x y, x, y R n. Lecture 2 Network Optimization, Fall 2015 22 / 22