Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22
Optimization Problems Optimization problems are generally expressed as min f(x) subject to x X, where X is a subset of R n. The function f : X R is called the objective function of the optimization problem, and the condition x X is called the constraint(s) of the problem. The set X is called the feasible set, and a vector x is said to be feasible if x X. The goal is to find a vector x X that minimizes the objective function. When the objective function or the feasible set is nonlinear, the optimization problem is called a nonlinear optimization problem or nonlinear program. If both the objective function and feasible set are linear, it is called a linear optimization problem or linear program. Lecture 2 Network Optimization, Fall 2015 2 / 22
Convex Optimization When the objective function and the feasible set are both convex (We will soon study the convexity of sets and functions), the optimization problem is called a convex optimization problem or convex program, and it can be solved efficiently (under some conditions). Examples Linear regression: min w Xw y, where (X, y) is the measurement setup and observed data pair, and x is the vector of (linear) model parameters. Classification (logistic regression or SVM): min w n log(1 + exp( y i x T i w)) or w 2 + C n ξ s.t. ξ 1 y i x T i w, ξ 0. Lecture 2 Network Optimization, Fall 2015 3 / 22
Examples Examples (contd.) Maximum likelihood estimation: k-means: max θ min µ 1,...,µ k n log p θ (x i ) k j=1 i C j x i µ j 2 Lecture 2 Network Optimization, Fall 2015 4 / 22
Course Outline Convexity of sets and functions Unconstrained optimization Optimality conditions and algorithms Constrained optimization Optimality conditions and algorithms Lagrange multiplier theory Lagrange multiplier algorithms Duality and convex programming Dual methods First order methods for large scale problems Robust optimization Lecture 2 Network Optimization, Fall 2015 5 / 22
Convex Sets A convex combination of points x 1,..., x m R n is defined as α 1 x 1 + + α m x m where α 1 + + α m = 1 and α i 0, i. Note that convex combinations of two points are the line segment connecting the two points. Definition of Convex Set A subset C of R n is said to be convex if αx + (1 α)y C, x, y C, α [0, 1]. For a convex set C and any two points in C, the line segment connecting the two points is contained in C. Examples on the board Lecture 2 Network Optimization, Fall 2015 6 / 22
Illustration of Convex Sets αx + (1 α)y, 0 α 1 x y x y x x y y Figure : Examples of convex/nonconvex sets ([Bertsekas, p.688]) Lecture 2 Network Optimization, Fall 2015 7 / 22
Convex Functions Definition of Convex Function Let C be a convex subset of R n. A function f : C R is said to be convex if f(αx + (1 α)y) αf(x) + (1 α)f(y), x, y C, α [0, 1]. The function f is called concave if f is convex. The function f is called strictly convex if the above inequality is strict for all x, y C with x y, and all α (0, 1). Examples Convex functions: x 2, e x Concave functions: log(x), x Neither convex nor concave: x 3 Lecture 2 Network Optimization, Fall 2015 8 / 22
Illustration of Convex Functions f(y) "#*% f(x) "#$% αf(x)!"#$%&'&#(&)&!%"#*% + (1 α)f(y) f ( αx "#!$&'&#(&)&!%*% + (1 α)y ) x$ αx!$&'&#(&)&!%* + (1 α)y y* C+ Figure : Illustration of definition of convex function ([Bertsekas, p.689]) Lecture 2 Network Optimization, Fall 2015 9 / 22
Convex Sets and Functions Proposition (a) For any collection {C i i I} of convex sets, the set intersection i I C i is convex. (b) The vector sum of two convex sets C 1 and C 2 is convex. (c) The image of a convex set under a linear transformation is convex. (d) For a convex set C and convex function f : C R, the level sets {x C f(x) α} and {x C f(x) < α} are convex for all scalars α. Exercises Is the union of convex sets convex? Lecture 2 Network Optimization, Fall 2015 10 / 22
Convex Functions Jensen s Inequality For a convex set C and a convex function f : C R, (a) ( m ) m f α i x i α i f(x i ), x 1,..., x m, α 1,..., α m 0 and m α i = 1. (b) ( f C ) xw(x)dx f(x)w(x)dx, C where w : C R + such that w(x)dx = 1. C Exercises Apply Jensen s inequality to prove that (x 1 x n ) 1/n x1+ +xn n nonnegative numbers x 1,..., x n. (Hint. Use the convexity of e x ) for Lecture 2 Network Optimization, Fall 2015 11 / 22
Jensen s Inequality Part (b) can be viewed as an extension of part (a) in the sense that the nonnegative weight function w corresponds to the nonnegative weight α and it integrates to 1, which is an analogy to the condition that the weights α i add up to 1. Jensen s inequality is one of the most used inequalities in applied mathematics and probability theory. For a random variable X and a convex function f, Part (b) leads to f(e[x]) E[f(X)] where E denotes the expectation. For example, we have E[X 2 ] E[X] 2 which is clear from the definition of variance. Part (a) can be proved by mathematical induction (assume WLOG α i > 0, i) ( m ) m 1 α f α i x i α mf(x n) + (1 α m)f i x i (why?) 1 α m m 1 α i α mf(x n) + (1 α m) f (x i ) (by induction hypothesis) 1 α m m = α i f(x i ) Lecture 2 Network Optimization, Fall 2015 12 / 22
Convex Functions Properties of Convex Function (a) A linear function is convex. (b) Any vector norm is convex. (c) The weighted sum of convex functions, with positive weights, is convex. (d) If I is an index set, C is a convex subset of R n, and f i : C R is convex for each i I, then the function h : C (, ] defined by is also convex. h(x) = sup f i (x) i I Exercises Let f(x) = 2x + 3 and g(x) = 2x + 1. Draw the function h(x) = max{f(x), g(x)} and check the convexity. Lecture 2 Network Optimization, Fall 2015 13 / 22
Differentiable Convex Functions Characterization of Differentiable Convex Functions Let C be a convex subset of R n and let f : R n R be differentiable over R n. (a) f is convex over C if and only if f(z) f(x) + (z x) T f(x), x, z C. (b) f is strictly convex over C if and only if the above inequality is strict whenever x z. Proof: ( ) Assume the inequality holds. Choose any x, y C and α [0, 1], and let z = αx + (1 α)y. Using the inequality twice, we obtain f(x) f(z) + (x z) T f(z), f(y) f(z) + (y z) T f(z). Multiply the first inequality by α, the second by (1 α), and add them to obtain αf(x) + (1 α)f(y) f(z) + (αx + (1 α)y z) T f(z) = z, which proves that f is convex. Conversely, assume that f is convex, let x and z be any vectors in C with x z, and for α (0, 1), consider the function f(x + α(z x)) f(x) g(α) =, α (0, 1]. α Lecture 2 Network Optimization, Fall 2015 14 / 22
Differentiable Convex Functions Proof (contd.): Consider any α 1, α 2, with 0 < α 1 < α 2 < 1, and let ᾱ = α 1 α 2, z = x + α 2 (z x). We have f(x + ᾱ( z x)) f(x) f(x + ᾱ( z x)) = ᾱf( z) + (1 ᾱ)f(x) ( z) f(x) ᾱ Note that the above inequality is strict if f is strictly convex. It follows from the above inequality that f(x + α 1 (z x)) f(x) α 1 f(x + α 2(z x)) f(x) α 2 g(α 1 ) g(α 2 ), with strict inequality if f is convex. This shows that g is an monotonically increasing function with α and thus, (z x) T f(x) = lim α 0 g(α) g(1) = f(z) f(x), which is the desired inequality. Exercises Check the above characterization by drawing convex function x 2 and its outer linear approximation. Lecture 2 Network Optimization, Fall 2015 15 / 22
Second Order Characterization Conditions for Convexity Let C be a convex subset of R n and let f : R n R be twice continuously differentiable over R n. (a) If 2 f(x) is positive semidefinite for all x C, then f is convex over C. (b) If 2 f(x) is positive definite for all x C, then f is strictly convex over C. (c) If C is open and f is convex over C, then 2 f(x) is positive semidefinite for all x C. (d) If f(x) = x T Qx, where Q is a symmetric matrix, then f is convex if and only if Q is positive semidefinite. Furthermore, f is strictly convex if and only if Q is positive definite. Proof of (a): For all x, y C, we have f(y) = f(x) + (y x) T f(x) + 1 2 (y x)t 2 f(x + α(y x))(y x) for some α [0, 1]. By the positive semidefiniteness of 2 f, we have f(y) = f(x) + (y x) T f(x), which shows that f is convex over C. Lecture 2 Network Optimization, Fall 2015 16 / 22
Second Order Characterization Exercises Find an example of a strictly convex function f such that the Hessian 2 f is not positive definite. This shows that the converse of part (b) is not true in general. Check whether the following functions are (strictly) convex (a) f(x 1, x 2) = x 2 1 + x 2 2 over R 2 (b) f(x) = log(x) over R ++ (c) f(x, y, z) = e x+y+z (d) f(x, y, z) = e x2 +y+z log(x + y) + 3 z2 Lecture 2 Network Optimization, Fall 2015 17 / 22
Strong Convexity We now consider a strengthened form of convexity, which is the key property in proving the linear convergence of first order method for solving the convex optimization problem. A continuously differentiable function f : C R, where C is a convex set, is said to be strongly convex if for some α > 0, ( f(x) f(y)) T (x y) α x y 2, x, y C. Strong Convexity Let C be an open convex subset of R n, and let f : C R be a function that is continuously differentiable over C. If f is strongly convex, then f is strictly convex. Furthermore, if f is twice continuously differentiable over C, then f satisfies the strong convexity condition if and only if the matrix 2 f(x) αi, where I is the identity, is positive definite for every x C. Lecture 2 Network Optimization, Fall 2015 18 / 22
Exercises Assume that is l 2 -norm in this slide. Consider a twice continuously differentiable function f defined over an open convex set C. Prove that for some positive constant α the following are equivalent: (a) f(x) α 2 x 2 is convex. (b) f(cx + (1 c)y) cf(x) + (1 c)f(y) c(1 c)α 2 x y 2, c [0, 1]. Prove that f is strongly convex if either (a) or (b) holds. Lecture 2 Network Optimization, Fall 2015 19 / 22
Convex and Affine Hull Let X be a subset of R n. Recall that a convex combination of elements of X is a vector of the form m α ix i, where x 1,..., x m X and α 1,..., α m are nonnegative and add up to 1. The convex hull of X, denoted conv(x), is the set of all convex combinations of elements of X. In particular, if X contains a finite number of elements x 1,...x m, then { m } m conv(x) = α i x i α i 0, i = 1,..., m, α i = 1. Caratheodory s Theorem Let X be a subset of R n. Every element of conv(x) can be represented as a convex combination of no more than n + 1 elements of X. Lecture 2 Network Optimization, Fall 2015 20 / 22
Local and Global Minima Let X R n and let f : X R be a function. A vector x X is called a local minimum of f if there exists some ɛ > 0 such that f(x) f(y) for every y X satisfying x y ɛ, where is some vector norm. A vector x is called a global minimum if f(x) f(y), y X. Under convexity assumptions, local minimum is equivalent to global minimum. Local Min = Global Min under Convexity If C is a convex subset of R n and f : C R is a convex function, then a local minimum of f is also a global minimum. If in addition f is strictly convex, then there exists at most one global minimum of f. Lecture 2 Network Optimization, Fall 2015 21 / 22
Projection Theorem Projection Theorem Let C be a closed convex set and let be the Euclidean norm. (a) For every x R n, there exists a unique vector z C that minimizes z x over all z C. This vector is called the projection of x on C, and is denoted by [x] +, i.e., [x] + = arg min z x. z C (b) Given some x R n, a vector z C is equal to [x] + if and only if (y z) T (x z) 0, y C. (c) The mapping f : R n C defined by f(x) = [x] + is continuous and nonexpansive, i.e., [x] + [y] + x y, x, y R n. Lecture 2 Network Optimization, Fall 2015 22 / 22