Math 273a: Optimization Basic concepts Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 slides based on Chong-Zak, 4th Ed.
Goals of this lecture The general form of optimization: minimize f(x) subject to x Ω We study the following topics: terminology types of minimizers optimality conditions
Unconstrained vs constrained optimization minimize f(x) subject to x Ω Suppose x R n, Ω is called the feasible set. if Ω = R n, then the problem is called unconstrained. otherwise, the problem is called constrained. In general, more sophisticated techniques are needed to solve constrained problems.
(off the topic) Later, we will study some nonsmooth analysis and algorithms that allow f to have the extended value,. Then, we can write any constrained problem in the unconstrained form where the indicator function minimize f(x) + ι Ω (x) ι Ω (x) = { 0, x Ω,, x Ω. The objective function f(x) + ι Ω (x) is nonsmooth.
Types of solutions x is a local minimizer if there is ɛ > 0 such that f(x) f(x ) for all x Ω \ {x } and x x < ɛ x is a global minimizer if f(x) f(x ) for all x Ω \ {x } If is replaced with >, then they are strict local minimizer and strict global minimizer, respectively. x 1: strict global minimizer; x 2: strict local minimizer; x 3: local minimizer
Convexity and global minimizers A set Ω is convex if λx + (1 λ)y Ω for any x, y Ω and λ [0, 1]. A function is convex if f(λx + (1 λ)y) λf(x) + (1 λ)f(y) for any x, y Ω and λ [0, 1]. A function is convex if and only if its graph is convex. An optimization problem is convex if both the objective function and feasible set are convex. Theorem: Any local minimizer of a convex optimization problem is a global minimizer.
Derivatives First-order derivative: row vector Gradient of f: f = (Df) T, which is a column vector. A gradient represents the slope of the tangent of the graph of function. It gives the linear approximation of f at a point. It points toward the greatest rate of increase.
Hessian (i.e., second-derivative) of f: which is a symmetric matrix. (F (x)) ij = 2 f x i x j = 2 f x j x i. For one-dimensional function f(x) where x R, it reduces to f (x). F (x) is the Jacobian of f(x), that is, F (x) = J( f(x)). Alternative notation: H(x) and 2 f(x) are also used for Hessian. A Hessian gives a quadratic approximation of f at a point. Gradient and Hessian are local properties that help us recognize local solutions and determine a direction to move at toward the next point.
Example Consider Then, and f(x 1, x 2) = x 3 1 + x 2 1 x 1x 2 + x 2 2 + 5x 1 + 8x 2 + 4 [ ] 3x 2 1 + 2x 1 x 2 + 5 f(x) = R 2 x 1 + 2x 2 + 8 [ ] 6x1 + 2 1 F (x) = R 2 2. 1 2 Observation: if f is a quadratic function (remove x 3 1 in the above example), f(x) is a linear vector and F (x) is a symmetric constant matrix for any x.
Taylor expansion Suppose φ C m (m times continuously differentiable). The Taylor expansion of φ at a point a is φ(a + h) = φ(a) + φ (a)h + φ (a) 2! There are other ways to write the last two terms. h 2 + + φm (a) h m + o(h m ). 1 m! Example: Consider x, d R n and f C 2. Define φ(α) = f(x + αd). Then, φ (α) = f(x + αd) T d φ (α) = d T F (x + αd) T d Hence, f(x + αd) = f(x) + ( f(x) T d ) α + o(α) = f(x) + ( f(x) T d ) α + dt F (x) T d α 2 + o(α 2 ). 2 1 o(α) collects the term(s) that is asymptotically smaller than α near 0, that is, o(α) α 0, as α 0.
Feasible direction A vector d R n is a feasible direction at x Ω if d 0 and x + αd Ω for some small α > 0. (It is possible that d is an infeasible step, that is, x + d Ω. But if there is some room in Ω to move from x toward d, then d is a feasible direction.) d 1 is feasible, d 2 is infeasible If Ω = R n or x lies in the interior of Ω, then any d R n \ {0} is a feasible direction Feasible directions are introduced to establish optimality conditions, especially for points on the boundary of a constrained problem
First-order necessary condition Let C 1 be the set of continuously differentiable functions. Proof: Let d by any feasible direction. First-order Taylor expansion: f(x + αd) = f(x ) + αd T f(x ) + o(α). If d T f(x ) < 0, which does not depend on α, then f(x + αd) < f(x ) for all sufficiently small α > 0 (that is, all α (0, ᾱ) for some ᾱ > 0). This is a contradiction since x is a local minimizer.
Proof: Since any d R n \ {0} is a feasible direction, we can set d = f(x ). From Theorem 6.1, we have d T f(x ) = f(x ) 2 0. Since f(x ) 2 0, we have f(x ) 2 = 0 and thus f(x ) = 0. Comment: This condition also reduces the problem minimize f(x) to solving the equation f(x ) = 0.
x 1 fails to satisfy the FONC; x 2 satisfies the FONC
Second-order necessary condition In FONC, there are two possibilities d T f(x ) > 0; d T f(x ) = 0. In the first case, f(x + αd) > f(x ) for all sufficiently small α > 0. In the second case, the vanishing d T f(x ) allows us to check higher-order derivatives.
Let C 2 be the set of twice continuously differentiable functions. Proof: Assume that a feasible direction d with d T f(x ) = 0 and d T F (x )d < 0. By 2nd-order Taylor expansion (with a vanishing 1st order term), we have f(x + αd) = f(x ) + dt F (x )d α 2 + o(α 2 ), 2 where by our assumption d T F (x )d < 0. Hence, for all sufficiently small α > 0, we have f(x + αd) < f(x ), which contradicts that x is a local minimizer.
The necessary conditions are not sufficient Counter examples f(x) = x 3, f (x) = 3x 2, f (x) = 6x f(x) = x 2 1 x 2 2 0 is a saddle point: f(0) = 0 but neither a local minimizer nor maximizer By SONC, 0 is not a local minimizer!
Second-order sufficient condition Comments: part 2 states F (x ) is positive definite: x T F (x )x > 0 for x 0. the condition is not necessary for strict local minimizer. Proof: For any d 0 and d = 1, we have d T F (x )d λ min(f (x )) > 0. Use the 2nd order Taylor expansion f(x +αd) = f(x )+ α2 2 dt F (x )d+o(α 2 ) f(x )+ α2 2 λmin(f (x ))+o(α 2 ). Then, ᾱ > 0, regardless of d, such that f(x + αd) > f(x ), α (0, ᾱ).
Graph of f(x) = x 2 1 + x 2 2 The point 0 satisfies the SOSC.
Roles of optimality conditions Recognize a solution: given a candidate solution, check optimality conditions to verify it is a solution. Measure the quality of an approximate solution: measure how close a point is to being a solution Develop algorithms: reduce an optimization problem to solving a (nonlinear) equation (finding a root of the gradient). Later, we will see other forms of optimality conditions and how they lead to equivalent subproblems, as well as algorithms
Quiz questions 1. Show that for Ω = {x R n : Ax = b}, d 0 is a feasible direction at x Ω if and only if Ad = 0. 2. Show that for any unconstrained quadratic program, which has the form minimize f(x) := 1 2 xt Qx b T x, if x satisfies the second-order necessary condition, then x is a global minimizer. 3. Show that for any unconstrained quadratic program with Q 0 (Q is symmetric and positive semi-definite), x is a global minimizer if and only if x satisfies the first-order necessary condition. That is, the problem is equivalent to solving Qx = b. 4. Consider minimize c T x, subject to x Ω. Suppose that c 0 and the problem has a global minimizer. Can the minimizer lie in the interior of Ω?