OPER 627: Nonlinear Optimization Lecture 2: Math Background and Optimality Conditions Department of Statistical Sciences and Operations Research Virginia Commonwealth University Aug 28, 2013 (Lecture 2) Nonlinear Optimization Aug 28, 2013 1 / 16
Quiz What is my son s name? Where was my best picture taken? Yes/No questions: A convergent sequence could only have one accumulation point {kπ kπ } k=1 have many but a finite number of accumulation point We will study optimization problems with a nondifferentiable objective function A finite union of open sets is open An infinite union of closed sets is closed Lipschitz continuous functions must be continuous (Lecture 2) Nonlinear Optimization Aug 28, 2013 2 / 16
Today s Outline More math background Optimality conditions (Lecture 2) Nonlinear Optimization Aug 28, 2013 3 / 16
Cal-Cool-Less Gradient of a function f : R n R is denoted by: 2 f (x) = f (x) = [f (x)] = 2 f x 2 1 f x 1. f x n Hessian of a function f : the matrix of second partial derivatives: 2 f x 1 x 2 2 f x 1 x n. f x nx 1.. 2 f x nx 2 2 f xn 2 Hessian matrix 2 f (x) is symmetric if f is twice continuously differentiable Chain rule: [h(g(x))] = h (g(x)) g (x) (Lecture 2) Nonlinear Optimization Aug 28, 2013 4 / 16
Mean-value theorem (first-order expansion) f : R n R continuously differentiable over an open set I, then x I, p R n : 1 t (0, 1) such that f (x + p) = f (x) + f (x + tp) p 2 f (x + p) = f (x) + f (x) p + o( p ) Notation o( ): h(α) is o(α) if lim α 0 h(α) α = 0 Geometry: Gradient f (x) captures the local trend at point x (Lecture 2) Nonlinear Optimization Aug 28, 2013 5 / 16
Second-order expansion f : R n R is twice continuously differentiable over an open set I. 1 f (x + y) = f (x) + 1 0 2 f (x + ty)ydt 2 y, x + y I, α (0, 1) that f (x + y) = f (x) + y f (x) + 1 2 y 2 f (x + αy)y 3 y, x + y I, f (x + y) = f (x) + y f (x) + 1 2 y 2 f (x)y + o( y 2 ) When f = 0, 2 f becomes the principle factor. That s it! I promise we will only see f and 2 f in this course! In general, Taylor s expansion: f (x + y) = f (x) + n ( 1) k+1 1 k! f (k) y k + o( y n ) k=1 People make fun of optimization: optimization = Taylor s expansion with n = 2 (Lecture 2) Nonlinear Optimization Aug 28, 2013 6 / 16
PD and PSD A symmetric matrix A R n n is: - Positive definite (PD) if x Ax > 0, x 0 - Positive semidefinite (PSD) if x Ax 0, x R n Some useful results on PD/PSD: A R m n, A A is PSD A A is PD if and only if A has full column rank. If A R n n, A A is PD A is nonsingular A is PD A 1 is PD A is PD eig(a) > 0; PSD eig(a) 0 A is PD, λ A : smallest eigenvalue, Λ A : largest eigenvalue, then λ A x 2 2 x Ax Λ A x 2 2 (Lecture 2) Nonlinear Optimization Aug 28, 2013 7 / 16
Ideas for optimization from Cal-cool-less? Gradient, Hessian provides tools for studying local properties of a function Special Hessian matrix (PD, PSD) provides nice structures of local properties Looking for the minimum of a function? Combine them! (Lecture 2) Nonlinear Optimization Aug 28, 2013 8 / 16
Key concepts Stationary points: f (x) = 0 Local minimum ˆx : f (ˆx) f (x), x N (ˆx) S Global minimum x : f (x ) f (x), x S A sad but true fact: the algorithms in this course will only guarantee local minimum (Lecture 2) Nonlinear Optimization Aug 28, 2013 9 / 16
Optimality conditions: Necessary conditions Necessary condition of an unconstrained local minimizer x of f : FONC: Assume f C 1 on an open set I, then f (x ) = 0 We call x a stationary point Originally formulated by Fermat (1637) What if a closed set? We will see it in the 2nd half of the course Q: why not sufficient? SONC: Assume f C 2 on an open set I, then 2 f (x ) is PSD (Lecture 2) Nonlinear Optimization Aug 28, 2013 10 / 16
Optimality conditions: Sufficient conditions Only first-order information is clearly not sufficient to claim a local minimizer. SOSC Let f C 2 on an open set I, suppose x I that: f (x ) = 0 2 f (x ) is PD Then x is a strict unconstrained local minimizer. Furthermore, δ that f (x) f (x ) + δ 2 x x 2, x {x x x ɛ}, ɛ > 0 The gap between SONC and SOSC is very narrow! But there is no sufficient and necessary optimality condition in general (Lecture 2) Nonlinear Optimization Aug 28, 2013 11 / 16
How we shall use the optimality conditions 1 Use them directly: First find all points satisfying f (x) = 0 Then check SONC, see if 2 f (x) is PSD Finally, for the remaining points, check if 2 f (x) is PD Q: What do we get after all these? 2 Alternatively: find all points satisfying FONC (stationary points), and pick the one with the minimum objective value Q: Is it a global minimum? 3 In general, solving f (x) = 0 could be as hard as solving the original optimization problem! (Lecture 2) Nonlinear Optimization Aug 28, 2013 12 / 16
Optimality conditions and optimization algorithms 1 Algorithms check whether a solution satisfy the optimality condition, and terminate if so 2 The behavior of many algorithms in the neighborhood of a local minimum depends on whether certain optimality conditions hold If SOSC holds, then typically algorithms converge very fast locally Keep them in mind! We will see them a lot soon (Lecture 2) Nonlinear Optimization Aug 28, 2013 13 / 16
All we have is the LOCAL result... Accent of a nonlinear optimizer: solve = find a local optimum Which local optimum we find depends on the starting point. (Lecture 2) Nonlinear Optimization Aug 28, 2013 14 / 16
Key concept: Convexity From now on, convexity is your best friend A set S is convex, if x, y S, then αx + (1 α)y S, α [0, 1] A function f is convex, if: (1) Its domain (where it is defined) S is a convex set (2) f (αx +(1 α)y) αf (x)+(1 α)f (y), α [0, 1] x, y S A $1,000,000,000 result Optimizing a convex function over a convex set: ANY local optimal solution is globally optimal! (Lecture 2) Nonlinear Optimization Aug 28, 2013 15 / 16
Next time Optimization under convexity: optimization utopia No class next Monday (Labor Day) Homework 1 out, due on Sept. 9th (Lecture 2) Nonlinear Optimization Aug 28, 2013 16 / 16