IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 1 / 21
Administrivia Midterms returned 11/01 11/01 oce hours moved PS5 posted this evening Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 2 / 21
Recap: Applications of KKT conditions Applications of KKT conditions: Portfolio optimization Public good allocation Communication channel power allocation (water-lling) Fisher's exchange market Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 3 / 21
Today Algorithms for unconstrained minimization: Introduction Bisection search Golden section search Line search Wolfe, Goldstein conditions Gradient method (steepest descent) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 4 / 21
Introduction Today's lecture is focused on solving the unconstrained problem for x R n minimize f (x) Ideally, we would like to nd a global minimizer, i.e. a point x such that f (x ) f (x) for all x R n In general, as we have seen with the KKT conditions, we have to settle for a local minimizer, i.e. a point x such that f (x ) f (x) for all x in a local neighborhood N (x ) If f (x) is convex, these two notions are the same Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 5 / 21
Necessary and sucient conditions If x is a local minimizer, then there must be no descent direction, i.e. a direction d such that f (x ) T d < 0 This immediately implies that f (x ) = 0 We also need to distinguish between local maximizers and local minimizers, so we also require that H 0, where h ij = 2 f (x ) x i x j The stronger condition H 0 is a sucient condition for x to be a minimizer Again, if f (x) is convex (and continuously dierentiable), then f (x ) = 0 is a necessary and sucient condition for a global minimizer Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 6 / 21
Overview Optimization algorithms tend to be iterative procedures: Starting at a given point x 0, they generate a sequence {x k } of iterates This sequence terminates when either no more progress can be made (out of memory, etc.) or when a solution point has been approximated satisfactorily At any given iterate x k, we generally want x k+1 to satisfy f (x k+1) < f (x k ) Furthermore, we want our sequence to converge to a local minimizer x The general approach is a line search: At any given iterate x k, choose a direction d k, and then set x k+1 = x k + α k d k for some scalar α k > 0 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 7 / 21
Convergent sequences Denition Let {x k } be a sequence of real numbers. Then {x k } converges to x if and only if for all real numbers ɛ > 0, there exists a positive integer K such that x k x < ɛ for all k K. Examples of convergence: x k = 1/k x k = (1/2) k [ x k = 1 log(k+1) ] k Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 8 / 21
Searching in one variable: root-nding Intermediate value theorem: given a continuous single-variable function f (x) and a pair of points x 0 and x 1 such that f (x l ) < 0 and f (x r ) > 0, there exists a point x [x l, x r ] such that f (x ) = 0 A simpler question to motivate: how can we nd x (or a point within ɛ of x )? Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 9 / 21
Bisection 1 Choose x mid = x l+xr 2 and evaluate f (x mid ) 2 If f (x mid ) = 0, then x = x mid and we're done 3 Otherwise, 1 If f (x mid ) < 0, then set x l = x mid 2 If f (x mid ) > 0, then set x r = x mid 4 If x r x l < ɛ, we're done; otherwise, go to step 1 The algorithm above divides the search interval in half at every iteration; thus, to approximate x by ɛ we require at most log 2 iterations r l ɛ Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 10 / 21
Golden section search Consider a unimodal function f (x) dened on an interval [x l, x r ] Unimodal: f (x) has only one local minimizer x in [x l, x r ] How can we nd x (or a point within ɛ of x )? Hint: we can do this without derivatives Hint: we need to sample two points x l, x in [x r l, x r ] Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 11 / 21
Golden section search Assume without loss of generality that x l = 0 and x r = 1; set ψ = 3 5 1 Set x l = ψ and x r = 1 ψ. ( ) ( ) 2 If f x l < f x r, then the minimizer must lie in the interval [ ] xl, x r, so set xr = x r 2 3 Otherwise, the minimizer must lie in the interval [ ] x l, x r, so set x l = x l 4 If x r x l < ɛ, we're done; otherwise, go to step 1 By setting ψ = 3 5 we decrease the search interval by a constant factor 2 1 ψ 0.618 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 12 / 21
Line search: step length Consider the multi-dimensional problem minimize f (x) for x R n At each iteration x k we set d k = f (x k ) and set x k+1 = x k + α k d k, for appropriately chosen α k Ideally, we would like for α k to be the minimizer of the univariate function φ (α) := f (x k + αd k ) but this is time-consuming In the big picture, we want α k to give us a sucient reduction in f (x), without spending too much time on it Two conditions we can impose are the Wolfe and Goldstein conditions Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 13 / 21
Armijo condition Clearly the step length α k f (x), so we require should guarantee a sucient decrease in φ (α) = f (x k + αd k ) f (x k ) + c 1 α f (x k ) T d k with c 1 (0, 1) The right-hand side is linear in α Note that this is satised for all α that are suciently small In practice, we often set c 1 10 4 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 14 / 21
Curvature condition The preceding condition is not sucient because an arbitrarily small α satises it, which means that {x k } may not converge to a minimizer One way to get around this is to impose the additional condition where c 2 (c 1, 1) φ (α) = f (x k + αd k ) T d k c 2 f (x k ) T d k This condition just says that the slope at φ (α) has to be more than c 2 times the slope at φ (0) Typically we choose c 2 0.9 If the slope at φ (α) were really small, it would mean that our step size wasn't chosen very well (we could continue in that direction and decrease the function) The Armijo condition and the curvature condition, when combined, are called the Wolfe conditions Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 15 / 21
Goldstein conditions An alternative to the Wolfe conditions is the Goldstein conditions: f (x k )+(1 c) α f (x k ) T d k f (x k + αd k ) f (x k )+cα f (x k ) T d k with c (0, 1/2) The second inequality is just the sucient decrease condition The rst inequality bounds the step length from below One disadvantage is that the local minimizers of φ (α) may be excluded in this search Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 16 / 21
Steepest (gradient) descent example Recall that in the method of steepest descent, we set d k = f (x k ) Consider the case where we want to minimize f (x) = c T x + 1 2 xt Q x where Q is a symmetric positive denite matrix Clearly, the unique minimizer lies where f (x ) = 0, which occurs precisely when Q x = c The descent direction will be d = f (x) = (c + Q x) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 17 / 21
Steepest descent example The iteration scheme x k+1 = x k + α k d k is given by x k+1 = x k α k (c + Q x k ) We need to choose a step size α k, so we consider φ (α) = f (x k α (c + Q x k )) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 18 / 21
Steepest descent example Note that we don't even need the Wolfe or Goldstein conditions, as we can nd the optimal α analytically! φ (α) = f (x k α (c + Q x k )) = c T (x k α (c + Q x k )) + 1 2 (x k α (c + Q x k )) T Q (x k α (c + Q x k )) Since φ (α) is a strictly convex quadratic function in α it is not hard to see that its minimizer occurs where c T d k + x T k Q d k + αd T k Q d k = 0 and thus we set with d k = (c + Q x k ) α k = dt k d k d T k Q d k Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 19 / 21
Steepest descent example The recursion for the steepest descent method is therefore x k+1 ( = x k d T k d k d T k Q d k ) d k Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 20 / 21
Convergence of steepest descent Theorem Let f (x) be a given continuously dierentiable function. Let x 0 R n be a point for which the sub-level set X 0 = {x R n : f (x) f (x 0 )} is bounded. Let {x k } be a sequence of points generated by the steepest descent method initiated at x 0, using either the Wolfe or Goldstein line search conditions. Then {x k } converges to a stationary point of f (x). The above theorem gives what is called the global convergence property of the steepest-descent method No matter how far away x 0 is, the steepest descent method must converge to a stationary point The steepest descent method may, however, be very slow to reach that point Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 21 / 21