IE 5531: Engineering Optimization I

IE 5531: Engineering Optimization I Lecture 19: Midterm 2 Review Prof. John Gunnar Carlsson November 22, 2010 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 1 / 34

Administrivia Midterm 2 on 11/24 Covers lectures 10-17 Open book, open notes Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 2 / 34

Lecture 10: Introduction to nonlinear methods A global minimizer for the problem minimize f (x) x F s.t. is a vector x F such that f ( x) f (x) for all x F Unlike linear programming, sometimes we must settle for a local minimizer x F, which is only locally optimal A local minimizer x F is a vector satisfying f ( x) f (x) for all x F N ( x), where N ( x) is called a neighborhood of x Typically N ( x) is an open ball centered at x with suciently small radius δ > 0 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 3 / 34

Lecture 10: Introduction to nonlinear methods Suppose that f (x) is a dierentiable function; if x R n and there exists a vector d such that f ( x) T d < 0 then there exists a scalar τ > 0 such that f ( x + τ d) < f ( x) for all τ (0, τ) The vector d is called a descent direction at x The most obvious descent direction is d = f ( x) Sometimes a better descent direction is d = H 1 f ( x), where H is the Hessian matrix of f (this captures more global behavior of the function) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 4 / 34

Lecture 10: Introduction to nonlinear methods Clearly a necessary condition for optimality of a feasible point x is that there be no feasible descent direction; that is, D ( x; F) D ( x; f ) = In an unconstrained problem we have F = R n and therefore every direction is feasible It follows that if x is optimal, then we must have D ( x; f ) = This means that for all vectors d we must have f ( x) T d 0, which can only happen if f ( x) = 0 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 5 / 34

Lecture 12: Nonlinear methods, continued Theorem Let x be a local minimizer of LEP. A necessary condition for optimality at a point x for the problem minimize f (x) s.t. Ax = b is that f ( x) = A T y for some vector y R m. The geometric interpretation is that the gradient vector must be perpendicular to the constraint hyperplanes Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 6 / 34

Lecture 12: Nonlinear methods, continued Theorem Let x be a local minimizer of the problem minimize f (x) s.t. g i (x) = 0 i {1,..., m} If the functions f (x) and g i (x) are continuously dierentiable at x and the Jacobian matrix g ( x) has rank m, then there exist scalars y 1,..., y m such that m f ( x) = y i g i ( x) i=1 where the y i 's are called Lagrange multipliers. Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 7 / 34

Lecture 12: Nonlinear methods, continued Theorem A necessary condition for optimality at a point x for the problem minimize f (x) s.t. Ax b is that f ( x) = A T y for some vector y R m with y 0. Furthermore, we must have y i = 0 if A i x > b i. Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 8 / 34

Lecture 12: Nonlinear methods, continued Theorem (KKT Conditions) If x is a local minimizer for the problem minimize f (x) s.t. c (x) 0 h (x) = 0 with c (x) R m and h (x) R p and certain (technical) constraint qualications are satised at x, then there exist scalars y 1,..., y m and z 1,..., z p such that f ( x) = m i=1 y i c i ( x) + p z i h i ( x) y i 0 i y i c i ( x) = 0 i Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 9 / 34 i=1

Lecture 12: Nonlinear methods, continued In all of the preceding problems, if the objective function and the feasible sets are convex, then the necessary optimality conditions are also sucient! Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 10 / 34

Lecture 13: Applications of KKT conditions Consider an economy consisting of n = 2 agents, each of whom consume a public good (national defense) x and a private good y (cars) A public good has two important properties: It is nonrival: consumption of the good by one agent does not reduce the amount from another agent It is non-excludable: no agent can prevent another agent from using it The agents each have initial endowments y 1, y 2 which can be spent on the public good (price p) or the private good (price 1, say) The agents' utility functions are given by u 1 (x, z 1 ) and u 2 (x, z 2 ), where x: total amount of the public good that agents 1 and 2 pay for z 1, z 2 : amount of the private good that agents 1 and 2 have Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 11 / 34

Lecture 13: Applications of KKT conditions At an optimal allocation, u 1 / x + u 2/ x = p u 1 / z 1 u 2 / z 2 which is called the Samuelson condition Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 12 / 34

Lecture 13: Applications of KKT conditions At an optimal allocation, u 1 / x + u 2/ x = p u 1 / z 1 u 2 / z 2 which is called the Samuelson condition The quantity u i / x is called the marginal rate of substitution u i / z i between the public and the private good; it says how many units of the private good the consumer will give up for one extra unit of x This says that, when public goods are allocated optimally, the unit cost of the public good p should be equal to the sum of the benets to all agents from the public good Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 12 / 34

Lecture 13: Applications of KKT conditions Power allocation: We have a collection of n communication channels and we need to decide how much power to allocate to each of them The capacity (communication rate) of channel i is log (α i + x i ) with given α i > 0, and we have a budget constraint 1 T x = 1, x 0 The optimization problem is maximize n log (α i + x i ) s.t. i=1 1 T x = 1 x 0 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 13 / 34

Lecture 13: Applications of KKT conditions The optimal solution is x i = max {0, 1/ν α i } for some ν { x 1/ν α i ν < 1/α i i = 0 otherwise Think of a set of patches with heights α i ; we then ood the region with water of height 1/ν The total amount of water is n i=1 max {0, 1/ν α i} The water-lling algorithm is precisely to ll the region with water until an amount 1 has been used! Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 14 / 34

Lecture 13: Applications of KKT conditions Buyers have money w i to buy j dierent goods and maximize their individual (linear) utility functions Producers sells their goods for money The equilibrium price is an assignment of prices to goods so that the market clears when every buyer buys his optimal set of goods Each buyer's strategy is maximize u T i x i s.t. p T x i w i x i 0 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 15 / 34

Lecture 13: Applications of KKT conditions ( maximize w i log u T i i x i i ) x i = 1 s.t. x i 0 i Theorem (Eisenberg and Gale 1959) The optimal Lagrange multiplier for the equality constraints in the above NLP is an equilibrium price vector. Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 16 / 34

Lecture 14: Unconstrained optimization Optimization algorithms tend to be iterative procedures: Starting at a given point x0, they generate a sequence {x k} of iterates This sequence terminates when either no more progress can be made (out of memory, etc.) or when a solution point has been approximated satisfactorily At any given iterate x k, we generally want x k+1 to satisfy f (x k+1) < f (x k) Furthermore, we want our sequence to converge to a local minimizer x The general approach is a line search: At any given iterate xk, choose a direction dk, and then set xk+1 = xk + α k dk for some scalar α k > 0 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 17 / 34

Lecture 14: Unconstrained optimization Root-nding methods: Bisection Golden section search Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 18 / 34

Lecture 14: Unconstrained optimization Consider the multi-dimensional problem for x R n minimize f (x) At each iteration x k we set d k = f (x k) and set x k+1 = x k + α k d k, for appropriately chosen α k In the big picture, we want α k to give us a sucient reduction in f (x), without spending too much time on it Two conditions we can impose are the Wolfe and Goldstein conditions Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 19 / 34

Lecture 14: Unconstrained optimization The Goldstein Conditions Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 20 / 34

Lecture 14: Unconstrained optimization The Wolfe Conditions Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 21 / 34

Lecture 14: Unconstrained optimization Theorem Let f (x) be a given continuously dierentiable function. Let x0 R n be a point for which the sub-level set X 0 = {x R n : f (x) f (x0)} is bounded. Let {x k} be a sequence of points generated by the steepest descent method initiated at x0, using either the Wolfe or Goldstein line search conditions. Then {x k} converges to a stationary point of f (x). Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 22 / 34

Lecture 15: Nonlinear optimization Minimizing a function f (x) can be thought of as nding a solution to the nonlinear system of equations f (x) = 0 Suppose we begin at a point x0 that is thought to be close to a minimizer x We may consider the problem of nding a solution to f (x) = 0 that is close to x0 (we're assuming that there aren't any maximizers that are closer to x0) Newton's method is a general method for solving a system of equations g (x) = 0 (to minimize/maximize, set g (x) := f (x) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 23 / 34

Lecture 15: Nonlinear optimization Newton's method is an iterative method that follows the following scheme: 1 At a given iterate xk, make a linear approximation L (x) to g (x) at xk by dierentiating g (x) 2 Set xk+1 to be the solution to the linear system of equations L (x) = 0 It is not hard to show that, in the univariate case, the iteration is x k+1 = x k g (x k) g (x k ) which is well-dened provided g (x k ) exists and is nonzero at each step Note that the iteration terminates if g (x k ) = 0 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 24 / 34

Lecture 15: Nonlinear optimization Consider the problem of solving g (x) = 0 Dene the Jacobian matrix J = g by [J] ij = g i (x) (x j ) (the rows of J are just the gradient vectors g i (x)) The iterations are where J is constructed at point x k x k+1 = x k J 1 g (x) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 25 / 34

Lecture 15: Nonlinear optimization The ellipsoid method is best introduced by considering the problem of nding an element of of a solution set X given by a system of linear inequalities: X = {x R n : Ax b, i = 1,..., m} An ellipsoid is just a set of the form { } E k = x R n : (x x k) T B 1 (x k x k) 1 where x k is the center of the ellipsoid B k is a symmetric positive denite matrix of dimension n Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 26 / 34

Lecture 15: Nonlinear optimization Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 28 / 34

Lecture 15: Nonlinear optimization At a given iteration k with x k and E k, we construct E k+1 as follows: dene τ = 1 n + 1 ; δ = n2 n 2 1 ; σ = 2τ We set x k+1 = x k + a T j τ B k a j B k a j; B k+1 = δ ( B k σ B k a j at j a T j B k a j B k ) Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 29 / 34

Lecture 15: Nonlinear optimization Theorem The ellipsoid E k+1 dened in the { preceding slide is the} minimum volume ellipsoid that contains E half := x E k k : a T x j a T j x k. Moreover, vol (E k+1 ) vol (E k ) ( ) n 2 (n 1)/2 = n 2 1 ( ) n 1 n + 1 < exp < 1 2 (n + 1) This establishes that the volume of the ellipsoid decreases by a constant amount at each iteration. It can be shown that the ellipsoid solves linear programs in O ( n 2 log (R/ɛ) ) iterations. Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 30 / 34

Lecture 16: Complexity theory, interior point methods If we determine that we can solve problem P in no more than α f (N) operations, where α is a constant, and f (N) is some function, then we say that algorithm A solves problem P in running time O (f (N)) If f (N) is bounded above by a polynomial, then we say that algorithm A solves problem P in polynomial running time (a desirable property) An undesirable property: solving problem P in exponential or factorial time Long-standing question (solved a long time ago): does the simplex method solve an LP in polynomial running time? (answer: NO) e.g. O ( mn 2), O ( ) m 3 n, etc. Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 31 / 34

Lecture 17: Interior point methods Consider the linearly-constrained problem minimize f (x) s.t. Ax = b x 0 The KKT conditions are X s = 0 Ax = b A T y + f (x) s = 0 x, s 0 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 32 / 34

Lecture 17: Interior point methods We can treat the constraint x 0 in a soft way by applying a penalty, or barrier function: n minimize f (x) µ log x j j=1 s.t. Ax = b as x i 0, the barrier term increases The KKT conditions are f (x) µx 1 1 A T y = 0 Ax = b x > 0 Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 33 / 34

Lecture 17: Interior point methods The central path C of the barrier problem is dened by C = {x (µ) > 0, y (µ), s (µ) > 0 : 0 < µ < } It turns out that as µ 0, the central path converges to the minimizer of the original constrained problem Interior point methods solve convex optimization problems (including LPs) with running time that is both theoretically and practically fast Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I November 22, 2010 34 / 34