Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9
Non-linear programming In case of LP, the goal was to maximize or minimize a linear function subject to linear constraints But in many interesting maximization and minimization problems the objective function may not be a linear function, or some of the constraints may not be linear constraints Such an optimization problem is called nonlinear programming problem (NLP)
Example If K units of capital and L units of labor are used, a company can produce KL units of a manufactured good. Capital can be purchased at $4/unit and labor can be purchased at $1/unit. A total of $8 is available to purchase capital and labor. How can the firm maximize the quantity of the good that can be manufactured?
NLP problem We are finding the maximum of function f where f : R n R and x R n max f(x) x Often constraints are also given in the form: g i (x) 0, i = 1,..., m and/or g j (x) = 0, i = 1,..., k. f and g i can be linear (LP), or non-linear (NLP), but we assume, that they are continuous there is no restriction to x let S be the set of feasible solutions
NLP problem When x is a solution? An x is a local maximum (optimum), if δ > 0, such that x S : x x < δ, then f(x ) f(x). An x is a global maximum (optimum), if x S : f(x ) f(x). Remark: it would be nice if all local optimum would also be global optimum. This is true if the function is concave (or convex).
NLP problem: convex and concave functions
Gradient, Hessian The gradient of function f is defined as: [ f f(x) =, f,..., f ] x 1 x 2 x n The Hessian (matrix) of function f is: with entries: H(x) = ( f(x)) = 2 f(x) 2 f h ij = x i x j therefore H is a symmetric n n matrix. Remark: we generally suppose that f is smooth enough and continuously differentiable (the derivative function is also differentiable), as many times as necessary
Gradient, Hessian: example Consider f(x, y) = x 2 + y 3 + xy. Then the gradient is f(x, y) = [ 2x + y, 3y 2 + x ] and the Hessian is [ 2 ] 1 1 6y
Necessary and sufficient condition for optimality Theorem (Necessary condition). If x is the maximum (minimum) of f(x), then f(x ) = 0. This is not sufficient, see e.g. f(x) = x 3. Theorem (Sufficient condition). If f(x ) = 0 and H(x ) is negative semidefinite (positive semidefinite), then x is a local maximum (local minimum) of f. 1 1 Check how to decide whether a matrix is postive or negative semidefinite!
Gradient descent method The necessary condition [ f(x) f(x) =, f(x),..., f(x) ] = 0 x 1 x 2 x n leads to an equation system that may not be solved easily. Instead, we can use an iterative procedure Local search method: Let x 0 be an initial solution. The iteration step is x i+1 = x i + µd i, i = 0,1,2,... where d i is a direction where f is increasing (in case of maximization), µ is the step size. if x i+1 x i ɛ STOP. We know that f(x) points in the direction of greatest increase. Thus d i = f(x i ) is an appropriate choice. µ can be determined by solving max µ x i + µd i, or we can approximate it.
Gradient descent method: example Example. Find the maximum of f(x, y) = (x 3) 2 3(y 1) 2 ( ) f f = x f y = ( ) 2(x 3) = = 6(y 1) x 0 = (0,0) ( 6 2x 6 6y x 1 = (0,0) + µ(6,6) = (6µ,6µ), thus we are finding max µ f(6µ,6µ) max µ f(6µ,6µ) = max (6µ 3) 2 3(6µ 1) 2 = max 144(µ 1/4) 2 3 ) µ = 1/4 x 1 = (6/4,6/4)
x 3 x * x 2 x 1 x 2 x 1
Constrained optimization: Lagrange duality Given the problem Let max x R n f(x) s.t. g i (x) 0, i = 1,..., l g j (x) = 0, j = l + 1,..., m L(x, λ, ν) = f(x) l λ i g i (x) i=1 Then the Lagrange dual function is m j=l+1 g(λ, ν) = max L(x, λ, ν). x Rn Then the dual optimization problem is defined as: min (λ,ν) R m g(λ, ν) subject to λ i 0, i = 1,..., l ν j g j (x)
Lagrange duality λ i and µ i are called: Lagrange multipliers For the existence of a maximum of L(x, λ, ν) it is necessary: 1 L x = f(x) i λ i g i (x) j µ i g j (x) = 0 2 L λ i = g i (x) = 0, i = 1,..., l 3 L µ i = g j (x) = 0, j = l + 1,..., m Especially, if, for instance µ = 0 (there is no = condition), then x is a solution of the original problem if g i (x) 0, i = 1,..., l and f(x) = i λ i g i (x) Only a necessary condition!
Lagrange duality: example Find the maximum of f(x, y) = x + y, subject to x 2 + y 2 = 1. Solution. The gradient is Necessity condition: L(x, y, λ) = x + y λ(x 2 + y 2 1) L(x, y, λ) = (1 2λx, 1 2λy, x 2 + y 2 1) L(x, y, λ) = 0 1 2λx = 0, 1 2λy = 0, x 2 + y 2 1 = 0 Solving it we get: x = y = 1/(2λ), λ = ±1/ 2 thus (x, y) = ( 2/2, 2/2), and (x, y) = ( 2/2, 2/2). It follows that f( 2/2, 2/2) = 2 a maximum, f( 2/2, 2/2) = 2 a minimum. Remark.: We should have checked the sufficiency condition (but now we know that we finished. Why?)
Lagrange duality: example
Lagrange duality and LP LP in standard form is given as Then The Lagrange function is then max x R n f(x) = c T x Ax b x 0 L(x, λ) = c T x λ T (Ax b) g(λ) = max L(x, λ) x We need the partial derivatives to find the maximum (necessary condition!).
Lagrange duality and LP The partial derivatives are and The Lagrange function is L x = ct λ T A = 0 L λ = Ax b = 0 then L(x, λ) = c T x λ T (Ax b) = (c T λ T A)x + λ T b g(λ) = max L(x, λ) = x then the dual problem is: { λ T b if c T λ T A 0 + otherwise min g(λ) = λ T b A T λ c λ 0
Lagrange duality Theorem (weak duality). Let p = max x f(x) be a primal optimum. Then for any λ 0 and ν, p g(λ, ν) Theorem (strong duality via Slater condition). Let p = max x f(x) be primal optimum, d = min (λ,ν) g(λ, ν) be the dual optimum. Let x 0 S be an inner point of the feasible region, means that g i (x 0 ) < 0, i = 1,..., l and g j (x 0 ) = 0, j = l + 1,..., m; this is called Slater condition). Then p = d. Remark: Compare the two theorems with the duality theorems we learned in case of LP.
The log-barrier method Motivation: instead of solving a constrained optimization problem, we put the constraints to the objective function (similarly as we did using the Lagrange multipliers) Idea: Suppose that we have an initial feasible solution (x 0 S). Add a barrier in the border of S. A task is: The Lagrange function: max x R n f(x) subject to g i (x) 0, i = 1,..., m L(x, λ) = f(x) λ T g(x) = f(x) m λ i g i (x) i=1
The log-barrier method The barrier function is defined as φ is usually given in the form: B µ (x) = f(x) + µφ(x) φ(x) = m i=1 log(g i(x)) φ(x) = m i=1 1 (g i (x)) Parameter µ controls the strength of the barrier: µ is large: gradual barrier µ is small: sharp barrier
A log-barrier method φ(x) = m log( g i (x)) φ(x) = i=1 m i=1 1 g i (x) µ large (blue) µ small (brown)
The log-barrier method: example Example. max x + y hif x 2 + y 2 1 B µ = x + y + µ x 2 +y 2 1 Start from x 0 initial feasible solution, solve max B µ problem. Then decrease µ gradually and solve max B µ again and again.
The log-barrier method: remarks The solutions of max B µ converges the solution of the original problem if µ 0, but cannot reach it if g i (x ) = 0 (B µ (x ) = µ > 0). It can be applied just in case of inequality constraints, otherwise it has no feasible solution If µ is small, then the barrier function is ill conditioned, means that hard to optimize with numerical methods If the method leave the feasible region in an iteration, then the objective function is undefined (logarithmic barrier), or the method gives bad solution (reciprocal barrier)
Simplex vs. interior point method Applying the log-barrier method to LP we obtain an important interior point method!
Penalty function What can we do if there is no initial feasible point (i.e. we cannot find an inner point)? Idea: Let the penalty function be π(x, ρ) = f(x) + ρψ(x), where ρ < 0 is the penalty parameter and { 0, if x feasible solution ψ(x) = > 0, otherwise. Goal: Solving max x π(x, ρ i) series of problems, where ρ i. More concretely: let the objective function value is small in case of non feasible solutions.
Penalty function Consider the problem max x R n f(x) subject to g j (x) = 0, j = 1,..., m given with equality constraints. A squared penalty function π(x, ρ) = f(x) + 1 m 2 ρ g i (x) 2 i=1 Calculating the derivatives: m π = f(x) + ρ g i (x) g i (x) = 0 Comparing with the Lagrange function of the original problem: m 0 = f(x) λ i g i (x) Thus λ i = ρg i (x). i=1 i=1
Penalty function: example Example. max x + y if x 2 + y 2 1 P ρ = x + y + ρ max{0, x 2 + y 2 1} 2
Penalty function: example
Penalty function We do not know the appropriate value of ρ in advance ρ < 0. If ρ is too large, then the problem is ill conditioned Decrease the value of ρ step-by-step Start the optimum search from the previous solution (as inner point) If the objective function converge: STOP Unfortunately the method not always gives a feasible solution
Barrier vs. Penalty Example. max e x s.t. x 0
Barrier vs. Penalty Barrier functions Penalty functions