Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques to solve constrained problems. This is achieved by either adding a penalty for infeasibility and forcing the solution to feasibility and subsequent optimum, or adding a barrier to ensure that a feasible solution never becomes infeasible. Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 11 7
Penalty Methods Penalty methods use a mathematical function that will increase the objective for any given constrained violation. General transformation of constrained problem into an unconstrained problem: min T(x) = f(x) + r k P(x) where f(x) is the objective function of the constrained problem r k is a scalar denoted as the penalty or controlling parameter P(x) is a function which imposes penalities for infeasibility (note that P(x) is controlled by r k ) T(x) is the (pseudo) transformed objective Two approaches exist in the choice of transformation algorithms: 1) Sequential penalty transformations 2) Exact penalty transformations Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 11 8
Sequential Penalty Transformations Sequential Penalty Transformations are the oldest penalty methods. Also known as Sequential Unconstrained Minimization Techniques (SUMT) based upon the work of Fiacco and McCormick, 1968. Consider the following frequently used general purpose penalty function: m T(x) = y(x) + r k { (max[0, g i (x)] 2 ) + [h j (x)] 2 ) } i=1 l j=1 Sequential transformations work as follows: Choose P(x) and a sequence of r k such that when k goes to infinity, the solution found. x* is For example, for k = 1, r 1 = 1 and we solve the problem. Then, for the second iteration r k is increased by a factor 10 and the problem is resolved starting from the previous solution. Note that an increasing value of r k will increase the effect of the penalties on T( x). The process is terminated when no improvement in T( x) is found and all constraints are satisfied. Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 11 9
Two Classes of Sequential Methods Two major classes exist in sequential methods: 1) First class uses a sequence of infeasible points and feasibility is obtained only at the optimum. These are referred to as penalty function or exterior-point penalty function methods. 2) Second class is characterized by the property of preserving feasibility at all times. These are referred to as barrier function methods or interiorpoint penalty function methods. General barrier function transformation T(x) = y(x) + r k B(x) where B(x) is a barrier function and r k the penalty parameter which is supposed to go to zero when k approaches infinity. Typical Barrier functions are the inverse or logarithmic, that is: m B(x) = g i -1 (x) i=1 Optimization in Engineering Design m or B(x) = ln[ g i (x)] i=1 Georgia Institute of Technology Systems Realization Laboratory 12 0
What to Choose? Some prefer barrier methods because even if they do not converge, you will still have a feasible solution. Others prefer penalty function methods because You are less likely to be stuck in a feasible pocket with a local minimum. Penalty methods are more robust because in practice you may often have an infeasible starting point. However, penalty functions typically require more function evaluations. Choice becomes simple if you have equality constraints. (Why?) Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 12 1
Exact Penalty Methods Theoretically, an infinite sequence of penalty problems should be solved for the sequential penalty function approach, because the effect of the penalty has to be more strongly amplified by the penalty parameter during the iterations. Exact transformation methods avoid this long sequence by constructing penalty functions that are exact in the sense that the solution of the penalty problem yields the exact solution to the original problem for a finite value of the penalty parameter. However, it can be shown that such exact functions are not differentiable. Exact methods are newer and less well-established as sequential methods. Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 12 2
Closing Remarks Typically, you will encounter sequential approaches. Various penalty functions P(x) exist in the literature. Various approaches to selecting the penalty parameter sequence exist. Simplest is to keep it constant during all iterations. Always ensure that penalty does not dominate the objective function during initial iterations of exterior point method. Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 12 3
Numerical Optimization Unit 9: Penalty Method and Interior Point Method Unit 10: Filter Method and the Maratos Effect Che-Rung Lee Scribe: May 1, 2011 (UNIT 9,10) Numerical Optimization May 1, 2011 1 / 24
Penalty method The idea is to add penalty terms to the objective function, which turns a constrained optimization problem to an unconstrained one. Quadratic penalty function Example (For equality constraints) min x 1 + x 2 subject to x 2 1 + x 2 2 2 = 0 ( x = (1, 1)) Define Q( x, μ) = x 1 + x 2 + μ 2 (x 2 1 + x 2 2 2)2 For μ = 1, ( 1 + 2(x 2 Q( x, 1) = 1 + x2 2 2)x ) 1 1 + 2(x1 2 + x 2 2 2)x = 2 ( 0 0 ), ( x 1 x 2 ) = ( 1.1 1.1 ) For μ = 10, ( 1 + 20(x 2 Q( x, 10) = 1 + x2 2 2)x 1 1 + 20(x1 2 + x 2 2 2)x 2 ) ( 0 = 0 ), ( x 1 x 2 ) = ( 1.0000001 1.0000001 ) (UNIT 9,10) Numerical Optimization May 1, 2011 2 / 24
Size of μ Example It seems the larger μ, the better solution is. When μ is large, matrix 2 Q μ c c T is ill-conditioned. μ cannot be too small either. min x 5x 2 1 + x 2 2 s.t. x 1 = 1. Q( x, μ) = 5x 2 1 + x 2 2 + μ 2 (x 1 1) 2. Q(x, μ) = f (x) + μ 2 (c(x))2 Q = f + μc c 2 Q = 2 f + μ c c T + μc 2 c For μ < 10, the problem min Q( x, μ) is unbounded. (UNIT 9,10) Numerical Optimization May 1, 2011 3 / 24
Quadratic penalty function Picks a proper initial guess of μ and gradually increases it. Algorithm: Quadratic penalty function 1 Given μ 0 > 0 and x 0 2 For k = 0, 1, 2,... 1 Solve min Q(:, μ k ) = f ( x) + μ k x 2 2 If converged, stop ci 2 ( x). i E 3 Increase μ k+1 > μ k and find a new x k+1 Problem: the solution is not exact for μ. (UNIT 9,10) Numerical Optimization May 1, 2011 4 / 24
Augmented Lagrangian method Use the Lagrangian function to rescue the inexactness problem. Let L( x, ρ, μ) = f ( x) i ε L = f ( x) i ε ρ i c i ( x) + μ ci 2 ( x) 2 ρ i c i ( x) + μ i ε i ε c i ( x) c i. By the Lagrangian theory, L = f i ε (ρ i μc i ) c }{{} i. λ i At the optimal solution, c i ( x ) = 1 μ (λ i ρ i ). If we can approximate ρ i λ i, μ k need not be increased indefinitely, λ k+1 i = λ k i μ k c i (x k ) Algorithm: update ρ i at each iteration. (UNIT 9,10) Numerical Optimization May 1, 2011 5 / 24
Inequality constraints There are two approaches to handle inequality constraints. 1 Make the object function nonsmooth (non-differentiable at some points). 2 Add slack variable to turn the inequality constraints to equality constraints. { ci ( x) s c i 0 i = 0 s i 0 But then we have bounded constraints for slack variables. We will focus on the second approach here. (UNIT 9,10) Numerical Optimization May 1, 2011 6 / 24
Inequality constraints Suppose the augmented Lagrangian method is used and all inequality constraints are converted to bounded constraints. For a fixed μ and λ, min x s.t. L( x, λ, μ) = f ( x) l x u m λ i c i ( x) + μ 2 i=1 m ci 2 ( x) The first order necessary condition for x to be a solution of the above problem is x = P( x x L A ( x, λ, μ), l, u), where P( g, l i, if g i l i ; l, u) = g i, if g i (l i, u i ); u i, if g i u i. i=1 for all i = 1, 2,..., n. (UNIT 9,10) Numerical Optimization May 1, 2011 7 / 24
Nonlinear gradient projection method Sequential quadratic programming + trust region method to solve min x f ( x) s.t. l x u Algorithm: Nonlinear gradient projection method 1 At each iteration, build a quadratic model q( x) = 1 2 (x x k) T B k (x x k ) + f T k (x x k) where B k is SPD approximation of 2 f (x k ). 2 For some Δ k, use the gradient projection method to solve min x q( x) s.t. max( l, x k Δ k ) x max( u, x k + Δ k ), 3 Update Δ k and repeat 1-3 until converge. (UNIT 9,10) Numerical Optimization May 1, 2011 8 / 24
Interior point method Consider the problem min x f ( x) s.t. C E ( x) = 0 C I ( x) s = 0 s 0 where s are slack variables. The interior point method starts a point inside the feasible region, and builds walls on the boundary of the feasible region. A barrier function goes to infinity when the input is close to zero. min f ( x) μ m log(s i ) s.t. x, s i=1 The function f (x) = log x as x 0. μ: barrier parameter C E ( x) = 0 C I ( x) s = 0 (UNIT 9,10) Numerical Optimization May 1, 2011 9 / 24 (1)
An example Example (min x + 1, s.t. x 1) min x x + 1 μ ln(1 x) y 1 x 0 μ = 1, x = 0.00005 μ = 0.1, x = 0.89999 μ = 0.01, x = 0.989999 μ = 10 5, x = 0.99993 x The Lagrangian of (1) is L( x, s, y, z) = f ( x) μ m log(s i ) y T C E ( x) z T (C I ( x) s) i=1 1 Vector y is the Lagrangian multiplier of equality constraints. 2 Vector z is the Lagrangian multiplier of inequality constraints. (UNIT 9,10) Numerical Optimization May 1, 2011 10 / 24
The KKT conditions The KKT conditions The KKT conditions for (1) x L = 0 f A E y A I z = 0 s L = 0 SZ μi = 0 y L = 0 C E ( x) = 0 z L = 0 C I ( x) s = 0 (2) Matrix S = diag( s) and matrix Z = diag( z). Matrix A E is the Jacobian of C E and matrix A I is the Jacobian of C I. (UNIT 9,10) Numerical Optimization May 1, 2011 11 / 24
Newton s step Let F = x L s L y L z L = f A E y A I z SZ μi C E ( x) C I ( x) s. The interior point method uses Newton s method to solve F = 0. F = xx L 0 A E ( x) A I ( x) 0 Z 0 S A E ( x) 0 0 0 A I ( x) I 0 0 Newton s step F = p x p s p y p z = F x k+1 = x k + α x p x s k+1 = s k + α s p s y k+1 = y k + α y p y z k+1 = z k + α z p z (UNIT 9,10) Numerical Optimization May 1, 2011 12 / 24
Algorithm: Interior point method (IPM) Algorithm: Interior point method (IPM) 1 Given initial x 0, s 0, y 0, z 0, and μ 0 2 For k = 0, 1, 2,... until converge (a) Compute p x, p s, p y, p z and α x, α s, α y, α z (b) ( x k+1, s k+1, y k+1, z k+1 ) = ( x k, s k, y k, z k ) + (α x p x, α s p s, α y p y, α z p z ) (c) Adjust μ k+1 < μ k (UNIT 9,10) Numerical Optimization May 1, 2011 13 / 24
Some comments of IMP Some comments of the interior point method 1 The complementarity slackness condition says s i z i = 0 at the optimal solution, by which, the parameter μ, SZ = μi, needs to decrease to zero as the current solution approaches to the optimal solution. 2 Why cannot we set μ zero or small in the beginning? Because that will make x k going to the nearest constraint, and the entire process will move along constraint by constraint, which again becomes an exponential algorithm. 3 To keep x k (or s and z) too close any constraints, IPM also limits the step size of s and z αs max αz max = max{α (0, 1], s + α p s (1 τ) s} = max{α (0, 1], z + α p z (1 τ) z} (UNIT 9,10) Numerical Optimization May 1, 2011 14 / 24
Interior point method for linear programming We will use linear programming to illustrate the details of IPM. KKT conditions The primal min x c T x s.t. A x = b, x 0. The dual max λ b T λ s.t. A T λ + s = c, s 0. A T λ + s = c A x = b x i s i = 0 x 0, s 0 (UNIT 9,10) Numerical Optimization May 1, 2011 15 / 24
Solve problem Let X = x 1 x 2... xn, S = s 1 s 2... sn, F = A T λ + s c A x b X s μ e The problem is to solve F = 0 (UNIT 9,10) Numerical Optimization May 1, 2011 16 / 24
Newton s method Using Newton s method F p x p λ p z F = = F 0 A T I A 0 0 S 0 X How to decide μ k? μ k = 1 n x k. s k is called duality measure. x k+1 = x k + α x p x λk+1 = λ k + α λ p λ z k+1 = z k + α z p z (UNIT 9,10) Numerical Optimization May 1, 2011 17 / 24
The central path The central path The central path: a set of points, p(τ) = solution of the equation x τ λ τ z τ, defined by the A T λ + s = c A x = b x i s i = τ i = 1, 2,, n x, s > 0 (UNIT 9,10) Numerical Optimization May 1, 2011 18 / 24
Algorithm Algorithm: The interior point method for solving linear programming 1 Given an interior point x 0 and the initial guess of slack variables s 0 2 For k = 0, 1,... 0 A T I (a) Solve A 0 0 S k 0 X k σ k [0, 1]. (b) Compute (α x, α λ, α s ) s.t. Δx k Δλ k Δs k x k+1 λk+1 s k+1 = = b A xk c s k A T λ k X k S k e + σ k μ k e x k + α x Δx k λk + α λ Δλ k s k + α s Δs k for is in the neighborhood of the central path x N (θ) = λ F XS e μ e θμ for some θ (0, 1]. s (UNIT 9,10) Numerical Optimization May 1, 2011 19 / 24
Filter method Example There are two goals of constrained optimization: 1 Minimize the objective function. 2 Satisfy the constraints. Suppose the problem is min x f ( x) s.t. c i ( x) = 0 for i E c i ( x) 0 for i I Define h( x) penalty functions of constraints. h( x) = i E c i ( x) + i I [c i ( x)], in which the notation { [z] = max{0, z}. min f ( x) The goals become min h( x) (UNIT 9,10) Numerical Optimization May 1, 2011 20 / 24
The filter method A pair (f k, h k ) dominates (f l, h l ) if f k < f l and h k < h l. A filter is a list of pairs (f k, h k ) such that no pair dominates any other. The filter method only accepts the steps that are not dominated by other pairs. Algorithm: The filter method 1 Given initial x 0 and an initial trust region Δ 0. 2 For k = 0, 1, 2,... until converge 1 Compute a trial x + by solving a local quadric programming model 2 If (f +, h + ) is accepted to the filter Set x k+1 = x +, add (f +, h + ) to the filter, and remove pairs dominated by it. Else Set x k+1 = x k and decrease Δ k. (UNIT 9,10) Numerical Optimization May 1, 2011 21 / 24
The Maratos effect Example The Maratos effect shows the filter method could reject a good step. min x 1,x 2 f (x 1, x 2 ) = 2(x 2 1 + x 2 2 1) x 1 s.t. x 2 1 + x 2 2 1 = 0 The optimal solution is x = (1, 0) ( ) ( cos θ sin Suppose x 2 θ k =, p sin θ k = sin θ cos θ x k+1 = x k + p k = ) ( cos θ + sin 2 θ sin θ(1 cos θ) ) (UNIT 9,10) Numerical Optimization May 1, 2011 22 / 24
Reject a good step ( ) x k x = cos θ 1 sin θ = cos 2 θ 2 cos θ + 1 + sin 2 θ = 2(1 cos θ) x k+1 x = cos θ + sin2 θ 1 sin θ sin θ cos θ = cos θ(1 cos θ) sin θ(1 cos θ) = cos 2 θ(1 cos θ) 2 + sin 2 θ(1 cos θ) 2 = (1 cos θ) 2 Therefore x k+1 x x k x 2 = 1. This step gives a quadratic convergence. 2 However, the filter method will reject this step because f ( x k ) = cos θ, and c( x k ) = 0, f ( x k+1 ) = cos θ sin 2θ = sin 2 θ cos θ > f ( x k ) c( x k+1 ) = sin 2 θ > 0 = c( x k ) (UNIT 9,10) Numerical Optimization May 1, 2011 23 / 24
The second order correction The second order correction could help to solve this problem. Instead of c( x k ) T p k + c( x k ) = 0, use quadratic approximation c( x k ) + c( x k ) T dk + 1 2 d T k 2 xxc( x) d k = 0. (3) Suppose d k p k is small. Use Taylor expansion to approximate quadratic term c( x k + p k ) c( x k ) + c( x k ) T p k + 1 2 pt k 2 xxc( x) p k. 1 2 d T k 2 xxc( x) d k 1 2 pt k 2 xxc( x) p k c( x k + p k ) c( x k ) c( x k ) T p k. Equation (3) can be rewritten as c( x k ) T dk + c( x k + p k ) c( x k ) T p k = 0 Use the corrected linearized constraint: c( x k ) T p + c( x k + p k ) c( x k ) T p k = 0. (The original linearized constraint is c( x k ) T p + c( x k ) = 0.) (UNIT 9,10) Numerical Optimization May 1, 2011 24 / 24