Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation Peter J.C. Dickinson DMMP, University of Twente p.j.c.dickinson@utwente.nl http://dickinson.website/teaching/2017co.html version: 06/11/17 Monday 6th November 2017 Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 1/25

Problem min x f (x) s. t. g j (x) 0 for all j = 1,..., m x R n. (C) f, g 1,..., g m C 1, f, g 1,..., g m : R n R, F := {x C : g j (x) 0 for all j = 1,..., m}. We will not make any convexity assumptions. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 2/25

Table of Contents 1 Introduction 2 Feasible descent method Basic idea Naive choice of direction Alternative choice of direction 3 Unconstrained optimisation 4 Penalty method 5 Barrier method Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 3/25

Basic idea 1 Start at a point x 0 F. (k = 0) 2 If x k is a John point then STOP. 3 If it is not a John point then there is a strictly feasible descent direction d k. 4 Line search: Find λ k = arg min λ {f (x k + λd k ) : λ R, x k + λd k F} (or just f (x k + λ k d k ) < f (x k ) and x k + λd k F). 5 Let x k+1 = x k + λ k d k F and k k + 1. 6 If stopping criteria satisfied then STOP, else go to step 2. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 4/25

Choosing d k : Naive method If there is a strictly feasible descent direction, then the following problem will provide one: min d,z z s. t. f (x k ) T d z g i (x k ) T d z for all i s. t. g i (x k ) = 0 1 d j 1 for all j = 1,..., n. Remark 6.1 (+) This is a relative simple method for choosing d k. (-) It ignores constraints s.t. g i (x k ) < 0 but g i (x k ) 0. This can lead to bad convergence, and possibly even converging to points which are not John points. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 5/25

Choosing d k : Topkis and Veinott method Ex. 6.1 Consider the following optimisation problem: f (x k ) T d z min d,z z : g i (x k ) T d z g i (x k ) for all i = 1,..., m 1 d j 1 for all j = 1,..., n. 1 Prove that if (d, z ) is optimal solution to the problem above with z < 0 then d is strictly feasible descent direction in (C) 2 Prove that if there is strictly feasible descent direction in (C) then the optimal value to problem above is strictly negative. (+) Relatively simple method for choosing d k. (+) All constraints are taken into account. (+) If there is a x F such that a subsequence of the solutions tend towards x, then x is a John point. [FKS,Th.12.5] (-) This gives a first order method (only gradients are taken in to account) and such methods generally have slow convergence. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 6/25

Table of Contents 1 Introduction 2 Feasible descent method 3 Unconstrained optimisation Newton s method Interpretations 4 Penalty method 5 Barrier method Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 7/25

Newton s method To minimise f : C R, f C 2 we do the following: 1 Start at a point x 0 C (k = 0). 2 If f (x k ) = 0 then STOP. 3 Assuming 2 f (x k ) O, let h k = ( 2 f (x k )) 1 f (x k ). 4 Let x k+1 = x k + h k and k k + 1. 5 If stopping criteria is satisfied then STOP, else go to step 2. Remark 6.2 We could penalise moving too far away from x k by exchanging f (x) for f k,µ (x) = f (x) + µ x x k 2 2, with parameter µ > 0. f k,µ (x) = f (x) + 2µ(x x k ), f k,µ (x k ) = f (x k ), 2 f k,µ (x) = 2 f (x) + 2µI, For µ high enough we then have 2 f k,µ (x) O. 2 f k,µ (x k ) = 2 f (x k ) + 2µI. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 8/25

Interpretations Interpretation 1 Want to find h in order to minimise f (x k + h). Have f (x k + h) f (x k ) + f (x k ) T h + 1 2 ht ( 2 f (x k ))h. Assuming 2 f (x k ) O and considering RHS of above as function of h, it is minimised at h = ( 2 f (x k )) 1 f (x k ). Interpretation 2 Want to find h such that f (x k + h) = 0. Have f (x k + h) f (x k ) + 2 f (x k )h. Assuming 2 f (x k ) is nonsingular, the RHS of the above is equal to 0 if and only if h = ( 2 f (x k )) 1 f (x k ). Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 9/25

Table of Contents 1 Introduction 2 Feasible descent method 3 Unconstrained optimisation 4 Penalty method Basic idea Basic results (Dis)advantages Choices for p Example Implementation Peter J.C. 5 Dickinson Barrier methodhttp://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 10/25

Basic idea Definition 6.3 p : R n R is a penalty function with respect to F if Penalty method p C 0, p(x) = 0 for all x F, p(x) > 0 for all x R n \ F. In the penalty method we solve the following unconstrained optimisation problem (for a suitable parameter r > 0 and penalty function p): min{f (x) + r p(x)}. x Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 11/25

Basic results Lemma 6.4 For r > 0 we have min{f (x) + r p(x)} min{f (x) + r p(x) : x F} = val(c). x x If F arg min x {f (x) + r p(x)} then we have equality above. Theorem 6.5 Suppose we have {r k : k N} R ++ with lim k r k = and x k arg min x {f (x) + r k p(x)} for all k N, such that x = lim k x k for some x R n. Then x F and x is a global minimiser of (C). Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 12/25

(Dis)advantages (+) This is an unconstrained problem, and thus we can use our methods from unconstrained optimisation. (+) The optimal solution for a given r > 0 gives a lower bound on the optimal value of (C). (+) If for some r > 0 we have an optimal solution to the penalty problem in F, then this is also an optimal solution to the original problem. (Under some conditions we can guarantee this happens for r large enough). (+) If x is a limiting point of a subsequence of optimal solutions x r as r then x F and x is a global minimiser to (C). ( ) In general we will get optimal solutions x r / F. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 13/25

Choice of p Letting g + j (x) = max{0, g j (x)}, two common choices are: m m ( 2 p(x) = g + j (x), and p(x) = g + j (x)). Ex. 6.2 j=1 Show that 1 if g is convex then g + is also convex. j=1 2 if g is convex then (g + (x)) 2 is also convex. If g C 1 then (g + (x)) 2 also has a continuous derivative. In general g + / C 1. If LICQ satisfied at local minimiser x F of (C), and y R m + are the KKT multipliers, then for p(x) = m j=1 g + j (x) and r > max{y j : j J x }, we have that x is local minimiser of penalty problem. [FKS, Th.12.10] Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 14/25

Example Example https://ggbm.at/szsqwcpu Ex. 6.3 Consider the problems min x {x : x 1}, min x {x 3 : x 1}. For each of these problems: 1 What is the global minimiser, denoted x, and optimal value to this problem? 2 For p(x) = m j=1 g + j (x) and p(x) = m j=1 (g + j (x)) 2 : 1 For r > 0, is the derivative of f r (x) := f (x) + r p(x) with respect to x continuous or not? 2 Find all the stationary points to f r (x), as a function of r. 3 Find the optimal value and solution to min x {f r (x) : x R} as a function of r > 0. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 15/25

Implementation One implementation would be to solve the penalty problem once for r very large. Alternatively, we could note that we are only interested in the limit as r, and not the solutions to the penalty problem for any fixed r > 0. We could thus use something like Newton s method to attempt to find a solution to the penalty problem, and in each iteration increase r. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 16/25

Table of Contents 1 Introduction 2 Feasible descent method 3 Unconstrained optimisation 4 Penalty method 5 Barrier method Basic idea Basic results (Dis)advantages Frisch s barrier function Implementation Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 17/25

Basic idea Will let F = {x R n : g i (x) < 0 for all i} and assume F = cl F. Lemma 6.6 inf x {f (x) : x F} = Definition 6.7 inf x {f (x) : x F}. b : F R, b C 0 is a barrier function for (C) if for all x bd F we have lim x x b(x) =. Barrier method In the barrier method we solve the following unconstrained optimisation problem (for a suitable parameter ρ > 0 and a suitable barrier function b): min{f (x) + ρb(x) : x F}. x Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 18/25

Basic results Lemma 6.8 Have F F, and thus for all x F, we get an upper bound of f ( x) on the optimal value to (C). Theorem 6.9 Suppose we have {ρ k : k N} R ++ with lim k ρ k = 0 and x k arg min x {f (x) + ρ k b(x)} for all k N, such that x = lim k x k for some x R n. Then x F and x is a global minimiser of (C). Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 19/25

(Dis)advantages (+) This is an unconstrained problem, and thus we can use our methods from unconstrained optimisation. (+) F F, and thus all feasible points for this problem are feasible for (C). (+) If x is a limiting point of a subsequence of optimal solutions x ρ as ρ 0 + then x F and x is a global minimiser to (C). Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 20/25

Frisch s barrier function Frisch s barrier function b(x) = m i=1 ln( g i(x)). Ex. 6.4 Consider g C 2 and b : {x R n : g(x) < 0} R, b(x) = ln( g(x)). Find 2 b(x) and using this, show that if g is a convex function then so is b. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 21/25

Parameterised KKT conditions Theorem 6.10 For Frisch s barrier function we have (f (x) + ρb(x)) = f (x) + m i=1 ρ g i (x) g i(x). Have that x is a stationary point to the barrier function if and only if its gradient is zero, or equivalently λ R m such that: x R n, λ R m +, 0 = f (x) + m λ i g i (x), i=1 g i (x) 0, λ i g i (x) = ρ for all i = 1,..., m. This system is known as the parameterised KKT conditions. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 22/25

Parameterised KKT conditions continued Theorem 6.11 Suppose we have {ρ k : k N} R ++ with lim k ρ k = 0 and (x k, λ k ) are solutions to the parameterised KKT conditions (with ρ = ρ k ). Then x k F and λ k R m +, implying that ψ(λ k ) val(c) f (x k ) for all k N. If ( x, λ) = lim k (x k, λ k ) for some ( x, λ) R n R m. Then x is a KKT point for (C) with multipliers λ. Recall that if (C) is convex, this implies that ( x, λ) is a saddle point to the Lagrangian function, and thus x is a optimal solution to the primal problem, whilst λ is an optimal solution to the dual problem. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 23/25

Example Example https://ggbm.at/szsqwcpu Ex. 6.5 Consider the problems min x {x : x 1}, min x {x : (x 1) exp(x 2 ) 0}. For each of these problems: 1 What is the global minimiser, denoted x, and optimal value to this problem? 2 For Frisch s barrier function, determine the optimal value to the barrier problem as a function of ρ. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 24/25

Implementation One implementation would be to solve the barrier problem once for ρ > 0 very small. Alternatively, we could note that we are only interested in the limit as ρ 0, and not the solutions to the penalty problem for any fixed ρ > 0. We could thus use something like Newton s method to attempt to find a solution to the penalty problem, and in each iteration decrease ρ. Peter J.C. Dickinson http://dickinson.website CO17, Chpt 6: Sol n methods: Constrained Opt. 25/25