Constrained optimization In general, the formulation of constrained optimization is as follows minj(w), subject to H i (w) = 0, i = 1,..., k. where J is the cost function and H i are the constraints. Lagrange multipliers: This is widely used method for constrained optimization. In order to use it, one needs to form the Lagrange function L(w, λ 1, λ 2,..., λ k ) = J(w) + k i=1 λ i H i (w) In the above λ i are called Lagrange multipliers. Observe that the minimum of L where the gradient w.r.t both w and λ s vanishes solves the original constrained optimization problem (the gradient of L w.r.t. to λ i gives the ith constraint function H i.) Gradient of L w.r.t w gives 1
J(w) w + k i=1 λ i H i (w) w = 0 The resulting set of equations can be solved using iterative method. In some simple cases, there may be even a solution in closed form. Example: Let w = [w 1, w 2 ] T. J(w) = 3w 2 1 + 7w 2 2 + 2w 1w 2. Use Lagrange method to find the minimum of J(w), subject to h(w) = w 1 + 2w 2 2 = 0. L(w, λ) = 3w 2 1 +7w2 2 +2w 1w 2 +λ(w 1 +2w 2 2) J(w) w 1 = 6w 1 + 2w 2 + λ = 0 J(w) w 2 = 14w 2 + 2w 1 + 2λ = 0 J(w) λ = h(w) = w 1 + 2w 2 2 = 0 The solution is 2
w 1 w 2 λ = 6 2 1 2 14 2 1 2 0 1 0 0 2 = 2/3 2/3 16/3 Projection method: If the constraints are simple (e.g. normalization of the parameter vector) and so they define a simple set of admissible parameter values the projection method can be used use a method of unconstrained optimization after each step of the optimization method project orthogonally the intermediate solution onto the constraints set 3
Example: Consider the last example in which we were trying to find the vector w maximizing the 4th order moment of w T x. We stated that the obtained algorithm was not very good as the norm of w would grow without bound. Consider the constrained version - maximize E[(w T x) 4 ] subject to the constraint w 2 = 1. Formulate the Lagrangian L(w, λ) = E[(w T x) 4 ] + λ( w 2 1) The gradient w.r.t. λ gives simply the constraint for the norm of w; the gradient w.r.t. w gives 4E[(w T x) 3 x] + 2λw. Thus, steepest descent gives w E[(w T x) 3 x] 1 2 λw Note the term 1 2λw which should help control the norm of w. 4
Example:Consider again the same problem but this time apply the projection method. The constraint of w 2 = 1 is equivalent to restricting w to points on the unit sphere. The projection algorithm would give w w + αe[(w T x) 3 x] w w w The normalisation of w at each step is equivalent to its orthogonal projection onto the unit sphere. Homework: Use Lagrange method to find extrema of the following objective function on the ellipse f(x) = x 2 1 + x2 2 {x : x 2 1 + 2x2 2 1 = 0} 5
Kuhn-Tucker conditions: The Lagrange method deals with constrained optimization, in which constraints have an equality form. Kuhn-Tucker (also Karush-Kuhn-Tucker) theorem extends this method to the inequality constraints. Consider the following problem: minimise f(x) subject to h(x) = 0, g(x) 0 where f : R n R, h : R n R m, m n and g : R n R p. Active and inactive constraints: An inequality constraint g j 0 is said to be active at x if g j (x) = 0 and it is called inactive at x if g j (x) < 0. Feasible point and feasible set: Any points satisfying the constraints is called a feasible point. The set of all feasible points is called the feasible set. 6
Regular point: Let x satisfy h(x ) = 0, g(x ) 0. Let J(x ) be the index set of active inequality constraints J(x ) = {j : g j (x ) = 0}. We say that x is a regular point if the vectors h i (x ), g j (x ), 1 i m, j J(x ) are linearly independent. Theorem:(Kuhn-Tucker) Let f,h,g C 1. Let x be a regular point and a local minimiser for the problem of minimising f subject to h(x) = 0, g(x) 0. Then, there exist λ R m and µ R p such that 1. µ 0 2. f(x ) + λ Dh(x ) + µ T Dg(x ) = 0 3. µ T g(x ) = 0. 7
Notes: Dh(x ) = ( h 1 (x ),..., h m (x )) T Dg(x ) = ( g 1 (x ),..., g p (x )) T λ is Lagrange multiplier vector and its components are Lagrange multipliers µ is Karush-Kuhn-Tucker (KKT) multiplier vector and its components are Karush- Kuhn-Tucker (KKT) multipliers 8
note that µ j 0 and g j(x ) 0. Thus µ T g(x ) = µ 1 g 1(x ) + µ 2 g 2(x ) + + µ pg p (x ) = 0 implies, that if g j (x ) < 0 then µ j = 0. Therefore the KKT multipliers corresponding to inactive constraints are zero. the other KKT multipliers are positive. Example: Assume that there are only 3 inequality constraints, of which g 3 is inactive, hence µ 3 = 0. From the KKT theorem or f(x ) + µ 1 g 1(x ) + µ 2 g 2(x ) = 0 f(x ) = µ 1 g 1(x ) µ 2 g 2(x ). Thus, f(x ) is a linear combination of the vectors g 1 (x ) and g 2 (x ) with positive multipliers. 9
Example: minf(x, y) = x 2 + x + y 2, s.t. 2x + 2y 1 x 0 y 0 This is restated as minf(x, y) = x 2 + x + y 2, s.t. 2x + 2y 1 0 x 0 y 0 The KKT translates as 1. µ 1 0 µ 2 0 µ 3 0 2. { 2x + 1 + 2µ1 µ 2 = 0 2y + 2µ 1 µ 3 = 0 3. µ 1 (2x + 2y 1) = 0 µ 2 ( x) = 0 µ 3 ( y) = 0 10
case 1 x > 0, y > 0 From 3. µ 2 = 0, µ 3 = 0 and µ 1 (2x + 2y 1) = 0 From 2. 2x+1+2µ 1 = 0 and 2y+2µ 1 = 0 which means 2x 2y + 1 = 0 There is no valid solution as µ 1 = 0 would lead to x = 0.5 and y = 0 which is invalid. or 2x + 2y 1 = 0 and 2x 2y + 1 = 0 would lead x = 0 invalid. case 2 x > 0, y = 0. From 3. µ 2 = 0 and µ 1 (2x 1) = 0 From 2. 2µ 1 µ 3 = 0 and 2x+1+2µ 1 = 0 There is not valid solution as x = 1/2 would lead to µ 1 = 1 which is invalid. or µ 1 = 0 would lead to x = 1/2 which is also invalid. 11
case 3 x = 0, y > 0 From 3. µ 3 = 0 and µ 1 (2y 1) = 0 From 2. 1+2µ 1 µ 2 = 0 and 2y+2µ 1 = 0 There is no valid solution as y = 1/2 would lead to µ 1 = 0.5 which is invalid. or µ 1 = 0 would lead to y = 0 which is also invalid. case 4 x = 0, y = 0 From 3. µ 1 = 0. From 2. µ 3 = 0, µ 2 = 1. So this is a valid solution. Hence the solution is x = 0, and y = 0. 12