ath 5311 Constrained Optimization otes February 5, 2009 1 Equality-constrained optimization Real-world optimization problems frequently have constraints on their variables. Constraints may be equality constraints, for example, The total mass of the system must equal 3 or inequality constraints, for example, The total mass of the system must be less than or equal to 3. In this course, we ll restrict ourselves to equality constraints. As always, we re restricting ourselves to variables living in a vector space. So in general, we ll be solving problems of the form min x V f (x) s.t. g(x) = 0 where the objective function (or cost function) f : V R and the constraints g : V U, where U is some other vector space. The abbreviation s.t. is customary for the phrase such that. Usually, we will have dim(u) < dim(v) and g(x) a non-invertible function. Were g(x) invertible, we could solve g(x) = 0 for a unique x and be done: if only one point is consistent with the constraints, then the minimum must be at that point. In other words, to be an interesting optimization problem, the constraints should form an underdetermined system of equations. 1.1 Examples of equality-constrained optimization problems 1. inimize x 2 1 + x2 2 + x 1x 2 2x 1 + 3x 2 over x R 2 subject to x 1 + 3 x 2 = 2. Here, the constraint g : R 2 R is g(x) = Ax b where A = 1 3 and b = 1. 2. inimize sin(x 1 x 2 2 ) over x R2 subject to x 2 = cos x 1. This problem is rather easily transformed to the 1D problem: min sin(x cos 2 x) over x R. 3. inimize Wu over u H 1 (, 1) subject to u() = 1, u(1) = 0 where W : H 1 (, 1) R is the functional 1 Wu = 2 u2 x + u dx. We can write the constraint as g(u) = u() 1 u(1) ote that g maps the infinite-dimensional H 1 to the finite-dimensional R 2.. 1
4. If we discretize example 3 on the -th order Vandermonde basis, we get the following finite-dimensional minimization problem: minimize W(u) = 1 2 1 () i+j u i u j i + j 1 + u i 1 () i i over u R subject to u i = 0 () i u i = 1. 5. inimize 1 2 xt Kx over x R subject to x T x = 1. 2 Solving equality-constrained differentiable optimization problems In calculus you learned the method of Lagrange multipliers for solving constrained optimization problems in R 2 and R 3. Here, we ll develop the Lagrange ultiplier Theorem (LT) in a general vector space setting using Gateaux differentials. To help understand the notation and content of the theorem, I ll first state and prove the LT with a finite number of constraints, then extend it (without proof) to an arbitrary vector space V. 2.1 The Lagrange ultiplier Theorem with a finite number of constraints Theorem 1. Let f, g 1,, g be Frechet differentiable real-valued functions on V. To avoid trivial cases, assume < dim(v), that is, we have fewer constraints than dimensions, and also assume we have no redundant constraints. Let Ω V be the subset of V satisfying the constraints, that is, Ω = {x V g 1 (x) = 0 g 2 (x) = 0 g (x) = 0}. If x is a local minimizer of f in Ω, then there exist {λ} such that This theorem warrants some discussion. D f (x ) + λ i Dg i (x ) = 0. 1. The LT establishes a necessary condition for x to be a minimizer: if x is a minimizer, then certain equations involving x must hold. It does not follow that if the equations hold, x is a minimizer. It may be a maximizer or a saddle point. 2. In R the Frechet derivative is just the dimensional gradient, a vector having components. The set of equations f (x ) + λ i g i (x ) therefore consists of equations in + unknowns ( components of x plus multipliers), so it s underdetermined. The remaining equations are the constraints, g i (x ) = 0, for i = 1 to. You must simultaneously solve the multiplier equations and the constraints. In general, this can be quite difficult. The full system of equations is often called the equality-constrained Karush-Kuhn-Tucker equations, or KKT equations. 3. I ve deliberately left vague what is meant by redundant constraints. As an exercise in your ability to formulate precise mathematical statements of obvious concepts, try to develop a clear and complete definition of redundant constraints. 2
2.2 Proof of the LT with a finite number of constraints Proof. Let f : V R and g : V R be Frechet differentiable in some open set S x. We ll refer to the i-th component of g as g i. Because g i is Frechet differentiable at x, it has a unique tangent plane T i at x. Let x satisfy the constraint equations g(x ) = 0. At x the tangent hyperplane T i to the constraint surface g i = 0 is defined as the set of directions h such that d h g i (x ) = 0. For x to be a stationary point of f subject to g(x ) = 0, it must be the case that d h f (x ) = 0 for all vectors h in the tangent hyperplane, that is, for all h such that d h g i (x ) = 0. ow, because we ve stipulated that f is Frechet differentiable, we know that d h f = D f h h V. For the same reason we know that d h g i = Dg i h h V. In other words, both d h f and d h g are linear functions of h. Furthermore, by the requirement for a stationary point we know that for all values of h such that d h g i (x ) = 0, we have d h f (x ) = 0 as well. The only possible linear function satisfying that condition is an arbitrary linear combination of the linear functions involving Dg, that is, This must be true for all h, so the LT follows. D f (x )h = i=0 λ i Dg i (x )h. 2.3 Several useful corollaries Corollary 2. If x is a stationary point of f (x ) subject to the constraint g(x ) = 0, then (x, λ ) is a stationary point of the function L(x, λ) = f (x) + λ T g(x). The proof is straightforward. The function L is usually called the Lagrangian. Warning: in mechanics, there is another function called the Lagrangian, and usually denoted by L; these are not the same functions. Unless otherwise specified, the Lagrangian will always refer to L = f + λ T g, not the function defined in mechanics. Sometimes you ll see the Lagrangian defined as L = f λ T g. The choice of sign is irrelevant to the value of x, and can be chosen as convenient for a given problem. Changing the sign in the Lagrangian will, of course, change the sign of the Lagrange multipliers (unless you also change the sign of g). Corollary 3. If x is a minimizer of f : V R subject to the constraints g : V R, then d h f (x ) + λ i d h g i (x ) = 0 h V. The proof follows immediately from the LT and the definition of the Frechet derivative. 2.4 The LT with infinitely many constraints Theorem 4. Let f : V R and g : V U be Frechet differentiable functions. Let, U be an inner product on the vector space U. Assume dim(u) < dim(v). Let Ω V be the subset of V satisfying the constraints. If x is a local minimizer of f in Ω, then there exist λ U such that D f (x ) + λ, Dg(x ) U = 0. 3 Examples 3.1 Quadratic objective functions, linear constraints Consider optimization in R with linear constraints. Let p(x) = 1 2 xt Kx x T f and g(x) = Ax b. Form the Lagrangian L = 1 2 xt Kx x T f λ T Ax + λ T b. The KKT equations are d (h,µ) L = h T (Kx f + A T λ) + µ T (Ax b) = 0, or in matrix form, 3
K A T A 0 x λ The question of when these KKT equations have solutions will be explored in one of your homework problems. = f b. 3.2 Quadratic objective functions, one quadratic constraint Let K be an by matrix. inimize the quadratic form p(x) = 1 2 xt Kx subject to g(x) = x T x 1 = 0. ote that g : R R is a single constraint. The Lagrangian is L = 1 2 xt Kx λ(x T x 1). ote that I ve used a negative sign in the definition of the Lagrangian; this will result in a more conventional form of the equations. Taking differentials of the Lagrangian and setting to zero gives d (h,λ) L = h T (Kx λx) + µ(x T x 1) = 0 h R, µ R. This results in two equations to be solved: Kx = λx and x T x = 1. The first is an eigenvalue problem for eigenvectors x and eigenvalue λ, the second is a normalization condition on the eigenvectors. 3.3 A quadratic functional with boundary conditions inimize Wu = 1 2 u2 x + u dx over u H 1 subject to the constraints u() = 1 and u(1) = 0. Form a Lagrangian L = W + λ 1 (u() 1) + λ 2 u(1). The stationary point is given by the solution to d (v,µ) L = Integration by parts gives u x v x + v dx + λ 1 v() + λ 2 v(1) + µ 1 (u() 1) + µ 2 u(1) v H 1, µ R 2. d (v,µ) L = v 1 u xx dx + (λ 1 + n u())v() + (λ 2 + n u(1))v(1) + µ 1 (u() 1) + µ 2 (u(1)) = 0 for all v H 1, µ R 2. The minimum will be given by the solution to u xx = 1 with boundary conditions u() = 1, u(1) = 0. The multipliers will be λ 1 = n u() and λ 2 = n u(1). 3.3.1 Discretization of a quadratic functional with boundary conditions Let s now discretize the same functional using the Vandermonde basis of order 1. The i-th basis function is φ i (x) = x i. Plugging into d (v,µ) L from example 3.3 gives 1 1 u j x i x j dx + x i dx + λ 1 () i + λ 2 = 0 i 1, u j () j = 1 u j = 0. 4
Define A 1j = () j, A 2j = 1, K ij = 1 xi x j dx, f i = 1 K A T A 0 xi dx, and b = (1, 0) T. The discrete equations are then u f =. λ b Some advantages to this approach are that we can work with a basis for H 1 (instead of H0 1 ) and we can solve a problem with inhomogeneous boundary conditions. 5