Math 5311 Constrained Optimization Notes

Similar documents
Constrained Optimization

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Lecture 18: Optimization Programming

More on Lagrange multipliers

Generalization to inequality constrained problem. Maximize

Constrained optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Kernel Methods. Machine Learning A W VO

4TE3/6TE3. Algorithms for. Continuous Optimization

Math 164-1: Optimization Instructor: Alpár R. Mészáros

Computation. For QDA we need to calculate: Lets first consider the case that

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Linear & nonlinear classifiers

Introduction to Support Vector Machines

Mathematical Economics. Lecture Notes (in extracts)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Constrained Optimization and Lagrangian Duality

Machine Learning. Support Vector Machines. Manfred Huber

Examination paper for TMA4180 Optimization I

Chapter 11. Taylor Series. Josef Leydold Mathematical Methods WS 2018/19 11 Taylor Series 1 / 27

Support Vector Machines

IE 5531: Engineering Optimization I

OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION. (Communicated by Yang Kuang)

The Karush-Kuhn-Tucker conditions

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Gradient Descent. Dr. Xiaowei Huang

Numerical Optimization

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

2 so Q[ 2] is closed under both additive and multiplicative inverses. a 2 2b 2 + b

Support Vector Machines

ICS-E4030 Kernel Methods in Machine Learning

Transpose & Dot Product

MATH 4211/6211 Optimization Constrained Optimization

Optimization using Calculus. Optimization of Functions of Multiple Variables subject to Equality Constraints

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Linear and non-linear programming

Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1

Optimality Conditions

Nonlinear Optimization

Transpose & Dot Product

Lecture Notes on Support Vector Machine

Statistical Machine Learning from Data

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Optimality Conditions for Constrained Optimization

Nonlinear Programming and the Kuhn-Tucker Conditions

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machines

MATH529 Fundamentals of Optimization Constrained Optimization I

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Support Vector Machine

Lecture 2: Linear SVM in the Dual

A Note on Two Different Types of Matrices and Their Applications

CITY UNIVERSITY. London

Support Vector Machine

REVIEW OF DIFFERENTIAL CALCULUS

KKT Examples. Stanley B. Gershwin Massachusetts Institute of Technology

Linear & nonlinear classifiers

Chapter 7. Optimization and Minimum Principles. 7.1 Two Fundamental Examples. Least Squares

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Elements of linear algebra

Convex Optimization Boyd & Vandenberghe. 5. Duality

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Chapter 1. Vectors, Matrices, and Linear Spaces

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

OR MSc Maths Revision Course

EXAMPLES OF PROOFS BY INDUCTION

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

March 5, 2012 MATH 408 FINAL EXAM SAMPLE

Convex Optimization & Lagrange Duality

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Lagrange Multipliers

CS-E4830 Kernel Methods in Machine Learning

1 Introduction

ECE580 Solution to Problem Set 6

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Exam in TMA4180 Optimization Theory

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

Inequality Constraints

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

5. Duality. Lagrangian

Mathematical Foundations -1- Constrained Optimization. Constrained Optimization. An intuitive approach 2. First Order Conditions (FOC) 7

Optimality, Duality, Complementarity for Constrained Optimization

Math 155 Prerequisite Review Handout

subject to (x 2)(x 4) u,

Chap 2. Optimality conditions

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Convex Functions and Optimization

Math 10C - Fall Final Exam

Convex Optimization M2

Support Vector Machine (SVM) and Kernel Methods

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Lecture: Duality of LP, SOCP and SDP

MATH2070 Optimisation

Numerical Optimization of Partial Differential Equations

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Transcription:

ath 5311 Constrained Optimization otes February 5, 2009 1 Equality-constrained optimization Real-world optimization problems frequently have constraints on their variables. Constraints may be equality constraints, for example, The total mass of the system must equal 3 or inequality constraints, for example, The total mass of the system must be less than or equal to 3. In this course, we ll restrict ourselves to equality constraints. As always, we re restricting ourselves to variables living in a vector space. So in general, we ll be solving problems of the form min x V f (x) s.t. g(x) = 0 where the objective function (or cost function) f : V R and the constraints g : V U, where U is some other vector space. The abbreviation s.t. is customary for the phrase such that. Usually, we will have dim(u) < dim(v) and g(x) a non-invertible function. Were g(x) invertible, we could solve g(x) = 0 for a unique x and be done: if only one point is consistent with the constraints, then the minimum must be at that point. In other words, to be an interesting optimization problem, the constraints should form an underdetermined system of equations. 1.1 Examples of equality-constrained optimization problems 1. inimize x 2 1 + x2 2 + x 1x 2 2x 1 + 3x 2 over x R 2 subject to x 1 + 3 x 2 = 2. Here, the constraint g : R 2 R is g(x) = Ax b where A = 1 3 and b = 1. 2. inimize sin(x 1 x 2 2 ) over x R2 subject to x 2 = cos x 1. This problem is rather easily transformed to the 1D problem: min sin(x cos 2 x) over x R. 3. inimize Wu over u H 1 (, 1) subject to u() = 1, u(1) = 0 where W : H 1 (, 1) R is the functional 1 Wu = 2 u2 x + u dx. We can write the constraint as g(u) = u() 1 u(1) ote that g maps the infinite-dimensional H 1 to the finite-dimensional R 2.. 1

4. If we discretize example 3 on the -th order Vandermonde basis, we get the following finite-dimensional minimization problem: minimize W(u) = 1 2 1 () i+j u i u j i + j 1 + u i 1 () i i over u R subject to u i = 0 () i u i = 1. 5. inimize 1 2 xt Kx over x R subject to x T x = 1. 2 Solving equality-constrained differentiable optimization problems In calculus you learned the method of Lagrange multipliers for solving constrained optimization problems in R 2 and R 3. Here, we ll develop the Lagrange ultiplier Theorem (LT) in a general vector space setting using Gateaux differentials. To help understand the notation and content of the theorem, I ll first state and prove the LT with a finite number of constraints, then extend it (without proof) to an arbitrary vector space V. 2.1 The Lagrange ultiplier Theorem with a finite number of constraints Theorem 1. Let f, g 1,, g be Frechet differentiable real-valued functions on V. To avoid trivial cases, assume < dim(v), that is, we have fewer constraints than dimensions, and also assume we have no redundant constraints. Let Ω V be the subset of V satisfying the constraints, that is, Ω = {x V g 1 (x) = 0 g 2 (x) = 0 g (x) = 0}. If x is a local minimizer of f in Ω, then there exist {λ} such that This theorem warrants some discussion. D f (x ) + λ i Dg i (x ) = 0. 1. The LT establishes a necessary condition for x to be a minimizer: if x is a minimizer, then certain equations involving x must hold. It does not follow that if the equations hold, x is a minimizer. It may be a maximizer or a saddle point. 2. In R the Frechet derivative is just the dimensional gradient, a vector having components. The set of equations f (x ) + λ i g i (x ) therefore consists of equations in + unknowns ( components of x plus multipliers), so it s underdetermined. The remaining equations are the constraints, g i (x ) = 0, for i = 1 to. You must simultaneously solve the multiplier equations and the constraints. In general, this can be quite difficult. The full system of equations is often called the equality-constrained Karush-Kuhn-Tucker equations, or KKT equations. 3. I ve deliberately left vague what is meant by redundant constraints. As an exercise in your ability to formulate precise mathematical statements of obvious concepts, try to develop a clear and complete definition of redundant constraints. 2

2.2 Proof of the LT with a finite number of constraints Proof. Let f : V R and g : V R be Frechet differentiable in some open set S x. We ll refer to the i-th component of g as g i. Because g i is Frechet differentiable at x, it has a unique tangent plane T i at x. Let x satisfy the constraint equations g(x ) = 0. At x the tangent hyperplane T i to the constraint surface g i = 0 is defined as the set of directions h such that d h g i (x ) = 0. For x to be a stationary point of f subject to g(x ) = 0, it must be the case that d h f (x ) = 0 for all vectors h in the tangent hyperplane, that is, for all h such that d h g i (x ) = 0. ow, because we ve stipulated that f is Frechet differentiable, we know that d h f = D f h h V. For the same reason we know that d h g i = Dg i h h V. In other words, both d h f and d h g are linear functions of h. Furthermore, by the requirement for a stationary point we know that for all values of h such that d h g i (x ) = 0, we have d h f (x ) = 0 as well. The only possible linear function satisfying that condition is an arbitrary linear combination of the linear functions involving Dg, that is, This must be true for all h, so the LT follows. D f (x )h = i=0 λ i Dg i (x )h. 2.3 Several useful corollaries Corollary 2. If x is a stationary point of f (x ) subject to the constraint g(x ) = 0, then (x, λ ) is a stationary point of the function L(x, λ) = f (x) + λ T g(x). The proof is straightforward. The function L is usually called the Lagrangian. Warning: in mechanics, there is another function called the Lagrangian, and usually denoted by L; these are not the same functions. Unless otherwise specified, the Lagrangian will always refer to L = f + λ T g, not the function defined in mechanics. Sometimes you ll see the Lagrangian defined as L = f λ T g. The choice of sign is irrelevant to the value of x, and can be chosen as convenient for a given problem. Changing the sign in the Lagrangian will, of course, change the sign of the Lagrange multipliers (unless you also change the sign of g). Corollary 3. If x is a minimizer of f : V R subject to the constraints g : V R, then d h f (x ) + λ i d h g i (x ) = 0 h V. The proof follows immediately from the LT and the definition of the Frechet derivative. 2.4 The LT with infinitely many constraints Theorem 4. Let f : V R and g : V U be Frechet differentiable functions. Let, U be an inner product on the vector space U. Assume dim(u) < dim(v). Let Ω V be the subset of V satisfying the constraints. If x is a local minimizer of f in Ω, then there exist λ U such that D f (x ) + λ, Dg(x ) U = 0. 3 Examples 3.1 Quadratic objective functions, linear constraints Consider optimization in R with linear constraints. Let p(x) = 1 2 xt Kx x T f and g(x) = Ax b. Form the Lagrangian L = 1 2 xt Kx x T f λ T Ax + λ T b. The KKT equations are d (h,µ) L = h T (Kx f + A T λ) + µ T (Ax b) = 0, or in matrix form, 3

K A T A 0 x λ The question of when these KKT equations have solutions will be explored in one of your homework problems. = f b. 3.2 Quadratic objective functions, one quadratic constraint Let K be an by matrix. inimize the quadratic form p(x) = 1 2 xt Kx subject to g(x) = x T x 1 = 0. ote that g : R R is a single constraint. The Lagrangian is L = 1 2 xt Kx λ(x T x 1). ote that I ve used a negative sign in the definition of the Lagrangian; this will result in a more conventional form of the equations. Taking differentials of the Lagrangian and setting to zero gives d (h,λ) L = h T (Kx λx) + µ(x T x 1) = 0 h R, µ R. This results in two equations to be solved: Kx = λx and x T x = 1. The first is an eigenvalue problem for eigenvectors x and eigenvalue λ, the second is a normalization condition on the eigenvectors. 3.3 A quadratic functional with boundary conditions inimize Wu = 1 2 u2 x + u dx over u H 1 subject to the constraints u() = 1 and u(1) = 0. Form a Lagrangian L = W + λ 1 (u() 1) + λ 2 u(1). The stationary point is given by the solution to d (v,µ) L = Integration by parts gives u x v x + v dx + λ 1 v() + λ 2 v(1) + µ 1 (u() 1) + µ 2 u(1) v H 1, µ R 2. d (v,µ) L = v 1 u xx dx + (λ 1 + n u())v() + (λ 2 + n u(1))v(1) + µ 1 (u() 1) + µ 2 (u(1)) = 0 for all v H 1, µ R 2. The minimum will be given by the solution to u xx = 1 with boundary conditions u() = 1, u(1) = 0. The multipliers will be λ 1 = n u() and λ 2 = n u(1). 3.3.1 Discretization of a quadratic functional with boundary conditions Let s now discretize the same functional using the Vandermonde basis of order 1. The i-th basis function is φ i (x) = x i. Plugging into d (v,µ) L from example 3.3 gives 1 1 u j x i x j dx + x i dx + λ 1 () i + λ 2 = 0 i 1, u j () j = 1 u j = 0. 4

Define A 1j = () j, A 2j = 1, K ij = 1 xi x j dx, f i = 1 K A T A 0 xi dx, and b = (1, 0) T. The discrete equations are then u f =. λ b Some advantages to this approach are that we can work with a basis for H 1 (instead of H0 1 ) and we can solve a problem with inhomogeneous boundary conditions. 5