Constrained optimization

Similar documents
Numerical Optimization

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Constrained Optimization

Constrained Optimization and Lagrangian Duality

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Numerical optimization

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

ICS-E4030 Kernel Methods in Machine Learning

More on Lagrange multipliers

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Constrained optimization: direct methods (cont.)

Lecture 18: Optimization Programming

Lecture 3. Optimization Problems and Iterative Algorithms

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

IE 5531: Engineering Optimization I

MATH2070 Optimisation

5 Handling Constraints

Generalization to inequality constrained problem. Maximize

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

CS-E4830 Kernel Methods in Machine Learning

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Gradient Descent. Dr. Xiaowei Huang

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

Constrained Optimization

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machines for Regression

Math 5311 Constrained Optimization Notes

Introduction to Support Vector Machines

Algorithms for constrained local optimization

Statistical Machine Learning from Data

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Optimality Conditions for Constrained Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Support Vector Machines and Kernel Methods

Topic one: Production line profit maximization subject to a production rate constraint. c 2010 Chuan Shi Topic one: Line optimization : 22/79

The Karush-Kuhn-Tucker conditions

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Convex Optimization & Lagrange Duality

Nonlinear Optimization

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Support Vector Machine

Machine Learning. Support Vector Machines. Manfred Huber

Nonlinear Optimization: What s important?

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

ECE580 Solution to Problem Set 6

Lecture 2: Linear SVM in the Dual

Lagrange Multipliers

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Mathematical Foundations -1- Constrained Optimization. Constrained Optimization. An intuitive approach 2. First Order Conditions (FOC) 7

KKT Examples. Stanley B. Gershwin Massachusetts Institute of Technology

1. f(β) 0 (that is, β is a feasible point for the constraints)

Sufficient Conditions for Finite-variable Constrained Minimization

The Karush-Kuhn-Tucker (KKT) conditions

Machine Learning A Geometric Approach

Support Vector Machines: Maximum Margin Classifiers

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

MATH 4211/6211 Optimization Constrained Optimization

Mathematical Economics. Lecture Notes (in extracts)

Existence of minimizers

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Nonlinear Programming and the Kuhn-Tucker Conditions

2.3 Linear Programming

Scientific Computing: Optimization

2.098/6.255/ Optimization Methods Practice True/False Questions

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Math 164-1: Optimization Instructor: Alpár R. Mészáros

Lagrange Relaxation and Duality

Optimality Conditions

4TE3/6TE3. Algorithms for. Continuous Optimization

Linear & nonlinear classifiers

5. Duality. Lagrangian

LECTURE 7 Support vector machines

Chapter 2. Optimization. Gradients, convexity, and ALS

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Chap 2. Optimality conditions

Optimization Methods

Mechanical Systems II. Method of Lagrange Multipliers

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

On the Method of Lagrange Multipliers

Chapter 3 Numerical Methods

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Numerical Optimization. Review: Unconstrained Optimization

Quadratic Programming

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Lecture: Duality of LP, SOCP and SDP

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Date: July 5, Contents

Numerical Optimization of Partial Differential Equations

Optimization Tutorial 1. Basic Gradient Descent

Transcription:

Constrained optimization In general, the formulation of constrained optimization is as follows minj(w), subject to H i (w) = 0, i = 1,..., k. where J is the cost function and H i are the constraints. Lagrange multipliers: This is widely used method for constrained optimization. In order to use it, one needs to form the Lagrange function L(w, λ 1, λ 2,..., λ k ) = J(w) + k i=1 λ i H i (w) In the above λ i are called Lagrange multipliers. Observe that the minimum of L where the gradient w.r.t both w and λ s vanishes solves the original constrained optimization problem (the gradient of L w.r.t. to λ i gives the ith constraint function H i.) Gradient of L w.r.t w gives 1

J(w) w + k i=1 λ i H i (w) w = 0 The resulting set of equations can be solved using iterative method. In some simple cases, there may be even a solution in closed form. Example: Let w = [w 1, w 2 ] T. J(w) = 3w 2 1 + 7w 2 2 + 2w 1w 2. Use Lagrange method to find the minimum of J(w), subject to h(w) = w 1 + 2w 2 2 = 0. L(w, λ) = 3w 2 1 +7w2 2 +2w 1w 2 +λ(w 1 +2w 2 2) J(w) w 1 = 6w 1 + 2w 2 + λ = 0 J(w) w 2 = 14w 2 + 2w 1 + 2λ = 0 J(w) λ = h(w) = w 1 + 2w 2 2 = 0 The solution is 2

w 1 w 2 λ = 6 2 1 2 14 2 1 2 0 1 0 0 2 = 2/3 2/3 16/3 Projection method: If the constraints are simple (e.g. normalization of the parameter vector) and so they define a simple set of admissible parameter values the projection method can be used use a method of unconstrained optimization after each step of the optimization method project orthogonally the intermediate solution onto the constraints set 3

Example: Consider the last example in which we were trying to find the vector w maximizing the 4th order moment of w T x. We stated that the obtained algorithm was not very good as the norm of w would grow without bound. Consider the constrained version - maximize E[(w T x) 4 ] subject to the constraint w 2 = 1. Formulate the Lagrangian L(w, λ) = E[(w T x) 4 ] + λ( w 2 1) The gradient w.r.t. λ gives simply the constraint for the norm of w; the gradient w.r.t. w gives 4E[(w T x) 3 x] + 2λw. Thus, steepest descent gives w E[(w T x) 3 x] 1 2 λw Note the term 1 2λw which should help control the norm of w. 4

Example:Consider again the same problem but this time apply the projection method. The constraint of w 2 = 1 is equivalent to restricting w to points on the unit sphere. The projection algorithm would give w w + αe[(w T x) 3 x] w w w The normalisation of w at each step is equivalent to its orthogonal projection onto the unit sphere. Homework: Use Lagrange method to find extrema of the following objective function on the ellipse f(x) = x 2 1 + x2 2 {x : x 2 1 + 2x2 2 1 = 0} 5

Kuhn-Tucker conditions: The Lagrange method deals with constrained optimization, in which constraints have an equality form. Kuhn-Tucker (also Karush-Kuhn-Tucker) theorem extends this method to the inequality constraints. Consider the following problem: minimise f(x) subject to h(x) = 0, g(x) 0 where f : R n R, h : R n R m, m n and g : R n R p. Active and inactive constraints: An inequality constraint g j 0 is said to be active at x if g j (x) = 0 and it is called inactive at x if g j (x) < 0. Feasible point and feasible set: Any points satisfying the constraints is called a feasible point. The set of all feasible points is called the feasible set. 6

Regular point: Let x satisfy h(x ) = 0, g(x ) 0. Let J(x ) be the index set of active inequality constraints J(x ) = {j : g j (x ) = 0}. We say that x is a regular point if the vectors h i (x ), g j (x ), 1 i m, j J(x ) are linearly independent. Theorem:(Kuhn-Tucker) Let f,h,g C 1. Let x be a regular point and a local minimiser for the problem of minimising f subject to h(x) = 0, g(x) 0. Then, there exist λ R m and µ R p such that 1. µ 0 2. f(x ) + λ Dh(x ) + µ T Dg(x ) = 0 3. µ T g(x ) = 0. 7

Notes: Dh(x ) = ( h 1 (x ),..., h m (x )) T Dg(x ) = ( g 1 (x ),..., g p (x )) T λ is Lagrange multiplier vector and its components are Lagrange multipliers µ is Karush-Kuhn-Tucker (KKT) multiplier vector and its components are Karush- Kuhn-Tucker (KKT) multipliers 8

note that µ j 0 and g j(x ) 0. Thus µ T g(x ) = µ 1 g 1(x ) + µ 2 g 2(x ) + + µ pg p (x ) = 0 implies, that if g j (x ) < 0 then µ j = 0. Therefore the KKT multipliers corresponding to inactive constraints are zero. the other KKT multipliers are positive. Example: Assume that there are only 3 inequality constraints, of which g 3 is inactive, hence µ 3 = 0. From the KKT theorem or f(x ) + µ 1 g 1(x ) + µ 2 g 2(x ) = 0 f(x ) = µ 1 g 1(x ) µ 2 g 2(x ). Thus, f(x ) is a linear combination of the vectors g 1 (x ) and g 2 (x ) with positive multipliers. 9

Example: minf(x, y) = x 2 + x + y 2, s.t. 2x + 2y 1 x 0 y 0 This is restated as minf(x, y) = x 2 + x + y 2, s.t. 2x + 2y 1 0 x 0 y 0 The KKT translates as 1. µ 1 0 µ 2 0 µ 3 0 2. { 2x + 1 + 2µ1 µ 2 = 0 2y + 2µ 1 µ 3 = 0 3. µ 1 (2x + 2y 1) = 0 µ 2 ( x) = 0 µ 3 ( y) = 0 10

case 1 x > 0, y > 0 From 3. µ 2 = 0, µ 3 = 0 and µ 1 (2x + 2y 1) = 0 From 2. 2x+1+2µ 1 = 0 and 2y+2µ 1 = 0 which means 2x 2y + 1 = 0 There is no valid solution as µ 1 = 0 would lead to x = 0.5 and y = 0 which is invalid. or 2x + 2y 1 = 0 and 2x 2y + 1 = 0 would lead x = 0 invalid. case 2 x > 0, y = 0. From 3. µ 2 = 0 and µ 1 (2x 1) = 0 From 2. 2µ 1 µ 3 = 0 and 2x+1+2µ 1 = 0 There is not valid solution as x = 1/2 would lead to µ 1 = 1 which is invalid. or µ 1 = 0 would lead to x = 1/2 which is also invalid. 11

case 3 x = 0, y > 0 From 3. µ 3 = 0 and µ 1 (2y 1) = 0 From 2. 1+2µ 1 µ 2 = 0 and 2y+2µ 1 = 0 There is no valid solution as y = 1/2 would lead to µ 1 = 0.5 which is invalid. or µ 1 = 0 would lead to y = 0 which is also invalid. case 4 x = 0, y = 0 From 3. µ 1 = 0. From 2. µ 3 = 0, µ 2 = 1. So this is a valid solution. Hence the solution is x = 0, and y = 0. 12