Lagrange Multipliers

Similar documents
MATH2070 Optimisation

1 Lagrange Multiplier Method

Method of Lagrange Multipliers

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Lagrange Multipliers

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Review of Optimization Methods

Lecture 9: Implicit function theorem, constrained extrema and Lagrange multipliers

Partial Derivatives. w = f(x, y, z).

EC /11. Math for Microeconomics September Course, Part II Lecture Notes. Course Outline

Examination paper for TMA4180 Optimization I

g(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to

Math 212-Lecture Interior critical points of functions of two variables

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

Mechanical Systems II. Method of Lagrange Multipliers

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Math 118, Fall 2014 Final Exam

Let f(x) = x, but the domain of f is the interval 0 x 1. Note

Math 207 Honors Calculus III Final Exam Solutions

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

3.7 Constrained Optimization and Lagrange Multipliers

Written Examination

Higher order derivative

Lagrange Multipliers

Tangent spaces, normals and extrema

MATH20132 Calculus of Several Variables. 2018

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

MATH 4211/6211 Optimization Constrained Optimization

Link lecture - Lagrange Multipliers

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

0.1 Tangent Spaces and Lagrange Multipliers

Generalization to inequality constrained problem. Maximize

Constrained optimization

MATH529 Fundamentals of Optimization Constrained Optimization I

1 Computing with constraints

APPLICATIONS The eigenvalues are λ = 5, 5. An orthonormal basis of eigenvectors consists of

Lagrange Multipliers

Constrained Optimization

3. Minimization with constraints Problem III. Minimize f(x) in R n given that x satisfies the equality constraints. g j (x) = c j, j = 1,...

Applications of Linear Programming

Summary Notes on Maximization

EC /11. Math for Microeconomics September Course, Part II Problem Set 1 with Solutions. a11 a 12. x 2

MATH Midterm 1 Sample 4

Numerical Optimization

Computation. For QDA we need to calculate: Lets first consider the case that

2.3 Linear Programming

Nonlinear Optimization

Constrained optimization: direct methods (cont.)

3 Applications of partial differentiation

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

HOMEWORK 7 SOLUTIONS

Optimization using Calculus. Optimization of Functions of Multiple Variables subject to Equality Constraints

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

ECE580 Solution to Problem Set 6

Implicit Functions, Curves and Surfaces

Lagrange multipliers. Portfolio optimization. The Lagrange multipliers method for finding constrained extrema of multivariable functions.

Second Order Optimality Conditions for Constrained Nonlinear Programming

b) The system of ODE s d x = v(x) in U. (2) dt

Math Advanced Calculus II

Multivariable Calculus and Matrix Algebra-Summer 2017

Mathematical Economics. Lecture Notes (in extracts)

7a3 2. (c) πa 3 (d) πa 3 (e) πa3

Constrained Optimization

Optimality Conditions

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

5 Handling Constraints

Optimality Conditions for Constrained Optimization

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

Ph.D. Katarína Bellová Page 1 Mathematics 2 (10-PHY-BIPMA2) EXAM - Solutions, 20 July 2017, 10:00 12:00 All answers to be justified.

x +3y 2t = 1 2x +y +z +t = 2 3x y +z t = 7 2x +6y +z +t = a

Math 291-3: Lecture Notes Northwestern University, Spring 2016

Introduction to Optimization Techniques. Nonlinear Optimization in Function Spaces

Lecture 2: Convex Sets and Functions

Sufficient Conditions for Finite-variable Constrained Minimization

Constrained Optimization Theory

Taylor Series and stationary points

Lecture 4: Optimization. Maximizing a function of a single variable

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

September Math Course: First Order Derivative

f(x, y) = 1 2 x y2 xy 3

Optimization: Problem Set Solutions

IE 5531: Engineering Optimization I

Introduction to Nonlinear Stochastic Programming

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Math 273a: Optimization Basic concepts

SECTION C: CONTINUOUS OPTIMISATION LECTURE 11: THE METHOD OF LAGRANGE MULTIPLIERS

LAGRANGE MULTIPLIERS

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Lecture 13 - Wednesday April 29th

B553 Lecture 3: Multivariate Calculus and Linear Algebra Review

Convex Optimization and Modeling

Computational Optimization. Constrained Optimization Part 2

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR

Introduction to unconstrained optimization - direct search methods

Exercises for Multivariable Differential Calculus XM521

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

Optimization Theory. Lectures 4-6

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Transcription:

Lagrange Multipliers (Com S 477/577 Notes) Yan-Bin Jia Nov 9, 2017 1 Introduction We turn now to the study of minimization with constraints. More specifically, we will tackle the following problem: minimize f(x) subject to h 1 (x) = 0. h m (x) = 0 where x Ω R n, and the functions f, h 1,...,h m are continuous, and usually assumed to be in C 2 (i.e., with continuous second partial derivatives). When f and h j s are linear, the problem is a linear programming one solvable using the simplex algorithm [3]. Hence we would like to focus on the case that these functions are nonlinear. In order to gain some intuition, let us look at the case where n = 2 and m = 1. he problem becomes minimize f(x,y) subject to h(x,y) = 0, x,y R. he constraint h(x,y) = 0 defines a curve. Differentiate the equation with respect to x: h x + h dy y dx = 0. he tangent of the curve is (x,y) = (1, dy dx ). And the gradient of the curve is h = ( h the above equation states that h = 0; x, h y ). So namely, the tangent of the curve must be normal to the gradient all the time. Suppose we are at a point on the curve. o stay on the curve, any motion must be along the tangent. h h(x, y) = 0 1

In order to increase or decrease f(x,y), motion along the constraint curve must have a component along the gradient of f, that is, f 0. f h(x, y) = 0 f At an extremum of f, a differential motion should not yield a component of motion along f. hus is orthogonal to f; in other words, the condition f = 0 must hold. Now is orthogonal to both gradients f and h at an extrema. his means that f and h must be parallel. Phrased differently, there exists some λ R such that f +λ h = 0. (1) (x 2, y 2 ) h(x, y) = 0 (x 1, y 1 ) (x, y ) f h f = c 5 f = c 4 f = c 1 f = c f = c 3 he figure above explains condition (1) by superposing the curve h(x,y) = 0 onto the family of level curves of f(x,y), that is, the collection of curves f(x,y) = c, where c is any real number in the range of f. In the figure, c 5 > c 4 > c 3 > c > c 1. he tangent of a level curve is always orthogonal to the gradient f. Otherwise moving along the curve would result in an increase or decrease of the value of f. Imagine a point moving on the curve h(x,y) = 0 from (x 1,y 1 ) to (x 2,y 2 ). Initially, the motion has a component along the negative gradient direction f, resulting in the decrease of the value of f. his component becomes smaller and smaller. When the moving point reaches (x,y ), the motion is orthogonal to the gradient. From that point on, the motion starts having a 2

component along the gradient direction f so the value of f increases. hus at (x,y ), f achieves a local minimum, which is c. he motion is in the tangential direction of the curve h(x,y) = 0, which is orthogonal to the gradient h. herefore, at the point (x,y ) the two gradients f and h must be collinear. his is what equation (1) says. It is clear that the two curves f(x,y) = c and h(x,y) = 0 are tangent at (x,y ). Suppose we find the set S of points satisfying the following equations: h(x,y) = 0, f +λ h = 0, for some λ. hen S contains the extremal points of f subject to the constraints h(x,y) = 0. he above two equations constitute a nonlinear system in the variables x, y, λ. It can be solved using numerical techniques, for example, Newton s method. 2 Lagrangian It is convenient to introduce the Lagrangian associated with the constrained problem, defined as Note that l = l(x,y,λ) = f(x,y)+λh(x,y). f x +λ h x f y +λ h y h = ( f +λ h,h). hus, setting l = 0 yields the same system of nonlinear equations we derived earlier. he value λ is known as the Lagrange multiplier. he approach of constructing the Lagrangians and setting its gradient to zero is known as the method of Lagrange multipliers. Here we are not minimizing the Lagrangian, but merely finding its stationary point (x, y, λ). Example 1 Find the extremal values of the function f(x,y) = xy subject to the constraint h(x,y) = x2 8 + y2 1 = 0. 2 We first construct the Lagrangian and find its gradient: ( ) x 2 l(x,y,λ) = xy +λ 8 + y2 2 1, he above leads to three equations l(x, yλ) = y + λx 4 x+λy x 2 8 + y2 2 1 = 0. y + λx = 0, (2) 4 x+λy = 0, (3) x 2 +4y 2 = 8. (4) 3

Eliminate x from (2) and (3): y λ2 4 y = 0, which, since y 0 (otherwise x = 0 would hold by (2)), leads to λ 2 = 4 and λ = ±2. Now x = ±2y by (3). Substituting this equation into (4) gives us y = ±1 and x = ±2. So there are four extremal points of f subject to the constraint h: (2,1),( 2, 1),(2, 1), and ( 2, 1). he maximum value 2 is achieved at the first two points while the minimum value 2 is achieved at the last two points. c = 2 c = 2 c = 1 c = 1 Graphically, the constraint h defines an ellipse. he level contours of f are the hyperbolas xy = c, with c increasing as the curves move out from the origin. 3 General Formulation Now we would like to generalize to the case with multiple constraints. Let h = (h 1,...,h m ) be a function from R n to R m. Consider the constrained optimization problem below. minimize f(x) subject to h(x) = 0. (5) Each function h i defines a surface S i = {x h i (x) = 0} of generally n 1 dimensions in the space R n. his surface is smooth provided h i (x) C 1. he m constraints h(x) = 0 together defines a surface S, which is the intersection of the surfaces S 1,...,S m ; namely, {x h(x) = 0} = {x h 1 (x) = 0} {x h m (x) = 0}. 4

h 2 (x) = 0 h(x ) x h(x) = 0 h 2 (x) h 1 (x) x S h 1 (x) = 0 S x(t) = 0 (a) (b) Figure 1: angent space at x on a surface defined by (a) one constraint h(x) = 0 (shown as a tangent plane) and (b) two constraints h 1 (x) = 0 and h 2 (x) = 0 (shown as a tangent line). he surface S is of generally n m dimensions. In Figure 1(a), S is defined by one constraint; in (b), the surface is defined by two constraints. he constrained optimization (5) is carried out over the surface S. A curve on S is a family of points x(t) S with a t b. he curve is differentiable if x = dx/dt exists, and twice differentiable if x = d 2 x/dt 2 exists. he curve passes through a point x if x = x(t ) for some t, a t b. he tangent space at x is the subspace of R n spanned by the tangents x (t ) of all curves x(t) on S such that x(t ) = x. In other words, is the set of the derivatives at x of all surface curves through x. Since a curve on S through x is also one on S i, every tangent vector to S at x is automatically a tangent vector to S i. his implies that the tangent space to S at x is a subspaceof thetangent space i to S i at the same point, for 1 i m. Meanwhile, the gradient h i is orthogonal to i because, for any curve x(t) in the surface S i such that x(t ) = x, for some t, we have h i(x ) = h i (x ) x (t ) = 0. herefore, h i must be orthogonal to the subspace of i. In other words, h i lies in the normal space, which is the complement of. A point x satisfying h(x) = 0 is a regular point of the constraint if the gradient vectors h 1 (x),..., h m (x) are linearly independent. From our previous intuition, we expect that f v = 0 for all v at an extremum. his implies that f also lies in the orthogonal complement of. We would like to claim that f can be composed from a linear combination of the h i s. his is only valid provided that these gradients span, which is true when the extremal point is regular. heorem 1 At a regular point x of the surface S defined by h(x) = 0, the tangent space is the same as {y h(x)y = 0}, 5

where h = h 1. h m. he rows of the matrix h(x) are the gradient vectors h j (x), j = 1,...,m. he theorem says that the tangent space at x is equal to the null space of h(x). hus, its orthogonal complement must equal the row space of h(x). Hence the vectors h j (x) span. Example 2. Suppose h(x 1,x 2 ) = x 1. hen h(x) = 0 yields the x 2 axis. And h = (1,0) at all the points on the axis. So every x R 2 is regular. he tangent space is also the x 2 axis and has dimension 1. If instead h(x 1,x 2 ) = x 2 1 then h(x) = 0 still defines the x 2 axis. On this axis, h = (2x 1,0) = (0,0). hus, no point is regular. he dimension of, which is the x 2 axis, is still one, but the dimension of the space {y h y = 0} is two. Lemma 2 Let x be a local extremum of f subject to the constraints h(x) = 0. hen for all y in the tangent space of the constraint surface at x, f(x )y = 0. he next theorem states that the Lagrange multiplier method is a necessary condition on an extremum point. heorem 3 (First-Order Necessary Conditions) Let x be a local extremum point of f subject to the constraints h(x) = 0. Assume further that x is a regular point of these constraints. hen there is a λ R m such that f(x )+λ h(x ) = 0. he first order necessary conditions together with the constraints h(x ) = 0 give a total of n + m equations in n + m variables x and λ. hus a unique solution can be determined at least locally. Example 3. We seek to construct a cardboard box of maximum volume, given a fixed area of cardboard. Denoting the dimensions of the box by x,y,z, the problem can be expressed as maximize xyz subject to xy +yz +xz = c 2, where c > 0 is the given area of cardboard. Consider the Lagrangian xyz + λ(xy + yz + xz c 2 ). he first-order necessary conditions are easily found to be yz +λ(y +z) = 0, (6) xz +λ(x+z) = 0, (7) xy +λ(x+y) = 0, (8) 6

together with the original constraint. Before solving the above equations, we note that their sum is which, under the original constraint, becomes (xy +yz +xz)+2λ(x+y +z) = 0, c/2+2λ(x+y +z) = 0. Hence, it is clear that λ 0. Neither of x,y,z can be zero since if either is zero, all must be so according to (6) (8). o solve the equations (6) (8), multiply (6) by x and (7) by y, and then subtract the two to obtain λ(x y)z = 0. Operate similarly on the second and the third to obtain Since no variables can be zero, it follows that λ(y z)x = 0. x = y = z = c 6 is the unique solution to the necessary conditions. he box must be a cube. We can derive second-order conditions for constrained problems, assuming f and h are twice continuously differentiable. heorem 4 (Second-Order Necessary Conditions) Suppose that x is a local minimum of f subject to h(x) = 0 and that x is a regular point of these constraints. Denote by F the Hessian of f, and by H i the Hessian of h i, for 1 i m. hen there is a λ R m such that he matrix f(x )+λ h(x ) = 0. L(x ) = F(x )+ is positive semidefinite on the tangent space {y h(x )y = 0}. m λ i H i (x ) (9) heorem 5 (Second-Order Sufficient Conditions) Suppose there is a point x satisfying h(x ) = 0, and a λ such that f(x )+λ h(x ) = 0. Suppose also that the matrix L(x ) defined in (9) is positive definite on the tangent space {y h(x )y = 0}. hen x is a strict local minimum of f subject to h(x) = 0. i=1 Example 4. consider the problem minimize x 1 x 2 +x 2 x 3 +x 1 x 3 subject to x 1 +x 2 +x 3 = 3 7

he first order necessary conditions become x 2 +x 3 +λ = 0, x 1 +x 3 +λ = 0, x 1 +x 2 +λ = 0. We can solve these three equations together with the one constraint equation and obtain x 1 = x 2 = x 3 = 1 and λ = 2. hus x = (1,1,1). Now we need to resort to the second-order sufficient conditions to determine if the problem achieves a local maximum or minimum at x 1 = x 2 = x 3 = 1. We find the matrix L(x ) = F(x )+λh(x ) = 0 1 1 1 0 1 +λ 0 0 0 0 0 0 1 1 0 0 0 0 = 0 1 1 1 0 1 1 1 0 is neither positive nor negative definite. On the tangent space M = {y y 1 + y 2 + y 3 = 0}, however, we note that y Ly = y 1 (y 2 +y 3 )+y 2 (y 1 +y 3 )+y 3 (y 1 +y 2 ) = (y 2 1 +y2 2 +y2 3 ) < 0, for all y 0.. hus L is negative definite on M and the solution 3 we found is at least a local maximum. 4 Inequality Constraints Finally, we address problems of the general form minimize f(x) subject to h(x) = 0 g(x) 0 where h = (h 1,...,h m ) and g = (g 1,...,g p ). A fundamental concept that provides a great deal of insight as well as simplifies the required theoretical development is that of an active constraint. An inequality constraint g i (x) 0 is said to be active at a feasible point x if g i (x) = 0 and inactive at x if g i (x) < 0. By convention we refer to any equality constraint h i (x) = 0 as active at any feasible point. he constraints active at a feasible point x restrict the domain of feasibility in neighborhoods of x. herefore, in studying the properties of a local minimum point, it is clear that attention can be restricted to the active constraints. his is illustrated in the figure below where local properties satisfied by the solution x obviously do not depend on the inactive constraints g 2 and g 3. 8

x = 0 feasible region g 3 (x) = 0 g 1 (x) = 0 g 2 (x) = 0 Assume that the functions f, h = (h 1,...,h m ), g = (g 1,...,g p ) are twice continuously differentiable. Let x be a point satisfying the constraints h(x ) = 0 and g(x ) 0, and let J = {j g j (x ) = 0}. hen x is said to be a regular point of the above constraints if the gradient vectors h i (x ), g j (x ), 1 i m, j J are linearly independent. Now suppose this regular point x is also a relative minimum point for the original problem (1). hen it can be shown that there exists a vector λ R m and a vector µ R p with µ 0 such that f(x )+λ h(x )+µ g(x ) = 0; µ g(x ) = 0. Since µ 0 and g(x ) 0, the second constraint above is equivalent to the statement that µ j 0, for each 1 j p, only if g j is active. o find a solution, we enumerate various combinations of active constraints, that is, constraints where equalities are attained at x, and check the signs of the resulting Lagrange multipliers. here are a number of distinct theories concerning this problem, based on various regularity conditions or constraint qualifications, which are directed toward obtaining definitive general statements of necessary and sufficient conditions. One can by no means pretend that all such results can be obtained as minor extensions of the theory for problems having equality constraints only. o date, however, their use has been limited to small-scale programming problems of two or three variables. We refer you to [3] for more on the subject. References [1]. Abell. Lecture notes. he Robotics Institute, School of Computer Science, Carnegie Mellon University, 1992. [2] M. Avriel. Nonlinear Programming: Analysis and Methods. Prentice-Hall, 1976; Dover Publications, 2003. 9

[3] D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, 2nd edition, 1984. 10