Lagrange Multipliers

Similar documents
Let f(x) = x, but the domain of f is the interval 0 x 1. Note

3.7 Constrained Optimization and Lagrange Multipliers

This exam will be over material covered in class from Monday 14 February through Tuesday 8 March, corresponding to sections in the text.

Practice problems for Exam 1. a b = (2) 2 + (4) 2 + ( 3) 2 = 29

Solutions to old Exam 3 problems

LB 220 Homework 4 Solutions

Partial Derivatives. w = f(x, y, z).

APPLICATIONS The eigenvalues are λ = 5, 5. An orthonormal basis of eigenvectors consists of

3 Applications of partial differentiation

MATH2070 Optimisation

25. Chain Rule. Now, f is a function of t only. Expand by multiplication:

0, otherwise. Find each of the following limits, or explain that the limit does not exist.

g(t) = f(x 1 (t),..., x n (t)).

1 Lagrange Multiplier Method

Transpose & Dot Product

Graphs of Antiderivatives, Substitution Integrals

The Relation between the Integral and the Derivative Graphs. Unit #10 : Graphs of Antiderivatives, Substitution Integrals

IE 5531: Engineering Optimization I

Course 2BA1: Hilary Term 2007 Section 8: Quaternions and Rotations

Math 118, Fall 2014 Final Exam

Transpose & Dot Product

MATH 19520/51 Class 5

REVIEW OF DIFFERENTIAL CALCULUS

COMP 175 COMPUTER GRAPHICS. Lecture 04: Transform 1. COMP 175: Computer Graphics February 9, Erik Anderson 04 Transform 1

100 CHAPTER 4. SYSTEMS AND ADAPTIVE STEP SIZE METHODS APPENDIX

Dr. Allen Back. Sep. 10, 2014

Lecture 13 - Wednesday April 29th

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Classical Mechanics. Luis Anchordoqui

Computation. For QDA we need to calculate: Lets first consider the case that

Sec. 1.1: Basics of Vectors

Math 203A - Solution Set 1

MATH Midterm 1 Sample 4

Tensor Visualization. CSC 7443: Scientific Information Visualization

Lagrange Multipliers

MATH 353 LECTURE NOTES: WEEK 1 FIRST ORDER ODES

Exercises for Multivariable Differential Calculus XM521

Calculus for the Life Sciences II Assignment 6 solutions. f(x, y) = 3π 3 cos 2x + 2 sin 3y

APPLICATIONS OF DIFFERENTIATION

DEPARTMENT OF MATHEMATICS AND STATISTICS UNIVERSITY OF MASSACHUSETTS. MATH 233 SOME SOLUTIONS TO EXAM 2 Fall 2018

Physics 6010, Fall 2016 Constraints and Lagrange Multipliers. Relevant Sections in Text:

Ph.D. Katarína Bellová Page 1 Mathematics 2 (10-PHY-BIPMA2) EXAM - Solutions, 20 July 2017, 10:00 12:00 All answers to be justified.

Math 290-2: Linear Algebra & Multivariable Calculus Northwestern University, Lecture Notes

Linear Algebra. Chapter 8: Eigenvalues: Further Applications and Computations Section 8.2. Applications to Geometry Proofs of Theorems.

Higher Mathematics Course Notes

VANDERBILT UNIVERSITY. MATH 2300 MULTIVARIABLE CALCULUS Practice Test 1 Solutions

Section 14.1 Vector Functions and Space Curves

September Math Course: First Order Derivative

Section 3.5: Implicit Differentiation

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR

Tangent spaces, normals and extrema

7a3 2. (c) πa 3 (d) πa 3 (e) πa3

TEST CODE: MIII (Objective type) 2010 SYLLABUS

AB-267 DYNAMICS & CONTROL OF FLEXIBLE AIRCRAFT

Calculus 2502A - Advanced Calculus I Fall : Local minima and maxima

Math 265H: Calculus III Practice Midterm II: Fall 2014

16.1 L.P. Duality Applied to the Minimax Theorem

Rotational Motion. Chapter 4. P. J. Grandinetti. Sep. 1, Chem P. J. Grandinetti (Chem. 4300) Rotational Motion Sep.

MATH20132 Calculus of Several Variables. 2018

Math Advanced Calculus II

ENGI Partial Differentiation Page y f x

ALGEBRAIC GEOMETRY HOMEWORK 3

Higher order derivative

1 Vectors and the Scalar Product

MATH The Chain Rule Fall 2016 A vector function of a vector variable is a function F: R n R m. In practice, if x 1, x n is the input,

Group, Rings, and Fields Rahul Pandharipande. I. Sets Let S be a set. The Cartesian product S S is the set of ordered pairs of elements of S,

Course Notes Math 275 Boise State University. Shari Ultman

Lecture Notes for MATH6106. March 25, 2010

Reading group: Calculus of Variations and Optimal Control Theory by Daniel Liberzon

Multivariable Calculus and Matrix Algebra-Summer 2017

Math 164-1: Optimization Instructor: Alpár R. Mészáros

How to Use Calculus Like a Physicist

Rotational & Rigid-Body Mechanics. Lectures 3+4

Directional Derivatives in the Plane

HOMEWORK 7 SOLUTIONS

MTH4101 CALCULUS II REVISION NOTES. 1. COMPLEX NUMBERS (Thomas Appendix 7 + lecture notes) ax 2 + bx + c = 0. x = b ± b 2 4ac 2a. i = 1.

3. Minimization with constraints Problem III. Minimize f(x) in R n given that x satisfies the equality constraints. g j (x) = c j, j = 1,...

Unit #10 : Graphs of Antiderivatives; Substitution Integrals

2015 Math Camp Calculus Exam Solution

Math 212-Lecture 8. The chain rule with one independent variable

General Physics I. Lecture 10: Rolling Motion and Angular Momentum.

Chapter 3 Numerical Methods

Multiple Integrals and Vector Calculus (Oxford Physics) Synopsis and Problem Sets; Hilary 2015

ORDINARY DIFFERENTIAL EQUATIONS

Multiple Integrals and Vector Calculus: Synopsis

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples

Course MA2C02, Hilary Term 2010 Section 4: Vectors and Quaternions

Unit #10 : Graphs of Antiderivatives, Substitution Integrals

Math Review for Exam 3

Mechanical Systems II. Method of Lagrange Multipliers

Implicit Functions, Curves and Surfaces

Multivariable Calculus Notes. Faraad Armwood. Fall: Chapter 1: Vectors, Dot Product, Cross Product, Planes, Cylindrical & Spherical Coordinates

MATH20411 PDEs and Vector Calculus B

Vector Calculus. Lecture Notes

12.1 Taylor Polynomials In Two-Dimensions: Nonlinear Approximations of F(x,y) Polynomials in Two-Variables. Taylor Polynomials and Approximations.

Math 302 Outcome Statements Winter 2013

Exam 2 Review Solutions

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.)

Print Your Name: Your Section:

Unit Speed Curves. Recall that a curve Α is said to be a unit speed curve if

Transcription:

Optimization with Constraints As long as algebra and geometry have been separated, their progress have been slow and their uses limited; but when these two sciences have been united, they have lent each mutual forces, and have marched together towards perfection. Motivation Many problems in bioinformatics require us to find the maximum or minimum of a differentiable function f. For example, in this course we want to find the optimal rotation that causes one protein to be superimposed over another in 3D space. We will also need to do a type of optimization when we look at classification problems. 1

Introduction (1) he points in the domain of f where the minimum or maximum occurs are called the critical points (also the extreme points). You have already seen examples of critical points in elementary calculus: Given f(x) with f differentiable, find values of x such that f(x) is a local minimum (or maximum). Usual approach: We assume these x values are such that f (x) = 0. hat is the derivative is zero at this critical point in the x domain. 3 Introduction () Our optimization problems will be more complicated for two reasons: 1. he function f depends on several variables, that is, f(x 1, x,, x n ).. In many applications, we will have a second equation g(x 1, x,, x n ) = 0 that also must be satisfied. n We still want a point in that is an extreme point for f but it must also lie on the line or surface defined by g. We will consider three optimization problems corresponding to n = 1,, 3 with no constraint and with a constraint (6 examples altogether). 4

hree Optimization Problems (1) Problem 1 (n = 1): Find x that will minimize Since f x x 4. it is clear that the minimum is at x = and the f value is 4. Note also that f ' x x 4 and setting this to zero also gives us x =. f x x 4x 8. f = x - 4x + 8 40 35 30 5 0 15 10 5 0-4 - 0 4 6 8 x 5 hree Optimization Problems () Problem (n = ): Minimize f x x x 4 8. 1 Because the squared terms cannot be negative, the minimum occurs when they are both zero, i.e. when x 1 = 0, x = 0 and at this critical point the f value is 8. Note also that f f x1 8x x and x 1 Setting both of these to zero also gives us x 1 = 0, x = 0. f x x 1 6 3

hree Optimization Problems (3) Problem 3 (n = 3): Minimize f x x 4x 16x 8x 4. 1 3 We can rewrite this as Because the squared terms cannot be negative, the minimum will occur when they are all zero, that is, when x 1 = 0, x = 1, x 3 = 0 and at this critical point the f value is 0. Note also that f x f x, 8x 8 1 x 1 f x x 4 x 1 16x 0 and 1 3 Setting all of these to zero also gives us x 1 = 0, x = 1, x 3 = 0. Unfortunately, a graph of f versus x 1, x, x 3 would be in a four dimensional space and impossible to visualize. f x 3 3x 3 7 Optimization Via Derivatives For these simple cases (no constraint function) the general strategy is to compute the derivatives with respect to all the independent variables: f x i1,,, n Setting these to zero: xi equations in n unknowns. i f 0 i 1,,, n gives n Solve the equations to get the critical points. 8 4

Level Sets (1) Level sets provide a visual aid for some of the geometric arguments that we will need to make. Given some particular real value, for example, r, a level set is a set of points in the domain of f such that f is equal to this value r. Formally: x, x,, x f x, x,, x r. 1 n 1 For Problem 1, the level set consists of two points when r > 4 and a single point when r = 4 (empty otherwise). For Problem, the level set is an ellipse when r > 8. For Problem 3, the level set is an ellipsoid when r > 0. n 9 Level Sets () he level sets for our three optimization Problems: f = x - 4x + 8 40 35 30 5 0 15? 4D space 10 5 0-4 - 0 4 6 8 x x f x x x r = 0 4 8 0 x x, x, f x, x x 4x 8 1 1 1 1 r = 1 x, x, x 1 3 f x, x, x 1 3 x 4x 16x 8x 0 36 1 3 r = 36 10 5

Gradient Given a scalar function f mapping a vector x x, x,, x to a real value, that is f 1 n x we define the gradient of f or grad(f) as: f f f f,,,. x x x 1 n Note that it is a column vector. But its entries are functions of x x, x,, x. 1 n We really have a vector field : one vector defined for each point in the space. SO: to get a critical point we find those x x, x,, x such that 1 n n f 0. Alternate notation for n = 3: f f f f i j k. x x x 1 3 11 Gradients and Level Sets (1) A gradient essentially tells us how level sets change as r increases. x x x x Consider a gradient vector at where x is a level set point. df f x 4 dx df x 8 dx x,,, n 1 Problem 1 Problem Problem 3 df x 6 8 dx x1 f 8x hey point in the direction of increasing r. x1 f 8x 8 3x 3 1 6

Gradients and Level Sets () In some cases a diagram will show a set of gradient vectors taken at regular intervals from the background field along with a set of level curves: Problem with an array of gradient vectors and 4 level curves: 13 Gradients and Level Sets (3) HEOREM: he gradient of f is normal to the level set of f at that point. Proof sketch: Consider a point p in the level set and a gradient vector that is defined at p. he level set going through p is x f x f p. Assume C is any differentiable curve that: 1. Is parameterized by t. Lies within the level set of f 3. Passes through p when t = t p C t c t, c t,, c t. 1 n 14 1 n f c t, c t,, c t f p. p 1 p p n p C t c t, c t,, c t p. 7

Gradients and Level Sets (4) HEOREM: he gradient of f is normal to the level set of f at that point. Proof sketch continued: he second assumption gives us d d f c t, c 1 t,, c t f p 0. n dt dt But using the chain rule on the LHS we can write: n f c t, c t,, c t 1 n dc t i 0. x dt i1 i dc At t = t p this can be written as: f dt Gradient vector at p. So, the gradient at p is normal to any tangent at p in the level set implying the gradient is normal to the level set. Since f(p) is a constant. 0. tt p angent to C at p. 15 Problems with Constraints (1) We now consider the same problems but with constraints: Recall: We still want to minimize or maximize f but now the point must also satisfy the equation g(x 1, x,, x n ) = 0. Let us go through the various problems and consider the effects of a constraint: Problem 1: 30 5 Minimize f x x 4x 8. 0 subject to g(x) = 0. 15 10 Find x values such that g(x) = 0 5 A discrete set; then find which of these x not very values produces the minimal f. x interesting. g(x) = 0 f = x - 4x + 8 40 35 0-4 - 0 4 6 8 16 8

Problems with Constraints () Problem with a constraint: f x x 4x 8 1 g x x x 4 0. 1 Minimize subject to: In this case we can solve for x 1 in g(x) to get x 1 = 4 x. f hen f becomes: 8x 16x 4 8x 1. So x = 1 and x 1 = giving a value for f that is 16. x x 1 (4,0) (,1) (0,) 1 g x x x 4 0 17 Problems with Constraints () In the last problem we were lucky because g(x 1,x ) = 0 could be solved for x 1 or x so that a substitution could be made into f(x 1,x ) reducing it to a minimization problem in one variable. In general, we cannot rely on this strategy. he constraint g(x 1,x ) = 0 could be so complicated that it is impossible to solve for one of its variables. Note that the previous visualization in f vs x space cannot be carried over to Problem 3. We need a new approach. his will require level curves, gradients, and eventually the Lagrange multiplier strategy. 18 9

A Different Approach (1) Consider viewing the f vs x scenario from above, along the f axis: We would see a series of level sets: he constraint g x x x 4 0 1 appears as the green line. here are 3 cases to analyze when considering the relationship between a level set and the constraint. Case 1: For low values of r the level set does not meet the constraint. So, even though the f values are very small we must reject these points because they do not satisfy the constraint. 19 A Different Approach () Case : As r increases the level set will eventually just meet the constraint in a tangential fashion. his is the red ellipse. Case 3: As r continues to increase the level set will cut the constraint in two places as illustrated by the blue ellipse. hese two points satisfy the constraint but we can do better! here are points between them on the constraint that correspond to intersections of the constraint with level sets of f having lower r values. So, Case is the situation that we want to discover. 0 10

A Different Approach (3) Case is characterized by having the level curve of f(x 1,x ) tangent to the constraint curve g(x 1,x ) = 0. In other words: he normal to both f and g at the point of contact are in the same direction. 1 A Different Approach (4) Notice that this visualization will also work for our Problem 3: In this case the level sets are ellipsoids that keep expanding until they meet the constraint surface. As before we want the normal to the constraint surface to be parallel to the normal of the level surface. 11

he Lagrange Formulation (1) Parallel normals at the point p can be expressed as: f p g p 0. g p he Lagrange formulation states: o find the extreme points of a function f(x) = f(x 1, x,, x n ) that are also subject to the constraint that they lie on g(x) = g(x 1, x,, x n ) = 0 we form the Lagrangian and find points x such that L( x, ) f x g x Lx (, ) 0. hat is: he gradient of f at p is a constant multiple of the gradient of g at p. Such a point p will be one of the required extreme points. 3 he Lagrange Formulation () he first set of equations is actually stating: f x g x Recall: L( x, ) f x g x. Note the significance of Lx (, ) 0. o compute the gradient of L we are taking partial derivatives with respect to all the x i and also. his gives us: L f g 0 i 1,,, n x x x i i i L 0 g x 0. Now solve n + 1 equations in n + 1 unknowns.. 4 1

Importance of the Lagrange Formulation Recall the requirement: f x g x g x 0 when x is an extreme point p. he previous slide shows that this is equivalent to with L( x, ) f x g x. Lx (, ) 0 Why is this important? Finding critical points of Lx (, ) 0 is a problem that does not mention the constraint as a separate issue. In that sense our problem is simpler. he tradeoff is that we are now dealing with a higher dimensional space (n + 1 instead of n dimensions). 5 Constrained Problem à la Lagrange Recall Problem with a constraint: Minimize f x x 4x 8 1 subject to: g x x x 4 0. 1 he Lagrangian is: Lx, f x g x x 4x 8 1 x x 4 1. Setting Lx (, ) 0 gives: L x1 0 x1 x1 x L 8x 0 x x x 1 1 L x x 4 1 0. the same solution we saw earlier. 6 13

Extra Constraints (1) For some applications we will have several constraints. o motivate this, we go back over our three problems this time imposing constraints, call them g 1 (x) = 0 and g (x) = 0. Problem 1: Both constraints g 1 (x) = 0 and g (x) = 0 would have to have at least one point on the real axis that simultaneously satisfied these conditions (very unlikely!). We would then consider the value of f at this point or these points, if they exist. his variation of problem 1 is essentially over-constrained and the analysis will usually come up with the result that there is no solution. 7 Extra Constraints () Problem : he enforcement of both constraints g 1 (x 1, x ) = 0 and g (x 1, x ) = 0 leads to a set of discrete points in the (x 1, x ) plane where the constraints are satisfied. We would then consider the value of f at this point or these points. While not over-constrained, this variation of problem is not very interesting because of the discrete nature of the investigation (as in Problem 1 with one constraint). Problem 3: he enforcement of the two constraints g 1 (x 1, x, x 3 ) = 0 and g (x 1, x, x 3 ) = 0 leads to a curve containing points in the (x 1, x, x 3 ) plane where the constraints are satisfied. Now that our feasible set contains an infinite number of points we are back to an interesting situation. he curve is a D continuum that is the intersection of g 1 = 0 and g = 0. 8 14

Extra Constraints (3) Problem 3 Recap: he enforcement of the two constraints g 1 (x 1, x, x 3 ) = 0 and g (x 1, x, x 3 ) = 0 leads to a curve containing points in the (x 1, x, x 3 ) plane where the constraints are satisfied. Intuitively, we would want to increase the level set parameter r until the level surface contacts the intersection curve in a tangential fashion. But how do we express this requirement in terms of the gradient of f? Asking for points p such that is parallel to both and is a show stopper because g 1 and could be different from each other for all p (consider two intersecting planes). f 1 g g g Going back to the single constraint case, here is another way to express the requirement that gradient of f is parallel to the gradient of g: We ask that is in the subspace defined by his immediately generalizes for our immediate needs: We ask that lies in the subspace defined by linear combinations of and f g.. g 1 g 9 f Extra Constraints (4) Problem 3 Recap: he enforcement of the two constraints g 1 (x 1, x, x 3 ) = 0 and g (x 1, x, x 3 ) = 0 leads to a curve containing points in the (x 1, x, x 3 ) plane where the constraints are satisfied. We ask that of and f. g 1 lies in the subspace defined by linear combinations g 1 g f g Note that the line of intersection is tangent to the ellipsoid but the planes are not necessarily tangent. In fact, at least one plane intersects the ellipsoid. Points satisfying both constraints. g (x 1, x, x 3 ) = 0 g 1 (x 1, x, x 3 ) = 0 Note that three constraints: g 1 =0, g =0, g 3 =0 for Problem 3 leads to the discrete case again. 30 15

Extra Constraints: Lagrange Formulation Our previous requirement is formalized as: k i 1 As before x is really (x 1, x,, x n ). o avoid the over-constrained or discrete points case we assume that the number of constraints k < n. his gives the Lagrangian: k Lx, f x g x i i i1 Setting Lx (, ) 0 leads to n + k equations in n + k unknowns. f x g x g x 0 i 1,, k. i i i 31 Extra Constraints: Lagrange Formulation Summary: o find extreme points of f(x) subject to the constraints g i (x) = 0 for i=1,,,k with k < n: (the notation x represents (x 1, x,, x n ) ) We form the Lagrangian: where,,,. 1 k k i1, L x f x g x i i hen the extreme points are calculated by solving the n + k equations in n + k unknowns derived from Lx (, ) 0. 3 16

Inertial Axes (1) An application of Lagrange multipliers: Very often in a structural analysis, we want to approximate a secondary structural element with a single straight line. Here we have a straight line that acts as the longitudinal axis of a helix. 33 Inertial Axes () he diagram gives us an indication of a reasonable strategy for the determination of the axis: We should have the straight line positioned among the atoms so that it is closest to all these atoms in a least squares sense. Our objective is to find this best helix axis. A reasonable strategy would be to have the axis pass through the centroid of all the atoms with a direction chosen to minimize the sum of the squares of the perpendicular distances from the atoms to the helix axis. 34 17

Inertial Axes (3) Mathematical derivation of the axes: If represents the perpendicular distance between atom a (i) and the helix axis, then we chose the axis direction so that it minimizes the sum: d i S N d i i1 35 Inertial Axes (4) It should be stated that such an axis could also be computed to go among the atoms of a beta strand, in fact, any arbitrary set of atoms. 36 18

Inertial Axes (5) We now provide an analysis that shows how the axis is dependent on the coordinates of the chosen atoms. o make the analysis simpler, we first consider one atom, call it a, with position vector a a, a, a. x y z As noted earlier, this vector is using a frame of reference that is an x, y, z coordinate system with origin positioned at the centroid of the atoms under consideration (see previous figure). 37 Inertial Axes (6) Analysis continued: he perpendicular distance from the atom to the axis is illustrated in the figure. We are trying to determine the values of w x, w y, and w z that define the axis direction. hese are the components of a unit vector w and so w w, w, w with. ww1 x y z he required axis is a scalar multiple of w and d represents the perpendicular vector going from atom a to this axis. 38 19

Inertial Axes (7) Analysis continued: Some trigonometry defines the square of the norm of d as: So: d a sin a 1 cos a 1 a his can be written as: aw w w his = 1. d a 1 a w a w a w his expression for d is quadratic in w. 39 Inertial Axes (8) Analysis continued: Now we replace a and w with their coordinate representations: where: x y z x y z x x y y z z d a a a w w w a w a w a w Aw Bw Cw x y z Fw w Gw w Hw w y z z x x y A a a F a a y z y z B a a G a a z x z x C a a H a a x y x y. 40 0

Inertial Axes (9) Analysis continued: Finally, an elegant representation for d: wx A H G wx d w H B F w. y y w G F C w z z Now recall that we want to minimize S which is the sum of all such squared norms going across all N atoms in the set. 41 Inertial Axes (10) Analysis continued: We let the coordinates of the i th atom a (i) be i i i i represented by a a, a, a, then x y z N wx D E E xx xy xz wx S d w E D E w i y yx yy yz y i1 w E E D w where: z zx zy zz z N N i i i i xx y z yz zy y z D a a E E a a i1 i1 N N i i i i yy z x xz zx z x D a a E E a a i1 i1 N N i i i i zz x y xy yx x y D a a E E a a i1 i1. 4 1

Inertial Axes (11) Analysis continued: Finally, we have the quantity that we want to minimize expressed as a function of w and the coordinates of the atoms: where: S w w D E E E D E E E D zx zy zz xx xy xz yx yy yz Physicists call this the inertial tensor. 43 Inertial Axes (1) Analysis continued: We can now formulate our minimization problem as a Lagrange multiplier problem in which we minimize S subject to the constraint that ww1. he Lagrangian will be:, 1 L w w w w w where is the Lagrange multiplier. Setting the usual derivatives to zero: L w w 0 w 44

Inertial Axes (13) Analysis finished: We now recognize this as an eigenvector problem: w w he 3 by 3 tensor matrix is symmetric and since S is always positive for any vector w we see that is also positive definite. Consequently, all three eigenvalues are positive. Note that once we have an eigenvalue, eigenvector pair we can write: S w w w w. 45 Inertial Axes (14) So, we have the solution to our problem. Once is calculated, we compute its eigenvalues and the required w (specifying the directions for our axis) will be the eigenvector corresponding to the smallest eigenvalue. Recall that we wanted to minimize S. 46 3

Inertial Axes (15) Significance of three eigenvalues: It should also be noted that there are three eigenvectors produced by this procedure and because of the symmetry of, they are mutually orthogonal and can be used as an orthogonal basis for the set of atoms under consideration. In some applications, this inertial frame of reference is a useful construct and the eigenvectors are called the principle axes of inertia. 47 4