Jim Lambers MAT 419/519 Summer Session 2011-12 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Functions of Several Variables We now generalize the results from the previous section, pertaining to optimization of functions of one variable, to functions of several variables. However, we first need some notation and definitions. Definition An n-vector in R n is an ordered n-tuple x = (x 1, x 2,..., x n ) of real numbers x i, called the components of x. Vectors belong to vector spaces, which support two essential operations. We define addition of two vectors x = (x 1, x 2,..., x n ) and y = (y 1, y 2,..., y n ) in R n by and multiplication of x and a real number λ by x + y = (x 1 + y1, x 2 + y 2,..., x n + y n ), λx = (λx 1, λx 2,..., λx n ). Multiplication of numbers needs to be generalized to a sort of multiplication operation involving two vectors. Definition If x = (x 1, x 2,..., x n ) and y = (y 1, y 2,..., y n ) are vectors in R n, their dot product or inner product x y is defined by x y = x 1 y 1 + x 2 y 2 + + x n y n = Two vectors x and y are orthogonal if x y = 0. n x k y k. We also need to generalize the notion of absolute value, or a number s magnitude, to the magnitude of a vector. Definition The norm or length x of a vector x = (x 1, x 2,..., x n ) in R n is defined by k=1 x = (x 2 1 + x 2 2 + + x 2 n) 1/2 = (x x) 1/2. The norm is a real-valued function on R n with the following properties: 1
1. x 0 for all vectors x R n. 2. x = 0 if and only if x = 0. 3. αx = α x for all vectors x R n and all real numbers α. 4. x + y x + y, the Triangle Inequality 5. x y x y, the Cauchy-Schwarz Inequality Using the norm, the dot product can also be defined as x y = x y cos θ where θ is the angle between x and y. Just as the distance between two numbers x and y is given by x y, the distance between two points in n-dimensional space can be defined similarly. Definition If x, y R n, the distance d(x, y) between x and y is defined by ( n ) 1/2 d(x, y) = x y = (x i y i ) 2. The ball B(x, r) centered at x of radius r is the set of all vectors y R n such that d(x, y) < r. A point x in a set D R n is an interior point of D if there exists an r > 0 such that B(x, r) D. The interior D 0 is the set of all interior points of D. A set G R n is open if G 0 = G. A set F R n is closed if its complement G = F c in R n is open. Now, we are prepared to define minimizers and maximizers of functions of n variables. Definition Suppose that f : D R n R. A point x D is a 1. global minimizer of f on D if f(x ) f for all x D; 2. strict global minimizer of f on D if f(x ) < f for all x D; 3. local minimizer of f(x if there exists a δ > 0 such that f(x ) f whenever x B(x, δ); 4. strict local minimizer of f if there exists a δ > 0 such that f(x ) < f whenever x B(x, δ) and x x ; 5. critical point of f if the first partial derivatives of f exist at x and i=1 x i (x ) = 0, i = 1, 2,..., n. 2
Using this definition of a critical point, we can now characterize the location of maximizers and minimizers, as in Fermat s theorem in the single-variable case. Theorem Suppose that f is a real-valued function for which all first partial derivatives of f exist on a subset D of R n. If x is an interior point of D that is a local minimizer of f, then x is a critical point of f. This theorem can be proved by reduction to the single-variable case, in which all variables except one are fixed. We need to generalize Taylor s Formula to the multi-variable case. Given a function f : R n R whose first and second partial derivatives are continuous on an open set containing the line segment joining x and x. By defining the function [x, x] = {w R n w = x + t(x x ), 0 t 1} ϕ(t) = f(x + t(x x )) and applying Taylor s Formula in conjunction with the multi-variable Chain Rule, we obtain the following result. Theorem Suppose that x, x R n and that f : D R n R with continuous first and second partial derivatives on some open set containing the line segment [x, x]. Then there exists a z [x, x] such that where is the gradient of f, and is the Hessian of f. f = f(x ) + f(x ) (x x ) + 1 2 (x x ) Hf(z)(x x ) f = Hf = [ x 1 x 2 1 x 2 x 1. x n x 1 x 2 x n x 1 x 2 x 1 x n 2 f x 2 x 2 2 x n..... x n x 2 2 f x 2 n Now we can characterize local or global maximizers or minimizers based on the second partial derivatives, in the same way as in the single-variable case. Theorem Suppose that x is a critical point of f with continuous first and second partial derivatives on R n. Then: 3 ]
1. x is a global minimizer of f if (x x ) Hf(z)(x x ) 0 for all x R n and all z [x, x]; 2. x is a strict global minimizer of f if (x x ) Hf(z)(x x ) > 0 for all x R n and all z [x, x]; 3. x is a global maximizer of f if (x x ) Hf(z)(x x ) 0 for all x R n and all z [x, x]; 4. x is a strict global maximizer of f if (x x ) Hf(z)(x x ) < 0 for all x R n and all z [x, x]. This theorem can be proved using the multi-variable generalization of Taylor s Theorem, in conjunction with the continuity of the second partial derivatives. Unfortunately, the sign of (x x ) Hf(z)(x x ) is not so easily determined, in comparison to the single-variable counterpart f (z)(x x ) 2. To that end, we turn to concepts from linear algebra. Definition Let A be an n n symmetric matrix. The quadratic form associated with A is a function Q A : R n R defined by Q A (y) = y Ay = n a ij y i y j, y R n. i,j=1 Example Let f(x, y, z) = x 2 y 2 + 4z 2 2xy + 4yz. Then we have f(x, y, z) = (2x 2y, 2y 2x + 4z, 8z + 4y) and It follows that Hf(x, y, z) = 2 2 0 2 2 4 0 4 8. Q Hf (x, y, z) = (x, y, z) Hf(x, y, z)(x, y, z) = (x, y, z) (2x 2y, 2x 2y + 4z, 4y + 8z) = 2x 2 2y 2 + 8z 2 4xy + 8yz = 2f(x, y, z). The following terms will enable us to more easily describe the conditions for local or global minimizers or maximizers. 4
Definition Suppose that A is an n n symmetric matrix and that Q A (y) = y Ay is the quadratic form associated with A. Then A and Q A are called: 1. positive semidefinite if Q A (y) 0 for all y R n ; 2. positive definite if Q A (y) > 0 for all y R n, y 0; 3. negative semidefinite if Q A (y) 0 for all y R n ; 4. negative definite if Q A (y) < 0 for all y R n, y 0; 5. indefinite if Q A (y) > 0 for some y R n and Q A (y) < 0 for other y R n. With these terms, the preceding theorem can be restated more concisely as follows: Theorem Suppose that x is a critical point of a function f with continuous first and second partial derivatives on R n and that Hf is the Hessian of f. Then x is a 1. global minimizer of f if Hf is positive semidefinite on R n ; 2. strict global minimizer of f if Hf is positive definite on R n ; 3. global maximizer of f if Hf is negative semidefinite on R n ; 4. strict global maximizer of f if Hf is negative definite on R n. It remains to determine when a given matrix is positive (or negative) definite (or semidefinite). This will be taken up in subsequent lectures. Exercises 1. Chapter 1, Exercise 3 2. Chapter 1, Exercise 4 3. Chapter 1, Exercise 5 5