Boston College. Math Review Session (2nd part) Lecture Notes August,2007. Nadezhda Karamcheva www2.bc.

Similar documents
EC /11. Math for Microeconomics September Course, Part II Lecture Notes. Course Outline

Math (P)refresher Lecture 8: Unconstrained Optimization

Mathematical Economics: Lecture 16

September Math Course: First Order Derivative

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

Review of Optimization Methods

EC /11. Math for Microeconomics September Course, Part II Problem Set 1 with Solutions. a11 a 12. x 2

CHAPTER 4: HIGHER ORDER DERIVATIVES. Likewise, we may define the higher order derivatives. f(x, y, z) = xy 2 + e zx. y = 2xy.

Week 4: Calculus and Optimization (Jehle and Reny, Chapter A2)

Nonlinear Programming (NLP)

EC400 Math for Microeconomics Syllabus The course is based on 6 sixty minutes lectures and on 6 ninety minutes classes.

Mathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions

Paul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions

where u is the decision-maker s payoff function over her actions and S is the set of her feasible actions.

Chapter 2: Unconstrained Extrema

The Kuhn-Tucker Problem

Mathematical Economics. Lecture Notes (in extracts)

x +3y 2t = 1 2x +y +z +t = 2 3x y +z t = 7 2x +6y +z +t = a

ARE211, Fall 2005 CONTENTS. 5. Characteristics of Functions Surjective, Injective and Bijective functions. 5.2.

MA102: Multivariable Calculus

8. Constrained Optimization

Tangent spaces, normals and extrema

Preliminary draft only: please check for final version

Functions of Several Variables

Optimization Theory. Lectures 4-6

Chapter 13. Convex and Concave. Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44

Monotone Function. Function f is called monotonically increasing, if. x 1 x 2 f (x 1 ) f (x 2 ) x 1 < x 2 f (x 1 ) < f (x 2 ) x 1 x 2

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

Session: 09 Aug 2016 (Tue), 10:00am 1:00pm; 10 Aug 2016 (Wed), 10:00am 1:00pm &3:00pm 5:00pm (Revision)

In view of (31), the second of these is equal to the identity I on E m, while this, in view of (30), implies that the first can be written

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then

1 Convexity, concavity and quasi-concavity. (SB )

1 Directional Derivatives and Differentiability

Optimization. A first course on mathematics for economists

Constrained Optimization

2015 Math Camp Calculus Exam Solution

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Lecture 4: Optimization. Maximizing a function of a single variable

Static Problem Set 2 Solutions

REVIEW OF DIFFERENTIAL CALCULUS

Constrained Optimization Theory

Paul Schrimpf. October 17, UBC Economics 526. Constrained optimization. Paul Schrimpf. First order conditions. Second order conditions

B553 Lecture 3: Multivariate Calculus and Linear Algebra Review

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

Optimality Conditions

E 600 Chapter 4: Optimization

Economics 205 Exercises

Chapter 3: Constrained Extrema

Index. Cambridge University Press An Introduction to Mathematics for Economics Akihito Asano. Index.

Math 273a: Optimization Basic concepts

z = f (x; y) f (x ; y ) f (x; y) f (x; y )

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Properties of Walrasian Demand

II. An Application of Derivatives: Optimization

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as

Basics of Calculus and Algebra

Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1

UNCONSTRAINED OPTIMIZATION PAUL SCHRIMPF OCTOBER 24, 2013

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

Convex Functions and Optimization

ECON 186 Class Notes: Optimization Part 2

E 600 Chapter 3: Multivariate Calculus

6 Optimization. The interior of a set S R n is the set. int X = {x 2 S : 9 an open box B such that x 2 B S}

CHAPTER 2: QUADRATIC PROGRAMMING

Functions of Several Variables

OR MSc Maths Revision Course

Higher order derivative

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR

Caculus 221. Possible questions for Exam II. March 19, 2002

Econ Slides from Lecture 14

THE INVERSE FUNCTION THEOREM

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Copyrighted Material. L(t, x(t), u(t))dt + K(t f, x f ) (1.2) where L and K are given functions (running cost and terminal cost, respectively),

Generalization to inequality constrained problem. Maximize

Mathematical Preliminaries for Microeconomics: Exercises

g(t) = f(x 1 (t),..., x n (t)).

Week 3: Sets and Mappings/Calculus and Optimization (Jehle September and Reny, 20, 2015 Chapter A1/A2)

Math Camp Notes: Linear Algebra I

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

14 March 2018 Module 1: Marginal analysis and single variable calculus John Riley. ( x, f ( x )) are the convex combinations of these two

3.5 Quadratic Approximation and Convexity/Concavity

Differentiation. f(x + h) f(x) Lh = L.

Chapter 8: Taylor s theorem and L Hospital s rule

Mathematical Foundations II

Appendix A Taylor Approximations and Definite Matrices

ARE202A, Fall 2005 CONTENTS. 1. Graphical Overview of Optimization Theory (cont) Separating Hyperplanes 1

Econ Slides from Lecture 8

Optimality Conditions for Constrained Optimization

Differentiable Functions

The Inverse Function Theorem 1

Linear and non-linear programming

Economics 101A (Lecture 3) Stefano DellaVigna

Lecture 8: Basic convex analysis

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

Mathematical Foundations -1- Convexity and quasi-convexity. Convex set Convex function Concave function Quasi-concave function Supporting hyperplane

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Problem Set 2 Solutions

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Transcription:

Boston College Math Review Session (2nd part) Lecture Notes August,2007 Nadezhda Karamcheva karamche@bc.edu www2.bc.edu/ karamche 1

Contents 1 Quadratic Forms and Definite Matrices 3 1.1 Quadratic Forms......................... 3 1.2 Definiteness of Quadratic Forms................. 3 1.3 Test for Definiteness....................... 5 2 Unconstrained Optimization 7 2.1 First order conditions...................... 7 2.2 Second Order Sufficient Conditions............... 8 2.3 Second Order Necessary Conditions............... 11 2.4 Global Max and Min (Sufficient Conditions).......... 12 2.5 Economic Applications...................... 16 3 Constrained Optimization 16 3.1 Equality Constraints....................... 16 3.2 Meaning of the Multiplier.................... 21 3.3 Second Order Sufficient Conditions............... 23 4 Calculus of Several Variables 24 4.1 Weierstrass s and Mean Value Theorems............ 24 4.2 Taylor Polynomials on R 1.................... 25 4.3 Taylor Polynomials in R n..................... 26 2

1 Quadratic Forms and Definite Matrices 1.1 Quadratic Forms There are several reasons for studying the quadratic forms. The natural starting point for the study of optimization problems is the simples such problem: the optimization of a quadratic form. Furthermore, the second order conditions that distinguish max from min in economic optimization problems are stated in terms of quadratic forms. Finally, a number of economic optimization problems have a quadratic objective function, for example, risk minimization problems in finance, where riskiness is measured by the quadratic variance of the returns from investments or least squares minimization problems in econometrics (OLS). Definition 1 (Quadratic Form) A quadratic form on R n is a real valued function Q(.): R n R : x R n Q(x) of the form: Q(x 1,..., x n ) = i j a ij x i x j in which each term is a monomial of degree two. As Theorem 13.3 suggests, there is a close relationship between symmetric matrices and quadratic forms. For every quadratic form Q(x) there is a symmetric matrix A, such that Q(x) = x T Ax, and every symmetric matrix A yields a quadratic form Q(x) = x T Ax. Example: The general two dimensional quadratic form a 11 x 2 1 + a 12 x 1 x 2 + a 22 x 2 21 can be written as: [ ] [ 1 a x1 x 11 a ] [ ] 2 12 x1 2 1 a 2 12 a 22 x 2 1.2 Definiteness of Quadratic Forms A quadratic form always takes on the value of zero at the point x = 0. Its distinguishing characteristics is the set of values it takes when x 0. We focus here on the question of whether x = 0 is a max, a min or neither of the quadratic forms under consideration. 3

Example: Consider on variable quadratic form y = ax 2.If a > 0, then always ax 2 0 and equals 0 only when x = 0. Such is called positive definite, and thus, x = 0 is global maximizer. If a < 0, then always ax 2 0 and equals 0 when x = 0. Such a quadratic form is called negative definite, thus, x = 0 is its global maximizer. Now let s move to two or more dimensions. A symmetric matrix is called positive definite, positive semidefinite, negative definite, etc. according to the definiteness of the corresponding quadratic form Q(x) = x T Ax. Note that Definition 2 (Definiteness) Let A be a n n symmetric matrix, then A is a)positive definite if x T Ax > 0 for all x 0 in R n b)positive semidefinite if x T Ax 0 for all x 0 in R n c)negative definite if x T Ax < 0 for all x 0 in R n d)negative semidefinite if x T Ax 0 for all x 0 in R n e)indefinite if x T Ax > 0 for some x 0 in R n and < 0 for some other x 0 in R n Remark 1 (Definiteness of Quadratic Forms:) A matrix that is positive(negative) definite is automatically positive(negative) semidefinite, but not the other way around. Otherwise,every symmetric matrix falls into one of the above five categories. Example: Consider these quadratic forms Quadratic form Definiteness Q 1 (x 1, x 2 ) = x 2 1 + x 2 2 always> 0 at (x 1, x 2 ) (0, 0) positive definite Q 2 (x 1, x 2 ) = x 2 1 x 2 2 always< 0 at (x 1, x 2 ) (0, 0) negative definite Q 3 (x 1, x 2 ) = x 2 1 x 2 2 takes both negative and positive values indefinite that is Q 3 (1, 0) = 1, Q 3 (0, 1) = 1 Q 4 (x 1, x 2 ) = (x 1 + x 2 ) 2 always 0, but at some x 0 Q 4 (x ) = 0 positive semidefinite Q 5 (x 1, x 2 ) = (x 1 + x 2 ) 2 always 0, but at some x 0 Q 5 (x ) = 0 negative semidefinite Application 1 : The definiteness of a symmetric matrix plays an important role in economic theory. For example, for a function y = f(x) of one variable, the sign of the second derivative f (x 0 ) at a critical point x 0 gives a necessary and a sufficient condition for determining if whether x 0 is a maximum of f, a minimum of f, or neither The generalization of this second derivative test to 4

higher dimensions involves checking whether the second derivative matrix (a Hessian) of f is positive definite, negative definite, or indefinite at a critical point of f. Application 2 : In a similar vein, a function y = f(x) of one variable is concave if its second derivative f (x) is 0 on some interval. The generalization of this result to higher dimension states that a function is concave on some region if its second derivative matrix is negative semidefinite for all x in the region. Now we will go over simple test for the definiteness of a quadratic form or of a symmetric matrix. First, consider the definitions of a principal minor and leading principal minor. 1.3 Test for Definiteness Definition 3 (Principal Minor) Let A be a n n matrix. A k k submatrix of A formed by deleting the n k columns, say columns i 1, i 2,..., i n k and the corresponding n-k rows, from A, is called a kth order principal submatrix of A. The determinant of a k k principal submatrix is called a kth order principal minor of A. Example: Let A be a 3 3,matrix, then there are 1. one third order principal minor det(a) a 2. three second order principal minor 11 a 12 formed by deleting column 3 and row 3; 11 a 13 a formed by deleting column 2 and row 2; 22 a 23 a 31 a 33 a 32 a 33 a 21 a 22 a formed by deleting column 1 and row 1. 3.three first order principal minors a 11, formed by deleting the last 2 rows and columns; a 22 formed by deleting the first and the third columns and corresponding rows; a 33. formed by deleting the first 2 rows and columns. Among the kth order principal minors of a given matrix, there is one special one that we want to highlight. 5

Definition 4 (Leading Principal Minor) Let A be a n n matrix. The kth order principal submatrix of A obtained by deleting the last n k rows and the last n k columns from A is called the kth order leading principal submatrix of A, denoted by A k. Its determinant is called the kth order leading principal minor of a, denoted by A k An n n matrix has n leading principal submatrices-the top leftmost 1 1 submatrix, the top-leftmost 2 2 submatrix, etc. Example: Let A be a 3 3 matrix, then the leading principal minors a are: a 11, 11 a 12 a 21 a 22, A. The below theorem provides an algorithm which uses the leading principal minors to determine the definiteness of a given matrix. Theorem 1 (Definiteness of Symmetric Matrices) Let A be an n n symmetric matrix. Then, a) A is positive definite if and only if all its n leading principal miniors are (strictly) positive b)a is negative definite if and only if its n leading principal minors alternate in sign as follows: A 1 < 0, A 2 > 0, A 3 < 0, etc., the kth order leading principal minor should have the same sign as ( 1) k c)if some kth order leading principal minor of A(or some pair of them) is nonzero but does not fit either of the above two sign patters, then A is indefinite. This case occurs when A has a negative kth order leading principal minor for an even integer k or when A has a negative kth order leading principal minor and a positive lth order principal minor for two distinct odd integers k and l Theorem 2 (Semi-definiteness of Symmetric Matrices) Let A be an n n symmetric matrix.then, A is positive semidefinite if and only if every principal minor of A is 0; A is negative semidefinite if and only if every principal minor of odd order 0 is and every principal minor of even order is 0 6

2 Unconstrained Optimization 2.1 First order conditions Definition 5 (r-ball) Let X be a vector in R n, let r be a positive number.the r-ball about X is the set B r defined as B r {X R n : X X < r}. Definition 6 (Open Set) A set U in R n is open if for each X U there exist an open ball B r (X) around X completely contained in U. Definition 7 (Max) Let F : U R 1 be a real-valued function of n variables, whose domain U is a subset of R n, then a) A point X U is a max of F on U if F F (X) for all X U b) A point X U is a strict max of F onu if F > F (X) for all X X in U. c) A point X U is a local max of F on U if there is a r-ball B r about X such that F F (X) for all X B r U. d) A point X U is a strict local max of F on U if there is a r-ball B r about X such that F > F (X) for all X X in B r U. Definition 8 (Interior Point) X is an interior point of a set U R n if there is a r-ball B r about X such that B r U. Definition 9 (Critical Point) The n-vector X is a critical point of a function F : U R 1 element of C 1 with domain U R n if X satisfies x i = 0 for all i = 1,...n. Remark 2 (F.O.C. for Max and Min with 1 Variable) If x 0 is an interior max or min of f : U R 1 where U R 1, then x 0 is a critical point of f. 7

Note that a function is neither increasing nor decreasing on an interval about an interior max or min, that is f (x 0 ) = 0 or undefined. Theorem 3 (F.O.C. for Max or Min with n Variables) Let F : U R 1 be a C 1 function defined on U R n.if (i)x is a local max or min of F in U and if (ii) X is an interior point of U, then x i = 0 i = 1,...n. Proof. Let B = B r be a ball about X in U such that F F (X) for all X B. Since X maximizes F on B, along each line segment through X that lies in B and that is parallel to one of the axes, F takes on its maximum value at X.In other words, x i maximizes F on B, i.e. x i maximizes the function of one variable x i F (x 1,..., x i 1, x i, x i+1,..., x n) for x i (x i r, x i + r). Apply the one-variable maximization criterion (Remark 1) to conclude that =... = = 0. The argument for the min case is similar. Q.E.D. Suggested Exercise 1 Find the local maxs and mins of F (x, y) = x 3 y 3 + 9xy. 2.2 Second Order Sufficient Conditions Definition 10 (Gradient, Jacobian and Hessian) i)the gradient vector of a function F : U R 1 element of C 1 with domain U R n evaluated at X is the array F of first derivatives of F of the form F. Note that F is a n x 1 vector. ii)the Jacobian matrix of an array of functions F [F 1, F 2,..., F m ] T, where each function is element of C 1, evaluated at X is the array of the form 8

DF = 1 (X ) 1 (X ) 2 (X ) 2..... m (X ) m Note that for one single function F : U R 1 with U R n, the Jacobian vector is DF = [ x 2 (X ) ] iii) The Hessian matrix of a function F C 2 evaluated at X is the array D 2 F of second derivatives of F of the form D 2 F = 2 F (X ) 2 F..... 2 F (X ) 2 F Note that the Hessian matrix has size n n. = F 11 F 1n..... F n1 F nn Remark 3 (Young s Theorem) Suppose that F : U R 1 where U R n, is a C 2 function in an open set J U. Then for all X J and for each pair of indixes i, j : 2 F x i x j = 2 F x j x i. Proposition 1 (Symmetry of the Hessian) The Hessian matrix of a function F : U R 1 element of C 2 with U R n, is symmetric. Proof. Follows directly from Remark 3. Q.E.D. Theorem 4 (S.O.S.C. for Max and Min) Let F : U R 1 element of C 2 with open domain in U R n. Suppose X is a critical point (see def.9) of F. a)if the Hessian D 2 F is a negative definite matrix, then X is a strict local maximum of F. b) If the Hessian D 2 F is a positive definite matrix, then X is a strict local minimum of F.c) If the Hessian D 2 F is indefinite, then X is neither a local maximum nor minimum of F. X,in such case, is called a saddle point. 9

Later when we develop the theory on Taylor polynomials we will show how to approximate a C 2 function by its Taylor polynomial of order two about X. Proof (Sketch of a proof) (the details on Taylor expansion are in the last section). For part (a), let X be a critical point of F, let h R n use a Taylor expansion of F around X and get F (X + h) = F + DF h + 1 2 h D 2 F h + R(h). Where DF is the Jacobian of F, D 2 F is the Hessian and R(h) is the remainder term such that R(h) 0 as h 0. Ignore the remainder term,recall that DF = 0 (since X is a critical point) and get F (X + h) F 1 2 h D 2 F h. For small h 0, since D 2 F is negative definite (by hypothesis) 1 2 h D 2 F h < 0 then F (X + h) F < 0 or F (X + h) < F hence X is a strict local maximum. A similar argument follows for part (b) and (c). Q.E.D. Remark 4 (Theorem 4 Restated) Let F : U R 1 element of C 2 with open domain in U R n.suppose X is a critical point so that x i = 0 for all i = 1,..., n. Then a) If the n leading principal minors of the Hessian alternate signs, this is F F x1 x 1 < 0, x1 x 1 F F x1 x x1 x 2 F x2 x 1 F x2 x 2 > 0, 1 F x1 x 2 F x1 x 3 F x2 x 1 F x2 x 2 F x2 x 3 < 0...at X, then X F x3 x 1 F x3 x 2 F x3 x 3 is a strict local max of F. b) If the n leading principal minors of the Hessian are all positive, i.e. F F x1 x 1 > 0, x1 x 1 F F x1 x x1 x 2 F x2 x 1 F x2 x 2 > 0, 1 F x1 x 2 F x1 x 3 F x2 x 1 F x2 x 2 F x2 x 3 > 0...at X, then X F x3 x 1 F x3 x 2 F x3 x 3 is a strict local min of F. c) If some non-zero leading principal minors of the Hessian violate the sign pattern in (b) and (c), then X is a saddle point of F. Note that the remark restates Theorem 4 with equivalent definitions of positive definiteness and negative definiteness. 10

Suggested Exercise 2 Find the critical points of F (x, y) = x 3 y 3 + 9xy.Find the Hessian matrix and apply the Theorem 4 or the Remark following it to find max or min. 2.3 Second Order Necessary Conditions Note that if the statements in the section above (sufficient conditions) are stated as A = B, now we state B = A. Second order necessary conditions for max and min of a function are weaker than the sufficient conditions. Local max requires to replace negative definite by negative semidefinite and local min requires to replace positive definite by positive semidefinite. That is why we do not use the qualifier strict. Theorem 5 (S.O.N.C. for Max and Min) Let F : U R 1 element of C 2 with open domanin U R n.suppose X is an interior point of U and suppose that X is a local min of F (respectively local max) then DF = 0, and D 2 F is positive semidefinite (respectively negative semidefinite). Proof is given in the last section. Remark 5 (Theorem 5 Restated) Let F : U R 1 element of C 2 with open domanin U R n.suppose X is an interior point of U a) If X is a local max of F then DF = 0 and all the principal minors of the Hessian D 2 F of odd order are 0 and all the principal minors of even order are 0. b) If X is a local min of F then DF = 0 and all the principal minors of the Hessian D 2 F are 0. Note that this remark restates Theorem 5 with alternative definitions of positive and negative semidefiniteness for the Hessian matrix. Suggested Exercise 3 Find the local max or min of the following two funcitons: f 1 = x 4 and f 2 = x 4 11

Suggested Exercise 4 Find the local max or min of the following function: F (x, y) = x 4 + x 2 6xy + 3y 2 Suggested Exercise 5 For the function F (x, y, z) = x 2 + 6xy + y 2 3yz + 4z 2 10x 5y 21z defined on R 3, find the critical points and classify them as local max, local min, saddle point, or can t tell. 2.4 Global Max and Min (Sufficient Conditions) Definition 11 (Convex Set) A set U R n is convex if whenever X and Y are points of U, the line segment l(x, Y ) {tx + (1 t)y : 0 t 1} joining X to Y is also in U. Definition 12 (Concave Function) A real-valued function g xy : U R 1 with convex domain U R n is concave function if for all X, Y in U and for all t [0, 1], g xy (tx + (1 t)y ) tg xy (X) + (1 t) g xy (Y ). Definition 13 (Convex Function) A real-valued function g xy : U R 1 with convex domain U R n is covex function if g xy ( ) is concave. Remark 6 Do not confuse the notion of convex function with that of convex set. A set U is a convex set if whenever x and y are points in U, the line segment joining x to y is also in U. The definition of concave or convex function requires that whenever f is defined at x and y, it is defined on the line segment joining them. Thus concave and covex functions are required to have convex domains. Remark 7 Many introductory calculus texts call convex functions concave up and concave functions concave down. We will stick with the more classical terms- convex and concave. Definition 14 (Restriction) A restriction of F : U R 1 with U convex and U R n to every line segment connecting X, Y in U is g xy (t) = F (tx + (1 t)y ) for t [0, 1]. We usually develop a geometric intuition for concave and convex functions of one variable in Calculus I. We can recognize a concave function by 12

its graph, because the inequality in the definition of a concave function has the following geometric interpretation: A function f is concave if and only if any secant line connecting two points on the graph of f lies belowthe graph. A function is convex if and only if any secant line connecting two points on its graph lies above the graph. In developing an intuition for concave function of several variables, it is useful to notice that a function of n variables on a convex set U is concave if and only if its restriction to any line segment in U is concave function of one variable. This should be intuitively clear since the definition of a concave function is a statement about its behavior on line segments. Lemma 1 Let F : U R 1 be a function with convex domain U R n. Then F is concave if and only if its restriction to every line segment in U is concave function. Proof. Let X, Y be arbitrary points in U, let g xy (t) F (tx + (1 t)y ), where t [0, 1], i.e. g xy is a restriction of F to every line segment in U. (We will proceed in two steps) i)( =). Suppose g xy is concave, we need to prove that F is concave, thus: F (tx + (1 t)y ) = g xy (t) by definition = g xy (t 1 + (1 t) 0) tg xy (1) + (1 t)g xy (0) since g xy is concave = tf (X) + (1 t)f (Y ) by definition of g xy ii)(= ).Suppose F is concave, we want to prove that g xy (t) is concave. Fix s 1 and s 2. g xy (ts 1 + (1 t)s 2 ) = F {(ts 1 + (1 t)s 2 )X + (1 ts 1 (1 t)s 2 )Y } = F {t[s 1 X + (1 s 1 )Y ] + (1 t)[s 2 X + (1 s 2 )Y ]} tf (s 1 X + (1 s 1 )Y ) + (1 t)f (s 2 X + (1 s 2 )Y ) = tg xy (s 1 ) + (1 t)g xy (s 2 ) The first line follows by definition of g xy, the second by rearranging terms, the third by concavity of F, the last by definition of g xy. Q.E.D. 13

Lemma 2 Let g xy : I R 1 be a C 1 function on an interval I R 1. Then: a) g xy is concave iff g xy (Y ) g xy (X) Dg xy (X)(Y X) for all X, Y I. a) g xy is convex iff g xy (Y ) g xy (X) Dg xy (X)(Y X) for all X, Y I. Proof. For part (a) consider i)(= ). Suppose g xy is concave on I, let X, Y I and let t (0, 1] then tg xy (Y ) + (1 t)g xy (X) g xy (ty + (1 t)x) by concavity of g xy, follows that gxy(ty +(1 t)x) gxy(x) g xy (Y ) g xy (X) (Y X), let t(y X), note t(y X) that 0 as t 0. Rewrite the last expression as g xy (Y ) g xy (X) g xy(x+ ) g xy(x) (Y X), let t 0, hence 0 and get g xy (Y ) g xy (X) Dg xy (X)(Y X). ii)( =) Suppose g xy ( X) g xy (Ỹ ) Dg xy(ỹ )( X Ỹ ) for any X, Ỹ I, consider in particular the pair X = X and Ỹ = (1 t)x + ty, then g xy (X) g xy ((1 t)x + ty ) Dg xy ((1 t)x + ty )(X [(1 t)x + ty ]) = tdg xy ((1 t)x + ty )(Y X). Now consider the pair X = Y and Ỹ = (1 t)x + ty then g xy (Y ) g xy ((1 t)x + ty ) Dg xy ((1 t)x + ty )(Y [(1 t)x + ty ]) = (1 t)dg xy ((1 t)x + ty )(Y X). Next, multiply the first inequality by(1 t), the second by t and add them to get (1 t)g xy (X) + tg xy (Y ) g xy ((1 t)x + ty ), hence g xy is concave. This proves part (a), part (b) follows similar lines. Q.E.D. Lemma 3 Let F : U R 1 be a C 1 function with convex domain U R n. a) F is concave on U iff for all X, Y on U, F (Y ) F (X) DF (X)(Y X). b) F is convex on U iff for all X, Y on U, F (Y ) F (X) DF (X)(Y X). 14

Proof. By Lemma 1, F is concave iff for every pair X,Y, the restriction function g xy (t) F (ty + (1 t)x) is concave, by Lemma 2 g xy is concave iff for every t [0, 1], g xy (t) g xy ([1 t]) Dg xy ([1 t])(t [1 t]). In particular for t = 1, by definition of g xy this is equivalent to F (Y ) F (X) DF (X)(X Y ). This proves part (a), part (b) follows from a similar argument. Q.E.D. Theorem 6 (Global Max and Min) Let F : U R 1 be a C 2 function with open and convex domain U R n. a) The following three conditions are equivalent: i) F is concave function in U. ii)f (Y ) F (X) DF (X)(Y X) for all X, Y in U. iii) D 2 F (X) is negative semidefinite for all X in U. b)the following three conditions are equivalent: i) F is convex function in U. ii) F (Y ) F (X) DF (X)(Y X) for all X, Y in U. iii) D 2 F (X) is positive semidefinite for all X in U. c) If F is a concave function on U and DF = 0 for a X in U then X is a global max of F on U. d) If F is a convex function on U and DF = 0 for a X in U then is a global min of F on U. Proof. I)To prove: ai aii. Follows from Lemma 3. II)To prove: ai aiii. i) ( =). As in the proof of Lemma 3, let X, Y be arbitrary points on U, let g xy (t) F (ty +(1 t)x). Follows that F is concave iff g xy is concave, which is equivalent to D 2 g xy (t) 0 for all t [0, 1]. Let Z ty + (1 t)x, recall Dg xy (t) DF (Z)(Y X), then D 2 g xy (t) (Y X) D 2 F (Z)(Y X). By hypothesis D 2 F (Z) is negative semidefinite, D 2 g xy (t) (Y X) D 2 F (Z)(Y X) 0,hence D 2 g xy (t) 0, therefore g xy (t) is concave, then F is concave. ii) (= ). Suppose F is concave on U, let Z be an arbitrary point in U and let V be an arbitrary displacement vector in R n. We want to show that V D 2 F (Z)V 0. Since U is an open set, there is t 0 > 0 such that Y =Z+ t 0 V U, since F is concave g xy is concave and D 2 g xy (0) 0 then 0 D 2 g xy (0) = (Y Z) D 2 F (Z)(Y Z) = (t 0 V )D 2 F (Z)(t 0 V ) = t 2 0V D 2 F (Z)V, hence D 2 F (Z) is negative semidefinite for all Z in U. 15

(I) and (II) prove the part (a) of the theorem, part (b) follows a similar argument presented in (I) and (II). III) To prove: part c of the theorem. If F is concave we already showed that F (Y ) F (X) DF (X)(Y X) for any X, Y in U.In particular consider X = X and notice that DF = 0 by hypothesis, then F (Y ) F DF (Y X ) imply F (Y ) F 0 therefore X is a global max. This proves (c), part (d) is shown following a similar argument. Q.E.D. 2.5 Economic Applications Suggested Exercise 6 A firm uses two inputs to produce a single product. If its production function is Q = x 1/4 y 1/4 and if it sells its output for a dollar a unit and buys each input for $4 dollars a unit, find its profit-maximizing input bundle. (Check the second order conditions.) Suggested Exercise 7 Suppose that a firm has a Cobb-Douglas production function Q = x a y b and that it faces output price p and input prices w and r, respectively. Solve the first order conditions for a profit-maximizing input bundle. Use the second order conditions to determine the values of the parameters a, b, p, w, r for which this solution is a global max. Suggested Exercise 8 Least Squares Analysis 3 Constrained Optimization 3.1 Equality Constraints The class of problems of constrained maximization can be written as: maxf (X) (1) where X R n must satisfy h 1 (X) = c 1,, h m (X) = c m (2) 16

and g 1 (X) b 1,, g k (X) b k (3) c js and b is are constants. The function F ( ) is called objective function, while h 1,, h m and g 1,, g k are called constraint functions. Note that the h is are equality constraints and g js are inequality constraints. The problem concerning to us is limited to equality constraints. This problem can be thought as the problem of finding the n-tuple X = (x 1,, x n) that maximizes (1) subject to the constraints (2). For illustrative purposes consider the problem with one equality constraint: max F (x 1, x 2 ) subject to h 1 (x 1, x 2 ) = c 1. A geometric visualization of this problem is given in Figure 1. Note from Figure 1 that at the point X = (x 1, x 2) the level curve of F ( ), F 1, and the constraint are tangents to each other, i.e. both have a common slope.we will explore this observation in order to find a characterization of the solution for this class of problems and its generalization to m restrictions. Figure 1 17

In order to find the derivative of the level curve at the optimum point X, recall from the implicit function theorem that for a function H(y 1, y 2 ), dy 2 dy 1 = H y 1 (Y o )/ H y 2 (Y o ),where Y o is a fixed point. In particular according to Figure 1, the slope dx 2 dx 1 for F and h must be the same at X, this is: / x 2 = h 1 / h 1 x 2 which can be written as / h 1 = x 2 / h 1 x 2 = λ (4) where λ is the common value of the slope at X.We are assuming that the ratios above do not have zero denominator. Now rewrite (4) as two equations λ h 1 = 0 (5) x 2 λ h 1 x 2 = 0 (6) and note that the system (5),(6) has three unknowns: (x 1, x 2, λ). We need to take into account a third equation in order to solve the system, such equation is the constraint h 1 ( ) itself: h 1 c 1 = 0 (7) Hence, the pair (X, λ) that maximizes F (x 1, x 2 ) subject to h 1 (x 1, x 2 ) = c 1 solves the system (5-7). The system (5-7) can be written in the form of a function L with three arguments x 1, x 2, λ. This function is called Lagrangian function L(x 1, x 2, λ) F (x 1, x 2 ) λ[h 1 (x 1, x 2 ) c 1 ] and λ is called Lagrange multiplier.the equations of the system (5-7) can be recovered by considering L( ) = 0, L( ) x 2 = 0, L( ) = 0.This is by λ considering the f.o.c. of the unconstrained maximization of L( ). We can think that the Lagrange method transforms a constrained maximization into an unconstrained maximization by constructing a Lagrange 18

function.the transformation requires the introduction of one more variable in the Lagrange function:the multiplier.it is important to note that the transformation requires h 1 0 or h 1 x 2 0 or both, otherwise the ratios (4) were not well defined.this is called constraint qualification.if the constraint is linear, the constraint qualification will automatically be satisfied. Note that the constraint qualification in the problem presented requires the Jacobian of the constraint fuction to be different from zero, this is Dh 1 [0, 0].For the case of a constraint function with domain in R n we require Dh 1 O where O is a 1 n vector of zeros. The constraint qualification for the more general case of m equality constraints requires the matrix DH be of rank m,where H is a columnvector of functions h i ( ). This is H = [h 1 (X),..., h m (X)] and DH = h 1 (X h ) 1..... h m (X h ) m is of size m n. This implies that the constraint set has a well-defined n-m dimensional tangent plane everywhere. Definition 15 (N.D.C.Q.) The functions h 1,... h m satisfy the nondegenerate constraint qualification if at X the rank of the Jacobian matrix DH is m. Theorem 7 (Constrained Max and Min) Let F (X), h 1 (X),..., h m (X) be C 1 functions with domain in R n.consider the problem: maximize F (X) subject to h 1 (X) = c 1,..., h m (X) = c m. Suppose that: (i) X satisfies h 1 = c 1,..., h m = c m, (ii) X is a local max or min of F ( ) on C h {X : h 1 (X) = c 1,..., h m (X) = c m }, (iii) X satisfies the NDCQ condition. Then there exists λ (λ 1,..., λ m) sucha that (X, λ ) is a critical point of the Lagrangian: 19

L(X, λ) F (X) λ 1 [h 1 (X) c 1 ]... λ m [h m (X) c m ]. L In other words, (X, λ ) = 0,..., L (X, λ ) = 0 and L λ 1 (X, λ ) = 0,..., L λ m (X, λ ) = 0. Proof. By assumption (iii) X satisfies the NDCQ condition hence the rank of the Jacobian matrix DH is m. Next, we claim that the Jacobian matrix of the column-vector of functions G = [F h 1 h 2... h m ] is not of maximum rank at X., i.e. the matrix DG = (X )... (X ) h 1 (X h )... 1..... h m (X h )... m (8) has rank strictly less than m+1. To prove this claim we are going to assume that the rank of DG is m+1 and generate a contradiction. Let c 0 = F and consider the system of equations F (X) = c 0 h 1 (X) = c 1. h m (X) = c m. (9) We know by assumption (i) and definition of c 0, that X is a solution of (9). Notice that the matrix (8) is the Jacobian associated to the system (9) and assume that the rank of DG is m+1. By the Implicit Function Theorem we can perturbate the c s and still find a solution for the system (9), in particular we can find a solution X such that the perturbed system F ( X) = c 0 + ε h 1 ( X) = c 1. h m ( X) = c m is satisfied, where ε is a small positive number. The last m equations of the system satisfy the constraints of the original problem. However F ( X) > 20

F (X). A contradiction to assumption (ii) of the theorem. This proves our claim. Next, the claim just proved implies that there exist scalars α 0, α 1,..., α n such that: (X h ) 1 (X h ) m (X ) α 0. +α 1. h 1 + +α m. h m = 0. 0. (10) Next claim that since X satisfies the NDCQ condition then α 0 0. To prove this (second) claim simply notice that if α 0 = 0 then by (10) the gradient vectors of the constraint functions are linearly dependent, this is α 1 h 1 +α 2 h 2 +...+α m h m = 0. Letting λ 1 = α 1 α 0,..., λ m = αm α 0, (10) can be written as F λ 1 h 1... λ m h m = 0 Suggested Exercise 9 Solve the following constrained maximization problem: maximize f(x 1, x 2 ) = x 1 x 2 subject to h(x 1, x 2 ) x 1 + 4x 2 = 16 Suggested Exercise 10 Solve the following constrained maximization problem: maximize f(x 1, x 2 ) = x 2 1x 2 subject to (x 1, x 2 ) in the constraint set C h = {(x 1, x 2 ) : 2x 2 1 + x 2 2 = 3}. Suggested Exercise 11 Find the max and min of f(x, y, z) = yz + xz subject to y 2 + z 2 = 1 and xz = 3. 3.2 Meaning of the Multiplier Theorem 8 (Meaning of the Multiplier) Let F (X), h 1 (X),..., h m (X) be C 1 function with domain in R n, let C = (c 1,..., c m ) a m-tuple of exogenous parameters and consider the problem: max F (X) subject to h 1 (X) = c 1,..., h m (X) = c m. (11) Suppose that : (i) The pair X, λ is the solution to the aforementioned problem, where λ = (λ 1,..., λ m ) are the corresponding Lagrange multipliers. 21

(ii) assume that x 1(c 1,..., c m ),..., x n(c 1,..., c m ) and λ 1(c 1,..., c m ),..., λ n(c 1,..., c m ) are differentiable functions in C. Then for each j = 1,..., m, c j F (x 1(c 1,..., c m ),..., x n(c 1,..., c m )) = λ j(c 1,..., c m ). Proof. Let X (C) x 1(c 1,..., c m ),..., x n(c 1,..., c m ). Since X (C) is a solution to the problem (11), by Theorem 5 for every j = 1,..., m (X (C)) = m j=1 λ j (c 1,..., c m ) h j (X (C)) (X (C)) = m j=1 λ j (c 1,..., c m ) h j (X (C)) hence for a generic x i we have x i (X (C)) = m j=1 λ j (c 1,..., c m ) h j x i (X (C)) (12) Moreover, since h j (X (C)) = c j then for all j = 1,..., m differentiating with respect to c j we have: h j (X (C)) dx 1 dc j + + h j (X (C)) dxn dc j, this is n i=1 h j x i (X (C)) dx i dc j = 1 (13) note also that by Chain Rule, differentiating h j (X (C)) = c j w.r.t. c 1 for j 1 : h j (X (C)) dx 1 + + h j (X (C)) dxn n i=1 = dc j = 0 h j x i (X (C)) dx i = 0 (14) Next, using the Chain Rule in F ( ) to differentiate w.r.t. c 1 we get df (x 1,..., x n) = ( ) dx x 1 1 ( ) + + ( ) dx x n n ( ) this is df (X (C)) = n i=1 (X (C)) dx x i i (X (C)) (15) 22

substituting (X (C)) from (12) into (15) we get x i df (X (C)) = n or df (X (C)) = m m i=1 j=1 j=1 λ j (c 1,..., c m ) h j x i (X (C)) dx i (X (C)) λ j (c 1,..., c m ) n i=1 h j x i (X (C)) dx i (X (C)) now recognize that from (14) the second summation is zero for all j 1, then the last expression simplifies to df (X (C)) = λ 1 (c 1,..., c m ) n i=1 now from (13) we know that n h 1 i=1 λ 1 (c 1,..., c m ). Q.E.D. h 1 x i (X (C)) dx i (X (C)) x i ( ) dx i ( ) = 1 hence df (X (C)) = From Theorem 8 we conclude that the Lagrange multiplier can be interpreted as the change in the maximum achieved by the objective function if we relax (tighten) in a small amount the corresponding constraint. Consider for example the case in which F is a utility function with two arguments x 1 and x 2,suppose that h(x 1, x 2 ) = Y 0 is the budget constraint given the income of Y 0 dollars. The Lagrange multiplier associated to this problem tells us how much happiness increases( measured by F ) if we increase a little the amount of income of the individual, this is called the marginal utility of income. Suggested Exercise 12 Redo exercise 10 using the constraint 2x 2 1 + x 2 2 = 3.3 3.3 Second Order Sufficient Conditions Theorem 9 (S.O.S.C. for Constrained Max) Let F (X), h 1 (X),..., h m (X) be C 2 function with domain in R n and consider the problem maximize F (X) subject to h 1 (X) = c 1,..., h m (X) = c m. Suppose that: (i) X = (x 1,..., x n) satisfies h j ( ) = c j for j = 1,..., m. 23

(ii) There exists λ = (λ 1,..., λ m) such that (X, λ ) is a critical point in the Lagrangian. This is L = 0,..., L = 0, L λ 1 = 0,..., L λ m = 0 at (X, λ ). (iii) The Hessian of L with respect to X at (X, λ ), DXL(X 2, λ ) is negative definite on the linear constraint set {V : DHV = 0}, that is V 0,and DHV = 0 = V DXL(X 2, λ )V < 0 Then, X is a strict local constrained max of F. The proof is left as an exercise. Suggested Exercise 13 In exercise 10, use the second order conditions to determine which of the critical points are local maxima and local minima. 4 Calculus of Several Variables 4.1 Weierstrass s and Mean Value Theorems Definition 16 (Compact Set) A closed and bounded set on R n is called a compact set. Definition 17 (Connected Set) A subset U R 1 is connected if for all x, z U if x y z then y U. Remark 8 (Weirstrass s Theorem) Let F : U R 1 be a continuous function whose domain is a compact subset U R n. Then there exist points X m and X M in U such that F (X m ) F (X) F (X M ) for all X U. These X m and X M are the global min and max of F in U. Theorem 10 (Rolle s Theorem) Suppose that F : [a, b] R 1 is continuous on [a, b] and C 1 on (a, b). If F (a) = F (b) = 0 then there is a point c (a, b) such that F (c) = 0. Proof. If F is constant in its domain then F (c) = 0 follows directly. Assume without loss of generality that F is sometimes positive on (a, b). By Weierstrass s theorem, F achieves its max at some point c [a, b]. Since F (c) > 0, c must be in (a, b). By the usual first order condition for an interior max on R 1, f (c) = 0 Q.E.D. 24

Theorem 11 (MeanValueTheorem) Let F : U R 1 be a continuous function whose domain is a connected interval U R 1. For any points a, b in U there is a point c between a and b such that F (b) F (a) = F (c)(b a). F (b) F (a) Proof. Construct the function g 0 (x) F (b) F (x) + (x b). b a Note that g 0 (a) = 0 and g 0 (b) = 0.By Rolle s theorem, there is a point c (a, b) so that g 0(c) = 0. Differentiating g 0 (x) with respect to x we get g 0(x) = F F (b) F (a) (x) + now evaluating this expression at c and equating b a to zero F F (b) F (a) (c) = or F (b) F (a) = F (c)(b a). Q.E.D. b a 4.2 Taylor Polynomials on R 1 Theorem 12 (Taylor Polynomials in One Dimension) Let F : U R 1 be a C 2 function whose domain is a connected interval U R 1. For any points a and a + h in U there exists a point c 2 between a and a + h such that F (a + h) = F (a) + F (a)h + 1F (c 2 2 )h 2 Proof. Fix a and a + h in U. Define a function g 1 (x) F (x) [F (a) + F (a)(x a)] M 1 (x a) 2,where M 1 = 1 h 2 [F (a + h) F (a) F (a)h], so that g 1 (a + h) = 0.Note that g 1 (a) = 0 and g 1(x) = F (x) F (a) 2M 1 (x a), thus g 1 (a) = g 1(a) = 0. Since g 1 (a) = g 1 (a + h) = 0 we can apply Rolle s theorem and conclude that there exists c 1 between a and a + h so that g 1(c 1 ) = 0. Moreover, since g 1(c 1 ) = 0 and g 1(a) = 0 then by Rolle s theorem again we conclude that there is a c 2 between c 1 and a so that g 1(c 2 ) = 0. Next note that g 1(x) = F (x) 2M 1, then F (c 2 ) = 2M 1 therefore F (c 2 ) = 2 1 [F (a + h) F (a) F (a)h] or F (a + h) = F (a) + F (a)h + h 1 2 F (c 2 2 )h 2. The term 1F (c 2 2 )h 2 is called the residual of the Taylor expansion and is denoted by R 1 (h; a) 1F (c 2 2 )h 2. Note that R 1(h;a) = 1F (c h 2 2 )h and c 2 a whenever h 0 because we know that c 2 lies between a and a + h. Thus R lim 1 (h;a) 1 h 0 = lim h h 0 F (c 2 2 )h 1F (a) 0 = 0. Q.E.D. 2 25

Remark 9 (Taylor Polynomials of Order k) Let F : U R 1 be a C k+1 function whose domain is a connected interval U R 1. For any points a and a + h in U there exists a point c between a and a + h such that F (a+h) = F (a)+f (a)h+ 1F (a)h 2 +...+ 1 F [k] (a)h [k] + 1 F [k+1] (c )h [k+1]. 2 k! k+1! The proof follows the same lines as the proof of Theorem 12, for k = 2 redefine g 1 (x) F (x) [F (a) + F (a)(x a) + 1 2 F (a)(x a) 2 ] M 1 (x a) 3, where M 1 = 1 h 3 [F (a + h) F (a) F (a)h 1 2 F (a)h 2 ]. Similarly to the case of Theorem 12, we define the residual as R k (h; a) 1 F [k+1] (c )h [k+1], where c a as h 0 then the residual has the k+1! R property: lim k (h;a) 1 h 0 = lim h k h 0 F [k+1] (a) h[k+1] = 0 k+1! h k 4.3 Taylor Polynomials in R n Theorem 13 (Taylor Polynomial in n Dimensions) Let F : U R 1 be a C 2 function whose domain is an open set U R n. Let a be a point in U. Then there exists a C 2 function h R 2 (h; a) such that for any point a + h in U with the property that the line segment l(a, a + h) lies in U: F (a + h) = F (a) + DF (a)h + 1 2 h D 2 F (a)h + R 2 (h; a), where R 2(h;a) h 2 0 as h 0. Proof. Define the C 1 function φ : [0, 1] U by φ(t) a + t h, that is φ(t) parameterizes the line segment l(a, a + h). Define f : [0, 1] R 1 by f F (φ(t)). Applying the one dimensional Taylor expansion of Remark 9 to f around zero we know that for any t [0, 1] there exists c 1 such that f(t) = f(0) + f (0)t + 1 2 f (0)t 2 + R(c 1 ) now by definition of f and the Chain Rule note that f (t) = DF (a + th)h and f (t) = h D 2 F (a + th)h then the expression above implies F (a + th) = F (a) + DF (a)ht + 1 2 h td 2 F (a)ht + R(c 1 ), define h ht and get F (a + h) = F (a) + DF (a) h + 1 2 h D 2 F (a) h + R(c 1 ). Finally note that R(c 1) h 2 = o(1). Q.E.D. 26

Remark 10 (Taylor Polynomials of Order k in n Dimensions) Let F : U R 1 with U R n be a C k function in a ball B around a U. Then there exists a C k function R k ( ; a) such that for any point a + h in B F (a+h) = F (a)+df (a)(h)+ 1 2! D2 F (a)(h, h)+...+ 1 k! Dk F (a)(h,..., h)+ R k (h; a) where R k(h;a) h k 0 as h 0 and D k F (a)(h,..., h) [ k F k 1,...,k n x k 1 1... xkn n where the summation is over all n-tuples (k 1,..., k n ) of nonnegative integers such that k 1+... + k n = k. The proof is similar to the proof of Theorem 13. We need to parameterize F on a line segment φ and use Remark 10. (a)h k 1 1 h k 2 2... h kn n ], Suggested Exercise 14 Compute the first and second order Taylor polynomial of the exponential function f(x) = e x at x = 0. Suggested Exercise 15 Compute the Taylor polynomial of order 2 of F (x, y) = x 1/4 y 3/4 at (1, 1). At this point we are ready to present a proof of Theorem 5. For ease of the exposition, Theorem 5 is presented again. Theorem 5 (proof of) Let F : U R 1 element of C 2 with open domanin U R n.suppose X is an interior point of U and suppose that X is a local min of F (respectively local max) then DF = 0, and D 2 F is positive semidefinite (respectively negative semidefinite). Proof. Claim that DF = 0 and W D 2 F W < 0 imply that the function t F (X + tw ) is decreasing in t for a small positive t. To prove the claim fix X and W and consider a Taylor expansion of F (t) around X. This is F (X +tw ) = F +DF W + 1W D 2 F W +R 2 2. Next, t small imply R 2 0 and imposing DF = 0 we have F (X + tw ) = F + 1W D 2 F W then W D 2 F W < 0 imply F (X + tw ) < 2 F that is, the function F is decreasing around X. Next, since F is decreasing in a neighborhood of X, X can not be a local min. Therefore if a critical point X is a local min of F there is no 27

W such that W D 2 F W < 0; in other words V D 2 F V 0 for any V R n. Hence D 2 F is positive semidefinite. The case for max follows a similar argument. Q.E.D. 28