Functions of Several Variables The Unconstrained Minimization Problem where In n dimensions the unconstrained problem is stated as f() x variables. minimize f()x x, is a scalar objective function of vector argument x, and s a column vector of n real R n We can take the gradient of f() x, as f which is a column vector of n functions. f f -------, -------,, ------- f x x n T The n x n matrix of second derivatives is called the Hessian matrix and can be found by taking the gradient again, thus Hx (,, x n ) T f f or more specifically, the operator can be written as.. x n. x j... x n.. x n x j n n where the subscript i denotes the row and j denotes the column. The Hessian matris symmetric because of the properties of the second derivatives of continuous functions. x j fx () x j fx ()
Some Definitions Definition a point x ** is said to be the strong global minimum of a function f() f for all x R n, f() x fx ( ** ) > 0. Definition a point is said to be a strong local minimum of a function f() f there exists a δ > 0 x * such that f() x fx ( * ) > 0 for all x such that x x * < δ. If the inequalities in f() x fx ( ** ) > 0 and f() x fx ( * ) > 0 are replaced by < then we have strong global and local maxima. If the inequalities in f() x fx ( ** ) > 0 and f() x fx ( * ) > 0 are replaced by then the minima are called weak global and weak local minima. Taylor Expansion We can expand f() n a Taylor expansion about x as fx ( + x) fx () x T + fx () + -- x T fx () x + O( x 3 ) where the notation O( x 3 ) represents terms of order x 3 and these can be neglected for x sufficiently small. From the definition of local minimum, for f( x * + x) fx ( * ) > 0 for x > 0. x * to be a local minimum this implies that From the Taylor expansion f( x * + x) fx ( * ) x T fx * ( ) + -- x T fx ( * ) x must be greater than 0, where we have neglected O( x 3 ) terms. For this to be true it is obvious that f( x * ) 0 otherwise we could pick x appropriately so that x T fx ( * ) < 0.
Necessary Conditions A necessary condition for a local minimum is the analogue of a one dimensional function having a stationary point. Thus we can state the following definition and theorem. Definition If f( x * ) 0 then x * is called a stationary point of f() x. Theorem necessary condition for local minimum based on f() x x * is a local minimum of f() x only if x * is a stationary point of f() x, that is f( x * ) 0. Sufficient Conditions In order to obtain sufficient conditions from the Taylor expansion we need to study the second term: Qx ( ) -- x T fx ( * ) x This is called a quadratic form which can be written more generally as Qz () z T Az where z is an arbitrary column vector, A R n n is an arbitrary matrix and R n the quadratic form is a scalar. The properties of the quadratic form are governed by the properties of the matrix A. Definition The matrix A is called:. positive definite if Qz () > 0, z, z 0,. positive semidefinite if Qz () 0, z, z 0, 3. negative definite if Qz () < 0, z, z 0, 4. negative semidefinite if Qz () 0, z, z 0,and 5. indefinite if the sign of Qz () depends on z. Theorem Sufficient condition based on ( f, f) if f( x * ) 0 and fx ( * ) is positive definite then x * is a strong local minimum of f( x).
Note this is not a necessary condition since if fx ( * ) 0 and the O( x 3 ) terms are positive then x * is still a strong local minimum. Therefore a necessary condition involving fx ( * ) can be stated as follows: Theorem Necessary condition based on ( f, f) x * is a local minimum only if fx ( * ) 0 and fx ( * ) is positive semidefinite. Definition a stationary point which is not a local minimum or local maximum is called a saddle point. Example: Consider the function the gradient is easily found as f() x x + x f() x x x T and thus f( x * ) 0 x* x* 0 x * 00 T is a stationary point. The Hessian is independent of x since f(x) is a quadratic function. We now determine whether the Hessian is positive definite by looking at the quadratic form which can easily be written as fx ( * ) z Hx ( * ) H z T Hz z z z z z + z z z z z z + z z z z T Hz [ ( z z ) + z z ] [ ( z + z ) 3z z ]. Now we note that the first form is positive for z and z both negative or positive and that the second form is always positive for z and z of opposite sign. Thus z T Hz > 0, z, z 0 and thus Hx ( * ) H is positive definite x * is a local minimum. z
Examples of quadratic forms A quadratic form can always be made so that the matris symmetric, for consider the following examples: a) + x + 3x.5 x.5 3 x b).5 + x + 4 x 3 + x 3 x x 3.5 0 0 0 x x 3 c) 00 + x + 3x 3 x x 3 00 003 x x 3 d) consider the general two dimensional case, Q can always be chosen symmetric: + x + 3x Q b a 3 such that a + b, or the symmetric form Q c c 3 such that c ( ---------------- a + b) 0.5.
Tests for Definiteness Definition the principal minors of a matrix a a a 3 a n A a a a 3 a n a 3 a 3 a 33 a n a n a n a n3 a nn a a a 3 a are written as a, a, 3 a a a 3, n A. a a a 3 a 3 a 33 Definition the eigenvalues of a matrix A are defined as all the scalars, which satisfy the characteristic equation A λ i I 0. λ i An nxn real symmetric matrix will have n real eigenvalues. Theorem Sylvester s Theorem A quadratic form z T Qz is positive definite if and only if all the leading principle determinants of the matrix Q are positive. Table : Tests for Definiteness definiteness eigenvalues principal minors. positive definite all λ i > 0 i > 0, i n. positive semidefinite all λ i 0 i 0, i n 3. negative definite all λ i < 0 < 0, > 0, 3 < 0 4. negative semidefinite all λ i 0 0, 0, 3 0 5. indefinite some λ i 0, some λ i 0 none of the above It is also obvious from the definition of positive definite that: Theorem the matrix A is negative definite if A is positive definite.
Examples Determine the sign definiteness of the following:. A 30; we have, 5, 3 30 + 5 3 3 05 05 and therefore since all leading principal determinants are positive definite. A is positive. A 3 0 ; we have, 7 and A is indefinite. 3 0 5 Example: Static Linear Springs K B A K P x System of linear springs in equilibrium The blocks A and B are two frictionless bodies connected to 3 linear elastic springs having spring constants K, K,. When the force P 0, 0, x 0 defines the natural position. We want to find new positions, x when a nonzero force P is applied. Solution: Recall that for a linear spring Hooke s Law says that F Kd where K is the spring constant and d is the displacement of the spring from equilibrium. The strain energy of a spring is the work done in stretching it, that is x E Kxdx --Kx. 0
The work put into the system by the constant force P is given by Px energy of the system is and thus the potential U --K. --K 3 ( x ) --K + + x Px We want to minimize this potential energy U according to the principle of minimum potential energy. Thus we have U * * * ------- K x ( x ) 0 or U * * * ------- K x 3 ( x ) + K x P 0 K + K + * x * 0 P which gives * PK -------------------------------------------------------- 3 ( K K + K + K ) * PK ( x + ) -------------------------------------------------------- ( K K + K + K ) as the only stationary point. We now determine the Hessian as Hx () U U x x U x U x K + K + and need to know whether Hx () positive definite? The principal minors are given as K K + > 0, + K + ( K K + K + K ) > 0 and therefore x * *, x is the minimum. The value of the potential at this minimum is given as P ( K + ) U min 0.5---------------------------------------------------. K + K K + K
Quadratic Functions In general we write a scalar quadratic function of n variables, x qx () c b T + x + --x T Ax where c R, b R n, and the matrix A R n n. R n, in matrix notation as We note that from Taylor s expansion any function can be approximated by a quadratic function f( x + x) fx () x T + fx () + -- x T fx () x + O( x 3 ) in a small enough region. Therefore what we now say about quadratic functions qx () can be used later in approximations. In algebraic form we write b T qx () as n qx () c+ b i + -- x j a ij i j i where [ b b n ], and A [ a ij ] n n. Since x T As a quadratic form we can always assume it to be symmetric. n n Definition If A is positive definite then qx ( ) is called a positive definite quadratic function. Question Show that the gradient vector of a quadratic function by q() x Ax + b. qx () c b T + x + --x T Ax is given q Solution: To find the gradient vector we require:, k,, n. Writing qx () algebraically x k and rearranging we have: qx () c+ b i + -- x a x k + kj j Now since A is symmetric i i c+ b i + -- a kk x k a ij a ji j x k j k and we have i k j a ij x j + a kj x j + x k a ik + i k i k j k a ij x j
and therefore qx () c + b i + -- a kk x k + x k a kj x j + i j k i k j k a ij x j This result can be written in vactor notation as b k + -- a. kk x k + a kj b k + a kj x j Thus we see that the gradient of a quadratic is easily found by a matrix multiplication and vector addition. q x k j k q() x b + Ax j Question Show that the Hessian matrix of the quadratic function equal to A. qx () c b T + x + --x T Ax is Solution: We know that the Hessian matris given as and from the above we see that Hx () qx ( ()) T qx () x j n n Therefore q x j b j + a jk x k. k qx () a x ji a ij j and in matrix form Hx () H A and the Hessian of a quadratic function is a constant matrix. Thus we see that if qx () is a positive definite quadratic function then H is positive definite at all points which implies that x * will be a minimum. In fact since
is the unique minimum of a positive definite quadratic function. Example q( x * ) b + Ax * 0 x * A b Consider the following quadratic function in two variables: f(, x ) K + d + ex + cx + b x + a where: a, b, c, d, e, K. Determine the conditions under which f() s positive definite. Solution: Rewriting f() x in matrix form as f() x K de + + -- a b x x x b c x we see that the Hessian is given by A a b. b c Therefore by Sylvester s theorem A is positive definite if and only if: a > 0, 4ac b > 0 c > 0. From geometry we know that if 4ac b > 0 then the contour plots f() x const. define concentric ellipses. unique minimum Concentric ellipses define shape of quadratic function in -D
As a specific example if a d K, e, b 0andc,, we have f() x + x + + x f() x + + -- 0 x x x 04 x and therefore A 0, b 04 A -- 0 0 4 -- -- 0 x * A b 0 4 -- ----- -- 4 is the unique minimum.