Functions of Several Variables

Similar documents
Math (P)refresher Lecture 8: Unconstrained Optimization

Performance Surfaces and Optimum Points

Math 5BI: Problem Set 6 Gradient dynamical systems

Linear Algebra Solutions 1

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

Linear Algebra And Its Applications Chapter 6. Positive Definite Matrix

HW3 - Due 02/06. Each answer must be mathematically justified. Don t forget your name. 1 2, A = 2 2

REVIEW OF DIFFERENTIAL CALCULUS

Chapter 2: Unconstrained Extrema

Chapter 3 Transformations

OR MSc Maths Revision Course

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as

Differentiable Functions

Math Matrix Algebra

Gradient Descent. Dr. Xiaowei Huang

Intro to Nonlinear Optimization

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Optimality Conditions

Nonlinear equations and optimization

Review of Vectors and Matrices

Symmetry and Properties of Crystals (MSE638) Stress and Strain Tensor

Lecture Notes: Geometric Considerations in Unconstrained Optimization

8 Extremal Values & Higher Derivatives

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

5 More on Linear Algebra

September Math Course: First Order Derivative

Optimal Control. Quadratic Functions. Single variable quadratic function: Multi-variable quadratic function:

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Computational Optimization. Mathematical Programming Fundamentals 1/25 (revised)

Lecture 6 Positive Definite Matrices

8. Diagonalization.

Paul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions

Vector Derivatives and the Gradient

Appendix A Taylor Approximations and Definite Matrices

Math 291-2: Final Exam Solutions Northwestern University, Winter 2016

Roots and Coefficients Polynomials Preliminary Maths Extension 1

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR

Calculus 2502A - Advanced Calculus I Fall : Local minima and maxima

Chapter 2 BASIC PRINCIPLES. 2.1 Introduction. 2.2 Gradient Information

In these chapter 2A notes write vectors in boldface to reduce the ambiguity of the notation.

Symmetric matrices and dot products

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Convex Functions and Optimization

Positive Definite Matrix

7. Symmetric Matrices and Quadratic Forms

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

A A x i x j i j (i, j) (j, i) Let. Compute the value of for and

B553 Lecture 5: Matrix Algebra Review

Conjugate Gradient (CG) Method

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

Multivariate Newton Minimanization

Lecture Unconstrained optimization. In this lecture we will study the unconstrained problem. minimize f(x), (2.1)

Introduction to gradient descent

Semidefinite Programming Basics and Applications

MATH529 Fundamentals of Optimization Unconstrained Optimization II

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Max-Min Problems in R n Matrix

CHAPTER 2: QUADRATIC PROGRAMMING

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

STAT200C: Review of Linear Algebra

Linear Algebra: Linear Systems and Matrices - Quadratic Forms and Deniteness - Eigenvalues and Markov Chains

Basics of Calculus and Algebra

CHAPTER 4: HIGHER ORDER DERIVATIVES. Likewise, we may define the higher order derivatives. f(x, y, z) = xy 2 + e zx. y = 2xy.

UNCONSTRAINED OPTIMIZATION PAUL SCHRIMPF OCTOBER 24, 2013

1 Convexity, concavity and quasi-concavity. (SB )

Neural Network Training

EC /11. Math for Microeconomics September Course, Part II Lecture Notes. Course Outline

Symmetric Matrices and Eigendecomposition

Elements of Continuum Elasticity. David M. Parks Mechanics and Materials II February 25, 2004

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

Linear Algebra: Matrix Eigenvalue Problems

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Math Camp Notes: Linear Algebra I

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Linear Algebra. Matrices Operations. Consider, for example, a system of equations such as x + 2y z + 4w = 0, 3x 4y + 2z 6w = 0, x 3y 2z + w = 0.

1 Kernel methods & optimization

Matrices and Deformation

Numerical Optimization

Unconstrained optimization

1. General Vector Spaces

Lecture 8: Maxima and Minima

Scientific Computing: Optimization

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Mathematical Economics: Lecture 16

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Math 273a: Optimization Basic concepts

Lecture 1 Notes: 06 / 27. The first part of this class will primarily cover oscillating systems (harmonic oscillators and waves).

(a) Write down the value of q and of r. (2) Write down the equation of the axis of symmetry. (1) (c) Find the value of p. (3) (Total 6 marks)

April 26, Applied mathematics PhD candidate, physics MA UC Berkeley. Lecture 4/26/2013. Jed Duersch. Spd matrices. Cholesky decomposition

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Appendix 3. Derivatives of Vectors and Vector-Valued Functions DERIVATIVES OF VECTORS AND VECTOR-VALUED FUNCTIONS

General Physics I Spring Oscillations

Linear Algebra. Linear Equations and Matrices. Copyright 2005, W.R. Winfrey

Principal Components Theory Notes

AM 205: lecture 6. Last time: finished the data fitting topic Today s lecture: numerical linear algebra, LU factorization

3 (Maths) Linear Algebra

Transcription:

Functions of Several Variables The Unconstrained Minimization Problem where In n dimensions the unconstrained problem is stated as f() x variables. minimize f()x x, is a scalar objective function of vector argument x, and s a column vector of n real R n We can take the gradient of f() x, as f which is a column vector of n functions. f f -------, -------,, ------- f x x n T The n x n matrix of second derivatives is called the Hessian matrix and can be found by taking the gradient again, thus Hx (,, x n ) T f f or more specifically, the operator can be written as.. x n. x j... x n.. x n x j n n where the subscript i denotes the row and j denotes the column. The Hessian matris symmetric because of the properties of the second derivatives of continuous functions. x j fx () x j fx ()

Some Definitions Definition a point x ** is said to be the strong global minimum of a function f() f for all x R n, f() x fx ( ** ) > 0. Definition a point is said to be a strong local minimum of a function f() f there exists a δ > 0 x * such that f() x fx ( * ) > 0 for all x such that x x * < δ. If the inequalities in f() x fx ( ** ) > 0 and f() x fx ( * ) > 0 are replaced by < then we have strong global and local maxima. If the inequalities in f() x fx ( ** ) > 0 and f() x fx ( * ) > 0 are replaced by then the minima are called weak global and weak local minima. Taylor Expansion We can expand f() n a Taylor expansion about x as fx ( + x) fx () x T + fx () + -- x T fx () x + O( x 3 ) where the notation O( x 3 ) represents terms of order x 3 and these can be neglected for x sufficiently small. From the definition of local minimum, for f( x * + x) fx ( * ) > 0 for x > 0. x * to be a local minimum this implies that From the Taylor expansion f( x * + x) fx ( * ) x T fx * ( ) + -- x T fx ( * ) x must be greater than 0, where we have neglected O( x 3 ) terms. For this to be true it is obvious that f( x * ) 0 otherwise we could pick x appropriately so that x T fx ( * ) < 0.

Necessary Conditions A necessary condition for a local minimum is the analogue of a one dimensional function having a stationary point. Thus we can state the following definition and theorem. Definition If f( x * ) 0 then x * is called a stationary point of f() x. Theorem necessary condition for local minimum based on f() x x * is a local minimum of f() x only if x * is a stationary point of f() x, that is f( x * ) 0. Sufficient Conditions In order to obtain sufficient conditions from the Taylor expansion we need to study the second term: Qx ( ) -- x T fx ( * ) x This is called a quadratic form which can be written more generally as Qz () z T Az where z is an arbitrary column vector, A R n n is an arbitrary matrix and R n the quadratic form is a scalar. The properties of the quadratic form are governed by the properties of the matrix A. Definition The matrix A is called:. positive definite if Qz () > 0, z, z 0,. positive semidefinite if Qz () 0, z, z 0, 3. negative definite if Qz () < 0, z, z 0, 4. negative semidefinite if Qz () 0, z, z 0,and 5. indefinite if the sign of Qz () depends on z. Theorem Sufficient condition based on ( f, f) if f( x * ) 0 and fx ( * ) is positive definite then x * is a strong local minimum of f( x).

Note this is not a necessary condition since if fx ( * ) 0 and the O( x 3 ) terms are positive then x * is still a strong local minimum. Therefore a necessary condition involving fx ( * ) can be stated as follows: Theorem Necessary condition based on ( f, f) x * is a local minimum only if fx ( * ) 0 and fx ( * ) is positive semidefinite. Definition a stationary point which is not a local minimum or local maximum is called a saddle point. Example: Consider the function the gradient is easily found as f() x x + x f() x x x T and thus f( x * ) 0 x* x* 0 x * 00 T is a stationary point. The Hessian is independent of x since f(x) is a quadratic function. We now determine whether the Hessian is positive definite by looking at the quadratic form which can easily be written as fx ( * ) z Hx ( * ) H z T Hz z z z z z + z z z z z z + z z z z T Hz [ ( z z ) + z z ] [ ( z + z ) 3z z ]. Now we note that the first form is positive for z and z both negative or positive and that the second form is always positive for z and z of opposite sign. Thus z T Hz > 0, z, z 0 and thus Hx ( * ) H is positive definite x * is a local minimum. z

Examples of quadratic forms A quadratic form can always be made so that the matris symmetric, for consider the following examples: a) + x + 3x.5 x.5 3 x b).5 + x + 4 x 3 + x 3 x x 3.5 0 0 0 x x 3 c) 00 + x + 3x 3 x x 3 00 003 x x 3 d) consider the general two dimensional case, Q can always be chosen symmetric: + x + 3x Q b a 3 such that a + b, or the symmetric form Q c c 3 such that c ( ---------------- a + b) 0.5.

Tests for Definiteness Definition the principal minors of a matrix a a a 3 a n A a a a 3 a n a 3 a 3 a 33 a n a n a n a n3 a nn a a a 3 a are written as a, a, 3 a a a 3, n A. a a a 3 a 3 a 33 Definition the eigenvalues of a matrix A are defined as all the scalars, which satisfy the characteristic equation A λ i I 0. λ i An nxn real symmetric matrix will have n real eigenvalues. Theorem Sylvester s Theorem A quadratic form z T Qz is positive definite if and only if all the leading principle determinants of the matrix Q are positive. Table : Tests for Definiteness definiteness eigenvalues principal minors. positive definite all λ i > 0 i > 0, i n. positive semidefinite all λ i 0 i 0, i n 3. negative definite all λ i < 0 < 0, > 0, 3 < 0 4. negative semidefinite all λ i 0 0, 0, 3 0 5. indefinite some λ i 0, some λ i 0 none of the above It is also obvious from the definition of positive definite that: Theorem the matrix A is negative definite if A is positive definite.

Examples Determine the sign definiteness of the following:. A 30; we have, 5, 3 30 + 5 3 3 05 05 and therefore since all leading principal determinants are positive definite. A is positive. A 3 0 ; we have, 7 and A is indefinite. 3 0 5 Example: Static Linear Springs K B A K P x System of linear springs in equilibrium The blocks A and B are two frictionless bodies connected to 3 linear elastic springs having spring constants K, K,. When the force P 0, 0, x 0 defines the natural position. We want to find new positions, x when a nonzero force P is applied. Solution: Recall that for a linear spring Hooke s Law says that F Kd where K is the spring constant and d is the displacement of the spring from equilibrium. The strain energy of a spring is the work done in stretching it, that is x E Kxdx --Kx. 0

The work put into the system by the constant force P is given by Px energy of the system is and thus the potential U --K. --K 3 ( x ) --K + + x Px We want to minimize this potential energy U according to the principle of minimum potential energy. Thus we have U * * * ------- K x ( x ) 0 or U * * * ------- K x 3 ( x ) + K x P 0 K + K + * x * 0 P which gives * PK -------------------------------------------------------- 3 ( K K + K + K ) * PK ( x + ) -------------------------------------------------------- ( K K + K + K ) as the only stationary point. We now determine the Hessian as Hx () U U x x U x U x K + K + and need to know whether Hx () positive definite? The principal minors are given as K K + > 0, + K + ( K K + K + K ) > 0 and therefore x * *, x is the minimum. The value of the potential at this minimum is given as P ( K + ) U min 0.5---------------------------------------------------. K + K K + K

Quadratic Functions In general we write a scalar quadratic function of n variables, x qx () c b T + x + --x T Ax where c R, b R n, and the matrix A R n n. R n, in matrix notation as We note that from Taylor s expansion any function can be approximated by a quadratic function f( x + x) fx () x T + fx () + -- x T fx () x + O( x 3 ) in a small enough region. Therefore what we now say about quadratic functions qx () can be used later in approximations. In algebraic form we write b T qx () as n qx () c+ b i + -- x j a ij i j i where [ b b n ], and A [ a ij ] n n. Since x T As a quadratic form we can always assume it to be symmetric. n n Definition If A is positive definite then qx ( ) is called a positive definite quadratic function. Question Show that the gradient vector of a quadratic function by q() x Ax + b. qx () c b T + x + --x T Ax is given q Solution: To find the gradient vector we require:, k,, n. Writing qx () algebraically x k and rearranging we have: qx () c+ b i + -- x a x k + kj j Now since A is symmetric i i c+ b i + -- a kk x k a ij a ji j x k j k and we have i k j a ij x j + a kj x j + x k a ik + i k i k j k a ij x j

and therefore qx () c + b i + -- a kk x k + x k a kj x j + i j k i k j k a ij x j This result can be written in vactor notation as b k + -- a. kk x k + a kj b k + a kj x j Thus we see that the gradient of a quadratic is easily found by a matrix multiplication and vector addition. q x k j k q() x b + Ax j Question Show that the Hessian matrix of the quadratic function equal to A. qx () c b T + x + --x T Ax is Solution: We know that the Hessian matris given as and from the above we see that Hx () qx ( ()) T qx () x j n n Therefore q x j b j + a jk x k. k qx () a x ji a ij j and in matrix form Hx () H A and the Hessian of a quadratic function is a constant matrix. Thus we see that if qx () is a positive definite quadratic function then H is positive definite at all points which implies that x * will be a minimum. In fact since

is the unique minimum of a positive definite quadratic function. Example q( x * ) b + Ax * 0 x * A b Consider the following quadratic function in two variables: f(, x ) K + d + ex + cx + b x + a where: a, b, c, d, e, K. Determine the conditions under which f() s positive definite. Solution: Rewriting f() x in matrix form as f() x K de + + -- a b x x x b c x we see that the Hessian is given by A a b. b c Therefore by Sylvester s theorem A is positive definite if and only if: a > 0, 4ac b > 0 c > 0. From geometry we know that if 4ac b > 0 then the contour plots f() x const. define concentric ellipses. unique minimum Concentric ellipses define shape of quadratic function in -D

As a specific example if a d K, e, b 0andc,, we have f() x + x + + x f() x + + -- 0 x x x 04 x and therefore A 0, b 04 A -- 0 0 4 -- -- 0 x * A b 0 4 -- ----- -- 4 is the unique minimum.