Differentiable Functions

Similar documents
MA102: Multivariable Calculus

Functions of Several Variables

Functions of Several Variables

MAT 473 Intermediate Real Analysis II

1 Directional Derivatives and Differentiability

REVIEW OF DIFFERENTIAL CALCULUS

Computational Optimization. Mathematical Programming Fundamentals 1/25 (revised)

Introduction - Motivation. Many phenomena (physical, chemical, biological, etc.) are model by differential equations. f f(x + h) f(x) (x) = lim

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

We denote the derivative at x by DF (x) = L. With respect to the standard bases of R n and R m, DF (x) is simply the matrix of partial derivatives,

Math 497C Mar 3, Curves and Surfaces Fall 2004, PSU

Chapter 8: Taylor s theorem and L Hospital s rule

Convex Functions and Optimization

Vector Calculus, Maths II

Chapter 5 Elements of Calculus

Recitation 1. Gradients and Directional Derivatives. Brett Bernstein. CDS at NYU. January 21, 2018

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR

1. Bounded linear maps. A linear map T : E F of real Banach

Matrix Algebra & Elementary Matrices

Vectors, metric and the connection

Multivariable Calculus

Kinematics of fluid motion

M2PM1 Analysis II (2008) Dr M Ruzhansky List of definitions, statements and examples Preliminary version

Transformations from R m to R n.

Mathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions

MTH4101 CALCULUS II REVISION NOTES. 1. COMPLEX NUMBERS (Thomas Appendix 7 + lecture notes) ax 2 + bx + c = 0. x = b ± b 2 4ac 2a. i = 1.

Optimization Tutorial 1. Basic Gradient Descent

3.5 Quadratic Approximation and Convexity/Concavity

Section Taylor and Maclaurin Series

Course Summary Math 211

Chapter 11. Taylor Series. Josef Leydold Mathematical Methods WS 2018/19 11 Taylor Series 1 / 27

MATRICES. a m,1 a m,n A =

Unconstrained optimization

Directional Derivative and the Gradient Operator

The Calculus of Vec- tors

Matrices: 2.1 Operations with Matrices

YURI LEVIN, MIKHAIL NEDIAK, AND ADI BEN-ISRAEL

Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems

Lecture Notes on Metric Spaces

446 CHAP. 8 NUMERICAL OPTIMIZATION. Newton's Search for a Minimum of f(x,y) Newton s Method

VCE. VCE Maths Methods 1 and 2 Pocket Study Guide

Analysis-3 lecture schemes

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Notes on Cellwise Data Interpolation for Visualization Xavier Tricoche

FIXED POINT ITERATIONS

Physics 411 Lecture 7. Tensors. Lecture 7. Physics 411 Classical Mechanics II

Optimality Conditions for Constrained Optimization

Mathematical Economics: Lecture 9

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

MATH 320, WEEK 7: Matrices, Matrix Operations

2.20 Fall 2018 Math Review

LECTURE 5: THE METHOD OF STATIONARY PHASE

Nonlinear Optimization

Preliminary draft only: please check for final version

FALL 2018 MATH 4211/6211 Optimization Homework 1

x +3y 2t = 1 2x +y +z +t = 2 3x y +z t = 7 2x +6y +z +t = a

OR MSc Maths Revision Course

SIMPLE MULTIVARIATE OPTIMIZATION

2.3 Terminology for Systems of Linear Equations

University of Houston, Department of Mathematics Numerical Analysis, Fall 2005

Review of Vectors and Matrices

MA1032-Numerical Analysis-Analysis Part-14S2- 1 of 8

ARE202A, Fall 2005 CONTENTS. 1. Graphical Overview of Optimization Theory (cont) Separating Hyperplanes 1

Permutations and Polynomials Sarah Kitchen February 7, 2006

Module-3: Kinematics

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

Machine Learning Brett Bernstein. Recitation 1: Gradients and Directional Derivatives

Optimization and Calculus

COMP 558 lecture 18 Nov. 15, 2010

Max-Min Problems in R n Matrix

G H. Extended Unit Tests A L L. Higher Still Advanced Higher Mathematics. (more demanding tests covering all levels) Contents. 3 Extended Unit Tests

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

Math 291-2: Final Exam Solutions Northwestern University, Winter 2016

Math 162: Calculus IIA

ORTHOGONALITY AND LEAST-SQUARES [CHAP. 6]

Contents. 2 Partial Derivatives. 2.1 Limits and Continuity. Calculus III (part 2): Partial Derivatives (by Evan Dummit, 2017, v. 2.

INTRODUCTION TO REAL ANALYTIC GEOMETRY

Second Order ODEs. Second Order ODEs. In general second order ODEs contain terms involving y, dy But here only consider equations of the form

2 Sequences, Continuity, and Limits

2.10 Saddles, Nodes, Foci and Centers

ISOMETRIES AND THE LINEAR ALGEBRA OF QUADRATIC FORMS.

TEST CODE: MIII (Objective type) 2010 SYLLABUS

The fundamental theorem of calculus for definite integration helped us to compute If has an anti-derivative,

Math 234. What you should know on day one. August 28, You should be able to use general principles like. x = cos t, y = sin t, 0 t π.

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as

NOTES ON CALCULUS OF VARIATIONS. September 13, 2012

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

MATH529 Fundamentals of Optimization Unconstrained Optimization II

(x x 0 ) 2 + (y y 0 ) 2 = ε 2, (2.11)

CITY UNIVERSITY LONDON. BEng (Hons) in Electrical and Electronic Engineering PART 2 EXAMINATION. ENGINEERING MATHEMATICS 2 (resit) EX2003

are continuous). Then we also know that the above function g is continuously differentiable on [a, b]. For that purpose, we will first show that

THE INVERSE FUNCTION THEOREM

100 CHAPTER 4. SYSTEMS AND ADAPTIVE STEP SIZE METHODS APPENDIX

Numerical Analysis: Interpolation Part 1

Caculus 221. Possible questions for Exam II. March 19, 2002

g(t) = f(x 1 (t),..., x n (t)).

CHAPTER 3 Further properties of splines and B-splines

Overview of vector calculus. Coordinate systems in space. Distance formula. (Sec. 12.1)

Transcription:

Differentiable Functions Let S R n be open and let f : R n R. We recall that, for x o = (x o 1, x o,, x o n S the partial derivative of f at the point x o with respect to the component x j is defined as f(x o x j f(x o 1, x o,,, x o j 1, x o j + h, x o j+1,, x o n := lim, h h provided this limit exists. If this limit exists for the points x S, then we can differentiate the resulting function x f(x with respect to any of the components of x to obtain x j f(x x k x j, the second partial derivative of f. In particular, if f i j we refer to this second partial derivative as the mixed partial derivative. An important property of this mixed partial derivative is that ( ( f(x = f(x, x k x j x j x k provided these second derivatives exist and are continuous. Higher order derivitives are defined in a similar manner. A real-valued function f : S R will be said to be of class C (k on the open set S provided it is continuous and posesses continuous partial derivatives of all orders up to and including k. It will be said to be of class C ( on S if it is of class C (k for all integers k. It will be said to be of class C (k on an arbitrary set S provided it is of class C (k on a neighborhood of that set. We will also use the notation C and D for the classes C (1 and C ( respectively. Alternate notations for the partial derivatives will also be used, for example, f xi = f x j, f xk,x j = ( f, etc. x k x j The notion of differentiability of functions of several variables is related to the existence of partial derivatives but is not coincident with the existence of the partials. Indeed, we have the following definition: A function f : S R m where S R n is an open set, is said to be differentiable at a point x o S provided there is a linear transformation L : R n R m such that, for all h R n f(x o + h f(x o L h lim h h =. In the case that such a linear transformation L exists, it is called the derivative (sometimes the Fréchet derivative or the differential of the function f at the point x o. There are 1

various notations for the differential. Since the linear transformation depends on the point x o we may denote it as L = L(f; x o when we need to be specific. Other notations will be f (x o, h = L(f; x o (h. Now, in the case that f : S R, the linear transformation is a linear map from R n to R and is therefore called a a linear functional. This linear functional can be realized by the application of a dot product. Given any fixed vector z R n it is clear that the map of R n R given by l z (h := z h = z h is a linear map, a fact which follows from the elementary properties of the dot product. On the other hand, it is well known that, given the standard basis of unit vectors, every linear transformation is realized as a 1 n matrix. So that, if y denotes this matrix, the linear functional is given by y h. In other words there is a one-to-one correspondence between vectors in R n and linear transformations from R n to R. It is shown in advanced calculus texts that if a real-valued function is differentiable on an open set, the partial derivatives exist and that the linear functional that defines the derivitive is given by the map h ( f x 1, f x,, f x n (h 1, h,, h n = f xi h i. 1 Here we have supressed the dependence on the point x o. The vector ( f x 1, f x,, f x n is called the gradient of f and is written variously as grad f or f. Hence we write the differential as f (x o ; h = h f(x o. As a simple example in R 3, suppose that f(x, y, z := x +3 y + z. Then grad f(x, y, z = (x, 6y, 4z so that, for example, at the point x o = (1,, 1, grad f(1, 1 = (,, 4. Note that the differential at this point is the map h f(1,, 1 h or (h 1, h, h 3 h 1 + 4h. i=1 If the function f is of class C ( then the second partial derivatives are defined and continuous. We can consider the map from S R n given by f and ask for its derivative. This derivative is again a linear functional defined on R n. It can be shown that this second derivative in then a bilinear form on R n R n which can be realized in terms of a matrix, represented relative to the standard basis as 1 Conversely, it can be shown that if the partial derivatives exist at a point x o and are all continuous in a neighborhood of that point, then the function f is differentiable at x o.

Q(x o := f(x o x 1 x 1 f(x o x 1 x... f(x o f(x o x x 1 x x...... f(x o f(x o x n x 1 x n x... f(x o x 1 x n f(x o x x n. f(x o x n x n, which, since the second partial derivatives are continuous, is a symmetric matrix. The second differential, or second Fréchet derivative of the function is then given by f (x o ; h, k := k Qh. The matrix Q is referred to as the Hessian matrix of f. Clearly the mapping of R n R n R given by (h, k f (x o ; h, k = k Qh is a bilinear form, a form that is linear in h for every fixed k and linear in k for each fixed h. We note that the values of this form are completely determined by the values of f (x o ; h, h for h R n. This can be seen by the following computation which is reminicent of the binomial theorem. and hence f (x o ; h + k, h + k = f (x o ; h, h + k + f (x o ; k, h + k = f (x o ; h, h + f (x o ; h, k + f (x o ; k, k, f (x o ; h, k = 1 (f (x o ; h + k, h + k f (x o ; h, h f (x o ; k, k. Let us pause for a concrete example. Consider the case n = and write the variables as (x, y rather than (x 1, x. We continue to write (h 1, h for h and k = (k 1, k. Then we have while f(x, y = x 3 y, so that f = (3 x y, x 3 y, and f (x; h = 3 x y h + x 3 y k f (x o ; h, k = ( k 1 k ( 6xy 6x y 6x y x 3 In this example the matrix ( h1 = 6xy h 1 k 1 + 6x y(h k 1 + h 1 k + x 3 h k. h 3

Q = ( 6xy 6x y 6x y x 3 is the Hessian matrix. Note that it is symmetric., A particularly instructive, and useful example for our future work is given in the case that f is the quadratic function f(x = 1 i=1 a ij x i x j + b i x i + c, where a ij = a ji, b i, and c are given constants. In matrix form, we write i=1 where the n n-matrix is symmetric. f(x = 1 x A x + b x + c, For a given index k, the variable x k is repeated in pairs in the first term defining f, namely, when the index j = k and when the index i = k. (This is the reason for the factor of 1/ in the definition. So, for example, the derivative of the first term with respect to x 1 is 1 ( a 1j x j + a 1j x j. Hence, differentiating the expression for f with respect to x k we obtain ( f = 1 a ik x i x k + x k i=1 = a kj x j + b k, a kj x k x j + b k since a jk = a kj for all i by hypothesis. Clearly, from this last form we have also that x i x j f = a ij. It follows that the gradient, f and the Hessian Q are given by ( f(x = a 1j x j + b 1,, a nj x j + b n = Q = (a ij. ( a 1j x j,, a nj x j + b and 4

or f(x = Ax + b Q = A. Note, in particular that if ϕ(x := x x then ϕ(x = x and if ν(x := x then ν(x = x/ x. Now, the Hessian that appears in the above formula is a symmetric matrix, and for such matrices we have the following definition. Definition 1.1 An n n symmetric matrix, Q, is said to be positive semi-definite provided, for all x R n, x, Qx. The matrix Q is said to be positive definite provided for all x R n, x, x, Qx >. We emphasize that the notions of positive definite and positive semi-definite are defined only for symmetric matrices. It is important in the theory of optimization to interpret the differentials of f as directional derivatives. We recall that, we begin with a fixed unit vector (or direction û R n and a real-valued function f defined and continuous in a neighborhood of the point x o. We assume that the neighborhood is convex i.e. that for any two points in the neighborhood, the line segment joining the two points is completely contained in the neighborhood. Then the directional derivative of f at x o in the direction û is defined to be (Dûf(x o f(x o + tû f(x o := lim. t t As a simple example, consider f(x, y := x + 3x y and let x o = (,. Let û = (1/, 1/. Then In R n it is easy to see that a δ-neighborhood is such a set. 5

x o + t û = f(x o + t û = ( + t, t ( + t ( + 3 + t ( t = 4 t t and so (Dûf(x o = lim t ( t t 4 t =. Now, if the function f : R n R is of class C (k on S such that for some δ >, y := x + tû S, δ < t < δ then the function ϕ(t := f(x + t û = f(x 1 + tu 1, x + tu,, x n + tu n, δ < t < δ, is of class D (k in t on S. If, in particular, f C (1 the by the chain rule for differentiation, we have ϕ (t = f x 1 (x o + tûu 1 + + f x n (x o + tûu n, and we have ϕ ( = f (x o, û. We often write d dt f(x + t û = f (x o ; û. t= This shows how to compute the directional derivative since f (x o ; û = f(x o û. In the simple example given above, since f(x, y = x +3xy, we have f = (x+3y, 3x so that (Dûf(x o = (1/, 1/ (4, 6 = 4/ 6/ = as before. Clearly, if û = e j, the standard j th unit vector, then we recover the usual j th partial derivative. We now turn to the multidimensional analog of Taylor s formula. Again, we assume that f is a real-valued function defined on an open set S. Then, if x S we can choose h so that, for every t, t 1, the line segment parameterized by y(t = x + th lies in S. Then Taylor s formula for f can be derived from the the Taylor formula for the function ϕ(t := f(x + th. Indeed ϕ : [, 1] R and, if f C (1, we have ϕ(1 = f(x + h, ϕ( = f(x, and ϕ (θ = f (x + θh; h. Using the Mean Value Theorem for functions of one variable, we have that there is a number θ 1, < θ 1 < 1 such that 6

1 ϕ(1 = ϕ( + ϕ (θ 1 = ϕ( + ϕ (θ dθ, the second relation holding by integration. In terms of the original function f we then have what we call the first-order Taylor expansion f(x + h = f(x + f (x + θ 1 h; h = f(x + f (x + θh; h dθ. 1 Now, if the function f C ( then so is the function ϕ and we have, using the second-order Taylor expansion for a function of one variable ϕ(1 = ϕ( + ϕ ( + ( 1 ϕ (θ, where θ (, 1. Or, in terms of the integral remainder term, ϕ(1 = ϕ( + ϕ ( + 1 (1 θ ϕ (θ dθ. 3 In terms of the original function f, we have f(x + h = f(x + f (x; h + = f(x + f (x; h + ( 1 f (x + θ h; h 1 f (x + θh; h dθ. Now, if we write r (x, h := 1 (1 θ [f (x + θh; h f (x; h] dθ, 3 To derive this form, start with the first order Taylor expansion for ϕ with integral remainder and integrate by parts. 7

we can write the second order Taylor expansion in the form f(x + h = f(x + f (x; h + ( 1 f (x h + r (x, h. 8