i c Robert C. Gunning

Similar documents
Lecture Notes on Metric Spaces

Some Background Material

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

Math 117: Topology of the Real Numbers

Introduction to Topology

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

A LITTLE REAL ANALYSIS AND TOPOLOGY

REVIEW OF ESSENTIAL MATH 346 TOPICS

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Topological properties

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers

Introduction to Real Analysis Alternative Chapter 1

Metric Spaces and Topology

Logical Connectives and Quantifiers

01. Review of metric spaces and point-set topology. 1. Euclidean spaces

Analysis-3 lecture schemes

Maths 212: Homework Solutions

Mathematical Analysis Outline. William G. Faris

CHAPTER 7. Connectedness

Set, functions and Euclidean space. Seungjin Han

FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets

Convex Analysis and Economic Theory Winter 2018

Analysis II - few selective results

MAT 257, Handout 13: December 5-7, 2011.

TOPOLOGY HW 2. x x ± y

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Continuity. Chapter 4

Continuity. Chapter 4

AN EXPLORATION OF THE METRIZABILITY OF TOPOLOGICAL SPACES

Chapter 2 Metric Spaces

MH 7500 THEOREMS. (iii) A = A; (iv) A B = A B. Theorem 5. If {A α : α Λ} is any collection of subsets of a space X, then

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

5 Integration with Differential Forms The Poincare Lemma Proper Maps and Degree Topological Invariance of Degree...

An introduction to some aspects of functional analysis

Chapter 1 The Real Numbers

Hilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

Introduction to Real Analysis

Chapter 8 Integral Operators

Course 212: Academic Year Section 1: Metric Spaces

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103

2 Sequences, Continuity, and Limits

Chapter 2. Metric Spaces. 2.1 Metric Spaces

Chapter 3 Continuous Functions

The Heine-Borel and Arzela-Ascoli Theorems

Real Analysis Problems

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions

1. Bounded linear maps. A linear map T : E F of real Banach

Problem Set 2: Solutions Math 201A: Fall 2016

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

CHAPTER I THE RIESZ REPRESENTATION THEOREM

Contents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3

Introduction and Preliminaries

MA651 Topology. Lecture 10. Metric Spaces.

General Notation. Exercises and Problems

ANALYSIS WORKSHEET II: METRIC SPACES

Real Analysis Chapter 4 Solutions Jonathan Conder

0 Sets and Induction. Sets

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

7: FOURIER SERIES STEVEN HEILMAN

Math 5210, Definitions and Theorems on Metric Spaces

Sequences. Chapter 3. n + 1 3n + 2 sin n n. 3. lim (ln(n + 1) ln n) 1. lim. 2. lim. 4. lim (1 + n)1/n. Answers: 1. 1/3; 2. 0; 3. 0; 4. 1.

1. Continuous Functions between Euclidean spaces

Immerse Metric Space Homework

REAL AND COMPLEX ANALYSIS

Notes on Complex Analysis

B553 Lecture 3: Multivariate Calculus and Linear Algebra Review

Mathematical Preliminaries

Part III. 10 Topological Space Basics. Topological Spaces

Functional Analysis HW #3

Chapter 1: Banach Spaces

Exercise Solutions to Functional Analysis

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

MASTERS EXAMINATION IN MATHEMATICS SOLUTIONS

Chapter 1. Sets and Mappings

Introduction to Proofs in Analysis. updated December 5, By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION

Lebesgue Measure on R n

NOTES ON MULTIVARIABLE CALCULUS: DIFFERENTIAL CALCULUS

11691 Review Guideline Real Analysis. Real Analysis. - According to Principles of Mathematical Analysis by Walter Rudin (Chapter 1-4)

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

EC 521 MATHEMATICAL METHODS FOR ECONOMICS. Lecture 1: Preliminaries

Analysis Comprehensive Exam Questions Fall F(x) = 1 x. f(t)dt. t 1 2. tf 2 (t)dt. and g(t, x) = 2 t. 2 t

MATH 31BH Homework 1 Solutions

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

Based on the Appendix to B. Hasselblatt and A. Katok, A First Course in Dynamics, Cambridge University press,

Chapter 7. Metric Spaces December 5, Metric spaces

Real Analysis. Joe Patten August 12, 2018

Multivariable Calculus

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

Real Analysis, 2nd Edition, G.B.Folland Elements of Functional Analysis

After taking the square and expanding, we get x + y 2 = (x + y) (x + y) = x 2 + 2x y + y 2, inequality in analysis, we obtain.

THEOREMS, ETC., FOR MATH 515

Tools from Lebesgue integration

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall.

M311 Functions of Several Variables. CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3.

Ordinary differential equations. Recall that we can solve a first-order linear inhomogeneous ordinary differential equation (ODE) dy dt

NAME: Mathematics 205A, Fall 2008, Final Examination. Answer Key

Metric Space Topology (Spring 2016) Selected Homework Solutions. HW1 Q1.2. Suppose that d is a metric on a set X. Prove that the inequality d(x, y)

Transcription:

c Robert C. Gunning i

ii

MATHEMATICS 218: NOTES Robert C. Gunning January 27, 2010

ii

Introduction These are notes of honors courses on calculus of several variables given at Princeton University during the academic years 2007-2010. I would particularly like to thank Lillian Pierce, who taught the course with me during the fall of 2008 and suggested and largely was responsible for supplementing the theoretical material with problem sets emphasizing the explicit calculations that are essential to understanding and using this material. The basis for these notes is the set of notes of the fall 2008 course taken by Robert Haraaway, Adam Hesterberg, Jay Holt and Alex Schiller, whom I should also like to thank for their careful and thorough notes and their many suggestions. Robert C. Gunning Fine Hall Princeton University January 2010 iii

iv INTRODUCTION

Contents Introduction iii 1 Background 1 1.1 Norms.................................... 1 1.2 Topological Preliminaries......................... 6 1.3 Continuous Mappings........................... 10 2 Differentiable Mappings 13 2.1 The Derivative............................... 13 2.2 The Chain Rule............................... 16 2.3 Higher Derivatives............................. 19 2.4 Functions.................................. 22 3 The Rank Theorem 29 3.1 The Inverse Mapping Theorem...................... 29 3.2 The Implicit Function Theorem...................... 36 3.3 The Rank Theorem............................. 43 4 Integration 53 4.1 Riemann Integral.............................. 53 4.2 Fubini s Theorem.............................. 63 4.3 Limits and Improper Integrals....................... 67 4.4 Change of Variables............................ 72 5 Differential Forms 77 5.1 Line Integrals................................ 77 5.2 Differential Forms............................. 83 5.3 Stokes s Theorem.............................. 93 A Exterior Algebra 107 v

vi CONTENTS

Chapter 1 Background 1.1 Norms Some fairly standard set-theoretic notation and terminology will be used consistently. In particular, a A indicates that a is an element of, or a point in, the set A, while A B indicates that A is a subset of B, possibly equal to B. The union A B of sets A and B consists of all elements that belong to either A or B or both A and B, while the intersection A B consists of all elements that belong to both A and B. The complement A B consists of all elements of A that do not belong to B, whether or not B is a subset of A. If the set A is understood from context it may be omitted; so B consists of all elements in a set A that are not in B, where the set A is understood, and if A and B are both understood to be subsets of a set E then A B = A ( B). A mapping f : A B associates to each element a A an element f(a) B, the image of the point a. This mapping f is injective if distinct elements of A have distinct images in B, is surjective if every element of B is the image of some element of A, and is bijective if it is both injective and surjective; thus f is bijective if and only if is a one-to-one mapping between the sets A and B, and consequently has an inverse mapping f 1 : B A that is also bijective. The n-dimensional real vector space will be denoted by R n, following the Bourbaki convention. It will be viewed as the space of real column vectors of length n, elements of which are 0 1 x = B @ x 1. x n but for notational convenient a vector x R n sometimes will be indicated by listing its coordinates in the form x = {x j} = {x 1,..., x n}. The origin in R n is the vector 0 = {0} = {0, 0,..., 0}. When considering a function f, the notation f(x 1,..., x n) also will be used when viewing coordinates of the vector x as individual variables; but the vector x still will be viewed as a column vector. Addition is the usual addition of vectors by adding their coordinates; and scalar multiplication amounts to multiplying 1 C A ;

2 CHAPTER 1. BACKGROUND all the entries of the vector by a scalar, so that 0 1 a x 1 B ax = @. a x n C A for any a R. Linear mappings between vector spaces are described by matrix multiplication; for example, a 2 3 matrix A = {a ij} describes the mapping A : R 3 R 2 that takes a vector x = {x j} R 3 to the vector 1 «0 a11 a 12 a 1 13 Ax = @x x a 21 a 22 a 2 23 x 3 A = a11x 1 + a 12x 2 + a 13x 3 a 21x 1 + a 22x 2 + a 23x 3 «R 2. There are a various norms measuring the sizes or lengths of vectors in R n, only two of which will be considered here: the Euclidean norm or Cartesian norm of a vector x = {x j} is defined by x 2 = q Pn j=1 x2 j = p x 2 1 + + x2 n (with the non-negative square root); the supremum norm or sup norm of a vector x = {x j} is defined by x = max 1 j n x j. In general, a norm on the vector space R n is a mapping R n R that associates to any x R n a real number x R with the following properties: 1. positivity : x 0 and x = 0 if and only if x = 0; 2. homogeneity: cx = c x for any c R; 3. the triangle inequality: x + y x + y. That the supremum norm satisfies these three properties is obvious, except perhaps for the triangle inequality; to verify that, if x,y R n then since x j x and y j y for 1 j n it follows that x j + y j x j + y j x + y for 1 j n and consequently that x + y = max 1 j n x j + y j x + y. That the Euclidean norm satisfies these three properties also is obvious, except for the triangle inequality; it is convenient to demonstrate that together with an inequality involving the inner product of two vectors, which is defined by (1.1) (x, y) = nx x jy j for vectors x = {x j} and y = {y j} R n. j=1 The inner product is quite commonly written (x,y) = x y and called the dot product of the two vectors x and y. It is characterized by the following properties: 1. linearity: (c 1x 1 + c 2x 2,y) = c 1(x 1,y) + c 2(x 2,y) for any c R; 2. symmetry: (x,y) = (y,x); 3. positivity: (x,x) 0 and (x,x) = 0 if and only if x = 0. These three properties follow almost immediately from the defining equation (1.1). It is apparent from symmetry that the inner product (x,y) is also a linear function of the vector y. The Euclidean norm can be defined in terms of the inner product by (1.2) x 2 = p (x,x) (with the non-negative square root),

1.1. NORMS 3 as is quite clear from the definitions. Conversely the inner product can be defined in terms of the norm by (1.3) (x,y) = 1 4 x + y 2 2 1 4 x y 2 2 since x + y 2 2 x y 2 2 = (x + y,x + y) (x y,x y) = (x,x) + 2(x,y) + (y,y) (x,x) 2(x,y) + (y,y) = 4(x,y). Equation (1.3) is called the polarization identity. Theorem 1.1 (i) (x,y) x 2 y 2 for any x,y R n ; and this is an equality if and only if the two vectors are linearly dependent. (ii) x + y 2 x 2 + y 2 for any x,y R n ; and this is an equality if and only if one of the two vectors is a non-negative multiple of the other. Proof: If x,y R n where y 0 and t R introduce the continuous function f(t) of the variable t defined by (1.4) nx f(t) = x + ty 2 2 = (x j + ty j) 2 j=1 nx nx X n = x 2 j + 2t x jy j + t 2 j=1 j=1 j=1 = x 2 2 + 2t(x,y) + t 2 y 2 2. y 2 j It is clear that the function f(t) becomes large for t large, since y 2 > 0 by assumption, so it must take a minimum value at some point. Since f (t) = 2(x,y) + 2t y 2 2 there is a single point at which f (t) = 0, the point t 0 = (x,y), so this must be the y 2 2 point at which the function f(t) takes its minimum value; and the minimum value is (1.5) f(t 0) = x 2 2 + 2 (x,y) y 2 2 «(x,y) + (x,y)2 y 4 2 y 2 2 = x 2 2 y 2 2 (x,y) 2. y 2 2 It is clear from (1.4) that f(t) 0 at all points t R, so in particular at the point t 0, and that yields the inequality (i). It is clear from (1.5) that this inequality is an equality if and only if f(t 0) = 0, hence if and only if x + t 0y = 0 since f(t 0) = x + t 0y 2 2; and since y 0 that is just the condition that the vectors x and y are linearly dependent. Next from (1.4) with t = 1 and from the inequality (i) it follows that x + y 2 2 = x 2 2 + 2(x,y) + y 2 2 x 2 2 + 2 x 2 y 2 + y 2 2 = ( x 2 + y 2) 2, which yields the inequality (ii). This inequality is an equality if and only if (x,y) = x 2 y 2, or equivalently if and only if (i) is an equality and (x,y) 0; and since

4 CHAPTER 1. BACKGROUND y 0 it follows from what has already been demonstrated that the inequality (i) is an equality if and only if x = cy for some real number c, and then (x,y) = c y 2 2 0 if and only if c 0. That suffices for the proof. The very useful inequality (i) in the preceding theorem, called the Cauchy- Schwarz inequality, can be written (x,y) x 2 y 2 1 if x 0,y 0; as a consequence there is an angle θ, called the angle between the nonzero vectors x and y, that is determined uniquely up to a multiple of 2π by (1.6) cos θ = (x,y) x 2 y 2. In particular if θ = 0 or π so cos θ = ±1 then by Theorem 1.1 (i) the two vectors x and y are linearly dependent; they are parallel and in the same direction if θ = 0 so that (x,y) > 0, or parallel and in the opposite direction if θ = π so that (x,y) < 0. The two vectors are orthogonal or perpendicular to one another when the angle is either π/2 or 3π/2 so that (x,y) = 0. The geometrical interpretation of the norm x 2 in terms of the inner product makes that norm particularly useful in many applications. In general, two norms x a and x b on a vector space R n are equivalent if there are nonzero constants c a and c b such that x a c a x b and x b c b x a for all x R n. The Euclidean and supremum norms are equivalent, since they are related by the very useful inequalities (1.7) x x 2 n x for any x R n. To verify these inequalities, if x = {x j} R n then x j x for 1 j n so x 2 2 = nx x 2 j j=1 nx x 2 = n x 2 hence x 2 n x ; j=1 on the other hand nx x 2 2 = x 2 j x 2 j so x 2 x j for 1 j n hence x 2 x. j=1 It follows from (1.7) that whenever x 2 is small then x is also small, and conversely; so for many purposes the two norms are interchangeable. If it is not really necessary to specify which norm is meant the notation x will be used, meaning either x 2 or x. Of course some care must be taken; in particular x should have the same meaning in any single equation, or usually throughout any single proof, for the quite obvious reasons. For some purposes it is convenient to view an m n matrix n o A = a ij 1 i m, 1 j n R m n as a vector in R mn, and to consider its norm when viewed as a vector; thus v X (1.8) A = max a ij and A 2 = u t a 2 ij (the nonnegative square root). 1 i m 1 j n 1 i m 1 j n

1.1. NORMS 5 If x R n then Ax R m, and since nx a ijx j j=1 nx a ij x j j=1 1 i m nx A x = n A x it follows that nx (1.9) Ax = max a ijx j n A x. j=1 j=1 An ǫ-neighborhood N ǫ(a) of a point a R n, an open ǫ-neighborhood to be more specific, is a subset of R n defined by n o (1.10) N ǫ(a) = x R n d(x,a) = x a < ǫ ; a closed ǫ-neighborhood is defined correspondingly by n o (1.11) N ǫ(a) = x R n d(x,a) = x a ǫ. The boundary of an open or closed ǫ-neighborhood is the closed set n o (1.12) N ǫ(a) = N ǫ(a) = x R n d(x,a) = x a = ǫ. If it is necessary or convenient to be specific, an open ǫ neighborhood in the Euclidean norm is denoted by N ǫ,2(a) while an open ǫ-neighborhood in the supremum norm is denoted by N ǫ, (a). It is clear from the inequality (1.7) that (1.13) N ǫ,2(a) N ǫ, (a) N ǫ n,2 (a) in R n. The geometrical shapes of these neighborhoods depend on the norms used in their definitions; for example in the plane R 2 neighborhoods have the shapes as shown in Figure1.1. Figure 1.1: Neighborhoods N ǫ (a) R 2. Other useful subsets of R n are open cells, subsets of the form n o (1.14) = x = {x j} R n aj < x j < b j for 1 j n for arbitrary real numbers a j < b j, and closed cells, subsets of the form n o (1.15) = x = {x j} R n aj x j b j for 1 j n

6 CHAPTER 1. BACKGROUND for arbitrary real numbers a j b j. Closed cells may have a j = b j for some indices j, while of course open cells for which a j = b j for any index j must be the empty set. The boundary of an open or closed cell is the closed set (1.16) = = ( x R n a j x j b j for 1 j n, and at least one inequality is an equality A customary notation for cells in R 1, which are also called intervals, is (1.17) (a, b) = { x R 1 a < x < b } and [a, b] = { x R 1 a x b }; these can be mixed, as for instance (a, b] = { x R 1 a < x b }. ). 1.2 Topological Preliminaries Some familiarity with the basic topological notions for subsets of the real line is assumed; but a brief review is included here, focusing on the extension of these notions to subsets of R n, and it may be a sufficient introduction even for those seeing these concepts for the first time. However if these notions are not entirely familiar it is advisable as an additional exercise to prove explicitly and in detail those statements that are labeled clear or evident in the discussion here. The basic topological property of the real number system is its completeness, the property that any set S of real numbers that is bounded from above has a least upper bound or supremum, denoted by sup S, or equivalently that any set S of real numbers that is bounded from below has a greatest lower bound or infimum, denoted by inf S; an alternative statement of this property is that any Cauchy sequence of real numbers converges to a real limit. There are a variety of topological notions and terms that are in common use. A subset U R n is said to be open if for each a U there is an ǫ > 0 such that N ǫ(a) U. It is clear from the inequalities (1.7) that the condition that a set be open is independent of which of the two norms x 2 or x is used to define the neighborhood N ǫ(a). Intuitively a set U R n is open if for any point a U all points that are near enough to a are also in the set U. For example an open ǫ-neighborhood N ǫ(a) of a point a R n and an open cell R n are open subsets. An open subset of R n containing a point a is often called an open neighborhood of the point a; an open ǫ-neighborhood N ǫ(a) is thus an open neighborhood of the point a in this sense as well. The collection T of open subsets of R n has the following characteristic properties: 1. If U α T then S α Uα T ; 2. If U i T for 1 i N for some N then T N i=1 Ui T ; 3. T and R n T. These properties can be summarized in the statement that an arbitrary union of open sets is open, a finite intersection of open sets is open, and the empty set and the set R n itself are open. That the open subsets of R n satisfy these three conditions is quite clear. It is also clear that an infinite intersection of open sets is not necessarily open; for instance T ν=1 N ν 1 (a) = a and a single point is not open. A collection T of subsets

1.2. TOPOLOGICAL PRELIMINARIES 7 of an arbitrary abstract set S having these three characteristic properties is said to be a topology on the set S; and the sets U T are said to be the open sets in this topology. For example if S R n is an arbitrary subset of R n the intersections U S of S with open sets U R n are a topology on S, called the relative topology on S induced by the topology on R n ; and the sets U T are called the relatively open sets in this topology. It is evident that relatively open sets can be defined in parallel with the definition of open sets in R n as those sets E S such that for any point a E there is an ǫ > 0 for which N ǫ(a) S E. Clearly a set E S can be open in the relative topology of S but not open in the usual topology of R n. A point a R n is a limit point of a subset E R n if the intersection N ǫ(a) E contains points of E other than a for all ǫ > 0; clearly that is equivalent to the condition that N ǫ(a) E is an infinite set of points for all ǫ > 0. The set of limit points of E is denoted by E and is called the derived set of E. The set E is said to be closed if E E. For example a closed neighborhood N ǫ(a) of a point a R n and a closed cell in R n are closed subsets. The closure of a set E, denoted by E, is defined to be the union E = E E, and is readily seen to be a closed set. A closed neighborhod N ǫ(a) of a point a R n is the closure of the open neighborhood N ǫ(a) and a closed cell in R n is the closure of the open cell provided that. A basic topological result is that a subset E R n is closed if and only if its complement F = R n E is open. To verify that, if E is closed and a E then a E so a is not a limit point of E hence there must be some ǫ > 0 such that N ǫ(a) E = ; consequently N ǫ(a) F so F is open. Conversely if E is open then no point of E can be a limit point of F, since for any point a E there is an ǫ > 0 such that N ǫ(a) E hence that N ǫ(a) F = ; consequently all the limit points of F are contained in F so F is closed. It is clear that the closed sets have the properties that a finite union of closed sets is closed, that any intersection of closed sets is closed, and that the empty set and the full set R n are both closed; a topology can be defined alternatively by giving a collection of sets satisfying these three conditions and defining the open sets to be their complements, although it is customary to define topologies directly in terms of the open sets. The boundary of a subset E R n is the closed set E = E ( E). The boundaries of ǫ neighborhoods N ǫ(a) and of cells as defined earlier are also their boundaries in this sense. For more general sets some caution is necessary, since the boundary may not always correspond to what naively might be viewed as the boundary of the set; for example If E is the set of all points in an open cell that have rational coordinates then E =. Two subsets E, F R n are separated if E F = E F =. This is somewhat stronger than that the two sets are disjoint; for example E = [0, 1) and F = [1, 2] are disjoint subsets of R but are not separated since E F = 1, but E = [0, 1) and G = (1, 2] are separated subsets of R. A subset S R n is connected if it cannot be written as the union of two nonempty separated subsets. In terms of open sets alone, a topological space S, such as a subset of R n with the relative topology, is connected if it cannot be written as a disjoint union of two relatively open subsets of S, or equivalently if there is no subset of S that is both relatively open and relatively closed, other than the empty set and all of S. The equivalence of these conditions is quite obvious. It is easy to see that the entire space R n is connected. Indeed if U R n is a subset other than the empty set or all of R n that is both open and closed, choose a point a U and a point b R n U. The set of real numbers s such that a + t(b a) U for 0 t < s is nonempty, since U is open, and it is bounded above since b U, so this set has a least upper bound s 0, one of the characteristic properties of the real number system. Since U is closed it must be the case that a + s 0b U,

8 CHAPTER 1. BACKGROUND for this is the limit of the points a + sb U as s s 0; but then since U is open it also must be the case that a + sb U for some real numbers s > s 0, a contradiction. The same argument shows that an open neighborhood N r(a) and an open cell in R n are connected sets. An open covering of a subset E R n is a collection of open sets U α, not necessarily a countable collection, such that E S U α. If some of the sets U α are redundant they can be eliminated and the remaining sets are also an open covering of E, called a subcovering of the set E. A set E is said to be compact if every open covering of E has a finite subcovering, that is, if for any open covering {U α} of E finitely many of the sets U α actually cover all of E. This is a rather subtle notion, but is very important and frequently used. An example of a non-compact subset of R n is a nonempy open neighborhood N 1(0) of the origin in R n ; this set is covered by the open subsets U ν = N 1 1 ν (0) for ν = 1, 2, 3,..., but the union of any finite collection of these subsets will just be the set U ν for the largest ν in the collection, and that is a proper subset of N 1(0). An open cell is also noncompact, for essentially the same reason. A closed cell is an example of a compact subset; but the proof is a bit subtler and rests on the basic topological properties of the real number system, just as did the proof that R n is connected. For the proof it is convenient to define the edgesize of an open or closed cell = { x = {x j} R n a j x j b j } R n to be the nonnegative number d( ) = max 1 j n (b j a j). Lemma 1.1 If ν R n are closed cells in R n for ν = 1, 2,... such that ν+1 ν and lim ν d( ν) = 0 then T ν ν is a single point of Rn. Proof: If ν = { x = {x j} a ν j x j b ν j } then for each j clearly a ν+1 j a ν j and b ν+1 j b ν j ; and since ν 1 it is also the case that a ν j b 1 j and b ν j a 1 j. The basic completeness property of the real number system implies that any increasing sequence of real numbers bounded from above and any decreasing sequence of real numbers bounded from below have limiting values; therefore lim ν a ν j = a j and lim ν b ν j = b j for some uniquely determined real numbers a j, b j, and it is clear that the cell = {x = {x aj j} x j b j } is contained in the intersection T ν ν. On the other hand since (b j a j) (b ν j a ν j ) d( ν) and lim ν d( ν) = 0 it must be the case that b j = a j so the limiting cell is just a single point of R n, which concludes the proof. Theorem 1.2 A closed cell in R n is compact. Proof: If a closed cell = { x = {x j} a j x j b j } is not compact there is an open covering {U α} of that does not admit any finite subcovering. The cell can be written as the union of the closed cells arising from bisecting each of its sides. If finitely many of the sets {U α} covered each of the subcells then finitely many would cover the entire set, which is not the case; hence at least one of the subcells cannot be covered by finitely many of the sets {U α}. Then bisect each of the sides of that subcell, and repeat the process. The result is that there is a collection of closed cells ν which cannot be covered by finitely many of the open sets {U α} and for which ν+1 ν, and lim ν d( ν) 0. It then follows from the preceding lemma that T ν ν = a, a single point in Rn. This point must be contained within one of the sets {U α0 }, and if ν is sufficiently large then ν U α0 as well; but that is a contradiction, since the cells ν were chosen so that none of them could be covered

1.2. TOPOLOGICAL PRELIMINARIES 9 by finitely many of the sets {U α}. Therefore the cell is compact, which concludes the proof. Theorem 1.3 A closed subset of a compact set is compact. Proof: Suppose that E F where F is compact and E is closed, and that {U α} is an open covering of the set E. The sets U α together with the open set R n E form a covering of the compact set F, so finitely many of these sets cover F hence also cover E. Clearly the set R n E covers none of the points of E, so the remaining finitely many sets U α necessarily cover E. Therefore E is compact, which suffices for the proof. Theorem 1.4 (Heine-Borel Theorem) A subset E R n is compact if and only if it is closed and bounded. Proof: A bounded subset E R n is contained in a sufficiently large closed cell R n ; the set is compact by Theorem 1.2, so if E is also closed then by Theorem 1.3 it is compact. Conversely suppose that E is a compact set. If E is not bounded it can be covered by the collection of open neighborhoods N ν(0) for ν = 1, 2,..., but it is not covered by any finite set of these neighborhoods; that contradicts the compactness of E, so a compact set is bounded. If E is not closed then there is a limit point a E that is not contained in E. Each closed neighborhood N 1/ν (a) of the point a must contain a point of E, since a E, but the intersection of all of these neighborhoods consists of the point a itself, which is not contained in E. The complements U ν = R n N 1/ν (a) then form an open covering of E, but no finite number of these sets suffice to cover E since the union of finitely many such sets must be one of the sets U ν and hence does not cover the points of E contaioned in N 1/ν (a); and that again is a contradiction, which suffices to conclude the proof. Theorem 1.5 (Casorati-Weierstrass Theorem) A subset E R n is compact if and only if every sequence of distinct points in E has a limit point in E. Proof: Suppose that E is compact. If a ν E is a collection of distinct points of E with no limit points in E then the points a ν can have no limit points in R n, for since E is closed these limit points would necessarily lie in E; in particular the set S ν aν is a closed set. Each point a ν has an open neighborhood U ν that contains none of the other points, since otherwise a ν would itself be a limit point of this collection of points. These open sets U ν together with the open set R n ( S ν aν) form an open covering of E; and since E is compact finitely many of these sets already cover E. That is a contradiction, since the set R n ( S ν aν) does not cover any of the points a ν and no finite collection of the sets U ν cover all the points a ν. On the other hand, suppose that E R n is not compact. Then by the Heine-Borel Theorem either E is not bounded or E is not closed. If E is not bounded it must contain a sequence of distinct points a ν such that a ν is a strictly increasing sequence of real numbers with no finite limit, and this sequence can have no limit point in E. On the other hand if E is not closed it contains a sequence of distinct points a ν with a limit point not contained in E. That suffices to conclude the proof. An incidental observation of interest is that the property that a set is compact is more intrinsic than the property of it being open or closed. It is clear that if S R n

10 CHAPTER 1. BACKGROUND has the relative topology, a subset E S that is open in the relative topology of S need not be open in the relative topology of R n, and a subset E S that is closed in the relative topology of S need not be closed in the relative topology of R n ; indeed an open interval in the line S = R 1 is not open in R 2 when the line is imbedded in the plane, and an open cell S = R n is a closed subset of itself in the topology inherited from R n, since it is the whole set S, but is not closed in R n. However a subset E S is compact as a subset of S in the topology of S if and only if it compact as a subset of R n in the topology of R n, as a clear consequence of the definition of the relatively open subsets of S as the intersections with S of open subsets in R n. 1.3 Continuous Mappings A mapping f from a subset U R m to a subset V R n associates to each point x = {x j} U a point f(x) = y = {y j} V ; the coordinates y j depend on the point x so can be viewed as given by functions y j = f j(x), which are the coordinate functions of the mapping f. The mapping f is continuous at a point a U if for every ǫ > 0 there is a δ > 0 such that f(a) f(x) ǫ whenever x a δ, or equivalently, for every ǫ > 0 there is a δ > 0 such that f(x) N ǫ(a) whenever x N δ (a). It is clear from the inequalities (1.7) that in the definition of continuity the norm x can be either the Euclidean norm x 2 or the supremum norm x. It is also clear that the mapping f is continuous at a point a = {a j} if and only if each of its coordinate functions f j is continuous at the point a = {a j}. The mapping is said to be continuous on the subset U if it is continuous at each point of U. Theorem 1.6 A mapping f : S R n from a subset S R m into R n is continuous in S if and only if f 1 (U) is an open subset of S in the relative topology of S for any open subset U R n. Proof: If f : S V is continuous, U R n is an open subset and a f 1 (U) then f(a) = b U, and since U is open there is an open neighborhood N ǫ(b) U. Since f is continuous there is a δ such that f(x) N ǫ(f(a)) whenever x N δ (a); and consequently N δ (a) f 1 (E), so f 1 (E) is an open subset. On the other hand if f 1 (U) is an open subset of S for any open subset U R n then in particular for any ǫ > 0 the set f 1 (N ǫ(b)) is an open subset of U, so N δ (a) f 1 (N ǫ(b)) for some δ, and consequently the mapping f is continuous at any point a U. That concludes the proof. Corollary 1.1 A mapping f : S R n from a subset S R m into R n is continuous in S if and only if f 1 (E) is a closed subset of S in the relative topology of S for any closed subset E R n. Proof: This is an immediate consequence of the preceding theorem, since a subset of S is closed in the relative topology of S if and only if its complement in S is relatively open and f 1 (R n E) = `S f 1 (E) for any subset E R n. That suffices for the proof. These results show that the continuity of a mapping in a set S really is a property of the topology of S. A simple consequence of either result is that if g : U V and f : V W are continuous mappings between subsets U, V, W of some Euclidean

1.3. CONTINUOUS MAPPINGS 11 spaces then the composition f g : U W that takes a point x U to the point (f g)(x) = f`g(x) is also continuous; indeed if E W is open then f 1 (E) is open since f is continuous and (f g) 1 (E) = g 1`f 1 (E) is open since g is continuous, and consequently (f g) is continuous. The results in Theorem 1.6 and Corollary 1.1 involve the inverse image of a set under a mapping f; the image of an open set under a continuous mapping is not necessarily open, and the image of a closed set under a continuous mapping is not necessarily closed. For instance if f : R R is the continuous mapping f(x) = e x2 then f(r) = { x 0 < x 1 }, which is neither open nor closed although R itself is both open and closed. Theorem 1.7 If f : S R n is a continuous mapping from a subset S R m into R n then the image f(e) of any compact subset E S is a compact subset of R n. Proof: If E S is compact and f(e) is contained in a union of open sets U α then E is contained in the union of the open sets f 1 (U α); and since E is compact is it contained in a union of finitely many of the sets U α, so E is contained in the union of the inverse images of these finitely many open sets. That suffices for the proof. Since a compact subset of R n is closed, by the Heine-Borel Theorem, the inverse under a continuous mapping of a compact set is necessarily closed; but it is not necessarily compact. For example the inverse image of the set [ 1, 1 ] R under the 2 2 mapping f : R R given by f(x) = sin x is an unbounded set hence is not compact. Corollary 1.2 If f is a continuous function on a compact set U R n then there are points a,b U such that (1.18) f(a) = sup f(x) x U and f(b) = inf x U f(x). Proof: The image f(u) R is compact by the preceding theorem, hence is a closed set; so if α = sup x U f(x) then since α is a limit point of the set f(u) it must be contained in the set f(u) hence α = f(a) for some point a U, and correspondingly for β = inf x U f(x). That suffices for the proof. Theorem 1.8 A one-to-one continuous mapping from a compact subset U R m onto a subset V R n has a continuous inverse. Proof: If the mapping f : U V is one-to-one it has a well defined inverse mapping g : V U. To show that g is continuous it suffices to show that g 1 (E) is closed for any closed subset E U, in view of Corollary 1.1. If E is closed then by Theorem 1.3 it is compact, since U is compact; and then bg 1 (E) = f(e) is compact by Theorem 1.7, hence is closed by the Heine-Borel Theorem, and that suffices for the proof. The assumption of compactness is essential in the preceding theorem. For example the mapping f : [0, 2π) R 2 defined by f(t) = (cos t, sin t) R 2 is clearly one-to-one and continuous; but the inverse mapping fails to be continuous at the point f(0) = (1, 0). Some properties of continuity are not purely topological properties, in the sense that they cannot be stated purely in terms of open and closed sets, but are metric properties, involving the norms used to define continuity. By definition

12 CHAPTER 1. BACKGROUND a mapping f : U W between two subsets of Euclidean spaces is continuous at a point a U if and only if for every ǫ > 0 there is a δ > 0 such that f(a) f(x) ǫ whenever x a δ. The mapping f is continuous in the set U if for each point a U and any ǫ > 0 there is a δ a > 0, which may depend on the point a, such that f(a) f(x) ǫ whenever x a δ a. The mapping f is uniformly continuous in U if it is possible to find values δ a that are independent of the point a U. Equivalently the mapping f is uniformly continuous in U if for any ǫ > 0 there exists δ > 0 such that f(x) f(y) < ǫ whenever x y < δ. It should be kept in mind that the two norms are in different spaces; and it is evident from the inequalities (1.7) that they may be different norms as well. Not all continuous mappings are uniformly continuous, as for example the mapping f : R R defined by f(x) = x 2 ; but in some circumstances continuous mappings are automatically uniformly continuous. Theorem 1.9 A continuous mapping f : U R n from a compact subset U R m into R n is uniformly continuous. Proof: If f : U R n and ǫ > 0 then for any point a U there is δ a > 0 such that f(x) f(a) < 1 ǫ whenever x N 2 δ a. The collection of neighborhoods N 1 2 δ a (a) for all points a U are an open covering of U, and if U is compact finitely many of these neighborhoods cover all of U. If δ > 0 is the minimum of the finitely many positive numbers δ a for this finite covering, and if x,y U are any two points such that x y < 1 δ, then x N 2 2 1 δ(a) for one of these neighborhoods, and since x y < 1 δ it is also the case that y N 2 δ(a). It follows that f(x) f(y) f(x) f(a) + f(a) f(y) 1 ǫ + 1 ǫ = ǫ, so f is uniformly continuous, which 2 2 concludes the proof.

Chapter 2 Differentiable Mappings 2.1 The Derivative A mapping f : U R n from an open subset U R m into R n is differentiable at a point a U if there is a linear mapping A : R m R n described by an n m matrix A such that for all h in an open neighborhood of the origin in R m (2.1) f(a + h) = f(a) + Ah + ǫ(h) where lim h 0 ǫ(h) h = 0. Here h R m so Ah R n, while f(a), f(a + h), ǫ(h) R n. This definition is independent of the norm chosen; for if lim 2 h 0 h 2 ǫ(h) = 0 then from the inequalities (1.7) it follows that ǫ(h) h ǫ(h) 2 h 2 n so ǫ(h) limh 0 h = 0 as well, and the converse holds similarly. If f is differentiable at a it is continuous at a, since lim h 0 Ah = 0 and lim h 0 ǫ(h) = 0. For example, if m = n = 1 it is possible to divide (2.1) by the real number h and to rewrite that equation in the form f(a + h) f(a) ǫ(h) lim A = lim h 0 h h 0 h = 0; this is a form of the familiar definition that the real-valued function f(x) of the variable x R is differentiable at the point a and that its derivative at that point is the real number A. For another example, if m = 3 and n = 2 so that f : R 3 R 2 then (2.1) takes the form «f1(a + h) = f 2(a + h) «f1(a + f 2(a) «0 1 a11 a 12 a 1 13 @h h a 21 a 22 a 2 A + 23 h 3 «ǫ1(h). ǫ 2(h) Theorem 2.1 A mapping f : R m R n is differentiable at a point a if and only if each of the coordinate functions f i of the mapping f is differentiable at that point. Proof: If f is a differentiable mapping it follows from (2.1) that each coordinate function f i satisfies (2.2) f i(a + h) = f i(a) + mx j=1 a ijh j + ǫ i(h) where lim h 0 ǫ i (h) h = 0, 13

14 CHAPTER 2. DIFFERENTIABLE MAPPINGS since ǫ i(h) h ǫ(h) h ; and this is just the condition that each of the coordinate functions f i of the mapping f is differentiable at the point a. Conversely if each of the coordinate mappings f i is differentiable at the point a then (2.2) holds for 1 i n. The collection of these n equations taken together form the equation (2.1) in which ǫ(h) = {ǫ i(h)}; and since ǫ(h) h lim h 0 ǫ(h) h ǫ = max i (h) 1 i n h it follows that = 0, so the mapping f is differentiable, and that concludes the proof. For the special case of a vector h = {h j} for which h j = 0 for j k for some index k, equation (2.2) takes the form f i(a 1,..., a k + h k,..., a m) = f i(a 1,..., a k,..., a m) + a ik h k + ǫ i(h k ) where lim hk 0 ǫ i(h k ) h k = 0; that is just the condition that f i(x), viewed as a function of the variable x k alone for fixed values x j = a j for the remaining variables for j k, is a differentiable function of that variable x k and that its derivative at the point x k = a k is the real number a ik. The constant a ik is called the partial derivative of the function f i with respect to the variable x k at the point a, and is customarily denoted by a ik = k f i(a). It follows that the entries in the matrix A = {a ik } are the uniquely determined partial derivatives of the coordinate functions of the mapping; this matrix is called the derivative of the mapping f at the point a and is denoted by f (a), so n (2.3) f (a) = `f (a) = ik kf i(a) 1 k m, 1 i n Generally it is a fairly straightforward matter to calculate the partial derivatives of a function; merely consider all the variables except one of them as constants, and apply the familiar techniques for calculating derivatives of a function of one variable. That can be applied to each coordinate function of a mapping, to yield the derivative of that mapping. It is evident from (2.1) that a linear combination c 1f 1 + c 2f 2 of two differentiable mappings f 1,f 2 : R m R n at a point a for any constants c 1, c 2 R is again a differentiable mapping at that point, and it follows from (2.3) that differentiation is linear in the sense that (c 1f 1 + c 2f 2) (a) = c 1f 1(a) + c 2f 2(a). There are various alternative notations for derivatives and partial derivatives of functions of several variables that are in common use. For instance Df(a) is often used for f (a) and D k f(a) or f x k (a) are commonly used for k f(a). If a mapping f : R m R n is differentiable at a point a R m then it has partial derivatives k f i(a) with respect to each variable x j at that point; but it is not true conversely that if the coordinate functions of a mapping f have partial derivatives at a point a with respect to each variable x j then f is a differentiable mapping. For example the mapping f : R 2 R defined by 8 x 1x 2 >< x 2 1 f(x) = + if x 0, x2 2 >: 0 if x = 0 vanishes identically in the variable x 2 if x 1 = 0, so 2f(0, 0) = 0, and similarly 1f(0, 0) = 0. However the function is not continuous at the origin, since for instance it takes the value 1 whenever x1 = x2 except at the origin where it takes the value 0; 2 hence it is not differentiable at the origin. o.

2.1. THE DERIVATIVE 15 Theorem 2.2 If the partial derivatives of a mapping f : R m R n exist at all points near a and are continuous at the point a then the mapping f is differentiable at the point a. Proof: In view of Theorem 2.1 it is enough to prove this for the special case that n = 1, in which case the mapping f : R m R is just a real valued function; and for convenience only the case m = 2 will be demonstrated in detail, since it is easier to follow the proof in the simpler case and all the essential ideas are present. Assume that the partial derivatives k f i(x) exist for all points x near a and are continuous at a = {a j}, and consider a fixed vector h = {h j}. When one of the variables is held fixed and f is viewed as a function of the remaining variable it is a differentiable function of a single variable. The mean value theorem for functions of a single variable asserts that if f(x) is continuous in a closed interval [a, b] and is differentiable at each point of the open interval (a, b) then f(b) f(a) = f (c)(b a) for some point c (a, b); this can be applied to the function f(x 1, a 2) of the single variable x 1 in the interval between a 1 and a 1 + h 1, and to the function f(a 1 + h 1, x 2) of the single variable x 2 in the interval between a 2 and a 2 + h 2 if h is sufficiently small; as a consequence there exist values α 1 between a 1 and a 1 + h 1 and α 2 between a 2 and a 2 + h 2 such that f(a 1 + h 1, a 2) f(a 1, a 2) = h 1 1f(α 1, a 2) and Then f(a 1 + h 1, a 2 + h 2) f(a 1 + h 1, a 2) = h 2 2f(a 1 + h 1, α 2). f(a + h) f(a) = f(a 1 + h 1, a 2 + h 2) f(a 1, a 2) = = f(a 1 + h 1, a 2 + h 2) f(a 1 + h 1, a 2) + f(a 1 + h 1, a 2) f(a 1, a 2) = h 2 2f(a 1 + h 1, α 2) + h 1 1f(α 1, α 2) = h 2 2f(a 1, a 2) + h 1 1f(a 1, a 2) + ǫ(h) where ǫ(h) = h 2 2f(a 1 + h 1, α 2) 2f(a 1, a 2) + h 1 1f(α 1, a 2) 1f(a 1, a 2). By the triangle inequality ǫ(h) 2f(a 1 + h 1, α 2) 2f(a 1, a 2) + h since h 1 h 1 and h 2 h 1f(α 1, a 2) 1f(a 1, a 2) 1; and since the partial derivatives are assumed to be continuous at the point a it follows that lim h 0 ǫ(h) h = 0. That shows that the mapping f : R 2 R is differentiable at the point (a 1, a 2), which suffices to conclude the proof. The converse of the preceding theorem does not hold; if a mapping f : R m R n is differentiable at all points of an open subset U R m the partial derivatives of the coordinate functions of the mapping f exist at each point of U, but need not be continuous functions in U. The standard example for functions of a single variable is the function 8 < sin 1 if x 0, x f(x) = : 0 if x 0,

16 CHAPTER 2. DIFFERENTIABLE MAPPINGS which is differentiable at all points x R but for which the derivative is not continuous at the point 0; indeed if x 0 then f (x) = 1 cos 1 so that x 2 x limn f ( 1 ) = +, 2πn but f sin 1 (0) = lim 0 x x 0 1 = lim xsin 1 x 0 x = 0 x since sin 1 x 1 for x 0. If a mapping f : U Rn defined in an open subset U R m is differentiable at all points x U, then the mapping that associates to each point x U the matrix f (x) is a well defined mapping f : U R n m where R n m is the vector space consisting of all n m matrices; the mapping f is continuously differentiable or of class C 1 in U if the mapping f is continuous. Corollary 2.1 A mapping f : U R n defined in an open subset U R m is continuously differentiable if and only if all the partial derivatives of its coordinate functions exist and are continuous functions in U. Proof: If the partial derivatives of the mapping f exist and are continuous then it follows from the preceding theorem that f is differentiable; and since f (x) = { k f i(x)} the mapping f is continuous, hence f is continuously differentiable. Conversely if f is continously differentiable then the partial derivatives of all of its coordinate functions exist and are continuous in U, which suffices for the proof. 2.2 The Chain Rule If g : U R m is a mapping defined in an open neighborhood U of a point a R l and f : V R n is a mapping defined in an open neighborhood V of the point b = g(a) R m, where g(u) V, the composition φ = f g : U R n is the mapping defined by φ(x) = f`g(x) for any x U; this situation is described in the following diagram. R l R m R n S S S (2.4) U g V f W φ = f g a g b = g(a) f f(b) = φ(a) Theorem 2.3 If the mapping g is differentiable at the point a and the mapping f is differentiable at the point b = g(a) then the composite function φ = f g is differentiable at the point a and φ (a) = f `g(a) g (a). Proof: Since the mapping f is differentiable at the point b (2.5) f(b + h) = f(b) + f (b)h + ǫ f (h) where lim h 0 ǫ f (h) h = 0, and since the mapping g is differentiable at the point a (2.6) g(a + k) = g(a) +g (a)k + ǫ g(k) {z} {z } b h where lim k 0 ǫ g(k) k = 0.

2.2. THE CHAIN RULE 17 Substituting (2.6) into (2.5) where b = g(a) and h = g (a)k+ǫ g(k) leads to the result that φ(a + k) = f`g(a + k) = f`b + h hence that = f(b) + f (b)h + ǫ f (h) = φ(a) + f (b) g (a)k + ǫ g(k) + ǫ f (h) (2.7) φ(a + k) = φ(a) + f (b)g (a)k + ǫ(k) where ǫ(k) = f (b)ǫ g(k)+ǫ f (h). From the inequality (1.9) and the triangle inequality it follows that ǫ(k) k f (b)ǫ g(k) k n f (b) ǫ g(k) k + ǫ f(h)) h g (a)k + ǫ g(k) k + ǫ f(h)) h n g (a) + ǫg(k) k!. Since lim k 0 ǫ g(k) k = 0 and lim h 0 ǫ f (h) h = 0 while lim h 0 k = 0 it follows from the preceding equation that lim k 0 ǫ(k) k = 0, and it then follows from (2.7) that h = f g is differentiable at the point a and that φ (a) = f `g(a) g (a), which concludes the proof. For a simple application of the chain rule, since the function g(y 1, y 2) = y 1y 2 is differentiable at any point y 1, y 2 R and g (y 1, y 2) = (y 2 y 1) it follows that for any differentiable mapping f = {f 1, f 2} : U R 2 in an open subset U R m the composition φ = g f : U R is a differentiable mapping and φ (x) = g `f(x) f «f (x) = f 2(x) f 1(x) 1 (x) f 2(x) = f 2(x)f 1(x) + f 1(x)f 2(x); this is just an extension of the familiar product rule for differentiating functions from the case of functions of a single variable to the case of functions of several variables, since φ(x) = f 1(x)f 2(x). The entries in the matrix φ (x) thus have the form k φ(x) = f 2(x) k f(x) + f 1(x) k f 2(x). The corresponding argument shows that the quotient f 1/f 2 of two differentiable functions is differentiable at any point x at which f 2(x) 0, and that its derivative has the familiar form. For another direct application of the chain rule, if f : U V is a one-to-one mapping between two open subsets U, V R m and if g : V U is the inverse mapping then g f : U U is the identity mapping, so that (g f) (x) = I where I is the m m identity matrix. If the mappings f and g are continuously differentiable then by the chain rule g `f(x) f (x) = I; thus the matrix g `f(x) is the inverse of the matrix f (x) at each point x U, so both matrices are nonsingular matrices at each point x U. An alternative notation for the chain rule is suggestive and sometimes quite useful. When mappings f : R l R m and g : R m R n are described in terms of the coordinates t = {t 1,..., t l } R l, x = {x 1,..., x m} R m and y = {y 1,..., y l } R n,

18 CHAPTER 2. DIFFERENTIABLE MAPPINGS the coordinate functions of the mappings f, g and φ have the form y i = f i(x) = φ i(t) and x j = g j(t). The partial derivatives are sometimes denoted by (φ ) ik = k φ i = yi t k, (f ) ij = jf i(x) = yi x j, (g ) jk = k g j(t) = xj t k. By the preceding theorem the derivative of the composite function φ = f g is the matrix product φ = f g, which in terms of the entries of these matrices is (φ ) ik = P n j=1 (f ) ij(g ) jk or equivalently k φ i = P n j=1 jfi kg j; and in the alternative notation this takes the form y i nx y i (2.8) = xj. t k x j t k j=1 This is the extension to mappings in several variables of the traditional formulation of the chain rule for functions of a single variable as the identity dy = dy dx ; this form dt dx dt of the chain rule is in some ways easier to remember, and with some caution, easier to use, than the version of the chain rule in the preceding theorem. It is customary however to omit any explicit mention of the points at which the derivatives are taken; so some care must be taken to remember the the derivative y i x j is evaluated at the point x while the derivatives y i t k and x j t k are evaluated at the point t. This lack of clarity means that some caution must be taken when this notation is used. Ssome care also must be taken with the chain rule in those cases where the compositions are not quite so straightforward. For instance if φ(x 1, x 2) = f`x 1, x 2, g(x 1, x 2) for a function f(x 1, x 2, x 3) of three variables and a function g(x 1, x 2) of two variables, the function φ is really the composition φ = f G of the mappings f : R 3 R 1 given by the function f and the mapping G : R 2 R 3 given by 0 1 x 1 G(x 1, x 2) = @ x 2 A, g(x 1, x 2) so by the chain rule φ (x) = f `G(x) g (x) = 1f`G(x) 2f`G(x) 0 1 1 0 3f`G(x)! B 0 1 C @ A 1g(x) 2g(x)! = 1f`G(x) + 3f`G(x) 1g(x) 2f`G(x) + 1f(x) 2g(x) and for the coordinate functions of the matrix φ (x) 1φ(x) = 1f`G(x) + 3f`G(x) 1g(x), 2φ(x) = 2f`G(x) + 1f`G(x) 2g(x). This amounts to calculating the partial derivative 1φ(x) as the sum of the partial derivatives of the function f`x 1, x 2, g 2(x 1, x 2) with respect to each of its three variables, and multiplying each of these derivatives by the derivative of what is in the place of that variable with respect to the variable x 1. Some practice, checked by going

2.3. HIGHER DERIVATIVES 19 back to the form of the chain rule give in Theorem 2.3, may prove helpful; and if there are any doubts about an application of the chain rule they can be cleared up by identifying the function as an explicit composition of mappings. It should be noted that in this case the meaning of the expression f`x 1, x 2, x 3 x 1 where x 3 = g(x 1, x 2) is not clear; it may mean either the derivative of the function f with respect to its first variable or the derivative of the composite function of the two variables x 1, x 2 with respect to the variable x 1, while 1f(x 1, x 2, x 3) where x 3 = g(x 1, x 2) is less ambiguous. The chain rule also is useful in deriving information about the derivatives of functions that are defined only implicitly. For example if a function f(x 1, x 2) satisfies the equation f(x 1, x 2) 5 + x 1f(x 1, x 2) + f(x 1, x 2) = 2x 1 + 3x 2 and the initial condition that f(0, 0) = 0 then the values of that function are determined implicitly but not explicitly by the preceding equation. This equation is the condition that the composition of the mapping F : R 2 R 3 defined by F(x 1, x 2) = `x1, x 2, f(x 1, x 2) and the mapping G : R 3 R defined by G(x 1, x 2, y) = y 5 +x 1y + y 2x 1 3x 2 is the trivial mapping G F(x 1, x 2) = 0, so that (G F) (0, 0) = 0; and by the chain rule 1(G F) = 5f(x 1, x 2) 4 1f(x 1, x 2) + x 1 1f(x 1, x 2) + f(x 1, x 2) + 1f(x 1, x 2) 2, so since 1(G F) = 0 and f(0, 0) = 0 the preceding equation reduces to 1f(0, 0) = 2. A similar calculation yields the value of 2f(0, 0). 2.3 Higher Derivatives If f : U R is a function defined in an open set U R 2 and if the partial derivative j1 f(x) exists at all points x U the function j1 f(x) may itself have partial derivatives, such as j2 ( j1`f(x), which for convenience is shortened to j2 j1 f(x); and the process may continue, leading to j3 j2 j1 f(x) and so on. The order in which successive derivatives are taken may be significant; for example, a straightforward calculation for the function 8 x 1x 2(x >< 2 1 x 2 2) x f(x 1, x 2) = 2 1 + if (x 1, x 2) (0, 0) x2 2 >: 0 if (x 1, x 2) = 0 shows that 1 2f(0, 0) = 1 but 2 1f(0, 0) = 1. However for sufficiently regular functions the order of differentiation is irrelevant. Theorem 2.4 If f : U R is a function in open subset U R 2, if the partial derivatives 1f(x), 2f(x), 1 2f(x), 2 1f(x) exist at all points x U, and if the mixed partial derivatives 1 2f(x), 2 1f(x) are continuous at a point a U, then 1 2f(a) = 2 1f(a).