The Contraction Mapping Principle

Similar documents
Chapter 2. Metric Spaces. 2.1 Metric Spaces

Chapter 4. Inverse Function Theorem. 4.1 The Inverse Function Theorem

Metric Spaces and Topology

COMPLETE METRIC SPACES AND THE CONTRACTION MAPPING THEOREM

2.3. COMPACTNESS 1. Proposition 2.1. Every compact set in a metric space is closed and bounded.

Mathematical Analysis Outline. William G. Faris

SMSTC (2017/18) Geometry and Topology 2.

Commutative Banach algebras 79

THE INVERSE FUNCTION THEOREM

Existence and Uniqueness

THE INVERSE FUNCTION THEOREM FOR LIPSCHITZ MAPS

INVERSE FUNCTION THEOREM and SURFACES IN R n

Implicit Functions, Curves and Surfaces

Several variables. x 1 x 2. x n

Lecture 4: Completion of a Metric Space

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Some Background Material

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

An introduction to some aspects of functional analysis

The Space of Continuous Functions

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.)

NORMS ON SPACE OF MATRICES

1 Directional Derivatives and Differentiability

1 The Existence and Uniqueness Theorem for First-Order Differential Equations

Topological properties

MATH 51H Section 4. October 16, Recall what it means for a function between metric spaces to be continuous:

Course 212: Academic Year Section 1: Metric Spaces

Continuity. Chapter 4

2.1 Dynamical systems, phase flows, and differential equations

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

FIXED POINT ITERATIONS

Many of you got these steps reversed or otherwise out of order.

Lecture 9 Metric spaces. The contraction fixed point theorem. The implicit function theorem. The existence of solutions to differenti. equations.

Analysis II: The Implicit and Inverse Function Theorems

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

FIXED POINT METHODS IN NONLINEAR ANALYSIS

Functional Analysis Exercise Class

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

B. Appendix B. Topological vector spaces

Maths 212: Homework Solutions

(x, y) = d(x, y) = x y.

TEST CODE: PMB SYLLABUS

The Inverse Function Theorem 1

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M.

The Fundamental Group and Covering Spaces

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

After taking the square and expanding, we get x + y 2 = (x + y) (x + y) = x 2 + 2x y + y 2, inequality in analysis, we obtain.

NOTES ON MULTIVARIABLE CALCULUS: DIFFERENTIAL CALCULUS

David Hilbert was old and partly deaf in the nineteen thirties. Yet being a diligent

Lecture Notes on Metric Spaces

On Existence and Uniqueness of Solutions of Ordinary Differential Equations

1. Bounded linear maps. A linear map T : E F of real Banach

Optimization Theory. A Concise Introduction. Jiongmin Yong

Review of Multi-Calculus (Study Guide for Spivak s CHAPTER ONE TO THREE)

Picard s Existence and Uniqueness Theorem

Continuity. Chapter 4

fy (X(g)) Y (f)x(g) gy (X(f)) Y (g)x(f)) = fx(y (g)) + gx(y (f)) fy (X(g)) gy (X(f))

1.4 The Jacobian of a map

Metric spaces and metrizability

Multivariate Differentiation 1

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

MAT 257, Handout 13: December 5-7, 2011.

Introduction to Topology

Convex Functions and Optimization

Problem List MATH 5143 Fall, 2013

Multivariable Calculus

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions

MA651 Topology. Lecture 10. Metric Spaces.

Introduction to Real Analysis Alternative Chapter 1

16 1 Basic Facts from Functional Analysis and Banach Lattices

REAL AND COMPLEX ANALYSIS

CHAPTER V DUAL SPACES

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

Econ 204 Differential Equations. 1 Existence and Uniqueness of Solutions

M311 Functions of Several Variables. CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3.

NOTES ON EXISTENCE AND UNIQUENESS THEOREMS FOR ODES

MAS331: Metric Spaces Problems on Chapter 1

5.4 Continuity: Preliminary Notions

Lecture 11 Hyperbolicity.

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

MATH 4200 HW: PROBLEM SET FOUR: METRIC SPACES

Functional Analysis HW #3

0.1 Pointwise Convergence

2 Sequences, Continuity, and Limits

1+t 2 (l) y = 2xy 3 (m) x = 2tx + 1 (n) x = 2tx + t (o) y = 1 + y (p) y = ty (q) y =

Math 117: Continuity of Functions

FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

HOMEWORK FOR , FALL 2007 ASSIGNMENT 1 SOLUTIONS. T = sup T(x) Ax A := sup

z x = f x (x, y, a, b), z y = f y (x, y, a, b). F(x, y, z, z x, z y ) = 0. This is a PDE for the unknown function of two independent variables.

Assignment-10. (Due 11/21) Solution: Any continuous function on a compact set is uniformly continuous.

Ordinary Differential Equations

Banach Spaces V: A Closer Look at the w- and the w -Topologies

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Introduction to Proofs in Analysis. updated December 5, By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

Lecture 2: Review of Prerequisites. Table of contents

Transcription:

Chapter 3 The Contraction Mapping Principle The notion of a complete space is introduced in Section 1 where it is shown that every metric space can be enlarged to a complete one without further condition. The importance of complete metric spaces partly lies on the Contraction Mapping Principle, which is proved in Section 2. Two major applications of the Contraction Mapping Principle are subsequently given, first a proof of the Inverse Function Theorem in Section 3 and, second, a proof of the fundamental existence and uniqueness theorem for the initial value problem of differential equations in Section 4. 3.1 Complete Metric Space In R n a basic property is that every Cauchy sequence converges. This property is called the completeness of the Euclidean space. The notion of a Cauchy sequence is well-defined in a metric space. Indeed, a sequence {x n } in (X, d) is a Cauchy sequence if for every ε >, there exists some n such that d(x n, x m ) < ε, for all n, m n. A metric space (X, d) is complete if every Cauchy sequence in it converges. A subset E is complete if (E, d E E ) is complete, or, equivalently, every Cauchy sequence in E converges with limit in E. Proposition 3.1. Let (X, d) be a metric space. (a) Every complete set in X is closed. (b) Every closed set in a complete metric space is complete. In particular, this proposition shows that every subset in a complete metric space is complete if and only if it is closed. 1

2 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE Proof. (a) Let E X be complete and {x n } a sequence converging to some x in X. Since every convergent sequence is a Cauchy sequence, {x n } must converge to some z in E. By the uniqueness of limit, we must have x = z E, so E is closed. (b) Let (X, d) be complete and E a closed subset of X. Every Cauchy sequence {x n } in E is also a Cauchy sequence in X. By the completeness of X, there is some x in X to which {x n } converges. However, as E is closed, x also belongs to E. So every Cauchy sequence in E has a limit in E. Example 3.1. In 25 it was shown that the space R is complete. Consequently, as the closed subsets in R, the intervals [a, b], (, b] and [a, ) are all complete sets. In contrast, the set [a, b), b R, is not complete. For, simply observe that the sequence {b 1/k}, k k, for some large k, is a Cauchy sequence in [a, b) and yet it does not have a limit in [a, b) (the limit is b, which lies outside [a, b)). The set of all rational numbers, Q, is also not complete. Every irrational number is the limit of some sequence in Q, and these sequences are Cauchy sequences whose limits lie outside Q. Example 3.2. In 26 we learned that every Cauchy sequence in C[a, b] with respect to the sup-norm implies that it converges uniformly, so the limit is again continuous. Therefore, C[a, b] is a complete space. The subset E = {f : f(x), x} is also complete. Indeed, let {f n } be a Cauchy sequence in E, it is also a Cauchy sequence in C[a, b] and hence there exists some f C[a, b] such that {f n } converges to f uniformly. As uniform convergence implies pointwise convergence, f(x) = lim n f n (x), so f belongs to E, and E is complete. Next, let P [a, b] be the collection of all polynomials restricted to [a, b]. It is not complete. For, taking the sequence h n (x) given by h n (x) = n k= x k k!, {h n } is a Cauchy sequence in P [a, b] which converges to e x. As e x is not a polynomial, P [a, b] is not complete. To obtain a typical non-complete set, we consider the closed interval [, 1] in R. Take away one point z from it to form E = [a, b] \ {z}. E is not complete, since every sequence in E converging to z is a Cauchy sequence which does not converge in E. In general, you may think of sets with holes being non-complete ones. Now, given a non-complete metric space, can we make it into a complete metric space by filling out all the holes? The answer turns out to affirmative. We can always enlarge a non-complete metric space into a complete one by putting in sufficiently many ideal points. Theorem 3.2 (Completion Theorem). * Every metric space has a completion.

3.2. THE CONTRACTION MAPPING PRINCIPLE 3 This theorem will be further explained and proved in the appendix. 3.2 The Contraction Mapping Principle Solving an equation f(x) =, where f is a function from R n to itself frequently comes up in application. This problem can be turned into a problem for fixed points. Literally, a fixed point of a mapping is a point which becomes unchanged under this mapping. By introducing the function g(x) = f(x) + x, solving the equation f(x) = is equivalent to finding a fixed point for g. This general observation underlines the importance of finding fixed points. In this section we prove the Contraction Mapping Principle, one of the oldest fixed point theorems and perhaps the most well-known one. As we will see, it has a wide range of applications. A map T : (X, d) (X, d) is called a contraction if there is a constant γ (, 1) such that d(t x, T y) γd(x, y), x, y X. A point x is called a fixed point of T if T x = x. Usually we write T x instead of T (x). Theorem 3.3 (Contraction Mapping Principle). Every contraction in a complete metric space admits a unique fixed point. This theorem is also called Banach s Fixed Point Theorem. Proof. Let T be a contraction in the complete metric space (X, d). Pick an arbitrary x X and define a sequence {x n } by setting x n = T x n 1 = T n x, n 1. We claim that {x n } forms a Cauchy sequence in X. First of all, by iteration we have d(t n x, T n 1 x ) γd(t n 1 x, T n 2 x ) γ n 1 d(t x, x ). (3.1) Next, for n N where N is to be specified in a moment, d(x n, x N ) = d(t n x, T N x ) γd(t n 1 x, T N 1 x ) γ N d(t n N x, x ).

4 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE By the triangle inequality and (3.1), n N d(x n, x N ) γ N j=1 n N γ N j=1 < d(t x, x ) γ N. 1 γ d(t n N j+1 x, T n N j x ) γ n N j d(t x, x ) (3.2) For ε >, choose N so large that d(t x, x )γ N /(1 γ) < ε/2. Then for n, m N, d(x n, x m ) d(x n, x N ) + d(x N, x m ) < 2d(T x, x ) 1 γ < ε, thus {x n } forms a Cauchy sequence. As X is complete, x = lim n x n exists. By the continuity of T, lim n T x n = T x. But on the other hand, lim n T x n = lim n x n+1 = x. We conclude that T x = x. Suppose there is another fixed point y X. From γ N d(x, y) = d(t x, T y) γd(x, y), and γ (, 1), we conclude that d(x, y) =, i.e., x = y. Incidentally, we point out that this proof is a constructive one. It tells you how to find the fixed point starting from an arbitrary point. In fact, letting n in (4.2) and then replacing N by n, we obtain an error estimate between the fixed point and the approximating sequence {x n }: d(x, x n ) d(t x, x ) γ n, n 1. 1 γ The following two examples demonstrate the sharpness of the Contraction Mapping Principle. Example 3.3. Consider the map T x = x/2 which maps (, 1] to itself. It is clearly a contraction. If T x = x, then x = x/2 which implies x =. Thus T does not have a fixed point in (, 1]. This example shows that completeness of the underlying space cannot be removed from the assumption of the theorem.

3.2. THE CONTRACTION MAPPING PRINCIPLE 5 Example 3.4. Consider the map S from R to itself defined by Sx = x log (1 + e x ). We have ds dx = 1 (, 1), x R. 1 + ex By the Mean-Value Theorem, Sx 1 Sx 2 = ds dt (c) x 1 x 2 < x 1 x 2, and yet S does not possesses any fixed point. It shows that the contraction condition cannot be removed from the assumption of the theorem. Example 3.5. Let f : [, 1] [, 1] be a continuously differentiable function satisfying f (x) < 1 on [, 1]. We claim that f admits a unique fixed point. For, by the Mean-Value Theorem, for x, y [, 1] there exists some z (, 1) such that f(y) f(x) = f (z)(y x). Therefore, f(y) f(x) = f (z) y x γ y x, where γ = sup t [,1] f (t) < 1 (Why?). We see that f is a contraction. By the Contraction Mapping Principle, it has a unique fixed point. In fact, by using the mean-value theorem one can show that every continuous function from [, 1] to itself admits at least one fixed point. This is a general fact. More generally, according to Brouwer s Fixed Point Theorem, every continuous maps from a compact convex set in R n to itself admits at least one fixed point. However, when the set has non-trivial topology, fixed points may not exist. For instance, take X to be A = {(x, y) : 1 x 2 + y 2 4} and T to be a rotation. It is clear that T has no fixed point in A. This is due to the topology of A, namely, it has a hole. We describe a common situation where the Contraction Mapping Principle can be applied. Let (X, ) be a normed space and Φ : X X satisfying Φ(x ) = y. We asked: Is it locally solvable? That is, for all y sufficiently near y, is there some x close to x so that Φ(x) = y holds? We have the following result. Theorem 3.4 (Perturbation of Identity). Let (X, ) be a Banach space and Φ : B r (x ) X satisfies Φ(x ) = y. Suppose that Φ is of the form I + Ψ where I is the identity map and Ψ satisfies Ψ(x 2 ) Ψ(x 1 ) γ x 2 x 1, x 1, x 2 B r (x ), γ (, 1). Then for y B R (y ), R = (1 γ)r, there is a unique x B r (x ) satisfying Φ(x) = y.

6 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE The idea of the following proof can be explained in a few words. Taking x = y = for simplicity, we would like to find x solving x + Ψ(x) = y. This is equivalent to finding a fixed point for the map T, T x + Ψ(x) = y, that is, T x = y Ψ(x). By our assumption, Ψ is a contraction, so is T. Proof. We first shift the points x and y to by redefining Φ. Indeed, for x B r (), let Φ(x) = Φ(x + x ) Φ(x ) = x + Ψ(x + x ) Ψ(x ). Then Φ() =. Consider this map on B r () given by T x = x ( Φ(x) y), y B R (). We would like to verify that T is a well-defined contraction on B r (). First, we claim that T maps B r () into itself. Indeed, T x = x ( Φ(x) y) = Ψ(x ) Ψ(x + x) + y r. Ψ(x + x) Ψ(x ) + y γ x + R Next, we claim that T is a contraction. Indeed, T x 2 T x 1 = Ψ(x 1 + x ) Ψ(x 2 + x ) γ x 2 x 1. As B r () is a closed subset of the complete space X, it is also complete. The Contraction Mapping Principle can be applied to conclude that for each y B R (), there is a unique fixed point for T, T x = x, in B r (). In other words, Φ(x) = y for a unique x B r (). The desired conclusion follows after going back to Φ. Two remarks are in order. First, it suffices to assume Ψ is a contraction on B r (x ) in the theorem. As a contraction is uniformly continuous, it extends to become a contraction with the same contraction constant in B r (x ). Second, by examining the proof above, one can see that the fixed point x B r (x ) whenever y B R (y ). Obviously, the terminology perturbation of identity comes from the expression Φ(x) = Φ(x + x ) Φ(x ) = x + Ψ(x + x ) Ψ(x ), which is in form of the identity plus a term satisfying the smallness condition Ψ(x + x ) Ψ(x ) γ x, γ (, 1).

3.2. THE CONTRACTION MAPPING PRINCIPLE 7 Example 3.6. Show that the equation 3x 4 x 2 + x =.5 has a real root. We look for a solution near. Let X be R and Φ(x) = x + Ψ(x) where Ψ(x) = 3x 4 x 2 so that Φ() =. According to the theorem, we need to find some r so that Ψ becomes a contraction. For x 1, x 2 B r (), that is, x 1, x 2 [ r, r],, we have Ψ(x 1 ) Ψ(x 2 ) = (3x 4 2 x 2 2) (3x 4 1 x 2 1) (3 x 3 2 + x 2 2x 1 + x 2 x 2 1 + x 3 1 + x 2 + x 1 ) x 2 x 1 (12r 3 + 2r) x 2 x 1, which is a contraction as long as γ = (12r 3 +2r) < 1. Taking r = 1/4, then γ = 11/16 < 1 will do the job. Then R = (1 γ)r = 5/64.78. We conclude that for all numbers b, b < 5/64, the equation 3x 4 x 2 + x = b has a unique root in ( 1/4, 1/4). Now,.5 falls into this range, so the equation has a root. For solving systems of equations in R n, there is a convenient way. Let us assume that we are given Φ(x) = x + Ψ(x) : G R where G is some open set containing and Φ() =. We claim: In case Ψ is differentiable near and Ψ i lim (x) =, i, j = 1,, n, x x j for sufficiently small r, Ψ is a contraction on B r (). Consequently, we may apply the theorem on perturbation of identity to conclude that Φ(x) = y is uniquely solvable near. To prove the claim, we fix x 1, x 2 B r () where r is to be determined and consider the function ϕ(t) = Ψ i (x 1 + t(x 2 x 1 )). We have ϕ() = Ψ i (x 1 ) and ϕ(1) = Ψ i (x 2 ). By the mean value theorem, there is some t (, 1) such that ϕ(1) ϕ() = ϕ (t )(1 ) = ϕ (t ). By the Chain Rule, ϕ (t) = d dt Ψ i(x 1 + t(x 2 x 1 )) = Ψ i (x 1 + t(x 2 x 1 ))(x 21 x 11 ) + + Ψ i (x 1 + t(x 2 x 1 ))(x 2n x 1n ) x 1 x n n Ψ i = (x 1 + t(x 2 x 1 ))(x 2j x 1j ). x j j=1 Setting z = x 1 + t (x 2 x 1 ), we have Ψ i (x 2 ) Ψ i (x 1 ) = n j=1 Ψ i x j (z)(x 2j x 1j ). By Cauchy-Schwarz Inequality, Ψ i (x 2 ) Ψ i (x 1 ) 2 n j=1 ( ) 2 Ψi (z) x 2 x 1 2. x j

8 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE Summing over i, where Ψ(x 1 ) Ψ(x 2 ) M x 1 x 2, M = sup z r i,j ( ) 2 Ψi (z). x j From our assumption, we can find some small r such that M < 1. This formula will be useful in the exercise. Example 3.7. Consider the integral equation y(x) = g(x) + K(x, t)y 2 (t)dt, where K(x, t) C([, 1] 2 ) and g C[, 1]. We would like to show that it admits a solution y as long as g is small in some sense. Our first job is to formulate this problem as a problem of perturbation of identity. We work on the Banach space C[, 1] and let That is, Φ(y)(x) = y(x) Ψ(y)(x) = K(x, t)y 2 (t)dt. K(x, t)y 2 (t)dt. We further choose x to be, the zero function, so y = Φ() =. Then, for y 2, y 1 B r () (r to be specified later), Ψ(y 2 ) Ψ(y 1 ) K(x, t) y 2 2 y 2 1 (t)dt M 2r y 2 y 1, M = max{ K(x, t) : (x, t) [, 1] 2 }, which shows that Ψ is a contraction as long as γ = 2Mr < 1. Under this condition, we may apply Theorem 3.4 to conclude that for all g B R (), R = (1 γ)r, the integral equation y(x) K(x, t)y 2 (t)dt = g(x), has a unique solution y B r (). For instance, we fix r = 1/(4M) so that 2Mr = 1/2 and R = 1/(8M). This integral equation is solvable for g as long as g < 1/(8M). In fact, there is exactly one solution satisfying y 1/(4M). You should be aware that in these two examples, the first underlying space is the Euclidean space and the second one is a function space. It shows the power of abstraction, that is, the fixed point theorem applies to all complete metric spaces.

3.3. THE INVERSE FUNCTION THEOREM 9 3.3 The Inverse Function Theorem We present some old and new results as preparations for the proof of the inverse function theorem. They are operator norm of a matrix, chain rule and a form of the mean value theorem. First, recall that an n n-matrix is a linear transformation from R n to R n (here we consider square matrices only). We may use the supnorm to describe its size, that is, let A = sup{ Ax 2 : x R n }. However, it is quickly realized that A becomes unless A is the zero matrix. Observing that a linear transformation is completely determined by its values in the unit ball, we define its operator norm as A op = sup{ Ax 2 : x R n, x 2 1}. The subscript op indicates it is the operator norm. It is straightforward to verify that this really defines a norm on the vector space of all n n-matrices. In the following we change the notation by writing x instead of x 2 and A instead of A op. Proposition 3.5. For n n-matrix A, B, (a) (b) (c) A = sup{ Ax / x : x, x R n.} Ax A x, x R n. AB A B. Proof. (a) Let α = sup{ Ax / x : x, x R n.} As sup{ Ax / x : x, x R n.} = sup{ Aˆx : ˆx = 1.} sup{ Ax : x 1.}, we see that α A. On the other hand, from the definition of α we have Ax α x for all x. Therefore, A = sup{ Ax : x 1} α. We conclude that α = A. (b) Just observe that for every non-zero x, ˆx = x/ x is a unit vector. (c) We have ABx A Bx A B x. Dividing both sides by x and then taking supremum, the desired result follows.

1 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE In the exercise you are asked to show that A 2 2 is equal to the largest eigenvalue of the symmetric matrix A t A in absolute value. A rough estimate on A 2 is also given by A 2 a 2 ij. i,j Second, let us recall the general chain rule. Let G : R n R m and F : R m R l be C 1, so their composition H = F G : R n R l is also C 1. We compute the first partial derivatives of H in terms of the partial derivatives of F and G. Letting G = (g 1,, g m ), F = (f 1,, f l ) and H = (h 1,, h l ). From we have h k (x 1,, x n ) = f k (g 1 (x),, g m (x)), k = 1,, l, h k x j = n i=1 f k y i g i x j. It is handy to write things in matrix form. Let DF be the Jacobian matrix of F, that is, F 1 F x 1 1 x m DF = F l x 1 F l x m and similarly for DG and DH. Then the formula above becomes DF (G(x))DG(x) = DH(x). Third, the mean-value in one-dimensional case reads as f(y) f(x) = f (c)(y x) for some value c lying between x and y. To remove the uncertainty of c, we note the alternative formula which is obtained from f(y) = f(x) + = f(x) + f(y) = f(x) + d f(x + t(y x)) dt dt We will need its n-dimensional version. f (x + t(y x)) dt(y x), (Fundamental Theorem of Calculus) f (x + t(y x)) dt(y x) (Chain Rule).

3.3. THE INVERSE FUNCTION THEOREM 11 Proposition 3.6. Let F be C 1 (B) when B is a ball in R n. For x 1, x 2 B, F (x 2 ) F (x 1 ) = Componentwise this means DF (x 1 + t(x 2 x 1 )) dt (x 2 x 1 ). F i (x 2 ) F i (x 1 ) = n j=1 F i x j (x 1 + t(x 2 x 1 ))dt (x 2j x 1j ), i = 1,, n. Proof. Applying Chain Rule to each function F i, we have F i (x 2 ) F i (x 1 ) = = = d dt F i(x 1 + t(x 2 x 1 ))dt F i (x 1 + t(x 2 x 1 ))(x 2j x 1j )dt x j j DF (x 1 + t(x 2 x 1 )) dt (x 2 x 1 ). The Inverse Function Theorem and Implicit Function Theorem play a fundamental role in analysis and geometry. They illustrate the principle of linearization which is ubiquitous in mathematics. We learned these theorems in advanced calculus but the proofs were not emphasized. Now we fill out the gap. All is about linearization. Recall that a real-valued function on an open interval I is differentiable at some x I if there exists some a R such that f(x) f(x ) a(x x ) lim =. x x x x In fact, the value a is equal to f (x ), the derivative of f at x. We can rewrite the limit above using the little o notation: f(x + z) f(x ) = f (x )z + (z), as z. Here (z) denotes a quantity satisfying lim z (z)/ z =. The same situation carries over to a real-valued function f in some open set in R n. A function f is called differentiable at x in this open set if there exists a vector a = (a 1,, a n ) such that f(x + x) f(x ) = n a j x j + ( x ) as x. j=1

12 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE Note that here x = (x 1,, x n ) is a vector. Again one can show that the vector a is uniquely given by the gradient vector of f at x ( f f(x ) = (x ),, f ) (x ). x 1 x n More generally, a map F from an open set in R n to R m is called differentiable at a point x in this open set if each component of F = (f 1,, f m ) is differentiable. We can write the differentiability condition collectively in the following form F (x + x) F (x ) = DF (x )x + o(x), (3.3) where DF (x ) is the linear map from R n to R m given by (DF (x )z) i = n a ij (x )x j, i = 1,, m, j=1 where ( a ij ) = ( f i / x j ) is the Jabocian matrix of f. (3.3) shows near x, that is, when x is small, the function F is well-approximated by the linear map DF (x ) up to the constant F (x ) as long as DF (x ) is nonsingular. It suggests that the local information of a map at a differentiable point could be retrieved from its a linear map, which is much easier to analyse. This principle, called linearization, is widely used in analysis. The Inverse Function Theorem is a typical result of linearization. It asserts that a map is locally invertible if its linearization is invertible. Therefore, local bijectivity of the map is ensured by the invertibility of its linearization. When DF (x ) is not invertible, the first term on the right hand side of (4.3) may degenerate in some or even all direction so that DF (x )x cannot control the error term (z). In this case the local behavior of F may be different from its linearization. Theorem 3.7 (Inverse Function Theorem). Let F : U R n be a C 1 -map where U is open in R n and x U. Suppose that DF (x ) is invertible. (a) There exist open sets V and W containing x and F (x ) respectively such that the restriction of F on V is a bijection onto W with a C 1 -inverse. (b) The inverse is C k when F is C k, 1 k, in V. A map from some open set in R n to R m is C k, 1 k, if all its components belong to C k. It is called a C -map or a smooth map if its components are C. Similarly, a matrix is C k or smooth if its entries are C k or smooth accordingly. The condition that DF (x ) is invertible, or equivalently the non-vanishing of the determinant of the Jacobian matrix, is called the nondegeneracy condition. Without this condition, the map may or may not be local invertible, see the examples below.

3.3. THE INVERSE FUNCTION THEOREM 13 When the inverse is differentiable, we may apply this chain rule to differentiate the relation F 1 (F (x)) = x to obtain DF 1 (y ) DF (x ) = I, y = F (x ), where I is the identity map. We conclude that DF 1 (y ) = ( DF (x ) ) 1. In other words, the matrix of the derivative of the inverse map is precisely the inverse matrix of the derivative of the map. We conclude that although the inverse may exist without the non-degeneracy condition. This condition is necessary in order to have a differentiable inverse. We single it out in the following proposition. Proposition 3.8. Let F : U R n be a C 1 -map and x U. Suppose for some open V in U containing x, F is invertible in V with a differentiable inverse. Then DF (x ) is non-singular. Now we prove Theorem 3.7. Let x U and y = F (x ). We write F (x) = DF (x )x+ (F (x) DF (x )x). Apply A (DF (x )) 1 to both sides of this expression AF (x) = x + A (F (x) DF (x )x) to turn it into the form of perturbation of identity. We set Φ(x) = AF (x) and Ψ(x) = A (F (x) DF (x )x), so Φ(x) = Φ(x + x ) Φ(x ) = x + A (F (x + x ) F (x ) DF (x )x) x + Ψ(x). By Proposition 3.6, for x 1, x 2 B r () where r is to be specified later, Ψ(x 2 ) Ψ(x 1 ) = A (F (x 2 + x ) F (x 1 + x ) DF (x ) (x 2 x 1 )) = A (DF (x 1 + x + t(x 2 x 1 )) DF (x )) dt (x 2 x 1 ) AB op x 2 x 1 A op B op x 2 x 1, where the matrix B is given by B = (DF (x 1 + x + t(x 2 x 1 )) DF (x )) dt. Note that the ij-th entry of B = (b ij ) is given by ( Fi b ij = (x 1 + x + t(x 2 x 1 )) F ) i (x ) dt. x j x j

14 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE By the continuity of F i / x j, given ε >, there is some r such that F i (z) F i (x ) x j x j < ε, z B r(x ). As x 1 + x + t(x 2 x 1 ) B r (x ) whenever x 1, x 2 B r (), F i (x 1 + x + t(x 2 x 1 )) F i (x ) x j x j < ε, x 1, x 2 B r (). It follows that and B op b 2 ij n 2 ε 2 = nε, i,j Ψ(x 2 ) Ψ(x 1 ) A op nε x 2 x 1. Now, if we choose ε satisfying A op nε = 1/2, that is, Ψ is a contraction. Ψ(x 2 ) Ψ(x 1 ) 1 2 x 2 x 1, x 1, x 2 B r (), (3.4) By Theorem 3.4, we conclude that Φ(x) = y is uniquely solvable for y B R (), R = (1 1/2)r = r/2, with x B r (), that is, Φ(x) = y is uniquely solvable for y B R (Φ(x )) with x B r (x ). In terms of F, AF (x) = y is uniquely solvable for y B R (Ay ). Since A is invertible, W = A 1 (B R (Ay )) is an open set containing y and F (x) = y is solvable for y W with a unique x B r (x ). This correspondence y x defines a map from W to B r (x ) which satisfies F (G(y)) = y, that is, it is the inverse of F (when restricted to B r (x ). It suffices to take V = G(W ) B r (x ) so that G maps W onto V. Note that V = F 1 (W ) is an open set. Next, we claim that G is continuous. Let G be the inverse to Φ which maps B R () back to B r (). The following relation holds: Indeed, for y B R (Ay ), G(y) = G(A(y y )) + x. Φ(G(A 1 y)) = Φ( G(y Ay ) + x ) = Φ( G(y Ay ) + x x ) + Ay = y, which implies F (G(y)) = y in W. Thus it suffices to show that G is continuous. In fact, for G(y i ) = x i B r (), i = 1, 2, (not to be mixed up with the x i above), we have

3.3. THE INVERSE FUNCTION THEOREM 15 Φ(x i ) = y i or x i = Ψ(x i ) + y i, and G(y 2 ) G(y 1 ) = x 2 x 1 Ψ(x 1 ) Ψ(x 2 ) + y 2 y 1 1 2 x 2 x 1 + y 2 y 1, after using (3.4). We deduce G(y 2 ) G(y 1 ) 2 y 2 y 1, that s, G is Lipschitz continuous on B R (). Finally, let s show that G is a C 1 -map in B R (). In fact, for y 1, y 1 + y in B R (), using we have y = (y 1 + y) y 1 = Φ( G(y 1 + y)) Φ( G(y 1 )) = D Φ( G(y 1 ) + t( G(y 1 + y) G(y 1 ))dt ( G(y 1 + y) G(y 1 )), G(y 1 + y) G(y 1 ) = (D Φ) 1 ( G(y 1 ))y + R, where R is given by (D Φ) 1 ( G(y 1 )) ( D Φ( G(y 1 )) D Φ( G(y 1 )+t( G(y 1 +y) G(y ) 1 )) ( G(y 1 +y) G(y 1 ))dt. As G is continuous and F is C 1, we have G(y 1 + y) G(y 1 ) (D Φ) 1 ( G(y 1 ))y = (1)( G(y 1 + y) G(y 1 )) for small y. Using (4.6), we see that G(y 1 + y) G(y 1 ) (D Φ) 1 ( G(y 1 ))y = (y), as y. We conclude that G is differentiable with Jacobian matrix (D Φ) 1 ( G(y 1 )). It implies that G is also differentiable. After proving the differentiability of G, from the formula DF (G(y))DG(y) = I where I is the identity matrix we see that DG(y) = (DF (G(y)) 1 for y B R (). From linear algebra we know that each entry of DG(y) can be expressed as a rational function of the entries of the matrix of DF (G(y)). Consequently, DG(y) is C k in y if DF (G(y)) is C k for 1 k. The proof of the Inverse Function Theorem is completed.

16 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE (Added Nov 1. For those who like to see the details in the end of the proof above, let us write R = MN( G(y 1 + y) G(y 1 )), where the matrix M = (D Φ) 1 ( G(y 1 )) and the matrix N = ( D Φ( G(y 1 )) D Φ( G(y 1 ) + t( G(y 1 + y) G(y ) 1 )) dt. As G(y 1 + y) G(y 1 ) 2 (y 1 + y) y 1 = 2 y from above, we have R op / y 2 MN op 2 M op N op. Thus R = o( y ) follows from lim y N op =. Using the estimate N op ( i,j n2 ij) 1/2, it suffices to show lim y n ij = for all i, j. Here n ij is the (i, j)-entry of the matrix N: n ij = ( D Φ ij ( G(y 1 )) D Φ ij ( G(y 1 ) + t( G(y 1 + y) G(y ) 1 ) dt. Since Φ is C 1, for ε >, there is some δ 1 such that D Φ ij ( G(y 1 )) D Φ ij (z) < ε, z, z G(y 1 ) < δ 1. For this δ 1, there exists some δ such that G(y 1 ) G(y 1 + y) < δ 1 whenever y < δ. It follows that for all y, y < δ, D Φ ij ( G(y 1 )) D Φ ij ( G(y 1 ) + t( G(y 1 + y) G(y 1 )) < ε, for all i, j. (Note that here z = G(y 1 ) + t( G(y 1 + y) G(y 1 )), so that z G(y 1 ) t( G(y 1 + y) G(y 1 ) < δ 1. Therefore, n ij for all i, j and y, y < δ, done.) D Φ ij ( G(y 1 )) D Φ ij ( G(y 1 ) + t( G(y 1 + y) G(y 1 ) dt < ε, Example 3.8. The Inverse Function Theorem asserts a local invertibility. Even if the linearization is non-singular everywhere, we cannot assert global invertibility. Let us consider the switching between the cartesian and polar coordinates in the plane: x = r cos θ, y = r sin θ. The function F : (, ) (, ) R 2 given by F (r, θ) = (x, y) is a continuously differentiable function whose Jacobian matrix is non-singular except (, ). However, it is clear that F is not bijective, for instance, all points (r, θ + 2nπ), n Z, have the same image under F.

3.3. THE INVERSE FUNCTION THEOREM 17 Example 3.9. An exceptional case is dimension one where a global result is available. Indeed, in 26 we learned that if f is continuously differentiable on (a, b) with nonvanishing f, it is either strictly increasing or decreasing so that its global inverse exists and is again continuously differentiable. Example 3.1. Consider the map F : R 2 R 2 given by F (x, y) = (x 2, y). Its Jacobian matrix is singular at (, ). In fact, for any point (a, b), a >, F (± a, b) = (a, b). We cannot find any open set, no matter how small is, at (, ) so that F is injective. On the other hand, the map H(x, y) = (x 3, y) is bijective with inverse given by J(x, y) = (x 1/3, y). However, as the non-degeneracy condition does not hold at (, ) so it is not differentiable there. In these cases the Jacobian matrix is singular, so the nondegeneracy condition does not hold. Inverse Function Theorem may be rephrased in the following form. A C k -map F between open sets V and W is a C k -diffeomorphism if F 1 exists and is also C k. Let f 1, f 2,, f n be C k -functions defined in some open set in R n whose Jacobian matrix of the map F = (f 1,, f n ) is non-singular at some point x in this open set. By Theorem 4.1 F is a C k -diffeomorphism between some open sets V and W containing x and F (x ) respectively. To every function Φ defined in W, there corresponds a function defined in V given by Ψ(x) = Φ(F (x)), and the converse situation holds. Thus every C k -diffeomorphism gives rise to a local change of coordinates. Next we deduce Implicit Function Theorem from Inverse Function Theorem. Theorem 3.9 (Implicit Function Theorem). Consider C 1 -map F : U R m where U is an open set in R n R m. Suppose that (x, y ) U satisfies F (x, y ) = and D y F (x, y ) is invertible in R m. There exist an open set V 1 V 2 in U containing (x, y ) and a C 1 -map ϕ : V 1 V 2, ϕ(x ) = y, such that F (x, ϕ(x)) =, x V 1. The map ϕ belongs to C k when F is C k, 1 k, in U. Moreover, assume further that DF y is invertible in V 1 V 2. If ψ is a C 1 -map from V 1 to V 2 satisfying F (x, ψ(x)) =, then ψ coincides with ϕ. The notation D y F (x, y ) stands for the linear map associated to the Jocabian matrix ( F i / y j (x, y )) i,j=1,,m where x is fixed. In general, a version of Implicit Function Theorem holds when the rank of DF at a point is m. In this case, we can rearrange the independent variables to make D y F non-singular at that point. Proof. Consider Φ : U R n R m given by Φ(x, y) = (x, F (x, y)).

18 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE It is evident that DΦ(x, y) is invertible in R n R m when D y F (x, y) is invertible in R m. By the Inverse Function Theorem, there exists a C 1 -inverse Ψ = (Ψ 1, Ψ 2 ) from some open W in R n R m containing Φ(x, y ) to an open subset of U. By restricting W further we may assume Ψ(W ) is of the form V 1 V 2. For every (x, z) W, we have which, in view of the definition of Φ, yields Φ(Ψ 1 (x, z), Ψ 2 (x, z)) = (x, z), Ψ 1 (x, z) = x, and F ((Ψ 1 (x, z), Ψ 2 (x, z)) = z. In other words, F (x, Ψ 2 (x, z)) = z holds. In particular, taking z = gives F (x, Ψ 2 (x, )) =, x V 1, so the function ϕ(x) Ψ 2 (x, ) satisfies our requirement. For uniqueness, we note that by assumption the matrix D y F (x, y 1 + t(y 2 y 1 )dt is invertible for (x, y 1 ), (x, y 2 ) V 1 V 2. Now, suppose ψ is a C 1 -map defined near x satisfying F (x, ψ(x)) =. We have = F (x, ψ(x)) F (x, ϕ(x)) = D y F (x, ϕ(x) + t(ψ(x) ϕ(x))dt(ψ(x) ϕ(x)), for all x in the common open set they are defined. This identity forces that ψ coincides with ϕ in V 1. The proof of the implicit function is completed, once we observe that the regularity of ϕ follows from Inverse Function Theorem. Example 3.11. We illustrate the condition det DF y (x, y ) or the equivalent condition rankdf (p ) = m, p R n+m, in the following three cases. First, consider the function F 1 (x, y) = x y 2 + 3. We have F 1 ( 3, ) = and F 1x ( 3, ) = 1. By Implicit Function Theorem, the zero set of F 1 can be described near ( 3, ) by a function x = ϕ(y) near y =. Indeed, by solving the equation F 1 (x, y) =, ϕ(y) = y 2 3. On the other hand, F 1y ( 3, ) = and from the formula y = ± x + 3 we see that the zero set is not a graph over an open interval containing 3. Next we consider the function F 2 (x, y) = x 2 y 2 at (, ). We have F 2x (, ) = F 2y (, ) =. Indeed, the zero set of F 2 consists of the two straight lines x = y and

3.4. PICARD-LINDELÖF THEOREM FOR DIFFERENTIAL EQUATIONS 19 x = y intersecting at the origin. It is impossible to express it as the graph of a single function near the origin. Finally, consider the function F 3 (x, y) = x 2 + y 2 at (, ). We have F 3x (, ) = F 3y (, ) =. Indeed, the zero set of F 3 degenerates into a single point {(, )} which cannot be the graph of any function. It is interesting to note that the Inverse Function Theorem can be deduced from Implicit Function Theorem. Thus they are equivalent. To see this, keeping the notations used in Theorem 4.5. Define a map F : U R n R n by F (x, y) = F (x) y. Then F (x, y ) =, y = F (x ), and D F (x, y ) is invertible. By Theorem 4.8, there exists a C 1 -function ϕ from near y satisfying ϕ(y ) = x and F (ϕ(y), y) = F (ϕ(y)) y =, hence ϕ is the local inverse of F. 3.4 Picard-Lindelöf Theorem for Differential Equations In this section we discuss the fundamental existence and uniqueness theorem for differential equations. I assume that you learned the skills of solving ordinary differential equations already so we will focus on the theoretical aspects. Most differential equations cannot be solved explicitly, in other words, they cannot be expressed as the composition of elementary functions. Nevertheless, there are two exceptional classes which come up very often. Let us review them before going into the theory. Example 3.12. Consider the equation dx dt = a(t)x + b(t), where a and b are continuous functions defined on some open interval I. This differential equation is called a linear differential equation because it is linear in x (with coefficients functions of t). The general solution of this linear equation is given by the formula x(t) = e α(t) (x + where t I, x R, are arbitrary. ) e α(s) b(s)ds t, α(t) = ˆ t ˆ t t a(s)ds,

2 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE Example 3.13. The second class is the so-called separable equation dx dt = f(t) g(x), where f and g are continuous functions on intervals I and J respectively. The solution can be obtained by an integration ˆ x g(z)dz = x ˆ t f(s)ds, t t I, x J. The resulting relation, written as G(x) = F (t), can be converted formally into x = G 1 F (t), a solution to the equation as immediately verified by the chain rule. For instance, consider the equation dx dt = t + 3. x The solution is given by integrating ˆ x ˆ t xdx = (t + 3)dt, x t to get We have x 2 = t 2 + 6t + c, c R. x(t) = ± t 2 + 6t + c. When x() = 2 is specified, we find the constant c = 4, so the solution is given by x(t) = t 2 + 6t + 4. More interesting explicitly solvable equations can be found in texts on ODE s. Well, let us consider the general situation. Numerous problems in natural sciences and engineering led to the initial value problem of differential equations. Let f be a function defined in the rectangle R = [t a, t + a] [x b, x + b] where (t, x ) R 2 and a, b >. We consider the initial value problem (IVP) or Cauchy Problem dx dt = f(t, x), x(t ) = x. (IVP) (In some books the independent variable t is replaced by x and the dependent variable x is replaced by y. We prefer to use t instead of x as the independent variable in many cases is the time.) To solve the initial value problem it means to find a function x(t) defined in a perhaps smaller rectangle, that is, x : [t a, t + a ] [x b, x + b], which

3.4. PICARD-LINDELÖF THEOREM FOR DIFFERENTIAL EQUATIONS 21 is differentiable and satisfies x(t ) = x and x (t) = f(t, x(t)), t [t a, t + a ], for some < a a. In general, no matter how nice f is, we do not expect there is always a solution on the entire [t a, t + a]. Let us look at the following example. Example 3.14. Consider the initial value problem dx dt = 1 + x2, x() =. The function f(t, x) = 1 + x 2 is smooth on [ a, a] [ b, b] for every a, b >. However, the solution, as one can verify immediately, is given by x(t) = tan t which is only defined on ( π/2, π/2). It shows that even when f is very nice, a could be strictly less than a. The Picard-Lindelöf theorem, sometimes referred to as the fundamental theorem of existence and uniqueness of differential equations, gives a clean condition on f ensuring the unique solvability of the initial value problem (IVP). This condition imposes a further regularity condition on f reminding what we did in the convergence of Fourier series. Specifically, a function f defined in R satisfies the Lipschitz condition (uniform in t) if there exists some L > such that (t, x i ) R [t a, t + a] [x b, x + b], i = 1, 2, f(t, x 1 ) f(t, x 2 ) L x 1 x 2. Note that in particular means for each fixed t, f is Lipschitz continuous in x. The constant L is called a Lipschitz constant. Obviously if L is a Lipschitz constant for f, any number greater than L is also a Lipschitz constant. Not all continuous functions satisfy the Lipschitz condition. An example is given by the function f(t, x) = tx 1/2 is continuous. I let you verify that it does not satisfy the Lipschitz condition on any rectangle containing the origin. In application, most functions satisfying the Lipschitz condition arise in the following manner. A C 1 -function f(t, x) in a closed rectangle automatically satisfies the Lipschitz condition. For, by the mean-value theorem, for some z lying on the segment between x 1 and x 2, f(t, x 2 ) f(t, x 1 ) = f x (t, z)(x 2 x 1 ). Letting { L = max f (t, x) x }, : (t, x) R (L is a finite number because f/ y is continuous on R and hence bounded), we have f(t, x 2 ) f(t, x 1 ) L x 2 x 1, (t, x i ) R, i = 1, 2.

22 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE Theorem 3.1 (Picard-Lindelöf Theorem). Consider (IVP) where f C(R) satisfies the Lipschitz condition on R = [t a, t + a] [x b, x + b]. There exist a (, a) and x C 1 [t a, t + a ], x b x(t) x + b for all t [t a, t + a ], solving (IVP). Furthermore, x is the unique solution in [t a, t + a ]. From the proof one will see that a can be taken to be any number satisfying { < a b < min a, M, 1 }, L where M = sup{ f(t, x) : (t, x) R}. To prove Picard-Lindelöf Theorem, we first convert (IVP) into a single integral equation. Proposition 3.11. Setting as Picard-Lindelöf Theorem, every solution x of (IVP) from [t a, t + a ] to [x b, x + b] satisfies the equation x(t) = x + ˆ t t f(t, x(t)) dt. (3.7) Conversely, every continuous function x(t), t [t a, t + a ], satisfying (3.7) is continuously differentiable and solves (IVP). Proof. When x satisfies x (t) = f(t, x(t)) and x(t ) = x, (3.7) is a direct consequence of the Fundamental Theorem of Calculus (first form). Conversely, when x(t) is continuous on [t a, t + a ], f(t, x(t)) is also continuous on the same interval. By the Fundamental Theorem of Calculus (second form), the left hand side of (3.7) is continuously differentiable on [t a, t + a ] and solves (IVP). Note that in this proposition we do not need the Lipschitz condition; only the continuity of f is needed. Proof of Picard-Lindelöf Theorem. Instead of solving (IVP) directly, we look for a solution of (3.7). We will work on the metric space X = {ϕ C[t a, t + a ] : ϕ(t) [x b, x + b], ϕ(t ) = x }, with the uniform metric. It is easily verified that it is a closed subset in the complete metric space C[t a, t + a ] and hence complete. Recall that every closed subset of a complete metric space is complete. The number a will be specified below. We are going to define a contraction on X. Indeed, for x X, define T by (T x)(t) = x + ˆ t t f(s, x(s)) ds.

3.4. PICARD-LINDELÖF THEOREM FOR DIFFERENTIAL EQUATIONS 23 First of all, for every x X, it is clear that f(t, x(t)) is well-defined and T x C[t a, t + a ]. To show that it is in X, we need to verify x b (T x)(t) x + b for all t [t a, t + a ]. We claim that this holds if we choose a satisfying a b/m, M = sup { f(t, x) : (t, x) R}. For, ˆ t (T x)(t) x = f(t, x(t)) dt t M t t Ma b. Next, we claim T is a contraction on X when a is further restricted to a < 1/L where and L is the Lipschitz constant for f. For, ˆ t (T x 2 T x 1 )(t) = f(t, x 2 (t)) f(t, x 1 (t)) dt t ˆ t f(t, x2 (t)) f(t, x 1 (t)) dt t ˆ t L x 2 (t) x 1 (t) dt t where I = [t a, t + a ]. It follows that L sup x 2 (t) x 1 (t) t t t I La x 2 x 1, T x 2 T x 1 γ x 2 x 1, γ = a L < 1. Now we can apply the Contraction Mapping Principle to conclude that T x = x for some x, and x solves (IVP). We have shown that (IVP) admits a solution in [t a, t + a ] where a can be chosen to be any number less than min{a, b/m, 1/L}. We point out that the existence part of Picard-Lindelöf Theorem still holds without the Lipschitz condition. We will prove this in the next chapter. However, the solution may not be unique. In the exercise you are asked to extend the local solution obtained in this theorem to a maximal one. Picard-Lindelöf Theorem remains valid for systems of differential equations. Consider the system

24 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE dx j dt = f j(t, x 1, x 2,, x N ), x j (t ) = x j, where j = 1, 2,, N. By setting x = (x 1, x 2,, x N ) and f = (f 1, f 2,, f N ), we can express it as in (IVP) but now both x and f are vectors. Essentially following the same arguments as the case of a single equation, we have Theorem 3.12 (Picard-Lindelöf Theorem for Systems). Consider (IVP) where f = (f 1,, f N ), f j C(R) satisfies the Lipschitz condition f(t, x) f(t, y) L x y, for all (t, x) R = [t a, t + a] [x 1 b, x 1 + b] [x N b, x N + b]. There exists a unique solution x C 1 [t a, t + a ], x(t) [x 1 b, x 1 + b] [x N b, x N + b], to (IVP) where < a < min { b a, M, 1 } L, M f j (t, x) : (t, x) R, j = 1,, N. Finally, let me remind you that there is a standard way to convert the initial value problem for higher order differential equation (m 2) x (m) = f(t, x, x,, x (m 1) ), x(t ) = x, x (t ) = x 1,, x (m 1) (t) = x m 1, into a system of first order differential equations. As a result, we also have a corresponding Picard-Lindelöf theorem for higher order differential equations. I will let you formulate this result. 3.5 Appendix: Completion of a Metric Space A metric space (X, d) is called isometrically embedded in (Y, ρ) if there is a mapping Φ : X Y such that d(x, y) = ρ(φ(x), Φ(y)). Note that it must be 1-1 and continuous.

3.5. APPENDIX: COMPLETION OF A METRIC SPACE 25 We call the metric space (Y, ρ) a completion of (X, d) if it is complete, (X, d) is embedded in (Y, ρ) and Φ(X) = Y. The latter condition is a minimality condition; (X, d) is enlarged merely to accommodate those ideal points to make the space complete. When X is isometrically embedded in Y, we may identify X with its image Φ(X) and d with ρ. Or, we can image X being enlarged to a larger set Y where d is also extended to some ρ on Y which makes Y complete. Before the proof of the Completion Theorem we briefly describe the idea. When (X, d) is not complete, we need to invent ideal points and add them to X to make it complete. The idea goes back to Cantor s construction of the real numbers from rational numbers. Suppose now we have only rational numbers and we want to add irrationals. First we identify Q with a proper subset in a larger set as follows. Let C be the collection of all Cauchy sequences of rational numbers. Every point in C is of the form (x 1, x 2, ) where {x n }, x n Q, forms a Cauchy sequence. A rational number x is identified with the constant sequence (x, x, x,... ) or any Cauchy sequence which converges to x. For instance, 1 is identified with (1, 1, 1,... ), (.9,.99,.999,... ) or (1.1, 1.1, 1.1,... ). Clearly, there are Cauchy sequences which cannot be identified with rational numbers. For instance, there is no rational number corresponding to (3, 3.1, 3.14, 3.141, 3.1415,... ), as we know, its correspondent should be the irrational number π. Similar situation holds for the sequence (1, 1.4, 1.41, 1.414, ) which should correspond to 2. Since the correspondence is not injective, we make it into one by introducing an equivalence relation on C Indeed, {x n } and {y n } are said to be equivalent if x n y n as n. The equivalence relation forms the quotient C/ which is denoted by C. Then x x sends Q injectively into C. It can be shown that C carries the structure of the real numbers. In particular, those points not in the image of Q are exactly all irrational numbers. Now, for a metric space the situation is similar. We let C be the quotient space of all Cauchy sequence in X under the relation {x n } {y n } if and only if d(x n, y n ). Define d( x, ỹ) = lim n d(x n, y n ), for x x, y ỹ. We have the embedding (X, d) ( X, d), and we can further show that it is a completion of (X, d). The following proof is for optional reading. In the exercise we will present a simpler but less instructive proof. Proof of Theorem 4.2. Let C be the collection of all Cauchy sequences in (M, d). We introduce a relation on C by x y if and only if d(x n, y n ) as n. It is routine to verify that is an equivalence relation on C. Let X = C/ and define a map: X X [, ) by d( x, ỹ) = lim d(x n, y n ) n where x = (x 1, x 2, x 3, ) and y = (y 1, y 2, y 3, ) are respective representatives of x and ỹ. We note that the limit in the definition always exists: For d(x n, y n ) d(x n, x m ) + d(x m, y m ) + d(y m, y n )

26 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE and, after switching m and n, d(x n, y n ) d(x m, y m ) d(x n, x m ) + d(y m, y n ). As x and y are Cauchy sequences, d(x n, x m ) and d(y m, y n ) as n, m, and so {d(x n, y n )} is a Cauchy sequence of real numbers. Step 1. (well-definedness of d) To show that d( x, ỹ) is independent of their representatives, let x x and y y. We have After switching x and x, and y and y, d(x n, y n ) d(x n, x n) + d(x n, y n) + d(y n, y n ). d(x n, y n ) d(x n, y n) d(x n, x n) + d(y n, y n). As x x and y y, the right hand side of this inequality tends to as n. Hence lim n d(x n, y n ) = lim n d(x n, y n). Step 2. ( d is a metric). Let {x n }, {y n } and {z n } represent x, ỹ and z respectively. We have ( d( x, z) = lim d(xn, z n ) n ( lim d(xn, y n ) + d(y n, z n ) ) n = lim d(x n, y n ) + lim d(y n, z n ) n n = d( x, ỹ) + d(ỹ, z) Step 3. We claim that there is a metric preserving map Φ : X X satisfying Φ(X) = X. Given any x in X, the constant sequence (x, x, x, ) is clearly a Cauchy sequence. Let x be its equivalence class in C. Then Φx = x defines a map from X to X. Clearly d(φ(x), Φ(y)) = lim n d(x n, y n ) = d(x, y) since x n = x and y n = y for all n, so Φ is metric preserving and it is injective in particular. To show that the closure of Φ(X) is X, we observe that any x in X is represented by a Cauchy sequence x = (x 1, x 2, x 3, ). Consider the constant sequence x n = (x n, x n, x n, ) in Φ(X). We have d( x, x n ) = lim m d(x m, x n ). Given ε >, there exists an n such that d(x m, x n ) < ε/2 for all m, n n. Hence d( x, x n ) = lim m d(x m, x n ) < ε for n n. That is x n x as n, so the closure of Φ(M) is precisely M.

3.5. APPENDIX: COMPLETION OF A METRIC SPACE 27 Step 4. We claim that ( X, d) is a complete metric space. Let { x n } be a Cauchy sequence in X. As Φ(X) is equal to M, for each n we can find a ỹ in Φ(X) such that d( x n, ỹ n ) < 1 n. So {ỹ n } is also a Cauchy sequence in d. Let y n be the point in X so that y n = (y n, y n, y n, ) represents ỹ n. Since Φ is metric preserving, and {ỹ n } is a Cauchy sequence in d, {y n } is a Cauchy sequence in X. Let (y 1, y 2, y 3, ) ỹ in X. We claim that ỹ = lim n x n in X. For, we have d( x n, ỹ) d( x n, ỹ n ) + d(ỹ n, ỹ) 1 n + lim m d(y n, y m ) as n. We have shown that d is a complete metric on X. Completion of a metric space is unique once we have clarified the meaning of uniqueness. Indeed, call two metric spaces (X, d) and (X, d ) isometric if there exists a bijective embedding from (X, d) onto (X, d ). Since a metric preserving map is always one-to-one, the inverse of of this mapping exists and is a metric preserving mapping from (X, d ) to (X, d). So two spaces are isometric provided there is a metric preserving map from one onto the other. Two metric spaces will be regarded as the same if they are isometric, since then they cannot be distinguish after identifying a point in X with its image in X under the metric preserving mapping. With this understanding, the completion of a metric space is unique in the following sense: If (Y, ρ) and (Y, ρ ) are two completions of (X, d), then (Y, ρ) and (Y, ρ ) are isometric. We will not go into the proof of this fact, but instead leave it to the interested reader. In any case, now it makes sense to use the completion of X to replace a completion of X. Comments on Chapter 3. There are two popular constructions of the real number system, Dedekind cuts and Cantor s Cauchy sequences. Although the number system is fundamental in mathematics, we did not pay much attention to its rigorous construction. It is too dry and lengthy to be included in Mathematical Analysis I. Indeed, there are two sophisticate steps in the construction of real numbers from nothing, namely, the construction of the natural numbers by Peano s axioms and the construction of real numbers from rational numbers. Other steps are much easier. Cantor s construction of the irrationals from the rationals is adapted to construct the completion for a metric space in Theorem 3.2. You may google under the key words Peano s axioms, Cantor s construction of the real numbers, Dedekind cuts for more.

28 CHAPTER 3. THE CONTRACTION MAPPING PRINCIPLE Contraction Mapping Principle, or Banach Fixed Point Theorem, was found by the Polish mathematician S. Banach (1892-1945) in his 1922 doctoral thesis. He is the founder of functional analysis and operator theory. According to P. Lax, During the Second World War, Banach was one of a group of people whose bodies were used by the Nazi occupiers of Poland to breed lice, in an attempt to extract an anti-typhoid serum. He died shortly after the conclusion of the war. The interested reader should look up his biography at Wiki. An equally famous fixed point theorem is Brouwer s Fixed Point Theorem. It states that every continuous map from a closed ball in R n to itself admits at least one fixed point. Here it is not the map but the geometry, or more precisely, the topology of the ball matters. You will learn it in a course on topology. Inverse and Implicit Function Theorems, which reduce complicated structure to simpler ones via linearization, are the most frequently used tool in the study of the local behavior of maps. We learned these theorems and some of its applications in Advanced Calculus I already. In view of this, we basically provide detailed proofs here but leave out many standard applications. You may look up Fitzpatrick, Advance Calculus, to refresh your memory. By the way, the proof in this book does not use Contraction Mapping Principle. I do know a third proof besides these two. Picard-Lindelöf Theorem or the fundamental existence and uniqueness theorem of differential equations was mentioned in Ordinary Differential Equations and now its proof is discussed in details. Of course, the contributors also include Cauchy and Lipschitz. Further results without the Lipschitz condition can be found in Chapter 4. A classic text on ordinary differential equations is Theory of Ordinary Differential Equations by E.A. Coddington and N. Levinson. V.I. Arnold s Ordinary Differential Equations is also a popular text.