Chapter 3. Differentiable Mappings. 1. Differentiable Mappings

Chapter 3 Differentiable Mappings 1 Differentiable Mappings Let V and W be two linear spaces over IR A mapping L from V to W is called a linear mapping if L(u + v) = Lu + Lv for all u, v V and L(λv) = λ(lv) for all λ IR and v V If L is a linear mapping from V to W, then for any c IR, cl is a linear mapping Moreover, if L and M are two linear mappings from V to W, then L + M is a linear mapping Let U, V and W be linear spaces If M is a linear mapping from U to V and L is a linear mapping from V to W, then the composite mapping L M is a linear mapping from U to W Now assume that V = IR k and W = IR m, where k and m are positive integers If L is a linear mapping from IR k to IR m, then there exists a unique matrix B = (b ij ) 1 i m,1 j k such that b 11 b 12 b 1k b L(x) = Bx = 21 b 22 b 2k b m1 b m2 b mk x 1 x 2 x k, x = x 1 x 2 x k IRk The matrix B is the matrix representation of the linear mapping L with respect to the standard bases We call B the standard matrix of L Often we will use B to denote the linear mapping x Bx, x IR k Suppose that B and C are the standard matrices of linear mapping L and M from IR k to IR m, respectively Then B + C is the standard matrix of L + M Moreover, if c is a real number, then cb is the standard matrix of cl Furthermore, if C is the standard matrix of a linear mapping M from IR d to IR k, and if B is the standard matrix of a linear mapping L from IR k to IR m, then BC is the standard matrix of the composite linear mapping L M from IR d to IR m The norm of a linear mapping L from IR k to IR m is defined by L := sup{ L(x) : x IR k, x 1} Thus, L 0 It is easily seen that L = 0 if and only if L(x) = 0 for all x IR k Moreover, for any real number c, cl = c L If M is also a linear mapping from IR k to IR m, then L + M L + M 1

Let U be a nonempty open subset of IR k A mapping f from U to IR m is said to be differentiable at a point a in U if there exists a linear mapping L from IR k to IR m such that f(x) f(a) L(x a) lim x a x a The linear mapping L satisfying the above condition is unique This linear mapping is denoted by df a and is called the differential of f at a If f is differentiable at every point = 0 in U, then we call f a differentiable mapping from U to IR m Theorem 11 Let f be a mapping from an open set U in IR k to IR m, and let a be a point in U Suppose that f(x) = (f 1 (x), f 2 (x),, f m (x)) for x U, where f 1, f 2,, f m are real-valued functions on U Then f is differentiable at a if and only if f 1, f 2,, f m are differentiable at a If this is the case, then the standard matrix of the differential df a is D 1 f 1 (a) D 2 f 1 (a) D k f 1 (a) D 1 f 2 (a) D 2 f 2 (a) D k f 2 (a) D 1 f m (a) D 2 f m (a) D k f m (a) Proof Suppose that f is differentiable at a and the standard matrix of the differential df a is B = (b ij ) 1 i m,1 j k For each i {1,, m}, let v i be the ith row vector (b i1,, b ik ) IR k We have It follows that f i (x) f i (a) v i, x a f(x) f(a) L(x a) f i (x) f i (a) v i, x a lim x a x a Hence, for each i {1,, m}, f i is differentiable at a and b ij = D j f i (a) for 1 i m and 1 j k Conversely, suppose that f i is differentiable at a for each i {1,, m} For each i, there exists a vector v i IR k such that f i (x) f i (a) v i, x a lim x a x a Let B be the m k matrix with v 1, v 2,, v m as its rows, and let L be the corresponding linear mapping from IR k to IR m Then we have = 0 = 0 f(x) f(a) L(x a) m f i (x) f i (a) v i, x a i=1 2

Consequently, f(x) f(a) L(x a) lim x a x a = 0 This shows that the mapping f is differentiable at a We define the Jacobian matrix of f at a to be Df(a) := ( D j f i (a) ) 1 i m,1 j k An application of the mean value theorem gives the following useful result for differentiable mappings Theorem 12 Let f = (f 1,, f m ) be a differentiable mapping from an open set U in IR k to IR m Let a and b be two distinct points in U such that the closed line segment [a, b] is contained in U If K is a real number such that Df(x) K for all x in the open line segment (a, b), then f(b) f(a) K b a Proof The theorem is obviously true if f(b) = f(a) In what follows we assume that f(b) f(a) For a vector v = (v 1,, v m ) IR m we define h(x) := v, f(x) = v 1 f 1 (x) + + v m f m (x), x U Then v is a differentiable function on U By the mean value theorem (Theorem 41 in Chapter 2), there exists some x (a, b) such that h(b) h(a) = h(x), b a) = v[df(x)](b a), where v is regarded as a 1 m matrix, Df(x) is an m k matrix, and b a is regarded as a k 1 matrix Since Df(x) K, it follows that v, f(b) f(a) v Df(x) b a K v b a Choosing v := [f(b) f(a)]/ f(b) f(a) in the above inequalities, we obtain v = 1 and v, f(b) f(a) = f(b) f(a) Therefore, f(b) f(a) K b a The following theorem gives the chain rule for the composition of two differentiable mappings 3

Theorem 13 Let f be a mapping from an open set U in IR k to IR m, and let g be a mapping from an open set V in IR k to IR n Suppose that a U and f(u) V If f is differentiable at a, and if g is differentiable at b := f(a), then the composite mapping g f is differentiable at a and d(g f) a = dg f(a) df a Consequently, D(g f)(a) = Dg(f(a))Df(a) Proof We write S for dg b and write T for df a Since g is differentiable at b = f(a), there exists r > 0 such that y B r (b) implies y V and g(y) g(b) S(y b) ε y b Since f is differentiable at a, there exists δ > 0 such that x B δ (a) implies x U, f(x) B r (b) and f(x) f(a) T (x a) ε x a In what follows we assume that x B δ (a) Then we have f(x) f(a) f(x) f(a) T (x a) + T (x a) (ε + T ) x a Moreover, S(f(x) f(a)) S T (x a) S f(x) f(a) T (x a) ε S x a Since f(x) lies in B r (b), we have g(f(x)) g(f(a)) S(f(x) f(a)) ε f(x) f(a) By using the triangle inequality, we derive from the above inequalities that g f(x) g f(a) S T (x a) ε S x a +ε f(x) f(a) ε( S +ε+ T ) x a This shows that g f is differentiable at a and d(g f) a = S T In other words, d(g f) a = dg f(a) df a In the above theorem, the mapping f can be represented as y s = f s (x 1,, x k ), s = 1,, m, (x 1,, x k ) U, and the mapping g can be represented as z i = g i (y 1,, y m ), i = 1,, n, (y 1,, y m ) V 4

If we use the traditional notation, then the chain rule can be expressed as z i x j = m s=1 z i y s y s x j, i = 1,, n, j = 1,, k Example Let f be the mapping from IR 2 to IR 2 given by u = ρ cos θ, v = ρ sin θ, (ρ, θ) IR 2, and let g be the mapping from IR 2 to IR 2 given by We have and x = u 2 v 2, y = 2uv, (u, v) IR 2 [ ] cos θ ρ sin θ Df(ρ, θ) = sin θ ρ cos θ [ 2u 2v Dg(u, v) = 2v 2u By the chain rule we obtain D(g f)(ρ, θ) = Dg(ρ cos θ, ρ sin θ)df(ρ, θ) Consequently, [ ] [ ] [ ] 2ρ cos θ 2ρ sin θ cos θ ρ sin θ 2ρ cos(2θ) 2ρ D(g f)(ρ, θ) = = 2 sin(2θ) 2ρ sin θ 2ρ cos θ sin θ ρ cos θ 2ρ sin(2θ) 2ρ 2 cos(2θ) ] 2 The Jacobian Determinant Let f be a differentiable mapping from an open set U in IR k to IR k For a point a U, the Jacobian determinant of f at a is defined to be J f (a) := det(df(a)) = det ( D j f i (a) ) 1 i,j k Example 1 Let f be the mapping from IR 2 to IR 2 given by u = ρ cos θ, v = ρ sin θ, (ρ, θ) IR 2 Then J f (ρ, θ) = cos θ sin θ ρ sin θ ρ cos θ = ρ Example 2 Let g be the mapping from IR 3 to IR 3 given by x = ρ cos θ sin φ, y = ρ sin θ sin φ, z = ρ cos φ, (ρ, θ, φ) IR 3 5

Then cos θ sin φ ρ sin θ sin φ ρ cos θ cos φ J g (ρ, θ, φ) = sin θ sin φ ρ cos θ sin φ ρ sin θ cos φ cos φ 0 ρ sin φ = ρ2 sin φ We are in a position to review basic properties of determinants Let A = (a ij ) 1 i,j n be an n n matrix of real numbers If n = 1, we define det(a 11 ) := a 11 Suppose that n > 1 For a fixed pair (i, j) (1 i, j n) we use A ij to denote the (n 1) (n 1) matrix obtained by deleting the ith row and the jth column from A We define det A := n ( 1) 1+j A 1j In particular, if A = (a ij ) 1 i,j 2 is a 2 2 matrix, then det A = a 11 a 12 a 21 a 22 := a 11a 22 a 12 a 21 j=1 If A = (a ij ) 1 i,j 3 is a 3 3 matrix, then a 11 a 12 a 13 det A = a 21 a 22 a 23 a 31 a 32 a 33 := a 11 a 22 a 23 a 32 a 33 a 12 a 21 a 23 a 31 a 33 + a 13 a 21 a 22 a 31 a 32 For an n n matrix A = (a ij ) 1 i,j n we use A i to denote its ith column (i = 1,, n) Thus A can be written as [A 1, A 2,, A n ] By using an induction argument we can easily verify the following properties of determinants (d1) If A is the identity matrix, that is, a ij = 1 for i = j and a ij = 0 for i j, then det A = 1 (d2) The determinant of a matrix A is a multilinear function of its columns More precisely, if the jth column A j is equal to a sum of two column vectors A i det[a 1,, A i 1, A i + A i, A i+1,, A n ] and A i, then = det[a 1,, A i 1, A i, A i+1,, A n ] + det[a 1,, A i 1, A i, A i+1,, A n ] Furthermore, if c IR, then det[a 1,, A i 1, ca i, A i+1,, A n ] = c det[a 1,, A i 1, A i, A i+1,, A n ] 6

(d3) If two adjacent columns of a matrix A are equal, ie, if A i = A i+1 for some i in {1,, n 1}, then det A = 0 The above three conditions (d1), (d2), and (d3) characterize the properties of determinants In other words, all the properties of determinants can be derived from (d1), (d2), and (d3) Let us derive the following property: (d4) If two columns of a matrix are interchanged, then its determinant changes by a sign We establish this property first when two adjacent columns A i and A i+1 are interchanged By (d3) we have det[, A i + A i+1, A i + A i+1, ] = 0 Applying the multilinear property (d2) to the above determinant, we obtain det[, A i, A i, ] + det[, A i, A i+1, ] + det[, A i+1, A i, ] + det[, A i+1, A i+1, ] = 0 By (d3) we have det[, A i, A i, ] = 0 and det[, A i+1, A i+1, ] = 0 Hence, det[, A i+1, A i, ] = det[, A i, A i+1, ] Now we strengthen property (d3) as follows: (d5) If two columns of a matrix A are equal, then det A = 0 Assume that two columns of the matrix A are equal We can change the matrix by a successive interchange of adjacent columns until we obtain a matrix A with equal adjacent columns By what has been proved for (d4) we have det A = det A or det A = det A But det A = 0 by (d3) Hence det A = 0 We can now finish the proof of (d4) Suppose that the ith column and the jth column are interchanged, where 1 i < j n By (d5) we have det[, A i + A j,, A i + A j, ] = 0 Expanding the above determinant as before, we obtain det[, A j,, A i, ] = det[, A i,, A j, ] The following property is also useful (d6) If one adds a scalar multiple of one column to another column, then the value of the determinant does not change 7

Suppose that the ith column A i of a matrix A is replaced by A i + ca j, where j i and c IR By (d2) we have det[, A i 1, A i + ca j, A i+1, ] = det[, A i 1, A i, A i+1, ] + c det[, A i 1, A j, A i+1, ] There are two determinants on the right of the above equality The first determinant is just det A, and the second determinant is equal to 0, since two of its columns are equal This verifies (d6) For c IR and i {1,, n}, let Q i (c) be the matrix obtained from the n n identity matrix I by multiplying its ith column by c For 1 i j n, we use P ij to denote the matrix obtained from I by interchanging the ith column and the jth column For α IR and 1 i j n, we use R ij (α) to denote the matrix obtained from I by adding the α multiple of the jth column to the ith column A square matrix is called an elementary matrix if it has one of the forms P ij, Q i (c), or R ij (α) Theorem 21 If A and B are two square matrices of the same size, then det(ab) = (det A)(det B) and det(a T ) = det A Proof Let A and B be two n n matrices Then B can be written as a product of elementary matrices Hence, in order to prove det(ab) = (det A)(det B), it suffices to show that det(ae) = (det A)(det E) for each elementary matrix E If E = P ij, then AP ij is the matrix obtained by interchanging the ith column and the jth column of A; hence det(ap ij ) = det A = (det A)(det P ij ) If E = Q i (c), then AQ i (c) is the matrix obtained from A by multiplying its ith column by c; hence det(aq i (c)) = c det A = (det A)(det Q i (c)) If E = R ij (α), then AR ij (α) is the matrix obtained from A by adding the α multiple of the jth column to the ith column; hence det(ar ij (α)) = det A = (det A)(det R ij (α)) This completes the proof of det(ab) = (det A)(det B) Let us show that det E T = det E for any elementary matrix E Indeed, Pij T = P ij and Q i (c) T = Q i (c) Moreover, R ij (α) T = R ji (α) Hence, det R ij (α) T = 1 = det R ij (α) The 8

matrix A can be written as A = E 1 E k, where E 1,, E k are elementary matrices We have det A T = det(e T k E T 1 ) = (det E T k ) (det E T 1 ) = (det E k ) (det E 1 ) = det A This completes the proof of det A T = det A An n n matrix A is said to be invertible if there exists an n n matrix B such that AB = BA = I Such a matrix B is uniquely determined by A This matrix B is called the inverse of A and will be denoted by A 1 Theorem 22 A square matrix A is invertible if and only if det A 0 Proof Let A be an n n matrix If A is invertible, then there exists an n n matrix B such that AB = I It follows that 1 = det I = det(ab) = (det A)(det B) This shows that det A 0 If E is an elementary matrix and det E 0, then E is invertible Indeed, P ij is invertible since P ij P ij = I Moreover, R ij (α) is invertible, because R ij (α)r ij ( α) = I If E = Q i (c) and det E 0, then c = det E 0 In this case Q i (c) is invertible, since Q i (1/c)Q i (c) = Q i (c)q i (1/c) = I Now suppose that det A 0 We write A as A = E 1 E k, where E 1,, E k are elementary matrices Since det A = det(e 1 ) det(e k ) and det A 0, we have det E j 0 for each j {1,, k} By what has been proved, each E j is invertible Consequently, (E 1 k This shows that A is invertible E1 1 )A = A(E 1 k E1 1 ) = I 3 The Inverse Function Theorem The main theorem of this section establishes sufficient conditions for the existence of a local inverse of a continuously differentiable mapping Theorem 31 Let U be an open set in IR k and let f = (f 1,, f k ) be a continuously differentiable mapping from U to IR k Suppose that a is a point in U such that J f (a) 0 Then there exist an open set U 1 with a U 1 U and an open set V 1 with f(a) V 1 f(u) such that f is a one-to-one mapping from U 1 onto V 1 Moreover, the inverse mapping g of f U1 is continuously differentiable on V 1 Proof Let S denote the Jacobian matrix Df(a) = (D j f i (a)) 1 i,j k Since the Jacobian determinant J f (a) 0, the matrix S is invertible Let T := S 1 For given y IR k, 9

consider the mapping h from U to IR k defined by h(x) := x T (f(x) y), x U If there exists x U such that h(x ) = x, then T (f(x ) y) = 0 Since T is invertible, it follows that f(x ) = y Thus, the problem of solving the equation f(x) = y is reduced to the problem of finding a fixed point of the mapping h We observe that Dh(x) = I T (Df(x)), x U In particular, Dh(a) = I T (Df(a)) = I T S = 0, where 0 stands for the k k matrix with all entries being 0 Since f is continuously differentiable on U, so is h Hence, there exists some r > 0 such that B r (a) U and Dh(x) < 1/2 for all x B r (a) Consequently, the matrix I Dh(x) is invertible Thus, the Jacobian matrix Df(x) = T 1 (I Dh(x)) is invertible for all x B r (a) Moreover, by Theorem 12 we have h(x ) h(x ) 1 2 x x for all x, x B r (a) It follows that T [f(x ) f(x )] = [x x ] [h(x ) h(x )] x x h(x ) h(x ) 1 2 x x Therefore, f(x ) f(x ) 1 2 T x x In particular, f Br (a) is one-to-one for all x, x B r (a) Let δ := r/(2 T ) and V 1 := B δ (b), where b := f(a) Let y V 1 Our goal is to find x U such that f(x) = y For this purpose, we use the following iteration scheme: x 0 := a and x k+1 := h(x k ) for k = 0, 1, 2, We shall use mathematical induction to prove that the statement P k : x k+1 B r (a) and x k+1 x k < r 2 k+1 is true for all k IN 0 For k = 0 we have x 1 x 0 = T (f(x 0 ) y) It follows that x 1 a = x 1 x 0 T y b < T δ = r 2 This verifies P 0 Suppose that k > 0 and P j is true for all j < k By the induction hypothesis, x k, x k 1 B r (a) Hence, x k+1 x k = h(x k ) h(x k 1 ) 1 2 xk x k 1 < 1 2 10 r 2 k = r 2 k+1

Moreover, x k+1 a = x k+1 x 0 = (x j+1 x j ) Thus, x k+1 B r (a) j=0 x j+1 x j < j=0 r < r 2j+1 This verifies P k and thereby completes the induction procedure Since x k+1 x k < r/2 k+1 for all k IN 0, the sequence (x k ) k=0,1, converges to some x in IR k Letting k on the both sides of the equation x k+1 = h(x k ), we obtain x = h(x ) Therefore, f(x ) = y Furthermore, x a = lim k xk+1 a x j+1 x j = x 1 x 0 + j=0 j=0 x j+1 x j < r Let U 1 := B r (a) f 1 (V 1 ) Then U 1 is an open set and a U 1 U By what has been proved, for any y V 1, there exists a unique x B r (a) such that f(x ) = y Clearly, x B r (a) f 1 (V 1 ) = U 1 This shows that f is a one-to-one mapping from U 1 onto V 1 Moreover, f(a) = b V 1 = f(u 1 ) f(u) Let g = (g 1,, g k ) be the inverse mapping of f U1 For v V 1, we wish to show that g is differentiable at v Let y V 1, x := g(y) U 1 and u := g(v) U 1 Then y = f(x) and v = f(u) Let S u denote the Jacobian matrix Df(u) Then S u is invertible We have It follows that g(y) g(v) S 1 u g(y) g(v) S 1 u j=1 (y v) = Su 1 [ f(x) f(u) Su (x u) ] (y v) Su 1 f(x) f(u) S u (x u) Note that y v = f(x) f(u) x u /(2 T ) Since f is differentiable at u, we have Consequently, f(x) f(u) S u (x u) lim x u x u lim y v g(y) g(v) Su 1 (y v) y v Therefore, g is differentiable at v, and Dg(v) = Su 1 on U 1, we conclude that Dg is continuous on V 1 = 0 = 0 Example Let f = (f 1, f 2 ) be the mapping from IR 2 to IR 2 given by = (Df(u)) 1 Since Df is continuous f 1 (x 1, x 2 ) := x 2 1 x 2 2, f 2 (x 1, x 2 ) := 2x 1 x 2, (x 1, x 2 ) IR 2 11

Given (y 1, y 2 ) IR 2, we wish to solve the system of equations f 1 (x 1, x 2 ) = y 1, f 2 (x 1, x 2 ) = y 2 We have y 2 1 = (x 2 1 x 2 2) 2 and y 2 2 = 4x 2 1x 2 2 It follows that y 2 1 + y 2 2 = (x 2 1 + x 2 2) 2 Hence, x 2 1 + x 2 2 = y 2 1 + y2 2 Thus, for (y 1, y 2 ) = (0, 0), the only solution is (x 1, x 2 ) = (0, 0) For (y 1, y 2 ) (0, 0), we derive from x 2 1 + x 2 2 = y 2 1 + y2 2 and x2 1 x 2 2 = y 1 that (x 1, x 2 ) = ±( [y1 + [ y1 ) y1 2 + 2] y2 /2, + y1 2 + ] y2 2 /2 for y 2 0 or (x 1, x 2 ) = ±( [y1 + [ y1 ) y1 2 + 2] y2 /2, + y1 2 + ] y2 2 /2 for y 2 < 0 Consequently, f maps IR 2 \ {(0, 0)} two-to-one onto IR 2 \ {(0, 0)} Let us compute the Jacobian of f We have J f (x 1, x 2 ) = D 1f 1 (x 1, x 2 ) D 2 f 1 (x 1, x 2 ) D 1 f 2 (x 1, x 2 ) D 2 f 2 (x 1, x 2 ) = 2x 1 2x 2 2x 2 2x 1 = 4(x2 1 + x 2 2) By Theorem 31, for (a 1, a 2 ) (0, 0), there exist an open set U 1 in IR 2 containing (a 1, a 2 ) and an open set V 1 in IR 2 containing f(a 1, a 2 ) such that f is a one-to-one mapping from U 1 onto V 1 Indeed, if we choose r := a 2 1 + a2 2 > 0, then f B r (a 1,a 2 ) is one-to-one Let g = (g 1, g 2 ) be the inverse of f Br (a 1,a 2 ) By Theorem 31 and the chain rule we have Dg(y 1, y 2 ) = where (y 1, y 2 ) = f(x 1, x 2 ) for (x 1, x 2 ) B r (a 1, a 2 ) [ ] 1 x1 x 2 2(x 2 1 + x2 2 ), x 2 x 1 In the above example, the inverse mapping could be found in an explicit form This is not possible in general But the Inverse Mapping Theorem is still applicable It gives us a powerful tool to analyze mappings and curvilinear coordinates 4 The Implicit Function Theorem Theorem 41 Let f = (f 1,, f m ) be a continuously differentiable mapping from an open set U in IR k+m to IR m Each f i (i = 1,, m) is a function of (x 1,, x k, y 1,, y m ) Suppose that (a, b) = (a 1,, a k, b 1,, b m ) is a point in U such that f i (a, b) = 0 for i = 1,, m If ( fi ) det (a, b) y 0, j 1 i,j m 12

then there exist an open set V in IR k containing a = (a 1,, a k ) and a continuously differentiable mapping g = (g 1,, g m ) from V to IR m such that g(a) = b = (b 1,, b m ) and that for i = 1,, m, f i ( x1,, x k, g 1 (x 1,, x k ),, g m (x 1,, x k ) ) = 0 (x 1,, x k ) V Proof Let F = (F 1,, F k, F k+1,, F m ) be the mapping from U to IR k+m given by F i (x, y) = x i for i = 1,, k and F k+j (x, y) = f j (x, y) for j = 1,, m, where x := (x 1,, x k ) and y := (y 1,, y m ) Clearly, F (a, b) = (a 1,, a k, 0,, 0) The Jacobian matrix of F at (a, b) is [ I 0 S T where I is the k k identity matrix, 0 is the k m zero matrix, ( fi ) ( fi ) S = (a, b) and T = (a, b) x j 1 i m,1 j k y j ], 1 i m,1 j m By our assumption, det T 0 Hence, the Jacobian determinant J F (a, b) 0 By the Inverse Function Theorem, there exist an open set U 1 in IR k+m with (a, b) U 1 U and an open set V 1 in IR k+m with F (a, b) V 1 F (U) such that F is an one-to-one mapping from U 1 onto V 1 Let G := (G 1, G 2,, G k+m ) be the inverse mapping of F U1 Then G is continuously differentiable on V 1 Set V := {(x 1,, x k ) IR k : (x 1,, x k, 0,, 0) V 1 } Then V is an open set in IR k Moreover, since (a 1,, a k, 0,, 0) = F (a, b) V 1, we have a = (a 1,, a k ) V For j = 1,, m, let g j (x 1,, x k ) := G k+j (x 1,, x k, 0,, 0), (x 1,, x k ) V Then g := (g 1,, g m ) is a continuously differentiable mapping from V to IR m Since F (a, b) = (a 1,, a k, 0,, 0) = (a, 0), we have G(a, 0) = (a, b) In light of the definition of g we see that b = g(a) Moreover, since F G is the identity mapping on V, we have F (G(x, 0)) = (x, 0) for all x = (x 1,, x k ) V Consequently, we obtain G i (x, 0) = x i for i = 1,, k and F k+j (G(x, 0)) = 0 for j = 1,, m Therefore, for j = 1,, m we have f j ( x1,, x k, g 1 (x 1,, x k ),, g m (x 1,, x k ) ) = 0 (x 1,, x k ) V This completes the proof of the theorem 13

Let Z := {(x 1,, x k, y 1,, y m ) U 1 : f j (x 1,, x k, y 1,, y m ) = 0 for j = 1,, m} Let ϕ be the mapping given by ϕ(x 1,, x k ) := ( x 1,, x k, g 1 (x 1,, x k ),, g m (x 1,, x k ) ), (x 1,, x k ) V From the above proof we see that ϕ is a one-to-one mapping from V onto Z Example 1 Let S 1 and S 2 be two surfaces in IR 3 represented by S 1 := {(x, y, z) IR 3 : x 2 (y 2 + z 2 ) = 5} and S 2 := {(x, y, z) IR 3 : (x z) 2 + y 2 = 2} Clearly, (1, 1, 2) S 1 S 2 Let F and G be the functions given by F (x, y, z) := x 2 (y 2 + z 2 ) 5 and G(x, y, z) := (x z) 2 + y 2 2, (x, y, z) IR 3 At the point (1, 1, 2) we have F y G y F z G z = 2x 2 y 2x 2 z 2y 2(x z) = 2 4 2 2 = 4 0 By Theorem 41 we can find an interval V in IR containing 1, an open set U in IR 3 containing (1, 1, 2), and functions f and g from V to IR such that f(1) = 1, g(1) = 2, and (f, g) maps V one-to-one and onto U (S 1 S 2 ) In particular, F (x, f(x), g(x)) = 0 and G(x, f(x), g(x)) = 0 for all x V By using the chain rule it follows that Consequently, F x + F y f (x) + F z g (x) = 0 and G x + G y f (x) + G z g (x) = 0 2x(y 2 + z 2 ) + 2x 2 yf (x) + 2x 2 zg (x) = 0 and 2(x z) + 2yf (x) 2(x z)g (x) = 0 Solving the above system of equations for f (x) and g (x) we obtain f (x) = y2 z + z 3 xy 2 x 2 z x 2 y and g (x) = x2 xz y 2 z 2 x 2, x V 14

Example 2 Consider the system of equations F (x, y, u, v) = u 2 + v 2 x 2 y = 0 and G(x, y, u, v) = u + v xy + 1 = 0 Clearly, F (2, 1, 1, 2) = 0 and G(2, 1, 1, 2) = 0 At the point (2, 1, 1, 2) we have F F u v = 2u 2v 1 1 = 2 4 1 1 = 6 0 G u G v By Theorem 41 we can find an open set V in IR 2 containing (2, 1), and functions f and g from V to IR such that f(2, 1) = 1, g(2, 1) = 2, and F (x, y, f(x, y), g(x, y)) = 0 and G(x, y, f(x, y), g(x, y)) = 0 for all (x, y) V Differentiating both sides of the above equations with respect to x, we obtain F x + F f u x + F v g x = 0 and G x + G f u x + G g v x = 0 Consequently, f x = x yv g and u + v x = x + yu, (x, y) V u + v Similarly, differentiating with respect to y yields Hence we obtain F y + F f u y + F v f y = 1 2xv 2(u + v) g y = 0 and G y + G f u y + G g v y = 0 and g y = 1 + 2xu, (x, y) V 2(u + v) 5 Constrained Optimization In this section, as an application of the inverse function theorem, we study the Lagrange multiplier method for constrained optimization Theorem 51 Let f, g 1,, g k be real-valued continuously differentiable functions of (x 1,, x n ) defined on an open set U in IR n with n > k Let Z := {z U : g 1 (z) = = g k (z) = 0} Suppose that there exist a point a in Z and an open ball B r (a) U such that f(z) f(a) (or f(z) f(a)) for all z Z B r (a) Suppose also that the rows of the Jacobian matrix ( gi ) (a) x j 1 i k,1 j n 15

are linearly independent Then there exist real numbers λ 1,, λ k such that f(a) + λ 1 g 1 (a) + + λ k g k (a) = 0 Proof By a permutation of the index set {1,, n} if necessary we may assume that ( gi ) det (a) x 0 j 1 i,j k Consequently, we can find real numbers λ 1,, λ k such that the equality f x j (a) + i=1 λ i g i x j (a) = 0 holds for j = 1,, k Our proof will be complete if we can show that the above equality also holds for j = k + 1,, n Suppose that a = (a 1,, a n ) We write a = (a, a ), where a := (a 1,, a k ) and a := (a k+1,, a n ) By the Implicit Function Theorem, there exist an open set V in IR n k containing a = (a k+1,, a n ) and a continuously differentiable mapping φ = (φ 1,, φ k ) from V to IR k such that φ(a k+1,, a n ) = (a 1,, a k ) and that for i = 1,, k, g i ( φ1 (x k+1,, x n ),, φ k (x k+1,, x n ), x k+1,, x n ) = 0 (xk+1,, x n ) V For (x k+1,, x n ) V, define h(x k+1,, x n ) := f ( φ 1 (x k+1,, x n ),, φ k (x k+1,, x n ), x k+1,, x n ) Then h is a continuously differentiable function on V and it attains a local minimum (or a local maximum) at the point a = (a k+1,, a n ) Hence, for m = k + 1,, n we have 0 = h x m (a ) = j=1 f x j (a) φ j x m (a ) + f x m (a), where the chain rule has been used to derive the second equality Furthermore, we have j=1 Consequently, j=1 g i x j (a) φ j x m (a ) + g i x m (a) = 0, i = 1,, k, m = k + 1,, n f x j (a) φ j x m (a ) + f x m (a) + i=1 λ i( j=1 16 g i (a) φ j (a ) + g ) i (a) = 0 x j x m x m

It follows that f x m (a) + i=1 λ i g i x m (a) = 0, m = k + 1,, n This completes the proof The above theorem gives the following method to find local minima or maxima for a continuous differentiable function f subject to the constraint g 1 = = g k = 0 Set the Lagrange function L(x 1,, x n ) := f(x 1,, x n ) + λ i g i (x 1,, x n ), (x 1,, x n ) U, where λ 1,, λ k are Lagrange multipliers Solve the system of n + k equations i=1 { L x j (x 1,, x n ) = 0 g i (x 1,, x n ) = 0 for j = 1,, n for i = 1,, k for (x 1,, x n ) and (λ 1,, λ k ) Example Let us find the extreme values (maximum and minimum) of the function f(x 1, x 2, x 3 ) = x 3 1 + x 3 2 + x 3 3 subject to the constraints x 2 1 + x 2 2 + x 2 3 = 4 and x 1 + x 2 + x 3 = 1 Note that the set E := {(x 1, x 2, x 3 ) IR 3 : x 2 1 + x 2 2 + x 2 3 = 4, x 1 + x 2 + x 3 = 1} is a compact set So f attains its maximum and minimum on E We use the Lagrange multiplier method to find the maximum and minimum of f on E Let g 1 (x 1, x 2, x 3 ) := x 2 1 + x 2 2 + x 2 3 4 and g 2 (x 1, x 2, x 3 ) := x 1 + x 2 + x 3 1 The Lagrange function is L(x 1, x 2, x 3, λ 1, λ 2 ) = (x 3 1 + x 3 2 + x 3 3) + λ 1 (x 2 1 + x 2 2 + x 2 3 4) + λ 2 (x 1 + x 2 + x 3 1) Setting L x j = 0 for j = 1, 2, 3, we obtain 3x 2 1 + λ 1 2x 1 + λ 2 = 0, 3x 2 2 + λ 1 2x 2 + λ 2 = 0, 3x 2 3 + λ 1 2x 3 + λ 2 = 0 17

It follows that Consequently, 3x 2 1 2x 1 1 3x 2 2 2x 2 1 3x 2 3 2x 3 1 = 0 (x 1 x 2 )(x 1 x 3 )(x 2 x 3 ) = 0 Thus, one and only one of the cases x 1 = x 2, x 1 = x 3, and x 2 = x 3 must occur Suppose that x 1 = x 2 This together with x 1 + x 2 + x 3 = 1 yields x 3 = 1 2x 1 Substituting x 2 = x 1 and x 3 = 1 2x 1 into the equation x 2 1 + x 2 2 + x 2 3 = 4, we obtain It has two solutions: x 2 1 + x 2 1 + (1 2x 1 ) 2 = 1 x 1 = 1 3 + 22 6 and x 1 = 1 3 22 6 Hence, the optimization problem has the following solutions: ( 1 22 3 + 6, 1 22 3 + 6, 1 3 22 ) 3 and ( 1 22 3 6, 1 22 3 6, 1 3 + 22 3 ) Other solutions are obtained from the above solutions by permutations of {x 1, x 2, x 3 } Note that the rows of the Jacobian matrix [ g1 g 1 g 1 ] x 1 x 2 x 3 = g 2 g 2 g 2 x 1 x 2 x 3 are linearly independent at any of these points [ 2x1 2x 2 2x 3 1 1 1 ] It is easily seen that the first set of solutions correspond to the minimum value, and the second set of solutions correspond to the maximum value 18