INVERSE AND IMPLICIT FUNCTION THEOREMS I use df x for the linear transformation that is the differential of f at x.. INVERSE FUNCTION THEOREM Definition. Suppose S R n is open, a S, and f : S R n is a function. We say f is locally invertible around a if there is an open set A S containing a so that f(a) is open and there is a function g : f(a) A so that, for all x A and y f(a), g(f(x)) = x, f(g(y)) = y. Clearly, it suffices to have f(a) open and f one-to-one on the open set A. It is important to note how f depends on the choice of A. If B another open set and h : f(b) B is an inverse for f on B, then on A B, h and g agree. So changing the set A may change the domain of f but not the value of f (x) for any point x. So we may, with only minimal risk of confusion, call g the local inverse of f near a. Definition 2. If S R n is open, then g : S R m is Lipschitz if there is a constant K so that We will need the following result: g(w) g(y) K w y. Proposition 3. Linear transformations are Lipschitz. That is, for a linear transformation L : R n R m, there is M > 0 so that, for all x, y R n, We also need the following result: Lx Ly M x y. Proposition 4. Let S R n is open. If function f : S R is continuous and T S is a compact set, then f attains its maximum and minimum on T. That is, there is t 0, t T so that, for all t T, f(t 0 ) f(t) f(t ). Note that f does not need to have an inverse function for f (V ) to make sense. Theorem 5 (Local Invertibility). Let S R n is open, a S, and f : S R n is C. If df a is invertible, then f is locally invertible around a and f is Lipschitz. Lemma 6. With the same hypotheses as the theorem, there are ɛ, c > 0 so that, for all x, z B ɛ (a), (7) f(x) f(z) c x z. and, for all x B ɛ (a), df x is invertible. Proof of Local Invertibility Theorem. Using the lemma, observe that for x, z B ɛ (a) with x z, f(x) f(z) c x z > 0
and so f(x) f(z), i.e. f is one-to-one on B ɛ (a). Thus, there is a function f : f(b ɛ (a)) B ɛ (a). Moreover, for w, y f(b ɛ (a)), there are x, z B ɛ (a) with w = f (x) and y = f (z). Using (7), w y c f (w) f (y). This shows f is Lipschitz (with constant /c) and so is continuous. To see that f(b ɛ (a)) is open, fix v in this set. There is x B ɛ (a) with f(x) = v. Choose s > 0 so that B s (x) is contained in B ɛ (a). Then K = {y : y x = s}, the boundary of B s (x), is a compact set. Since f is continuous, the image, f(k), is also compact. By the proposition, there is y 0 K so that the function z f(z) v attains its minimum. That is, for all y K, f(y) v f(y 0 ) v. As f is one-to-one, v is not in f(k); so d = f(y 0 ) v > 0. We shall show that B d/2 (v) is contained in f(b ɛ (a)). Let u B d/2 (v) and define a function on B s (x) by g(y) = f(y) u 2 = (f(y) u) (f(y) u). Observe that g is C because f is and by previous work dg y (h) = 2 ( df y )(h) ) (f(y) u) ). Since B s (x) is a closed and bounded set and g is continuous, the proposition guarantees that g attains its minimum value. Observe that at every point of K, ( g(y) = f(y) u 2 ( f(y) v v u ) 2 d d ) 2 = d2 2 4, while g(x) = v u 2 < d2 4. Hence the minimum of g occurs at some interior point y 0. So by previous work, dg y0 = 0. But df(y 0 ) is invertible by the lemma, so f(y 0 ) u = 0; that is, f(y 0 ) = u. So every point u B d/2 (v), we have found a point y 0 B s (x) with f(y 0 ) = u. Therefore f(b ɛ (a)) B d/2 (v) for each v f(b ɛ (a), showing that f(b ɛ (a)) is open. Proof of Lemma. Let T = (df a ). By the proposition above, there is M > 0 so that T u T v M u v. Letting u = T (x a) and v = T (y a), (so u = df a (x a)), we have df a (x a) df a (y a) x y. M Define E : S R n by E(x) = f(x) f(a) df a (x a). Since f is C and linear transformations are infinitely differentiable, E is C. Notice that de a (h) = df a (h) df a (h) = 0. In particular, if E = (E,..., E n ), then by the continuity of d(e i ) a there is some ɛ > 0 so that d(e i ) z 2M n, for i =,..., n and all z B ɛ (a). 2
Suppose that x, z B ɛ (a). Then, for each i, by Taylor s Theorem with linear remainder term, there is c i L[x, z] B ɛ (a) so that and so E i (x) E i (z) = d(e i ) ci (x z) E(x) E(z) 2 = n E i (x) E i (z) 2 i= 2M x z. n n ( ) 2 2M x z 2 n i= ( ) 2 = x z 2. 2M Thus, E(x) E(z) x z /(2M). As f(x) f(z) = E(x) E(z) (df a (x a) df a (z a)), f(x) f(z) df a (x a) df a (z a) E(x) E(z) x z M 2M x z = x z. 2M The proves (7) with c = /(2M). Finally, to see that df x in invertible for each x B ɛ (a), observe that de x (z x) = df x (z x) df a (z x). If there was z so that df x (z x) = 0, then de x (z x) = df a (z x). On the other hand, we have that df a (z x) M z x, de x(z x) z x. 2M This contradiction shows that df x is a one-to-one linear transformation from R n to R n and so must be invertible. Recall that we proved that a function g is differentiable at c if and only if there is a linear transformation L and a function ɛ so that lim x c ɛ(x) = 0 and In this case, L is dg c. g(x) = g(c) + L(x c) + ɛ(x) x c. Theorem 8 (Inverse Function Theorem). Let S R n be open, a S, and f : S R n is C. If df a is invertible, then f is differentiable at b = f(a) and d(f ) b = ( df f (b)). Proof. Since f is differentiable at a, there is a function ɛ : S R n with lim x a ɛ(x) = 0 and f(x) = f(a) + df a (x a) + ɛ(x) x a. Since f is locally invertible around a, there is some open set A containing a on which f is one-toone and f is Lipschitz on the open set f(a). 3
For x A, there is y f(a) with x = f (y). Using this and a = f (b), we have f(f (y)) = f(f (b)) + df a (f (y) f (b)) + ɛ(f (y)) f (y) f (b). Using the inverse function identities and moving b over, we have y b = df a (f (y) f (b)) + ɛ(f (y)) f (y) f (b). Applying (df a ) to this equation and using the linearity of (df a ), we have (df a ) (y b) = f (y) f (b) + (df a ) (ɛ(f (y))) f (y) f (b). Then we can rearrange the previous equation to obtain f (y) = f (b) + (df a ) (y b) + η(y) y b. if we define a new function η on f(a) by letting η(b) = 0 and otherwise η(y) = (df a) (ɛ(f (y))) f (y) f (b). y b To show that f is differentiable at b and d(f ) b is (df a ), it suffices to show that lim η(y) = 0. y b As f is Lipschitz, there is a constant K > 0 so that f (y) f (b) y b K for all y f(a). So it suffices to prove that lim (df a) (ɛ(f (y))) = 0. y b Now, as y b, f (y) f (b) = a. By our choice of the function ɛ, as f (y) a, ɛ(f (y)) 0. Since the linear transformation (df a ) is continuous, we have the claimed limit. This concludes the proof. Corollary 9. f is C on its domain. This is very rough. Notice first that since f is uniquely defined on its domain, call it A, f is locally invertible at each point of A. By the lemma, we may assume df a is invertible for each a A. By the inverse function theorem, we have that d(f ) b = (df f (b)) for each b f(a). To see that this function is continuous, observe first that f is continuous; second, that the map x df x is continuous; and third, that matrix inversion is continuous. As a composition of three continuous operations, d(f ) b is a continuous function of b. Definition 0. We define a C diffeomorphism as a function f : S R n, where S R n which is one-to-one on S and has df s invertible for each s S. Then the Inverse Function Theorem can be reformulated as the statement that f : f(s) S is also a C diffeomorphism. 4
2. IMPLICIT FUNCTION THEOREM Definition. Suppose G : R m R n R n satisfies G(a, b) = 0. A local solution of G(x, y) = 0 for y in terms of x near (a, b) consists of an open set W R n R m with (a, b) W, an open set U R m, and a function h : U R n so that G(x, y) = 0, (x, y) W if and only if y = h(x), x U. That is, G(x, h(x)) = 0 for every x U and if (x, y) W satisfies G(x, y) = 0, then x U and y = h(x). In particular, a U. To motivate the next definition, suppose we have a function H : U R n that is a local solution to G(x, y) = 0. Letting K : U R m R n be given by K(x) = (x, H(x)). Notice that, as a matrix [ ] Im [dk x ] =, [dh x ] where I m is the m m identity matrix. Then G(x, H(x)) = 0 is the composition of G and K and, so, we may use the chain rule, to obtain that, as linear transformations, O = dg K(x) dk x and writing [dg K(x) ] as [T T 2 ] where T has m columns and T 2 has n, we have = [ ] [ ] I T T m 2. [dh x ] Thus, O = T + T 2 dh x and so, if T 2 is invertible, we have dh x = T 2 T. Theorem 2 (Implicit Function Theorem). Suppose that S R n R m is open, a R n, b R m with (a, b) S, and G : S R n is C with G(a, b) = 0. If [dg (a,b) ] = [T T 2 ] where T 2 is an invertible n n matrix, then there is a C local solution to G(x, y) = 0 for y in terms of x. Further, the differential of the local solution H at (a, b) is dh (a,b) = T 2 T. Proof. Define F : R m+n R m+n by F (x, y) = (x, G(x, y)). Since G is C, so is F ; F (a, b) = (a, G(a, b)) = (a, 0)). The matrix of df (a,b) is the 2 2 block matrix [ ] Im O. T T 2 It is a standard result from linear algebra that [df (a,b) ] is invertible if and only if T 2 is invertible. Thus, df (a,b) is invertible and we can apply the Inverse Function Theorem to F to obtain a C local inverse around (a, b). Thus, we have an open set W R n+m with (a, b) W so that V = F (W ) is open. Note that (a, 0) V. Since F is the identity on its first n components, F must also be the identity on its first n components. Thus, there is a C function K : V R n so that, for w R m and z R n with (w, z) V, we have F (w, z) = (w, K(w, z)). 5
Moreover, since F is C, so is K. Observe that, for w and z as above, (z, w) = F (F (w, z)) = F (w, K(w, z)) = (w, G(w, K(w, z))) In particular, if z = 0, then G(w, K(w, 0)) = 0. Letting j : R m R m+n be the continuous map j(x) = (x, 0), define U = j (V ) R m, an open set, and define H : U R n by H(w) = K(w, 0). Notice that w U if and only if (w, 0) V. By the definition of H, for w U, G(w, H(w)) = G(w, K(w, 0)) = 0 while if, for some (x, y) W, G(x, y) = 0, then F (x, y) = (x, 0) V and so x U. Thus, (x, y) = F (x, 0) = (x, K(x, 0)) and so y = H(x) with x U, as required. So the sets W and U and the function H form a local solution to G(x, y) = 0 around (a, b). To see that H is C, observe that it is the restriction of the C function K. Finally, the differential of H follows from the computation given before the theorem. Notice that in solving G(x, y) = 0, we are explicitly looking for a function that expresses y in terms of x. That is, we have specific variables in mind. What if we don t distinguish between the n + m variables and just want to be able solve for n of them in terms of the other m? 6