Exercise 8.1 We have. the function is differentiable, with. f (x 0, y 0 )(u, v) = (2ax 0 + 2by 0 )u + (2bx 0 + 2cy 0 )v.

Exercise 8.1 We have f(x, y) f(x 0, y 0 ) = a(x 0 + x) 2 + 2b(x 0 + x)(y 0 + y) + c(y 0 + y) 2 ax 2 0 2bx 0 y 0 cy 2 0 = (2ax 0 + 2by 0 ) x + (2bx 0 + 2cy 0 ) y + (a x 2 + 2b x y + c y 2 ). By a x 2 +2b x y+c y 2 ( a +2 b + c ) max{ x, y } 2 ( a +2 b + c )ɛ max{ x, y }, the function is differentiable, with f (x 0, y 0 )(u, v) = (2ax 0 + 2by 0 )u + (2bx 0 + 2cy 0 )v. Exercise 8.2 Suppose F ( x) is differentiable at x 0. For any ɛ > 0, there is δ > 0, such that Then x x 0 < δ = F ( x) a L( x) ɛ x. a = F ( x 0 ), F ( x) F ( x 0 ) L( x) + ɛ x ( L + ɛ) x < ( L + ɛ)δ. This shows that { } ɛ x < min δ, L + ɛ = F ( x) F ( x 0 ) < ɛ. This proves that F ( x) is continuous at x 0. By Exercise 6.21, the norm is always continuous. Suppose x 2 has linear approximation a + l( x) at 0. Then a = 0 2 = 0 and for any ɛ > 0, there is δ > 0, such that x 2 < δ = x 2 l( x) ɛ x 2. Applying the property to x and using the linearity of l, we find Therefore x 2 < δ = x 2 + l( x) ɛ x 2. x 2 < δ = 2 l( x) x 2 l( x) + x 2 + l( x) 2ɛ x 2. This further implies l(c x) ɛ c x 2, where c x can be any vector of arbitrary length. Since ɛ is arbitrary, we conclude that l( x) = 0 for all x. Thus for any ɛ > 0, there is δ > 0, such that x 2 < δ implies x 2 ɛ x 2. This cannot hold for ɛ < 1. The contradiction shows that x 2 is not differentiable at 0. Exercise 8.3

Denote L = F ( x 0 ). For any ɛ > 0, there is δ > 0, such that x x 0 < δ = F ( x) F ( x 0 ) L( x) = F ( x) L( x) ɛ x, λ( x) λ( x 0 ) < ɛ. Then F ( x) L( x) + ɛ x ( L + ɛ) x, λ( x)f ( x) λ( x 0 )L( x) λ( x)f ( x) λ( x 0 )F ( x) + λ( x 0 )F ( x) λ( x 0 )L( x) λ( x) λ( x 0 ) F ( x) + λ( x 0 ) F ( x) L( x) ɛ( L + ɛ) x + ɛ λ( x 0 ) x = ɛ( L + ɛ + λ( x 0 ) ) x. This proves that λ( x)f ( x) is differentiable with (λf ) ( x 0 ) = λ( x 0 )L = λ( x 0 )F ( x 0 ). Exercise 8.4 If f( x) = f( x 0 ) + J( x) x and J is continuous, then with f( x) = f( x 0 ) + J( x 0 ) x + R( x) R( x) = (J( x 0 ) J( x 0 )) x J( x) J( x 0 ) 2 x 2. The continuity of J at x 0 then tells us R( x) = o( x ). Since J( x 0 ) x is linear in x, we conclude that f is differentiable at x 0 and f( x 0 ) = J( x 0 ). Conversely, suppose f is differentiable at x 0. Let Then R( x) = f( x) f( x 0 ) f ( x 0 )( x) = f( x) f( x 0 ) f( x 0 ) x. lim x 0 Thus J( x) = f( x 0 ) + R( x) x x 2 2 R( x) x x 2 2 = lim x 0 is continuous at x 0, and R( x) x 2 = 0. f( x 0 ) + J( x) x = f( x 0 ) + f( x 0 ) x + R( x) x x x 2 2 = f( x 0 ) + f( x 0 ) x + R( x) = f( x). Exercise 8.5 Suppose f(c x) = c p f( x) for c > 0. If f is differentiable at 0. Then by restricting the definition to the straight line t x for fixed x and changing t, f(t x) is differentiable at t = 0. This implies f(t x) = t p f( x) is differentiable at t = 0 +. Thus we conclude that either f( x) = 0 for all x or p 1. The continuity at 0 then implies that f( 0) = 0. Suppose p > 1. Then by x = r u, r = x, u = 1, we have f( x) = r p f( u). If f( u) < B is bounded on the unit sphere (i.e., for u = 1), then x < δ implies that f( x) δ p 1 B x,

so that F is differentiable at 0 with f ( 0) = 0. If f( u) is unbounded on the unit sphere, then no matter how small r is, we can always find u such that f( x) is arbitrarily large. Thus the function is not continuous at 0. Consequently, the function is not differentiable at 0. Suppose p = 1 and f ( 0) = l. Then g( x) = f( x) l( x) satisfies g(c x) = cg( x) for c > 0 and g ( x 0 ) = 0. Thus for any ɛ > 0, there is δ > 0, such that x < δ implies g( x) ɛ x. This further implies that g(c x) = cg( x) ɛ c x for any c > 0. Since c x can be any nonzero vector of arbitrary length, we conclude that g( x) ɛ x for all x. Since ɛ is arbitrary, we have g( x) = 0 for all x. Thus f( x) = l( x) is a linear functional. Exercise 8.6 We have A x x A x 0 x 0 = A x x 0 + A x 0 x + A x x. Since A x x 0 + A x 0 x = (A + A T ) x 0 x is a linear functional of x and A x x 2 A x 2 2 ɛ A x 2, we conclude that A x x is differentiable, with the derivative at x 0 to be (A + A T ) x 0 v. Exercise 8.7 We have B( x 0 + x, y 0 + y) = B( x 0, y 0 ) + B( x, y 0 ) + B( x 0, y) + B( x, y) The term B( x 0, y 0 ) is constant. The terms B( x, y 0 ) and B( x 0, y) are linear in ( x, y). The term B( x, y) satisfies B( x, y) B x y B ( x, y) 2, where ( x, y) = max{ x, y }. Therefore the first order derivative is given by the two linear terms B ( x 0, y 0 )( x, y) = B( x, y 0 ) + B( x 0, y). Exercise 8.8 Let φ(t) = φ(t 0 ) + φ (t 0 ) t + R 1, ψ(t) = ψ(t 0 ) + ψ (t 0 ) t + R 2, R 1 = o( t); R 2 = o( t). Then B(φ(t), ψ(t)) = B(φ(t 0 ) + φ (t 0 ) t + R 1, ψ(t 0 ) + ψ (t 0 ) t + R 2 ) = B(φ(t 0 ), ψ(t 0 )) + [B(φ (t 0 ), ψ(t 0 )) + B(φ(t 0 ), ψ (t 0 ))] t + R, R = B(φ(t 0 ), R 2 ) + B(R 1, ψ(t 0 )) + [B(φ (t 0 ), R 2 ) + B(R 1, ψ (t 0 ))] t + B(φ (t 0 ), ψ (t 0 )) t 2 + B(R 1, R 2 ). By B( u, v) B u v, we have B(φ(t 0 ), R 2 ) B φ(t 0 ) R 2, B(φ (t 0 ), R 2 ) B φ (t 0 ) R 2,

B(φ (t 0 ), ψ (t 0 )) B φ (t 0 ) ψ (t 0 ), B(R 1, R 2 ) B R 1 R 2. Then by R 1 = o( t) and R 2 = o( t), we get B(φ(t 0 ), R 2 ) = o( t), B(φ (t 0 ), R 2 ) = o( t), B(φ (t 0 ), ψ (t 0 )) t 2 = o( t), B(R 1, R 2 ) = o( t) and similar estimations for the other terms in R. We conclude that R = o( t), so that B(φ(t 0 ), ψ(t 0 ))+[B(φ (t 0 ), ψ(t 0 ))+B(φ(t 0 ), ψ (t 0 ))] t is linear approximation of B(φ(t), ψ(t)) at t 0. In particular, we get [B(φ(t), ψ(t))] (t 0 ) = B(φ (t 0 ), ψ(t 0 )) + B(φ(t 0 ), ψ (t 0 )). Exercise 8.9 By b linear in the first variable, we have d b(φ(t), w) b(φ(t 0 ), w) dt b(φ(t), w) = lim t t0 t=t0 t t 0 ( ) φ(t) φ(t0 ) = lim b, w. t t0 t t 0 Since b is a dual pairing, by Exercise 7.35, the right side equals b( v, w) for all w if and only if φ (t 0 ) = lim t t0 φ(t) φ(t 0 ) t t 0 Exercise 8.10 Let H = X A (this is X). Then F (X) F (A) = (A + H) T (A + H) A T A = A T H + H T A + H T H. Since A T H + H T A is linear in H and H T H H T H c H 2 cɛ H ( H T is also a norm of H, so that H T c H ). Thus F is differentiable, with F (A)(H) = A T H + H T A. Exercise 8.11 Suppose f(x) is function of matrix that is multilinear with respect to the columns of X. Then we have = v. f(a + H) = f( a 1 + h 1, a 2 + h 2,..., a n + h n ) = f( a 1, a 2,..., a n ) + f( a 1, a 2,..., h i,..., a n ) 1 i n + 1 i<j n + + f( h 1, h 2,..., h n ). f( a 1, a 2,..., h i,..., h j,..., a n ) The first order part gives us the derivative of f at A f (A)(H) = f( a 1, a 2,..., h i,..., a n ). 1 i n

In case f = det and the rank of a 1, a 2,..., a n is n 2, replacing one vector gives a 1, a 2,..., h i,..., a n, with rank at most n 1. Therefore all the determinants vanish, and det (A)(H) = 0. Exercise 8.12 We have (A + H) k A k = A k 1 H + A k 2 HA + + AHA k 2 + AH k 1 + R, where A k 1 H + A k 2 HA + + AHA k 2 + AH k 1 is linear in H, and R is a finite sum of products of i copies of H and k i copies of A in all possible order, with i 2. Because i 2, the norm of each such product is A k i H i A k i ɛ H. Thus A k 1 H + A k 2 HA + + AHA k 2 + AH k 1 is the derivative of X k at A. Exercise 8.13 For F (X) = X 1, we have F (I + H) = I H + H 2 H 3 + By Exercise 7.17, for H < 1 we also have so that Thus = I H + H 2 (I H + H 2 H 3 + ) = F (I) H + H 2 (I + H) 1. (I + H) 1 I (I + H) 1 I + H I 2 1 H I = H 1 H, H 2 (I + H) 1 H 2 (I + H) 1 H 1 H = 1 1 H. H 2 1 H ɛ H. This shows that the linear map H of H is the derivative of F (X) at I. Exercise 8.14 (1) We have f(x, 0) = x 2p sin 1 x 2. Thus f x(0, 0) exists if and only if 2p > 1. Conversely, for p > 1 2, we have f x(0, 0) = f y (0, 0) = 0. If the function is differentiable at (0, 0), then p > 1 and the linear approximation is 0. This 2 means that for any ɛ > 0, there is δ > 0, such that x 2 + y 2 < δ implies (x2 + y 2 ) p 1 sin (x 2 + y 2 ) < ɛ x 2 + y 2. Letting r = x 2 + y 2, we find 0 < r < δ implies r2p sin 1 ɛr. Since p > 1 r 2 2, this always happens. Thus f is differentiable at (0, 0) if and only if p > 1 2.

Exercise 8.14 (2) The function is continuous at (0, 0) if and only if p > 0 or q > 0. When p > 0, we have f(0, y) = 0 for all y. The partial derivative f y (0, 0) = 0 exists. When y q sin 1 if y 0 p = 0, we have f(0, y) = y 2, and f y (0, y) exists (and must be zero) if and 0 if y = 0 only if q > 1. Similarly, f x (0, 0) exists if and only if q > 0 or p > 1, q = 0. Suppose p > 0 and q > 0. If the function is differentiable at (0, 0), then by the computation of the partial derivatives, the linear approximation must be 0. Therefore we need x p y q lim (x,y) (0,0) max{ x, y } sin 1 = 0. By restricting the limit to the line x = y, we get x 2 + y2 p + q > 1. Now assume p > 0, q > 0 and p + q > 1, then f(x, y) (x, y) p+q ɛ (x, y) 1 when (x, y) < ɛ p+q 1. Therefore the function is differentiable at (0, 0) with 0 as the linear approximation. Exercise 8.14 (3) By f(x, 0) = f(0, y) = 0, the function has partial derivatives f x (0, 0) = f y (0, 0) = 0. If the function is differentiable at (0, 0), then the linear approximation is 0. This means that lim x,y 0 + When restricted to x m = y n, we have x p y q (x m + y n ) k (x + y) = 0. x p+q m n 2 k+1 x mk x x p y q 1 n min{m,n} (x m + y n ) k (x + y) = x p+q m n 2 k x mk (x + x m n ) x p+q 2 k x mk x. 1 n min{m,n} Therefore the restriction converges to 0 if and only if p + q m n > mk + 1 min{m, n}. This is the n same as p m + q min{m, n} > k +. n mn When restricted to x = y, we have x p+q 2 k+1 x min{m,n}k x x p y q (x m + y n ) k (x + y) = x p+q 2(x m + x n ) k x x p+q 2x min{m,n}k x. Therefore the restriction converges to 0 if and only if p + q > min{m, n}k + 1. Conversely, assume the two inequalities hold. Without loss of generality, assume m n. Then the two inequalities mean that p m + q n > k + 1, p + q > nk + 1. m m n

Then we consider three regions. For y x m n, we have x p y q (x m + y n ) k (x + y) xp+q x mk x = m xp+q n mk 1. By p + q m ( p n mk 1 = m m + q n k 1 ) > 0, the restriction of the limit on the region m converges to 0. For x m n y x, we have x p y q 2 k y nk 2x x p y q (x m + y n ) k (x + y) xp y q y nk x. x p y q Therefore lim m x n y x;x,y 0 + (x m + y n ) k (x + y) = 0 if and only if lim x m x p 1 y q nk = n y x;x,y 0 + 0. For the given x, the maximum and the minimum of x p 1 y q nk for the y in the range are x p 1 x q nk = x p+q nk 1 and x p 1 x m n (q nk) = x m( p m + q n k m) 1. Therefore limx m n y x;x,y 0 + x p 1 y q nk = 0 if and only if lim x 0 + x p+q nk 1 = 0 and lim x 0 + x m( p m + q n k m) 1 = 0. This is true, given the two inequalities. For y x, we have x p y q (x m + y n ) k (x + y) yp+q y nk y = yp+q nk 1. Since p + q > nk + 1, the restriction of the limit on the region converges to 0. We conclude that the function is differentiable at (0, 0) if and only if p m + q n min{m, n} > k +, p + q > min{m, n}k + 1. mn Exercise 8.14 (4) By f(x, 0) = x pr mk and f(0, y) = y qr nk, the function has partial derivatives if and only if pr mk > 1 and qr nk > 1. Moreover, we have f x (0, 0) = f y (0, 0) = 0 when the condition is satisfied. If the function is differentiable at (0, 0), then pr mk > 1 and qr nk > 1, the linear approximation is 0, and we have lim x,y 0 + m n (x p + y q ) r (x m + y n ) k (x + y) = 0. Denote { p λ = min m n}, q, µ = min{m, n}, ν = min{p, q}. As in Exercise 6.23(2), restricting the limit to x m = y n, we have x λmr (x p + y q ) r 2 r x λmr, (x m + y n ) k = 2 k x mk, x µ n x + y 2x µ n.

x λmr Therefore the restriction has limit 0 if and only if lim x 0 + x mk x µ n λmr > mk + µ, which is the same as n = 0. This means that { p r min m n}, q > k + min{m, n}. mn On the other hand, restricting the limit to x = y, we have x νr (x p + y q ) r 2 r x νr, x µk (x m + y n ) k 2 k x µk, x + y = 2x. Therefore the restriction has limit 0 if and only if lim x 0 + µk + 1, which is the same as x νr x µk x r min{p, q} > k min{m, n} + 1. = 0. This means that νr > Conversely, assume the two inequalities hold. Without loss of generality, assume m n. Then the two inequalities mean that { p rλ = r min m n}, q > k + 1, rν = r min{p, q} > kn + 1. m Then we consider three regions. For y x m n, we have (x p + y q ) r (x m + y n ) k (x + y) (xp + x q 2r x λmr x mk x x mk x. ( Since λmr mk 1 = m rλ k 1 ) > 0, the restriction of the limit on the region converges m to 0. For y x, we have m n ) r (x p + y q ) r (x m + y n ) k (x + y) (yp + y q ) r y nk y 2r y νr y nk y. Since νr nk 1 > 0, the restriction of the limit on the region converges to 0. For x m n y x, we have Moreover, we have y nk x (x m + y n ) k (x + y) 2 k y nk 2x. 1 2 (ur + v r ) max{u r, v r } (u + v) r (2 max{u, v}) r = 2 r (max{u, v}) r 2 r (u r + v r ). By substituting u = x p and v = y q, we see that the restriction of the limit to the region converges to 0 if and only if the limit of xpr + y qr converges to 0, which is the same as the y nk x

limits of x pr y nk x and yqr y nk x converge to 0. For fixed x, the maxima and minima of x pr y nk x and y qr y nk x for the y in the range are xpr nk 1, x pr mk 1, x qr nk 1, x qr m n mk 1. Therefore the limit converges to zero if and only if pr nk 1 > 0, pr mk 1 > 0, qr nk 1 > 0, qr m n mk 1 > 0. Since the two inequalities imply all four inequalities above, we conclude that the two inequalities are necessary and sufficient. In conclusion, the function is differentiable at (0, 0) if and only if { p r min m n}, q > k + Exercise 8.15 In Example 8.1.3, we get min{m, n}, r min{p, q} > k min{m, n} + 1. mn x 0 + x 2 2 = x 0 2 2 + 2 x 0 x + x 2 2. So the gradient should give the linear functional 2 x 0 x. This means that ( x x) = 2 x. Exercise 8.16 By the computation in Exercise 8.11, for 2 2 matrix, we have det (A)(H) = det( a 1, h 2 ) + det( h 1, a 2 ) = a 11 h 22 a 21 h 12 + a 22 h 11 a 12 h 21 ( ) ( ) a22 a = 21 h11 h 12. a 12 a 11 h 21 h 22 Here is the dot product in R 4. Therefore the gradient ( ) a22 a det = 21. a 12 a 11 Exercise 8.17 ( ) ( ) x y x For X =, we have X z w 2 = 2 + yz xy + yw xz + zw yz + w 2. If we identify the 2 2 matrix X with (x, y, z, w) T R 4, then the Jacobian matrix is (x 2 + yz) (x 2 + yz) (x 2 + yz) (x 2 + yz) x y z w (xy + yw) (xy + yw) (xy + yw) (xy + yw) 2x z y 0 (X 2 ) = x y z w (xz + zw) (xz + zw) (xz + zw) (xz + zw) = y x + w 0 y z 0 x + w z. x y z w 0 z y 2w (yz + w 2 ) (yz + w 2 ) (yz + w 2 ) (yz + w 2 ) x y z w

( ) a b Applying the matrix to the matrix H =, which corresponds to the vector (a, b, c, d) c d T R 4, we get 2x z y 0 a 2xa + zb + yc (X 2 ) (H) = y x + w 0 y b z 0 x + w z c = ya + (x + w)b + yd za + (x + w)c + zd. 0 z y 2w d zb + yc + 2wd Translated back into matrix, this means ( ) 2xa + zb + yc ya + (x + w)b + yd (X 2 ) (H) = za + (x + w)c + zd zb + yc + 2wd ( ) ( ) ( ) ( ) x y a b a b x y = + = XH + HX. z w c d c d z w This recovers the computation in Example 8.1.5. Exercise 8.18 ( ) ( ) x y For X =, we have X z w 1 1 w y =. If we identify the 2 2 matrix X xw yz z x with (x, y, z, w) T R 4, then the Jacobian matrix is w w w (X 1 ) = 1 y X xw yz z = [ ] y X z + y 1 z X xw yz x x x 0 0 0 1 w = 0 1 0 0 0 0 1 0 1 y (xw yz) 2 z (w z y x) 1 0 0 0 x 0 0 0 1 w 2 zw yw xw = 0 1 0 0 0 0 1 0 1 yw yz y 2 xy (xw yz) 2 zw z 2 yz xz 1 0 0 0 xw xz xy x 2 Then at the identity matrix, we have 0 0 0 1 1 0 0 1 1 0 0 0 (X 1 ) at I = 0 1 0 0 0 0 1 0 1 0 0 0 0 1 2 0 0 0 0 = 0 1 0 0 0 0 1 0 = identity. 1 0 0 0 1 0 0 1 0 0 0 1 This recovers the computation in Exercise 8.13. Exercise 8.19

Let f = (a, b, c). Then the condition tells us (1, 2, 2) 1 = f (1, 2, 2) = 1 (a + 2b + 2c), 3 (0, 1, 1) 2 = f (0, 1, 1) = 1 (b c), 2 3 = f Solving the system, we get f = ( 15, 5, 3). (0, 0, 1) (0, 0, 1) = c. Exercise 8.20 Let f = a 1 u 1 + a 2 u 2 + + a n u n. Then by the orthonormal property, we have D u1 f = f u 1 = a 1 u 1 u 1 +a 2 u 2 u 1 + +a n u n u 1 = a 1 1+a 2 0+ +a n 0 = a 1. The other coefficients can be similarly derived. Exercise 8.21 (1) r x = x x2 + y 2, r x = y x2 + y 2, θ x = (r, θ) The Jacobian matrix (x, y) = 1 x 2 + y ( ) 2 dr The differential = dθ y ( x2 + y 2 (xdx + ydy) ydx + xdy 1 y 1 + y2 x = y 2 x 2 + y, θ 2 y = ( x 2 x x2 + y 2 y x 2 + y 2 ). x 1 1 + y2 Exercise 8.21 (2) The Jacobian matrix (u 1 1 1 1, u 2, u 3 ) (x 1, x 2, x 3 ) = x 2 + x 3 x 3 + x 1 x 1 + x 2. x 2 x 3 x 3 x 1 x 1 x 2 du 1 dx 1 + dx 2 + dx 3 The differential du 2 = (x 2 + x 3 )dx 1 + (x 3 + x 1 )dx 2 + (x 1 + x 2 )dx 3. du 3 x 2 x 3 dx 1 + x 3 x 1 dx 2 + x 1 x 2 dx 3 Exercise 8.21 (3) sin φ cos θ r cos φ cos θ r sin φ sin θ (x, y, z) The Jacobian matrix (r, φ, θ) = sin φ sin θ r cos φ sin θ r sin φ cos θ. cos φ r sin φ 0 dx sin φ cos θdr + r cos φ cos θdφ r sin φ sin θdθ The differential dy = sin φ sin θdr + r cos φ sin θdφ + r sin φ cos θdθ. dz cos φdr r sin φdφ ). x 2 y x = x x 2 + y 2. Exercise 8.22 Suppose F ( x) < B. Then x 2 < δ implies x 2 2F ( x) = x 2 2 F ( x) < δb x 2. This shows that x 2 2F ( x) is differentiable at 0, with derivatives O. On the other hand, if we choose f( x) = k if k coordinates are rational and n k coordinates are irrational. Then x 2 2f( x) is not continuous along any coordinate except 0. Thus the function has no partial derivatives away from 0.

Exercise 8.23 (x 2 + y 2 1 ) sin if(x, y) (0, 0) The function f(x, y) = x 2 + y 2 in Exercise 8.14 is differen- 0 if(x, y) = (0, 0) tiable everywhere. But f(x, 0) = x 2 sin 1 x, and f x(x, 0) = 2x sin 1 2 x 2 2 x cos 1 is not continuous x2 at x = 0. The other partial derivative is also not continuous. Exercise 8.24 By the existence of f x (x 0, y 0 ), for any ɛ > 0, there is δ > 0, such that x < δ implies f(x, y 0 ) f(x 0, y 0 ) f x (x 0, y 0 ) x ɛ x. By mean value theorem, we have f(x, y) f(x, y 0 ) = f y (x, d) y. Then by the continuity of f y at (x 0, y 0 ), there is δ > 0, such that x < δ, y < δ implies f y (x, d) f y (x 0, y 0 ) < ɛ. Therefore f(x, y) f(x, y 0 ) f y (x 0, y 0 ) y = (f y (x, d) f y (x 0, y 0 )) y ɛ y. Combining the two estimations, for x < min{δ, δ }, y < min{δ, δ }, we have f(x, y) f(x 0, y 0 ) f x (x 0, y 0 ) x f y (x 0, y 0 ) y f(x, y 0 ) f(x 0, y 0 ) f x (x 0, y 0 ) x + f(x, y) f(x, y 0 ) f y (x 0, y 0 ) y ɛ x + ɛ y. This proves the differentiability of f at (x 0, y 0 ). In general, if one partial derivative exist and the other partial derivatives are continuous, then the function is differentiable. Exercise 8.25 If F (x, y) is differentiable, then the composition (or restriction) F (x, x) = f (x) is differentiable, which means the existence of second order derivative. Conversely, suppose f has second order derivative. By Proposition 3.4.2 we have f(x) = f(x 0 ) + f (x 0 ) x + 1 2 f (x 0 ) x 2 + R(x), R(x) lim x x0 x = 0. 2 We also note that R(x) has second order derivative, and R(x 0 ) = R (x 0 ) = R (x 0 ) = 0. The Taylor expansion implies f(x) f(y) = f (x 0 )( x y) + 1 2 f (x 0 )( x 2 y 2 ) + R(x) R(y) = f (x 0 )(x y) + 1 2 f (x 0 )(x y)( x + y) + R(x) R(y). Then for x y and (x, y) near (x 0, x 0 ), we get f(x) f(y) x y = f (x 0 ) + 1 2 f (x 0 )( x + y) + R(x) R(y). x y

If we can show that (x 0, x 0 ) with R(x) R(y) x y = o( ( x, y) ), then we find F (x, y) is differentiable at F (x 0, x 0 )(u, v) = 1 2 f (x 0 )(u + v). R(x) R(y) By the mean value theorem, we have = R (c), where c is between x and y. x y Using R (x 0 ) = R (x 0 ) = 0, we further have R (c) = R (x 0 ) + R (x 0 )(c x 0 ) + o(c x 0 ) = o(c x 0 ). Since c is between x and y, we have c x 0 max{ x x 0, y x 0 } = ( x, y). Therefore R(x) R(y) x y = R (c) = o( ( x, y) ). This completes the proof that F is differentiable at (x 0, x 0 ). In case x 0 y 0, both f(x) f(y) and x y are differentiable at (x 0, y 0 ) and x y 0 near (x 0, y 0 ). Therefore the quotient F (x, y) is differentiable at (x 0, y 0 ). Additional: F is continuous if and only if f is continuous. We already know that, if f has second order derivative, then f (x)(x y) f(x) + f(y), if x y, F x (x, y) = (x y) 2 1 2 f (x), if x = y, and we have the similar formula for F y. The continuity of F means the continuity of F x and F y. The formula tells us that the continuity of F x (x, x) already means f is continuous. Conversely, suppose f (x) is continuous. We want to show that F x is continuous. The continuity of f already implies the continuity of F x at (x 0, y 0 ) whenever x 0 y 0. It remains to show the continuity at (x 0, x 0 ). For x y, we have F x (x, y) 1 2 f (x 0 ) = f(y) f(x) f (x)(y x) 1 2 f (x 0 )(y x) 2 (y x) 2 = r 1(y, x) (y x) 2 = r 1(y, x) r 1 (x, x) (y x) 2 (x x) 2 = D 1r 1 (c 1, x) 2(c 1 x) = f (c 1 ) f (x) f (x 0 )(c 1 x) 2(c x) = r 2(c 1, x) 2(c x) = r 2(c 1, x) r 2 (x, x) 2(c x) = D 1r 2 (c 2, x) = f (c 2 ) f (x 0 ). 2 Since c 1 is between y and x 0, and c 2 is between c 1 and x 0, the continuity of f shows that the limit above as (x, y) (x 0, y 0 ) is 0.

Exercise 8.31 We may copy the argument for Exercise 8.8 word by word. The argument using small o notation is equivalent to using Exercises 8.27 and 8.29. Let F ( x) = F ( x 0 ) + F ( x 0 )( x) + R 1, G( x) = G( x 0 ) + G ( x 0 )( x) + R 2, R 1 = o( x ); R 2 = o( x ). Then B(F ( x), G( x)) = B(F ( x 0 ) + F ( x 0 )( x) + R 1, G( x 0 ) + G ( x 0 )( x) + R 2 ) = B(F ( x 0 ), G( x 0 )) + B(F ( x 0 )( x), G( x 0 )) + B(F ( x 0 ), G ( x 0 )( x)) + R, R = B(F ( x 0 ), R 2 ) + B(R 1, G( x 0 )) + B(F ( x 0 )( x), R 2 ) + B(R 1, G ( x 0 )( x)) By B( u, v) B u v, we have + B(F ( x 0 )( x), G ( x 0 )( x)) + B(R 1, R 2 ). B(F ( x 0 ), R 2 ) B F ( x 0 ) R 2, B(R 1, G( x 0 )) B G( x 0 ) R 1, B(F ( x 0 )( x), R 2 ) B F ( x 0 )( x) R 2 B F ( x 0 ) x R 2, B(R 1, G ( x 0 )( x)) B G ( x 0 )( x) R 1 B G ( x 0 ) x R 1, B(F ( x 0 )( x), G ( x 0 )( x)) B F ( x 0 )( x) G ( x 0 )( x) B F ( x 0 ) G ( x 0 ) x 2, B(R 1, R 2 ) B R 1 R 2. Then by R 1 = o( x) and R 2 = o( x), the estimations above imply that B(F ( x 0 ), R 2 ) = o( x ), B(F ( x 0 ), R 2 ) = o( x ), and the other four terms in R are o( x 2 ). Therefore R = o( x ), and B(F ( x 0 ), G( x 0 )) + B(F ( x 0 )( x), G( x 0 )) + B(F ( x 0 ), G ( x 0 )( x)) is the linear approximation of B(F ( x), G( x)) at x 0. In particular, we get B(F, G) ( x 0 )( v) = B(F ( x 0 )( v), G( x 0 )) + B(F ( x 0 ), G ( x 0 )( v)). [Alternative: Argue as special case of Exercise 8.28] Let a = F ( x 0 ), b = G( x 0 ), L = F ( x 0 ) and K = G ( x 0 ). Then F ( x) x a + L( x), G( x) x b + K( x). By Exercise 8.28, we have B(F ( x), G( x)) x B( a + L( x), b + K( x)) = B( a, b) + B(L( x), b) + B( a, K( x)) + B(L( x), K( x)). Since B(L( x), K( x)) B L( x) K( x)) B L K x) 2,

we get B(L( x), K( x)) x 0. Then by Exercise 8.27, we get B( a, b) + B(L( x), b) + B( a, K( x)) + B(L( x), K( x)) x B( a, b) + B(L( x), b) + B( a, K( x)). Further by Exercise 8.29, we get Then the linear part B(F ( x), G( x)) x B( a, b) + B(L( x), b) + B( a, K( x)). B(L( x), b) + B( a, K( x)) = B(F ( x 0 )( x), G( x 0 )) + B(F ( x 0 ), G ( x 0 )( x)) of the right side is then the derivative B(F, G) ( x 0 )( x). Exercise 8.32 If G is a multilinear map, and F 1,..., F k are differentiable maps, then G(F 1, F 2,..., F k ) = G(F 1, F 2,..., F k ) + G(F 1, F 2,..., F k ) + + G(F 1, F 2,..., F k). Exercise 8.33 (1) We have F ( x) = F ( x 0 ) + F ( x 0 )( x) + R 1 ( x) = F ( x 0 )( x) + R 1 ( x), R 1 ( x) = o( x ), and similarly G( x) = G( x 0 ) + G ( x 0 )( x) + R 2 ( x) = G ( x 0 )( x) + R 2 ( x), R 2 ( x) = o( x ). Then B(F ( x), G( x)) = B(F ( x 0 )( x), G ( x 0 )( x)) + B(F ( x 0 )( x), R 2 ( x)) + B(R 1 ( x), G ( x 0 )( x)) + B(R 1 ( x), R 2 ( x)). For any ɛ > 0, there is δ > 0, such that x < δ = R 1 ( x) < ɛ x, R 2 ( x) < ɛ x. Then B(F ( x 0 )( x), R 2 ( x)) B F ( x 0 )( x) R 2 ( x) B F ( x 0 ) x ɛ x, B(R 1 ( x), G ( x 0 )( x)) B R 1 ( x) G ( x 0 )( x) B ɛ x G ( x 0 ) x, B(R 1 ( x), R 2 ( x)) B R 1 ( x) R 2 ( x) B ɛ x ɛ x. This implies B(F ( x), G( x)) B(F ( x 0 )( x), G ( x 0 )( x)) ɛ B ( F ( x 0 ) + G ( x 0 ) + ɛ) x,

and proves B(F, G) x 2 B(F ( x 0 )( x), G ( x 0 )( x)). In general, if G( y 1,..., y k ) is multilinear, and F i ( x 0 ) = 0, then G(F 1,..., F k ) x k G(F 1( x 0 )( x),..., F k( x 0 )( x)). Exercise 8.33 (2) We have Q( y) = B( y, y) for a symmetric bilinear map B. Then by the first part, we have Q(F ) x 2 B(F ( x 0 )( x), F ( x 0 )( x)) = Q(F ( x 0 )( x)). In general, if G( y) is k-th order, and F i ( x 0 ) = 0, then G(F ( x)) x k G(F ( x 0 )( x)). Exercise 8.34 The gradient is defined by f ( x 0 )( v) = f( x 0 ) v for all v. The calculation of the gradient follows from the calculation of the derivative. By (f + g) ( x 0 ) = f ( x 0 ) + g ( x 0 ), we get (f + g)( x 0 ) v = f( x 0 ) v + g( x 0 ) v = ( f( x 0 ) + g( x 0 )) v. Since this holds for all v, we get (f + g) = f + g. By (fg) ( x 0 )( v) = g( x 0 )f ( x 0 )( v) + f( x 0 )g ( x 0 )( v), we get (fg)( x 0 ) v = g( x 0 )( f( x 0 ) v) + f( x 0 )( g( x 0 ) v) = (g( x 0 ) f( x 0 ) + f( x 0 ) g( x 0 )) v. Since this holds for all v, we get (fg) = g f + f g. We have Since this holds for all v, we get (F G) ( x 0 )( v) = F ( x 0 )( v) G( x 0 ) + F ( x 0 ) G ( x 0 )( v) = v F ( x 0 ) (G( x 0 )) + G ( x 0 ) (F ( x 0 )) v = [F ( x 0 ) (G( x 0 )) + G ( x 0 ) (F ( x 0 ))] v. (F G)( x 0 ) = F ( x 0 ) (G( x 0 )) + G ( x 0 ) (F ( x 0 )), or (F G) = F (G) + G (F ). Exercise 8.35 We may prove the chain rule similar to the single variable case. This means applying Exercise 8.30 to the case P is the linear approximation of F, with u = x, and Q is the linear approximation of G, with v = y.

The following is a more direct proof. Let y 0 = F ( x 0 ), z 0 = G( y 0 ), L = F ( x 0 ), K = G ( y 0 ). For any ɛ > 0, there is µ > 0, such that y < µ implies G( y) z 0 K( y) ɛ y. Then there is δ > 0, such that x < δ implies F ( x) y 0 L( x) ɛ x. Note that this implies F ( x) y 0 ( L + ɛ) x < ( L + ɛ)δ. By choosing δ < to begin with, we also know F ( x) y 0 < µ, so that Then G(F ( x)) z 0 K(F ( x) y 0 ) ɛ F ( x) y 0 ɛ( L + ɛ) x. G(F ( x)) z 0 K(L( x)) G(F ( x)) z 0 K(F ( x) y 0 ) + K(F ( x) y 0 L( x)) ɛ( L + ɛ) x + K F ( x) y 0 L( x) ɛ( L + ɛ + K ) x. µ L + ɛ This implies that z 0 + K(L( x)) is a linear approximation of G(F ( x)) at x 0 and (G F ) ( x 0 ) = K L. Exercise 8.36 By restricting (chain rule used here) to x = t v, we see that f( x 2 ) is differentiable away from 0 if and only if f(t) is differentiable for t > 0. Now consider the differentiability of f( x 2 ) at 0. First, since differentiable functions are continuous, we must have f(0) = lim t 0 + f(t). Suppose the linear functional l( v) is the derivative of f( x 2 ) at 0. Then for any ɛ > 0, there is δ > 0, such that x 2 < δ implies f( x 2 ) f(0) l( x) ɛ x 2. By applying the condition to x, we find x 2 < δ implies f( x 2 ) f(0) + l( x) ɛ x 2. Thus x 2 < δ implies 2l( x) f( x 2 ) f(0) l( x) + f( x 2 ) f(0) + l( x) 2ɛ x 2. Thus l( x) ɛ x 2 for any x satisfying x 2 < δ. This further implies l(c x) ɛ c x 2, where c x can be any vector of arbitrary length. Therefore we must have l(c x) = 0 for all x. Thus the differentiability at 0 means that for any ɛ > 0, there is δ > 0, such that x 2 < δ implies f( x 2 ) f(0) ɛ x 2. In other words, 0 < t < δ implies f(t) f(0) ɛt. This f(t) f(0) means lim t 0 + = 0. Thus we conclude that f( x 2 ) is differentiable at 0 if and t only if f +(0) = 0. Exercise 8.37 By the chain rule, F G = id and G F = id implies F G = I and G F = I. Thus the linear transforms F and G are inverse to each other. Exercise 8.38

By taking the derivative of f(t, t 2 ) = 1, we get f x (t, t 2 )+f y (t, t 2 )2t = 0. Then by f x (t, t 2 ) = t, we get f y (t, t 2 ) = 1 2. Exercise 8.39 Let f = (a, b). Then φ = (1, 2) and ψ = (2, 1) at (1, 1), and the condition tells us 2 = f (1, 2) = a + 2b, 3 = f (2, 1) = 2a + b. Solving the system, we get f = 1 (4, 1). 3 Exercise 8.40 We have f r = f x cos θ + f y sin θ and f θ = f x r sin θ + f y cos θ. Then D er f = f e r = f x cos θ + f y sin θ = f r and D eθ f = f e θ = f x sin θ + f y cos θ = r 1 f θ. By Exercise 6.1.32, we have f = f r e r + r 1 f θ e θ. Exercise 8.41 We have f r = f x cos θ + f y sin θ = r 1 (xf x + yf y ). Therefore f is independent of r f r = 0 xf x + yf y = 0. We also have f θ = f x r sin θ + f y r cos θ = yf x + xf y. Therefore f is independent of θ f θ = 0 yf x + xf y = 0. Exercise 8.42 ( z xg x + yg y + zg z =x f v 2 x + f w ( z y = y 2 y + z 2 z ) y 2 x ) f u + = yzf u + zxf v + xyf w = uf u + vf v + wf w. ( ) ( ) x z y x + y f w 2 y + f u 2 + z f u y 2 z + f v 2 z ( ) ( ) x z y x z 2 z + x 2 f v + x x 2 x + y 2 f w y Exercise 8.43 If f(x, y) = φ(xy), then f x = yφ (xy), f y = xφ (xy). Therefore xf x = yf y. ( y If f(x, y) = φ, then f x = x) y ( y ) x 2 φ, f y = 1 ( y ) x x φ. Therefore xf x + yf y = 0. x Exercise 8.44 We have (g f) ( x 0 )( v) = g (f( x 0 ))(f ( x 0 )( v)) = g (f( x 0 ))( f( x 0 ) v) = [g (f( x 0 )) f( x 0 )] v. Here the last equality is due to the following reason: Both g (f( x 0 )) and f( x 0 ) v are actually numbers. Therefore the application of the former (as linear transform) to the later (as vector) is simply multiplication of two numbers. Since the equality holds for all v, we get (g f)( x 0 ) = g (f( x 0 )) f( x 0 ), or (g f) = (g f) f. For the more general g(f ( x)), we have (g F ) ( x 0 )( v) = g (F ( x 0 ))(F ( x 0 )( v)) = g(f ( x 0 )) F ( x 0 )( v) = F ( x 0 ) ( g(f ( x 0 ))) v.

Since the equality holds for all v, we get (g F )( x 0 ) = F ( x 0 ) ( g(f ( x 0 ))), or (g F ) = F ( g F ). Exercise 8.45 The derivatives of X A 1 X, X XA 1 are H A 1 H, H HA 1 because the two maps are linear. By Exercise 8.12, the derivative of X X 1 at I is H H. Thus the composition X (A 1 X) 1 A 1 = X 1 has the derivative ( A 1 H)A 1 = A 1 HA 1 at A. Exercise 8.46 x2 y The function is f(x, y) = for (x, y) (0, 0) and f(0, 0) = 0. Since f(0, 0) = (0, 0), x 2 + y2 we always have f(0, 0) v = 0. On the other hand, the direct calculation shows that D v f = a 2 b 0 is not zero for most v = (a, b). So the function fails the formula D v f = f v. Since the formula D v f = f v is a special case of the chain rule, the failure of the formula D v f = f v implies the failure of the chain rule. Since the straight line is always differentiable, the failure must be due to the non-differentiability of f. Exercise 8.47 As explained in Example 8.1.13, the function in Example 6.2.3 satisfies the equality D v f( x 0 ) = f( x 0 ) v for all v. As explained in Exercise 8.46, this means that f(φ(t)) satisfies the chain rule for all straight lines φ(t). Yet the function is not differentiable. Exercise 8.48 The chain rule formula is a formula for the partial derivatives. The partial derivative is taken for a specific variable, while keeping all the other variables constant. Therefore in the chain rule formula for F G, only the differentiability of G along one particular variable is used. Since this differentiability is equivalent to the existence of the corresponding partial derivative, we do not expect a counterexample where G only has partial derivatives but is not differentiable. Exercise 8.49 By the equivalence of norms, we will conclude F ( b) F ( a) C F ( c) b a for some C > 0 in Proposition 8.2.2, where C is independent of the functions and variables (it depends only on the norms of the Euclidean spaces). Exercise 8.50 If f x (x, y) = 0 on an open subset U, such that the intersection of U with any x-line y = c is an interval, then f(x, y) = g(y) for a function g of y. More generally, suppose f x ( x, y) on an open subset U, such that the intersection of U with any plane y = c is path connected, then f( x, y) = g( y) for a function g of y.