20. Derivatives of compositions: the chain rule At the en of the last lecture we iscovere a nee for the erivative of a composition. In this lecture we show how to calculate it. Accoringly, let P have omain [a, b], let Q be such that its omain is the range of P an let S QP be their composition, i.e., S(x) Q(P(x)). (20.) Then S(x + h) Q(P(x + h)), implying S(x + h) S(x) Q(P(x + h)) Q(P(x)) DQ( S, [x,x + h] ) (20.2) h h To fin the erivative of S, we must extract the leaing term of this expression. Details are in the appenix, where it is shown that Q(P(x + h)) Q(P(x)) h P (x) + O[h]. (20.3) Thus S (x) P (x) (20.4) or, in mixe notation, x { Q(P(x)) } P (x) Q (P(x)). (20.5) This formula for the erivative of a composition is known as the chain rule. The chain rule is often easiest to work with in ifferential notation. Define y P(x), z Q(y), (20.6) so that z Q(P(x)) S(x). Then y/x P (x), z/y Q (y) an z/x S (x). With these substitutions, (4) reuces to z z y x y x. (20.7) It is important to note that (4), (5) an (7) say exactly the same thing, but in ifferent notations (stanar, mixe an ifferential, respectively). Each has avantages an isavantages, which is why we use them all. Our first task for the chain rule is to fin the erivative of the exponential function. Recall from Lecture 7 that exp is efine on [0, ) by where {φn n } is efine on [0, ) by That is, where exp(x) lim φ (x) (20.8) n n φ n(x) + x n n. (20.9) φ (x) ( P(x) ) n, (20.0) n P(x) + x n. (20.)
M. Mesterton-Gibbons: Biocalculus, Lecture 20, Page 2 Because x 0, the range of P is [, ). So efine Q on [, ) by Q(y) y n. (20.2) Then, from (6.20) an Table 8., we have both an P (x) Moreover, from (0) an (2), So the chain rule implies that 0 + x + x n x { } + n n n x { x} (20.3) Q (y) ny n. (20.4) φ n (x) Q(P(x)). (20.5) { x φ (x)} P (x) n n n(p(x))n φ n(x) (20.6) (P(x)) n (P(x))n + x P(x) n on using (0), (3) an (4) with y P(x). Now let n become infinitely large. Then x/n becomes arbitrarily close to zero, so that + x/n becomes arbitrarily close to ; an so the right-han sie of (6) approaches the limit of φn(x) as n, which is exp(x), by (8). At the same time, the left-han sie of (6) approaches the erivative of the limit of φn(x) as n, i.e., {exp(x)}/x. Thus x exp(x) Our secon task for the chain rule is to calculate the erivative of the function R efine in (9.7) by exp(ax m ). With P, Q efine on [0, ) by Ax m we have R(x) max m Q by (6.20) an Table 8.. But (7) an (20) imply Q exp(y), an hence Q exp(p(x)). Thus (27) yiels max m exp(ax m ). { } exp(x). (20.7) R(x) (20.8) P(x) (20.9) an Q(y) exp(y), (20.20) Q(P(x)). Thus, from the chain rule, { x exp(axm )} x { Q(P(x)) } (x) (P(x)) x Axm { } (20.27) (P(x)) (y) (P(x)) x exp(axm { )} (20.28) A further application of the chain rule yiels the probability ensity function of the istribution fitte in Lecture 9 to ata on prairie og survival, leaf thickness an minnow size. From (9.32), its c..f. is efine on [0, ) by
M. Mesterton-Gibbons: Biocalculus, Lecture 20, Page 3 f(x) x R(x). (20.29) x exp(ax m ) Suppose that y R(x), an that z /y. Then, from version (7) of the chain rule, y x f(x) z x z y y y 2 )max m exp(ax m ) y ( { x exp(axm )} maxm exp(ax m ) max m { exp(ax m )} 2 exp(ax m ), agreeing with (9.34) by virtue of (28). The prouct rule now enables us to fin where f is increasing or ecreasing. From (8) an (30), we have if we efine f (x) x x max m exp(ax m ) R(x) maxm f(x) max m + ma R(x) f(x) max m + ma R(x) f(x) max m + + x R(x) R(x) maxm { } x maxm { } x xm (m )xm 2 (m ) xr(x) maxm (20.30) (m ) f(x) max m + x maf(x) {{x*} m x m }, (20.3) x m m x *. (20.32) m A Note from (3) that f (x) > 0 if x < x*, f (x) < 0 if x > x*, an f (x*) 0. So f has a global maximum at x x*. In other wors, x* is the moe of the istribution. Because f F, if f (x*) 0 then F (x*) 0. So x x* is also where the c..f. has its inflexion point. The istribution efine by (30) is a special case of a istribution known as the Weibull. It is convenient to set A / s m an rewrite c..f. an p..f. as F(x) (20.33) exp({x / s} m ) an m In the more general case, m nee not be an integer. See Lecture 22.
M. Mesterton-Gibbons: Biocalculus, Lecture 20, Page 4 with moe f(x) m(x / s) m s exp({x / s} m ), (20.34) x * s, (20.35) m by (32). Then, if sf(x) is plotte on a vertical axis against x/s on a horizontal axis, the shape of the graph is completely etermine by m; accoringly, we call m the shape parameter an s the scale parameter. The Weibull istribution is very versatile. It was use in Lecture 9 to moel variation in prairie og survival (m ), leaf thickness (m 7) an minnow size (m 5), an it will be use in Lecture 2 to moel variation in rat pupil size (m 2). Figure shows the p..f. for m,..., 6. The chain rule yiels an exceeingly simple an useful result when P an Q are inverse functions, i.e., when y P(x) x Q(y). (20.36) Then Q(P(x)) x, implying P (x) In other wors, (36) implies P (x) x Q(P(x)) Q (y), or m { } x x { }. (20.37) Q (y) P, (20.38) which is often rewritten in ifferential notation as (x) x y y x. (20.39) In particular, if y exp(x) then x ln(y) from Lecture 7, whereas P (x) {exp(x)}/x exp(x) y from (7). So (38) yiels { y ln(y) } y, y (20.40) or, which is exactly the same thing, x { ln(x) }, x. (20.4) x The graphs of the logarithm an its erivative are sketche in Figure 2.
M. Mesterton-Gibbons: Biocalculus, Lecture 20, Page 5 Exercises 20 20. Fin S (x) for S efine on [c +, ) by (i) S(x) (x c) 0 (ii) S(x) (x c) 4 (iii) S(x) x c (iv) S(x) ln(x c) In each case, state the omain an range of S. 20.2 If P is an arbitrary smooth function with erivative P on any subset of (-, ), what is { x exp(p(x)) }? 20.3 If P is an arbitrary smooth function whose range is a subset of (0, ), what is x { ln(p(x)) }? 20.4 Fin the moe, m, of the istribution on [0, ) with p..f. efine by f(x) 20 243 x3 (3 x) 2 if 0 x < 3 0 if 3 x < 20.5 Fin the moe, m, of the istribution on [0, ) with p..f. efine by f(x) 35 26244 x4 (4 x) 5 if 0 x < 4 0 if 4 x < 20.6 The function R is efine on [0, ) by At + Bt 3 if 0 x < R(t) t if x <. + t What must be the values of A an B if R is smooth on [0, )? 20.7 A function F is efine on [0, ) by F(t) (t 4 + t 3 + ) 3 (i) Why must F be the c..f. of a ranom variable, say T, on [0, )? (ii) What is the probability that T excees? (iii) Fin the p..f. of the istribution, an sketch its graph. (iv) Hence fin the moe of the istribution, approximately.
M. Mesterton-Gibbons: Biocalculus, Lecture 20, Page 6 Appenix 20: On the chain rule The purpose of this appenix is to establish (3). Let P have omain [a, b], let Q be efine on the range of P an let S QP be their composition. Then S(x) Q(P(x)), a x b. (20.A) Because P is smooth on [a, b], we have DQ(P, [x, x + h]) P (x) + ε P(h,t), where ε (h,t) P O[h] an h is a very small positive number. Thus P(x + h) P(x) + h{ P (x) + εp(h,t)} (20.A2a) P(x) + h{ P (x) + O[h]}. (20.A2b) Similarly, because S is smooth on [a, b], DQ(S, [x, x + h]) S (x) + O[h] or S(x + h) S(x) + h{ S (x) + O[h]}. (20.A3) Moreover, because Q is smooth on the range of P, DQ(Q, [y, y + h]) Q (y) + O[ h] or Q(y + h) Q(y) + h{ Q (y) + O[h]} (20.A4) where y belongs to the range of P (or omain of Q) an h is a very small positive number. Any very small positive number will o: If h is one such number, then h times a positive constant is another. So any O[h] is a possible h. Accoringly, in (A4), we set y P(x) an take (x) + ε (h,t)}, P h{ P (x) + O[h])}, h h{ P (20.A5a) P (20.A5b) which is O[h] by (3.27), (3.29) an Exercise 3.4. Now (A4) implies Q(P(x) + h{ P (x) + ε P(h,t)}) Q(P(x)) + h{ P (x) + O[h]}{ + O[h]}. (20.A6) Note that we have replace h by expression (A5a) in the left-han sie of (A4) but by expression (A5b) on the right-han sie, because ifferent amounts of precision are require. The extra precision on the left-han sie enables us to exploit (A2a), which reuces (A6) to Q(P(x + h)) Q(P(x)) + h{ P (x) + O[h]}{ + O[h]} (20.A7) Q(P(x)) + h P (x) + h{ P (x)o[h] + O[h] + O[h]O[h]}, (20.A8) on simplification. But h is O[h], an so P (x)o[h] + O[h] + O[h]O[h] is also O[h], by (3.27)-(3.29). Moreover, Q(P(x)) S(x), implying Q(P(x+h)) S(x+h). So (A8) yiels S(x + h) S(x) + h{ P (x) + O[h]}, (20.A9) implying as require. DQ(S,[x,x + h]) S(x + h) S(x) h P (x) + O[h], (20.A0)
M. Mesterton-Gibbons: Biocalculus, Lecture 20, Page 7 Answers an Hints for Selecte Exercises 20. (i) Define P an Q by P(x) x c an Q(y) y 0, so that Q(P(x)) {P(x)} 0. Then S(x) (x c) 0 {P(x)} 0 Q(P(x)), P (x) x { x c} x { x} x { c} 0 0 an Q (y) 0y 9 (by Table 8.), implying 0{P(x)} 9 0(x c) 9. So, by the chain rule, S (x) P (x) 0(x c) 9 0(x c) 9. That is, {(x c)0} 0(x c) 9. x Note that P has omain [c +, ) an range [, ), so that Q has omain [, ) an range [, ). Thus S, like P, has omain [c +, ) an range [, ). (ii) As before, P(x) x c P (x) ; but now Q(y) y 4, implying Q (y) 4y 5 (by Appenix B). So S (x) P (x) ( 4{P(x)} 5 ), or 4 {(x c) 4} 4(x c) 5 x (x c). 5 Note that, although Q still has omain [, ), the range of Q is now (0, ]. Thus S has omain [c +, ) an range (0, ]. (iii) Again, P(x) x c P (x) ; but now Q(y) y, so that (by Exercise 7.4). So Q (y) 2 y 2 P(x) S (x) P (x) P (x) 2 P(x) The omains an ranges of P, Q an S are the same as for (i). 2 x c. (iv) Now Q(y) ln(y), so that Q (y) /y by (40). Hence S (x) P (x) P (x) P(x) x c. Q still has omain [, ), but its range is now [0, ). So S has omain [c +, ) an range[0, ). 20.2 Define Q by Q(y) exp(y), so that Q (y) exp(y) exp(p(x)). Then, by the chain rule, { x exp(p(x)) } x { Q(P(x)) } P (x) P (x)exp(p(x)). Inee this result is effectively containe in the lecture. 20.3 Define Q by Q(y) ln(y), so that Q (y) /y /P(x). Then, by the chain rule, x { ln(p(x)) } x { Q(P(x)) } P (x) P (x) P(x) P (x) P(x). Inee this result is effectively containe in the solution to Exercise (iv).
M. Mesterton-Gibbons: Biocalculus, Lecture 20, Page 8 20.4 Note that f(x) g(x)/l, where g(x) x 3 (3 x) 2 if 0 x < 3 0 if 3 x < an L 243/20 is a positive constant whose value is irrelevant (because f an g have the same global maximizer). By the chain rule with P(x) 3 x P (x) an Q(y) y 2 Q (y) 2y 2P(x) 2(3 x), we have {(3 x)2} x x { Q(P(x)) } P (x) ( ) 2(3 x) 2(3 x). Thus, on using the prouct rule, we have g (x) { x x3 (3 x) 2 } { x x3 } (3 x) 2 + x 3 {(3 x)2} x 3x 2 (3 x) 2 + x 3 { 2(3 x)} x 2 (3 x) { 3(3 x) 2x} x 2 (3 x) { 9 5x}. Because g (x) > 0 if 0 < x < 9/5 an g (x) < 0 if 9/5 < x < 3, g has a maximum where x 9/5. Therefore f also has a maximum where x 9/5. So m 9/5. 20.5 As in the previous exercise, f(x) g(x)/l on [0, ), where now x 4 (4 x) 5 if 0 x < 4 g(x) 0 if 4 x < an L 26244/ 35 > 0. By the chain rule with P(x) 4 x P (x) an Q(y) y 5 Q (y) 5y 4 5{P(x)} 4 5(4 x) 4, we have {(4 x)5} x x { Q(P(x)) } P (x) ( ) 5(4 x)4. Thus, on using the prouct rule, we have g (x) { x x4 (4 x) 5 } { x x4 } (4 x) 5 + x 4 {(4 x)5} x 4x 3 (4 x) 5 + x 4 { 5(4 x) 4 } { } x 3 (4 x) 4 { 6 9x} x 3 (4 x) 4 4(4 x) 5x on [0, 4]. Because g (x) > 0 if 0 < x < 6/9 an g (x) < 0 if 6/9 < x < 4, g has a maximum where x 6/9. Therefore f also has a maximum where x 6/9. So the moe of the istribution is m 6/9.
M. Mesterton-Gibbons: Biocalculus, Lecture 20, Page 9 20.6 First note that, by the chain rule with P(t) + t an Q(y) y we have {( + t) } t { Q(P(t)) } P (t) Q (P(t)) {0 + } ( {P(t)} 2 ) ( + t) 2. Therefore, t t + t 2 ( + t) t + t 2 t + t 2 t + t (Alternatively, but less simply, {( t) ( + t) } { ( t) } ( + t) + ( t) {( + t) } t t t t { } (0 ) ( + t) + ( t) ( + t) 2 ( + t)( + t) 2 ( t)( + t) 2 { + t + t}( + t) 2, on using the prouct rule.) So 0 2( + t) 2. A + 3Bt 2 if 0 t < R (t) 2 if t <. ( + t) 2 The continuity conition is R(-) R(+) or A + B 0. The smoothness conition is R (-) R (+) or A + 3B /2. So A + B + 2B /2 0+ 2B /2 B /4. Then A B /4. Compare your solution to Exercise 5.3. 20.7 (i Because F(0) 0 an F( ) an F is a strictly increasing function. (ii) Prob(T > ) Int(f, [, )) F( ) F() 26/27 /27. (iii) Define P an Q by P(t) t 4 + t 3 + an Q(y) y 3. Then, by our rule for the erivative of a sum of multiples an Table 8., we have P (t) 4t 3 + 3t 2 an Q (y) 3 y 4, implying Q (P(t)) 3(t 4 + t 3 + ) 4. So, from the chain rule, f(t) F (t) t { Q(P(t)) } 0 P (t) Q (P(t)) (4t 3 + 3t 2 ){ 3(t 4 + t 3 + ) 4 } 3t2 (4t + 3) (t 4 + t 3 + ). 4 Go to http://www.math.fsu.eu/~mmg/quizbank/mac23.f97/ Answers/assC2.gif for the graph. (iv) From the graph, Max(f, [0, )) f(0.52).9, approximately. So m 0.52.