Chain rules via multiplication Bro. Davi E. Brown, BYU Iaho Dept. of Mathematics. All rights reserve. Version 0.44, of June 16, 2014 Answer to Exercise 2.1 correcte, minor eits mae, numbering of exercises mostly correcte on 2014-06-13. Contents 1 Introction 1 2 A fancier Calc I example 2 3 Chain rules for multivariate functions 3 3.1 Aitional chain rules?........................................ 4 3.2 Isn t there a better way to write own chain rules?........................ 5 3.2.1 A more compact way of writing chain rules........................ 5 3.2.2 A more flexible way of writing chain rules......................... 6 4 The Chain Rule 7 5 Aitional exercises 9 6 Answers, etc. 9 1 Introction You learne a chain rule for ifferentiating compositions of functions in Calculus I. It probably looke something like ( g(u(x)) ) = g (u(x))u (x), which is sometimes shortene to g xg(u(x)) = x. The truth is slightly more complicate: The chain rule is really g(u) g(u(x)) = x u=u(x) x, (1) but we re usually too lazy to write all this, so we use one of the abbreviate versions given above. You know how this goes: For example, if f(x) = sin 3 x, you can think of g as being the cubing function (that is, g(u) = u 3 ), u as being the sine function (i.e., u(x) = sin x). Then g(u(x)) = (sin x) 3 = sin 3 x = f(x). The chain rule (Equation (1)) says f x = g u=sin x x = u3 ( u=sin x x = 3u 2 u=sin )(cos x) = 3 sin 2 x cos x. x Back when my father learne calculus, this technique of ifferentiating by substitution was use quite commonly, much as we integrate by substitution. Differentiation by substitution has gone out of style, but 1
it s a vali technique. Feel free to use it, if it helps you with the chain rule. We will make goo use of it in this ocument. The question at h is what the chain rule looks like when there are more variables than x floating aroun. We ll sneak up on this question, by looking at a slightly more complicate Calc I example, to get a sense for how aitional variables might be hle. Then we ll look at chain rules for partial erivatives for functions of more than one variable. I say rules instea of rule because there are many chain rules. Fortunately, we will combine them all into one mother-of-all-chain-rules. Along the way, we ll examine spiffy uses of symbols, to help keep the teium own. Spiffy uses of symbols will inclue introcing matrices matrix multiplication at some point. An, as you have no oubt iscerne, this little exposition is rather informal. We can tackle formalities some other time. 2 A fancier Calc I example Careful examination of a suitable example can provie a stepping stone to chain rules for multivariate functions. Let s try ifferentiating f(x) = (ln x) 2 + sin 3 x. Rather than treating this minlessly as a Calc I problem, let s examine it in some etail, eliberately introcing aitional variables to increase its value as a brige to the multivariate case. So: f has two terms: (ln x) 2 sin 3 x. I want you to think of these two terms as u 2 v 3, respectively, so that This requires us to set f(x) = f(u, v) = u 2 + v 3. (2) u = ln x v = sin x. We re using a substitution one that happens to have two parts to it. I call u v intermeiate variables because f epens on u v, which in turn epen on x y this means they are between f on the one h x y on the other. To ifferentiate f with respect to x, we can start with the (ln x) 2 term, think of it as u 2, use the Calc I chain rule, in a knee-jerk reaction sort of way: ( ) 1 x (ln x)2 = (2 ln x) = 2 ln x x x. Notice that the 2 ln x bit is really in isguise, 1 ( ) 1 is actually u=ln x x x. So (2 ln x) is really x. Put this together fin that x u=ln x ( x (ln x)2 = u=ln x ) ( ) The ifferentiation of the other term is like unto it: ( ) ( ) x sin3 x = v = 3v 2 v=sin (cos x) = (3 sin 2 x)(cos x) = 3 sin 2 x cos x. x x v=sin x It s the left-most equality that interests me here: ( x sin3 x = (Compare with Equation (3).) v=sin x x (3) ) ( ) v. (4) x Page 2
Here s the punchline: We can get the entire erivative of f = (ln x) 2 + sin 3 x by aing the results of (3) (4) together: ( ) ( ) ( ) ( ) f x = + v = 2 ln x + 3 sin 2 x cos x. x x x u=ln x v=sin x I want you to focus on the leftmost equality in the above, which is ( ) ( ) ( f x = + x u=ln x With your kin permission, I ll abbreviate it as v=sin x ) ( ) v. x f x = x + v x, (5) always remembering to substitute in the u = ln x v = sin x, as neee. (This abbreviation is customary.) Hmm... Equation (5) looks suspiciously like two instances of the chain rule ae together. That s because that s exactly what it is: Differentiating the given f actually requires you to use the Calc I chain rule twice, once for the u 2 term once for the v 3 term. (Go back look at Equation (2) to remember why I m talking about u 2 v 3.) Exercise 2.0.1. Reproce the logic above, so as to ifferentiate f = sinh 2 x ln(x 2 ) with respect to x. In the process, re-invent Equation (5). Hint: Think about u 2 ln v. 3 Chain rules for multivariate functions So Equation (5) is a chain rule; it uses two intermeiate variables (u v) one inepenent variable (x). What if there are two inepenent variables? Let s fin out. Let f(x, y) = ln(2x + y) + cos(3x y), think about calculating. Here s how it goes if we inlge our Calc I knee-jerk reaction: Differentiate the ln(2x + y) term using the Calc I chain rule, but remember to hol y constant ring the ifferentiation: ( ) 1 ln(2x + y) = 2x + y (2x + y) = 2 2x + y. Then o the same for the cosine term: cos(3x y) = ( sin(3x y) ) ( ) (3x y) = 3 sin(3x y), combine the results to get = 2 3 sin(3x y). (6) 2x + y Problem is, the knee-jerk reaction skips the intermeiate steps, which (a) keeps you from seeing what s really happening therefore also (b) puts you at risk for making mistakes later. So what are the missing steps? To see them, try setting u = 2x + y v = 3x y. Then f = f(u, v) = ln u + cos v, 2 which makes the bit equal to 2x + y, the sin(3x y) bit equal to. Likewise, u=2x+y v=3x y (2x + y) = 2 (3x y) = 3. Page 3
Then we can write Equation (6) as = 2 3 sin(3x y) = 2x + y Let s abbreviate this as ( 1 2x + y ) (2) + ( sin(3x y) ) ( 3) = u=2x+y + u=2x+y. = +. (7) This is just Equation (5), except we have to say f instea of x because x isn t the only inepenent variable anymore. If you re intereste, here s how it woul look to ifferentiate f(x, y) = ln(2x + y) + cos(3x y) using Equation (7) without all lea-up I ve written above: You start by saying, Hmm... Let s say u = 2x + y v = 3x y calculate + = ( ) 1 (2) + ( sin(3x y) ) (3) = 2 3 sin(3x y). x y x y Exercise 3.0.2. Use the ieas of this section to calculate for f(x, y) = ln(2x + y) + cos(3x y). In the process, you shoul come up with a chain rule similar to (but not the same as) Equation (7). Hint: Use the same u v that I i. 3.1 Aitional chain rules? You might be wonering whether there is a ifferent chain rule for every function. Fortunately, the answer is, nope. Exercise 3.1.1. Use Equation (7) to calculate for f(x, y) = xy + x y. Hint: Choose u v to be functions that are insie other functions. Exercise 3.1.2. Use the chain rule you create for Exercise 3.0.2 to calculate for f(x, y) = xy + x y. So: A given chain rule may serve to ifferentiate more than one function in fact, lots of functions. Nevertheless, there are lots of chain rules. For example, suppose you have to ifferentiate f = sinh(xz) + cosh(yz) + tanh(xyz) with respect to z. If we let u = xz, v = yz, w = xyz, the chain rule for z Exercise 3.1.3. Explain why the previous sentence is true. looks like this: z = z + z + z. (8) Exercise 3.1.4. Write own chain rules for, for the current example. Exercise 3.1.5. Go ahea calculate,, z for the current example, using the chain rules. Exercise 3.1.6. Write own the chain rule for ifferentiating f(x, y) = sin(xy) cos(xy) with respect to x use it to calculate. The o the same for ifferentiation with respect to y. (Hint: There s only one intermeiate variable in this example.) Page 4
3.2 Isn t there a better way to write own chain rules? Fortunately, there is a more flexible compact way of writing own chain rules than what we have so far. To fin out what it is, let s put a chain rule uner the microscope. How about the chain rule you shoul have iscovere while oing Exercise 3.0.2? It was = +. Hmm... This chain rule is a sum of procts... the first proct uses u in both factors; the secon uses v... hmm... u is first v is secon, in some sense?... uh, two erivatives; two components... Is this chain rule a ot proct of some vectors or other? as being the first secon components of a vector, if we think of as the first secon components of some other vector, we can write the chain rule as the following ot proct: [ [ + =. (9) Yes, actually. If we think of Clever, eh? (Wish I coul take creit for it!) Exercise 3.2.1. Write Equation (7) as a ot proct of suitable vectors. There is another way to write ot procts. Some people (myself inclue) write Equation (9) like this: + [ [ =. (10) (Note the absence of the ot.) Means EXACTLY the same thing as Equation (9). Right now, it s just another way to write the ot proct. Shortly, however, we will see that it s a more powerful more flexible way of writing certain types of multiplication. Exercise 3.2.2. Write Equation (7) in the same way as Equation (10). Equation (10) has some avantages over Equation (9). One is that it has a nice, compact representation. Another is that it can be extene to more complicate situations than writing a chain rule for a partial erivative with respect to a single variable. Let s take these in turn. 3.2.1 A more compact way of writing chain rules It is customary to use the symbol f #» [ to st for the vector 1 el f, the graient of f, gra f, or even nabla f. ) Also, some people write [ u for the column v can t, then ask someboy.) I m too lazy to write instea. (The symbol (u,v) [. (The symbol f #» is pronounce. (Convince yourself that this makes sense. If you [ u, so I m just as likely to write v for [, sts for the partial erivative of [ u again, as it takes the column v [ u with respect to y. This is sloppiness v, turns it into the row [ u v, puts a comma between the u the v, 1 The graient, being a row instea of a column, is not a vector, but a covector. It is very common to call the graient a vector, at this point in your ecation, it s even safe. So I will strive to resist the temptation to be picky about this. Page 5
changes the square brackets into roun parentheses, to get the (u, v) in the numerator of the symbol (u,v). Shameful, but customary.) With sloppy abbreviations like the above in h, we can write the multiplication [ [ as so that the chain rule of Equation (10) is now #» f, = #» f. (11) Exercise 3.2.3. Write the chain rule of Equation (7) in the same format as Equation (11). Then write out what it means in terms of a matrix multiplication a ot proct. Exercise 3.2.4. Repeat the previous exercise for the chain rule of Equation (8). 3.2.2 A more flexible way of writing chain rules The more compact metho of writing chain rules is also more flexible. Equation (8) can be written as z = f #» (u, v, w). z Likewise, if we nee to ifferentiate f = x 2 + y 2 + z 2 with respect to y, the chain rule is = #» f (u, v, w), (I m thinking of f as being u + v + w, with u = x 2, v = y 2, w = z 2.) For example, the chain rule of Exercise 3.2.5. Write out what = f #» (u,v,w) means calculate it, assuming f = x 2 + y 2 + z 2, as above. Exercise 3.2.6. Write own the chain rule for, again assuming f = x2 + y 2 + z 2 ; write out what this chain rule means, calculate it. Example 3.2.7. The ensity ρ of the water at a point uner the surface of the ocean epens on the temperature T, the epth, the salinity s at that point. The temperature the salinity both epen on the epth. Write own the chain rule for fining out how the ensity changes with epth, interpret it. Fine: ρ epens on T,, s. We can express this fact as ρ = ρ(, s, T ) (keeping everything in alphabetical orer, for the sake of goo bookkeeping). Likewise, T = T () s = s(). On the other h, is just. (You can say = () if you like, but it seems pretty silly.) The chain rule for how ensity changes with epth is (by analogy with Equation (7)) ρ = ρ #» (, s, T ). Page 6
We can calculate this by realizing that #» ρ = [ ρ ρ s ρ T (,s,t ) = together, we get that ρ = ρ #» (, s, T ) = [ ρ ρ s ρ T s = T 1 s = ρ + ρ s s + ρ T T. T 1 s T. Putting all this Notice that the rightmost expression clearly shows that ensity epens on epth (irectly, without regar to salinity or temperature that s the ρ part), but that ensity also epens on salinity temperature, which in turn epen on epth. Of course, we coul have sai all that with wors (just i!), but the statement ρ = ρ + ρ s s + ρ T T is cleaner, easier to rea, just plain more elegant. Moreover, this statement shows how the epenence of salinity temperature are incorporate into the epenence of ensity on temperature. Heh! Try oing that in wors! The foregoing example points out a shortcoming in our notation. We have sai elsewhere that ρ ρ mean the same thing. Yet, in the example above, they on t! What s going on here? Well, in the ρ s s + ρ T T part of our answer, we treate s T like intermeiate variables. But in the ρ part, we i not! In the ρ part, we hel s T fixe. It s as though we were saying ρ means The partial erivative with respect to, treating s T as intermeiate variables when it suits us, while saying ρ means The partial erivative of ρ with respect to, holing s T constant. The science engineering community believe they have a cure for this problem, but in my excessively autistic way, I on t believe they re cure oes the job. 2 4 The Chain Rule Our sloppy symbols are actually flexible enough to allow us to put all the partial erivatives of a function in one place. For example, Equation (11) its cousin = f #» (u,v) from Exercise 3.2.3 give us chain rules for the first partial erivatives of some function f with respect to x y, respectively. We can combine these two chain rules into one happy equation, like so: (x, y) = [ [ ux u f u f y v, (12) v x v y that is, if we can agree on what the symbols mean. You ll recognize the row [ f u f [ v as being the graient ux u of f, though it s laying own on the job. The boxy thing y is calle the Jacobian matrix of f. v x v y The symbol for the Jacobian matrix is the hopefully unsurprising write our chain rules together as (x, y) = #» f. Using this symbol allows us to (x, y) (x, y). (13) Fine, but how oes this symbol st for the two chain rules combine? Think of the right-h sie of Equation (12) as a multiplication. 3 To proce one of the chain rules properly, this multiplication has to inclue multiplying the graient by (u,v) (u,v). Since is the left column of the Jacobian matrix, the multiplication require inclues taking the ot proct of the graient with 2 If you want to know what the cure is, take a look at pages 844 846 of the text. 3 There s no symbol between the graient the Jacobian matrix, writing things next to each other with no symbol between has meant multiplication since 5th or 6th grae, yes? So, it s a multiplication. Page 7
(u,v) (u,v). Likewise, we nee the ot proct of the graient with, to get the other chain rule. So the multiplication in Equation (12) is a pair of ot procts. Specifically, it s [ [ u fu f x u y v = [ f v x v u u x + f v v x f u u y + f v v y. y If you like, you can write this as [ [ u fu f x u y v v x v y = [ #» f (u,v) #» f (u,v). I prefer to write it as Cleaner, yes? So now we can write [ [ u fu f x u y v = f #» v x v y (x, y) = #» f as in Equation (13). As a bonus, the symbol (u,v) (x,y) (x, y), (x, y). Equation (11) with (x,y). So: Equation (13) is really Equation (12), in isguise. We will call (u,v) is consistent with symbols like the (x,y) the total erivative of f. we use in Example 4.0.8. Let s calculate the total erivative of f = sinh(x 2 y 2 ) cos(x 2 + y 2 ). To o so, I suggest letting u = x 2 y 2 v = x 2 + y 2. Then f is f = sinh u + cos v, which shows how f epens on u v; bear in min that these two intermeiate variables epen on x y. Hmm... Souns like a job for Equation (13): (x, y) = #» f (x, y) = [ [ u f u f x u y v v x v y = [ cosh(x 2 y 2 ) cos(x 2 + y 2 ) sinh(x 2 y 2 ) sin(x 2 + y 2 ) [ 2x 2y 2x 2y [ = 2x cosh(x 2 y 2 ) cos(x 2 + y 2 ) 2x sinh(x 2 y 2 ) sin(x 2 + y 2 ) 2y cosh(x 2 y 2 ) cos(x 2 + y 2 ) 2y sinh(x 2 y 2 ) sin(x 2 + y 2 ) (The last expression is suppose to be a row, but it in t fit, so I put the first entry on one line the secon entry on the following line.) Exercise 4.0.9. Calculate the total erivative of f = sin(x + y) cos(x y). Exercise 4.0.10. What woul Equation (13) look like if f = sin(x 2 + y 2 + z 2 )? We now finally arrive at the chain rule. I want a nice way to write it. To create a nice way, note first that all our chain rules inclue the symbol #» f. But the symbol for the Jacobian is ifferent from one context to another, epening on how many intermeiate variables there are, how many inepenent variables. I will get aroun this problem by using the symbol J to st for the Jacobian. Likewise, the symbol for the total erivative epens on how many inepenent variables there are. I will get aroun this problem by using the symbol Df for the total erivative. Here is the long-awaite chain rule: Df = #» f J. (14) Page 8
Not very ramatic, perhaps, but this one equation now inclues all the chain rules there are in the universe, from Calc I on up. You may be intereste to know that suitable use of matrix multiplication can exten the chain rule to situations in which there are variables between the intermeiate variables the inepenent variables. I also note in passing[ that if you [ want your total erivative to be a genuine vector (as oppose to a row matrix), you can use = J T f; #» this is what you get when you transpose the matrices in Equation (14) reverse the orer of multiplication. 5 Aitional exercises Exercise 5.0.11. Use Equation (7) the chain rule you invente in Exercise 3.0.2 to calculate the first partial erivatives of f(x, y) = xy + x/y. Until I can get some more exercises written, look in your Calculus text, in the section on chain rules. They ll talk about branch iagrams or tree iagrams for helping you with the bookkeeping. That s fine, but try working your textbook s examples using the methos of this ocument, see if you get the same answers as the book oes. You better! 6 Answers, etc. Exercise 2.0.1. Let u = sinh x v = x 2. Then f = u 2 + ln v, so that: = 2u u=sinh x = 2 sinh x, u=sinh x x = cosh x = 1 = 1 v=x 2 v x 2, v x = 2x. v=x 2 The erivative of the u 2 term is therefore = (2 sinh x)(2x) = 4x sinh x, u=sinh x x the erivative of the ln v term is v=x 2 Put the pieces together to get f x = u=sinh x x + v x = 1 (2x) = 2/x. x2 v=x 2 Note: In practice, people usually think in terms of as neee. Their work typically looks like this, on paper: ( 1 = (2 sinh x) (2x) + x 2 v x = 4x sinh x + 2 x. x + v x substitute in the sinh x the x2, ) (2x) = 4x sinh x + 2 x. Exercise 3.0.2. Knee-jerk reaction: Differentiate the ln(2x + y) term using the Calc I chain rule, but remember to hol x constant ring the ifferentiation: ( ) 1 ln(2x + y) = 2x + y (2x + y) = 1 2x + y. Page 9
Then o the same for the cosine term: cos(3x y) = ( sin(3x y) ) ( ) (3x y) = sin(3x y), combine the results to get What we ve one here is you invent. = 1 sin(3x y). 2x + y = +. This is the chain rule similar to Equation (7) that I hope Exercise 3.1.1. Let u = xy v = x y. Then f = u + v, so = 1 2 u = 1 2 xy, = 1 2 v = 1 Equation (7) now says 2 x y = y 4x, = y, = 1 y. = + = y 2 xy + 1 y y y 4x = 4x + 1. 4xy Exercise 3.1.2. We can use The chain rule you create for Exercise 3.0.2 shoul have been = + = 1 2 xy from the previous exercise. But instea of, we nee = x = x y 2. y = 4x. Then = + = 1 y ( 2 xy x + xy ) x x 4x 2 = 4y 4y 3. Exercise 3.1.3. Well, f epens on u, v, w, all of which epen on z, but in ifferent ways. So the contributions to z that u, v, w all make have to be accounte for separately. The term z term escribes the epenence of f on z, via u, likewise for the terms z z. Aing the three terms together gives the total epenence of f on z. Exercise 3.1.4. Also, = + it s missing because = 0.) = +. The term Exercise 3.1.5. = + = + z = z + z + z. The term is missing, because v = yz oes not epen on x. is missing, because u = xz oes not epen on y. (If you prefer, = z cosh(xz) + yz sech(xyz) = z sinh(yz) + xz sech(xyz) = x cosh(xz) + y sinh(yz) + xy sech(xyz) Exercise 3.1.6. Let u = xy. Then f = sin u cos u, = f = y sin2 u + y cos 2 u = y(cos 2 xy sin 2 xy) = y(cos 2xy 1). Page 10
Likewise, = f = x sin2 u + x cos 2 u = x(cos 2 xy sin 2 xy) = x(cos 2xy 1). Exercise 3.2.1. Exercise 3.2.2. Exercise 3.2.3. = [ = [ [ [ = f #» [ (u,v) = [ = [ [. Oh. This looks a lot like what we wrote for the previous two exercises! (Sorry about the repetition. I wante to rive home the point that we re just writing the same thing in three ifferent ways.) Exercise 3.2.4. Exercise 3.2.5. = f #» (u,v,w) means = #» f (u, v, w) = [ f u f v f w u y, = [ 1 1 1 0 2y 0 v y w y = 1(0) + 1(2y) + 1(0) = 2y which we calculate as Exercise 4.0.9. Let u = x + y v = x y, so that f = sin u cos v. Then (x, y) = f #» [ [ (x, y) = = [ cos u cos v sin u sin v [ 1 1 = [ cos u cos v sin u sin v cos u cos v + sin u sin v. 1 1 Exercise 4.0.10. Let u = x 2, v = y 2, w = z 2, so that f = sin(u + v + w). Then the total erivative of f is (x, y, z) = f #» (u, v, w) [ z (x, y, z) = z = [ cos(u + v + w) cos(u + v + w) cos(u + v + w) 2x 0 0 0 2y 0 0 0 2z = [ 2x cos(u + v + w) 2y cos(u + v + w) 2z cos(u + v + w) = [ 2x cos(x 2 + y 2 + z 2 ) 2y cos(x 2 + y 2 + z 2 ) 2z cos(x 2 + y 2 + z 2 ) z Page 11
Exercise 5.0.11. Equation (7) says = + = + = (1)(y) + (1). If we let u = xy v = x y ) ( 1 y = y + 1 y. Similarly, in Exercise 3.0.2, you shoul have foun that = +. This implies that = + = (1)(x) + (1) ( xy ) 2 = x x y 2., then f = u + v, Page 12