Sometimes the domains X and Z will be the same, so this might be written:

II. MULTIVARIATE CALCULUS The first lecture covered functions where a single input goes in, and a single output comes out. Most economic applications aren t so simple. In most cases, a number of variables influence a decision, or a number of factors are required to produce some good. These are functions of many variables. If f is a function of X and Z with range Y, this is written: y = f ( x,z) f : X Z Y Sometimes the domains X and Z will be the same, so this might be written: f : X 2 Y f : X X Y Of course, f could be a function of more than two inputs. A production function might have capital, labor, and material inputs: Y = F( K, L, M ) Utility might be a function of goods (which are consumed in nonnegative amounts), taking on any real value: U : R + R Demand for a particular good x 1 is a function of the price of that good p 1, the price of other goods (let s say just p 2 ) and the person s wealth: x = x( p 1,w) In undergrad micro, you probably talked about how quantity demanded changes when something else, like the price of y, changes, holding everything else constant or ceteris paribus. This translates into the idea of a partial derivative, denoted by: x p 1 or sometimes: x p1 ( p 1,w) or x 1 ( p 1,w) The price of good x is held constant, and wealth is held constant though the person s wealth may change if he is a large producer of good y, we re not concerned about those possible effects. A subscripted variable denotes the partial with respect to the variable; a subscripted number means the partial derivative with respect to the that argument of the function. The marginal product of labor might be denoted by F L ( K, L, M ) or F 2 (K, L, M ). However, I dislike these notations. Fall 2007 math class notes, page 7

In practice, computing a partial derivative f x is like pretending that f (x,z) is just a function of x, with z constant. Because of this, the addition rule, the product rule, the quotient rule, and the power rule all work the same as in the univariate case. As with univariate functions, you can take the second derivative of a multivariate function. We write this as 2 f. You can also take the cross-partial derivative: 2 f xz is obtained by first taking the derivative of f with respect to x, then taking the derivative of that function with respect to z. (The order doesn t matter.) Example: The Cobb-Douglas utility function is u(x,z) = Ax z 1. The partial derivatives of this function are u x = Ax 1 z 1 and u z = A(1 )x z. The second derivatives are 2 u = A( 1)x 2 z 1 and 2 u z 2 = A(1 )()x z 1. The cross-partial derivative is 2 u xz = A(1 )()x 1 z = 2 u zx. The partial derivative has a ceteris paribus interpretation: it holds all other variables constant. Considering that other factors may change is the notion of a total derivative. Consider that demand for x is a composite function of prices and wealth, which is itself a function of prices: x = x( p 1,w(p 1 )) Then the total derivative of x with respect to the price of y is: dx dp 2 = x( p, p,w) 1 2 ( ) + x p, p,w 1 2 p 2 w w p 2 The first term on the right-hand side is the partial derivative of x with respect to price of y. This tells you the effect holding everything else constant. For the total derivative, we also add in these other effects in this case, that the person s wealth changes when the price of y changes, and that his demand for x changes when his wealth changes. The second term on the right-hand side comes from applying the chain rule. At this point, let s try some problems. We can combine all partial derivatives together and give the differential of a function. Remember that you can use the first derivative to approximate the change in a function: y f ( x) x (This has a graphical interpretation, as well.) When the x gets really, really tiny this is a really, really good approximation. When its gets infinitely tiny or infinitesimal, this turns out to be exactly correct, not an approximation at all. The convention in calculus is, of course, to use dx to denote this small change: dy = f ( x) dx Fall 2007 math class notes, page 8

This is known as a differential. If you divide both sides by dx, you get the derivative. In multivariate calculus, when you have a function like y = f (w, x,z), you can write out its complete differential as: dy = f w dw + f x dx + f z dz For example, take the function: y = f ( w, x,z) = 7 + 12x 3 w 2 + w z 6z The total differential of this is: dy = 24x 3 w + 1 z dw + (14x + 36 w 2 ) dx + (w ( 6 z 2 dz We can use this to talk about how y changes when all of the other variables change. It means that it is approximately true that: ( ) = 24x 3 w + 1 z y f w, x,z ( w + (14x + 36 w 2 ) ( x + )w ) 6 z 2 ( z You can plug w, x, and z into this equation and get the approximate y. The differential form is exactly (not just approximately) true when the changes are infinitesimal. You can use the total differential in other calculations. Suppose that x and z don t change; we want to see how a small change in w alone affects y. Translating this into math: dx = dz = 0, dw > 0 (but really tiny) means that: dy = 24x 3 w + 1 z dw For a one-unit change in w, y changes by 24x 3 w + 1 z. This is exactly the same as the partial derivative of y with respect to x, which shouldn t be much of a surprise. However, we can also use the differential to come up with more interesting calculations. Suppose that the total product, Y, of a firm depends on the amount of capital and labor it uses, K and L: Y = F(K, L) Suppose that the firm were to decrease its labor by a few units. How much would it have to increase its capital in order to keep output the same? When we find the marginal rate of technical substitution, we are looking for dk dl, holding dy = 0. The differential of this function is: Fall 2007 math class notes, page 9

dy = F F dk + K L dl We want to set dy = 0 (so set the left-hand side equal to zero), and rearrange for dk dl. 0 = F F F F dk dk + dl dk = dl K L K L dl = F L F K You might have known that the marginal rate of technical substitution was equal to the ratio of marginal products, but this is the derivation. Another interesting question would is how much of one good a person would require to offset in utility changes from a change in another good. Let s say there are goods, labeled x 1 through x, in the person s utility function: U = U ( x 1,, x 3,, x ) The differential of this function is: du = U x 1 dx 1 + U d + + U x dx We want to know how much of good k the person would trade for good m, keeping his utility the same. He is not trading another other goods. Then: 0 = U x k dx k + U x m dx m Rearranging produces the marginal rate of substitution: dx k dx m = U x m U x k Again, you may recall that the MRS equals the ratio of marginal utilities. It should also be the slope of a line tangent to the indifference curve at that point. Think about this in a two-good world draw some pictures in (x k, x m ) space. If we start with a particular point and name all others points which do not change the utility that is, the set of all points such that du = 0 we ve identified an indifference curve. Its slope at this particular point would be described by dx k dx m, which is exactly what we found here. Indifference curves and technology frontiers are both examples of implicit functions. An explicit function is what we usually think of as a function: y = f ( x) Fall 2007 math class notes, page 10

An implicit function looks like: c = f ( x, y) where c is some constant. For example, when we have two goods x 1 and, an indifference curve is an implicit function: it contains all points such that U(x 1, ) = u 0 for some level of utility u 0. This really is a function in the sense that for each x 1, there is exactly one that corresponds with it. However, sometimes it s tricky to isolate as an explicit function of x 1. With a Cobb-Douglas utility function like, U(x 1, ) = x 1 1 You would be able to solve for the equation of the indifference curve: x 1 1 = u 0 1 = u 0 x 1 = (u 0 x 1 ) 1/(1 ) and from this, you could obtain the slope of the indifference curve at any point: d dx 1 = ( / (1 ))u 0 1/(1 ) x 1 /(1 )1 On the other hand, a simple quadratic utility function looks like: U(x 1, ) = a x 1 b x 1 2 + c d 2 + e x 1 The indifference curves are perfectly good implicit functions, a x 1 b x 1 2 + c d 2 + e x 1 = u 0 but it s difficult to solve this for an explicit relationship between the variables. However, we can still find the slope of the indifference curve fairly easily by using the differential: (a 2b x 1 + e ) dx 1 + (c 2d + e x 1 ) d = 0 This can be rearranged to give us: d = (a 2b x + e x ) 1 2 dx 1 (c 2d + e x 1 ) The point is that even if you can t solve y explicitly in terms of x, you can still evaluate its derivative. When you re using a generic utility functions or production function, you can t solve explicitly for one variable in terms of another. You can still look at marginal effects, as we did earlier. These are all applications of my favorite principle in mathematics, the implicit function theorem. Fall 2007 math class notes, page 11

( ) be a function that is continuously differentiable in the ( ). Suppose that f ( x, y) = c, and f y 0 at( x, y ). ( ) and: Theorem: Let f x, y neighborhood of a particular point x, y Then there exists a function such that y = x ( x) = dy dx = f x f y in this neighborhood of( x, y ). This is very useful in economics. For instance, consider a person who gets utility from consumption when young and when old. The first of these equals w 1 s, wealth when young minus savings, and the second is w 2 + ( 1+ r)s, wealth when old plus the return on savings. The utility function is: ( ) = U ( w 1 s, w 2 + ( 1+ r)s) U c 1,c 2 We usually find that a utility maximizing individual sets the marginal rate of substitution of consumption when old for consumption when young equal to the gross interest rate: ( ) c 1 = 1+ r ( ) c 2 U w 1 s, w 2 + ( 1+ r)s U w 1 s, w 2 + ( 1+ r)s ( ) How does a change in the interest rate affect savings? One side of the equation needs to be a constant, which can almost always be achieved by simply subtracting one side off. It also looks like we can make things simpler by multiplying through by the denominator. Let s give this implicit function the name Q : ( ) U w 1 s, w 2 + 1+r Q w 1,w 2,s,r ( ( )s ) 1+ r c 1 ( ( )s ) c 2 ( ) U w 1 s, w 2 + 1+r The implicit function theorem tells how to find all of these effects; for instance, ds Q r = dr Q s Finally, differentials can sometimes even help us take the derivatives of explicit functions especially ones that are a bit messy. Suppose that we re given: y = a x and we re asked to find the derivative dy dx. The power rule doesn t work here; that works with functions like x a, not the other way around. We need to get x out of the exponent, and the easiest way to do that is by taking the natural logaritm of both sides of the equation: Fall 2007 math class notes, page 12

ln y = ln(a x ) = x lna Now we find the total differential of this function: dy y = dx lna and we rearrange to get dy dx : dy = dx ylna dy dx = ylna = ax lna That s it for taking derivatives. It s now time to briefly introduce matrix notation, since matrices come in handy when working with derivatives of multivariate functions. I like to think of matrices as a very compact notation summarizing the variables. For example, we might have a function that depends on a bunch of xes: y = f (x 1,,, x k ) It becomes cumbersome to write all of those xes repeatedly. Instead, let s just let x = (x 1,,, x k ) denote that bunch of x variables. We call x an k-dimensional vector: a collection of k variables in a specific order. We can write this function as: y = f (x) This simplifies things greatly. Vectors are also useful when we have a bunch of similar variables, and another bunch of similar variables, and we multiply them pairwise. A budget constraint is one example: p 1 x 1 + p 2 + + p L x L w On the left-hand side, we have L different prices and L different goods, and we multiply the price of one good by the quantity of that good. The covariance between two variables is another example: Cov(X,Y ) = N i=1 X i Y i (N 1) NXY = (X 1 Y 1 + X 2 Y 2 + + X N Y N ) (N 1) NXY In the first term on the right-hand side, we have a collection of N values of X and N values of Y, and we multiply each X by the corresponding value of Y. It should be clear that we could write the bundle of goods in the first example as a vector: x = (x 1,,, x L ) R L. We can do the same with prices: p = (p 1,, p L ) R L. In the second example, we can stack all of the X random Fall 2007 math class notes, page 13

variables into a vector, and we can stack all of the Y values into another vector: X = (X 1, X 2,, X N ) and Y = (Y 1,Y 2,,Y N ). We would like to have a simply notation to indicate this pairwise multiplication of the elements of the vectors. Given two vectors a,b R n, the dot product or inner product of the variables is: k a b = i=1 a i b i So we can represent the budget constraint as simply, p x w and a covariance is defined as: Cov(X,Y ) = X Y NXY That s it for multiplication. There s little else that can be said about vectors that doesn t generalize to matrices, so we ll move there next. Whereas you can think of a vector as a single list of similar variables, a matrix consists of multiple side-by-side lists. Technically, an n k matrix X consists of nk elements, arranged in n rows and k columns: x 11 x 12 x 1k 1 x 22 X = x n1 x n2 x nk As with vectors, the elements of a matrix are scalars. A scalar is a real number (or a function that takes on a specific value). Incidentally, a vector is just a special kind of matrix: it is a matrix with a single column. An n-dimensional vector is nothing more or less than an n 1 matrix. Anyhow, a 4 2 matrix would look like: 2 1 3 0 A = 5 7 8 7 And a 1 3 might be: B = [ 3 1 2] Fall 2007 math class notes, page 14

The dimensions of a matrix are always given in row x column order. It is important to keep these straight. A familiar matrix-like thing is a spreadsheet. I might keep students exam scores in an Excel file, like: Student Exam 1 Exam 2 Exam 3 Ann 90 85 86 Bob 78 62 73 Carl 83 86 91 Doris 92 91 90 Pat 97 98 93 Essentially, I have a 5 3 matrix of grades, 90 85 86 78 62 73 G = 83 86 91 92 91 90 97 98 93 The i-th column of this matrix is the vector of scores on the i-th exam; the j-th column represents the grades of the j-th student. There are three things that you need to know how to do with matrices now: transpose, add, and multiply. We ll learn the rest later. Given an n k matrix A with the entries described as above, the transpose of A is the k n matrix A (sometimes written as A T ) that results from interchanging the columns and rows of A. That is, the i-th column of A becomes the i th row of A ; the j-th row of A becomes the j-th column of A : a 11 a 12 a 1k a 21 a 22 a 2k A = a n1 a n2 a nk n k a 11 a 21 a n1 a 12 a 22 a n2 ( A ) = a 1k a 2k a nk k n Think of this like flipping the matrix on its diagonal. Example: With the matrix of grades above, Fall 2007 math class notes, page 15

90 85 86 78 62 73 G = 83 86 91 92 91 90 97 98 93 G = 90 78 83 92 97 85 62 86 91 98 86 73 91 90 93 Addition of matrices is fairly straightforward, defined in this manner. Given two matrices A and B that have the same dimension n k, their sum A + B is also an n k matrix, which we obtain by adding elements in the corresponding positions: a 11 + b 11 a 12 + b 12 a 1n + b 1n a 21 + b 21 a 22 + b 22 a 2n + b 2n A + B = a m1 + b m1 a m2 + b m2 a mn + b mn Not all matrices can be added; their dimensions must be exactly the same. As with addition of scalars (that is, addition as you know it), matrix addition is both commutative and associative; that is, if A and B and C are matrices of the same dimension, then A + B mn ( ) + C = A + ( B + C) and A + B = B + A. Example: Let D and E be the matrices below: 1 2 1 0 D = 3 4, E = 1 1 6 7 0 1 Then their sum is the matrix: 1 2 1 0 1+ 1 2 + 0 2 2 D + E = 3 4 + 1 1 = 3 + 1 4 + 1 = 4 5, 6 7 0 1 6 + 0 7 + 1 6 8 Again, matrix addition probably feels very natural. Matrix subtraction is the same. There are two types of multiplication used with matrices, and the first should also feel natural. This is called scalar multiplication: when we multiply an entire matrix by a constant value. If is some scalar (just a single number), and B is an n k matrix, then B is computed by multiplying each component of B by the constant : Fall 2007 math class notes, page 16

b 11 b 12 b 1k b 21 b 22 b 2k B = b n1 b n2 b nk n k )b 11 )b 12 )b 1k )b 21 )b 22 )b 2k ( )B = )b n1 )b n2 )b nk n k Scalar multiplication has all the familiar properties: it is distributive, commutative, and associative. That is, ( A + B) = A + B, ( + )A = ( + )A, (A) = ()A, and ( + )A = A + A. Example: Use the matrix D from the previous example. Then 4D is the matrix: 4 1 4 2 4 8 4 D = 4 3 4 4 = 12 16 4 6 4 7 24 28 Multiplying one matrix by another matrix is more complicated. Matrix multiplication is only defined between an n k matrix A and an k m matrix B, and the order matters. The number of columns in the first must equal the number of rows in the second. Their product is the n m matrix C, where the ij-th element is defined as: c ij = a i1 b 1 j + a i2 b 2 j + + a ik b kj + + a in b nj In other words, we take get c ij by taking the dot product of the i-th row of A and the j-th column of B: a 11 a 1 a 1k a i1 a i a ik a m1 a n a nk n k b 11 b p1 b k1 b 1j b pj b kj b 1m b pm b km k m c 11 c 1 j c 1m = c i1 c ij c im c n1 c nj c nm n m Notice that multiplying a row of A by a column of B is unlikely to give you the same answer as multiplying a column of A by a row of B. Matrix multiplication is not commutative: AB BA (except by coincidence, or when both are diagonal matrices of the same dimension). It is very, very important to keep the order right. Here are two other rules to know about matrix multiplication: AB = 0 / ( A = 0 or B = 0) and: AB = AC / B = C Fall 2007 math class notes, page 17

except in special cases. Fortunately, matrix multiplication is still associative and distributive. That is, A( BC) = ( AB)C and A( B + C) = AB + BC. This makes multiplication a bit easier. Because I find it really hard to remember which column gets multiplied by which row and ends up where, I use this trick to keep everything straight when multiplying matrices. I align the two matrices A and B so that the second one is above and to the right of the first. For each row i of A I trace a line out to the right, and each column j of B a line going down, and where these intersect is where their product lies in the matrix C. This is like a coordinate system for the c ij. a 11 a 1k a 1n a i1 a ik a in a m1 a mk a mn b 1j b 1p b 11 b k1 b kj b kp b n1 b nj b np c 11 c 1 j c 1p c i1 c ij c ip c m1 c mj c mp I also find this trick very useful for multiplying a bunch of matrices. If we have find the product ABD of three matrices, Once I find C = AB as above, all I have to do is stick the matrix D immediately to the right of B, and I have my coordinate system for the product of C and D. Example: Let F by a 2 2 matrix, and let G be a 2 2 matrix, defined below: F = 1 2 3 4, G = 1 0 1 2 Then the product FG is the 2 2 matrix: FG = 1 2 1 0 1(1 1( 2 0 (1+ 2 ( 2 3 4 1 2 = 1( 3 1( 4 0 ( 3 + 2 ( 4 = 1 2 0 + 2 3 4 0 + 8 = 1 2 3 8 Example: Let C by a 2 3 matrix, and let D be a 3 2 matrix, defined below: Fall 2007 math class notes, page 18

C = 1 2 0 1 2 0 3 1, D = 3 4 6 7 Then the product CD is the 2 2 matrix: CD = 1 2 0 1 2 0 3 1 3 4 1(1+ 2 ( 3 + 0 ( 6 1( 2 + 2 ( 4 + 0 ( 7 = 0 (1+ 3( 3 1( 6 0 ( 2 + 3( 4 1( 7 6 7 = 1+ 6 + 0 2 + 8 + 0 0 + 9 6 0 + 12 7 = 7 10 15 5 Let s return to calculus now. When we have a function like y = f (x), we have a lot of partial derivatives: f x 1, f,, f x k. It is convenient to define a matrix with all of these partial derivatives: f x is the 1 k matrix: f x = [f x 1 f f x k ] A matrix of first derivatives is sometimes called the Jacobian (matrix) of the function. By convention, it is always a 1 k matrix, not a k 1 vector. If you want to write it the other way, you should use one of the following notations: f x = (f x) (It is important to remember that these are equivalent.) Example: A person s utility function is u(x 1, ) = Ax 1 1. We can write this equivalently as u(x) = Ax 1 1. The matrix of first derivatives is: [ ] = Ax 1 1 1 u x = u x 1 u A(1 )x 1 So far, the functions covered have given a single real number as their output. A vector-valued function returns a real number in each of several dimensions. For example, we have a bunch of demand functions, which all depend on the same prices: x 1 = x 1 (p 1,, p L,w) = x 1 (p,w) = (p 1,, p L,w) = (p,w) x L = x L (p 1,, p L,w) = x L (p,w) It is much simpler to define a vector of demands, x = (x 1,,, x L ), and express this as a vector-valued function: x = x(p,w) Fall 2007 math class notes, page 19

This represents exactly the same thing as the system of equations above; it is simply a compact notation. In this case, x is a function that depends on L + 1 variables and it returns L values, so we would write x : R L +1 R L. When f : R k R n, its Jacobian is a n k matrix with f i (Equivalently, the ij-th element is f i x j.) x in the i-th row. f 1 x f 1 x 1 f 1 f 1 x k f 2 x f f x = 2 x 1 f 2 f 2 x = k f n x f n x 1 f n f n x k Example: The Cobb-Douglas utility function gives demand functions of x 1 (p 1,w) = w p 1 and (p 1,w) = (1 )w p 2. We might write: w p 1 x(p,w) = (1 )w p ( 2 for the demand function, and x(p,w) p = x p 1 1 x 1 p 2 p 1 p 2 = ()w p 2 1 0 2 0 ((1( ))w p 2 for the Jacobian of the demand function. We can also write a matrix of second derivatives (and cross-partial derivatives). If f : R k R, then (f x) = f x is a vector-valued function. We can take its derivative, and so 2 f x x is the k k matrix: 2 f x x = 2 2 f x 1 2 f x 1 2 f x 1 x k 2 f x 1 2 2 f 2 ( f x k ( ( 2 f x 1 x k 2 f x k 2 2 ( f x k This matrix of second derivatives and cross-partial derivatives is often called the Hessian of the function. Example: The Hessian of the Cobb-Douglas utility function is: u x x = A( 1)x 1 A(1 )x 1 1 2 1 A(1 )x 1 1 ( A(1 )()x 1* 1 ) Fall 2007 math class notes, page 20