Change Of Variable Theorem: Multiple Dimensions Moulinath Banerjee University of Michigan August 30, 01 Let (X, Y ) be a two-dimensional continuous random vector. Thus P (X = x, Y = y) = 0 for all (x, y). Also assume that (X, Y ) has a density function f(x, y). What this means is the following: For any nice( measurable ) subset of R, the probability that (X, Y ) assumes values in R can be represented as P ((X, Y ) A) = f(x, y) dx dy. This is an extension of the requirement in the univariate case. Also, f(x, y) 0 for all (x, y). Thus, the volume enclosed by the surface {x, y, f(x, y)} in x y z space over the area A gives the chance that (X, Y ) takes values in A. We will discuss the change of variable theorem, which enables us to find the density of the random vector (U, V ) which is a nice transformation of (X, Y ). Nice will be made precise in what follows. Change of variable theorem THE HAIRY TECHNICAL VERSION: Let (X 1, X ) be jointly distributed continuous random variables with density function f X (x, y). Let S be an open subset of R, such that P ((X 1, X ) S) = 1 (so the density f can be assumed to be concentrated on S). Let g be a transformation from S to R. Thus we can write, (Y 1, Y ) g(x 1, X ) = (g 1 (X 1, X ), g (X 1, X )), where g 1 and g are both real-valued. Now assume that, (1) g has continuous first partial derivatives on S. () g is a 1 1 function. (3) Let A(x 1, x ) be the matrix whose first row is ( g1 (x 1, x ), g ) ( y1 (x 1, x ) (x 1, x ), y ) (x 1, x ) 1 A,
and whose second row is ( g1 (x 1, x ), g ) ( y1 (x 1, x ) (x 1, x ), y ) (x 1, x ). x x x x Let J g (x 1, x ) = abs (det A(x 1, x )) = y 1 (x 1, x ) y (x 1, x ) y (x 1, x ) y 1 (x 1, x ) x x, be the Jacobian of g. Then, J g (x 1, x ) does not vanish for any (x 1, x ) S. Let h denote the inverse transformation of g. Thus h is defined on g(s) and h(y 1, y ) (h 1 (y 1, y ), h (y 1, y )) for (y 1, y ) in g(s) is the unique (x 1, x ) in S such that (g 1 (x 1, x ), g (x 1, x )) = (y 1, y ). Then h itself has continuous first partial derivatives on g(s) and is clearly 1 1. Also, if B(y 1, y ) denotes the matrix of first partial derivatives of h, then the Jacobian of h, J h (y 1, y ) = (y 1, y ) x (y 1, y ) x (y 1, y ) (y 1, y ) y 1 y y 1 y, where x 1 = h 1 (y 1, y ) and x = h (y 1, y ), does not vanish on g(s) and in fact J h (y 1, y ) = J g (h 1 (y 1, y ), h (y 1, y )) 1. Also, the density of the random vector (Y 1, Y ) is given by, f Y (y 1, y ) = f(h 1 (y 1, y ), h (y 1, y )) J h (y 1, y ), (y 1, y ) g(s) f Y (y 1, y ) = 0 otherwise. Thus, for any nice subset I of S, we have, f X (x 1, x ) dx 1 dx = P ((X 1, X ) I) I = P ((Y 1, Y ) g(i)) = f(h 1 (y 1, y ), h (y 1, y )) J h (y 1, y ). g(i) What it boils down to in SIMPLE language but with caveats: Given (X 1, X ) with joint density f X (x 1, x ), consider (Y 1, Y ) which can be expressed as a nice (appropriately smooth and one-to-one) function of (X 1, X ). To find the density of (Y 1, Y ) we go through the following steps:
Express (X 1, X ) as a function of (Y 1, Y ), i.e. solve for (X 1, X ) in terms of (Y 1, Y ). Thus X 1 = h 1 (Y 1, Y ) for some function h 1 and X = h (Y 1, Y ) for some function h. Calculate J h (y 1, y ) = (y 1, y ) x (y 1, y ) x (y 1, y ) (y 1, y ) y 1 y y 1 y. The density of (Y 1, Y ) at any point (y 1, y ) in the domain, D Y, of (Y 1, Y ) (i.e. the region in which (Y 1, Y ) lives with probability 1) is: f Y (y 1, y ) = f(h 1 (y 1, y ), h (y 1, y )) J h (y 1, y ). We now do an application of the change of variable theorem, that will clearly illustrate what is going on. The theorem looks big and messy at first shot but really has a nice pattern, once you keep staring at it. Those of you who remember your advanced calculus well, will probably spot resemblances to the change of variable theorem in calculus (for two variables). In fact, this is precisely what the above theorem, which we will subsequently refer to as the Jacobian theorem, is, but in a different garb. The theorem extends readily to the case of more than variables but we shall not discuss that extension. Suppose that (X 1, X ) are i.i.d. Exponential(λ) random variables. Thus, f X (x 1, x ) = λ e λ x 1 λ e λ x = λ e λ (x 1+x ), (x 1, x ) S, where S is the open set {x 1 > 0, x > 0}. Consider the following transformation, g, of (X 1, X ). (Y 1, Y ) = g(x 1, X ) = (g 1 (X 1, X ), g (X 1, X )) = (X 1 + X, X 1 /(X 1 + X )). Then, g(s), the open set in which the random vector (Y 1, Y ) assumes values is, g(s) = {(y 1, y ) : 0 < y 1, 0 < y < 1}. Computing the partial derivatives of g we have, and g 1 = 1, g 1 x = 1, g = x (x 1 + x ), g x 1 = x (x 1 + x ). 3
Clearly, the partial derivatives are continuous functions of (x 1, x ); also, g is clearly a 1 1 function on S and furthermore, J g (x 1, x ) = x 1 + x (x 1 + x ) = 1 > 0, x 1 + x for every (x 1, x ) in S. Thus, all conditions of the Jacobian theorem are satisfied. To obtain the density function of (Y 1, Y ) we need to find the inverse transformation. This amounts to expressing (X 1, X ) in terms of (Y 1, Y ). Note that, Y (X 1 + X ) = X 1 ; but Y 1 = X 1 + X. Thus Y Y 1 = X 1. Consequently, X = Y 1 X 1 = Y 1 Y 1 Y = Y 1 (1 Y ). Thus, we obtain the function h from g(s) to S as, h 1 (y 1, y ) = y 1 y, h (y 1, y ) = y 1 y 1 y. The density of (Y 1, Y ) at the point (y 1, y ) in g(s) is then computed as, on noting that f Y (y 1, y ) = f X (h 1 (y 1, y ), h (y 1, y )) J h (y 1, y ) = λ e λ (h 1(y 1,y )+h (y 1,y )) J g (h 1 (y 1, y ), h (y 1, y )) 1 = λ e λ (y 1 y +y 1 y 1 y ) y 1, Thus we can rewrite the density of (Y 1, Y ) as J g (x 1, x ) 1 = x 1 + x. f Y (y 1, y ) = (λ e λ y 1 y 1 ) 1{y 1 > 0} 1{0 < y < 1}. The above shows immediately that Y 1 and Y are independent and that Y 1 follows Γ(, λ) while Y follows U(0, 1). Here I am tacitly using propositions on factorization of joint densities as a product of marginal densities as a necessary and sufficient condition for independence of random variables, a fact you would have learnt in Stat/Math 45. Here is another application of the Change of Variable Theorem and one that gives a way of generating observations from a Normal distribution. Let (X, Y ) be i.i.d. N(0, 1) random variables. Let R be the radius vector corresponding to the point (X, Y ) and let Θ be the angle that R subtends with the positive direction of the x axis. Thus (R, Θ) represents the vector (X, Y ) in polar co-ordinates and we have the following equations: X = R cos θ and Y = R sin θ. (Recall the picture that I drew in class). We want to find the joint density of (R, θ). Note that (R, Θ) lives, with probability 1, in the open set (0 ) (0, Π). When we 4
express X and Y in terms of R and Θ we are looking at the inverse transformation h; the transformation g that maps (X, Y ) to (R, Θ) is a nice transformation in the sense that it satisfies the assumptions (1), () and (3) of the Change of Variable Theorem. We first write down the joint density of X, Y ). f X,Y (x, y) = f X (x) f Y (y) = 1 exp ( x π Now, ) 1 π exp ) ( y = 1 π exp (x, y) = (h 1 (r, θ), h (r, θ)) (r cos θ, r sin θ). We next compute the Jacobian of h at the point (r, θ). This is, J h (r, θ) = x y r θ y x r θ = cos θ r cos θ sin θ ( r sin θ) = r cos (θ) + r sin (θ) = r. ) ( x + y. Thus the joint density of (R, θ) is, f R,Θ (r, θ) = 1 ( π exp h ) 1(r, θ) + +h (r, θ) J h (r, θ) 1{r > 0} 1{0 < θ < π} = 1 ( ) π exp r cos (θ) + r sin (θ) r 1{r > 0} 1{0 < θ < π} = 1 π 1{0 < θ < π} r exp ( r / ) 1{r > 0}. This immediately shows that R and Θ are independent, and that Θ has the uniform distribution on (0, Π) with marginal density, The density of R is, f Θ (θ) = 1 1{0 < θ < π}. π f R (r) = r exp ( r / ) 1{r > 0}. Thus, if we generate R and Θ independently, with marginal distributions given as above, then X = R cos θ and Y = R cos θ are i.i.d. N(0, 1) random variables. To generate R and Θ we proceed as follows: Recall that if F is the distribution function of a random variable X, then F 1 (U) has the same distribution as X, where U is a random variable distributed uniformly on (0, 1). Now, it is easy to show (by using the change of variable theorem in 1 5
dimension discussed in the previous section) that R follows exponential(1/) (this is left as an exercise). If F denotes the distribution function of exp(1/), we have, so that F (w) = 1 exp ( w/), F 1 (p) = log(1 p). Thus if U 1 and U be i.i.d U(0,1) random variables, then log(1 U 1 ) follows exp(1/) and π U has a uniform distribution on (0, π). Consequently, we can take, R = log(1 U 1 ) and Θ = π U. Relevant reading from Rice s Book: Chapter 3 with emphasis on Sections 3.1 through 3.6. Potential problems for discussion: Problems 19, 4, 48, 65. 6