Geometry of Transformations of Random Variables

Geometry of Transformations of Random Variables Univariate distributions We are interested in the problem of finding the distribution of Y = h(x) when the transformation h is one-to-one so that there is a unique x = h (y) for eah x and y with positive probability or density. In the ase of disrete random variables, the transformation is simple. P (Y = y) = P (h(x) = y) = P ( X = h (y) ) In ontrast, for absolutely ontinuous random variables, the density f Y (y) is in general not equal to f X (h (y)). The reason is that the geometry of the transformation beomes more omplex as the dimension inreases. For disrete distributions, probability is loated at zero-dimensional points, and the transformations do not affet the size of points. For univariate absolutely ontinuous distributions, however, probability is assoiated with the integral of a density over a one-dimensional line segment. Transformations an hange the lengths of intervals, as shown here where an interval of length dx is transformed to smaller interval of length dy. h y+dy y x x+dx Figure : Transformation Y = h(x). The figure shows Y = h(x) over a very small interval so that h appears to be essentially linear. For small dx, the probability in the interval (x, x + dx) is approximately f X (x)dx. The density at y = h(x) will be the limit of the ratio of this probability over the length of the interval between h(x) and h(x + dx) whih is h(x + dx) h(x). (If h (x) < 0, then h(x + dx) < h(x) so the absolute value is needed.) As h is differentiable, the approximation h(x + dx) h(x) + h (x)dx is aurate for very small dx and it follows that the transformed interval has approximate length h (x) dx. The density at y is then f Y (y) = f X(x)dx h (x) dx = f X(h (y)) h (h (y)) after applying x = h (y). Multivariate Distributions We would like to extend this idea to joint densities. If random variables X = (X,..., X n ) have joint density f X, we aim find the joint density f Y of the random variables Y = (Y,..., Y n ) where

we write Y = h(x) to mean Y i = h i (X,..., X n ) for i =,..., n. We will assume that h is a differentiable bijetion whih means that all partial derivatives h i / x j exist and that the vetor equation (y,..., y n ) = h(x,..., x n ) has a unique solution (suh that f X > 0) with (x,..., x n ) = h (y,..., y n ). Bivariate Distributions. We motivate the general answer by examining the bivariate ase (Y, Y 2 ) = h(x, X 2 ). The density at h(x, x 2 ) is the limiting ratio of the probability in a retangle with a orner at (x, x 2 ) with sides of length dx and over the area of the retangle dx. The density at (y, y 2 ) = h(x, x 2 ) will depend on the geometry of the transformation of the orners of this retangle. (x,x 2 + ) (x + dx,x 2 + ) (x,x 2 ) (x + dx,x 2 ) Figure 2: Retangle before transformation. By the partial differentiability of h in eah dimension, the following approximation is true. h (x + dx, x 2 + ) h (x, x 2 ) + h dx + h h 2 (x + dx, x 2 + ) h 2 (x, x 2 ) + h 2 dx + h 2 To simplify the expressions, let y = h (x, x 2 ), y 2 = h 2 (x, x 2 ), a = h dx, b = h, = h 2 dx, and d = h 2. With this notation, the four orners of the retangle are mapped approximately as follows: (x, x 2 ) (y, y 2 ) (x + dx, x 2 ) (y + a, y 2 + b) (x + dx, x 2 + ) (y + a +, y 2 + b + d) (x, x 2 + ) (y +, y 2 + d) These points will not be arranged as a retangle in general, but will be a parallelogram. The parallelogram an be understood geometrially as being formed by the two vetors (a, b) and (, d) 2

extending from (y, y 2 ) to form two adjaent sides with the other sides then being parallel and equal length to these. The proper saling of the density f Y (y) will then depend on the relative area of this parallelogram to the original retangle. The following figure shows a parallelogram where lower left orner orresponds to the point (y, y 2 ) and the two adjaent sides are desribed by the vetors (a, b) and (, d). In this figure, a, b,, d > 0, whih orresponds to all of the partial derivatives h i / x j being positive. In addition, a > and d > b so that ad > b. The following geometri argument relies on these hoies, but the result will be true in general. d b a Figure 3: Parallelogram after transformation. There are two retangles with dashed lines added to the figure. The larger of these retangles has width a and height d and the smaller one has width and height b. In addition, there are two small dotted lines added to the figure whih reate six triangles and two larger polygons. Notie that the six triangles ome in pairs whih are the same size and orientation. Eah pair inludes one shaded triangle within the parallelogram and one outside. When the shading of the triangles are reversed, we get the following figure. The total shaded area is the same and is equal to ad b as it is the differene in the areas of the retangles. Thus, the area of the parallelogram depends only on the lengths and orientations of the vetors (a, b) and (, d). When these vetors are ombined to form a matrix, we see that the area is equal to the absolute value of the determinant of this matrix. ( ) a b ad b = det d In fat, the absolute value of this determinant measures the area of the orresponding parallelogram for any real a, b,, d whih an be shown by working through all ases. If we substitute bak in our original expressions, we see that the area of the parallelogram is ad b = ( h dx ) ( ) h2 3 ( h ) ( ) h2 dx = J dx

d b a Figure 4: Equal area. The area of the parallelogram is equal to the differene in the areas of the retangles. where J = det ( h h h 2 h 2 is alled the Jaobian or Jaobian derivative of the transformation. The ratio of the area of the parallelogram to the area of the original retangle is J and it follows then that the joint density of the random variables Y and Y 2 is f Y (y, y 2 ) = ) J(h (y, y 2 )) f X(h (y, y 2 )). More than two dimensions. It is natural to then ask how this extends to joint distributions of n random variables. The answer is that the density requires a resaling whih is found by alulating the reiproal of the absolute value of the Jaobian derivative for this larger transformation whih is simply a determinant of a larger matrix of partial derivatives. The derivation above found the Jaobian deriavative by omputing y j / x i for eah i, j, but it is also possible to take derivatives of the inverse relationships x j / y j and find the orresponding Jaobian deriavative. The value of this seond derivative is the reiproal of the first. In order to better distinguish these ases, it is useful to introdue a different notation that make the diretion of differentiation lear. We an define the Jaobian derivative as follows. y (y,..., y n ) (x,..., x n ) = det. y x n y n.... y n x n 4

The density of Y = (Y,..., Y n ) an then be omputed by finding one of two Jaobian derivatives. f Y (y,..., y n ) = = (y,...,y n) f X (h (x,..., x n )) (x,...,x n) (x,..., x n ) (y,..., y n ) f X(h (x,..., x n )) If you simply memorize the expression f Y (y,..., y n ) (y,..., y n ) = f X (x,..., x n ) (x,..., x n ) you an rerrange this algebraially to find either Jaobian and then properly use it or its reiproal to find the desired density after the transformation. 5