STAT 450: Statistical Theory. Distribution Theory. Reading in Casella and Berger: Ch 2 Sec 1, Ch 4 Sec 1, Ch 4 Sec 6.

STAT 45: Statistical Theory Distribution Theory Reading in Casella and Berger: Ch 2 Sec 1, Ch 4 Sec 1, Ch 4 Sec 6. Basic Problem: Start with assumptions about f or CDF of random vector X (X 1,..., X p ). Define Y g(x 1,..., X p ) to be some function of X (usually some statistic of interest like a sample mean or correlation coefficient or a test statistic). How can we compute the distribution or CDF or density of Y? Univariate Techniques In a univariate problem we have X a real-valued random variable and Y, a real valued transformation of X. Often we start with X and are told either the density or CDF of X and then asked to find the distribution of something like X 2 or sin(x) a function of X which we call a transformation. I am going to show you the steps you should follow in an example problem. Method 1: compute the CDF by integration and differentiate to find f Y. Example: Suppose U Uniform[, 1]. What is the distribution of log(x)?. Step 1: Make sure you give the transformed variable a name. Let Y log U. Step 2: Write down the definition of the CDF of Y : F Y (y) P (Y y). Step 3: Substitute in the definition of Y in terms of the original variable in this case, U: F Y (y) P (Y y) P ( log(u) y). Step 4: Solve the inequality to get, hopefully, one or more intervals for U. In this case log(u) y is the same as log(u) y 1

which is the same as So U e y. P ( log(u) Y ) P ( U e y). Step 5: now use the assumptions to write the resulting probability in terms of F U. P ( U e y) 1 P ( U < e y) 1 F U ( e y ). Step 6: Differentiate the resulting formula with respect to y. In this case the formula for F U (u) needs to be written carefully: 1 u > 1 F U (u) u < u < 1 u The quantity e y is never negative but if y < e y > 1 so { 1 F U (e y 1 y < ) 1 e y y Now we can differentiate. For negative y the derivative is. For positive y the derivative is e y. At y the derivative does not exist and the density f Y (y) does not have a unique definition So our density is y < f Y (y) undefined y e y y > This is the standard exponential density so we would finish the problem by saying that log(u) has a standard exponential distribution. I have done the individual steps in a bit of detail but here are the crucial pieces again in a chain of equalities: F Y (y) P (Y y) P ( log U y) P (log U y) P (U e y ) { 1 e y y > y. 2

Differentiating we find f Y (y) d dy F Y (y) so Y has standard exponential distribution. Example: Z N(, 1), i.e. and Y Z 2. Then Now differentiate y < undefined y e y y > f Z (z) 1 2π e z2 /2 F Y (y) P (Z 2 y) { y < P ( y Z y) y. P ( y Z y) F Z ( y) F Z ( y) to get Then f Y (y) [ y < d FZ ( y) F dy Z ( y) ] y > undefined y. d dy F Z( y) f Z ( y) d dy 1 2π exp y ( ( y) 2 /2 ) 1 2 y 1/2 1 2 2πy e y/2. (Similar formula for other derivative.) Thus 1 2πy e y/2 y > f Y (y) y < undefined y. 3

This is the density of a χ 2 1 random variable because the chi-squared distributing with ν degrees of freedom is defined to be the distribution of the sum of ν squared independent standard normal variables. I am now going to rewrite this density using indicator notation which can be very useful: { 1 y > 1(y > ) y which we use to write f Y (y) 1 2πy e y/2 1(y > ) (changing definition unimportantly at y ). Notice: I never evaluated F Y before differentiating it. In fact F Y and F Z are integrals I can t do but I can differentiate them anyway. Remember the fundamental theorem of calculus: d dx x a f(y) dy f(x) at any x where f is continuous. Summary: for Y g(x) with X and Y each real valued P (Y y) P (g(x) y) Take d/dy to compute the density f Y (y) d dy P (X g 1 (, y]). {x:g(x) y} Often can differentiate without doing integral. f X (x) dx. Method 2: Change of variables. Assume g is one to one. I do: g is increasing and differentiable. Interpretation of density (based on density F ): f Y (y) P (y Y y + δy) lim δy δy F Y (y + δy) F Y (y) lim δy δy 4

and P (x X x + δx) f X (x) lim. δx δx Now assume y g(x). Define δy by y + δy g(x + δx). Then Get Take limit to get or P (y Y g(x + δx)) P (x X x + δx). P (y Y y + δy)) δy P (x X x + δx)/δx {g(x + δx) y}/δx. f Y (y) f X (x)/g (x) f Y (g(x))g (x) f X (x). Alternative view: Each probability is integral of a density. The first is the integral of the density of Y over the small interval from y g(x) to y g(x + δx). The interval is narrow so f Y is nearly constant and P (y Y g(x + δx)) f Y (y)(g(x + δx) g(x)). Since g has a derivative the difference and we get g(x + δx) g(x) δxg (x) P (y Y g(x + δx)) f Y (y)g (x)δx. Same idea applied to P (x X x + δx) gives so that or, cancelling the δx in the limit P (x X x + δx) f X (x)δx f Y (y)g (x)δx f X (x)δx f Y (y)g (x) f X (x). If you remember y g(x) then you get f X (x) f Y (g(x))g (x). 5

Or solve y g(x) to get x in terms of y, that is, x g 1 (y) and then f Y (y) f X (g 1 (y))/g (g 1 (y)). This is just the change of variables formula for doing integrals. Remark: For g decreasing g < but Then the interval (g(x), g(x + δx)) is really (g(x + δx), g(x)) so that g(x) g(x + δx) g (x)δx. In both cases this amounts to the formula f X (x) f Y (g(x)) g (x). Mnemonic: f Y (y)dy f X (x)dx. Example: : X Weibull(shape α, scale β) or f X (x) α β ( ) α 1 x exp { (x/β) α } 1(x > ). β Let Y log X or g(x) log(x). Solve y log x: x exp(y) or g 1 (y) e y. Then g (x) 1/x and 1/g (g 1 (y)) 1/(1/e y ) e y. Hence ( ) e y α 1 exp { (e y /β) α } 1(e y > )e y. f Y (y) α β β For any y, e y > so indicator 1. So Define φ log β and θ 1/α; then, f Y (y) α β α exp {αy eαy /β α }. f Y (y) 1 θ exp { y φ θ { }} y φ exp. θ Extreme Value density with location parameter φ and scale parameter θ. (Note: several distributions are called Extreme Value.) Marginalization 6

Simplest multivariate problem: (or in general Y is any X j ). X (X 1,..., X p ), Y X 1 Theorem 1 If X has density f(x 1,..., x p ) and q < p then Y (X 1,..., X q ) has density f Y (x 1,..., x q ) f(x 1,..., x p ) dx q+1... dx p. f X1,...,X q is the marginal density of X 1,..., X q and f X the joint density of X but they are both just densities. Marginal just to distinguish from the joint density of X. Example: : The function is a density provided The integral is f(x 1, x 2 ) Kx 1 x 2 1(x 1 >, x 2 >, x 1 + x 2 < 1) P (X R 2 ) K 1 1 x1 f(x 1, x 2 ) dx 1 dx 2 1. x 1 x 2 dx 1 dx 2 K so K 24. The marginal density of x 1 is f X1 (x 1 ) 24 24x 1 x 2 1 x 1 (1 x 1 ) 2 dx 1 /2 K(1/2 2/3 + 1/4)/2 K/24 1(x 1 >, x 2 >, x 1 + x 2 < 1) dx 2 1 x1 x 1 x 2 1( < x 1 < 1)dx 2 12x 1 (1 x 1 ) 2 1( < x 1 < 1). 7

This is a Beta(2, 3) density. General problem has Y (Y 1,..., Y q ) with Y i g i (X 1,..., X p ). Case 1: q > p. Y won t have density for smooth g. Y will have a singular or discrete distribution. Problem rarely of real interest. (But, e.g., residuals have singular distribution.) Case 2: q p. We use a change of variables formula which generalizes the one derived above for the case p q 1. (See below.) Case 3: q < p. Pad out Y add on p q more variables (carefully chosen) say Y q+1,..., Y p. Find functions g q+1,..., g p. Define for q < i p, Y i g i (X 1,..., X p ) and Z (Y 1,..., Y p ). Choose g i so that we can use change of variables on g (g 1,..., g p ) to compute f Z. Find f Y by integration: f Y (y 1,..., y q ) f Z (y 1,..., y q, z q+1,..., z p )dz q+1... dz p Change of Variables Suppose Y g(x) R p with X R p having density f X. Assume g is a one to one ( injective ) map, i.e., g(x 1 ) g(x 2 ) if and only if x 1 x 2. Find f Y : Step 1: Solve for x in terms of y: x g 1 (y). Step 2: Use basic equation: and rewrite it in the form Interpretation of derivative dx dy f Y (y)dy f X (x)dx f Y (y) f X (g 1 (y)) dx dy. when p > 1: dx dy which is the so called Jacobian. ( ) det xi y j 8

Equivalent formula inverts the matrix: This notation means dy dx det f Y (y) f X(g 1 (y)) dy dx y 1 x 1 y 1 x 2 y p x 1. y p x 2 but with x replaced by the corresponding value of y, that is, replace x by g 1 (y). y 1 x p y p x p Example: : The density f X (x 1, x 2 ) 1 { } 2π exp x2 1 + x 2 2 2 is the standard bivariate normal density. Let Y (Y 1, Y 2 ) where Y 1 X 2 1 + X 2 2 and Y 2 < 2π is angle from the positive x axis to the ray from the origin to the point (X 1, X 2 ). I.e., Y is X in polar co-ordinates. Solve for x in terms of y: so that X 1 Y 1 cos(y 2 ) X 2 Y 1 sin(y 2 ) g(x 1, x 2 ) (g 1 (x 1, x 2 ), g 2 (x 1, x 2 )) ( x 2 1 + x 2 2, argument(x 1, x 2 )) g 1 (y 1, y 2 ) (g1 1 (y 1, y 2 ), g2 1 (y 1, y 2 )) (y 1 cos(y 2 ), y 1 sin(y 2 )) ( dx dy det cos(y2 ) y 1 sin(y 2 ) sin(y 2 ) y 1 cos(y 2 ) y 1. It follows that f Y (y 1, y 2 ) 1 { } 2π exp y2 1 y 1 1( y 1 < )1( y 2 < 2π). 2 9 )

Next: marginal densities of Y 1, Y 2? Factor f Y as f Y (y 1, y 2 ) h 1 (y 1 )h 2 (y 2 ) where h 1 (y 1 ) y 1 e y2 1 /2 1( y 1 < ) and Then h 2 (y 2 ) 1( y 2 < 2π)/(2π). f Y1 (y 1 ) h 1 (y 1 ) h 1 (y 1 )h 2 (y 2 ) dy 2 h 2 (y 2 ) dy 2 so marginal density of Y 1 is a multiple of h 1. Multiplier makes f Y1 1 but in this case so that h 2 (y 2 ) dy 2 2π (2π) 1 dy 2 1 f Y1 (y 1 ) y 1 e y2 1 /2 1( y 1 < ). (Special case of Weibull or Rayleigh distribution.) Similarly f Y2 (y 2 ) 1( y 2 < 2π)/(2π) which is the Uniform((, 2π) density. Exercise: W Y1 2 /2 has standard exponential distribution. Recall: by definition U Y1 2 has a χ 2 distribution on 2 degrees of freedom. Exercise: find χ 2 2 density. Note: We show below factorization of density is equivalent to independence. 1