Conditional Distributions The goal is to provide a general definition of the conditional distribution of Y given X, when (X, Y ) are jointly distributed. Let F be a distribution function on R. Let G(, ) be a map from R B R to [, 1] satisfying: (a) G(x, ) ia a probability measure on B R for every x in R, and, (b) G(, A) is a measurable function for every Borel set A. We can then form the generalized product F G in the following sense: There exists a measure H on B R 2, which we call F G such that: (F G)[(, x ] (, y ]] = x G(x, (, y ]) df (x). More generally, for Borel subsets A, B, we should have: (F G)(A B) = G(x, B) df (x). If (X, Y ) has distribution F G, then the marginal of X is F and the conditional of Y given X = x is G(x, ). From F G, we can recover the marginal distribution of Y, say F and the conditional of X given Y = y, say G(y, ), where G has the same properties as G and F G gives the distribution of (Y, X). Note that: A F (y ) = P (X <, Y y ) = G(x, (, y ]) df (x). When G does not depend on its first co-ordinate, i.e. G(x, ) is the same measure for all x, X and Y are independent, and G( ) G(, ), gives the marginal distribution of Y. Example 1: Suppose that F is U(, 1), and that G is defined by: G(x, {1}) = x and G(x, {}) = 1 x. 1
We seek to find F G which is defined on ([, 1] {, 1}, B [,1] 2 {,1} ). Let (X, Y ) follow F G. Now, Similarly, P (X x, Y = 1) = x G(u, {1}) df (u) = x u du = x2 2. (.1) P (X x, Y = ) = x x2 2. (.2) So P (Y = 1) = 1/2; therefore Y Ber(1/2). It is also clear from the above discussion that given X, Y Ber(x). This can also be verified through the limiting definition of conditional probabilities that was discussed before. P (Y = 1 X = x) = lim h P (Y = 1 X [x, x + h]) P (Y = 1, X [x, x + h]) = lim h P (X [x, x + h]) = lim h x+h x u du h = x. Next, we seek to find H(y, ), the conditional of X given Y = y. We can do it via the definition of conditional probabilities when conditioning on a discrete random variable but let s try the more formal recipe. We have: P (Y = 1, X x) = H(y, (, x]) df (y) = H(1, (, x]) 1 2, {1} and P (Y =, X x) = and using (.1) and (.2), we get: {} H(y, (, x]) df (y) = H(, (, x]) 1 2, H(1, (, x]) = x 2 and H(, (, x]) = 2x = x 2. Example 2: Let X be a continuous random variable with: Note that the distribution function of X is: f(x) = 2 3 e x 1(x > ) + e x 1(x < ). F (x) = P (X x) = 1 3 ex 1(x ) + 2 ( 1 3 + 2 ) 3 (1 e x ) 1(x > ).
We first find the conditional distribution of Y = sign(x) given X, i.e. P (Y = 1 X = x) for x >. So, it suffices to get a transition function G(t, a), a { 1, 1}, t >, such that: and Now, and P ( X x, Y = 1) = P ( X x, Y = 1) = x x G(t, 1) df X ) t. (.3) G(t, 1) df X ) t. P ( X x, Y = 1) = P ( < X x) = 2 3 (1 e x ), P ( X x, Y = 1) = P ( x X < ) = 1 3 (1 e x ). So P ( X x) = 1 e x. Now, by (.3), 2 3 (1 e x ) = x G(t, 1) d(1 e t ) = x G(t, 1) e t dt, showing that G(x, 1) = 2/3. Similarly G(x, 1) = 1/3. Note that the distribution corresponding to f can be generated by the following stochastic mechanism: Let V follow Exp(1) and let B be a { 1, 1} valued random variable independent of V, with p B (1) = 2/3 = 1 p B ( 1), and let X = V 1{B = 1} V 1{B = 1}. Then V is precisely X and X f. Note that the sign of X is precisely B and it is independent of V = X by the mechanism itself. So the conditional of sign(x) given X is simply the unconditional distribution of B and we obtain the same result as with the formal derivation. Next, consider the distribution of X given Y. Note that P (Y = 1) = 1/3 and P (Y = 1) = 2/3. We have: so H( 1, (, x]) = 3 P (Y = 1, X x) Similarly, we compute H(1, (, x]). P (Y = 1, X x) = H( 1, (, x]) 1 3, = 3 P (X x) [ ] 1 = 3 1(x > ) + P (X x) 1(x ) = e x 1(x ) + 1(x > ). 3 3
.1 Order statistics and conditional distributions Let X 1, X 2,..., X n be i.i.d. from a distribution F with Lebesgue density f. Let X (1), X (2),..., X (n) be the corresponding order statistics. Note that the order statistics are all distinct with probability 1 and P ( (X (1), X (2),..., X (n) ) B) = 1, where B = {(x 1, x 2,..., x n ) : x 1 < x 2 <... < x n }. Let s first find the joint density of the order statistics. Let Π be the set of all permutations of the numbers 1 through n. For a measurable subset of B, say A, we have: P ((X (1), X (2),..., X (n) ) A)) = P ( π Π {(X π1, X π2,..., X πn ) A}) = π Π P ({(X π1, X π2,..., X πn ) A}) This shows that: = n! P ((X 1, X 2,..., X ) A) = n! f(x 1, x 2,..., x n ) dx 1 dx 2... dx n. A f ord (x 1, x 2,..., x n ) = n! Π n i=1 f(x i ), (x 1, x 2,..., x n ) B. Remark: If we assumed that the X i s were not independent but came from an exchangeable distribution with density f(x 1, x 2,..., x n ), i.e. the distribution of the X i s is invariant to permutations of the X i s, then f is necessarily symmetric in its arguments and an argument similar to the one above would show that f ord (x 1, x 2,..., x n ) = n! f(x 1, x 2,..., x n ), (x 1, x 2,..., x n ) B. Now, consider the situation that the distribution of (X 1, X 2,..., X n ) is exchangeable. We seek to find: P ((X 1, X 2,... X n ) = (x π1, x π2,..., x πn ) X (1) = x 1, X (2) = x 2,..., X (n) = x n ) for some permutation π. Let τ be an arbitrary permutation. Note that (Y 1, Y 2,..., Y n ) (X τ1, X τ2,..., X τn ) has the same distribution as (X 1, X 2,..., X n ). Thus, P((X 1, X 2,... X n ) = (x π1, x π2,..., x πn ) X (1) = x 1, X (2) = x 2,..., X (n) = x n ) = P((Y 1, Y 2,... Y n ) = (x π1, x π2,..., x πn ) Y (1) = x 1, Y (2) = x 2,..., Y (n) = x n ) = P((X τ1, X τ2,... X τn ) = (x π1, x π2,..., x πn ) X (1) = x 1, X (2) = x 2,..., X (n) = x n ) = P((X 1, X 2,... X n ) = (x (π τ 1 ) 1, x (π τ 1 ) 2,..., x (π τ 1 ) n ) X (i) = x i, i = 1,..., n). 4
As τ runs over all permutations, so does π τ 1, showing that the conditional probability under consideration does not depend upon the permutation π initially fixed. As there are n! permutations, we conclude that: P ((X 1, X 2,... X n ) = (x π1, x π2,..., x πn ) X (1) = x 1, X (2) = x 2,..., X (n) = x n ) = 1 n!. An example with Uniforms: Suppose that X 1, X 2,..., X n are i.i.d. Uniform (, θ). The joint density of {X (i) } is given by: f ord (x 1, x 2,..., x n ) = n! θ n 1 { < x 1 < x 2 <... < x n < θ}. The marginal density of the maximum, X (n) is: f X(n) (x n ) = n θ n xn 1 n 1 { < x n < θ}. So, the conditional density of (X (1), X (2),..., X (n 1) ) X (n) = x n ), by direct division, is seen to be: (n 1)! f cond (x 1, x 2,..., x n 1 ) = 1 { < x 1 < x 2 <... < x n }. x n 1 n This shows that the first n 1 order statistics given the maximum, x n, are distributed as the n 1 order statistics from a sample of size n 1 from Uniform(, x n ). But note that the distribution of the vector {X 1, X 2,..., X n } {X (n) } given (X (1), X (2),..., X (n) ) must be uniformly distributed on all the (n 1)! permutations of the first n 1 order statistics. Thus, the random vector {X 1, X 2,..., X n } {X (n) }, conditional on X (n), must behave like an i.i.d. random sample from Uniform(, X (n) ). These arguments can be made more rigorous but at the expense of much notation. Order statistics and non-exchangeable distributions: Take (X, Y ) to be a pair of independent random variables, each defined on (, 1), with X having Lebesgue density f and Y having Lebesgue density g. Now, X and Y are not exchangeable. We consider P ((X, Y ) A, (U, V ) B) where U = X Y, V = X Y, A is a Borel subset of (, 1) and B a Borel subset of {x < y : x, y (, 1)}. Let π be the permutation on {1, 2} that swaps indices. Then: P ((X, Y ) A, (U, V ) B) = P ((X, Y ) A (B πb)) = P ((X, Y ) A B) + P ((X, Y ) (A πb) = f(x)g(y) dx dy + f(x)g(y) dxdy A B A πb = f(u)g(v) du dv + f(v)g(u) dudv (change of variable) A B πa B =, {(f(u)g(v) 1((u, v) A) + f(v)g(u) 1((u, v) πa)} du dv. B 5
From the above derivation, taking A to be the unit square, we find that: P ((U, V ) B) = (f(u)g(v) + f(v)g(u)) dudv, so that df U,V (u, v) = (f(u)g(v) + f(v)g(u)) dudv. Conclude that: P ((X, Y ) A, (U, V ) B) = ξ((u, v), A) df U,V (u, v) where, for u < v, ξ((u, v), A) = B B f(u) g(v) f(v) g(u) 1((u, v) A) + 1((u, v) πa). f(u)g(v) + f(v)g(u) f(u)g(v) + f(v)g(u) Remark: If (X 1, X 2,..., X n ) is a random vector with density n i=1 f i(x i ), you should be able to guess the form of the conditional distribution of the X i s given the order statistics. 6