7: /4/ TOPIC Distribution functions their inverses This section develops properties of probability distribution functions their inverses Two main topics are the so-called probability integral transformation inverse probability transformation Distribution unctions Let X be a real-valued rom variable defined on a sample space Ω The distribution function (df ) of X is the function X from R := (, ) to [, ] defined by := P [ ω Ω : X(ω) x ] P [ X x ] Here are a couple of examples which motivate Theorem below The symbol is to be read as distributed as : df of X Uniform on {, /, } x df of X Uniform on [, ] x Theorem (Properties of ) or any rom variable X, the distribution function of X has these properties: D is nondecreasing; D is right-continuous; D lim x = lim x = Moreover for each x R, D4 (x ) := lim w x, w<x (w) = P [ X < x ], D5 jump of at x := (x ) = P [ X = x ] Proof I will prove D D, leave the rest to you as Exercise = P [X x[ D is nondecreasing D is right-continuous D x y = (y): Indeed, suppose x y Then the event A := {X x} is contained in the event B := {X y}, so = P [A] P [B] (y) D x n x = (x n ) Indeed, suppose x, x, is an infinite sequence of real numbers that decrease down to x Then the events A n := {X x n } shrink down to the event A := {X x} By a property of probability measures (see (5), below), (x n ) = P [A n ] decreases down to P [A] = A special case This section discusses inverse dfs the probability integral inverse probability transformations under the simplifying assumption that the distribution function of X is continuous strictly increasing, as illustrated below: u x (u) or u (, ), let (u) be defined as in the picture, ie, (u) is the unique number ξ such that (ξ) = u By a result in analysis, (u) is continuous strictly increasing in u Consider now the rom variable (X), whose value at a sample point ω is ( X(ω) ) = P [ ω : X(ω ) X(ω) ], the probability of observing a new value for X no greater than the value X(ω) at h or < u <, we have (X) u X = ( (X) ) (u), so P [ (X) u ] = P [ X (u) ] = ( (u) ) = u () This implies that (X) is uniformly distributed over (, ): in symbols, (X) U for U Uniform(,) R
(): (X) U Uniform(,) () has implications for statistics, in the context of hypothesis testing Think of X as a test statistic for the simple hypothesis H that the data are distributed according to P, with the alternative A such that you ought to reject H when X is too far to the left or an observed value x of X, = P [X x] is the chance of getting a result as extreme, or more so, than the one at h In statistics, this quantity is called the p-value Small p-values argue against H ( thus in favor of A); in decision theory you reject H ( accept A) if the p-value is sufficiently small, say 5 () says that when H is in fact true, repeated tests will produce p-values that are uniformly distributed over (, ); by chance alone, you ll get a p-value less than 5 ( mistakenly reject H) about time in Now let U be a rom variable uniformly distributed over (, ) consider the rom variable (U) or x R, (U) x U = ( (U) ), so P [ (U) x ] = P [ U ] = = P [ X x ] () Since this is true for all x, (U) X have the same distribution: (U) X This result has implications for rom number generation Namely, if you can somehow generate a uniform variable U, then (U) will be distributed like X In principle this method can always be used to simulate X, but it is efficient only when is easy to compute When that s not the case, one can often get an efficient algorithm by using some result from distribution theory The transformation from X to (X) is called the probability integral transformation (PIT), whereas the transformation from U to (U) is called the inverse probability transformation (IPT) In what follows we are going to study the PIT IPT in the general case, where may be discontinuous not strictly increasing Inverse distribution functions Let be an arbitrary probability distribution function Since the graph of can have jumps flat spots, there is no true inverse to in the usual sense One can define a kind of inverse to, as follows Refer to the figure below: u u u (u ) x (u ) (u ) As a first attempt, try taking (u) to be that x such that u = This definition works for u = u, but it doesn t work for u = u, for which there is a whole range of x s such that u = As a second attempt, try taking (u) to be the smallest x such that u = This works for u = u u, but it doesn t work for u = u, since there are no x s such that u = As a third attempt, try taking (u) to be the smallest x such that u This works for u = u u u We make this the general definition: more precisely we take (u) := inf{ x R : u } () for u (, ); here inf sts for infimum, or greatest lower bound To better underst () fix u (, ) consider the set I := {x R : u } Note that: I is nonempty, because u < as x ; I is an interval extending out to +, because is nondecreasing; I has a finite left-endpoint, say ξ, because u > as x ; ξ I, because is right-continuous The last claim follows from x n := ξ + /n I for all integers n = u (x n ) for all n = u lim n (x n ) = (ξ) = ξ I 4
7: /4/ To summarize, I = { x R : u } is a left-closed right-semiinfinite interval (u) = ξ is its finite left endpoint This gives Theorem Let be a probability distribution function let : (, ) R be defined by (u) = inf{x R : u } The infimum here is attained: (u) is in fact the smallest x R such that u (4) Moreover, for any u (, ) x R, one has u (u) x (5) Relation (5) is called the switching formula (S) (5) has an obvious counterpart: u > (u) > x; (6) But watch out! If is changed to < throughout (5), or if > is changed to throughout (6), the resulting assertions may not hold: see Exercises 8 In these notes, when invoking (5) (6), I will always write the u-thing on the left the x-thing on the right only use the valid inequalities > The theorem below gives the main properties of To motivate them, here is the graph of (u) versus u for the on the preceding page (the scales are different, though): (u ) (u ) (u ) u u u Note that this is nondecreasing left-continuous As is the case for any nondecreasing function, (u+) := lim v u, v>u (v) (7) exists for each u 5 (): (u) = inf{ x R : u } (S): u (u) x Theorem (Properties of ) Let be a probability distribution function let be defined by () for < u < Then ID is nondecreasing; ID is left-continuous; ID lim u (u) = inf{ x R : > } lim u (u) = sup{ x R : < } ID4 for each u (, ) x R with < <, ( ( (u)) ) u ( (u) ), (8) ( ) x ( ()+ ) (9) Proof ID < u v < = (u) (v): This follows easily (show how!) from the definition of It also follows from the switching formula: (v) (v) = v ( (v) ) (by the S) = u ( (v) ) (since u v) = (u) (v) (S again) ID: u n u = (u n ) (u) The assumption is u n u n+ for all n u = lim n u n Since is nondecreasing, we have (u n ) (u n+ ) (u) for all n, so L := lim n (u n ) exists satisfies L (u) To get the opposite inequality, consider an x R with (u) > x Then u > (by the S) = u n > for all large n (since u n u) = (u n ) > x for all large n (S again) = L > x (since L = lim n (u n )) Now let x tend up to (u) to conclude L (u), as desired 6
(): (u) = inf{ x R : u } (S): u (u) x (8): ( (u) ) u ( (u) ) (9): ( ) x ( + ) ID: This result is not so important, so I ll leave it to you as Exercise (8) holds The inequality on the right follows directly from the S, as in the proof of ID To get the inequality on the left, set x = (u); we need to show u (x ) or this let ξ < x Then (u) > ξ (since x = (u)) = u > (ξ) (by the S) Letting ξ tend up to x shows that u (x ), as desired (9) holds The argument for this is similar to that for (8); I ll leave it to Exercise 4 Relations (8) (9) specify the extent to which are inverses Note that the inequalities in these relations can be strict, as in the following cases: u (x ) (8) x := (u) u := (9) (u) x (u+) In view of ID ID4, is called the left-continuous inverse to ID4 the preceding examples yield this corollary: Theorem 4 Let be the left-continuous inverse to the df Then ( (u) ) = u for all u (, ) iff is continuous, () { ( ) } = x for all x A := { x R : < < } () iff is strictly increasing over A (): (u) = inf{ x R : u } (S): u (u) x The inverse probability transformation We saw earlier (see () ()) that if the df of a rom variable X is continuous strictly increasing, then (i) (X) is uniformly distributed over (, ),, conversely, (ii) if U is uniformly distributed over (, ), then (U) X The following theorem says that (ii) without any conditions on (i) is not always true, but there are some things that can be said; we ll deal with that in the next subsection Theorem 5 (The IPT Theorem) Let X be a rom variable with df left-continuous inverse df If U (, ), then (U) X Proof or each x R, we have (U) x U by the switching formula Thus P [ (U) x ] = P [ U ] = = P [ X x ] Since this is true for all x R, (U) X Example Suppose X takes the values, /, with probability / each The graphs of are as follows: Df of X x (u) Inverse df u It is clear that if U Uniform(, ), then (U) takes the values, /, with probability / each, just as X does 7 8
7: /4/ (8): ( (u) ) u ( (u) ) U Uniform = IPT (U) (): or X, P [ (X) u ] u for all u (, ) The probability integral transformation The second half of the following theorem gives a necessary sufficient condition for (X) to be uniformly distributed over (, ) Theorem 6 (The PIT Theorem) Let X be a rom variable with df Then P [ (X) u ] u for all u (, ) () Moreover P [ (X) u ] = u for all u (, ) is continuous () Proof Let U Uniform(, ) let be the left-continuous inverse to By the IPT, X (U), so (X) ( (U) ) () holds In general, ( (U) ) U by (8) Thus for all u (, ), Example As in the preceding example, suppose X takes the values, /, with probability / each Then Y := (X) takes the values /, /, with probability / each The graphs of the df of X the df G of Y are as follows: Df of X x G(y) Df G of Y = (X) y The graph of G shows that G(y) y for all y (, ), as () asserts The proof of the PIT Theorem illustrates a useful technique if you want to prove something about the distribution of a rom variable X with df, try representing X as (U) for a uniform rom variable U P [ (X) u ] = P [ ( (U) ) u ] P [ U u ] = u () holds If is continuous, then U = ( (U) ) by (8), so (X) ( (U) ) = U Uniform(,) On the other h, if is not continuous, then there exists an x R such that < (x ) = P [ X = x ] P [ (X) = ] But P [ U = ] =, so (X) U 9
Exercises The following definitions results are needed for Exercise Suppose (x n ) n= is an infinite sequence of real numbers x is a real number One writes x n x to mean x n x n+ for all n lim n x n = x x n x to mean x n x n+ for all n lim n x n = x Similarly, if (A n ) n= is an infinite sequence of events A is an event, one writes A n A to mean A n A n+ for all n A = A n A to mean A n A n+ for all n A = One of the properties of a probability measure P is that n= A n, n= A n A n A = P [A n ] P [A] (4) A n A = P [A n ] P [A] (5) Properties (4) (5) are called respectively continuity from below continuity from above Exercise Complete the proof of Theorem, using (4) (5) to verify properties D D4 Exercise Let be the left-continuous inverse to a df Show by examples that (u) < x does not imply u < that u < does not imply (u) < x Exercise Prove ID [Hint: put L = lim u (u) ξ = inf(a) for A = { x R : > } Note that L ξ may be Deduce L ξ from the fact that L x for each x A (why?) Deduce L ξ from the fact that (u) A for each u > (why?)] Exercise 4 Prove (9) Exercise 5 Let be a df such that < < for all x R let be the left-continuous inverse to Show that ( ( (u) )) = (u) for all u (, ) (6) ( ( )) = for all x R Exercise 6 Prove Theorem 4 (7) Exercise 7 Inequality () has an important implication for p- values What is that? Exercise 8 Let X be a rom variable with df left-continuous inverse df (a) Show that for x R < u <, one has u (x ) (u+) x, (8) with (x ) defined as in D4 (u+) defined by (7) (b) Use part (a) to show that for < u <, (u+) = sup{ x : u (x ) } (9) [Hint for (a): use the switching formula v > (w) (v) > w, noting for example that u (x ) u (w) for all w < x] Unfornately the jumps of complicate what would otherwise be a simple theory orturnately, though, don t have too many jumps according to the following exercise, there are at most countably many of them Exercise 9 Let B be a subinterval of R (B doesn t have to be a proper subinterval; the case B = R is allowed) Let f be a non-
7: /4/ decreasing mapping from B into R (or example, f might be a df, defined on B = R, or an inverse df, defined on B = (, )) Put D f := {x B : f is discontinuous at x}, () C f := D c f = {x B : f is continuous at x} () (a) Show that D f is countable (b) Use part (a) to show that C f is dense in B [Hint for (a): irst show that for any closed bounded subinterval A of B any number ɛ >, there are at most finitely many points x A such that the jump f(x+) f(x ) of f at x exceeds ɛ] The following exercise plays an important role in the theory of convergence of probability distributions The main point is that () implies (4) Similar reasoning shows that, conversely, (4) implies (); you don t have to give the argument for that Exercise Let,,, n,, be dfs with corresponding left-continuous inverse dfs,,, n,, Suppose that lim n n (x) = for all continuity points x of () (a) Suppose that u (, ) that w is a continuity point of with (u) > w Use the switching formula to show that n(u) > w for all large n (b) Suppose that u (, ) that y is a continuity point of with (u+) < y Show that n(u) y for all large n (c) Use parts (a) (b) of this exercise part (b) of the preceeding exercise to show that for each u (, ), one has (u) lim inf n n(u) lim sup n n(u) (u+) () (d) Use part (c) to show that lim n n(u) = (u) for all continuity points u of (4) (e) Show by example that if u is not a continuity point of, then n(u) may not converge as n