F (x) = P [X x[. DF1 F is nondecreasing. DF2 F is right-continuous

Similar documents
1. Supremum and Infimum Remark: In this sections, all the subsets of R are assumed to be nonempty.

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

2.4 The Extreme Value Theorem and Some of its Consequences

2.2 Some Consequences of the Completeness Axiom

Part 2 Continuous functions and their properties

Homework 1 (revised) Solutions

MATH 102 INTRODUCTION TO MATHEMATICAL ANALYSIS. 1. Some Fundamentals

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

Set, functions and Euclidean space. Seungjin Han

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

MORE ON CONTINUOUS FUNCTIONS AND SETS

Suppose R is an ordered ring with positive elements P.

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

MATH 131A: REAL ANALYSIS (BIG IDEAS)

Supremum and Infimum

Standard forms for writing numbers

HW 4 SOLUTIONS. , x + x x 1 ) 2

U e = E (U\E) e E e + U\E e. (1.6)

Iowa State University. Instructor: Alex Roitershtein Summer Homework #5. Solutions

REVIEW OF ESSENTIAL MATH 346 TOPICS

A LITTLE REAL ANALYSIS AND TOPOLOGY

MAT137 - Term 2, Week 2

Proof. We indicate by α, β (finite or not) the end-points of I and call

MATH5011 Real Analysis I. Exercise 1 Suggested Solution

2. The Concept of Convergence: Ultrafilters and Nets

In N we can do addition, but in order to do subtraction we need to extend N to the integers

HOMEWORK ASSIGNMENT 6

Problem List MATH 5143 Fall, 2013

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

MATH 117 LECTURE NOTES

Chapter 1 The Real Numbers

5.4 Continuity: Preliminary Notions

Problem set 1, Real Analysis I, Spring, 2015.

Principle of Mathematical Induction

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

Selected solutions for Homework 9

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

convergence theorem in abstract set up. Our proof produces a positive integrable function required unlike other known

Cauchy Sequences. x n = 1 ( ) 2 1 1, . As you well know, k! n 1. 1 k! = e, = k! k=0. k = k=1

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Important Properties of R

Measure and Integration: Concepts, Examples and Exercises. INDER K. RANA Indian Institute of Technology Bombay India

Structure of R. Chapter Algebraic and Order Properties of R

n [ F (b j ) F (a j ) ], n j=1(a j, b j ] E (4.1)

Consequences of the Completeness Property

In N we can do addition, but in order to do subtraction we need to extend N to the integers

Metric Spaces and Topology

Principles of Real Analysis I Fall VII. Sequences of Functions

THE RADIUS OF CONVERGENCE FORMULA. a n (z c) n, f(z) =

MA103 Introduction to Abstract Mathematics Second part, Analysis and Algebra

CHAPTER I THE RIESZ REPRESENTATION THEOREM

Math 104: Homework 7 solutions

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

CHAPTER 8: EXPLORING R

5.5 Deeper Properties of Continuous Functions

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation

Due date: Monday, February 6, 2017.

Introductory Analysis 2 Spring 2010 Exam 1 February 11, 2015

MATH 140B - HW 5 SOLUTIONS

Lebesgue Integration on R n

Probability and Measure

ADVANCED CALCULUS - MTH433 LECTURE 4 - FINITE AND INFINITE SETS

5.1 Increasing and Decreasing Functions. A function f is decreasing on an interval I if and only if: for all x 1, x 2 I, x 1 < x 2 = f(x 1 ) > f(x 2 )

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

The Arzelà-Ascoli Theorem

Scalar multiplication and addition of sequences 9

Sequences. We know that the functions can be defined on any subsets of R. As the set of positive integers

Mathematical Methods for Physics and Engineering

Continuity. Matt Rosenzweig

Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall.

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Introduction to Real Analysis Alternative Chapter 1

Math 172 HW 1 Solutions

Sequence convergence, the weak T-axioms, and first countability

6.2 Deeper Properties of Continuous Functions

MAS221 Analysis Semester Chapter 2 problems

Some Background Material

Convex Analysis and Economic Theory AY Elementary properties of convex functions

Lecture 21. Hypothesis Testing II

Real Analysis - Notes and After Notes Fall 2008

Numerical Sequences and Series

Measure and integration

Math 328 Course Notes

ABSTRACT INTEGRATION CHAPTER ONE

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Exercises for Unit VI (Infinite constructions in set theory)

Continuity. Chapter 4

7 About Egorov s and Lusin s theorems

Basic Definitions: Indexed Collections and Random Functions

Real Analysis Comprehensive Exam Fall A(k, ε) is of Lebesgue measure zero.

Continuity. Chapter 4

MA 1124 Solutions 14 th May 2012

Introduction to Proofs in Analysis. updated December 5, By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION

Austin Mohr Math 704 Homework 6

Convex Optimization Notes

Exercise 2. Prove that [ 1, 1] is the set of all the limit points of ( 1, 1] = {x R : 1 <

Handout 2 (Correction of Handout 1 plus continued discussion/hw) Comments and Homework in Chapter 1

1 Definition of the Riemann integral

We have been going places in the car of calculus for years, but this analysis course is about how the car actually works.

Transcription:

7: /4/ TOPIC Distribution functions their inverses This section develops properties of probability distribution functions their inverses Two main topics are the so-called probability integral transformation inverse probability transformation Distribution unctions Let X be a real-valued rom variable defined on a sample space Ω The distribution function (df ) of X is the function X from R := (, ) to [, ] defined by := P [ ω Ω : X(ω) x ] P [ X x ] Here are a couple of examples which motivate Theorem below The symbol is to be read as distributed as : df of X Uniform on {, /, } x df of X Uniform on [, ] x Theorem (Properties of ) or any rom variable X, the distribution function of X has these properties: D is nondecreasing; D is right-continuous; D lim x = lim x = Moreover for each x R, D4 (x ) := lim w x, w<x (w) = P [ X < x ], D5 jump of at x := (x ) = P [ X = x ] Proof I will prove D D, leave the rest to you as Exercise = P [X x[ D is nondecreasing D is right-continuous D x y = (y): Indeed, suppose x y Then the event A := {X x} is contained in the event B := {X y}, so = P [A] P [B] (y) D x n x = (x n ) Indeed, suppose x, x, is an infinite sequence of real numbers that decrease down to x Then the events A n := {X x n } shrink down to the event A := {X x} By a property of probability measures (see (5), below), (x n ) = P [A n ] decreases down to P [A] = A special case This section discusses inverse dfs the probability integral inverse probability transformations under the simplifying assumption that the distribution function of X is continuous strictly increasing, as illustrated below: u x (u) or u (, ), let (u) be defined as in the picture, ie, (u) is the unique number ξ such that (ξ) = u By a result in analysis, (u) is continuous strictly increasing in u Consider now the rom variable (X), whose value at a sample point ω is ( X(ω) ) = P [ ω : X(ω ) X(ω) ], the probability of observing a new value for X no greater than the value X(ω) at h or < u <, we have (X) u X = ( (X) ) (u), so P [ (X) u ] = P [ X (u) ] = ( (u) ) = u () This implies that (X) is uniformly distributed over (, ): in symbols, (X) U for U Uniform(,) R

(): (X) U Uniform(,) () has implications for statistics, in the context of hypothesis testing Think of X as a test statistic for the simple hypothesis H that the data are distributed according to P, with the alternative A such that you ought to reject H when X is too far to the left or an observed value x of X, = P [X x] is the chance of getting a result as extreme, or more so, than the one at h In statistics, this quantity is called the p-value Small p-values argue against H ( thus in favor of A); in decision theory you reject H ( accept A) if the p-value is sufficiently small, say 5 () says that when H is in fact true, repeated tests will produce p-values that are uniformly distributed over (, ); by chance alone, you ll get a p-value less than 5 ( mistakenly reject H) about time in Now let U be a rom variable uniformly distributed over (, ) consider the rom variable (U) or x R, (U) x U = ( (U) ), so P [ (U) x ] = P [ U ] = = P [ X x ] () Since this is true for all x, (U) X have the same distribution: (U) X This result has implications for rom number generation Namely, if you can somehow generate a uniform variable U, then (U) will be distributed like X In principle this method can always be used to simulate X, but it is efficient only when is easy to compute When that s not the case, one can often get an efficient algorithm by using some result from distribution theory The transformation from X to (X) is called the probability integral transformation (PIT), whereas the transformation from U to (U) is called the inverse probability transformation (IPT) In what follows we are going to study the PIT IPT in the general case, where may be discontinuous not strictly increasing Inverse distribution functions Let be an arbitrary probability distribution function Since the graph of can have jumps flat spots, there is no true inverse to in the usual sense One can define a kind of inverse to, as follows Refer to the figure below: u u u (u ) x (u ) (u ) As a first attempt, try taking (u) to be that x such that u = This definition works for u = u, but it doesn t work for u = u, for which there is a whole range of x s such that u = As a second attempt, try taking (u) to be the smallest x such that u = This works for u = u u, but it doesn t work for u = u, since there are no x s such that u = As a third attempt, try taking (u) to be the smallest x such that u This works for u = u u u We make this the general definition: more precisely we take (u) := inf{ x R : u } () for u (, ); here inf sts for infimum, or greatest lower bound To better underst () fix u (, ) consider the set I := {x R : u } Note that: I is nonempty, because u < as x ; I is an interval extending out to +, because is nondecreasing; I has a finite left-endpoint, say ξ, because u > as x ; ξ I, because is right-continuous The last claim follows from x n := ξ + /n I for all integers n = u (x n ) for all n = u lim n (x n ) = (ξ) = ξ I 4

7: /4/ To summarize, I = { x R : u } is a left-closed right-semiinfinite interval (u) = ξ is its finite left endpoint This gives Theorem Let be a probability distribution function let : (, ) R be defined by (u) = inf{x R : u } The infimum here is attained: (u) is in fact the smallest x R such that u (4) Moreover, for any u (, ) x R, one has u (u) x (5) Relation (5) is called the switching formula (S) (5) has an obvious counterpart: u > (u) > x; (6) But watch out! If is changed to < throughout (5), or if > is changed to throughout (6), the resulting assertions may not hold: see Exercises 8 In these notes, when invoking (5) (6), I will always write the u-thing on the left the x-thing on the right only use the valid inequalities > The theorem below gives the main properties of To motivate them, here is the graph of (u) versus u for the on the preceding page (the scales are different, though): (u ) (u ) (u ) u u u Note that this is nondecreasing left-continuous As is the case for any nondecreasing function, (u+) := lim v u, v>u (v) (7) exists for each u 5 (): (u) = inf{ x R : u } (S): u (u) x Theorem (Properties of ) Let be a probability distribution function let be defined by () for < u < Then ID is nondecreasing; ID is left-continuous; ID lim u (u) = inf{ x R : > } lim u (u) = sup{ x R : < } ID4 for each u (, ) x R with < <, ( ( (u)) ) u ( (u) ), (8) ( ) x ( ()+ ) (9) Proof ID < u v < = (u) (v): This follows easily (show how!) from the definition of It also follows from the switching formula: (v) (v) = v ( (v) ) (by the S) = u ( (v) ) (since u v) = (u) (v) (S again) ID: u n u = (u n ) (u) The assumption is u n u n+ for all n u = lim n u n Since is nondecreasing, we have (u n ) (u n+ ) (u) for all n, so L := lim n (u n ) exists satisfies L (u) To get the opposite inequality, consider an x R with (u) > x Then u > (by the S) = u n > for all large n (since u n u) = (u n ) > x for all large n (S again) = L > x (since L = lim n (u n )) Now let x tend up to (u) to conclude L (u), as desired 6

(): (u) = inf{ x R : u } (S): u (u) x (8): ( (u) ) u ( (u) ) (9): ( ) x ( + ) ID: This result is not so important, so I ll leave it to you as Exercise (8) holds The inequality on the right follows directly from the S, as in the proof of ID To get the inequality on the left, set x = (u); we need to show u (x ) or this let ξ < x Then (u) > ξ (since x = (u)) = u > (ξ) (by the S) Letting ξ tend up to x shows that u (x ), as desired (9) holds The argument for this is similar to that for (8); I ll leave it to Exercise 4 Relations (8) (9) specify the extent to which are inverses Note that the inequalities in these relations can be strict, as in the following cases: u (x ) (8) x := (u) u := (9) (u) x (u+) In view of ID ID4, is called the left-continuous inverse to ID4 the preceding examples yield this corollary: Theorem 4 Let be the left-continuous inverse to the df Then ( (u) ) = u for all u (, ) iff is continuous, () { ( ) } = x for all x A := { x R : < < } () iff is strictly increasing over A (): (u) = inf{ x R : u } (S): u (u) x The inverse probability transformation We saw earlier (see () ()) that if the df of a rom variable X is continuous strictly increasing, then (i) (X) is uniformly distributed over (, ),, conversely, (ii) if U is uniformly distributed over (, ), then (U) X The following theorem says that (ii) without any conditions on (i) is not always true, but there are some things that can be said; we ll deal with that in the next subsection Theorem 5 (The IPT Theorem) Let X be a rom variable with df left-continuous inverse df If U (, ), then (U) X Proof or each x R, we have (U) x U by the switching formula Thus P [ (U) x ] = P [ U ] = = P [ X x ] Since this is true for all x R, (U) X Example Suppose X takes the values, /, with probability / each The graphs of are as follows: Df of X x (u) Inverse df u It is clear that if U Uniform(, ), then (U) takes the values, /, with probability / each, just as X does 7 8

7: /4/ (8): ( (u) ) u ( (u) ) U Uniform = IPT (U) (): or X, P [ (X) u ] u for all u (, ) The probability integral transformation The second half of the following theorem gives a necessary sufficient condition for (X) to be uniformly distributed over (, ) Theorem 6 (The PIT Theorem) Let X be a rom variable with df Then P [ (X) u ] u for all u (, ) () Moreover P [ (X) u ] = u for all u (, ) is continuous () Proof Let U Uniform(, ) let be the left-continuous inverse to By the IPT, X (U), so (X) ( (U) ) () holds In general, ( (U) ) U by (8) Thus for all u (, ), Example As in the preceding example, suppose X takes the values, /, with probability / each Then Y := (X) takes the values /, /, with probability / each The graphs of the df of X the df G of Y are as follows: Df of X x G(y) Df G of Y = (X) y The graph of G shows that G(y) y for all y (, ), as () asserts The proof of the PIT Theorem illustrates a useful technique if you want to prove something about the distribution of a rom variable X with df, try representing X as (U) for a uniform rom variable U P [ (X) u ] = P [ ( (U) ) u ] P [ U u ] = u () holds If is continuous, then U = ( (U) ) by (8), so (X) ( (U) ) = U Uniform(,) On the other h, if is not continuous, then there exists an x R such that < (x ) = P [ X = x ] P [ (X) = ] But P [ U = ] =, so (X) U 9

Exercises The following definitions results are needed for Exercise Suppose (x n ) n= is an infinite sequence of real numbers x is a real number One writes x n x to mean x n x n+ for all n lim n x n = x x n x to mean x n x n+ for all n lim n x n = x Similarly, if (A n ) n= is an infinite sequence of events A is an event, one writes A n A to mean A n A n+ for all n A = A n A to mean A n A n+ for all n A = One of the properties of a probability measure P is that n= A n, n= A n A n A = P [A n ] P [A] (4) A n A = P [A n ] P [A] (5) Properties (4) (5) are called respectively continuity from below continuity from above Exercise Complete the proof of Theorem, using (4) (5) to verify properties D D4 Exercise Let be the left-continuous inverse to a df Show by examples that (u) < x does not imply u < that u < does not imply (u) < x Exercise Prove ID [Hint: put L = lim u (u) ξ = inf(a) for A = { x R : > } Note that L ξ may be Deduce L ξ from the fact that L x for each x A (why?) Deduce L ξ from the fact that (u) A for each u > (why?)] Exercise 4 Prove (9) Exercise 5 Let be a df such that < < for all x R let be the left-continuous inverse to Show that ( ( (u) )) = (u) for all u (, ) (6) ( ( )) = for all x R Exercise 6 Prove Theorem 4 (7) Exercise 7 Inequality () has an important implication for p- values What is that? Exercise 8 Let X be a rom variable with df left-continuous inverse df (a) Show that for x R < u <, one has u (x ) (u+) x, (8) with (x ) defined as in D4 (u+) defined by (7) (b) Use part (a) to show that for < u <, (u+) = sup{ x : u (x ) } (9) [Hint for (a): use the switching formula v > (w) (v) > w, noting for example that u (x ) u (w) for all w < x] Unfornately the jumps of complicate what would otherwise be a simple theory orturnately, though, don t have too many jumps according to the following exercise, there are at most countably many of them Exercise 9 Let B be a subinterval of R (B doesn t have to be a proper subinterval; the case B = R is allowed) Let f be a non-

7: /4/ decreasing mapping from B into R (or example, f might be a df, defined on B = R, or an inverse df, defined on B = (, )) Put D f := {x B : f is discontinuous at x}, () C f := D c f = {x B : f is continuous at x} () (a) Show that D f is countable (b) Use part (a) to show that C f is dense in B [Hint for (a): irst show that for any closed bounded subinterval A of B any number ɛ >, there are at most finitely many points x A such that the jump f(x+) f(x ) of f at x exceeds ɛ] The following exercise plays an important role in the theory of convergence of probability distributions The main point is that () implies (4) Similar reasoning shows that, conversely, (4) implies (); you don t have to give the argument for that Exercise Let,,, n,, be dfs with corresponding left-continuous inverse dfs,,, n,, Suppose that lim n n (x) = for all continuity points x of () (a) Suppose that u (, ) that w is a continuity point of with (u) > w Use the switching formula to show that n(u) > w for all large n (b) Suppose that u (, ) that y is a continuity point of with (u+) < y Show that n(u) y for all large n (c) Use parts (a) (b) of this exercise part (b) of the preceeding exercise to show that for each u (, ), one has (u) lim inf n n(u) lim sup n n(u) (u+) () (d) Use part (c) to show that lim n n(u) = (u) for all continuity points u of (4) (e) Show by example that if u is not a continuity point of, then n(u) may not converge as n