Stat 643 Review of Probability Results (Cressie)

Stat 643 Review of Probability Results (Cressie) Probability Space: ( HTT,, ) H is the set of outcomes T is a 5-algebra; subsets of H T is a probability measure mapping from T onto [0,] Measurable Space: ( HT, ) Random Variable: Suppose ( HT,, T ) is a probability space and let : H Ä be measurable (ie, { = H : ( = ) Ÿ!} T) Then is said to be a random variable (rv) Integral of a Measurable Function: Suppose ( HT,, ) is a measure space (ie, maps from T onto [0,_]) and 0 is a measurable mapping Then ' 0 is defined as a limit of integrals of simple (ie, step) functions: Write: 0œ0 + 0, where 0 +, 0 0 If ' 0 + _ and ' 0 _ then the integral is said to be finite If ' 0 + ' œ_œ 0, then the integral is said not to exist; otherwise it is said to exist The measurable function 0 is said to be integrableif ' 0 exists and is finite Notation: If is a rv on ( HT,, T ), write E( ) for ' T Important Convergence Theorems Let ( HT,, ) be a measure space and ( U, ) be the measurable space of real numbers with the Borel 5- algebra U In the following,, 0, { 08} 8>, and { 8} 8> denote measurable functions from ( HT, ) into ( U, ) Fatou's Lemma: If 08 0 as (), for all 8, then ' liminf 0 8 Ÿ liminf ' 0 8 Monotone Convergence Theorem: Suppose that as (), 0 Ÿ08 Å 0 Then ' 0 8 Å ' 0 Dominated Convergence Theorem: Suppose that as (), 08 Ä0 as and 08 Ÿfor all 8 If ' _, then 0 is integrable and lim ' 0 8 œ ' 0 Extended Dominated Convergence Theorem: Suppose that (i) 0 Ä0 as (), Äas () 8 8 (ii) 0 Ÿ as () and ' _, for all 8 8 8 8 (iii) lim ' 8 œ ' _

Then, lim ' 0 8 œ ' 0 _ Note: Dominated convergence is a special case with 8 œ for all 8 Scheffe's Theorem: Suppose 08 0 and 0 0 as () Let / 8( E ) ' 0 8 E and /( E ) ' 0 be measures on ( HT,, ) with / ( H ) œ /H ( ) _, for all 8 If 0 Ä0as (), then 8 8 (i) sup{ / 8( E ) /( E) : E T} Ä 0, as and (ii) ' 0 0 8 Ä 0, as Uniform Integrability The sequence of measurable functions { 0 8 } 8> is called uniformly integrable (wrt ) if lim sup ' 0 - { 0 > } 08 œ 8 - Ä_8> E Theorem: Suppose that H ( ) _, 08 Ä0 as (), and the sequence { 08} 8> is uniformly integrable Then { 0 : 8 }, 0 are all integrable and ' 0 Ä ' 0 8 8 Various Forms of Convergence for rv's as Almost sure convergence: 8 Ä if T ( lim 8 œ ) œ T Convergence in probability: 8 Ä if lim T ( 8 %) œ 0, a % 0 L: Convergence in L p : 8 Ä if for ' : 8 T _, ' : T _, lim ' : 8 T œ0 eg, :œ corresponds to convergence in the mean :œ2 corresponds to convergence in mean square Jensen's Inequality: Let (,,) w be a random vector (ie, a measurable mapping from ( HT,, T) to (, U ), where U is the 5 -algebra of Borel sets in ) Suppose H as, where H is a convex set in, and E( ) _ (Recall that (" # + â + # ) /2 ) Define w E( ) (E( ),,E()) Let 9 : H Ä, where 9 is convex (ie, 9! ( B (! ) C) Ÿ!9( B) (!9 ) ( C)) Then E( 9( )) 9(E( )) Corollary: If < is concave then E( < ()) Ÿ < (E()) ( < is concave iff < is convex) Radon-Niodym Theorem Definition: A signed measure on a measurable space ( HT, ) is a mapping : T Ä ( _,_] such that (st) 9 ( ) œ 0 and (- E ) œ!( E ), (*) M 3 3 M 2

where M is countable and { E 3 } are disjoint The equality in (*) is taen to mean that the summation converges absolutely if ( - E 3 ) is finite, and diverges otherwise 3 M Jordan Decomposition: A signed measure can be written as œ +, where + and are measures Definition: + Example: Let be a rv st ' T _ Then is a signed measure ' E ( E ) T ; E T _ 3 3 3 3= Definition: A measure is a 5-finite measure if b disjoint { E } st H œ -E and ( E ) _ ; 3œ,2, Definition: A [signed] measure is said to be absolutely continuous (ac) with respect to a [signed] measure / if /( E) œ 0 Ê ( E) œ 0 [ / ( E) œ 0 Ê ( E) œ 0] Write << / Definition: Two measurable mappings 0 and on ( HT,, ) are said to be equivalent if ( 0Á) œ 0 Radon-Niodym (R-N) Theorem () Let ( HT,, T) be a probability space and a signed measure st << T Then b rv, unique up to equivalence ( T ), st ( E) œ ' E T, ae T, where ' T _ (2) Let ( HT/,, ) be a measure space, where / is 5 -finite, and let be a signed measure st << / Then b measurable mapping 0, unique up to equivalence (/), st ( E) œ ' E 0 /, ae T, where ' 0 / _ Notes: (i) When ( HT,, T) is a probability space, T is trivially 5-finite and so () is just a special case of (2) (ii) When is a measure, 0 in (2) (and thus in ()) is nonnegative Notation: in () or 0 in (2) is called the Radon-Niodym derivative and is denoted as /d T in () or // in (2) Conditional Expectations Let ( HT,, T ) be a probability space and a rv st is integrable Let V T be a sub 5- algebra (ie, V is a 5-algebra contained in T ) Define a signed measure on ( HV, ) by: 3

' G (C) T ; G V Definition: Let T V be the probability measure on V given by TV( G) œtg ( ) ; ag V Then, << T V and for any V -measurable rv ] (ie, ] (( _,!]) V, a! ), ']T œ ']T V Definition: A function : H Ä is called (a version of) the conditional expectation of given V if (i) is V-measurable (ii) ' T œ ' T, ag V G G Note: The R-N Theorem guarantees the existence of the conditional expectation because ( G) œ ' G TV, where is the R-N derivative T ; ie, V ( G) œ T The rv on V is unique up to equivalence Notation: œ E( V ) Notes: (i) If is V -measurable (ie, (( _,!]) V, a! ) then E( V) œ as ( T ) (ii) If V œ { 9H, }, then E( V) œ E( ) (iii) Suppose ] is another rv on ( HT,, T ) and define U( ]) {] ( F): F U} to be the 5-algebra generated by ] Then write E( U( ])) as E( ] ) (iv) The conditional expectation of smooths" the rv Suppose is T- measurable on ( HTT,, ) Then: Sub 5-algebra: { 9H, } V T E( ) : E( ) E( V) as smoothest" smooth" roughest" Conditional Monotone Convergence Theorem: If 8 0, 8 Å as ( T ), then E( 8V ) Å E( V) as ( T) Conditional Dominated Convergence Theorem: If 8 Ä as ( T ), 8 Ÿ] as ( T) and E( ] ) _, then lim E( ) E( ) as ( ) 8 V œ V T ' G 4

Conditional Jensen's Inequality: Let 9 : H Ä, where 9 is convex and H is a convex subset of Let be a rv on ( HT,, T ) st H as ( T ) Suppose E( 9( ) ) _ Then E( 9( ) V) 9(E( V)) as ( T ) Regular Conditional Probability Let E T and ME be the indicator function of M Then E( ME V) is a V-measurable rv that has some properties of a probability on sets E T Notice that T( =, E ) E( ME V )( = ) is a mapping from H T onto [0,] Definition: Given a probability space ( HT,, T ), < : H T Ä [0,] is called a regular conditional probability on T given V if (i) for each fixed = H, <= (, ) is a probability measure on ( HT, ), (ii) for each fixed E T, < ( E, ) is V-measurable, and (iii) for every E T, ' G <= (, ET ) ( = ) œte G ( ), ag V Note: Although E( M E V) satisfies (ii) and (iii), it does not necessarily satisfy (i) Theorem: If ( HT,, T) œ (, U, T), then a regular conditional probability on U given V exists and is given by <= (, E) œ E( V )( = ) M E Change of Variable Theorem: Let 0: ( HT,, ) Ä * * * * ( H, T,/) and : ( H, T,/) Ä ( U, ) be two measurable functions, where * * /( E ) ( 0 ( E )) is the measure induced by 0 on ( H *, T * ) Then ' ' ' * * * / œ * 0 œ * 0 ( ( = )), ae T, E 0 ( E ) 0 ( E ) in the sense that if one of the integrals exists then so does the other and they are equal Probability versus Statistics Probability is concerned with rv's : ( HT,,P) Ä ( U, ) Now induces a probability measure T on ( U, ) through T ( F ) T ( ( F)), af U Statistics focuses on the triple ( U,, T ) and essentially forgets about ( HT,, T ) More generally, does not have to be a rv on ( U, ) but could be some more general random quantity Then data could be thought of simply as a measurable mapping into ( U,, T ) Example: œ and U œ U : Then is a random vector More complicated and U are needed when is, say, a random set Definition: A statistic is a measurable mapping : ( U,, T ) Ä ( g, Y) Note: T induces a T : T ( J ) T ( ( J)), aj Y 5

Notation: U0 U( ) { ( J): J Y}, the 5-algebra generated by Then U0 U is a sub 5-algebra Lehmann's Theorem (TSH, p 42): A real valued U-measurable function 9 is U0-measurable iff b a real-valued Y-measurable function < : ( g, Y ) ( U, ) st Ä 9 ( B) œ < ( B ( )), ab, where is a statistic mapping into ( g, Y) and U œ U( ) 0 Partitioning the Sample Space Suppose is the sample space and is a statistic Then defines a partition of as follows: For BC,, Band C are in the same member of the partition (write BµC) iff B ( ) œc ( ) Notice that two different" statistics can generate the same partition, eg, ( B) œ! B and ( B) œ! B / 3 2 3 3= 3= It is tempting to characterize a statistic's behavior via its partition of but, for technical reasons, it does not necessarily generate the 5-algebra of interest, namely U( ) In general, if U( ) œ U( ) then we say the two statistics, are the same 2 2 Change of Variable Formula Involving a Statistic Suppose ( U,, T ) Ä ( g, Y, T ) Ä ( U, ) Then, ' ( ( )) ( ) ' () ();, ( J) B T B œ J >T > J Y in the sense that if one integral exists then so does the other and they are equal Proof: Assume Jœ g, the whole space First let œm for some J Y Then J B ( ( )) œm ( B ( )) œm ( B) J ( J ) 6

Thus, since œ ( g ), ' ( ( )) ( ) ' B T B œ M ( BT ) ( B) ( J ) œt ( ( J )) œt ' g ( J ) œ M () >T () > J ' g œ >T () () > The same result is obtained when J is not g Therefore, the result is true for indictor functions, so true for simple functions, and hence true for limits of simple functions Probability Version of Change of Variables Formula Suppose Then, ( HT,, T) Ä ( U,, T ) Ä ( U, ) ' H ( ( )) ( ) ' = T = œ BT ( ) ( B), which is nown as the law of the unconscious statistician Change of Variables and the R-N Derivative Suppose T <<, where is some 5-finite measure on ( U, ) Define 0B ( ) (T /)( B); B, which recall is a U-measurable mapping; ie, T ( F) œ ' 0 Then, for st ' T ( B ) T ( B) _, the change of variable formula gives Further, E( ( )) ' H ( ( = ))T( = ) F œ ' BT ( ) ( B) ' BT ( ) ( B) œ ' B0B ( ) ( ) ( B) 7

This last equality is true for ( ) œm F( ), because ' T œ ' M FT œt ( F) œ ' 0 F œ ' MF0 œ ' 0 Hence it is true for simple functions, and so it is true for limits of simple functions Example: is Lebesgue measure Write B" as shorthand for ( B) For example, if ' 2 2 BT B _ œ ' 2 ( ), then E( ) B0BB ( ), where 0B ( ) is the R-N derivative of T wrt ; 0 is commonly called the probability density function 8