Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27
The probability space De nition The (Ω, A, P) measure space is a probability space where Ω is the set of elementary events; A is the σ- eld of events (or simply eld of events); P is the probability measure, i.e. P(Ω) = 1. Examples Combinatorial probability Ω = fω 1, ω 2,, ω n g A =2 Ω P (fω k g) = 1 n k = 1, 2,, n =) P(A) = ωk 2A 1 "No. of elements in A" n = "No. of elements in Ω" A Ω Geometric probability Ω 2 B n 0 < m n (Ω) < A = B nω P(A) is proportional to m nω (A) =) P(A) = m n(a) m n A 2 B (Ω) nω (VEGTMAM144B) Measure-theoretic probability November 28, 2012 2 / 27
Random variables De nition Let (Ω, A, P) be a probability space. An A-measurable function X : Ω! R n is called random variable (RV). Remark If n = 1 it is scalar RV, otherwise random vector variable (RVV) The X : Ω! R function is a scalar RV i for all x 2 R X 1 (] ; x[) = fω j X (ω) < xg 2 A. The X = (X 1, X 2,, X n ) : Ω! R n function is a RVV i X k : Ω! R are scalar RVs for all k = 1, 2,, n. (VEGTMAM144B) Measure-theoretic probability November 28, 2012 3 / 27
Distribution of random variables De nition The cumulative distribution function (cdf, or simply distribution function) of the RV(V) X = (X 1, X 2,, X n ) : Ω! R n is the function F (x 1, x 2,, x n ) = P X 1 (] ; x 1 [] ; x 2 [ ] ; x n [) = P (X 1 < x 1, X 2 < x 2,, X n < x n ) (x 1, x 2,, x n ) 2 R n. Remark The cdf of X is the distribution function induced by the generated P X measure on B n, F and P X uniquely determine each other, so that Z Z P(X 2 B) = P X (B) = dp X = df B 2 B n. B B (VEGTMAM144B) Measure-theoretic probability November 28, 2012 4 / 27
Properties of the cdf If I R n is a bounded interval e.g. if n = 1 P(X 2 I ) = P X (I ) = [F ] I 0 P X ([a; b[) = F (b) F (a) P ([a; b]) = F (b + 0) F (a) P X (]a; b[) = F (b) F (a + 0) P (]a; b]) = F (b + 0) F (a + 0) F is left-continuos and monotone increasing in each of its variables. F has limit values for (x 1,, x k 1, x k+1,, x n ) 2 R n 1 n > 1 lim xk! F (x 1,, x k,, x n ) = 0 lim xk!+ F (x 1,, x k,, x n ) = P (X 1 < x 1,, X k 1 < x k 1, X k+1 < x k+1, X n < x n ) what is the cdf of the RV(V) (X 1,, X k 1, X k+1,, X n ), and if n = 1 then lim x! F (x) = 0 lim x!+ F (x) = 1. (VEGTMAM144B) Measure-theoretic probability November 28, 2012 5 / 27
Discrete distribution If P X is dominated by the counting measure ν(a) = 1, z2z \A where Z is a nite or countable subset of R n, then denoting p z = dp X d ν (z) at the point z 2 Z which is a subset of the range X with P X measure 1 and p z = 0 otherwise, we have F (x) = p z p z = lim F (x) z<x x!z+0 F (z) where z 7! p z z 2 Z is a discrete probability density function (discrete pdf) satisfying moreover Z P(X 2 B) = P X (B) = p z 0 and p z = 1 z2z B Z dp X = z2b p z dν = p z B 2 B n. z2b (VEGTMAM144B) Measure-theoretic probability November 28, 2012 6 / 27
Continuos distribution If P X is dominated by the Lebesgue measure m n, then denoting f (z) = dp X dm n (z) at the point z 2 Z which is a subset of the range X with P X measure 1 and f (z) = 0 otherwise, we have Z F (x) = z<x f dm n x 2 R n f (z) = n z 1 z n F (z) at f s continuity points z 2 R n where z 7! f (z) is a continuos probability density function (continuos pdf) satisfying Z f (z) 0 and f (z) dm n = 1 moreover Z P(X 2 B) = P X (B) = B Z Z dp X = where the relations <, 2 are meant by componentwise. B f (z) dm n B 2 B n, (VEGTMAM144B) Measure-theoretic probability November 28, 2012 7 / 27
The support of a pdf Remark In both cases the discrete and continuos pdf dp X d ν, dp X dm n can be set to 0 except on a set Z im(x ) with P X measure 1, as in the discrete case P X (B) = in the continuos case Z P X (B) = B \Z P X (B) = P X (B \ Z ) B 2 B n, z2b \Z p z + 0 = z2b \Z c p z z2b \Z Z Z f (z) dm n + 0 dm n = f (z) dm n. B \Z c B \Z (VEGTMAM144B) Measure-theoretic probability November 28, 2012 8 / 27
Probability vs measure theory If some property is valid on the points of Ω except on a set N of P-measure 0, i.e. it is valid a.e., then P(N c ) = 1 and we say that it is valid with probability 1, i.e. almost sure (a.s.). If the sequence X 1, X 2, of RVs converges to the RV X in 1 a.e., then we say that the sequence converges strongly, or a.s., denoted (a.s.) lim n! X n = X ; 2 P-measure, then we say that the sequence converges in probability, denoted (P) lim n! X n = X ; 3 L p, then we say that the sequence converges in p-th mean, denoted (L p ) lim n! X n = X, we say convergence in mean or in mean square in case of p = 1 and p = 2 respectively, where mean square convergence implies convergence in mean. (VEGTMAM144B) Measure-theoretic probability November 28, 2012 9 / 27
Weak convergence of distributions De nition The sequence of cdfs F n : R r! [0; 1] n = 1, 2, converges weakly to the distribution function F : R r! [0; 1] if denoted F n w! F. lim F n(x) = F (x) at continuity points x 2 R r of F, n! Remark As a cdf can have at most countable discontinuity points, the weak limit is unique too. w! F equivalent with F n lim n! [F n] I = [F ] I I = [a 1 ; b 1 ] [a r ; b r ] R r for a = (a 1,, a r ) and b = (b 1,, b r ) continuity points of F. (VEGTMAM144B) Measure-theoretic probability November 28, 2012 10 / 27
Independent events We call two or more events A i i 2 I independent if for any nite collection of indices fi 1, i 2,, i k g I P(A i1 \ A i2 \ \ A ik ) = P (A i1 ) P (A i2 ) P (A ik ) which means according to the concept of conditional probability measure P(A j B) = P (A\B ) P(B) > 0 that for any two disjoint collections of P (B ) indices P(A i1 \ A i2 \ \ A ik j A j1 \ A j2 \ \ A jl ) = P(A i1 \ A i2 \ \ A ik ) fi 1, i 2,, i k g, fj 1, j 2,, j l g I i.e. some of them occur jointly with the same probability regardless of the occurrence of some other s. Remark Any event can be substituted by its complement, the independence remains true. (VEGTMAM144B) Measure-theoretic probability November 28, 2012 11 / 27
Independent collections of events and RVs We call two or more set of events A i i 2 I independent if the events A i 2 A i i 2 I are independent. It also means that a set can be extended by the complements of its events. We call two or more RV(V)s X i i 2 I independent if A Xi i 2 I are independent. It is equivalent with the following conditions: for any nite collection of indices fi 1, i 2,, i k g I P (Xi1,X i2,,x ik ) = P Xi1 P Xi2 P Xik F (Xi1,X i2,,x ik )(x 1, x 2,, x k ) = F Xi1 (x 1 ) F Xi2 (x 2 )... F Xik (x k ) and if P (Xi1,X i2,,x ik ) is dominated by the Lebesgue measure m k, f (Xi1,X i2,,x ik )(x 1, x 2,, x k ) = f Xi1 (x 1 ) f Xi2 (x 2 )... f Xik (x k ) m k a.e. and if P (Xi1,X i2,,x ik ) is dominated by a counting measure ν k, p (x1,x 2,,x k ) = p x1 p x2... p xk ν k a.e. (VEGTMAM144B) Measure-theoretic probability November 28, 2012 12 / 27
Expectation and standard deviation De nition The expected value of the scalar RV X : Ω! R is the integral Z E (X ) = XdP if its value is nite, i.e. X 2 L 1 (Ω, A, P). The standard deviation of the scalar RV X : Ω! R is r D(X ) = E h(x E (X )) 2i if E (X ) and E Ω h (X E (X )) 2i are existing, i.e. X 2 L 2 (Ω, A, P). V (X ) = D 2 (X ) = E is called the variance of the RV X. h (X E (X )) 2i = E X 2 E 2 (X ) (VEGTMAM144B) Measure-theoretic probability November 28, 2012 13 / 27
Properties of the expectation Assuming the existence of the expectation of the RVs, 1 if 0 X then 0 E (X ), moreover "=" holds if and only if X = 0 a.s.; 2 if, α, β 2 R then 3 if A 2 A then E (αx + βy ) = αe (X ) + βe (Y ) ; E (1 A ) = P(A) especially E (1 Ω ) = P(Ω) = 1, so a constant RV c = c 1 Ω have expectation E (c) = c the same constant; 4 if h : R! R convex function, then 5 if X, Y are independent RVs then h (E (X )) E (h(x )) ; E (XY ) = E (X ) E (Y ). (VEGTMAM144B) Measure-theoretic probability November 28, 2012 14 / 27
Properties of the standard deviation Assuming the existence of the expectation of the RVs, 1 D(X ) 0 and equality holds if and only if X = E (X ) a.s., i.e. X is a.s. constant function; 2 D(X ) = q E (X 2 ) E 2 (X ) D 2 (X ) = V (X ) = E (X 2 ) E 2 (X ) ; 3 if X, Y are independent, then q D(X + Y ) = D 2 (X ) + D 2 (Y ) V (X + Y ) = V (X ) + V (Y ) ; 4 if a, b 2 R then D(aX + b) = jaj D(X ) V (ax + b) = a 2 V (X ). (VEGTMAM144B) Measure-theoretic probability November 28, 2012 15 / 27
Calculating the expected value of a RV Let F denote the cdf of the RV X, then in the image space Z Z + E (X ) = xdp X = xdf (x) if the integral is nite. Especially if P X is dominated by the counting measure with pdf R x k 7! p k k = 1, 2,, then E (X ) = x k p k k assuming the series converges absolutely, i.e. k jx k j p k is nite; Lebesgue measure with pdf x 7! f (x) x 2 R, then E (X ) = Z + x f (x)dx assuming R the (improper-) integral converges absolutely, i.e. + jxj f (x)dx is nite. (VEGTMAM144B) Measure-theoretic probability November 28, 2012 16 / 27
Expectation of a Borel function of RV(V) Let F denote the cdf of the RV(V) X, and h : R n! R be Borel measurable, then Z + Z + Z + E (h(x )) = h(x 1, x 2,, x n )df (x 1, x 2,, x n ) if the integral is nite. Especially if P X is dominated by the counting measure with pdf x k 7! p k k = 1, 2,, then E (h(x )) = h(x k ) p k k assuming the series converges absolutely; Lebesgue measure with pdf x 7! f (x) x = (x 1, x 2,, x n ) 2 R n, then Z + Z + E (h(x )) = h(x 1,, x n ) f (x 1,, x n )dx 1 dx n assuming the (improper-) integral converges absolutely. (VEGTMAM144B) Measure-theoretic probability November 28, 2012 17 / 27
Moment-generating function De nitions The moment-generating function of the RV X : Ω! R is M X (t) = E e tx t 2 R e tx 2 L 1 The generating function of the RV X : Ω! N is G X (z) = E z X z 2 R z X 2 L 1 Remark M X (0) = G X (1) = 1, and M X (t) = G X (e t ) M X (t) = 1 + t E (X ) + t2 2 E (X 2 ) + + tn n! E (X n ) + G X (z) = P(X = 0) + z P(X = 1) + + z n P(X = n) + If there exists the moment E (X k ) of order k, then M (k) X (0) = E (X k ) G (k) X (1) = E (X (X 1) (X k + 1)) If the RVs X Y are independent then M X +Y (t) = M X (t) M Y (t) G X +Y (z) = G X (z) G Y (z) (VEGTMAM144B) Measure-theoretic probability November 28, 2012 18 / 27
Convergent sequences of RVs 1 Weak law of large numbers: if X 1, X 2,, X n, are independent identically distributed (i.i.d. RVs) with the common expectation and standard deviation m = E (X k ), σ = D(X k ) k = 1, 2,, then (P) n 1 lim n! n X k = m. k=1 2 Bernoulli law of large numbers: if the RVs Y n are distributed according to the binomial law (k 7! p k = ( n k )pk (1 p) n k k = 0, 1,, n), then (P) Y n lim n! n = p. 3 Strong law of large numbers: if X 1, X 2,, X n, are i.i.d. RVs, 1 then the limit (a.s.) lim n! n n k=1 X k exits if and only if X k 2 L 1 k = 1, 2, and in this case (a.s.) 1 lim n! n n k=1 X k = m = E (X k ). (VEGTMAM144B) Measure-theoretic probability November 28, 2012 19 / 27
Weak convergence of distributions 1 The hypergeometric distribution k 7! p k = (M k )(N converges weakly to the binomial distribution: n lim p k = k N,M! M N!p M n k ) ( N n ) p k (1 p) n k k = 0, 1,, n k=0,1,,n 2 The binomial distribution k 7! p k = ( n k )pk (1 p) n k k=0,1,,n converges weakly to the Poisson distribution: lim p n! k = λk k! e λ k = 0, 1, np!λ 3 Central limit theorem: if X 1, X 2,, X n, are square integrable i.i.d. RVs with m = E (X k ), σ = D(X k ) k = 1, 2,, then denoting n the cdf of the RV 1 n k=1 X k m p n by Fn, then σ F n w! Φ i.e. lim n! F n (x) = Φ(x) x 2 R. (VEGTMAM144B) Measure-theoretic probability November 28, 2012 20 / 27
Conditional expectation De nition Let X : Ω! R be a RV with existing expected value (X 2 L 1 ), and A 0 A a sub - σ- eld. The conditional expectation of X with respect to the σ- eld A 0 is the A 0 measurable function E (X j A 0 ) : Ω! R for which ν(a) = R A X dp = R A E (X j A 0) dp A 2 A 0 Remark The conditional expectation is the a.s. unique Radon Nikodym derivative E (X j A 0 ) = d ν dp. If Y : Ω! R n is a RV(V) and A 0 = A Y then there is a P Y a.s. unique function E (X j Y = ) 2 L 1 (R n, B n, P Y ), and substituting Y into it we get the RV E (X j Y ) = E (X j A Y ) 2 L 1 (Ω, A Y, P). If X is square integrable, then E (X j Y = ) 2 L 2 (R n, B n, P Y ) is the function, RR for which jx E (X j Y = y)j dp XY = R jx E (X j Y )j 2 dp is minimal. RR n Ω (VEGTMAM144B) Measure-theoretic probability November 28, 2012 21 / 27
Properties of the conditional expectation 1 If X 2 L 1 then E (E (X j A 0 )) = E (X ). 2 If 0 X 2 L 1 then E (X j A 0 ) 0 a.s. 3 If X, Y 2 L 1 and α, β 2 R then E (X + Y j A 0 ) = E (X j A 0 ) + E (Y j A 0 ) 4 If Y : Ω! R n is a RV(V), h : R n! R is a measurable function and X, h(y ) X 2 L 1, then E (h(y ) X j Y ) = h(y ) E (X j Y ) a.s. which is true for any h(y) = c constant function too. 5 If the RVs X, Y are independent, and h(x, Y ) 2 L 1 then E (h(x, Y ) j Y = y) = E (h(x, y)) especially if X 2 L 1 then E (X j Y ) = E (X ) 6 If X 2 L 1 and A 0 A 1 A then E (E (X j A 1 ) j A 0 ) = E (X j A 0 ) a.s. P Y - a.s. E (E (X j Y 0, Y 1 ) j Y 0 ) = E (X j Y 0 ) if A 0 = A Y0 A 1 = A Y0,Y 1 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 22 / 27 a.s.
Conditional expectation as the best regression Let X 2 L 2 and the RV(V) Y : Ω! R n, then E jx E (X j Y )j 2 Z Ω jx H(Y )j 2 dp H 2 L 2 (R n, B n, P Y ) and equality holds if and only if E (X j Y ) = H(Y ) a.s., i.e. the best (in L 2 norm) regressor function is the conditional expectation E (X j Y ). The residual variance of X by the best regressor function of Y is σ 2 R = E jx E (X j Y )j 2 = E (X 2 ) E E 2 (X j Y ) = D 2 (X ) D 2 (E (X j Y )) (VEGTMAM144B) Measure-theoretic probability November 28, 2012 23 / 27
Conditional probability If A 2 A then we denote by P(A j A 0 ) = E (1 A j A 0 ) or P(A j Y = y) = E (1 A j Y = y) the conditional probability of the event A for given A 0 or Y = y. It satis es the followings: 0 P(A j A 0 ) 1 and P(Ω j A 0 ) = 1 a.s. if A 1, A 2, 2 A are pairwise disjoint events then P([ n A n j A 0 ) = P(A n j A 0 ) a.s. n (VEGTMAM144B) Measure-theoretic probability November 28, 2012 24 / 27
Discrete case Let the RVV (X, Y ) : Ω! R R n be with the discrete pdf then (x k, y l ) 7! p kl k = 1, 2, l = 1, 2,, E (X j Y = y l ) = x k p xk jy l k l = 1, 2, where for each l = 1, 2, if 0 < p l = k p kl = P(Y = y l ) p xk jy l = P(X = x k j Y = y l ) = p kl p l k = 1, 2, is the conditional distribution of the RV X for given Y = y l. (VEGTMAM144B) Measure-theoretic probability November 28, 2012 25 / 27
Continuos case Let the RVV (X, Y ) : Ω! R R n be with the continuos pdf (x, y) 7! f (x, y) x 2 R, y 2 R n then E (X j Y = y) = Z x f X jy (xjy) dx where for each y 2 R n if 0 < f Y (y) = R f (x, y) dx y 2 R n f X jy (xjy) = f (x, y) f Y (y) x 2 R is the conditional distribution of the RV X for given Y = y. (VEGTMAM144B) Measure-theoretic probability November 28, 2012 26 / 27
Mixed case If X = k x k 1 Ak where fx = x k g = A k 2 A P(A k ) > 0 k = 1, 2, is a partition of Ω, and the RV Y : Ω! R n is distributed depending on the occurrence of the event A k as Z P(Y 2 B j A k ) = f k dm n B 2 B n k = 1, 2, then Y is distributed with the continuos pdf and f (y) = k P(A k ) f k (y) E (Y j X = x k ) = Z P(A k j Y = y) = P(A k ) f k (y) f (y) P(A E (X j Y = y) = k ) f k (y) x k f (y) k B y 2 R n (theorem of total probability) y f k (y) dy k = 1, 2, (if n = 1) y 2 R n and f (y) > 0 (Bayes theorem) y 2 R n and f (y) > 0 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 27 / 27