P. Billingsley, Probability and Measure, Wiley, New York, P. Gänssler, W. Stute, Wahrscheinlichkeitstheorie, Springer, Berlin, 1977.

Size: px

Start display at page:

Download "P. Billingsley, Probability and Measure, Wiley, New York, P. Gänssler, W. Stute, Wahrscheinlichkeitstheorie, Springer, Berlin, 1977."

Meredith Evans
6 years ago
Views:

1 Probability Theory Klaus Ritter TU Kaiserslautern, WS 2017/18 Literature In particular, H. Bauer, Probability Theory, de Gruyter, Berlin, P. Billingsley, Probability and Measure, Wiley, New York, P. Gänssler, W. Stute, Wahrscheinlichkeitstheorie, Springer, Berlin, A. Klenke, Probability Theory, Springer, Berlin, Prerequisites Stochastic Methods and Measure and Integration Theory

2 Contents I Introduction 1 II Basic Concepts of Probability Theory 3 1 Random Variables and Distributions Convergence in Probability Convergence in Distribution Uniform Integrability Kernels and Product Measures Independence III Limit Theorems 37 1 Zero-One Laws The Strong Law of Large Numbers The Weak Law of Large Numbers Characteristic Functions The One-Dimensional Central Limit Theorem The Law of the Iterated Logarithm The Multi-dimensional Central Limit Theorem IV Brownian Motion 65 1 Stochastic Processes Donsker s Invariance Principle The Brownian Motion V Conditional Expectations and Martingales 78 1 The Radon-Nikodym-Theorem Conditional Expectations Discrete-Time Martingales Stopping Times and Optional Sampling Martingale Inequalities and Convergence Theorems ii

3 CONTENTS iii A Measure and Integration Measure Spaces and Measurable Mappings Borel Sets The Lebesgue Measure Real-valued Measurable Mappings Integration L p -Spaces Dynkin Classes Product Spaces Carathéodory s Theorem The Factorization Lemma Literature 116 Definitions 117

4 Chapter I Introduction A stochastic model: a probability space (Ω, A, P ) together with a collection of random variables (measurable mappings) Ω R, say. Main topics in this course: (i) limit theorems, (ii) conditional probabilities and expectations, (iii) discrete-time martingales, (iv) Brownian motion. Example 1. Limit theorems like the law of large numbers or the central limit theorem deal with sequences X 1, X 2,... of random variables and their partial sums S n = n X i (physics: position of a particle after n collisions; gambling: cumulative gain after n trials). Under which conditions and in which sense does n α S n converge as n tends to infinity? Example 2. Consider two random variables X 1 and X 2. If P ({X 2 = v}) > 0 then the conditional probability of {X 1 A} given {X 2 = v} is defined by P ({X 1 A} {X 2 = v}) = P ({X 1 A} {X 2 = v}). P ({X 2 = v}) How can we reasonably extend this definition to the case P ({X 2 = v}) = 0, e.g., for X 2 being normally distributed? How does the observation X 2 = v change our stochastic model? Example 3. Martingales may be used to model fair games, and a particular case of a martingale S 0, S 1,... arises in Example 1, if X 1, X 2,... is an independent sequence with zero mean. A gambling strategy is defined by a sequence H 1, H 2,... of random 1

5 I. Introduction 2 variables, where H n may depend in any way on the outcomes of the first n 1 trials. The cumulative gain after n trials is given by the discrete integral n H k X k = k=1 n H k (S k S k 1 ). k=1 Can we tilt the martingale in favor of the gambler by a suitable strategy? Example 4. The fluctuation of a stock price defines a function on the time interval [0, [ with values in R (for simplicity, we admit negative stock prices at this point). What is a reasonable σ-algebra on the space Ω of all mappings [0, [ R or on the subspace of all continuous mappings? How can we define (non-discrete) probability measures on these spaces in order to model the random dynamics of stock prices? Analogously for random perturbations in physics, biology, etc. More generally, the same questions arise for mappings I S with an arbitrary non-empty set I and S R d (physics: phase transition in ferromagnetic materials, the orientation of magnetic dipoles on a set I of sites; medicine: spread of diseases, certain biometric parameters for a set I of individuals; environmental science: the concentration of certain pollutants in a region I). Example 5. Suppose that X 1, X 2,... is an independent sequence with zero mean and variance one. Rescaling the partial sums S n in time and space according to S (m) n/m = n X k / m k=1 and using piecewise linear interpolation of S (m) 0, S (m) 1/m,... at the knots 0, 1/m,... we get a random variable S (m) taking values in the space C([0, [). By the central limit theorem S (m) 1 converges to a standard normally distributed random variable as m tends to infinity. The infinite-dimensional counterpart of this result, which deals with probability measures on the space C([0, [), guarantees convergence of S (m) to a Brownian motion. On the latter we quote from Schilling, Partzsch (2014, p. vi): Brownian motion is arguably the single most important stochastic process.

6 Chapter II Basic Concepts of Probability Theory Context for probability theoretical concepts: a probability space (Ω, A, P ). Terminology: A A event, P (A) probability of the event A A. 1 Random Variables and Distributions Given: a probability space (Ω, A, P ) and a measurable space (Ω, A ). Definition 1. X : Ω Ω is a random element if X is A-A -measurable. Particular cases: (i) X is a (real) random variable if (Ω, A ) = (R, B), (ii) X is a numerical random variable if (Ω, A ) = (R, B), (iii) X is a k-dimensional (real) random vector if (Ω, A ) = (R k, B k ), (iv) X is a k-dimensional numerical random vector if (Ω, A ) = (R k, B k ). As is customary, we use the abbreviation for any X : Ω Ω and A Ω. Definition 2. {X A} = {ω Ω : X(ω) A} (i) The distribution (probability law) of a random element X : Ω Ω (with respect to P ) is the image measure P X = X(P ). Notation: X Q if P X = Q. 3

7 II.1. Random Variables and Distributions 4 (ii) Given: probability spaces (Ω 1, A 1, P 1 ), (Ω 2, A 2, P 2 ) and random elements X 1 and X 2 are identically distributed if X 1 : Ω 1 Ω, X 2 : Ω 2 Ω. (P 1 ) X1 = (P 2 ) X2. Definition 3. A property Π holds P -almost surely (P -a.s., a.s., with probability one), if A A: A {ω Ω : Π holds for ω} P (A) = 1. Remark 4. (i) For random elements X, Y : Ω Ω X = Y P -a.s. P X = P Y, but the converse is not true in general. For instance, let P be the uniform distribution on Ω = {0, 1} and define X(ω) = ω and Y (ω) = 1 ω. (ii) For every probability measure Q on (Ω, A ) there exists a probability space (Ω, A, P ) and a random element X : Ω Ω such that X Q. Take (Ω, A, P ) = (Ω, A, Q) and X = id Ω. Carathéodory s Theorem is a general tool for the construction of probability measures, see Section A.9. (iii) A major part of probability theory deals with properties of random elements that can be formulated in terms of their distributions only. Example 5. A discrete distribution P X is specified by a countable set D Ω and a mapping p: D R such that ω D : p(ω ) 0 p(ω ) = 1, ω D namely, Here ε ω P X = p(ω ) ε ω. ω D denotes the Dirac measure at the point ω Ω, see Example A.1.3. Hence P X (A ) = P ({X A }) = p(ω ), A A. ω A D Assume that {ω } A for every ω D. Then P ({X = ω }) = p(ω ), and p, extended by zero to a mapping Ω R, is the density of P X w.r.t the counting measure on A.

8 II.1. Random Variables and Distributions 5 If D < then p(ω ) = 1 D yields the uniform distribution on D. For (Ω, A ) = (R, B) B(n, p) = n k=0 ( ) n p k (1 p) n k ε k k is the binomial distribution with parameters n N and p [0, 1]. In particular, for n = 1 we get the Bernoulli distribution B(1, p) = (1 p) ε 0 + p ε 1. Further examples include the geometric distribution G(p) with parameter p ]0, 1], G(p) = p (1 p) k 1 ε k, k=1 and the Poisson distribution π(λ) with parameter λ > 0, π(λ) = k=0 exp( λ) λk k! ε k. Example 6. Consider a distribution on (R k, B k ) that is defined in terms of a probability density f : R k [0, [ w.r.t. the Lebesgue measure λ k. In this case P X (A ) = P ({X A }) = f dλ k, A B k. A We present some examples in the case k = 1. The normal distribution N(µ, σ 2 ) = f λ 1 with parameters µ R and σ 2, where σ > 0, is obtained by ( 1 f(x) = exp 1 ) (x µ) 2, x R. 2πσ 2 2 σ 2 The exponential distribution with parameter λ > 0 is obtained by { 0 if x < 0 f(x) = λ exp( λ x) if x 0. The uniform distribution on D B with λ 1 (D) ]0, [ is obtained by f = 1 λ 1 (D) 1 D.

9 II.1. Random Variables and Distributions 6 Distribution Functions Definition 7. Let X = (X 1,..., X k ): Ω R k be a random vector. Then F X : R k [0, 1] (x 1,..., x k ) P X ( k ) ( k ], x i ] = P ) {X i x i } is called the distribution function of X. Theorem 8. Given: probability spaces (Ω 1, A 1, P 1 ), (Ω 2, A 2, P 2 ) and random vectors X 1 : Ω 1 R k, X 2 : Ω 2 R k. Then (P 1 ) X 1 = (P 2 ) X 2 F X 1 = F X 2. Proof. holds trivially. : By Theorem A.2.4, B k = σ(e) for E = { k } ], x i ] : x 1,..., x k R, and E is closed w.r.t. intersections. Use Theorem A.1.5. For notational convenience, we consider the case k = 1 in the sequel. We refer to Elstrodt (2011, Satz II.5.10) or Übung 3.4 in Maß- und Integrationstheorie (2016) for the following two facts. Theorem 9. (i) F X is non-decreasing, (ii) F X is right-continuous, (iii) lim x F X (x) = 0 and lim x F X (x) = 1, (iv) F X is continuous at x iff P ({X = x}) = 0. Theorem 10. For every function F that satisfies (i) (iii) from Theorem 9, 1 Q probability measure on B x R: Q(], x]) = F (x). Expectation and Variance Remark 11. Define r = for r > 0. For 1 p < q < and X Z(Ω, A) ( ) p/q X p dp X q dp, due to Hölder s inequality, see Theorem A.6.2.

10 II.1. Random Variables and Distributions 7 Notation: L = L(Ω, A, P ) = { X Z(Ω, A) : } X dp < is the class of P -integrable random variables, and analogously { } L = L(Ω, A, P ) = X Z(Ω, A) : X dp < is the class of P -integrable numerical random variables. We consider P X as a distribution on (R, B) if P ({X R}) = 1 for a numerical random variable X, and we consider L as a subspace of L. Definition 12. For X L or X Z + E(X) = X dp is the expectation of X. For X Z(Ω, A) such that X 2 L Var(X) = (X E(X)) 2 dp and Var(X) are the variance and the standard deviation of X, respectively. Remark 13. The Transformation Theorem A.5.12 implies X p dp < x p P X (dx) < Ω for X Z(Ω, A), in which case, for p = 1 E(X) = and for p = 2 R R R x P X (dx), Var(X) = (x E(X)) 2 P X (dx). Thus E(X) and Var(X) depend only on P X. Example 14. X B(n, p) E(X) = n p Var(X) = n p (1 p) X G(p) E(X) = 1 Var(X) = 1 p p p 2 X π(λ) E(X) = λ Var(X) = λ, see the lecture on Stochastische Methoden. X is Cauchy distributed with parameter α > 0 if X f λ 1 where f(x) = α π(α 2 + x 2 ), x R.

11 II.2. Convergence in Probability 8 Since t x dx = 1 log( x 2 2 t2 ) neither E(X + ) < nor E(X ) <, and therefore X L. If X N(µ, σ 2 ) then E(X) = µ Var(X) = σ 2. If X is exponentially distributed with parameter λ > 0 then E(X) = 1 λ Var(X) = 1 λ 2. See the lecture on Stochastische Methoden. 2 Convergence in Probability Motivated by the Examples A.5.11 and A.6.8 we introduce a notion of convergence that is weaker than convergence in mean and convergence almost surely. In the sequel, X, X n, etc. are random variables on a common probability space (Ω, A, P ). Definition 1. (X n ) n converges to X in probability if Notation: X n P X. ε > 0: lim P ({ X n X > ε}) = 0. Theorem 2 (Chebyshev-Markov Inequality). Let (Ω, A, µ) be a measure space and f Z(Ω, A). For every u > 0 and every 1 p < µ({ f u}) 1 u f p dµ. p Proof. We have u p dµ f p dµ. { f u} Ω Corollary 3. If E(X 2 ) < and ε > 0, then P ({ X E(X) ε}) 1 ε 2 Var(X). Theorem 4. d(x, Y ) = min(1, X Y ) dp defines a semi-metric on Z(Ω, A), and X n P X lim d(x n, X) = 0.

12 II.2. Convergence in Probability 9 Proof. For ε > 0 min(1, X n X ) dp = min(1, X n X ) dp + { X n X >ε} P ({ X n X > ε}) + min(1, ε). : Let 0 < ε < 1. Use Theorem 2 to obtain { X n X ε} min(1, X n X ) dp P ({ X n X > ε}) = P ({min(1, X n X ) > ε}) 1 ε min(1, X n X ) dp = 1 ε d(x n, X). Remark 5. By Theorems 4 and A.6.11, X n L p X X n P X. Example A.5.11 shows that does not hold in general. Remark 6. By Theorems 4 and A.5.10, X n P -a.s. X X n P X. Example A.6.8 shows that does not hold in general. The Law of Large Numbers deals with convergence almost surely or convergence in probability, see the introductory Example I.1 and Sections III.2 and III.3. Subsequence Criteria Corollary 7. X n P P -a.s. X subsequence (X nk ) k N : X nk X. Proof. Due to Theorems 4 and A.6.7 there exists a subsequence (X nk ) k N such that min(1, X nk X ) P -a.s. 0. Remark 8. In any semi-metric space (M, d) a sequence (a n ) n N converges to a iff subsequence (a nk ) k N subsequence (a nkl ) l N : lim l d(a nkl, a) = 0. Corollary 9. X n P X iff subsequence (X nk ) k N subsequence (X nkl ) l N : X nkl P -a.s. X. Proof. : Corollary 7. : Remarks 6 and 8 together with Theorem 4.

13 II.3. Convergence in Distribution 10 Remark 10. We conclude that, in general, there is no semi-metric on Z(Ω, A) that defines the a.s.-convergence. However, if Ω is countable, then Proof: Übung 2.3. X n P -a.s. X X n P X. Lemma 11. Let denote convergence almost everywhere or convergence in probability. If X n (i) X (i) for i = 1,..., m and f : R m R is continuous, then f (X n (1),..., X n (m) ) f (X (1),..., X (m) ). Proof. Trivial for convergence almost everywhere, and by Corollary 9 the conclusion holds for convergence in probability, too. Corollary 12. Let X n P X. Then X n Proof. Corollary 9 and Lemma A.6.6. P Y X = Y P -a.s. 3 Convergence in Distribution Given: a metric space (M, ρ). Let M(M) denote the set of all probability measures on the Borel-σ-algebra B(M) in M. Moreover, let Definition 1. C b (M) = {f : M R : f bounded, continuous}. (i) A sequence (Q n ) n N in M(M) converges weakly to Q M(M) if f C b (M): lim f dq n = f dq. Notation: Q n w Q. (ii) A sequence (X n ) n N of random elements with values in M converges in distribution to a random element X with values in M if Q n Q for the distributions w Q n of X n and Q of X, respectively. d Notation: X n X. Remark 2. For convergence in distribution the random elements need not be defined on a common probability space. In the sequel: Q n, Q M(M) for n N.

14 II.3. Convergence in Distribution 11 Example 3. (i) For x n, x M ε xn For the proof of, note that f dε xn = f(x n ), w ε x lim ρ(x n, x) = 0. f dε x = f(x). For the proof of, suppose that lim sup ρ(x n, x) > 0. Take f(y) = min(ρ(y, x), 1), y M, and observe that f C b (M) and lim sup f dε xn = lim sup min(ρ(x n, x), 1) > 0 while f dε x = 0. (ii) For the Euclidean distance ρ on M = R k we have (M, B(M)) = (R k, B k ). Now, in particular, k = 1 and Q n = N(µ n, σ 2 n) where σ n > 0. For f C b (R) f dq n = 1/ 2π f(σ n x + µ n ) exp( 1/2 x 2 ) λ 1 (dx). R Put N(µ, 0) = ε µ. Then lim µ n = µ w lim σ n = σ Q n N(µ, σ 2 ). Otherwise (Q n ) n N does not converge weakly. Übung 4.1. (iii) For M = C([0, T ]) let ρ(x, y) = sup t [0,T ] x(t) y(t). Examples I.4 and I.5. Cf. the introductory Remark 4. Note that Q n w Q does not imply A B(M): lim Q n (A) = Q(A). For instance, assume lim ρ(x n, x) = 0 with x n x for every n N. Then ε xn ({x}) = 0, ε x ({x}) = 1. Theorem 5 (Portemanteau Theorem). The following properties are equivalent: (i) Q n w Q, (ii) f C b (M) uniformly continuous: lim f dqn = f dq,

15 II.3. Convergence in Distribution 12 (iii) A M closed: lim sup Q n (A) Q(A), (iv) A M open: lim inf Q n (A) Q(A), (v) A B(M): Q( A) = 0 lim Q n (A) = Q(A). Proof. At first we show that (ii) (iii) (i), which yields the equivalence of (i) (iv). (ii) (iii) : Let A M be closed, and put A m = {x M : dist(x, A) < 1/m}. Note that A m A. Let ε > 0, and take m N such that Q(A m ) < Q(A) + ε. Define ϕ: R R by 1 if z 0 ϕ(z) = 1 z if 0 < z < 1 0 otherwise and f : M R by f(x) = ϕ(m dist(x, A)). Then f C b (M) is uniformly continuous, and 0 f 1 with f A = 1 and f A c m = 0. Hence Q n (A) f dq n and lim sup Q n (A) f dq Q(A m ) Q(A) + ε. (iii) (i) : Let f C b (M). Assume that 0 < f < 1 without loss of generality. For every m N and every P M(M) m f dp k/m P ({(k 1)/m f < k/m}) = 1 m P ({f (k 1)/m}) m k=1 k=1 m = (k 1)/m P ({(k 1)/m f < k/m}) + 1/m f dp + 1/m. k=1 Therefore lim sup f dq n 1 m m Q({f (k 1)/m}) k=1 f dq + 1/m, which implies lim sup f dqn f dq. In fact, the latter holds true for any bounded and upper semi-continuous mapping f : M R. For f being continuous consider 1 f, too, to obtain lim f dqn = f dq. (i) (v) : Let A B(M) with Q( A) = 0, and put f = 1 A as well a f = sup{g : g lower semi-continuous, g f}, f = inf{h : h upper semi-continuous, h f}.

16 II.3. Convergence in Distribution 13 Then f is lower semi-continuous, f is upper semi-continuous, and f f f with equality Q-a.s., since Q( A) = 0. From the proof of (iii) (i) we get f dq lim inf f dq n lim sup f dq n f dq. Hence lim f dqn = f dq. (v) (i) : Übung 3.4. Lemma 6. Consider metric spaces M, M and a continuous mapping π : M M. Then w w Q πq n πq. Q n Proof. Note that f C b (M ) implies f π C b (M). Employ Theorem A.5.12 and the definition of weak convergence. Weak Convergence of Measures on (R, B) In the sequel, we study the particular case (M, B(M)) = (R, B), i.e., convergence in distribution for random variables. The Central Limit Theorem deals with this notion of convergence, see the introductory Example I.1 and Section III.5. Notation: for any Q M(R) and for any function F : R R Theorem 7. Q n Moreover, if Q n F Q (x) = Q(], x]), x R, Cont(F ) = {x R : F continuous at x}. w Q x Cont(F Q ): lim F Qn (x) = F Q (x). w Q and Cont(F Q ) = R then lim sup F Qn (x) F Q (x) = 0. x R Proof. : If x Cont(F Q ) and A = ], x] then Q( A) = Q({x}) = 0, see Theorem 1.9. Hence Theorem 5 implies lim F Q n (x) = lim Q n (A) = Q(A) = F Q (x). : Consider a non-empty open set A R. Take pairwise disjoint open intervals A 1, A 2,... such that A = A i. Fatou s Lemma implies lim inf Q n(a) = lim inf Q n (A i ) lim inf Q n(a i ).

17 II.3. Convergence in Distribution 14 Note that R \ Cont(F Q ) is countable. Fix ε > 0, and take for i N such that Then lim inf We conclude that A i = ]a i, b i] A i a i, b i Cont(F Q ) Q(A i ) Q(A i) + ε 2 i. w and therefore Q n Q by Theorem 5. Uniform convergence, see Übung 1.3. Corollary 8. Q n(a i ) lim inf Q n(a i) = Q(A i) Q(A i ) ε 2 i. Q n lim inf Q n(a) Q(A) ε, w w Q Q n Q Q = Q. Proof. By Theorem 7 F Q (x) = F Q(x) if x D = Cont(F Q ) Cont(F Q). Since D is dense in R and F Q as well as F Q are right-continuous, we get F Q = F Q. Apply Theorem Given: random variables X n, X on (Ω, A, P ) for n N. Theorem 9. and Proof. Assume X n X n X n P d X X n X d P X X constant a.s. X n X. P X. For ε > 0 and x R P ({X x ε}) P ({ X X n > ε}) P ({X x ε} { X X n ε}) P ({X n x}) = P ({X n x} {X x + ε}) + P ({X n x} {X > x + ε}) P ({X x + ε}) + P ({ X X n > ε}). Thus F X (x ε) lim inf F X n (x) lim sup F Xn (x) F X (x + ε). For x Cont(F X ) we get lim F Xn (x) = F X (x). Apply Theorem 7. d Now, assume that X n X and P X = ε x. Let ε > 0 and take f C b (R) such that f 0, f(x) = 0, and f(y) = 1 if x y ε. Then P ({ X X n > ε}) = P ({ x X n > ε}) = 1 R\[x ε,x+ε] dp Xn f dp Xn and lim f dp Xn = f dp X = 0.

18 II.3. Convergence in Distribution 15 Example 10. Consider the uniform distribution P on Ω = {0, 1}. Put Then P Xn = P X and therefore X n (ω) = ω, X(ω) = 1 ω. X n However, { X n X < 1} = and therefore X n d X. P X does not hold. Theorem 11 (Skorohod). There exists a probability space (Ω, A, P ) with the following property. If w Q, Q n then there exist X n, X Z(Ω, A) for n N such that n N: Q n = P Xn Q = P X X n P -a.s. X. Proof. Take Ω = ]0, 1[, A = B(Ω), and consider the uniform distribution P on Ω. Define X Q (ω) = inf{z R : ω F Q (z)}, ω ]0, 1[, for any Q M(R). Since X Q is non-decreasing, we have X Q Z(Ω, A). Furthermore, see Übung 2.4. w P XQ = Q, (1) Assuming Q n Q we define X n = X Qn and X = X Q. Since X is non-decreasing, we conclude that Ω \ Cont(X) is countable. Thus it suffices to show ω Cont(X): lim X n (ω) = X(ω). Let ω Cont(X) and ε > 0. Put x = X(ω) and take x i Cont(F Q ) such that Hence x ε < x 1 < x < x 2 < x + ε. F Q (x 1 ) < ω < F Q (x 2 ). By assumption there exists n 0 N such that F Qn (x 1 ) < ω < F Qn (x 2 ) for n n 0. Hence X n (ω) ]x 1, x 2 ], i.e. X n (ω) X(ω) < ε.

19 II.3. Convergence in Distribution 16 Remark 12. By (1) we have a general method to transform uniformly distributed random numbers from ]0, 1[ into random numbers with distribution Q. Remark 13. (i) Put C (r) = {f : R R : f, f (1),..., f (r) bounded, f (r) uniformly continuous}. Then Q n w Q r N 0 f C (r) : lim f dq n = f dq, see Gänssler, Stute (1977, p. 66). (ii) The Lévy distance d(q, R) = inf{h ]0, [ : x R: F Q (x h) h F R (x) F Q (x + h) + h} defines a metric on M(R), and Q n see Chow, Teicher (1978, Thm ). w Q lim d(q n, Q) = 0, (iii) Suppose that (M, ρ) is a complete separable metric space. Then there exists a metric d on M(M) such that (M(M), d) is complete and separable as well, and Q n w Q lim d(q n, Q) = 0, see Parthasarathy (1967, Sec. II.6) or Klenke (2013, Abschn. 13.2) and cf. Übung 3.3. Compactness Finally, we present a compactness criterion, which is very useful for construction of probability measures on B(M). Lemma 14. Let x n,l R for n, l N with l N: sup x n,l <. n N Then there exists an increasing sequence (n i ) i N in N such that l N: (x ni,l) i N converges. Proof. See Billingsley (1979, Thm ).

20 II.3. Convergence in Distribution 17 Definition 15. (i) P M(M) tight if ε > 0 K M compact P P: P (K) 1 ε. (ii) P M(M) relatively compact if every sequence in P contains a subsequence that converges weakly. Theorem 16 (Prohorov). Assume that M is a complete separable metric space and P M(M). Then P relatively compact P tight. Proof. See Parthasarathy (1967, Thm. II.6.7). Here: M = R. : Suppose that P is not tight. Then, for some ε > 0, there exists a sequence (P n ) n N in P such that P n ([ n, n]) < 1 ε. For a suitable subsequence, P nk Theorem 5 implies P (] m, m[) lim inf k which is a contradiction. w P M(R). Take m > 0 such that P (] m, m[) > 1 ε. P n k (] m, m[) lim inf k P n k ([ n k, n k ]) 1 ε, : Consider any sequence (P n ) n N in P and the corresponding sequence (F n ) n N of distribution functions. Use Lemma 14 to obtain a subsequence (F ni ) i N and a non-decreasing function G: Q [0, 1] with q Q: lim i F ni (q) = G(q). Put F (x) = inf{g(q) : q Q x < q}, x R. Claim (Helly s Theorem): (i) F is non-decreasing and right-continuous, (ii) x Cont(F ): lim i F ni (x) = F (x). Proof: Ad (i): Obviously F is non-decreasing. For x R and ε > 0 take δ 2 > 0 such that q Q ]x, x + δ 2 [ : G(q) F (x) + ε. Thus, for z ]x, x + δ 2 [, F (x) F (z) F (x) + ε.

21 II.4. Uniform Integrability 18 Ad (ii): If x Cont(F ) and ε > 0 take δ 1 > 0 such that Thus, for q 1, q 2 Q with we get Claim: F (x) ε F (x δ 1 ). x δ 1 < q 1 < x < q 2 < x + δ 2, F (x) ε F (x δ 1 ) G(q 1 ) lim inf i G(q 2 ) F (x) + ε. lim F (x) = 0 x Proof: For ε > 0 take m Q such that Thus F n i (x) lim sup F ni (x) i lim F (x) = 1. x n N: P n (] m, m]) 1 ε. G(m) G( m) = lim i ( Fni (m) F ni ( m) ) = lim i P ni (] m, m]) 1 ε. Since F (m) G(m) and F ( m 1) G( m), we obtain It remains to apply Theorems 1.10 and 7. F (m) F ( m 1) 1 ε. 4 Uniform Integrability In the sequel: X n, X random variables on a common probability space (Ω, A, P ). Definition 1. (X n ) n N uniformly integrable (u.i.) if lim sup X n dp = 0. α n N Remark 2. { X n α} (i) (X n ) n N u.i. ( n N: X n L 1 ) sup n N X n 1 <. (ii) Y L 1 n N: X n Y (X n ) n N u.i. (iii) p > 1 ( n N: X n L p ) sup n N X n p < (X n ) n N u.i. Proof: { X n α} X n dp = 1/α p 1 { X n α} αp 1 X n dp 1/α p 1 X n p p.

22 II.4. Uniform Integrability 19 Example 3. For the uniform distribution P on [0, 1] and X n = n 1 [0,1/n] we have X n L 1 and X n 1 = 1, but for any α > 0 and n α X n dp = n P ([0, 1/n]) = 1, so that (X n ) n N is not u.i. Lemma 4. (X n ) n N u.i. iff { X n α} sup E( X n ) < (1) n N and ε > 0 δ > 0 A A: ( P (A) < δ sup n N A ) X n dp < ε. (2) Proof. : For (1), see Remark 2.(i). Moreover, X n dp = X n dp + X n dp A A { X n α} A { X n <α} X n dp + α P (A). { X n α} For ε > 0 take α > 0 with sup X n dp < ε/2 n N { X n α} and δ = ε/(2α) to obtain (2). : Put M = sup n N E( X n ). Theorem 2.2 yields P ({ X n α}) M/α. Let ε > 0, take δ > 0 according to (2) to obtain for α > M/δ X n dp < ε. sup n N { X n α} Theorem 5. Let 1 p <, and assume X n L p for every n N. Then (X n ) n N converges in L p iff (X n ) n N converges in probability ( X n p ) n N is u.i. L p Proof. : Assume X n X. It follows that (X n ) n N is bounded in L p, and from P Remark 2.5 we get X n X. Observe that 1 A X n p 1 A (X n X) p + 1 A X p

23 II.4. Uniform Integrability 20 for every A A. Let ε > 0, take k N such that By Remark 2.(ii), sup X n X p < ε. (3) n>k ( X 1 X p,..., X k X p, X p, X p,... ) u.i. By Lemma 4 P (A) < δ ( ) sup 1 A (X n X) p < ε 1 A X p < ε 1 n k for a suitable δ > 0. Together with (3) this implies P (A) < δ sup 1 A X n p < 2 ε. n N : Let ε > 0, put A = A m,n = { X m X n > ε}. Then X m X n p 1 A (X m X n ) p + 1 A c (X m X n ) p 1 A X m p + 1 A X n p + ε. P By assumption X n X for some X Z(Ω, A). Take δ > 0 according to (2) for ( X n p ) n N, and note that Hence, for m, n sufficiently large, A m,n { X m X > ε/2} { X n X > ε/2}. P (A m,n ) < δ, which implies Apply Theorem A.6.7. X m X n p 2 ε 1/p + ε. Remark 6. (i) Theorem 5 yields a generalization of Lebesgue s convergence theorem: If X n L 1 P for every n N and X n X, then (X n ) n N u.i. X L 1 X n L 1 X. (ii) Uniform integrability is a property of the distributions only.

24 II.5. Kernels and Product Measures 21 Convergence of Expectations Theorem 7. X n d X E( X ) lim inf E( X n ). Proof. From Skorohod s Theorem 3.11 we get a probability space ( Ω, Ã, P ) with random variables X n, X such that X n P -a.s. X P Xn = P Xn P X = P X. Thus E( X ) = E( X ) and E( X n ) = E( X n ). Apply Fatou s Lemma A.5.5. Theorem 8. If then X n d X (X n ) n N u.i. X L 1 lim E(X n ) = E(X). Proof. Notation as previously. Now ( X n ) n N is u.i., see Remark 6.(ii). Hence, by Remark 6.(i), X L 1 and X L 1 n X. Thus E( X ) < and lim E(X n) = lim E( X n ) = E( X) = E(X). P -a.s. Example 9. Example 3 continued. With X = 0 we have X n X, and therefore d X. But E(X n ) = 1 > 0 = E(X). X n 5 Kernels and Product Measures Recall the construction and properties of product (probability) measures from Elstrodt (2011, Kap. V) or Section V.2 in the Lecture Notes on Maß- und Integrationstheorie (2016). Two-Stage Experiments Given: measurable spaces (Ω 1, A 1 ) and (Ω 2, A 2 ). Motivation: two-stage experiment. Output ω 1 Ω 1 of the first stage determines probabilistic model for the second stage. Example 1. Choose one out of n coins and throw it once. Parameters a 1,..., a n 0 such that n a i = 1 and b 1,..., b n [0, 1]. Let Ω 1 = {1,..., n}, A 1 = P(Ω 1 ) and define µ = n a i ε i,

25 II.5. Kernels and Product Measures 22 i.e., a i = µ({i}) is the probability of choosing the i-th coin. Moreover, let and define Ω 2 = {H, T}, A 2 = P(Ω 2 ) K(i, ) = b i ε H + (1 b i ) ε T, i.e., b i = K(i, {H}) is the probability for obtaining H when throwing the i-th coin. Definition 2. K : Ω 1 A 2 R is a Markov (transition) kernel (from (Ω 1, A 1 ) to (Ω 2, A 2 )), if (i) K(ω 1, ) is a probability measure on A 2 for every ω 1 Ω 1, (ii) K(, A 2 ) is A 1 -B-measurable for every A 2 A 2. Example 3. Extremal cases, non-disjoint. (i) Model for the second stage not influenced by output of the first stage, i.e., for a probability measure ν on A 2 In Example 1 this means b 1 = = b n. ω 1 Ω 1 : K(ω 1, ) = ν. (ii) Output of the first stage determines the output of the second stage, i.e., for a A 1 -A 2 -measurable mapping f : Ω 1 Ω 2 Given: ω 1 Ω 1 : K(ω 1, ) = ε f(ω1 ). In Example 1 this means b 1,..., b n {0, 1}. a probability measure µ on A 1 and a Markov kernel K from (Ω 1, A 1 ) to (Ω 2, A 2 ). Question: stochastic model (Ω, A, P ) for the compound experiment? Reasonable, and assumed in the sequel, How to define P? Ω = Ω 1 Ω 2, A = A 1 A 2. Example 4. In Example 1, a reasonable requirement for P is P (A 1 Ω 2 ) = µ(a 1 ), A 1 Ω 1, and P ({i} A 2 ) = K(i, A 2 ) µ({i}), A 2 Ω 2.

26 II.5. Kernels and Product Measures 23 Consequently, for A Ω P (A) = = n P ({(ω 1, ω 2 ) A : ω 1 = i}) = n P ({i} {ω 2 Ω 2 : (i, ω 2 ) A}) n K(i, {ω 2 Ω 2 : (i, ω 2 ) A}) µ({i}) = K(ω 1, {ω 2 Ω 2 : (ω 1, ω 2 ) A}) µ(dω 1 ). Ω 1 May we generally use the right-hand side integral for the definition of P? Lemma 5. Let f Z(Ω, A). Then, for ω 1 Ω 1, the ω 1 -section f(ω 1, ): Ω 2 R of f is A 2 -B-measurable, and for ω 2 Ω 2 the ω 2 -section of f is A 1 -B-measurable. f(, ω 2 ): Ω 1 R Proof. In the case of an ω 1 -section. Fix ω 1 Ω 1. Then Ω 2 Ω 1 Ω 2 : ω 2 (ω 1, ω 2 ) is A 2 -A-measurable due to Theorem A.8.5.(i). Apply Theorem A.1.7. Remark 6. In particular, for A A and f = 1 A f(ω 1, ) = 1 A (ω 1, ) = 1 A(ω1 ) where 1 A(ω 1 ) = {ω 2 Ω 2 : (ω 1, ω 2 ) A} is the ω 1 -section of A. By Lemma 5 ω 1 Ω 1 : A(ω 1 ) A 2. Analogously for the ω 2 -section A(ω 2 ) = {ω 1 Ω 1 : (ω 1, ω 2 ) A} of A. Lemma 7. Let f Z + (Ω, A). Then g : Ω 1 [0, [ { }, ω 1 f(ω 1, ω 2 ) K(ω 1, dω 2 ) Ω 2 is A 1 -B([0, ])-measurable. 1 poor notation

27 II.5. Kernels and Product Measures 24 Proof. Let F denote the set of all functions f Z + (Ω, A) with the measurability property as claimed. We show that Indeed, A 1 A 1, A 2 A 2 : 1 A1 A 2 F. (1) Ω 2 1 A1 A 2 (ω 1, ω 2 ) K(ω 1, dω 2 ) = 1 A1 (ω 1 )K(ω 1, A 2 ). Furthermore, we show that A A: 1 A F. (2) To this end let and D = {A A : 1 A F} E = {A 1 A 2 : A 1 A 1 A 2 A 2 }. Then E D by (1), E is closed w.r.t. intersections, and σ(e) = A. It easily follows that D is a Dynkin class. Hence Theorem A.7.4 yields A = σ(e) = δ(e) D A, which implies (2). From Theorems A.4.6 and A.5.9.(iv) we get f 1, f 2 F α [0, [ αf 1 + f 2 F. (3) Finally, Theorem A.5.3 and Theorem A.4.4.(iii) imply that f n F f n f f F. (4) Use Theorem A.4.8 together with (2) (4) to conclude that F = Z +. Theorem 8. Moreover, 1 probability measure µ K on A A 1 A 1 A 2 A 2 : µ K(A 1 A 2 ) = A 1 K(ω 1, A 2 ) µ(dω 1 ). (5) A A: µ K(A) = K(ω 1, A(ω 1 )) µ(dω 1 ). Ω 1 (6) Proof. Existence : For A A and ω 1 Ω 1 K(ω 1, A(ω 1 )) = 1 A(ω1 )(ω 2 ) K(ω 1, dω 2 ) = 1 A (ω 1, ω 2 ) K(ω 1, dω 2 ). Ω 2 Ω 2 According to Lemma 5.7 µ K is well-defined via (6). Using Theorem A.5.3, it is easy to verify that µ K is a probability measure on A. For A 1 A 1 and A 2 A 2 Hence µ K satisfies (5). K(ω 1, (A 1 A 2 )(ω 1 )) = { K(ω 1, A 2 ) if ω 1 A 1 0 otherwise. Uniqueness : Apply Theorem A.1.5 with A 0 = {A 1 A 2 : A i A i }.

28 II.5. Kernels and Product Measures 25 Example 9. In Example 4 we have P = µ K. Theorem 10 (Fubini s Theorem). (i) For f Z + (Ω, A) f d(µ K) = (ii) For f (µ K)-integrable and we have Ω (a) A 1 A 1 and µ(a 1 ) = 1, Ω 1 Ω 2 f(ω 1, ω 2 ) K(ω 1, dω 2 ) µ(dω 1 ). A 1 = {ω 1 Ω 1 : f(ω 1, ) K(ω 1, )-integrable} (b) A 1 R: ω 1 Ω 2 f(ω 1, ) dk(ω 1, ) is integrable w.r.t. µ A1 A 1, (c) Ω f d(µ K) = A 1 Ω 2 f(ω 1, ω 2 ) K(ω 1, dω 2 ) µ A1 A 1 (dω 1 ). Proof. Ad (i): algebraic induction. Ad (ii): consider f + and f and use (i). Remark 11. For brevity, we write f(ω 1, ω 2 ) K(ω 1, dω 2 ) µ(dω 1 ) = Ω 2 Ω 1 A 1 Ω 2 f(ω 1, ω 2 ) K(ω 1, dω 2 ) µ A1 A 1 (dω 1 ), if f is (µ K)-integrable. For f Z(Ω, A) f is (µ K)-integrable f (ω 1, ω 2 ) K(ω 1, dω 2 ) µ(dω 1 ) <. Ω 2 Multi-Stage Experiments Ω 1 Now we construct a stochastic model for a series of experiments, where the outputs of the first i 1 stages determine the model for the ith stage. Given: measurable spaces (Ω i, A i ) for i I, where I = {1,..., n} or I = N. Put and note that i j=1 ( ( ) Ω i i, A i = Ω j, j=1 Ω j = Ω i 1 Ω i i A j ), j=1 i A j = A i 1 A i j=1

29 II.5. Kernels and Product Measures 26 for i I \ {1}. Furthermore, let Ω = i I Ω i, A = i I A i. (7) Given: a probability measure µ on A 1, Markov kernels K i from ( Ω i 1, A i 1) to (Ωi, A i ) for i I \ {1}. Theorem 12. For I = {1,..., n} 1 probability measure ν on A A 1 A 1... A n A n : ν(a 1 A n ) =... K n ((ω 1,..., ω n 1 ), A n ) K n 1 ((ω 1,..., ω n 2 ), dω n 1 ) µ(dω 1 ). A 1 A n 1 Moreover, for f ν-integrable (the short version) f dν = Ω... Ω 1 f(ω 1,..., ω n ) K n ((ω 1,..., ω n 1 ), dω n ) µ(dω 1 ). Ω n (8) Notation: ν = µ K 2 K n. Proof. Induction, using Theorems 8 and 10. Remark 13. Particular case of Theorem 12 with µ = P 1, i I \ {1} ω i 1 Ω i 1 : K i (ω i 1, ) = P i (9) for probability measures P i on A i : 1 probability measure P 1 P n on A A 1 A 1... A n A n : P 1 P n (A 1 A n ) = P 1 (A 1 ) P n (A n ). Moreover, for every P 1 P n -integrable function f, Fubini s Theorem reads f d(p 1 P n ) = Ω... Ω 1 f(ω 1,..., ω n ) P n (dω n ) P 1 (dω 1 ). Ω n Analogously for any other order of integration. Definition 14. P 1 P n is called the product probability measure corresponding to P i for i = 1,..., n, and (Ω, A, µ) is called the product probability space corresponding to (Ω i, A i, P i ) for i = 1,..., n. Example 15. (i) In Example 4 with b = b 1 = = b n and ν = b ε H + (1 b) ε T we have P = µ ν.

30 II.5. Kernels and Product Measures 27 (ii) For countable spaces Ω i and σ-algebras A i = P(Ω i ) we get P 1 P 2 (A) = ω 1 Ω 1 P 2 (A(ω 1 )) P 1 ({ω 1 }), A Ω. (iii) For uniform distributions P i on finite spaces Ω i, P 1 P n is the uniform distribution on Ω. Remark 16. Theorems 8, 10, and 12 and Remark 13 extend to σ-finite measures µ instead of probability measures and σ-finite kernels K or K i, resp., instead of Markov kernels, with the resulting measure on A being σ-finite, too. Recall that µ is σ-finite, if A 1,1, A 1,2,... A 1 pairwise disjoint: Ω 1 = A 1,i i N: µ(a 1,i ) <. By definition, a mapping K : Ω 1 A 2 R is a kernel, if K(ω 1, ) is a measure on A 2 for every ω 1 Ω 1 and if K(, A 2 ) is A 1 -R-measurable for every A 2 A 2. Moreover, K is a σ-finite kernel if, additionally, Example 17. A 2,1, A 2,2,... A 2 pairwise disjoint: Ω 2 = A 2,i i N: sup K(ω 1, A 2,i ) <. ω 1 Ω 1 (i) For every integrable random variable X on any probability space (Ω, A, P ) with X 0, E(X) = (1 F X (u)) λ 1 (du), ]0, [ see Billingsley (1995, p. 275) or Übung 7.4 in Maß- und Integrationstheorie (2016). (ii) For the Lebesgue measure Theorem 18 (Ionescu-Tulcea). For I = N, λ n = λ 1 λ 1. 1 probability measure P on A n N A 1 A 1... A n A n : ( ) P A 1 A n Ω i = (µ K 2 K n )(A 1 A n ). (10) i=n+1 Proof. Existence : Consider the σ-algebras n A i, Ã n = σ ( ) π {1,...,n}

31 II.5. Kernels and Product Measures 28 in n Ω i and Ω i, respectively. Define a probability measure P n on Ãn by P n (A Then (8) yields the following consistency property Thus P n+1 (A Ω n+1 ) Ω i = (µ K 2 K n )(A), A i=n+1 ) Ω i = P n (A Ω i ), A i=n+2 i=n+1 P (Ã) = P n (Ã), yields a well-defined mapping on the algebra Ã Ãn, n A i. n A i. Ã = n N Ã n of cylinder sets. Obviously, P is a content with P ( Ω i ) = 1 and (10) holds for P = P. Claim: P is σ-continuous at. It suffices to show that for every sequence of sets A (n) = B (n) with B (n) n A i and A (n) we have Ω i i=n+1 lim (µ K 2... K n )(B (n) ) = 0. Assume the contrary, i.e., inf n N (µ K 2... K n )(B (n) ) > 0. Observe that Recursively, we define ω Ω such that B (n) Ω n+1 B (n+1). (11) n N: B (n+1) (ω 1,..., ω n), (12) which implies (ω 1,..., ω n) B (n) for every n N, see (11). Consequently, ω A (n), n=1 contradicting A (n). Put ω n = (ω 1,..., ω n ) for n 1 and ω i Ω i. Consider the probability measure K n+1 (ω n, ) on A n+1 as well as the Markov kernels K n+m ((ω n, ), ): n+m 1 Ω i A n+m R i=n+1

32 II.5. Kernels and Product Measures 29 for m 2. By Q n,m (ω n, ) = K n+1 (ω n, ) K n+m ((ω n, ), ) we obtain a probability measures on n+m i=n+1 A i for m 1; actually Q n,m is a Markov kernel. Finally, we put f n,m (ω n ) = Q n,m ( ωn, B (n+m) (ω n ) ) for n, m 1. Recursively, we define ω Ω such that Take m = 1 in (13) to obtain (12). n N: inf m 1 f n,m(ω 1,..., ω n) > 0. (13) Due to (11) we have B n+m (ω n ) Ω n+m+1 B n+m+1 (ω n ), and therefore f n,m (ω n ) = Q n,m+1 ( ωn, B n+m (ω n ) Ω n+m+1 ) fn,m+1 (ω n ). Furthermore, 0 f n,m 1 and f n,m (ω n ) = 1 B (n+m)(ω n, ω n+1,..., ω n+m ) Q n,m (ω n, d(ω n+1,..., ω n+m )). n+m i=n+1 Ω i In particular, f n,m is n A i B measurable. For n = 1 Therefore Ω 1 which yields For n 1 Therefore Ω 1 f 1,m (ω 1 ) µ(dω 1 ) = (µ K 2... K 1+m )(B (1+m) ). inf f 1,m(ω 1 ) µ(dω 1 ) = inf (µ K 2... K 1+m )(B (1+m) ) > 0, m 1 m 1 Ω n+1 ω 1 Ω 1 : inf m 1 f 1,m(ω 1) > 0. Ω n+1 f n+1,m (ω n, ω n+1 ) K n+1 (ω n, dω n+1 ) = f n,m+1 (ω n ). inf m 1 f n+1,m(ω n, ω n+1 ) K n+1 (ω n, dω n+1 ) = inf m 1 f n,m+1(ω n ). Proceed inductively to establish (13). By Theorem A.9.3, P is σ-additive and it remains to apply Theorem A.9.4. Uniqueness : By (10), P is uniquely determined on the class of measurable rectangles. Apply Theorem A.1.5. Outlook: the probabilistic method. Example 19. The queuing model, see Übung 5.3. depends on ω i 1. See also Übung 5.1. Here K i ((ω 1,..., ω i 1 ), ) only Outlook: Markov processes.

33 II.6. Independence 30 Product Measures The General Case Given: a non-empty set I and probability spaces (Ω i, A i, P i ) for i I. Recall the definition (7). Put Theorem 20. P 0 (I) = {J I : J non-empty, finite}. probability measure P on A S P 0 (I) A i A i, i S : 1 ( ) P A i Ω i = P i (A i ). (14) i S i I\S i S Notation: P = i I P i. Proof. See Remark 13 in the case of a finite set I. If I = N, assume I = N without loss of generality. The particular case of Theorem 18 with (9) for probability measures P i on A i shows 1 probability measure P on A n N A 1 A 1... A n A n : ( ) P A 1 A n Ω i = P 1 (A 1 ) P n (A n ). i=n+1 If I is uncountable, we use Theorem A.8.7. For S I non-empty and countable and for B i S A i we put P (( ) 1B) πs I = P i (B). i S Hereby we get a well-defined mapping P : A R, which clearly is a probability measure and satisfies (14). Use Theorem A.1.5 to obtain the uniqueness result. Definition 21. P = i I P i is called the product measure corresponding to P i for i I, and (Ω, A, P ) is called the product measure space corresponding to (Ω i, A i, P i ) for i I. 6 Independence... the concept of independence... plays a central role in probability theory; it is precisely this concept that distinguishes probability theory from the general theory of measure spaces, see Shiryayev (1984, p. 27). In the sequel, (Ω, A, P ) denotes a probability space and I is a non-empty set. Independence of Events Definition 1. Let A i A for i I. Then (A i ) i I is independent if ( ) P A i = P (A i ) (1) i S i S for every S P 0 (I). Elementary case: I = 2.

34 II.6. Independence 31 In the sequel, E i A for i I. Definition 2. (E i ) i I is independent if (1) holds for every S P 0 (I) and all A i E i for i S. Remark 3. (i) (E i ) i I independent i I : Ẽi E i (Ẽi) i I independent. (ii) (E i ) i I independent S P 0 (I): (E i ) i S independent. Lemma 4. (E i ) i I independent (δ(e i )) i I independent. Proof. Without loss of generality, I = {1,..., n} and n 2, see Remark 3.(ii). Put D 1 = {A δ(e 1 ) : ({A}, E 2,..., E n ) independent}. Then D 1 is a Dynkin class and E 1 D 1, hence δ(e 1 ) = D 1. Thus (δ(e 1 ), E 2,..., E n ) independent. Repeat this step for 2,..., n. Theorem 5. If (E i ) i I independent i I : E i closed w.r.t. intersections (2) then (σ(e i )) i I independent. Proof. Use Theorem A.7.4 and Lemma 4. Corollary 6. Assume that I = j J I j for pairwise disjoint sets I j. If (2) holds, then ( ( ) ) σ independent. Proof. Let { Ẽ j = i S i I j E i j J } A i : S P 0 (I j ) A i E i for i S. Then Ẽj is closed w.r.t. intersections and ( Ẽ j is independent. Finally, )j J ( ) σ E i = σ(ẽj). i I j

35 II.6. Independence 32 Independence of Random Elements In the sequel, (Ω i, A i ) denotes a measurable space for i I, and X i : Ω Ω i is A-A i -measurable for i I. Definition 7. (X i ) i I is independent if (σ(x i )) i I is independent. Remark 8. See Übung 6.2 for a characterization of independence in terms of prediction problems. Example 9. Actually, the essence of independence. Assume that ( (Ω, A, P ) = Ω i, ) A i P i i I i I, i I for probability measures P i on A i. Let X i = π i. Then, for S P 0 (I) and A i A i for i S ( ) ( ) P {X i A i } = P i S A i Ω i = P i (A i ) = P ({X i A i }). i S i I\S i S i S Hence (X i ) i I is independent. Furthermore, P Xi = P i. Theorem 10. Given: probability spaces (Ω i, A i, P i ) for i I. Then there exist (i) a probability space (Ω, A, P ) and (ii) A-A i -measurable mappings X i : Ω Ω i for i I such that (X i ) i I independent i I : P Xi = P i. Proof. See Example 9. Theorem 11. Let F i A i for i I. If i I : σ(f i ) = A i F i closed w.r.t. intersections then (X i ) i I independent (X 1 i (F i )) i I independent. Proof. By Elstrodt (2011, Satz I.4.4) or Lemma II.3.6 in Maß- und Integrationstheorie (2016) we have σ(x i ) = X 1 i (A i ) = σ(x 1 i (F i )). : See Remark 3.(i). : Note that X 1 i (F i ) is closed w.r.t. intersections. Use Theorem 5. Example 12. Independence of a family of random variables X i, i.e., (Ω i, A i ) = (R, B) for i I. In this case (X i ) i I is independent iff ( ) S P 0 (I) c i R, i S : P {X i c i } = P ({X i c i }). i S i S

36 II.6. Independence 33 Theorem 13. Let (i) I = j J I j for pairwise disjoint sets I j, (ii) ( Ω j, Ãj) be measurable spaces for j J, (iii) f j : i Ij Ω i Ω ( j be A i Ij i) -Ãj measurable mappings for j J. Put Then Proof. Y j = (X i ) i Ij : Ω i I j Ω i. (X i ) i I independent (f j Y j ) j J independent. σ(f j Y j ) = Y 1 j (f 1 j ( (Ãj)) Y 1 j i I j A i ( ) = σ({x i : i I j }) = σ X 1 i (A i ). i I j ) Use Corollary 6 and Remark 3.(i). Example 14. For an independent sequence (X i ) i N of random variables are independent. ( max(x 1, X 3 ), 1 [0, [ (X 2 ), lim sup 1/n n ) X i Independence and Product Measures Remark 15. Consider the mapping X : Ω i I Ω i : ω (X i (ω)) i I. Clearly X is A- i I A i-measurable. By definition, P X (A) = P ({X A}) for A i I A i. In particular, for measurable rectangles A i I A i, i.e., A = A i Ω i (3) i S i I\S with S P 0 (I) and A i A i, ( ) P X (A) = P {X i A i }. (4) i S

37 II.6. Independence 34 Definition 16. (i) P X is called the joint distribution of the random elements X i, i I. (ii) Let P denote a probability measure on ( i I Ω i, i I A i), and let i I. Then P πi is called a (one-dimensional) marginal distribution of P. Example 17. Let Ω = {1,..., 6} 2 and consider the uniform distribution P on A = P(Ω), which is a model for rolling a die twice. Moreover, let Ω i = N and A i = P(Ω i ) such that 2 A i = P(N 2 ). Consider the random variables Then where X 1 (ω 1, ω 2 ) = ω 1, X 2 (ω 1, ω 2 ) = ω 1 + ω 2. P X (A) = A M, A N 2, 36 M = {(k, l) N 2 : 1 k 6 k + 1 l k + 6} Claim: (X 1, X 2 ) are not independent. Proof: but We add that Theorem 18. P ({X 1 = 1} {X 2 = 3}) = P X ({(1, 3)}) = P ({(1, 2)}) = 1/36 P ({X 1 = 1}) P ({X 2 = 3}) = 1/6 P ({(1, 2), (2, 1)}) = 1/3 1/36. P X1 = /6 ε k, P X2 = (6 l 7 )/36 ε l. k=1 l=2 (X i ) i I independent P X = i I P Xi. Proof. For A given by (3) ( ) P Xi (A) = P Xi (A i ) = P ({X i A i }). i I i S i S On the other hand, we have (4). Thus hold trivially. Use Theorem A.1.5 to obtain. In the sequel, we consider random variables X i, i.e., (Ω i, A i ) = (R, B) for i I. Theorem 19. Let I = {1,..., n}. If (X 1,..., X n ) independent i I : X i 0 (X i integrable) then ( n X i is integrable and) ( n ) E X i = n E(X i ).

38 II.6. Independence 35 Proof. Use Fubini s Theorem and Theorem 18 to obtain ( n ) E X i = x 1 x n P (X1,...,X n)(d(x 1,..., x n )) R n = x 1 x n (P X1 P Xn )(d(x 1,..., x n )) R n n n = x i P Xi (dx i ) = E( X i ). R Drop if the random variables are integrable. Uncorrelated Random Variables Definition 20. X 1, X 2 L 2 are uncorrelated if E(X 1 X 2 ) = E(X 1 ) E(X 2 ). Theorem 21 (Bienaymé). Let X 1,..., X n L 2 be pairwise uncorrelated. Then ( n ) n Var X i = Var(X i ). Proof. We have ( n ) ( n Var X i = E (X i E(X i )) Moreover, = ) 2 n E(X i E(X i )) 2 + n E((X i E(X i )) (X j E(X j ))). i,j=1 i j E((X i E(X i )) (X j E(X j ))) = E(X i X j ) E(X i ) E(X j ). (The latter quantity is called the covariance between X i and X j.) Convolutions Definition 22. The convolution product of probability measures P 1,..., P n on B is defined by P 1 P n = s(p 1 P n ) where s(x 1,..., x n ) = x x n. Theorem 23. Let (X 1,..., X n ) be independent and S = n X i. Then P S = P X1 P Xn.

39 II.6. Independence 36 Proof. Put X = (X 1,..., X n ). Since S = s (X 1,..., X n ) we get P S = s(p X ) = s(p X1 P Xn ). Remark 24. The class of probability measure on B forms an Abelian semi-group w.r.t., and P ε 0 = P. Theorem 25. For all probability measures P 1, P 2 on B and every P 1 P 2 -integrable function f f d(p 1 P 2 ) = f(x + y) P 1 (dx) P 2 (dy). R If P 1 = h 1 λ 1 then P 1 P 2 = h λ 1 with h(x) = h 1 (x y) P 2 (dy). If P 2 = h 2 λ 1, additionally, then h(x) = R R R R h 1 (x y) h 2 (y) λ(dy). Proof. Use Fubini s Theorem and the transformation theorem. See Billingsley (1979, p. 230). Example 26. (i) Put N(µ, 0) = ε µ. By Theorems 21 and 25 for µ i R and σ i 0. N(µ 1, σ 2 1) N(µ 2, σ 2 2) = N(µ 1 + µ 2, σ σ 2 2) (ii) Consider n independent Bernoulli trials, i.e., (X 1,..., X n ) independent with P Xi = p ε 1 + (1 p) ε 0 for every i {1,..., n}, where p [0, 1]. Inductively, we get for k {1,..., n} k X i B(k, p), see also Übung 4.4. Thus, for any n, m N, B(n, p) B(m, p) = B(n + m, p).

40 Chapter III Limit Theorems Given: a sequence (X n ) n N of random variables on a probability space (Ω, A, P ) and weights 0 < a n. Put S n = n X i, n N 0. For instance, S n might be the cumulative gain after n trials or (one of the coordinates of) the position of a particle after n collisions. Question: Convergence of S n /a n for suitable weights a n in a suitable sense? Particular case: a n = n. Definition 1. (X n ) n N independent and identically distributed (i.i.d.) if (X n ) n N is independent and n N: P Xn = P X1. In this case (S n ) n N0 is called the associated random walk. 1 Zero-One Laws Kolmogorov s Zero-One Law Definition 1. For σ-algebras A n A, n N, the corresponding tail σ-algebra is A = ( σ A m ), n N and A A is called a tail (terminal) event. m n Example 2. Let A n = σ(x n ). Put C = B. Then A = n N σ({x m : m n}) and A A n N C C: A = {(X n, X n+1,... ) C}. 37

41 III.1. Zero-One Laws 38 For instance, {(S n ) n N converges}, {(S n /a n ) n N converges} A, and the function lim inf S n /a n is A -B-measurable. However, S n as well as lim inf S n are not A -B-measurable, in general. Analogously for the lim sup s. Theorem 3 (Kolmogorov s Zero-One Law). Let (A n ) n N be an independent sequence of σ-algebras A n A. Then A A : P (A) {0, 1}. Proof. We show that A and A are independent (terminology), which implies P (A) = P (A) P (A) for every A A. Put A n = σ(a 1 A n ). Note that A σ(a n+1... ). By Corollary II.6.6 and Remark II.6.3.(i) A n, A independent, and therefore n N A n and A are independent, too. Thus, by Theorem II.6.5, ( σ n N A n ), A independent. Finally, ( A σ n N ) ( A n = σ n N A n ). Corollary 4. Let X Z(Ω, A ). Under the assumptions of Theorem 3, X is constant P -a.s. Remark 5. Assume that (X n ) n N is independent. Then P ({(S n ) n N converges}), P ({(S n /a n ) n N converges}) {0, 1}. In case of convergence P -a.s., lim S n /a n is constant P -a.s. The Borel-Cantelli Lemma Definition 6. Let A n A for n N. Then lim inf A n = A m, n N m n Remark 7. ( ) c (i) lim inf A n = lim sup A c n. lim sup A n = n N m n A m.

42 III.1. Zero-One Laws 39 ( ) (ii) P lim inf A n lim inf (iii) If (A n ) n N is independent, then P Proof: Übung 6.3. ( P (A n) lim sup P (A n ) P ( lim sup lim sup A n ). A n ) {0, 1} (Borel s Zero-One Law). Theorem 8 (Borel-Cantelli Lemma). Let A = lim sup A n with A n A. (i) If n=1 P (A n) < then P (A) = 0. (ii) If n=1 P (A n) = and (A n ) n N is independent, then P (A) = 1. Proof. Ad (i): ( P (A) P m n A m ) P (A m ). m=n By assumption, the right-hand side tends to zero as n tends to. Ad (ii): We have P (A c ) = P (lim inf Use 1 x exp( x) for x 0 to obtain ( l P m=n A c m ) = l (1 P (A m )) m=n Ac n) n=1 ( P m n A c m ). l exp( P (A m )) = exp( m=n l P (A m )). By assumption, the right-hand side tends to zero as l tends to. Thus P (A c ) = 0. Example 9. A fair coin is tossed an infinite number of times. Determine the probability that 0 occurs twice in a row infinitely often. Model: (X n ) n N i.i.d. with Put P ({X 1 = 0}) = P ({X 1 = 1}) = 1/2. A n = {X n = X n+1 = 0}. Then (A 2n ) n N is independent and P (A 2n ) = 1/4. Thus P (lim sup A n ) = 1. Remark 10. A stronger version of Theorem 8.(ii) requires only pairwise independence, see Bauer (1996, p. 70). Example 11. Let (X n ) n N be i.i.d. with P ({X 1 = 1}) = p = 1 P ({X 1 = 1}) for some constant p [0, 1]. Put A n = σ(x n ) and A = lim sup {S n = 0}, m=n

43 III.2. The Strong Law of Large Numbers 40 and note that A / A = n N σ({x m : m n}) at least in the standard setting from Example II.6.9. Clearly Use Stirling s Formula to obtain P ({S 2n = 0}) = where r = 4p (1 p) [0, 1]. Suppose that Then r < 1, and therefore 1/2 (S n + n) B(n, p). n! P ({S n = 0}) = n=0 The Borel-Cantelli Lemma implies Suppose that Then P ({S n = 0}) = n=0 but ({S n = 0}) n N is not independent. show that P (A) = 1, see Example ( ) n n 2πn e ( ) 2n p n (1 p) n rn, n πn p 1/2. P ({S 2n = 0}) <. n=0 P (A) = 0. p = 1/2. P ({S 2n = 0}) =, n=0 Using the Central Limit Theorem one can 2 The Strong Law of Large Numbers Throughout this section: (X n ) n N independent. The L 2 Case Put C = { (S n ) n N converges in R }. By Remark 1.5, P (C) {0, 1}. First we provide sufficient conditions for P (C) = 1 to hold.

7 The Radon-Nikodym-Theorem

32 CHPTER II. MESURE ND INTEGRL 7 The Radon-Nikodym-Theorem Given: a measure space (Ω,, µ). Put Z + = Z + (Ω, ). Definition 1. For f (quasi-)µ-integrable and, the integral of f over is (Note: 1 f f.) Theorem