Probability Theory. Richard F. Bass

Size: px
Start display at page:

Download "Probability Theory. Richard F. Bass"

Transcription

1 Probability Theory Richard F. Bass

2 ii c Copyright 2014 Richard F. Bass

3 Contents 1 Basic notions A few definitions from measure theory Definitions Some basic facts Independence Convergence of random variables Types of convergence Weak law of large numbers The strong law of large numbers Techniques related to a.s. convergence Uniform integrability Conditional expectations 25 4 Martingales Definitions Stopping times Optional stopping Doob s inequalities Martingale convergence theorems iii

4 iv CONTENTS 4.6 Applications of martingales Weak convergence 41 6 Characteristic functions Inversion formula Continuity theorem Central limit theorem 59 8 Gaussian sequences 67 9 Kolmogorov extension theorem Brownian motion Definition and construction Nowhere differentiability Markov chains Framework for Markov chains Examples Markov properties Recurrence and transience Stationary measures Convergence

5 Chapter 1 Basic notions 1.1 A few definitions from measure theory Given a set X, a σ-algebra on X is a collection A of subsets of X such that (1) A; (2) if A A, then A c A, where A c = X \ A; (3) if A 1, A 2,... A, then i=1a i and i=1a i are both in A. A measure on a pair (X, A) is a function µ : A [0, ] such that (1) µ( ) = 0; (2) µ( i=1a i ) = i=1 µ(a i) whenever the A i are in A and are pairwise disjoint. A function f : X R is measurable if the set {x : f(x) > a} A for all a R. A property holds almost everywhere, written a.e, if the set where it fails has measure 0. For example, f = g a.e. if µ({x : f(x) g(x)}) = 0. The characteristic function χ A of a set in A is the function that is 1 when x A and is zero otherwise. A function of the form n i=1 a iχ Ai is called a simple function. If f is simple and of the above form, we define f dµ = n a i µ(a i ). i=1 1

6 2 CHAPTER 1. BASIC NOTIONS If f is non-negative and measurable, we define { } f dµ = sup s dµ : 0 s f, s simple. Provided f dµ <, we define f dµ = f + dµ f dµ, where f + = max(f, 0), f = max( f, 0). 1.2 Definitions A probability or probability measure is a measure whose total mass is one. Because the origins of probability are in statistics rather than analysis, some of the terminology is different. For example, instead of denoting a measure space by (X, A, µ), probabilists use (Ω, F, P). So here Ω is a set, F is called a σ-field (which is the same thing as a σ-algebra), and P is a measure with P(Ω) = 1. Elements of F are called events. Elements of Ω are denoted ω. Instead of saying a property occurs almost everywhere, we talk about properties occurring almost surely, written a.s.. Real-valued measurable functions from Ω to R are called random variables and are usually denoted by X or Y or other capital letters. We often abbreviate random variable by r.v. We let A c = (ω Ω : ω / A) (called the complement of A) and B A = B A c. Integration (in the sense of Lebesgue) is called expectation or expected value, and we write E X for XdP. The notation E [X; A] is often used for A XdP. The random variable 1 A is the function that is one if ω A and zero otherwise. It is called the indicator of A (the name characteristic function in probability refers to the Fourier transform). Events such as (ω : X(ω) > a) are almost always abbreviated by (X > a). Given a random variable X, we can define a probability on R by P X (A) = P(X A), A R. (1.1)

7 1.2. DEFINITIONS 3 The probability P X is called the law of X or the distribution of X. We define F X : R [0, 1] by F X (x) = P X ((, x]) = P(X x). (1.2) The function F X is called the distribution function of X. As an example, let Ω = {H, T }, F all subsets of Ω (there are 4 of them), P(H) = P(T ) = 1. Let X(H) = 1 and X(T ) = 0. Then P 2 X = 1δ δ 2 1, where δ x is point mass at x, that is, δ x (A) = 1 if x A and 0 otherwise. F X (a) = 0 if a < 0, 1 if 0 a < 1, and 1 if a 1. 2 Proposition 1.1 The distribution function F X of a random variable X satisfies: (a) F X is nondecreasing; (b) F X is right continuous with left limits; (c) lim x F X (x) = 1 and lim x F X (x) = 0. Proof. We prove the first part of (b) and leave the others to the reader. If x n x, then (X x n ) (X x), and so P(X x n ) P(X x) since P is a measure. Note that if x n x, then (X x n ) (X < x), and so F X (x n ) P(X < x). Any function F : R [0, 1] satisfying (a)-(c) of Proposition 1.1 is called a distribution function, whether or not it comes from a random variable. Proposition 1.2 Suppose F is a distribution function. There exists a random variable X such that F = F X. Proof. Let Ω = [0, 1], F the Borel σ-field, and P Lebesgue measure. Define X(ω) = sup{x : F (x) < ω}. Here the Borel σ-field is the smallest σ-field containing all the open sets. We check that F X = F. Suppose X(ω) y. Then F (z) ω if z > y. By the right continuity of F we have F (y) ω. Hence (X(ω) y) (ω F (y)).

8 4 CHAPTER 1. BASIC NOTIONS Suppose ω F (y). If X(ω) > y, then by the definition of X there exists z > y such that F (z) < ω. But then ω F (y) F (z), a contradiction. Therefore (ω F (y)) (X(ω) y). We then have P(X(ω) y) = P(ω F (y)) = F (y). In the above proof, essentially X = F 1. However F may have jumps or be constant over some intervals, so some care is needed in defining X. Certain distributions or laws are very common. We list some of them. (a) Bernoulli. A random variable is Bernoulli if P(X = 1) = p, P(X = 0) = 1 p for some p [0, 1]. ( ) n (b) Binomial. This is defined by P(X = k) = p k k (1 p) n k, where n is a positive integer, 0 k n, and p [0, 1]. (c) Geometric. For p (0, 1) we set P(X = k) = (1 p)p k. Here k is a nonnegative integer. (d) Poisson. For λ > 0 we set P(X = k) = e λ λ k /k! Again k is a nonnegative integer. (e) Uniform. For some positive integer n, set P(X = k) = 1/n for 1 k n. Suppose F is absolutely continuous. This is not the definition, but being absolutely continuous is equivalent to the existence of a function f such that F (x) = x f(y) dy for all x. We call f = F the density of F. Some examples of distributions characterized by densities are the following. (f) Uniform on [a, b]. Define f(x) = (b a) 1 1 [a,b] (x). This means that if X has a uniform distribution, then 1 P(X A) = b a 1 [a,b](x) dx. A

9 1.3. SOME BASIC FACTS 5 (g) Exponential. For x > 0 let f(x) = λe λx. (h) Standard normal. Define f(x) = 1 2π e x2 /2. So P(X A) = 1 e x2 /2 dx. 2π (i) N (µ, σ 2 ). We shall see later that a standard normal has mean zero and variance one. If Z is a standard normal, then a N (µ, σ 2 ) random variable has the same distribution as µ + σz. It is an exercise in calculus to check that such a random variable has density A 1 2πσ e (x µ)2 /2σ 2. (1.3) (j) Cauchy. Here f(x) = 1 π x Some basic facts We can use the law of a random variable to calculate expectations. Proposition 1.3 If g is nonnegative or if E g(x) <, then E g(x) = g(x) P X (dx). Proof. If g is the indicator of an event A, this is just the definition of P X. By linearity, the result holds for simple functions. By the monotone convergence theorem, the result holds for nonnegative functions, and by writing g = g + g, it holds for g such that E g(x) <. If F X has a density f, then P X (dx) = f(x) dx. So, for example, E X = xf(x) dx and E X 2 = x 2 f(x) dx. (We need E X finite to justify this if X is not necessarily nonnegative.) We define the mean of a random variable to be its expectation, and the variance of a random variable is defined by Var X = E (X E X) 2.

10 6 CHAPTER 1. BASIC NOTIONS For example, it is routine to see that the mean of a standard normal is zero and its variance is one. Note Var X = E (X 2 2XE X + (E X) 2 ) = E X 2 (E X) 2. Another equality that is useful is the following. Proposition 1.4 If X 0 a.s. and p > 0, then E X p = 0 pλ p 1 P(X > λ) dλ. The proof will show that this equality is also valid if we replace P(X > λ) by P(X λ). Proof. Use Fubini s theorem and write 0 pλ p 1 P(X > λ) dλ = E = E 0 X 0 pλ p 1 1 (λ, ) (X) dλ pλ p 1 dλ = E X p. We need two elementary inequalities. Proposition 1.5 Chebyshev s inequality If X 0, P(X a) E X a. Proof. We write P(X a) = E since X/a is bigger than 1 when X [a, ). [ ] [ X ] 1 [a, ) (X) E a 1 [a, )(X) E X/a,

11 1.3. SOME BASIC FACTS 7 If we apply this to X = (Y E Y ) 2, we obtain P( Y E Y a) = P((Y E Y ) 2 a 2 ) Var Y/a 2. (1.4) This special case of Chebyshev s inequality is sometimes itself referred to as Chebyshev s inequality, while Proposition 1.5 is sometimes called the Markov inequality. The second inequality we need is Jensen s inequality, not to be confused with the Jensen s formula of complex analysis. Proposition 1.6 Suppose g is convex and and X and g(x) are both integrable. Then g(e X) E g(x). Proof. One property of convex functions is that they lie above their tangent lines, and more generally their support lines. So if x 0 R, we have g(x) g(x 0 ) + c(x x 0 ) for some constant c. Take x = X(ω) and take expectations to obtain Now set x 0 equal to E X. E g(x) g(x 0 ) + c(e X x 0 ). If A n is a sequence of sets, define (A n i.o.), read A n infinitely often, by (A n i.o.) = n=1 i=n A i. This set consists of those ω that are in infinitely many of the A n. A simple but very important proposition is the Borel-Cantelli lemma. It has two parts, and we prove the first part here, leaving the second part to the next section. Proposition 1.7 (Borel-Cantelli lemma) If n P(A n) <, then P(A n i.o.) = 0.

12 8 CHAPTER 1. BASIC NOTIONS Proof. We have However, which tends to zero as n. P(A n i.o.) = lim n P( i=na i ). P( i=na i ) P(A i ), i=n 1.4 Independence Let us say two events A and B are independent if P(A B) = P(A)P(B). The events A 1,..., A n are independent if P(A i1 A i2 A ij ) = P(A i1 )P(A i2 ) P(A ij ) for every subset {i 1,..., i j } of {1, 2,..., n}. Proposition 1.8 If A and B are independent, then A c and B are independent. Proof. We write P(A c B) = P(B) P(A B) = P(B) P(A)P(B) = P(B)(1 P(A)) = P(B)P(A c ). We say two σ-fields F and G are independent if A and B are independent whenever A F and B G. The σ-field generated by a random variable X, written σ(x), is given by {(X A) : A a Borel subset of R}.) Two random variables X and Y are independent if the σ-field generated by X and the σ-field generated by Y are independent. We define the independence of n σ-fields or n random variables in the obvious way. Proposition 1.8 tells us that A and B are independent if the random variables 1 A and 1 B are independent, so the definitions above are consistent.

13 1.4. INDEPENDENCE 9 If f and g are Borel functions and X and Y are independent, then f(x) and g(y ) are independent. This follows because the σ-field generated by f(x) is a sub-σ-field of the one generated by X, and similarly for g(y ). Let F X,Y (x, y) = P(X x, Y y) denote the joint distribution function of X and Y. (The comma inside the set means and. ) Proposition 1.9 F X,Y (x, y) = F X (x)f Y (y) if and only if X and Y are independent. Proof. If X and Y are independent, the 1 (,x] (X) and 1 (,y] (Y ) are independent by the above comments. Using the above comments and the definition of independence, this shows F X,Y (x, y) = F X (x)f Y (y). Conversely, if the inequality holds, fix y and let M y denote the collection of sets A for which P(X A, Y y) = P(X A)P(Y y). M y contains all sets of the form (, x]. It follows by linearity that M y contains all sets of the form (x, z], and then by linearity again, by all sets that are the finite union of such half-open, half-closed intervals. Note that the collection of finite unions of such intervals, A, is an algebra generating the Borel σ-field. It is clear that M y is a monotone class, so by the monotone class lemma, M y contains the Borel σ-field. For a fixed set A, let M A denote the collection of sets B for which P(X A, Y B) = P(X A)P(Y B). Again, M A is a monotone class and by the preceding paragraph contains the σ-field generated by the collection of finite unions of intervals of the form (x, z], hence contains the Borel sets. Therefore X and Y are independent. The following is known as the multiplication theorem. Proposition 1.10 If X, Y, and XY are integrable and X and Y are independent, then E [XY ] = (E X)(E Y ). Proof. Consider the random variables in σ(x) (the σ-field generated by X) and σ(y ) for which the multiplication theorem is true. It holds for indicators by the definition of X and Y being independent. It holds for simple random variables, that is, linear combinations of indicators, by linearity of both sides.

14 10 CHAPTER 1. BASIC NOTIONS It holds for nonnegative random variables by monotone convergence. And it holds for integrable random variables by linearity again. If X 1,..., X n are independent, then so are X 1 E X 1,..., X n E X n. Assuming everything is integrable, E [(X 1 E X 1 ) + (X n E X n )] 2 = E (X 1 E X 1 ) E (X n E X n ) 2, using the multiplication theorem to show that the expectations of the cross product terms are zero. We have thus shown Var (X X n ) = Var X Var X n. (1.5) We finish up this section by proving the second half of the Borel-Cantelli lemma. Proposition 1.11 Suppose A n is a sequence of independent events. If we have n P(A n) =, then P(A n i.o.) = 1. Note that here the A n are independent, while in the first half of the Borel- Cantelli lemma no such assumption was necessary. Proof. Note P( N i=na i ) = 1 P( N i=na c i) = 1 N P(A c i) = 1 i=n N (1 P(A i )). By the mean value theorem, 1 x e x, so we have that the right hand side is greater than or equal to 1 exp( N i=n P(A i)). As N, this tends to 1, so P( i=na i ) = 1. This holds for all n, which proves the result. i=n

15 Chapter 2 Convergence of random variables 2.1 Types of convergence In this section we consider three ways a sequence of random variables X n can converge. We say X n converges to X almost surely if P(X n X) = 0. X n converges to X in probability if for each ε, as n. X n converges to X in L p if as n. P( X n X > ε) 0 E X n X p 0 The following proposition shows some relationships among the types of convergence. 11

16 12 CHAPTER 2. CONVERGENCE OF RANDOM VARIABLES Proposition 2.1 (1) If X n X a.s., then X n X in probability. (2) If X n X in L p, then X n X in probability. (3) If X n X in probability, there exists a subsequence n j such that X nj converges to X almost surely. Proof. To prove (1), note X n X tends to 0 almost surely, so 1 ( ε,ε) c(x n X) also converges to 0 almost surely. Now apply the dominated convergence theorem. (2) comes from Chebyshev s inequality: as n. P( X n X > ε) = P( X n X p > ε p ) E X n X p /ε p 0 To prove (3), choose n j larger than n j 1 such that P( X n X > 2 j ) < 2 j whenever n n j. So if we let A j = ( X nj X > 2 j ), then P(A j ) 2 j. By the Borel-Cantelli lemma P(A i i.o.) = 0. This implies there exists J (depending on ω) such that if j J, then ω / A j. Then for j J we have X nj (ω) X(ω) 2 j. Therefore X nj (ω) converges to X(ω) if ω / (A i i.o.). Let us give some examples to show there need not be any other implications among the three types of convergence. Let Ω = [0, 1], F the Borel σ-field, and P Lebesgue measure. Let X n = e n 1 (0,1/n). Then clearly X n converges to 0 almost surely and in probability, but E X p n = e np /n for any p. Let Ω be the unit circle, and let P be Lebesgue measure on the circle normalized to have total mass 1. Let t n = n i=1 i 1, and let A n = {e iθ : t n 1 θ < t n }. Let X n = 1 An. Any point on the unit circle will be in infinitely many A n, so X n does not converge almost surely to 0. But P(A n ) = 1/2πn 0, so X n 0 in probability and in L p.

17 2.2. WEAK LAW OF LARGE NUMBERS Weak law of large numbers Suppose X n is a sequence of independent random variables. Suppose also that they all have the same distribution, that is, F Xn = F X1 for all n. This situation comes up so often it has a name, independent, identically distributed, which is abbreviated i.i.d. Define S n = n i=1 X i. S n is called a partial sum process. S n /n is the average value of the first n of the X i s. Theorem 2.2 (Weak law of large numbers=wlln) Suppose the X i i.i.d. and E X1 2 <. Then S n /n E X 1 in probability. are Proof. Since the X i are i.i.d., they all have the same expectation, and so E S n = ne X 1. Hence E (S n /n E X 1 ) 2 is the variance of S n /n. If ε > 0, by Chebyshev s inequality, P( S n /n E X 1 > ε) Var (S n/n) ε 2 = n i=1 Var X i n 2 ε 2 = nvar X 1 n 2 ε 2. (2.1) Since E X 2 1 <, then Var X 1 <, and the result follows by letting n. The weak law of large numbers can be improved greatly; it is enough that xp( X 1 > x) 0 as x. 2.3 The strong law of large numbers Our aim is the strong law of large numbers (SLLN), which says that S n /n converges to E X 1 almost surely if E X 1 <. The strong law of large numbers is the mathematical formulation of the law of averages. If one tosses a fair coin over and over, the proportion of heads should converge to 1/2. Mathematically, if X i is 1 if the i th toss turns up heads and 0 otherwise, then we want S n /n to converge with probability one to 1/2, where S n = X X n.

18 14 CHAPTER 2. CONVERGENCE OF RANDOM VARIABLES Before stating and proving the strong law of large numbers for i.i.d. random variables with finite first moment, we need three facts from calculus. First, recall that if b n b are real numbers, then b b n n b. (2.2) Second, there exists a constant c 1 such that k=n 1 k 2 c 1 n. (2.3) (To prove this, recall the proof of the integral test and compare the sum to n 1 x 2 dx when n 2.) Third, suppose a > 1 and k n is the largest integer less than or equal to a n. Note k n a n /2. Then 1 k {n:k n 2 n j} {n:a n j} 4 a 4 2n j 1 (2.4) 2 1 a 2 by the formula for the sum of a geometric series. We also need two probability estimates. Lemma 2.3 If X 0 a.s., then E X < if and only if P(X n) <. n=1 Proof. Suppose E X is finite. Since P(X x) increases as x decreases, P(X n) n=1 = which is finite. n n=1 n 1 0 P(X x) dx P(X x) dx = E X,

19 2.3. THE STRONG LAW OF LARGE NUMBERS 15 If we now suppose n=1 P(X n) <, write E X = P(X x) dx P(X n) <. n=1 n=1 n+1 n P(X x) dx Lemma 2.4 Let {X n } be an i.i.d. sequence with each X n E X 1 <. Define Y n = X n 1 (Xn n). Then k=1 Var Y k k 2 <. 0 a.s. and Proof. Since Var Y k E Y 2 k, k=1 Var Y k k 2 = = = = E Y 2 k k 2 k=1 k=1 0 k k=1 k=1 k= k 2 1 k 2 2xP(Y k > x) dx 2xP(Y k > x) dx 0 k=1 c (x k) 2xP(Y k > x) dx 1 (x k) 2xP(X k > x) dx 1 k 2 1 (x k)2xp(x 1 > x) dx 1 x 2xP(X 1 > x) dx

20 16 CHAPTER 2. CONVERGENCE OF RANDOM VARIABLES = 2c 1 P(X 1 > x) dx = 2c 1 E X 1 <. 0 We used the fact that the X k are i.i.d., the Fubini theorem, and (2.3). Before proving the strong law, let us first show that we cannot weaken the hypotheses. Theorem 2.5 Suppose the X i are i.i.d. and S n /n converges a.s. Then E X 1 <. Proof. Since S n /n converges, then X n n = S n n S n 1 n 1 n 1 n 0 almost surely. Let A n = ( X n > n). If n=1 P(A n) =, then by using that the A n s are independent, the Borel-Cantelli lemma (second part) tells us that P(A n i.o.) = 1, which contradicts X n /n 0 a.s. Therefore n=1 P(A n) <. Since the X i are identically distributed, we then have P( X 1 > n) <, n=1 and E X 1 < follows by Lemma 2.3. We now state and prove the strong law of large numbers. Theorem 2.6 Suppose {X i } is an i.i.d. sequence with E X 1 <. S n = n i=1 X i. Then S n n E X 1, a.s. Let Proof. By writing each X n as X n + Xn and considering the positive and negative parts separately, it suffices to suppose each X n 0. Define Y k =

21 2.3. THE STRONG LAW OF LARGE NUMBERS 17 X k 1 (Xk k) and let T n = n i=1 Y i. The main part of the argument is to prove that T n /n E X 1 a.s. Step 1. Let a > 1 and let k n be the largest integer less than or equal to a n. Let ε > 0 and let ( Tkn E T kn ) A n = > ε. Then Then P(A n ) Var (T k n /k n ) ε 2 P(A n ) n=1 k n n=1 j=1 = Var T k n k 2 nε 2 = k n = 1 ε 2 Var Y j k 2 nε 2 k j=1 {n:k n 2 n j} 4(1 a 2 ) 1 ε 2 kn j=1 Var Y j k 2 nε 2. 1 Var Y j j=1 Var Y j j 2 by (2.4). By Lemma 2.4, n=1 P(A n) <, and by the Borel-Cantelli lemma, P(A n i.o.) = 0. This means that for each ω except for those in a null set, there exists N(ω) such that if n N(ω), then T kn (ω) ET kn /k n < ε. Applying this with ε = 1/m, m = 1, 2,..., we conclude Step 2. Since T kn E T kn k n 0, a.s. E Y j = E [X j ; X j j] = E [X 1 ; X 1 j] E X 1 by the dominated convergence theorem as j, then by (2.2) E T kn k n = Therefore T kn /k n E X 1 a.s. kn j=1 E Y j k n E X 1.

22 18 CHAPTER 2. CONVERGENCE OF RANDOM VARIABLES Step 3. If k n k k n+1, then T k k T k n+1 k n+1 kn+1 k n since we are assuming that the X k are non-negative. Therefore lim sup k T k k ae X 1, a.s. Similarly, lim inf k T k /k (1/a)E X 1 a.s. Since a > 1 is arbitrary, T k k E X 1, a.s. Step 4. Finally, P(Y n X n ) = n=1 P(X n > n) = n=1 P(X 1 > n) < by Lemma 2.3. By the Borel-Cantelli lemma, P(Y n X n i.o.) = 0. In particular, Y n X n 0 a.s. By (2.2) we have n=1 T n S n n = n i=1 (Y i X i ) n 0, a.s., hence S n /n E X 1 a.s.

23 2.4. TECHNIQUES RELATED TO A.S. CONVERGENCE Techniques related to a.s. convergence If X 1,..., X n are random variables, σ(x 1,..., X n ) is defined to be the smallest σ-field with respect to which X 1,..., X n are measurable. This definition extends to countably many random variables. If X i is a sequence of random variables, the tail σ-field is defined by T = n=1σ(x n, X n+1,...). An example of an event in the tail σ-field is (lim sup n X n > a). Another example is (lim sup n S n /n > a). The reason for this is that if k < n is fixed, S n n = S n k n + i=k+1 X i. n The first term on the right tends to 0 as n. So lim sup S n /n = lim sup( n i=k+1 X i)/n, which is in σ(x k+1, X k+2,...). This holds for each k. The set (lim sup S n > a) is easily seen not to be in the tail σ-field. Theorem 2.7 (Kolmogorov 0-1 law) If the X i events in the tail σ-field have probability 0 or 1. are independent, then the This implies that in the case of i.i.d. random variables, if S n /n has a limit with positive probability, then it has a limit with probability one, and the limit must be a constant. Proof. Let M be the collection of sets in σ(x n+1,...) that is independent of every set in σ(x 1,..., X n ). M is easily seen to be a monotone class and it contains σ(x n+1,..., X N ) for every N > n. Therefore M must be equal to σ(x n+1,...). If A is in the tail σ-field, then A is independent of σ(x 1,..., X n ) for each n. The class M A of sets independent of A is a monotone class, hence is a σ-field containing σ(x 1,..., X n ) for each n. Therefore M A contains σ(x 1,...). We thus have that the event A is independent of itself, or This implies P(A) is zero or one. P(A) = P(A A) = P(A)P(A) = P(A) 2. Next is Kolmogorov s inequality, a special case of Doob s inequality.

24 20 CHAPTER 2. CONVERGENCE OF RANDOM VARIABLES Proposition 2.8 Suppose the X i are independent and E X i = 0 for each i. Then P( max 1 i n S i λ) E S2 n λ 2. Proof. Let A k = ( S k λ, S 1 < λ,..., S k 1 < λ). Note the A k are disjoint and that A k σ(x 1,..., X k ). Therefore A k is independent of S n S k. Then n E Sn 2 E [Sn; 2 A k ] = k=1 n E [(Sk 2 + 2S k (S n S k ) + (S n S k ) 2 ); A k ] k=1 n E [Sk; 2 A k ] + 2 k=1 n E [S k (S n S k ); A k ]. k=1 Using the independence, E [S k (S n S k )1 Ak ] = E [S k 1 Ak ]E [S n S k ] = 0. Therefore n n E Sn 2 E [Sk; 2 A k ] λ 2 P(A k ) = λ 2 P( max S k λ). 1 k n k=1 Our result is immediate from this. k=1 We look at a special case of what is known as Kronecker s lemma. Proposition 2.9 Suppose x i are real numbers and s n = n i=1 x i. If the sum j=1 (x j/j) converges, then s n /n 0. Proof. Let b n = n j=1 (x j/j), b 0 = 0, and suppose b n b. As we have seen, this implies ( n i=1 b i)/n b. We have n(b n b n 1 ) = x n, so s n n n = i=1 (ib i ib i 1 ) n i=1 = ib i n 1 i=1 (i + 1)b i n n n 1 i=1 = b n b i n 1 n 1 b b = 0. n

25 2.4. TECHNIQUES RELATED TO A.S. CONVERGENCE 21 Lemma 2.10 Suppose V i is a sequence of independent random variables, each with mean 0. Let W n = n i=1 V i. If i=1 Var V i <, then W n converges almost surely. Proof. Choose n j > n j 1 such that i=n j Var V i < 2 3j. If n > n j, then applying Kolmogorov s inequality shows that P( max n j i n W i W nj > 2 j ) 2 3j /2 2j = 2 j. Letting n, we have P(A j ) 2 j, where A j = (max n j i W i W nj > 2 j ). By the Borel-Cantelli lemma, P(A j i.o.) = 0. Suppose ω / (A j i.o.). Let ε > 0. Choose j large enough so that 2 j+1 < ε and ω / A j. If n, m > n j, then W n W m W n W nj + W m W nj 2 j+1 < ε. Since ε is arbitrary, W n (ω) is a Cauchy sequence, and hence converges. We now consider the three series criterion. We prove the if portion here and defer the only if to Section 20. Theorem 2.11 Let X i be a sequence of independent random variables., A > 0, and Y i = X i 1 ( Xi A). Then X i converges if and only if all of the following three series converge: (a) P( X n > A); (b) E Y i ; (c) Var Y i. Proof of if part. Since (c) holds, then (Y i E Y i ) converges by Lemma Since (b) holds, taking the difference shows Y i converges. Since (a) holds, P(X i Y i ) = P( X i > A) <, so by Borel-Cantelli, P(X i Y i i.o.) = 0. It follows that X i converges.

26 22 CHAPTER 2. CONVERGENCE OF RANDOM VARIABLES 2.5 Uniform integrability Before proceeding to an extension of the SLLN, we discuss uniform integrability. A sequence of random variables is uniformly integrable if sup X i dp 0 i as M. ( X i >M) Proposition 2.12 Suppose there exists ϕ : [0, ) [0, ) such that ϕ is nondecreasing, ϕ(x)/x as x, and sup i E ϕ( X i ) <. Then the X i are uniformly integrable. Proof. Let ε > 0 and choose x 0 such that x/ϕ(x) < ε if x x 0. If M x 0, ( X i >M) X i = X i ϕ( X i ) ϕ( X i )1 ( Xi >M) ε ϕ( X i ) ε sup E ϕ( X i ). i Proposition 2.13 If X n and Y n are two uniformly integrable sequences, then X n + Y n is also a uniformly integrable sequence. Proof. Since there exists M such that sup n E [ X n ; X n > M] < 1 and sup n E [ Y n ; Y n > M] < 1, then sup n E X n M + 1, and similarly for the Y n. Note P( X n + Y n > K) E X n + E Y n K by Chebyshev s inequality. Now let ε > 0 and choose L such that E [ X n ; X n > L] < ε 2(1 + M) K

27 2.5. UNIFORM INTEGRABILITY 23 and the same when X n is replaced by Y n. E [ X n + Y n ; X n + Y n > K] E [ X n ; X n + Y n > K] + E [ Y n ; X n + Y n > K] E [ X n ; X n > L] + E [ X n ; X n L, X n + Y n > K] + E [ Y n ; Y n > L] + E [ Y n ; Y n L, X n + Y n > K] ε + LP( X n + Y n > K] + ε + LP( X n + Y n > K] 2ε + L 4(1 + M). K Given ε we have already chosen L. Choose K large enough that 4L(1 + M)/K < ε. We then get that E [ X n + Y n ; X n + Y n > K] is bounded by 3ε. The main result we need in this section is Vitali s convergence theorem. Theorem 2.14 T7.3 If X n X almost surely and the X n are uniformly integrable, then E X n X 0. Proof. By the above proposition, X n X is uniformly integrable and tends to 0 a.s., so without loss of generality, we may assume X = 0. Let ε > 0 and choose M such that sup n E [ X n ; X n > M] < ε. Then E X n E [ X n ; X n > M] + E [ X n ; X n M] ε + E [ X n 1 ( Xn M)]. The second term on the right goes to 0 by dominated convergence. Proposition 2.15 Suppose X i is an i.i.d. sequence and E X 1 <. Then E S n n E X 1 0. Proof. Without loss of generality we may assume E X 1 = 0. By the SLLN, S n /n 0 a.s. So we need to show that the sequence S n /n is uniformly integrable.

28 24 CHAPTER 2. CONVERGENCE OF RANDOM VARIABLES Pick M such that E [ X 1 ; X 1 > M] < ε. Pick N = ME X 1 /ε. So P( S n /n > N) E S n /nn E X 1 /N = ε/m. We used here We then have E S n n E X i = ne X 1. i=1 E [ X i ; S n /n > N] E [ X i : X i > M] + E [ X i ; X i M, S n /n > N] ε + MP( S n /n > N) 2ε. Finally, E [ S n /n ; S n /n > N] 1 n n E [ X i ; S n /n > N] 2ε. i=1

29 Chapter 3 Conditional expectations If F G are two σ-fields and X is an integrable G measurable random variable, the conditional expectation of X given F, written E [X F] and read as the expectation (or expected value) of X given F, is any F measurable random variable Y such that E [Y ; A] = E [X; A] for every A F. The conditional probability of A G given F is defined by P(A F) = E [1 A F]. If Y 1, Y 2 are two F measurable random variables with E [Y 1 ; A] = E [Y 2 ; A] for all A F, then Y 1 = Y 2, a.s., or conditional expectation is unique up to a.s. equivalence. In the case X is already F measurable, E [X F] = X. If X is independent of F, E [X F] = E X. Both of these facts follow immediately from the definition. For another example, which ties this definition with the one used in elementary probability courses, if {A i } is a finite collection of disjoint sets whose union is Ω, P(A i ) > 0 for all i, and F is the σ-field generated by the A i s, then P(A F) = P(A A i ) 1 Ai. P(A i i ) This follows since the right-hand side is F measurable and its expectation over any set A i is P(A A i ). As an example, suppose we toss a fair coin independently 5 times and let X i be 1 or 0 depending whether the ith toss was a heads or tails. Let A be the event that there were 5 heads and let F i = σ(x 1,..., X i ). Then 25

30 26 CHAPTER 3. CONDITIONAL EXPECTATIONS P(A) = 1/32 while P(A F 1 ) is equal to 1/16 on the event (X 1 = 1) and 0 on the event (X 1 = 0). P(A F 2 ) is equal to 1/8 on the event (X 1 = 1, X 2 = 1) and 0 otherwise. We have E [E [X F]] = E X (3.1) because E [E [X F]] = E [E [X F]; Ω] = E [X; Ω] = E X. The following is easy to establish. Proposition 3.1 (a) If X Y are both integrable, then E [X F] E [Y F] a.s. (b) If X and Y are integrable and a R, then E [ax + Y F] = ae [X F] + E [Y F]. It is easy to check that limit theorems such as monotone convergence and dominated convergence have conditional expectation versions, as do inequalities like Jensen s and Chebyshev s inequalities. Thus, for example, we have the following. Proposition 3.2 (Jensen s inequality for conditional expectations) If g is convex and X and g(x) are integrable, A key fact is the following. E [g(x) F] g(e [X F]), Proposition 3.3 If X and XY are integrable and Y is measurable with respect to F, then E [XY F] = Y E [X F]. (3.2) Proof. If A F, then for any B F, E [ 1 A E [X F]; B ] = E [ E [X F]; A B ] = E [X; A B] = E [1 A X; B]. Since 1 A E [X F] is F measurable, this shows that (3.1) holds when Y = 1 A and A F. Using linearity and taking limits shows that (3.1) holds whenever Y is F measurable and Xand XY are integrable. Two other equalities follow. a.s.

31 27 Proposition 3.4 If E F G, then E [ E [X F] E ] = E [X E] = E [ E [X E] F ]. Proof. The right equality holds because E [X E] is E measurable, hence F measurable. To show the left equality, let A E. Then since A is also in F, E [ E [ E [X F] E ] ; A ] = E [ E [X F]; A ] = E [X; A] = E [E [ X E]; A ]. Since both sides are E measurable, the equality follows. To show the existence of E [X F], we proceed as follows. Proposition 3.5 If X is integrable, then E [X F] exists. Proof. Using linearity, we need only consider X 0. Define a measure Q on F by Q(A) = E [X; A] for A F. This is trivially absolutely continuous with respect to P F, the restriction of P to F. Let E [X F] be the Radon- Nikodym derivative of Q with respect to P F. The Radon-Nikodym derivative is F measurable by construction and so provides the desired random variable. When F = σ(y ), one usually writes E [X Y ] for E [X F]. Notation that is commonly used (however, we will use it only very occasionally and only for heuristic purposes) is E [X Y = y]. The definition is as follows. If A σ(y ), then A = (Y B) for some Borel set B by the definition of σ(y ), or 1 A = 1 B (Y ). By linearity and taking limits, if Z is σ(y ) measurable, Z = f(y ) for some Borel measurable function f. Set Z = E [X Y ] and choose f Borel measurable so that Z = f(y ). Then E [X Y = y] is defined to be f(y). If X L 2 and M = {Y L 2 : Y is F-measurable}, one can show that E [X F] is equal to the projection of X onto the subspace M. We will not use this in these notes.

32 28 CHAPTER 3. CONDITIONAL EXPECTATIONS

33 Chapter 4 Martingales 4.1 Definitions In this section we consider martingales. Let F n be an increasing sequence of σ-fields. A sequence of random variables M n is adapted to F n if for each n, M n is F n measurable. M n is a martingale if M n is adapted to F n, M n is integrable for all n, and E [M n F n 1 ] = M n 1, a.s., n = 2, 3,.... (4.1) If we have E [M n F n 1 ] M n 1 a.s. for every n, then M n is a submartingale. If we have E [M n F n 1 ] M n 1, we have a supermartingale. Submartingales have a tendency to increase. Let us take a moment to look at some examples. If X i is a sequence of mean zero i.i.d. random variables and S n is the partial sum process, then M n = S n is a martingale, since E [M n F n 1 ] = M n 1 + E [M n M n 1 F n 1 ] = M n 1 + E [M n M n 1 ] = M n 1, using independence. If the X i s have variance one and M n = S 2 n n, then E [S 2 n F n 1 ] = E [(S n S n 1 ) 2 F n 1 ]+2S n 1 E [S n F n 1 ] S 2 n 1 = 1+S 2 n 1, using independence. It follows that M n is a martingale. Another example is the following: if X L 1 and M n = E [X F n ], then M n is a martingale. 29

34 30 CHAPTER 4. MARTINGALES If M n is a martingale and H n F n 1 for each n, it is easy to check that N n = n i=1 H i(m i M i 1 ) is also a martingale. If M n is a martingale and g(m n ) is integrable for each n, then by Jensen s inequality E [g(m n+1 ) F n ] g(e [M n+1 F n ]) = g(m n ), or g(m n ) is a submartingale. Similarly if g is convex and nondecreasing on [0, ) and M n is a positive submartingale, then g(m n ) is a submartingale because E [g(m n+1 ) F n ] g(e [M n+1 F n ]) g(m n ). 4.2 Stopping times We next want to talk about stopping times. Suppose we have a sequence of σ-fields F i such that F i F i+1 for each i. An example would be if F i = σ(x 1,..., X i ). A random mapping N from Ω to {0, 1, 2,...} is called a stopping time if for each n, (N n) F n. A stopping time is also called an optional time in the Markov theory literature. The intuition is that the sequence knows whether N has happened by time n by looking at F n. Suppose some motorists are told to drive north on Highway 99 in Seattle and stop at the first motorcycle shop past the second realtor after the city limits. So they drive north, pass the city limits, pass two realtors, and come to the next motorcycle shop, and stop. That is a stopping time. If they are instead told to stop at the third stop light before the city limits (and they had not been there before), they would need to drive to the city limits, then turn around and return past three stop lights. That is not a stopping time, because they have to go ahead of where they wanted to stop to know to stop there. We use the notation a b = min(a, b) and a b = max(a, b). The proof of the following is immediate from the definitions. Proposition 4.1 (a) Fixed times n are stopping times. (b) If N 1 and N 2 are stopping times, then so are N 1 N 2 and N 1 N 2. (c) If N n is a nondecreasing sequence of stopping times, then so is N = sup n N n. (d) If N n is a nonincreasing sequence of stopping times, then so is N =

35 4.3. OPTIONAL STOPPING 31 inf n N n. (e) If N is a stopping time, then so is N + n. We define F N = {A : A (N n) F n for all n}. 4.3 Optional stopping Note that if one takes expectations in (4.1), one has E M n = E M n 1, and by induction E M n = E M 0. The theorem about martingales that lies at the basis of all other results is Doob s optional stopping theorem, which says that the same is true if we replace n by a stopping time N. There are various versions, depending on what conditions one puts on the stopping times. Theorem 4.2 If N is a bounded stopping time with respect to F n and M n a martingale, then E M N = E M 0. Proof. Since N is bounded, let K be the largest value N takes. We write K E M N = E [M N ; N = k] = k=0 Note (N = k) is F j measurable if j k, so K E [M k ; N = k]. k=0 Hence E [M k ; N = k] = E [M k+1 ; N = k] E M N = This completes the proof. = E [M k+2 ; N = k] =... = E [M K ; N = k]. K E [M K ; N = k] = E M K = E M 0. k=0 The assumption that N be bounded cannot be entirely dispensed with. For example, let M n be the partial sums of a sequence of i.i.d. random variable that take the values ±1, each with probability 1 2. If N = min{i : M i = 1}, we will see later on that N < a.s., but E M N = 1 0 = E M 0. The same proof as that in Theorem 4.2 gives the following corollary.

36 32 CHAPTER 4. MARTINGALES Corollary 4.3 If N is bounded by K and M n E M N E M K. is a submartingale, then Also the same proof gives Corollary 4.4 If N is bounded by K, A F N, and M n is a submartingale, then E [M N ; A] E [M K ; A]. Proposition 4.5 If N 1 N 2 are stopping times bounded by K and M is a martingale, then E [M N2 F N1 ] = M N1, a.s. Proof. Suppose A F N1. We need to show E [M N1 ; A] = E [M N2 ; A]. Define a new stopping time N 3 by { N 1 (ω) if ω A N 3 (ω) = N 2 (ω) if ω / A. It is easy to check that N 3 is a stopping time, so E M N3 = E M K = E M N2 implies E [M N1 ; A] + E [M N2 ; A c ] = E [M N2 ]. Subtracting E [M N2 ; A c ] from each side completes the proof. The following is known as the Doob decomposition. Proposition 4.6 Suppose X k is a submartingale with respect to an increasing sequence of σ-fields F k. Then we can write X k = M k + A k such that M k is a martingale adapted to the F k and A k is a sequence of random variables with A k being F k 1 -measurable and A 0 A 1. Proof. Let a k = E [X k F k 1 ] X k 1 for k = 1, 2,... Since X k is a submartingale, then each a k 0. Then let A k = k i=1 a i. The fact that the A k are increasing and measurable with respect to F k 1 is clear. Set M k = X k A k. Then or M k is a martingale. E [M k+1 M k F k ] = E [X k+1 X k F k ] a k+1 = 0, Combining Propositions 4.5 and 4.6 we have

37 4.4. DOOB S INEQUALITIES 33 Corollary 4.7 Suppose X k is a submartingale, and N 1 N 2 are bounded stopping times. Then E [X N2 F N1 ] X N Doob s inequalities The first interesting consequences of the optional stopping theorems are Doob s inequalities. If M n is a martingale, denote M n = max i n M i. Theorem 4.8 If M n is a martingale or a positive submartingale, P(M n a) E [ M n ; M n a]/a E M n /a. Proof. Set M n+1 = M n. Let N = min{j : M j a} (n + 1). Since is convex, M n is a submartingale. If A = (Mn a), then A F N because By Corollary 4.4 P(M n a) E A (N j) = (N n) (N j) = (N j) F j. [ M ] n a ; M n a 1 a E [ M N ; A] 1 a E [ M n ; A] 1 a E M n. For p > 1, we have the following inequality. Theorem 4.9 If p > 1 and E M i p < for i n, then ( p ) pe E (Mn) p Mn p. p 1 Proof. Note Mn n i=1 M n, hence Mn L p. We write, using Theorem 4.8 for the first inequality, E (M n) p = = E 0 M n 0 pa p 1 P(M n > a) da pa p 2 M n da = p p 1 (E (M n) p ) (p 1)/p (E M n p ) 1/p. 0 pa p 1 E [ M n 1 (M n a)/a] da p p 1 E [(M n) p 1 M n ]

38 34 CHAPTER 4. MARTINGALES The last inequality follows by Hölder s inequality. Now divide both sides by the quantity (E (M n) p ) (p 1)/p. 4.5 Martingale convergence theorems The martingale convergence theorems are another set of important consequences of optional stopping. The main step is the upcrossing lemma. The number of upcrossings of an interval [a, b] is the number of times a process crosses from below a to above b. To be more exact, let S 1 = min{k : X k a}, T 1 = min{k > S 1 : X k b}, and S i+1 = min{k > T i : X k a}, T i+1 = min{k > S i+1 : X k b}. The number of upcrossings U n before time n is U n = max{j : T j n}. Theorem 4.10 (Upcrossing lemma) If X k is a submartingale, E U n (b a) 1 E [(X n a) + ]. Proof. The number of upcrossings of [a, b] by X k is the same as the number of upcrossings of [0, b a] by Y k = (X k a) +. Moreover Y k is still a submartingale. If we obtain the inequality for the the number of upcrossings of the interval [0, b a] by the process Y k, we will have the desired inequality for upcrossings of X. So we may assume a = 0. Fix n and define Y n+1 = Y n. This will still be a submartingale. Define the S i, T i as above, and let S i = S i (n + 1), T i = T i (n + 1). Since T i+1 > S i+1 > T i, then T n+1 = n + 1. We write n+1 n+1 E Y n+1 = E Y S 1 + E [Y T i Y S i ] + E [Y S i+1 Y T i ]. i=0 i=0

39 4.5. MARTINGALE CONVERGENCE THEOREMS 35 All the summands in the third term on the right are nonnegative since Y k is a submartingale. For the jth upcrossing, Y T j Y S j is always greater than or equal to 0. So b a, while Y T j Y S j (Y T i Y S i ) (b a)u n. i=0 So E U n E Y n+1 /(b a). (4.2) This leads to the martingale convergence theorem. Theorem 4.11 If X n is a submartingale such that sup n E X n + X n converges a.s. as n. <, then Proof. Let U(a, b) = lim n U n. For each a, b rational, by monotone convergence, E U(a, b) c(b a) 1 E (X n a) + <. So U(a, b) <, a.s. Taking the union over all pairs of rationals a, b, we see that a.s. the sequence X n (ω) cannot have lim sup X n > lim inf X n. Therefore X n converges a.s., although we still have to rule out the possibility of the limit being infinite. Since X n is a submartingale, E X n E X 0, and thus E X n = E X + n + E X n = 2E X + n E X n 2E X + n E X 0. By Fatou s lemma, E lim n X n sup n E X n <, or X n converges a.s. to a finite limit. Corollary 4.12 If X n is a positive supermartingale or a martingale bounded above or below, X n converges a.s. Proof. If X n is a positive supermartingale, X n is a submartingale bounded above by 0. Now apply Theorem 4.11.

40 36 CHAPTER 4. MARTINGALES If X n is a martingale bounded above, by considering X n, we may assume X n is bounded below. Looking at X n + M for fixed M will not affect the convergence, so we may assume X n is bounded below by 0. Now apply the first assertion of the corollary. Proposition 4.13 If X n is a martingale with sup n E X n p < for some p > 1, then the convergence is in L p as well as a.s. This is also true when X n is a submartingale. If X n is a uniformly integrable martingale, then the convergence is in L 1. If X n X in L 1, then X n = E [X F n ]. X n is a uniformly integrable martingale if the collection of random variables X n is uniformly integrable. Proof. The L p convergence assertion follows by using Doob s inequality (Theorem 4.9) and dominated convergence. The L 1 convergence assertion follows since a.s. convergence together with uniform integrability implies L 1 convergence. Finally, if j < n, we have X j = E [X n F j ]. If A F j, E [X j ; A] = E [X n ; A] E [X ; A] by the L 1 convergence of X n to X. Since this is true for all A F j, X j = E [X F j ]. 4.6 Applications of martingales One application of martingale techniques is Wald s identities. Proposition 4.14 Suppose the Y i are i.i.d. with E Y 1 <, N is a stopping time with E N <, and N is independent of the Y i. Then E S N = (E N)(EY 1 ), where the S n are the partial sums of the Y i. Proof. S n n(e Y 1 ) is a martingale, so E S n N = E (n N)E Y 1 by optional stopping. The right hand side tends to (E N)(E Y 1 ) by monotone convergence. S n N converges almost surely to S N, and we need to show the expected values converge.

41 4.6. APPLICATIONS OF MARTINGALES 37 Note S n N = = n k S n k 1 (N=k) Y j 1 (N=k) k=0 n Y j 1 (N=k) = k=0 j=0 j=0 k>j j=0 n Y j 1 (N j) Y j 1 (N j). j=0 The last expression, using the independence, has expected value (E Y j )P(N j) (E Y 1 )(1 + E N) <. j=0 So by dominated convergence, we have E S n N E S N. Wald s second identity is a similar expression for the variance of S N. Let S n be your fortune at time n. In a fair casino, E [S n+1 F n ] = S n. If N is a stopping time, the optional stopping theorem says that E S N = E S 0 ; in other words, no matter what stopping time you use and what method of betting, you will do not better on average than ending up with what you started with. We can use martingales to find certain hitting probabilities. Proposition 4.15 Suppose the Y i are i.i.d. with P(Y 1 = 1) = 1/2, P(Y 1 = 1) = 1/2, and S n the partial sum process. Suppose a and b are positive integers. Then P(S n hits a before b) = b a + b. If N = min{n : S n { a, b}}, then E N = ab. Proof. Sn 2 n is a martingale, so E Sn N 2 = E n N. Let n. The right hand side converges to E N by monotone convergence. Since S n N is bounded in absolute value by a+b, the left hand side converges by dominated convergence to E SN 2, which is finite. So E N is finite, hence N is finite almost surely.

42 38 CHAPTER 4. MARTINGALES S n is a martingale, so E S n N = E S 0 = 0. By dominated convergence, and the fact that N < a.s., hence S n N S N, we have E S N = 0, or We also have ap(s N = a) + bp(s N = b) = 0. P(S N = a) + P(S N = b) = 1. Solving these two equations for P(S N = a) and P(S N = b) yields our first result. Since E N = E S 2 N = a2 P(S N = a) + b 2 P(S N = b), substituting gives the second result. Based on this proposition, if we let a, we see that P(N b < ) = 1 and E N b =, where N b = min{n : S n = b}. An elegant application of martingales is a proof of the SLLN. Fix N large. Let Y i be i.i.d. Let Z n = E [Y 1 S n, S n+1,..., S N ]. We claim Z n = S n /n. Certainly S n /n is σ(s n,, S N ) measurable. If A σ(s n,..., S N ) for some n, then A = ((S n,..., S N ) B) for some Borel subset B of R N n+1. Since the Y i are i.i.d., for each k n, E [Y 1 ; (S n,..., S N ) B] = E [Y k ; (S n,..., S N ) B]. Summing over k and dividing by n, E [Y 1 ; (S n,..., S N ) B] = E [S n /n; (S n,..., S N ) B]. Therefore E [Y 1 ; A] = E [S n /n; A] for every A σ(s n,..., S N ). Thus Z n = S n /n. Let X k = Z N k, and let F k = σ(s N k, S N k+1,..., S N ). Note F k gets larger as k gets larger, and by the above X k = E [Y 1 F k ]. This shows that X k is a martingale (cf. the next to last example in Section 11). By Doob s upcrossing inequality, if Un X is the number of upcrossings of [a, b] by X, then E UN 1 X E X+ N 1 /(b a) E Z 1 /(b a) = E Y 1 /(b a). This differs by at most one from the number of upcrossings of [a, b] by Z 1,..., Z N. So the expected number of upcrossings of [a, b] by Z k for k N is bounded by 1 + E Y 1 /(b a). Now let N. By Fatou s lemma, the expected number of upcrossings of [a, b] by Z 1,... is finite. Arguing as in the proof of the martingale convergence theorem, this says that Z n = S n /n does not oscillate.

43 4.6. APPLICATIONS OF MARTINGALES 39 It is conceivable that S n /n. But by Fatou s lemma, E[lim S n /n ] lim inf E S n /n lim inf ne Y 1 /n = E Y 1 <.

44 40 CHAPTER 4. MARTINGALES

45 Chapter 5 Weak convergence We will see later that if the X i are i.i.d. with mean zero and variance one, then S n / n converges in the sense P(S n / n [a, b]) P(Z [a, b]), where Z is a standard normal. If S n / n converged in probability or almost surely, then by the Kolmogorov zero-one law it would converge to a constant, contradicting the above. We want to generalize the above type of convergence. We say F n converges weakly to F if F n (x) F (x) for all x at which F is continuous. Here F n and F are distribution functions. We say X n converges weakly to X if F Xn converges weakly to F X. We sometimes say X n converges in distribution or converges in law to X. Probabilities µ n converge weakly if their corresponding distribution functions converges, that is, if F µn (x) = µ n (, x] converges weakly. An example that illustrates why we restrict the convergence to continuity points of F is the following. Let X n = 1/n with probability one, and X = 0 with probability one. F Xn (x) is 0 if x < 1/n and 1 otherwise. F Xn (x) converges to F X (x) for all x except x = 0. Proposition 5.1 X n converges weakly to X if and only if E g(x n ) E g(x) for all g bounded and continuous. The idea that E g(x n ) converges to E g(x) for all g bounded and continuous makes sense for any metric space and is used as a definition of weak 41

46 42 CHAPTER 5. WEAK CONVERGENCE convergence for X n taking values in general metric spaces. Proof. First suppose E g(x n ) converges to E g(x). Let x be a continuity point of F, let ε > 0, and choose δ such that F (y) F (x) < ε if y x < δ. Choose g continuous such that g is one on (, x], takes values between 0 and 1, and is 0 on [x + δ, ). Then F Xn (x) E g(x n ) E g(x) F X (x + δ) F (x) + ε. Similarly, if h is a continuous function taking values between 0 and 1 that is 1 on (, x δ] and 0 on [x, ), F Xn (x) E h(x n ) E h(x) F X (x δ) F (x) ε. Since ε is arbitrary, F Xn (x) F X (x). Now suppose X n converges weakly to X. We start by making some observations. First, if we have a distribution function, it is increasing and so the number of points at which it has a discontinuity is at most countable. Second, if g is a continuous function on a closed bounded interval, it can be approximated uniformly on the interval by step functions. Using the uniform continuity of g on the interval, we may even choose the step function so that the places where it jumps are not in some pre-specified countable set. Our third observation is that P(X < x) = lim k P(X x 1 k ) = lim y x F X(y); thus if F X is continuous at x, then P(X = x) = F X (x) P(X < x) = 0. Suppose g is bounded and continuous, and we want to prove that E g(x n ) E g(x). By multiplying by a constant, we may suppose that g is bounded by 1. Let ε > 0 and choose M such that F X (M) > 1 ε and F X ( M) < ε and so that M and M are continuity points of F X. We see that P(X n M) = F Xn ( M) F X ( M) < ε, and so for large enough n, we have P(X n M) 2ε. Similarly, for large enough n, P(X n > M) 2ε. Therefore for large enough n, E g(x n ) differs from E (g1 [ M,M] )(X n ) by at most 4ε. Also, E g(x) differs from E (g1 [ M,M] )(X) by at most 2ε. Let h be a step function such that sup x M h(x) g(x) < ε and h is 0 outside of [ M, M]. We choose h so that the places where h jumps are continuity points of F and of all the F n. Then E (g1 [ M,M] )(X n ) differs from E h(x n ) by at most ε, and the same when X n is replaced by X.

A D VA N C E D P R O B A B I L - I T Y

A D VA N C E D P R O B A B I L - I T Y A N D R E W T U L L O C H A D VA N C E D P R O B A B I L - I T Y T R I N I T Y C O L L E G E T H E U N I V E R S I T Y O F C A M B R I D G E Contents 1 Conditional Expectation 5 1.1 Discrete Case 6 1.2

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

Exercises Measure Theoretic Probability

Exercises Measure Theoretic Probability Exercises Measure Theoretic Probability 2002-2003 Week 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Chapter 5. Weak convergence

Chapter 5. Weak convergence Chapter 5 Weak convergence We will see later that if the X i are i.i.d. with mean zero and variance one, then S n / p n converges in the sense P(S n / p n 2 [a, b])! P(Z 2 [a, b]), where Z is a standard

More information

Random Process Lecture 1. Fundamentals of Probability

Random Process Lecture 1. Fundamentals of Probability Random Process Lecture 1. Fundamentals of Probability Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/43 Outline 2/43 1 Syllabus

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Probability and Measure

Probability and Measure Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real

More information

Useful Probability Theorems

Useful Probability Theorems Useful Probability Theorems Shiu-Tang Li Finished: March 23, 2013 Last updated: November 2, 2013 1 Convergence in distribution Theorem 1.1. TFAE: (i) µ n µ, µ n, µ are probability measures. (ii) F n (x)

More information

Chapter 1. Poisson processes. 1.1 Definitions

Chapter 1. Poisson processes. 1.1 Definitions Chapter 1 Poisson processes 1.1 Definitions Let (, F, P) be a probability space. A filtration is a collection of -fields F t contained in F such that F s F t whenever s

More information

Math 6810 (Probability) Fall Lecture notes

Math 6810 (Probability) Fall Lecture notes Math 6810 (Probability) Fall 2012 Lecture notes Pieter Allaart University of North Texas September 23, 2012 2 Text: Introduction to Stochastic Calculus with Applications, by Fima C. Klebaner (3rd edition),

More information

17. Convergence of Random Variables

17. Convergence of Random Variables 7. Convergence of Random Variables In elementary mathematics courses (such as Calculus) one speaks of the convergence of functions: f n : R R, then lim f n = f if lim f n (x) = f(x) for all x in R. This

More information

Elementary Probability. Exam Number 38119

Elementary Probability. Exam Number 38119 Elementary Probability Exam Number 38119 2 1. Introduction Consider any experiment whose result is unknown, for example throwing a coin, the daily number of customers in a supermarket or the duration of

More information

Measure and integration

Measure and integration Chapter 5 Measure and integration In calculus you have learned how to calculate the size of different kinds of sets: the length of a curve, the area of a region or a surface, the volume or mass of a solid.

More information

Some basic elements of Probability Theory

Some basic elements of Probability Theory Chapter I Some basic elements of Probability Theory 1 Terminology (and elementary observations Probability theory and the material covered in a basic Real Variables course have much in common. However

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1

ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1 ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1 Last compiled: November 6, 213 1. Conditional expectation Exercise 1.1. To start with, note that P(X Y = P( c R : X > c, Y c or X c, Y > c = P( c Q : X > c, Y

More information

STOR 635 Notes (S13)

STOR 635 Notes (S13) STOR 635 Notes (S13) Jimmy Jin UNC-Chapel Hill Last updated: 1/14/14 Contents 1 Measure theory and probability basics 2 1.1 Algebras and measure.......................... 2 1.2 Integration................................

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

I. ANALYSIS; PROBABILITY

I. ANALYSIS; PROBABILITY ma414l1.tex Lecture 1. 12.1.2012 I. NLYSIS; PROBBILITY 1. Lebesgue Measure and Integral We recall Lebesgue measure (M411 Probability and Measure) λ: defined on intervals (a, b] by λ((a, b]) := b a (so

More information

7 Convergence in R d and in Metric Spaces

7 Convergence in R d and in Metric Spaces STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

Exercises Measure Theoretic Probability

Exercises Measure Theoretic Probability Exercises Measure Theoretic Probability Chapter 1 1. Prove the folloing statements. (a) The intersection of an arbitrary family of d-systems is again a d- system. (b) The intersection of an arbitrary family

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Stochastic integration. P.J.C. Spreij

Stochastic integration. P.J.C. Spreij Stochastic integration P.J.C. Spreij this version: April 22, 29 Contents 1 Stochastic processes 1 1.1 General theory............................... 1 1.2 Stopping times...............................

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic

More information

1 Sequences of events and their limits

1 Sequences of events and their limits O.H. Probability II (MATH 2647 M15 1 Sequences of events and their limits 1.1 Monotone sequences of events Sequences of events arise naturally when a probabilistic experiment is repeated many times. For

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

Jump Processes. Richard F. Bass

Jump Processes. Richard F. Bass Jump Processes Richard F. Bass ii c Copyright 214 Richard F. Bass Contents 1 Poisson processes 1 1.1 Definitions............................. 1 1.2 Stopping times.......................... 3 1.3 Markov

More information

18.175: Lecture 3 Integration

18.175: Lecture 3 Integration 18.175: Lecture 3 Scott Sheffield MIT Outline Outline Recall definitions Probability space is triple (Ω, F, P) where Ω is sample space, F is set of events (the σ-algebra) and P : F [0, 1] is the probability

More information

Real Analysis Problems

Real Analysis Problems Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

More information

Probability and Measure

Probability and Measure Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability

More information

1 Probability space and random variables

1 Probability space and random variables 1 Probability space and random variables As graduate level, we inevitably need to study probability based on measure theory. It obscures some intuitions in probability, but it also supplements our intuition,

More information

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias Notes on Measure, Probability and Stochastic Processes João Lopes Dias Departamento de Matemática, ISEG, Universidade de Lisboa, Rua do Quelhas 6, 1200-781 Lisboa, Portugal E-mail address: jldias@iseg.ulisboa.pt

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

Probability: Handout

Probability: Handout Probability: Handout Klaus Pötzelberger Vienna University of Economics and Business Institute for Statistics and Mathematics E-mail: Klaus.Poetzelberger@wu.ac.at Contents 1 Axioms of Probability 3 1.1

More information

STA 711: Probability & Measure Theory Robert L. Wolpert

STA 711: Probability & Measure Theory Robert L. Wolpert STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A

More information

Problem Sheet 1. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X : Ω R be defined by 1 if n = 1

Problem Sheet 1. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X : Ω R be defined by 1 if n = 1 Problem Sheet 1 1. Let Ω = {1, 2, 3}. Let F = {, {1}, {2, 3}, {1, 2, 3}}, F = {, {2}, {1, 3}, {1, 2, 3}}. You may assume that both F and F are σ-fields. (a) Show that F F is not a σ-field. (b) Let X :

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

2 n k In particular, using Stirling formula, we can calculate the asymptotic of obtaining heads exactly half of the time:

2 n k In particular, using Stirling formula, we can calculate the asymptotic of obtaining heads exactly half of the time: Chapter 1 Random Variables 1.1 Elementary Examples We will start with elementary and intuitive examples of probability. The most well-known example is that of a fair coin: if flipped, the probability of

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

Measure-theoretic probability

Measure-theoretic probability Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is

More information

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure

More information

Independent random variables

Independent random variables CHAPTER 2 Independent random variables 2.1. Product measures Definition 2.1. Let µ i be measures on (Ω i,f i ), 1 i n. Let F F 1... F n be the sigma algebra of subsets of Ω : Ω 1... Ω n generated by all

More information

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries Chapter 1 Measure Spaces 1.1 Algebras and σ algebras of sets 1.1.1 Notation and preliminaries We shall denote by X a nonempty set, by P(X) the set of all parts (i.e., subsets) of X, and by the empty set.

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017.

Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017. Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 017 Nadia S. Larsen 17 November 017. 1. Construction of the product measure The purpose of these notes is to prove the main

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

Measurable functions are approximately nice, even if look terrible.

Measurable functions are approximately nice, even if look terrible. Tel Aviv University, 2015 Functions of real variables 74 7 Approximation 7a A terrible integrable function........... 74 7b Approximation of sets................ 76 7c Approximation of functions............

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Martingale Theory for Finance

Martingale Theory for Finance Martingale Theory for Finance Tusheng Zhang October 27, 2015 1 Introduction 2 Probability spaces and σ-fields 3 Integration with respect to a probability measure. 4 Conditional expectation. 5 Martingales.

More information

1: PROBABILITY REVIEW

1: PROBABILITY REVIEW 1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following

More information

Brownian Motion and Conditional Probability

Brownian Motion and Conditional Probability Math 561: Theory of Probability (Spring 2018) Week 10 Brownian Motion and Conditional Probability 10.1 Standard Brownian Motion (SBM) Brownian motion is a stochastic process with both practical and theoretical

More information

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989), Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer

More information

MATH 117 LECTURE NOTES

MATH 117 LECTURE NOTES MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set

More information

CHAPTER 1. Martingales

CHAPTER 1. Martingales CHAPTER 1 Martingales The basic limit theorems of probability, such as the elementary laws of large numbers and central limit theorems, establish that certain averages of independent variables converge

More information

1. Probability Measure and Integration Theory in a Nutshell

1. Probability Measure and Integration Theory in a Nutshell 1. Probability Measure and Integration Theory in a Nutshell 1.1. Measurable Space and Measurable Functions Definition 1.1. A measurable space is a tuple (Ω, F) where Ω is a set and F a σ-algebra on Ω,

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

Lecture 21: Expectation of CRVs, Fatou s Lemma and DCT Integration of Continuous Random Variables

Lecture 21: Expectation of CRVs, Fatou s Lemma and DCT Integration of Continuous Random Variables EE50: Probability Foundations for Electrical Engineers July-November 205 Lecture 2: Expectation of CRVs, Fatou s Lemma and DCT Lecturer: Krishna Jagannathan Scribe: Jainam Doshi In the present lecture,

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

µ X (A) = P ( X 1 (A) )

µ X (A) = P ( X 1 (A) ) 1 STOCHASTIC PROCESSES This appendix provides a very basic introduction to the language of probability theory and stochastic processes. We assume the reader is familiar with the general measure and integration

More information

Lecture 4 Lebesgue spaces and inequalities

Lecture 4 Lebesgue spaces and inequalities Lecture 4: Lebesgue spaces and inequalities 1 of 10 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 4 Lebesgue spaces and inequalities Lebesgue spaces We have seen how

More information

1. Stochastic Processes and filtrations

1. Stochastic Processes and filtrations 1. Stochastic Processes and 1. Stoch. pr., A stochastic process (X t ) t T is a collection of random variables on (Ω, F) with values in a measurable space (S, S), i.e., for all t, In our case X t : Ω S

More information

Part III Advanced Probability

Part III Advanced Probability Part III Advanced Probability Based on lectures by M. Lis Notes taken by Dexter Chua Michaelmas 2017 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after

More information

Chapter 4. Measure Theory. 1. Measure Spaces

Chapter 4. Measure Theory. 1. Measure Spaces Chapter 4. Measure Theory 1. Measure Spaces Let X be a nonempty set. A collection S of subsets of X is said to be an algebra on X if S has the following properties: 1. X S; 2. if A S, then A c S; 3. if

More information

Basics of Stochastic Analysis

Basics of Stochastic Analysis Basics of Stochastic Analysis c Timo Seppäläinen This version November 16, 214 Department of Mathematics, University of Wisconsin Madison, Madison, Wisconsin 5376 E-mail address: seppalai@math.wisc.edu

More information

4th Preparation Sheet - Solutions

4th Preparation Sheet - Solutions Prof. Dr. Rainer Dahlhaus Probability Theory Summer term 017 4th Preparation Sheet - Solutions Remark: Throughout the exercise sheet we use the two equivalent definitions of separability of a metric space

More information

MATH/STAT 235A Probability Theory Lecture Notes, Fall 2013

MATH/STAT 235A Probability Theory Lecture Notes, Fall 2013 MATH/STAT 235A Probability Theory Lecture Notes, Fall 2013 Dan Romik Department of Mathematics, UC Davis December 30, 2013 Contents Chapter 1: Introduction 6 1.1 What is probability theory?...........................

More information

ECON 2530b: International Finance

ECON 2530b: International Finance ECON 2530b: International Finance Compiled by Vu T. Chau 1 Non-neutrality of Nominal Exchange Rate 1.1 Mussa s Puzzle(1986) RER can be defined as the relative price of a country s consumption basket in

More information

Probability Theory II. Spring 2016 Peter Orbanz

Probability Theory II. Spring 2016 Peter Orbanz Probability Theory II Spring 2016 Peter Orbanz Contents Chapter 1. Martingales 1 1.1. Martingales indexed by partially ordered sets 1 1.2. Martingales from adapted processes 4 1.3. Stopping times and

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

9 Brownian Motion: Construction

9 Brownian Motion: Construction 9 Brownian Motion: Construction 9.1 Definition and Heuristics The central limit theorem states that the standard Gaussian distribution arises as the weak limit of the rescaled partial sums S n / p n of

More information

On the convergence of sequences of random variables: A primer

On the convergence of sequences of random variables: A primer BCAM May 2012 1 On the convergence of sequences of random variables: A primer Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park armand@isr.umd.edu BCAM May 2012 2 A sequence a :

More information

Chapter 4. The dominated convergence theorem and applications

Chapter 4. The dominated convergence theorem and applications Chapter 4. The dominated convergence theorem and applications The Monotone Covergence theorem is one of a number of key theorems alllowing one to exchange limits and [Lebesgue] integrals (or derivatives

More information

Combinatorics in Banach space theory Lecture 12

Combinatorics in Banach space theory Lecture 12 Combinatorics in Banach space theory Lecture The next lemma considerably strengthens the assertion of Lemma.6(b). Lemma.9. For every Banach space X and any n N, either all the numbers n b n (X), c n (X)

More information

LEBESGUE INTEGRATION. Introduction

LEBESGUE INTEGRATION. Introduction LEBESGUE INTEGATION EYE SJAMAA Supplementary notes Math 414, Spring 25 Introduction The following heuristic argument is at the basis of the denition of the Lebesgue integral. This argument will be imprecise,

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define 1 Measures 1.1 Jordan content in R N II - REAL ANALYSIS Let I be an interval in R. Then its 1-content is defined as c 1 (I) := b a if I is bounded with endpoints a, b. If I is unbounded, we define c 1

More information

Measures and Measure Spaces

Measures and Measure Spaces Chapter 2 Measures and Measure Spaces In summarizing the flaws of the Riemann integral we can focus on two main points: 1) Many nice functions are not Riemann integrable. 2) The Riemann integral does not

More information

Probability Theory I: Syllabus and Exercise

Probability Theory I: Syllabus and Exercise Probability Theory I: Syllabus and Exercise Narn-Rueih Shieh **Copyright Reserved** This course is suitable for those who have taken Basic Probability; some knowledge of Real Analysis is recommended( will

More information

Advanced Probability

Advanced Probability Advanced Probability Perla Sousi October 10, 2011 Contents 1 Conditional expectation 1 1.1 Discrete case.................................. 3 1.2 Existence and uniqueness............................ 3 1

More information

Lebesgue measure and integration

Lebesgue measure and integration Chapter 4 Lebesgue measure and integration If you look back at what you have learned in your earlier mathematics courses, you will definitely recall a lot about area and volume from the simple formulas

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

3. DISCRETE RANDOM VARIABLES

3. DISCRETE RANDOM VARIABLES IA Probability Lent Term 3 DISCRETE RANDOM VARIABLES 31 Introduction When an experiment is conducted there may be a number of quantities associated with the outcome ω Ω that may be of interest Suppose

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Compendium and Solutions to exercises TMA4225 Foundation of analysis

Compendium and Solutions to exercises TMA4225 Foundation of analysis Compendium and Solutions to exercises TMA4225 Foundation of analysis Ruben Spaans December 6, 2010 1 Introduction This compendium contains a lexicon over definitions and exercises with solutions. Throughout

More information

FE 5204 Stochastic Differential Equations

FE 5204 Stochastic Differential Equations Instructor: Jim Zhu e-mail:zhu@wmich.edu http://homepages.wmich.edu/ zhu/ January 20, 2009 Preliminaries for dealing with continuous random processes. Brownian motions. Our main reference for this lecture

More information

Probability and Measure

Probability and Measure Probability and Measure Martin Orr 19th January 2009 0.1 Introduction This is a set of notes for the Probability and Measure course in Part II of the Cambridge Mathematics Tripos. They were written as

More information