Preliminaries. Probability space

Size: px
Start display at page:

Download "Preliminaries. Probability space"

Transcription

1 Preliminaries This section revises some parts of Core A Probability, which are essential for this course, and lists some other mathematical facts to be used (without proof) in the following. Probability space We recall that a sample space Ω is a collection of all possible outcomes of a probabilistic experiment; an event is a collection of possible outcomes, ie., a subset of the sample space. We introduce the impossible event and the certain event Ω; also, if A Ω and B Ω are events, it is natural to consider other events such that A B (A or B), A B (A and B), A c Ω \ A (not A), and A \ B (A but not B). Definition 0.1. Let A be a collection of subsets of Ω. We shall call A a field if it has the following properties: 1. A; 2. if A 1, A 2 A, then A 1 A 2 A; 3. if A A, then A c A. Remark Obviously, every field is closed w.r.t. taking finite unions or intersections. Definition 0.2. Let F be a collection of subsets of Ω. We shall call F a σ-field if it has the following properties: 1. F; 2. if A 1, A 2, F, then k=1 A k F; 3. if A F, then A c F. Remark Obviously, property 2 above can be replaced by the equivalent condition k=1 A k F. Clearly, if Ω is fixed, the smallest σ-field in Ω is just {, Ω } and the biggest σ-field consists of all subsets of Ω. We observe the following simple fact: Exercise 0.3. Show that if F 1 and F 2 are σ-fields, then 1 F 1 F 2 is a σ-field, but, in general, F 1 F 2 is not a σ-field. If A and B are events, we say that A and B are incompatible (or disjoint), if A B =. Definition 0.4. Let Ω be a sample space, and F be a σ-field of events in Ω. A probability distribution P on (Ω, F) is a collection of numbers P(A), A F, possessing the following properties: 1 and, in fact, an intersection of arbitrary (even uncountable!) collection of σ-fields; i

2 A1 for every event A F, P(A) 0; A2 P(Ω) = 1; A3 for any pair of incompatible events A and B, P(A B) = P(A) + P(B); A4 for any countable collection A 1, A 2,... of mutually incompatible 2 events, ( P k=1 A k ) = P ( ) A k. Remark Notice that the additivity axiom A4 above does not extend to uncountable collections of incompatible events. Remark Obviously, property A4 above and Definition 0.2 are nontrivial only in examples with infinitely many different events, ie., when the collection F of all events (and, therefore, the sample space Ω) is infinite. k=1 The following properties are immediate from the above axioms: P1 for any pair of events A, B in Ω we have P(B \ A) = P(B) P(A B), P(A B) = P(A) + P(B \ A); in particular, P(A c ) = 1 P(A); P2 if events A, B in Ω are such that A B Ω, then 0 = P( ) P(A) P(B) P(Ω) = 1. P3 if A 1, A 2,..., A n are events in Ω, then P ( n k=1 A k) n k=1 P(A k) with the inequality becoming an equality if these events are mutually incompatible; Definition 0.5. A probability space is a triple (Ω, F, P), where Ω is a sample space, F is a σ-field of events in Ω, and P( ) is a probability measure on (Ω, F). In what follows we shall always assume that some probability space (Ω, F, P) is fixed. Conditional probability, independence Definition 0.6. The conditional probability of event A given event B such that P(B) > 0, is P(A B) def P(A B) =. P(B) It is easy to see that if E F is any event with P(E) > 0, then P( E) is a probability measure on (Ω, F), ie., axioms A1 A4 and properties P1 P3 hold (just P( ) replace with P( E)). We list some additional useful properties of conditional probabilities: 2 ie., A k A j = for all k j; ii

3 P4 multiplication rule for probabilities: if A and B are events, then P(A B) = P(A) P(B A) = P(B) P(A B) ; more generally, if A 1,..., A n are arbitrary events in F, then ( n P k=1 A k ) = P(A 1 ) n k=2 ( k 1 ) P A k A j ; (0.1) j=1 for example, P(A B C) = P(A) P(B A) P(C A B). P5 partition theorem or formula of total probability: we say that events B 1,..., B n form a partition of Ω if they are mutually incompatible (disjoint) and their union n k=1 B k is the entire space Ω. The partition theorem says that if B 1,..., B n form a partition of Ω, then for any event A we have P(A) = n P(B k ) P(A B k ). (0.2) k=1 P6 Bayes theorem: for any events A, B, we have P(A B) = P(A) P(B A) P(B) ; in particular, if D is an event and C 1,..., C n form a partition of Ω, then P(C k D) = P(C k ) P(D C k ) n k=1 P(C k) P(D C k ) ; (0.3) Exercise 0.7. Check carefully (ie., by induction) property P4 above. Then next definition is one of the most important in probability theory. Definition 0.8. We say that events A and B are independent if P ( A B ) = P(A) P(B) ; (0.4) under (0.4), we have P(A B) = P(A), ie., event A is independent of B; similarly, P(B A) = P(B), ie., event B is independent of A. More generally, Definition 0.9. A collection of events A 1,..., A n is called (mutually) independent, if ( n ) n P A k = P ( ) A k. (0.5) k=1 It is immediate from (0.5) that every sub-collection of { A 1,..., A n } is also mutually independent. k=1 iii

4 Random variables It is very common for the sample space Ω of possible outcomes to be a set of real numbers. Then the outcome to the probabilistic experiment is often called a random variable and denoted by a capital letter such as X. In this case the events are subsets A R and it is usual to write P(X A) instead of P(A) and similarly P(X = 1) for P({1}), P(1 < X < 5) for P(A) where A = (1, 5) and so on. The probability distribution of a r.v. X is the collection of probabilities P(X A) for all intervals A R (and other events that can be obtained from intervals via axioms A1 A4). Let X be a random variable (so the sample space Ω is a subset of R). We say that X is a discrete r.v. if in addition Ω is countable, i.e., if the possible values for X can be enumerated in a (possibly infinite) list. In this case the function p(x) def = P(X = x) (defined for all real x) is called the probability mass function of X and the corresponding probability distribution of X is defined via P(X A) = x A P(X = x) = x A p(x). If X takes possible values x 1, x 2,..., then, by axiom A3, k 1 p(x k) = 1 and if x is NOT one of the possible values of X then p(x) = 0. Similarly, a random variable X has a continuous probability distribution if there exists a non-negative function f(x) on R such that for any interval (a, b) R P ( a < X < b ) = b a f(x) dx ; in particular, by axiom A3, we must have f(x) dx = 1. The function f( ) is then called the probability density function (or pdf) of X. In Core A Probability you saw a number of random variables with discrete (Bernoulli, binomial, geometric, Poisson) or continuous (uniform, exponential, normal) distribution. Definition For any random variable X, the cumulative distribution function (or cdf) of X is the function F : R [0, 1] that is given at all x R by F (x) def = P(X x) = { x f(y) dy, X a continuous r.v.; x k :x k x p(x k), X a discrete r.v.; (0.6) If, in addition, f(x) is continuous function on some interval (a, b) then by the fundamental theorem of calculus, for all x (a, b), F (x) = f(x); ie., the cdf determines the pdf and vice versa. In fact, the cdf of a r.v. X always determines its probability distribution. Remark Suppose X is a random variable and h is some real-valued function defined for all real numbers. Then h(x) is also a random variable, namely, the outcome to a new experiment obtained by running the old experiment to produce the r.v. X and then evaluating h(x). iv

5 Joint distributions It is essential for most useful applications of probability to have a theory which can handle many random variables simultaneously. Definition Let (X 1,..., X n ) be a multivariate random variable (or random vector). Its cumulative distribution function is F X1,...,X n (x 1,..., x n ) def = P ( X 1 x 1,..., X n x n ), (0.7) here and below we write { X 1 x 1,..., X n x n } = {X1 x 1 } {X n x n }. Bivariate variables: discrete case Suppose (X, Y ) is a bivariate r.v. and that X and Y are discrete r.v. taking possible values x 1, x 2,... and y 1, y 2,... respectively. Then the collection of probabilities p(x j, y k ) P(X = x j, Y = y k ), k 1, j 1, determines the joint probability distribution of (X, Y ). It is important to remember that given the joint distribution of X, Y we can recover the probability density function p X (in this case it is called the marginal probability distribution) of X via p X (x j ) P(X = x j ) = k P(X = x j, Y = y k ) = k p(x j, y k ) (0.8) for any possible value x j of X. Similarly, the marginal probability distribution of Y is given by p Y (y k ) = j P(X = x j, Y = y k ) = j p(x j, y k ). Conditional distribution and independence For any discrete bivariate rv (X, Y ) the conditional distribution of X given Y has probability mass function p(x y) P(X = x Y = y) = p(x, y) p Y (y) for all y with p Y (y) > 0. There is also a r.v. version of the partition theorem (0.2); it is often called the law of total probability: for any X-event A, P(X A) = y P ( X A Y = y ) p Y (y). (0.9) We say that X and Y are independent if for all x, y Alternatively, we have p(x, y) = p X (x) p Y (y). (0.10) v

6 Definition Random variables X, Y are independent if for every X-event A and every Y -event B we have P ( X A, Y B ) P ( (X, Y ) A B ) = P(X A) P(Y B). (0.11) The definitions (0.10), (0.11) can be easily extended to the case of any general multivariate distribution. Let (X 1,..., X n ) be a random vector and g : R n R be a function. Then g(x 1,..., X n ) is a random variable (obtained by the new experiment consisting of first carrying out the original experiment to determine the value of (X 1,..., X n ) and then applying the function g to this ordered n-tuple to obtain a real number g(x 1,..., X n )). Exercise ). Let (X, Y, Z) be a random vector with independent components; show that for any function h : R 2 R the variables h(x, Y ) and Z are independent. 2). Let X 1,..., X k and Y 1,..., Y m be a collection of independent random variables. If the functions f and g are such that f : R k R and g : R m R, show that the random variables f(x 1,..., X m ) and g(y 1,..., Y m ) are independent. Bivariate variables: continuous case We will only consider the case where (X, Y ) has a continuous joint pdf f(x, y) defined for (x, y) R 2. By analogy with the definition for discrete random variables, P ( (X, Y ) A ) = f(x, y) dx dy for any integrable set A. In this case X and Y have the marginal pdfs f X (x) = and for any interval (a, b) we have P(a < X < b) A f(x, y) dy, f Y (y) = b a f(x, y) dx dy = b a f(x, y) dx f X (x) dx. We define the continuous conditional density of X given Y by f(x y) = { f(x, y)/fy (y), if f Y (y) > 0 0, if f Y (y) = 0. Also, X and Y are independent if and only if f(x, y) = f X (x) f Y (y) for every pair (x, y) R 2. Transformations g(x, Y ) in the continuous case are treated similarly to the discrete case. vi

7 Expectation Definition For any random variable X the expected value (or mean) of X is the number x k Ω E(X) = x k p(x k ), X discrete with pmf p ; xf(x) dx, (0.12) X continuous with pdf f. The following generalisation of this definition is of great importance to the whole theory. If X is a discrete rv and takes values in Ω = {x 1, x 2,... } with probabilities p(x k ) and the transformed rv g(x) takes values y 1, y 2,... with probabilities q(y m ) def = P(X G m ) = x G m p(x), where G m def = { x Ω : g(x) = y m }, then the sets G m form a partition of Ω and it follows that E ( g(x) ) = y m q(y m ) = g(x) p(x) = g(x k ) p(x k ). m m x G m Similarly, if X is continuous rv with pdf f, then E ( g(x) ) = g(x) f(x) dx. The most important properties of the expectation are: E1 linearity: let f, g be real functions and let a, b be real numbers; then k=1 E ( af(x) + bg(x) ) = a E ( f(x) ) + b E ( g(x) ), (0.13) provided the corresponding expectations exist. E2 monotonicity: if h(x) 0 for all real x, then E ( h(x) ) 0; in other words, if the real functions f, g are such that f(x) g(x) for all real x, then provided the corresponding expectations exist. E ( f(x) ) E ( g(x) ), (0.14) Recall three important special cases: the variance Var(X) of a rv X, its r-th moment E(X r ), and its moment generating function, M X (t), Var(X) def = E ( X E(X) ) 2, MX (t) def = E ( e tx). Exercise Let X be a rv, and let g : R [0, ] be an increasing function such that E ( g(x) ) <. Show that for any real a, one has P ( X > a ) E( g(x) ). (0.15) g(a) In particular, P ( X > a ) E ( exp { λ(x a) }) for any real a and any λ > 0. Notice that the Markov inequality and the Chebyshev inequality are special cases of (0.15). vii

8 Multivariate case In the multivariate case, the expectation is defined similarly and has properties analogous to the considered above. Additionally, we mention two other properties: E3 multivariate linearity: let (X 1,..., X n ) be a random vector, g 1,..., g n be real functions, and a 1,..., a n be real numbers. Then ( n E k=1 ) a k g k (X k ) = n a k E ( g k (X k ) ). (0.16) E4 independence: if X 1,..., X n are independent rv s, so that their joint pmf/pdf factorises, k=1 p X1,...,X n (x 1,..., x n ) = n p Xk (x k ), k=1 then for all real functions g 1,..., g n one has ( n E k=1 g k ( Xk ) ) = n E ( g k (X k ) ). (0.17) We say that the variables X and Y are uncorrelated if their covariance, Cov(X, Y ) def = E ( (X E(X))(Y E(Y )) ) E(XY ) E(X) E(Y ), (0.18) vanishes, Cov(X, Y ) = 0. In particular, any pair of independent variables is uncorrelated. By linearity property E3, the variance Var ( n k=1 X k) of the sum of rv s X1,..., X n equals ( n ) Var X k = k=1 k=1 n Var ( ) X k + 2 Cov(X k, X l ). k=1 Thus, if the variables X 1,..., X n are pairwise uncorrelated (in particular, independent), then ( n ) n Var X k = Var ( ) X k. (0.19) Conditional expectation k=1 Let X be a discrete rv on a sample space Ω, and let A Ω be an event. The conditional expectation of X given A is a number E(X A) defined by k=1 k<l E(X A) = x x P(X = x A), (0.20) viii

9 where the sum runs through all possible values of X. In particular, we have the partition theorem for expectation: if events B 1,..., B n form a partition of the sample space Ω, then E(X) = n E(X B k ) P(B k ). k=1 Using the definition (0.20), it is immediate to compute E(X Y = y); we recall that then E(X Y ) is a random variable such that E ( E(X Y ) ) = E(X). Limiting results Theorem 0.16 (Law of Large Numbers). Let X 1,..., X n be iid (independent, identically distributed) rv s such that E(X k ) µ, Var(X k ) = σ 2. Denote S n def = n k=1 X k. Then for any fixed a > 0 as n. P ( n 1 S n µ > a ) 0 (0.21) Theorem 0.17 (Central Limit Theorem). Under the conditions of the previous theorem, denote Sn def = S n nµ Var(Sn ) S n nµ σ n. Then, as n, the distribution of Sn converges to that of the standard Gaussian random variable (ie., N (0, 1)): for every fixed a R, P ( S n a ) a Moment generating functions 1 2π e y2 /2 dy. (0.22) As mentioned before, the moment generating function (or mgf) of a rv X is defined via M X (t) def = E ( e tx). (0.23) We finish by listing several useful properties of mgf s. M1 For each positive integer r E(X r ) = dr M X dt r (0). M2 [uniqueness] The mgf M X (t) of X uniquely determines the probability distribution of X, provided that M X (t) is finite in some neighbourhood of the origin. ix

10 M3 [linear transformation] If X has mgf M X (t), and Y = ax + b, then M Y (t) = e bt M X (at). M4 [independence] Suppose that X 1,..., X n are independent rv s and let Y = n k=1 X k. Then n M Y (t) = M Xk (t). M5 [convergence] Suppose that Y 1, Y 2,... is an infinite sequence of rv s, and that Y is a further random variable. Suppose that M Y (t) is finite for t < a for some positive a and that for all t ( a, a) Then, as n, k=1 M Yn (t) M Y (t) as n. P(Y n c) P(Y c). for all real c such that P(Y = c) = 0. x

11 1 Sequences of events and their limits 1.1 Monotone sequences of events Sequences of events arise naturally when a probabilistic experiment is repeated many times. For example, if a coin is flipped consecutively, the event 3 A = { heads never seen } is just the intersection, A = n 1 A n, of the events A n = { heads not seen in the first n tosses }. This simple remark leads to the following important observations: 1) taking countable operations is not that exotic in probabilistic models, and thus any reasonable theory should deal with σ-fields; b) the event A is in some sense the limit of the sequence (A n ) n 1, so understanding limits of sequences of sets (events) might be useful. In general, finding a limit of a sequence of sets is not easy and we will not do this here. 4 Instead, we will mostly consider monotone sequences of events. Definition 1.1. A sequence (A n ) n 1 of events is increasing if A n A n+1 for all n 1. It is decreasing if A n A n+1 for all n 1. Example 1.2. If (A n ) n 1 is a sequence of arbitrary events, then the sequence (B n ) n 1 with B n = n k=1 A k is increasing, whereas the sequence (C n ) n 1 with C n = n k=1 A k is decreasing. The following result shows that the probability measure is continuous along monotone sequences of events. Lemma 1.3. If (A n ) n 1 is increasing with A = lim n A n = n 1 A n, then P(A) = P ( lim A ) n = lim P(A n). n n If (A n ) n 1 is a decreasing sequence with A = lim n A n = n 1 A n, then P(A) = P ( lim A ) n = lim P(A n). n n Remark If (A n ) n 1 is not a monotone sequence of events, the claim of the lemma is not necessarily true (find a counterexample!). Proof. Let (A n ) n 1 be increasing with A = n 1 A n. Denote C 1 = A 1 and, for n 2, put C n = A n \ A n 1 = A n A c n 1. We then have (why?) 5 A n = n A k = k=1 n C k, k=1 A k = k=1 C k. 3 Apriori we do not know that A is an event, ie, can be assigned probability to! 4 The corresponding theory is the subject of pure courses such as set theory or (real) analysis/measure theory; if interested, have a look at problems E26 E28 and/or get in touch! 5 Decompositions in the form A n = n ( k=1 Ak \( k 1 m=1 A m) ) are often called telescopic; they are analogous to those in sequential Bayes formulae. k=1 1

12 Since the events in (C k ) k 1 are mutually incompatible, the σ-additivity property P3 of the probability measure gives ( ) ( ) P(A) = P A k = P C k = P ( ) C k 1 k 1 and therefore k 1 k 1 0 P(A) P(A n ) = P ( A \ A n ) = P ( k>n ) C k = P(C k ) 0 k>n as n, as a tail sum of a convergent series k 1 P( C k ). A similar argument holds for decreasing sequences (do this!). Example 1.4. A standard six-sided die is tossed repeatedly. Let N 1 denote the total number of ones observed. Assuming that the individual outcomes are independent, show that P(N 1 = ) = 1. Solution. We show that P(N 1 < ) = 0. First, notice that {N 1 < } = n 1 B n with B n = { no ones after nth toss }, so it is enough to show that P(B n ) = 0 for all n. However, B n = m>0 C n,m with C n,m = { no one on tosses n + 1,... n + m } being a decreasing sequence, C n,m C n,m+1 for all m 1. Since P(C n,m ) = (5/6) m 0 as m, Lemma implies P(B n ) = lim m P(C n,m ) = 0, as requested. Example 1.5. Let X be a positive random variable with P(X < ) = 1. For k 1, denote X k = 1 k X. Show that the event A(ε) { X k > ε finitely often } satisfies P ( A(ε) ) = 1 for every ε > 0. Solution. Let Ω 0 = {ω Ω : X(ω) < } be the event X is finite ; by assumption, P(Ω 0 ) = 1. Consider the events B k = { X k > ε} = {ω : X(ω) > kε}. Since the random variables X k form a pointwise decreasing sequence, namely ω Ω, X k (ω) X k+1 (ω) for all k 1, the events B k are decreasing (ie., B k B k+1 for all k 1) towards {X = }, we deduce that A(ε) = { B k finitely often } Ω 0. Remark The previous argument shows that the event {ω : X k (ω) 0} coincides with ε>0 A(ε) Ω 0 ; in other words, the sequence of random variables X k converges (to zero) with probability one (or almost surely), P(X k 0) = Borel-Cantelli lemma ( Let (A k ) k 1 be an infinite sequence of events from some probability space Ω, F, P. One is often interested in finding out how many of the events An occur. 6 The event that infinitely many of the events A n occur, written { A } n i.o. or { A } n infinitely often, is { An i.o. } = n 1 k=n A k. (1.1) 6 Eg., some results in Number Theory about rational approximations of irrational numbers are formulated in a form similar to Lemma 1.6! 2

13 The next result is very important for applications. Its proof uses the intrinsic monotonicity structure of the definition (1.1). Lemma 1.6 (Borel-Cantelli lemma). Let A = n 1 k=n A k be the event that infinitely many of the A n occur. Then: a) If k P(A k) <, then P(A) = 0, ie., with probability one only finitely many of the A k occur. b) If k P(A k) = and A 1, A 2,... are independent events, then P(A) = 1. Remark The independence condition in part b) above cannot be relaxed. Otherwise, let A n E for all n 1, where E F satisfies 0 < P(E) < 1 (and thus the events A k are not independent). Then A = E and P(A) = P(E) 1. Remark An even more interesting counterexample to part b) without the independence property can be constructed as follows (do this!): Let X be a uniform random variable on (0, 1), write X U(0, 1). For n 1, consider the event A n = { X < 1/n }. It is easy to see that A = { A n i.o. } =, so that one can have n P(A n) = together with P(A) = P(A n i.o.) = 0. Example 1.7 (Infinite monkey theorem). By the second Borel-Cantelli lemma, Lemma 1.6b), a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely (ie., with probability one) type any particular chosen text, such as the complete works of William Shakespeare (and, in fact, infinitely many copies of the chosen text). Idea of the argument. Suppose that the typewriter has 50 keys, and the word to be typed is banana. The chance that the first letter typed is b is 1/50, as is the chance that the second letter is a, and so on. These events are independent, so the chance of the first six letters matching banana is 1/50 6. For the same reason the chance that the next six letters match banana is also 1/50 6, and so on. Now, the chance of not typing banana in each block of six letters is 1 1/50 6. Because each block is typed independently, the chance of not typing banana in any of the first n blocks of six letters 7 is p = (1 1/50 6 ) n. If we were to count occurences of banana that crossed blocks, p would approach zero even more quickly. 8 Finally, once the first copy of the word banana appears, the process starts afresh independently of the past, so that the probability of obtaining the second copy of the word banana within the same number of blocks is still p etc.; the result now follows from Lemma 1.6. Of course, the same argument applies if the monkey were typing any other string of characters of finite length, eg., your favourite novel. 9 7 As n grows, p gets smaller. For n = 10 6, p is more than 99.99%, but for n = the probability p is about 52.73% and for an n = it is about 0.17%. As n goes to infinity, the probability p can be made as small as one likes. 8 Using the theory of Markov chains, discussed later in the course, you should be able to show that the expected hitting time of the word banana is exactly You can use the R script available from the course webpage to explore sequences of different length and/or different typewriters. 3

14 Remark By using an appropriate monotone approximation, one can deduce the result as in Example 1.4, without explicitly using the Borel-Cantelli lemma. Moreover, the same argument can be extended to the situations, when the probability p n of typing banana in the nth block of six letters varies with n, but remains uniformly positive, ie., p n δ > 0 for all n 1. The true power of the lemma is seen in the situations when p n 0 slowly enough to have n p n = (provided the events in different blocks are independent). Proof of Lemma 1.6. a) For every n 1, let B n def = k=na k be the event that at least one of A k with k n occurs. Since A B n for all n 1, we have P(A) P(B n ) P(A k ) 0 k=n as n, whenever k P(A k) <. b) The event A c = { A n occur finitely often } is related to the sequence B c n = k=na c k { none of A k, k n, occurs } via A c = A c k = Bn c, n k=n n so it is sufficient to show that P(Bn) c = 0 for all n 1. By independence and the elementary inequality 1 x e x with x 0, we get so that ( m P k=n A c k as the sum diverges. ) = m P ( A c ) m ( k = 1 P ( ) ) { A k exp k=n P ( ( Bn) c = lim P m m k=n k=n A c k ) { exp k=n m k=n } P(A k ) = 0, } P(A k ) Example 1.8. A standard six-sided die is tossed repeatedly. Let N k denote the total number of tosses when face k was observed. Assuming that the individual outcomes are independent, show that P(N 1 = ) = P(N 2 = ) = P(N 1 =, N 2 = ) = 1. Solution. Equalities P(N 1 = ) = P(N 2 = ) = 1 can be derived as in Example 1.4, so that the intersection event {N 1 =, N 2 = } has probability one. Alternatively, we derive the first equality from the Borel-Cantelli lemma. To this end, fix k {1, 2,..., 6} and denote A k n = { nth toss shows k }. For different n, the events A k n are independent and have the same probability 1/6. Since n P(Ak n) =, the Borel-Cantelli lemma implies that the event { N k = } { A k n infinitely often } has probability one. The remaining claims now follow as indicated above. Example 1.9. A coin showing heads with probability p is tossed repeatedly. With X n denoting the result of the nth toss, let C n = {X n = T, X n 1 = H}. Show that P(C n i.o. ) = 1. 4

15 Solution. We have { C 2n i.o. } { C n i.o. }, where P(C 2n ) pq and C 2n are independent. The result follows from Lemma 1.6b) (or via monotone approximation). The Borel-Cantelli lemma is often used, when one needs to describe longterm behaviour of sequences of random variables. Example Let (X k ) k 1 be i.i.d. random variables with common exponential distribution of mean 1/λ, i.e., P ( X 1 > x ) = e λx for all x 0. One can show that X n grows like 1 10 λ log n, more precisely, that Solution. For ε > 0, denote A ε n def = P ( lim sup n X n log n = 1 λ) = 1. { } ω : X n (ω) > 1+ε log n, B ε def λ n = { ω : X n (ω) > 1 ε λ log n }. We clearly have P ( An) ε = n (1+ε) and P ( Bn) ε = n (1 ε). Since n P(Aε n) <, by Lemma 1.6a) the event { A ε n infinitely often } has probability zero. Similarly, the events Bn ε are independent and { n P(Bε n) =, thus, by Lemma 1.6b), the event B ε n infinitely often } has probability one. Remark (Records) A slightly more general version of the argument from Example 1.10 helps to control the limiting behaviour of records: 11 Let (X k ) k 1 be i.i.d. exponential r.v. with distribution P(X k > x) = e x, and def let M n = max 1 k n X k. Then P(M n /(log n) 1) = 1, ie., the normalized maximum M n /(log n) converges to one almost surely (as n ). Example Let random variables (X n ) n 1 be i.i.d. with X 1 U[0, 1]. For α > 0, we have P ( X n > 1 n α) = n α, so that P ( X n > 1 n α i.o. ) = 1 iff α 1. A similar analysis shows that ( P X n > 1 1 ) n(log n) β i.o. = { 1, β 1, 0, β > 1. Lemma 1.6 is one of the main methods of proving almost sure convergence: Example If (X k ) k 1 is a sequence of random variables such that for every ε > 0 the event A(ε) { X k > ε finitely often } has probability one, then X k is said to converge to zero with probability one, recall Remark A simple example is with X k = 1 k X for a variable X 0 of finite mean, EX <. One then can show that k 1 P( X k > ε) = k 1 P(X > kε) <, and thus the result follows from Lemma 1.6, see also Lemma 2.15 below. 10 Recall that for a real sequence (a n ) n 1 one defines lim sup a n as the largest limiting n point of the sequence (a n ) n 1, equivalently, lim sup a n lim sup n n k n a k, see App. A below. 11 Similar results hold for other distributions, see page E6 in the Problems Sheets. 5

16 2 Convergence of random variables In probability theory one uses various modes of convergence of random variables, many of which are crucial for applications. In this section we shall consider some of the most important of them: convergence in L r, convergence in probability and convergence with probability one (a.k.a. almost sure convergence). 2.1 Weak laws of large numbers Definition 2.1. Let r > 0 be fixed. We say that a sequence X j, j 1, of random variables converges to a random variable X in L r L (write X r n X) as n, if E Xn X r 0 as n. Example 2.2. Let ( X n be a sequence of random variables such that for )n 1 some real numbers (a n ) n 1, we have P ( X n = a n ) = pn, P ( X n = 0 ) = 1 p n. (2.1) Then X n L r 0 iff E Xn r an r p n 0 as n. The following result is the L 2 weak law of large numbers (L 2 -WLLN) Theorem 2.3. Let X j, j 1, be a sequence of uncorrelated random variables with EX j = µ and Var(X j ) C <. Denote S n = X X n. Then 1 n S n L 2 µ as n. Proof. Immediate from 1 ) 2 E( n S (S n nµ) 2 n µ = E = Var(S n) n 2 n 2 Cn n 2 0 as n. Definition 2.4. We say that a sequence X j, j 1, of random variables converges P to a random variable X in probability (write X n X) as n, if for every fixed ε > 0 P ( X n X ε ) 0 as n. Example 2.5. Let the sequence ( X n be as in (2.1). Then for every ε > 0 )n 1 we have P ( X n ε ) P ( P X n 0) = p n, so that X n 0 if pn 0 as n. The usual (WLLN) is just a convergence in probability result: Theorem 2.6. Under the conditions of Theorem 2.3, 1 n S n P µ as n. Exercise 2.7. Derive Theorem 2.6 from the Chebyshev inequality. We prove Theorem 2.6 using the following simple fact: L Lemma 2.8. Let X j, j 1, be a sequence of random variables. If X r n X for some fixed r > 0, then X n P X as n. 6

17 Proof. By the generalized Markov inequality with g(x) = x r and Z n = X n X 0, we get: for every fixed ε > 0 as n. P ( Z n ε ) P ( X n X r ε r) E X n X r ε r 0 Proof of Theorem 2.6. Follows immediately from Theorem 2.3 and Lemma 2.8. As the following example shows, a high dimensional cube is almost a sphere. Example 2.9. Let X j, j 1 be iid with X j U( 1, 1). Then the variables Y j = (X j ) 2 satisfy EY j = 1 3, Var(Y j) E[(Yj 2)] = E[(X j) 4 ] 1. Fix ε > 0 and consider the set { def A n,ε = z R n : (1 ε) n/3 < z < (1 + ε) } n/3, where z is the usual Euclidean length in R n, z 2 = n j=1 (z j) 2. By the WLLN, 1 n n Y j 1 n j=1 n (X j ) 2 P 1 3 ; j=1 in other words, for every fixed ε > 0, a point X = (X 1,..., X n ) chosen uniformly at random in ( 1, 1) n satisfies ( 1 P n n (X j ) 2 1 ) ε P ( ) X A n,ε 0 as n, 3 j=1 ie., for large n, with probability approaching one, a random point X ( 1, 1) n is near the n-dimensional sphere of radius n/3 centred at the origin. Theorem Let random variables S n, n 1, have two finite moments, µ n ES n, σn 2 Var(S n ) <. If, for some sequence b n, we have σ n /b n 0 as n, then (S n µ n )/b n 0 as n, both in L 2 and in probability. Proof. The result follows immediately from the observation ( (Sn µ n ) 2 ) E b 2 n = Var(S n) b 2 n 0 as n. Example In the coupon collector s problem 12 let T n be the time to collect all n coupons. It is easy to show that ET n = n n m=1 1 m n log n and Var(T n ) n 2 n m=1 1 m π2 n 2 2 6, so that T n ET n n log n 0 ie., T n n log n 1 as n both in L 2 and in probability. 12 Problem R4 7

18 2.2 Almost sure convergence Let (X k ) k 1 be a sequence of i.i.d. random variables having mean EX 1 = µ and def finite second moment. Denote S n = n k=1 X k. Then the usual (weak) law of large numbers (WLLN) tells us that for every δ > 0 P( n 1 S n µ ) > δ 0 as n. (2.2) In other words, according to WLLN, n 1 S n converges in probability to a constant random variable X µ = E(X 1 ), as n (recall Definition 2.4, Theorem 2.6). It is important to remember that convergence in probability is not related to the pointwise convergence, ie., convergence X n (ω) X(ω) for a fixed ω Ω. The following useful definition can be realised in terms of a U[0, 1] random variable, recall Remark Definition The canonical probability space is ( Ω, F, P ), where Ω = [0, 1], F is the smallest σ-field containing all intervals in [0, 1], and P is the length measure on Ω (ie., for A = [a, b] [0, 1], P(A) = b a). Example Let ( Ω, F, P ) be the canonical probability space. For every event A F consider the indicator random variable { 1, if ω A, 1 A (ω) = (2.3) 0, if ω / A. For n 1 put m = [log 2 n], ie., m 0 is such that 2 m n < 2 m+1, define [ n 2 m A n = 2 m, n + 1 ] 2m 2 m [ 0, 1 ] and let X n def = 1 An. Since P ( 1 An > 0 ) = P(An ) = 2 [log 2 n] < 2 n 0 as n, the sequence X n converges in probability to X 0. However, { ω Ω : Xn (ω) X(ω) 0 as n } =, ie., there is no point ω Ω for which the sequence X n (ω) {0, 1} converges to X(ω) = 0. [Try the R script simulating this sequence from the course webpage!] The following is the key definition of this section. Definition A sequence (X k ) k 1 of random variables in (Ω, F, P) converges, as n, to a random variable X with probability one (or almost surely) if P( {ω Ω : Xn (ω) X(ω) as n }) = 1. (2.4) Remark For ε > 0, let A n (ε) = { ω : X n (ω) X(ω) > ε }. Then the property (2.4) is equivalent to saying that for every ε > 0 P ({ A n (ε) finitely often }) = 1. (2.5) This is why the Borel-Cantelli lemma is so useful in studying almost sure limits. 8

19 Example 1.5 (continued) Consider a finite random variable X, ie., satisfying def P( X < ) = 1. Then the sequence (X k ) k 1 defined via X k = 1 k X converges to zero with probability one. Solution. The previous discussion established exactly (2.5). In general, to verify convergence with probability one is not immediate. The following lemma gives a sufficient condition of almost sure convergence. Lemma Let X 1, X 2,... and X be random variables. If, for every ε > 0, P ( X n X > ε ) <, (2.6) n=1 then X n converges to X almost surely. Proof. Fix ε > 0 and let A n (ε) = { ω Ω : X n (ω) X(ω) > ε }. By (2.6), n P( A n (ε) ) <, and, by Lemma 1.6a), only a finite number of A n (ε) occur with probability one. This means that for every fixed ε > 0 the event A(ε) def = { ω Ω : X n (ω) X(ω) ε for all n large enough } has probability one. By monotonicity (A(ε 1 ) A(ε 2 ) if ε 1 < ε 2 ), the event { ω Ω : Xn (ω) X(ω) as n } = ε>0 A(ε) = m 1 A(1/m) has probability one. The claim follows. A straightforward application of Lemma 2.15 improves the WLLN (2.2) and gives the following famous (Borel) Strong Law of Large Numbers (SLLN): Theorem 2.16 (L 4 -SLLN). Let the variables X 1, X 2,... be i.i.d. with E(X k ) = µ and E ( (X k ) 4) def <. If S n = X 1 + X X n, then S n /n µ almost surely, as n. Proof. We may and shall suppose 13 that µ = E(X k ) = 0. Now, E ( (S n ) 4) (( n ) 4 ) = E X k = k=1 k E ( (X k ) 4) + 6 E ( (X k ) 2 (X m ) 2) 1 k<m n so that E ( (S n ) 4) Cn 2 for some C (0, ). By Chebyshev s inequality, and the result follows from (2.6). P ( S n > nε ) E( (S n ) 4) (nε) 4 C n 2 ε 4 13 otherwise, consider the centred variables X k = X k µ and deduce the result from the relation 1 n S n = 1 n S n µ and linearity of almost sure convergence. 9

20 With some additional work, 14 one can obtain the following SLLN (which is due to Kolmogorov): Theorem 2.17 (L 1 -SLLN). Let X 1, X 2,... be i.i.d. r.v. with E X k <. If def E(X k ) = µ and S n = X X n, then 1 n S n µ almost surely, as n. Notice that verifying almost sure convergence through the Borel-Cantelli lemma (or the sufficient condition (2.6)) is easier than using an explicit construction in the spirit of Example 1.5. We shall see more examples below. 2.3 Relations between different types of convergence It is important to remember the relations between different types of convergence. We know that (Lemma 2.8) X n L r X = X n P X ; one can also show 15 In addition, according to Example 2.13, X n a.s. X = X n P X. X n P X X n a.s. X, and the same construction shows that X n L r X X n a.s. X. The following examples fill in the remaining gaps: L r a.s. Example 2.18 (X n X X n X). Let X n be a sequence of independent random variables such that P(X n = 1) = p n, P(X n = 0) = 1 p n. Then X n P X p n 0 X n L r X as n, whereas X n a.s. X n p n <. In particular, taking p n = 1/n we deduce the claim. Notice that this example also shows that X P a.s. n X X n X. Example 2.19 (X P L n X X r n X). Let (Ω, F, P) be the canonical probability space, recall Definition For every n 1, define { X n (ω) def = e n e n, 0 ω 1/n 1 [0,1/n] (ω) 0, ω > 1/n. We obviously have X n a.s. 0 and X n P 0 as n ; however, for every r > 0 L r E X n r = enr as n, ie., X n n 0. Notice that this example also shows that a.s. L X n X X r n X. 14 we will not do this here! 15 although we shall not do it here! 10

21 3 Lebesgue integral In the simplest case, the (Riemann) integral of a non-negative function can be regarded as the area between the graph of that function and the x-axis. Lebesgue integration is a mathematical construction that extends the notion of the integral to a larger class of functions; it also extends the domains on which these functions can be defined. As such, the Lebesgue integral plays an important role in real analysis, probability, and many other areas of mathematics. 3.1 Integration: Riemann vs. Lebesgue As part of the general movement towards rigour in mathematics in the nineteenth century, attempts were made to put the integral calculus on a firm foundation. The Riemann integral 16 is one of the most widely known examples; its definition starts with the construction of a sequence of easily-calculated integrals which converge to the integral of a given function. This definition is successful in the sense that it gives the expected answer for many already-solved problems, and gives useful results for many other problems. However, despite the Riemann integral is naturally linear and monotone, 17 it does not interact well with taking limits of sequences of functions, making such limiting functions difficult to analyse (and integrate). 18 The Lebesgue integral is easier to deal with when taking limits under the integral sign; it also allows to calculate integrals for a broader class of functions. For example, the Dirichlet function, which is 0 where its argument is irrational and 1 otherwise, is Lebesgue-integrable, but not Riemann-integrable Riemann integral Recall that a partition of an interval [a, b] is a finite sequence a = x 0 < x 1 < x 2 <... < x n = b. Each [x i, x i+1 ] is called a sub-interval of the partition. The mesh of a partition is defined to be the length of the longest sub-interval [x i, x i+1 ], that is, it is max(x i+1 x i ) where 0 i n 1. Let f be a real-valued function defined on the interval [a, b]. The Riemann sum of f with respect to the partition x 0,..., x n is n 1 i=0 f(t i )(x i+1 x i ), where each t i is a fixed point in the sub-interval [x i, x i+1 ]. Notice that the last expression is the sum of areas of rectangles with heights f(t i ) and lengths x i+1 x i. 16 proposed by Bernhard Riemann ( ); 17 see the slides! 18 This is of prime importance, for instance, in the study of Fourier series, Fourier transforms and other topics. 11

22 Loosely speaking, the Riemann integral of f is the limit of the Riemann sums of f as the partitions get finer and finer (ie. the mesh goes to zero), and every function f for which this limit does not depend on the approximating sequence is called integrable Lebesgue integral: sketch of the construction The modern approach to the theory of Lebesgue integration has two distinct parts: a) a theory of measurable sets and measures on these sets; b) a theory of measurable functions and integrals on these functions. Measure theory initially was created to provide a detailed analysis of the notion of length of subsets of the real line and more generally area and volume of subsets of Euclidean spaces. In particular, it provided a systematic answer to the question of which subsets of R have a length. As was shown by later developments in set theory, it is actually impossible to assign a length to all subsets of R in a way which preserves some natural additivity and translation invariance properties. This suggests that picking out a suitable class of measurable subsets is an essential prerequisite. The modern approach to measure and integration is axiomatic. One defines a measure as a mapping µ from a σ-field A of subsets of a set E, which satisfies a certain list of properties. 19 These properties can be shown to hold in many different cases. Integration. In the Lebesgue theory, integrals are limited to a class of functions called measurable functions. Let E be a set and let A be a σ-field of subsets 20 of E. A function f : E R is measurable if the pre-image of any closed interval [a, b] R is in A, f 1 ([a, b]) A. The set of measurable functions is naturally closed under algebraic operations; in addition (and more importantly) this class is closed under various kinds of point-wise sequential limits, eg., if the sequence {f k } k N consists of measurable functions, then both are measurable functions. lim inf f k and lim sup k N Let a measure space (E, A, µ) be fixed. The Lebesgue integral f dµ for measurable functions f : E R is constructed in stages: E Indicator functions: If S A, ie., the set S is measurable, we define the integral of its indicator function 21 1 S via 1 S dµ = µ(s). k N f k 19 see the slides! 20 one often calls (E, A) a measurable space, and (E, A, µ) a measure space; 21 recall that 1 S (x) = 1 if x S and 1 S (x) = 0 otherwise 12

23 Simple functions: for non-negative simple functions, ie., linear combinations of indicator functions f = k a k1 Sk (where the sum is finite and all a k 0), we use linearity to define 22 ( ) µ(f) a k 1 Sk dµ = a k 1 Sk dµ = a k µ(s k ), k k k This construction is obviously linear and monotone. 23 Moreover, even if a simple function can be written as k a k1 Sk in many ways, the integral will always be the same. 24 Non-negative functions: Let f : E [0, + ] be measurable. We put { } f dµ := sup h dµ : h f, 0 h simple E E We need to check whether this construction is consistent, ie., if 0 f is simple we need to verify whether this definition coincides with the preceding one. Another question is: if f as above is Riemann-integrable, does this definition give the same value of the integral? It is not hard to prove that the answer to both questions is yes. Clearly, if f : E [0, + ] is any measurable function, its integral f dµ may be infinite. Signed functions: If f : E [, + ] is measurable, 25 we decompose it into the positive and negative parts, f = f + f, where f + (x) = { f(x) if f(x) > 0, 0 otherwise, f (x) = { f(x) if f(x) < 0, 0 otherwise, Note that the functions f + 0 and f 0 satisfy f = f + + f. If f dµ is finite, then f is called Lebesgue integrable. In this case, both integrals f + dµ and f dµ converge, and it makes sense to define f dµ = f + dµ f dµ. It turns out that this definition gives the desirable properties of the integral, namely, linearity, monotonicity and regularity when taking limits. The functions, which can be obtained from the above construction, are called Borel functions. 26 The class of Borel functions is very big and sufficient for most practical considerations here we always assume that 0 = 0 = 0; 23 see the slides! 24 Also, if any two functions f 1 and f 2 coincide almost everywhere, ie., they differ on a set of measure zero, µ(x : f 1 (x) f 2 (x)) = 0, their integrals are equal, µ(f 1 ) = µ(f 2 ). 25 Complex valued functions can be similarly integrated, by considering the real part and the imaginary part separately. 26 by definition f : E [, + ] is Borel, if for every a R, {x E : f(x) a} A, ie., is measurable. 27 it is not easy to construct a non-borel real-valued function; get in touch, if interested! 13

24 3.2 Lebesgue integral: limiting results The construction described above implies the following limiting property, which is one of the most central in the area: Theorem 3.1 (Monotone Convergence Theorem; (MON)). Let f and (f n ) n 1 be Borel functions on (E, A, µ) such that 0 f n f. Then, as n, µ ( f n ) µ ( f ). The random variables version of the result is: Theorem 3.2 (Monotone Convergence Theorem; (MON)). If random variables X n 0 are such that X n X as n, then E(X n ) E(X) as n. In view of the footnote 24 above, the following result is rather natural: Corollary 3.3. Let f and (f n ) n 1 be non-negative Borel functions on (E, A, µ) such that, except on a µ-null set N, 0 f n f, i.e., x E \ N, f n (x) f(x) and µ(n) = 0. Then µ ( f n ) µ ( f ) as n. Exercise 3.4. State an analogue of the previous corollary for random variables (using almost sure convergence). Another important result is Theorem 3.5 (Dominated-Convergence Theorem; (DOM)). Let (f n ) n 1 and f be Borel functions on (E, A, µ) such that f n (x) converges to f(x) for all x E as n and such that the sequence f n (x) is dominated by a non-negative integrable function g, i.e., for all x E and n N, f n (x) f(x) and fn (x) g(x) with µ(g) <. (3.1) Then µ(f n ) µ(f) as n. Theorem 3.6 (Dominated-Convergence Theorem; (DOM)). Let (X n ) n 1 and X be random variables such that for all ω Ω, we have X n (ω) X(ω) as n. If there is a random variable Y 0 such that E(Y ) <, and for all ω Ω, X n (ω) Y (ω), then E(X n ) E(X) as n. Of course, similarly to the corollary above, one can allow the conditions (3.1) to be violated on a set N of measure zero. Exercise 3.7. State the versions of the last two theorems in the case convergence is violated on a set of measure zero (ie., convergence takes place almost surely). Various examples of application of these results were discussed in the lectures and tutorials. 14

25 4 Generating functions Even quite straightforward counting problems can lead to laborious and lengthy calculations. These are often greatly simplified by using generating functions. 28 Definition 4.1. Given a collection of real numbers (a k ) k 0, the function G(s) = G a (s) def = is called the generating function of (a k ) k 0. a k s k (4.1) Why do we care? If the generating function G a (s) of (a n ) n 0 is analytic near the origin, then there is a one-to-one correspondence between G a (s) and (a n ) n 0 ; namely, a k can be recovered via 29 k=0 a k = 1 k! d k ds k G a(s) s=0. (4.2) This result is often referred to as the uniqueness property of generating functions. Definition 4.2. If X is a discrete random variable with values in Z + def = {0, 1,... }, its (probability) generating function, G(s) G X (s) def = E ( s X) = s k P(X = k), (4.3) is just the generating function of the pmf { p k } { P(X = k) } of X. k=0 Recall that the moment generating function 30 M X (t) def = E(e tx ) of a random variable X is just 31 E(X k ) k! t k. Why do we introduce both G X (s) and M X (t)? k 0 The following result illustrates one of the most useful applications of generating functions in probability theory: Theorem 4.3. If X and Y are independent random variables with values in {0, 1, 2,... } and Z def = X + Y, then their generating functions satisfy 32 G Z (s) = G X+Y (s) = G X (s) G Y (s). Example 4.4. If X 1, X 2,..., X n are independent identically distributed random variables 33 with values in {0, 1, 2,... } and if S n = X X n, then G Sn (s) = G X1 (s)... G Xn (s) [ G X (s) ] n. 28 introduced by de Moivre and Euler in the early eighteenth century. 29 this and a several other useful properties of power series can be found in Sect. A.4 below. 30 we might have M X (t) = for t 0! 31 ie., it is the generating function of the sequence E(X k )/k!. 32recall: if X and Y are discrete random variables, and f, g : Z + R are arbitrary functions, then f(x) and g(y ) are independent random variables and E [ f(x)g(y ) ] = Ef(X) Eg(Y ); 33 from now on we shall often abbreviate this to just i.i.d.r.v. 15

26 Example 4.5. Let X 1, X 2,..., X n be i.i.d.r.v. with values in {0, 1, 2,... } and let N 0 be an integer-valued random variable independent of {X k } k 1. Then 34 def S N = X X N has generating function G SN (s) = G N ( GX (s) ). (4.4) Solution. This is a straightforward application of the partition theorem for expectations. Alternatively, the result follows from the standard properties of conditional expectations: E ( z S N ) = E [ E ( z S N N )] = E ([ G X (z) ] N ) = GN ( GX (z) ). Example 4.6. [Renewals] Imagine a diligent janitor who replaces a light bulb the same day as it burns out. Suppose the first bulb is put in on day 0 and let X i be the lifetime of the ith light bulb. Let the individual lifetimes X i be i.i.d.r.v. s with values in {1, 2,... } and have a common distribution with def generating function G f (s). Define r n = P ( a light bulb was replaced on day n ) and f k def = P ( the first light bulb was replaced on day k ). Then r 0 = 1, f 0 = 0, and r n = n k=1 f k r n k, n 1. A standard computation implies that G r (s) = 1 + G f (s) G r (s) for all s < 1, so that G r (s) = 1/(1 G f (s)). In general, we say a sequence (c n ) n 0 is the convolution of (a k ) k 0 and (b m ) m 0 (write c = a b), if c n = n a k b n k, n 0, (4.5) k=0 The key property of convolutions is given by the following result: Theorem 4.7. [Convolution thm] If c = a b, then the generating functions G c (s), G a (s), and G b (s) satisfy G c (s) = G a (s) G b (s). Example 4.8. Let X Poi(λ) and Y Poi(µ) be independent. Then Z = X + Y is Poi(λ + µ). Solution. A straightforward computation gives G X (s) = e λ(s 1) ; Theorem 4.3 then implies G Z (s) = G X (s) G Y (s) = e λ(s 1) e µ(s 1) e (λ+µ)(s 1), so that the result follows by uniqueness. A similar argument implies the following result. Example 4.9. If X Bin(n, p) and Y Bin(m, p) are independent, then X + Y Bin(n + m, p). Another useful property of probability generating function G X (s) is that it can be used to compute moments of X: 34 This is a two-stage probabilistic experiment! 16

27 Theorem If X has generating function G(s), then 35 E [ X(X 1)... (X k + 1) ] = G (k) (1). Remark The quantity E [ X(X 1)... (X k + 1) ] is called the kth factorial moment of X. Notice also that Var(X) = G X(1) + G X(1) ( G X(1) ) 2. (4.6) Proof. Fix s (0, 1) and differentiate G(s) k times 36 to get G (k) (s) = E [ s X k X(X 1)... (X k + 1) ]. Taking the limit s 1 and using the Abel theorem, 37 we obtain the result. Remark Notice also that lim G X(s) lim E[s X ] = P(X < ). s 1 s 1 This allows us to check whether a variable is finite, if we do not know this apriori. Exercise Let S N be defined as in Example 4.5. Use (4.4) to compute E [ S N ] and Var [ S N ] in terms of E[N], E[N], Var[X] and Var[N]. Now check your result for E [ S N ] and Var [ SN ] by directly applying the partition theorem for expectations. Generating functions are also very useful in solving recurrences, especially when combined with the following algebraic fact. 38 Lemma 4.12 (Partial fraction expansion). Let f(x) = g(x)/h(x) be a ratio of two polynomials without common roots. Let deg(g) < deg(h) = m and suppose that the roots a 1,..., a m of h(x) are all distinct. Then f(x) can be decomposed into a sum of partial fractions, ie., for some constants b 1, b 2,..., b m, Remark Since f(x) = b 1 a 1 x + b 2 a 2 x + + b a x = b a ( x ) k = a k 0 k 0 b m a m x. (4.7) b a k+1 xk, a generating function of the form (4.7) can be easily written as a power series. 35 here, if G (k) (1) does not exists we understand the RHS of the equation as G (k) (1 ) lim s 1 G (k) (s), the limiting value of the kth left derivative of G(s) at s = 1; 36 As G X (s) E s X 1 for all s 1, the generating function G X (s) can be differentiated many times for all s inside the disk { s : s < 1 }. 37 Theorem A.12 below; by footnote 36, it applies to all probability generating functions. 38 An alternative way would be to use products of matrices; get in touch, if interested! 17

1 Sequences of events and their limits

1 Sequences of events and their limits O.H. Probability II (MATH 2647 M15 1 Sequences of events and their limits 1.1 Monotone sequences of events Sequences of events arise naturally when a probabilistic experiment is repeated many times. For

More information

1 Generating functions

1 Generating functions 1 Generating functions Even quite straightforward counting problems can lead to laborious and lengthy calculations. These are greatly simplified by using generating functions. 2 Definition 1.1. Given a

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

1.1. MEASURES AND INTEGRALS

1.1. MEASURES AND INTEGRALS CHAPTER 1: MEASURE THEORY In this chapter we define the notion of measure µ on a space, construct integrals on this space, and establish their basic properties under limits. The measure µ(e) will be defined

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction

More information

Notes 1 : Measure-theoretic foundations I

Notes 1 : Measure-theoretic foundations I Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

6.1 Moment Generating and Characteristic Functions

6.1 Moment Generating and Characteristic Functions Chapter 6 Limit Theorems The power statistics can mostly be seen when there is a large collection of data points and we are interested in understanding the macro state of the system, e.g., the average,

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

1: PROBABILITY REVIEW

1: PROBABILITY REVIEW 1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

Probability and Measure

Probability and Measure Chapter 4 Probability and Measure 4.1 Introduction In this chapter we will examine probability theory from the measure theoretic perspective. The realisation that measure theory is the foundation of probability

More information

Probability Theory. Richard F. Bass

Probability Theory. Richard F. Bass Probability Theory Richard F. Bass ii c Copyright 2014 Richard F. Bass Contents 1 Basic notions 1 1.1 A few definitions from measure theory............. 1 1.2 Definitions............................. 2

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

7 Random samples and sampling distributions

7 Random samples and sampling distributions 7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces

More information

17. Convergence of Random Variables

17. Convergence of Random Variables 7. Convergence of Random Variables In elementary mathematics courses (such as Calculus) one speaks of the convergence of functions: f n : R R, then lim f n = f if lim f n (x) = f(x) for all x in R. This

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

2 n k In particular, using Stirling formula, we can calculate the asymptotic of obtaining heads exactly half of the time:

2 n k In particular, using Stirling formula, we can calculate the asymptotic of obtaining heads exactly half of the time: Chapter 1 Random Variables 1.1 Elementary Examples We will start with elementary and intuitive examples of probability. The most well-known example is that of a fair coin: if flipped, the probability of

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Week 12-13: Discrete Probability

Week 12-13: Discrete Probability Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible

More information

Probability and Measure

Probability and Measure Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real

More information

8 Laws of large numbers

8 Laws of large numbers 8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Lecture 1: Review on Probability and Statistics

Lecture 1: Review on Probability and Statistics STAT 516: Stochastic Modeling of Scientific Data Autumn 2018 Instructor: Yen-Chi Chen Lecture 1: Review on Probability and Statistics These notes are partially based on those of Mathias Drton. 1.1 Motivating

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

1 Random Variable: Topics

1 Random Variable: Topics Note: Handouts DO NOT replace the book. In most cases, they only provide a guideline on topics and an intuitive feel. 1 Random Variable: Topics Chap 2, 2.1-2.4 and Chap 3, 3.1-3.3 What is a random variable?

More information

On the convergence of sequences of random variables: A primer

On the convergence of sequences of random variables: A primer BCAM May 2012 1 On the convergence of sequences of random variables: A primer Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park armand@isr.umd.edu BCAM May 2012 2 A sequence a :

More information

I. ANALYSIS; PROBABILITY

I. ANALYSIS; PROBABILITY ma414l1.tex Lecture 1. 12.1.2012 I. NLYSIS; PROBBILITY 1. Lebesgue Measure and Integral We recall Lebesgue measure (M411 Probability and Measure) λ: defined on intervals (a, b] by λ((a, b]) := b a (so

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Lectures for APM 541: Stochastic Modeling in Biology. Jay Taylor

Lectures for APM 541: Stochastic Modeling in Biology. Jay Taylor Lectures for APM 541: Stochastic Modeling in Biology Jay Taylor November 3, 2011 Contents 1 Distributions, Expectations, and Random Variables 4 1.1 Probability Spaces...................................

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

STA205 Probability: Week 8 R. Wolpert

STA205 Probability: Week 8 R. Wolpert INFINITE COIN-TOSS AND THE LAWS OF LARGE NUMBERS The traditional interpretation of the probability of an event E is its asymptotic frequency: the limit as n of the fraction of n repeated, similar, and

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

1 Review of Probability

1 Review of Probability 1 Review of Probability Random variables are denoted by X, Y, Z, etc. The cumulative distribution function (c.d.f.) of a random variable X is denoted by F (x) = P (X x), < x

More information

1.1 Review of Probability Theory

1.1 Review of Probability Theory 1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,

More information

THE QUEEN S UNIVERSITY OF BELFAST

THE QUEEN S UNIVERSITY OF BELFAST THE QUEEN S UNIVERSITY OF BELFAST 0SOR20 Level 2 Examination Statistics and Operational Research 20 Probability and Distribution Theory Wednesday 4 August 2002 2.30 pm 5.30 pm Examiners { Professor R M

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x

More information

Measure and integration

Measure and integration Chapter 5 Measure and integration In calculus you have learned how to calculate the size of different kinds of sets: the length of a curve, the area of a region or a surface, the volume or mass of a solid.

More information

3. Review of Probability and Statistics

3. Review of Probability and Statistics 3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture

More information

18.175: Lecture 2 Extension theorems, random variables, distributions

18.175: Lecture 2 Extension theorems, random variables, distributions 18.175: Lecture 2 Extension theorems, random variables, distributions Scott Sheffield MIT Outline Extension theorems Characterizing measures on R d Random variables Outline Extension theorems Characterizing

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Week 2. Review of Probability, Random Variables and Univariate Distributions

Week 2. Review of Probability, Random Variables and Univariate Distributions Week 2 Review of Probability, Random Variables and Univariate Distributions Probability Probability Probability Motivation What use is Probability Theory? Probability models Basis for statistical inference

More information

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define 1 Measures 1.1 Jordan content in R N II - REAL ANALYSIS Let I be an interval in R. Then its 1-content is defined as c 1 (I) := b a if I is bounded with endpoints a, b. If I is unbounded, we define c 1

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Limiting Distributions

Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results

More information

Stochastic Models (Lecture #4)

Stochastic Models (Lecture #4) Stochastic Models (Lecture #4) Thomas Verdebout Université libre de Bruxelles (ULB) Today Today, our goal will be to discuss limits of sequences of rv, and to study famous limiting results. Convergence

More information

The Lebesgue Integral

The Lebesgue Integral The Lebesgue Integral Brent Nelson In these notes we give an introduction to the Lebesgue integral, assuming only a knowledge of metric spaces and the iemann integral. For more details see [1, Chapters

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

RS Chapter 1 Random Variables 6/5/2017. Chapter 1. Probability Theory: Introduction

RS Chapter 1 Random Variables 6/5/2017. Chapter 1. Probability Theory: Introduction Chapter 1 Probability Theory: Introduction Basic Probability General In a probability space (Ω, Σ, P), the set Ω is the set of all possible outcomes of a probability experiment. Mathematically, Ω is just

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

1 Stat 605. Homework I. Due Feb. 1, 2011

1 Stat 605. Homework I. Due Feb. 1, 2011 The first part is homework which you need to turn in. The second part is exercises that will not be graded, but you need to turn it in together with the take-home final exam. 1 Stat 605. Homework I. Due

More information

JUSTIN HARTMANN. F n Σ.

JUSTIN HARTMANN. F n Σ. BROWNIAN MOTION JUSTIN HARTMANN Abstract. This paper begins to explore a rigorous introduction to probability theory using ideas from algebra, measure theory, and other areas. We start with a basic explanation

More information

We introduce methods that are useful in:

We introduce methods that are useful in: Instructor: Shengyu Zhang Content Derived Distributions Covariance and Correlation Conditional Expectation and Variance Revisited Transforms Sum of a Random Number of Independent Random Variables more

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

µ X (A) = P ( X 1 (A) )

µ X (A) = P ( X 1 (A) ) 1 STOCHASTIC PROCESSES This appendix provides a very basic introduction to the language of probability theory and stochastic processes. We assume the reader is familiar with the general measure and integration

More information

Convergence of Random Variables

Convergence of Random Variables 1 / 15 Convergence of Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay March 19, 2014 2 / 15 Motivation Theorem (Weak

More information

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be Chapter 6 Expectation and Conditional Expectation Lectures 24-30 In this chapter, we introduce expected value or the mean of a random variable. First we define expectation for discrete random variables

More information

Review: mostly probability and some statistics

Review: mostly probability and some statistics Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics and Engineering Lecture notes for PDEs Sergei V. Shabanov Department of Mathematics, University of Florida, Gainesville, FL 32611 USA CHAPTER 1 The integration theory

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Review of Probability. CS1538: Introduction to Simulations

Review of Probability. CS1538: Introduction to Simulations Review of Probability CS1538: Introduction to Simulations Probability and Statistics in Simulation Why do we need probability and statistics in simulation? Needed to validate the simulation model Needed

More information

Chapter 4. Chapter 4 sections

Chapter 4. Chapter 4 sections Chapter 4 sections 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP: 4.8 Utility Expectation

More information

Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya

Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya BBM 205 Discrete Mathematics Hacettepe University http://web.cs.hacettepe.edu.tr/ bbm205 Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya Resources: Kenneth Rosen, Discrete

More information

A D VA N C E D P R O B A B I L - I T Y

A D VA N C E D P R O B A B I L - I T Y A N D R E W T U L L O C H A D VA N C E D P R O B A B I L - I T Y T R I N I T Y C O L L E G E T H E U N I V E R S I T Y O F C A M B R I D G E Contents 1 Conditional Expectation 5 1.1 Discrete Case 6 1.2

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Product measure and Fubini s theorem

Product measure and Fubini s theorem Chapter 7 Product measure and Fubini s theorem This is based on [Billingsley, Section 18]. 1. Product spaces Suppose (Ω 1, F 1 ) and (Ω 2, F 2 ) are two probability spaces. In a product space Ω = Ω 1 Ω

More information

7 Convergence in R d and in Metric Spaces

7 Convergence in R d and in Metric Spaces STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

Measures and Measure Spaces

Measures and Measure Spaces Chapter 2 Measures and Measure Spaces In summarizing the flaws of the Riemann integral we can focus on two main points: 1) Many nice functions are not Riemann integrable. 2) The Riemann integral does not

More information

Fundamental Tools - Probability Theory II

Fundamental Tools - Probability Theory II Fundamental Tools - Probability Theory II MSc Financial Mathematics The University of Warwick September 29, 2015 MSc Financial Mathematics Fundamental Tools - Probability Theory II 1 / 22 Measurable random

More information

Elementary Probability. Exam Number 38119

Elementary Probability. Exam Number 38119 Elementary Probability Exam Number 38119 2 1. Introduction Consider any experiment whose result is unknown, for example throwing a coin, the daily number of customers in a supermarket or the duration of

More information