Probability. Prof. F.P. Kelly. Lent These notes are maintained by Paul Metcalfe. Comments and corrections to

Size: px

Start display at page:

Download "Probability. Prof. F.P. Kelly. Lent These notes are maintained by Paul Metcalfe. Comments and corrections to"

Ashley Hudson
6 years ago
Views:

1 Probability Prof. F.P. Kelly Lent 996 These notes are maintained by Paul Metcalfe. Comments and corrections to

2 Revision:. Date: 2002/09/20 4:45:43 The following people have maintained these notes. June 2000 Kate Metcalfe June 2000 July 2004 Andrew Rogers July 2004 date Paul Metcalfe

3 Contents Introduction v Basic Concepts. Sample Space Classical Probability Combinatorial Analysis Stirling s Formula The Axiomatic Approach 5 2. The Axioms Independence Distributions Conditional Probability Random Variables 3. Expectation Variance Indicator Function Inclusion - Exclusion Formula Independence Inequalities Jensen s Inequality Cauchy-Schwarz Inequality Markov s Inequality Chebyshev s Inequality Law of Large Numbers Generating Functions 3 5. Combinatorial Applications Conditional Expectation Properties of Conditional Expectation Branching Processes Random Walks Continuous Random Variables Jointly Distributed Random Variables Transformation of Random Variables Moment Generating Functions iii

4 iv CONTENTS 6.4 Central Limit Theorem Multivariate normal distribution

5 Introduction These notes are based on the course Probability given by Prof. F.P. Kelly in Cambridge in the Lent Term 996. This typed version of the notes is totally unconnected with Prof. Kelly. Other sets of notes are available for different courses. At the time of typing these courses were: Probability Discrete Mathematics Analysis Further Analysis Methods Quantum Mechanics Fluid Dynamics Quadratic Mathematics Geometry Dynamics of D.E. s Foundations of QM Electrodynamics Methods of Math. Phys Fluid Dynamics 2 Waves (etc.) Statistical Physics General Relativity Dynamical Systems Combinatorics Bifurcations in Nonlinear Convection They may be downloaded from v

6 vi INTRODUCTION

7 Chapter Basic Concepts. Sample Space Suppose we have an experiment with a set Ω of outcomes. Then Ω is called the sample space. A potential outcome ω Ω is called a sample point. For instance, if the experiment is tossing coins, then Ω {H, T }, or if the experiment was tossing two dice, then Ω {(i, j) : i, j {,..., 6}}. A subset A of Ω is called an event. An event A occurs is when the experiment is performed, the outcome ω Ω satisfies ω A. For the coin-tossing experiment, then the event of a head appearing is A {H} and for the two dice, the event rolling a four would be A {(, 3), (2, 2), (3, )}..2 Classical Probability If Ω is finite, Ω {ω,..., ω n }, and each of the n sample points is equally likely then the probability of event A occurring is P(A) A Ω Example. Choose r digits from a table of random numbers. Find the probability that for 0 k 9,. no digit exceeds k, 2. k is the greatest digit drawn. Solution. The event that no digit exceeds k is A k {(a,..., a r ) : 0 a i k, i... r}. Now A k (k + ) r, so that P(A k ) ( ) k+ r. 0 Let B k be the event that k is the greatest digit drawn. Then B k A k \ A k. Also A k A k, so that B k (k + ) r k r. Thus P(B k ) (k+)r k r 0 r

8 2 CHAPTER. BASIC CONCEPTS The problem of the points Players A and B play a series of games. The winner of a game wins a point. The two players are equally skillful and the stake will be won by the first player to reach a target. They are forced to stop when A is within 2 points and B within 3 points. How should the stake be divided? Pascal suggested that the following continuations were equally likely AAAA AAAB AABB ABBB BBBB AABA ABBA BABB ABAA ABAB BBAB BAAA BABA BBBA BAAB BBAA This makes the ratio : 5. It was previously thought that the ratio should be 6 : 4 on considering termination, but these results are not equally likely..3 Combinatorial Analysis The fundamental rule is: Suppose r experiments are such that the first may result in any of n possible outcomes and such that for each of the possible outcomes of the first i experiments there are n i possible outcomes to experiment i. Let a i be the outcome of experiment i. Then there are a total of r i n i distinct r-tuples (a,..., a r ) describing the possible outcomes of the r experiments. Proof. Induction..4 Stirling s Formula For functions g(n) and h(n), we say that g is asymptotically equivalent to h and write g(n) h(n) if g(n) h(n) as n. Theorem. (Stirling s Formula). As n, and thus n! 2πnn n e n. log n! 2πnnn e n 0 We first prove the weak form of Stirling s formula, that log(n!) n log n. Proof. log n! n log k. Now n log xdx and z logx dx z log z z +, and so n log k n+ log xdx, n log n n + log n! (n + ) log(n + ) n. Divide by n log n and let n to sandwich Therefore log n! n log n. log n! n log n between terms that tend to.

9 .4. STIRLING S FORMULA 3 Now we prove the strong form. Proof. For x > 0, we have Now integrate from 0 to y to obtain Let h n log x + x 2 x 3 < + x < x + x2. y y 2 /2 + y 3 /3 y 4 /4 < log( + y) < y y 2 /2 + y 3 /3. n!en. Then we obtain n n+/2 2n 2 2n 3 h n h n+ 2n 2 + 6n 3. For n 2, 0 h n h n+ n. Thus h 2 n is a decreasing sequence, and 0 h 2 h n+ n r2 (h r h r+ ) r. Therefore h 2 n is bounded below, decreasing so is convergent. Let the limit be A. We have obtained n! e A n n+/2 e n. We need a trick to find A. Let I r π/2 sin r θ dθ. We obtain the recurrence I 0 r r r I r 2 by integrating by parts. Therefore I 2n (2n)! (2 n n!) π/2 and I 2 2n+ (2n n!) 2 (2n+)!. Now I n is decreasing, so I 2n I 2n+ I 2n I 2n+ + 2n. But by substituting our formula in, we get that I 2n I 2n+ π 2 Therefore e 2A 2π as required. 2n + n 2 2π e2a e 2A. by playing silly buggers with log + n

10 4 CHAPTER. BASIC CONCEPTS

11 Chapter 2 The Axiomatic Approach 2. The Axioms Let Ω be a sample space. Then probability P is a real valued function defined on subsets of Ω satisfying :-. 0 P(A) for A Ω, 2. P(Ω), 3. for a finite or infinite sequence A, A 2, Ω of disjoint events, P( A i ) i P(A u). The number P(A) is called the probability of event A. We can look at some distributions here. Consider an arbitrary finite or countable Ω {ω, ω 2,... } and an arbitrary collection {p, p 2,... } of non-negative numbers with sum. If we define P(A) p i, i:ω i A it is easy to see that this function satisfies the axioms. The numbers p, p 2,... are called a probability distribution. If Ω is finite with n elements, and if p p 2 p n n we recover the classical definition of probability. Another example would be to let Ω {0,,... } and attach to outcome r the λ λr probability p r e r! for some λ > 0. This is a distribution (as may be easily verified), and is called the Poisson distribution with parameter λ. Theorem 2. (Properties of P). A probability P satisfies. P(A c ) P(A), 2. P( ) 0, 3. if A B then P(A) P(B), 4. P(A B) P(A) + P(B) P(A B). 5

12 6 CHAPTER 2. THE AXIOMATIC APPROACH Proof. Note that Ω A A c, and A A c. Thus P(Ω) P(A)+P(A c ). Now we can use this to obtain P( ) P( c ) 0. If A B, write B A (B A c ), so that P(B) P(A) + P(B A c ) P(A). Finally, write A B A (B A c ) and B (B A) (B A c ). Then P(A B) P(A) + P(B A c ) and P(B) P(B A) + P(B A c ), which gives the result. Theorem 2.2 (Boole s Inequality). For any A, A 2, Ω, ( n ) n P A i P(A i ) ( ) P A i P(A i ) Proof. Let B A and then inductively let B i A i \ i B k. Thus the B i s are disjoint and i B i i A i. Therefore ( ) ( ) P A i P B i i i i P(B i ) i i i P(A i ) as B i A i. Theorem 2.3 (Inclusion-Exclusion Formula). ( n ) P A i ( ) S P A j. j S S {,...,n} S Proof. We know that P(A A 2 ) P(A ) + P(A 2 ) P(A A 2 ). Thus the result is true for n 2. We also have that P(A A n ) P(A A n ) + P(A n ) P((A A n ) A n ). But by distributivity, we have ( n ) P A i P i ( n ) A i + P(A n ) P Application of the inductive hypothesis yields the result. Corollary (Bonferroni Inequalities). S {,...,r} S ( ) S P A j j S ( n or P ) (A i A n ). ( n ) A i according as r is even or odd. Or in other words, if the inclusion-exclusion formula is truncated, the error has the sign of the omitted term and is smaller in absolute value. Note that the case r is Boole s inequality.

13 2.2. INDEPENDENCE 7 Proof. The result is true for n 2. If true for n, then it is true for n and r n by the inductive step above, which expresses a n-union in terms of two n unions. It is true for r n by the inclusion-exclusion formula. Example (Derangements). After a dinner, the n guests take coats at random from a pile. Find the probability that at least one guest has the right coat. Solution. Let A k be the event that guest k has his own coat. We want P( n i A i). Now, P(A i A ir ) (n r)!, n! by counting the number of ways of matching guests and coats after i,..., i r have taken theirs. Thus ( ) n (n r)! P(A i A ir ) r n! r!, i < <i r and the required probability is ( n ) P A i i which tends to e as n. 2! + 3! + + ( )n, n! Furthermore, let P m (n) be the probability that exactly m guests take the right coat. Then P 0 (n) e and n! P 0 (n) is the number of derangements of n objects. Therefore ( ) n P0 (n m) (n m)! P m (n) m n! 2.2 Independence P 0(n m) m! e m! as n. Definition 2.. Two events A and B are said to be independent if P(A B) P(A) P(B). More generally, a collection of events A i, i I are independent if ( ) P A i P(A i ) i J for all finite subsets J I. i J Example. Two fair dice are thrown. Let A be the event that the first die shows an odd number. Let A 2 be the event that the second die shows an odd number and finally let A 3 be the event that the sum of the two numbers is odd. Are A and A 2 independent? Are A and A 3 independent? Are A, A 2 and A 3 independent? I m not being sexist, merely a lazy typist. Sex will be assigned at random...

14 8 CHAPTER 2. THE AXIOMATIC APPROACH Solution. We first calculate the probabilities of the events A, A 2, A 3, A A 2, A A 3 and A A 2 A 3. Event Probability A A 2 As above, 2 A A A A A A A 2 A 3 0 Thus by a series of multiplications, we can see that A and A 2 are independent, A and A 3 are independent (also A 2 and A 3 ), but that A, A 2 and A 3 are not independent. Now we wish to state what we mean by 2 independent experiments 2. Consider Ω {α,... } and Ω 2 {β,... } with associated probability distributions {p,... } and {q,... }. Then, by 2 independent experiments, we mean the sample space Ω Ω 2 with probability distribution P((α i, β j )) p i q j. Now, suppose A Ω and B Ω 2. The event A can be interpreted as an event in Ω Ω 2, namely A Ω 2, and similarly for B. Then P(A B) p i q j p i q j P(A) P(B), α i A β j B α i A β j B which is why they are called independent experiments. The obvious generalisation to n experiments can be made, but for an infinite sequence of experiments we mean a sample space Ω Ω 2... satisfying the appropriate formula n N. You might like to find the probability that n independent tosses of a biased coin with the probability of heads p results in a total of r heads. 2.3 Distributions The binomial distribution with parameters n and p, 0 p has Ω {0,..., n} and probabilities p i ( n i) p i ( p) n i. Theorem 2.4 (Poisson approximation to binomial). If n, p 0 with np λ held fixed, then ( n )p r ( p) n r λ λr e r r!. 2 or more generally, n.

15 2.4. CONDITIONAL PROBABILITY 9 Proof. ( ) n p r ( p) n r r n(n )... (n r + ) p r ( p) n r r! n n n r ( n i + n i n... n r + (np) r ( p) n r n r! ) ( λ r λ r! n λr r! e λ λ λr e r!. ) n ( λ n ) r Suppose an infinite sequence of independent trials is to be performed. Each trial results in a success with probability p (0, ) or a failure with probability p. Such a sequence is called a sequence of Bernoulli trials. The probability that the first success occurs after exactly r failures is p r p( p) r. This is the geometric distribution with parameter p. Since 0 p r, the probability that all trials result in failure is zero. 2.4 Conditional Probability Definition 2.2. Provided P(B) > 0, we define the conditional probability of A B 3 to be P(A B) P(A B). P(B) Whenever we write P(A B), we assume that P(B) > 0. Note that if A and B are independent then P(A B) P(A). Theorem P(A B) P(A B) P(B), 2. P(A B C) P(A B C) P(B C) P(C), 3. P(A B C) P(A B C) P(B C), 4. the function P( B) restricted to subsets of B is a probability function on B. Proof. Results to 3 are immediate from the definition of conditional probability. For result 4, note that A B B, so P(A B) P(B) and thus P(A B). P(B B) (obviously), so it just remains to show the last axiom. For disjoint A i s, ( ) P A i B P( i (A i B)) P(B) i i P(A i B) P(B) i P(A i B), as required. 3 read A given B.

16 0 CHAPTER 2. THE AXIOMATIC APPROACH Theorem 2.6 (Law of total probability). Let B, B 2,... be a partition of Ω. Then P(A) i P(A B i ) P(B i ). Proof. P(A Bi ) P(B i ) P(A B i ) ( ) P A B i i P(A), as required. Example (Gambler s Ruin). A fair coin is tossed repeatedly. At each toss a gambler wins if a head shows and loses if tails. He continues playing until his capital reaches m or he goes broke. Find p x, the probability that he goes broke if his initial capital is x. Solution. Let A be the event that he goes broke before reaching m, and let H or T be the outcome of the first toss. We condition on the first toss to get P(A) P(A H) P(H) + P(A T ) P(T ). But P(A H) p x+ and P(A T ) p x. Thus we obtain the recurrence p x+ p x p x p x. Note that p x is linear in x, with p 0, p m 0. Thus p x x m. Theorem 2.7 (Bayes Formula). Let B, B 2,... be a partition of Ω. Then P(B i A) P(A B i) P(B i ) j P(A B j) P(B j ). Proof. P(B i A) P(A B i) P(A) by the law of total probability. P(A B i) P(B i ) j P(A B j) P(B j ),

17 Chapter 3 Random Variables Let Ω be finite or countable, and let p ω P({ω}) for ω Ω. Definition 3.. A random variable X is a function X : Ω R. Note that random variable is a somewhat inaccurate term, a random variable is neither random nor a variable. Example. If Ω {(i, j), i, j t}, then we can define random variables X and Y by X(i, j) i + j and Y (i, j) max{i, j} Let R X be the image of Ω under X. When the range is finite or countable then the random variable is said to be discrete. We write P(X x i ) for ω:x(ω)x i p ω, and for B R P(X B) x B P(X x). Then (P(X x), x R X ) is the distribution of the random variable X. Note that it is a probability distribution over R X. 3. Expectation Definition 3.2. The expectation of a random variable X is the number E[X] ω Ω p w X(ω) provided that this sum converges absolutely.

18 2 CHAPTER 3. RANDOM VARIABLES Note that E[X] p w X(ω) ω Ω p ω X(ω) x R X ω:x(ω)x x R X x ω:x(ω)x p ω x R X xp(x x). Absolute convergence allows the sum to be taken in any order. If X is a positive random variable and if ω Ω p ωx(ω) we write E[X] +. If x R X x 0 xp(x x) and xp(x x) x R X x<0 then E[X] is undefined. Example. If P(X r) e λ λr r!, then E[X] λ. Solution. E[X] r0 λ λr re r! λe λ r λ r (r )! λe λ e λ λ Example. If P(X r) ( n r) p r ( p) n r then E[X] np.

19 3.. EXPECTATION 3 Solution. E[X] n ( ) n rp r ( p) n r r r0 n n! r r!(n r)! pr ( p) n r r0 n (n )! n (r )!(n r)! pr ( p) n r np r n r (n )! (r )!(n r)! pr ( p) n r n (n )! np (r)!(n r)! pr ( p) n r r n ( ) n np p r ( p) n r r np r For any function f : R R the composition of f and X defines a new random variable f and X defines the new random variable f(x) given by f(x)(w) f(x(w)). Example. If a, b and c are constants, then a + bx and (X c) 2 are random variables defined by Note that E[X] is a constant. Theorem 3... If X 0 then E[X] 0. (a + bx)(w) a + bx(w) and (X c) 2 (w) (X(w) c) If X 0 and E[X] 0 then P(X 0). 3. If a and b are constants then E[a + bx] a + be[x]. 4. For any random variables X, Y then E[X + Y ] E[X] + E[Y ]. [ 5. E[X] is the constant which minimises E (X c) 2]. Proof.. X 0 means X w 0 w Ω So E[X] ω Ω p ω X(ω) 0 2. If ω Ω with p ω > 0 and X(ω) > 0 then E[X] > 0, therefore P(X 0).

20 4 CHAPTER 3. RANDOM VARIABLES 3. E[a + bx] (a + bx(ω)) p ω ω Ω a p ω + b p ω X(ω) ω Ω ω Ω a + E[X]. 4. Trivial. 5. Now E [ (X c) 2] E [ (X E[X] + E[X] c) 2] E [ [(X E[X]) 2] + 2(X E[X])(E[X] c) + [(E[X] c)] 2 ] E [ (X E[X]) 2] + 2(E[X] c)e[(x E[X])] + (E[X] c) 2 E [ (X E[X]) 2] + (E[X] c) 2. This is clearly minimised when c E[X]. Theorem 3.2. For any random variables X, X 2,..., X n [ n ] n E X i E[X i ] i i Proof. [ n ] [ n ] E X i E X i + X n i E i [ n ] X i + E[X] i Result follows by induction. 3.2 Variance Var X E [ X 2] E[X] 2 Standard Deviation Var X E[X E[X]] 2 σ 2 for Random Variable X Theorem 3.3. Properties of Variance (i) Var X 0 if Var X 0, then P(X E[X]) Proof - from property of expectation (ii) If a, b constants, Var (a + bx) b 2 Var X

21 3.2. VARIANCE 5 Proof. Var a + bx E[a + bx a be[x]] b 2 E[X E[X]] b 2 Var X (iii) Var X E [ X 2] E[X] 2 Proof. E[X E[X]] 2 E [ X 2 2XE[X] + (E[X]) 2] E [ X 2] 2E[X] E[X] + E[X] 2 E [ X 2] (E[X]) 2 Example. Let X have the geometric distribution P(X r) pq r with r 0,, 2... and p + q. Then E[X] q p and Var X q p 2. Solution. E[X] rpq r pq E [ X 2] r0 pq r0 r0 rq r d dq (qr ) pq d ( ) dq q pq( q) 2 q p r 2 p 2 q 2r r0 ( ) pq r(r + )q r rq r r r 2 pq( ( q) 3 ( q) 2 2q p 2 q p Var X E [ X 2] E[X] 2 2q p 2 q p q2 p q p 2 Definition 3.3. The co-variance of random variables X and Y is: The correlation of X and Y is: Cov(X, Y ) E[(X E[X])(Y E[Y ])] Corr(X, Y ) Cov(X, Y ) Var X Var Y

22 6 CHAPTER 3. RANDOM VARIABLES Linear Regression Theorem 3.4. Var (X + Y ) Var X + Var Y + 2Cov(X, Y ) Proof. Var (X + Y ) E [ (X + Y ) 2 E[X] E[Y ] ] 2 E [ (X E[X]) 2 + (Y E[Y ]) 2 + 2(X E[X])(Y E[Y ]) ] Var X + Var Y + 2Cov(X, Y ) 3.3 Indicator Function Definition 3.4. The Indicator Function I[A] of an event A Ω is the function I[A](w) {, if ω A; 0, if ω / A. (3.) NB that I[A] is a random variable. E[I[A]] P(A) E[I[A]] p ω I[A](w) ω Ω P(A) 2. I[A c ] I[A] 3. I[A B] I[A]I[B] 4. I[A B] I[A] + I[B] I[A]I[B] I[A B](ω) if ω A or ω B I[A B](ω) I[A](ω) + I[B](ω) I[A]I[B](ω) WORKS! Example. n couples are arranged randomly around a table such that males and females alternate. Let N The number of husbands sitting next to their wives. Calculate

23 3.3. INDICATOR FUNCTION 7 the E[N] and the Var N. N n I[A i ] i [ n ] E[N] E I[A i ] i n E[I[A i ]] i n i 2 n Thus E[N] n 2 n 2 E [ ( n ) 2 N 2] E I[A i ] i n E I[A i ] i A i event couple i are together i j I[A i ]I[A j ] ne [ I[A i ] 2] + n(n )E[(I[A ]I[A 2 ])] E [ I[A i ] 2] E[I[A i ]] 2 n E[(I[A ]I[A 2 ])] IE[[A B 2 ]] P(A A 2 ) P(A ) P(A 2 A ) 2 ( n n n n 2 ) 2 n n Var N E [ N 2] E[N] 2 2 ( + 2(n 2)) 2 n 2(n 2) n

24 8 CHAPTER 3. RANDOM VARIABLES 3.4 Inclusion - Exclusion Formula ( N N A i A c i [ N ] [( N I A i I I ) c A c i [ N ) c ] A c i ] N I[A c i] N ( I[A i ]) N I[A i ] i i 2 I[A ]I[A 2 ] ( ) j+ I[A ]I[A 2 ]...I[A j ] +... i i 2... i j Take Expectation [ N ] ( N ) E A i P A i N P(A i ) i i 2 P(A A 2 ) ( ) j+ i i 2... i j P ( Ai A i2... A ij ) Independence Definition 3.5. Discrete random variables X,..., X n are independent if and only if for any x...x n : P(X x, X 2 x 2...X n x n ) N P(X i x i ) Theorem 3.5 (Preservation of Independence). If X,..., X n are independent random variables and f, f 2...f n are functions R R then f (X )...f n (X n ) are independent random variables

25 3.5. INDEPENDENCE 9 Proof. P(f (X ) y,..., f n (X n ) y n ) x :f (X )y,... x n:f n(x n)y n N P(X i x i ) x i:f i(x i)y i N P(f i (X i ) y i ) P(X x,..., X n x n ) Theorem 3.6. If X...X n are independent random variables then: [ N ] E X i N E[X i ] NOTE that E[ X i ] E[X i ] without requiring independence. Proof. Write R i for R Xi the range of X i [ N ] E X i... x..x n P(X x, X 2 x 2..., X n x n ) x R x n R n ( ) N P(X i x i ) x i R i N E[X i ] Theorem 3.7. If X,..., X n are independent random variables and f...f n are function R R then: [ N ] N E f i (X i ) E[f i (X i )] Proof. Obvious from last two theorems! Theorem 3.8. If X,..., X n are independent random variables then: ( n ) n Var X i Var X i i i

26 20 CHAPTER 3. RANDOM VARIABLES Proof. ( n ) ( n ) 2 [ n ] 2 Var X i E X i E X i i i i E Xi 2 + [ n ] 2 X i X j E X i i i j i E [ X 2 ] i + E[X i X j ] E[X i ] 2 E[X i ] E[X j ] i i j i i j ( E [ X 2 ] i E[Xi ] 2) i i Var X i Theorem 3.9. If X,..., X n are independent identically distributed random variables then Proof. Var Var ( n ( n ) n X i n Var X i i ) n X i n 2 Var X i i n n 2 Var X i i n Var X i Example. Experimental Design. Two rods of unknown lengths a, b. A rule can measure the length but with but with error having 0 mean (unbiased) and variance σ 2. Errors independent from measurement to measurement. To estimate a, b we could take separate measurements A, B of each rod. E[A] a Var A σ 2 E[B] b Var B σ 2

27 3.5. INDEPENDENCE 2 Can we do better? YEP! Measure a + b as X and a b as Y E[X] a + b Var X σ 2 E[Y ] a b Var Y σ 2 [ ] X + Y E a 2 Var X + Y 2 2 σ2 [ ] X Y E b 2 Var X Y 2 2 σ2 So this is better. Example. Non standard dice. You choose then I choose one. Around this cycle a B P(A B) 2 3. So the relation A better that B is not transitive.

28 22 CHAPTER 3. RANDOM VARIABLES

29 Chapter 4 Inequalities 4. Jensen s Inequality A function f; (a, b) R is convex if f(px + qy) pf(x) + ( p)f(y) - x, y (a, b) - p (0, ) Strictly convex if strict inequality holds when x y f is concave if f is convex. f is strictly concave if f is strictly convex 23

30 24 CHAPTER 4. INEQUALITIES Concave neither concave or convex. We know that if f is twice differentiable and f (x) 0 for x (a, b) the if f is convex and strictly convex if f (x) 0 forx (a, b). Example. f(x) is strictly convex on (0, ) Example. Strictly concave. f(x) log x f (x) x f (x) x 2 0 f(x) x log x f (x) ( + logx) f (x) x 0 Example. f(x x 3 is strictly convex on (0, ) but not on (, ) Theorem 4.. Let f : (a, b) R be a convex function. Then: ( n n ) p i f(x i ) f p i x i i x,..., X n (a, b), p,..., p n (0, ) and p i. Further more if f is strictly convex then equality holds if and only if all x s are equal. i E[f(X)] f(e[x])

31 4.. JENSEN S INEQUALITY 25 Proof. By induction on n n nothing to prove n 2 definition of convexity. Assume results holds up to n-. Consider x,..., x n (a, b), p,..., p n (0, ) and pi For i 2...n, set p i p n i, such that p i p i i2 Then by the inductive hypothesis twice, first for n-, then for 2 n n p i f i (x i ) p f(x ) + ( p ) p if(x i ) ( n ) f p i x i i i2 ( n ) p f(x ) + ( p )f p ix i f ( p x + ( p ) i2 ) n p ix i f is strictly convex n 3 and not all the x i s equal then we assume not all of x 2...x n are equal. But then ( n n ) ( p j ) p if(x i ) ( p j )f p ix i So the inequality is strict. i2 Corollary (AM/GM Inequality). Positive real numbers x,..., x n ( n i x i ) n n n i Equality holds if and only if x x 2 x n Proof. Let P(X x i ) n then f(x) log x is a convex function on (0, ). So x i i2 i2 E[f(x)] f (E[x]) (Jensen s Inequality) E[log x] log E[x] [] Therefore n log x i log n x n n ( n i x i ) n n n x i [2] For strictness since f strictly convex equation holds in [] and hence [2] if and only if x x 2 x n i

32 26 CHAPTER 4. INEQUALITIES If f : (a, b) R is a convex function then it can be shown that at each point y (a, b) a linear function α y + β y x such that f(x) α y + β y x x (a, b) f(y) α y + β y y If f is differentiable at y then the linear function is the tangent f(y) + (x y)f (y) Let y E[X], α α y and β β y f (E[X]) α + βe[x] So for any random variable X taking values in (a, b) E[f(X)] E[α + βx] α + βe[x] f (E[X]) 4.2 Cauchy-Schwarz Inequality Theorem 4.2. For any random variables X, Y, Proof. For a, b R Let E[XY ] 2 E [ X 2] E [ Y 2] LetZ ax by Then0 E [ Z 2] E [ (ax by ) 2] a 2 E [ X 2] 2abE[XY ] + b 2 E [ Y 2] quadratic in a with at most one real root and therefore has discriminant 0.

33 4.3. MARKOV S INEQUALITY 27 Take b 0 E[XY ] 2 E [ X 2] E [ Y 2] Corollary. Corr(X, Y ) Proof. Apply Cauchy-Schwarz to the random variables X E[X] and Y E[Y ] 4.3 Markov s Inequality Theorem 4.3. If X is any random variable with finite mean then, Proof. Let P( X a) E[ X ] a for any a 0 A X a T hen X ai[a] Take expectation E[ X ] ap(a) E[ X ] ap( X a) 4.4 Chebyshev s Inequality Theorem 4.4. Let X be a random variable with E [ X 2]. Then ɛ 0 Proof. P( X ɛ) E[ X 2] ɛ 2 I[ X ɛ] x2 ɛ 2 x

34 28 CHAPTER 4. INEQUALITIES Then Take Expectation I[ X ɛ] x2 ɛ 2 [ ] x 2 P( X ɛ) E E[ X 2] ɛ 2 ɛ 2 Note. The result is distribution free - no assumption about the distribution of X (other than E [ X 2] ). 2. It is the best possible inequality, in the following sense X +ɛ with probability c 2ɛ 2 ɛ with probability c 2ɛ 2 0 with probability c ɛ 2 Then P( X ɛ) c ɛ 2 E [ X 2] c P( X ɛ) c ɛ 2 E [ ] X 2 ɛ 2 3. If µ E[X] then applying the inequality to X µ gives Often the most useful form. P(X µ ɛ) Var X ɛ Law of Large Numbers Theorem 4.5 (Weak law of large numbers). Let X, X 2... be a sequences of independent identically distributed random variables with Variance σ 2 Let S n n i X i Then ( ) S n ɛ 0, P n µ ɛ 0 as n

35 4.5. LAW OF LARGE NUMBERS 29 Proof. By Chebyshev s Inequality ( ) S n P n µ ɛ ( S n Thus P n µ E[ ( Sn n µ)2] ɛ 2 E[ (S n nµ) 2] n 2 ɛ 2 Var S n n 2 ɛ 2 Since E[S n ] nµ But Var S n nσ 2 ) ɛ nσ2 n 2 ɛ 2 σ2 nɛ 2 0 properties of expectation Example. A, A 2... are independent events, each with probability p. Let X i I[A i ]. Then Theorem states that S n n na n number of times A occurs number of trials µ E[I[A i ]] P(A i ) p ( ) S n P n p ɛ 0 as n Which recovers the intuitive definition of probability. Example. A Random Sample of size n is a sequence X, X 2,..., X n of independent identically distributed random variables ( n observations ) X n i X i n is called the SAMPLE MEAN Theorem states that provided the variance of X i is finite, the probability that the sample mean differs from the mean of the distribution by more than ɛ approaches 0 as n. We have shown the weak law of large numbers. Why weak? larger numbers. ( ) Sn P n µ as n This is NOT the same as the weak form. What does this mean? ω Ω determines S n, n, 2,... n as a sequence of real numbers. Hence it either tends to µ or it doesn t. ( P ω : S ) n(ω) µ as n n a strong form of

36 30 CHAPTER 4. INEQUALITIES

37 Chapter 5 Generating Functions In this chapter, assume that X is a random variable taking values in the range 0,, 2,.... Let p r P(X r) r 0,, 2,... Definition 5.. The Probability Generating Function (p.g.f) of the random variable X,or of the distribution p r 0,, 2,..., is p(z) E [ z X] z r P(X r) p r z r r0 This p(z) is a polynomial or a power series. If a power series then it is convergent for z by comparison with a geometric series. r0 p(z) r p r z r r p r Example. p r 6 r,..., 6 p(z) E [ z X] 6 z z 6 6 z ( + z +... z 6 ) Theorem 5.. The distribution of X is uniquely determined by the p.g.f p(z). Proof. We know that we can differential p(z) term by term for z Repeated differentiation gives p (z) p + 2p 2 z +... and so p (0) p (p(0) p 0 ) p (i) (z) ri r! (r i)! p rz r i and has p (i) 0 i!p i Thus we can recover p 0, p,... from p(z) 3

38 32 CHAPTER 5. GENERATING FUNCTIONS Theorem 5.2 (Abel s Lemma). Proof. p (z) E[X] lim z p (z) rp r z r z ri For z (0, ), p (z) is a non decreasing function of z and is bounded above by Choose ɛ 0, N large enough that Then True ɛ 0 and so z ri E[X] rp r ri N rp r E[X] ɛ ri N N lim rp r z r lim rp r z r rp r z ri E[X] lim z p (z) ri Usually p (z) is continuous at z, then E[X] p (). ( Recall p(z) z z 6 ) 6 z Theorem 5.3. Proof. E[X(X )] lim z p (z) p (z) Proof now the same as Abel s Lemma r(r )pz r 2 r2 Theorem 5.4. Suppose that X, X 2,..., X n are independent random variables with p.g.f s p (z), p 2 (z),..., p n (z). Then the p.g.f of X + X X n is Proof. p (z)p 2 (z)... p n (z) E [ z X+X2+...Xn] E [ z X.z X2....z Xn] E [ z X] E [ z X2]... E [ z Xn] p (z)p 2 (z)... p n (z)

39 33 Example. Suppose X has Poisson Distribution Then Let s calculate the variance of X Then λ λr P(X r) e r! E [ z X] r0 r 0,,... z r λ λr e r! e λ e λz e λ( z) p λe λ( z) p λ 2 e λ( z) E[X] lim z p (z) p ()( Since p (z) continuous at z )E[X] λ E[X(X )] p () λ 2 Var X E [ X 2] E[X] 2 E[X(X )] + E[X] E[X] 2 λ 2 + λ λ 2 λ Example. Suppose that Y has a Poisson Distribution with parameter µ. If X and Y are independent then: E [ z X+Y ] E [ z X] E [ z Y ] e λ( z) e µ( z) e (λ+µ)( z) But this is the p.g.f of a Poisson random variable with parameter λ + µ. By uniqueness (first theorem of the p.g.f) this must be the distribution for X + Y Example. X has a binomial Distribution, ( n P(X r) r E [ z X] n r0 ) p r ( p) n r r 0,,... ( ) n p r ( p) n r z r r (pz + p) n This shows that X Y + Y Y n. Where Y + Y Y n are independent random variables each with P(Y i ) p P(Y i 0) p Note if the p.g.f factorizes look to see if the random variable can be written as a sum.

40 34 CHAPTER 5. GENERATING FUNCTIONS 5. Combinatorial Applications Tile a (2 n) bathroom with (2 ) tiles. How many ways can this be done? Say f n f n f n + f n 2 f 0 f Let F (z) f n z n n0 f n z n f n z n + f n 2 z n f n z n f n z n + f n 2 z n n2 n2 n0 F (z) f 0 zf z(f (z) f 0 ) + z 2 F (z) F (z)( z z 2 ) f 0 ( z) + zf z + z. Since f 0 f, then F (z) z z 2 Let α + 5 α F (z) ( α z)( α 2 z) α The coefficient of z n, that is f n, is ( α z) α 2 ( α 2 z) ( α α n z n α 2 α α 2 f n n0 5.2 Conditional Expectation n0 α α 2 (α n+ α n+ 2 ) Let X and Y be random variables with joint distribution Then the distribution of X is P(X x) P(X x, Y y) y R y P(X x, Y y) α n 2 z n ) This is often called the Marginal distribution for X. The conditional distribution for X given by Y y is P(X x Y y) P(X x, Y y) P(Y y)

41 5.3. PROPERTIES OF CONDITIONAL EXPECTATION 35 Definition 5.2. The conditional expectation of X given Y y is, E[X x Y y] x R x xp(x x Y y) The conditional Expectation of X given Y is the random variable E[X Y ] defined by Thus E[X Y ] : Ω R E[X Y ] (ω) E[X Y Y (ω)] Example. Let X, X 2,..., X n be independent identically distributed random variables with P(X ) p and P(X 0) p. Let Then Y X + X X n P(X Y r) P(X, Y r) P(Y r) P(X, X X n r ) P(Y r) P(X ) P(X X n r ) P(Y r) p( ) n r p r ( p) n r ) pr ( p) n r ( n r ( n r ( n r) ) r n Then E[X Y r] 0 P(X 0 Y r) + P(X Y r) r n E[X Y Y (ω)] n Y (ω) Therefore E[X Y ] n Y Note a random variable - a function of Y. 5.3 Properties of Conditional Expectation Theorem 5.5. E[E[X Y ]] E[X]

42 36 CHAPTER 5. GENERATING FUNCTIONS Proof. E[E[X Y ]] P(Y y) E[X Y y] y R y y y E[X] P(Y y) P(X x Y y) x R x xp(x x Y y) x Theorem 5.6. If X and Y are independent then E[X Y ] E[X] Proof. If X and Y are independent then for any y R y E[X Y y] xp(x x Y y) xp(x x) E[X] x R x x Example. Let X, X 2,... be i.i.d.r.v s with p.g.f p(z). Let N be a random variable independent of X, X 2,... with p.g.f h(z). What is the p.g.f of: X + X X N E [ z X+,...,Xn] E [ E [ z X+,...,Xn N ]] P(N n) E [ z X+,...,Xn N n ] n0 P(N n) (p(z)) n n0 h(p(z)) Then for example E[X +,..., X n ] d dz h(p(z)) z h ()p () E[N] E[X ] Exercise Calculate d2 dz 2 h(p(z)) and hence Var X +,..., X n In terms of Var N and Var X

43 5.4. BRANCHING PROCESSES Branching Processes X 0, X... sequence of random variables. X n number of individuals in the n th generation of population. Assume.. X 0 2. Each individual lives for unit time then on death produces k offspring, probability f k. f k 3. All offspring behave independently. Assume Where Y n i. f f 0 + f X n+ Y n + Y n Y n n are i.i.d.r.v s. Y n i number of offspring of individual i in generation n. Let F(z) be the probability generating function ofy n i. Let F (z) f k z k E [ z Xi] ] E [z Y n i n0 F n (z) E [ z Xn] Then F (z) F (z) the probability generating function of the offspring distribution. Theorem 5.7. F n (z) is an n-fold iterative formula. F n+ (z) F n (F (z)) F (F (... (F (z))... ))

44 38 CHAPTER 5. GENERATING FUNCTIONS Proof. F n+ (z) E [ z Xn+] E [ E [ ]] z Xn+ X n P(X n k) E [ z Xn+ X n k ] n0 n0 n0 ] P(X n k) E [z Y n +Y n 2 + +Y n n ] [ ] P(X n k) E [z Y n... E z Y n n P(X n k) (F (z)) k n0 F n (F (z)) Theorem 5.8. Mean and Variance of population size If m and σ 2 kf k k0 (k m) 2 f k k0 Mean and Variance of offspring distribution. Then E[X n ] m n Var X n { σ 2 m n (m n ) m, m nσ 2, m (5.) Proof. Prove by calculating F (z), F (z) Alternatively E[X n ] E[E[X n X n ]] E[m X n ] me[x n ] m n by induction E [ (X n mx n ) 2] E [ E [ (X n mx n ) 2 X n ]] E[Var (X n X n )] E [ σ 2 X n ] σ 2 m n Thus E [ Xn] 2 2mE[Xn X n ] + m 2 E [ Xn 2 ] 2 σ 2 m n

45 5.4. BRANCHING PROCESSES 39 Now calculate E[X n X n ] E[E[X n X n X n ]] E[X n E[X n X n ]] E[X n mx n ] me [ Xn 2 ] Then E [ Xn 2 ] σ 2 m n + m 2 E[X n ] 2 Var X n E [ Xn 2 ] E[Xn ] 2 m 2 E [ Xn 2 ] + σ 2 m n m 2 E[X n ] 2 m 2 Var X n + σ 2 m n m 4 Var X n 2 + σ 2 (m n + m n ) m 2(n ) Var X + σ 2 (m n + m n + + m 2n 3 ) σ 2 m n ( + m + + m n ) To deal with extinction we need to be careful with limits as n. Let A n X n 0 Extinction occurs by generation n and let A A n the event that extinction ever occurs Can we calculate P(A) from P(A n )? More generally let A n be an increasing sequence A A 2... and define A lim n A n A n Define B n for n B A B n A n ( n i A n A c n A i ) c

46 40 CHAPTER 5. GENERATING FUNCTIONS B n for n are disjoint events and Thus A i i n A i i i n i B i B i ( ) ( ) P A i P B i i i P(B i ) lim n lim n P(B i ) n B i n i A i n i lim n lim n P(A n) ( ) P lim A n lim P(A n) n n Probability is a continuous set function. Thus P(extinction ever occurs) lim n P(A n) lim n P(X n 0) q, Say Note P(X n 0), n, 2, 3,... is an increasing sequence so limit exists. But P(X n 0) F n (0) F n is the p.g.f of X n So Also q lim n F n(0) ( ) F (q) F lim F n(0) n lim F (F n(0)) n lim F n+(0) n Thus F (q) q Since F is continuous q is called the Extinction Probability.

47 5.4. BRANCHING PROCESSES 4 Alternative Derivation q k P(X k) P(extinction X k) P(X k) q k F (q) Theorem 5.9. The probability of extinction, q, is the smallest positive root of the equation F (q) q. m is the mean of the offspring distribution. If m then q, while if m thenq Proof. F () m 0 kf k lim z F (z) F (z) j(j )z j 2 in 0 z Since f 0 + f Also F (0) f 0 0 jz Thus if m, there does not exists a q (0, ) with F (q) q. If m then let α be the smallest positive root of F (z) z then α. Further, F (0) F (α) α F (F (0)) F (α) α F n (0) α n q lim F n(0) 0 n q α Since q is a root of F (z) z

48 42 CHAPTER 5. GENERATING FUNCTIONS 5.5 Random Walks Let X, X 2,... be i.i.d.r.vs. Let S n S 0 + X + X X n Where, usually S 0 0 Then S n (n 0,, 2,... is a dimensional Random Walk. We shall assume X n {, with probability p, with probability q (5.2) This is a simple random walk. If p q 2 then the random walk is called symmetric Example (Gambler s Ruin). You have an initial fortune of A and I have an initial fortune of B. We toss coins repeatedly I win with probability p and you win with probability q. What is the probability that I bankrupt you before you bankrupt me?

49 5.5. RANDOM WALKS 43 Set a A + B and z B Stop a random walk starting at z when it hits 0 or a. Let p z be the probability that the random walk hits a before it hits 0, starting from z. Let q z be the probability that the random walk hits 0 before it hits a, starting from z. After the first step the gambler s fortune is either z or z + with prob p and q respectively. From the law of total probability. p z qp z + pp z+ 0 z a Also p 0 0 and p a. Must solve pt 2 t + q 0. t ± 4pq 2p ± 2p 2p or q p General Solution for p q is and so p z A + B ( ) z q A + B 0A p p z If p q, the general solution is A + Bz ( ) z q p ( ) a q p ( ) a q p p z z a To calculate q z, observe that this is the same problem with p, q, z replaced by p, q, a z respectively. Thus ) a ( ) z q q z ( q p ( q p p ) a if p q

50 44 CHAPTER 5. GENERATING FUNCTIONS or q z a z if p q z Thus q z + p z and so on, as we expected, the game ends with probability one. What happens as a? P(hits 0 before a) q z P( paths hit 0 ever) q z ( q p ( q p Or a z z az+ ) a ( q p )z ) a if p q if p q path hits 0 before it hits a P(hits 0 ever) lim P(hits 0 before a) a lim q z a ( ) q p q p p q Let G be the ultimate gain or loss. { a z, with probability p z G (5.3) z, with probability q z E[G] { ap z z, if p q 0, if p q (5.4) Fair game remains fair if the coin is fair then then games based on it have expected reward 0. Duration of a Game Let D z be the expected time until the random walk hits 0 or a, starting from z. Is D z finite? D z is bounded above by x the mean of geometric random variables (number of window s of size a before a window with all + s or s). Hence D z is finite. Consider the first step. Then D z + pd z+ + qd z E[duration] E[E[duration first step]] p (E[duration first step up]) + q (E[duration first step down]) p( + D z+ ) + q( + D z ) Equation holds for 0 z a with D 0 D a 0. Let s try for a particular solution D z C z C q p for p q C z C p (z + ) + C q (z ) +

51 5.5. RANDOM WALKS 45 Consider the homogeneous relation General Solution for p q is Substitute z 0, a to get A and B pt 2 t + q 0 t t 2 q p D z D z A + B z q p ( ) z q + z p q p a q p ( ) z q p ( q p ) a p q If p q then a particular solution is z 2. General solution D z z 2 + A + Bz Substituting the boundary conditions given., D z z(a z) p q Example. Initial Capital. p q z a P(ruin) E[gain] E[duration] Stop the random walk when it hits 0 or a. We have absorption at 0 or a. Let U z,n P(r.w. hits 0 at time n starts at z) U z,n+ pu z+,n + qu z,n 0 z a n 0 U 0,n U a,n 0 n 0 U a,0 U z,0 0 0 z a Let U z U z,n s n. n0 Now multiply by s n+ and add for n 0,, 2... Where U 0 (s) and U a (s) 0 U z (s) psu z+ (s) + qsu z (s) Look for a solution U x (s) (λ(s)) z λ(s)

52 46 CHAPTER 5. GENERATING FUNCTIONS Must satisfy Two Roots, Every Solution of the form λ(s) ps ((λ(s)) 2 + qs λ (s), λ 2 (s) ± 4pqs 2 2ps U z (s) A(s) (λ (s)) z + B(s) (λ 2 (s)) z Substitute U 0 (s) and U a (s) 0.A(s) + B(s) and A(s) (λ (s)) a + B(s) (λ 2 (s)) a 0 U z (s) (λ (s)) a (λ 2 (s)) z (λ (s)) z (λ 2 (s)) a (λ (s)) a (λ 2 (s)) a But λ (s)λ 2 (s) q recall quadratic p ( ) q (λ (s)) a z (λ 2 (s)) a z U z (s) p (λ (s)) a (λ 2 (s)) a Same method give generating function for absorption probabilities at the other barrier. Generating function for the duration of the game is the sum of these two generating functions.

53 Chapter 6 Continuous Random Variables In this chapter we drop the assumption that Ω id finite or countable. Assume we are given a probability p on some subset of Ω. For example, spin a pointer, and let ω Ω give the position at which it stops, with Ω ω : 0 ω 2π. Let P(ω [0, θ]) θ (0 θ 2π) 2π Definition 6.. A continuous random variable X is a function X : Ω R for which Where f(x) is a function satisfying. f(x) 0 P(a X(ω) b) b a f(x)dx 2. + f(x)dx The function f is called the Probability Density Function. For example, if X(ω) ω given position of the pointer then x is a continuous random variable with p.d.f f(x) { 2π, (0 x 2π) 0, otherwise (6.) This is an example of a uniformly distributed random variable. On the interval [0, 2π] 47

54 48 CHAPTER 6. CONTINUOUS RANDOM VARIABLES in this case. Intuition about probability density functions is based on the approximate relation. P(X [x, x + xδx]) x+xδx Proofs however more often use the distribution function F (x) P(X x) F (x) is increasing in x. x f(z)dz If X is a continuous random variable then F (x) and so F is continuous and differentiable. x f(z)dz F (x) f(x) (At any point x where then fundamental theorem of calculus applies). The distribution function is also defined for a discrete random variable, F (x) and so F is a step function. ω:x(ω) x p ω In either case P(a X b) P(X b) P(X a) F (b) F (a)

55 49 Example. The exponential distribution. Let { e λx, 0 x F (x) 0, x 0 (6.2) The corresponding pdf is f(x) λe λx 0 x this is known as the exponential distribution with parameter λ. If X has this distribution then P(X x + z X z) P(X x + z) P(X z) e λ(x+z) e λz e λx P(X x) This is known as the memoryless property of the exponential distribution. Theorem 6.. If X is a continuous random variable with pdf f(x) and h(x) is a continuous strictly increasing function with h (x) differentiable then h(x) is a continuous random variable with pdf f h (x) f ( h (x) ) d dx h (x) Proof. The distribution function of h(x) is P(h(X) x) P ( X h (x) ) F ( h (x) ) Since h is strictly increasing and F is the distribution function of X Then. d P(h(X) x) dx is a continuous random variable with pdf as claimed f h. Note usually need to repeat proof than remember the result.

56 50 CHAPTER 6. CONTINUOUS RANDOM VARIABLES Example. Suppose X U[0, ] that is it is uniformly distributed on [0, ] Consider Y log x Thus Y is exponentially distributed. More generally P(Y y) P( log X y) P ( X e Y ) e Y dx e Y Theorem 6.2. Let U U[0, ]. For any continuous distribution function F, the random variable X defined by X F (u) has distribution function F. Proof. P(X x) P ( F (u) x ) P(U F (x)) F (x) U[0, ] Remark. a bit more messy for discrete random variables P(X X i ) p i i 0,,... Let X x j if j p i U i0 i0 j p i U U[0, ] 2. useful for simulations 6. Jointly Distributed Random Variables For two random variables X and Y the joint distribution function is Let F (x, y) P(X x, Y y) F : R 2 [0, ] F X (x) P(Xz x) P(X x, Y ) F (x, ) lim F (x, y) y

57 6.. JOINTLY DISTRIBUTED RANDOM VARIABLES 5 This is called the marginal distribution of X. Similarly F Y (x) F (, y) X, X 2,..., X n are jointly distributed continuous random variables if for a set c R b P((X, X 2,..., X n ) c)... f(x,..., x n )dx... dx n (x,...,x n) c For some function f called the joint probability density function satisfying the obvious conditions.. 2. Example. (n 2) f(x,..., x n )dx 0... f(x,..., x n )dx... dx n R n and so f(x, y) 2 F (x, y) x y F (x, y) P(X x, Y y) x y f(u, v)dudv Theorem 6.3. provided defined at (x, y). If X and y are jointly continuous random variables then they are individually continuous. Proof. Since X and Y are jointly continuous random variables is the pdf of X. P(X A) P(X A, Y (, + )) where f X (x) f(x, y)dy A f A f X (x)dx Jointly continuous random variables X and Y are Independent if f(x, y) f X (x)f Y (y) Then P(X A, Y B) P(X A) P(Y B) f(x, y)dxdy Similarly jointly continuous random variables X,..., X n are independent if n f(x,..., x n ) f Xi (x i ) i

58 52 CHAPTER 6. CONTINUOUS RANDOM VARIABLES Where f Xi (x i ) are the pdf s of the individual random variables. Example. Two points X and Y are tossed at random and independently onto a line segment of length L. What is the probability that: X Y l? Suppose that at random means uniformly so that f(x, y) L 2 x, y [0, L] 2

59 6.. JOINTLY DISTRIBUTED RANDOM VARIABLES 53 Desired probability A f(x, y)dxdy area of A L 2 L2 2 2 (L l)2 2Ll l2 L 2 L 2 Example (Buffon s Needle Problem). A needle of length l is tossed at random onto a floor marked with parallel lines a distance L apart l L. What is the probability that the needle intersects one of the parallel lines. Let θ [0, 2π] be the angle between the needle and the parallel lines and let x be the distance from the bottom of the needle to the line closest to it. It is reasonable to suppose that X is distributed Uniformly. X U[0, L] Θ U[0, π) and X and Θ are independent. Thus f(x, θ) lπ 0 x L and 0 θ π

60 54 CHAPTER 6. CONTINUOUS RANDOM VARIABLES The needle intersects the line if and only if X sin θ The event A f(x, θ)dxdθ l A π 0 2l πl sin θ πl dθ Definition 6.2. The expectation or mean of a continuous random variable X is E[X] xf(x)dx provided not both of xf(x)dx and 0 xf(x)dx are infinite Example (Normal Distribution). Let f(x) e (x µ)2 2σ 2 2πσ x This is non-negative for it to be a pdf we also need to check that Make the substitution z x µ σ. Then f(x)dx Thus I 2 [ ] [ ] e x2 2 dx e y2 2 dy 2π I 2πσ 2π 2π 2π 0 2π 0 2π 0 dθ e (x µ) 2 2σ 2 dx e z2 2 dz e (y2 +x 2 ) 2 dxdy re θ2 2 drdθ Therefore I. A random variable with the pdf f(x) given above has a Normal distribution with parameters µ and σ 2 we write this as The Expectation is E[X] 2πσ 2πσ X N[µ, σ 2 ] xe (x µ) 2 2σ 2 dx (x µ)e (x µ) 2 2σ 2 dx + 2πσ µe (x µ) 2 2σ 2 dx.

61 6.. JOINTLY DISTRIBUTED RANDOM VARIABLES 55 The first term is convergent and equals zero by symmetry, so that E[X] 0 + µ µ Theorem 6.4. If X is a continuous random variable then, Proof. result follows. Similarly E[X] P(X x) dx P(X x) dx P(X x) dx 0 0 x P(X x) dx [ 0 0 y ] f(y)dy dx I[y x]f(y)dydx dxf(y)dy yf(y)dy yf(y)dy Note This holds for discrete random variables and is useful as a general way of finding the expectation whether the random variable is discrete or continuous. If X takes values in the set [0,,..., ] Theorem states and a direct proof follows E[X] P(X n) n0 P(X n) n0 n0 m0 I[m n]p(x m) ( ) I[m n] P(X m) m0 n0 mp(x m) m0 Theorem 6.5. Let X be a continuous random variable with pdf f(x) and let h(x) be a continuous real-valued function. Then provided h(x) f(x)dx E[h(x)] h(x)f(x)dx

62 56 CHAPTER 6. CONTINUOUS RANDOM VARIABLES Proof. Similarly 0 0 P(h(X) y) dy 0 0 [ x:h(x) P(h(X) y) x:h(x) 0 f(x)dx x:h(x) 0 x:h(x) 0 ] dy I[h(x) y]f(x)dxdy [ ] h(x) 0 dy f(x)dx 0 x:h(x) 0 h(x)f(x)dy h(x)f(x)dy So the result follows from the last theorem. Definition 6.3. The variance of a continuous random variable X is Var X E [ (X E[X]) 2] Note The properties of expectation and variance are the same for discrete and continuous random variables just replace with in the proofs. Example. Var X E [ X 2] E[X] 2 ( x 2 f(x)dx xf(x)dx Example. Suppose X N[µ, σ 2 ] Let z X µ σ ( ) X µ P(Z z) P z σ ( Let u x µ ) σ P(X µ + σz) µ+σz z e (x µ)2 2σ 2 dx 2πσ 2π e u2 2 du then Φ(z) The distribution function of a N(0, ) random variable Z N(0, ) ) 2

63 6.2. TRANSFORMATION OF RANDOM VARIABLES 57 What is the variance of Z? Var X E [ Z 2] E[Z] 2 Last term is zero 2π Var X [ ze z2 2 2π 0 + z 2 e z2 2 dz ] + e z2 2 dz Variance of X? X µ + σz Thus E[X] µ we know that already Var X σ 2 Var Z Var X σ 2 X (µ, σ 2 ) 6.2 Transformation of Random Variables Suppose X, X 2,..., X n have joint pdf f(x,..., x n ) let Y r (X, X 2,..., X n ) Y 2 r 2 (X, X 2,..., X n ). Y n r n (X, X 2,..., X n ) Let R R n be such that P((X, X 2,..., X n ) R) Let S be the image of R under the above transformation suppose the transformation from R to S is - (bijective).

64 58 CHAPTER 6. CONTINUOUS RANDOM VARIABLES Then inverse functions x s (y, y 2,..., y n ) x 2 s 2 (y, y 2,..., y n )... x n s n (y, y 2,..., y n ) Assume that si y j exists and is continuous at every point (y, y 2,..., y n ) in S s y... J..... s n y... s y n s n y n (6.3) If A R P((X,..., X n ) A) [] A B f(x,..., x n )dx... dx n f (s,..., s n ) J dy... dy n Where B is the image of A P((Y,..., Y n ) B) [2] Since transformation is - then [],[2] are the same Thus the density for Y,..., Y n is g((y, y 2,..., y n ) f (s (y, y 2,..., y n ),..., s n (y, y 2,..., y n )) J y, y 2,..., y n S 0 otherwise. Example (density of products and quotients). Suppose that (X, Y ) has density f(x, y) { 4xy, for 0 x, 0 y 0, Otherwise. (6.4) Let U X Y and V XY

65 6.2. TRANSFORMATION OF RANDOM VARIABLES 59 X V UV Y U x v uv y u x u v x 2 u v u 2 v y u 2 Therefore J 2u and so v 2 u 3 2 g(u, v) 2u (4xy) Note U and V are NOT independent 2u 4 uv v u 2 u v if (u, v) D 0 Otherwise. g(u, v) 2 u I[(u, v) D] v y v 2 uv. not product of the two identities. When the transformations are linear things are simpler still. Let A be the n n invertible matrix. Then the pdf of (Y,..., Y n ) is Y. A Y n X. X n. J det A det A g(y,..., n ) det A f(a g) Example. Suppose X, X 2 have the pdf f(x, x 2 ). Calculate the pdf of X + X 2. Let Y X + X 2 and Z X 2. Then X Y Z and X 2 Z. ( ) A (6.5) 0 Then det A det A g(y, z) f(x, x 2 ) f(y z, z)

66 60 CHAPTER 6. CONTINUOUS RANDOM VARIABLES joint distributions of Y and X. Marginal density of Y is g(y) or g(y) f(y z, z)dz y f(z, y z)dz By change of variable If X and X 2 are independent, with pgf s f and f 2 then f(x, x 2 ) f(x )f(x 2 ) and then g(y) f(y z)f(z)dz - the convolution of f and f 2 For the pdf f(x) ˆx is a mode if f(ˆx) f(x) x ˆx is a median if ˆx f(x)dx ˆx For a discrete random variable, ˆx is a median if f(x)dx 2 P(X ˆx) 2 or P(X ˆx) 2 If X,..., X n is a sample from the distribution then recall that the sample mean is n X i n Let Y,..., Y n (the statistics) be the values of X,..., X n arranged in increasing order. Then the sample median is Y n+ if n is odd or any value in 2 [ ] Y n, Y 2 n+ if n is even 2 If Y n maxx,..., X n and X,..., X n are iidrv s with distribution F and density f then, P(Y n y) P(X y,..., X n y) (F (y)) n

67 6.2. TRANSFORMATION OF RANDOM VARIABLES 6 Thus the density of Y n is g(y) d (F (y))n dy Similarly Y minx,..., X n and is Then the density of Y is n (F (y)) n f(y) P(Y y) P(X y,..., X n y) ( F (y)) n What about the joint density of Y, Y n? G(y, y n ) P(Y y, Y n y n ) n ( F (y)) n f(y) P(Y n y n ) P(Y n y n, Y ) P(Y n y n ) P(y X y n, y X 2 y n,..., y X n y n ) (F (y n )) n (F (y n ) F (y )) n Thus the pdf of Y, Y n is g(y, y n ) 2 y y n G(y, y n ) n(n ) (F (y n ) F (y )) n 2 f(y )f(y n ) y y n 0 otherwise What happens if the mapping is not -? X f(x) and X g(x)? P( X (a, b)) b a (f(x) + f( x)) dx g(x) f(x) + f( x) Suppose X,..., X n are iidrv s. What is the pdf of Y,..., Y n the order statistics? { n!f(y )... f(y n ), y y 2 y n g(y,..., y n ) (6.6) 0, Otherwise Example. Suppose X,..., X n are iidrv s exponentially distributed with parameter λ. Let z Y z 2 Y 2 Y. z n Y n Y n Where Y,..., Y n are the order statistics of X,..., X n. What is the distribution of the z s?

68 62 CHAPTER 6. CONTINUOUS RANDOM VARIABLES Z AY Where det(a) A h(z,..., z n ) g(y,..., y n ) n!f(y )... f(y n ) n!λ n e λy... e λyn n!λ n e λ(y+ +yn) n!λ n e λ(z2z2+ +nzn) n λie λizn+ i i (6.7) Thus h(z,..., z n ) is expressed as the product of n density functions and Z n+ i exp(λi) exponentially distributed with parameter λi, with z,..., z n independent. Example. Let X and Y be independent N(0.) random variables. Let D R 2 X 2 + Y 2 then tan Θ Y X then ( y d x 2 + y 2 and θ arctan x) J 2x y x 2 +( x) y 2 2y x +( x) y 2 2 (6.8)

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.