Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures. They are nowhere near accurate representations of what was actually lectured, and in particular, all errors are almost surely mine. Basic concepts Classical probability, equally likely outcomes. Combinatorial analysis, permutations and combinations. Stirling s formula (asymptotics for log n! proved). [3] Axiomatic approach Axioms (countable case). Probability spaces. Inclusion-exclusion formula. Continuity and subadditivity of probability measures. Independence. Binomial, Poisson and geometric distributions. Relation between Poisson and binomial distributions. Conditional probability, Bayes s formula. Examples, including Simpson s paradox. [5] Discrete random variables Expectation. Functions of a random variable, indicator function, variance, standard deviation. Covariance, independence of random variables. Generating functions: sums of independent random variables, random sum formula, moments. Conditional expectation. Random walks: gambler s ruin, recurrence relations. Difference equations and their solution. Mean time to absorption. Branching processes: generating functions and extinction probability. Combinatorial applications of generating functions. [7] Continuous random variables Distributions and density functions. Expectations; expectation of a function of a random variable. Uniform, normal and exponential random variables. Memoryless property of exponential distribution. Joint distributions: transformation of random variables (including Jacobians), examples. Simulation: generating continuous random variables, independent normal random variables. Geometrical probability: Bertrand s paradox, Buffon s needle. Correlation coefficient, bivariate normal random variables. [6] Inequalities and limits Markov s inequality, Chebyshev s inequality. Weak law of large numbers. Convexity: Jensen s inequality for general random variables, AM/GM inequality. Moment generating functions and statement (no proof) of continuity theorem. Statement of central limit theorem and sketch of proof. Examples, including sampling. [3] 1
Contents IA Probability (Theorems) Contents 0 Introduction 3 1 Classical probability 4 1.1 Classical probability......................... 4 1.2 Counting............................... 4 1.3 Stirling s formula........................... 4 2 Axioms of probability 5 2.1 Axioms and definitions........................ 5 2.2 Inequalities and formulae...................... 5 2.3 Independence............................. 5 2.4 Important discrete distributions................... 6 2.5 Conditional probability....................... 6 3 Discrete random variables 7 3.1 Discrete random variables...................... 7 3.2 Inequalities.............................. 8 3.3 Weak law of large numbers..................... 9 3.4 Multiple random variables...................... 9 3.5 Probability generating functions.................. 10 4 Interesting problems 11 4.1 Branching processes......................... 11 4.2 Random walk and gambler s ruin.................. 11 5 Continuous random variables 12 5.1 Continuous random variables.................... 12 5.2 Stochastic ordering and inspection paradox............ 12 5.3 Jointly distributed random variables................ 12 5.4 Geometric probability........................ 12 5.5 The normal distribution....................... 12 5.6 Transformation of random variables................ 12 5.7 Moment generating functions.................... 13 6 More distributions 14 6.1 Cauchy distribution......................... 14 6.2 Gamma distribution......................... 14 6.3 Beta distribution*.......................... 14 6.4 More on the normal distribution.................. 14 6.5 Multivariate normal......................... 14 7 Central limit theorem 15 8 Summary of distributions 16 8.1 Discrete distributions......................... 16 8.2 Continuous distributions....................... 16 2
0 Introduction IA Probability (Theorems) 0 Introduction 3
1 Classical probability IA Probability (Theorems) 1 Classical probability 1.1 Classical probability 1.2 Counting Theorem (Fundamental rule of counting). Suppose we have to make r multiple choices in sequence. There are m 1 possibilities for the first choice, m 2 possibilities for the second etc. Then the total number of choices is m 1 m 2 m r. 1.3 Stirling s formula Proposition. log n! n log n Theorem (Stirling s formula). As n, ( ) n!e n log = log ( ) 1 2π + O n n n+ 1 2 Corollary. n! 2πn n+ 1 2 e n Proposition (non-examinable). We use the 1/12n term from the proof above to get a better approximation: 2πn n+1/2 e n+ 1 12n+1 n! 2πn n+1/2 e n+ 1 12n. 4
2 Axioms of probability IA Probability (Theorems) 2 Axioms of probability 2.1 Axioms and definitions Theorem. (i) P( ) = 0 (ii) P(A C ) = 1 P(A) (iii) A B P(A) P(B) (iv) P(A B) = P(A) + P(B) P(A B). Theorem. If A 1, A 2, is increasing or decreasing, then ( ) lim P(A n) = P lim A n. n n 2.2 Inequalities and formulae Theorem (Boole s inequality). For any A 1, A 2,, ( ) P A i P(A i ). i=1 Theorem (Inclusion-exclusion formula). ( n ) n P A i = P(A i ) P(A i1 A j2 ) + P(A i1 A i2 A i3 ) i 1 i 1<i 2 i 1<i 2<i 3 + ( 1) n 1 P(A 1 A n ). Theorem (Bonferroni s inequalities). For any events A 1, A 2,, A n and 1 r n, if r is odd, then ( n ) P A i P(A i1 ) P(A i1 A i2 ) + P(A i1 A i2 A i3 ) + 1 i 1 i 1<i 2 i 1<i 2<i 3 + P(A i1 A i2 A i3 A ir ). i 1<i 2< <i r If r is even, then ( n ) P A i P(A i1 ) P(A i1 A i2 ) + P(A i1 A i2 A i3 ) + 1 i 1 i 1<i 2 i 1<i 2<i 3 P(A i1 A i2 A i3 A ir ). i 1<i 2< <i r 2.3 Independence Proposition. If A and B are independent, then A and B C are independent. i=1 5
2 Axioms of probability IA Probability (Theorems) 2.4 Important discrete distributions Theorem (Poisson approximation to binomial). Suppose n and p 0 such that np = λ. Then ( n q k = )p k (1 p) n k λk k k! e λ. 2.5 Conditional probability Theorem. (i) P(A B) = P(A B)P(B). (ii) P(A B C) = P(A B C)P(B C)P(C). (iii) P(A B C) = P(A B C) P(B C). (iv) The function P( B) restricted to subsets of B is a probability function (or measure). Proposition. If B i is a partition of the sample space, and A is any event, then P(A) = P(A B i ) = P(A B i )P(B i ). i=1 Theorem (Bayes formula). Suppose B i is a partition of the sample space, and A and B i all have non-zero probability. Then for any B i, i=1 P(B i A) = P(A B i)p(b i ) j P(A B j)p(b j ). Note that the denominator is simply P(A) written in a fancy way. 6
3 Discrete random variables IA Probability (Theorems) 3 Discrete random variables 3.1 Discrete random variables Theorem. (i) If X 0, then E[X] 0. (ii) If X 0 and E[X] = 0, then P(X = 0) = 1. (iii) If a and b are constants, then E[a + bx] = a + be[x]. (iv) If X and Y are random variables, then E[X + Y ] = E[X] + E[Y ]. This is true even if X and Y are not independent. (v) E[X] is a constant that minimizes E[(X c) 2 ] over c. Theorem. For any random variables X 1, X 2, X n, for which the following expectations exist, [ n ] n E X i = E[X i ]. Theorem. i=1 i=1 (i) var X 0. If var X = 0, then P(X = E[X]) = 1. (ii) var(a + bx) = b 2 var(x). This can be proved by expanding the definition and using the linearity of the expected value. (iii) var(x) = E[X 2 ] E[X] 2, also proven by expanding the definition. Proposition. E[I[A]] = ω p(ω)i[a](ω) = P(A). I[A C ] = 1 I[A]. I[A B] = I[A]I[B]. I[A B] = I[A] + I[B] I[A]I[B]. I[A] 2 = I[A]. Theorem (Inclusion-exclusion formula). ( n ) n P A i = P(A i ) P(A i1 A j2 ) + P(A i1 A i2 A i3 ) i 1 i 1<i 2 i 1<i 2<i 3 + ( 1) n 1 P(A 1 A n ). Theorem. If X 1,, X n are independent random variables, and f 1,, f n are functions R R, then f 1 (X 1 ),, f n (X n ) are independent random variables. Theorem. If X 1,, X n are independent random variables and all the following expectations exists, then [ ] E Xi = E[X i ]. 7
3 Discrete random variables IA Probability (Theorems) Corollary. Let X 1, X n be independent random variables, and f 1, f 2, f n are functions R R. Then [ ] E fi (x i ) = E[f i (x i )]. Theorem. If X 1, X 2, X n are independent random variables, then ( ) var Xi = var(x i ). Corollary. Let X 1, X 2, X n be independent identically distributed random variables (iid rvs). Then ( 1 ) var Xi = 1 n n var(x 1). 3.2 Inequalities Proposition. If f is differentiable and f (x) 0 for all x (a, b), then it is convex. It is strictly convex if f (x) > 0. Theorem (Jensen s inequality). If f : (a, b) R is convex, then ( n n ) p i f(x i ) f p i x i i=1 for all p 1, p 2,, p n such that p i 0 and p i = 1, and x i (a, b). This says that E[f(X)] f(e[x]) (where P(X = x i ) = p i ). If f is strictly convex, then equalities hold only if all x i are equal, i.e. X takes only one possible value. Corollary (AM-GM inequality). Given x 1,, x n positive reals, then i=1 ( ) 1/n 1 xi xi. n Theorem (Cauchy-Schwarz inequality). For any two random variables X, Y, (E[XY ]) 2 E[X 2 ]E[Y 2 ]. Theorem (Markov inequality). If X is a random variable with E X < and ε > 0, then P( X ε) E X. ε Theorem (Chebyshev inequality). If X is a random variable with E[X 2 ] < and ε > 0, then P( X ε) E[X2 ] ε 2. 8
3 Discrete random variables IA Probability (Theorems) 3.3 Weak law of large numbers Theorem (Weak law of large numbers). Let X 1, X 2, be iid random variables, with mean µ and var σ 2. Let S n = n i=1 X i. Then for all ε > 0, as n. We say, Sn n ( ) S n P n µ ε 0 tends to µ (in probability), or S n n p µ. Theorem (Strong law of large numbers). ( ) Sn P n µ as n = 1. We say S n n as µ, where as means almost surely. 3.4 Multiple random variables Proposition. (i) cov(x, c) = 0 for constant c. (ii) cov(x + c, Y ) = cov(x, Y ). (iii) cov(x, Y ) = cov(y, X). (iv) cov(x, Y ) = E[XY ] E[X]E[Y ]. (v) cov(x, X) = var(x). (vi) var(x + Y ) = var(x) + var(y ) + 2 cov(x, Y ). (vii) If X, Y are independent, cov(x, Y ) = 0. Proposition. corr(x, Y ) 1. Theorem. If X and Y are independent, then E[X Y ] = E[X] Theorem (Tower property of conditional expectation). E Y [E X [X Y ]] = E X [X], where the subscripts indicate what variable the expectation is taken over. 9
3 Discrete random variables IA Probability (Theorems) 3.5 Probability generating functions Theorem. The distribution of X is uniquely determined by its probability generating function. Theorem (Abel s lemma). E[X] = lim z 1 p (z). If p (z) is continuous, then simply E[X] = p (1). Theorem. E[X(X 1)] = lim z 1 p (z). Theorem. Suppose X 1, X 2,, X n are independent random variables with pgfs p 1, p 2,, p n. Then the pgf of X 1 + X 2 + + X n is p 1 (z)p 2 (z) p n (z). 10
4 Interesting problems IA Probability (Theorems) 4 Interesting problems 4.1 Branching processes Theorem. F n+1 (z) = F n (F (z)) = F (F (F ( F (z) )))) = F (F n (z)). Theorem. Suppose E[X 1 ] = kp k = µ and Then var(x 1 ) = E[(X µ) 2 ] = (k µ) 2 p k <. E[X n ] = µ n, var X n = σ 2 µ n 1 (1 + µ + µ 2 + + µ n 1 ). Theorem. The probability of extinction q is the smallest root to the equation q = F (q). Write µ = E[X 1 ]. Then if µ 1, then q = 1; if µ > 1, then q < 1. 4.2 Random walk and gambler s ruin 11
5 Continuous random variables IA Probability (Theorems) 5 Continuous random variables 5.1 Continuous random variables Proposition. The exponential random variable is memoryless, i.e. P(X x + z X x) = P(X z). This means that, say if X measures the lifetime of a light bulb, knowing it has already lasted for 3 hours does not give any information about how much longer it will last. Theorem. If X is a continuous random variable, then E[X] = 0 P(X x) dx 0 P(X x) dx. 5.2 Stochastic ordering and inspection paradox 5.3 Jointly distributed random variables Theorem. If X and Y are jointly continuous random variables, then they are individually continuous random variables. Proposition. For independent continuous random variables X i, (i) E[ X i ] = E[X i ] (ii) var( X i ) = var(x i ) 5.4 Geometric probability 5.5 The normal distribution Proposition. Proposition. E[X] = µ. Proposition. var(x) = σ 2. 1 2πσ 2 e 1 2σ 2 (x µ)2 dx = 1. 5.6 Transformation of random variables Theorem. If X is a continuous random variable with a pdf f(x), and h(x) is a continuous, strictly increasing function with h 1 (x) differentiable, then Y = h(x) is a random variable with pdf f Y (y) = f X (h 1 (y)) d dy h 1 (y). Theorem. Let U U[0, 1]. For any strictly increasing distribution function F, the random variable X = F 1 U has distribution function F. Proposition. (Y 1,, Y n ) has density g(y 1,, y n ) = f(s 1 (y 1,, y n ), s n (y 1,, y n )) J if (y 1,, y n ) S, 0 otherwise. 12
5 Continuous random variables IA Probability (Theorems) 5.7 Moment generating functions Theorem. The mgf determines the distribution of X provided m(θ) is finite for all θ in some interval containing the origin. Theorem. The rth moment X is the coefficient of θr r! in the power series expansion of m(θ), and is E[X r ] = dn dθ n m(θ) = m (n) (0). θ=0 Theorem. If X and Y are independent random variables with moment generating functions m X (θ), m Y (θ), then X + Y has mgf m X+Y (θ) = m X (θ)m Y (θ). 13
6 More distributions IA Probability (Theorems) 6 More distributions 6.1 Cauchy distribution Proposition. The mean of the Cauchy distribution is undefined, while E[X 2 ] =. 6.2 Gamma distribution 6.3 Beta distribution* 6.4 More on the normal distribution Proposition. The moment generating function of N(µ, σ 2 ) is E[e θx ] = exp (θµ + 12 ) θ2 σ 2. Theorem. Suppose X, Y are independent random variables with X N(µ 1, σ 2 1), and Y (µ 2, σ 2 2). Then (i) X + Y N(µ 1 + µ 2, σ 2 1 + σ 2 2). (ii) ax N(aµ 1, a 2 σ 2 1). 6.5 Multivariate normal 14
7 Central limit theorem IA Probability (Theorems) 7 Central limit theorem Theorem (Central limit theorem). Let X 1, X 2, be iid random variables with E[X i ] = µ, var(x i ) = σ 2 <. Define Then for all finite intervals (a, b), S n = X 1 + + X n. ( lim P a S ) n nµ b n σ n b 1 = e 1 2 t2 dt. a 2π Note that the final term is the pdf of a standard normal. We say S n nµ σ n D N(0, 1). Theorem (Continuity theorem). If the random variables X 1, X 2, have mgf s m 1 (θ), m 2 (θ), and m n (θ) m(θ) as n for all θ, then X n D the random variable with mgf m(θ). 15
8 Summary of distributions IA Probability (Theorems) 8 Summary of distributions 8.1 Discrete distributions 8.2 Continuous distributions 16