Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Similar documents
Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Probability. J.R. Norris. December 13, 2017

Lecture 2: Repetition of probability theory and statistics

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Lecture 11. Probability Theory: an Overveiw

EE514A Information Theory I Fall 2013

Continuous Random Variables

Probability and Distributions

Lecture 1: August 28

1 Presessional Probability

Formulas for probability theory and linear models SF2941

Chp 4. Expectation and Variance

Algorithms for Uncertainty Quantification

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Chapter 5 continued. Chapter 5 sections

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

STAT Chapter 5 Continuous Distributions

LIST OF FORMULAS FOR STK1100 AND STK1110

Stat 5101 Notes: Algorithms

Probability and Statistics Notes

Limiting Distributions

Recitation 2: Probability

Lecture 2: Review of Probability

Probability. Copier s Message. Schedules. This is a pure course.

Part II Probability and Measure

Random Variables and Their Distributions

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

3. DISCRETE RANDOM VARIABLES

THE QUEEN S UNIVERSITY OF BELFAST

Limiting Distributions

Week 2. Review of Probability, Random Variables and Univariate Distributions

PROBABILITY VITTORIA SILVESTRI

Probability. Prof. F.P. Kelly. Lent These notes are maintained by Paul Metcalfe. Comments and corrections to

BASICS OF PROBABILITY

1.1 Review of Probability Theory

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Chapter 7. Basic Probability Theory

Lecture 22: Variance and Covariance

Chapter 5. Chapter 5 sections

ACM 116: Lectures 3 4

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Probability: Handout

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

MAS223 Statistical Inference and Modelling Exercises

Lecture 11. Multivariate Normal theory

4. CONTINUOUS RANDOM VARIABLES

Quick Tour of Basic Probability Theory and Linear Algebra

Final Exam # 3. Sta 230: Probability. December 16, 2012

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

1 Review of Probability and Distributions

Actuarial Science Exam 1/P

ECON 5350 Class Notes Review of Probability and Distribution Theory

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Lecture 1: Review on Probability and Statistics

Lecture 1 Measure concentration

Probability reminders

CSE 312 Final Review: Section AA

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Chapter 4. Chapter 4 sections

Introduction to Machine Learning

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Probability Background

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment:

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Contents 1. Contents

Disjointness and Additivity

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Measure-theoretic probability

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

2 n k In particular, using Stirling formula, we can calculate the asymptotic of obtaining heads exactly half of the time:

Chapter 6: Functions of Random Variables

Midterm 2 Review. CS70 Summer Lecture 6D. David Dinh 28 July UC Berkeley

Probability and Statistics

1 Stat 605. Homework I. Due Feb. 1, 2011

Product measure and Fubini s theorem

6.1 Moment Generating and Characteristic Functions

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Review of Probabilities and Basic Statistics

STAT 418: Probability and Stochastic Processes

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Practice Examination # 3

2. AXIOMATIC PROBABILITY

CME 106: Review Probability theory

Multivariate Random Variable

Chapter 4 Multiple Random Variables

HANDBOOK OF APPLICABLE MATHEMATICS

Expectation of Random Variables

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Transcription:

Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures. They are nowhere near accurate representations of what was actually lectured, and in particular, all errors are almost surely mine. Basic concepts Classical probability, equally likely outcomes. Combinatorial analysis, permutations and combinations. Stirling s formula (asymptotics for log n! proved). [3] Axiomatic approach Axioms (countable case). Probability spaces. Inclusion-exclusion formula. Continuity and subadditivity of probability measures. Independence. Binomial, Poisson and geometric distributions. Relation between Poisson and binomial distributions. Conditional probability, Bayes s formula. Examples, including Simpson s paradox. [5] Discrete random variables Expectation. Functions of a random variable, indicator function, variance, standard deviation. Covariance, independence of random variables. Generating functions: sums of independent random variables, random sum formula, moments. Conditional expectation. Random walks: gambler s ruin, recurrence relations. Difference equations and their solution. Mean time to absorption. Branching processes: generating functions and extinction probability. Combinatorial applications of generating functions. [7] Continuous random variables Distributions and density functions. Expectations; expectation of a function of a random variable. Uniform, normal and exponential random variables. Memoryless property of exponential distribution. Joint distributions: transformation of random variables (including Jacobians), examples. Simulation: generating continuous random variables, independent normal random variables. Geometrical probability: Bertrand s paradox, Buffon s needle. Correlation coefficient, bivariate normal random variables. [6] Inequalities and limits Markov s inequality, Chebyshev s inequality. Weak law of large numbers. Convexity: Jensen s inequality for general random variables, AM/GM inequality. Moment generating functions and statement (no proof) of continuity theorem. Statement of central limit theorem and sketch of proof. Examples, including sampling. [3] 1

Contents IA Probability (Theorems) Contents 0 Introduction 3 1 Classical probability 4 1.1 Classical probability......................... 4 1.2 Counting............................... 4 1.3 Stirling s formula........................... 4 2 Axioms of probability 5 2.1 Axioms and definitions........................ 5 2.2 Inequalities and formulae...................... 5 2.3 Independence............................. 5 2.4 Important discrete distributions................... 6 2.5 Conditional probability....................... 6 3 Discrete random variables 7 3.1 Discrete random variables...................... 7 3.2 Inequalities.............................. 8 3.3 Weak law of large numbers..................... 9 3.4 Multiple random variables...................... 9 3.5 Probability generating functions.................. 10 4 Interesting problems 11 4.1 Branching processes......................... 11 4.2 Random walk and gambler s ruin.................. 11 5 Continuous random variables 12 5.1 Continuous random variables.................... 12 5.2 Stochastic ordering and inspection paradox............ 12 5.3 Jointly distributed random variables................ 12 5.4 Geometric probability........................ 12 5.5 The normal distribution....................... 12 5.6 Transformation of random variables................ 12 5.7 Moment generating functions.................... 13 6 More distributions 14 6.1 Cauchy distribution......................... 14 6.2 Gamma distribution......................... 14 6.3 Beta distribution*.......................... 14 6.4 More on the normal distribution.................. 14 6.5 Multivariate normal......................... 14 7 Central limit theorem 15 8 Summary of distributions 16 8.1 Discrete distributions......................... 16 8.2 Continuous distributions....................... 16 2

0 Introduction IA Probability (Theorems) 0 Introduction 3

1 Classical probability IA Probability (Theorems) 1 Classical probability 1.1 Classical probability 1.2 Counting Theorem (Fundamental rule of counting). Suppose we have to make r multiple choices in sequence. There are m 1 possibilities for the first choice, m 2 possibilities for the second etc. Then the total number of choices is m 1 m 2 m r. 1.3 Stirling s formula Proposition. log n! n log n Theorem (Stirling s formula). As n, ( ) n!e n log = log ( ) 1 2π + O n n n+ 1 2 Corollary. n! 2πn n+ 1 2 e n Proposition (non-examinable). We use the 1/12n term from the proof above to get a better approximation: 2πn n+1/2 e n+ 1 12n+1 n! 2πn n+1/2 e n+ 1 12n. 4

2 Axioms of probability IA Probability (Theorems) 2 Axioms of probability 2.1 Axioms and definitions Theorem. (i) P( ) = 0 (ii) P(A C ) = 1 P(A) (iii) A B P(A) P(B) (iv) P(A B) = P(A) + P(B) P(A B). Theorem. If A 1, A 2, is increasing or decreasing, then ( ) lim P(A n) = P lim A n. n n 2.2 Inequalities and formulae Theorem (Boole s inequality). For any A 1, A 2,, ( ) P A i P(A i ). i=1 Theorem (Inclusion-exclusion formula). ( n ) n P A i = P(A i ) P(A i1 A j2 ) + P(A i1 A i2 A i3 ) i 1 i 1<i 2 i 1<i 2<i 3 + ( 1) n 1 P(A 1 A n ). Theorem (Bonferroni s inequalities). For any events A 1, A 2,, A n and 1 r n, if r is odd, then ( n ) P A i P(A i1 ) P(A i1 A i2 ) + P(A i1 A i2 A i3 ) + 1 i 1 i 1<i 2 i 1<i 2<i 3 + P(A i1 A i2 A i3 A ir ). i 1<i 2< <i r If r is even, then ( n ) P A i P(A i1 ) P(A i1 A i2 ) + P(A i1 A i2 A i3 ) + 1 i 1 i 1<i 2 i 1<i 2<i 3 P(A i1 A i2 A i3 A ir ). i 1<i 2< <i r 2.3 Independence Proposition. If A and B are independent, then A and B C are independent. i=1 5

2 Axioms of probability IA Probability (Theorems) 2.4 Important discrete distributions Theorem (Poisson approximation to binomial). Suppose n and p 0 such that np = λ. Then ( n q k = )p k (1 p) n k λk k k! e λ. 2.5 Conditional probability Theorem. (i) P(A B) = P(A B)P(B). (ii) P(A B C) = P(A B C)P(B C)P(C). (iii) P(A B C) = P(A B C) P(B C). (iv) The function P( B) restricted to subsets of B is a probability function (or measure). Proposition. If B i is a partition of the sample space, and A is any event, then P(A) = P(A B i ) = P(A B i )P(B i ). i=1 Theorem (Bayes formula). Suppose B i is a partition of the sample space, and A and B i all have non-zero probability. Then for any B i, i=1 P(B i A) = P(A B i)p(b i ) j P(A B j)p(b j ). Note that the denominator is simply P(A) written in a fancy way. 6

3 Discrete random variables IA Probability (Theorems) 3 Discrete random variables 3.1 Discrete random variables Theorem. (i) If X 0, then E[X] 0. (ii) If X 0 and E[X] = 0, then P(X = 0) = 1. (iii) If a and b are constants, then E[a + bx] = a + be[x]. (iv) If X and Y are random variables, then E[X + Y ] = E[X] + E[Y ]. This is true even if X and Y are not independent. (v) E[X] is a constant that minimizes E[(X c) 2 ] over c. Theorem. For any random variables X 1, X 2, X n, for which the following expectations exist, [ n ] n E X i = E[X i ]. Theorem. i=1 i=1 (i) var X 0. If var X = 0, then P(X = E[X]) = 1. (ii) var(a + bx) = b 2 var(x). This can be proved by expanding the definition and using the linearity of the expected value. (iii) var(x) = E[X 2 ] E[X] 2, also proven by expanding the definition. Proposition. E[I[A]] = ω p(ω)i[a](ω) = P(A). I[A C ] = 1 I[A]. I[A B] = I[A]I[B]. I[A B] = I[A] + I[B] I[A]I[B]. I[A] 2 = I[A]. Theorem (Inclusion-exclusion formula). ( n ) n P A i = P(A i ) P(A i1 A j2 ) + P(A i1 A i2 A i3 ) i 1 i 1<i 2 i 1<i 2<i 3 + ( 1) n 1 P(A 1 A n ). Theorem. If X 1,, X n are independent random variables, and f 1,, f n are functions R R, then f 1 (X 1 ),, f n (X n ) are independent random variables. Theorem. If X 1,, X n are independent random variables and all the following expectations exists, then [ ] E Xi = E[X i ]. 7

3 Discrete random variables IA Probability (Theorems) Corollary. Let X 1, X n be independent random variables, and f 1, f 2, f n are functions R R. Then [ ] E fi (x i ) = E[f i (x i )]. Theorem. If X 1, X 2, X n are independent random variables, then ( ) var Xi = var(x i ). Corollary. Let X 1, X 2, X n be independent identically distributed random variables (iid rvs). Then ( 1 ) var Xi = 1 n n var(x 1). 3.2 Inequalities Proposition. If f is differentiable and f (x) 0 for all x (a, b), then it is convex. It is strictly convex if f (x) > 0. Theorem (Jensen s inequality). If f : (a, b) R is convex, then ( n n ) p i f(x i ) f p i x i i=1 for all p 1, p 2,, p n such that p i 0 and p i = 1, and x i (a, b). This says that E[f(X)] f(e[x]) (where P(X = x i ) = p i ). If f is strictly convex, then equalities hold only if all x i are equal, i.e. X takes only one possible value. Corollary (AM-GM inequality). Given x 1,, x n positive reals, then i=1 ( ) 1/n 1 xi xi. n Theorem (Cauchy-Schwarz inequality). For any two random variables X, Y, (E[XY ]) 2 E[X 2 ]E[Y 2 ]. Theorem (Markov inequality). If X is a random variable with E X < and ε > 0, then P( X ε) E X. ε Theorem (Chebyshev inequality). If X is a random variable with E[X 2 ] < and ε > 0, then P( X ε) E[X2 ] ε 2. 8

3 Discrete random variables IA Probability (Theorems) 3.3 Weak law of large numbers Theorem (Weak law of large numbers). Let X 1, X 2, be iid random variables, with mean µ and var σ 2. Let S n = n i=1 X i. Then for all ε > 0, as n. We say, Sn n ( ) S n P n µ ε 0 tends to µ (in probability), or S n n p µ. Theorem (Strong law of large numbers). ( ) Sn P n µ as n = 1. We say S n n as µ, where as means almost surely. 3.4 Multiple random variables Proposition. (i) cov(x, c) = 0 for constant c. (ii) cov(x + c, Y ) = cov(x, Y ). (iii) cov(x, Y ) = cov(y, X). (iv) cov(x, Y ) = E[XY ] E[X]E[Y ]. (v) cov(x, X) = var(x). (vi) var(x + Y ) = var(x) + var(y ) + 2 cov(x, Y ). (vii) If X, Y are independent, cov(x, Y ) = 0. Proposition. corr(x, Y ) 1. Theorem. If X and Y are independent, then E[X Y ] = E[X] Theorem (Tower property of conditional expectation). E Y [E X [X Y ]] = E X [X], where the subscripts indicate what variable the expectation is taken over. 9

3 Discrete random variables IA Probability (Theorems) 3.5 Probability generating functions Theorem. The distribution of X is uniquely determined by its probability generating function. Theorem (Abel s lemma). E[X] = lim z 1 p (z). If p (z) is continuous, then simply E[X] = p (1). Theorem. E[X(X 1)] = lim z 1 p (z). Theorem. Suppose X 1, X 2,, X n are independent random variables with pgfs p 1, p 2,, p n. Then the pgf of X 1 + X 2 + + X n is p 1 (z)p 2 (z) p n (z). 10

4 Interesting problems IA Probability (Theorems) 4 Interesting problems 4.1 Branching processes Theorem. F n+1 (z) = F n (F (z)) = F (F (F ( F (z) )))) = F (F n (z)). Theorem. Suppose E[X 1 ] = kp k = µ and Then var(x 1 ) = E[(X µ) 2 ] = (k µ) 2 p k <. E[X n ] = µ n, var X n = σ 2 µ n 1 (1 + µ + µ 2 + + µ n 1 ). Theorem. The probability of extinction q is the smallest root to the equation q = F (q). Write µ = E[X 1 ]. Then if µ 1, then q = 1; if µ > 1, then q < 1. 4.2 Random walk and gambler s ruin 11

5 Continuous random variables IA Probability (Theorems) 5 Continuous random variables 5.1 Continuous random variables Proposition. The exponential random variable is memoryless, i.e. P(X x + z X x) = P(X z). This means that, say if X measures the lifetime of a light bulb, knowing it has already lasted for 3 hours does not give any information about how much longer it will last. Theorem. If X is a continuous random variable, then E[X] = 0 P(X x) dx 0 P(X x) dx. 5.2 Stochastic ordering and inspection paradox 5.3 Jointly distributed random variables Theorem. If X and Y are jointly continuous random variables, then they are individually continuous random variables. Proposition. For independent continuous random variables X i, (i) E[ X i ] = E[X i ] (ii) var( X i ) = var(x i ) 5.4 Geometric probability 5.5 The normal distribution Proposition. Proposition. E[X] = µ. Proposition. var(x) = σ 2. 1 2πσ 2 e 1 2σ 2 (x µ)2 dx = 1. 5.6 Transformation of random variables Theorem. If X is a continuous random variable with a pdf f(x), and h(x) is a continuous, strictly increasing function with h 1 (x) differentiable, then Y = h(x) is a random variable with pdf f Y (y) = f X (h 1 (y)) d dy h 1 (y). Theorem. Let U U[0, 1]. For any strictly increasing distribution function F, the random variable X = F 1 U has distribution function F. Proposition. (Y 1,, Y n ) has density g(y 1,, y n ) = f(s 1 (y 1,, y n ), s n (y 1,, y n )) J if (y 1,, y n ) S, 0 otherwise. 12

5 Continuous random variables IA Probability (Theorems) 5.7 Moment generating functions Theorem. The mgf determines the distribution of X provided m(θ) is finite for all θ in some interval containing the origin. Theorem. The rth moment X is the coefficient of θr r! in the power series expansion of m(θ), and is E[X r ] = dn dθ n m(θ) = m (n) (0). θ=0 Theorem. If X and Y are independent random variables with moment generating functions m X (θ), m Y (θ), then X + Y has mgf m X+Y (θ) = m X (θ)m Y (θ). 13

6 More distributions IA Probability (Theorems) 6 More distributions 6.1 Cauchy distribution Proposition. The mean of the Cauchy distribution is undefined, while E[X 2 ] =. 6.2 Gamma distribution 6.3 Beta distribution* 6.4 More on the normal distribution Proposition. The moment generating function of N(µ, σ 2 ) is E[e θx ] = exp (θµ + 12 ) θ2 σ 2. Theorem. Suppose X, Y are independent random variables with X N(µ 1, σ 2 1), and Y (µ 2, σ 2 2). Then (i) X + Y N(µ 1 + µ 2, σ 2 1 + σ 2 2). (ii) ax N(aµ 1, a 2 σ 2 1). 6.5 Multivariate normal 14

7 Central limit theorem IA Probability (Theorems) 7 Central limit theorem Theorem (Central limit theorem). Let X 1, X 2, be iid random variables with E[X i ] = µ, var(x i ) = σ 2 <. Define Then for all finite intervals (a, b), S n = X 1 + + X n. ( lim P a S ) n nµ b n σ n b 1 = e 1 2 t2 dt. a 2π Note that the final term is the pdf of a standard normal. We say S n nµ σ n D N(0, 1). Theorem (Continuity theorem). If the random variables X 1, X 2, have mgf s m 1 (θ), m 2 (θ), and m n (θ) m(θ) as n for all θ, then X n D the random variable with mgf m(θ). 15

8 Summary of distributions IA Probability (Theorems) 8 Summary of distributions 8.1 Discrete distributions 8.2 Continuous distributions 16