Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures. They are nowhere near accurate representations of what was actually lectured, and in particular, all errors are almost surely mine. Basic concepts Classical probability, equally likely outcomes. Combinatorial analysis, permutations and combinations. Stirling s formula (asymptotics for log n! proved). [3] Axiomatic approach Axioms (countable case). Probability spaces. Inclusion-exclusion formula. Continuity and subadditivity of probability measures. Independence. Binomial, Poisson and geometric distributions. Relation between Poisson and binomial distributions. Conditional probability, Bayes s formula. Examples, including Simpson s paradox. [5] Discrete random variables Expectation. Functions of a random variable, indicator function, variance, standard deviation. Covariance, independence of random variables. Generating functions: sums of independent random variables, random sum formula, moments. Conditional expectation. Random walks: gambler s ruin, recurrence relations. Difference equations and their solution. Mean time to absorption. Branching processes: generating functions and extinction probability. Combinatorial applications of generating functions. [7] Continuous random variables Distributions and density functions. Expectations; expectation of a function of a random variable. Uniform, normal and exponential random variables. Memoryless property of exponential distribution. Joint distributions: transformation of random variables (including Jacobians), examples. Simulation: generating continuous random variables, independent normal random variables. Geometrical probability: Bertrand s paradox, Buffon s needle. Correlation coefficient, bivariate normal random variables. [6] Inequalities and limits Markov s inequality, Chebyshev s inequality. Weak law of large numbers. Convexity: Jensen s inequality for general random variables, AM/GM inequality. Moment generating functions and statement (no proof) of continuity theorem. Statement of central limit theorem and sketch of proof. Examples, including sampling. [3] 1
Contents IA Probability (Definitions) Contents 0 Introduction 3 1 Classical probability 4 1.1 Classical probability......................... 4 1.2 Counting............................... 4 1.3 Stirling s formula........................... 4 2 Axioms of probability 5 2.1 Axioms and definitions........................ 5 2.2 Inequalities and formulae...................... 6 2.3 Independence............................. 6 2.4 Important discrete distributions................... 6 2.5 Conditional probability....................... 7 3 Discrete random variables 8 3.1 Discrete random variables...................... 8 3.2 Inequalities.............................. 9 3.3 Weak law of large numbers..................... 9 3.4 Multiple random variables...................... 9 3.5 Probability generating functions.................. 10 4 Interesting problems 11 4.1 Branching processes......................... 11 4.2 Random walk and gambler s ruin.................. 11 5 Continuous random variables 12 5.1 Continuous random variables.................... 12 5.2 Stochastic ordering and inspection paradox............ 13 5.3 Jointly distributed random variables................ 13 5.4 Geometric probability........................ 14 5.5 The normal distribution....................... 14 5.6 Transformation of random variables................ 14 5.7 Moment generating functions.................... 15 6 More distributions 16 6.1 Cauchy distribution......................... 16 6.2 Gamma distribution......................... 16 6.3 Beta distribution*.......................... 16 6.4 More on the normal distribution.................. 16 6.5 Multivariate normal......................... 16 7 Central limit theorem 17 8 Summary of distributions 18 8.1 Discrete distributions......................... 18 8.2 Continuous distributions....................... 18 2
0 Introduction IA Probability (Definitions) 0 Introduction 3
1 Classical probability IA Probability (Definitions) 1 Classical probability 1.1 Classical probability Definition (Classical probability). Classical probability applies in a situation when there are a finite number of equally likely outcome. Definition (Sample space). The set of all possible outcomes is the sample space, Ω. We can lists the outcomes as ω 1, ω 2, Ω. Each ω Ω is an outcome. Definition (Event). A subset of Ω is called an event. Definition (Set notations). Given any two events A, B Ω, The complement of A is A C = A = Ā = Ω \ A. A or B is the set A B. A and B is the set A B. A and B are mutually exclusive or disjoint if A B =. If A B, then A occurring implies B occurring. Definition (Probability). Suppose Ω = {ω 1, ω 2,, ω N }. Let A Ω be an event. Then the probability of A is 1.2 Counting P(A) = Number of outcomes in A Number of outcomes in Ω = A N. Definition (Sampling with replacement). When we sample with replacement, after choosing at item, it is put back and can be chosen again. Then any sampling function f satisfies sampling with replacement. Definition (Sampling without replacement). When we sample without replacement, after choosing an item, we kill it with fire and cannot choose it again. Then f must be an injective function, and clearly we must have X n. Definition (Multinomial coefficient). A multinomial coefficient is ( ) ( )( ) ( ) n n n n1 n n1 n k 1 n! = = n 1, n 2,, n k n 1!n 2! n k!. n 1 n 2 It is the number of ways to distribute n items into k positions, in which the ith position has n i items. 1.3 Stirling s formula n k 4
2 Axioms of probability IA Probability (Definitions) 2 Axioms of probability 2.1 Axioms and definitions Definition (Probability space). A probability space is a triple (Ω, F, P). Ω is a set called the sample space, F is a collection of subsets of Ω, and P : F [0, 1] is the probability measure. F has to satisfy the following axioms: (i), Ω F. (ii) A F A C F. (iii) A 1, A 2, F i=1 A i F. And P has to satisfy the following Kolmogorov axioms: (i) 0 P(A) 1 for all A F (ii) P(Ω) = 1 (iii) For any countable collection of events A 1, A 2, which are disjoint, i.e. A i A j = for all i, j, we have ( ) P A i = P(A i ). i i Items in Ω are known as the outcomes, items in F are known as the events, and P(A) is the probability of the event A. Definition (Probability distribution). Let Ω = {ω 1, ω 2, }. Choose numbers p 1, p 2, such that i=1 p i = 1. Let p(ω i ) = p i. Then define P(A) = p(ω i ). ω i A This P(A) satisfies the above axioms, and p 1, p 2, is the probability distribution Definition (Limit of events). A sequence of events A 1, A 2, is increasing if A 1 A 2. Then we define the limit as lim A n = n A n. Similarly, if they are decreasing, i.e. A 1 A 2, then lim A n = n 1 A n. 1 5
2 Axioms of probability IA Probability (Definitions) 2.2 Inequalities and formulae 2.3 Independence Definition (Independent events). Two events A and B are independent if P(A B) = P(A)P(B). Otherwise, they are said to be dependent. Definition (Independence of multiple events). Events A 1, A 2, are said to be mutually independent if P(A i1 A i2 A ir ) = P(A i1 )P(A i2 ) P(A ir ) for any i 1, i 2, i r and r 2. 2.4 Important discrete distributions Definition (Bernoulli distribution). Suppose we toss a coin. Ω = {H, T } and p [0, 1]. The Bernoulli distribution, denoted B(1, p) has P(H) = p; P(T ) = 1 p. Definition (Binomial distribution). Suppose we toss a coin n times, each with probability p of getting heads. Then So In general, P(HHT T T ) = pp(1 p) (1 p). P(two heads) = P(k heads) = ( ) n p 2 (1 p) n 2. 2 ( ) n p k (1 p) n k. k We call this the binomial distribution and write it as B(n, p). Definition (Geometric distribution). Suppose we toss a coin with probability p of getting heads. The probability of having a head after k consecutive tails is p k = (1 p) k p This is geometric distribution. We say it is memoryless because how many tails we ve got in the past does not give us any information to how long I ll have to wait until I get a head. Definition (Hypergeometric distribution). Suppose we have an urn with n 1 red balls and n 2 black balls. We choose n balls. The probability that there are k red balls is ( n1 )( n2 ) P(k red) = k n k ). ( n1+n 2 n Definition (Poisson distribution). The Poisson distribution denoted P (λ) is for k N. p k = λk k! e λ 6
2 Axioms of probability IA Probability (Definitions) 2.5 Conditional probability Definition (Conditional probability). Suppose B is an event with P(B) > 0. For any event A Ω, the conditional probability of A given B is P(A B) = P(A B). P(B) We interpret as the probability of A happening given that B has happened. Definition (Partition). A partition of the sample space is a collection of disjoint events {B i } i=0 such that i B i = Ω. 7
3 Discrete random variables IA Probability (Definitions) 3 Discrete random variables 3.1 Discrete random variables Definition (Random variable). A random variable X taking values in a set Ω X is a function X : Ω Ω X. Ω X is usually a set of numbers, e.g. R or N. Definition (Discrete random variables). A random variable is discrete if Ω X is finite or countably infinite. Notation. Let T Ω X, define P(X T ) = P({ω Ω : X(ω) T }). i.e. the probability that the outcome is in T. Definition (Discrete uniform distribution). A discrete uniform distribution is a discrete distribution with finitely many possible outcomes, in which each outcome is equally likely. Notation. We write P X (x) = P(X = x). We can also write X B(n, p) to mean ( ) n P(X = r) = p r (1 p) n r, r and similarly for the other distributions we have come up with before. Definition (Expectation). The expectation (or mean) of a real-valued X is equal to E[X] = ω Ω p ω X(ω). provided this is absolutely convergent. Otherwise, we say the expectation doesn t exist. Alternatively, E[X] = p ω X(ω) x Ω X ω:x(ω)=x = x Ω X x = ω:x(ω)=x p ω x Ω X xp (X = x). We are sometimes lazy and just write EX. Definition (Variance and standard deviation). The variance of a random variable X is defined as var(x) = E[(X E[X]) 2 ]. The standard deviation is the square root of the variance, var(x). 8
3 Discrete random variables IA Probability (Definitions) Definition (Indicator function). The indicator function or indicator variable I[A] (or I A ) of an event A Ω is { 1 ω A I[A](ω) = 0 ω A Definition (Independent random variables). Let X 1, X 2,, X n be discrete random variables. They are independent iff for any x 1, x 2,, x n, 3.2 Inequalities P(X 1 = x 1,, X n = x n ) = P(X 1 = x 1 ) P(X n = x n ). Definition (Convex function). A function f : (a, b) R is convex if for all x 1, x 2 (a, b) and λ 1, λ 2 0 such that λ 1 + λ 2 = 1, λ 1 f(x 1 ) + λ 2 f(x 2 ) f(λ 1 x 1 + λ 2 x 2 ). It is strictly convex if the inequality above is strict (except when x 1 = x 2 or λ 1 or λ 2 = 0). λ 1 f(x 1 ) + λ 2 f(x 2 ) A function is concave if f is convex. λ x 1 x 1 + λ 2 x 2 1 x 2 3.3 Weak law of large numbers 3.4 Multiple random variables Definition (Covariance). Given two random variables X, Y, the covariance is cov(x, Y ) = E[(X E[X])(Y E[Y ])]. Definition (Correlation coefficient). The correlation coefficient of X and Y is corr(x, Y ) = cov(x, Y ) var(x) var(y ). Definition (Conditional distribution). Let X and Y be random variables (in general not independent) with joint distribution P(X = x, Y = y). Then the marginal distribution (or simply distribution) of X is P(X = x) = y Ω y P(X = x, Y = y). 9
3 Discrete random variables IA Probability (Definitions) The conditional distribution of X given Y is P(X = x Y = y) = The conditional expectation of X given Y is E[X Y = y] = P(X = x, Y = y). P(Y = y) x Ω X xp(x = x Y = y). We can view E[X Y ] as a random variable in Y : given a value of Y, we return the expectation of X. 3.5 Probability generating functions Definition (Probability generating function (pgf)). The probability generating function (pgf) of X is p(z) = E[z X ] = P(X = r)z r = p 0 + p 1 z + p 2 z 2 = r=0 p r z r. 0 This is a power series (or polynomial), and converges if z 1, since p(z) r p r z r r p r = 1. We sometimes write as p X (z) to indicate what the random variable. 10
4 Interesting problems IA Probability (Definitions) 4 Interesting problems 4.1 Branching processes 4.2 Random walk and gambler s ruin Definition (Random walk). Let X 1,, X n be iid random variables such that X n = +1 with probability p, and 1 with probability 1 p. Let S n = S 0 + X 1 + + X n. Then (S 0, S 1,, S n ) is a 1-dimensional random walk. If p = q = 1 2, we say it is a symmetric random walk. 11
5 Continuous random variables IA Probability (Definitions) 5 Continuous random variables 5.1 Continuous random variables Definition (Continuous random variable). A random variable X : Ω R is continuous if there is a function f : R R 0 such that P(a X b) = b a f(x) dx. We call f the probability density function, which satisfies f 0 f(x) = 1. Definition (Cumulative distribution function). The cumulative distribution function (or simply distribution function) of a random variable X (discrete, continuous, or neither) is F (x) = P(X x). Definition (Uniform distribution). The uniform distribution on [a, b] has pdf Then F (x) = f(x) = 1 b a. x a f(z) dz = x a b a for a x b. If X follows a uniform distribution on [a, b], we write X U[a, b]. Definition (Exponential random variable). The exponential random variable with parameter λ has pdf f(x) = λe λx and for x 0. We write X E(λ). F (x) = 1 e λx Definition (Expectation). The expectation (or mean) of a continuous random variable is E[X] = xf(x) dx, provided not both xf(x) dx and 0 xf(x) dx are infinite. 0 Definition (Variance). The variance of a continuous random variable is var(x) = E[(X E[X]) 2 ] = E[X 2 ] (E[X]) 2. 12
5 Continuous random variables IA Probability (Definitions) Definition (Mode and median). Given a pdf f(x), we call ˆx a mode if f(ˆx) f(x) for all x. Note that a distribution can have many modes. For example, in the uniform distribution, all x are modes. We say it is a median if ˆx f(x) dx = 1 2 = f(x) dx. For a discrete random variable, the median is ˆx such that P(X ˆx) 1 2, P(X ˆx) 1 2. Here we have a non-strict inequality since if the random variable, say, always takes value 0, then both probabilities will be 1. ˆx Definition (Sample mean). If X 1,, X n distribution, then the sample mean is is a random sample from some X = 1 n n X i. 1 5.2 Stochastic ordering and inspection paradox Definition (Stochastic order). The stochastic order is defined as: X st Y if P(X > t) P(Y > t) for all t. 5.3 Jointly distributed random variables Definition (Joint distribution). Two random variables X, Y have joint distribution F : R 2 [0, 1] defined by The marginal distribution of X is F (x, y) = P(X x, Y y). F X (x) = P(X x) = P(X x, Y < ) = F (x, ) = lim F (x, y) y Definition (Jointly distributed random variables). We say X 1,, X n are jointly distributed continuous random variables and have joint pdf f if for any set A R n P((X 1,, X n ) A) = f(x 1,, x n ) dx 1 dx n. (x 1, x n) A where and f(x 1,, x n ) 0 R n f(x 1,, x n ) dx 1 dx n = 1. 13
5 Continuous random variables IA Probability (Definitions) Definition (Independent continuous random variables). Continuous random variables X 1,, X n are independent if P(X 1 A 1, X 2 A 2,, X n A n ) = P(X 1 A 1 )P(X 2 A 2 ) P(X n A n ) for all A i Ω Xi. If we let F Xi and f Xi be the cdf, pdf of X, then and F (x 1,, x n ) = F X1 (x 1 ) F Xn (x n ) f(x 1,, x n ) = f X1 (x 1 ) f Xn (x n ) are each individually equivalent to the definition above. 5.4 Geometric probability 5.5 The normal distribution Definition (Normal distribution). The normal distribution with parameters µ, σ 2, written N(µ, σ 2 ) has pdf f(x) = 1 ) (x µ)2 exp ( 2πσ 2σ 2, for < x <. It looks approximately like this: µ The standard normal is when µ = 0, σ 2 = 1, i.e. X N(0, 1). We usually write φ(x) for the pdf and Φ(x) for the cdf of the standard normal. 5.6 Transformation of random variables Definition (Jacobian determinant). Suppose si y j exists and is continuous at every point (y 1,, y n ) S. Then the Jacobian determinant is s 1 J = (s y 1 1,, s n ) (y 1,, y n ) = det..... s n y 1 Definition (Order statistics). Suppose that X 1,, X n are some random variables, and Y 1,, Y n is X 1,, X n arranged in increasing order, i.e. Y 1 Y 2 Y n. This is the order statistics. We sometimes write Y i = X (i). s 1 y n s n y n 14
5 Continuous random variables IA Probability (Definitions) 5.7 Moment generating functions Definition (Moment generating function). The moment generating function of a random variable X is m(θ) = E[e θx ]. For those θ in which m(θ) is finite, we have m(θ) = e θx f(x) dx. Definition (Moment). The rth moment of X is E[X r ]. 15
6 More distributions IA Probability (Definitions) 6 More distributions 6.1 Cauchy distribution Definition (Cauchy distribution). The Cauchy distribution has pdf for < x <. f(x) = 6.2 Gamma distribution 1 π(1 + x 2 ) Definition (Gamma distribution). The gamma distribution Γ(n, λ) has pdf f(x) = λn x n 1 e λx. (n 1)! We can show that this is a distribution by showing that it integrates to 1. 6.3 Beta distribution* Definition (Beta distribution). The beta distribution β(a, b) has pdf f(x; a, b) = for 0 x 1. This has mean a/(a + b). Γ(a + b) Γ(a)Γ(b) xa 1 (1 x) b 1 6.4 More on the normal distribution 6.5 Multivariate normal 16
7 Central limit theorem IA Probability (Definitions) 7 Central limit theorem 17
8 Summary of distributions IA Probability (Definitions) 8 Summary of distributions 8.1 Discrete distributions 8.2 Continuous distributions 18