PROBABILITY VITTORIA SILVESTRI

Similar documents
PROBABILITY. Contents Preface 1 1. Introduction 2 2. Combinatorial analysis 5 3. Stirling s formula 8. Preface

PROBABILITY VITTORIA SILVESTRI

Probability and Statistics. Vittoria Silvestri

2. AXIOMATIC PROBABILITY

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Statistical Inference

CMPSCI 240: Reasoning about Uncertainty

6 CARDINALITY OF SETS

Lecture 1: An introduction to probability theory

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Discrete Mathematics & Mathematical Reasoning Chapter 6: Counting

1 Basic Combinatorics

Module 1. Probability

The Inclusion Exclusion Principle

the time it takes until a radioactive substance undergoes a decay

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

Notes. Combinatorics. Combinatorics II. Notes. Notes. Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry. Spring 2006

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Lecture 3. Probability and elements of combinatorics

Lecture 4: Counting, Pigeonhole Principle, Permutations, Combinations Lecturer: Lale Özkahya

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

Notes on statistical tests

Chapter 2 Class Notes

Discrete Probability

2030 LECTURES. R. Craigen. Inclusion/Exclusion and Relations

Solution Set for Homework #1

Econ 325: Introduction to Empirical Economics

Economics 204 Fall 2011 Problem Set 1 Suggested Solutions

Axiomatic Foundations of Probability. Definition: Probability Function

EE 178 Lecture Notes 0 Course Introduction. About EE178. About Probability. Course Goals. Course Topics. Lecture Notes EE 178

Undergraduate Probability I. Class Notes for Engineering Students

Basic counting techniques. Periklis A. Papakonstantinou Rutgers Business School

Probability 1 (MATH 11300) lecture slides

Probability and distributions. Francesco Corona

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

MATH 556: PROBABILITY PRIMER

Statistical Theory 1

Properties of Probability

1. Discrete Distributions

STAT 516 Answers Homework 2 January 23, 2008 Solutions by Mark Daniel Ward PROBLEMS. = {(a 1, a 2,...) : a i < 6 for all i}

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Mathematical Probability

1. How many labeled trees are there on n vertices such that all odd numbered vertices are leaves?

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

Probability, Random Processes and Inference

Probability. J.R. Norris. December 13, 2017

Independence. P(A) = P(B) = 3 6 = 1 2, and P(C) = 4 6 = 2 3.

= 2 5 Note how we need to be somewhat careful with how we define the total number of outcomes in b) and d). We will return to this later.

MATH 220 (all sections) Homework #12 not to be turned in posted Friday, November 24, 2017

Statistics 1 - Lecture Notes Chapter 1

Introduction to Probability. Ariel Yadin. Lecture 1. We begin with an example [this is known as Bertrand s paradox]. *** Nov.

Lecture 1. ABC of Probability

MATH39001 Generating functions. 1 Ordinary power series generating functions

Lecture 8: Probability

Sample Spaces, Random Variables

Introduction and basic definitions

CS 246 Review of Proof Techniques and Probability 01/14/19

Deep Learning for Computer Vision

Probability Theory Review

HW2 Solutions, for MATH441, STAT461, STAT561, due September 9th

Chapter 4. Measure Theory. 1. Measure Spaces

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

STAT Chapter 3: Probability

tossing a coin selecting a card from a deck measuring the commuting time on a particular morning

HW MATH425/525 Lecture Notes 1

Lectures on Elementary Probability. William G. Faris

Chapter 2 PROBABILITY SAMPLE SPACE

STAT:5100 (22S:193) Statistical Inference I

Notes 1 : Measure-theoretic foundations I

Definition: Let S and T be sets. A binary relation on SxT is any subset of SxT. A binary relation on S is any subset of SxS.

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Given a experiment with outcomes in sample space: Ω Probability measure applied to subsets of Ω: P[A] 0 P[A B] = P[A] + P[B] P[AB] = P(AB)

Chapter 1. Sets and probability. 1.3 Probability space

MAT 271E Probability and Statistics

Discrete Mathematics: Lectures 6 and 7 Sets, Relations, Functions and Counting Instructor: Arijit Bishnu Date: August 4 and 6, 2009

Introduction to Decision Sciences Lecture 11

Math 463/563 Homework #2 - Solutions

Math 564 Homework 1. Solutions.

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Lecture 1. Chapter 1. (Part I) Material Covered in This Lecture: Chapter 1, Chapter 2 ( ). 1. What is Statistics?

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Mathematical Structures Combinations and Permutations

Chapter 3 : Conditional Probability and Independence

Prof. Ila Varma HW 8 Solutions MATH 109. A B, h(i) := g(i n) if i > n. h : Z + f((i + 1)/2) if i is odd, g(i/2) if i is even.

PRIMES Math Problem Set

Name: Exam 2 Solutions. March 13, 2017

Discrete Mathematics

Problem # Number of points 1 /20 2 /20 3 /20 4 /20 5 /20 6 /20 7 /20 8 /20 Total /150

Lectures for STP 421: Probability Theory. Jay Taylor

Monty Hall Puzzle. Draw a tree diagram of possible choices (a possibility tree ) One for each strategy switch or no-switch

Probabilistic models

2. Counting and Probability

2.1 Elementary probability; random sampling

STOR Lecture 4. Axioms of Probability - II

STAT 712 MATHEMATICAL STATISTICS I

Lecture 2 31 Jan Logistics: see piazza site for bootcamps, ps0, bashprob

Review Counting Principles Theorems Examples. John Venn. Arthur Berg Counting Rules 2/ 21

Transcription:

PROBABILITY VITTORIA SILVESTRI Contents Preface. Introduction 2 2. Combinatorial analysis 5 3. Stirling s formula 8 4. Properties of Probability measures Preface These lecture notes are for the course Probability IA, given in Lent 209 at the University of Cambridge. The contents are closely based on the following lecture notes, all available online: James Norris: http://www.statslab.cam.ac.uk/ james/lectures/p.pdf Douglas Kennedy: http://trin-hosts.trin.cam.ac.uk/fellows/dpk0/ia/iaprob.html Richard Weber: http://www.statslab.cam.ac.uk/ rrw/prob/prob-weber.pdf Appropriate books for this course are: W. Feller An Introduction to Probability Theory and its Applications, Vol. I. Wiley 968 G. Grimmett and D. Welsh Probability: An Introduction. Oxford University Press 2nd Edition 204 S. Ross A First Course in Probability. Prentice Hall 2009 D.R. Stirzaker Elementary Probability. Cambridge University Press 994/2003 Please notify vs358@cam.ac.uk for comments and corrections.

2 VITTORIA SILVESTRI. Introduction This course concerns the study of experiments with random outcomes, such as rolling a die, tossing a coin or drawing a card from a standard deck. Say that the set of possible outcomes is Ω = {ω, ω 2, ω 3,...}. We call Ω sample space, while its elements are called outcomes. A subset A Ω is called an event. Example. (Rolling a die. Toss a normal six-faced die: the sample space is Ω = {, 2, 3, 4, 5, 6}. Examples of events are: {5} (the outcome is 5 {, 3, 5} (the outcome is odd {3, 6} (the outcome is divisible by 3 Example.2 (Drawing a card. Draw a card from a standard deck: Ω is the set of all possible cards, so that Ω = 52. Examples of events are: A = {the card is a Jack}, A = 4 A 2 = {the card is Diamonds}, A 2 = 3 A 3 = {the card is not the Queen of Spades}, A 3 = 5. Example.3 (Picking a natural number. Pick any natural number: the sample space is Ω = N. Examples of events are: {the number is at most 5} = {0,, 2, 3, 4, 5} {the number is even} = {2, 4, 6, 8...} {the number is not 7} = N \ {7}. Example.4 (Picking a real number. Pick any number in the closed interval [0, ]: the sample space is Ω = [0, ]. Examples of events are: {x : x < /3} = [0, /3 {x : x 0.7} = [0, ] \ {0.7} = [0, 0.7 (0.7, ] {x : x = 2 n for some n N} = {, /2, /4, /8...}. Note that the sample space Ω is finite in the first two examples, infinite but countable in the third example, and uncountable in the last example. Remark.5. For the first part of the course we will restrict to countable sample spaces, thus excluding Example.4 above.

PROBABILITY 3 We can now give a general definition. Definition.6 (Probability space. Let Ω be any set, and F be a set of subsets of Ω. We say that F is a σ-algebra if - Ω F, - if A F, then A c F, - for every sequence (A n n in F, it holds n= A n F. Assume that F is a σ-algebra. A function P : F [0, ] is called a probability measure if - P(Ω =, - for any sequence of disjoint events (A n n it holds ( P A n = P(A n. n= The triple (Ω, F, P is called a probability space. Remark.7. In the case of countable state space we take F to be the set of all subsets of Ω, unless otherwise stated. We think of F as the collection of observable events. If A F, then P(A is the probability of the event A. In some probability models, such as the one in Example.4, the probability of each individual outcome is 0. This is one reason why we need to specify probabilities of events rather than outcomes... Equally likely outcomes. The simplest case is that of a finite (non-empty sample space n= Ω = {ω, ω 2... ω Ω }, Ω < and equally likely outcomes: P(A = A Ω Note that taking A = {w i } we find P({ω i } = Ω A F. i Ω, thus all outcomes are equally likely. To check that P is a probability measure, note that P(Ω = Ω / Ω =, and for disjoint events (A k n it holds ( n P A k = A A 2... A n n A k n = = P(A k, Ω Ω as wanted. Moreover, we have the following properties:

4 VITTORIA SILVESTRI P( = 0, if A B then P(A P(B, P(A B = P(A + P(B P(A B, P(A c = P(A. Example.8. When rolling a fair die there are 6 possible outcomes, all equally likely. Then Ω = {, 2, 3, 4, 5, 6}, P({i} = /6 for i =... 6. So P(even outcome = P({2, 4, 6} = /2, while P(outcome 5 = P({, 2, 3, 4, 5} = 5/6. Example.9 (Largest digit. Consider a string of random digits 0,... 9 of length n. For 0 k 9, what is the probability that the largest digit is at most k? And exactly k? We model the set of all possible strings by Ω = {0,... 9} n, so that Ω = 0 n, and all elements of Ω are equally likely. Let A k denote the event that none of the digit exceeds k. Then A k = (k + n, so P(A k = A k (k + n = Ω 0 n. To answer the second question, let B k denote the event that the largest digit equals k, and note that B k = A k \ A k. Since A k A k, it follows that P(B k = P(A k P(A k = (k + n k n 0 n. Example.0 (The birthday problem. Suppose there are n 2 people in the room. What is the probability that at least two of them share the same birthday? To answer the question, we assume that no-one was born on the 29th of February, the other dates all being equally likely. Then Ω = {, 2... 365} n and Ω = 365 n. Let A n denote the event that at least two people share the same birthday. If n > 365 then P(A n =, so we restrict to n 365. Then P(A n = P(A c 365 364 (365 n + n = 365 n. You can check that P(A n /2 as soon as n 23.

PROBABILITY 5 2. Combinatorial analysis We have seen that it is often necessary to be able to count the number of subsets of Ω with a given property. We now take a systematic look at some counting methods. 2.. Multiplication rule. Take N finite sets Ω, Ω 2... Ω N (some of which might coincide, with cardinalities Ω k = n k. We imagine to pick one element from each set: how many possible ways do we have to do so? Clearly, we have n choices for the first element. Now, for each choice of the first element, we have n 2 choices for the second, so that Ω Ω 2 = n n 2. Once the first two element, we have n 3 choices for the third, and so on, giving We refer to this as the multiplication rule. Ω Ω 2... Ω N = n n 2 n N. Example 2. (The number of subsets. Suppose a set Ω = {ω, ω 2... ω n } has n elements. How many subsets does Ω have? We proceed as follows. To each subset A of Ω we can associate a sequence of 0 s and s of length n so that the i th number is if ω i is in A, and 0 otherwise. Thus if, say, Ω = {ω, ω 2, ω 3, ω 4 } then A = {ω }, 0, 0, 0 A 2 = {ω, ω 3, ω 4 }, 0,, A 3 = 0, 0, 0, 0. This defines a bijection between the subsets of Ω and the strings of 0 s and s of length n. Thus we have to count the number of such strings. Since for each element we have 2 choices (either 0 or, there are 2 n strings. This shows that a set of n elements has 2 n subsets. Note that this also counts the number of functions from a set of n elements to {0, }. 2.2. Permutations. How many possible orderings of n elements are there? Label the elements {, 2... n}. A permutation is a bijection from {, 2... n} to itself, i.e. an ordering of the elements. We may obtain all permutations by subsequently choosing the image of element, then the image of element 2 and so on. We have n choices for the image of, then n choices for the image of 2, n 2 choices for the image of 3 until we have only one choice for the image of n. Thus the total number of choices is, by the multiplication rule, n! = n(n (n 2. Thus there are n! different orderings, or permutations, of n elements. Equivalently, there are n! different bijections from any two sets of n elements. Example 2.2. There are 52! possible orderings of a standard deck of cards.

6 VITTORIA SILVESTRI 2.3. Subsets. How many ways are there to choose k elements from a set of n elements? 2.3.. With ordering. We have n choices for the first element, n choices for the second element and so on, ending with n k + choices for the k th element. Thus there are (2. n(n (n k + = n! (n k! ways to choose k ordered elements from n. An alternative way to obtain the above formula is the following: to pick k ordered elements from n, first pick a permutation of the n elements (n! choices, then forget all elements but the first k. Since for each choice of the first k elements there are (n k! permutations starting with those k elements, we again obtain (2.. 2.3.2. Without ordering. To choose k unordered elements from n, we could first choose k ordered elements, and then forget about the order. Recall that there are n!/(n k! possible ways to choose k ordered elements from n. Moreover, any given k elements can be ordered in k! possible ways. Thus there are ( n n! = k k!(n k! possible ways to choose k unordered elements from n. More generally, suppose we have integers n, n 2... n k with n + n 2 + + n k = n. Then there are ( n n! = n... n k n!... n k! possible ways to partition n elements in k subsets of cardinalities n,... n k. 2.4. Subsets with repetitions. How many ways are there to choose k elements from a set of n elements, allowing repetitions? 2.4.. With ordering. We have n choices for the first element, n choices for the second element and so on. Thus there are n k = n n n possible ways to choose k ordered elements from n, allowing repetitions. 2.4.2. Without ordering. Suppose we want to choose k elements from n, allowing repetitions but discarding the order. How many ways do we have to do so? Note that naïvely dividing n k by k! doesn t give the right answer, since there may be repetitions. Instead, we count as follows. Label the n elements {, 2... n}, and for each element draw a each time it is picked. 2 3... n ** *... ***

PROBABILITY 7 Note that there are k s and n vertical lines. Now delete the numbers: (2.2... The above diagram uniquely identifies an unordered set of (possibly repeated k elements. Thus we simply have to count how many such diagrams there are. The only restriction is that there must be n vertical lines and k s. Since there are n + k locations, we can fix such a diagram by assigning the positions of the s, which can be done in ( n + k k ways. This therefore counts the number of unordered subsets of k elements from n, without ordering. Example 2.3 (Increasing and non decreasing functions. An increasing function from {, 2... k} to {, 2... n} is uniquely determined by its range, which is a subset of {, 2... n} of size k. Vice versa, each such subset determines a unique increasing function. This bijection tells us that there are ( n k increasing functions from {, 2... k} to {, 2... n}, since this is the number of subsets of size k of {, 2,... n}. How about non decreasing functions? There is a bijection from the set of non decreasing functions f : {, 2... k} {, 2... n} to the set of increasing functions g : {, 2... k} {, 2... n + k }, given by g(i = f(i + i for i k. Hence the number of such decreasing functions is ( n+k k. Example 2.4 (Ordered partitions. An ordered partition of k of size n is a sequence (k, k 2... k n of non-negative integers such that k + + k n = k. How many ordered partitions of k of size n are there? We give a graphic representation of each such partition as follows: draw k s followed by a vertical line, then k 2 s followed by another vertical line and so on, closing with k n s, e.g. (, 0, 3, 2 Note the analogy with (2.2. Now, since this determines a bijection, it again suffices to count the number of diagrams made of k s and n vertical line, which is ( n+k k.

8 VITTORIA SILVESTRI 3. Stirling s formula We have seen how factorials are ubiquitous in combinatorial analysis. It is therefore important to be able to provide asymptotics for n! as n becomes large, which is often the case of interest. Recall that we write a n b n to mean that a n /b n as n. Theorem 3. (Stirling s formula. As n we have n! 2πn n+/2 e n. Note that this implies that log(n! log( 2πn n+/2 e n as n. We now prove this weaker statement. Set l n = log(n! = log + log 2 + + log n. Write x for the integer part of x. Then Integrate over the interval [, n] to get log x log x log x +. from which Integrating by parts we find from which we deduce that n n l n n log xdx l n log xdx l n n+ log xdx. log xdx = n log n n +, n log n n + l n (n + log(n + n. Since n log n n + n log n, (n + log(n + n n log n, dividing through by n log n and taking the limit as n we find that l n n log n. Since log( 2πn n+/2 e n n log n, this concludes the proof.

Proof of Stirling s formula non-examinable. Note the identity b a f(xdx = PROBABILITY 9 f(a + f(b (b a 2 2 b a (x a(b xf (xdx, which may be checked by integrating by parts twice the right hand side. Take f(x = log x to find k+ k log xdx = = log k + log(k + 2 log k + log(k + 2 for all k. Sum over k n to get from which n + 2 + 2 k+ k log xdx = 2 log((n! + 2 log(n! + 2 0 (x k(k + x x 2 dx x( x (x + k 2 dx n 0 x( x (x + k 2 dx, n log n n + = log(n! n 2 log n + x( x 2 0 (x + k 2 dx }{{} a k Rearranging for log(n! we find Note that since = log(n! n 2 log n + a k. log(n! = a k 2k 2 ( n + n log n n + a k. 2 we have a k <. We therefore define 0 A = exp x( xdx = 2k 2, ( a k and exponentiate both sides to find ( n! = An n+/2 e n exp a k. Since as n, this gives a k 0 k=n k=n n! An n+/2 e n.

0 VITTORIA SILVESTRI It only remains to show that A = 2π. Using the above asymptotic, we find ( 2n 2 2 2n n A n, so it suffices to show that To see this, set 2 2n ( 2n n I n = π/2 0 πn. cos n θdθ. Then I 0 = π/2, I = and integrating by parts we obtain for all n 2. Thus I n = n n I n 2 I 2n = 2n 2n... 3 ( 2n π 4 2 I 0 = 2 2n n 2, ( ( 2n 2 2n n I 2n+ = 2n 2n +... 4 2 5 3 I = But (I n n is decreasing in n, and I n 2 I n = n n as wanted. ( 2 2n ( 2n n 2, so also I 2n +. 2n I 2n+ 2 (2n + π nπ,, and we conclude that Example 3.2. Suppose we have 4n balls, of which 2n are red and 2n are black. We put them randomly into two urns, so that each urn contains 2n balls. What is the probability that each urn contains exactly n red balls and n black balls? Call this probability p n. Note that there are ( ( 4n 2n ways to distribute the balls into the two urns, 2n ( 2n n n of which will result in each urn containing exactly n balls of each color. Thus ( ( 2n 2n p n = ( n n (2n! ( = 4n n! 2n 4 (4n! ( 2πe 2n (2n 2n+/2 2πe n n n+/2 4 2πe 4n (4n 4n+/2 = 2 πn.

PROBABILITY 4. Properties of Probability measures Let (Ω, F, P be a probability space, and recall that P : F [0, ] has the property that P(Ω = and for any sequence (A n n of disjoint events in F it holds ( P A n = P(A n. n= This holds in particular for any finite collection (A n N n= of disjoint events (simply make into an infinite sequence by setting A n = for all n N +. Then we have the following: n= 0 P(A for all A F, if A B = then P(A B = P(A + P(B, P(A c = P(A, since P(A + P(A c = P(Ω =, P( = 0, if A B then P(A P(B, since P(B = P(A (B \ A = P(A + P(B \ A P(A. P(A B = P(A + P(B P(A B, P(A B P(A + P(B. 4.. Countable subadditivity. For any sequence of events (A n n we have ( (4. P A n P(A n. To see this, define the events n= B = A, B 2 = A 2 \ A, B 3 = A 3 \ (A A 2... B n = A n \ (A A n... Then the B k s are disjoint, and P(B k P(A k for all k =... n, from which ( n ( n n n P A k = P B k = P(B k P(A k. i= The same proof shows that this also holds for infinite sequences of events (A n n ( P A n P(A n. n= The above inequalities show that probability measures are subadditive, and are often referred to as Boole s inequality. n= n=

2 VITTORIA SILVESTRI 4.2. Continuity of probability measures. For a non-decreasing sequence of events A A 2 A 3 we have ( P A n = lim P(A n. n n= To see this, define a new sequence (B n n by B = A, B 2 = A 2 \ A, B 3 = A 3 \ (A A 2... B n = A n \ (A A n... Then the B n s are disjoint, and so P(A n = P(B B n = Taking the limit as n both sides, we get ( lim P(A n = P(B n = P n n= n= n P(B k. ( B n = P n= A n, as wanted. Similarly, for a non-increasing sequence of events B B 2 B 3 we have ( P B n = lim P(B n. n n= 4.3. Inclusion-exclusion formula. We have already seen that for any two events A, B P(A B = P(A + P(B P(A B. How does this generalise to an arbitrary number of events? For 3 events we have P(A B C = P(A + P(B + P(C P(A B P(A C P(B C + P(A B C. A B C In general, for n events A, A 2... A n we have ( n n P A i = ( k+ P(A i A i2 A ik. i= i < <i k n

PROBABILITY 3 This can be proved by induction. The case n = 2 is clear. Now assume the formula holds true for n. Then P(A A 2 A n = P(A A n + P(A n P(I I 2 I n where I k = A k A n for k n. Now use the inductive hypothesis on the first ans third term of the right hand side, and rearrange to conclude. We omit the details. Note that the formula can be written more explicitly as ( n n P A i = P(A k i= i <i 2 n + i <i 2 <i 3 n P(A i A i2 P(A i A i2 A i3 + ( n+ P(A A 2 A n. For the case of equally likely outcomes, we can deduce the following inclusion-exclusion formula for the cardinality of unions of sets: A A 2 A n = n ( k+ i < <i k n A i A i2 A ik. 4.4. Applications. We present below an applications of the inclusion-exclusion formula to illustrate its power. 4.4.. Derangements. A permutation of {, 2... n} is called a derangement if it has no fixed points. What is the probability that a uniformly chosen permutation is a derangement? Let Ω denote the set of all permutations of {, 2... n}, so that Ω = n!. For i n denote by A i = {ω Ω : ω(i = i} the set of all permutations having i as a fixed point. Then P({w Ω is a derangement = P((A A 2 A n c m = ( k+ i < <i k m A i A i2 A ik. Ω Now, since A i A i2 A ik counts the number of permutations fixing i, i 2... i k, we have A i A i2 A ik = (n k!

4 VITTORIA SILVESTRI for all k n. Thus Note that as n. P({w Ω is a derangement = = = m ( k+ i < <i k m m ( n (n k! ( k+ k n! m ( k+ k! = (n k! n! n ( k k!. k=0 P({w Ω is a derangement e = 0.3678... Statslab, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Road, Cambridge CB3 0WB, UK