Prof. Thistleton MAT 505 Introduction to Probability Lecture 5

Sections from Text and MIT Video Lecture: Sections 3.3, 3.4, 3.5 http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systems-analysis-andapplied-probability-fall-2010/video-lectures/lecture-2-conditioning-and-bayes-rule/ Topics from Syllabus: Independence, Total Probability Theorem (Bayes Rule) Independence Two events are said to be independent when conditioning provides no information on probabilities. That is, A is said to be independent of B if P(A B) = P(A) Show that if A and B are independent, written A B then P(A B) = P(A) P(B) OK, trivial. We can say {P(A B) P(A B) P(B) } {P(A B) = P(A B)P(B) = P(A)P(B)} This is the famous multiplication rule for independent events. Note that the above is a little sloppy and that some care needs to be taken when P(B) = 0. Also, saying P(B) = 0 is not the same thing as saying that B =! Why not?!? (Think about the case of picking a random number from the interval (0,1) ). SUNY POLY Page 1

Show that if A is independent of B then B is independent of A and we may just speak of the independence of two events. This is again trivial. We can say {P(B A) P(A B) P(A) and A B} {P(B A) P(A)P(B) P(A) = P(B)} When there are more than two events under consideration we need to make a distinction between pairwise independence (weaker) and mutual independence (stronger). Define a collection of sets {A 1, A 2,, An} to be mutually independent if, for any sub-collection of these events Ai1, Ai2,, Aik we have P(Ai1 Ai2 Aik) = P(Ai1) P(Ai2) P(Aik) That is, grab them several at a time and the multiplication rule always still holds. More succinctly, a family {A i : i I } is said to be independent if, for any index set J I P ( A i ) = P(A i ) i J i J SUNY POLY Page 2

Note that pairwise independence does not guarantee mutual independence! Here s a quick counter example from (Grimmett & Stirzaker, 1992) (paraphrasing): Let S = {aaa, bbb, ccc, abc, acb, cab, cba, bca, bac} be an equiprobable sample space. We will select an outcome at random, and define the event E k to be the event that the k th letter is a. Show that the family {E 1, E 2, E 3 }is pairwise independent but not mutually independent. Let s write out the three events. E 1, i. e. first letter is a = { aaa, abc, acb}, P(E 1 ) = 3 9 E 2, i. e. second letter is a = { aaa, cab, bac}, P(E 2 ) = 3 9 E 3, i. e. third letter is a = { aaa, cba, bca}, P(E 3 ) = 3 9 Now show pairwise independence: E 1 E 2 since P(E 1 E 2 ) = 1 9 = 1 3 1 3 = P(E 1) P(E 2 ) E 1 E 3 since P(E 1 E 3 ) = 1 9 = 1 3 1 3 = P(E 1) P(E 3 ) E 2 E 3 since P(E 2 E 3 ) = 1 9 = 1 3 1 3 = P(E 2) P(E 3 ) But we do not have mutual independence since P(E 1 E 2 E 3 ) = 1 9 1 3 1 3 1 3 Toss a fair coin 6 times. What is the probability of obtaining exactly 3 heads? Work with the classical notion of probability if you like. Write down all 64 possible sequences of H and T of length 6 and count how many have exactly 3 heads. You should find 20 of them. Or, if you have done combinations and permutations, just take C 6 3 = 20 to get 20/64. Independence is applied when we argue that the probability of any given sequence is 1 2 1 2 1 2 1 2 1 2 1 2 = 1 64 SUNY POLY Page 3

There are 32 people in a room. What is the probability that at least two of them have the same birthday? Tricky! We need to guard against exactly two having the same, a few pairs having the same birthday but also some triples, etc. It is a very messy sample space. When a sample space is messy, see if the complement is easier to work with. The opposite of some is none, so find the probability that no two individuals have the same birthday. We work sequentially and use the multiplication rule for mutually independent events. Since there are 32 people, the probability that the first person you call on does not share a birthday with any of the previous individuals is 365/365 (no leap years! This is just a simple example!) Now, when you ask the second person, there are 364 days left open, so the likelihood they have a unique birthday is 364/365. We are assuming independence- no twins, etc. Thought of slightly differently, we continue in this fashion to see that the number of ways 32 people can have unique birthdays divided by the number of ways we can select a day of the year for each of 32 people is 365 365 364 365 31 365 365 If you like your calculator you can get busy! If you have R available you could calculate this as numerator=seq(365,365-31) #create artithmetic sequence from 365 to 334 by -1 denominator=rep(365,times=32) #create a vector of 32 entries, all equal to 365 1-prod(numerator/ denominator) #take the product of our fractions, complement You should get 0.7533475, a number most people find surprisingly high. SUNY POLY Page 4

Total Probability Theorem and Bayes' Rule We often search for the cause of an event. As a naively simple first example, consider a group of people described in terms of gender (Male or Female) and beverage choice (Coffee or Tea). Suppose you survey and obtain a breakdown as follows. Drink Choice Coffee, A 1 Tea, A 2 Soda, A 3 Gender Male 34 26 15 75 Female 30 10 8 48 64 36 23 We see that Drink Choice forms a partition of the sample space. Formally, a partition of a sample space S is a collection of mutually disjoint (non-overlapping) sets which, taken together, comprise the entire sample space. That is A i A j = for i j, and A i = S We will randomly select an individual from this group and denote the event they are female as Event B. Drink choice will be denoted as A 1 for Coffee, A 2 for Tea, and A 3 for Soda. It is easy for us to see that P(Female) = P(B) = 48 It is also easy to see that (according to Kolmogorov s Third Axiom) P(Female) = P(B A 1 ) + P(B A 2 ) + P(B A 3 ) = 30 + 10 + 8 This is really just arithmetic. If we write this out symbolically, we can say that, if = 48 SUNY POLY Page 5

A 1, A 2,, A n form a partition of a sample space, and if B is a non-empty event, then P(B) = P(B A 1 ) + P(B A 2 ) + + P(B A n ) We may as well go a little further and note that, since P(B A i ) = P(A i B)P(B) we can write P(B) = P(B A 1 )P(A 1 ) + P(B A 2 )P(A 2 ) + + P(B A n )P(A n ) This is simple, but maybe a little unmotivated. Let s check our arithmetic, then look at a real problem. We have This checks out, since Why would anyone care? P(B A 1 )P(A 1 ) = P(Female Coffee) P(Coffee) = 30 64 64 P(B A 2 )P(A 2 ) = P(Female Tea) P(Tea) = 10 36 36 P(B A 3 )P(A 3 ) = P(Female Soda) P(Soda) = 8 23 23 P(Female) = 30 64 64 + 10 36 36 + 8 23 23 = 48 Bayes Rule We can turn things around a little and ask, If we know someone is female, what is the likelihood they drink coffee? More particularly, can we do this just knowing how gender plays out in all of the various drinks? Think about it like this. We would like P(Coffee Female) = P(B A 1 ) We can work this out as SUNY POLY Page 6

So, P(Coffee Female) = More generally, we have that P(Coffee Female) = P(A 1 B) = P(B A 1) P(B) P(Coffee Female) = P(B A 1 )P(A 1 ) P(B A 1 )P(A 1 ) + P(B A 2 )P(A 2 ) + P(B A 3 )P(A 3 ) 30 64 64 30 64 64 + 10 36 36 + 8 23 23 = 30 48 P(A i B) = P(B A i )P(A i ) n P(B A i ) P(A i ) i=1 Key Example: Suppose you are administering a test with specificity 98% and sensitivity 90%. You already know from other sources that the prevalence of steroid use among student athletes is 1%. You test an athlete and find that the test indicates steroid use. What is the probability that the athlete has, in fact, used steroids? We know We want Can we set this up to use Bayes Rule? P(Test Positive Has Condition) = 0.90 P(Test Negative Does Not Have Condition) = 0.98 P(Condition) = 0.01 P(Has Condition Tests Positive) SUNY POLY Page 7

How can we apply the rule? Evidently we should work with P(A i B) = P(B A i )P(A i ) n P(B A i ) P(A i ) i=1 We should therefore take Then A 1 and A 2 form a partition and A 1 = Has Condition Cond A 2 = Does Not Have Condition Cond c B = Test Positive P(Cond B) = P(B Cond)P(Cond) P(B Cond) P(Cond) + P(B Cond c ) P(Cond c ) We know all of these numbers! Just substitute: P(Cond B) = 0.9 0.01 0.9 0.01 + 0.02 0.99 = 0.3125 We have used that P(Cond c ) = 1 P(Cond) = 0.99 P(B Cond c ) + P(B c Cond c ) = 1, so P(B Cond c ) = 1 0.98 = 0.02 Same as last time! Bibliography Grimmett, G., & Stirzaker, D. (1992). Probability and Random Processes. Oxford University Press. SUNY POLY Page 8