Law of Total Probability and Bayes Rule

MATH 382 Law of Total Probability and Bayes Rule Dr Neal, WKU Law of Total Probability: Suppose events A 1, A 2,, A n form a partition of Ω That is, the events are mutually disjoint and their union is all of Ω Then for any other event B, we have P(B) P(A 1 ) P(B A 1 ) + P(A 2 ) P( B A 2 ) + +P(A n ) P( B A n ) A 3 A n A 2 A 1 A 1 A 2 A n Ω, all mutually disjoint n Proof Because the A i are all disjoint, the sets {B A i } i 1 are all disjoint Moreover, n n these sets union to B because B B Ω B A i (B A i ) So by the additivity i 1 i 1 of P and the Multiplication Principle, we have n P(B) P ( B A i ) n P( n B A i ) P( A i ) P(B A i ) i 1 i 1 P(A 1 ) P(B A 1 ) + P(A 2 ) P( B A 2 ) + +P(A n ) P( B A n ) i 1 Ω Bayes Rule (Reverse Conditionals): Given that event B has occurred, the probability of a partition event A j is given by P(A j B) P( A j B) P( B) P( A j ) P(B A j ) P( A 1 ) P(B A 1 ) + + P(A n ) P( B A n ) Example 1 Among Kentucky registered voters, 30% are Republican, 50% are Democrat, and 20% are Other The percentages that support the President among these groups are respectively 010, 080, and 060 (a) What percentage of KY registered voters support the President? (b) If one supports the President, then what is the probability that one is a Democrat? (c) If does not support the President, then what is the probability that one is a Republican?

Solution (a) Here the partition is A 1 R, A 2 D, and A 3 O Let S denote those that support the president Then, by the law of Total Probability, we have P(S) P( R) P(S R) + P(D) P(S D) + P(O) P(S O) 0 30 010 + 050 080 + 020 060 0 55 So 55% currently support the President (b) Applying Bayes Rule, we obtain P(D S) P( D)P(S D) P(S) 0 50 080 055 8 0 72 11 (c) Using the fact that P(S c R) 1 P(S R) 090 along with Bayes Rule, we obtain P(R S c ) P(R)P(Sc R) P(S c ) 030 0 90 045 060 Example 2 Suppose 10% of major league ball players use enhancers such as steroids or Andro If one uses, then a test will say YES 99% of the time If one doesn't use, then a test will say NO 98% of the time (a) Find the probability that a randomly chosen major leaguer will test positive (b) If one tests positive, what are the probabilities that they do/don't use enhancers (c) If one tests negative, what are the probabilities that they do/don't use enhancers Solution Let Ω be the sample space of major league ball players, and let E be those that use an Enhancer Then Ω E E c is a partition Now let Y denote a Positive Test Then we are given P(E) 010, P(E c ) 090, P(Y E) 099, P(N E c ) 098 From the given information, we also have P(N E) 001 and P(Y E c ) 002 (which is the probability of a false positive (a) We then have P(Y ) P( E) P(Y E ) + P(E c ) P(Y E c ) 010 099 + 090 0 02 0117 (b) By Bayes Rule, we have P(E Y) P(E )P(Y E ) P(Y) 01 099 0117 084615 and P(E c Y) 1 P( E Y) 015385 (or P(E c Y) 090 002 / 0117 )

(c) P(E N) P(E N ) P( N) P(E)P( N E ) P(Y c ) 01 001 0883 0 001132531 P(E c N ) P(Ec ) P( N E c ) P( N) 0 9 0 98 0883 0 9988675 (or use 1 P( E N )) So what happens if somebody tests positive? The sample is independently tested again by other labs If you are clean, the chance of testing positive three times in a row is 002 3 0000008 (almost impossible, but not quite) Also note, about 11 in 10,000 negative tests are wrong But that's why athletes are tested over and over throughout the season A user will eventually be caught The next result is a variation of the multiplication principle: Theorem P(A B C) P( A C) P( B A C) Proof Applying the definition of conditional probability, we obtain: P(A C) P(B A C) P( A C) P(C) P(( A B) C)) P(C) P( A B C) P( B ( A C)) P( A C) Example 3 Draw two cards from a deck without replacement (a) What is the probability that the second is a King? (b) Given that there was a King on the 2nd, what is the probability that there was a King on the 1st? (c) What is the probability that the first is a face card and the second is a King? Solution (a) The result depends on the 1st draw A partition of the 1st draw is K 1 King on 1st, K 1 c No King on 1st Let K2 King on Second Then, P(K 2 ) P(K 1 ) P(K 2 K 1 ) + P(K 1 c ) P(K2 K 1 c ) 4 52 3 51 + 48 52 4 51 4(3 + 48) 52 51 4 52 1 13

(b) By Bayes Rule, we have P(K 1 K 2 ) P(K 1)P(K 2 K 1 ) P(K 2 ) (1 / 13) (3 / 51) (1 / 13) 3 51 (c) Let F 1 face on 1st Then we want P(F 1 K 2 ) But this value still depends on K 1 So we have P(F 1 K 2 ) P( K 1 ) P(F 1 K 2 K 1 ) + P( K 1 c ) P(F 1 K 2 K 1 c ) P( K 1 ) P(F 1 K 1 ) P(K 2 F 1 K 1 ) + P(K 1 c ) P( F1 K 1 c ) P( K2 F 1 K 1 c ) 4 52 1 3 51 + 48 52 8 48 4 51 4 52 3 51 + 8 52 4 51 11 663 This result also can be computed by P(K 1 ) P( K 2 K 1 ) + P( F 1 K 1 c ) P( K2 F 1 K 1 c ) 4 52 3 51 + 8 52 4 51

Exercises 1 Bowl A has 3 red chips and 5 blue chips Bowl B has 5 red chips and 4 blue chips A chip is chosen at random from Bowl A and placed in Bowl B Then a chip is chosen at random from Bowl B (a) Compute the probability that the second chip chosen is Red (b) Given that the second chip chosen was red, what is the probability that the first chosen chip was red? (c) Given that the second chip chosen was blue, what is the probability that the first chosen chip was blue? 2 Bowl A has five $5 bills, four $10 bills, and three $20 bills Bowl B has five $10 bills, four $20 bills, and three $50 bills Bowl C has five $20 bills, four $50 bills, and three $100 bills Roll two dice If you roll a sum of 2, 3, 4, or 5, then you pick a bill at random from A If you roll a sum of 6, 7, or 8, then you pick a bill at random from B If you roll a sum of 9, 10, 11, or 12, then you pick a bill at random from C (a) What is the probability that you pick a $20 bill? (b) If you pick a $20 bill, then what is the probability that you rolled a 6, 7, or 8? (c) If you don't pick a $20 bill, then what is the probability that you did not roll a 2, 3, 4, or 5? (d) If you pick a $5 bill, then what is the probability that you rolled a 2, 3, 4, or 5? (e) If you don't pick a $100 bill, then what is the probability that you rolled a 9, 10, 11, or 12? 3 Draw two cards from a deck without replacement (a) What is the probability that the second is a Face Card (J, Q, K)? (b) Given that there was a Face Card on the 2nd, what is the probability that there was a Face Card on the 1st? (c) What is the probability that the first is a High Card (10, J, Q, K A) and the second is a Face Card?

Addendum: Application to Confidential Random Sampling Let D be those students that use drugs How do we approximate P(D)? We can take a random survey and simply look at the sample proportion But suppose we are doing one-on-one interviews and we ask that question We might not get truthful answers Is there some way around this problem? Is there some way to ask the question so that the subject will feel more comfortable responding truthfully? Create 100 flash cards, where 60 have the statement I use drugs (C 1 ) and 40 have the statement I do not use drugs (C 2 ) Show the cards to the subject so that he or she can see that they are assorted and the backs are not marked Then have the subject pick a card at random without revealing the question to the surveyor The subject simply says True or False, and the surveyor does not even know which statement was read Now compute the sample proportion p of those that say True Then, P(T ) P(C 1 ) P(T C 1 ) + P(C 2 ) P(T C 2 ) 06 P(D C 1 ) + 0 4 P(D c C 2 ) 06 P(D) + 0 4 P(D c ) indep due to randomness 06 P(D) + 0 4 (1 P(D)) 02 P(D) + 0 4 Hence, p P(T ) 02 P(D) + 0 4 ; so P(D) p 0 4 02 For example, suppose 10% actually use drugs Survey 1000 in this manner The results should be close to the following: C 1 C 2 D 60 40 100 D c 540 360 900 600 400 1000 The number of True responses should be around 60 + 360 420 So p should be 042 0 4 close to 420/1000 042 Then we approximate P(D) by 010 02

Simpson s Paradox Following is an example that demonstrates how a lurking variable seemingly can cause a paradox when comparing the conditional percentages involving this variable with conditional percentages that do not involve the variable Example 10 Below are data on all recent patients undergoing surgery at two hospitals The data include the condition of the patient before the surgery Good Condition Poor Condition Hospital A Hospital B Hospital A Hospital B Died 6 8 Died 57 8 Survived 594 592 Survived 1443 192 Total 600 600 Total 1500 200 (a) Compare the percents to show that Hospital A has a higher survival rate for both groups of patients (b) But which hospital has the higher survival rate? In Hospital A: P(S G) 594 600 In Hospital B: P(S G) 592 600 099 and P(S P) 1443 1500 0986 and P(S P) 192 200 0 962 0 96 So it seems that Hospital A has a higher survival rate for patients who were admitted in good condition and for patients who were admitted in poor condition But altogether, the survival rates are In Hospital A: P(S) 2037 2100 097 In Hospital B: P(S) 784 800 098 So Hospital B has the higher survival rate! Explain a b > e f and c d > g h but a + c b + d < e + g f + h (The huge number of patients admitted in poor condition to Hospital A heavily weights their overall average and brings it down) Example 11 Simpson s Paradox can be understood more easily from the following baseball statistics: Player 1: Hits 2 out of 4 (batting 500), then 1 out of 3 (333) 3/7 0429 Player 2: Hits 199/400 (batting < 500), then 0 for 2 (000) 199/402 0495