P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

Lectures 7-8 jacques@ucsdedu 41 Conditional Probability Let (Ω, F, P ) be a probability space Suppose that we have prior information which leads us to conclude that an event A F occurs Based on this information, how does the probability space (Ω, F, P ) change to reflect this information? This is the notion of conditional probability Intuitively, we should consider for any event B F, given that A occurs, the probability that B occurs is the probability of that part of B which is in A divided by the probability of A This leads to the definition: Definition of conditional probability Let (Ω, F, P ) be a probability space and let A F such that 0 Then the conditional probability of an event B F given A is defined by Note that (A, G, Q) defines a new probability space, with G = {A B : B F} and Q(B) = P (B A) To see this, it is not hard to check that G is again a σ-field, and to see that Q satisfies the axioms of probability, we observe that Q(A) = P (A A)/ = 1 (Axiom 1 is satisfied), Q(B) 0 for every B G (Axiom 2 is satisfied) To check Axiom 3, suppose we have disjoint events B, C G Then Q(B C) = P (B C A) = P ((B C) A) = P (B A) + P (C A) = P (B A) + P (C A) = Q(A) + Q(B) The same argument extends to countably many disjoint events B 1, B 2, G Having verified that Q(B) = P (B A) is actually a probability measure, we do some computational examples 1

First Example A fair six-sided die is tossed once and the outcome is recorded Given that the outcome is an odd number, what is the probability that the outcome is 5? Let A be the event that the outcome is odd, and let B be the event that the outcome is 5 Then we are asking for P (B A) According to the definition, Now = 1/2, and = P (B) = P ({5}) = 1/6 Therefore as expected 1/6 1/2 = 1 3 Second Example A fair six-sided die is tossed twice and both outcomes are recorded Given that at least one of the numbers is a 6, what is the probability that the total on the dice is at least 10? Let A be the event that at least one of the numbers is a 6, and let B be the event that the total is at least 10 Then we are asking for P (B A) Now = 1, where A is the event that neither of the dice shows 6 The probability of that event, by the multiplication principle is A 36 = 5 5 36 = 25 36 We conclude = 11/36 Next, we compute This is the event that at least one of the dice shows a 6 and the total on the dice is at least 10 We can list the outcomes in A B: A B = {(6, 4), (6, 5), (6, 6), (5, 6), (4, 6)} and therefore So finally, = A B 36 = 5 36 = 5/36 11/36 = 5 11 42 Law of total probability The third axiom of probability gives a law called the law of total probability, which is useful in computing conditional probabilities It essentially says that if a given event B can be written as B = (B A 1 ) (B A 2 ) 2

where A 1, A 2, are events, then we can compute P (B) in terms of the probabilities P (B A i ) and P (A i ) This allows us to break down probability computations into simple pieces Law of total probability Let B, A 1, A 2, be events in a probability space and suppose B = (B A 1 ) (B A 2 ) Then P (B) = P (B A i )P (A i ) i=1 Proof By Axiom 3 of probability, P (B) = P (B A i ) i=1 By definition of conditional probability, P (B A i ) = P (B A i) P (A i ) and so P (B A i ) = P (B A i )P (A i ) Putting this into the sum, P (B) = P (B A i )P (A i ) i=1 This verifies the law of total probability First Example A red urn and a blue urn each contain black and white balls The red urn contains as many white balls as black balls, whereas the blue urn contains three times as many white balls as black balls A player randomly and uniformly picks one of the urns, and then randomly and uniformly picks a ball from that urn Determine the probability that the ball picked is black Let A r be the event that we pick the red urn, and let A b be the event that we pick the blue urn Then P (A r ) = P (A b ) = 1/2 Let B be the event that the ball picked is black To determine P (B), note that B = (A r B) (A b B) and therefore P (B) = P (A r B) + P (A b B) 3

by Axiom 3 of probability We would expect P (A r B) = 1 2 1 2 = 1 4 since there is half a chance of picking the red urn, and then half a chance that we pick a white ball from the red urn Similarly, P (A b B) = 1 2 2 3 = 1 3 To make these statements precise, we use the law of total probability: P (B) = P (B A r )P (A r ) + P (B A b )P (A b ) = 1 2 1 2 + 2 3 1 2 = 7 12 The reader should find the problem with the following argument: P (A r B) = 1/2 since A r B is the event that a black ball is picked from the red urn, and half the balls in the red urn are black; this ignores that A r is an event and P (A r ) < 1 For this reason we apply P (A r B) = P (B A r )P (A r ) Second Example Continuing from the last example, suppose that we know that the ball picked is black What is the probability that the ball was picked from the blue urn? Using the notation of the last example, this is P (A b B) By definition of conditional probability, P (A b B) = P (A b B) P (B) = 1/3 7/12 = 4 7 Third Example Two decks of cards lie on a table One of the decks is a standard deck, but the other is not, since every card in it is the ace of spades A dealer randomly and uniformly picks one of the two decks, and asks you to pick a card from it The card you pick is the ace of spades Determine the probability that the deck offered to you is the non-standard deck This problem is basically the same as the last one: let B be the event that the ace of spades is picked, and let A s and A n be the events that the chosen deck is standard and non-standard respectively Then we want P (A n B) We know that P (A n B) = P (A n B) P (B) Let us compute P (B) using the law of total probability: P (B) = P (B A n )P (A n ) + P (B A s )P (A s ) 4

Then P (A s ) = P (A n ) = 1/2 and P (B A n ) = 1 and P (B A s ) = 1/52 Therefore P (B) = 1 2 + 1 104 = 53 104 We also have P (A n B) = P (B A n )P (A n ) = 1/2, so P (A n B) = P (A n B) P (B) = 1/2 53/104 = 52 53 So there is a 52/53 chance that the deck chosen is non-standard 43 Bayes Rule Bayes rule is an exceedingly important but simple to state tool in probability, and appears in numerous applications Bayes Rule Let A and B be events in a probability space, such that 0 and P (B) 0 Then P (B)P (A B) Proof By definition of conditional probability, Also Solving for, we get P (A B) = P (B) P (A B)P (B) and dividing both sides by gives Bayes Rule The probability P (B) is called the prior probability, and P (B A) is called the posterior probability We imagine that an experiment is performed, from which we want to obtain some information about B The prior probability P (B) is the probability that B occurs, without any further information The posterior probability is the probability that B occurs, given the information obtained from the experiment, which is the event A In practice, to compute, we may use the law of total probability = P (A B)P (B) + P (A B)P (B) 5

First Example A patient X at a medical exam reveals a symptom x of Type II diabetes The percentage of all people in the United States with Type II diabetes is 10 percent (the prior probability is 10/100) The percentage of Type II diabetics which are asymptomatic in a medical exam is 01 percent and the percentage of Type II diabetics which have symptoms is 80 percent Determine the probability that patient X has Type II diabetes (the posterior probability) We set up the problem in the framework of conditional probability Let B be the event that a patient has Type II diabetes Let A be the event that a patient has symptom x at a medical exam Then we have from Bayes Rule that the posterior probability is P (A B)P (B)/ Now P (B A) is the probability we want: it is the probability that given that symptom x is detected, the patient has Type II diabetes The prior probability is P (B) = 10/100, and we have P (A B) = 80/100, and P (A B) = 1/1000, since these quantities are given Therefore Then = P (A B)P (B) + P (A B)P (B) = 80 100 (80/100) (10/100) (809/10000) 10 100 + 1 1000 = 800 809 = 098887 90 100 = 809 10000 So the patient has roughly a 989 percent chance of having Type II diabetes Second Example A student S is suspected of cheating on an exam, due to evidence E of cheating being present Suppose that in the case of a cheating student, evidence E is present with 60 percent probability, and that in the case of a student that does not cheat, evidence E is present with a 001 percent probability Suppose also that the proportion of students that cheat is 1 percent Determine the probability that S cheated Let B be the event S cheated, and let A be the event that evidence E is present We seek P (A B)P (B) This time we do not know, the probability that evidence E is present However, by the law of total probability, = P (A B)P (B) + P (A B)P (B) We know P (A B) = 06 and P (A B) = 00001, and P (B) = 001, since these are given Therefore P (A B)P (B) = 06 001 06 001 + 00001 099 = 0983 So there is roughly a 98 percent chance the student S cheated 6

44 Monty-Hall Problem The Monty-Hall problem is a good illustration of conditional probability A game show consists of three closed doors, behind one of which there is a prize A game show host, who knows which door hides the prize, asks a contestant to pick one of the three doors, and then randomly opens a different door with no prize behind it it Then the host asks: would you like to open the door you originally picked (call this stay strategy), or would you like to pick the other closed door (ie switch strategy)? The first remark is that it does not matter which door you first selected, so we might as well assume you picked door number 1 The question is: do you stick with door number 1 or do you switch to another door? It seems intuitive that it is equally likely that the prize is behind the door 1 and the door that the game show host did not pick However, this is not the case, since by revealing that there is nothing behind door 2, the game show host increases the likelihood that there is something behind door 3 Thus it seems plausible that to maximize the probability of revealing the prize, you should switch doors The key is to make this rigorous in terms of probability Let A i be the event that the prize lies behind door i Let B i be the event that the game show host picks door i Let B be the event that switching doors leads to a win Then we have P (A i ) = 1/3 for i = 1, 2, 3 and also B = (B 3 A 2 ) (B 2 A 3 ) By the law of total probability, P (B) = P (B 3 A 2 )P (A 2 ) + P (B 2 A 3 )P (A 3 ) = 2 3 since P (B 3 A 2 ) = P (B 2 A 3 ) = 1 Let C be the event that keeping the door (staying) wins Then C = (B 2 A 1 ) (B 3 A 1 ) and by the law of total probability, P (C) = P (B 2 A 1 )P (A 1 ) + P (B 3 A 1 )P (A 1 ) = 1 3 since P (B 2 A 1 ) = 1/2 = P (B 3 A 1 ) Therefore a win is twice as likely if you switch doors than if you stay 7

45 Independence Two events in a probability space are independent if the occurrence of either of them does not affect the probability of the occurrence of the other In mathematical terms, this means events A and B are independent if P (A B) = By definition of conditional probability, this is the same as = P (B) More generally, we can define independence of a set of events Definition of independence Let (Ω, F, P ) be a probability space and let A 1, A 2,, A n be events in F Then A 1, A 2,, A n are independent if for any non-empty set I {1, 2,, n}, ( ) P A i = P (A i ) i I i I The definition says that if events are independent, the the probability of their intersection is just the product of all their probabilities So for instance, = P (B), and P (A B C) = P (B)P (C) We have been implicitly using the notion of independence When we say two fair die are tossed independently, we now have the formal definition of what that means: if A is any set of outcomes on the first die and B is any set of outcomes on the second, then = P (B) In words, the outcome of the first die does not affect the outcome of the second First Example An event A cannot be independent of itself, unless = 0 or = 1 To see this, if A is independent of A, then P (A A) = = 2 by definition But P (A A) =, so we get = 2 which is only possible if = 0 or = 1 Similarly, one can show that if A and B are disjoint and independent, then or P (B) is 0, and one can also show that 8

if A and B are independent, then so are A and B, using inclusion-exclusion and the complement rule Second Example A coin is tossed independently n times Suppose the probability of heads on each toss is p What is the probability that no heads occur? Let A i be the event of tails on the ith toss Then we want P (A 1 A 2 A n ) By independendence, this is P (A 1 )P (A 2 ) P (A n ) Now by the complement rule, P (A i ) = 1 p, so P (no heads) = (1 p) n We will come back to this example when we compute the probability of exactly k heads in n coin tosses; this is the famous binomial distribution Third Example A dart is thrown uniformly and randomly at a circular simplified dartboard consisting of eight sectors with equal area, labelled clockwise 1, 2, 3, 4 Let A be the event that the dart lands in sectors 1, 2, let B be the event that the dart lands in sectors 2, 3, and let C be the event that the dart lands in sectors 1, 3 Are the events A, B, C independent? To check independence, we have to check P (B) =, P (B)P (C) = P (B C), P (C) = P (A C) and P (B)P (C) = P (A B C) Now = P (B) = P (C) = 1/2, since they each have 2 sectors, and since each pairwise intersection A B, B C, A C covers 1 sector, = P (B C) = P (A C) = 1/4 So any two of the events A, B, C are independent However, they are not independent together, since A B C = and so P (A B C) = 0 P (B)P (C) = 1 8 The last example illustrates that to check independence of a set of events, independence of all combinations of at least two of those sets must also be checked 9