Lecture 3. January 7, () Lecture 3 January 7, / 35

Lecture 3 January 7, 2013 () Lecture 3 January 7, 2013 1 / 35

Outline This week s lecture: Fast review of last week s lecture: Conditional probability. Partition, Partition theorem. Bayes theorem and its applications. Independence. () Lecture 3 January 7, 2013 2 / 35

Contents 1 Fast review of last week s lecture 2 Bayes theorem 3 Independence () Lecture 3 January 7, 2013 3 / 35

Conditional Probability Definition (Conditional Probability) Let A and B be events from a given event space F with B satisfying P (B) > 0. The conditional probability of A, given that B occurs, is a probability measure denoted by P (A B) and is defined by, P (A B) = P (A B). P (B) If P (B) = 0 then P (A B) is not defined. () Lecture 3 January 7, 2013 4 / 35

Example Roll a die. What is the probability that... 1 the score is at most 3, given that the score is prime? Solution: By the definition of conditional probability, we have B = {2, 3, 5} and A = {1, 2, 3}. With the knowledge of conditional probability, we can write the probability in the second problem as P (A B), and P (A B) = P (A B) P (B) = 2/6 3/6 = 2 3. () Lecture 3 January 7, 2013 5 / 35

Partition Definition (Partition, Definition 2.2.1) A collection of events, B 1, B 2,..., B k is called a partition of the sample space Ω if 1 B i B j =, for all i and j such that i j; 2 B 1 B 2 B k = Ω. Example: A single die flip: Ω = {1, 2, 3, 4, 5, 6}. It is clear that B 1 = {1}, B 2 = {2, 3}, B 3 = {4, 5, 6}, is a partition of Ω. Example(Example 2.2.2): A and A c form a partition of Ω for any event A. () Lecture 3 January 7, 2013 6 / 35

Partition Theorem (or Total Probability Theorem): general case Theorem (Partition Theorem (general case)) More generally, for any partition, B 1, B 2, B 3,..., B k of the sample space Ω with P (B j ) > 0 for all 1 j k, we have k P (A) = P (A B n )P (B n ). n=1 () Lecture 3 January 7, 2013 7 / 35

Some important comments about the conditional probability 1 One can often compute the conditional probability without figure out the sample space first. Of course, we can do it in this way, but it will make the problem horribly complicate. 2 The idea of partition theorem is to split computing the probability of a problem into several small problems. For each small problem, one can get further information to compute the probability of this small problems, and then add these probability together. () Lecture 3 January 7, 2013 8 / 35

An example Example (Example 2.2.6): We are given two black boxes with each containing a collection of coloured balls. The first box contains 2 red and 3 blue balls and the second contains 4 red and 1 blue balls. 1 First A ball is chosen at random from the first box and placed in the second. 2 Then a ball is chosen at random from the second box. Question: What is the probability that this second ball is blue? () Lecture 3 January 7, 2013 9 / 35

Solution Obviously, the choice of the second ball is influenced by the result of the first step. Solution: Take A = {Ball 2 is blue} and B = {Ball 1 is blue}. Sifting the information we are given we deduce that P (A B) = 1/3, P (B) = 3/5, P (A B c ) = 1/6 and P (B c ) = 2/5. Using the Partition Theorem, the required probability is P (A) = P (A B)P (B) + P (A B c )P (B c ) = 1 3 3 5 + 1 6 2 5 = 4 15. () Lecture 3 January 7, 2013 10 / 35

Contents 1 Fast review of last week s lecture 2 Bayes theorem 3 Independence () Lecture 3 January 7, 2013 11 / 35

A question? We are given two boxes with each containing a collection of coloured balls. The first box contains 2 red and 3 blue balls and the second contains 4 red and 1 blue balls. A ball is chosen at random from the first box and placed in the second. Then a ball is chosen at random from the second box. Question: Suppose this second ball was blue. What is the probability that the first ball was also blue? Keep this question in mind, let us consider the Bayes Theorem. () Lecture 3 January 7, 2013 12 / 35

Bayes Theorem (A special case)] The next theorem is called Bayes Theorem, it is very important and has a lot of applications in statistics, especially for testing. Theorem (Bayes Theorem (A special case)) For any events A and B with P (A) > 0 and P (B) > 0, P (B A) = P (A B)P (B) P (A) = P (A B)P (B) P (A B)P (B) + P (A B c )P (B c ). () Lecture 3 January 7, 2013 13 / 35

Bayes Theorem (general case) Theorem (Bayes Theorem (general case)) Let A be an event with P (A) > 0. Let B 1, B 2,..., B n form a partition of Ω such that P (B i ) > 0 for all 1 i n. Then, for each j = 1, 2,..., n, P (B j A) = P (A B j)p (B j ) P (A) = P (A B j )P (B j ) n i=1 P (A B i)p (B i ). () Lecture 3 January 7, 2013 14 / 35

Proof of Bayes Theorem (general case) Proof. It follows from definition of conditional probability that P (B j A) = P (B j A) P (A) = P (A B j)p (B j ). (1) P (A) Recall the partition theorem (total probability theorem), we get P (A) = n P (A B i )P (B i ). i=1 Hence, P (B j A) = P (B j A) P (A) = P (A B j)p (B j ) P (A) = P (A B j )P (B j ) n i=1 P (A B i)p (B i ). (2) () Lecture 3 January 7, 2013 15 / 35

The question We are given two boxes with each containing a collection of coloured balls. The first box contains 2 red and 3 blue balls and the second contains 4 red and 1 blue balls. A ball is chosen at random from the first box and placed in the second. Then a ball is chosen at random from the second box. Question: Suppose this second ball was blue. What is the probability that the first ball was also blue? () Lecture 3 January 7, 2013 16 / 35

The question Solution: Let A = {Ball 2 is blue} and B = {Ball 1 is blue}. Sifting the information we are given we deduce the priors P (A B) = 1/3, P (B) = 3/5, P (A B c ) = 1/6 and P (B c ) = 2/5. Using Bayes Theorem, the required probability is therefore P (B A) = 1 P (A B)P (B) P (A B)P (B) + P (A B c )P (B c ) = 3 3 5 1 3 3 5 + 1 6 2 5 = 3 4. () Lecture 3 January 7, 2013 17 / 35

A famous question (The Monty Hall Problem) Monty Hall is the host of a game show. There are three doors. Behind one is a car, and behind the other two are goats. It is the contestant s goal to win the car by choosing the door with the car behind it. Monty knows which door has the car and invites the contestant to choose a door. Once chosen, Monty chooses one of the other doors that has a goat behind it. The car is still behind a closed door. The contestant is now given the opportunity to switch choice to the other closed door. To maximise his chances of winning the car, should the contestant: 1 switch? 2 stay with his original choice? 3 or does it not matter? () Lecture 3 January 7, 2013 18 / 35

Solution to the Monty Hall Problem Solution: Define the events, A = {contestant picks the correct door first time and X = {Monty opens a door with a goat}. P (X) = P (X A) = P (X A c ) = 1, so by Bayes Theorem P (A X) = Then P (A) = 1/3 and P (X A)P (A) P (X A)P (A) + P (X A c )P (A c ) = 1 3. So P (A X) = P (A) and the contestant should switch doors to increase his probability of winning to 2/3. () Lecture 3 January 7, 2013 19 / 35

An example from www.yudkowsky.net In a certain age group 1% of women have breast cancer. A mammogram correctly confirms the disease in 80% of women tested but incorrectly gives a positive result in 9.6% of cases. What is the probability that a woman who receives a positive result actually has breast cancer? (Source: www.yudkowsky.net/bayes/bayes.html.) () Lecture 3 January 7, 2013 20 / 35

Solution Let X be the event that a woman gets a positive test result and A be the event that she has breast cancer. We want P (A X) and we use Bayes Theorem. The priors are: P (A) = 1/100, P (X A) = 4/5 and P (X A c ) = 12/125. Hence, P (A X) = P (X A)P (A) P (X A)P (A) + P (X A c )P (A c ) = 25 322 7.76%. So less than eight in every hundred women with a positive test actually have breast cancer. () Lecture 3 January 7, 2013 21 / 35

Contents 1 Fast review of last week s lecture 2 Bayes theorem 3 Independence () Lecture 3 January 7, 2013 22 / 35

Some discussion We see that in the famous Monty Hall s problem that Monty s choice helps contestant to have a better chance to win the car. Today we shall study that the conditional probability between two events is unaffected by the second, i.e., the two events are independent. Independence is one of the several most important conceptions in probability theory. Example: Choose a day at random. Does knowing B alter P (A) in the case below? A = {it is Saturday}, B = {it is morning}. Solution: Obviously not: P (A) = P (A B) = 1/7. () Lecture 3 January 7, 2013 23 / 35

Independence For two events, A and B with P (B) > 0 and P (A) > 0, when P (A) = P (A B), we have P (A B) = P (A B)P (B) = P (A)P (B). () Lecture 3 January 7, 2013 24 / 35

Independence Definition (Independence) Two events, A and B with P (B) > 0 and P (A) > 0, are said to be independent when P (A B) = P (A)P (B). Here we use the definition with a form different from that in long lecture note by Jacques Furter, but they are equivalent (see the next slide). () Lecture 3 January 7, 2013 25 / 35

Independence(equivalent definition) The equivalent definition of independence is Definition (Independence) Two events, A and B with P (B) > 0 and P (A) > 0, are said to be independent when P (A B) = P (A). Proof. 1. If P (A B) = P (A), then P (A B) = P (A B)P (B) = P (A)P (B). 2. If P (A B) = P (A)P (B), because P (A B) = P (A B)P (B), we have P (A B)P (B) = P (A)P (B). Hence, P (A B) = P (A). () Lecture 3 January 7, 2013 26 / 35

Independence Definition (independence of A, B, C) Three events, A, B and C, are said to be (collectively) independent when, P (A B C) = P (A)P (B)P (C) and each pair (A, B), (B, C), (C, A) are also (pairwise) independent. Similar definitions exist for four or more events. Also, A and B are said to be conditionally independent, given C, when P (A B C) = P (A C) P (B C). () Lecture 3 January 7, 2013 27 / 35

Independence and Dependence It is sometimes obvious that events are independent. For example, if we toss a coin twice, the events A = {first toss is heads} and B = {second toss is tails are clearly independent if the coin is fair. But, what about these: A = {a burglary occurs in January}, B = {the government loses the May election}? Probability of B is most probably influenced by the event A. () Lecture 3 January 7, 2013 28 / 35

Example(rolling a die) We roll a die two times. Let A be the event where the first throw score is between 1 and 3, B the event where the second throw score is between 4 and 6 and C the event where both scores are either between 1 and 3 or between 4 and 6. Show that A, B and C are pairwise independent, but A, B and C are (collectively) not independent. () Lecture 3 January 7, 2013 29 / 35

Solution Solution: The set of outcomes for the throw of a die is S = {1,..., 6}. The sample space is Ω = S S. Let x i N, i = 1, 2, be the score on the i th throw. The event A is given by {1, 2, 3} S, B is S {4, 5, 6} and C by {1, 2, 3} 2 {4, 5, 6} 2 where X 2 denotes the cartesian product of the set X. We see that Ω = 36, A = B = C = 18, hence P (A) = P (B) = P (C) = 1 2. Then we can show that the three events A, B and C are pairwise independent: () Lecture 3 January 7, 2013 30 / 35

Solution (continued) 1 To check for independence of A and B, we calculate P (A B) = P ({1, 2, 3} {4, 5, 6}) = 1 4. Hence P (A B) = P (A)P (B). 2 To check for independence of A and C, we calculate P (A C) = P ({1, 2, 3} 2 ) = 1 4. Hence P (A C) = P (A)P (C). 3 To check for independence of B and C, we calculate P (B C) = P ({4, 5, 6} 2 ) = 1 4. Hence P (B C) = P (B)P (C). () Lecture 3 January 7, 2013 31 / 35

Solution (continued) A B C =. Therefore, but Therefore, P (A B C) = 0, P (A)P (B)P (C) = 1 8. P (A B C) P (A)P (B)P (C). () Lecture 3 January 7, 2013 32 / 35

Another example When we throw a dye, the sample space is Ω = {1,..., 6}. The events A = {a} and B = {b} are not independent when a b. Indeed, they are exclusive in that case, and so P (A B) = 0, which is clearly different from P (A) or P (B). () Lecture 3 January 7, 2013 33 / 35

Another example (continued) On the other hand, if we throw a dye twice in succession, getting a score a the first time and b the second are indeed independent events. The sample space is the set of couples Ω 2 = Ω 2. Now, the event corresponding to the first score of a is A 2 = {a} Ω, for the score b the event is B 2 = Ω {b}. Those events are independent because A 2 B 2 = {(a, b)} = 1, A 2 = B 2 = 6 and Ω 2 = 36. Hence, P (A B) = P (A 2 B 2 ) P (B 2 ) = 1/36 1/6 = 1/6 = P (A 2). () Lecture 3 January 7, 2013 34 / 35

Thanks for attention! () Lecture 3 January 7, 2013 35 / 35