Probability Theory and Applications Videos of the topics covered in this manual are available at the following links: Lesson 4 Probability I http://faculty.citadel.edu/silver/ba205/online course/lesson 04.wmv Lesson 5 Probability II Conditional Probability, Independence, Bayes http://faculty.citadel.edu/silver/ba205/online course/lesson 05.wmv Introduction to the theory of probability. The probability of any outcome of an experiment is defined to be the relative likelihood that it will occur and is given as a number between 0 and 1. Think of a trial of any experiment, for example, tossing a coin. Either a heads or a tails will occur (we assume the chance that the coin lands on its edge is 0, meaning it cannot occur). If we assign the value 0 for a tail and 1 for a head, then we can add up the number of heads, i.e. the 1 s, and divide by the total number of tosses n. What we get is the proportion of heads in the n tosses in our experiment. This will be the proportion of heads in our sample, or our sample proportion. [We talk more about the sample proportion in the statistics text in the discussion of the binomial distribution.] Now we can conduct this experiment over and over again and add the number of 1 s in the next set of trials to the number of 1 s already accumulated and divide it by the accumulated number of trials. As we do this, the proportion of heads in our experiment will approach the true proportion of heads in the coin, which, if it s a fair coin, will be ½. Thus, we may think of the probability of an outcome or event (to be defined shortly) as the proportion of times it will occur in an experiment conducted an infinite number of times. Such an experiment, in which there are only two possible outcomes such as a coin toss, is called a binomial, or Bernoulli, experiment. In general, however, an experiment may have more than two possible outcomes, in which case we call it a multinomial experiment. [In the statistics notes to this course, we will not cover probability theory for multinomial experiments; the treatment of such cases is found in more advanced statistics texts.] We can list these outcomes as O 1, O 2,, O n. Then the sample space S is defined to be the set that lists all these possible outcomes, S = {O 1, O 2,, O n }. An event in S is defined as any subset of S. For example if S = {H, T} for the experiment of a single coin toss, then Head = {H} is a (proper) subset of S. Therefore, Head is both an outcome and an event in S. Another possible event in S is the null, or empty, set Φ. For example, rolling both a head and a tail on a single toss is not possible. This example in which two outcomes occur at the same time introduces the concept of intersection. A and B A B A intersect B is the event containing all values that are in both A and
B. Because any value outside of either of the two events A and B lies outside A B, A B is rather exclusive. In fact, often this event is empty. In this case we say A and B are mutually exclusive, or disjoint; that is, there is no overlap of the two events. Another event is A or B AUB A union B, which is the event that includes all outcomes in S that are in either A or B. This event is relatively large in that any outcome that is not outside both A and B is in AUB. In particular, A B lies in AUB. In fact, if A B AUB then A B. We also need an event to represent all outcomes in S not in A; we call this event A complement, or A c. [Some texts use or A.] From the figure below, known as a Venn diagram, we see that all of S can be partitioned into four mutually exclusive events in S, namely: A B, A B c, A c B, and A c B c. Now, consider A c B c, which contains all outcomes not in A and not in B. Then any outcome in either A or B is outside this event; otherwise the outcome is inside A c B c. Thus it is in the complement of AUB or in (AUB) c. So A c B c (AUB) c. Also, consider the event A c UB c, which contains all outcomes in S that are not in A or not in B. Thus, it contains all outcomes outside (A and B), that is, outside A B; otherwise the outcome is in the event. Therefore, A c UB c (A B) c. The two rules A c B c (AUB) c and A c UB c (A B) c are called De Morgan s rules and are very useful in probability theory. We will often use these rules in calculating probabilities of events. In looking at these two formulas, we see that we take the complements of the two outcomes, reverse the union/intersection sign, and then take the complement of this event. Let us now do a simple example problem applying the concepts. Take the experiment of rolling a single die. Then S = {1, 2, 3, 4, 5, 6} is a listing of the possible outcomes. One event in S is S is rolling an even number, Even = {2, 4, 6}; Odd = {1, 3, 6}. Obviously, Even Odd = Φ, and Even U Odd = S. What about rolling an even R(E) and rolling a
number greater than 3 R(>3)? Calling this even A, the only outcomes in this event are 4 and 6, so R(E) R(>3) = {4, 6}. What about rolling an even or rolling a number greater than three. This includes 2, 4, 5, and 6; then R(E) U R(>3) = {2, 4, 5, 6}. Now suppose we roll two dice. The outcomes are ordered pairs of numbers; for example (1, 2) is the outcome of rolling a 1 on the first die and a 2 on the second. Then S = {(1, 1), (1, 2),, (1, 6), (2, 1), (2, 6),, (6, 1),, (6, 6)}. Now one game that is played rolling, shooting, two dice is craps. In this game we add the values showing on the two dice; for example (1, 6) yields a 7. We now divide S up into mutually exclusive events based on the sum of the two values. Thus, R(2) = the set of outcomes adding to 2 = {(1, 1)}, R(7) = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}. We can now list these events, and their probabilities, as follows: Events and Their Probabilities for the Game of Craps Cumulative Event (X) Outcomes in Event X P(X) P(X) R(2) (1,1) 1/36 1/36 R(3) (1,2), (2,1) 2/36 3/36 R(4) (1,3), (2,2), (3,1) 3/36 6/36 R(5) (1,4), (2,3), (3,2), (4,1) 4/36 10/36 R(6) (1,5), (2,4), (3,3), (4,2), (5,1) 5/36 15/36 R(7) (1,6), (2,5), (3,4), (4,3), (5,2), (6,1) 6/36 21/36 R(8) (2,6), (3,5), (4,4), (5,3), (6,2) 5/36 26/36 R(9) (3,6), (4,5), (5,4), (6,3) 4/36 30/36 R(10) (4,6), (5,5), (6,4) 3/36 33/36 R(11) (5,6), (6,5) 2/36 35/36 R(12) (6,6) 1/36 36/36 Total 36 Outcomes in craps each with p = 1/36 1 As a side note, the game is played as follows: the shooter rolls the two dice; if they sum to 7 or 11 the shooter wins the pot. If not, then the shooter continues to roll the dice until one of two events occur: he rolls a 7, and loses, or he rolls the same value as originally, in which case he wins. In Las Vegas there are some additional rules to make the game fair, so that the house and the shooter have about an equal chance of winning: 2, 3 and 12 on the first roll are automatic losers. Now let s see how to calculate the probability that the shooter wins the game. There is a 6/36 (rolls a 7 ) + 2/36 (rolls an 11 ) = 8/36 = 2/9 chance the shooter wins on the first roll. There is a 3/36 chance of rolling a 4 on the first roll and a 3/9 probability of following the first roll with a second roll of 4. Why? Because after the first roll there are exactly 9 ways the game can end: six of these are rolling a 7 and three are by rolling a 4. Since each outcome is equally likely and three of the nine are favorable, the probability of a win for the shooter after rolling an initial 4 is 3/9. Thus, the shooter has a (3/36)*(3/9) = 1/36 chance of rolling an initial 4 and winning the game. More will be
discussed about these calculations when we talk about independence and the chain rule later in this chapter. As an exercise, show that the probability the shooter will win is 0.49293. Now returning to the Venn diagram, suppose we are given the following probabilities: P(A) =.6, P(B) =.4, P(A B) =.3. How would we calculate the following probabilities: P(AUB), P(A c B), P(A c UB)? We need some additional theory to help us here. Looking at the Venn diagram, suppose we drew events A and B such that the area of the events relative to the area of S just equals the probability of the events. Thus, A occupies 60% of the area of S, B occupies 40% and A B 30%. If we add the areas of A and B we get 100%, but clearly the area of AUB is not all of S. The reason is that part of A overlaps with B and we have double counted that area when adding up the two areas. Thus we need to subtract out the overlap once to avoid the double-count. So the area of AUB is.6 +.4 -.3 =.7 = P(AUB). Thus, P(AUB) = P(A) + P(B) P(A B). This is often referred to as the addition rule of probability. Now the P(A c B) is the P(B) P(A B) =.4 -.3 =.1 and the P(A c UB) = P(A c ) + P(B) P(A c B) = (1 -.6) +.4 -.1 =.7. Using De Morgan we get A c UB (A B c ) c and P(A c UB) = 1 P(A B c ) = 1 [P(A) P(A B)] = 1 (.6 -.3) =.7. So we have checked the answer using the addition rule with De Morgan s rules. And what about P(A c UB c )? Using De Morgan, we get 1 P(A B) = 1.3 =.7. Using the addition rule, we get P(A c ) + P(B c ) P(A c B c ) = (1 -.6) + (1 -.4) [1 P(AUB)] =.4 +.6 -.3 =.7. Again the same both ways. Conditional probability, Independence and the Chain Rule. Suppose two events A and B are mutually exclusive. Thus, there is no overlap of the circles representing the two events A and B. Now what is the answer to the following question: What is the probability of A given that B has occurred? Obviously the answer is 0 since A cannot occur if B has occurred. What if A and B do overlap? In this case we need to ask how likely is it for A to occur knowing that we are inside circle B. But the only way that A can occur inside B is if A overlaps with B since the part of A outside B is not relevant. In this case the answer is the area of A B divided by the area of B. And since the area of the events are proportional to their probabilities, the answer is P(A given B) = P(A B)/P(B). For simplicity, we write P(A given B) as P(A B). The chain rule follows directly by multiplying both sides of this relationship by P(B); P(A B) = P(B)*P(A B). In other words, A and B can occur if B occurs and then A occurs given that B has occurred. Of course, we could equally well say A and B can occur if first A occurs and then B occurs given that A has occurred; that is, P(A B) = P(A)*P(B A). Returning to our craps example, the probability of rolling a 4 on the first roll and winning
the game is P[R(4)]*P[R(4) on the second roll before rolling a 7 given rolling a 4 on the first roll]. But the probability of R(4) on the two rolls are independent of each other, meaning that the outcome of the second roll does not depend on the outcome of the first roll. So the probability of rolling a 4 on the second roll before rolling a 7 = the number of outcomes resulting in a roll of 4 divided by the number of ways to roll a 4 or a 7 = 9. Thus, P[R(4)]*P[R(4) on the second roll before rolling a 7, given that the first roll is a 4] = (3/36)*(3/9) = 1/36. [You now have all you need to calculate the probability of winning at the craps table! Give it a try.] Formalizing the concept of independence, we say two events A and B are independent if and only if P(A B) = P(A). But since P(A B) = P(A B)/P(B), then if A and B are independent, we see that P(A B) = P(A B)/P(B) = P(A). Multiplying both sides of this last equation by P(B), we get P(A B) = P(A)*P(B). It is also the case that if P(A B) = P(A)*P(B), then A and B are independent. In fact, some texts define independence in just this way. Bayesian inference. The idea is that as new information arises our prior probability estimates for the state of the world must be revised. As a simple illustration, suppose I believe that a coin is fait; that is, P(H) = P(T) =.5, where H is a head and T is a tail. Now suppose I toss the coin three times and every time a head appears. Am I likely to continue to believe that the coin is fair? Suppose I am allowed to choose between two coins to toss, one which is fair, the other with two heads. I choose the coin randomly and then am allowed to toss it three times. Now if any of the three tosses is tails, I am certain that it is a fair coin. But what is the probability that the coin is fair given that all three tosses are heads? To answer this question we turn to Bayes. Let 3H be the event of rolling three straight heads, F = fair coin and B = biased coin. We know the following probabilities P(F) = P(B) =.5 since I chose the coin at random. And P(F 3H) = P(F 3H)/P(3H) = P(F 3H)/ [P(F 3H) + P(B 3H)]. Now P(F 3H) = P(F) *P(3H F) =.5*.125 =.0625 and P(B 3H) = P(B) *P(3H B) -.5*1 =.5. Thus, P(F 3H) =.0625/ (.0625 +.5) =.0625/.5625 = 1/9 =.111. So there is an 8/9 probability the coin is biased. Now let us generalize Bayes Theorem. Let here be n possible states of nature, S 1, S 2,, S n, and k possible outcomes/events that can occur under each state, say O 1, O 2, O k. Let us suppose we conduct our experiment and O i occurs. The question then is what is the probability that nature is in state S j given O i occurred. Using the logic we just used to solve the problem above we can generalize the problem to finding P(S j O i ) = P(S j O i )/P(O i ) = P(S j O i )/[P(S 1 O i ) + P(S 2 O i ) = + P(S n O i )] = P(S j )*P(O i S j )/[P(S 1 )*P(O i S 1 ) + P(S 2 )*P(O i S 2 ) + + P(S n )*P(O i S n )]. An easy way to work this type of problem is to set up a contingency table as follows.
Contingency Table for Solving a Bayesian Problem S 1 S 2 S j S n O 1 P(S 1 )*P(O 1 S 1 ) P(S 2 )*P(O 1 S 2 ) P(S j )*P(O 1 S j ) P(S n )*P(O 1 S n ) P(O 1 ) O 2 P(S 1 )*P(O 2 S 1 ) P(S 2 )*P(O 2 S 2 ) P(S j )*P(O 2 S j ) P(S n )*P(O 2 S n ) P(O 2 ) O i P(S 1 )*P(O i S 1 ) P(S 2 )*P(O i S 2 ) P(S j )*P(O i S j ) P(S n )*P(O i S n ) P(O i ) O k P(S 1 )*P(O k S 1 ) P(S 2 )*P(O k S 2 ) P(S j )*P(O k S j ) P(S n )*P(Ok Sn) P(O k ) P(S 1 ) P(S 2 ) P(S j ) P(S n ) 1 Then if we want to know P(O i S j ) we divide the entry in the i th row and j th column by the i th row sum. Below we present historical weather data for three cities: New York City, Miami, and Atlanta. Each cell is the probability of the weather condition in the row for the city in the column. Thus, the probability of rain in NY is 0.1 and the probability of clear skies for Atlanta is 0.4. Now let us suppose our prior probabilities for being in each city is P(NY) = 0.3, P(Miami) = 0.5, and the P(Atlanta) = 0.2. In the next chart we present the joint probabilities of each weather condition for each city. For example, P(Miami and Rain) = P(Miami)*P(Rain Miami) = 0.5*0.4 = 0.2. Conditional probabilities of each condition NY Miami Atlanta Rain 0.1 0.4 0.2 Cloudy but no rain 0.3 0.4 0.4 Clear skies 0.6 0.2 0.4 Joint probabilities of each condition and each city NY Miami Atlanta Rain 0.03 0.2 0.04 0.27 Cloudy but no rain 0.09 0.2 0.08 0.37 Clear skies 0.18 0.1 0.08 0.36 0.3 0.5 0.2 1 Now the probability that I am in Miami given that the weather is cloudy is P(Miami Cloudy)/P(Cloudy) =.2/.37 = 0.54, which is slightly greater than the a priori probability of my being in Miami = 0.5.