Computing Posterior Probabilities. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Size: px

Start display at page:

Download "Computing Posterior Probabilities. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington"

Letitia Goodwin
5 years ago
Views:

1 Computing Posterior Probabilities Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1

2 Overview of Candy Bag Example As described in Russell and Norvig, for Chapter 20 of the 2 nd edition: Five kinds of bags of candies. 10% are h 1 : 100% cherry candies 20% are h 2 : 75% cherry candies + 25% lime candies 40% are h 3 : 50% cherry candies + 50% lime candies 20% are h 4 : 25% cherry candies + 75% lime candies 10% are h 5 : 100% lime candies Each bag has an infinite number of candies. This way, the ratio of candy types inside a bag does not change as we pick candies out of the bag. We have a bag, and we are picking candies out of it. Based on the types of candies we are picking, we want to figure out what type of bag we have. 2

3 Hypotheses and Prior Probabilities Five kinds of bags of candies. 10% are h 1 : 100% cherry candies 20% are h 2 : 75% cherry candies + 25% lime candies 40% are h 3 : 50% cherry candies + 50% lime candies 20% are h 4 : 25% cherry candies + 75% lime candies 10% are h 5 : 100% lime candies Each h i is called a hypothesis. The initial probability that is given for each hypothesis is called the prior probability for that hypothesis. It is called prior because it is the probability we have before we have made any observations. 3

4 Observations and Posteriors Out of our bag, we pick T candies, whose types are: Q 1, Q 2,, Q T. Each Q j is equal to either C (cherry) or L ( lime ). These Q j s are called the observations. Based on our observations, we want to answer two types of questions: What is P(h i Q 1,, Q t )? Probability of hypothesis i after t observations. This is called the posterior probability of h i. What is P(Q t+1 = C Q 1,, Q t )? Similarly, what is P(Q t+1 = L Q 1,, Q t ) Probability of observation t+1 after t observations. 4

5 Simplifying notation Define: P t (h i ) = P(h i Q 1,, Q t ) P t (Q t+1 = C) = P(Q t+1 = C Q 1,, Q t )? Special case: t = 0 (no observations): P 0 (h i ) = P(h i ) P 0 (h i ) is the prior probability of h i P 0 (Q 1 = C) = P(Q 1 = C) P 0 (Q 1 = C) is the probability that the first observation is equal to C. 5

6 Questions We Want to Answer, Revisited Using the simplified notation of the previous slide: What is P t (h i )? Posterior probability of hypothesis i after t observations. What is P t (Q t+1 = C)? Similarly, what is P t (Q t+1 = L) Probability of observation t+1 after t observations. 6

7 Computing P 0 (Q t ) As an example: Consider P 0 (Q 1 = C). What does P 0 (Q 1 = C) mean? 7

8 Computing P 0 (Q t ) As an example: Consider P 0 (Q 1 = C). What does P 0 (Q 1 = C) mean? It is the probability that the first candy we pick out of our bag is a cherry candy. P 0 Q 1 = C = P Q 1 = C h 1 ) P(h 1 ) + P Q 1 = C h 2 ) P(h 2 ) + P Q 1 = C h 3 ) P(h 3 ) + P Q 1 = C h 4 ) P(h 4 ) + P Q 1 = C h 5 ) P(h 5 ) 8

9 Computing P 0 (Q t ) As an example: Consider P 0 (Q 1 = C). What does P 0 (Q 1 = C) mean? It is the probability that the first candy we pick out of our bag is a cherry candy. P 0 Q 1 = C = = 0.5 9

10 Computing P 1 (h i ) As an example: Consider P 1 (h 1 ). What does P 1 (h 1 )mean? P 1 h 1 = P h 1 Q 1 = C) P 1 h 1 is the probability that our bag is of type h 1, if the first candy we pick out of our bag is a cherry candy. 10

11 Computing P 1 (h i ) As an example: Consider P 1 (h 1 ). P 1 h 1 = P h 1 Q 1 = C) P h 1 Q 1 = C) = 11

12 Computing P 1 (h i ) As an example: Consider P 1 (h 1 ). P 1 h 1 = P h 1 Q 1 = C) P h 1 Q 1 = C) = P Q 1=C h 1 ) P(h 1 ) P(Q 1 =C) 12

13 Computing P 1 (h i ) As an example: Consider P 1 (h 1 ). P 1 h 1 = P h 1 Q 1 = C) P h 1 Q 1 = C) = P Q 1=C h 1 ) P(h 1 ) P(Q 1 =C) = =

14 Computing P 1 (h i ) Consider P 1 (h 2 ). P 1 h 2 = P h 2 Q 1 = C) P h 2 Q 1 = C) = P Q 1=C h 2 ) P(h 2 ) P(Q 1 =C) = =

15 Computing P 1 (h i ) Consider P 1 (h 3 ). P 1 h 3 = P h 3 Q 1 = C) P h 3 Q 1 = C) = P Q 1=C h 3 ) P(h 3 ) P(Q 1 =C) = =

16 Computing P 1 (h i ) Consider P 1 (h 4 ). P 1 h 4 = P h 4 Q 1 = C) P h 4 Q 1 = C) = P Q 1=C h 4 ) P(h 4 ) P(Q 1 =C) = =

17 Computing P 1 (h i ) Consider P 1 (h 5 ). P 1 h 5 = P h 5 Q 1 = C) P h 5 Q 1 = C) = P Q 1=C h 5 ) P(h 5 ) P(Q 1 =C) = = 0 17

18 Updated Probabilities, after Q 1 = C P 0 (h 1 ) = 0.1 P 1 (h 1 ) = 0.2 : 100% cherry candies P 0 (h 2 ) = 0.2 P 1 (h 2 ) = 0.3 : h 2 : 75% cherry candies + 25% lime candies P 0 (h 3 ) = 0.4 P 1 (h 3 ) = 0.4 : h 3 : 50% cherry candies + 50% lime candies P 0 (h 4 ) = 0.2 P 1 (h 4 ) = 0.1 : h 4 : 25% cherry candies + 75% lime candies P 0 (h 5 ) = 0.1 P 1 (h 5 ) = 0.0 : h 5 : 100% lime candies Probabilities have changed for each bag, now that we have picked one candy and we have seen that it is a cherry candy. 18

19 Computing P 1 (Q t ) Now, consider P 1 (Q 2 = C). What does P 1 (Q 2 = C) mean? It is the probability that the second candy we pick out of our bag is a cherry candy, given our knowledge of the type of the first candy. To continue with our previous example, we assume that the first candy was cherry, so Q 1 = C. P 1 Q 2 = C = P Q 2 = C Q 1 = C) 19

20 Computing P 1 (Q t ) P 1 Q 2 = C = P Q 2 = C Q 1 = C) = P Q 1 = C h 1, Q 1 = C) P h 1 Q 1 = C) + P Q 1 = C h 2, Q 1 = C) P h 2 Q 1 = C) + P Q 1 = C h 3, Q 1 = C) P h 3 Q 1 = C) + P Q 1 = C h 4, Q 1 = C) P h 4 Q 1 = C) + P Q 1 = C h 5, Q 1 = C) P h 5 Q 1 = C) 20

21 Computing P 1 (Q t ) P 1 Q 2 = C = P Q 2 = C Q 1 = C) = P Q 1 = C h 1, Q 1 = C) P h 1 Q 1 = C) + P Q 1 = C h 2, Q 1 = C) P h 2 Q 1 = C) + P Q 1 = C h 3, Q 1 = C) P h 3 Q 1 = C) + P Q 1 = C h 4, Q 1 = C) P h 4 Q 1 = C) + P Q 1 = C h 5, Q 1 = C) P h 5 Q 1 = C) NOTE: Q 2 is conditionally independent of Q 1 given h i. If we know the type of bag, what we have already picked does not change our expectation of what we will pick next. Therefore, we can simplify P Q 1 = C h 1, Q 1 = C) as P Q 1 = C h 1 ). 21

22 Computing P 1 (Q t ) We simplify P Q 1 = C h 1, Q 1 = C) as P Q 1 = C h 1 ) P 1 Q 2 = C = P Q 2 = C Q 1 = C) = P Q 1 = C h 1 ) P h 1 Q 1 = C) + P Q 1 = C h 2 ) P h 2 Q 1 = C) + P Q 1 = C h 3 ) P h 3 Q 1 = C) + P Q 1 = C h 4 ) P h 4 Q 1 = C) + P Q 1 = C h 5 ) P h 5 Q 1 = C) 22

23 Computing P 1 (Q t ) We simplify P Q 1 = C h 1, Q 1 = C) as P Q 1 = C h 1 ) P 1 Q 2 = C = P Q 2 = C Q 1 = C) = P Q 1 = C h 1 ) P h 1 Q 1 = C) + P Q 1 = C h 2 ) P h 2 Q 1 = C) + P Q 1 = C h 3 ) P h 3 Q 1 = C) + P Q 1 = C h 4 ) P h 4 Q 1 = C) + P Q 1 = C h 5 ) P h 5 Q 1 = C) We now can plug in numbers everywhere: We know P Q 1 = C h i ). We computed P h i Q 1 = C) in previous slides, and we called it P 1 (h i ). 23

24 Computing P 1 (Q t ) P 1 Q 2 = C = P Q 2 = C Q 1 = C) = = 0.65 Notice the difference caused by the knowledge that Q 1 = C: P 0 Q 1 = C = 0.5 P 1 Q 2 = C =

25 Computing P 2 (h i ) Now, consider P 2 (h 1 ). What does P 2 (h 1 ) mean? We have defined P 2 (h 1 ) as the probability of h 1 after the first two observations. In our example, the first two observations were both of type cherry. Therefore: P 2 h 1 = P h 1 Q 1 = C, Q 2 = C) =??? 25

26 A Special Case of Bayes Rule The normal version of Bayes rule states that: P(A B) = P(B A) P(A) P B From the basic formula, we can derive a special case of Bayes rule, that we can apply if we also know some other fact F: P A B, F) = P B A, F) P A F) P B F Here, we want to compute P h 1 Q 1 = C, Q 2 = C). We will apply the special case of Bayes rule, with: h 1 as A. "Q 2 = C" as B. "Q 1 = C" as F. 26

27 Computing P 2 (h i ) P 2 h 1 = P h 1 Q 2 = C, Q 1 = C) = P Q 2 = C h 1, Q 1 = C) P h 1 Q 1 = C) P Q 2 = C Q 1 = C) These are all quantities we have computed before: 27

28 Computing P 2 (h i ) P 2 h 1 = P h 1 Q 2 = C, Q 1 = C) = P Q 2 = C h 1, Q 1 = C) P h 1 Q 1 = C) P Q 2 = C Q 1 = C) These are all quantities we have computed before: P Q 2 = C h 1, Q 1 = C) = P Q 2 = C h 1 ) = 1. P h 1 Q 1 = C) = P 1 h 1 = 0.2 P Q 2 = C Q 1 = C) = P 1 Q 2 = C =

29 Computing P 2 (h i ) P 2 h 1 = P h 1 Q 2 = C, Q 1 = C) = P Q 2 = C h 1, Q 1 = C) P h 1 Q 1 = C) P Q 2 = C Q 1 = C) = P Q 2 = C h 1 ) P 1 h 1 P 1 Q 2 = C =

30 Computing P 2 (h i ) P 2 h 2 = P h 2 Q 2 = C, Q 1 = C) = P Q 2 = C h 2, Q 1 = C) P h 2 Q 1 = C) P Q 2 = C Q 1 = C) = P Q 2 = C h 2 ) P 1 h 2 P 1 Q 2 = C =

31 Computing P 2 (h i ) P 2 h 3 = P h 3 Q 2 = C, Q 1 = C) = P Q 2 = C h 3, Q 1 = C) P h 3 Q 1 = C) P Q 2 = C Q 1 = C) = P Q 2 = C h 3 ) P 1 h 3 P 1 Q 2 = C =

32 Computing P 2 (h i ) P 2 h 4 = P h 4 Q 2 = C, Q 1 = C) = P Q 2 = C h 4, Q 1 = C) P h 4 Q 1 = C) P Q 2 = C Q 1 = C) = P Q 2 = C h 4 ) P 1 h 4 P 1 Q 2 = C =

33 Computing P 2 (h i ) P 2 h 5 = P h 5 Q 2 = C, Q 1 = C) = P Q 2 = C h 5, Q 1 = C) P h 5 Q 1 = C) P Q 2 = C Q 1 = C) = P Q 2 = C h 5 ) P 1 h 5 P 1 Q 2 = C = = 0 33

34 Updated Probabilities Probabilities of bags: Before any observations. After one observation (assuming the first candy is of type cherry). After two observations (assuming both candies are of type cherry). P 0 (h 1 ) = 0.1 P 1 (h 1 ) = 0.2 P 2 (h 1 ) = h 1 : 100% cherry candies P 0 (h 2 ) = 0.2 P 1 (h 2 ) = 0.3 P 2 (h 2 ) = h 2 : 75% cherry + 25% lime P 0 (h 3 ) = 0.4 P 1 (h 3 ) = 0.4 P 2 (h 3 ) = h 3 : 50% cherry + 50% lime P 0 (h 4 ) = 0.2 P 1 (h 4 ) = 0.1 P 2 (h 4 ) = h 4 : 25% cherry + 75% lime P 0 (h 5 ) = 0.1 P 1 (h 5 ) = 0.0 P 2 (h 5 ) = 0.0 h 5 : 100% lime candies 34

35 Computing P t (h i ) Let t be an integer between 1 and T: We have defined P t (h i ) = P(h i Q 1,, Q t ) To compute P t (h i ), we will use again the special case of Bayes rule that we used before: P A B, F) = P B A, F) P A F) P B F We will apply this formula, using: h i as A. Q t as B. "Q 1, Q 2,, Q t-1 " as F. 35

36 Computing P t (h i ) Let t be an integer between 1 and T: P t (h i ) = P(h i Q 1,, Q t ) = P(Q t h i, Q 1,, Q t-1 ) * P(h i Q 1,, Q t-1 ) P(Q t Q 1,, Q t-1 ) => => P t (h i ) = P(Q t h i ) * P t-1 (h i ) P t-1 (Q t ) 36

37 Computing P t+1 (Q t ) P t (Q t+1 ) = P(Q t+1 Q 1,, Q t ) = 5 Σ i=1 (P(Q t+1 h i ) P(h i Q 1,, Q t )) => 5 P t (Q t+1 ) = (P(Q t+1 h i ) P t (h i )) Σ i=1 37

38 Computing P t (h i ) (continued) The formula is recursive, as it requires knowing P t-1 (h i ). P t (h i ) = The base case is P 0 (h i ) = P(h i ). P(Q t h i ) * P t-1 (h i ) P t-1 (Q t ) To compute P t (h i ) we also need P t-1 (Q t ). We show how to compute that next. 38

39 Computing P t (h i ) and P t (Q t+1 ) Base case: t = 0. P 0 (h i ) = P(h i ), where P(h i ) is known. 5 Σ P 0 (Q 1 ) = ( P(Q 1 h i ) * P(h i ) ), where P(Q 1 h i ) is known. i=1 To compute P t (h i ) and P t (Q t+1 ): For j = 1,, t Compute P j (h i ) = P(Q j h i ) * P j-1 (h i ) P j-1 (Q j ) 5 Σ Compute P j (Q j+1 ) = ( P(Q j+1 h i ) * P j (h i )) i=1 39

40 Computing P t (h i ) and P t (Q t+1 ) Base case: t = 0. P 0 (h i ) = P(h i ), where P(h i ) is known. 5 Σ P 0 (Q 1 ) = ( P(Q 1 h i ) * P(h i ) ), where P(Q 1 h i ) is known. i=1 To compute P t (h i ) and P t (Q t+1 ): For j = 1,, t Compute P j (h i ) = known computed at previous round P(Q j h i ) * P j-1 (h i ) P j-1 (Q j ) computed at previous round 5 Σ Compute P j (Q j+1 ) = ( P(Q j+1 h i ) * P j (h i )) i=1 known computed at previous line 40

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete