Probability, Entropy, and Inference / More About Inference

Size: px

Start display at page:

Download "Probability, Entropy, and Inference / More About Inference"

Angel Morris
5 years ago
Views:

1 Probability, Entropy, and Inference / More About Inference Mário S. Alvim (msalvim@dcc.ufmg.br) Information Theory DCC-UFMG (2018/02) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 1 / 50

2 Probability, Entropy, and Inference Mário S. Alvim Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 2 / 50

3 Definitions and notation An ensemble X is a triple (x, A X, P X ), where: x is the outcome of a random variable; A X = {a 1, a 2,..., a i,..., a I } is the set of possible values for the random variable, called an alphabet; P X = {p 1, p 2,..., p I } are the probability of each value, with P(x = a i ) = p i, p i 0, P(x = a i ) = 1. a i A X and We may write P(x = a i ) as P(a i ) or P(x) when no confusion is possible. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 3 / 50

4 Definitions and notation Example 1 Frequency of letters in The Frequently Asked Questions Manual for Linux. The outcome is a letter that is randomly selected from an English document. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 4 / 50

5 Definitions and notation Probability of a subset. If T is a subset of A X then P(T ) = P(x T ) = a i T P(x = a i ). Example 1 (Continued) If we define V to be vowels, V = {a, e, i, o, u}, then P(V ) = p(a) + p(e) + p(i) + p(o) + p(u) = = Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 5 / 50

6 Definitions and notation Joint ensemble. XY is an ensemble in which each outcome is an ordered pair x, y with x A X = {a 1,..., a I } and y A Y = {b 1,..., b J }. We call P(x, y) the joint probability of x and y. Commas are optional when writing ordered pairs, so xy x, y. In a joint ensemble XY the two random variables may or may not be independent. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 6 / 50

7 Definitions and notation Example 1 (Continued) Frequency of bigrams (pair xy of letters) in The FAQ Manual for Linux. The outcome is a pair of consecutive letters randomly selected from an English document. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 7 / 50

8 Definitions and notation We can obtain the marginal probability P(x) from the joint probability P(x, y) by summation: P(x = a i ) y A Y P(x = a i, y). Similarly, using briefer notation, the marginal probability of y is P(y) x A X P(x, y). Conditional probability. If P(y = b j ) 0, P(x = a i y = b j ) P(x = a i, y = b j ). P(y = b j ) If P(y = b j ) = 0, then P(x = a i y = b j ) is undefined. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 8 / 50

9 Definitions and notation Example 1 (Continued) Conditional probabilities of bigrams in The FAQ Manual for Linux. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 9 / 50

10 Definitions and notation We will use H to denote the assumptions on which the probabilities are based. Product rule / Chain rule: P(x, y H) = P(x H)P(y x, H) = P(y H)P(x y, H). Sum rule: P(x H) = y P(x, y H) = y P(y H)P(x y, H). Independence. Two random variables X and Y are independent (sometimes written as X Y ) if, and only if, for all x, y: P(x, y) = P(x)P(y). Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 10 / 50

11 What probability means Probabilities can be interpreted in two main ways: 1. The frequentist interpretation sees probabilities as frequencies of outcomes in random experiments. This is the usual approach we have taken so far. 2. The Bayesian interpretation (also known as degree-of-belief interpretation, knowledge interpretation, or subjective interpretation) sees probabilities as the degree to which one believes in the truth value of propositions depending on random variables. For instance, juries in trials usually try to access the probability that Mr. Jones killed his wife, given the evidence. The degree-of-belief interpretation of probabilities is essential for probabilistic reasoning, in particular, Bayesian reasoning. Mário S. Alvim Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 11 / 50

12 Cox axioms for reasoning about belief Notation. Let the degree of belief in proposition x be denoted by B(x), the negation of x be denoted by x, and the degree of belief in a conditional proposition, x, assuming proposition y to be true, be denoted by B(x y). Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 12 / 50

13 Cox axioms for reasoning about belief The Cox axioms establish what is essential to reason about belief: A1 Degrees of belief can be ordered. Using to denote is greater than : if B(x) B(y) and B(y) B(z) then B(x) B(z). (As a consequence, beliefs can be mapped onto real numbers.) A2 The degree of belief in a proposition x and its negation x are related. There is a function f such that B(x) = f (B(x)). A3 The degree of belief in a conjunction of propositions x, y (i.e., x y) is related to the degree of belief in the conditional proposition x y and the degree of belief in the proposition y. There is a function g such that B(x, y) = g(b(x y), B(y)). Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 13 / 50

14 Probabilities as belief If a set of beliefs satisfy the Cox axioms, we can map them onto probabilities satisfying p(false) = 0, p(true) = 1, and 0 p(x) 1, and also satisfying the rules of probability and p(x) = 1 p(x), p(x, y) = p(x y)p(y). This is actually what we have been using all along when applying Bayesian reasoning. Probabilities as belief or knowledge is a (very beautiful!) generalization of logic. I strongly recommend E. T. Jaynes s book Probability Theory: The Logic of Science [link]. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 14 / 50

15 Forward and inverse probabilities Using probabilities as degrees of belief, we can make inference of future outcomes and of past parameters of an experiment. We can talk, then, about forward and inverse probabilities. Mário S. Alvim Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 15 / 50

16 Forward and inverse probabilities Forward probability problems involve a generative model that describes a process that is assumed to give rise to some data. The task is to compute the probability distribution or expectation of some quantity that depends on the data. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 16 / 50

17 Forward and inverse probabilities Example 2 An urn contains K balls, of which B are black and W = K B are white. Fred draws a ball at random from the urn and replaces it, N times. a) What is the probability distribution of the number of times a black ball is drawn, n B? b) What is the expectation of n B? What is the variance of n B? What is the standard deviation of n B? Give numerical answers for the cases N = 5 and N = 400, when B = 2 and K = 10. Solution. a) Let us define f B = B /K as the fraction of black balls in the urn. The number of black balls follow has a binomial distribution. ( ) N p(n B f B, N) = f n B B (1 f B) N n B. n B Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 17 / 50

18 Forward and inverse probabilities Example 2 (Continued) b) Since we have a binomial distribution, its expecatation is its variance is E(n B ) = Nf B, Var(n B ) = Nf B (1 f B ), and its standard deviation is σ(n B ) = Nf B (1 f B ). Hence, when B = 2 and K = 10 we have f b = 1 /5, and for N = 5 we have E(n B ) = 1, Var(n B ) = 4/5, and σ(n B ) = for N = 400 we have E(n B ) = 80, Var(n B ) = 64, and σ(n B ) = 8. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 18 / 50

19 Forward and inverse probabilities Like forward probability problems, inverse probability problems involve a generative model of a process. But instead of computing the probability distribution of some quantity produced by the process, we compute the conditional probability of one or more of the unobserved variables in the process, given the observed variables. This invariably requires the use of Bayes theorem. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 19 / 50

20 Forward and inverse probabilities Example 3 Urn A contains three balls: one black, and two white; urn B contains three balls: two black, and one white. One of the urns is selected at random an one ball is drawn. The ball is black. What is the probability that the selected urn is A? Solution. We have two alternative hypotheses: the selected urn is A or it is B. We have two possible evidences: the ball drawn is black (b) or it s white (w). We can find p(urn = A ball = black) using Bayes s theorem: p(urn = A ball = b) = = p(urn = A, ball = b) p(ball = b) p(urn = A)p(ball = b urn = A) p(urn = A)p(ball = b urn = A) + p(urn = B)p(ball = b urn = B) 1/2 1/3 = 1/2 1/3 + 1/2 2/3 = 1 3 Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 20 / 50

21 Forward and inverse probabilities Like we said, using probabilities as degrees of belief, we can make inference of future outcomes and of past parameters of an experiment. Notation. Let H be the overall hypothesis space; θ be the parameters of an experiment; and D be the data obtained as the outcome of the experiment. p(θ H) is the prior probability of the parameter θ, that is, the probability of the parameter before the experiment is run. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 21 / 50

22 Forward and inverse probabilities The conditional distribution p(d θ, H) represents how the experiment evolves, relating the parameter θ and the data D given some hypothesis H. Depending on what we are reasoning about, this quantity has different names: 1. If we fix the value of the parameter θ, p(d θ, H) is the conditional probability of observing data D if the parameter assumed value θ under hypothesis H. 2. If we fix the value of the data D, p(d θ, H) is the likelihood of the parameter θ given we observed data D, under hypothesis H. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 22 / 50

23 Forward and inverse probabilities p(d H) is the probability of evidence, that is, the probability of producing data D under hypothesis H, after the experiment is run. Naturally p(d H) = θ = θ p(θ, D H) p(θ H)p(D θ, H). The general equation is written as p(θ D, H) = posterior probability = p(θ H)p(D θ, H) p(d H) prior probability likelihood. evidence Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 23 / 50

24 Probability vs. Likelihood We can say: the probability of the data, meaning the probability of obtaining data D given parameter θ, under hypothesis H. the likelihood of the parameter, meaning how likely it is that a given parameter θ generated the data D we observed, under assumption H. We can not say the likelihood of the data. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 24 / 50

25 The Likelihood Principle Let us consider now a more complicated example involving urns and balls. Example 4 Urn A contains five balls: one black, two white, one green, and one pink; urn B contains five hundred balls: two hundred black, one hundred white, 50 yellow, 40 cyan, 30 sienna, 25 olive, 25 silver, 20 gold, and 10 purple. (One fifth of A s balls are black; two-fifths of B s balls are black.) One of the urns is selected at random an one ball is drawn. The ball is black. What is the probability that the selected urn is A? Solution. You can do it, guys! I know you can! Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 25 / 50

26 The Likelihood Principle In the previous two examples, does each answer depend on the detailed contents of each urn? No! All that matters is the probability of the outcome that actually happened (here, that the ball drawn was black) given the different hypotheses. We need only to know the likelihood, i.e., how the probability of the data that happened varies with the hypothesis. This simple rule about inference is known as the likelihood principle. The likelihood principle: given a generative model for data D given parameters θ, p(d θ), and having observed a particular outcome d, all inferences and predictions should depend only on the function P(d θ). In other words, the likelihood principle says that once an outcome d is observed, the probabilities of other outcomes do not have an influence on the inferences we make about the parameter θ. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 26 / 50

27 More About Inference Mário S. Alvim Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 27 / 50

28 Bayes Theorem at the heart of inference We have seen many examples showing that Bayes Theorem is very useful when doing inference. If we are presented with data D and have a possible explanation E for the data, Bayes theorem tells us that (Recall that H denotes the assumptions on which the probabilities are based.) p(e D, H) = p(e H)p(D E, H). p(d H) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 28 / 50

29 Bayes Theorem at the heart of inference The probability p(e D, H) we will encounter will depend on the assumptions we made, such as: However: the hypotheses H, and the a priori probability p(e H) of the explanation. you can t do inference or data compression without making assumptions, and Bayesians won t apologize for it. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 29 / 50

30 Why Bayesian inference is good inference 1. Once assumptions are made, the inferences are: objective, unique, and reproducible with complete agreement by anyone who has the same information and makes the same assumptions. 2. Bayesian inference makes assumptions explicit, so you are forced to think clearly about what hypotheses your model is based on. Moreover, once assumptions are explicit, they are easier to criticize, and easier to modify. Mário S. Alvim Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 30 / 50

31 Why Bayesian inference is good inference 3. When we are not sure which of various alternative assumptions is the most appropriate for a problem, we can treat this question as another inference task. Given data D, we can compare alternative assumptions H using P(H D, I ) = P(H I )P(D H, I ), P(D I ) where I denotes the highest assumptions, which we are not questioning. 4. We can take into account our uncertainty about the assumptions themselves when making subsequent predictions. Rather than choosing one particular assumption H, and predicting quantity t using P(t D, H, I ), we can do P(t D, I ) = H P(H D, I )P(t D, H, I ). Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 31 / 50

32 Model comparison Assume we have two competing models for a phenomenon, represented by hypothesis H 1 and hypothesis H 2, and that we run an experiment and observe data D. The best model (H 1 or H 2 ) fitting the data (D) is indicated by the ratio: so that If If If p(h 1 D) p(h 2 D), p(h1 D) >1, hypothesis H1 is a more probable explanation for D than H2. p(h 2 D) p(h1 D) <1, hypothesis H2 is a more probable explanation for D than H1. p(h 2 D) p(h1 D) =1, hypotheses H1 and H2 are equally good in explaining D. p(h 2 D) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 32 / 50

33 Model comparison Note that this ratio can be decomposed into: p(h 1 D) p(h 2 D) = p(h 1)p(D H 1 ) p(d) p(d) p(h 2 )p(d H 2 ) = p(h 1) p(h 2 ) p(d H 1) p(d H 2 ), where: p(h 1) is the ratio of the prior probabilities of the competing models. p(h 2) This ratio indicates the relative probability of the hypotheses before data D is collected. p(h 1) = k means that H1 is a priori k times more probable than H2. p(h 2) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 33 / 50

34 Model comparison Note that this ratio can be decomposed into: p(h 1 D) p(h 2 D) = p(h 1)p(D H 1 ) p(d) p(d) p(h 2 )p(d H 2 ) = p(h 1) p(h 2 ) p(d H 1) p(d H 2 ), where: The likelihood ratio p(d H1) is the ratio of the evidence for each model. p(d H 2) This ratio serves as a correction factor for the prior ratio, taking into consideration the weight evidence D collected has on each hypothesis. p(d H 1) = c means that hypothesis H1 is c times more likely an p(d H 2) explanation for data D than hypothesis H 2: c > 1 means that evidence D favors hypothesis H 1 over hypothesis H 2. c < 1 means that evidence D favors hypothesis H 2 over hypothesis H 1. c = 1 means that evidence D favors no hypothesis. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 34 / 50

35 Inference - An example of legal evidence Example 5 crime. Two people have left traces of their own blood at the scene of a A suspect, Oliver, is tested and found to have type O blood. The blood groups of the two traces are found to be: of type O (a common type in the local population, having frequency 60%) and, of type AB (a rare type, with frequency 1%). Do these data (type O and AB blood were found at the scene) give evidence in favor of the proposition that Oliver was one of the two people present at the crime? Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 35 / 50

36 Inference - An example of legal evidence Example 5 (Continued) Solution. A careless lawyer might claim that the fact that the suspect s blood type was found at the crime scene is evidence for the theory that he was present. This is wrong. Let S be the proposition the suspect and one unknown person were present and the alternative proposition S be two unknown people from the population were present. Let D be the data collected (i.e., there is one sample of blood type O and one of blood type AB in the scene), and let H be the set of underlying hypotheses. We want to evaluate the contribution the data D gives as evidence for each supposition (S or S), i.e., the ratio p(d S, H) p(d S, H). Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 36 / 50

37 Inference - An example of legal evidence Example 5 (Continued) Let us call p AB and p O the probabilities of a person in the general population having blood type AB and O, respectively. We know that p(d S, H) = p AB, since given S, we already know that one trace will be of type O. We also know that Hence, the ratio of the evidences is p(d S, H) = 2p O p AB. p(d S, H) p(d S, H) = 1 2p O = = 0.83, so the data provides evidence against the supposition that the suspect, Oliver, was present. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 37 / 50

38 Inference - An example of legal evidence Example 6 Assume the same scenario of the example above, but now there is another suspect, Alberto, whose blood type is AB. Does the evidence found (one sample of blood type O and one sample of blood type AB ) favors the proposition that Alberto was one of the two people present at the crime? Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 38 / 50

39 Inference - An example of legal evidence Example 6 (Continued) Solution. Intuitively, it may seem that the data does provide evidence that Alberto was at the crime scene. But let s be careful and do the right calculations. Let S be the proposition the suspect (Alberto) and one unknown person were present and the alternative proposition S be two unknown people from the population were present. We know that p(d S, H) = p O, since given S, we already know that one trace will be of type AB. The ratio of the evidences is p(d S, H) p(d S, H) = p O 2p O p AB = 1 2p AB = = 50, so the data provides evidence for the supposition that Alberto was present. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 39 / 50

40 Inference - An example of legal evidence Example 7 Assume the same scenario of the example above, but now consider that 99% of the population is of blood type O, and 1% is of blood type AB (and that no other blood type exists). Intuitively the evidence found (one sample of blood type O and one sample of blood type AB ) favors the proposition that Alberto was one of the two people present at the crime. On the other hand, is it reasonable to assume that the evidence found favors the proposition that Oliver was one of the two people present at the crime? Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 40 / 50

41 Inference - An example of legal evidence Example 7 (Continued) Solution. Intuitively (since there exist only blood types AB and O ) the the evidence cannot increase the probability that both a person of blood type AB (such as Alberto) and a person of blood type O (such as Oliver) were present. That would mean that the evidence increases the probability of all suspects, no matter whom, being at the crime scene, which is nonsense. In other words: data may be compatible with several theories, but if some data provides evidence for some theories, it necessarily provides evidence against some other theories. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 41 / 50

42 Inference - An example of legal evidence Example 8 Assume once again the same scenario of the example above. Consider that were found at the crime scene n O blood stains of individuals of type O, and n AB blood stains of individuals of type AB from a total of N blood samples found in total. Consider that the frequencies of blood type O in the general population is p O, and the frequency of blood type AB in the general population is p AB (there may be other blood types too). Find a closed formula for the likelihood ratio of the following two competing hypotheses S: the type O suspect (Oliver) and N 1 unknown others left N stains, and S: N unknowns left N stains. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 42 / 50

43 Inference - An example of legal evidence Example 8 (Continued) Solution. For hypothesis S we have For hypothesis S we have P(n O, n AB S) = N! n O!n AB! pn O O p n AB AB. (N 1)! P(n O, n AB S) = (n O 1)!n AB! pn O 1 O p n AB AB, since because Oliver is fixed, we only need the distribution of the other N 1 individuals. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 43 / 50

44 Inference - An example of legal evidence Example 8 (Continued) The likelihood ratio is p n AB AB P(n O, n AB S) P(n O, n AB S) = (N 1 1)!pnO O (n O 1)!n AB! = n O/N p O. n O!n AB! N!p n O O p n AB AB This likelihood ration means that the evidence depends on a comparison on the frequency of Oliver s blood type in the data (n O /N), and the frequency of Oliver s blood type in the general population (p O ): 1. If Oliver s blood type is more frequent in the data than in the population, the evidence counts for his presence at the crime scene. 2. If Oliver s blood type is less frequent in the data than in the population, the evidence counts for his absence from the crime scene. 3. If Oliver s blood type is as frequent in the data than as it is in the population, the evidence is neutral for his presence in the crime scene. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 44 / 50

45 Model comparison - Examples Example 9 A die is selected at random from two twenty-faced dice on which the symbols 1-10 are written with nonuniform frequency as follows. Symbol Number of faces of die A Number of faces of die B The randomly chosen die is rolled 7 times, with the following outcomes: 5, 3, 9, 3, 8, 4, 7. Is this sequence evidence in favor of the die being die A or die B? Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 45 / 50

46 Model comparison - Examples Example 9 (Continued) The observed data D is the sequence of outcomes 3, 9, 3, 8, 4, 7. The concurrent hypotheses are whether the die is A or B. We can, then, compute the ratio of posterior probabilities: p(a D) p(b D) = p(a) p(d A) p(b) p(d B) = 1 /2 1/2 1/20 3/20 1/20 3/20 1/20 2/20 1/20 2/20 2/20 1/20 2/20 2/20 2/20 2/20 = 9 32 Since the ratio is smaller than 1, the evidence favors the die being die B. Note that, knowing that p(a D) + p(b D) = 1, we can go further and infer that p(a D) = 9 /41, and p(b D) = 32 /41. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 46 / 50

47 Model comparison - Examples Example 10 Consider again the same dice A and B from the previous example, but assume now that there is a third twenty-faced die, die C, on which the symbols 1-20 are written once each. As above, one of the three dice is selected at random and rolled 7 times, giving the outcomes: What is the probability that the die is a) die A? b) die B? c) die C? 3, 5, 4, 8, 3, 9, 7. Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 47 / 50

48 Model comparison - Examples Example 10 (Continued) Solution. We can compute the quantities directly using the formulas p(a D) = First, let us compute p(a)p(d A), p(b D) = p(d) p(d A) = 3 20 p(d B) = 2 20 p(d C) = p(b)p(d B), and p(c D) = p(d) = = = p(c)p(d C). p(d) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 48 / 50

49 Model comparison - Examples Example 10 (Continued) Let us assume that all dice are equally likely, i.e., Then, we can compute p(a) = p(b) = p(c) = 1 /3. p(d) = p(a)p(d A) + p(b)p(d B) + p(c)p(d C) = = Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 49 / 50

50 Model comparison - Examples Example 10 (Continued) And, finally, we can compute: p(a D) = p(b D) = p(c D) = p(a) p(d A) p(d) p(b) p(d B) p(d) p(c) p(d C) p(d) = 1 /3 18 / /( ) = 1 /3 64 / /( ) = 1 /3 1/ /( ) = = = 1 83 Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference / More About Inference DCC-UFMG (2018/02) 50 / 50

Probability and Inference

Probability and Inference Deniz Yuret ECOE 554 Lecture 3 Outline 1 Probabilities and ensembles 2 3 Ensemble An ensemble X is a triple (x, A X, P X ), where the outcome x is the value of a random variable, which takes on one of