Class #10: Introduction to Probability Theory Artificial Intelligence (CS 452/552): M. Allen, 27 Sept. 17 A Problem Involving Games } Two players put money in on a game of chance } First one to certain number of points, P, wins the money } Game is interrupted before either wins, however: } Player one has n < P points } Player two has m < P points } Who wins? Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 2 Paccioli s Solution } Summa de Arithmetica, Geometria, Proportioni et Proportionalita (1487) } Each gets proportion given by points so far, relative to total points by all players } Player with n points gets: n / (n + m) } Player with m points gets: m / (n + m) According to Paccioli, any other solution is simply preposterous Problems for Paccioli: Small Samples } What if we ve only played one round? } What if we ve played a few rounds, but there are many more left still to go? } Is it fair that I win all or most of the money if I ve only won a few games by luck at the start? Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 3 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 4 1
Tartaglia s Solution The Trouble with Tartaglia: Ignoring Likely Outcomes } Trattato generale di numeri e misure (1556) } Winner so far gets extra share in proportion to current lead, relative to total needed to win } So if player with n points leads, and P points complete the game, that player gets own half of the money, plus extra bonus, taken out of losing player s share: (n m) / P } Suppose we play to 100 points and situation is: } Player 1: 80, Player 2: 70? } Player 1:99, Player 2: 89? } Player 1: 15, Player 2: 5? } What s the difference between these situations? } Depending upon the situation, the current lead may or may not tell us much about how likely we are to win in the future Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 5 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 6 The Birth of Probability Theory Creators of Probability Theory } This Problem of Points (among other things) led to creation of probability theory } A realization that it didn t really matter how many games you had already won } Instead, we want to think about the chance that you are going to win in the future } What was wanted: 1. An exact measure of how likely something is 2. A system for calculating with such likelihood measurements Blaise Pascal Pierre de Fermat Christiaan de Huygens (1623-62) (1629-95) (1601-65) Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 7 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 8 2
Sources of Uncertainty } Stochastic environments } players in games who play randomly, random action-effects } Imprecise models } complex systems (weather, real-life games), unknowns and errors in descriptions (Mars Polar Lander) } Noisy data } limited range, obscuring weather, defective sensors } Limitless exceptions to our knowledge } Mammals have live young, Republicans are military hawks } Many more Reasoning Under Uncertainty } Even simple exceptions to rules cause problems for apparently logical reasoning patterns: Usually, mammals bear live young Usually, warm-blooded animals are mammals Usually, warm-blooded animals bear live young Usually, people who join the Army are male Usually, female ROTC members join the Army Usually female ROTC members are male Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 9 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 10 Probability and Uncertainty } Precise framework for exact reasoning under uncertainty } Allows us to: } Combine multiple pieces of evidence } Update our beliefs as new evidence comes in } Predict what is likely going to happen in the future } Diagnose what is likely to have happened in the past, given the present circumstances Basic Elements of Probability Theory } Random variable: thing with uncertain outcomes, and its own domain of values, which could be: } Boolean (True/False) } Discrete (countable domains) } Continuous (real-numbered domains) } Outcome: particular setting of a value for some variable } Example: die 1 = 3 } Event: a combination of outcomes } Example: die 1 = 3 die 2 = 2 die 3 = 6 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 11 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 12 3
Random Variables & Outcomes } Notation: } X, Y, Z : individual variables } x, y, z : outcome values from the domain of each variable } We may use more informative names: } StudentXGrade: grade for Student X } A, B: possible grade values } Atomic event: particular outcome for set of variables } Perhaps a single event: StudentXGrade = A } Perhaps a combination: StudentXGrade = A StudentYGrade = B Basic Logical Notation in Play } Logical symbols: } NOT: A (Event A did not occur) } AND: A B (Both A and B occur) } OR: A B (Either A or B, or both, occur) } SOME: x (Exists at least one x such that ) } ALL: x (All x are such that ) } Basic logical facts: } Non-contradiction: A A == False } Excluded Middle: A A == True Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 13 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 14 Properties of Basic Events } Mutually exclusive: At most one event can be true } Exhaustive: At least one event must be true } Thus, one (and only one) of the following has to be true, assuming two variables with two values each: StudentXGrade = A StudentYGrade = A StudentXGrade = A StudentYGrade = B StudentXGrade = B StudentYGrade = A StudentXGrade = B StudentYGrade = B Probability Distributions } For any variable and outcome, we have the unconditional probability that it is true: P(StudentXGrade = A) = 0.78 } Distribution: collection of all probabilities for a variable: P(StudentXGrade) = {0.78, 0.22} [A, B] } We can then calculate the probability of more complex events based on the basic distributions: P(StudentXGrade = A StudentYGrade = B) =? } Where do these basic numbers actually come from? Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 15 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 16 4
Subjective Probability } One view: probabilities describe what we believe to be true about the world } How confident we are that something will/won t happen } Can be based on a number of things (observation of human behavior, expert opinion, polls, etc.) } Question: What do we do if our beliefs are different than those of others? Objective Probability } Another view: probability distributions reflect reality } Events really do happen with some set probability } If we have an accurate distribution, then our numbers match with how the world is in and of itself } Question: what is the mechanism that makes all of this work? Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 17 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 18 Frequentist Probability Basic Axioms of Probability } Another idea: probabilities are based on actual, empirical observation of events over time } We track how many times different outcomes occur, relative to the total number of events seen so far 1. For any event a, 0 P (a) 1. 2. P (True) = 1 and P (F alse) = 0. 3. P (a b) =P (a)+p (b) P (a b). P(a b) } Potential Issues: } Requires many observations } Unrepeatable/very rare events are hard to put meaningful numbers on } Objectivist view thinks this is just an attempt to learn what is really there in the first place P(a v b) P(a) P(b) Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 19 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 20 5
Using the Axioms P P (a (a a) a) =P =P (a)+p ( a) ( a) P (a P ^ (a a) ^ a) [Axiom 3] P (True)=P (a)+p ( a) P (False) [Logic] 1=P (a)+p ( a) [Axiom 2] P ( a) =1 P (a) P( a) Joint Probability } For random variables {X1, X2,, Xn} joint distribution gives probability of each possible combination of outcomes } Can be written as a table of values: StudentXGrade = A StudentXGrade = B StudentYGrade = A 0.72 0.08 StudentYGrade = B 0.18 0.02 P(a) Note: if the table is supposed to represent a proper joint distribution, then the numbers must all sum to 1.0 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 21 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 22 Marginal Probability } Given the joint distribution we can get the probability of any outcome by marginalizing: summing all values where the outcome we want is true } For example, probability student X gets an A (left column): } Probability that either X or Y gets an A: StudentXGrade = A StudentXGrade = B StudentYGrade = A 0.72 0.08 StudentYGrade = B 0.18 0.02 P (StudentXGrade = A) = (0.72 + 0.18) = 0.9 P (StudentXGrade = A _ StudentY Grade = A) = (0.72 + 0.18 + 0.008) = 0.98 This Week } Topics: Constraint satisfaction, intro to probability } Read: } Read sections 6.1 6.5, 13.1 13.3 } Homework 01: due today, by start of class } Homework 02: will be posted by Friday afternoon } Due: Wednesday, 11 October, by start of class } Office Hours: Wing 210 } Monday/Tuesday, 4:30 PM 6:00 PM } Thursday/Friday, 9:00 AM 10:30 AM Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 23 Wednesday, 27 Sep. 2017 Artificial Intelligence (CS 452/552) 24 6