An AI-ish view of Probability, Conditional Probability & Bayes Theorem

Similar documents
10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

10/15/2015 A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) Probability, Conditional Probability & Bayes Rule. Discrete random variables

Uncertainty. Chapter 13

Artificial Intelligence

Artificial Intelligence Uncertainty

Uncertainty. Outline

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)

Uncertainty. Chapter 13

Probabilistic Reasoning

Pengju XJTU 2016

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

Uncertainty. Chapter 13, Sections 1 6

Uncertainty. Chapter 13. Chapter 13 1

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13

Probabilistic Reasoning

Basic Probability and Decisions

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Chapter 13 Quantifying Uncertainty

CS 561: Artificial Intelligence

Quantifying uncertainty & Bayesian networks

: A sea change in AI technologies

Pengju

Probabilistic Robotics

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Uncertain Knowledge and Reasoning

Where are we in CS 440?

n How to represent uncertainty in knowledge? n Which action to choose under uncertainty? q Assume the car does not have a flat tire

Uncertainty and Bayesian Networks

Uncertainty (Chapter 13, Russell & Norvig) Introduction to Artificial Intelligence CS 150 Lecture 14

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

Where are we in CS 440?

Uncertainty. Russell & Norvig Chapter 13.

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Web-Mining Agents Data Mining

Resolution or modus ponens are exact there is no possibility of mistake if the rules are followed exactly.

CS 5100: Founda.ons of Ar.ficial Intelligence

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Basic Probability. Robert Platt Northeastern University. Some images and slides are used from: 1. AIMA 2. Chris Amato

13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

Cartesian-product sample spaces and independence

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Reasoning under Uncertainty: Intro to Probability

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Probability and Decision Theory

Probabilistic representation and reasoning

Reasoning with Uncertainty. Chapter 13

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Discrete Probability and State Estimation

Uncertainty (Chapter 13, Russell & Norvig)

CSE 473: Artificial Intelligence

Our Status. We re done with Part I Search and Planning!

ARTIFICIAL INTELLIGENCE. Uncertainty: probabilistic reasoning

Probabilistic Models

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

CS 188: Artificial Intelligence. Our Status in CS188

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Artificial Intelligence CS 6364

CS 188: Artificial Intelligence Fall 2009

Artificial Intelligence Programming Probability

Reasoning under Uncertainty: Intro to Probability

Lecture Overview. Introduction to Artificial Intelligence COMP 3501 / COMP Lecture 11: Uncertainty. Uncertainty.

Probability theory: elements

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Probabilistic representation and reasoning

Why Probability? It's the right way to look at the world.

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

CS 5522: Artificial Intelligence II

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Fusion in simple models

Stochastic Methods. 5.0 Introduction 5.1 The Elements of Counting 5.2 Elements of Probability Theory

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

CS 188: Artificial Intelligence Fall 2009

CS 343: Artificial Intelligence

PROBABILITY. Inference: constraint propagation

CS 188: Artificial Intelligence Spring Announcements

Discrete Probability and State Estimation

Uncertain Knowledge and Bayes Rule. George Konidaris

Y. Xiang, Inference with Uncertain Knowledge 1

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

In today s lecture. Conditional probability and independence. COSC343: Artificial Intelligence. Curse of dimensionality.

Modeling and reasoning with uncertainty

Probabilistic Models. Models describe how (a portion of) the world works

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

Discrete Random Variables

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

Answering. Question Answering: Result. Application of the Theorem Prover: Question. Overview. Question Answering: Example.

CS 188: Artificial Intelligence Fall 2008

Basic Probability and Statistics

Our Status in CSE 5522

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

CS 188: Artificial Intelligence Fall 2011

Probability and Uncertainty. Bayesian Networks

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

Probabilistic Robotics. Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics (S. Thurn et al.

Basic notions of probability theory

Transcription:

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there on time? true/false Will A 20 get me there on time? true/false Will A 30 get me there on time? true/false Will A 200 get me there on time? true/false Problems: The world is not Fully observable (road state, other drivers plans, etc.) Deterministic (flat tire, etc.) Single agent (immense complexity modeling and predicting traffic) And often incoming information is wrong: noisy sensors (traffic reports, etc.) CIS 421/521 - Intro to AI 2

Making decisions under uncertainty Suppose I believe the following: P(A 25 gets me there on time ) = 0.04 P(A 90 gets me there on time ) = 0.70 P(A 120 gets me there on time ) = 0.95 P(A 1440 gets me there on time ) = 0.9999 Which action to choose? It still depends on my preferences for missing flight vs. time spent waiting, etc. CIS 421/521 - Intro to AI 3

Decision Theory Decision Theory develops methods for making optimal decisions in the presence of uncertainty. Decision Theory = utility theory + probability theory Utility theory is used to represent and infer preferences: Every state has a degree of usefulness An agent is rational if and only if it chooses an action A that yields the highest expected utility (expected usefulness). Let O be the possible outcomes of A, U(o) be the utility of outcome o, and P A (o) be the probability of o as an outcome for action A, then the Expected Utility of A is EU ( A) PA ( o) U ( o) o O CIS 421/521 - Intro to AI 4

Probability as Degree of Belief: An AI view of probability CIS 421/521 - Intro to AI 5

Probability as Degree of Belief Given the available evidence, A 25 will get me there on time with probability 0.04 Probabilistic assertions summarize the ignorance in perception and in models Theoretical ignorance: often, we simply have no complete theory of the domain, e.g. medicine Uncertainty (partial observability): Even if we know all the rules, we might be uncertain about a particular patient Laziness: Too much work to list the complete set of antecedents or consequents to ensure no exceptions CIS 421/521 - Intro to AI 6

Probabilities as Degrees of Belief Subjectivist (Bayesian) (Us) Probability is a model of agent s own degree of belief Frequentist (Not us, many statisticians) Probability is inherent in the process Probability is estimated from measurements CIS 421/521 - Intro to AI 7

Frequentists: Probability as expected frequency Frequentist (Not us, many statisticians) Probability is inherent in the process Probability is estimated from measurements P(A) = 1: A will always occur. P(A) = 0: A will never occur. 0.5 < P(A) < 1: A will occur more often than not. CIS 421/521 - Intro to AI 8

Bayesians: Probabilities as Degrees of Belief Subjectivist (Bayesian) (Us) Probability is a model of agent s own degree of belief P(A) = 1: Agent completely believes A is true. P(A) = 0: Agent completely believes A is false. 0.5 < P(A) < 1: Agent believes A is more likely to be true than false. Increasing evidence strengthens belief therefore changes probability estimate CIS 421/521 - Intro to AI 9

Frequentists vs. Bayesians I Actual dialogue one month before 2008 election Mitch (Bayesian): I think Obama now has an 80% chance of winning. Sue (Frequentist): What does that mean? It s either 100% or 0%, and we ll find out on election day. Why be Bayesian? Often need to make inferences about singular events (See above) De Finetti: It s good business If you don t use probabilities (and get them right) you ll lose bets see discussion in AIMA CIS 421/521 - Intro to AI 10

More AI-ish description of probability Basic assumption 1: : possible complete specifications of all possible states of the world (possible worlds) an agent is typically uncertain which describes the actual world E.g., if the world consists of only two Boolean variables Cavity and Toothache, then the sample space consists of 4 distinct elementary events: 1 : Cavity = false Toothache = false 2 : Cavity = false Toothache = true 3 : Cavity = true Toothache = false 4 : Cavity = true Toothache = true Elementary events are mutually exclusive and exhaustive CIS 421/521 - Intro to AI 11

AI-ish view: Discrete random variables A random variable can take on one of a set of different values, each with an associated probability. Its value at a particular time is subject to random variation. Discrete random variables take on one of a discrete (often finite) range of values Domain values must be exhaustive and mutually exclusive For us, random variables will have a discrete, countable (usually finite) domain of arbitrary values of types well beyond R Mathematical statistics calls these random elements Example: Weather is a discrete random variable with domain {sunny, rain, cloudy, snow}. Example: A Boolean random variable has the domain {true,false}, CIS 421/521 - Intro to AI 12

A word on notation Assume Weather is a discrete random variable with domain {sunny, rain, cloudy, snow}. Weather = sunny abbreviated sunny P(Weather=sunny)=0.72 abbreviated P(sunny)=0.72 Cavity = true abbreviated cavity Cavity = false abbreviated cavity CIS 421/521 - Intro to AI 13

Factored Representations: Propositions Elementary proposition constructed by assignment of a value to a random variable: e.g. Weather = sunny (abbreviated as sunny) e.g. Cavity = false (abbreviated as cavity) Complex proposition formed from elementary propositions & standard logical connectives e.g. Weather = sunny Cavity = false AI systems often work with event spaces over such propositions CIS 421/521 - Intro to AI 14

Joint probability distribution a Probability assignment to all combinations of values of random variables (i.e. all elementary events) toothache toothache cavity 0.04 0.06 cavity 0.01 0.89 The sum of the entries in this table has to be 1 Every question about a domain can be answered by the joint distribution!!! Probability of a proposition is the sum of the probabilities of elementary events in which it holds P(cavity) = 0.1 [marginal of row 1] P(toothache) = 0.05 [marginal of toothache column] CIS 421/521 - Intro to AI 15

More on Conditional Probabilities CIS 421/521 - Intro to AI 16

Review: Conditional Probability Given a probability space (Ω, P), For any two events A and B, if P B 0, we define the conditional probability that A occurs given that B occurs [ P(A B) ] as P A B = P A B P(B) From this follows Chain Rule: P A B = P A B P B P A B C = P A B C P B C = P A B C P B C P C Etc CIS 421/521 - Intro to AI 17

Conditional Probability toothache toothache cavity 0.04 0.06 cavity 0.01 0.89 P(cavity)=0.1 and P(cavity toothache)=0.04 are both prior (unconditional) probabilities Once the agent has new evidence concerning a previously unknown random variable, e.g. Toothache, we can specify a posterior (conditional) probability e.g. P(cavity Toothache=true) P(a b) = P(a b)/p(b) [Probability of a with the Universe restricted to b] The new information restricts the set of possible worlds i consistent with that new information, so changes the probability. So P(cavity toothache) = 0.04/0.05 = 0.8 CIS 421/521 - Intro to AI 18 A B A B

Probabilistic Inference Probabilistic inference: the computation from observed evidence of posterior probabilities (probabilities given that evidence) for query propositions. We use the full joint distribution as the knowledge base from which answers to questions may be derived. Ex: three Boolean variables Toothache (T), Cavity (C), ShowsOnXRay (X) t t x x x x c 0.108 0.012 0.072 0.008 c 0.016 0.064 0.144 0.576 Review: Probabilities in joint distribution sum to 1 CIS 421/521 - Intro to AI 19

Probabilistic Inference II Review: Probability of any proposition computed by finding atomic events where proposition is true and adding their probabilities P(cavity toothache) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28 P(cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2 P(cavity) is called a marginal probability and the process of computing this is called marginalization CIS 421/521 - Intro to AI 20 t t x x x x c 0.108 0.012 0.072 0.008 c 0.016 0.064 0.144 0.576

Probabilistic Inference III Can also compute conditional probabilities. P( cavity toothache) = P( cavity toothache)/p(toothache) = (0.016 + 0.064) / (0.108 + 0.012 + 0.016 + 0.064) = 0.4 Denominator is viewed as a normalization constant: t Stays constant no matter what the value of Cavity is. (Book uses a to denote normalization constant 1/P(X), for random variable X.) t x x x x c 0.108 0.012 0.072 0.008 c 0.016 0.064 0.144 0.576 CIS 421/521 - Intro to AI 21

Bayes Theorem & Naïve Bayes (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)

Bayes Theorem & Diagnosis Posterior Likelihood Prior Normalization Useful for assessing diagnostic probability from causal probability: P(Cause Effect) = P(Effect Cause) P(Cause) P(Effect) Simple Proof: P F H = P H F P F H P H = P H F P F P F H = P( a b) P(H F) P F P(H) P( b a) P( a) Pb () (by defn of cond prob) CIS 421/521 - Intro to AI 23

Bayes Theorem For Diagnosis II P(Disease Symptom) = P(Symptom Disease) P(Disease) Imagine: P(Symptom) disease = TB, symptom = coughing P(Disease Symptom) is different in TB-indicated country vs. USA P(Symptom Disease) should be the same It is more widely useful to learn P(Symptom Disease) What about P(Symptom)? Use conditioning (next slide) For determining, e.g., the most likely disease given the symptom, we can just ignore P(Symptom)!!! (see next lecture) CIS 421/521 - Intro to AI 24

Conditioning: Computing Normalization Idea: Use conditional probabilities instead of joint probabilities In general: A = A B (A തB) P A = P A B P B + P A തB P തB Here: P(Symptom) = P(Symptom Disease) P(Disease) + P(Symptom Disease) P( Disease) CIS 421/521 - Intro to AI 25

Exponentials rear their ugly head again Estimating the necessary joint probability distribution for many symptoms is infeasible For D diseases, S symptoms where a person can have n of the diseases and m of the symptoms P(s d 1, d 2,, d n ) requires S D n values P(s 1, s 2,, s m ) requires S m values These numbers get big fast If S =1,000, D =100, n=4, m=7 P(s d 1, d n ) requires 1000*100 4 =10 11 values (-1) P(s 1..s m ) requires 1000 7 = 10 21 values (-1) CIS 421/521 - Intro to AI 26

The Solution: Independence Random variables A and B are independent iff P(A B) = P(A) P(B) equivalently: P(A B) = P(A) and P(B A) = P(B) A and B are independent if knowing whether A occurred gives no information about B (and vice versa) Independence assumptions are essential for efficient probabilistic reasoning Cavity Toothache Weather Xray decomposes into Cavity Toothache Weather Xray P(T, X, C, W) = P(T, X, C) P(W) 15 entries (2 4-1) reduced to 8 (2 3-1 + 2-1) For n independent biased coins, O(2 n ) entries O(n) CIS 421/521 - Intro to AI 27

Conditional Independence BUT absolute independence is rare Dentistry is a large field with hundreds of variables, none of which are independent. What to do? A and B are conditionally independent given C iff P(A B, C) = P(A C) P(B A, C) = P(B C) P(A B C) = P(A C) P(B C) Toothache (T), Spot in Xray (X), Cavity (C) None of these are independent of the other two But T and X are conditionally independent given C CIS 421/521 - Intro to AI 28

Conditional Independence II WHY?? If I have a cavity, the probability that the XRay shows a spot doesn t depend on whether I have a toothache (and vice versa): P(X T,C) = P(X C) From which follows: P(T X,C) = P(T C) and P(T,X C) = P(T C) P(X C) By the chain rule), given conditional independence: P(T,X,C) = P(T X,C) P(X,C) = P(T X,C) P(X C) P(C) = P(T C) P(X C) P(C) P(Toothache, Cavity, Xray) has 2 3 1 = 7 independent entries Given conditional independence, chain rule yields 2 + 2 + 1 = 5 independent numbers CIS 421/521 - Intro to AI 29

Conditional Independence III In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. Conditional independence is our most basic and robust form of knowledge about uncertain environments. CIS 421/521 - Intro to AI 30

Another Example Battery is dead (B) Radio plays (R) Starter turns over (S) None of these propositions are independent of one another BUT: R and S are conditionally independent given B CIS 421/521 - Intro to AI 31

Next Time: Naïve Bayes Spam or not Spam: that is the question From: "" <takworlld@hotmail.com> Subject: real estate is the only way... gem oalvgkay Anyone can buy real estate with no money down Stop paying rent TODAY! There is no need to spend hundreds or even thousands for similar courses I am 22 years old and I have already purchased 6 properties using the methods outlined in this truly INCREDIBLE ebook. Change your life NOW! ================================================= Click Below to order: http://www.wholesaledaily.com/sales/nmd.htm ================================================= CIS 421/521 - Intro to AI 32