Artificial Intelligence Programming Probability

Similar documents
Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Artificial Intelligence Uncertainty

Artificial Intelligence CS 6364

Artificial Intelligence

Uncertainty and Bayesian Networks

Pengju XJTU 2016

Uncertainty. Chapter 13

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

CS 188: Artificial Intelligence Fall 2009

Uncertainty. Outline

Probabilistic Reasoning

Probabilistic Reasoning

Lecture Overview. Introduction to Artificial Intelligence COMP 3501 / COMP Lecture 11: Uncertainty. Uncertainty.

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

CS 188: Artificial Intelligence. Our Status in CS188

13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

Uncertainty. Chapter 13

Chapter 13 Quantifying Uncertainty

Quantifying uncertainty & Bayesian networks

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

Uncertain Knowledge and Reasoning

CS 188: Artificial Intelligence Fall 2011

Uncertainty. Chapter 13. Chapter 13 1

CSE 473: Artificial Intelligence

Our Status. We re done with Part I Search and Planning!

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Pengju

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Probabilistic Robotics

Where are we in CS 440?

Probabilistic representation and reasoning

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

CSCE 478/878 Lecture 6: Bayesian Learning

CS 343: Artificial Intelligence

Uncertainty (Chapter 13, Russell & Norvig) Introduction to Artificial Intelligence CS 150 Lecture 14

In today s lecture. Conditional probability and independence. COSC343: Artificial Intelligence. Curse of dimensionality.

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

Where are we in CS 440?

Probabilistic Robotics. Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics (S. Thurn et al.

Bayesian Classification. Bayesian Classification: Why?

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

Our Status in CSE 5522

Discrete Probability and State Estimation

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Basic Probability and Decisions

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

CS 188: Artificial Intelligence Spring Announcements

Bayesian Learning Features of Bayesian learning methods:

Reinforcement Learning Wrap-up

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Basic Probability. Robert Platt Northeastern University. Some images and slides are used from: 1. AIMA 2. Chris Amato

Resolution or modus ponens are exact there is no possibility of mistake if the rules are followed exactly.

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Probability and Decision Theory

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

CS 188: Artificial Intelligence Spring Today

CS 5100: Founda.ons of Ar.ficial Intelligence

Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473

Uncertainty. Chapter 13, Sections 1 6

Probabilistic representation and reasoning

CS 561: Artificial Intelligence

Propositional Logic: Logical Agents (Part I)

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Bayesian Learning. Examples. Conditional Probability. Two Roles for Bayesian Methods. Prior Probability and Random Variables. The Chain Rule P (B)

Foundations of Artificial Intelligence

CS 188: Artificial Intelligence Spring Announcements

Stochastic Methods. 5.0 Introduction 5.1 The Elements of Counting 5.2 Elements of Probability Theory

Uncertain Knowledge and Bayes Rule. George Konidaris

CS 188: Artificial Intelligence Fall 2008

Stephen Scott.

Classification and Regression Trees

Random Variables. A random variable is some aspect of the world about which we (may) have uncertainty

Y. Xiang, Inference with Uncertain Knowledge 1

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Answering. Question Answering: Result. Application of the Theorem Prover: Question. Overview. Question Answering: Example.

PROBABILITY. Inference: constraint propagation

Reasoning under Uncertainty: Intro to Probability

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Logical Agents. Knowledge based agents. Knowledge based agents. Knowledge based agents. The Wumpus World. Knowledge Bases 10/20/14

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Uncertainty. Russell & Norvig Chapter 13.

Web-Mining Agents Data Mining

CS 5522: Artificial Intelligence II

Modeling and reasoning with uncertainty

Reasoning Under Uncertainty

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Propositional Logic and Probability Theory: Review

Reasoning Under Uncertainty

Reasoning under Uncertainty: Intro to Probability

Transcription:

Artificial Intelligence Programming Probability Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science University of San Francisco p.1/??

13-0: Uncertainty In many interesting agent environments, uncertainty plays a central role. Actions may have nondeterministic effects. Shooting an arrow at a target, retrieving a web page, moving Agents may not know the true state of the world. Incomplete sensors, dynamic environment Relations between facts may not be deterministic. Sometimes it rains when it s cloudy. Sometimes I play tennis when it s humid. Rational agents will need to deal with uncertainty. Department of Computer Science University of San Francisco p.2/??

13-1: Logic and Uncertainty We ve already seen how to use logic to deal with uncertainty. Studies(Bart) W atchest V (Bart) Hungry(Homer) Eats(Homer, HotDog) Eats(Homer, P ie) xhungry(x) Unfortunately, the logical approach has some drawbacks. Department of Computer Science University of San Francisco p.3/??

13-2: Weaknesses with logic Qualifying all possible outcomes. If I leave now, I ll be on time, unless there s an earthquake, or I run out of gas, or there s an accident... We may not know all possible outcomes. If a patient has a toothache, she may have a cavity, or may have gum disease, or maybe something else we don t know about. We have no way to talk about the likelihood of events. It s possible that I ll get hit by lightning today. Department of Computer Science University of San Francisco p.4/??

13-3: Qualitative vs. Quantitative Logic gives us a qualitative approach to uncertainty. We can say that one event is more common than another, or that something is a possibility. Useful in cases where we don t have statistics, or we want to reason more abstractly. Probability allows us to reason quantitatively We assign concrete values to the chance of an event occurring and derive new concrete values based on observations. Department of Computer Science University of San Francisco p.5/??

13-4: Uncertainty and Rationality Recall our definition of rationality: A rational agent is one that acts to maximize its performance measure. How do we define this in an uncertain world? We will say that an agent has a utility for different outcomes, and that those outcomes have a probability of occurring. An agent can then consider each of the possible outcomes, their utility, and the probability of that outcome occurring, and choose the action that produces the highest expected (or average) utility. The theory of combining preferences over outcomes with the probability of an outcome s occurrence is called Department of Computer Science University of San Francisco p.6/?? decision theory.

13-5: Basic Probability A probability signifies a belief that a proposition is true. P(BartStudied) = 0.01 P(Hungry(Homer)) = 0.99 The proposition itself is true or false - we just don t know which. This is different than saying the sentence is partially true. Bart is short - this is sort of true, since short is a vague term. An agent s belief state is a representation of the probability of the value of each proposition of interest. Department of Computer Science University of San Francisco p.7/??

13-6: Random Values A random variable is a variable or proposition whose value is unknown. It has a domain of values that it can take on. These variables can be: Boolean (true, false) - Hungry(Homer), israining Discrete - values taken from a countable domain. Temperature: <hot, cool, mild>, Outlook: <sunny, overcast, rain> Continuous - values can be drawn from an interval such as [0,1] Velocity, time, position Most of our focus will be on the discrete case. Department of Computer Science University of San Francisco p.8/??

13-7: Atomic Events We can combine propositions using standard logical connectives and talk about conjunction and disjunction P (Hungry(Homer) Study(Bart)) P (Brother(Lisa, Bart) Sister(Lisa, Bart) A sentence that specifies a possible value for every uncertain variable is called an atomic event. Atomic events are mutually exclusive The set of all atomic events is exhaustive An atomic event predicts the truth or falsity of every proposition Atomic events will be useful in determining truth in cases with multiple uncertain variables. Department of Computer Science University of San Francisco p.9/??

13-8: Axioms of Probability All probabilities are between 0 and 1. 0 P (a) 1 Propositions that are necessarily true have probability 1. Propositions that are unsatisfiable have probability 0. The probability of (A B) is P (A) + P (B) P (A B) Department of Computer Science University of San Francisco p.10/??

13-9: Prior Probability The prior probability of a proposition is its probability of taking on a value in the absence of any other information. P(Rain) = 0.1, P(Overcast) = 0.4, P(Sunny) = 0.5 We can also list the probabilities of combinations of variables P (Rain Humid) = 0.1, P (Rain Humid) = 0.1, P (Overcast Humid) = 0.2, P (Overcast Humid) = 0.2, P (Sunny Humid) = 0.15, P (Sunny Humid) = 0.25 This is called a joint probability distribution For continuous variables, we can t enumerate values Instead, we use a parameterized function. P (x) = 1 2π e z2 2 dz (Normal distribution) Department of Computer Science University of San Francisco p.11/??

13-10: Inference with Joint Probability Distributions The simplest way of doing probabilistic inference is to keep a table representing the joint probability distribution. Observe each of the independent variables, then look up the probability of the dependent variable. Hum = High Hum = High Hum = Normal Hum = Normal Sky = Overcast Sky = Sunny Sky = Overcast Sky = Sunny Rain 0.1 0.05 0.15 0.05 Rain 0.2 0.15 0.1 0.2 Department of Computer Science University of San Francisco p.12/??

13-11: Inference with Joint Probability Distributions We can also use the joint probability distribution to determine the marginal probability of the dependent variable by summing all the ways the dependent variable can be true. P (Rain) = 0.1 + 0.05 + 0.15 + 0.05 = 0.35 What is the problem with using the joint probability distribution to do inference? Department of Computer Science University of San Francisco p.13/??

13-12: Inference with Joint Probability Distributions We can also use the joint probability distribution to determine the marginal probability of the dependent variable by summing all the ways the dependent variable can be true. P (Rain) = 0.1 + 0.05 + 0.15 + 0.05 = 0.35 What is the problem with using the joint probability distribution to do inference? A problem with n independent Boolean variables requires a table of size 2 n Department of Computer Science University of San Francisco p.14/??

13-13: Conditional Probability Once we begin to make observations about the value of certain variables, our belief in other variables changes. Once we notice that it s cloudy, P (Rain) goes up. this is called conditional probability Written as: P (Rain Cloudy) P (a b) = P (a b) P (b) or P (a b) = P (a b)p (b) This is called the product rule. Department of Computer Science University of San Francisco p.15/??

13-14: Conditional Probability Example: P (Cloudy) = 0.3 P (Rain) = 0.2 P (cloudy rain) = 0.15 P (cloudy Rain) = 0.1 P ( cloudy Rain) = 0.1 P ( Cloudy Rain) = 0.65 Initially, P (Rain) = 0.2. Once we see that it s cloudy, P (Rain Cloudy) = P (Rain Cloudy) = 0.15 = 0.5 P (Cloudy) 0.3 Department of Computer Science University of San Francisco p.16/??

13-15: Independence In some cases, we can simplify matters by noticing that one variable has no effect on another. For example, what if we add a fourth variable DayOfW eek to our Rain calculation? Since the day of the week will not affect the probability of rain, we can assert P (rain Cloudy, Monday) = P (rain cloudy, T uesday)... = P (rain cloudy) We say that DayOfW eek and Rain are independent. We can then split the larger joint probability distribution into separate subtables. Independence will help us divide the domain into separate pieces. Department of Computer Science University of San Francisco p.17/??

13-16: Bayes Theorem Often, we want to know how a probability changes as a result of an observation. Recall the Product Rule: P (a b) = P (a b)p (b) P (a b) = P (b a)p (a) We can set these equal to each other P (a b)p (b) = P (b a)p (a) And then divide by P (a) P (b a) = P (a b)p (b) P (a) This equality is known as Bayes theorem (or rule or law). Department of Computer Science University of San Francisco p.18/??

13-17: Bayes Theorem we can generalize this to the case with more than two variables: P (Y X, e) = P (X Y,e)P (Y e) P (X e) We can then recursively solve for the conditional probabilities on the right-hand side. In practice, Bayes rule is useful for transforming the question we want to ask into one for which we have data. Department of Computer Science University of San Francisco p.19/??

13-18: Bayes theorem example Say we know: Meningitis causes a stiff neck in 50% of patients. P (stiffneck Meningitis) = 0.5 Prior probability of meningitis is 1/50000. P (meningitis) = 0.00002 Prior probability of a stiff neck is 1/20 P (stiffneck) = 0.05 A patient comes to use with a stiff neck. What is the probability she has meningitis? P (meningitis stiffneck) = P (stif f N eck meningitis)p (meningitis) P (stiffneck) = 0.5 0.00002 0.05 = 0.0002 Department of Computer Science University of San Francisco p.20/??

13-19: Another example Suppose a lab test comes back saying that a patient has cancer. Should we believe it? P (cancer) = 0.008 P (positivet est cancer) = 0.98 P (positivet est cancer) = 0.03 We want to know whether cancer or cancer is more likely. P (cancer positive) Department of Computer Science University of San Francisco p.21/??

13-20: Another example We can ignore the denominator for the moment. P (positive cancer)p (cancer) = 0.98 0.008 = 0.0078 P (positive cancer) = 0.03 0.992 = 0.298 We see that it s more likely that the patient does not have cancer. we can get the exact probabilities by normalizing these values. P (cancer positive) = 0.0078 0.0078+0.0298 = 0.21 Department of Computer Science University of San Francisco p.22/??

13-21: Why is this useful? Often, a domain expert will want diagnostic information. P (meningitis stif f N eck) We could derive this directly from statistical information. However, if there s a meningitis outbreak, P (meningitis) will change. Unclear how to update a direct estimate of P (meningitis stif f N eck) Since P (stif f N eck meningitis) hasn t changed, we can use Bayes rule to indirectly update it instead. This makes our inference system more robust to changes in priors. Department of Computer Science University of San Francisco p.23/??

13-22: Combining Evidence with Bayes Theorem We can extend this to work with multiple observed variables. P (a b c) = αp (a b c)p (c) where α is a normalization parameter representing 1 b c This is still hard to work with in the general case. However, if a and b are independent of each other, then we can write: P (a b) = P (a)p (b) More common is the case where a and b influence each other, but are independent once the value of a third variable is known, This is called conditional independence. Department of Computer Science University of San Francisco p.24/??

13-23: Conditional Independence Suppose we want to know is the patient has a cavity. Our observed variables are toothache and catch. These aren t initially independent - if the patient has a toothache, it s likely she has a cavity, which increases the probability of catch. Since each is caused by the having a cavity, once we know that the patient does (or does not) have a cavity, these variables become independent. We write this as: P (toothache catch cavity) = P (toothache cavity)p (catch cavity). We then use Bayes theorem to rewrite this as: P (cavity toothache catch) = αp (toothache cavity)p (catch cavity)p (cavity) Department of Computer Science University of San Francisco p.25/??

13-24: Applications of probabilistic Bayesian Learning - classifying unseen examples based on distributions from a training set. Bayesian Networks. Probabilistic rule-based systems Exploit conditional independence for tractability Can perform diagnostic or causal reasoning Decision networks - predicting the effects of uncertain actions. inference Department of Computer Science University of San Francisco p.26/??