Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Similar documents
Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Pengju XJTU 2016

Uncertainty and Bayesian Networks

Uncertainty. Chapter 13

Probabilistic Reasoning

Uncertainty. Chapter 13

Artificial Intelligence

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Uncertainty. Outline

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13

Uncertainty. Chapter 13. Chapter 13 1

Uncertain Knowledge and Reasoning

Uncertainty. Russell & Norvig Chapter 13.

CS 561: Artificial Intelligence

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

Probabilistic Robotics

Quantifying uncertainty & Bayesian networks

n How to represent uncertainty in knowledge? n Which action to choose under uncertainty? q Assume the car does not have a flat tire

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Artificial Intelligence Uncertainty

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Uncertainty (Chapter 13, Russell & Norvig) Introduction to Artificial Intelligence CS 150 Lecture 14

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Resolution or modus ponens are exact there is no possibility of mistake if the rules are followed exactly.

Probability. CS 3793/5233 Artificial Intelligence Probability 1

ARTIFICIAL INTELLIGENCE. Uncertainty: probabilistic reasoning

Probabilistic Reasoning

Where are we in CS 440?

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

CS 188: Artificial Intelligence Fall 2009

Pengju

Where are we in CS 440?

Bayesian belief networks

Probabilistic representation and reasoning

Probabilistic Models

Modeling and reasoning with uncertainty

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Probabilistic representation and reasoning

PROBABILISTIC REASONING SYSTEMS

CS 188: Artificial Intelligence Fall 2008

Web-Mining Agents Data Mining

Uncertainty. Chapter 13, Sections 1 6

Y. Xiang, Inference with Uncertain Knowledge 1

Introduction to Artificial Intelligence. Unit # 11

CS 5522: Artificial Intelligence II

Probabilistic Reasoning in AI COMP-526

Uncertainty (Chapter 13, Russell & Norvig)

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

CS 343: Artificial Intelligence

COMP5211 Lecture Note on Reasoning under Uncertainty

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Reasoning Systems

Probabilistic Models. Models describe how (a portion of) the world works

CS6220: DATA MINING TECHNIQUES

Basic Probability and Decisions

14 PROBABILISTIC REASONING

Lecture 8: Bayesian Networks

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks. Motivation

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Informatics 2D Reasoning and Agents Semester 2,

Bayesian belief networks. Inference.

Bayesian networks. Chapter 14, Sections 1 4

Discrete Probability and State Estimation

13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

Review: Bayesian learning and inference

Chapter 13 Quantifying Uncertainty

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Introduction to Artificial Intelligence Belief networks

CS 188: Artificial Intelligence. Our Status in CS188

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Our Status. We re done with Part I Search and Planning!

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Discrete Probability and State Estimation

Our learning goals for Bayesian Nets

Bayesian Networks Representation

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayesian networks. Chapter Chapter

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS 5522: Artificial Intelligence II

CSE 473: Artificial Intelligence Autumn 2011

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Propositional Logic and Probability Theory: Review

Basic Probabilistic Reasoning SEG

Transcription:

Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424, Lecture 10 - February 6, 2013 1 Uncertainty Back to planning: Let action A(t) denote leaving for the airport t minutes before the flight For a given value of t, willa(t) get me there on time? Problems: Partial observability (roads, other drivers plans, etc.) Noisy sensors (traffic reports) Uncertainty in action outcomes (flat tire, etc.) Immense complexity of modeling and predicting traffic COMP-424, Lecture 10 - February 6, 2013 2

Implicit methods: How To Deal With Uncertainty Ignore uncertainty as much as possible Build procedures that are robust to uncertainty This is the approach in the planning methods studied so far (e.g. monitoring and replanning) Explicit methods Build a model of the world that describes the uncertainty (about the system s state, dynamics, sensors, model) Reason about the effect of actions given the model COMP-424, Lecture 10 - February 6, 2013 3 How To Represent Uncertainty What language should we use? representations? What are the semantics of our What queries can we answer with our representations? answer them? How do we How do we construct a representation? Do we need to ask an expert or can we learn from data? COMP-424, Lecture 10 - February 6, 2013 4

Why Not Use First-Order Logic? A purely logical approach has two main problems: Risks falsehood A(25) will get me there on time. Leads to conclusions that are too weak: A(25) will get me there on time if there is no accident on the bridge and it does not rain and my tires remain intact, etc. etc. A(1440) might reasonably be said to get me there on time (but I would have to stay overnight at the airport!) COMP-424, Lecture 10 - February 6, 2013 5 Methods for Handling Uncertainty Default (non-monotonic) logic: make assumptions unless contradicted by evidence. E.g. Assume my car doesnt have a flat tire. What assumptions are reasonable? What about contradictions? Rules with fudge factor: E.g. Sprinkler 0.99 WetGrass, WetGrass 0.7 Rain But: Problems with combination (e.g. Sprinkler causes rain?) Probability: E.g. Given what I know, A(25) succeed with probability 0.2 Fuzzy logic: E.g. WetGrass is true to degree 0.2 But: Handles degree of truth, NOT uncertainty. COMP-424, Lecture 10 - February 6, 2013 6

Probability Awell-knownandwell-understoodframeworkfordealingwithuncertainty Has a clear semantics Provides principled answers for: Combining evidence Predictive and diagnostic reasoning Incorporation of new evidence Can be learned from data Intuitive to human experts (arguably?) COMP-424, Lecture 10 - February 6, 2013 7 Beliefs (Bayesian Probabilities) We use probability to describe uncertainty due to: Laziness: failure to enumerate exceptions, qualifications etc. Ignorance: lack of relevant facts, initial conditions etc. True randomness? Quantum effects?... Beliefs (Bayesian or subjective probabilities) relate propositions to one s current state of knowledge E.g. P(A(25) no reported accident) = 0.1 These are not assertions about the world / absolute truth Beliefs change with new evidence: E.g. P(A(25) no reported accident, 5am) = 0.2 This is analogous to logical entailment: KB α means that α is true given the KB, but may not be true in general. COMP-424, Lecture 10 - February 6, 2013 8

Making Decisions Under Uncertainty Suppose I believe the following: P(A(25) gets me there on time... ) = 0.04 P(A(90) gets me there on time... ) = 0.70 P(A(120) gets me there on time... ) = 0.95 P(A(1440) gets me there on time... ) = 0.9999 Which action should I choose? COMP-424, Lecture 10 - February 6, 2013 9 Making Decisions Under Uncertainty Suppose I believe the following: P(A(25) gets me there on time... ) = 0.04 P(A(90) gets me there on time... ) = 0.70 P(A(120) gets me there on time... ) = 0.95 P(A(1440) gets me there on time... ) = 0.9999 Which action should I choose? Depends on my preferences for missing flight vs. airport cuisine, etc. Utility theory is used to represent and infer preferences. Decision theory = utility theory + probability theory COMP-424, Lecture 10 - February 6, 2013 10

Random Variables A random variable X describes an outcome that cannot be determined in advance E..g. The roll of a die E.g. Number of e-mails received in a day The sample space (domain) S of a random variable X is the set of all possible values of the variable E.g. For a die, S = {1, 2, 3, 4, 5, 6} E.g. For number of emails received in a day, S is the natural numbers An event is a subset of S. E.g. e = {1} corresponds to a die roll of 1 E.g. number of e-mails in a day more than 100 COMP-424, Lecture 10 - February 6, 2013 11 Probability for Discrete Random Variables Usually, random variables are governed by some law of nature, described as a probability function P defined on S. P (x) defines the chance that variable X takes value x S. E.g. for a die roll with a fair die, P (1) = P (2) = = P (6) = 1/6 Note that we still cannot determine the value of X, just the chance of encountering a given value If X is a discrete variable, then a probability space P (x) has the following properties: 0 P (x) 1, x S and x S P (x) =1 COMP-424, Lecture 10 - February 6, 2013 12

Beliefs We use probability to describe the world and existing uncertainty Agents will have beliefs based on their current state of knowledge E.g. P(Some day AI agents will rule the world)=0.2 reflects a personal belief, based on one s state of knowledge about current AI, technology trends etc. Different agents may hold different beliefs, as these are subjective Beliefs may change over time as agents get new evidence Prior (unconditional) beliefs denote belief prior to the arrival of any new evidence. COMP-424, Lecture 10 - February 6, 2013 13 Axioms of Probability Beliefs satisfy the axioms of probability. For any propositions A, B: 1. 0 P (A) 1 2. P (True)=1(hence P (False) =0) 3. P (A B) =P (A)+P (B) P (A B) 4. Alternatively, if A and B are mutually exclusive (A B = F )then: P (A B) =P (A)+P (B) The axioms of probability limit the class of functions that can be considered probability functions. COMP-424, Lecture 10 - February 6, 2013 14

Defining Probabilistic Models We define the world as a set of random variables Ω = {X 1...X n }. A probabilistic model is an encoding of probabilistic information that allows us to compute the probability of any event in the world The world is divided into a set of elementary, mutually exclusive events, called states E.g. If the world is described by two Boolean variables A, B, astate will be a complete assignment of truth values for A and B. A joint probability distribution function assigns non-negative weights to each event, such that these weights sum to 1. COMP-424, Lecture 10 - February 6, 2013 15 Inference using Joint Distributions E.g. Suppose Happy and Rested are the random variables: Happy= true Happy= false Rested = true 0.05 0.1 Rested = false 0.6 0.25 The unconditional probability of any proposition is computable as the sum of entries from the full joint distribution E.g. P(Happy) = P(Happy, Rested) +P(Happy, Rested) = 0.65 COMP-424, Lecture 10 - February 6, 2013 16

Conditional Probability The basic statements in the Bayesian framework talk about conditional probabilities. P (A B) is the belief in event A given that event B is known with certainty The product rule gives an alternative formulation: P (A B) =P (A B)P (B) =P (B A)P (A) Note: we often write P (A, B) as a shorthand for P (A B) COMP-424, Lecture 10 - February 6, 2013 17 Chain Rule Chain rule is derived by successive application of product rule: P (X 1,...,X n )= = P (X 1,...,X n 1 )P (X n X 1,...,X n 1 ) = P (X 1,...,X n 2 )P (X n 1 X 1,...,X n 2 )P (X n X 1,...,X n 1 ) =... n = P (X i X 1,...,X i 1 ) i=1 COMP-424, Lecture 10 - February 6, 2013 18

Bayes Rule Bayes rule is another alternative formulation of the product rule: P (A B) = P (B A)P (A) P (B) The complete probability formula states that: P (A) =P (A B)P (B)+P (A B)P ( B) or more generally, P (A) = i P (A b i )P (b i ), where b i form a set of exhaustive and mutually exclusive events. COMP-424, Lecture 10 - February 6, 2013 19 Conditional vs unconditonal probability Bertrand s coin box problem What are we supposed to compute? COMP-424, Lecture 10 - February 6, 2013 20

Using Bayes Rule for Inference Suppose we want to form a hypothesis about the world based on observable variables. Bayes rule tells us how to calculate the belief in a hypothesis H given evidence e: P (H e) =(P (e H)P (H))/P (e) P (H e) is the posterior probability P (H) is the prior probability P (e H) is the likelihood P (e) is a normalizing constant, which can be computed as: P (e) =P (e H)P (H) +P (e H)P ( H) Sometimes we write P (H e) P (e H)P (H) COMP-424, Lecture 10 - February 6, 2013 21 Example: Medical Diagnosis You go to the doctor complaining about the symptom of having a fever (evidence). The doctor knows that bird flu causes a fever 95% of the time. The doctor knows that if a person is selected randomly from the population, there is a 10 7 chance of the person having bird flu. In general, 1 in 100 people in the population suffer from fever. What is the probability that you have bird flu (hypothesis)? P (H e) = P (e H)P (H) P (e) = 0.95 10 7 0.01 =0.95 10 5 COMP-424, Lecture 10 - February 6, 2013 22

Computing Conditional Probabilities Typically, we are interested in the posterior joint distribution of some query variables Y given specific values e for some evidence variables E Let the hidden variables be Z = X Y E If we have a joint probability distribution, we can compute the answer by summing out the hidden variables: P (Y e) P (Y,e)= z P (Y,e,z) summing out Problem: the joint distribution is too big to handle! COMP-424, Lecture 10 - February 6, 2013 23 Example Consider medical diagnosis, where there are 100 different symptoms and test results that the doctor could consider. A patient comes in complaining of fever, cough and chest pains. The doctor wants to compute the probability of pneumonia. The probability table has >= 2 100 entries! For computing the probability of a disease, we have to sum out over 97 hidden variables! COMP-424, Lecture 10 - February 6, 2013 24

Independence of Random Variables Two random variables X and Y are independent if knowledge about X does not change the uncertainty about Y (and vice versa): P (x y) = P (x) x S X,y S Y P (y x) = P (y) x S X,y S Y or equivalently, P (x, y) =P (x)p (y) If n Boolean variables are independent, the whole joint distribution can be computed as: P (x 1,...x n )= P (x i ) i Only n numbers are needed to specify the joint, instead of 2 n Problem: Absolute independence is a very strong requirement COMP-424, Lecture 10 - February 6, 2013 25 Example: Dice Let A be the event that one die comes to 1 Let B be the event that a different die comes to 4 Let C be the event of the sum of the two dice being 5 Are A and B independent? Are A and C independent? Are A, B and C all independent? COMP-424, Lecture 10 - February 6, 2013 26

Conditional Independence Two variables X and Y are conditionally independent given Z if: P (x y, z) =P (x z), x, y, z This means that knowing the value of Y does not change the prediction about X if the value of Z is known. COMP-424, Lecture 10 - February 6, 2013 27 Example Consider a patient with three random variables: B (patient has bronchitis), F (patient has fever), C (patient has a cough) The full joint distribution has 2 3 1 =7independententries If someone has bronchitis, we can assume that, the probability of a cough does not depend on whether they have a fever: P (C B,F) =P (C B) (1) I.e., C is conditionally independent of F given B The same independence holds if the patient does not have bronchitis: P (C B,F) =P (C B) (2) therefore C and F are conditionally independent given B COMP-424, Lecture 10 - February 6, 2013 28

Example (continued) The full joint distribution can now be written as: P (C, F, B) = = P (C, F B)P (B) = P (C B)P (F B)P (B) I.e., 2 + 2 + 1 = 5 independent numbers (equations 1 and 2 remove two numbers) Much more important savings happen if the system has lots of variables! COMP-424, Lecture 10 - February 6, 2013 29 Naive Bayesian Model A common assumption in early diagnosis is that the symptoms are independent of each other given the disease Let s 1,...s n be the symptoms exhibited by a patient (e.g. fever, headache etc) Let D be the patient s disease Using the naive Bayes assumption: P (D, s 1,...s n )=P (D)P (s 1 D) P (s n D) COMP-424, Lecture 10 - February 6, 2013 30

Recursive Bayesian Updating The naive Bayes assumption allows incremental updating of beliefs as more evidence is gathered Suppose that after knowing symptoms s 1,...s n the probability of D is: P (D s 1...s n )=P(D) n i=1 P (s i D) What happens if a new symptom s n+1 appears? n+1 P (D s 1...s n,s n+1 ) = P (D) P (s i D) i=1 = P (D s 1...s n )P (s n+1 D) An even nicer formula can be obtained by taking logs: log P (D s 1...s n,s n+1 ) = log P (D s 1...s n ) + log P (s n+1 D) COMP-424, Lecture 10 - February 6, 2013 31 A Graphical Representation of the Naive Bayesian Model Diagnosis Sympt1 Sympt2... Sympt n The nodes represent random variables The arcs represent influences The lack of arcs represents conditional independence How about models represented as more general graphs? COMP-424, Lecture 10 - February 6, 2013 32

Example (Adapted from Pearl) I m at work, and my neighbor calls to say the burglar alarm is ringing in my house. Sometimes the alarm is set off by minor earthquakes. Earthquakes are sometimes reported on the radio. Should I rush home? What random variables are there? Are there any independence relationships between these variables? COMP-424, Lecture 10 - February 6, 2013 33 Example: A Belief (Bayesian) Network P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1 P(R E) R=1 0.0001 0.65 0.35 R=0 0.9999 A=0 A=1 R P(C A) C=1 C=0 0.05 0.95 0.7 0.3 A C P(A B,E) A=1 A=0 B=0,E=0 0.001 0.999 B=0,E=1 0.3 0.7 B=1,E=0 0.8 0.2 B=1,E=1 0.95 0.05 The nodes represent random variables The arcs represent direct influences At each node, we have a conditional probability distribution (CPD) for the corresponding variable given its parents COMP-424, Lecture 10 - February 6, 2013 34

Using a Bayes Net for Reasoning (1) Computing any entry in the joint probability table is easy: P (B, E, A, C, R) = P (B)P ( E)P (A B, E)P (C A)P ( R E) = 0.01 0.995 0.8 0.7 0.9999 0.0056 What is the probability that a neighbor calls? P (C = 1) = e,b,r,a P (C =1, e, b, r, a) =... What is the probability of a call in case of a burglary? P (C =1,B = 1) e,r,a P (C =1,B =1, e, r, a) P (C =1 B = 1) = = P (B = 1) c,e,r,a P (c, B =1, e, r, a) This is causal reasoning or prediction COMP-424, Lecture 10 - February 6, 2013 35 Suppose we got a call. Using a Bayes Net for Reasoning (2) What is the probability of a burglary? P (B C) = P (C B)P (B) P (C) =... What is the probability of an earthquake? P (E C) = P (C E)P (E) P (C) =... This is evidential reasoning or explanation COMP-424, Lecture 10 - February 6, 2013 36

Using a Bayes Net for Reasoning (3) What happens to the probabilities if the radio announces an earthquake? P (E C, R) P (E C) and P (B C, R) P (B C) This is called explaining away COMP-424, Lecture 10 - February 6, 2013 37 Example: Pathfinder (Heckerman, 1991) Medical diagnostic system for lymph node diseases Large net! 60 diseases, 100 symptoms and test results, 14000 probabilities Network built by medical experts 8 hours to determine the variables 35 hours for network topology 40 hours for probability table values Experts found it easy to invent causal links and probabilities Pathfinder is now outperforming world experts in diagnosis Being extended now to other medical domains COMP-424, Lecture 10 - February 6, 2013 38

Using Graphs for Knowledge Representation Graphs have been proposed as models of human memory and reasoning on many occasions (e.g. semantic nets, inference networks, conceptual dependencies) There are many efficient algorithms that work with graphs, and efficient data structures Recently there has been an explosion of graphical models for representing probabilstic knowledge Lack of edges is assumed to represent conditional independence assumptions Graph properties are used to do inference (i.e. probabilities) efficiently using such representations compute conditional COMP-424, Lecture 10 - February 6, 2013 39