Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Similar documents
Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Uncertainty and Bayesian Networks

Uncertainty. Chapter 13

Quantifying uncertainty & Bayesian networks

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Artificial Intelligence

Resolution or modus ponens are exact there is no possibility of mistake if the rules are followed exactly.

COMP5211 Lecture Note on Reasoning under Uncertainty

Artificial Intelligence Uncertainty

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

Uncertainty. Outline

Uncertainty. Chapter 13, Sections 1 6

Pengju XJTU 2016

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Probabilistic Reasoning

Uncertainty (Chapter 13, Russell & Norvig) Introduction to Artificial Intelligence CS 150 Lecture 14

Probabilistic Robotics

Uncertainty. Chapter 13. Chapter 13 1

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Uncertainty. Chapter 13

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Chapter 13 Quantifying Uncertainty

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Informatics 2D Reasoning and Agents Semester 2,

CS 561: Artificial Intelligence

Bayesian networks. Chapter 14, Sections 1 4

Artificial Intelligence Programming Probability

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Probabilistic representation and reasoning

Uncertain Knowledge and Reasoning

Probabilistic representation and reasoning

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

Reasoning Under Uncertainty

Uncertainty. Russell & Norvig Chapter 13.

Quantifying Uncertainty

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Where are we in CS 440?

Basic Probability and Decisions

Foundations of Artificial Intelligence

Y. Xiang, Inference with Uncertain Knowledge 1

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

PROBABILISTIC REASONING SYSTEMS

Reasoning Under Uncertainty

Where are we in CS 440?

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

Introduction to Artificial Intelligence Belief networks

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Uncertainty (Chapter 13, Russell & Norvig)

Bayesian Networks BY: MOHAMAD ALSABBAGH

Probabilistic Reasoning

PROBABILISTIC REASONING Outline

Artificial Intelligence CS 6364

ARTIFICIAL INTELLIGENCE. Uncertainty: probabilistic reasoning

Probabilistic Representation and Reasoning

14 PROBABILISTIC REASONING

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Web-Mining Agents Data Mining

Pengju

Probabilistic Reasoning Systems

Bayesian networks. Chapter Chapter

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Graphical Models - Part I

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Propositional Logic and Probability Theory: Review

Directed Graphical Models

CS 188: Artificial Intelligence Fall 2009

Reasoning with Uncertainty. Chapter 13

CS 188: Artificial Intelligence Spring Announcements

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Bayesian Networks. Motivation

Artificial Intelligence Bayesian Networks

13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 5100: Founda.ons of Ar.ficial Intelligence

Review: Bayesian learning and inference

Where are we? Knowledge Engineering Semester 2, Reasoning under Uncertainty. Probabilistic Reasoning

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 5522: Artificial Intelligence II

Introduction to Artificial Intelligence. Unit # 11

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

CS 188: Artificial Intelligence. Our Status in CS188

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes

CS 343: Artificial Intelligence

Basic Probability. Robert Platt Northeastern University. Some images and slides are used from: 1. AIMA 2. Chris Amato

CS 5522: Artificial Intelligence II

Our Status in CSE 5522

Lecture Overview. Introduction to Artificial Intelligence COMP 3501 / COMP Lecture 11: Uncertainty. Uncertainty.

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Probabilistic Models

n How to represent uncertainty in knowledge? n Which action to choose under uncertainty? q Assume the car does not have a flat tire

Transcription:

C:145 Artificial Intelligence@ Uncertainty Readings: Chapter 13 Russell & Norvig. Artificial Intelligence p.1/43 Logic and Uncertainty One problem with logical-agent approaches: Agents almost never have access to the whole truth about their environments. Very often, even in simple worlds, there are important questions for which there is no boolean answer. In that case, an agent must reason under uncertainty. Uncertainty also arises because of an agent s incomplete or incorrect understanding of its environment.

Uncertainty Let action L t = leave for airport t minutes before flight. Will L t get me there on time? Problems partial observability (road state, other drivers plans, etc.) noisy sensors (unreliable traffic reports) uncertainty in action outcomes (flat tire, etc.) immense complexity of modelling and predicting traffic Hence a purely logical approach either 1. risks falsehood ( A 25 will get me there on time ), or 2. leads to conclusions that are too weak for decision making ( A 25 will get me there on time if there s no Artificial Intelligence p.3/43 accident on the way, it doesn t rain, my tires remain intact, Reasoning under Uncertainty A rational agent is one that makes rational decisions (in order to maximize its performance measure). A rational decision depends on the relative importance of various goals, the likelihood they will be achieved the degree to which they will be achieved.

Handling Uncertain Knowledge Reasons FOL-based approaches fail to cope with domains like medical diagnosis: Laziness: too much work to write complete axioms, or too hard to work with the enormous sentences that result. Theoretical Ignorance: The available knowledge of the domain is incomplete. Practical Ignorance: The theoretical knowledge of the domain is complete but some evidential facts are missing. Artificial Intelligence p.5/43 Degrees of Belief In several real-world domains the agent s knowledge can only provide a degree of belief in the relevant sentences. The agent cannot say whether a sentence is true, but only that is true x% of the times. The main tool for handling degrees of belief is Probability Theory. The use of probability summarizes the uncertainty that stems from our laziness or ignorance about the domain.

Probability Theory Probability Theory makes the same ontological commitments as FOL: Every sentence ϕ is either true or false. The degree of belief that ϕ is true is a number P between 0 and 1. P(ϕ) = 1 ϕ is certainly true. P(ϕ) = 0 ϕ is certainly not true. P(ϕ) = 0.65 ϕ is true with a 65% chance. Artificial Intelligence p.7/43 Probability of Facts Let A be a propositional variable, a symbol denoting a proposition that is either true or false. P(A) denotes the probability that A is true in the absence of any other information. Similarly, P( A) = probability that A is false P(A B) = P(A B) = probability that both A and B are true probability that either A or B (or both) are true Examples: P( Blonde) P(Blonde BlueEyed) P(Blonde BlueEyed)

Conditional/Unconditional Probability P(A) is the unconditional (or prior) probability of fact A. An agent can use unconditional probability of A to reason about A only in the absence of no further information. If some further evidence B becomes available, the agent must use the conditional (or posterior) probability: P(A B) the probability of A given that all the agent knows is B. Note: P(A) can be thought as the conditional probability of A with respect to the empty evidence: P(A) = P(A ). Artificial Intelligence p.9/43 Conditional Probabilities The probability of a fact may change as the agent acquires more, or different, information: 1. P(Blonde) 2. P(Blonde Swedish) 3. P(Blonde Kenian) 4. P(Blonde Kenian EuroDescent) 1. If we know nothing about a person, the probability that he/she is blonde equals a certain value, say 0.2. 2. If we know that a person is Swedish the probability that s/he is blonde is much higher, say 0.9. 3. If we know that the person is Kenyan, the probability s/he is blonde much lower, say 0.000003. 4. If we know that the person is Kenyan and not of European descent, the probability s/he is blonde is

The Axioms of Probability Probability Theory is governed by the following axioms: 1. All probabilities are between 0 and 1. for all ϕ, 0 P(ϕ) 1 2. Valid propositions have probability 1. Unsatisfiable propositions have probability 0. P(α α) = 1 P(α α) = 0 3. The probability of disjunction is defined as follows. P(α β) = P(α)+P(β) P(α β) Artificial Intelligence p.11/43 Understanding Axiom 3 P(A B) = P(A)+P(B) P(A B)

Conditional Probabilities Conditional probabilities are defined in terms of unconditional ones, whenever P(B) > 0, P(A B) = P(A B) P(B) The same thing can be written as the product rule: P(A B) = P(A B)P(B) = P(B A)P(A) A and B are independent iff P(A B) = P(A) (or P(B A) = P(B), or P(A B) = P(A)P(B)). Artificial Intelligence p.13/43 Random Variable A random variable is a variable ranging over a certain domain of values. It is discrete if it ranges over a discrete (that is, countable) domain. continuous if it ranges over the real numbers. We will only consider discrete random variables with finite domains. Propositional variables can be seen as random variables over the Boolean domain.

Variable Random Variables Domain Age {1,2,...,120} W eather {sunny, dry, cloudy, rain, snow} Size {small, medium, large} Blonde {true, f alse} The probability that a random variable X has value val a is written as P(X = val) Note: P(X = true) is written simply as P(X). P(X = false) is written as P( X). a In Probability Theory, contrary to First-Order Logic, variables are capitalized and constant values are not. Artificial Intelligence p.15/43 Probability Distribution If X is a random variable, we use the bold case P(X) to denote a vector of values for the probabilites of each individual element that X can take. Example: P(Weather = sunny) = 0.6 P(Weather = rain) = 0.2 P(Weather = cloudy) = 0.18 P(Weather = snow) = 0.02 Then P(W eather) = 0.6, 0.2, 0.18, 0.02 (the value order of sunny, rain, cloudy, snow is assumed). P(Weather) is called a probability distribution for the random

Joint Probability Distribution If X 1,...,X n are random variables, P(X 1,...,X n ) denotes their joint probability distribution (JPD), an n-dimensional matrix specifying the probability of every possible combination of values for X 1,...,X n. Example Sky : {sunny, cloudy, rain, snow} Wind : {true,false} P(W ind, Sky) = sunny cloudy rain snow true 0.30 0.15 0.17 0.01 f alse 0.30 0.05 0.01 0.01 Artificial Intelligence p.17/43 Joint Probability Distribution All relevant probabilities about a vector X 1,...,X n of random variables can be computed from P(X 1,...,X n ). S = sunny S = cloudy S = rain S = snow P(W) W 0.30 0.15 0.17 0.01 0.63 W 0.30 0.05 0.01 0.01 0.37 P(S) 0.60 0.20 0.18 0.02 1.00 P(S = rain W) = 0.17 P(W) = 0.30+0.15+0.17+0.01 = 0.63 P(S = rain) = 0.17+0.01 = 0.18 P(S = rain W) = P(S = rain W)/P(W) = 0.17/0.63 = 0.27

Joint Probability Distribution A joint probability distribution P(X 1,...,X n ) provides complete information about the probabilities of its random variables. However, JPD s are often hard to create (again because of incomplete knowledge of the domain). Even when available, JPD tables are very expensive, or impossible, to store because of their size. A JPD table for n random variables, each ranging over k distinct values, has k n entries! A better approach is to come up with conditional probabilities as needed and compute the others from them. Artificial Intelligence p.19/43 An alternative to JPD: The Bayes Rule Recall that for any fact A and B, P(A B) = P(A B)P(B) = P(B A)P(A) From this we obtain the Bayes Rule: P(B A) = P(A B)P(B) P(A) The rule is useful in practice because it is often easier to figure out P(A B), P(B), P(A) than to figure out P(B A) directly.

Applying the Bayes Rule What is the probability that a patient has meningitis (M) given that he has a stiff neck (S)? P(M S) = P(S M)P(M) P(S) P(S M) is easier to estimate than P(M S) because it refers to causal knowledge: meningitis typically causes stiff neck. P(S M) can be estimated from past medical cases and the knowledge about how meningitis works. Similarly, P(M),P(S) can be estimated from statistical information. Artificial Intelligence p.21/43 Applying the Bayes Rule The Bayes rule is helpful even in absence of (immediate) causal relationships. What is the probability that a blonde (B) is Swedish (S)? P(S B) = P(B S)P(S) P(B) All P(B S),P(S),P(B) are easily estimated from statistical information. P(B S) P(S) P(B) # of blonde Swedish Swedish population = 9 10 Swedish population world population =... # of blondes world population =...

Conditional Independence In terms of exponential explosion, conditional probabilities do not seem any better than JPD s for computing the probability of a fact, given n > 1 pieces of evidence. P(Meningitis StiffNeck Nausea DoubleVision) However, certain facts do not always depend on all the evidence. P(Meningitis StiffNeck Astigmatic) = P(Meningitis StiffNeck) Meningitis and Astigmatic are conditionally independent, given StiffNeck. Artificial Intelligence p.23/43 Bayesian Networks Conditional independence information is crucial to making (automated) probabilistic reasoning feasible. Bayesian Networks are a successful example of probabilistic systems that exploit conditional independence to reason effectively under uncertainty.

Making Probabilistic Reasoning Feasible A joint probability distribution (JPD) contains all the relevant information to reason about the various kinds of probabilities of a set {X 1,...,X n } of random variables. Unfortunately, JPD tables are difficult to create and also very expensive to store. One possibility is to work with conditional probabilities and exploit the fact that many random variables are conditionally independent. Belief Networks are a successful example of probabilistic systems that exploit conditional independence to reason effectively under uncertainty. Artificial Intelligence p.25/43 Review of Some Concepts JPD is a collection of probabilties: P(X 1,...,X n ) = {P(X 1 = x 1 X n = x n ) x i Domain(X i )} Conditional Probability: P(X 1 = x 1 X 2 = x 2 ) = P(X 1=x 1 X 2 =x 2 ) P(X 2 =x 2 ) P(X 1 = x 1 X 2 = x 2 ) = P(X 1 = x 1 X 2 = x 2 )P(X 2 = x 2 ) or P(X 1 = x 1 X 2 = x 2,...,X n = x n ) = P(X 1=x 1 X 2 =x 2 X n =x n ) P(X 2 =x 2 X n =x n ) P(X 1 = x 1 X 2 = x 2 X n = x n ) or = P(X 1 = x 1 X 2 = x 2,...,X n = x n )P(X 2 = x 2 X n = x n ) = n i=1 P(X i = x i X i+1 = x i+1,...,x n = x n )

Review of Some Concepts (2) Conditional Independence: If P(X 1 = x 1 X 2 = x 2,...,X n = x n ) = P(X 1 = x 1 X 2 = x 2,...,X n 1 = x n 1 ) then we say X 1 = x 1 is conditionally indepedent of X n = x n given the evidence of X 2 = x 2,...,X n 1 = x n 1. Artificial Intelligence p.27/43 Belief Networks Let X 1,...,X n be random variables. A belief network (or Bayesian network) for X 1,...,X n is a graph with m nodes such that there is a node for each X i, all the edges between two nodes are directed, there are no cycles, each node has a conditional probability table (CPT), given in terms of the node s parents. The intuitive meaning of an edge from a node X i to a node X j is that X i has a direct influence on X j.

Conditional Probability Tables Each node X i in a belief network has a CPT expressing the probability of X i, given its parents as evidence. CPT for Alarm: Alarm Burglary Earthquake T F T T 0.950 0.050 T F 0.940 0.060 F T 0.290 0.710 F F 0.001 0.999 For instance, the value of P(Alarm Burglary Earthquake) is given in position (1,1) of the table; the value of P( Alarm Burglary Earthquake) is given in position (3,2) of the table. Artificial Intelligence p.29/43 A Belief Network with CPTs Note: The tables only show P(X) here because P( X) = 1 P(X).

The Semantics of Belief Networks There are two, equivalent, ways to interpret a belief network for the variables X 1,...,X n : 1. The network is a representation of the JPD P(X 1,...,X n ). 2. The network is a collection of conditional dependence statements about X 1,...,X n. Interpretation 1 is helpful in understanding how to construct belief networks. Interpretation 2 is helpful in designing inference procedures on top of them. Artificial Intelligence p.31/43 Belief Network as Joints The whole joint P(X 1,...,X n ) can be computed from a belief network for X 1,...,X n and its CPTs. For each tuple x 1,...,x n of possible values for X 1,...,X n, n P(X 1 = x 1 X n = x n ) = P(X i = x i Parents(X i )) i=1 Notation: Parents(X i ) is a subset of {X i = x i 1 i n}, that is, each parent of X i takes the value specified in x 1,...,x n. n P(X 1 = x 1 X 2 = x 2 X n = x n ) = P(X i = x i X i+1 = x i+1,...,x n = x n ) tells us that Parents(X i ) may be {X i+1 = x i+1,...,x n = x n }. i=1

Belief Network as Joints P(X 1 = x 1 X n = x n ) = n P(X i = x i Parents(X i )) i=1 P(J M A B E) = P(J A)P(M A)P(A B E)P( B)P( E) = 0.9 0.7 0.001 0.999 0.998 = 0.00062 Artificial Intelligence p.33/43 BN and Conditional Independence Let {X 1,X 2,...,X n } be any set of nodes in the network so that all the parents of X 1 are in {X 2,...,X n }, no node in {X 2,...,X n } is a descendant of X 1. Let x 1,...,x n be a value assignment for X 1,...,X n. From equation (1) we can show that, P(X 1 = x 1 X 2 = x 2 X n = x n ) = P(X 1 = x 1 Parents(X 1 )) Given its parents as evidence, each node is conditionally independent from any node other than itself and its parents in the network.

BN and Conditional Independence P(X 1 = x 1 X 2 = x 2 X n = x n ) = P(X 1 = x 1 Parents(X 1 )) Examples P(B E) = P(B) P(J M A) = P(J A) P(J A E) = P(J A) P(J A B E) = P(J A) P(J M A B E) = P(J A) Exercise: Find all the conditional independences holding in this network. Artificial Intelligence p.35/43 Constructing BN: General Procedure 1. Choose a set of random variables X i that describe the domain. 2. Order the variables into a list L. 3. Start with an empty network. 4. While there are variables left in L: (a) Remove the first variable X i of L and add a node for it to the network. (b) Choose a minimal set of nodes already in the network which satisfy the conditional dependence property with X i. (c) Make these nodes the parents of X i. (d) Fill in the CPT for X i.

Ordering the Variables The order in which add the variables to the network is important. Wrong orderings produces more complex networks. Artificial Intelligence p.37/43 Ordering the Variables Right A general, effective heuristic for constructing simpler belief networks is to exploit causal links between random variables whenever possible. That is, add the variables to the network so that causes get added before effects.

Locally Structured Systems In addition to being a complete and non-redundant representation of the domain, a belief network is often more compact than a full joint. The reason is that probabilistic domains are often representable as a locally structured system. In a locally structured system, each subcomponent interacts with only a bounded number of other components, regardless of the size of the system. The complexity of local structures generally grows linearly, instead of exponentially. Artificial Intelligence p.39/43 BN and Locally Structured Systems In many real-world domains, each random variable is influenced by at most k others, for some fixed constant k. Assuming that we have n variables and, for simplicity, that they are all Boolean, a joint table will have 2 n entries. In a well constructed belief network, each node will have at most k parents. This means that each node will have a CPT with at most 2 k entries, for a total of 2 k n entries. Example: n=20, k=5. entries in network CPTs 2 5 20 = 640

Inference in Belief Networks Main task of a belief network: Compute the conditional probability of a set of query variables, given exact values for some evidence variables. P(Query Evidence) Belief networks are flexible enough so that any node can serve as either a query or an evidence variable. In general, to decide what actions to take, an agent 1. first gets values for some variables from its percepts, or from its own reasoning, 2. then asks the network about the possible values of the other variables. Artificial Intelligence p.41/43 Probabilistic Inference with BN Belief networks are a very flexible tool for probabilistic inference because they allow several kinds of inference: Diagnostic inference (from effects to causes) Example: Infer P(Burglary JohnCalls) Causal inference (from causes to effects) Infer P(JohnCalls Burglary) Example: Intercausal inference (between causes of a common effect) Example: Infer P(Burglary Alarm Earthquake) Mixed inference (combination of the above) Example: Infer P(Alarm JohnCalls Earthquake)

Types of Inference in Belief Networks Q = query variable E = evidence variable Artificial Intelligence p.43/43