Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Similar documents
Artificial Intelligence

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Uncertainty. Chapter 13

Uncertainty. Outline

Artificial Intelligence Uncertainty

Uncertainty. Chapter 13

Quantifying uncertainty & Bayesian networks

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Pengju XJTU 2016

CS 561: Artificial Intelligence

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

Uncertainty. Chapter 13, Sections 1 6

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic Reasoning

Uncertain Knowledge and Reasoning

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Uncertainty and Bayesian Networks

Uncertainty. Chapter 13. Chapter 13 1

Resolution or modus ponens are exact there is no possibility of mistake if the rules are followed exactly.

Web-Mining Agents Data Mining

Probabilistic Robotics

Uncertainty. Russell & Norvig Chapter 13.

Probabilistic Reasoning Systems

Uncertainty (Chapter 13, Russell & Norvig) Introduction to Artificial Intelligence CS 150 Lecture 14

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probabilistic representation and reasoning

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Chapter 13 Quantifying Uncertainty

Probabilistic representation and reasoning

ARTIFICIAL INTELLIGENCE. Uncertainty: probabilistic reasoning

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Uncertainty. CmpE 540 Principles of Artificial Intelligence Pınar Yolum Uncertainty. Sources of Uncertainty

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Bayesian networks. Chapter 14, Sections 1 4

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

UNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.

Graphical Models - Part I

Basic Probability and Decisions

CS 188: Artificial Intelligence Fall 2009

Probability theory: elements

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

PROBABILISTIC REASONING Outline

Probabilistic Models

n How to represent uncertainty in knowledge? n Which action to choose under uncertainty? q Assume the car does not have a flat tire

Pengju

PROBABILISTIC REASONING SYSTEMS

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

CS 188: Artificial Intelligence Spring Announcements

CS 5522: Artificial Intelligence II

Bayesian networks. Chapter Chapter

Uncertainty (Chapter 13, Russell & Norvig)

Review: Bayesian learning and inference

Fusion in simple models

CS 343: Artificial Intelligence

COMP5211 Lecture Note on Reasoning under Uncertainty

Introduction to Artificial Intelligence Belief networks

CS 5100: Founda.ons of Ar.ficial Intelligence

14 PROBABILISTIC REASONING

Reasoning with Uncertainty. Chapter 13

CS 188: Artificial Intelligence Fall 2009

CS 188: Artificial Intelligence Fall 2008

CSE 473: Artificial Intelligence Autumn 2011

Where are we in CS 440?

Probabilistic Reasoning

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

CS 188: Artificial Intelligence. Our Status in CS188

Reasoning Under Uncertainty

Bayesian Belief Network

Bayesian networks. Chapter Chapter

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Our Status. We re done with Part I Search and Planning!

Y. Xiang, Inference with Uncertain Knowledge 1

Informatics 2D Reasoning and Agents Semester 2,

Reasoning Under Uncertainty

Probabilistic Models. Models describe how (a portion of) the world works

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Where are we in CS 440?

Course Overview. Summary. Outline

Artificial Intelligence Programming Probability

Bayesian networks (1) Lirong Xia

Lecture Overview. Introduction to Artificial Intelligence COMP 3501 / COMP Lecture 11: Uncertainty. Uncertainty.

CSE 473: Artificial Intelligence

Quantifying Uncertainty

Bayesian networks: Modeling

CS Belief networks. Chapter

13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Transcription:

Problem of Logic Agents 22c:145 Artificial Intelligence Uncertainty Reading: Ch 13. Russell & Norvig Logic-agents almost never have access to the whole truth about their environments. A rational agent is one that makes rational decisions in order to maximize its performance measure. Logic-agents g may have to either risk falsehood or make weak decisions in uncertain situation A rational agent s decision depends on relative importance of goals, likelihood of achieving them. Probability theory provides a quantitative way of encoding likelihood Lecture 14 1 Lecture 14 2 Foundations of Probability Probability Theory makes the same ontological commitments as FOL Every sentence S is either true or false. The degree of belief, or probability, that S is true is a number P between 0 and 1. P(S) = 1 iff S is certainly true P(S) = 0 iff S is certainly false P(S) = 0.4 iff S is true with a 40% chance P(not A) = probability that A is false P(A and B) = probability that both A and B are true P(A or B) = probability that either A or B (or both) are true Axioms of Probability All probabilities are between 0 and 1 Valid propositions have probability 1. Unsatisfiable propositions have probability 0. That is, P(A v : A) = P(true) = 1 P(A Æ:A) = P(false) = 0 P(: A) = 1 P(A) The probability of disjunction is defined as follows. P(A v B) = P(A) + P(B) P(A Æ B) P(A Æ B) = P(A) + P(B) P(A v B) A B U Lecture 14 3 Lecture 14 4 Exercise Problem I Prove that P(A v B v C) = P(A) + P(B) + P(C) P(A Æ B) P(A Æ C) P(B Æ C) + P(A Æ B Æ C) How to Decide Values of Probability P(the sun comes up tomorrow) = 0.999 Frequentist Probability is inherent in the process Probs can be Probability is estimated from wrong! measurements Lecture 14 5 Lecture 14 6 1

A Question Jane is from Berkeley. She was active in anti-war protests in the 60 s. She lives in a commune. Which is more probable? 1. Jane is a bank teller 2. Jane is a feminist bank teller A Question Jane is from Berkeley. She was active in anti-war protests in the 60 s. She lives in a commune. Which is more probable? 1. Jane is a bank teller 2. Jane is a feminist bank teller 1. A 2. A Æ B A A Æ B B U Lecture 14 7 Lecture 14 8 Conditional Probability P(A) is the unconditional (or prior) probability An agent can use unconditional probability of A to reason about A only in the absence of no further information. If some further evidence B becomes available, the agent must use the conditional (or posterior) probability: pobab P(A B) the probability of A given that the agent already knew that B is true. P(A) can be thought as the conditional probability of A with respect to the empty evidence: P(A) = P(A ). Conditional Probability 1. P(Blonde) = 2. P(Blonde Swedish) = 3. P(Blonde Kenian) = 4. P(Blonde Kenian Æ :EuroDescent) = If we know nothing about a person, the probability that he/she is blonde equals a certain value, say 0.1. If we know that a person is Swedish the probability that s/he is blonde is much higher, say 0.9. If we know that the person is Kenyan, the probability s/he is blonde much lower, say 0.000003. If we know that the person is Kenyan and not of European descent, the probability s/he is blonde is basically 0. Computation: P(A B) = P(A Æ B)/P(B) Lecture 14 9 Lecture 14 10 Random Variables Variable Domain Age { 1, 2,, 120 } Weather { sunny, dry, cloudy, raining } Size { small, medium, large } Raining { true, false } The probability that a random variable X has value val is written as P(X=val) P: domain! [0, 1] Sums to 1 over the domain: P(Raining = true) = P(Raining) = 0.2 P(Raining = false) = P(: Raining) = 0.8 Probability Distribution If X is a random variable, we use the bold case P(X) to denote a vector of values for the probabilites of each individual element that X can take. : P(Weather = sunny) = 0.6 P(Weather = rain) = 0.2 P(Weather = cloudy) = 0.18 P(Weather = snow) = 0.0202 Then P(Weather) = <0.6, 0.2, 0.18, 0.02> (the value order of sunny'', rain'', cloudy'', snow'' is assumed). P(Weather) is called a probability distribution for the random variable Weather. Joint distribution: P(X1, X2,, Xn) Probability assignment to all combinations of values of random variables Lecture 14 11 Lecture 14 12 2

Joint Distribution Joint Distribution Toothache :Toothache Cavity 0.04 0.06 : Cavity 0.01 0.89 Toothache :Toothache Cavity 0.04 0.06 : Cavity 0.01 0.89 The sum of the entries in this table has to be 1 The sum of the entries in this table has to be 1 Given this table, one can answer all the probability questions about this domain P(cavity) = 0.1 [add elements of cavity row] P(toothache) = 0.05 [add elements of toothache column] Lecture 14 13 Lecture 14 14 Joint Distribution Joint Distribution Toothache :Toothache Cavity 0.04 0.06 : Cavity 0.01 0.89 Toothache :Toothache Cavity 0.04 0.06 : Cavity 0.01 0.89 The sum of the entries in this table has to be 1 Given this table, one can answer all the probability questions about this domain P(cavity) = 0.1 [add elements of cavity row] P(toothache) = 0.05 [add elements of toothache column] P(A B) = P(A Æ B)/P(B) [prob of A when U is limited to B] The sum of the entries in this table has to be 1 Given this table, one can answer all the probability questions about this domain P(cavity) = 0.1 [add elements of cavity row] P(toothache) = 0.05 [add elements of toothache column] P(A B) = P(A Æ B)/P(B) [prob of A when U is limited to B] P(cavity toothache) = 0.04/0.05 = 0.8 A B U Lecture 14 15 A Æ B Lecture 14 16 Joint Probability Distribution (JPD) A joint probability distribution P(X1, X2, Xn) provides complete information about the probabilities of its random variables. However, JPD's are often hard to create (again because of incomplete knowledge of the domain). Even when available, JPD tables are very expensive, or impossible, to store because of their size. A JPD table for n random variables, each ranging over k distinct values, has k n entries! A better approach is to come up with conditional probabilities as needed and compute the others from them. Bayes Rule Bayes Rule P(A B) = P(B A) P(A) / P(B) What is the probability that a patient has meningitis (M) given that he has a stiff neck (S)? P(M S) = P(S M) P(M)/P(S) P(S M) is easier to estimate than P(M S) because it refers to causal knowledge: meningitis typically causes stiff neck. P(S M) can be estimated from past medical cases and the knowledge about how meningitis works. Similarly, P(M), P(S) can be estimated from statistical information. Lecture 14 17 Lecture 14 18 3

Bayes Rule Bayes Rule: P(A B) = P(B A) P(A) / P(B) The Bayes rule is helpful even in absence of (immediate) causal relationships. What is the probability that a blonde (B) is Swedish (S)? P(S B) = P(B S) P(S)/P(B) All P(B S), P(S), P(B) are easily estimated from statistical information. P(B S) = (# of blonde Swedish)/(Swedish population) = 9/10 P(S) = Swedish population/world population = P(B) = # of blondes/world population = Conditional Conditioning P(A) = P(A B) P(B) + P(A :B) P(:B) = P(A Æ B) + P(A Æ :B) In terms of exponential explosion, conditional probabilities do not seem any better than JPD's for computing the probability of a fact, given n>1 pieces of evidence. P(Meningitis StiffNeck Æ Nausea Æ Æ DoubleVision) However, certain facts do not always depend on all the evidence. P(Meningitis StiffNeck Æ Astigmatic) = P(Meningitis StiffNeck) Meningitis and Astigmatic are conditionally independent, given StiffNeck. Lecture 14 19 Lecture 14 20 A and B are independent iff P(A Æ B) = P(A) P(B) P(A B) = P(A) P(B A) = P(B) A and B are independent iff P(A Æ B) = P(A) P(B) P(A B) = P(A) P(B A) = P(B) is essential for efficient probabilistic reasoning A and B are conditionally independent given C iff P(A B, C) = P(A C) P(B A, C) = P(B C) P(A Æ B C) = P(A C) P(B C) Lecture 14 21 Lecture 14 22 s of Conditional Toothache (T) Spot in Xray (X) Cavity (C) s of Conditional Toothache (T) Spot in Xray (X) Cavity (C) None of these propositions are independent of one other T and X are conditionally independent given C Lecture 14 23 Lecture 14 24 4

s of Conditional Toothache (T) Spot in Xray (X) Cavity (C) None of these propositions are independent of one other T and X are conditionally independent given C Battery is dead (B) Radio plays (R) Starter turns over (S) s of Conditional Toothache (T) Spot in Xray (X) Cavity (C) None of these propositions are independent of one other T and X are conditionally independent given C Battery is dead (B) Radio plays (R) Starter turns over (S) None of these propositions are independent of one another R and S are conditionally independent given B Lecture 14 25 Lecture 14 26 Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1. partial observability (road state, other drivers' plans, etc. 2. noisy sensors (traffic reports) 3. uncertainty in action outcomes (flat tire, etc.) 4. immense complexity of modeling and predicting traffic Hence a purely logical approach either 1. risks falsehood: A 25 will get me there on time, or 2. leads to conclusions that are too weak for decision making: A 25 will get me there on time if there's no accident on the bridge and it doesn't rain and my tires remain intact etc etc. (A 1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport ) Lecture 14 27 Methods for handling uncertainty Default or nonmonotonic logic: Assume my car does not have a flat tire Assume A 25 works unless contradicted by evidence Issues: What assumptions are reasonable? How to handle contradiction? Rules with fudge factors: A 25 0.3 get there on time Sprinkler 0.99 WetGrass WetGrass 0.7 Rain Issues: Problems with combination, e.g., Sprinkler causes Rain?? Probability Model agent's degree of belief Given the available evidence, A 25 will get me there on time with probability 0.04 Lecture 14 28 Inference by enumeration Start with the joint probability distribution: Inference by enumeration Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σ ω:ω φ P(ω) For any proposition φ, sum the atomic events where it is true: P(φ) = Σ ω:ω φ P(ω) P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 Lecture 14 29 Lecture 14 30 5

Inference by enumeration Start with the joint probability distribution: Inference by enumeration Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σ ω:ω φ P(ω) P(toothache \/ cavity) = 0.108 + 0.012 + 0.016 + 0.064 + 0.072 + 0.008 = 0.28 Can also compute conditional probabilities: P( cavity toothache) = P( cavity toothache) P(toothache) = 0.016+0.064 0.108 + 0.012 + 0.016 + 0.064 = 0.4 Lecture 14 31 Lecture 14 32 Normalization Inference by enumeration Typically, we are interested in the posterior joint distribution of the query variables Y given specific values e for the evidence variables E Let the hidden variables be H = X - Y - E Denominator can be viewed as a normalization constant α P(Cavity toothache) = α P(Cavity,toothache) = α [P(Cavity,toothache,catch) + P(Cavity,toothache, catch)] = α [<0.108,0.016> + <0.012,0.064>] = α <0.12,0.08> = <0.6,0.4> where α = 1/ P(toothache) General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables Then the required summation of joint entries is done by summing out the hidden variables: P(Y E = e) = αp(y,e = e) = ασ h P(Y,E= e, H = h) where α = 1/ P(E = e) ) The terms in the summation are joint entries because Y, E and H together exhaust the set of random variables Obvious problems: 1. Worst-case time complexity O(d n ) where d is the largest arity 2. Space complexity O(d n ) to store the joint distribution 3. How to find the numbers for O(d n ) entries? Lecture 14 33 Lecture 14 34 Conditional independence P(Toothache, Cavity, Catch) has 2 3 1 = 7 independent entries If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) P(catch toothache, cavity) = P(catch cavity) The same independence holds if I haven't got a cavity: (2) P(catch toothache, cavity) = P(catch cavity) Catch is conditionally independent of Toothache given Cavity: P(Catch Toothache,Cavity) = P(Catch Cavity) Equivalent statements: P(Toothache Catch, Cavity) = P(Toothache Cavity) P(Toothache, Catch Cavity) = P(Toothache Cavity) P(Catch Cavity) Conditional independence Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache Catch, Cavity) P(Catch, Cavity) = P(Toothache Catch, Cavity) P(Catch Cavity) P(Cavity) = P(Toothache Cavity) P(Catch Cavity) P(Cavity) I.e., 2 + 2 + 1 = 5 independent numbers In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. Conditional independence is our most basic and robust form of knowledge about uncertain environments. Lecture 14 35 Lecture 14 36 6

Bayes' Rule Product rule P(a b) = P(a b) P(b) = P(b a) P(a) Bayes' rule: P(a b) = P(b a) P(a) / P(b) or in distribution form P(Y X) = P(X Y) P(Y) / P(X) = αp(x Y) P(Y) Useful for assessing diagnostic probability from causal probability: P(Cause Effect) = P(Effect Cause) P(Cause) / P(Effect) Bayes' Rule and conditional independence P(Cavity toothache catch) = αp(toothache catch Cavity) P(Cavity) = αp(toothache Cavity) P(catch Cavity) P(Cavity) This is an example of a naïve Bayes model: P(Cause,Effect 1,,Effect n ) = P(Cause) π i P(Effect i Cause) E.g., let M be meningitis, S be stiff neck: P(m s) = P(s m) P(m) / P(s) = 0.8 0.0001 / 0.1 = 0.0008 Note: posterior probability of meningitis still very small! Total number of parameters is linear in n Lecture 14 37 Lecture 14 38 Summary Probability is a rigorous formalism for uncertain knowledge Joint probability distribution specifies probability of every atomic event Queries can be answered by summing over atomic events For nontrivial domains, we must find a way to reduce the joint size and conditional independence provide the tools Bayesian Networks To do probabilistic reasoning, you need to know the joint probability distribution But, in a domain with N propositional variables, one needs 2 N numbers to specify the joint probability distribution We want to exploit independences in the domain Two components: structure and numerical parameters Lecture 14 39 Lecture 14 40 Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Topology of network encodes conditional independence assertions: Syntax: a set of nodes, one per variable a directed, acyclic graph (link "directly influences") a conditional distribution for each node given its parents: P (X i Parents (X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity Lecture 14 41 Lecture 14 42 7

contd. I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects "causal" knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call Lecture 14 43 Lecture 14 44 Compactness A CPT for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values Each row requires one number p for X i = true (the number for X i = false is just 1-p) If each variable has no more than k parents, the complete network requires O(n 2 k ) numbers I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution Semantics The full joint distribution is defined as the product of the local conditional distributions: P (X 1,,X n ) = π i = 1 P (X i Parents(X i )) n e.g., P(j m a b e) = P (j a) P (m a) P (a b, e) P ( b) P ( e) For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5-1 = 31) Lecture 14 45 Lecture 14 46 Constructing Bayesian networks 1. Choose an ordering of variables X 1,,X n 2. For i = 1 to n add X i to the network select parents from X 1,,X i-1 such that P (X i Parents(X i )) = P (X i X 1 1,... X i-1 1) n This choice of parents guarantees: n P (X 1,,X n ) = π i =1 P (X i X 1,, X i-1 ) (chain rule) = π i =1 P (X i Parents(X i )) (by construction) Lecture 14 47 Lecture 14 48 8

No P(A J, M) = P(A J)? P(A J, M) = P(A)? No P(A J, M) = P(A J)? P(A J, M) = P(A)? No P(B A, J, M) = P(B A)? P(B A, J, M) = P(B)? Lecture 14 49 Lecture 14 50 No P(A J, M) = P(A J)? P(A J, M) = P(A)? No P(B A, J, M) = P(B A)? Yes P(B A, J, M) = P(B)? No P(E B, A,J, M) = P(E A)? P(E B, A, J, M) = P(E A, B)? No P(A J, M) = P(A J)? P(A J, M) = P(A)? No P(B A, J, M) = P(B A)? Yes P(B A, J, M) = P(B)? No P(E B, A,J, M) = P(E A)? No P(E B, A, J, M) = P(E A, B)? Yes Lecture 14 51 Lecture 14 52 contd. Summary Bayesian networks provide a natural representation for (causally induced) conditional independence Topology + CPTs = compact representation of joint distribution Generally easy for domain experts to construct Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed Lecture 14 53 Lecture 14 54 9