CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-II

Similar documents
Reasoning under Uncertainty

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Reasoning under uncertainty

Reasoning Under Uncertainty

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Reasoning under Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Modeling and reasoning with uncertainty

CS 188: Artificial Intelligence. Our Status in CS188

Uncertainty. Chapter 13

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Computational Complexity of Inference

Reasoning Under Uncertainty: Conditional Probability

Quantifying uncertainty & Bayesian networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Directed Graphical Models

CS 188: Artificial Intelligence Spring Announcements

Uncertainty and Bayesian Networks

Artificial Intelligence Bayesian Networks

CS 5522: Artificial Intelligence II

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Bayesian Networks. Probability Theory and Probabilistic Reasoning. Emma Rollon and Javier Larrosa Q

Artificial Intelligence Uncertainty

Bayesian Networks Inference with Probabilistic Graphical Models

Reasoning with Bayesian Networks

Our Status. We re done with Part I Search and Planning!

CS 343: Artificial Intelligence

CSE 473: Artificial Intelligence

Probability. CS 3793/5233 Artificial Intelligence Probability 1

CS 188: Artificial Intelligence Fall 2008

Uncertain Knowledge and Bayes Rule. George Konidaris

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Part I Qualitative Probabilistic Networks

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Informatics 2D Reasoning and Agents Semester 2,

Directed Graphical Models or Bayesian Networks

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Reasoning Under Uncertainty: More on BNets structure and construction

Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

CS 188: Artificial Intelligence Fall 2009

Fusion in simple models

Introduction to Artificial Intelligence. Unit # 11

Probabilistic and Bayesian Analytics

Knowledge Discovery in Data: Overview. Naïve Bayesian Classification. .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Computer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks

Pengju

Logic and Bayesian Networks

Bayesian Networks Representation

Uncertainty. Chapter 13

2 Chapter 2: Conditional Probability

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Probabilistic Reasoning

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Probability Calculus. Chapter From Propositional to Graded Beliefs

10/15/2015 A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) Probability, Conditional Probability & Bayes Rule. Discrete random variables

Probabilistic Representation and Reasoning

ARTIFICIAL INTELLIGENCE. Uncertainty: probabilistic reasoning

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Bayesian networks (1) Lirong Xia

Bayesian decision theory. Nuno Vasconcelos ECE Department, UCSD

CSE 473: Artificial Intelligence Autumn 2011

CS 188: Artificial Intelligence. Bayes Nets

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Reasoning Under Uncertainty: Bayesian networks intro

Artificial Intelligence

Uncertainty. Outline

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Our learning goals for Bayesian Nets

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Some Concepts of Probability (Review) Volker Tresp Summer 2018

CS 188: Artificial Intelligence Fall 2009

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

CS 331: Artificial Intelligence Probability I. Dealing with Uncertainty

Discrete Probability and State Estimation

Artificial Intelligence Programming Probability

PROBABILISTIC REASONING SYSTEMS

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Pengju XJTU 2016

CS 188: Artificial Intelligence Fall Recap: Inference Example

Probability. Lecture Notes. Adolfo J. Rumbos

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Module 10. Reasoning with Uncertainty - Probabilistic reasoning. Version 2 CSE IIT,Kharagpur

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Reasoning under Uncertainty: Intro to Probability

Probability Basics Review

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

Probability Space: Formalism Simplest physical model of a uniform probability space:

Transcription:

CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-II 1

Bayes Rule Example Disease {malaria, cold, flu}; Symptom = fever Must compute Pr(D fever) to prescribe treatment Why not assess this quantity directly? Pr(mal fever) is not natural to assess; Pr(fever mal) reflects the underlying causal mechanism Pr(mal fever) is not stable : a malaria epidemic changes this quantity (for example) So we use Bayes rule: Pr(mal fever) = Pr(fever mal) Pr(mal) / Pr(fever) 2

Bayes Rule Pr(mal fever) = Pr(fever mal)pr(mal)/pr(fever) What about Pr(mal)? This is the prior probability of Malaria, i.e., before you exhibited a fever, and with it we can account for other factors, e.g., a malaria epidemic, or recent travel to a malaria risk zone. 3

Bayes Rule Pr(mal fever) = Pr(fever mal)pr(mal)/pr(fever) What about Pr(fever)? We compute Pr(fever mal)pr(mal), Pr(fever cold)pr(cold). Pr(fever flu)pr(flu). I.e., solve the same problem for each possible clause of fever. Pr(fever mal)pr(mal) = Pr(fever mal)/pr(mal) * Pr(mal) = Pr(fever mal) Similarly we can obtain Pr(fever cold) Pr(fever flu) 4

Bayes Rule Say that in our example, m, c and fl are only possible causes of fever (Pr(fev m c fl) = 0) and they are mutually exclusive. Then Pr(fev) = Pr(m&fev) + Pr(c&fev) + Pr(fl&fev) So we can sum the numbers we obtained for each disease to obtain the last factor in Bayes Rule: Pr(fever). Then we divide each answer by Pr(fever) to get the final probabilities Pr(mal fever), Pr(cold fever), Pr(flu fever). This summing and dividing by the sum ensures that Pr(mal fever) + Pr(cold fever) + Pr(flu fever) = 1, and is called normalizing. 5

Chain Rule Pr(A 1 A 2 A n ) = Pr(A 1 A 2 A n ) * Pr(A 2 A 3 A n ) * L * Pr(A n-1 A n ) * Pr(A n ) Proof: Pr(A 1 A 2 A n ) * Pr(A 2 A 3 A n ) * L * Pr(A n-1 A n ) = Pr(A 1 A 2 A n )/ Pr(A 2 A n ) * Pr(A 2 A n )/ Pr(A 3 A n ) * L * Pr(A n-1 A n )/Pr(A n ) * Pr(A n ) 6

Variable Independence Recall that we will be mainly dealing with probabilities over feature vectors. We have a set of variables, each with a domain of values. It could be that {V 1 =a} and {V 2 =b} are independent: Pr(V 1 =a V 2 =b) = Pr(V 1 =a)*pr(v 2 =b) It could also be that {V 1 =b} and {V 2 =b} are not independent: Pr(V 1 =b V 2 =b)!= Pr(V 1 =b)*pr(v 2 =b) 7

Variable Independence However we will generally want to deal with the situation where we have variable independence. Two variables X and Y are conditionally independent given variable Z iff x,y,z. x Dom(X) y Dom(Y) z Dom(Z) X=x is conditionally independent of Y=y given Z = z Pr(X=x Y=y Z=z) = Pr(X=x Z=z) * Pr(Y=y Z=z) Also applies to sets of more than two variables Also to unconditional case (X,Y independent) 8

Variable Independence If you know the value of Z (whatever it is), learning Y s value (whatever it is) does not influence your beliefs about any of X s values. these definitions differ from earlier ones, which talk about particular sets of events being independent. Variable independence is a concise way of stating a number of individual independencies. 9

What does independence buys us? Suppose (say, boolean) variables X 1, X 2,, X n are mutually independent (i.e., every subset is variable independent of every other subset) we can specify full joint distribution (probability function over all vectors of values) using only n parameters (linear) instead of 2 n -1 (exponential) How? Simply specify Pr(X 1 ), Pr(X n ) (i.e., Pr(X i =true) for all i) from this I can recover probability of any primitive event easily (or any conjunctive query). e.g. Pr(X 1 X 2 X 3 X 4 ) = Pr(X 1 ) (1-Pr(X 2 )) Pr(X 3 ) Pr(X 4 ) we can condition on observed value X k (or X k ) trivially Pr(X 1 X 2 X 3 ) = Pr(X 1 ) (1-Pr(X 2 )) 10

The Value of Independence Complete independence reduces both representation of joint and inference from O(2n) to O(n)! Unfortunately, such complete mutual independence is very rare. Most realistic domains do not exhibit this property. Fortunately, most domains do exhibit a fair amount of conditional independence. And we can exploit conditional independence for representation and inference as well. Bayesian networks do just this 11

An Aside on Notation Pr(X) for variable X (or set of variables) refers to the (marginal) distribution over X. It specifies Pr(X=d) for all d Dom[X] Note d Dom[X] Pr(X=d) = 1 (every vector of values must be in one of the sets {X=d} d Dom[X]) Also Pr(X=d 1 X=d 2 ) = 0. for all d 1,d 2 Dom[X] d 1 d 2 (no vector of values contains two different values for X). 12

An Aside on Notation Pr(X Y) refers to family of conditional distributions over X, one for each y Dom(Y). For each d Dom[Y], Pr(X Y) specifies a distribution over the values of X: Pr(X=d 1 Y=d), Pr(X=d 2 Y=d),, Pr(X=d n Y=d) for Dom[X] = {d 1,d 2,,d n }. Distinguish between Pr(X) which is a distribution and Pr(x i ) (x i Dom[X])-- which is a number. Think of Pr(X) as a function that accepts any x i Dom(X) as an argument and returns Pr(x i ). Similarly, think of Pr(X Y) as a function that accepts any x i Dom[X] and y k Dom[Y] and returns Pr(x i y k ). Note that Pr(X Y) is not a single distribution; rather it denotes the family of distributions (over X) induced by the different y k Dom(Y) 13

Exploiting Conditional Independence Let s see what conditional independence buys us Consider a story: If Craig woke up too early E, Craig probably needs coffee C; if C, Craig needs coffee, he's likely angry A. If A, there is an increased chance of an aneurysm (burst blood vessel) B. If B, Craig is quite likely to be hospitalized H. E C A B H E Craig woke too early A Craig is angry H Craig hospitalized C Craig needs coffee B Craig burst a blood vessel 14

Cond l Independence in our Story E C A B H If you learned any of E, C, A, or B, your assessment of Pr(H) would change. E.g., if any of these are seen to be true, you would increase Pr(h) and decrease Pr(~h). So H is not independent of E, or C, or A, or B. But if you knew value of B (true or false), learning value of E, C, or A, would not influence Pr(H). Influence these factors have on H is mediated by their influence on B. Craig doesn't get sent to the hospital because he's angry, he gets sent because he's had an aneurysm. So H is independent of E, and C, and A, given B 15

Cond l Independence in our Story E C A B H Similarly: B is independent of E, and C, given A A is independent of E, given C This means that: Pr(H B, {A,C,E} ) = Pr(H B) i.e., for any subset of {A,C,E}, this relation holds Pr(B A, {C,E} ) = Pr(B A) Pr(A C, {E} ) = Pr(A C) Pr(C E) and Pr(E) don t simplify 16

Cond l Independence in our Story E C A B H By the chain rule (for any instantiation of H E): Pr(H,B,A,C,E) = Pr(H B,A,C,E) Pr(B A,C,E) Pr(A C,E) Pr(C E) Pr(E) By our independence assumptions: Pr(H,B,A,C,E) = Pr(H B) Pr(B A) Pr(A C) Pr(C E) Pr(E) We can specify the full joint by specifying five local conditional distributions: Pr(H B); Pr(B A); Pr(A C); Pr(C E); and Pr(E) 17

Example Quantification Pr(c e) = 0.9 Pr(~c e) = 0.1 Pr(c ~e) = 0.5 Pr(~c ~e) = 0.5 Pr(b a) = 0.2 Pr(~b a) = 0.8 Pr(b ~a) = 0.1 Pr(~b ~a) = 0.9 E C A B H Pr(e) = 0.7 Pr(~e) = 0.3 Pr(a c) = 0.3 Pr(~a c) = 0.7 Pr(a ~c) = 1.0 Pr(~a ~c) = 0.0 Specifying the joint requires only 9 parameters (if we note that half of these are 1 minus the others), instead of 31 for explicit representation linear in number of vars instead of exponential! linear generally if dependence has a chain structure Pr(h b) = 0.9 Pr(~h b) = 0.1 Pr(h ~b) = 0.1 Pr(~h ~b) = 0.9 18

Inference is Easy E C A B H Want to know P(a)? Use summing out rule: P( a) = c i Pr( a Dom( C) c i ) Pr( c i ) = c i Pr( a Dom( C) c i ) e i Pr( c Dom( E) i e i ) Pr( e i ) These are all terms specified in our local distributions! 19

Inference is Easy E C A B H Computing P(a) in more concrete terms: P(c) = P(c e)p(e) + P(c ~e)p(~e) = 0.8 * 0.7 + 0.5 * 0.3 = 0.78 P(~c) = P(~c e)p(e) + P(~c ~e)p(~e) = 0.22 P(~c) = 1 P(c), as well P(a) = P(a c)p(c) + P(a ~c)p(~c) = 0.7 * 0.78 + 0.0 * 0.22 = 0.546 P(~a) = 1 P(a) = 0.454 20

Bayesian Networks The structure above is a Bayesian network. A BN is a graphical representation of the direct dependencies over a set of variables, together with a set of conditional probability tables quantifying the strength of those influences. Bayes nets generalize the above ideas in very interesting ways, leading to effective means of representation and inference under uncertainty. 21