Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Similar documents
Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-II

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Reasoning under Uncertainty

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

CS 188: Artificial Intelligence. Bayes Nets

Directed Graphical Models or Bayesian Networks

Directed Graphical Models

Informatics 2D Reasoning and Agents Semester 2,

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Uncertainty and Bayesian Networks

Bayes Nets: Independence

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

CSE 473: Artificial Intelligence Autumn 2011

Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473

CS 188: Artificial Intelligence Fall 2008

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Quantifying uncertainty & Bayesian networks

Probabilistic Models

CS 188: Artificial Intelligence Fall 2009

Directed and Undirected Graphical Models

CS 188: Artificial Intelligence Spring Announcements

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CS 5522: Artificial Intelligence II

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Introduction to Artificial Intelligence. Unit # 11

Bayes Nets III: Inference

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Reasoning with Bayesian Networks

Rapid Introduction to Machine Learning/ Deep Learning

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

COMP5211 Lecture Note on Reasoning under Uncertainty

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

PROBABILISTIC REASONING SYSTEMS

Probabilistic Models. Models describe how (a portion of) the world works

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

CS 5522: Artificial Intelligence II

Bayesian networks (1) Lirong Xia

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Artificial Intelligence Bayesian Networks

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

CS 188: Artificial Intelligence Spring Announcements

CS 343: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Probabilistic representation and reasoning

Artificial Intelligence Bayes Nets: Independence

Graphical Models - Part I

Bayesian Networks. Motivation

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

{ p if x = 1 1 p if x = 0

Bayesian Networks. Exact Inference by Variable Elimination. Emma Rollon and Javier Larrosa Q

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Uncertain Knowledge and Bayes Rule. George Konidaris

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Bayesian networks. Chapter Chapter

CSEP 573: Artificial Intelligence

Part I Qualitative Probabilistic Networks

CS 343: Artificial Intelligence

Computer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks

Reasoning under uncertainty

Reasoning Under Uncertainty: Conditional Probability

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Reasoning Under Uncertainty: More on BNets structure and construction

Review: Directed Models (Bayes Nets)

Reasoning Under Uncertainty: More on BNets structure and construction

Probabilistic representation and reasoning

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

10601 Machine Learning

Our Status. We re done with Part I Search and Planning!

Markov Networks.

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Reasoning Under Uncertainty

Foundations of Artificial Intelligence

Bayesian Approaches Data Mining Selected Technique

Uncertainty. Chapter 13

Reasoning Under Uncertainty: Bayesian networks intro

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

Logic and Bayesian Networks

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Directed Graphical Models

Pengju

Bayesian Networks: Representation, Variable Elimination

CSE 473: Artificial Intelligence

Probabilistic Graphical Models (I)

Bayesian Networks: Independencies and Inference

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Transcription:

Lecture 8 Probabilistic Reasoning CS 486/686 May 25, 2006

Outline Review probabilistic inference, independence and conditional independence Bayesian networks What are they What do they mean How do we create them 2

Probabilistic Inference By probabilistic inference, we mean given a prior distribution Pr over variables of interest, representing degrees of belief and given new evidence E=e for some var E Revise your degrees of belief: posterior Pr e How do your degrees of belief change as a result of learning E=e (or more generally E=e, for set E) 3

Conditioning We define Pr e (α) = Pr(α e) That is, we produce Pr e by conditioning the prior distribution on the observed evidence e 4

Semantics of Conditioning p1 p3 p1 αp1 p2 E=e p4 E=e p2 E=e αp2 E=e Pr Pr e α = 1/(p1+p2) normalizing constant 5

Inference: Computational Bottleneck Semantically/conceptually, picture is clear; but several issues must be addressed 6

Issue 1 How do we specify the full joint distribution over a set of random variables X 1, X 2,, X n? Exponential number of possible worlds e.g., if the X i are boolean, then 2n numbers (or 2n -1 parameters/degrees of freedom, since they sum to 1) These numbers are not robust/stable These numbers are not natural to assess (what is probability that Pascal wants a cup of tea; it s not raining or snowing in Montreal; robot charge level is low;?) 7

Issue 2 Inference in this representation is frightfully slow Must sum over exponential number of worlds to answer query Pr(α) or to condition on evidence e to determine Pr e (α) 8

Small Example: 3 Variables sunny ~sunny cold ~cold cold ~cold headache 0.108 0.012 headache 0.072 0.008 ~headache 0.016 0.064 ~headache 0.144 0.576 P(headache)=0.108+0.012+0.072+0.008=0.2 P(headache ^ cold sunny)= P(headache ^ cold ^ sunny)/p(sunny) = 0.108/(0.108+0.012+0.016+0.064)=0.54 P(headache ^ cold ~sunny)= P(headache ^ cold ^ ~sunny)/p(~sunny) = 0.072/(0.072+0.008+0.144+0.576)=0.09 9

Is there anything we can do? How do we avoid these two problems? no solution in general but in practice there is structure we can exploit We ll use conditional independence 10

Independence Recall that x and y are independent iff: Pr(x) = Pr(x y) iff Pr(y) = Pr(y x) iff Pr(xy) = Pr(x)Pr(y) intuitively, learning y doesn t influence beliefs about x x and y are conditionally independent given z iff: Pr(x z) = Pr(x yz) iff Pr(y z) = Pr(y xz) iff Pr(xy z) = Pr(x z)pr(y z) iff intuitively, learning y doesn t influence your beliefs about x if you already know z e.g., learning someone s mark on 486 exam can influence the probability you assign to a specific GPA; but if you already knew final 486 grade, learning the exam mark would not influence your GPA assessment 11

Variable Independence Two variables X and Y are conditionally independent given variable Z iff x, y are conditionally independent given z for all x Dom(X), y Dom(Y), z Dom(Z) Also applies to sets of variables X, Y, Z Also to unconditional case (X,Y independent) If you know the value of Z (whatever it is), nothing you learn about Y will influence your beliefs about X these definitions differ from earlier ones (which talk about events, not variables) 12

What good is independence? Suppose (say, boolean) variables X 1, X 2,, X n are mutually independent We can specify full joint distribution using only n parameters (linear) instead of 2n -1 (exponential) How? Simply specify Pr(x 1 ), Pr(x n ) From this we can recover the probability of any world or any (conjunctive) query easily Recall P(x,y)=P(x)P(y) and P(x y)=p(x) and P(y x)=p(y) 13

Example 4 independent boolean random variables X 1, X 2, X 3, X 4 P(x 1 )=0.4, P(x 2 )=0.2, P(x 3 )=0.5, P(x 4 )=0.8 P(x1,~x2,x3,x4)=P(x1)(1-P(x2))P(x3)P(x4) = (0.4)(0.8)(0.5)(0.8) = 0.128 P(x1,x2,x3 x4)=p(x1)p(x2)p(x3)1 =(0.4)(0.2)(0.5)(1) =0.04 14

The Value of Independence Complete independence reduces both representation of joint and inference from O(2 n ) to O(n)!! Unfortunately, such complete mutual independence is very rare. Most realistic domains do not exhibit this property. Fortunately, most domains do exhibit a fair amount of conditional independence. We can exploit conditional independence for representation and inference as well. Bayesian networks do just this 15

An Aside on Notation Pr(X) for variable X (or set of variables) refers to the (marginal) distribution over X. Pr(X Y) refers to family of conditional distributions over X, one for each y Dom(Y). Distinguish between Pr(X) -- which is a distribution and Pr(x) or Pr(~x) (or Pr(x i ) for nonboolean vars) -- which are numbers. Think of Pr(X) as a function that accepts any x i Dom(X) as an argument and returns Pr(x i ). Think of Pr(X Y) as a function that accepts any x i and y k and returns Pr(x i y k ). Note that Pr(X Y) is not a single distribution; rather it denotes the family of distributions (over X) induced by the different y k Dom(Y) 16

Exploiting Conditional Consider a story: Independence If Pascal woke up too early E, Pascal probably needs coffee C; if Pascal needs coffee, he's likely grumpy G. If he is grumpy then it s possible that the lecture won t go smoothly L. If the lecture does not go smoothly then the students will likely be sad S. E C G L S E Pascal woke too early G Pascal is grumpy S Students are sad C Pascal needs coffee L The lecture did not go smoothly 17

Conditional Independence E C G L S If you learned any of E, C, G, or L, your assessment of Pr(S) would change. E.g., if any of these are seen to be true, you would increase Pr(s) and decrease Pr(~s). So S is not independent of E, or C, or G, or L. But if you knew value of L (true or false), learning value of E, C, or G, would not influence Pr(S). Influence these factors have on S is mediated by their influence on L. Students aren t sad because Pascal was grumpy, they are sad because of the lecture. So S is independent of E, C, and G, given L 18

Conditional Independence E C G L S So S is independent of E, and C, and G, given L Similarly: S is independent of E, and C, given G G is independent of E, given C This means that: Pr(S L, {G,C,E} ) = Pr(S L) Pr(L G, {C,E} ) = Pr(L G) Pr(G C, {E} ) = Pr(G C) Pr(C E) and Pr(E) don t simplify 19

Conditional Independence E C G L S By the chain rule (for any instantiation of S E): Pr(S,L,G,C,E) = Pr(S L,G,C,E) Pr(L G,C,E) Pr(G C,E) Pr(C E) Pr(E) By our independence assumptions: Pr(S,L,G,C,E) = Pr(S L) Pr(L G) Pr(G C) Pr(C E) Pr(E) We can specify the full joint by specifying five local conditional distributions: Pr(S L); Pr(L G); Pr(G C); Pr(C E); and Pr(E) 20

Example Quantification Pr(c e) = 0.9 Pr(~c e) = 0.1 Pr(c ~e) = 0.5 Pr(~c ~e) = 0.5 Pr(l g) = 0.2 Pr(~l g) = 0.8 Pr(l ~g) = 0.1 Pr(~l ~g) = 0.9 E C G L S Pr(e) = 0.7 Pr(~e) = 0.3 Pr(g c) = 0.3 Pr(~g c) = 0.7 Pr(g ~c) = 1.0 Pr(~g ~c) = 0.0 Specifying the joint requires only 9 parameters (if we note that half of these are 1 minus the others), instead of 31 for explicit representation linear in number of vars instead of exponential! linear generally if dependence has a chain structure Pr(s l) = 0.9 Pr(~s l) = 0.1 Pr(s ~l) = 0.1 Pr(~s ~l) = 0.9 21

Inference is Easy E C G L S Want to know P(g)? Use summing out rule: P( g) = = c c i i Dom( C) Dom( C) Pr( g Pr( g c c i i ) Pr( c ) e i i ) Pr( c Dom( E) i e i ) Pr( e i ) These are all terms specified in our local distributions! 22

Inference is Easy E C G L S Computing P(g) in more concrete terms: P(c) = P(c e)p(e) + P(c ~e)p(~e) = 0.8 * 0.7 + 0.5 * 0.3 = 0.78 P(~c) = P(~c e)p(e) + P(~c ~e)p(~e) = 0.22 P(~c) = 1 P(c), as well P(g) = P(g c)p(c) + P(g ~c)p(~c) = 0.7 * 0.78 + 0.0 * 0.22 = 0.546 P(~g) = 1 P(g) = 0.454 23

Bayesian Networks The structure above is a Bayesian network. Graphical representation of the direct dependencies over a set of variables + a set of conditional probability tables (CPTs) quantifying the strength of those influences. Bayes nets generalize the above ideas in very interesting ways, leading to effective means of representation and inference under uncertainty. 24

Bayesian Networks aka belief networks, probabilistic networks A BN over variables {X 1, X 2,, X n } consists of: a DAG whose nodes are the variables a set of CPTs (Pr(X i Parents(X ) i ) for each X i P(a) P(~a) A B P(b) P(~b) C P(c a,b) P(~c a,b) P(c ~a,b) P(~c ~a,b) P(c a,~b) P(~c a,~b) P(c ~a,~b) P(~c ~a,~b) 25

Key notions Bayesian Networks aka belief networks, probabilistic networks parentsof a node: Par(X ) i childrenof node descendents of a node ancestorsof a node family: set of nodes consisting of X i and its parents CPTs are defined over families in the BN A C D B Parents(C)={A,B} Children(A)={C} Descendents(B)={C,D} Ancestors{D}={A,B,C} Family{C}={C,A,B} 26

An Example Bayes Net A few CPTs are shown Explicit joint requires 211-1 =2047 params BN requires only 27 parms (the number of entries for each CPT is listed) 27

Semantics of a Bayes Net The structure of the BN means: every X i is conditionally independent of all of its nondescendants given its parents: Pr(X i S Par(X i )) = Pr(X i Par(X i )) for any subset S NonDescendants(X i ) 28

Semantics of Bayes Nets If we ask for P(x 1, x 2,, x n ) we obtain assuming an ordering consistent with network By the chain rule, we have: P(x 1, x 2,, x n ) = P(x n x n-1,,x 1 ) P(x n-1 x n-2,,x 1 ) P(x 1 ) = P(x n Par(x n )) P(x n-1 Par(x n-1 )) P(x 1 ) Thus, the joint is recoverable using the parameters (CPTs) specified in an arbitrary BN 29

Constructing a Bayes Net Given any distribution over variables X 1, X 2,, X n, we can construct a Bayes net that faithfully represents that distribution. Take any ordering of the variables (say, the order given), and go through the following procedure for X n down to X 1. Let Par(X n ) be any subset S {X 1,, X n-1 } such that X n is independent of {X 1,, X n-1 } - S given S. Such a subset must exist (convince yourself). Then determine the parents of X n-1 in the same way, finding a similar S {X 1,, X n-2 }, and so on. In the end, a DAG is produced and the BN semantics must hold by construction. 30

Causal Intuitions The construction of a BN is simple works with arbitrary orderings of variable set but some orderings are much better than others! generally, if ordering/dependence structure reflects causal intuitions, a more natural, compact BN results In this BN, we ve used the ordering Mal, Cold, Flu, Aches to build BN for distribution P for Aches Variable can only have parents that come earlier in the ordering 31

Causal Intuitions Suppose we build the BN for distribution P using the opposite ordering i.e., we use ordering Aches, Cold, Flu, Malaria resulting network is more complicated! Mal depends on Aches; but it also depends on Cold, Flu given Aches Cold, Flu explain away Mal given Aches Flu depends on Aches; but also on Cold given Aches Cold depends on Aches 32

Compactness 1+1+1+8=11 numbers 1+2+4+8=15 numbers In general, if each random variable is directly influenced by at most k others, then each CPT will be at most 2 k. Thus the entire network of n variables is specified by n2 k. 33

Testing Independence Given BN, how do we determine if two variables X, Y are independent (given evidence E)? we use a (simple) graphical property D-separation: A set of variables E d- separates X and Y if it blocks every undirected path in the BN between X and Y. X and Y are conditionally independent given evidence E if E d-separates X and Y thus BN gives us an easy way to tell if two variables are independent (set E = ) or cond. independent 34

Blocking in D-Separation Let P be an undirected path from X to Y in a BN. Let E be an evidence set. We say E blocks path P iff there is some node Z on the path such that: Case 1: one arc on P goes into Z and one goes out of Z, and Z E; or Case 2: both arcs on P leave Z, and Z E; or Case 3: both arcs on P enter Z and neither Z, nor any of its descendents, are in E. 35

Blocking: Graphical View 36

D-Separation: Intuitions 1. Subway and Thermometer? 2.Aches and Fever? 3.Aches and Thermometer? 4. Flu and Malaria? 5.Subway and ExoticTrip? 37

D-Separation: Intuitions Subway and Therm are dependent; but are independent given Flu (since Flu blocks the only path) Aches and Fever are dependent; but are independent given Flu (since Flu blocks the only path). Similarly for Aches and Therm (dependent, but indep. given Flu). Flu and Mal are indep. (given no evidence): Fever blocks the path, since it is not in evidence, nor is its descendant Therm. Flu,Mal are dependent given Fever (or given Therm): nothing blocks path now. Subway,ExoticTrip are indep.; they are dependent given Therm; they are indep. given Therm and Malaria. This for exactly the same reasons for Flu/Mal above. 38

Next class Inference with Bayesian networks Section 14.4 39