Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Similar documents
Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-II

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Directed Graphical Models

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

Directed Graphical Models or Bayesian Networks

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

{ p if x = 1 1 p if x = 0

PROBABILISTIC REASONING SYSTEMS

Informatics 2D Reasoning and Agents Semester 2,

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Introduction to Artificial Intelligence. Unit # 11

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Reasoning under Uncertainty

Bayesian Networks BY: MOHAMAD ALSABBAGH

Rapid Introduction to Machine Learning/ Deep Learning

Quantifying uncertainty & Bayesian networks

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

CS Lecture 3. More Bayesian Networks

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

COMP5211 Lecture Note on Reasoning under Uncertainty

CS 5522: Artificial Intelligence II

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Artificial Intelligence Bayes Nets: Independence

CS 188: Artificial Intelligence Fall 2009

Bayes Nets: Independence

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Bayes Networks 6.872/HST.950

Artificial Intelligence Bayesian Networks

Directed and Undirected Graphical Models

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Models. Models describe how (a portion of) the world works

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Graphical Models - Part I

CS 188: Artificial Intelligence. Bayes Nets

Probabilistic Graphical Models (I)

Uncertainty and Bayesian Networks

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Bayesian Approaches Data Mining Selected Technique

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

CS6220: DATA MINING TECHNIQUES

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473

Directed Graphical Models

CMPSCI 240: Reasoning about Uncertainty

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Probabilistic Models

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

CSE 473: Artificial Intelligence Autumn 2011

Probabilistic Graphical Networks: Definitions and Basic Results

Probability. CS 3793/5233 Artificial Intelligence Probability 1

CS 188: Artificial Intelligence Spring Announcements

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Chris Bishop s PRML Ch. 8: Graphical Models

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Intelligent Systems (AI-2)

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CS 5522: Artificial Intelligence II

Probabilistic representation and reasoning

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Introduction to Machine Learning

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Introduction to Machine Learning

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Bayesian networks. Chapter Chapter

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence

Markov Networks.

CS6220: DATA MINING TECHNIQUES

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Reasoning Under Uncertainty: Belief Network Inference

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Bayesian Machine Learning

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Bayesian Networks. Motivation

Reasoning Under Uncertainty: More on BNets structure and construction

Probabilistic representation and reasoning

An Introduction to Bayesian Machine Learning

Directed Graphical Models. William W. Cohen Machine Learning

Bayesian networks. Chapter 14, Sections 1 4

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Probabilistic Reasoning Systems

Lecture 6: Graphical Models

Machine Learning Lecture 14

Bayesian Network Representation

Stephen Scott.

Transcription:

Introduction to Bayes Nets CS 486/686: Introduction to Artificial Intelligence Fall 2013 1

Introduction Review probabilistic inference, independence and conditional independence Bayesian Networks - - What they are What they mean 2

Example: Joint Distribution sunny ~sunny cold ~cold cold ~cold headache 0.108 0.012 ~headache 0.016 0.064 headache 0.072 0.008 ~headache 0.144 0.576 P(headache^sunny^cold)=0.108 P(~headache^sunny^~cold)=0.064 P(headache V sunny) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28 P(headache)=0.108+0.012+0.072+0.008=0.2 marginalization 3

Example: Joint Distribution sunny ~sunny cold ~cold cold ~cold headache 0.108 0.012 ~headache 0.016 0.064 headache 0.072 0.008 ~headache 0.144 0.576 P(headache ^ cold sunny)= P(headache ^ cold ^ sunny)/p(sunny) = 0.108/(0.108+0.012+0.016+0.064) = 0. 54 P(headache ^ cold ~sunny)= P(headache ^ cold ^ ~sunny)/p(~sunny) = 0.072/(0.072+0.008+0.144+0.576) = 0.09 4

Bayes Rule Note: - P(A B)P(B)=P(A B)=P(B A)=P(B A)P(A) Bayes Rule: - P(B A)=[P(A B)P(B)]/P(A) Memorize this! 5

General Forms of Bayes Rule 6

Using Bayes Rule for Inference Often we want to form a hypothesis about the world based on what we have observed Bayes rule is vitally important when viewed in terms of stating the belief given to hypothesis H, given evidence e Likelihood Prior probability Posterior probability Normalizing constant 7

Conditioning We define P e(x)=p(x e) - Produce P e by conditioning prior distribution on observed evidence Semantically we take original measure μ - - Set μ=0 for any world where e was false Set μ=μ(w)/p(e) for any e-world - Normalization 8

Semantics of Conditioning p1 p3 p1 αp1 p2 p4 p2 αp2 E=e E=e E=e E=e Pr Pr e α = 1/(p1+p2) normalizing constant 9

Inference Semantically/conceptually, the picture is clear But several issues must be addressed 10

Issue 1 How do we specify the full joint distribution over a set of random variables X1, X2,..., Xn? - What are the difficulties? 11

Issue 2 Inference in this representation is very slow 12

Independence Two variables A and B are independent if knowledge of A does not change uncertainty of B (and vice versa) - P(A B)=P(A) - P(B A)=P(B) - P(A B)=P(A)P(B) - In general: P(X 1,X2,...,Xn)= ip(xi) 13

Variable Independence Two variables X and Y are conditionally independent given variable Z iff x, y are conditionally independent given z for all x in Dom(X), y in Dom(Y) and z in Dom(Z) - Also applies to sets of variables X, Y, Z If you know the value of Z (whatever it is) nothing you learn about Y will influence your beliefs about X 14

What good is independence? Suppose (boolean) random variables X1,X2,...,Xn are mutually independent - Specify the full joint using only n parameters instead of 2 n -1 How? Specify P(x 1), P(x2),..., P(xn) - Can now recover probability for any query - P(x,y)=P(x)P(y) and P(x y)=p(x) and P(y x)=p(y) 15

Value of Independence Complete independence reduce both representation of the joint and inference from O(2 n ) to O(n)! Unfortunately, rarely have complete mutual independence 16

Conditional Independence Full independence is often too strong a requirement Two variables A and B are conditionally independent given C if - - P(a b,c)=p(a c) for all a,b,c i.e. knowing the value of B does not change the prediction of A if the value of C is known 17

Conditional Independence Diagnosis problem - Fl=Flu, Fv=Fever, C=Cough Full joint dist. has 23-1=7 independent entries If someone has the flu, we can assume that the probability of a cough does not depend on having a fever (P(C Fl,Fv)=P(C Fl)) If the same condition holds if the patient does not have the Flu then C and Fv are conditionally independent given FL (P(C ~Fl, Fv)=P(C ~Fl)) 18

Conditional Independence Full distribution can be written as P (C, F l, F v) = P (C, F V Fl)P (Fl) = P (C Fl)P(Fv Fl)P(Fl) We only need 5 numbers! Huge savings if there are lots of variables 19

Conditional Independence Such a probability distribution is sometimes called a Naive Bayes model In practice they work well - even when the independence assumption is not true 20

Value of Independence Fortunately, most domains do exhibit a fair amount of conditional independence - Exploit conditional independence for both representation and inference Bayesian networks do just this 21

Notation P(X) for variable X (or set of variables) refers to (marginal) distribution over X P(X Y) is the family of conditional distributions over X (one for each y in Dom(Y) Distinguish between P(X) (distribution) and P(x) (numbers) - Think of P(X) as a function that accepts any x i in Dom(X) and returns a number 22

Notation Think of P(X Y) as a function that accepts any xi and yk and returns P(xi yk) Note (again) that P(X Y) is not a single distribution 23

Exploiting Conditional Independence Consider the following story - If Kate woke up too early (E), she probably needs coffee (C); if Kate needs coffee (C), she is likely to be grumpy (G). If she is grumpy, then itʼs possible that the lecture wonʼt go smoothly (L). If the lecture does not go smoothly, then the students will likely be sad (S). E C G L S E Kate woke too early G Kate is grumpy S Students are sad C Kate needs coffee L The lecture did not go smoothly 24

Conditional Independence E C G L S If you learned any of E, C, G, or L then your assessment of P(S) would change - - if any of these are seen to be true, you would increase P(s) and decrease P(~s) So S is not independent of E, C, G, or L 25

Conditional Independence E C G L S But if you knew the value of L (true or false) then learning the values of E, C, or G would not influence P(S) - - Students are not sad because Kate did not have a coffee, they are sad because of the lecture So S is independent of E, C, and G, given L 26

Conditional Independence E C G L S S is independent of E, and C and G given L Similarly - L is independent of E and C, given G - G is independent of E given C This means that - P(S L,{G,C,E})= - P(L G, {C,E})= - P(G C,{E})= - P(C E)= - P(E)= 27

Conditional Independence E C G L S By the chain rule - P(S,L,G,C,E)=? By our independence assumptions - P(S,L,G,C,E)=? We can specify the full joint by specifying five conditional distributions: P(S L), P(L G), P(G C), P(C E) and P(E) 28

Example Quantification Pr(c e) = 0.9 Pr(~c e) = 0.1 Pr(c ~e) = 0.5 Pr(~c ~e) = 0.5 Pr(l g) = 0.2 Pr(~l g) = 0.8 Pr(l ~g) = 0.1 Pr(~l ~g) = 0.9 E C G L S Pr(e) = 0.7 Pr(~e) = 0.3 Pr(g c) = 0.3 Pr(~g c) = 0.7 Pr(g ~c) = 1.0 Pr(~g ~c) = 0.0 Pr(s l) = 0.9 Pr(~s l) = 0.1 Pr(s ~l) = 0.1 Pr(~s ~l) = 0.9 Specifying the joint requires only 9 parameters instead of 31 for explicit representation - linear in number of vars instead of exponential - linear in general if dependence has a chain structure 29

Inference is easy E C G L S Want to know P(g)? Use marginalization! These are all terms specified in our local distributions! 30

Inference is Easy E C G L S Computing P(g) in more concrete terms 31

Bayesian Networks The structure just introduced is a Bayesian Network - Graphical representation of direct dependencies over a set of variables + a set of conditional probability distributions (CPTs) quantifying the strength of the influences 32

Bayesian Networks (aka belief networks, causal networks, probabilistic networks...) A BN over a set of variables {X 1,...,Xn} consists of - A directed acyclic graph whose nodes are the variables - A set of CPTs (P(Xi Parents(Xi)) for each Xi P(a) P(~a) A B P(b) P(~b) C P(c a,b) P(~c a,b) P(c ~a,b) P(~c ~a,b) P(c a,~b) P(~c a,~b) P(c ~a,~b) P(~c ~a,~b) 33

Bayesian Networks Key notions parents of a node: Par(X i) children of a node descendents of a node A C D B ancestors of a node family: set of nodes consisting of Xi and its parents CPT are defined over families Parents(C)={A,B} Children(A)={C} Descendents(B)={C,D} Ancestors{D}={A,B,C} Family{C}={C,A,B} 34

Bayes Net Example A couple CPTS are shown Explicit joint requires 2 11-1 =2047 params BN requires only 27 parms (the number of entries for each 35

Semantics The structure of the BN means: every X i is conditionally independent of all of its nondescendents given its parents Pr(X i S Par(X i )) = Pr(X i Par(X i )) for any subset S NonDescendants(X i ) 36

Semantics Imagine we make the query P(x 1,x2,...,xn) - = P(x n xn-1,...,x1)p(xn-1 xn-2,...,x1)...p(x1) - = P(x n Par(xn)P(xn-1 Par(xn-1))...P(x1) The joint is recoverable using the parameters (CPT) specified in an arbitrary BN 37

Constructing a BN Given any distribution over variables X1,X2,...,Xn, we can construct a BN that faithfully represents that distribution Take any ordering of the variables (say, the order given), and go through the following procedure for X n down to X 1. Let Par(X n ) be any subset S {X 1,, X n-1 } such that X n is independent of {X 1,, X n-1 } - S given S. Such a subset must exist. Then determine the parents of X n-1 in the same way, finding a similar S {X 1,, X n-2 }, and so on. In the end, a DAG is produced and the BN semantics must hold by construction. 38

Causal Intuitions The construction of a BN is simple - Works with arbitrary orderings of variable set - - But some orderings are much better than others Generally, if ordering/dependence structure reflects causal intuitions, we get a more compact BN In this BN, we ve used the ordering Malaria, Cold, Flu, Aches to build BN for distribution P for Aches Variable can only have parents that come earlier in the ordering 39

Causal Intuitions We could have used a different ordering - Aches, Cold, Flu, Malaria Mal depends on Aches; but it also depends on Cold, Flu given Aches Cold, Flu explain away Mal given Aches Flu depends on Aches; but also on Cold given Aches Cold depends on Aches 40

Compactness In general, if each random variable is directly influenced by at most k others then each CPT will be at most 2 k. Thus the entire network of n variables can be specified by n2 k 1+1+1+8=11 numbers 1+2+4+8=15 numbers 41

Testing Independence Given a BN, how we do determine if two variables X and Y are independent given evidence E? - We use a simple graphical property D-separation: A set of variables E d-separates X and Y if it blocks every undirected path between X and Y X and Y are conditionally independent given E if E d-separates X and Y 42

Blocking P is an undirected path from X to Y in BN. Let E be evidence set. E blocks path P iff there is some node in Z on the path such that - - - Case 1: one arc on P goes into Z and one goes out of Z and Z in E, or Case 2: both arcs on P leave Z and Z in E, or Case 3: both arcs on P enter Z and neither Z, nor any of its descendents, are in E 43

Blocking 44

Examples 1. Subway and Thermometer? 2. Aches and Fever? 3. Aches and Thermometer? 4. Flu and Malaria? 5. Subway and ExoticTrip? 45

D-Separation Can be computed in linear time with a depth-first search like algorithm Useful since now have a linear time algorithm for automatically inferring whether learning the value of one variables might given us any additional info about some other variable, given when we already know - Might since vars might be conditionally independent but not d-seperated 46

Other ways of determining conditional independence Non-descendents A node is conditionally independent of its non-descendents, given its parents. X is conditionally independent of the Z ij s given U i s 47

Example Fever is conditionally independent of Jaundice given Malaria and Flu 48

Markov Blanket A node is conditionally independent of all other nodes in the network, given its parents, children and children s parents (Markov blanket). 49

Markov Blanket Markov blanket Malaria is conditionally independent of Aches given ExoticTrip, Jaundice, Fever and Flu 50

Next Class Inference in Bayes Nets! 51