CSEP 573: Artificial Intelligence

Similar documents
CS 188: Artificial Intelligence Spring Announcements

Bayes Nets III: Inference

CS 188: Artificial Intelligence. Bayes Nets

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

Bayes Nets: Independence

Artificial Intelligence Bayes Nets: Independence

CS 5522: Artificial Intelligence II

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Product rule. Chain rule

last two digits of your SID

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

CSE 473: Artificial Intelligence Autumn 2011

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Random Variables. A random variable is some aspect of the world about which we (may) have uncertainty

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

CSE 473: Artificial Intelligence

CS 343: Artificial Intelligence

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

CS 188: Artificial Intelligence. Our Status in CS188

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Spring Announcements

Bayesian Networks BY: MOHAMAD ALSABBAGH

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Fall 2011

Our Status. We re done with Part I Search and Planning!

CSEP 573: Artificial Intelligence

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Recap: Bayes Nets. CS 473: Artificial Intelligence Bayes Nets: Independence. Conditional Independence. Bayes Nets. Independence in a BN

Bayesian networks. Chapter 14, Sections 1 4

Our Status in CSE 5522

Reinforcement Learning Wrap-up

Bayesian networks (1) Lirong Xia

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Fall 2009

Probabilistic Models. Models describe how (a portion of) the world works

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Quantifying uncertainty & Bayesian networks

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

CS 188: Artificial Intelligence Spring Announcements

CSE 473: Artificial Intelligence Spring 2014

Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Bayesian networks. Chapter Chapter

CS 188: Artificial Intelligence Fall 2009

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

CS 188: Artificial Intelligence

Introduction to Artificial Intelligence Belief networks

Approximate Inference

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Artificial Intelligence Bayesian Networks

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

CS 5522: Artificial Intelligence II

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

PROBABILITY AND INFERENCE

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Quantifying Uncertainty

CS 730/730W/830: Intro AI

Computer Science CPSC 322. Lecture 23 Planning Under Uncertainty and Decision Networks

Artificial Intelligence Uncertainty

Artificial Intelligence

Uncertainty and Bayesian Networks

CS 188: Artificial Intelligence Fall 2008

School of EECS Washington State University. Artificial Intelligence

Bayes Nets: Sampling

Bayesian Networks. Motivation

Introduction to Artificial Intelligence. Unit # 11

PROBABILITY. Inference: constraint propagation

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Graphical Models and Kernel Methods

Learning Bayesian Networks (part 1) Goals for the lecture

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net

Inference in Bayesian networks

CS 188: Artificial Intelligence Fall Recap: Inference Example

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

COMP5211 Lecture Note on Reasoning under Uncertainty

CS6220: DATA MINING TECHNIQUES

CS 343: Artificial Intelligence

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

CS 343: Artificial Intelligence

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Spring 2009

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Directed Graphical Models

PROBABILISTIC REASONING SYSTEMS

Probabilistic Models

Intelligent Systems (AI-2)

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Transcription:

CSEP 573: Artificial Intelligence Bayesian Networks: Inference Ali Farhadi Many slides over the course adapted from either Luke Zettlemoyer, Pieter Abbeel, Dan Klein, Stuart Russell or Andrew Moore 1

Outline Bayesian Networks Inference Exact Inference: Variable Elimination Approximate Inference: Sampling

Bayes Net Representation 3

4

Reachability (D-Separation)! Question: Are X and Y conditionally independent given evidence vars {Z}?! Yes, if X and Y separated by Z! Look for active paths from X to Y! No active paths = independence!! A path is active if each triple is active:! Causal chain A B C where B is unobserved (either direction)! Common cause A B C where B is unobserved! Common effect (aka v-structure) A B C where B or one of its descendents is observed! All it takes to block a path is a single inactive segment! Active Triples (dependent) Inactive Triples (Independent) 5

Bayes Net Joint Distribution B# P(B)# +b# 0.001# Qb# 0.999# B # E # E# P(E)# +e# 0.002# Qe# 0.998# A# J# P(J A)# +a# +j# 0.9# +a# Qj# 0.1# Qa# +j# 0.05# Qa# Qj# 0.95# J # A # M # A# M# P(M A)# +a# +m# 0.7# +a# Qm# 0.3# Qa# +m# 0.01# Qa# Qm# 0.99# B# E# A# P(A B,E)# +b# +e# +a# 0.95# +b# +e# Qa# 0.05# +b# Qe# +a# 0.94# +b# Qe# Qa# 0.06# Qb# +e# +a# 0.29# Qb# +e# Qa# 0.71# Qb# Qe# +a# 0.001# Qb# Qe# Qa# 0.999# 6

Bayes Net Joint Distribution B# P(B)# +b# 0.001# Qb# 0.999# B # E # E# P(E)# +e# 0.002# Qe# 0.998# A# J# P(J A)# +a# +j# 0.9# +a# Qj# 0.1# Qa# +j# 0.05# Qa# Qj# 0.95# J # A # M # A# M# P(M A)# +a# +m# 0.7# +a# Qm# 0.3# Qa# +m# 0.01# Qa# Qm# 0.99# B# E# A# P(A B,E)# +b# +e# +a# 0.95# +b# +e# Qa# 0.05# +b# Qe# +a# 0.94# +b# Qe# Qa# 0.06# Qb# +e# +a# 0.29# Qb# +e# Qa# 0.71# Qb# Qe# +a# 0.001# Qb# Qe# Qa# 0.999# 7

Probabilistic Inference Probabilistic inference: compute a desired probability from other known probabilities (e.g. conditional from joint) We generally compute conditional probabilities P(on time no reported accidents) = 0.90 These represent the agent s beliefs given the evidence Probabilities change with new evidence: P(on time no accidents, 5 a.m.) = 0.95 P(on time no accidents, 5 a.m., raining) = 0.80 Observing new evidence causes beliefs to be updated

Inference! Inference:#calcula)ng#some# useful#quan)ty#from#a#joint# probability#distribu)on#! Examples:#! Posterior#probability#! Most#likely#explana)on:# 9

Inference by Enumeration General case: Evidence variables: Query* variable: Hidden variables: All variables We want: First, select the entries consistent with the evidence Second, sum out H to get joint of Query and evidence: Finally, normalize the remaining entries to conditionalize Obvious problems: Worst-case time complexity O(d n ) Space complexity O(d n ) to store the joint distribution

Inference in BN by Enumeration! Given#unlimited#)me,#inference#in#BNs#is#easy#! Reminder#of#inference#by#enumera)on#by#example:# P (B + j, +m) / B P (B,+j, +m) B # E # A # = X e,a P (B,e,a,+j, +m) J # M # = X e,a P (B)P (e)p (a B,e)P (+j a)p (+m a) =P (B)P (+e)p (+a B,+e)P (+j + a)p (+m + a)+p (B)P (+e)p ( a B,+e)P (+j a)p (+m a) P (B)P ( e)p (+a B, e)p (+j + a)p (+m + a)+p (B)P ( e)p ( a B, e)p (+j a)p (+m a) 11

Inference by Enumerataion P (Antilock observed variables) =? 12

Variable Elimination Why is inference by enumeration so slow? You join up the whole joint distribution before you sum out the hidden variables You end up repeating a lot of work! Idea: interleave joining and marginalizing! Called Variable Elimination Still NP-hard, but usually much faster than inference by enumeration We ll need some new notation to define VE

Review Joint distribution: P(X,Y) Entries P(x,y) for all x, y Sums to 1 Selected joint: P(x,Y) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x) T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 T W P cold sun 0.2 cold rain 0.3

Review Family of conditionals: P(X Y) Multiple conditionals Entries P(x y) for all x, y Sums to Y T W P hot sun 0.8 hot rain 0.2 cold sun 0.4 cold rain 0.6 Single conditional: P(Y x) Entries P(y x) for fixed x, all y Sums to 1 T W P cold sun 0.4 cold rain 0.6

Review Specified family: P(y X) Entries P(y x) for fixed y, but for all x Sums to who knows! T W P hot rain 0.2 cold rain 0.6 In general, when we write P(Y 1 Y N X 1 X M ) It is a factor, a multi-dimensional array Its values are all P(y 1 y N x 1 x M ) Any assigned X or Y is a dimension missing (selected) from the array

Inference Inference is expensive with enumeration Variable elimination: Interleave joining and marginalization: Store initial results and then join with the rest

Example: Traffic Domain Random Variables R: Raining T: Traffic L: Late for class! First query: P(L) R +r 0.1 - r 0.9 T L +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 +t +l 0.3 +t - l 0.7 - t +l 0.1 - t - l 0.9

Variable Elimination Outline Maintain a set of tables called factors Initial factors are local CPTs (one per node) +r 0.1 - r 0.9 +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 Any known values are selected E.g. if we know +t +l 0.3 +t - l 0.7 - t +l 0.1 - t - l 0.9, the initial factors are +r 0.1 - r 0.9 +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 +t +l 0.3 - t +l 0.1 VE: Alternately join factors and eliminate variables

Operation 1: Join Factors R T First basic operation: joining factors Combining factors: Just like a database join Get all factors over the joining variable Build a new factor over the union of the variables involved Example: Join on R +r 0.1 - r 0.9 +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 Computation for each entry: pointwise products +r +t 0.08 +r - t 0.02 - r +t 0.09 - r - t 0.81 R,T

Example: Multiple Joins R T L +r 0.1 - r 0.9 +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 Join R +r +t 0.08 +r - t 0.02 - r +t 0.09 - r - t 0.81 R, T L +t +l 0.3 +t - l 0.7 - t +l 0.1 - t - l 0.9 +t +l 0.3 +t - l 0.7 - t +l 0.1 - t - l 0.9

Example: Multiple Joins R, T, L R, T L +r +t 0.08 +r - t 0.02 - r +t 0.09 - r - t 0.81 +t +l 0.3 +t - l 0.7 - t +l 0.1 - t - l 0.9 Join T +r +t +l 0.024 +r +t - l 0.056 +r - t +l 0.002 +r - t - l 0.018 - r +t +l 0.027 - r +t - l 0.063 - r - t +l 0.081 - r - t - l 0.729

Operation 2: Eliminate Second basic operation: marginalization Take a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example: +r +t 0.08 +r - t 0.02 - r +t 0.09 - r - t 0.81 +t 0.17 - t 0.83

Multiple Elimination R, T, L T, L L +r +t +l 0.024 +r +t - l 0.056 +r - t +l 0.002 +r - t - l 0.018 - r +t +l 0.027 - r +t - l 0.063 - r - t +l 0.081 - r - t - l 0.729 Sum out R +t +l 0.051 +t - l 0.119 - t +l 0.083 - t - l 0.747 Sum out T +l 0.134 - l 0.886

P(L) : Marginalizing Early! +r 0.1 - r 0.9 Join R Sum out R R T +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 +r +t 0.08 +r - t 0.02 - r +t 0.09 - r - t 0.81 R, T +t 0.17 - t 0.83 T L +t +l 0.3 +t - l 0.7 - t +l 0.1 - t - l 0.9 +t +l 0.3 +t - l 0.7 - t +l 0.1 - t - l 0.9 L +t +l 0.3 +t - l 0.7 - t +l 0.1 - t - l 0.9 L

Marginalizing Early (aka VE*) T L Join T T, L Sum out T L +t 0.17 - t 0.83 +t +l 0.3 +t - l 0.7 - t +l 0.1 - t - l 0.9 +t +l 0.051 +t - l 0.119 - t +l 0.083 - t - l 0.747 +l 0.134 - l 0.886 * VE is variable elimination

R P (L) =? Traffic Domain T L! Inference#by#Enumera)on# = X t X P (L t)p (r)p (t r) r Join#on#r#! Variable#Elimina)on# = X P (L t) X P (r)p (t r) t r Join#on#r# Join#on#t# Eliminate#r# Eliminate#r# Join#on#t# Eliminate#t# Eliminate#t# 27

Marginalizing Early Join#R# Sum#out#R# Join#T# Sum#out#T# R T L +r# 0.1# Qr# 0.9# +r# +t# 0.8# +r# Qt# 0.2# Qr# +t# 0.1# Qr# Qt# 0.9# +t# +l# 0.3# +t# Ql# 0.7# Qt# +l# 0.1# Qt# Ql# 0.9# +r# +t# 0.08# +r# Qt# 0.02# Qr# +t# 0.09# Qr# Qt# 0.81# R, T L +t# +l# 0.3# +t# Ql# 0.7# Qt# +l# 0.1# Qt# Ql# 0.9# +t# 0.17# Qt# 0.83# T L +t# +l# 0.3# +t# Ql# 0.7# Qt# +l# 0.1# Qt# Ql# 0.9# T, L L +t# +l# 0.051# +t# Ql# 0.119# Qt# +l# 0.083# Qt# Ql# 0.747# +l# 0.134# Ql# 0.866# 28

Evidence If evidence, start with factors that select that evidence No evidence uses these initial factors: +r 0.1 - r 0.9 +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 +t +l 0.3 +t - l 0.7 - t +l 0.1 - t - l 0.9 Computing, the initial factors become: +r 0.1 +r +t 0.8 +r - t 0.2 +t +l 0.3 +t - l 0.7 - t +l 0.1 - t - l 0.9 We eliminate all vars other than query + evidence

Evidence II Result will be a selected joint of query and evidence E.g. for P(L +r), we d end up with: Normalize +r +l 0.026 +r - l 0.074 +l 0.26 - l 0.74 To get our answer, just normalize this! That s it!

General Variable Elimination Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize

Variable Elimination Bayes Rule Start / Select Join on B Normalize B P B A P B +b 0.1 b 0.9 a +b +a 0.8 b a 0.2 b +a 0.1 b a 0.9 a, B A B P +a +b 0.08 +a b 0.09 A B P +a +b 8/17 +a b 9/17

Example Query: Choose A

Example Choose E Finish with B Normalize

Variable Elimination A,E P(B, j, m) = P(b, j, m, A, E) = A,E E P(B)P(E)P(A B, E)P(m A)P( j A) A P(B)P(E) P(A B, E)P(m A)P( j A) = P(B)P(E) P(m, j, A B, E) E A B M A E J = P(B)P(E)P(m, j B, E) = P(B) P(m, j, E B) E E = P(B)P(m, j B)

Another Example Computa)onal#complexity#cri)cally# depends#on#the#largest#factor#being# generated#in#this#process.##size#of#facto =#number#of#entries#in#table.##in# example#above#(assuming#binary)#all# factors#generated#are#of#size#2#qqq#as# they#all#only#have#one#variable#(z,#z,# and#x 3 #respec)vely).## 36

Variable Elimination Ordering! For#the#query#P(X n y 1,,y n )#work#through#the#following#two#different#orderings# as#done#in#previous#slide:#z,#x 1,#,#X nq1 #and#x 1,#,#X nq1,#z.##what#is#the#size#of#the# maximum#factor#generated#for#each#of#the#orderings?# # #! Answer:#2 n+1 #versus#2 2 #(assuming#binary)#! In#general:#the#ordering#can#greatly#affect#efficiency.### 37

VE: Computational and Space Complexity! The#computa)onal#and#space#complexity#of#variable#elimina)on#is# determined#by#the#largest#factor#! The#elimina)on#ordering#can#greatly#affect#the#size#of#the#largest#factor.###! E.g.,#previous#slide s#example#2 n #vs.#2#! Does#there#always#exist#an#ordering#that#only#results#in#small#factors?#! No!# 38

Exact Inference: Variable Elimination Remaining Issues: Complexity: exponential in tree width (size of the largest factor created) Best elimination ordering? NP-hard problem What you need to know: Should be able to run it on small examples, understand the factor creation / reduction flow Better than enumeration: saves time by marginalizing variables as soon as possible rather than at the end We have seen a special case of VE already HMM Forward Inference

Variable Elimination! Interleave#joining#and#marginalizing#! d k #entries#computed#for#a#factor#over#k# variables#with#domain#sizes#d#! Ordering#of#elimina)on#of#hidden#variables# can#affect#size#of#factors#generated#! Worst#case:#running#)me#exponen)al#in#the# size#of#the#bayes #net# 40