CS 188: Artificial Intelligence Spring Announcements

Similar documents
Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayes Nets III: Inference

CS 188: Artificial Intelligence. Bayes Nets

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CS 188: Artificial Intelligence Spring Announcements

CS 5522: Artificial Intelligence II

CS 343: Artificial Intelligence

CSEP 573: Artificial Intelligence

CS 188: Artificial Intelligence Fall 2009

CS 188: Artificial Intelligence Fall 2008

Bayes Nets: Independence

CSE 473: Artificial Intelligence Autumn 2011

Probabilistic Models

CS 5522: Artificial Intelligence II

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net

Probabilistic Models. Models describe how (a portion of) the world works

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Artificial Intelligence Bayes Nets: Independence

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Product rule. Chain rule

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CS 343: Artificial Intelligence

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

last two digits of your SID

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Announcements. CS 188: Artificial Intelligence Fall Example Bayes Net. Bayes Nets. Example: Traffic. Bayes Net Semantics

Bayesian networks (1) Lirong Xia

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

CS 188: Artificial Intelligence Spring Announcements

Recap: Bayes Nets. CS 473: Artificial Intelligence Bayes Nets: Independence. Conditional Independence. Bayes Nets. Independence in a BN

CS 188: Artificial Intelligence. Our Status in CS188

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayesian networks. Chapter 14, Sections 1 4

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

Our Status. We re done with Part I Search and Planning!

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

Bayes Nets: Sampling

Approximate Inference

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

CSE 473: Artificial Intelligence

Quantifying uncertainty & Bayesian networks

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

CS 188: Artificial Intelligence Spring Announcements

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

CS 188: Artificial Intelligence Fall 2011

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Fall 2009

Bayesian networks. Chapter Chapter

PROBABILISTIC REASONING SYSTEMS

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Reinforcement Learning Wrap-up

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

CS 188: Artificial Intelligence

Directed Graphical Models

Our Status in CSE 5522

Bayesian Networks. Motivation

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Reasoning Under Uncertainty: Bayesian networks intro

CS 188: Artificial Intelligence Spring 2009

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Introduction to Artificial Intelligence Belief networks

Introduction to Artificial Intelligence. Unit # 11

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Probabilistic Reasoning Systems

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

CSE 473: Artificial Intelligence Autumn Topics

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 343: Artificial Intelligence

Informatics 2D Reasoning and Agents Semester 2,

Bayesian belief networks. Inference.

CSEP 573: Artificial Intelligence

Uncertainty and Bayesian Networks

Learning Bayesian Networks (part 1) Goals for the lecture

CS 188: Artificial Intelligence Fall Recap: Inference Example

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

CS 188: Artificial Intelligence Spring Today

Artificial Intelligence

Markov Networks.

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Probabilistic representation and reasoning

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

CS 343: Artificial Intelligence

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Probabilistic representation and reasoning

CSE 473: Artificial Intelligence

Transcription:

CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements Assignments W4 out today --- this is your last written!! Any assignments you have not picked up yet In bin in 283 Soda [same room as for submission drop-off] 2 1

Bayes Net Semantics A set of nodes, one per variable A directed, acyclic graph A conditional distribution for each node A collection of distributions over, one for each combination of parents values A 1 A n CPT: conditional probability table Description of a noisy causal process A Bayes net = Topology (graph) + Local Conditional Probabilities 3 Probabilities in BNs For all joint distributions, we have (chain rule): Bayes nets implicitly encode joint distributions As a product of local conditional distributions To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together: This lets us reconstruct any entry of the full joint Not every BN can represent every joint distribution The topology enforces certain conditional independencies 4 2

All Conditional Independences Given a Bayes net structure, can run d- separation to build a complete list of conditional independences that are necessarily true of the form i j { k1,..., kn } This list determines the set of probability distributions that can be represented 5 Possible to have same full list of conditional independence assumptions for different BN graphs? Yes! Examples: 6 3

Topology Limits Distributions Given some graph topology G, only certain joint distributions can be encoded The graph structure guarantees certain (conditional) independences (There might be more independence) Adding arcs increases the set of distributions, but has several costs Full conditioning can encode any distribution { Y, Z, Y Z, Y Y Z Z Z Y, Y Z, Y Z } Y Z Y Z { Z Y } {} Y Z Y Y Y Y Y Z Z Z Z Z 8 Causality? When Bayes nets reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from experts BNs need not actually be causal Sometimes no causal net exists over the domain E.g. consider the variables Traffic and Drips End up with arrows that reflect correlation, not causation What do the arrows really mean? Topology may happen to encode causal structure Topology only guaranteed to encode conditional independence [[If you wanted to learn more about causality, beyond the 9 scope of 188: Causility Judea Pearl]]** 4

Example: Traffic Basic traffic net Let s multiply out the joint R T r 1/4 r 3/4 r t 3/4 t 1/4 r t 1/2 t 1/2 r t 3/16 r t 1/16 r t 6/16 r t 6/16 10 Example: Reverse Traffic Reverse causality? T R t 9/16 t 7/16 t r 1/3 r 2/3 t r 1/7 r 6/7 r t 3/16 r t 1/16 r t 6/16 r t 6/16 11 5

Example: Coins Extra arcs don t prevent representing independence, just allow non-independence 1 2 1 2 h 0.5 h 0.5 h 0.5 h h 0.5 t 0.5 t 0.5 t 0.5 t h 0.5 Adding unneeded arcs isn t wrong, it s just inefficient h t 0.5 t t 0.5 12 Changing Bayes Net Structure The same joint distribution can be encoded in many different Bayes nets Causal structure tends to be the simplest Analysis question: given some edges, what other edges do you need to add? One answer: fully connect the graph Better answer: don t make any false conditional independence assumptions 13 6

An Algorithm for Adding Necessary Edges Choose an ordering consistent with the partial ordering induced by existing edges, let s refer to the ordered variables as 1, 2,, n For i=1, 2,, n Find the minimal set parents( i ) such that P (x i x 1 x i 1 )=P (x i parents( i )) Why does this ensure no spurious conditional independencies remain? 14 Example: Alternate Alarm Burglary Earthquake If we reverse the edges, we make different conditional independence assumptions Alarm John calls Mary calls John calls Mary calls Alarm To capture the same joint distribution, we have to add more edges to the graph Burglary Earthquake 15 7

Bayes Nets Representation Summary Bayes nets compactly encode joint distributions Guaranteed independencies of distributions can be deduced from BN graph structure D-separation gives precise conditional independence guarantees from graph alone A Bayes net s joint distribution may have further (conditional) independence that is not detectable until you inspect its specific distribution 16 Representation Bayes Nets Status Inference Learning Bayes Nets from Data 17 8

Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: B A E Most likely explanation: J M 18 Inference by Enumeration Given unlimited time, inference in BNs is easy Recipe: State the marginal probabilities you need Figure out ALL the atomic probabilities you need Calculate and combine them Example: B E A J M 19 9

Example: Enumeration In this simple method, we only need the BN to synthesize the joint entries 20 Inference by Enumeration? 21 10

Variable Elimination Why is inference by enumeration so slow? You join up the whole joint distribution before you sum out the hidden variables You end up repeating a lot of work! Idea: interleave joining and marginalizing! Called Variable Elimination Still NP-hard, but usually much faster than inference by enumeration We ll need some new notation to define VE 22 Factor Zoo I Joint distribution: P(,Y) Entries P(x,y) for all x, y Sums to 1 T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 Selected joint: P(x,Y) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x) T W P cold sun 0.2 cold rain 0.3 Number of capitals = dimensionality of the table 23 11

Factor Zoo II Family of conditionals: P( Y) Multiple conditionals Entries P(x y) for all x, y Sums to Y T W P hot sun 0.8 hot rain 0.2 cold sun 0.4 cold rain 0.6 Single conditional: P(Y x) Entries P(y x) for fixed x, all y Sums to 1 T W P cold sun 0.4 cold rain 0.6 24 Factor Zoo III Specified family: P(y ) Entries P(y x) for fixed y, but for all x Sums to who knows! T W P hot rain 0.2 cold rain 0.6 In general, when we write P(Y 1 Y N 1 M ) It is a factor, a multi-dimensional array Its values are all P(y 1 y N x 1 x M ) Any assigned or Y is a dimension missing (selected) from the array 25 12

Example: Traffic Domain Random Variables R: Raining T: Traffic L: Late for class! R +r 0.1 - r 0.9 T L +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 26 Variable Elimination Outline Track objects called factors Initial factors are local CPTs (one per node) +r 0.1 - r 0.9 Any known values are selected E.g. if we know +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9, the initial factors are +r 0.1 - r 0.9 +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 VE: Alternately join factors and eliminate variables 27 13

Operation 1: Join Factors First basic operation: joining factors Combining factors: Just like a database join Get all factors over the joining variable Build a new factor over the union of the variables involved Example: Join on R R T +r 0.1 - r 0.9 +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 +r +t 0.08 +r - t 0.02 - r +t 0.09 - r - t 0.81 R,T Computation for each entry: pointwise products 28 Example: Multiple Joins R T L +r 0.1 - r 0.9 +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 Join R +r +t 0.08 +r - t 0.02 - r +t 0.09 - r - t 0.81 R, T L 30 14

Example: Multiple Joins R, T, L R, T L +r +t 0.08 +r - t 0.02 - r +t 0.09 - r - t 0.81 Join T +r +t +l 0.024 +r +t - l 0.056 +r - t +l 0.002 +r - t - l 0.018 - r +t +l 0.027 - r +t - l 0.063 - r - t +l 0.081 - r - t - l 0.729 31 Operation 2: Eliminate Second basic operation: marginalization Take a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example: +r +t 0.08 +r - t 0.02 - r +t 0.09 - r - t 0.81 +t 0.17 - t 0.83 32 15

Multiple Elimination R, T, L T, L L +r +t +l 0.024 +r +t - l 0.056 +r - t +l 0.002 +r - t - l 0.018 - r +t +l 0.027 - r +t - l 0.063 - r - t +l 0.081 - r - t - l 0.729 Sum out R +t +l 0.051 +t - l 0.119 - t +l 0.083 - t - l 0.747 Sum out T +l 0.134 - l 0.886 33 P(L) : Marginalizing Early! +r 0.1 - r 0.9 Join R Sum out R R T +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 +r +t 0.08 +r - t 0.02 - r +t 0.09 - r - t 0.81 R, T +t 0.17 - t 0.83 T L L L 34 16

Marginalizing Early (aka VE*) T L Join T T, L L Sum out T +t 0.17 - t 0.83 +t +l 0.051 +t - l 0.119 - t +l 0.083 - t - l 0.747 +l 0.134 - l 0.886 * VE is variable elimination Evidence If evidence, start with factors that select that evidence No evidence uses these initial factors: +r 0.1 - r 0.9 +r +t 0.8 +r - t 0.2 - r +t 0.1 - r - t 0.9 Computing, the initial factors become: +r 0.1 +r +t 0.8 +r - t 0.2 We eliminate all vars other than query + evidence 36 17

Evidence II Result will be a selected joint of query and evidence E.g. for P(L +r), we d end up with: +r +l 0.026 +r - l 0.074 Normalize +l 0.26 - l 0.74 To get our answer, just normalize this! That s it! 37 General Variable Elimination Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize 38 18

Example Choose A 39 Example Choose E Finish with B Normalize 40 19

Example 2: P(B a) Start / Select Join on B Normalize B P B A P B +b 0.1 b 0.9 a +b +a 0.8 b a 0.2 b +a 0.1 b a 0.9 a, B A B P +a +b 0.08 +a b 0.09 A B P +a +b 8/17 +a b 9/17 41 Variable Elimination What you need to know: Should be able to run it on small examples, understand the factor creation / reduction flow Better than enumeration: saves time by marginalizing variables as soon as possible rather than at the end We will see special cases of VE later On tree-structured graphs, variable elimination runs in polynomial time, like tree-structured CSPs You ll have to implement a tree-structured special case to track invisible ghosts (Project 4) 20