Bayes Nets III: Inference

Similar documents
Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence Spring Announcements

CS 343: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

Bayes Nets: Sampling

Approximate Inference

CSEP 573: Artificial Intelligence

CS 5522: Artificial Intelligence II

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

Bayes Nets: Independence

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2009

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

CS 188: Artificial Intelligence Spring Announcements

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

CS 343: Artificial Intelligence

Artificial Intelligence Bayes Nets: Independence

CS 5522: Artificial Intelligence II

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Probabilistic Models

Probabilistic Models. Models describe how (a portion of) the world works

Introduction to Artificial Intelligence (AI)

CSE 473: Artificial Intelligence Autumn 2011

Product rule. Chain rule

Bayesian Networks BY: MOHAMAD ALSABBAGH

last two digits of your SID

CS 188: Artificial Intelligence Spring Announcements

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Announcements. CS 188: Artificial Intelligence Fall Example Bayes Net. Bayes Nets. Example: Traffic. Bayes Net Semantics

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Artificial Intelligence

Intelligent Systems (AI-2)

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net

Bayesian networks (1) Lirong Xia

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

CS 188: Artificial Intelligence Spring 2009

PROBABILISTIC REASONING SYSTEMS

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

CSE 473: Artificial Intelligence

CSEP 573: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Directed Graphical Models

CS 188: Artificial Intelligence. Our Status in CS188

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Recap: Bayes Nets. CS 473: Artificial Intelligence Bayes Nets: Independence. Conditional Independence. Bayes Nets. Independence in a BN

CS 188: Artificial Intelligence Fall Recap: Inference Example

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

Our Status. We re done with Part I Search and Planning!

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Reinforcement Learning Wrap-up

Quantifying uncertainty & Bayesian networks

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Artificial Intelligence Bayesian Networks

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Bayesian networks. Chapter 14, Sections 1 4

Mini-project 2 (really) due today! Turn in a printout of your work at the end of the class

Sampling from Bayes Nets

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

CS 188: Artificial Intelligence Spring Announcements

Bayesian networks. Chapter Chapter

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

CS 188: Artificial Intelligence Fall 2009

CS 343: Artificial Intelligence

Our Status in CSE 5522

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Inference in Bayesian Networks

Informatics 2D Reasoning and Agents Semester 2,

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Machine Learning. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 8 May 2012

Midterm 2 V1. Introduction to Artificial Intelligence. CS 188 Spring 2015

Uncertainty and Bayesian Networks

Bayesian belief networks. Inference.

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Basic Probability and Statistics

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

CS6220: DATA MINING TECHNIQUES

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

CS 188: Artificial Intelligence Fall 2011

Inference in Bayesian networks

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

CS 730/730W/830: Intro AI

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Probabilistic Reasoning Systems

Transcription:

1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy of Dan Klein, Stuart Russell, or Andrew Moore

2 Hal Daumé III (me@hal3.name) Announcements Midterms graded Grades posted, pick up after class, complain soon :) Grade distribution: Projects: P3 solution is posted P4 (a combined P4/P5 is posted) If you do well on P4, it's worth 14% otherwise, it's worth 8.75%

3 Hal Daumé III (me@hal3.name) Contest Capture-the-flag style pacman Tight connection to P4 Completely optional, team based (<=3 students) Deadline: 8 May Prizes: Worth a few points on the final exam See web page for prize details

4 Hal Daumé III (me@hal3.name) Reachability (D-Separation) Question: Are X and Y conditionally independent given evidence variables {Z}? Look for active paths from X to Y No active paths = independence! Active Triples Inactive Triples A path is active if each triple is either a: Causal chain A B C where B is unobserved (either direction) Common cause A B C where B is unobserved Common effect (aka v- structure) A B C where B or one of its descendents is observed

5 Hal Daumé III (me@hal3.name) Causality? When Bayes nets reflect the true causal patterns: Often simpler (nodes have fewer parents) Often easier to think about Often easier to elicit from experts BNs need not actually be causal Sometimes no causal net exists over the domain E.g. consider the variables Traffic and Drips End up with arrows that reflect correlation, not causation What do the arrows really mean? Topology may happen to encode causal structure Topology only guaranteed to encode conditional independencies

6 Hal Daumé III (me@hal3.name) Inference by Enumeration Given unlimited time, inference in BNs is easy Recipe: State the marginal probabilities you need Figure out ALL the atomic probabilities you need Calculate and combine them Example:

7 Hal Daumé III (me@hal3.name) Example Where did we use the BN structure? We didn t!

8 Hal Daumé III (me@hal3.name) Example In this simple method, we only need the BN to synthesize the joint entries

9 Hal Daumé III (me@hal3.name) Normalization Trick Normalize

10 Hal Daumé III (me@hal3.name) Inference by Enumeration?

11 Hal Daumé III (me@hal3.name) Variable Elimination Why is inference by enumeration so slow? You join up the whole joint distribution before you sum out the hidden variables You end up repeating a lot of work! Idea: interleave joining and marginalizing! Called Variable Elimination Still NP-hard, but usually much faster than inference by enumeration We ll need some new notation to define VE

12 Hal Daumé III (me@hal3.name) Factor Zoo I Joint distribution: P(X,Y) Entries P(x,y) for all x, y Sums to 1 Selected joint: P(x,Y) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x) T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 T W P cold sun 0.2 cold rain 0.3

13 Hal Daumé III (me@hal3.name) Factor Zoo II Family of conditionals: P(X Y) Multiple conditionals Entries P(x y) for all x, y Sums to Y T W P hot sun 0.8 hot rain 0.2 cold sun 0.4 cold rain 0.6 Single conditional: P(Y x) Entries P(y x) for fixed x, all y Sums to 1 T W P cold sun 0.4 cold rain 0.6

14 Hal Daumé III (me@hal3.name) Factor Zoo III Specified family: P(y X) Entries P(y x) for fixed y, all x Sums to who knows! T W P hot rain 0.2 cold rain 0.6 In general, when we write P(Y 1 Y N X 1 X M ) It is a factor, a multi-dimensional array Its values are all P(y 1 y N x 1 x M ) Any unassigned X or Y is a dimension missing (selected) from the array

15 Hal Daumé III (me@hal3.name) Basic Objects Track objects called factors Initial factors are local CPTs One per node in the BN Any known values are specified E.g. if we know J = j and E = e, the initial factors are VE: Alternately join and marginalize factors

16 Hal Daumé III (me@hal3.name) Basic Operation: Join First basic operation: join factors Combining two factors: Just like a database join Build a factor over the union of the variables involved Example: Computation for each entry: pointwise products

17 Hal Daumé III (me@hal3.name) Basic Operation: Join In general, we join on a variable Take all factors mentioning that variable Join them all together Example: Join on A: Pick up these: Join to form:

18 Hal Daumé III (me@hal3.name) Basic Operation: Eliminate Second basic operation: marginalization Take a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example: Definition:

19 Hal Daumé III (me@hal3.name) General Variable Elimination Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Project out H Join all remaining factors and normalize

20 Hal Daumé III (me@hal3.name) Example Choose A

21 Hal Daumé III (me@hal3.name) Example Choose E Finish with B Normalize

22 Hal Daumé III (me@hal3.name) Variable Elimination What you need to know: Should be able to run it on small examples, understand the factor creation / reduction flow Better than enumeration: VE caches intermediate computations Saves time by marginalizing variables as soon as possible rather than at the end Polynomial time for tree-structured graphs sound familiar? We will see special cases of VE later You ll have to implement the special cases Approximations Exact inference is slow, especially with a lot of hidden nodes Approximate methods give you a (close, wrong?) answer, faster

23 Hal Daumé III (me@hal3.name) Sampling Basic idea: Draw N samples from a sampling distribution S Compute an approximate posterior probability Show this converges to the true probability P Outline: Sampling from an empty network Rejection sampling: reject samples disagreeing with evidence Likelihood weighting: use evidence to weight samples

24 Hal Daumé III (me@hal3.name) Prior Sampling Cloudy Sprinkler Rain WetGrass

25 Hal Daumé III (me@hal3.name) Prior Sampling This process generates samples with probability i.e. the BN s joint probability Let the number of samples of an event be Then I.e., the sampling procedure is consistent

26 Hal Daumé III (me@hal3.name) Example We ll get a bunch of samples from the BN: c, s, r, w c, s, r, w c, s, r, w c, s, r, w c, s, r, w If we want to know P(W) We have counts <w:4, w:1> Normalize to get P(W) = <w:0.8, w:0.2> This will get closer to the true distribution with more samples Can estimate anything else, too What about P(C r)? P(C r, w)? Sprinkler S Cloudy C WetGrass W Rain R

27 Hal Daumé III (me@hal3.name) Rejection Sampling Let s say we want P(C) No point keeping all samples around Just tally counts of C outcomes Let s say we want P(C s) Same thing: tally C outcomes, but ignore (reject) samples which don t have S=s This is rejection sampling It is also consistent (correct in the limit) Sprinkler S Cloudy C WetGrass W Rain R c, s, r, w c, s, r, w c, s, r, w c, s, r, w c, s, r, w

28 Hal Daumé III (me@hal3.name) Likelihood Weighting Problem with rejection sampling: If evidence is unlikely, you reject a lot of samples You don t exploit your evidence as you sample Consider P(B a) Burglary Alarm Idea: fix evidence variables and sample the rest Burglary Alarm Problem: sample distribution not consistent! Solution: weight by probability of evidence given parents

29 Hal Daumé III (me@hal3.name) Likelihood Sampling Cloudy Sprinkler Rain WetGrass w = 1.0 * 0.1 * 0.9

30 Hal Daumé III (me@hal3.name) Likelihood Weighting Sampling distribution if z sampled and e fixed evidence Cloudy C Now, samples have weights S Rain R W Together, weighted sampling distribution is consistent

31 Hal Daumé III (me@hal3.name) Likelihood Weighting Note that likelihood weighting doesn t solve all our problems Rare evidence is taken into account for downstream variables, but not upstream ones A better solution is Markov-chain Monte Carlo (MCMC), more advanced We ll return to sampling for robot localization and tracking in dynamic BNs S Cloudy C W Rain R