Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Similar documents
Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayes Nets III: Inference

CS 188: Artificial Intelligence. Bayes Nets

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Spring Announcements

Bayes Nets: Sampling

Approximate Inference

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

CSEP 573: Artificial Intelligence

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

Introduction to Artificial Intelligence (AI)

Artificial Intelligence

Bayesian Networks BY: MOHAMAD ALSABBAGH

last two digits of your SID

Intelligent Systems (AI-2)

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

CS 188: Artificial Intelligence Spring 2009

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

CS 188: Artificial Intelligence Fall 2009

Bayes Nets: Independence

CS 188: Artificial Intelligence Fall 2008

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

PROBABILISTIC REASONING SYSTEMS

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

CSE 473: Artificial Intelligence Autumn 2011

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

CSEP 573: Artificial Intelligence

Sampling from Bayes Nets

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Artificial Intelligence Bayes Nets: Independence

CS 188: Artificial Intelligence Fall Recap: Inference Example

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

CS 5522: Artificial Intelligence II

Probabilistic Models

CSE 473: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Artificial Intelligence Bayesian Networks

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

CS 188: Artificial Intelligence. Our Status in CS188

CS 343: Artificial Intelligence

Informatics 2D Reasoning and Agents Semester 2,

Inference in Bayesian Networks

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Inference in Bayesian networks

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Markov Networks.

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

CS 343: Artificial Intelligence

CS 5522: Artificial Intelligence II

Bayesian networks. Chapter 14, Sections 1 4

Bayesian networks. Chapter Chapter

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Our Status in CSE 5522

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Quantifying uncertainty & Bayesian networks

Our Status. We re done with Part I Search and Planning!

Outline } Exact inference by enumeration } Exact inference by variable elimination } Approximate inference by stochastic simulation } Approximate infe

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

CSE 473: Artificial Intelligence Autumn Topics

Mini-project 2 (really) due today! Turn in a printout of your work at the end of the class

Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Product rule. Chain rule

CS 730/730W/830: Intro AI

CS 188: Artificial Intelligence Fall 2009

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Probabilistic Reasoning Systems

Machine Learning for Data Science (CS4786) Lecture 24

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probability. CS 3793/5233 Artificial Intelligence Probability 1

PROBABILITY AND INFERENCE

Markov Chains and MCMC

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Basic Probability and Statistics

STA 4273H: Statistical Machine Learning

CS 188: Artificial Intelligence Spring Announcements

Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

PROBABILITY. Inference: constraint propagation

Reinforcement Learning Wrap-up

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

CS 343: Artificial Intelligence

Bayesian networks (1) Lirong Xia

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Bayes Networks 6.872/HST.950

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

Sampling Algorithms for Probabilistic Graphical models

School of EECS Washington State University. Artificial Intelligence

Introduction to Bayesian Learning

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Transcription:

Introduction to Artificial Intelligence V22.0472-001 Fall 2009 Lecture 15: Bayes Nets 3 Midterms graded Assignment 2 graded Announcements Rob Fergus Dept of Computer Science, Courant Institute, NYU Slides from John DeNero, Dan Klein, Stuart Russell or Andrew Moore 2 Mid-term Inference Bayes rule: P(x y) = Posterior P(y x) = Likelihood P(x) = Prior P(y) = Evidence Inference: calculating some statistic from a joint probability distribution s: Posterior probability: Most likely explanation: D L R T T B 4 Reminder: Alarm Network Inference by Enumeration Given unlimited time, inference in BNs is easy Recipe: State the marginal probabilities you need Figure out ALL the atomic probabilities you need Calculate and combine them : 5 6 1

In this simple method, we only need the BN to synthesize the joint entries Where did we use the BN structure? We didn t! 7 8 Normalization Trick Inference by Enumeration? Normalize 9 10 Variable Elimination Factor Zoo I Why is inference by enumeration so slow? You join up the whole joint distribution before you sum out the hidden variables You end up repeating a lot of work! Joint distribution: P(X,Y) Entries P(x,y) for all x, y Sums to 1 hot sun 0.4 hot rain 0.1 cold sun 0.2 Idea: interleave joining and marginalizing! cold rain 0.3 Called Variable Elimination Still NP-hard, but usually much faster than inference by enumeration We ll need some new notation to define VE 11 Selected joint: P(x,Y) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x) cold sun 0.2 cold rain 0.3 12 2

Factor Zoo II Factor Zoo III Family of conditionals: P(X Y) Multiple conditionals Entries P(x y) for all x, y Sums to Y Single conditional: P(Y x) Entries P(y x) for fixed x, all y Sums to 1 hot sun 0.8 hot rain 0.2 cold sun 0.4 cold rain 0.6 cold sun 0.4 cold rain 0.6 Specified family: P(y X) Entries P(y x) for fixed y, all x Sums to who knows! hot rain 0.2 cold rain 0.6 In general, when we write P(Y 1 Y N X 1 X M ) It is a factor, a multi-dimensional array Its values are all P(y 1 y N x 1 x M ) Any unassigned X or Y is a dimension missing (selected) from the array 13 14 Basic Objects Track objects called factors Initial factors are local CPTs One per node in the BN Any known values are specified E.g. if we know J = j and E = e, the initial factors are Basic Operation: Join First basic operation: join factors Combining two factors: Just like a database join Build a factor over the union of the variables involved : Computation for each entry: pointwise products VE: Alternately join and marginalize factors 15 16 Basic Operation: Join In general, we join on a variable Take all factors mentioning that variable Join them all together : Join on A: Pick up these: Basic Operation: Eliminate Second basic operation: marginalization Take a factor and sum out a variable Shrinks a factor to a smaller one A projection operation : Definition: Join to form: 17 18 3

General Variable Elimination Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Project out H Choose A Join all remaining factors and normalize 19 20 Variable Elimination Choose E What you need to know: Should be able to run it on small examples, understand the factor creation / reduction flow Better than enumeration: VE caches intermediate computations Saves time by marginalizing variables as soon as possible rather than at the end Polynomial time for tree-structured graphs sound familiar? We will see special cases of VE later Finish with B Normalize 21 Approximations Exact inference is slow, especially with a lot of hidden nodes Approximate methods give you a (close, wrong?) answer, faster 22 Sampling Prior Sampling Basic idea: Draw N samples from a sampling distribution S Compute an approximate posterior probability Show this converges to the true probability P Cloudy Outline: Sampling from an empty network Rejection sampling: reject samples disagreeing with evidence Likelihood weighting: use evidence to weight samples Sprinkler WetGrass Rain 23 24 4

Prior Sampling This process generates samples with probability We ll get a bunch of samples from the BN: c, s, r, w i.e. the BN s joint probability c, s, r, w Sprinkler S Let the number of samples of an event be c, s, r, w If we want to know P(W) WetGrass W Then We have counts <w:4, w:1> Normalize to get P(W) = <w:0.8, w:0.2> This will get closer to the true distribution with more samples Can estimate anything else, too I.e., the sampling procedure is consistent 25 What about P(C r)? P(C r, w)? 26 Rejection Sampling Likelihood Weighting Let s say we want P(C) No point keeping all samples around Just tally counts of C outcomes Let s say we want P(C s) Same thing: tally C outcomes, but ignore (reject) samples which don t have S=s This is rejection sampling It is also consistent (correct in the limit) Sprinkler S WetGrass W c, s, r, w c, s, r, w c, s, r, w 27 Problem with rejection sampling: If evidence is unlikely, you reject a lot of samples You don t exploit your evidence as you sample Consider P(B a) Burglary Idea: fix evidence variables and sample the rest Burglary Alarm Alarm Problem: sample distribution not consistent! Solution: weight by probability of evidence given parents 28 Likelihood Sampling Likelihood Weighting Sampling distribution if z sampled and e fixed evidence Cloudy Sprinkler Rain Now, samples have weights S W WetGrass Together, weighted sampling distribution is consistent 29 30 5

Likelihood Weighting History of Sampling Methods Note that likelihood weighting doesn t solve all our problems Rare evidence is taken into account for downstream variables, but not upstream ones S Monte Carlo methods developed at Los Alamos during the Manhattan project Used to design H-bomb A better solution is Markov-chain Monte Carlo (MCMC), more advanced We ll return to sampling for robot localization and tracking in dynamic BNs W Markov-Chain Monte Carlo (MCMC) by Metropolis 1953, Hastings 1970. Totally revolutionized statistics from 1990 s on 31 Require fast computers & what else? 6