Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

Similar documents
CS 343: Artificial Intelligence

Bayes Nets: Sampling

CS 188: Artificial Intelligence. Bayes Nets

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayes Nets III: Inference

Approximate Inference

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

CS 188: Artificial Intelligence Spring Announcements

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

Artificial Intelligence Bayes Nets: Independence

Introduction to Artificial Intelligence (AI)

Intelligent Systems (AI-2)

Bayes Nets: Independence

Bayesian Networks BY: MOHAMAD ALSABBAGH

CS 188: Artificial Intelligence Spring Announcements

Inference in Bayesian networks

Product rule. Chain rule

CS 5522: Artificial Intelligence II

CS 343: Artificial Intelligence

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

Inference in Bayesian networks

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence

CS 188: Artificial Intelligence Spring Announcements

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

CS 343: Artificial Intelligence

Markov Networks.

CS 343: Artificial Intelligence

Machine Learning for Data Science (CS4786) Lecture 24

CSEP 573: Artificial Intelligence

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

CS 188: Artificial Intelligence

ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

Probabilistic Models. Models describe how (a portion of) the world works

Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

CSE 473: Artificial Intelligence

CS 188: Artificial Intelligence. Our Status in CS188

Recap: Bayes Nets. CS 473: Artificial Intelligence Bayes Nets: Independence. Conditional Independence. Bayes Nets. Independence in a BN

CS 5522: Artificial Intelligence II

last two digits of your SID

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Stochastic Simulation

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2009

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

PROBABILISTIC REASONING SYSTEMS

Probabilistic Models

CS 188: Artificial Intelligence Fall Recap: Inference Example

Reinforcement Learning Wrap-up

CS 188: Artificial Intelligence

Artificial Intelligence Bayesian Networks

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Our Status. We re done with Part I Search and Planning!

Our Status in CSE 5522

CSEP 573: Artificial Intelligence

CSE 473: Artificial Intelligence

CS 188: Artificial Intelligence Spring 2009

Stochastic Simulation

Announcements. CS 188: Artificial Intelligence Fall VPI Example. VPI Properties. Reasoning over Time. Markov Models. Lecture 19: HMMs 11/4/2008

Machine Learning for Data Science (CS4786) Lecture 24

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

CSE 473: Artificial Intelligence Autumn 2011

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Bayesian Inference and MCMC

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Directed Graphical Models or Bayesian Networks

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Artificial Intelligence Methods. Inference in Bayesian networks

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

STA 4273H: Statistical Machine Learning

Sampling Algorithms for Probabilistic Graphical models

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Artificial Intelligence

Probability. CS 3793/5233 Artificial Intelligence Probability 1

STA 4273H: Statistical Machine Learning

Bayesian networks. Chapter 14, Sections 1 4

CS 730/730W/830: Intro AI

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

DAG models and Markov Chain Monte Carlo methods a short overview

Bayesian networks. Chapter Chapter

Inference in Bayesian networks

CS 5522: Artificial Intelligence II

Uncertainty and Bayesian Networks

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

CSE 473: Artificial Intelligence Autumn Topics

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Transcription:

188: Artificial Intelligence Bayes Nets: ampling Bayes Net epresentation A directed, acyclic graph, one node per random variable A conditional probability table (PT) for each node A collection of distributions over X, one for each combination of parents values Bayes nets implicitly encode joint distributions As a product of local conditional distributions To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together: Instructors: Dan Klein and Pieter Abbeel --- University of alifornia, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for 188 Intro to AI at U Berkeley. All 188 materials are available at http://ai.berkeley.edu.] Variable Elimination Approximate Inference: ampling Interleave joining and marginalizing d k entries computed for a factor over k variables with domain sizes d Ordering of elimination of hidden variables can affect size of factors generated orst case: running time exponential in the size of the Bayes net ampling ampling ampling is a lot like repeated simulation Predicting the weather, basketball games, Basic idea Draw N samples from a sampling distribution ompute an approximate posterior probability how this converges to the true probability P hy sample? Learning: get samples from a distribution you don t know Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination) ampling from given distribution tep 1: Get sample u from uniform distribution over [0, 1) E.g. random() in python tep 2: onvert this sample u into an outcome for the given distribution by having each target outcome associated with a sub-interval of [0,1) with sub-interval size equal to probability of the outcome Example P() red 0.6 green 0.1 blue 0.3 If random() returns u = 0.83, then our sample is = blue E.g, after sampling 8 times:

ampling in Bayes Nets Prior ampling Prior ampling ejection ampling Likelihood eighting Gibbs ampling Prior ampling Prior ampling +c 0.5 -c 0.5 For i = 1, 2,, n loudy ample x i from P(X i Parents(X i )) +c +s 0.1 -s 0.9 -c +s 0.5 -s 0.5 prinkler ain +c +r 0.8 -r 0.2 -c +r 0.2 -r 0.8 eturn (x 1, x 2,, x n ) +s +r +w 0.99 -w 0.01 -r +w 0.90 -s +r +w 0.90 -r +w 0.01 -w 0.99 etgrass amples: -c, +s, -r, +w Prior ampling Example This process generates samples with probability: i.e. the BN s joint probability Let the number of samples of an event be Then I.e., the sampling procedure is consistent e ll get a bunch of samples from the BN: -c, +s, +r, -w -c, -s, -r, +w If we want to know P() e have counts <+w:4, -w:1> Normalize to get P() = <+w:0.8, -w:0.2> This will get closer to the true distribution with more samples an estimate anything else, too hat about P( +w)? P( +r, +w)? P( -r, -w)? Fast: can use fewer samples if less time (what s the drawback?)

ejection ampling ejection ampling Let s say we want P() No point keeping all samples around Just tally counts of as we go Let s say we want P( +s) ame thing: tally outcomes, but ignore (reject) samples which don t have =+s This is called rejection sampling It is also consistent for conditional probabilities (i.e., correct in the limit) -c, +s, +r, -w -c, -s, -r, +w ejection ampling Likelihood eighting Input: evidence instantiation For i = 1, 2,, n ample x i from P(X i Parents(X i )) If x i not consistent with evidence eject: return no sample is generated in this cycle eturn (x 1, x 2,, x n ) Likelihood eighting Likelihood eighting Problem with rejection sampling: If evidence is unlikely, rejects lots of samples Evidence not exploited as you sample onsider P( hape blue ) hape olor pyramid, green pyramid, red sphere, blue cube, red sphere, green Idea: fix evidence variables and sample the rest Problem: sample distribution not consistent! olution: weight by probability of evidence given parents pyramid, blue pyramid, blue hape olor sphere, blue cube, blue sphere, blue +c +s 0.1 -s 0.9 -c +s 0.5 -s 0.5 prinkler +s +r +w 0.99 -w 0.01 -r +w 0.90 -s +r +w 0.90 -r +w 0.01 -w 0.99 +c 0.5 -c 0.5 loudy etgrass ain amples: +c +r 0.8 -r 0.2 -c +r 0.2 -r 0.8

Likelihood eighting Likelihood eighting Input: evidence instantiation w = 1.0 for i = 1, 2,, n if X i is an evidence variable X i = observation x i for X i et w = w * P(x i Parents(X i )) else ample x i from P(X i Parents(X i )) return (x 1, x 2,, x n ), w ampling distribution if z sampled and e fixed evidence Now, samples have weights loudy Together, weighted sampling distribution is consistent Likelihood eighting Gibbs ampling Likelihood weighting is good e have taken evidence into account as we generate the sample E.g. here, s value will get picked based on the evidence values of, More of our samples will reflect the state of the world suggested by the evidence Likelihood weighting doesn t solve all our problems Evidence influences the choice of downstream variables, but not upstream ones ( isn t more likely to get a value matching the evidence) e would like to consider evidence when we sample every variable (leads to Gibbs sampling) Gibbs ampling Gibbs ampling Example: P( +r) Procedure: keep track of a full instantiation x 1, x 2,, x n. tart with an arbitrary instantiation consistent with the evidence. ample one variable at a time, conditioned on all the rest, but keep evidence fixed. Keep repeating this for a long time. Property: in the limit of repeating this infinitely many times the resulting samples come from the correct distribution (i.e. conditioned on evidence). ationale: both upstream and downstream variables condition on evidence. In contrast: likelihood weighting only conditions on upstream evidence, and hence weights obtained in likelihood weighting can sometimes be very small. um of weights over all samples is indicative of how many effective samples were obtained, so we want high weight. tep 1: Fix evidence = +r teps 3: epeat hoose a non-evidence variable X esample X from P( X all other variables) tep 2: Initialize other variables andomly

Efficient esampling of One Variable Bayes Net ampling ummary ample from P( +c, +r, -w) Prior ampling P( Q ) ejection ampling P( Q e ) Likelihood eighting P( Q e) Gibbs ampling P( Q e ) Many things cancel out only PTs with remain! More generally: only PTs that have resampled variable need to be considered, and joined together Further eading on Gibbs ampling* Gibbs sampling produces sample from the query distribution P( Q e ) in limit of re-sampling infinitely often Gibbs sampling is a special case of more general methods called Markov chain Monte arlo (MM) methods Metropolis-Hastings is one of the more famous MM methods (in fact, Gibbs sampling is a special case of Metropolis-Hastings) You may read about Monte arlo methods they re just sampling