Stochastic Simulation

Similar documents
Stochastic Simulation

Bayes Nets: Sampling

CS 343: Artificial Intelligence

Machine Learning for Data Science (CS4786) Lecture 24

Bayesian networks: approximate inference

Intelligent Systems (AI-2)

Approximate Inference

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

Markov chain Monte Carlo Lecture 9

CS 188: Artificial Intelligence. Bayes Nets

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Answers and expectations

Introduction to Artificial Intelligence (AI)

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayesian Networks BY: MOHAMAD ALSABBAGH

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Informatics 2D Reasoning and Agents Semester 2,

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Graphical Models. Lecture 15: Approximate Inference by Sampling. Andrew McCallum

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Propositions. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 5.1, Page 1

DAG models and Markov Chain Monte Carlo methods a short overview

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Model Averaging (Bayesian Learning)

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Sampling Methods (11/30/04)

Machine Learning for Data Science (CS4786) Lecture 24

Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule

17 : Markov Chain Monte Carlo

Markov Networks.

STA 4273H: Statistical Machine Learning

Bayesian Methods for Machine Learning

28 : Approximate Inference - Distributed MCMC

Markov Chains and MCMC

CPSC 540: Machine Learning

LECTURE 15 Markov chain Monte Carlo

Artificial Intelligence Bayesian Networks

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

Artificial Intelligence

Bayesian Graphical Models

Inference in Bayesian Networks

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Bayes Nets III: Inference

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Markov Chain Monte Carlo (MCMC)

Outline } Exact inference by enumeration } Exact inference by variable elimination } Approximate inference by stochastic simulation } Approximate infe

CSC 2541: Bayesian Methods for Machine Learning

Lecture 7 and 8: Markov Chain Monte Carlo

Sampling Algorithms for Probabilistic Graphical models

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

7.1 Coupling from the Past

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Quantifying Uncertainty

16 : Approximate Inference: Markov Chain Monte Carlo

Robert Collins CSE586, PSU Intro to Sampling Methods

16 : Markov Chain Monte Carlo (MCMC)

Markov chain Monte Carlo

15-780: Grad AI Lecture 19: Graphical models, Monte Carlo methods. Geoff Gordon (this lecture) Tuomas Sandholm TAs Erik Zawadzki, Abe Othman

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: Bayesian Networks

Robert Collins CSE586, PSU Intro to Sampling Methods

9 Markov chain Monte Carlo integration. MCMC

CS 188: Artificial Intelligence

Markov Chain Monte Carlo (MCMC)

CS 343: Artificial Intelligence

Graphical Models and Kernel Methods

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Introduction to Machine Learning

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II


MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

PROBABILISTIC REASONING SYSTEMS

CS 343: Artificial Intelligence

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

MCMC: Markov Chain Monte Carlo

an introduction to bayesian inference

The Metropolis-Hastings Algorithm. June 8, 2012

Lecture : Probabilistic Machine Learning

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

CS281A/Stat241A Lecture 22

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Computer intensive statistical methods

Bayesian Inference for Dirichlet-Multinomials

Transcription:

Stochastic Simulation Idea: probabilities samples Get probabilities from samples: X count x 1 n 1. x k total. n k m X probability x 1. n 1 /m. x k n k /m If we could sample from a variable s (posterior) probability, we could estimate its (posterior) probability. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 1 1 / 15

Generating samples from a distribution For a variable X with a discrete domain or a (one-dimensional) real domain: Totally order the values of the domain of X. Generate the cumulative probability distribution: f (x) = P(X x). Select a value y uniformly in the range [0, 1]. Select the x such that f (x) = y. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 2 2 / 15

Cumulative Distribution 1 1 P(X) f(x) 0 0 v 1 v 2 v 3 v 4 v 1 v 2 v 3 v 4 c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 3 3 / 15

Hoeffding s inequality Theorem (Hoeffding): Suppose p is the true probability, and s is the sample average from n independent samples; then P( s p > ɛ) 2e 2nɛ2. Guarantees a probably approximately correct estimate of probability. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 4 4 / 15

Hoeffding s inequality Theorem (Hoeffding): Suppose p is the true probability, and s is the sample average from n independent samples; then P( s p > ɛ) 2e 2nɛ2. Guarantees a probably approximately correct estimate of probability. If you are willing to have an error greater than ɛ in δ of the cases, solve 2e 2nɛ2 < δ for n, which gives n > ln δ 2 2ɛ 2. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 5 4 / 15

Hoeffding s inequality Theorem (Hoeffding): Suppose p is the true probability, and s is the sample average from n independent samples; then P( s p > ɛ) 2e 2nɛ2. Guarantees a probably approximately correct estimate of probability. If you are willing to have an error greater than ɛ in δ of the cases, solve 2e 2nɛ2 < δ for n, which gives n > ln δ 2 2ɛ 2. ɛ δ n 0.1 0.05 185 0.01 0.05 18,445 0.1 0.01 265 c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 6 4 / 15

Forward sampling in a belief network Sample the variables one at a time; sample parents of X before sampling X. Given values for the parents of X, sample from the probability of X given its parents. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 7 5 / 15

Rejection Sampling To estimate a posterior probability given evidence Y 1 = v 1... Y j = v j : Reject any sample that assigns Y i to a value other than v i. The non-rejected samples are distributed according to the posterior probability: sample =α P(α evidence) 1 sample 1 where we consider only samples consistent with evidence. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 8 6 / 15

Rejection Sampling Example: P(ta sm, re) Observe Sm = true, Re = true Ta Fi Al Sm Le Re s 1 false true false true false false Ta Fi Al Sm Le Re c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 9 7 / 15

Rejection Sampling Example: P(ta sm, re) Observe Sm = true, Re = true Ta Fi Al Sm Le Re s 1 false true false true false false s 2 false true true true true true Ta Fi Al Sm Le Re c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 10 7 / 15

Rejection Sampling Example: P(ta sm, re) Observe Sm = true, Re = true Ta Fi Ta Fi Al Sm Le Re s 1 false true false true false false s 2 false true true true true true s 3 true false true false Al Sm Le Re c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 11 7 / 15

Rejection Sampling Example: P(ta sm, re) Observe Sm = true, Re = true Ta Fi Ta Fi Al Sm Le Re s 1 false true false true false false s 2 false true true true true true s 3 true false true false s 4 true true true true true true Al Sm Le Re c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 12 7 / 15

Rejection Sampling Example: P(ta sm, re) Observe Sm = true, Re = true Ta Al Fi Sm Ta Fi Al Sm Le Re s 1 false true false true false false s 2 false true true true true true s 3 true false true false s 4 true true true true true true... s 1000 false false false false Le Re c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 13 7 / 15

Rejection Sampling Example: P(ta sm, re) Observe Sm = true, Re = true Ta Al Fi Sm Ta Fi Al Sm Le Re s 1 false true false true false false s 2 false true true true true true s 3 true false true false s 4 true true true true true true... s 1000 false false false false Le Re P(sm) = 0.02 P(re sm) = 0.32 How many samples are rejected? How many samples are used? c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 14 7 / 15

Importance Sampling Samples have weights: a real number associated with each sample that takes the evidence into account. Probability of a proposition is weighted average of samples: sample =α P(α evidence) weight(sample) sample weight(sample) Mix exact inference with sampling: don t sample all of the variables, but weight each sample according to P(evidence sample). c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 15 8 / 15

Importance Sampling (Likelihood Weighting) procedure likelihood weighting(bn, e, Q, n): ans[1 : k] := 0 where k is size of dom(q) repeat n times: weight := 1 for each variable X i in order: if X i = o i is observed weight := weight P(X i = o i parents(x i )) else assign X i a random sample of P(X i parents(x i )) if Q has value v: ans[v] := ans[v] + weight return ans/ v ans[v] c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 16 9 / 15

Importance Sampling Example: P(ta sm, re) Ta Al Fi Sm Ta Fi Al Le Weight s 1 true false true false s 2 false true false false s 3 false true true true s 4 true true true true... s 1000 false false true true Le Re P(sm fi) = 0.9 P(sm fi) = 0.01 P(re le) = 0.75 P(re le) = 0.01 c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 17 10 / 15

Importance Sampling Example: P(ta sm, re) Ta Al Fi Sm Ta Fi Al Le Weight s 1 true false true false 0.01 0.01 s 2 false true false false 0.9 0.01 s 3 false true true true 0.9 0.75 s 4 true true true true 0.9 0.75... s 1000 false false true true 0.01 0.75 Le Re P(sm fi) = 0.9 P(sm fi) = 0.01 P(re le) = 0.75 P(re le) = 0.01 c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 18 10 / 15

Importance Sampling Example: P(le sm, ta, re) Ta Al Le Re Fi Sm P(ta) = 0.02 P(fi) = 0.01 P(al fi ta) = 0.5 P(al fi ta) = 0.99 P(al fi ta) = 0.85 P(al fi ta) = 0.0001 P(sm fi) = 0.9 P(sm fi) = 0.01 P(le al) = 0.88 P(le al) = 0.001 P(re le) = 0.75 P(re le) = 0.01 c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 19 11 / 15

Computing Expectations & Proposal Distributions Expected value of f with respect to distribution P: E P (f ) = w f (w) P(w) c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 20 12 / 15

Computing Expectations & Proposal Distributions Expected value of f with respect to distribution P: E P (f ) = f (w) P(w) w 1 f (s) n s s is sampled with probability P. There are n samples. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 21 12 / 15

Computing Expectations & Proposal Distributions Expected value of f with respect to distribution P: E P (f ) = f (w) P(w) w 1 f (s) n s s is sampled with probability P. There are n samples. E P (f ) = w f (w) P(w)/Q(w) Q(w) c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 22 12 / 15

Computing Expectations & Proposal Distributions Expected value of f with respect to distribution P: E P (f ) = f (w) P(w) w 1 f (s) n s s is sampled with probability P. There are n samples. E P (f ) = w f (w) P(w)/Q(w) Q(w) 1 f (s) P(s)/Q(s) n s s is selected according the distribution Q. The distribution Q is called a proposal distribution. P(c) > 0 then Q(c) > 0. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 23 12 / 15

Particle Filtering Importance sampling can be seen as: for each particle: for each variable: sample / absorb evidence / update query where particle is one of the samples. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 24 13 / 15

Particle Filtering Importance sampling can be seen as: for each particle: for each variable: sample / absorb evidence / update query where particle is one of the samples. Instead we could do: Why? for each variable: for each particle: sample / absorb evidence / update query c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 25 13 / 15

Particle Filtering Importance sampling can be seen as: for each particle: for each variable: sample / absorb evidence / update query where particle is one of the samples. Instead we could do: Why? for each variable: for each particle: sample / absorb evidence / update query We can have a new operation of resampling It works with infinitely many variables (e.g., HMM) c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 26 13 / 15

Particle Filtering for HMMs Start with random chosen particles (say 1000) Each particle represents a history. Initially, sample states in proportion to their probability. Repeat: Absorb evidence: weight each particle by the probability of the evidence given the state of the particle. Resample: select each particle at random, in proportion to the weight of the particle. Some particles may be duplicated, some may be removed. All new particles have same weight. Transition: sample the next state for each particle according to the transition probabilities. To answer a query about the current state, use the set of particles as data. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 27 14 / 15

Markov Chain Monte Carlo To sample from a distribution P: Create (ergodic and aperiodic) Markov chain with P as equilibrium distribution. Let T (S i+1 S i ) be the transition probability. Given state s, sample state s from T (S s) c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 28 15 / 15

Markov Chain Monte Carlo To sample from a distribution P: Create (ergodic and aperiodic) Markov chain with P as equilibrium distribution. Let T (S i+1 S i ) be the transition probability. Given state s, sample state s from T (S s) After a while, the states sampled will be distributed according to P. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 29 15 / 15

Markov Chain Monte Carlo To sample from a distribution P: Create (ergodic and aperiodic) Markov chain with P as equilibrium distribution. Let T (S i+1 S i ) be the transition probability. Given state s, sample state s from T (S s) After a while, the states sampled will be distributed according to P. Ignore the first samples burn-in use the remaining samples. Samples are not independent of each other autocorrelation. Sometimes use subset (e.g., 1/1000) of them thinning c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 30 15 / 15

Markov Chain Monte Carlo To sample from a distribution P: Create (ergodic and aperiodic) Markov chain with P as equilibrium distribution. Let T (S i+1 S i ) be the transition probability. Given state s, sample state s from T (S s) After a while, the states sampled will be distributed according to P. Ignore the first samples burn-in use the remaining samples. Samples are not independent of each other autocorrelation. Sometimes use subset (e.g., 1/1000) of them thinning Gibbs sampler: sample each non-observed variable from the distribution of the variable given the current (or observed) value of the variables in its Markov blanket. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 31 15 / 15