Informatics 2D Reasoning and Agents Semester 2,

Similar documents
Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

CS 343: Artificial Intelligence

PROBABILISTIC REASONING SYSTEMS

Inference in Bayesian Networks

Bayes Nets: Sampling

Artificial Intelligence

Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

CS 188: Artificial Intelligence. Bayes Nets

Informatics 2D Reasoning and Agents Semester 2,

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Approximate Inference

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Intelligent Systems (AI-2)

Inference in Bayesian networks

Artificial Intelligence Methods. Inference in Bayesian networks

Sampling from Bayes Nets

14 PROBABILISTIC REASONING

Lecture 10: Bayesian Networks and Inference

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayesian Networks BY: MOHAMAD ALSABBAGH

Inference in Bayesian networks

Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Outline } Exact inference by enumeration } Exact inference by variable elimination } Approximate inference by stochastic simulation } Approximate infe

Bayesian Networks. Philipp Koehn. 6 April 2017

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Bayesian Networks. Philipp Koehn. 29 October 2015

Machine Learning for Data Science (CS4786) Lecture 24

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probabilistic Reasoning Systems

Bayes Nets III: Inference

Artificial Intelligence Bayesian Networks

Introduction to Artificial Intelligence (AI)

Inference in Bayesian networks

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Lecture 8: Bayesian Networks

Stochastic Simulation

Quantifying uncertainty & Bayesian networks

Belief Networks for Probabilistic Inference

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Bayesian networks. AIMA2e Chapter 14

15-381/781 Bayesian Nets & Probabilistic Inference

School of EECS Washington State University. Artificial Intelligence

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

A Bayesian Approach to Phylogenetics

Bayesian Networks. Motivation

Learning Bayesian Networks

Bayesian networks: approximate inference

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayes Networks 6.872/HST.950

Bayesian Inference and MCMC

Markov Chains and MCMC

An Introduction to Bayesian Networks: Representation and Approximate Inference

Sampling Algorithms for Probabilistic Graphical models

Bayesian Methods in Artificial Intelligence

Introduction to Bayesian Networks

Markov Networks.

Bayesian networks. Instructor: Vincent Conitzer

CS 188, Fall 2005 Solutions for Assignment 4

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Reasoning Under Uncertainty

CS 343: Artificial Intelligence

LECTURE 15 Markov chain Monte Carlo

Uncertainty and Bayesian Networks

Introduction to Bayesian Networks. Probabilistic Models, Spring 2009 Petri Myllymäki, University of Helsinki 1

Reasoning Under Uncertainty

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Bayesian Networks. CompSci 270 Duke University Ron Parr. Why Joint Distributions are Important

Probabilistic Machine Learning

Independence. CS 109 Lecture 5 April 6th, 2016

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Computational Cognitive Science

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Introduction to Bayesian Networks

Graphical Models and Kernel Methods

Directed Graphical Models

Bayesian Graphical Models

Stochastic Simulation

PROBABILISTIC REASONING Outline

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Artificial Intelligence

STA 4273H: Statistical Machine Learning

T Machine Learning: Basic Principles

Artificial Intelligence: Reasoning Under Uncertainty/Bayes Nets

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Course Overview. Summary. Outline

CS 188: Artificial Intelligence Fall 2011

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Transcription:

Informatics 2D Reasoning and Agents Semester 2, 2018 2019 Alex Lascarides alex@inf.ed.ac.uk Lecture 25 Approximate Inference in Bayesian Networks 19th March 2019 Informatics UoE Informatics 2D 1

Where are we? Last time... Inference in Bayesian Networks Exact methods: enumeration, variable elimination algorithm Computationally intractable in the worst case Today... Approximate Inference in Bayesian Networks Informatics UoE Informatics 2D 140

Approximate inference in BNs Exact inference computationally very hard Approximate methods important, here randomised sampling algorithms Monte Carlo algorithms We will talk about two types of MC algorithms: 1. 2. Markov chain sampling Informatics UoE Informatics 2D 141

Basic idea: generate samples from a known probability distribution Consider an unbiased coin as a random variable sampling from the distribution is like flipping the coin It is possible to sample any distribution on a single variable given a set of random numbers from [0,1] Simplest method: generate events from network without evidence Sample each variable in topological order Probability distribution for sampled value is conditioned on values assigned to parents Informatics UoE Informatics 2D 142

Example Consider the following BN and ordering [Cloudy, Sprinkler, Rain, WetGrass]: P(C)=.5 Cloudy C t f P(S).10.50 Sprinkler Wet Grass Rain C t f P(R).80.20 S t t f f R t f t f P(W).99.90.90.00 Informatics UoE Informatics 2D 143

Example Direct sampling process: Sample from P(Cloudy) = 0.5,0.5, suppose this returns true Sample from P(Sprinkler Cloudy = true) = 0.1, 0.9, suppose this returns false Sample from P(Rain Cloudy = true) = 0.8, 0.2, suppose this returns true Sample from P(WetGrass Sprinkler = false, Rain = true) = 0.9, 0.1, suppose this returns true Event returned=[true, false, true, true] Informatics UoE Informatics 2D 144

Generates samples with probability S(x 1,...,x n ) n S(x 1,...,x n ) = P(x 1,...,x n ) = P(x i parents(x i )) i.e. in accordance with the distribution Answers are computed by counting the number N(x 1,...,x n ) of the times event x 1,...,x n was generated and dividing by total number N of all samples In the limit, we should get N(x 1,...,x n ) lim = S(x 1,...,x n ) = P(x 1,...,x n ) n N If the estimated probability ˆP becomes exact in the limit we call the estimate consistent and we write in this sense, e.g. P(x 1,...,x n ) N(x 1,...,x n )/N i=1 Informatics UoE Informatics 2D 145

Purpose: to produce samples for hard-to-sample distribution from easy-to-sample distribution To determine P(X e) generate samples from the prior distribution specified by the BN first Then reject those that do not match the evidence The estimate ˆP(X = x e) is obtained by counting how often X = x occurs in the remaining samples is consistent because, by definition: ˆP(X e) = N(X,e) N(e) P(X,e) P(e) = P(X e) Informatics UoE Informatics 2D 146

Back to our example Assume we want to estimate P(Rain Sprinkler = true), using 100 samples 73 have Sprinkler = false (rejected), 27 have Sprinkler = true Of these 27, 8 have Rain = true and 19 have Rain = false P(Rain Sprinkler = true) α 8, 19 = 0.296, 0.704 True answer would be 0.3, 0.7 But the procedure rejects too many samples that are not consistent with e (exponential in number of variables) Not really usable (similar to naively estimating conditional probabilities from observation) Informatics UoE Informatics 2D 147

Avoids inefficiency of rejection sampling by generating only samples consistent with evidence Fixes the values for evidence variables E and samples only the remaining variables X and Y Since not all events are equally probable, each event has to be weighted by its likelihood that it accords to the evidence Likelihood is measured by product of conditional probabilities for each evidence variable, given its parents Informatics UoE Informatics 2D 148

Consider query P(Rain Sprinkler = true, WetGrass = true) in our example; initially set weight w = 1, then event is generated: Sample from P(Cloudy) = 0.5,0.5, suppose this returns true Sprinkler is evidence variable with value true, we set w w P(Sprinkler = true Cloudy = true) = 0.1 Sample from P(Rain Cloudy = true) = 0.8, 0.2, suppose this returns true WetGrass is evidence variable with value true, we set w w P(WetGrass = true Sprinkler = true,rain = true) = 0.099 Sample returned=[true, true, true, true] with weight 0.099 tallied under Rain = true Informatics UoE Informatics 2D 149

why it works S(z,e) = l i=1 P(z i parents(z i )) S s sample values for each Z i is influenced by the evidence among Z i s ancestors But S pays no attention when sampling Z i s value to evidence from Z i s non-ancestors; so it s not sampling from the true posterior probability distribution! But the likelihood weight w makes up for the difference between the actual and desired sampling distributions: w(z,e) = m P(e i parents(e i )) i=1 Informatics UoE Informatics 2D 150

why it works Since two products cover all the variables in the network, we can write l m P(z,e) = P(z i parents(z i )) P(e i parents(e i )) i=1 } {{ } S(z,e) i=1 } {{ } w(z,e) With this, it is easy to derive that likelihood weighting is consistent (tutorial exercise) Problem: most samples will have very small weights as the number of evidence variables increases These will be dominated by tiny fraction of samples that accord more than infinitesimal likelihood to the evidence Informatics UoE Informatics 2D 151

The Markov chain Monte Carlo (MCMC) algorithm MCMC algorithm: create an event from a previous event, rather than generate all events from scratch Helpful to think of the BN as having a current state specifying a value for each variable Consecutive state is generated by sampling a value for one of the non-evidence variables X i conditioned on the current values of variables in the Markov blanket of X i Recall that Markov blanket consists of parents, children, and children s parents Algorithm randomly wanders around state space flipping one variable at a time and keeping evidence variables fixed Informatics UoE Informatics 2D 152

The MCMC algorithm Consider query P(Rain Sprinkler = true, WetGrass = true) once more Sprinkler and WetGrass (evidence variables) are fixed to their observed values, hidden variables Cloudy and Rain are initialised randomly (e.g. true and false) Initial state is [true, true, false, true] Execute repeatedly: Sample Cloudy given values of Markov blanket, i.e. sample from P(Cloudy Sprinkler = true, Rain = false) Suppose result is false, new state is [false,true,false,true] Sample Rain given values of Markov blanket, i.e. sample from P(Rain Sprinkler = true, Cloudy = false, WetGrass = true) Suppose we obtain Rain = true, new state [false,true,true,true] Informatics UoE Informatics 2D 153

The MCMC algorithm why it works Each state is a sample, contributes to estimate of query variable Rain (count samples to compute estimate as before) Basic idea of proof that MCMC is consistent: The sampling process settles into a dynamic equilibrium in which the long-term fraction of time spent in each state is exactly proportional to its posterior probability MCMC is a very powerful method used for all kinds of things involving probabilities Informatics UoE Informatics 2D 154

Approximate inference in BN s Likelihood working and why it works MCMC algorithm and why it works Next time: Time and Uncertainty I Informatics UoE Informatics 2D 155