An Introduction to Bayesian Networks: Representation and Approximate Inference

Similar documents
Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Machine Learning for Data Science (CS4786) Lecture 24

Lecture 5: Bayesian Network

Probabilistic Machine Learning

Sampling Algorithms for Probabilistic Graphical models

Lecture 8: Bayesian Networks

Intelligent Systems (AI-2)

Artificial Intelligence Bayesian Networks

Probabilistic Graphical Networks: Definitions and Basic Results

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Inference in Bayesian Networks

CS6220: DATA MINING TECHNIQUES

CS 343: Artificial Intelligence

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Bayesian Networks BY: MOHAMAD ALSABBAGH

CS6220: DATA MINING TECHNIQUES

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Directed and Undirected Graphical Models

Graphical Models and Kernel Methods

Artificial Intelligence

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Support for Multiple Cause Diagnosis with Bayesian Networks

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Markov Networks.

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

PROBABILISTIC REASONING SYSTEMS

Sampling Methods (11/30/04)

4 : Exact Inference: Variable Elimination

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Particle-Based Approximate Inference on Graphical Model

Bayes Nets: Sampling

Building Bayesian Networks. Lecture3: Building BN p.1

Directed Graphical Models or Bayesian Networks

Bayesian Networks for Causal Analysis

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Informatics 2D Reasoning and Agents Semester 2,

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Lecture 6: Graphical Models

STA 4273H: Statistical Machine Learning

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

16 : Approximate Inference: Markov Chain Monte Carlo

Bayes Networks 6.872/HST.950

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Lecture 9: PGM Learning

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Lecture 6: Graphical Models: Learning

COMP538: Introduction to Bayesian Networks

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

CS 188: Artificial Intelligence. Bayes Nets

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Undirected Graphical Models

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Machine Learning Summer School

An Introduction to Bayesian Machine Learning

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Probabilistic Graphical Models

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Monte Carlo Methods. Leon Gu CSD, CMU

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Markov Chains and MCMC

Bayesian Network Approach to Multinomial Parameter Learning using Data and Expert Judgments

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Introduction to Machine Learning CMU-10701

9 Forward-backward algorithm, sum-product on factor graphs

Directed and Undirected Graphical Models

The Ising model and Markov chain Monte Carlo

Causal Bayesian networks. Peter Antal

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

The Origin of Deep Learning. Lili Mou Jan, 2015

Today. Why do we need Bayesian networks (Bayes nets)? Factoring the joint probability. Conditional independence

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian networks: approximate inference

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo

MCMC and Gibbs Sampling. Sargur Srihari

Bayesian Networks Inference with Probabilistic Graphical Models

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Bayesian networks as causal models. Peter Antal

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Introduction to Artificial Intelligence. Unit # 11

Rapid Introduction to Machine Learning/ Deep Learning

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Bayesian Inference and MCMC

Markov Chain Monte Carlo

Markov chain Monte Carlo methods in atmospheric remote sensing

Introduction to Probabilistic Graphical Models

Computational Genomics

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Transcription:

An Introduction to Bayesian Networks: Representation and Approximate Inference Marek Grześ Department of Computer Science University of York Graphical Models Reading Group May 7, 2009

Data and Probabilities Representation of Bayesian Networks Approximate Reasoning with Gibbs Sampling References

Joint Probability Distribution Bronchitis Cancer Smoker Count Prob. ---------- ------- --------- ----- ----- absent absent nonsmoker 3 0.03 absent absent smoker 0 0 absent present nonsmoker 27 0.27 absent present smoker 15 0.15 present absent nonsmoker 2 0.02 present absent smoker 0 0 present present nonsmoker 18 0.18 present present smoker 35 0.25 [Cuss 09]

Joint and Marginal Probability Distributions, Queries P is a probability over X = {X 1,..., X n }, a joint probability distribution is P(X 1,..., X n )

Joint and Marginal Probability Distributions, Queries P is a probability over X = {X 1,..., X n }, a joint probability distribution is P(X 1,..., X n ) Inference - query types: conditional probability query: P(Y E = e), marginal probabilities: P(X i ), P(X i, X j ), P(Y ), likelihoods: P(E), maximum a posteriori query: find the most likely assignment to the variables Y given the evidence E = e : arg max y P(y e).

Queries: examples P(Bronchitis Cancer = absent, Smoker = nonsmoker) Bronchitis Cancer Smoker Count Prob. ---------- ------- --------- ----- ----- absent absent nonsmoker 3 0.6 present absent nonsmoker 2 0.4 P(Cancer, Smoker) Cancer Smoker Count Prob. ------- --------- ----- ----- absent nonsmoker 5 0.05 absent smoker 0 0 present nonsmoker 45 0.45 present smoker 50 0.5 Straightforward and natural, but complexity is a problem!

Graphical Models Graphical models are a marriage between probability theory and graph theory[jord 99]. Probability theory to deal with uncertainly. Graph theory to deal with complexity (compact representation and efficient reasoning).

A simple Bayesian network P(Pneumonia, Tuberculosis, Lung Infiltrates, XRay, Sputum Smear).

A simple Bayesian network P(Pneumonia, Tuberculosis, Lung Infiltrates, XRay, Sputum Smear). [Koll 07]

A simple Bayesian network (ctd) [Koll 07]

A simple Bayesian network (ctd) [Koll 07] P(P, T, I, X, S) = P(P)P(T )P(I P, T )P(X I )P(S, T )

Product Rule of Probability and Independence Why we can write this: P(P, T, I, X, S) = P(P)P(T )P(I P, T )P(X I )P(S, T )?

Product Rule of Probability and Independence Why we can write this: P(P, T, I, X, S) = P(P)P(T )P(I P, T )P(X I )P(S, T )? Repeated application of the product rule of probability: P(X 1,..., X n ) = P(X n X 1,..., X n 1 )...P(X 2 X 1 )P(X 1 ).

Product Rule of Probability and Independence Why we can write this: P(P, T, I, X, S) = P(P)P(T )P(I P, T )P(X I )P(S, T )? Repeated application of the product rule of probability: P(X 1,..., X n ) = P(X n X 1,..., X n 1 )...P(X 2 X 1 )P(X 1 ). This can be seen a fully connected graph (Bayesian network) with each node having incoming links from all lower numbered nodes.

Product Rule of Probability and Independence Why we can write this: P(P, T, I, X, S) = P(P)P(T )P(I P, T )P(X I )P(S, T )? Repeated application of the product rule of probability: P(X 1,..., X n ) = P(X n X 1,..., X n 1 )...P(X 2 X 1 )P(X 1 ). This can be seen a fully connected graph (Bayesian network) with each node having incoming links from all lower numbered nodes. Independent variables: If P(X 1, X 2 ) = P(X 1 )P(X 2 ), then X 1 and X 2 are independent and P(X 2 X 1 ) = P(X 2 ).

Diagnostic Bayesian Networks Figure: This is a common structure of diagnostic networks: predisposition nodes at the top (Visit to Asia, Smoking), diseases in the middle (Tuberculosis, Lung Cancer, Bronchitis), and symptoms at the bottom (XRay Result, Dyspnea).

Independence/Dependance in Bayesian Networks X is conditionally independent of Y given Z if P(X = x, Y = y Z = z) = P(X = x Z = z)p(y = y Z = z); (X Y Z).

Independence/Dependance in Bayesian Networks X is conditionally independent of Y given Z if P(X = x, Y = y Z = z) = P(X = x Z = z)p(y = y Z = z); (X Y Z). A and B are marginally dependent, and A and B are conditionally independent.

Independence/Dependance in Bayesian Networks X is conditionally independent of Y given Z if P(X = x, Y = y Z = z) = P(X = x Z = z)p(y = y Z = z); (X Y Z). A and B are marginally dependent, and A and B are conditionally independent. A and B are marginally independent, A and B are conditionally dependent.

Reasoning Computing conditional/marginal distributions. Exact inference (e.g., junction tree algorithm) is efficient: chain-like graphs, tree-like graphs. Approximate inference (variational methods, sampling, etc.) when: dense graphs, layered graphs, coupled graphs [Jord 97].

Sampling We would like to compute marginal, P(X i ), or conditional, P(Y E), probability. Lets assume that it is impossible to do this exactly. If we can create a generative model of the distribution we are interested in, we can sample from this distribution and compute desired probability empirically.

Markov Chains Transition matrix: Start state: [,1][,2][,3][,4] [,1] [,2] [,3] [,4] [1,] 0.4 0.2 0.3 0.1 1 0 0 0 [2,] 0.4 0.4 0.2 0.0 After 3 time steps: [3,] 0.6 0.2 0.1 0.1 [,1] [,2] [,3] [,4] [4,] 0.7 0.1 0.0 0.2 0.465 0.237 0.212 0.086 Start state: [,1] [,2] [,3] [,4] 0 0 1 0 After 3 time steps: [,1] [,2] [,3] [,4] 0.473 0.237 0.204 0.086 [Cuss 09]

Markov Chain Monte Carlo - MCMC It is a simulation-based method, hence Monte Carlo. Sample from a sequence of distributions which gets progressively closer to the target distribution (query). Sampling from a Markov chain, P MC (S i+1 S i ), S i = {X i 1,..., X i N }. A stationary distribution more formally. π the transition matrix for a Markov chain. If P is a distribution such that P = Pπ then P is said to be a stationary distribution. The aim of Markov chain Monte Carlo methods is to design a Markov chain whose stationary distribution is the target distribution (query). The initial distribution does not matter. Initial, burn-in samples can be removed.

Gibbs Sampling - One Variable at a Time X = {X 1,..., X n } is a set of variables and a query is P(Y E). Set up Markov chain as follows: initialise all Xi Y arbitrarily (variables in E are known), chose i (randomly or cycle over all variables in Y taking each one in turn, but do not choose variables from E), sample from P(X i {E X \ X i \ E}), iterate (no samples rejected). It has been proved that this process has P(Y E) as its equilibrium. How to do this in graphical models?

Gibbs Sampling in Graphical Models In graphical models, when sampling from P(X i {E X \ X i \ E}), the conditional independence can be exploited. Markov blanket: shields a variable from the influence of other variables. Thus, in BNs we can sample from P(X i MarkovBlanket(X i ). We can always do this in Gibbs sampling because we sample only one variable at a time so we assume that all other variables are known (in the current state of the Markov chain).

Gibbs Sampling Does Not Always Work

Working Example Time for demo: visualisation of Gibbs sampling in graphical models.

References [Cuss 09] [Jord 97] [Jord 99] [Koll 07] J. Cussens. Algorithms for Graphical Models: Lecture Notes. 2009. M. I. Jordan. An Introduction to Graphical Models. Tech. Rep., Center for Biological and Computational Learning, MIT, 1997. M. I. Jordan, Ed. Learning in Graphical Models. The MIT Press, 1999. D. Koller, N. Friedman, L. Getoor, and B. Taskar. Introduction to Statistical Relational Learning, Chap. Graphical Models in a Nutshell. MIT Press, 2007.