Bayesian networks. Independence. Bayesian networks. Markov conditions Inference. by enumeration rejection sampling Gibbs sampler

Similar documents
Introduction to Bayesian Networks. Probabilistic Models, Spring 2009 Petri Myllymäki, University of Helsinki 1

Introduction to Bayesian Networks

Informatics 2D Reasoning and Agents Semester 2,

Rapid Introduction to Machine Learning/ Deep Learning

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

CS 343: Artificial Intelligence

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Learning Bayesian Networks

Bayes Nets III: Inference

Bayes Nets: Independence

PROBABILISTIC REASONING SYSTEMS

CS 188: Artificial Intelligence. Bayes Nets

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Directed Graphical Models

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Inference in Bayesian Networks

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Artificial Intelligence Bayes Nets: Independence

CS 5522: Artificial Intelligence II

Bayesian networks. Instructor: Vincent Conitzer

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Introduction to Bayesian Networks

Intelligent Systems (AI-2)

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Bayes Nets: Sampling

14 PROBABILISTIC REASONING

Conditional Independence

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Lecture 8: Bayesian Networks

Artificial Intelligence Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Artificial Intelligence

Bayesian Networks. Philipp Koehn. 6 April 2017

Probabilistic Reasoning Systems

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Quantifying uncertainty & Bayesian networks

Bayesian Networks. Philipp Koehn. 29 October 2015

Graphical Models and Kernel Methods

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

Chris Bishop s PRML Ch. 8: Graphical Models

Approximate Inference

T Machine Learning: Basic Principles

Probabilistic Graphical Networks: Definitions and Basic Results

CMPSCI 240: Reasoning about Uncertainty

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Graphical Models. Mark Gales. Lent Machine Learning for Language Processing: Lecture 3. MPhil in Advanced Computer Science

CS 188: Artificial Intelligence Spring Announcements

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

CSEP 573: Artificial Intelligence

Outline } Exact inference by enumeration } Exact inference by variable elimination } Approximate inference by stochastic simulation } Approximate infe

Cognitive Systems 300: Probability and Causality (cont.)

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Bayes Networks 6.872/HST.950

Lecture 10: Bayesian Networks and Inference

Probabilistic Graphical Models Redes Bayesianas: Definição e Propriedades Básicas

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Bayesian networks: approximate inference

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

Inference in Bayesian networks

Machine Learning Lecture 14

Belief Networks for Probabilistic Inference

Chapter 16. Structured Probabilistic Models for Deep Learning

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

School of EECS Washington State University. Artificial Intelligence

Undirected Graphical Models: Markov Random Fields

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Directed Graphical Models

Directed and Undirected Graphical Models

Informatics 2D Reasoning and Agents Semester 2,

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Bayesian Networks Basic and simple graphs

Uncertainty and Bayesian Networks

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Sampling from Bayes Nets

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Graphical Models - Part I

Probabilistic Graphical Models (I)

Bayesian Networks. CompSci 270 Duke University Ron Parr. Why Joint Distributions are Important

Artificial Intelligence Methods. Inference in Bayesian networks

Data Mining 2018 Bayesian Networks (1)

Today. Why do we need Bayesian networks (Bayes nets)? Factoring the joint probability. Conditional independence

Directed Graphical Models or Bayesian Networks

Probabilistic Models. Models describe how (a portion of) the world works

Introduction to Causal Calculus

Transcription:

Bayesian networks Independence Bayesian networks Markov conditions Inference by enumeration rejection sampling Gibbs sampler

Independence if P(A=a,B=a) = P(A=a)P(B=b) for all a and b, then we call A and B (marginally) independent. if P(A=a,B=a C=c) = P(A=a C=c)P(B=b C=c) for all a and b, then we call A and B conditionally independent given C=c. if P(A=a,B=a C=c) = P(A=a C=c)P(B=b C=c) for all a, b and c, then we call A and B conditionally independent given C. P A,B =P A P B implies P A,B P A B = = P A P B =P A P B P B

Independence saves space If A and B are independent given C P(A,B,C) = P(C,A,B) = P(C)P(A C)P(B A,C) = P(C)P(A C)P(B C) Instead of having a full joint probability table for P(A,B,C), we can have a table for P(C) and tables P(A C=c) and P(B C=c) for each c. Even for binary variables this saves space: 2 3 = 8 vs. 2 + 2 + 2 = 6. With many variables and many independences you save a lot.

Chain Rule Independence - BN Chain rule : P A, B,C, D =P A P B A P C A, B P D A, B,C A B C D Independence : P A, B,C, D =P A P B P C A, B P D A,C A B A B C D C Bayesian Network D

But order matters P(A,B,C) = P(C,A,B) P(A)P(B A)P(C A,B) = P(C)P(A C)P(B A,C) And if A and B are conditionally independent given C: 1.P(A,B,C) = P(A)P(B A)P(C A,B) 2.P(C,A,B) = P(C)P(A C)P(B C) A C B 1. 2. C A B With the same independence assumptions, some orders yield simpler networks.

Bayes net as a factorization Bayesian network structure forms a directed acyclic graph (DAG). If we have a DAG G, we denote the parents of the node (variable) X i with Pa G (x i ) and a value configuration of Pa G (x i ) with pa G (x i ) : n P x 1, x 2,..., x n G = i=1 P x i pa G x i, where P(x i pa G (x i )) are called local probabilities. Local probabilities are stored in conditional probability tables CPTs.

A Bayesian network P(Cloudy) P(Sprinkler Cloudy) Cloudy Sprinkler=onSprinkler=off no 0.5 0.5 yes 0.9 0.1 Cloudy=no Cloudy=yes 0.5 0.5 Sprinkler Cloudy Rain P(Rain Cloudy) Cloudy Rain=yes Rain=no no 0.2 0.8 yes 0.8 0.2 Wet Grass P(WetGrass Sprinkler, Rain) Sprinkler Rain WetGrass=yesWetGrass=no on no 0.90 0.10 on yes 0.99 0.01 off no 0.01 0.99 off yes 0.90 0.10

Causal order recommended Causes first, then effects. Since causes render direct consequences independent yielding smaller CPTs Causal CPTs are easier to assess by human experts Smaller CPT:s are easier to estimate reliably from a finite set of observations (data) Causal networks can be used to make causal inferences too.

Markov conditions Local (parental) Markov condition X is independent of its ancestors given its parents. Global Markov Condition X is independent of any set of other variables given its parents, children and parents of its children (Markov blanket) D-separation X and Y are dependent given Z, if there is an unblocked path without colliders between X and Y. or if each collider or some descendant of each collider is in Z.

Inference in Bayesian networks Given a Bayesian network B (i.e., DAG and CPTs), calculate P(X e) where X is a set of query variables and e is an instantiaton of observed variables E (X and E separate). There is always the way through marginals: normalize P(x,e) = Σ y dom(y) P(x,y,e), where dom(y), is a set of all possible instantiations of the unobserved non-query variables Y. There are much smarter algorithms too, but in general the problem is NP hard.

Approximate inference in Bayesian networks How to estimate how probably it rains next day, if the previous night temperature is above the month average. count rainy and non rainy days after warm nights (and count relative frequencies). Rejection sampling for P(X e) : 1.Generate random vectors (x r,e r,y r ). 2.Discard those those that do not match e. 3.Count frequencies of different x r and normalize.

How to generate random vectors from a Bayesian network Sample parents first Cloudy=no Cloudy=yes 0.5 0.5 P(C) (0.5, 0,5) yes Cloudy Sprinkler=onSprinkler=off no 0.5 0.5 yes 0.9 0.1 Cloudy Rain=yes Rain=no no 0.2 0.8 yes 0.8 0.2 P(S C=yes) (0.9, 0.1) on P(R C=yes) (0.8, 0.2) no Sprinkler Rain WetGrass=yesWetGrass=no on no 0.90 0.10 on yes 0.99 0.01 off no 0.01 0.99 off yes 0.90 0.10 P(W S=on, R=no) (0.9, 0.1) yes P(C,S,R,W) = P(yes,on,no,yes) = 0.5 x 0.9 x 0.2 x 0.9 = 0.081

Rejection sampling, bad news Good news first: super easy to implement Bad news: if evidence e is improbable, generated random vectors seldom conform with e, thus it takes a long time before we get a good estimate P(X e). With long E, all e are improbable. So called likelihood weighting can alleviate the problem a little bit, but not enough.

Gibbs sampling Given a Bayesian network for n variables X E Y, calculate P(X e) as follows: N = (associative) array of zeros Generate random vector x,y. While True: for V in X,Y: generate v from P(V MarkovBlanket(V)) replace v in x,y. N[x] +=1 print normalize(n[x])

P(X mb(x))? P X mb X =P X mb x,rest = P X,mb X,Rest P mb X,Rest P All = X i X P X i Pa X i =P X Pa X C ch X P X Pa X C ch X P C Pa C P C Pa C R Rest Pa V P R Pa R

Why does it work All decent Markov Chains q have a unique stationary distribution P* that can be estimated by simulation. Detailed balance of transition function q and state distribution P* implies stationarity of P*. Proposed q, P(V mb(v)), and P(X e) form a detailed balance, thus P(X e) is a stationary distribution, so it can be estimated by simulation.

Markov chains stationary distribution Defined by transition probabilities between states q(x x'), where x and x' belong to a set of states X. Distribution P* over X is called stationary distribution for the Markov Chain q, if P*(x')= x P*(x)q(x x'). P*(X) can be found out by simulating Markov Chain q starting from the random state x r.

Markov Chain detailed balance Distribution P over X and a state transition distribution q are said to form a detailed balance, if for any states x and x', P(x)q(x x') = P(x')q(x' x), i.e. it is equally probable to witness transition from x to x' as it is to witness transition from x' to x. If P and q form a detailed balance, x P(x)q(x x') = x P(x')q(x' x) = P(x') x q(x' x) =P(x'), thus P is stationary.

Gibbs sampler as Markov Chain Consider Z=(X,Y) to be states of a Markov chain, and q((v,z -V )) (v',z -V ))=P(v' z -V, e), where Z -V = Z-{V}. Now P*(Z)=P(Z e) and q form a detailed balance, thus P* is a stationary distribution of q and it can be found with the sampling algorithm. P*(z)q(z z') = P(z e)p(v' z -V, e) = P(v,z -V e)p(v' z -V, e) = P(v z -V,e)P(z -V e)p(v' z -V, e) = P(v z -V,e)P(v', z -V e) = q(z' z)p*(z'), thus balance.