Inference in Bayesian networks

Similar documents
Artificial Intelligence Methods. Inference in Bayesian networks

Inference in Bayesian networks

Inference in Bayesian Networks

Bayesian networks. Chapter Chapter

Bayesian Networks. Philipp Koehn. 6 April 2017

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Bayesian Networks. Philipp Koehn. 29 October 2015

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Artificial Intelligence

Outline } Exact inference by enumeration } Exact inference by variable elimination } Approximate inference by stochastic simulation } Approximate infe

Bayesian networks. Chapter 14, Sections 1 4

Lecture 10: Bayesian Networks and Inference

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probabilistic Reasoning Systems

Uncertainty and Bayesian Networks

BAYESIAN NETWORKS AIMA2E CHAPTER (SOME TOPICS EXCLUDED) AIMA2e Chapter (some topics excluded) 1

School of EECS Washington State University. Artificial Intelligence

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

CS 188: Artificial Intelligence. Bayes Nets

last two digits of your SID

Quantifying Uncertainty

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayes Networks 6.872/HST.950

Informatics 2D Reasoning and Agents Semester 2,

Belief Networks for Probabilistic Inference

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

PROBABILISTIC REASONING SYSTEMS

Quantifying uncertainty & Bayesian networks

Introduction to Artificial Intelligence Belief networks

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CS 343: Artificial Intelligence

CS 730/730W/830: Intro AI

Artificial Intelligence Bayesian Networks

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Course Overview. Summary. Outline

Bayesian networks. AIMA2e Chapter 14

Bayes Nets III: Inference

Artificial Intelligence Bayes Nets: Independence

Review: Bayesian learning and inference

Machine Learning for Data Science (CS4786) Lecture 24

Informatics 2D Reasoning and Agents Semester 2,

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Directed Graphical Models

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayes Nets: Independence

Graphical Models and Kernel Methods

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Reasoning Under Uncertainty

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence Spring Announcements

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

14 PROBABILISTIC REASONING

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Bayesian Networks. Motivation

Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

Reasoning Under Uncertainty

CSEP 573: Artificial Intelligence

Graphical Models - Part I

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Introduction to Artificial Intelligence. Unit # 11

Uncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes

COMP538: Introduction to Bayesian Networks

PROBABILISTIC REASONING Outline

Bayesian networks: approximate inference

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CSE 473: Artificial Intelligence Autumn 2011

Probabilistic Graphical Models (I)

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Lecture 8: Bayesian Networks

Intelligent Systems (AI-2)

Foundations of Artificial Intelligence

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example

4 : Exact Inference: Variable Elimination

PROBABILISTIC REASONING OVER TIME

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

COMP5211 Lecture Note on Reasoning under Uncertainty

Approximate Inference

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Inference in Bayesian networks

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Data Mining 2018 Bayesian Networks (1)

Inference in Bayesian Networks

Learning Bayesian Networks (part 1) Goals for the lecture

Likelihood Weighting and Importance Sampling

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Probabilistic Representation and Reasoning

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Transcription:

Inference in Bayesian networks AIMA2e hapter 14.4 5 1 Outline Exact inference by enumeration Exact inference by variable elimination Approximate inference by stochastic simulation Approximate inference by Markov chain Monte arlo 2

Inference tasks imple queries: compute posterior marginal P(X i E = e) e.g., P (NoGas Gauge = empty, Lights = on, tarts = false) onjunctive queries: P(X i, X j E = e) =P(X i E = e)p(x j X i, E = e) Optimal decisions: decision networks include utility information; probabilistic inference required for P (outcome action, evidence) Value of information: which evidence to seek next? ensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor? 3 Inference by enumeration lightly intelligent way to sum out variables from the joint without actually constructing its explicit representation imple query on the burglary network: B P(B j, m) = P(B, j, m)/p(j, m) A = αp(b, j, m) = α e ap(b, e, a, j, m) J E M Rewrite full joint entries using product of P entries: P(B j, m) = α e ap(b)p(e)p(a B, e)p(j a)p(m a) = αp(b) e P(e) ap(a B, e)p(j a)p(m a) Recursive depth-first enumeration: O(n) space, O(d n ) time 4

Enumeration algorithm function ENUMERAION-AK(X, e, bn) returns a distribution over X inputs: X, the query variable e, observed values for variables E bn, a Bayesian network with variables {X} E Y Q(X ) a distribution over X, initially empty for each value x i of X do extend e with value x i for X Q(x i ) ENUMERAE-ALL(VAR[bn], e) return NORMALIZE(Q(X)) function ENUMERAE-ALL(vars, e) returns a real number if EMPY?(vars) then return 1.0 Y IR(vars) if Y has value y in e then return P (y Pa(Y )) ENUMERAE-ALL(RE(vars), e) else return P (y Pa(Y )) ENUMERAE-ALL(RE(vars), e y) y where e y is e extended with Y = y 5 Evaluation tree Enumeration is inefficient: repeated computation e.g., computes P (j a)p (m a) for each value of e P(b).001 P(e).002 P( e).998 P(a b,e) P( a b,e) P(a b, e) P( a b, e).95.05.94.06 P(j a) P(j a) P(j a).05 P(j a).05 P(m a) P(m a).70.01 6 P(m a) P(m a).70.01

Inference by variable elimination Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid recomputation P(B j, m) = α P(B) }{{} B e P (e) }{{} E a P(a B, e) P (j a) P (m a) }{{}}{{}}{{} A J M = αp(b) e P(e) a P(a B, e)p(j a)f M(a) = αp(b) e P(e) a P(a B, e)f J(a)f M (a) = αp(b) e P(e) a f A(a, b, e)f J (a)f M (a) = αp(b) ep(e)fājm(b, e) (sum out A) = αp(b)f ĒĀJM (b) (sum out E) = αf B (b) f (b) ĒĀJM 7 Variable elimination: Basic operations umming out a variable from a product of factors: move any constant factors outside the summation add up submatrices in pointwise product of remaining factors x f 1 f k = f 1 f i x f i+1 f k = f 1 f i f X assuming f 1,...,f i do not depend on X Pointwise product of factors f 1 and f 2 : f 1 (x 1,...,x j,y 1,...,y k ) f 2 (y 1,...,y k,z 1,...,z l ) = f(x 1,...,x j,y 1,...,y k,z 1,...,z l ) E.g., f 1 (a, b) f 2 (b, c) =f(a, b, c) 8

Variable elimination algorithm function ELIMINAION-AK(X, e, bn) returns a distribution over X inputs: X, the query variable e, evidence specified as an event bn, a belief network specifying joint distribution P(X 1,...,X n ) factors []; vars REVERE(VAR[bn]) for each var in vars do factors [MAKE-AOR(var, e) factors] if var is a hidden variable then factors UM-OU(var, factors) return NORMALIZE(POINWIE-PRODU(factors)) 9 Irrelevant variables onsider the query P (Johnalls Burglary = true) P (J b) =αp (b) P (e) P (a b, e)p (J a) P (m a) e a m B A E um over m is identically 1; M is irrelevant to the query J M hm 1: Y is irrelevant unless Y Ancestors({X} E) Here, X = Johnalls, E = {Burglary},and Ancestors({X} E) ={Alarm, Earthquake} so M is irrelevant (ompare this to backward chaining from the query in Horn clause KBs) 10

L L L L Irrelevant variables contd. Defn: moral graph of Bayes net: marry all parents and drop arrows Defn: A is m-separated from B by iff separated by in the moral graph hm 2: Y is irrelevant if m-separated from X by E or P (Johnalls Alarm = true), both Burglary and Earthquake are irrelevant B A E J M 11 omplexity of exact inference ingly connected networks (or polytrees): any two nodes are connected by at most one (undirected) path time and space cost of variable elimination are O(d k n) Multiply connected networks: can reduce 3A to exact inference NP-hard equivalent to counting 3A models #P-complete 0.5 0.5 0.5 0.5 A B D 1. A v B v 2. v D v A 1 2 3 3. B v v D AND 12

Inference by stochastic simulation Basic idea: 1) Draw N samples from a sampling distribution 2) ompute an approximate posterior probability ˆP 3) how this converges to the true probability P 0.5 oin Outline: ampling from an empty network Rejection sampling: reject samples disagreeing with evidence Likelihood weighting: use evidence to weight samples Markov chain Monte arlo (MM): sample from a stochastic process whose stationary distribution is the true posterior 13 ampling from an empty network function PRIOR-AMPLE(bn) returns an event sampled from bn inputs: bn, a belief network specifying joint distribution P(X 1,...,X n ) x an event with n elements for i = 1to n do x i a random sample from P(X i Parents(X i )) return x 14

Example loudy P(R ) P( ) R.99 P(W,R) P().01 15 Example loudy P(R ) P( ) R.99 P(W,R) P().01 16

Example loudy P(R ) P( ) R.99 P(W,R) P().01 17 Example loudy P(R ) P( ) R.99 P(W,R) P().01 18

Example loudy P(R ) P( ) R.99 P(W,R) P().01 19 Example loudy P(R ) P( ) R.99 P(W,R) P().01 20

Example P() loudy P( ) P(R ) R P(W,R) 21.99.01 ampling from an empty network contd. Probability that PRIORAMPLE generates a particular event P (x 1...x n )= n i =1 P (x i Parents(X i )) = P (x 1...x n ) i.e., the true prior probability E.g., P (t, f, t, t) =0.5 0.9 0.8 0.9 =0.324 = P (t, f, t, t) Let N P (x 1...x n ) be the number of samples generated for event x 1,...,x n hen we have lim N ˆP (x 1,...,x n ) = lim P(x 1,...,x n )/N N = P (x 1,...,x n ) = P (x 1...x n ) hat is, estimates derived from PRIORAMPLE are consistent horthand: ˆP (x 1,...,x n ) P (x 1...x n ) 22

Rejection sampling ˆP(X e) estimated from samples agreeing with e function REJEION-AMPLING(X, e, bn, N) returns an estimate of P (X e) local variables: N, a vector of counts over X, initially zero for j =1toN do x PRIOR-AMPLE(bn) if x is consistent with ethen N[x] N[x]+1 where x is the value of X in x return NORMALIZE(N[X]) E.g., estimate P( = true) using 100 samples 27 samples have = true Of these, 8 have = true and 19 have = false. ˆP( = true) =NORMALIZE(< 8, 19 >) =< 0.296, 0.704 > imilar to a basic real-world empirical estimation procedure 23 Analysis of rejection sampling ˆP(X e) =αn P (X, e) (algorithm defn.) = N P (X, e)/n P (e) (normalized by N P (e)) P(X, e)/p(e) (property of PRIORAMPLE) = P(X e) (defn. of conditional probability) Hence rejection sampling returns consistent posterior estimates Problem: hopelessly expensive if P (e) is small P (e) drops off exponentially with number of evidence variables! 24

Likelihood weighting Idea: fix evidence variables, sample only nonevidence variables, and weight each sample by the likelihood it accords the evidence function LIKELIHOOD-WEIGHING(X, e, bn, N) returns an estimate of P (X e) local variables: W, a vector of weighted counts over X, initially zero for j =1toN do x, w WEIGHED-AMPLE(bn) W[x ] W[x ]+w where x is the value of X in x return NORMALIZE(W[X ]) function WEIGHED-AMPLE(bn, e) returns an event and a weight x an event with n elements; w 1 for i =1to n do if X i has a value x i in e then w w P (X i = x i Parents(X i )) else x i a random sample from P(X i Parents(X i )) return x, w 25 Likelihood weighting example P() loudy P( ) P(R ) R P(W,R).99.01 w =1.0 26

Likelihood weighting example loudy P(R ) P( ) R.99 P(W,R) P().01 w =1.0 27 Likelihood weighting example loudy P(R ) P( ) R.99 P(W,R) P().01 w =1.0 28

Likelihood weighting example P() loudy P( ) P(R ) R P(W,R).99.01 w =1.0 0.1 29 Likelihood weighting example P() loudy P( ) P(R ) R P(W,R).99.01 w =1.0 0.1 30

Likelihood weighting example P() loudy P( ) P(R ) R P(W,R).99.01 w =1.0 0.1 31 Likelihood weighting example P() loudy P( ) P(R ) R P(W,R).99.01 w =1.0 0.1 0.99 = 0.099 32

Likelihood weighting analysis ampling probability for WEIGHEDAMPLE is W (z, e) = l i =1 P (z i Parents(Z i )) Note: pays attention to evidence in ancestors only somewhere in between prior and posterior distribution Weight for a given sample z, e is w(z, e) = m i =1 P (e i Parents(E i )) Hence likelihood weighting returns consistent estimates but performance still degrades with many evidence variables because a few samples have nearly all the total weight loudy Weighted sampling probability is W (z, e)w(z, e) = l i =1 P (z i Parents(Z i )) m i =1 P (e i Parents(E i )) = P (z, e) (by standard global semantics of network) 33 Approximate inference using MM tate of network = current assignment to all variables. Generate next state by sampling one variable given Markov blanket ample each variable in turn, keeping evidence fixed function MM-AK(X, e, bn, N) returns an estimate of P (X e) local variables: N[X ], a vector of counts over X, initially zero Z, the nonevidence variables in bn x, the current state of the network, initially copied from e initialize x with random values for the variables in Y for j =1toN do N[x ] N[x ]+1where x is the value of X in x for each Z i in Zdo sample the value of Z i in x from P(Z i MB(Z i )) given the values of MB(Z i ) in x return NORMALIZE(N[X ]) an also choose a variable to sample at random each time 34

he Markov chain With = true, W et = true, there are four states: loudy loudy loudy loudy Wander about for a while, average what you see 35 MM example contd. Estimate P( = true, = true) ample loudy or given its Markov blanket, repeat. ount number of times is true and false in the samples. E.g., visit 100 states 31 have = true, 69have = false ˆP( = true, W et = true) = NORMALIZE(< 31, 69 >) =< 0.31, 0.69 > heorem: chain approaches stationary distribution: long-run fraction of time spent in each state is exactly proportional to its posterior probability 36

Markov blanket sampling Markov blanket of loudy is and Markov blanket of is loudy,,and loudy Probability given the Markov blanket is calculated as follows: P (x i MB(X i)) = P (x i Parents(X i)) Z j hildren(x i ) P (z j Parents(Z j )) Easily implemented in message-passing parallel systems, brains Main computational problems: 1) Difficult to tell if convergence has been achieved 2) an be wasteful if Markov blanket is large: P (X i MB(X i )) won t change much (law of large numbers) 37 ummary Exact inference by variable elimination: polytime on polytrees, NP-hard on general graphs space = time, very sensitive to topology Approximate inference by LW, MM: LW does poorly when there is lots of (downstream) evidence LW, MM generally insensitive to topology onvergence can be very slow with probabilities close to 1 or 0 an handle arbitrary combinations of discrete and continuous variables 38