Artificial Intelligence Bayesian Networks

Similar documents
PROBABILISTIC REASONING SYSTEMS

Informatics 2D Reasoning and Agents Semester 2,

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian networks. Chapter 14, Sections 1 4

School of EECS Washington State University. Artificial Intelligence

14 PROBABILISTIC REASONING

Bayesian Networks. Motivation

Probabilistic Reasoning Systems

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Introduction to Artificial Intelligence. Unit # 11

Bayesian networks. Chapter Chapter

Quantifying uncertainty & Bayesian networks

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Artificial Intelligence

Directed Graphical Models

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Graphical Models and Kernel Methods

Lecture 8: Bayesian Networks

CS 343: Artificial Intelligence

Graphical Models - Part I

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

last two digits of your SID

Introduction to Artificial Intelligence Belief networks

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

CS 188: Artificial Intelligence. Bayes Nets

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Learning Bayesian Networks (part 1) Goals for the lecture

Bayes Networks 6.872/HST.950

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

COMP5211 Lecture Note on Reasoning under Uncertainty

Machine Learning for Data Science (CS4786) Lecture 24

Bayesian Networks. Philipp Koehn. 6 April 2017

Review: Bayesian learning and inference

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Uncertainty and Bayesian Networks

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Bayesian Networks. Philipp Koehn. 29 October 2015

Bayes Nets III: Inference

Inference in Bayesian Networks

Inference in Bayesian networks

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Artificial Intelligence Markov Chains

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Bayesian belief networks. Inference.

Lecture 10: Bayesian Networks and Inference

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS Lecture 3. More Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

CS 188: Artificial Intelligence Spring Announcements

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Artificial Intelligence Bayes Nets: Independence

Informatics 2D Reasoning and Agents Semester 2,

CS Belief networks. Chapter

Probabilistic representation and reasoning

CSEP 573: Artificial Intelligence

Introduction to Artificial Intelligence (AI)

Y. Xiang, Inference with Uncertain Knowledge 1

CS 188: Artificial Intelligence Fall 2009

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Bayesian belief networks

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Another look at Bayesian. inference

Intro to AI: Lecture 8. Volker Sorge. Introduction. A Bayesian Network. Inference in. Bayesian Networks. Bayesian Networks.

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

CS 188: Artificial Intelligence Spring Announcements

An Introduction to Bayesian Networks: Representation and Approximate Inference

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

Introduction to Bayesian Learning

Probabilistic representation and reasoning

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

CS 5522: Artificial Intelligence II

Graphical Models - Part II

Artificial Intelligence Methods. Inference in Bayesian networks

Bayes Nets: Independence

Introduction to Bayesian Networks

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Probabilistic Models

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Reasoning Under Uncertainty

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Probabilistic Graphical Networks: Definitions and Basic Results

Bayes Nets: Sampling

CS 5522: Artificial Intelligence II

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

Transcription:

Artificial Intelligence Bayesian Networks Stephan Dreiseitl FH Hagenberg Software Engineering & Interactive Media Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 1 / 43 Overview Representation of uncertain knowledge Constructing Bayesian networks Using Bayesian networks for inference Algorithmic aspects of inference Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 2 / 43

A simple Bayesian network example Rain Worms Umbrellas P(Rain, Worms, Umbrellas) = P(Worms Rain)P(Umbrellas Rain)P(Rain) With conditional independence, need only right-hand side to represent joint distribution Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 3 / 43 A simple Bayesian net example (cont.) Rain Worms Umbrellas Intuitively: graphical representation of influence Mathematically: graphical representation of conditional independence assertions Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 4 / 43

A more complicated example Burglary Earthquake P(b) P(e) 0.001 0.002 Alarm B E P(a B, E) T T 0.95 T F 0.94 F T 0.24 F F 0.001 MaryCalls A P(m A) T 0.7 F 0.01 JohnCalls A P(j A) T 0.9 F 0.05 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 5 / 43 Definition of Bayesian networks A Bayesian network is a directed acyclic graph with random variables as nodes, links that specify directly influences relationships, probability distributions P(X i parents(x i )) for each node X i Graph structure asserts conditional independencies: P(MaryCalls JohnCalls, Alarm, Earthquake, Burglary) = P(MaryCalls Alarm) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 6 / 43

Bayesian networks as joint probabilities P(X 1,..., X n ) = = n P(X i X 1,..., X i 1 ) i=1 n P(X i parents(x i )) i=1 for parents(x i ) {X 1,..., X i 1 } Burglary example: P(b, e, a, m, j) = P(b)P( e)p(a b, e)p( m a)p(j a) = 0.001 0.998 0.94 0.3 0.9 = 0.0002 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 7 / 43 Conditional independencies in networks Use graphical structure to visualize conditional dependencies and independencies Nodes are dependent if there is information flow between them (along one possible path) Nodes are independent if information flow is blocked (along all possible paths) Distinguish situations with and without evidence (instantiated variables) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 8 / 43

Conditional independencies in networks No evidence: Information flow along a path is blocked iff there is a head-to-head node (blocker) on path No blockers between A and B: A B A B Blocker C between A and B: A C B Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 9 / 43 Conditional independencies in networks Evidence blocks information flow, except at blockers (or their descendents), where it opens information flow Information flow between A and B blocked by evidence: A B A B Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 10 / 43

Conditional independencies in networks Information flow between A and B unblocked by evidence: A B A B Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 11 / 43 Conditional independencies in networks A node is conditionally independent of its non-descendents, given its parents P 1 P 2 A X B C C D 1 2 P(X P 1, P 2, A, B, D) = P(X P 1, P 2 ) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 12 / 43

Cond. independencies in networks (cont.) A node is conditionally independent of all other nodes in the network, given its Markov blanket: its parents, children, and children s parents P 1 P 2 A X B C C D 1 2 P(X P 1, P 2, C 1, C 2, A, B, D) = P(X P 1, P 2, C 1, C 2, A, B) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 13 / 43 Noisy OR For Boolean node X with n Boolean parents, conditional probability table has 2 n entries Noisy OR assumption reduces this number to n: Assume each parent may be inhibited independently Flu Malaria Cold Fever Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 14 / 43

Noisy OR (cont.) Need only specify first three entries of table: Flu Malaria Cold P( fever) T F F 0.2 F T F 0.1 F F T 0.6 F F F 1.0 F T T 0.1 0.6 = 0.06 T F T 0.2 0.6 = 0.12 T T F 0.2 0.1 = 0.02 T T T 0.2 0.1 0.6 = 0.012 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 15 / 43 Building an example network When I go home at night, I want to know if my family is home before I try the doors (perhaps the most convenient door to enter is double locked when nobody is home). Now, often when my wife leaves the house she turns on an outdoor light. However, she sometimes turns on this light if she is expecting a guest. Also, we have a dog. When nobody is home, the dog is put in the back yard. The same is true if the dog has bowel trouble. Finally, if the dog is in the back yard, I will probably hear her barking, but sometimes I can be confused by other dogs barking. F. Jensen, An introduction to Bayesian networks, UCL Press, 1996. Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 16 / 43

Building an example network (cont.) Relevant entities: Boolean random variables FamilyOut, LightsOn, HearDogBark Causal structure: FamilyOut has direct influence on both LightsOn and HearDogBark, so LightsOn and HearDogBark are conditionally independent given FamilyOut FamilyOut LightsOn HearDogBark Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 17 / 43 Building an example network (cont.) Numbers in conditional probability table derived from previous experience, or subjective belief P(familyout) = 0.2 P(lightson familyout) = 0.99 P(lightson familyout) = 0.1 Run into problem with P(heardogbark familyout): dog may be out because of bowel problems, and barking may be other dogs Network structure needs to be updated to reflect this Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 18 / 43

Building an example network (cont.) Introduce mediating variable DogOut to model uncertainty with bowel problems and hearing other dogs bark FamilyOut BowelProblems LightsOn DogOut HearDogBark Need: P(DogOut FamilyOut, BowelProblems) P(HearDogBark DogOut) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 19 / 43 Building an example network (cont.) Obtain the following additional probability tables: FamilyOut P(f) 0.2 BowelProblems P(b) 0.05 LightsOn DogOut F P(l F) T 0.99 F 0.1 F B P(d F, B) T T 0.99 T F 0.88 F T 0.96 F F 0.2 HearDogBark D P(h D) T 0.6 F 0.25 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 20 / 43

Inference in Bayesian networks Given events (instantiated variables) e and no information on hidden variables H, calculate distribution for query variable Q Algorithmically, calculate P(Q e) by marginalizing over H P(Q e) = αp(q, e) = α h P(Q, e, h) with h all possible value combinations of H Distinguish between causal, diagnostic, and intercausal reasoning Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 21 / 43 Types of inference Causal reasoning: query variable is downstream of events P(heardogbark familyout) = 0.56 Diagnostic reasoning: query variable upstream of events P(familyout heardogbark) = 0.296 Explaining away (intercausal reasoning): knowing effect and possible cause, reduce the probability of other possible causes P(familyout bowelproblems, heardogbark) = 0.203 P(bowelproblems heardogbark) = 0.078 P(bowelproblems familyout, heardogbark) = 0.053 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 22 / 43

Algorithmic aspects of inference Calculating joint distribution computationally expensive Several alternatives for inference in Bayesian networks: Exact inference by enumeration by variable elimination Stochastic inference (Monte Carlo methods) by sampling from the joint distribution by rejection sampling by likelihood weighting by Markov chain Monte Carlo methods Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 23 / 43 Inference by enumeration FamilyOut example (d {d, d}, b {b, b}) P(F l, h) = αp(f, l, h) = α P(F, l, h, d, b ) d b P(f l, h) = α d b P(f )P(b )P(l f )P(d f, b )P(h d ) = α P(f )P(l f ) d P(h d ) b P(b )P(d f, b ) = α 0.2 0.99 (0.6 0.8857 + 0.25 0.1143) = α 0.111 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 24 / 43

Inference by enumeration (cont.) Similarily, P( f l, h) = α 0.0267 From P(f l, h) + P( f l, h) = 1, normalization yields P(F l, h) = α (0.111, 0.0267) = (0.806, 0.194) Burglary example: P(B j, m) = α P(B) e P(e ) a P(a B, e )P(j a )P(m a ) Last two factors P(j a)p(m a) do not depend on e, but have to be evaluated twice (for e and e) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 25 / 43 Variable elimination Eliminate repetitive calculations by summing inside out, storing intermediate results (cf. dynamic programming) Burglary example, different query: P(J b) = α P(b) e P(e ) a P(a b, e )P(J a ) m P(m a ) } {{ } = 1 Fact: Any variable that is not an ancestor of the query or evidence variables is irrelevant and can be dropped Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 26 / 43

Sampling from the joint distribution Straightforward if there is no evidence in the network: Sample each variable in topological order For nodes without parents, sample from their distribution; for nodes with parents, sample from the conditional distribution With N S (x 1,..., x n ) being the number of times specific realizations (x 1,..., x n ) are generated in N sampling experiments, obtain lim N N S (x 1,..., x n ) N = P(x 1,..., x n ) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 27 / 43 Example: joint distribution sampling FamilyOut P(f) 0.2 BowelProblems P(b) 0.05 LightsOn DogOut F P(l F) T 0.99 F 0.1 F B P(d F, B) T T 0.99 T F 0.88 F T 0.96 F F 0.2 HearDogBark D P(h D) T 0.6 F 0.25 What is probability of family at home, dog has no bowel problems and isn t out, the light is off, and a dog s barking can be heard? Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 28 / 43

Example: joint distribution sampling FamilyOut example: Generate 100000 samples from the network by first sampling from FamilyOut and BowelProblems variables, then sampling from all other variables in turn, given sampled parent values Obtain N S ( f, b, l, d, h) = 13740 Compare with P( f, b, l, d, h) = 0.8 0.95 0.9 0.8 0.25 = 0.1368 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 29 / 43 Example: joint distribution sampling Advantage of sampling: easy to generate estimates for other probabilities Standard error of estimates drops as 1/ N, for N = 100000 this is 0.00316 N S ( d)/100000 = 0.63393 ( P( d) = 0.63246 ) N S (f, h) N S ( h) = 0.1408 ( P(f h) = 0.1416 ) Last example: Form of rejection sampling Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 30 / 43

Rejection sampling in Bayesian networks Method to approximate conditional probabilities P(X e) of variables X, given evidence e: P(X e) N S(X, e) N S (e) Rejection sampling: Take only those samples that are consistent with the evidence into account Problem with rejection sampling: Number of samples consistent with evidence drops exponentially with number of evidence variables, therefore unusable for real-life networks Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 31 / 43 Likelihood weighting Fix evidence and sample all other variables This overcomes rejection sampling shortcoming by only generating samples consistent with the evidence Problem: Consider situation with P(E = e X = x) = 0.001 and P(X = x) = 0.9. Then, 90% of samples will have X = x (and fixed E = e), but this combination is very unlikely, since P(E = e X = x) = 0.001 Solution: Weight each sample by product of conditional probabilities of evidence variables, given its parents Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 32 / 43

Example: Likelihood weighting FamilyOut example: Calculate P(F l, d) Iterate the following: sample all non-evidence variables, given the evidence variables, obtaining, e.g., ( f, b, h) calculate weighting factor, e.g. P(l f ) P(d f, b) = 0.1 0.2 = 0.02 Finally, sum and normalize weighting factors for samples (f, l, d) and ( f, l, d) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 33 / 43 Example: Likelihood weighting (cont). For N = 100000, obtain N S ( f, l, d) = 20164 N S (f, l, d) = 79836 w( f,l,d) = 1907.18 w(f,l,d) = 17676.4 P(f l, d) 17676.4/(17676.4 + 1907.18) = 0.90261 Correct: P(f l, d) = 0.90206 Disadvantage of likelihood weighting: With many evidence variables, most samples will have very small weights, and few samples with larger weights dominate Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 34 / 43

Markov chains A sequence of discrete r.v. X 0, X 1,... is called a Markov chain with state space S iff P(X n = x n X 0 = x 0,..., X n 1 = x n 1 ) = P(X n = x n X n 1 = x n 1 ) for all x 0,..., x n S. Thus, X n is conditionally independent of all variables before it, given X n 1 Specify state transition matrix P with P ij = P(X n = x j X n 1 = x i ) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 35 / 43 Markov chains Monte Carlo methods Want to obtain samples from given distributions P d (X ) (hard to sample from with other methods) Idea: Construct a Markov chain that, for arbitrary initial state x 0, converges towards a stationary (equilibrium) distribution P d (X ) Then, successive realizations x n, x n+1,... are sampled according to P d (but are not independent!!) Often not clear when convergence of chain has taken place Therefore, discard initial portion of chain (burn-in phase) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 36 / 43

Markov chain example Let S = {1, 2} with state transition matrix P = ( 1 2 Simulate chain for 1000 steps, show N S (1)/N and N S (2)/N for N = 1,..., 1000 with starting state 1 (left) and 2 (right) 1 4 1 2 3 4 ) 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 200 400 600 800 1000 200 400 600 800 1000 Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 37 / 43 MCMC for Bayesian networks Given evidence e and non-evidence variables X, use Markov chains to sample from the distribution P(X e) Obtain sequence of states x 0, x 1,..., discard initial portion After convergence, samples x k, x k+1,... have desired distribution P(X e) Many variants of Markov chain Monte Carlo algorithms Consider only Gibbs sampling Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 38 / 43

Gibbs sampling Fix evidence variables to e, assign arbitrary values to non-evidence variables X Recall: Markov blanket of a variable is parents, children, and children s parents. Iterate the following: pick arbitrary variable X i from X sample from P(X i MarkovBlanket(X i )) new state = old state, with new value of X i Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 39 / 43 Gibbs sampling (cont.) Calculating P(X i MarkovBlanket(X i )): P(x i MarkovBlanket(X i )) = α P(x i parents(x i )) Y i children(x i ) P(y i parents(y i )) With this, calculate P(x i MarkovBlanket(X i )) and P( x i MarkovBlanket(X i )), normalize to obtain P(X i MarkovBlanket(X i )) Sample from this for next value of X i, and thus next state of Markov chain Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 40 / 43

Bayesian network MCMC example FamilyOut example: Calculate P(F l, d) Start with arbitrary non-evidence settings (f, b, h) Pick F, sample from P(F l, d, b), obtain f Pick B, sample from P(B f, d), obtain b Pick H, sample from P(H d), obtain h Iterate last three steps 50000 times, keep last 10000 states Obtain P(f l, d) 0.9016 (correct 0.90206) Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 41 / 43 Comparison of inference algorithms Inference by enumeration computationally prohibitive Variable elimination removes all irrelevant variables Direct sampling from joint distribution: easy when no evidence present Use rejection sampling and likelihood weighting for more efficient calculations Markov chain Monte Carlo methods are most efficient for large networks by calculating new states based on old states, but lose independence of samples Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 42 / 43

Summary Bayesian networks are graphical representations of causal influence among random variables Network structure graphically specifies conditional independence assumptions Need conditional distributions of nodes, given its parents Use noisy OR to reduce number of parameters in tables Reasoning types in Bayesian networks: causal, diagnostic, and explaining away There are exact and approximate inference algorithms Stephan Dreiseitl (Hagenberg/SE/IM) Lecture 11: Bayesian Networks Artificial Intelligence SS2010 43 / 43