Bayesian networks. Chapter Chapter

Similar documents
Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Bayesian networks. Chapter 14, Sections 1 4

Bayesian Networks. Philipp Koehn. 6 April 2017

Bayesian Networks. Philipp Koehn. 29 October 2015

Probabilistic Reasoning Systems

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Uncertainty and Bayesian Networks

Artificial Intelligence Methods. Inference in Bayesian networks

Inference in Bayesian networks

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Introduction to Artificial Intelligence Belief networks

Inference in Bayesian Networks

Bayesian networks. Chapter Chapter

Reasoning Under Uncertainty

Lecture 10: Bayesian Networks and Inference

Quantifying uncertainty & Bayesian networks

PROBABILISTIC REASONING SYSTEMS

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Bayesian networks: Modeling

Probabilistic Reasoning. (Mostly using Bayesian Networks)

CS Belief networks. Chapter

Reasoning Under Uncertainty

School of EECS Washington State University. Artificial Intelligence

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Review: Bayesian learning and inference

Foundations of Artificial Intelligence

Artificial Intelligence

last two digits of your SID

Course Overview. Summary. Outline

Graphical Models - Part I

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Informatics 2D Reasoning and Agents Semester 2,

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

Outline } Conditional independence } Bayesian networks: syntax and semantics } Exact inference } Approximate inference AIMA Slides cstuart Russell and

Belief Networks for Probabilistic Inference

14 PROBABILISTIC REASONING

Bayesian Belief Network

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Inference in Bayesian networks

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Directed Graphical Models

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Quantifying Uncertainty

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Outline } Exact inference by enumeration } Exact inference by variable elimination } Approximate inference by stochastic simulation } Approximate infe

Probabilistic representation and reasoning

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Example. Bayesian networks. Outline. Example. Bayesian networks. Example contd. Topology of network encodes conditional independence assertions:

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Probabilistic representation and reasoning

CSE 473: Artificial Intelligence Autumn 2011

Bayesian Networks. Material used

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Bayesian Networks BY: MOHAMAD ALSABBAGH

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Artificial Intelligence Bayes Nets: Independence

Outline. Bayesian networks. Example. Bayesian networks. Example contd. Example. Syntax. Semantics Parameterized distributions. Chapter 14.

CS 5522: Artificial Intelligence II

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Bayesian Networks. Motivation

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayes Nets: Independence

Artificial Intelligence Bayesian Networks

PROBABILISTIC REASONING Outline

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Bayes Networks 6.872/HST.950

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

Graphical Models - Part II

CS 188: Artificial Intelligence Spring Announcements

Another look at Bayesian. inference

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

Introduction to Artificial Intelligence. Unit # 11

CS 188: Artificial Intelligence Spring Announcements

Bayesian networks. AIMA2e Chapter 14

CS 188: Artificial Intelligence Fall 2009

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence. Bayes Nets

Bayesian networks (1) Lirong Xia

Probabilistic Models

CS 343: Artificial Intelligence

Product rule. Chain rule

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

COMP5211 Lecture Note on Reasoning under Uncertainty

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Probabilistic Models. Models describe how (a portion of) the world works

Y. Xiang, Inference with Uncertain Knowledge 1

Probabilistic Representation and Reasoning

Learning Bayesian Networks (part 1) Goals for the lecture

BAYESIAN NETWORKS AIMA2E CHAPTER (SOME TOPICS EXCLUDED) AIMA2e Chapter (some topics excluded) 1

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

CS 188: Artificial Intelligence Fall 2008

Transcription:

Bayesian networks Chapter 14.1 3 Chapter 14.1 3 1

Outline Syntax Semantics Parameterized distributions Chapter 14.1 3 2

Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link directly influences ) a conditional distribution for each node given its parents: P(X i P arents(x i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values Chapter 14.1 3 3

Example Topology of network encodes conditional independence assertions: Weather Cavity Toothache Catch W eather is independent of the other variables T oothache and Catch are conditionally independent given Cavity Chapter 14.1 3 4

Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, M arycalls Network topology reflects causal knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call Chapter 14.1 3 5

Example contd. Burglary P(B).001 Earthquake P(E).002 B T T F F E T F T F P(A B,E).95.94.29.001 Alarm JohnCalls A T F P(J A).90.05 MaryCalls A T F P(M A).70.01 Chapter 14.1 3 6

Compactness A CPT for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values B E Each row requires one number p for X i = true (the number for X i = false is just 1 p) J A M If each variable has no more than k parents, the complete network requires O(n 2 k ) numbers I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5 1 = 31) Chapter 14.1 3 7

Global semantics Global semantics defines the full joint distribution as the product of the local conditional distributions: P (x 1,..., x n ) = Π n i = 1P (x i parents(x i )) e.g., P (j m a b e) = B J A E M Chapter 14.1 3 8

Global semantics Global semantics defines the full joint distribution as the product of the local conditional distributions: P (x 1,..., x n ) = Π n i = 1P (x i parents(x i )) e.g., P (j m a b e) = P (j a)p (m a)p (a b, e)p ( b)p ( e) = 0.9 0.7 0.001 0.999 0.998 0.00063 B J A E M Chapter 14.1 3 9

Local semantics Local semantics: each node is conditionally independent of its nondescendants given its parents U 1... U m Z 1j X Z nj Y 1... Y n Theorem: Local semantics global semantics Chapter 14.1 3 10

Markov blanket Each node is conditionally independent of all others given its Markov blanket: parents + children + children s parents U 1... U m Z 1j X Z nj Y 1... Y n Chapter 14.1 3 11

Constructing Bayesian networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics 1. Choose an ordering of variables X 1,..., X n 2. For i = 1 to n add X i to the network select parents from X 1,..., X i 1 such that P(X i P arents(x i )) = P(X i X 1,..., X i 1 ) This choice of parents guarantees the global semantics: P(X 1,..., X n ) = Π n i = 1P(X i X 1,..., X i 1 ) (chain rule) = Π n i = 1P(X i P arents(x i )) (by construction) Chapter 14.1 3 12

Inference tasks Simple queries: compute posterior marginal P(X i E = e) e.g., P (NoGas Gauge = empty, Lights = on, Starts = false) Conjunctive queries: P(X i, X j E = e) = P(X i E = e)p(x j X i, E = e) Optimal decisions: decision networks include utility information; probabilistic inference required for P (outcome action, evidence) Value of information: which evidence to seek next? Sensitivity analysis: which probability values are most critical? Explanation: why do I need a new starter motor? Chapter 14.4 5 3

Inference by enumeration Slightly intelligent way to sum out variables from the joint without actually constructing its explicit representation Simple query on the burglary network: P(B j, m) = P(B, j, m)/p (j, m) = αp(b, j, m) = α Σ e Σ a P(B, e, a, j, m) B J A E M Rewrite full joint entries using product of CPT entries: P(B j, m) = α Σ e Σ a P(B)P (e)p(a B, e)p (j a)p (m a) = αp(b) Σ e P (e) Σ a P(a B, e)p (j a)p (m a) Recursive depth-first enumeration: O(n) space, O(d n ) time Chapter 14.4 5 4

Enumeration algorithm function Enumeration-Ask(X, e, bn) returns a distribution over X inputs: X, the query variable e, observed values for variables E bn, a Bayesian network with variables {X} E Y Q(X ) a distribution over X, initially empty for each value x i of X do extend e with value x i for X Q(x i ) Enumerate-All(Vars[bn], e) return Normalize(Q(X )) function Enumerate-All(vars, e) returns a real number if Empty?(vars) then return 1.0 Y First(vars) if Y has value y in e then return P (y P a(y )) Enumerate-All(Rest(vars), e) else return y P (y P a(y )) Enumerate-All(Rest(vars), e y ) where e y is e extended with Y = y Chapter 14.4 5 5

Evaluation tree P(b).001 P(e).002 P( e).998 P(a b,e).95 P( a b,e).05 P(a b, e).94 P( a b, e).06 P(j a).90 P(j a) P(j a).05.90 P(j a).05 P(m a) P(m a).70.01 P(m a) P(m a).70.01 Enumeration is inefficient: repeated computation e.g., computes P (j a)p (m a) for each value of e Chapter 14.4 5 6

Inference by variable elimination Variable elimination: carry out summations right-to-left, storing intermediate results (factors) to avoid recomputation P(B j, m) = α P(B) } {{ } B Σ e P (e) }{{} E Σ a P(a B, e) }{{} A P (j a) } {{ } J = αp(b)σ e P (e)σ a P(a B, e)p (j a)f M (a) = αp(b)σ e P (e)σ a P(a B, e)f J (a)f M (a) = αp(b)σ e P (e)σ a f A (a, b, e)f J (a)f M (a) = αp(b)σ e P (e)f ĀJM (b, e) (sum out A) = αp(b)f ĒĀJM (b) (sum out E) = αf B (b) f ĒĀJM (b) P (m a) } {{ } M Chapter 14.4 5 7

Variable elimination: Basic operations Summing out a variable from a product of factors: move any constant factors outside the summation add up submatrices in pointwise product of remaining factors Σ x f 1 f k = f 1 f i Σ x f i+1 f k = f 1 f i f X assuming f 1,..., f i do not depend on X Pointwise product of factors f 1 and f 2 : f 1 (x 1,..., x j, y 1,..., y k ) f 2 (y 1,..., y k, z 1,..., z l ) = f(x 1,..., x j, y 1,..., y k, z 1,..., z l ) E.g., f 1 (a, b) f 2 (b, c) = f(a, b, c) Chapter 14.4 5 8

Pointwise Product Pointwise multiplication of factors when variable is summed out or at last step Pointwise product of factors f 1 and f 2 : f 1 (x 1,...,x j,y 1,...,y k ) f 2 (y 1,...,y k,z 1,...,z l )=f(x 1,...,x j,y 1,...,y k,z 1,...,z l ) E.g. f 1 (a, b) f 2 (b, c) =f(a, b, c) : a b f 1 (a, b) b c f 2 (b, c) a b c f(a, b, c) T T.3 T T.2 T T T.3 *.2 T F.7 T F.8 T T F.3 *.8 F T.9 F T.6 T F T.7 *.6 F F.1 F F.4 T F F.7 *.4 F T T.9 *.2 F T F.9 *.8 F F T.1 *.6 F F F.1 *.4 F. C. Langbein, Artificial Intelligence IV. Uncertain Knowledge and Reasoning; 4. Inference in Bayesian Networks 7

Summing Out Summing out a variable from a product of factors Move any constant factors outside the summation Add up sub-matrices in pointwise product of remaining factors f 1 f k = f 1 f l f l+1 f k = f 1 f l f X x E.g. a f(a, b, c) =f a(b, c) : x a b c f(a, b, c) b c f a (b, c) T T T.3 *.2 T T.3 *.2 +.9 *.2 T T F.3 *.8 T F.3 *.8 +.9 *.8 T F T.7 *.6 F T.7 *.6 +.1 *.6 T F F.7 *.4 F F.7 *.4 +.1 *.4 F T T.9 *.2 F T F.9 *.8 F F T.1 *.6 F F F.1 *.4 F. C. Langbein, Artificial Intelligence IV. Uncertain Knowledge and Reasoning; 4. Inference in Bayesian Networks 8

Variable elimination algorithm function Elimination-Ask(X, e, bn) returns a distribution over X inputs: X, the query variable e, evidence specified as an event bn, a belief network specifying joint distribution P(X 1,..., X n ) factors [ ]; vars Reverse(Vars[bn]) for each var in vars do factors [Make-Factor(var, e) factors] if var is a hidden variable then factors Sum-Out(var, factors) return Normalize(Pointwise-Product(factors)) Chapter 14.4 5 9

Irrelevant variables Consider the query P (JohnCalls Burglary = true) P (J b) = αp (b) P (e) P (a b, e)p (J a) P (m a) e a m Sum over m is identically 1; M is irrelevant to the query B J A E M Thm 1: Y is irrelevant unless Y Ancestors({X} E) Here, X = JohnCalls, E = {Burglary}, and Ancestors({X} E) = {Alarm, Earthquake} so MaryCalls is irrelevant (Compare this to backward chaining from the query in Horn clause KBs) Chapter 14.4 5 10

L L L L Complexity of exact inference Singly connected networks (or polytrees): any two nodes are connected by at most one (undirected) path time and space cost of variable elimination are O(d k n) Multiply connected networks: can reduce 3SAT to exact inference NP-hard equivalent to counting 3SAT models #P-complete 0.5 0.5 0.5 0.5 A B C D 1. A v B v C 2. C v D v A 1 2 3 3. B v C v D AND Chapter 14.4 5 12