Bayesian Networks. Material used

Similar documents
Probabilistic Reasoning Systems

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Bayesian networks. Chapter 14, Sections 1 4

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

PROBABILISTIC REASONING SYSTEMS

CS Belief networks. Chapter

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Bayesian networks. Chapter Chapter

Review: Bayesian learning and inference

Quantifying uncertainty & Bayesian networks

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Introduction to Artificial Intelligence Belief networks

Outline } Conditional independence } Bayesian networks: syntax and semantics } Exact inference } Approximate inference AIMA Slides cstuart Russell and

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

Informatics 2D Reasoning and Agents Semester 2,

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Graphical Models - Part I

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

14 PROBABILISTIC REASONING

Bayesian networks. Chapter Chapter

Bayesian Belief Network

Introduction to Artificial Intelligence. Unit # 11

Bayesian Networks. Motivation

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Bayesian networks: Modeling

Probabilistic representation and reasoning

CSE 473: Artificial Intelligence Autumn 2011

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Y. Xiang, Inference with Uncertain Knowledge 1

Uncertainty and Bayesian Networks

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Spring Announcements

Another look at Bayesian. inference

CS 5522: Artificial Intelligence II

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Learning Bayesian Networks (part 1) Goals for the lecture

CS 343: Artificial Intelligence

Probabilistic Representation and Reasoning

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

Probabilistic representation and reasoning

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Bayesian networks (1) Lirong Xia

Probabilistic Models

Directed Graphical Models

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

CS 188: Artificial Intelligence Fall 2009

Bayesian Networks. Philipp Koehn. 6 April 2017

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Probabilistic Models. Models describe how (a portion of) the world works

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. C P (D = t) f 0.9 t 0.

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Bayesian Networks. Philipp Koehn. 29 October 2015

CS Lecture 3. More Bayesian Networks

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CMPSCI 240: Reasoning about Uncertainty

COMP5211 Lecture Note on Reasoning under Uncertainty

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Graphical Models - Part II

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Belief networks Chapter 15.1í2. AIMA Slides cæstuart Russell and Peter Norvig, 1998 Chapter 15.1í2 1

Example. Bayesian networks. Outline. Example. Bayesian networks. Example contd. Topology of network encodes conditional independence assertions:

Outline. Bayesian networks. Example. Bayesian networks. Example contd. Example. Syntax. Semantics Parameterized distributions. Chapter 14.

Rapid Introduction to Machine Learning/ Deep Learning

PROBABILISTIC REASONING Outline

Artificial Intelligence Bayesian Networks

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

School of EECS Washington State University. Artificial Intelligence

Reasoning with Bayesian Networks

Bayesian belief networks

Belief Networks for Probabilistic Inference

Course Overview. Summary. Outline

Probabilistic Reasoning

Reasoning Under Uncertainty

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Bayesian Networks. Axioms of Probability Theory. Conditional Probability. Inference by Enumeration. Inference by Enumeration CSE 473

Bayesian Networks BY: MOHAMAD ALSABBAGH

COMP538: Introduction to Bayesian Networks

Reasoning Under Uncertainty

CS 5522: Artificial Intelligence II

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Foundations of Artificial Intelligence

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Quantifying Uncertainty

Bayesian belief networks. Inference.

Bayes Nets: Independence

FIRST ORDER LOGIC AND PROBABILISTIC INFERENCING. ECE457 Applied Artificial Intelligence Page 1

CS 188: Artificial Intelligence Spring Announcements

Transcription:

Bayesian Networks Material used Halpern: Reasoning about Uncertainty. Chapter 4 Stuart Russell and Peter Norvig: Artificial Intelligence: A Modern Approach 1 Random variables 2 Probabilistic independence 3 Belief networks 4 Global and local semantics 5 Constructing belief networks 6 Inference in belief networks

1 Random variables Suppose that a coin is tossed five times. What is the total number of heads? Intuitively, it is a variable because its value varies, and it is random because its value is unpredictable in a certain sense Formally, a random variable is neither random nor a variable Definition 1: A random variable X on a sample space (set of possible worlds) W is a function from W to some range (e.g. the natural numbers)

Example A coin is tossed five times: W = {h,t} 5. NH(w) = {i: w[i] = h} (number of heads in seq. w) NH(hthht) = 3 Question: what is the probability of getting three heads in a sequence of five tosses? µ(nh = 3) = def µ({w: NH(w) = 3}) µ(nh = 3) = 10 2-5 = 10/32

Why are random variables important? They provide a tool for structuring possible worlds A world can often be completely characterized by the values taken on by a number of random variables Example: W = {h,t} 5, each world can be characterized by 5 random variables X 1, X 5 where X i designates the outcome of the ith tosses: X i (w) = w[i] an alternative way is in terms of Boolean random variables, e.g. H i : H i (w) = 1 if w[i] = h, H i (w) = 0 if w[i] = t. use the random variables H i (w) for constructing a new random variable that expresses the number of tails in 5 tosses

2 Probabilistic Independence If two events U and V are independent (or unrelated) then learning U should not affect he probability of V and learning V should not affect the probability of U. Definition 2: U and V are absolutely independent (with respect to a probability measure µ) if µ(v) 0 implies µ(u V) = µ(u) and µ(u) 0 implies µ(v U) = µ(v) Fact 1: the following are equivalent a. µ(v) 0 implies µ(u V) = µ(u) b. µ(u) 0 implies µ(v U) = µ(v) c. µ(u V) = µ(u) µ(v)

Absolute independence for random variables Definition 3: Two random variables X and Y are absolutely independent (with respect to a probability measure µ) iff for all x Value(X) and all y Value(Y) the event X = x is absolutely independent of the event Y = y. Notation: I µ (X,Y) Definition 4: n random variables X 1 X n are absolutely independent iff for all i, x 1,, x n, the events X i = x i and j i (X i =x i ) are absolutely independent. Fact 2: If n random variables X 1 X n are absolutely independent then µ(x 1 = x 1, X n = x n ) = Π i µ(x i = x i ). Absolute independence is a very strong requirement, seldom met

Conditional independence: example Example: Dentist problem with three events: Toothache (I have a toothache) Cavity (I have a cavity) Catch (steel probe catches in my tooth) If I have a cavity, the probability that the probe catches in it does not depend on whether I have a toothache i.e. Catch is conditionally independent of Toothache given Cavity:I µ (Catch, Toothache Cavity) µ(catch Toothache Cavity) = µ(catch Cavity)

Conditional independence for events Definition 5: A and B are conditionally independent given C if µ(b C) 0 implies µ(a B C) = µ(a C) and µ(a C) 0 implies µ(b A C) = µ(b C) Fact 3: the following are equivalent if µ(c) 0 a. µ(a B C) 0 implies µ(a B C)= µ(a C) b. µ(b A C) 0 implies µ(b A C)= µ(b C) c. µ(a B C)= µ(a C) µ(b C)

Conditional independence for random variables Definition 6: Two random variables X and Y are conditionally independ. given a random variable Z iff for all x Value(X), y Value(Y) and z Value(z) the events X = x and Y = y are conditionally independent given the event Z = z. Notation: I µ (X,Y Z) Important Notation: Instead of (*) µ(x=x Y=y Z=z)= µ(x=x Z=z) µ(y=y Z=z) we simply write (**) µ(x,y Z) = µ(x Z) µ(y Z) Question: How many equations are represented by (**)?

Dentist problem with random variables Assume three binary (Boolean) random variables Toothache, Cavity, and Catch Assume that Catch is conditionally independent of Toothache given Cavity The full joint distribution can now be written as µ(toothache, Catch, Cavity) = µ(toothache, Catch Cavity) µ(cavity) = µ(toothache Cavity) µ(catch Cavity) µ(cavity) In order to express the full joint distribution we need 2+2+1 = 5 independent numbers instead of 7! 2 are removed by the statement of conditional independence: µ(toothache, Catch Cavity) = µ(toothache Cavity) µ(catch Cavity)

3 Belief networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distribution. Syntax: a set of nodes, one per random variable a directed, acyclic graph (link directly influences ) a conditional distribution for each node given its parents µ(x i Parents(X i )) Conditional distributions are represented by conditional probability tables (CPT)

The importance of independency statements n binary nodes, fully connected n binary nodes each node max. 3 parents 2 n -1 independent numbers less than 2 3 n independent numbers

The earthquake example You have a new burglar alarm installed It is reliable about detecting burglary, but responds to minor earthquakes Two neighbors (John, Mary) promise to call you at work when they hear the alarm John always calls when hears alarm, but confuses alarm with phone ringing (and calls then also) Mary likes loud music and sometimes misses alarm! Given evidence about who has and hasn t called, estimate the probability of a burglary

The network I mat work, John calls to say my alarm is ringing, Mary doesn t call. Is there a burglary? 5 Variables network topology reflects causal knowledge

4 Global and local semantics Global semantics (corresponding to Halpern s quantitative Bayesian network) defines the full joint distribution as the product of the local conditional distributions For defining this product, a linear ordering of the nodes of the network has to be given: X 1 X n µ(x 1 X n ) = Π n i=1 µ(x i Parents(X i )) ordering in the example: B, E, A, J, M µ(j M A B E) = µ( B) µ( E) µ(a B E) µ(j A) µ(m A)

Local semantics Local semantics (corresponding to Halpern s qualitative Bayesian network) defines a series of statements of conditional independence Each node is conditionally independent of its nondescendants given its parents: I µ (X, Nondescentents(X) Parents(X)) Examples X Y Z I µ (X, Y)? I µ (X, Z)? X Y Z I µ (X, Z Y)? X Y Z I µ (X, Y)? I µ (X, Z)?

The chain rule µ(x, Y, Z) = µ(x) µ(y, Z X)= µ(x) µ(y X) µ(z X, Y) In general: µ(x 1,, X n ) = Π n i=1 µ(x i X 1,, X i 1 ) a linear ordering of the nodes of the network has to be given: X 1,, X n The chain rule is used to prove the equivalence of local and global semantics

Local and global semantics are equivalent If a local semantics in form of the independeny statements is given, i.e. I µ (X, Nondescendants(X) Parents(X)) for each node X of the network, then the global semantics results: µ(x 1 X n ) = Π n µ(x i=1 i Parents(X i )), and vice versa. For proving local semantics global semantics, we assume an ordering of the variables that makes sure that parents appear earlier in the ordering: X i parent of X j then X i < X j

Local semantics global semantics µ(x 1,, X n ) = Π n µ(x i=1 i X 1,, X i 1 ) chain rule Parents(X i ) {X 1,, X i 1 } µ(x i X 1,, X i 1 ) = µ(x i Parents(X i ), Rest) local semantics:i µ (X, Nondescendants(X) Parents(X)) The elements of Rest are nondescendants of X i, hence we can skip Rest Hence, µ(x 1 X n ) = Π n µ(x i=1 i Parents(X i )),

5 Constructing belief networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics 1. Chose an ordering of variables X 1,, X n 2. For i = 1 to n add X i to the network select parents from X 1,, X i 1 such that µ(x i Parents(X i )) = µ(x i X 1,, X i 1 ) This choice guarantees the global semantics: µ(x 1,, X n ) = Π n µ(x i=1 i X 1,, X i 1 ) (chain rule) = Π n µ(x i=1 i Parents(X i )) by construction

Earthquake example with canonical ordering What is an appropriate ordering? In principle, each ordering is allowed! heuristic rule: start with causes, go to direct effects (B, E), A, (J, M) [4 possible orderings]

Earthquake example with noncanonical ordering Suppose we chose the ordering M, J, A, B, E MaryCalls JohnCalls Alarm Burglary µ(j M) = µ(j)? No µ(a J,M) = µ(a J)? µ(a J,M) = µ(a)? µ(b A,J,M) = µ(b A)? Yes µ(b A,J,M) = µ(b)? No µ(e B, A,J,M) = µ(e A)? No µ(e B,A,J,M) = µ(e A,B)? Yes Earthquake No

6 Inference in belief networks Types of inference: Q quary variable, E evidence variable

Kinds of inference