Bayesian networks (1) Lirong Xia

Similar documents
Probabilistic Models. Models describe how (a portion of) the world works

CSE 473: Artificial Intelligence Autumn 2011

Probabilistic Models

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS 5522: Artificial Intelligence II

CS 188: Artificial Intelligence Spring Announcements

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS 188: Artificial Intelligence Fall 2009

CS 343: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CS 188: Artificial Intelligence Fall 2008

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Bayes Nets: Independence

CS 5522: Artificial Intelligence II

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Artificial Intelligence Bayes Nets: Independence

Product rule. Chain rule

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Bayes Nets. CS 188: Artificial Intelligence Fall Example: Alarm Network. Bayes Net Semantics. Building the (Entire) Joint. Size of a Bayes Net

Review: Bayesian learning and inference

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Our Status. We re done with Part I Search and Planning!

Announcements. CS 188: Artificial Intelligence Fall Example Bayes Net. Bayes Nets. Example: Traffic. Bayes Net Semantics

CS 188: Artificial Intelligence. Our Status in CS188

CS 188: Artificial Intelligence Fall 2009

Quantifying uncertainty & Bayesian networks

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probabilistic Reasoning Systems

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

CSE 473: Artificial Intelligence

Bayes Nets III: Inference

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Informatics 2D Reasoning and Agents Semester 2,

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

CS 5522: Artificial Intelligence II

Another look at Bayesian. inference

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

Recap: Bayes Nets. CS 473: Artificial Intelligence Bayes Nets: Independence. Conditional Independence. Bayes Nets. Independence in a BN

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian networks. Chapter 14, Sections 1 4

CMPSCI 240: Reasoning about Uncertainty

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Bayesian networks. Chapter Chapter

Graphical Models - Part I

Uncertainty. Chapter 13

Artificial Intelligence Uncertainty

Probability, Markov models and HMMs. Vibhav Gogate The University of Texas at Dallas

Probabilistic representation and reasoning

Uncertainty. Outline

CSEP 573: Artificial Intelligence

Artificial Intelligence

PROBABILISTIC REASONING SYSTEMS

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Bayesian networks. Chapter Chapter

Directed Graphical Models

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

Uncertainty and Bayesian Networks

Introduction to Artificial Intelligence Belief networks

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Introduction to Artificial Intelligence. Unit # 11

CS Belief networks. Chapter

Bayesian networks: Modeling

Bayesian Networks. Motivation

Random Variables. A random variable is some aspect of the world about which we (may) have uncertainty

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence Fall 2011

14 PROBABILISTIC REASONING

Uncertainty. Chapter 13

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

PROBABILISTIC REASONING Outline

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Y. Xiang, Inference with Uncertain Knowledge 1

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Probabilistic representation and reasoning

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

13.4 INDEPENDENCE. 494 Chapter 13. Quantifying Uncertainty

School of EECS Washington State University. Artificial Intelligence

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

ARTIFICIAL INTELLIGENCE. Uncertainty: probabilistic reasoning

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayesian Networks. Material used

Probabilistic Reasoning

Uncertainty. Russell & Norvig Chapter 13.

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Probability and Decision Theory

Bayesian Networks Representation

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Reasoning Under Uncertainty

Transcription:

Bayesian networks (1) Lirong Xia

Random variables and joint distributions Ø A random variable is a variable with a domain Random variables: capital letters, e.g. W, D, L values: small letters, e.g. w, d, l Ø A joint distribution over a set of random variables: X 1, X 2,, X n specifies a real number for each assignment (or outcome) p(x 1 = x 1, X 2 = x 2,, X n = x n ) p(x 1, x 2,, x n ) Ø This is a special (structured) probability space Sample space Ω: all combinations of values probability mass function is p Ø A probabilistic model is a joint distribution over a set of random variables will be our focus of this course ptw, ( ) T W p hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 2

Marginal Distributions Ø Marginal distributions are sub-tables which eliminate variables Ø Marginalization (summing out): combine collapsed rows by adding p X = x = p X = x, X = x ( ) ( ) 1 1 1 1 2 2 x 2 ptw, ( ) T W p hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 T: hot cold W: sun rain 0.6 0.4 0.5 0.4 0.1 0.5 0.2 0.3 3

Conditional Distributions Ø Conditional distributions are probability distributions over some variables given fixed values of others pwt ( ) Conditional Distributions ( = hot) pwt W p sun 0.8 rain 0.2 ( = cold) pwt W p sun 0.4 rain 0.6 Joint Distributions ptw, ( ) T W p hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 4

Independence Ø Two variables are independent in a joint distribution if for all x,y, the events X=x and Y=y are independent: p X, Y = p X p Y ( ) ( ) ( ) xypxy,, = px p y ( ) ( ) ( ) The joint distribution factors into a product of two simple ones Usually variables aren t independent! 5

The Chain Rule Ø Write any joint distribution as an incremental product of conditional distributions ( ) = p( x ) 1 p x 2 x 1 p x 1, x 2, x 3 Ø Why is this always true? Key: p(a B)=p(A,B)/p(B) ( ) p( x 3 x 1, x ) 2 ( ) = p x i x 1 x i 1 p x 1, x 2, x n i ( ) 6

Today s schedule ØConditional independence ØBayesian networks definitions independence 7

Conditional Independence among random variables Ø p(toothache, Cavity, Catch) Ø If I don t have a cavity, the probability that the probe catches in it doesn t depend on whether I have a toothache: p(+catch +Toothache,-Cavity) = p(+catch -Cavity) Ø The same independence holds if I have a cavity: p(+catch +Toothache,+Cavity) = p(+catch +Cavity) Ø Catch is conditionally independent of toothache given cavity: p(catch Toothache,Cavity) = p(catch Cavity) Ø Equivalent statements: p(toothache Catch,Cavity) = p(toothache Cavity) p(toothache,catch Cavity) = p(toothache Cavity) p(catch Cavity) One can be derived from the other easily (part of Homework 1) 8

Conditional Independence Ø Unconditional (absolute) independence very rare Ø Conditional independence is our most basic and robust form of knowledge about uncertain environments: Ø Definition: X and Y are conditionally independent given Z, if x, y, z: p(x, y z)=p(x z) p(y z) or equivalently, x, y, z: p(x z, y)=p(x z) X, Y, Z are random variables written as X Y Z Ø Brain teaser: in a probabilistic model with three random variables XYZ If X and Y are independent, can we say X and Y are conditionally independent given Z If X and Y are conditionally independent given Z, can we say X and Y are independent? Bonus questions in Homework 1 9

The Chain Rule Ø p(x 1,, X n )=p(x 1 ) p(x 2 X 1 ) p(x 3 X 1, X 2 ) Ø Trivial decomposition: p(catch, Cavity, Toothache) = p(cavity) p(catch Cavity) p(toothache Catch,Cavity) Ø With assumption of conditional independence: Toothache Catch Cavity p(toothache, Catch, Cavity) = p(cavity) p(catch Cavity) p(toothache Cavity) Ø Bayesian networks/ graphical models help us express conditional independence assumptions 10

Bayesian networks: Big Picture Ø Using full joint distribution tables Representation: n random variables, at least 2 n entries Computation: hard to learn (estimate) anything empirically about more than a few variables at a time S T W p summer hot sun 0.30 summer hot rain 0.05 summer cold sun 0.10 summer cold rain 0.05 winter hot sun 0.10 winter hot rain 0.05 winter cold sun 0.15 winter cold rain 0.20 Bayesian networks: a technique for describing complex joint distributions (models) using simple, local distributions (conditional probabilities) More properly called graphical models We describe how variables locally interact Local interactions chain together to give global, indirect interactions 11

Example Bayesian networks: Car Ø Initial observation: car won t start Ø Orange: broken nodes Ø Green: testable evidence Ø Gray: hidden variables to ensure sparse structure, reduce parameters 12

Graphical Model Notation Ø Nodes: variables (with domains) Ø Arcs: interactions Indicate direct influence between variables Formally: encode conditional independence (more later) Ø For now: imagine that arrows mean direct causation (in general, they don t!) 13

Example: Coin Flips Ø n independent coin flips (different coins) Ø No interactions between variables: independence Really? How about independent flips of the same coin? How about a skillful coin flipper? Bottom line: build an application-oriented model 14

Example: Traffic ØVariables: R: It rains T: There is traffic ØModel 1: independence ØModel 2: rain causes traffic ØWhich model is better? 15

Example: Burglar Alarm Network ØVariables: B: Burglary A: Alarm goes off M: Mary calls J: John calls E: Earthquake! 16

Bayesian network Ø Definition of Bayesian network (Bayes net or BN) Ø A set of nodes, one per variable X Ø A directed, acyclic graph Ø A conditional distribution for each node A collection of distributions over X, one for each combination of parents values p(x a 1,, a n ) CPT: conditional probability table Description of a noisy causal process p( X A 1 A ) n A Bayesian network = Topology (graph) + Local Conditional Probabilities 17

Probabilities in BNs Ø Bayesian networks implicitly encode joint distributions As a product of local conditional distributions Example: p x 1, x 2, x n n i=1 ( ) ( ) = p x i parents( X ) i ( ) p +Cavity, +Catch, -Toothache Ø This lets us reconstruct any entry of the full joint Ø Not every BN can represent every joint distribution The topology enforces certain conditional independencies 18

Example: Coin Flips p( X ) ( ) 1 p X ( ) 2 p X n h 0.5 t 0.5 h 0.5 t 0.5 h 0.5 t 0.5 phhth= (,,, ) Only distributions whose variables are absolutely independent can be represented by a Bayesian network with no arcs. 19

Example: Traffic p R ( ) +r 0.25 -r 0.75 p + r, t = ( ) ( ) ptr +r +t 0.75 +r -t 0.25 -r +t 0.50 -r -t 0.50 20

Example: Alarm Network B p(b) +b 0.001 -b 0.999 E p(e) +e 0.002 -e 0.998 B E A p(a B,E) A J p(j A) +a +j 0.9 +a -j 0.1 -a +j 0.05 -a -j 0.95 A M p(m A) +a +m 0.7 +a -m 0.3 -a +m 0.01 -a -m 0.99 +b +e +a 0.95 +b +e -a 0.05 +b -e +a 0.94 +b -e -a 0.06 -b +e +a 0.29 -b +e -a 0.71 -b -e +a 0.001 -b -e -a 0.999 21

Size of a Bayesian network Ø How big is a joint distribution over N Boolean variables? 2 N Ø How big is an N-node net if nodes have up to k parents? O(N 2 k+1 ) Ø Both give you the power to calculate p(x 1,, X n ) Ø BNs: Huge space savings! Ø Also easier to elicit local CPTs Ø Also turns out to be faster to answer queries 22

Bayesian networks ØSo far: how a Bayesian network encodes a joint distribution ØNext: how to answer queries about that distribution Key idea: conditional independence Main goal: answer queries about conditional independence and influence from the graph ØAfter that: how to answer numerical queries (inference) 23

Conditional Independence in a BN ØImportant question about a BN: Are two nodes independent given certain evidence? If yes, can prove using algebra (tedious in general) If no, can prove with a counter example Example: X: pressure, Y: rain, Z: traffic Question: are X and Z necessarily independent? Answer: no. Example: low pressure causes rain, which causes traffic X can influence Z, Z can influence X (via Y) 24

Causal Chains ØThis configuration is a causal chain p x y z (,, ) = p( x) p( y x) p( z y) X: Low pressure Y: Rain Z: Traffic Is X independent of Z given Y? (, ) p z x y p( x, y, z) = = p( x, y) = p x p y x p z y ( ) ( ) ( ) p( x) p( y x) ( ) Yes! p z y Evidence along the chain blocks the influence 25

Common Cause ØAnother basic configuration: two effects of the same cause Are X and Z independent? Are X and Z independent given Y? (, ) p z x y p( x, y, z) = = p( x, y) = p y p x y p z y ( ) ( ) ( ) p( y) p( x y) ( ) Yes! p z y Y: Project due X: Many Piazza posts Z: No one play games Observing the cause blocks influence between effects. 26

Common Effect ØLast configuration: two causes of one effect (v-structure) Are X and Z independent? Yes: the ballgame and the rain cause traffic, but they are not correlated Still need to prove they must be (try it!) Are X and Z independent given Y? No: seeing traffic puts the rain and the ballgame in competition as explanation? This is backwards from the other cases Observing an effect activates influence between possible causes. X: Raining Z: Ballgame Y: Traffic 27

The General Case ØAny complex example can be analyzed using these three canonical cases ØGeneral question: in a given BN, are two variables independent (given evidence)? ØSolution: analyze the graph 28

Reachability (D-Separation) Ø Question: are X and Y conditionally independent given evidence vars {Z}? Yes, if X and Y separated by Z Look for active paths from X to Y No active paths = independence! Ø A path is active if each triple is active: Causal chain A B C where B is unobserved (either direction) Common cause A B C where B is unobserved Common effect A B C where B or one of its descendents is observed Ø All it takes to block a path is a single inactive segment 29

Example R B Yes! R BT R BT ' 30

Example L T' T Yes! L B Yes! L BT L BT ' L BT, R Yes! 31

Example ØVariables: R: Raining T: Traffic D: Roof drips S: I am sad Ø Questions: T D T D R Yes! T D R, S 32