Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

Similar documents
Bayesian networks. Chapter Chapter

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Bayesian networks: Modeling

CS Belief networks. Chapter

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Introduction to Artificial Intelligence Belief networks

Outline } Conditional independence } Bayesian networks: syntax and semantics } Exact inference } Approximate inference AIMA Slides cstuart Russell and

Outline. Bayesian networks. Example. Bayesian networks. Example contd. Example. Syntax. Semantics Parameterized distributions. Chapter 14.

Example. Bayesian networks. Outline. Example. Bayesian networks. Example contd. Topology of network encodes conditional independence assertions:

Graphical Models - Part I

Belief networks Chapter 15.1í2. AIMA Slides cæstuart Russell and Peter Norvig, 1998 Chapter 15.1í2 1

Course Overview. Summary. Outline

Belief Networks for Probabilistic Inference

Bayesian networks. Chapter 14, Sections 1 4

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Graphical Models - Part II

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Bayesian Networks. Philipp Koehn. 6 April 2017

Bayesian Networks. Philipp Koehn. 29 October 2015

Review: Bayesian learning and inference

Bayesian networks. Chapter Chapter

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Informatics 2D Reasoning and Agents Semester 2,

Probabilistic Reasoning Systems

14 PROBABILISTIC REASONING

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Bayesian networks. AIMA2e Chapter 14

Probabilistic representation and reasoning

Probabilistic representation and reasoning

PROBABILISTIC REASONING SYSTEMS

Lecture 10: Bayesian Networks and Inference

Another look at Bayesian. inference

Rational decisions From predicting the effect to optimal intervention

Directed Graphical Models

Rational decisions. Chapter 16. Chapter 16 1

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Bayesian Networks. Material used

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Rational decisions. Chapter 16 1

Quantifying uncertainty & Bayesian networks

Uncertainty and Bayesian Networks

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Bayesian Belief Network

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CSE 473: Artificial Intelligence Autumn 2011

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Probabilistic Models. Models describe how (a portion of) the world works

PROBABILISTIC REASONING Outline

Bayesian Networks. Motivation

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Artificial Intelligence Bayes Nets: Independence

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Bayesian networks (1) Lirong Xia

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

BAYESIAN NETWORKS AIMA2E CHAPTER (SOME TOPICS EXCLUDED) AIMA2e Chapter (some topics excluded) 1

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

CS 188: Artificial Intelligence Spring Announcements

Bayes Nets: Independence

Introduction to Artificial Intelligence. Unit # 11

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Bayesian Networks BY: MOHAMAD ALSABBAGH

COMP5211 Lecture Note on Reasoning under Uncertainty

CS 343: Artificial Intelligence

Learning Bayesian Networks (part 1) Goals for the lecture

School of EECS Washington State University. Artificial Intelligence

Decision. AI Slides (5e) c Lin

Probabilistic Models

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Artificial Intelligence Bayesian Networks

Inference in Bayesian Networks

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

CS 188: Artificial Intelligence Fall 2009

last two digits of your SID

Local learning in probabilistic networks with hidden variables

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

CS 188: Artificial Intelligence Spring Announcements

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

Quantifying Uncertainty

Directed Graphical Models or Bayesian Networks

Probabilistic Representation and Reasoning

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

Y. Xiang, Inference with Uncertain Knowledge 1

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Reasoning Under Uncertainty

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Directed and Undirected Graphical Models

제 7 장 : 베이즈넷과확률추론 ( 교재 ) 장병탁, 인공지능개론, 2017 장병탁 서울대학교컴퓨터공학부 ( 인공지능강의슬라이드 )

FIRST ORDER LOGIC AND PROBABILISTIC INFERENCING. ECE457 Applied Artificial Intelligence Page 1

Foundations of Artificial Intelligence

Transcription:

Bayesian networks Chapter 14.1 3 AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 1

Outline Syntax Semantics Parameterized distributions AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 2

Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link directly influences ) a conditional distribution for each node given its parents: P(X i Parents(X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 3

Example Topology of network encodes conditional independence assertions: Weather Cavity Toothache Catch Weather is independent of the other variables Toothache and Catch are conditionally independent given Cavity AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 4

Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 5

Example contd. Burglary P(B).001 Earthquake P(E).002 B E P(A B,E) T T F F T F T F.95.94.29.001 Alarm JohnCalls A T F P(J A).90.05 MaryCalls A T F P(M A).70.01 AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 6

Compactness A CPT for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values B E Each row requires one number p for X i = true (the number for X i =false is just 1 p) J A M If each variable has no more than k parents, the complete network requires O(n 2 k ) numbers I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5 1 = 31) AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 7

Global semantics Global semantics defines the full joint distribution as the product of the local conditional distributions: P(x 1,...,x n ) = Π n i = 1P(x i parents(x i )) e.g., P(j m a b e) = B J A E M AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 8

Global semantics Global semantics defines the full joint distribution as the product of the local conditional distributions: P(x 1,...,x n ) = Π n i = 1P(x i parents(x i )) e.g., P(j m a b e) = P(j a)p(m a)p(a b, e)p( b)p( e) = 0.9 0.7 0.001 0.999 0.998 0.00063 B J A E M AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 9

Local semantics Local semantics: each node is conditionally independent of its nondescendants given its parents U 1... U m Z 1j X Z nj Y 1... Y n Theorem: Local semantics global semantics AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 10

Markov blanket Each node is conditionally independent of all others given its Markov blanket: parents + children + children s parents U 1... U m Z 1j X Z nj Y 1... Y n AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 11

Constructing Bayesian networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics 1. Choose an ordering of variables X 1,...,X n 2. For i = 1 to n add X i to the network select parents from X 1,...,X i 1 such that P(X i Parents(X i )) = P(X i X 1,..., X i 1 ) This choice of parents guarantees the global semantics: P(X 1,...,X n ) = Π n i =1P(X i X 1,..., X i 1 ) (chain rule) = Π n i =1P(X i Parents(X i )) (by construction) AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 12

Example Suppose we choose the ordering M, J, A, B, E MaryCalls JohnCalls P(J M) = P(J)? AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 13

Example Suppose we choose the ordering M, J, A, B, E MaryCalls JohnCalls Alarm P(J M) = P(J)? No P(A J, M) = P(A J)? P(A J, M) = P(A)? AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 14

Example Suppose we choose the ordering M, J, A, B, E MaryCalls JohnCalls Alarm Burglary P(J M) = P(J)? No P(A J, M) = P(A J)? P(A J, M) = P(A)? P(B A, J, M) = P(B A)? P(B A, J, M) = P(B)? No AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 15

Example Suppose we choose the ordering M, J, A, B, E MaryCalls JohnCalls Alarm Burglary Earthquake P(J M) = P(J)? No P(A J, M) = P(A J)? P(A J, M) = P(A)? P(B A, J, M) = P(B A)? Yes P(B A, J, M) = P(B)? No P(E B,A, J, M) = P(E A)? P(E B,A, J, M) = P(E A, B)? No AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 16

Example Suppose we choose the ordering M, J, A, B, E MaryCalls JohnCalls Alarm Burglary Earthquake P(J M) = P(J)? No P(A J, M) = P(A J)? P(A J, M) = P(A)? P(B A, J, M) = P(B A)? Yes P(B A, J, M) = P(B)? No P(E B,A, J, M) = P(E A)? No P(E B,A, J, M) = P(E A, B)? Yes No AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 17

Example contd. MaryCalls JohnCalls Alarm Burglary Earthquake Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Assessing conditional probabilities is hard in noncausal directions Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 18

Example: Car diagnosis Initial evidence: car won t start Testable variables (green), broken, so fix it variables (orange) Hidden variables (gray) ensure sparse structure, reduce parameters battery age alternator broken fanbelt broken battery dead no charging battery meter battery flat no oil no gas fuel line blocked starter broken lights oil light gas gauge car won t start dipstick AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 19

Example: Car insurance Age GoodStudent RiskAversion SeniorTrain SocioEcon Mileage VehicleYear ExtraCar DrivingSkill MakeModel DrivingHist Antilock DrivQuality Airbag CarValue HomeBase AntiTheft Ruggedness Accident OwnDamage Theft Cushioning OtherCost OwnCost MedicalCost LiabilityCost PropertyCost AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 20

Compact conditional distributions CPT grows exponentially with number of parents CPT becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case: X = f(parents(x)) for some function f E.g., Boolean functions NorthAmerican Canadian US Mexican E.g., numerical relationships among continuous variables Level t = inflow + precipitation - outflow - evaporation AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 21

Compact conditional distributions contd. Noisy-OR distributions model multiple noninteracting causes 1) Parents U 1...U k include all causes (can add leak node) 2) Independent failure probability q i for each cause alone P(X U 1...U j, U j+1... U k ) = 1 Π j i = 1q i Cold F lu M alaria P(F ever) P( F ever) F F F 0.0 1.0 F F T 0.9 0.1 F T F 0.8 0.2 F T T 0.98 0.02 = 0.2 0.1 T F F 0.4 0.6 T F T 0.94 0.06 = 0.6 0.1 T T F 0.88 0.12 = 0.6 0.2 T T T 0.988 0.012 = 0.6 0.2 0.1 Number of parameters linear in number of parents AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 22

Hybrid (discrete+continuous) networks Discrete (Subsidy? and Buys?); continuous (Harvest and Cost) Subsidy? Harvest Cost Buys? Option 1: discretization possibly large errors, large CPTs Option 2: finitely parameterized canonical families 1) Continuous variable, discrete+continuous parents (e.g., Cost) 2) Discrete variable, continuous parents (e.g., Buys?) AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 23

Continuous child variables Need one conditional density function for child variable given continuous parents, for each possible assignment to discrete parents Most common is the linear Gaussian model, e.g.,: P(Cost = c Harvest = h,subsidy? =true) = N(a t h + b t, σ t )(c) 1 = exp 1 2 c (a t h + b t ) σ t 2π 2 σ t Mean Cost varies linearly with Harvest, variance is fixed Linear variation is unreasonable over the full range but works OK if the likely range of Harvest is narrow AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 24

Continuous child variables P(Cost Harvest,Subsidy?=true) 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 5 Cost 10 0 5 10 Harvest All-continuous network with LG distributions full joint distribution is a multivariate Gaussian Discrete+continuous LG network is a conditional Gaussian network i.e., a multivariate Gaussian over all continuous variables for each combination of discrete variable values AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 25

Discrete variable w/ continuous parents Probability of Buys? given Cost should be a soft threshold: 1 0.8 P(Buys?=false Cost=c) 0.6 0.4 0.2 0 0 2 4 6 8 10 12 Cost c Probit distribution uses integral of Gaussian: Φ(x) = x N(0, 1)(x)dx P(Buys? = true Cost = c) = Φ(( c + µ)/σ) AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 26

1. It s sort of the right shape Why the probit? 2. Can view as hard threshold whose location is subject to noise Cost Cost Noise Buys? AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 27

Discrete variable contd. Sigmoid (or logit) distribution also used in neural networks: P(Buys? =true Cost = c) = 1 1 + exp( 2 c+µ σ ) Sigmoid has similar shape to probit but much longer tails: P(Buys?=false Cost=c) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 Cost c AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 28

Summary Bayes nets provide a natural representation for (causally induced) conditional independence Topology + CPTs = compact representation of joint distribution Generally easy for (non)experts to construct Canonical distributions (e.g., noisy-or) = compact representation of CPTs Continuous variables parameterized distributions (e.g., linear Gaussian) AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.1 3 29