CS Belief networks. Chapter

Similar documents
Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Outline } Conditional independence } Bayesian networks: syntax and semantics } Exact inference } Approximate inference AIMA Slides cstuart Russell and

Bayesian networks. Chapter Chapter

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

Bayesian networks: Modeling

Introduction to Artificial Intelligence Belief networks

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Outline. Bayesian networks. Example. Bayesian networks. Example contd. Example. Syntax. Semantics Parameterized distributions. Chapter 14.

Example. Bayesian networks. Outline. Example. Bayesian networks. Example contd. Topology of network encodes conditional independence assertions:

Belief networks Chapter 15.1í2. AIMA Slides cæstuart Russell and Peter Norvig, 1998 Chapter 15.1í2 1

Graphical Models - Part I

Belief Networks for Probabilistic Inference

Course Overview. Summary. Outline

Probabilistic Reasoning Systems

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

Graphical Models - Part II

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Bayesian Networks. Philipp Koehn. 29 October 2015

Bayesian Networks. Philipp Koehn. 6 April 2017

Bayesian networks. Chapter Chapter

Bayesian networks. Chapter 14, Sections 1 4

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Probabilistic Reasoning. (Mostly using Bayesian Networks)

14 PROBABILISTIC REASONING

Informatics 2D Reasoning and Agents Semester 2,

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Review: Bayesian learning and inference

Bayesian Networks. Material used

Rational decisions. Chapter 16. Chapter 16 1

Bayesian networks. AIMA2e Chapter 14

Uncertainty and Bayesian Networks

Rational decisions From predicting the effect to optimal intervention

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Rational decisions. Chapter 16 1

Probabilistic representation and reasoning

PROBABILISTIC REASONING SYSTEMS

Probabilistic representation and reasoning

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Another look at Bayesian. inference

Lecture 10: Bayesian Networks and Inference

Directed Graphical Models

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Quantifying uncertainty & Bayesian networks

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Probabilistic Models. Models describe how (a portion of) the world works

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

PROBABILISTIC REASONING Outline

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

CSE 473: Artificial Intelligence Autumn 2011

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Belief Network

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

BAYESIAN NETWORKS AIMA2E CHAPTER (SOME TOPICS EXCLUDED) AIMA2e Chapter (some topics excluded) 1

Uncertainty. 22c:145 Artificial Intelligence. Problem of Logic Agents. Foundations of Probability. Axioms of Probability

Bayesian networks (1) Lirong Xia

Probabilistic Reasoning

Uncertainty. Chapter 13. Chapter 13 1

Bayesian Networks. Motivation

Outline. Uncertainty. Methods for handling uncertainty. Uncertainty. Making decisions under uncertainty. Probability. Uncertainty

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

CS 188: Artificial Intelligence Spring Announcements

Artificial Intelligence Bayesian Networks

Decision. AI Slides (5e) c Lin

Artificial Intelligence Bayes Nets: Independence

CS 188: Artificial Intelligence Fall 2009

Uncertainty. Outline. Probability Syntax and Semantics Inference Independence and Bayes Rule. AIMA2e Chapter 13

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CS 5522: Artificial Intelligence II

Foundations of Artificial Intelligence

Probabilistic Models

Introduction to Artificial Intelligence. Unit # 11

Bayes Nets: Independence

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Reasoning Under Uncertainty

CS 343: Artificial Intelligence

School of EECS Washington State University. Artificial Intelligence

Brief Intro. to Bayesian Networks. Extracted and modified from four lectures in Intro to AI, Spring 2008 Paul S. Rosenbloom

Local learning in probabilistic networks with hidden variables

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

CS 5522: Artificial Intelligence II

COMP5211 Lecture Note on Reasoning under Uncertainty

Uncertainty. Chapter 13

Reasoning Under Uncertainty

Uncertain Knowledge and Reasoning

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayesian belief networks

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

Bayesian belief networks. Inference.

Learning Bayesian Networks (part 1) Goals for the lecture

Bayesian Networks. Alan Ri2er

Bayesian Networks BY: MOHAMAD ALSABBAGH

Outline. CSE 473: Artificial Intelligence Spring Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Inference in Bayesian Networks

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

CMPSCI 240: Reasoning about Uncertainty

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

Transcription:

CS 580 1 Belief networks Chapter 15.1 2

CS 580 2 Outline Conditional independence Bayesian networks: syntax and semantics Exact inference Approximate inference

CS 580 3 Independence Two random variables ABare (absolutely) independent iff P (A B) =P (A) or P (A, B) =P (A B)P (B) =P (A)P (B) e.g., A and B are two coin tosses If n Boolean variables are independent, the full joint is P(X 1,...,X n )=Π i P(X i ) hence can be specified by just n numbers Absolute independence is a very strong requirement, seldom met

CS 580 4 Conditional independence Consider the dentist problem with three random variables: T oothache, Cavity, Catch (steel probe catches in my tooth) The full joint distribution has 2 3 1 = 7 independent entries If I have a cavity, the probability that the probe catches in it doesn t depend on whether I have a toothache: (1) P (Catch T oothache, Cavity) =P (Catch Cavity) i.e., Catch is conditionally independent of T oothache given Cavity The same independence holds if I haven t got a cavity: (2) P (Catch T oothache, Cavity)=P (Catch Cavity)

CS 580 5 Conditional independence contd. Equivalent statements to (1) (1a) P (T oothache Catch,Cavity) =P (T oothache Cavity) Why?? (1b) P (T oothache, Catch Cavity)= P (T oothache Cavity)P (Catch Cavity) Why?? Full joint distribution can now be written as P(T oothache, Catch, Cavity) = P(T oothache, Catch Cavity)P(Cavity) = P(T oothache Cavity)P(Catch Cavity)P(Cavity) i.e., 2 + 2 + 1 = 5 independent numbers (equations 1 and 2 remove 2)

CS 580 6 Conditional independence contd. Equivalent statements to (1) (1a) P (T oothache Catch,Cavity) =P (T oothache Cavity) Why?? P (T oothache Catch,Cavity) = P (Catch T oothache, Cavity)P (T oothache Cavity)/P (Catch Cavity) = P (Catch Cavity)P (T oothache Cavity)/P (Catch Cavity) (from 1) = P (T oothache Cavity) (1b) P (T oothache, Catch Cavity)= P (T oothache Cavity)P (Catch Cavity) Why?? P (T oothache, Catch Cavity) = P (T oothache Catch,Cavity)P (Catch Cavity) (product rule) = P (T oothache Cavity)P (Catch Cavity) (from 1a)

CS 580 7 Belief networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link directly influences ) a conditional distribution for each node given its parents: P(X i Parents(X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT)

CS 580 8 Example I m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn t call. Sometimes it s set off by minor earthquakes. Is there a burglar? Variables: Burglar, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge: Burglary P(B).001 Earthquake P(E).002 Alarm B T T F F E T F T F P(A).95.94.29.001 JohnCalls A T F P(J).90.05 MaryCalls A T F P(M).70.01 Note: k parents O(d k n) numbers vs. O(d n )

CS 580 9 Semantics Global semantics defines the full joint distribution as the product of the local conditional distributions: P(X 1,...,X n )=Π n i =1P(X i Parents(X i )) e.g., P (J M A B E) is given by?? =

CS 580 10 Semantics Global semantics defines the full joint distribution as the product of the local conditional distributions: P(X 1,...,X n )=Π n i =1P(X i Parents(X i )) e.g., P (J M A B E) is given by?? = P ( B)P ( E)P (A B E)P (J A)P (M A) Local semantics: each node is conditionally independent of its nondescendants given its parents Theorem: Local semantics global semantics

CS 580 11 Markov blanket Each node is conditionally independent of all others given its Markov blanket: parents + children + children s parents U 1... U m Z 1j X Z nj Y 1... Y n

CS 580 12 Constructing belief networks Need a method such that a series of locally testable assertions of conditional independence guarantees the required global semantics 1. Choose an ordering of variables X 1,...,X n 2. For i =1ton add X i to the network select parents from X 1,...,X i 1 such that P(X i Parents(X i )) = P(X i X 1,..., X i 1 ) This choice of parents guarantees the global semantics: P(X 1,...,X n )=Π n i =1P(X i X 1,..., X i 1 ) (chain rule) = Π n i =1P(X i Parents(X i )) by construction

CS 580 13 Example Suppose we choose the ordering M, J, A, B, E MaryCalls JohnCalls P (J M) =P (J)?

CS 580 14 Example Suppose we choose the ordering M, J, A, B, E MaryCalls JohnCalls Alarm P (J M) =P (J)? No P (A J, M)=P (A J)? P (A J, M)=P (A)?

CS 580 15 Example Suppose we choose the ordering M, J, A, B, E MaryCalls JohnCalls Alarm Burglary P (J M) =P (J)? No P (A J, M)=P (A J)? P (A J, M)=P (A)? P (B A, J, M) =P (B A)? P (B A, J, M) =P (B)? No

CS 580 16 Example Suppose we choose the ordering M, J, A, B, E MaryCalls JohnCalls Alarm Burglary Earthquake P (J M) =P (J)? No P (A J, M)=P (A J)? P (A J, M)=P (A)? P (B A, J, M) =P (B A)? Yes P (B A, J, M) =P (B)? No P (E B,A,J,M)=P(E A)? P (E B,A,J,M)=P(E A, B)? No

CS 580 17 Example Suppose we choose the ordering M, J, A, B, E MaryCalls JohnCalls Alarm Burglary Earthquake P (J M) =P (J)? No P (A J, M)=P (A J)? P (A J, M)=P (A)? P (B A, J, M) =P (B A)? Yes P (B A, J, M) =P (B)? No P (E B,A,J,M)=P(E A)? No P (E B,A,J,M)=P(E A, B)? Yes No

CS 580 18 Example: Car diagnosis Initial evidence: engine won t start Testable variables (thin ovals), diagnosis variables (thick ovals) Hidden variables (shaded) ensure sparse structure, reduce parameters battery age alternator broken fanbelt broken battery dead no charging battery flat no oil no gas fuel line blocked starter broken lights oil light gas gauge engine won t start

CS 580 19 Example: Car insurance Predict claim costs (medical, liability, property) given data on application form (other unshaded nodes) Age SeniorTrain GoodStudent RiskAversion SocioEcon Mileage VehicleYear ExtraCar DrivingSkill MakeModel DrivingHist Antilock DrivQuality Airbag CarValue HomeBase AntiTheft Ruggedness Accident OwnDamage Theft Cushioning OtherCost OwnCost MedicalCost LiabilityCost PropertyCost

CS 580 20 Compact conditional distributions CPT grows exponentially with no. of parents CPT becomes infinite with continuous-valued parent or child Solution: canonical distributions that are defined compactly Deterministic nodes are the simplest case: X = f(parents(x)) for some function f E.g., Boolean functions N orthamerican Canadian U S M exican E.g., numerical relationships among continuous variables Level t = inflow + precipation - outflow - evaporation

CS 580 21 Compact conditional distributions contd. Noisy-OR distributions model multiple noninteracting causes 1) Parents U 1...U k include all causes (can add leak node) 2) Independent failure probability q i for each cause alone P (X U 1...U j, U j+1... U k )=1 Π j i =1q i Cold F lu Malaria P (Fever) P ( Fever) F F F 0.0 1.0 F F T 0.9 0.1 F T F 0.8 0.2 F T T 0.98 0.02 = 0.2 0.1 T F F 0.4 0.6 T F T 0.94 0.06 = 0.6 0.1 T T F 0.88 0.12 = 0.6 0.2 T T T 0.988 0.012 = 0.6 0.2 0.1 Number of parameters linear in number of parents

CS 580 22 Hybrid (discrete+continuous) networks Discrete (Subsidy? and Buys?); continuous (Harvest and Cost) Subsidy? Harvest Cost Buys? Option 1: discretization possibly large errors, large CPTs Option 2: finitely parameterized canonical families 1) Continuous variable, discrete+continuous parents (e.g., Cost) 2) Discrete variable, continuous parents (e.g., Buys?)

CS 580 23 Continuous child variables Need one conditional density function for child variable given continuous parents, for each possible assignment to discrete parents Most common is the linear Gaussian model, e.g.,: P (Cost= c Harvest= h, Subsidy?=true) = N(a t h + b t,σ t )(c) ( 1 = exp 1 ( ) ) 2 c (at h + b t ) σ t 2π 2 σ t Mean Cost varies linearly with Harvest,varianceisfixed Linear variation is unreasonable over the full range but works OK if the likely range of Harvest is narrow

CS 580 24 Continuous child variables P(Cost Harvest,Subsidy?=true) 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 P(Cost Harvest,Subsidy?=false) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 P(Cost Harvest) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 5 Cost 10 0 5 10 Harvest 0 5 Cost 10 0 5 10 Harvest 0 5 Cost 10 0 5 10 Harvest All-continuous network with LG distributions full joint is a multivariate Gaussian Discrete+continuous LG network is a conditional Gaussian network i.e., a multivariate Gaussian over all continuous variables for each combination of discrete variable values

CS 580 25 Discrete variable w/ continuous parents Probability of Buys? given Cost should be a soft threshold: 1 0.8 P(Buys?=false Cost=c) 0.6 0.4 0.2 0 0 2 4 6 8 10 12 Cost c Probit distribution uses integral of Gaussian: Φ(x) = x N(0, 1)(x)dx P (Buys?=true Cost= c) =Φ(( c + µ)/σ)

CS 580 26 Can view as hard threshold whose location is subject to noise

CS 580 27 Discrete variable contd. Sigmoid (or logit) distribution also used in neural networks: 1 P (Buys?=true Cost= c) = 1+exp( 2 c+µ σ ) Sigmoid has similar shape to probit but much longer tails: P(Buys?=false Cost=c) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 Cost c