Bayesian belief networks. Inference.

Similar documents
Bayesian belief networks

Modeling and reasoning with uncertainty

Probabilistic Reasoning Systems

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Review: Bayesian learning and inference

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks. Motivation

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Introduction to Artificial Intelligence Belief networks

Directed Graphical Models

CS 188: Artificial Intelligence Fall 2009

Quantifying uncertainty & Bayesian networks

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

PROBABILISTIC REASONING SYSTEMS

CS 188: Artificial Intelligence Fall 2008

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 188: Artificial Intelligence Spring Announcements

Bayes Networks 6.872/HST.950

Bayesian networks. Chapter 14, Sections 1 4

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Probabilistic Models

Bayesian networks. Chapter Chapter

Artificial Intelligence Bayesian Networks

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Introduction to Artificial Intelligence. Unit # 11

CS 188: Artificial Intelligence Spring Announcements

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Reasoning Under Uncertainty: Belief Network Inference

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Learning Bayesian Networks (part 1) Goals for the lecture

COMP5211 Lecture Note on Reasoning under Uncertainty

Bayes Nets III: Inference

Learning Bayesian belief networks

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Informatics 2D Reasoning and Agents Semester 2,

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Lecture 8: Bayesian Networks

CS 343: Artificial Intelligence

Reasoning Under Uncertainty: Bayesian networks intro

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Another look at Bayesian. inference

CMPSCI 240: Reasoning about Uncertainty

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Probabilistic Classification

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS Belief networks. Chapter

Lecture 5: Bayesian Network

Graphical Models - Part I

Bayes Nets: Independence

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Introduction to Bayesian Learning

Probabilistic Models. Models describe how (a portion of) the world works

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

CSE 473: Artificial Intelligence Autumn 2011

Defining Things in Terms of Joint Probability Distribution. Today s Lecture. Lecture 17: Uncertainty 2. Victor R. Lesser

Ch.6 Uncertain Knowledge. Logic and Uncertainty. Representation. One problem with logical approaches: Department of Computer Science

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Probabilistic representation and reasoning

Bayesian Networks and Decision Graphs

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Probabilistic representation and reasoning

Product rule. Chain rule

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Uncertainty and Bayesian Networks

Artificial Intelligence

COMP538: Introduction to Bayesian Networks

Bayesian networks (1) Lirong Xia

Reasoning Under Uncertainty: More on BNets structure and construction

Inference in Bayes Nets

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

CSEP 573: Artificial Intelligence

14 PROBABILISTIC REASONING

Outline } Conditional independence } Bayesian networks: syntax and semantics } Exact inference } Approximate inference AIMA Slides cstuart Russell and

Probabilistic Representation and Reasoning

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Artificial Intelligence Bayes Nets: Independence

last two digits of your SID

Bayesian Belief Network

Reasoning Under Uncertainty: More on BNets structure and construction

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Directed and Undirected Graphical Models

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Machine Learning Lecture 14

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

6.867 Machine learning, lecture 23 (Jaakkola)

Automatic Differentiation Equipped Variable Elimination for Sensitivity Analysis on Probabilistic Inference Queries

Rapid Introduction to Machine Learning/ Deep Learning

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example

Transcription:

Lecture 13 Bayesian belief networks. Inference. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Midterm exam Monday, March 17, 2003 In class Closed book Material covered by Wednesday, March 12 Last year midterm will be posted on the web

Project proposals Due: Monday, March 24, 2002 1-2 pages long Proposal Written proposal: 1. Outline of a learning problem, type of data you have available. Why is the problem important? 2. Learning methods you plan to try and implement for the problem. References to previous work. 3. How do you plan to test, compare learning approaches 4. Schedule of work (approximate timeline of work) Short 5 minute PPT presentation summarizing points 1-4 Modeling uncertainty with probabilities Full joint distribution: joint distribution over all random variables defining the domain it is sufficient to represent the complete domain and to do any type of probabilistic inferences Problems: Space complexity. To store full joint distribution requires to remember O(d n ) numbers. n number of random variables, d number of values Inference complexity. To compute some queries requires. O(d n ) steps. Acquisition problem. Who is going to define all of the probability entries?

Pneumonia example. Complexities. Space complexity. Pneumonia (2 values: F)ever (2: F), Cough (2: F), WBCcount (3: high, normal, low), paleness (2: F) Number of assignments: 2*2*2*3*248 We need to define at least 47 probabilities. Time complexity. Assume we need to compute the probability of PneumoniaT from the full joint Pneumonia Fever i, Cough i T j T k h, n, l u T Sum over 2*2*3*224 combinations j, WBCcount k, Pale u) Bayesian belief networks (BBNs) Bayesian belief networks. Represent the full joint distribution over the variables more compactly with a smaller number of parameters. Take advantage of conditional and marginal independences among random variables A and B are independent P ( A, B) A) B) A and B are conditionally independent given C P ( A, B C ) A C ) B C ) P ( A C, B) A C )

Bayesian belief network. 1. Directed acyclic graph Nodes random variables Links missing links encode independences. Burglary B) Earthquake E) Alarm A B,E) JohnCalls J A) M A) MaryCalls Bayesian belief network. 2. Local conditional distributions relate variables and their parents Burglary B) Earthquake E) Alarm A B,E) J A) JohnCalls M A) MaryCalls

Bayesian belief network. Burglary JohnCalls B) E) T F T F 0.001 0.999 Alarm J A) A T F T 0.90 0.1 F 0.05 0.95 Earthquake A B,E) B E T F T T 0.95 0.05 T F 0.94 0.06 F T 0.29 0.71 F F 0.001 0.999 MaryCalls 0.002 0.998 M A) A T F T 0.7 0.3 F 0.01 0.99 Bayesian belief networks (general) Two components: B ( S, ΘS ) Directed acyclic graph Nodes correspond to random variables (Missing) links encode independences Parameters Local conditional probability distributions for every variable-parent configuration i pa( i )) Where: pa ( i ) - stand for parents of i B J A A B,E) E B E T F M T T 0.95 0.05 T F 0.94 0.06 F T 0.29 0.71 F F 0.001 0.999

Full joint distribution in BBNs Full joint distribution is defined in terms of local conditional distributions (obtained via the chain rule): 1, 2,.., n ) Example: i 1,.. n Assume the following assignment of values to random variables B E J M F i pa( Then its probability is: B E J M F) P ( B E T B E J T M F i )) B J A E M Bayesian belief networks (BBNs) Bayesian belief networks Represent the full joint distribution over the variables more compactly using the product of local conditionals. But how did we get to local parameterizations? Answer: Graphical structure encodes conditional and marginal independences among random variables A and B are independent P ( A, B) A) B) A and B are conditionally independent given C P ( A C, B) A C ) P ( A, B C ) A C ) B C ) The graph structure implies the decomposition!!!

Independences in BBNs 3 basic independence structures: 1. 2. 3. Burglary Burglary Earthquake Alarm Alarm Alarm JohnCalls MaryCalls JohnCalls Full joint distribution in BBNs Rewrite the full joint probability using the product rule: B A E B E J M F) J M

Full joint distribution in BBNs Rewrite the full joint probability using the product rule: B A E B E J M F) J M P ( J T B E M F) B E M F) P ( J T B E M F) P ( M F B E B E P ( M F B E P ( T B E B E P ( B E P ( J T M F T B E B E Parameters: full joint: Parameter complexity problem In the BBN the full joint distribution is expressed as a product of conditionals (of smaller) complexity 1, 2,.., n ) 2 5 32 i 1,.. n i pa( i )) Burglary Earthquake BBN: 2 3 + 2(2 2 ) + 2(2) 20 Alarm Parameters to be defined: full joint: 2 5 1 31 JohnCalls MaryCalls BBN: 2 2 + 2(2) + 2(1) 10

Model acquisition problem The structure of the BBN typically reflects causal relations BBNs are also sometime referred to as causal networks Causal structure is very intuitive in many applications domain and it is relatively easy to obtain from the domain expert Probability parameters of BBN correspond to conditional distributions relating a random variable and its parents only Their complexity much smaller than the full joint Easier to come up (estimate) the probabilities from expert or automatically by learning from data BBNs built in practice In various areas: Intelligent user interfaces (Microsoft) Troubleshooting, diagnosis of a technical device Medical diagnosis: Pathfinder (Intellipath) CPSC Munin QMR-DT Collaborative filtering Military applications Insurance, credit applications

Diagnosis of car engine Diagnose the engine start problem Car insurance example Predict claim costs (medical, liability) based on application data

(ICU) Alarm network CPCS Computer-based Patient Case Simulation system (CPCS-PM) developed by Parker and Miller (at University of Pittsburgh) 422 nodes and 867 arcs

QMR-DT Medical diagnosis in internal medicine Bipartite network of disease/findings relations Derived from the Internist-1/QMR knowledge base Inference in Bayesian networks BBN models compactly the full joint distribution by taking advantage of existing independences between variables Smaller number of parameters But we are interested in solving various inference tasks: Diagnostic task. (from effect to cause) P ( Burglary JohnCalls T ) Prediction task. (from cause to effect) P ( JohnCalls Burglary T ) Other probabilistic queries (queries on joint distributions). P ( Alarm ) Question: Can we take advantage of independences to construct special algorithms and speedup the inference?

Inference in Bayesian network Bad news: Exact inference problem in BBNs is NP-hard (Cooper) Approximate inference is NP-hard (Dagum, Luby) But very often we can achieve significant improvements Assume our Alarm network Burglary Earthquake Alarm JohnCalls MaryCalls Assume we want to compute: P ( J T ) Inference in Bayesian networks Computing: P ( J T ) Approach 1. Blind approach. Sum out all un-instantiated variables from the full joint, express the joint distribution as a product of conditionals J T ) b T e T a T m T b T e T a T m T P ( B b, E J T A a) M m A a) A a B b, E e) B b) E e) Computational cost: Number of additions: 15 Number of products: 16*464 e, A a, J T, M m )

Inference in Bayesian networks Approach 2. Interleave sums and products Combines sums and product in a smart way (multiplications by constants can be taken out of the sum) J b T e T a T m T J T a) M m a) B b)[ a B b, E e) E e)] b T a T. Fm T a T J T A a) M m A a) A a B b, E e) B b) E e) e T J T a)[ M m a)][ B b)[ a B b, E e) E e)]] m T b T e T Computational cost: Number of additions: 1+ 2*(1)+2*(1+2*(1))9 Number of products: 2*(2+2*(1)+2*(2*(1)))16 Inference in Bayesian networks The smart interleaving of sums and products can help us to speed up the computation of joint probability queries What if we want to compute: P ( B T, J T ) B T, J T ) a T J T a)[ M m a)][ B [ m T e T a B E e) E e)]] J T ) a T J T a)[ M m a)][ B b)[ a B b, E e) E e)]] m T b T e T A lot of shared computation Smart cashing of results can save the time for more queries

Inference in Bayesian networks The smart interleaving of sums and products can help us to speed up the computation of joint probability queries What if we want to compute: P ( B T, J T ) B T, J T ) a T J T a)[ M m a)][ B [ m T e T a B E e) E e)]] J T ) J T a)[ M m a)][ B b)[ a B b, E e) E e)]] a T m T b T e T A lot of shared computation Smart cashing of results can save the time if more queries Inference in Bayesian networks When cashing of results becomes handy? What if we want to compute a diagnostic query: B T J T ) B T, J T ) J T ) Exactly probabilities we have just compared!! There are other queries when cashing and ordering of sums and products can be shared and saves computation B, J T ) B J T ) α B, J T ) J T ) General technique: Variable elimination

Inference in Bayesian networks General idea of variable elimination True ) 1 [ J j a)][ M m a)][ B b)[ a B b, E e) E e)]] a T j T m T b T e T f J (a) f M (a) f E ( a, b) Variable order: A J M B E f B (a) Results cashed in the tree structure Inferences in Bayesian network Exact inference algorithms: Symbolic inference (D Ambrosio) Message passing algorithm (Pearl) Clustering and joint tree approach (Lauritzen, Spiegelhalter) Arc reversal (Olmsted, Schachter) Approximate inference algorithms: Monte Carlo methods: Forward sampling, Likelihood sampling Variational methods