Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Similar documents
Bayesian Networks BY: MOHAMAD ALSABBAGH

CS 343: Artificial Intelligence

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Intelligent Systems (AI-2)

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Artificial Intelligence

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Machine Learning for Data Science (CS4786) Lecture 24

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Artificial Intelligence Bayes Nets: Independence

Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

CS 188: Artificial Intelligence. Bayes Nets

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

12735: Urban Systems Modeling. Loss and decisions. instructor: Matteo Pozzi. Lec : Urban Systems Modeling Lec. 11 Loss and decisions

An Introduction to Bayesian Networks: Representation and Approximate Inference

Intelligent Systems (AI-2)

COMP90051 Statistical Machine Learning

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Graphical Models and Kernel Methods

Particle-Based Approximate Inference on Graphical Model

Introduction to Machine Learning CMU-10701

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Machine Learning Lecture 14

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CS Lecture 3. More Bayesian Networks

Sampling Methods (11/30/04)

CS 188: Artificial Intelligence Spring Announcements

Bayesian networks. Chapter 14, Sections 1 4

Bayes Nets: Sampling

Sampling Algorithms for Probabilistic Graphical models

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

p L yi z n m x N n xi

component risk analysis

CS 343: Artificial Intelligence

Rapid Introduction to Machine Learning/ Deep Learning

Approximate Inference

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

CS 5522: Artificial Intelligence II

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Probabilistic Machine Learning

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

CSEP 573: Artificial Intelligence

CS 5522: Artificial Intelligence II

Bayes Nets III: Inference

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Bayesian Networks (Part II)

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Chapter 05: Hidden Markov Models

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

CS 5522: Artificial Intelligence II

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Linear Dynamical Systems

Bayesian networks: approximate inference

Y. Xiang, Inference with Uncertain Knowledge 1

Machine Learning 4771

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Directed and Undirected Graphical Models

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

STAT 518 Intro Student Presentation

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

School of EECS Washington State University. Artificial Intelligence

PROBABILISTIC REASONING SYSTEMS

Belief Update in CLG Bayesian Networks With Lazy Propagation

Probabilistic Graphical Models (I)

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

ECE521 Lecture 19 HMM cont. Inference in HMM

STA 4273H: Statistical Machine Learning

Bayesian networks. Chapter Chapter

Lecture 9: PGM Learning

Artificial Intelligence Bayesian Networks

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Markov Networks.

Bayesian Methods in Artificial Intelligence

Statistical Approaches to Learning and Discovery

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian Networks Inference with Probabilistic Graphical Models

Directed Graphical Models

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Graphical Networks: Definitions and Basic Results

Artificial Intelligence: Cognitive Agents

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Product rule. Chain rule

Undirected Graphical Models

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

CS 188: Artificial Intelligence Spring 2009

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

An Introduction to Bayesian Machine Learning

Inference in Bayesian Networks

Transcription:

12735: Urban Systems Modeling Lec. 09 Bayesian Networks instructor: Matteo Pozzi x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 1

outline example of applications how to shape a problem as a BN complexity of the inference problem inference via variable elimination inference via junction tree MCMC approximate inference 2

intro on Bayesian Networks random variables are nodes, links defines conditional dependence/independence. seismic intensity magnitudo damage Discrete variables, possible values for each var. 1 table table, table JOINT PROBABILITY Chain rule (product rule),,, table Each variable is defined by a table with number of dimensions equal to number of parents plus one. 3

example of Bayesian network scenario x 1 stiffness material x 2 strength Set of random variables, defined by conditional independence. load x 3 x 4 x 5 demand stress x 7 x 6 x 8 damage x 9 loss 4

example of Bayesian network roots scenario x 1 stiffness material x 2 strength Set of random variables, defined by conditional independence. roots defined by: load x 3 x 4 x 5 demand stress x 7 x 6 x 8 damage x 9 loss 5

example of Bayesian network roots scenario x 1 stiffness material x 2 strength Set of random variables, defined by conditional independence. roots defined by: load x 3 x 4 x 5 parent demand stress x 7 x 6 x 8 damage child children defined by: parents x 9 loss 6

example of Bayesian network roots scenario x 1 stiffness material x 2 strength Set of random variables, defined by conditional independence. roots defined by: load x 3 x 4 x 5 parent stress children defined by: demand x 7 x 6 x 8 damage child joint probability: parents parents x 9 loss task: prediction conditional prediction 7

applications integrated risk analysis predicting global warming predicting effects of natural hazards road construction time models: degrading systems, e.g. due to fatigue HMM time models: vibration of structures (Kalman Filter) 8

example of 2 vars. BN seismic intensity magnitudo Discrete variables, possible values for each variable, Joint probability, : table: 1 degrees of freedom (dofs), 1 if : fully connected, or complete graph Chain rule (product rule), : 1 table: 1 dofs : table: dofs : 1 1, : 2 2 dofs this reduced graph is less powerful than the complete one. It can represent only joint probability satisfying. However inference is much easier for this graph: 9

Independence [from lec.2], the joint prob. is no richer than the set of marginal prob. P (Y ) Y 1 Y 2 Y 3 20% 50% 30% 100% 0.3, P (X ) P (X,Y ) Y 1 Y 2 Y 3 X 1 10% X 1 2% 5% 3% 10% X 2 60% X 2 12% 30% 18% 60% X 3 30% X 3 6% 15% 9% 30% 100% 20% 50% 30% 100%, the joint prob. is richer than the set of marginal prob. P (Y ) Y 1 Y 2 Y 3 20% 50% 30% 100% P (X ) P (X,Y ) Y 1 Y 2 Y 3 X 1 10% X 1 2% 5% 3% 10% X 2 60% X 2 3% 30% 27% 60% X 3 30% X 3 15% 15% 0% 30% 100% 20% 50% 30% 100%, P(X,Y) P(X,Y) 0.2 0.1 0 0.3 0.2 0.1 0 1 1 2 Y 2 Y 3 3 1 1 2 X 2 X 3 3 10

example of 3 vars. BN seismic intensity magnitudo damage complete graph Discrete variables, possible values for each var. Joint probability,, : table: 1 dofs Chain rule (product rule),,,,,,, 1 if :, conditional independence After observing intensity, any additional information on magnitudo is irrelevant for inferring the damage.,, 12 2 N1 dofs 11

chain graph for n vars Chain graph Complete graph: : table: 1 dofs,, If,, :,, 1 1 dofs the chain graph is less powerful, but much easier to handle. number of dofs N = 10 10 9 10 7 complete chain 10 5 10 3 10 1 1 2 3 4 5 6 7 8 9 n 12

prediction by variable elimination seismic intensity magnitudo damage M I D build joint probability:,, table derive marginal probability by marginalization:,,, you can derive everything from the joint prob.: I D D, we can derive without handling any 3 d table: only handling 1 d and 2 d tables. vector matrix product 13

prediction by variable elimination [cont.] seismic intensity magnitudo damage M I D build joint probability:,, table derive marginal probability by marginalization:,,, you can derive everything from the joint prob.: M I, 1 I 14

prediction by variable elimination [cont.] seismic intensity magnitudo damage M I D build joint probability:,, table derive marginal probability by marginalization:,,, you can derive everything from the joint prob.: M I, 1 M 1 15

inference by variable elimination seismic intensity magnitudo damage M I D build joint probability:,, table derive marginal probability by marginalization:,,, / you can derive everything from the joint prob.: M D, normalization:,, 16

inference by variable elimination [cont.] seismic intensity magnitudo damage M I D build joint probability:,, table derive marginal probability by marginalization:,,, / you can derive everything from the joint prob.: I D normalization:,,, 17

best order of elimination load stiffness x 3 x 4 strength x 5 stress x 6,,,,, x 8 damage,, 4D table x 3 x 4 x 5 x 8 The efficiency of the algorithm depends on the order for eliminating variables. By selecting an inappropriate order, you may increase the dimension of the Condition Probability Tables (CPTs). E.g., for predicting, it is not efficient to eliminate first, relating damage to {load, stiffness, strength}. 18

branching graph damage on building 1 I seismic intensity damage on building 2 build joint probability:,, D 1 D 2 D 1 D 2 3 d table D 1 D 2 task: modeling no 3 d table is used prediction:, after observing and :,,, after observing :,, cost. 1 is irrelevant and are NOT independent, while is not fixed. 1 d table is irrelevant after 19

V graph load 1 load 2 L 1 L 2 L 1 L 2 build joint probability:,,, damage D, task: modeling prediction:,,,,, 1 after observing :,,, cost. 1, are irrelevant as L 1 L 2 20

V graph [cont.] load 1 load 2 build joint probability:,,, L 1 L 2 L 1 L 2 L 1 L 2 damage D, task: modeling after observing :,,, knowledge on L 1 is used for building likelihood. after observing and :,,,,, cost. conditionally to (having observed), this is an example of INDUCED DEPENDENCE variables L 1 and L 2 are NOT independent. (induced correlation) 21

inference via variable elimination and junction tree seismic intensity magnitudo damage target: M I D method: eliminate M to get, eliminate I to get, The variables to be eliminated depend on the specific query. If we are interested in more than one query, we may repeat some operations in different queries. The Junction Tree is an algorithm to get response to all possible queries, without repeating operations. clique M,I separator I clique I,D,, 22

HHM revised S 0 S 1 S k S k+1 y 1 y k y k+1 S n y n task: compute : eliminate, process, eliminate, process,, eliminate, process. (eliminate) (eliminate) (eliminate) The prediction correction algorithm is an application of a best elimination order. 23

conditions for exact inference Discrete variables, except for course of dimensionality. Continuous variables: integral instead of sum. Generally integrals cannot be solved in close form. But they can be solved for Gaussian Linear Models (GLM). x 1 x 3 x condition for GLM: if vector lists all parents of : 5,, x 2 x 4 x 6 x 7 GLMs are used for dynamic systems (Kalman filters) GLM can be seen as a special case of Gaussian processes, with special independency relations (while Gaussian processes are complete graphs). Other problems can be also mapped into a GLM. For example Log normal models can be mapped by taking into GLMs by taking the log. Hybrid graphs have also been proposed, mixing discrete and continuous variables, by imposing some rules. 24

approximate inference MC: sequential sampling. We start sampling roots from their marginal, then each other variables conditional to their (sampled) parents. After observing any variable, we can reject samples non compatible with observations, or use importance sampling. MCMC: Gibbs sampling. We samples randomly variables conditional to the other vars. In the Markov blanket (kept fixed). It is an application of the Metropolis algorithm with special proposal distribution. Markov blanket Gibbs sampling Russell, S. and P. Norvig. (2010). Artificial Intelligence: A Modern Approach. Pearson Education. Barber, B. (2012). Bayesian Reasoning and Machine Learning. Cambridge UP 25

summary Inference and prediction in Bayesian Network can be done in three steps. i) compute the joint probability: parents ii) compute the conditional distribution.,.., iii) marginalize on variables of interest:.. \. \ All exact and approximate methods are used to overcome computational difficulties related to previous approach. 26

HHM with dummy algorithm S 0 S 1 S k S k+1 y 1 y k y k+1 S n y n task: compute : i) compute the joint probability: :, : huge table/function: it is not an effective path ii) compute the conditional distribution: : : :, : iii) marginalize on variables of interest : : : : : 27

references Barber, B. (2012). Bayesian Reasoning and Machine Learning. Cambridge UP. Downloadable from http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer Russell, S. and P. Norvig. (2010). Artificial Intelligence: A Modern Approach. Pearson Education. 28