Bayesian Networks BY: MOHAMAD ALSABBAGH

Similar documents
CS 343: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

PROBABILISTIC REASONING SYSTEMS

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CS 188: Artificial Intelligence. Bayes Nets

Introduction to Artificial Intelligence (AI)

Quantifying uncertainty & Bayesian networks

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Bayesian networks. Chapter 14, Sections 1 4

Approximate Inference

Artificial Intelligence Bayes Nets: Independence

Bayes Nets III: Inference

CS 188: Artificial Intelligence Spring Announcements

CS 5522: Artificial Intelligence II

Bayes Nets: Sampling

Bayes Nets: Independence

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

Bayesian networks. Chapter Chapter

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CS 188: Artificial Intelligence Spring Announcements

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Artificial Intelligence Bayesian Networks

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Bayesian Networks. Motivation

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Introduction to Artificial Intelligence Belief networks

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Rapid Introduction to Machine Learning/ Deep Learning

Artificial Intelligence

Probabilistic Reasoning Systems

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Machine Learning

Bayesian Methods in Artificial Intelligence

CS 188: Artificial Intelligence Fall 2009

CSEP 573: Artificial Intelligence

Lecture 8: Bayesian Networks

CSEP 573: Artificial Intelligence

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

CS 188: Artificial Intelligence Fall 2008

Inference in Bayesian Networks

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Informatics 2D Reasoning and Agents Semester 2,

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Probabilistic Models

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Artificial Intelligence

CS 5522: Artificial Intelligence II

Bayes Networks 6.872/HST.950

Graphical Models and Kernel Methods

CS 188: Artificial Intelligence Fall Recap: Inference Example

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

CS 343: Artificial Intelligence

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Reasoning Under Uncertainty: Bayesian networks intro

qfundamental Assumption qbeyond the Bayes net Chain rule conditional independence assumptions,

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

CS Lecture 3. More Bayesian Networks

Product rule. Chain rule

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Directed Graphical Models

Review: Bayesian learning and inference

Learning Bayesian Networks (part 1) Goals for the lecture

14 PROBABILISTIC REASONING

Reasoning Under Uncertainty: Belief Network Inference

Bayesian belief networks. Inference.

Uncertainty and Bayesian Networks

CSE 473: Artificial Intelligence Autumn 2011

CS532, Winter 2010 Hidden Markov Models

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

Intelligent Systems (AI-2)

Graphical Models - Part I

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Final Examination CS 540-2: Introduction to Artificial Intelligence

School of EECS Washington State University. Artificial Intelligence

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Mini-project 2 (really) due today! Turn in a printout of your work at the end of the class

Undirected Graphical Models

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Directed Graphical Models or Bayesian Networks

Inference in Bayesian networks

Probabilistic Models. Models describe how (a portion of) the world works

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CS 343: Artificial Intelligence

Informatics 2D Reasoning and Agents Semester 2,

Markov Networks.

The Origin of Deep Learning. Lili Mou Jan, 2015

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

STA 4273H: Statistical Machine Learning

Transcription:

Bayesian Networks BY: MOHAMAD ALSABBAGH

Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN

Introduction Conditional Probability: Product Rule: Chain Rule: Independence: Conditional Independence:

Bayes Rule P( A B) P( B A) P( A) P( B) P( Cause Evidence) P( Evidence Cause) P( Cause) P( Evidence) Thomas Bayes (1701-1761)

Bayesian Networks (BN) Representation A directed, acyclic graph (DAG) One node per random variable. Each Node is a conditional distribution represented by a conditional probability table (CPT) given its parents. BN encodes joint distribution efficiently: As a product of local conditional distribution

Bayesian Networks (BN) Representation Examples: P b, e, a, m, j = P b P e P a b, e) P(m a P j a = 0.001 0.998 0.94 0.7 0.1 = 0.0000656684 P b, e, a, m, j = 0.001 0.998 0.94 0.7 0.9 = 0.0005910156

Size of a Bayesian Network Full joint distribution over N Boolean variables table requires 2 N numbers in the table. BN with N nodes and up to k parents representation size is O(N * 2 K ) Benefits: Provide a huge saving in space Easier to calculate local CPTs Faster to answer queries For N = 11 and k = 3 BN size is 88 vs 2048 numbers in CPT N = 30 and k = 5 BN requires 960 and full joint distribution requires over billion. k = 3

Inference via BN What is Inference? Exact Inference in BN: Enumeration Variable Elimination Approximate Inference in BN: Sampling

What is Inference? Inference: Compute posterior probability distribution for a set of query variables given some observed event. Q query variable, E evidence variable Examples (Alarm BN): P(b j, m) (diagnostic) P(e m) (diagnostic) P(m e) (causal) P(a m, b) (Mixed) P(b a, e) (inter-causal)

Exact Inference in BN Enumeration: Summing terms from the full join distribution Examples: P(b j, m) = 0.284 P(b j, m) = P b,j,m P(j,m) = α P b, j, m = α P b, j, m, A, E = α a e P b, j, m, a e = α 0.00059224 P( b j, m) = α 0.0014919 P(b j, m)+p( b j, m) = 1 α 479.53532

Exact Inference in BN Variable Elimination: Improve Enumeration Algorithm by eliminating repeated calculations. Store intermediate results. Elimination order of hidden variables matters. Every variable that is not an ancestor of a query variable or an evidence variable is irrelevant to the query Algorithm: Query = Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or E i ) Pick hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize

Approximate Inference in BN Sampling: Why: Sampling is a lot like repeated simulation Generate N random samples to compute approximate posterior probability. Getting samples is faster than computing the right answer. Learning: get samples from a distribution we don't know.

Approximate Inference in BN Sampling in BN: Prior Sampling Each variable is sampled according to the condition distribution given. P(x 1,.,x m ) = N PS (x 1,.,x m )/N Rejection Sampling No point keeping all samples around. P (C s) same as before but reject samples which don t have S = s (sprinkler evidence) Likelihood Weighting: Avoids inefficiency from rejecting samples by generating samples that are consistent with the evidence e.

Approximate Inference in BN Sampling in BN: Markov chain Monte Carlo (Gibbs Sampling): Generates each sample from a previous sample by doing a random modification. It is conditional on the current values of the variables in the Markov blanket. The algorithm wanders randomly around the state space flipping one variable at a time but keeping evidence variable fixed. Gibbs Algorithm: Fix evidence R= r (as an example) Initialize other variables randomly Repeat on non-evidence variable.

BN Learning In practical settings BN is unknown and we need to use data to learn. Given training data (prior knowledge), we need to estimate the graph topology (network structure) and the parameters in joint distribution. Learning the structure is harder than BN parameters. Possible cases of the problem: Case BN Structure Observability Proposed Learning Method 1 Known Full Maximum likelihood estimate 2 Known Partial EM (Expectation Maximization) MCMC 3 Unknown Full Search through model space 4 Unknown Partial EM + Search through model space

Dynamic BN DBN is a BN that represents a temporal probability. In general each time slice of DBN can have any numbers of variables X t and evidence variables. Model structure & parameters don t change overtime. Inference: Filtering: P(X t e 1:t ) Prediction: P(X t+k e 1:t ); k > 0 Smoothing: P(X k e 1:t ); 0 k t Most likely explanation: (given sequence of observation we want to find best states) Exact inference Variable elimination Approximate inference Particle filtering (an improvement on Likelihood Weighting) MCMC

Dynamic BN Special cases of DBN: Each HMM is a DBN Discrete State Variables Used to model sequences of events. Single state variable and single evidence variable Each DBN can be converted to HMM by combining all state variables to mega variable with all possible cases. DBN with 20 Boolean states and 3 parents as max the transition model for it will require only 160 probabilities while corresponding HMM needs 2 40 or ~trillion in transition model.

Dynamic BN Special cases of DBN: Every Kalman Filters is a DBN Continuous State Variables, with Gaussian Distribution p 2 ( y ) 1 2 ( y) e 2 2 Gaussian distribution is fully defined by its mean and variance Used to model noisy continuous observations Example: predict a motion of a bird in a Jungle. Not every DBN can be converted to Kalman Filter. DBN allow no-linear distribution, that require both discrete and continues variables which Kalman doesn t allow.

Dynamic BN Constructing Required information Prior distributions over state variables P(X 0 ) The transition model P(X t+1 X t ) The sensor or observation model P(E t X t )

Further resources Tools (Belief and Decision Networks) Books: http://www.aispace.org/downloads.shtml Artificial Intelligence A Modern Approach by Russell & Norvig.

Summary BN become extremely popular models. BN used in many applications like: Machine Learning Speech Recognition Bioinformatics Medical diagnosis Weather forecasting BN is intuitively appealing and convenient for representation of both causal and probabilistic semantics.

Q&A Thank you