Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

Similar documents
Bayesian Networks BY: MOHAMAD ALSABBAGH

PROBABILISTIC REASONING OVER TIME

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Hidden Markov models 1

Hidden Markov Models (recap BNs)

Machine Learning for Data Science (CS4786) Lecture 19

Temporal probability models. Chapter 15, Sections 1 5 1

Artificial Intelligence

CS 5522: Artificial Intelligence II

Artificial Intelligence Bayes Nets: Independence

Approximate Inference

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Temporal probability models. Chapter 15

Bayes Nets: Independence

Probabilistic Reasoning. (Mostly using Bayesian Networks)

CS 188: Artificial Intelligence Spring Announcements

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

CS 343: Artificial Intelligence

Rapid Introduction to Machine Learning/ Deep Learning

CSEP 573: Artificial Intelligence

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Introduction to Artificial Intelligence (AI)

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

Bayes Nets: Sampling

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Probabilistic Models

Informatics 2D Reasoning and Agents Semester 2,

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Intelligent Systems (AI-2)

Linear Dynamical Systems

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Reasoning Under Uncertainty Over Time. CS 486/686: Introduction to Artificial Intelligence

CS 7180: Behavioral Modeling and Decision- making in AI

CS 188: Artificial Intelligence. Bayes Nets

Bayesian Methods in Artificial Intelligence

Machine Learning Lecture 14

Bayesian Approaches Data Mining Selected Technique

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Review: Bayesian learning and inference

Directed Graphical Models or Bayesian Networks

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Dynamic Bayesian Networks and Hidden Markov Models Decision Trees

Markov Models. CS 188: Artificial Intelligence Fall Example. Mini-Forward Algorithm. Stationary Distributions.

Bayesian Networks. instructor: Matteo Pozzi. x 1. x 2. x 3 x 4. x 5. x 6. x 7. x 8. x 9. Lec : Urban Systems Modeling

Bayes Nets III: Inference

CS532, Winter 2010 Hidden Markov Models

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Announcements. CS 188: Artificial Intelligence Fall Markov Models. Example: Markov Chain. Mini-Forward Algorithm. Example

Learning Bayesian Networks (part 1) Goals for the lecture

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

Bayesian Networks. Motivation

Probabilistic Models. Models describe how (a portion of) the world works

ECE521 Lecture 19 HMM cont. Inference in HMM

CS 188: Artificial Intelligence Spring 2009

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Hidden Markov Models. Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 19 Apr 2012

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

Quantifying uncertainty & Bayesian networks

CS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III

CS 343: Artificial Intelligence

CSE 473: Artificial Intelligence

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

Bayesian networks (1) Lirong Xia

Product rule. Chain rule

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Soft Computing. Lecture Notes on Machine Learning. Matteo Matteucci.

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

CS 5522: Artificial Intelligence II

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

CS 188: Artificial Intelligence Fall 2011

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

Uncertainty and Bayesian Networks

PROBABILISTIC REASONING SYSTEMS

Probability. CS 3793/5233 Artificial Intelligence Probability 1

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

CS 343: Artificial Intelligence

CMPSCI 240: Reasoning about Uncertainty

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

CS 188: Artificial Intelligence Spring Announcements

2 : Directed GMs: Bayesian Networks

A Tutorial on Learning with Bayesian Networks

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

Sum-Product Networks: A New Deep Architecture

Machine Learning 4771

Transcription:

Extensions of Bayesian Networks Outline Ethan Howe, James Lenfestey, Tom Temple Intro to Dynamic Bayesian Nets (Tom Exact inference in DBNs with demo (Ethan Approximate inference and learning (Tom Probabilistic Relational Models (James Reasoning under Uncertainty Bayesian Network How do we use prior knowledge and new observations to judge what is likely and what is not? P( s observatio ns, prior knowledge But that is a very large joint distribution GPA Recs Low Great Low Good Ave Great Ave Good High Great High Good Accepted.05.05.25 Bayesian Network Features of Bayesian Networks A Bayesian Network is a Directed Acyclic Graph (DAG with variables as nodes. Edges go from parent to child such that Each child, x, has a conditional probability table P(x parents(x that defines the affects that x feels from its parents. Intuitively, these edges codify direct relationships between nodes in a cause-effect manner. The lack of an edge between nodes implies that they are conditionally independent. Arbitrarily descriptive; allows encapsulation of all the available prior knowledge The model makes no distinction between observed variables and inferred variables The DAG restriction is somewhat limiting

HMM as a Bayes Net Dynamic Bayesian Networks S S2 S3 S.8.05 S2.2.5 S3.2.6 S4.02 DBNs are BNs with a temporal regularity analogous to the HMM The full representation of a DBN grows with time. S4.9 Hybrid State in DBNs Exact Inference in Bayesian Networks For basis functions, parent variables are used as function parameters Usually the function is a mixture of Gaussians with known means and variances and the parent variables decides the mixture weights. Queries of the network are fielded by summing out all of the non-query variables A variable is summed out by collecting all of the factors that contain it into a new factor. Then sum over all of the possible states of the variable to be eliminated. Temporal Inference Notation Smoothing Filtering t Filtering Probabilities of current states Prediction Probabilities of future states Smoothing Probabilities of past states Prediction X t+ : set of hidden variables in the t+ time slice x t+ : set of values for those hidden variables at t+ e t+ : set of evidence at time t+ e :t : set of evidence from all times from to t a: normalization constant 2

Toy Example R t- P(R t t 0.7 f 0.3 Rain t- Rain t Rain t+ R t P(U t t 0.9 f 0.2 Markov Assumptions First-order Markov process P(X t X 0:t- = P(X t X t- Markov assumption of evidence P(E t X 0:t,E 0:t- = P(E t X t t- t t+ Prediction: Separating Time Slices P(X t+ e :t+ = P(X t+ e :t,e t+ (divide evidence = a P(e t+ X t+,e :t P(X t+ e :t (Bayes' rule = a P(e t+ X t+ P(X t+ e :t (Markov property Predict Next Time Slice from Current P(X t+ e :t+ x t = a P(e t+ X t+ S P(X t+ x t,e :t P(x t e :t x t =a P(e t+ X t+ S P(X t+ x t P(x t e :t of evidence Example: Seen on Day P(R 0 = {0.5,0.5} P(R u = a P(u R S P(R r 0 P(r 0 = a {0.9,0.2} (0.5 {0.7,0.3}+ 0.5 {0.3,0.7} = a {0.45, 0} = {0.88, 082} Rain Rain 0 Example: Seen Again on Day 2 P(R 2 u = S P(R 2 r P(r u r P(R 2 u,u 2 = a P(u 2 R 2 P(R 2 u = a{0.9, 0.2}(0.88{0.7,0.3}+082{0.3,0.7} = a{0.565,0.075} = {0.883, 07} Rain 0 Rain Rain 2 2 3

Encapsulate Prediction Repeating actions at each step of time Formalize as procedure P(X t+ e :t+ = FORWARD(previous state set, new evidence set Only need to pass message from previous time step f :t+ = a FORWARD(f :t,e t+ Can forget about previous times Smoothing: Break into data before and after P(X k e :t = P(X k e :k,e k+:t (divide evidence = a P(X k e :k P(e k+:t,e :k (Bayes' rule = a P(X k e :k P(e k+:t (Markov Indep. = a f :k b k+:t Backwards message Backwards procedure P(X k+:t e k = S P(e k+:t,x k+ P(x k+ (cond. on X k+ x k+ =S P(e k+:t x k+ P(x k+ (Markov indep. x k+ =S P(e k+,e k+2:t x k+ P(x k+ x k+ =S P(e k+ x k+ P(e k+2:t x k+ P(x k+ x k+ Repeating actions at each step of time Formalize as procedure P(e k+:t = BACKWARD(next state set, future evidence set Message passing goes backwards in time b k+:t = BACKWARD(b k+2:t,e k+:t Example: Was it really raining day? P(u 2 R = S P(u 2 r 2 P(u 3:2 r 2 P(r 2 R r 2 = (0.9**{0.7,0.3}+(0.2**{0.3,0.7} = {0.69, 0.4} P(R u :2 = a f : b 2:2 = a {0.88,082} {0.69,0.4} = {0.883,07} Rain 0 Rain Rain 2 2 Forward-Backwards Algorithm Save FORWARD values at each time step Each time new data arrives, recompute BACKWARD values. Time Complexity O(t Space Complexity O(t Both can be bounded if only care about last k slices. 4

Demo: Battery Meter Model Exact Inference in DBNs BMBroken 0 BMBroken B 0 P(B t.0 f 0.00 BMeter There are some additional difficulties with exact inference in DBNs The factors grow to include all the states which sharply decreases the advantage over HMMs Constant memory requirement Battery 0 Battery Drain Battery Approximate Inference Learning While a number of inference algorithms exist for the standard Bayes Net, few of them adapt well to the DBN. One that does adapt is Particle filtering, due to its ingenuitive use of resampling Particle filtering deals well with hybrid state Sampling has trouble with unlikely events The three components of a BN the probabilities the structure the variables can all be learn automatically, though with varying degrees of complexity Probability learning How do we generate the Network? GPA Recs Low Great Low Good Ave Great Ave Good High Great High Good Accepted The network contains variables, edges and probabilities all of which need to be known ahead of time. Given all the variables and their relationships, it is relatively easy to estimate the probabilities from sample data, even if that sample data does not include every variable. 5

Structure Learning Structure Learning Given the variables only, we can guess a structure, learn parameters for that structure and then rate it based on how well it predicts the data. Care must be taken not to overfit. Search the structure space with greedyascent on this rating Randomization helps avoid local maxima Hidden Variable Learning Hidden Variable Learning Given a subset of the variables, generate new variables, structure and parameters. Greedy ascent on the output of Structure learning. Some risk of overfit Applications More exciting applications Windows printer troubleshooter and goddamn paper-clip Speech and Vision recognition Fewer parameters than traditional HMMs mean better learning Allows for easier experimentation with intuitive (i.e. hand coded hierarchical models 6

In Depth Training Isolated Word Recognition Since the DBN model doesn t make a distinction between observed variables and hidden variables, it can learn a model with access to data that we don t have during prediction. For example, in Speech Recognition, We know that sounds are made by the pose of the lips mouth and tongue While training we can measure the pose Recall a few important points Bayesian networks are an arbitrarily expressive model for joint probability distributions DBNs can be more compact than HMMsand therefore easier to learn and faster to use Dynamic Bayesian networks allow for reasoning in a temporal model Tolerant of missing data, likewise able to exploit bonus data Computationally, the compactness advantage is often lost in exact reasoning Particle Filters are an alternative for exploiting this compactness as long as the important probabilities are large enough to be sampled from. 7