Directed Graphical Models

Similar documents
CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Graphical Models - Part I

Probabilistic Graphical Models (I)

Bayesian Networks. Motivation

Chris Bishop s PRML Ch. 8: Graphical Models

Conditional Independence

Bayesian networks. Chapter 14, Sections 1 4

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Review: Bayesian learning and inference

Learning Bayesian Networks (part 1) Goals for the lecture

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Rapid Introduction to Machine Learning/ Deep Learning

Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence Belief networks

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

PROBABILISTIC REASONING SYSTEMS

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Bayesian networks. Chapter Chapter

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Informatics 2D Reasoning and Agents Semester 2,

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Graphical Models - Part II

Bayesian networks: Modeling

COMP5211 Lecture Note on Reasoning under Uncertainty

Probabilistic Reasoning Systems

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

CS 5522: Artificial Intelligence II

Artificial Intelligence Bayes Nets: Independence

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Bayesian networks. Chapter Chapter

Bayesian networks. Chapter AIMA2e Slides, Stuart Russell and Peter Norvig, Completed by Kazim Fouladi, Fall 2008 Chapter 14.

Directed and Undirected Graphical Models

Another look at Bayesian. inference

Graphical Models 359

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Bayes Nets: Independence

Bayesian networks. Chapter Chapter Outline. Syntax Semantics Parameterized distributions. Chapter

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

Machine Learning Lecture 14

Quantifying uncertainty & Bayesian networks

Probabilistic Graphical Models

Probabilistic Models. Models describe how (a portion of) the world works

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

School of EECS Washington State University. Artificial Intelligence

Bayes Networks 6.872/HST.950

14 PROBABILISTIC REASONING

last two digits of your SID

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

CS 343: Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

CS 5522: Artificial Intelligence II

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Graphical Models and Kernel Methods

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Bayesian Belief Network

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS 188: Artificial Intelligence Fall 2009

Probabilistic Classification

Directed Graphical Models or Bayesian Networks

Probabilistic representation and reasoning

Artificial Intelligence Bayesian Networks

Based on slides by Richard Zemel

Chapter 16. Structured Probabilistic Models for Deep Learning

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CSE 473: Artificial Intelligence Autumn 2011

{ p if x = 1 1 p if x = 0

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Probability. CS 3793/5233 Artificial Intelligence Probability 1

CS Belief networks. Chapter

Uncertainty and Bayesian Networks

Probabilistic representation and reasoning

Introduction to Bayesian Learning

Probabilistic Representation and Reasoning

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

CS 188: Artificial Intelligence Spring Announcements

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Bayes Nets III: Inference

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Bayesian Machine Learning

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Bayesian Networks BY: MOHAMAD ALSABBAGH

Probabilistic Reasoning. Kee-Eung Kim KAIST Computer Science

Bayesian Networks Representation

Probabilistic Models

Product rule. Chain rule

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Bayesian Networks (Part II)

CS 188: Artificial Intelligence Spring Announcements

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

CS Lecture 3. More Bayesian Networks

Probabilistic Graphical Networks: Definitions and Basic Results

Lecture 6: Graphical Models

Probabilistic Graphical Models: Representation and Inference

CS 343: Artificial Intelligence

Transcription:

CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017

Graphical Models If no assumption of independence is made, must estimate an exponential number of parameters No realistic amount of training data is sufficient If we assume all variables independent, efficient training and inference possible, but assumption too strong Graphical models use graphs over random variables to specify variable dependencies and allow for less restrictive independence assumptions while limiting the number of parameters that must be estimated Bayesian networks: Directed acyclic graphs indicate causal structure Markov networks: Undirected graphs capture general dependencies Adapted from Ray Mooney

Bayesian Networks Directed Acyclic Graph (DAG) Nodes are random variables Edges indicate causal influences Burglary Earthquake Alarm JohnCalls MaryCalls Ray Mooney

Conditional Probability Tables Each node has a conditional probability table (CPT) that gives the probability of each of its values given every possible combination of values for its parents (conditioning case) Roots (sources) of the DAG that have no parents are given prior probabilities P(B).001 Burglary Earthquake P(E).002 Alarm B E P(A) T T.95 T F.94 F T.29 F F.001 A P(J) T.90 F.05 JohnCalls MaryCalls A P(M) T.70 F.01 Ray Mooney

CPT Comments Probability of false not given since rows must add to 1 Example requires 10 parameters rather than 2 5 1=31 for specifying the full joint distribution Number of parameters in the CPT for a node is exponential in the number of parents Ray Mooney

Bayes Net Inference Given known values for some evidence variables, determine the posterior probability of some query variables Example: Given that John calls, what is the probability that there is a Burglary???? John calls 90% of the time there Burglary Earthquake is an Alarm and the Alarm detects 94% of Burglaries so people Alarm generally think it should be fairly high. JohnCalls MaryCalls However, this ignores the prior probability of John calling. Ray Mooney

Bayes Net Inference Example: Given that John calls, what is the probability that there is a Burglary???? Burglary JohnCalls P(B).001 Alarm Earthquake MaryCalls John also calls 5% of the time when there is no Alarm. So over 1,000 days we expect 1 Burglary and John will probably call. However, he will also call with a false report 50 times on average. So the call is about 50 times more likely a false report: P(Burglary JohnCalls) 0.02 A P(J) T.90 F.05 Ray Mooney

Bayesian Networks No independence encoded

Bayesian Networks More interesting: Some independences encoded General Factorization

Conditional Independence a is independent of b given c Equivalently Notation

Conditional Independence: Example 1 Node c is tail to tail for path from a to b: No independence of a and b follows from this path

Conditional Independence: Example 1 Node c is tail to tail for path from a to b: Observing c blocks the path thus making a and b conditionally independent

Conditional Independence: Example 2 Node c is head to tail for path from a to b: No independence of a and b follows from this path Σ c p(c a) p(b c) = Σ c p(c a) p(b c, a) = // next slide Σ c p(b, c a) = p(b a)

Conditional Independence: Example 2 Node c is head to tail for path from a to b: Observing c blocks the path thus making a and b conditionally independent

Conditional Independence: Example 3 Node c is head to head for path from a to b: Unobserved c blocks the path thus making a and b independent Note: this is the opposite of Example 1, with c unobserved.

Conditional Independence: Example 3 Node c is head to head for path from a to b: Observing c unblocks the path thus making a and b conditionally dependent Note: this is the opposite of Example 1, with c observed.

Bayesian Networks More interesting: Some independences encoded General Factorization

Am I out of fuel? B = F = G = Battery (0=flat, 1=fully charged) Fuel Tank (0=empty, 1=full) Fuel Gauge Reading (0=empty, 1=full) and hence

Am I out of fuel? Probability of an empty tank increased by observing G = 0.

Am I out of fuel? Probability of an empty tank reduced by observing B = 0. This referred to as explaining away.

D-separation A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked if it contains a node such that either a) the arrows on the path meet either head-to-tail or tailto-tail at the node, and the node is in the set C, or b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. If all paths from A to B are blocked, A is said to be d- separated from B by C. If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies.

D-separation: Example

D-separation: I.I.D. Data The x i s conditionally independent. Are the x i s marginally independent?

Naïve Bayes Conditioned on the class z, the distributions of the input variables x 1,, x D are independent. Are the x 1,, x D marginally independent?

Bayesian Curve Fitting Polynomial

Bayesian Curve Fitting Plate

Bayesian Curve Fitting Input variables and explicit hyperparameters

Bayesian Curve Fitting Learning Condition on data

Bayesian Curve Fitting Prediction Predictive distribution: where

Discrete Variables General joint distribution: K 2 { 1 parameters Independent joint distribution: 2(K { 1) parameters

Discrete Variables General joint distribution over M variables: K M { 1 parameters M -node Markov chain: K { 1 + (M { 1)K(K { 1) parameters

Discrete Variables: Bayesian Parameters

Discrete Variables: Bayesian Parameters Shared prior

The Markov Blanket Factors independent of x i cancel between numerator and denominator. The parents, children and co-parents of x i form its Markov blanket, the minimal set of nodes that isolate x i from the rest of the graph.