CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Similar documents
1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks

Directed Graphical Models or Bayesian Networks

Probabilistic Graphical Models

Rapid Introduction to Machine Learning/ Deep Learning

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

CS Lecture 3. More Bayesian Networks

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

BN Semantics 3 Now it s personal!

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Introduction to Machine Learning CMU-10701

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

CS 5522: Artificial Intelligence II

Bayesian Networks (Part II)

Representation of undirected GM. Kayhan Batmanghelich

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Models. Models describe how (a portion of) the world works

Learning in Bayesian Networks

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Chris Bishop s PRML Ch. 8: Graphical Models

Directed Graphical Models

Bayes Nets: Independence

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Advanced Data Science

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

CS Lecture 4. Markov Random Fields

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

3 : Representation of Undirected GM

BN Semantics 3 Now it s personal! Parameter Learning 1

Learning P-maps Param. Learning

Probabilistic Models

6.867 Machine learning, lecture 23 (Jaakkola)

CS711008Z Algorithm Design and Analysis

Directed Graphical Models (aka. Bayesian Networks)

Bayesian Networks Representation

Artificial Intelligence Bayes Nets: Independence

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

10708 Graphical Models: Homework 2

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Directed and Undirected Graphical Models

Statistical Approaches to Learning and Discovery

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Probabilistic Reasoning. (Mostly using Bayesian Networks)

CS 188: Artificial Intelligence Fall 2009

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Probabilistic Graphical Models

CS 5522: Artificial Intelligence II

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

An Introduction to Bayesian Machine Learning

CS 343: Artificial Intelligence

Lecture 6: Graphical Models

Chapter 16. Structured Probabilistic Models for Deep Learning

Introduction to Probabilistic Graphical Models

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Based on slides by Richard Zemel

CS 188: Artificial Intelligence Fall 2008

CSC 412 (Lecture 4): Undirected Graphical Models

Introduction to Probabilistic Graphical Models

CSE 473: Artificial Intelligence Autumn 2011

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Bayesian Network Representation

Introduction to Bayesian Networks

Markov properties for directed graphs

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Bayesian Networks. Motivation

K. Nishijima. Definition and use of Bayesian probabilistic networks 1/32

Probabilistic Graphical Models

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Hidden Markov Models. x 1 x 2 x 3 x N

Probabilistic Graphical Models

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Bayesian Networks to design optimal experiments. Davide De March

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Introduction to Bayesian Networks. Probabilistic Models, Spring 2009 Petri Myllymäki, University of Helsinki 1

{ p if x = 1 1 p if x = 0

CS 188: Artificial Intelligence Spring Announcements

Learning Bayesian Networks (part 1) Goals for the lecture

Directed Graphical Models

PROBABILISTIC REASONING SYSTEMS

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Bayesian networks: Modeling

CS 188: Artificial Intelligence Spring Announcements

Undirected Graphical Models: Markov Random Fields

Logic and Bayesian Networks

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Directed and Undirected Graphical Models

Introduction to Causal Calculus

STA 4273H: Statistical Machine Learning

Transcription:

CS839: Probabilistic Graphical Models Lecture 2: Directed Graphical Models Theo Rekatsinas 1

Questions Questions? Waiting list Questions on other logistics 2

Section 1 1. Intro to Bayes Nets 3

Section 1 Representing Multivariate Distributions If X i s are conditionally independent (as described by a PGM), the joint can be factored to a product of simpler terms, e.g., P (X 1,X 2,X 3,X 4,X 5,X 6,X 7,X 8 ) =P (X 1 )P (X 2 )P (X 3 X 1 )P (X 4 X 2 )P (X 5 X 2 ) P (X 6 X 3,X 4 )P (X 7 X 6 )P (X 8 X 5,X 6 ) If are independent: X i s P (X i ) =P (X i ) P (X 1,X 2,X 3,X 4,X 5,X 6,X 7,X 8 ) =P (X 1 )P (X 2 )P (X 3 )P (X 4 )P (X 5 )P (X 6 )P (X 7 )P (X 8 ) 4

Section 1 Notation We will be using the notation from Koller and Friedman (at the end of the book) Random variable: X, Y, Z Random variable set (some time matrices): X, Y, Z Parameters: α, β, γ, κ, θ 5

Section 1 Representation -- example by Eric Xing The Dishonest Casino A casino has two dice: Fair die: P(1) = P(2) = = P(6) = 1/6 Loaded die: P(1) = P(2) = P(3) = P(5) = 1/10, P(6) = 1/2 The dealer switches back and forth between fair and loaded die once every 20 turns Game: 1. You bet $X 2. You roll (always with a fair die) 3. Dealer rolls (maybe with a fair die, maybe with a loaded die) 4. Highest number wins $2X 6

Section 1 Representation -- example by Eric Xing You have a sequence of rolls from the dealer 1245526462146146136136661664661 Questions How likely is this sequence, given our model of how the casino works? (Evaluation) What portion of the sequence was generated with a fair die, and what portion with a loaded one? (Decoding) How loaded is the loaded die? How fair is the fair die? How often does the casino player change from fair to loaded and back? (Learning) 7

Section 1 Knowledge Engineering -- example by Eric Xing Modeling random variables Latent (hidden) Observed 8

Section 1 Knowledge Engineering -- example by Eric Xing Modeling random variables Latent (hidden) Observed Modeling the structure Causal Generative Coupling 9

Section 1 Knowledge Engineering -- example by Eric Xing Modeling random variables Latent (hidden) Observed Modeling the structure Causal Generative Coupling Model probabilities Conditional probabilities Orders of magnitude 10

Section 1 Hidden Markov Model 11

Section 1 Bayesian Network A BN is a directed graph whose nodes represent the random variables and whose edges represent direct influence of one variable on another It is a data structure that provides the skeleton for representing a joint distribution compactly in a factorized way It offers a compact representation for a set of conditional independence assumptions about a distribution We can view the graph as encoding a generative sampling process executed by nature, where the value for each variable is selected by nature using a distribution that depends only on its parents. 12

Section 1 Bayesian Network Factorization Theorem: Given a DAG, the most general form of the probability distribution that is consistent with the graph factors according to node given its parents : P (X) = Y i=1:d P (X i X i ) where X is the set of parents of X i i, d is the number of nodes (variables) in the graph 13

Section 1 Bayesian Network Factorization Theorem Example P (X 1,X 2,X 3,X 4,X 5,X 6,X 7,X 8 ) =P (X 1 )P (X 2 )P (X 3 X 1 )P (X 4 X 2 )P (X 5 X 2 ) P (X 6 X 3,X 4 )P (X 7 X 6 )P (X 8 X 5,X 6 ) 14

Section 1 Specification of a Bayesian Network Qualitative specification: structure and variables (roughly knowledge engineering) Prior knowledge of causal relationships Prior knowledge of modular relationships Assessment from experts Learning from data Quantitative specification: instantiation of conditional probability distributions Conditional probability tables Continuous distributions 15

2. Local Structure and Independence 16

Local structure B Conditional independences Common parent Fixing B decouples A and C A C I(A?C B) Cascade Knowing B decouples A and C A B C I(C?A B) V-structure Knowing C couples A and B (knowing one variable explains the contribution of the other to a common child event) A C B Three foundational building blocks (compact language) for creating complex BNs 17

Why a graphical specification? Consider a factoring of P(A,B,C) = P(B)P(A C)P(B C) Consider I(A?C B) I(P (A, B, C)) I(A?C B) Can we show that? A B C 18

I-maps Def: Let P be a distribution over X. We define I(P) to be the set of independence assertions of the form (X?Y Z) that hold in P (despite how we set the parameter-values). Def: Let K be any graph object associated with a set of independencies I(K). We say that K is an I-map for a set of independencies I if I(K) I We say that graph G is an I-map for P if G is an I-map for I(P), where we use I(G) as the set of independencies associated. 19

I-maps For F to be an I-map of P, it is necessary that G does not mislead us regarding independencies in P: Any independence that G asserts must also hold in P. Conversely, P may have additional independencies that are not reflected in G Example 20

From I(G) to local Markov assumptions of BNs A BN with structure G is a directed acyclic graph whose nodes represent random variables. X 1,...,X n Local Markov assumptions (one way to specify independencies: ) Def: Let Pa Xi denote the parents of X i in G and NonDescendants Xi denote the variables in the graph that are not descendants of X i.theng encodes the following set of local conditional independence assumptions I l (G): I l (G) :{X i?nondescendants Xi Pa Xi : 8I} In other words, each node X i is independent of its non-descendants given its parents. 21

Graph separation criterion D-separation criterion for Bayesian networks (D for Directed edges): Def: variables x and y are D-separated (conditionally independent) given z if they are separate in the moralized ancestral graph. Example: 22

Active trail condition Causal trail X Z Y: active if and only if Z is not observed. Evidential trail X Z Y: active if and only if Z is not observed. Common cause X Z Y: active if and only if Z is not observed. Common effect (V-structure) X Z Y: active if and only if either Z or one of Z s descendants is observed. Let X, Y, Z be three sets of nodes in G. We say that X and Y are d-separated given Z, denoted d sep G (X; Y Z), if there is no active trail between any node X 2 X Y 2 Y and given Z. 23

Global Markov properties of BN X is d-separated (directed-separated) from Z given Y if we cannot send a ball from any node in X to any node in Z using the Bayes-ball algorithm. 24

Quantitative specification of P Separation properties in the graph imply independence properties about the associated variables Equivalence Theorem For a graph G, Let D 1 denote the family of all distributions that satisfy I(G). Let D 2 denote the family of all distributions that factor according to G P (X) = Y P (X i X i ) i=1:d Then D 1 D 2. For the graph to be useful, any conditional independence properties we can derive from the graph should hold for the probability distribution that the graph represents 25

Specification of BNs Conditional probability tables (CPTs) 26

Specification of BNs Conditional probability density functions (CPDs) 27

Summary of BN semantics Def: A Bayesian network is a pair (G,P) where P factorizes over G, and P is specified as a set of CPDs associated with G s nodes. Conditional independencies imply factorization Factorization according to G implies the associated conditional independencies. 28

Summary of BN semantics Def: A Bayesian network is a pair (G,P) where P factorizes over G, and P is specified as a set of CPDs associated with G s nodes. Conditional independencies imply factorization Factorization according to G implies the associated conditional independencies. Are there other independences that hold for every distribution P that factorizes over G? 29

Soundness and completeness Soundness: Theorem: if a distribution P factorizes according to G, then I(G) I(P). Completeness: Claim: for any distribution P that factorizes over G, if (X Y Z) I(P) then d sep G (X; Y Z)? 2 30

Soundness and completeness Soundness: Theorem: if a distribution P factorizes according to G, then I(G) I(P). Completeness: Claim: for any distribution P that factorizes over G, if (X Y Z) I(P) then d sep G (X; Y Z)? 2 Think of this: If X and Y are not d-separated given Z in G, then X and Y are dependent in all distributions P that factorize over G Is this true? 31

Soundness and completeness Completeness: Claim: for any distribution P that factorizes over G, if (X Y Z) I(P) then d sep G (X; Y Z)? 2 32

Soundness and completeness Completeness: Claim: for any distribution P that factorizes over G, if (X Y Z) I(P) then d sep G (X; Y Z)? 2 Theorem: Let G be a BN graph. If X and Y are not d-separated given Z in G, then X and Y are dependent in some distribution P that factorizes over G. 33

Uniqueness of BNs Very different BN graphs can actually be equivalent, in that they encode precisely the same set of conditional independence assertions. 34

I-equivalence Def: Two BN graphs G1 and G2 over X are I-equivalent if I(G1) = I(G2). Any distribution P that can be factorized over one of these graphs can be factorized over the other. Implications when trying to determine directionality of influence. 35

Simple BNs Conditionally Independent Observations 36

Simple BNs Plate model 37

Simple BNs Hidden Markov Model 38

Summary A Bayesian network is a pair (G, P) where P factorizes over G, and where P is specified as a set of local conditional probability distributions. A BN captures causality, generative schemes, asymetric influences etc. Local and global independence properties are identifiable via d- separation criteria 39