STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Markov networks: Representation

Similar documents
Markov Networks.

Bayesian Networks: Representation, Variable Elimination

1 Undirected Graphical Models. 2 Markov Random Fields (MRFs)

Alternative Parameterizations of Markov Networks. Sargur Srihari

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Directed Graphical Models

Machine Learning 4771

Gibbs Fields & Markov Random Fields

Alternative Parameterizations of Markov Networks. Sargur Srihari

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Directed and Undirected Graphical Models

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Probabilistic Reasoning Systems

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Notes on Markov Networks

Undirected Graphical Models: Markov Random Fields

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Bayesian networks. Chapter 14, Sections 1 4

Bayesian Networks (Part II)

Bayes Networks 6.872/HST.950

Probabilistic Graphical Models

Bayesian networks. Chapter Chapter

Probabilistic Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models

Chapter 16. Structured Probabilistic Models for Deep Learning

Undirected Graphical Models 4 Bayesian Networks and Markov Networks. Bayesian Networks to Markov Networks

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Graphical Models - Part II

6.867 Machine learning, lecture 23 (Jaakkola)

Lecture 4 October 18th

Which coin will I use? Which coin will I use? Logistics. Statistical Learning. Topics. 573 Topics. Coin Flip. Coin Flip

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Informatics 2D Reasoning and Agents Semester 2,

COMP5211 Lecture Note on Reasoning under Uncertainty

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Quantifying uncertainty & Bayesian networks

Probabilistic Graphical Models

Representation of undirected GM. Kayhan Batmanghelich

p(x) p(x Z) = y p(y X, Z) = αp(x Y, Z)p(Y Z)

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Undirected Graphical Models

Review: Directed Models (Bayes Nets)

Introduction to Artificial Intelligence. Unit # 11

3 : Representation of Undirected GM

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Statistical Approaches to Learning and Discovery

Uncertainty and Belief Networks. Introduction to Artificial Intelligence CS 151 Lecture 1 continued Ok, Lecture 2!

Introduction to Artificial Intelligence Belief networks

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Probabilistic Models Bayesian Networks Markov Random Fields Inference. Graphical Models. Foundations of Data Analysis

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

EE512 Graphical Models Fall 2009

Inference in Bayesian Networks

CSC 412 (Lecture 4): Undirected Graphical Models

Intelligent Systems (AI-2)

This lecture. Reading. Conditional Independence Bayesian (Belief) Networks: Syntax and semantics. Chapter CS151, Spring 2004

Bayesian Networks. Motivation

PROBABILISTIC REASONING SYSTEMS

Using Graphs to Describe Model Structure. Sargur N. Srihari

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

School of EECS Washington State University. Artificial Intelligence

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Probabilistic Graphical Models

Exact Inference Algorithms Bucket-elimination

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian Belief Network

Rapid Introduction to Machine Learning/ Deep Learning

Learning Bayesian Networks (part 1) Goals for the lecture

Lecture 5: Bayesian Network

Bayesian Learning. Bayesian Learning Criteria

Markov Random Fields

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Probabilistic Representation and Reasoning

Probabilistic Graphical Models (I)

Inference in Bayesian networks

Bayesian & Markov Networks: A unified view

Undirected Graphical Models

Graphical Models and Kernel Methods

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayesian Networks. Philipp Koehn. 6 April 2017

Bayesian Networks. Probability Theory and Probabilistic Reasoning. Emma Rollon and Javier Larrosa Q

4.1 Notation and probability review

Graphical Models - Part I

Gibbs Field & Markov Random Field

Bayesian Networks. Philipp Koehn. 29 October 2015

last two digits of your SID

Markov Random Fields

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

From Bayesian Networks to Markov Networks. Sargur Srihari

Artificial Intelligence

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CS Lecture 4. Markov Random Fields

Transcription:

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas Markov networks: Representation

Markov networks: Undirected Graphical models Model the following distribution using a Bayesian network I(A, {B, D}, C) I(B, {A, C}, D) Here, we are trying to model how both good and bad information spreads between four mutual friends: Anna (A), Bob (B), Charles (C) and Debbie (D). Anna and Charles used to date each other and the relationship ended badly. Bob and Debbie were married. Unfortunately, they got divorced recently and are not in speaking terms.

Attempted Bayesian networks A A D B D B D B C C C A (a) (b) (c) (b) says that A is independent of C given {B, D}. However, it also implies that B and D are independent given only A but dependent given both A and C.

Markov networks: Undirected Graphical models Definition: An undirected graph G A set of factors or functions, denoted by φ associated with maximal cliques of G Example of a factor φ(a, B) a b 30 a b 5 a b 1 a b 10

Distribution represented by a Markov network Normalized product of all factors (called the Gibbs distribution). Pr(a, b, c, d) = 1 Z φ(a, b) φ(a, c) φ(b, d) φ(c, d) Z is a normalizing constant, often called the partition function Z = a,b,c,d φ(a, b) φ(a, c) φ(b, d) φ(c, d)

What does a factor represent? They are not conditional probability distributions. Why?

What does a factor represent? They are not conditional probability distributions. Why? The interactions are not directed.

What does a factor represent? They are not conditional probability distributions. Why? The interactions are not directed. The factors represent affinity between related variables. Higher values are more probable and they trade-off with each other using the partition function Z.

What does a factor represent? They are not conditional probability distributions. Why? The interactions are not directed. The factors represent affinity between related variables. Higher values are more probable and they trade-off with each other using the partition function Z. Alice and Bob tend to agree with each other. a b 30 a b 5 a b 1 a b 10

Distribution represented by a Markov network φ(a, B) a 0 b 0 30 a 0 b 1 5 a 1 b 0 1 a 1 b 1 10 φ(b, C) b 0 c 0 100 b 0 c 1 1 b 1 c 0 1 b 1 c 1 100 φ(c, D) c 0 d 0 1 c 0 d 1 100 c 1 d 0 100 c 1 d 1 1 φ(d, A) d 0 a 0 100 d 0 a 1 1 d 1 a 0 1 d 1 a 1 100 Assignment UnNormalized Normalized a 0 b 0 c 0 d 0 300K 0.04 a 0 b 0 c 0 d 1 300K 0.04 a 0 b 0 c 1 d 0 300K 0.04 a 0 b 0 c 1 d 1 30 4.1E-06 a 0 b 1 c 0 d 0 500 6.9E-05 a 0 b 1 c 0 d 1 500 6.9E-05 a 0 b 1 c 1 d 0 5M 0.69 a 0 b 1 c 1 d 1 500 6.9E-05 a 1 b 0 c 0 d 0 100 1.4E-05 a 1 b 0 c 0 d 1 1M 0.14 a 1 b 0 c 1 d 0 100 1.4E-05 a 1 b 0 c 1 d 1 100 1.4E-05 a 1 b 1 c 0 d 0 10 1.4E-05 a 1 b 1 c 0 d 1 100K 0.014 a 1 b 1 c 1 d 0 100K 0.014 a 1 b 1 c 1 d 1 100K 0.014

The Partition Function vs Local Factors Compare: Marginal over A, B is: a 0 b 0.13 a 0 b 1.69 a 1 b 0.14 a 1 b 1.04 φ(a, B) a 0 b 0 30 a 0 b 1 5 a 1 b 0 1 a 1 b 1 10 The most likely tuple in φ(a, B) is the first one. The most likely tuple in the marginal is the second one. Strength of other factors is much stronger than that of φ(a, B) and therefore the influence of the latter is overwhelmed.

Independencies represented by the Markov network Graph separation: I(X, Z, Y) only if X and Y are disconnected after removing Z What are the independencies represented by this network?

Independencies represented by the Markov network Graph separation: I(X, Z, Y) only if X and Y are disconnected after removing Z What are the independencies represented by this network? I(A, {B, C}, D)

Independencies represented by the Markov network Graph separation: I(X, Z, Y) only if X and Y are disconnected after removing Z What are the independencies represented by this network? I(A, {B, C}, D) I(B, {A, D}, C)

Independencies represented by the Markov network Graph separation: I(X, Z, Y) only if X and Y are disconnected after removing Z What are the independencies represented by this network? I(A, {B, C}, D) I(B, {A, D}, C) Compare with the Bayesian network for the example problem.

Independencies represented by the Markov network If Pr is a Gibbs distribution of an undirected graph G, then G is an I-map of Pr. If G is an I-map of a positive distribution Pr, then Pr is a Gibbs distribution. Theorem (Hammersley-Clifford Theorem) Let Pr be a positive distribution, then Pr is a Gibbs distribution of a Markov network G if and only if G is an I-map of Pr.

Independencies represented by the Markov network Remember Bayesian networks. We had local independencies (Markov) and global independencies (d-separation). Analogously, we can define local independencies via Markov blanket (neighbors of node). I(X, Neighbors(X), Non Neighbors(X)) Global independencies via Graph separation follow from these local independencies Building minimal I-maps 1. For each node X find a minimal set of nodes MB(X) such that the local independence property is satisfied. 2. Connect X with MB(X)

Bayesian vs Markov Networks Property Bayesian networks Markov networks Functions CPTs Factors/Potentials Distribution Product of CPTs normalized product Cycles Not Allowed Allowed Partition function Z = 1 Z =? Independencies Check D-separation Graph-separation Compact models Class-BN Class-MN Class-BN Class-MN

Converting Bayesian to Markov networks Moralization: Connect all parents of a node to each other. Remove directionality. Moralized graph MG of a Bayesian network G is a minimal I-map of the distribution represented by G. Moralization results in loss of some independence information. Example: X Z Y. Markov network?

Converting Bayesian to Markov networks Moralization: Connect all parents of a node to each other. Remove directionality. Moralized graph MG of a Bayesian network G is a minimal I-map of the distribution represented by G. Moralization results in loss of some independence information. Example: X Z Y. Markov network? Clique over X, Y and Z Loss:

Converting Bayesian to Markov networks Moralization: Connect all parents of a node to each other. Remove directionality. Moralized graph MG of a Bayesian network G is a minimal I-map of the distribution represented by G. Moralization results in loss of some independence information. Example: X Z Y. Markov network? Clique over X, Y and Z Loss: I(X,, Y ) no longer holds in the Markov network.

Converting Bayesian to Markov networks Draw an equivalent Markov network. Burglary Earthquake TV Alarm Nap JohnCall MaryCall

Converting Bayesian to Markov networks Draw an equivalent Markov network.

Converting Markov to Bayesian networks Non-trivial and hard! Requires triangulating the Markov network. Example: Convert the following to a Bayesian network

Converting Markov to Bayesian networks Equivalent Bayesian network that is a minimal I-map of the Markov network. Loss of independence assumptions: I(C, {A, F}, D) does not hold in the Bayesian network.

Classes of Compact models

Factor Graphs A Markov network does not reveal its Gibbs distribution. We need to know the parameters! Example: Are the factors φ(a, B), φ(a, C), φ(b, C) or is there a single factor φ(a, B, C)? Factor graph is a fine grained factorization. It is a bi-partite graph.

Factor Graphs Quiz: How to convert a Markov network to a factor graph?

Log-linear models Convert each tuple in a function to a formula. If the formula is True then weight equals log of the function value and 0 otherwise. Pr(x) = 1 Z exp w i I x (f i ) fi I x (f i ) = 1 if x satisfies f i and 0 otherwise. Var 1 Var 2 φ Formula Weight A B 30 A B log(30) A B 5 A B log(5) A B 1 A B log(1) A B 30 A B log(30)

Log-linear models: Mapping logic to Probabilities If two formulas have the same weight, we can merge the two: Var 1 Var 2 φ Formula Weight A B 30 A B log(30) A B 5 A B log(5) A B 1 A B log(1) A B 30 A B log(30) Merging first and fourth formula, we have a new [formula, weight] pair: [(A B) ( A B), log(30)]. Significant compaction can be achieved in some cases.

Building Graphical models by hand: Skills Identify relevant variables and their possible values. Build the network structure (a Bayesian network or a factor graph) [Issues: CPTs/Factors can be large] Define the specific numbers in CPTs or factors [Issues: How significant are the numbers? Will a 0.1 instead of 0.01 matter?] Identify the specific queries that needs to be answered (application dependent)

Types of Queries Probability of evidence Prior vs Posterior marginals Most probable explanation Maximum a posteriori hypothesis (MAP) Burglary Earthquake TV Alarm Nap JohnCall MaryCall