Capturing Independence Graphically; Undirected Graphs

Similar documents
Capturing Independence Graphically; Undirected Graphs

Exact Inference Algorithms Bucket-elimination

Exact Inference Algorithms Bucket-elimination

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Probabilistic Reasoning; Network-based reasoning

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Probabilistic Reasoning; Network-based reasoning

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Probabilistic Reasoning; Network-based reasoning

Markov properties for undirected graphs

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

Bayesian Networks: Representation, Variable Elimination

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

CS Lecture 4. Markov Random Fields

Faithfulness of Probability Distributions and Graphs

Markov properties for undirected graphs

Statistical Approaches to Learning and Discovery

Separable and Transitive Graphoids

COMP538: Introduction to Bayesian Networks

Introduction to Artificial Intelligence. Unit # 11

Directed Graphical Models

Probabilistic Graphical Models (I)

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Probabilistic Graphical Models

Probabilistic Reasoning: Graphical Models

Independence, Decomposability and functions which take values into an Abelian Group

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS Lecture 3. More Bayesian Networks

Conditional Independence

COMPSCI 276 Fall 2007

Bayesian Machine Learning

Conditional Independence and Factorization

Learning in Bayesian Networks

Directed Graphical Models or Bayesian Networks

Reasoning with Bayesian Networks

Undirected Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models

Bayesian Network Representation

Directed and Undirected Graphical Models

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

Semirings for Breakfast

Y. Xiang, Inference with Uncertain Knowledge 1

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Lecture 5: Bayesian Network

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Bayesian Networks (Part II)

Conditional Independence and Markov Properties

{ p if x = 1 1 p if x = 0

A Tutorial on Bayesian Belief Networks

Fusion in simple models

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

A graph-separation theorem for quantum causal models

Tutorial: Gaussian conditional independence and graphical models. Thomas Kahle Otto-von-Guericke Universität Magdeburg

On Markov Properties in Evidence Theory

Undirected Graphical Models

Lecture 4 October 18th

Decision-Theoretic Specification of Credal Networks: A Unified Language for Uncertain Modeling with Sets of Bayesian Networks

Total positivity in Markov structures

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

BN Semantics 3 Now it s personal! Parameter Learning 1

Causal Inference & Reasoning with Causal Bayesian Networks

Undirected Graphical Models: Markov Random Fields

Chapter 16. Structured Probabilistic Models for Deep Learning

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic Graphical Models

COMP5211 Lecture Note on Reasoning under Uncertainty

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Independency relationships and learning algorithms for singly connected networks

The Monte Carlo Method: Bayesian Networks

Machine Learning Lecture 14

From Distributions to Markov Networks. Sargur Srihari

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Introduction to Bayesian Networks

COMP538: Introduction to Bayesian Networks

Structure Learning: the good, the bad, the ugly

An Introduction to Bayesian Networks: Representation and Approximate Inference

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Computational Genomics

From Bayesian Networks to Markov Networks. Sargur Srihari

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Bayesian networks as causal models. Peter Antal

A Relational Knowledge System

Reasoning with Deterministic and Probabilistic graphical models Class 2: Inference in Constraint Networks Rina Dechter

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Introduction to Causal Calculus

Introduction to Bayesian Networks. Probabilistic Models, Spring 2009 Petri Myllymäki, University of Helsinki 1

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

2 : Directed GMs: Bayesian Networks

Causality in Econometrics (3)

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Machine Learning Summer School

Advanced Probabilistic Modeling in R Day 1

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Transcription:

Capturing Independence Graphically; Undirected Graphs COMPSCI 276, Spring 2014 Set 2: Rina Dechter (Reading: Pearl chapters 3, Darwiche chapter 4) 1

Constraint Networks Example: map coloring Variables - countries (A,B,C,etc.) Values - colors (red, green, blue) Constraints: A B, A D, D E, etc. A B red green red yellow green red green yellow yellow green yellow red Constraint A graph E A E D B F C D B F G G C

Bayesian Networks (Pearl 1988) Smoking P(S) BN (G, Θ) P(C S) lung Cancer P(X C,S) X-ray Bronchitis P(B S) P(D C,B) Dyspnoea CPD: C B P(D C,B) 0 0 0.1 0.9 0 1 0.7 0.3 1 0 0.8 0.2 1 1 0.9 0.1 Combination: Product P(S, C, B, X, D) = P(S) P(C S) P(B S) P(X C,S) P(D C,B) Marginalization: sum/max Posterior marginals, probability of evidence, MPE P( x 1... x P( e) n ) P( D= 0) = i i i MAP(P)= P(S) P(C S) P(B S) P(X C,S) P(D C,B) X E mpe max x i p( x p( x P( x) i pa( x pa( x i )) ))

Sample Applications for Graphical Models class1 huji

The Qualitative Notion of Depedence Motivations and issues Motivating example: What I eat for breakfast, what I eat for dinner? What I eat for breakfast, What I dress What I eat for breakfast today, the grade in 276 The time I work on homework 1, my grade in 276 Shoe size,reading ability Shoe-size, reading ability, if we know the age 5

The Qualitative Notion of Depedence motivations and issues The traditional definition of independence uses equality of numerical quantities as in P(x,y)=P(x)P(y) People can easily and confidently detect dependencies, but not provide numbers The notion of relevance and dependence are far more basic to human reasoning than the numerical quantification. Assertions about dependency relationships should be expressed first. 6

Dependency graphs The nodes represent propositional variables and the arcs represent local dependencies among conceptually related propositions. Graph concepts are entrenched in our language (e.g., thread of thoughts, lines of reasoning, connected ideas ). One wonders if people can reason any other way except by tracing links and arrows and paths in some mental representation of concepts and relations. What types of (in)dependencies are deducible from graphs? For a given probability distribution P and any three variables X,Y,Z,it is straightforward to verify whether knowing Z renders X independent of Y, but P does not dictates which variables should be regarded as neighbors. Some useful properties of dependencies and relevancies cannot be represented graphically. 7

Properties of Probabilistic independence If Probabilistic independence is a good (intuitive to human reasoning) formalizm, then the axioms it obeys will be consistent with our intuition 9

Properties of Probabilistic independence Symmetry: I(X,Z,Y) I(Y,Z,X) Decomposition: I(X,Z,YW) I(X,Z,Y) and I(X,Z,W) Weak union: I(X,Z,YW)I(X,ZW,Y) Contraction: I(X,Z,Y) and I(X,ZY,W)I(X,Z,YW) Intersection: I(X,ZY,W) and I(X,ZW,Y) I(X,Z,YW) 10

Pearl language: If two pieces of information are irrelevant to X then each one is irrelevant to X

Example: Two coins and a bell

17

Graphs vs Graphoids Symmetry: I(X,Z,Y) I(Y,Z,X) Decomposition: I(X,Z,YW) I(X,Z,Y) and I(X,Z,W) Weak union: I(X,Z,YW)I(X,ZW,Y) Graphoid: satisfy all 5 axioms Semi-graphoid: satisfies the first 4. Contraction: I(X,Z,Y) and I(X,ZY,W)I(X,Z,YW) Intersection: I(X,ZY,W) and I(X,ZW,Y) I(X,Z,YW) Decomposition is only one way while in graphs it is iff. Weak union states that w should be chosen from a set that, like Y should already be separated from X by Z 18

Why Axiomatic Characterization? Allow deriving conjectures about independencies that are clearer Axioms serve as inference rules Can capture the principal differences between various notions of relevance or independence 19

Dependency Models and Dependency Maps 20

Independency-map (i-map) and Dependency-maps (d-maps) model with induced dependencies cannot have a graph which is a perfect map xample: two coins and a bell try it ow we then represent two causes leading to a common consequence? 21

Axiomatic Characterization of Graphs Definition: A model M is graph-isomorph if there exists a graph which is a perfect map of M. Theorem (Pearl and Paz 1985): A necessary and sufficient condition for a dependency model to be graph isomorph is that it satisfies Symmetry: I(X,Z,Y) I(Y,Z,X) Decomposition: I(X,Z,YW) I(X,Z,Y) and I(X,Z,Y) Intersection: I(X,ZW,Y) and I(X,ZY,W)I(X,Z,YW) Strong union: I(X,Z,Y) I(X,ZW, Y) Transitivity: I(X,Z,Y) exists t s.t. I(X,Z,t) or I(t,Z,Y) This properties are satisfied by graph separation 22

Markov Networks Graphs and probabilities: Given P, can we construct a graph I-map with minimal edges? Given (G,P) can we test if G is an I-map? a perfect map? Markov Network Definition: A graph G which is a minimal I-map of a probability distribution P, namely deleting any edge destroys its i-mappness, is called a Markov network of P. 23

Markov Networks Theorem (Pearl and Paz 1985): A dependency model satisfying symmetry decomposition and intersection has a unique minimal graph as an i-map, produced by deleting every edge (a,b) for which I(a,U-a-b,b) is true. The theorem defines an edge-deletion method for constructing G0 Markov blanket of a is a set S for which I(a,S,U-S-a). Markov Boundary: a minimal Markov blanket. Theorem (Pearl and Paz 1985): if symmetry, decomposition, weak union and intersection are satisfied by P, the Markov boundary is unique and it is the neighborhood in the Markov network of P 24

Markov Networks Corollary: the Markov network G of any strictly positive distribution P can be obtained by connecting every node to its Markov boundary. The following 2 interpretations of direct neighbors are identical: Neighbors as blanket that shields a variable from the influence of all others Neighborhood as a tight influence between variables that cannot be weakened by other elements in the system So, given P (positive) how can we construct G? Given (G,P) how do we test the G is an I-map of P? Given G, can we construct P which is a perfect i-map? (Geiger and Pearl 1988) 25

Testing I-mapness Theorem 5 (Pearl): Given a positive P and a graph G the following are equivalent: G is an I-map of P iff G is a super-graph of the Markov network of P G is locally Markov w.r.t. P (the neighbors of a in G is a Markov blanket.) iff G is a super-graph of the Markov network of P There appear to be no test for I-mappness of undirected graph that works for extreme distributions without testing every cutset in G (ex: x=y=z=t ) Representations of probabilistic independence using undirected graphs rest heavily on the intersection and weak union axioms. In contrast, we will see that directed graph representations rely on the contraction and weak union axiom, with intersection playing a minor role. 26

Markov Networks: Summary

The unusual edge (3,4) reflects the reasoning that if we fix the arrival time (5) the travel time (4) must depends on current time (3)

Summary

How can we construct a probability Distribution that will have all these independencies?

So, How do we learn Markov networks From data?

G is locally markov If neighbors make every Variable independent From the rest.