STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

Similar documents
Reasoning with Bayesian Networks

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Logic and Bayesian Networks

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Probabilistic Reasoning Systems

Bayesian Networks: Representation, Variable Elimination

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Probability Review

Directed Graphical Models or Bayesian Networks

Sampling Algorithms for Probabilistic Graphical models

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Markov networks: Representation

Bayesian networks Lecture 18. David Sontag New York University

Directed and Undirected Graphical Models

Capturing Independence Graphically; Undirected Graphs

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Bayesian Network Representation

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

CS Lecture 3. More Bayesian Networks

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Capturing Independence Graphically; Undirected Graphs

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Probabilistic Graphical Models

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian networks. Chapter 14, Sections 1 4

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Informatics 2D Reasoning and Agents Semester 2,

Uncertainty and Bayesian Networks

CS 343: Artificial Intelligence

Bayesian Networks (Part II)

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Machine Learning for Data Science (CS4786) Lecture 24

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Reasoning Under Uncertainty: Bayesian networks intro

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Intelligent Systems (AI-2)

Extensions of Bayesian Networks. Outline. Bayesian Network. Reasoning under Uncertainty. Features of Bayesian Networks.

Probabilistic Graphical Networks: Definitions and Basic Results

Directed Graphical Models

Qualifier: CS 6375 Machine Learning Spring 2015

Probabilistic Models

Quantifying uncertainty & Bayesian networks

Causality in Econometrics (3)

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Learning in Bayesian Networks

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Prof. Dr. Lars Schmidt-Thieme, L. B. Marinho, K. Buza Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Probabilistic Graphical Models (I)

Bayesian networks. Chapter Chapter

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

Structure Learning: the good, the bad, the ugly

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Markov properties for undirected graphs

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

CS 188: Artificial Intelligence Fall 2008

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Separable and Transitive Graphoids

Exact Inference Algorithms Bucket-elimination

Part I Qualitative Probabilistic Networks

{ p if x = 1 1 p if x = 0

Probabilistic Representation and Reasoning

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Review: Bayesian learning and inference

Bayesian Networks Representation

Bayesian Networks BY: MOHAMAD ALSABBAGH

Exact Inference Algorithms Bucket-elimination

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

On the errors introduced by the naive Bayes independence assumption

2 : Directed GMs: Bayesian Networks

Rapid Introduction to Machine Learning/ Deep Learning

Probabilistic Machine Learning

Bayesian Approaches Data Mining Selected Technique

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Introduction to Artificial Intelligence Belief networks

Probabilistic Models. Models describe how (a portion of) the world works

CS Belief networks. Chapter

Fusion in simple models

Bayesian networks (1) Lirong Xia

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

CS 188: Artificial Intelligence Fall 2009

CS 343: Artificial Intelligence

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Bayesian Networks. Material used

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

From Bayesian Networks to Markov Networks. Sargur Srihari

CPSC 540: Machine Learning

PROBABILISTIC REASONING SYSTEMS

Markov properties for directed graphs

CSE 473: Artificial Intelligence Autumn 2011

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

BN Semantics 3 Now it s personal! Parameter Learning 1

Introduction to Bayesian Learning

Transcription:

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas Bayesian networks: Representation

Motivation Explicit representation of the joint distribution is unmanageable Computationally: Memory intensive to store and manipulate Cognitively: Impossible to acquire so many numbers from human experts Statistically: We will need ridiculously large amount of data to learn. Solution: Exploit Independence properties and Represent the distribution using a graph Trouble: Mapping the logic of probability theory into graph theory!

Properties of Independence The statement I(X, Z, Y) means that X is independent of Y given Z. Namely, Pr(X Y, Z) = Pr(X Z) and Pr(X, Y Z) = Pr(X Z) Pr(Y Z) Symmetry I(X, Z, Y) I(Y, Z, X) Decomposition I(X, Z, Y W) I(X, Z, Y) Weak Union I(X, Z, Y W) I(X, Z W, Y) Contraction I(X, Z Y, W)&I(X, Z, Y) I(X, Z, Y W) Intersection For any positive distribution: I(X, Z W, Y)&I(X, Z Y, W) I(X, Z, Y W)

Proof of Symmetry Assume that I(X, Z, Y) holds. This implies that: Pr(X, Y Z) = Pr(X Z) Pr(Y Z) (1) i.e. I(Y, Z, X) holds too (exchanging the positions of X and Y).

Proof of Decomposition Assume that I(X, Z, Y W) holds. Then, Pr(X, Y, W Z) = Pr(X Z) Pr(Y, W Z) Pr(X, Y Z) = w = w Pr(X, Y, w Z) (2) Pr(X Z) Pr(Y, w Z) (3) = Pr(X Z) w Pr(Y, w Z) (4) i.e. I(X, Z, Y) holds too. = Pr(X Z) Pr(Y Z) (5)

Proof of Weak Union Assume that I(X, Z, Y W) holds. Then, Pr(X, Y, W Z) = Pr(X Z) Pr(Y, W Z) Pr(X, Y W, Z) = Pr(X, Y, W Z) Pr(W Z) (6) = Pr(X Z) Pr(Y, W Z) Pr(W Z) (7) = Pr(X Z) Pr(Y W, Z) (8) I stopped here: Homework problem Which of the previous properties can you use to prove that: Pr(X, Y W, Z) = Pr(X W, Z) Pr(Y W, Z)

Bayesian networks: Directed-Graphs Mapping a distribution to a Graph!! The graph can be viewed in two different ways: As a data structure to represent the joint distribution compactly As a compact representation of a set of conditional independence assumptions about a distribution The two views are equivalent

Bayesian networks: Data Structure view Bayesian network = Use Chain rule + Conditional Independence properties Chain rule: P(X 1,..., X n ) = n P(X i X 1,..., X i 1 ) i=1

Bayesian networks: Data Structure view Chain rule: P(X 1,..., X n ) = n i=1 P(X i X 1,..., X i 1 ) Use the chain rule to represent the joint distribution rather than a giant table! Example Intelligence I and SAT Score S P(I, S) = P(I)P(S I) s 0 s 1 i 0 0.95*0.7 0.05 *0.7 i 1 0.2*0.3 0.8*0.3 = i0 i 1 0.7 0.3 s0 s1 i 0 0.95 0.05 i 1 0.2 0.8

Bayesian networks: Data Structure view However, we don t gain anything by using the chain rule. Space complexity is the same. Exploit conditional independence properties What if I tell you that you are representing a joint distribution over 2 coin tosses? Chain rule : P(X 1, X 2 ) = P(X 1 )P(X 2 X 1 ) Conditional Independence : P(X 1, X 2 ) = P(X 1 )P(X 2 )

Bayesian networks: Data Structure view Chain rule P(X 1,..., X n ) = n i=1 P(X i X 1,..., X i 1 ) as a directed graph. X 1,..., X i 1 are the parents of X i. Complete Graph If we know that P(X i X 1,..., X i 1 ) = P(X i Y i ) where Y i is a subset of {X 1,..., X i 1 }. Then, we get a sparse graph (a Bayesian network).

Bayesian networks: Data structure view d 0 d 1 0.6 0.4 g 1 i 0,d 0 0.3 i 0,d 1 0.05 i 0,d 0 0.9 i 0,d 1 0.5 g 2 g 3 0.4 0.3 0.25 0.7 0.08 0.02 0.3 0.2 Difficulty Grade Letter l 0 g 1 0.1 g 2 0.4 g 2 0.99 Intelligence l 1 0.9 0.6 0.01 i 0 i 1 SAT s 0 i 0 0.95 0.2 i 1 0.7 0.3 s 1 0.05 0.8 Random variables are nodes and edges represent direct influence of one variable on other Each node is associated with a conditional probability table (CPT). The joint distribution is product of all CPTs P(D, I, G, S, L) = P(D)P(I)P(G D, I)P(L G)P(S I) What is P(i 1, d 0, g 2, s 1, l 0 )?

Bayesian networks: Data structure view d 0 d 1 0.6 0.4 g 1 i 0,d 0 0.3 i 0,d 1 0.05 i 0,d 0 0.9 i 0,d 1 0.5 g 2 g 3 0.4 0.3 0.25 0.7 0.08 0.02 0.3 0.2 Difficulty Grade Letter l 0 g 1 0.1 g 2 0.4 g 2 0.99 Intelligence l 1 0.9 0.6 0.01 i 0 i 1 SAT s 0 i 0 0.95 0.2 i 1 0.7 0.3 s 1 0.05 0.8 Space complexity? Assume each variable has d values in its domain. O(d k+1 ) for each variable having k parents.

Graph terms

Bayesian networks: Compact Representation of Conditional Independence statements view A directed acyclic graph G represents the following independence statements Markov(G) = I(V, Parents(V ), Non Descendants(V )) Parents(V) denote the direct causes of V and Descendants(V) denote the effects of V Given the direct causes of a variable, our beliefs in that variable become independent of its non-effects.

Bayesian networks: Compact Representation of Conditional Independence statements view Markovian assumptions? 1. 2. 3. 4. 5.

Bayesian networks: Compact Representation of Conditional Independence statements view

Expanding Markov (G) using properties of probabilistic independence Markov(G) is not comprehensive. We can expand it using properties such as symmetry, decomposition, weak-union, contraction and intersection. Given W Non Descendants(X)), how can we strengthen Markov(G) = I(X, Parents(X), Non Descendants(X)) Decomposition: I(X, Z, Y W) I(X, Z, Y)

Expanding Markov (G) using properties of probabilistic independence Markov(G) is not comprehensive. We can expand it using properties such as symmetry, decomposition, weak-union, contraction and intersection. Given W Non Descendants(X)), how can we strengthen Markov(G) = I(X, Parents(X), Non Descendants(X)) Decomposition: I(X, Z, Y W) I(X, Z, Y) I(X, Parents(X), W) Weak Union: I(X, Z, Y W) I(X, Z W, Y)

Expanding Markov (G) using properties of probabilistic independence Markov(G) is not comprehensive. We can expand it using properties such as symmetry, decomposition, weak-union, contraction and intersection. Given W Non Descendants(X)), how can we strengthen Markov(G) = I(X, Parents(X), Non Descendants(X)) Decomposition: I(X, Z, Y W) I(X, Z, Y) I(X, Parents(X), W) Weak Union: I(X, Z, Y W) I(X, Z W, Y) and so on. I(X, Parents(X) W, Non Decendants(X) \ W)

Expanding Markov(G) using Symmetry

Expanding Markov(G) using Decomposition

Expanding Markov(G) using Weak-union

Expanding Markov(G) using Weak-union

Graphoid Axioms

Capturing independence graphically Question: Is there a purely graphical test that can find all of these independence statements, namely Markov(G) plus the ones inferred using the properties of conditional independence? YES, it is called d-separation.

D-separation

D-separation

D-separation

D-separation

D-separation

D-separation

D-separation

D-separation

D-separation

D-separation

D-separation

D-separation

D-separation

D-separation

D-separation

D-separation The d-separation test is sound If distribution Pr is induced by Bayesian network G, then dsep G (X, Z, Y) only if I Pr (X, Z, Y) The proof of soundness is constructive showing that every independence claimed by d-separation can indeed be derived using the graphoid axioms.

I-map, D-map and P-map A directed acyclic graph (a Bayesian network) describes a set of conditional independence assumptions I G. It is an I-map of a distribution I Pr if I G I Pr or I G I Pr It is a D-map if I Pr I G or I Pr I G It is a P-map if it is both an I-map and a D-map. Namely, I G = I Pr I-maps and D-maps can be constructed trivially. Therefore, we enforce minimality.

Minimal I-maps

Minimal I-maps Minimal I-maps are not unique. Different orderings give rise to different I-maps D I D I D I G S G S G S L L L (a) (b) (c) Ordering for (b): (L, S, G, I, D) Ordering for (c): (L, D, S, I, G)

Blankets and Boundaries

Blankets and Boundaries

Blankets and Boundaries