STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Similar documents
BN Semantics 3 Now it s personal!

2 : Directed GMs: Bayesian Networks

Directed Graphical Models or Bayesian Networks

2 : Directed GMs: Bayesian Networks

Probabilistic Graphical Models

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

CS Lecture 3. More Bayesian Networks

CS Lecture 4. Markov Random Fields

Lecture 4 October 18th

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Learning in Bayesian Networks

Directed Graphical Models

4.1 Notation and probability review

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Independencies. Undirected Graphical Models 2: Independencies. Independencies (Markov networks) Independencies (Bayesian Networks)

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Bayesian Networks Representation

Rapid Introduction to Machine Learning/ Deep Learning

Undirected Graphical Models: Markov Random Fields

Chris Bishop s PRML Ch. 8: Graphical Models

Markov properties for directed graphs

Machine Learning Lecture 14

Directed and Undirected Graphical Models

Probabilistic Graphical Networks: Definitions and Basic Results

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Data Mining 2018 Bayesian Networks (1)

BN Semantics 3 Now it s personal! Parameter Learning 1

3 : Representation of Undirected GM

The Origin of Deep Learning. Lili Mou Jan, 2015

CPSC 540: Machine Learning

Representation of undirected GM. Kayhan Batmanghelich

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Probability Review

Undirected Graphical Models

Probabilistic Graphical Models (I)

4 : Exact Inference: Variable Elimination

Based on slides by Richard Zemel

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Review: Directed Models (Bayes Nets)

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

An Introduction to Bayesian Machine Learning

CSE 473: Artificial Intelligence Autumn 2011

9 Forward-backward algorithm, sum-product on factor graphs

STA 4273H: Statistical Machine Learning

Bayesian Machine Learning

Stat 521A Lecture 1 Introduction; directed graphical models

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Conditional Independence and Factorization

Bayesian Networks (Part II)

Statistical Learning

Probabilistic Graphical Models

Machine Learning 4771

Directed Graphical Models

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Undirected Graphical Models

Probabilistic Models

Learning P-maps Param. Learning

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Bayesian networks (1) Lirong Xia

Introduction to Causal Calculus

Inference methods in discrete Bayesian networks

Reasoning Under Uncertainty: Belief Network Inference

Probabilistic Models. Models describe how (a portion of) the world works

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Bayesian Networks to design optimal experiments. Davide De March

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

Lecture 17: May 29, 2002

Directed and Undirected Graphical Models

Graphical Models - Part I

Bayesian Networks. Alan Ri2er

Y. Xiang, Inference with Uncertain Knowledge 1

TDT70: Uncertainty in Artificial Intelligence. Chapter 1 and 2

Bayesian Network Representation

Graphical Models and Kernel Methods

From Bayesian Networks to Markov Networks. Sargur Srihari

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Motivation. Bayesian Networks in Epistemology and Philosophy of Science Lecture. Overview. Organizational Issues

CS 5522: Artificial Intelligence II

CSC 412 (Lecture 4): Undirected Graphical Models

Events A and B are independent P(A) = P(A B) = P(A B) / P(B)

Bayesian Networks 3 D-separation. D-separation

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Arrowhead completeness from minimal conditional independencies

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

6.867 Machine learning, lecture 23 (Jaakkola)

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Chapter 16. Structured Probabilistic Models for Deep Learning

Uncertain Reasoning. Environment Description. Configurations. Models. Bayesian Networks

Transcription:

STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Bayesian Networks

Representing Joint Probability Distributions 2 n -1 free parameters

Reducing Number of Parameters: Conditional Independence Conditional independence can reduce the number of parameters 1 parameter 2 parameters 2 parameters

Reducing Number of Parameters: Conditional Independence Z X Y 1 parameter 2 parameters 2 parameters

Example: Naïve Bayes & & L A P F E L: like/dislike A: ambience P: price F: food E: ethnic &

Structure Graph Skeleton for factorization of a joint distribution Representation for a set of conditional independence relations Two are the same

Causal Interpretation

With Probability Tables

Marginals

Causal Reasoning

Causal Reasoning

Evidential Reasoning

Evidential Reasoning

Evidential Reasoning

Explaining Away

Explaining Away

Bayesian Network Model Graphical way to describe a particular chain rule decomposition of the joint distribution Parents in the graph are conditioning variables Factors are conditional probability distributions

Conditional Independence Assumptions What conditional independence assumptions are made?

Representation Theorem Local Markov assumption Each variable is independent of its non-descendants given its parents P factorizes according to G

Representation Theorem G is an I-map for P P could potentially exhibit more CI relations

Representation Theorem X 1 G is an I-map for P X 2 X 5 P could potentially exhibit more CI relations X 3 X 4 Is there an I-map graph for every P?

Representation Theorem Each variable is independent of its non-descendants given its parents P factorizes according to G

Representation Theorem Each variable is independent of its non-descendants given its parents P factorizes according to G Proof: 1. Assume topological order (or reorder) 2. 3. 4. & & &

Representation Theorem Each variable is independent of its non-descendants given its parents P factorizes according to G Homework!

Bayesian Network + DAG: conditioning variables=parents Bayesian network structure = Conditional probability tables Bayesian network parameters Joint distribution as a chain rule of conditional probabilities

Dimensionality Reduction Full probability table Bayesian network (BN) O(2 n ) free parameters O(n2 k ) free parameters (assuming at most k parents per node)

Representation Theorem DAG: conditioning variables=parents Bayesian Network Structure Structure Local Markov assumption Can be more independencies in P Independencies Joint distribution as a chain rule of conditional probabilities

Finding Conditional Independences A B E F H I C G J D Is A E?

Finding Conditional Independences A B E F H I C G J D Is A E B?

Finding Conditional Independences A B E F H I C G J D Is A E B,G? How to convert local Markov properties into conditional independencies?

Simple Case X Y direct connection Is possible to find Z so that for X Y Z? Not always, e.g., Y is a deterministic function of X Verdict: dependent edge is active in the flow of influence active = dependence

Independencies for Three Variables X Z Y active indirect causal effect X blocked Y X Z Y indirect evidential effect active Z common effect X Z Y active common cause

Independencies for Three Variables X Z Y blocked indirect causal effect X active Y X Z Y indirect evidential effect blocked Z common effect X Z Y blocked common cause

Trails A B E F H I C G J D Trail = undirected path

Trails A B E F H I C G J D Trail = undirected path

Active Trails A B E F H I C G J D Active trail = all consecutive triples are active v-triples do not have descendants among conditioning nodes

Finding Active Trails Algorithm 3.1 in the textbook Find the ancestors for evidence nodes (to test v- structures) Breadth-first search (a bit tricky as both up and down directions have to be considered) Instead, play Bayes-Ball (The Rational Pastime)

http://ai.stanford.edu/~paskin/gm-short-course/lec2.pdf

http://ai.stanford.edu/~paskin/gm-short-course/lec2.pdf

Direction-dependent Separation X and Y are d-separated given Z = no active trails between nodes in X and Y given nodes in Z Denoted by Will examine the set of independencies induced by d-separation How can we formally tie d-separation and conditional independence?

Soundness of d-separation For all P that factorizes according to G G is a BN structure for P Will prove later in the course d-separation in G conditional independence in P G is an I-map for P

Completeness of d-separation What would be a good converse? Faithfullness Does not hold! Relaxing

Completeness of d-separation Relaxing relaxation contraposition

Completeness of d-separation Interpreting the statement Active trail between X and Y given Z A B E F H I X and Y are dependent given Z in some P that factorizes according to G C G J D

Completeness of d-separation Interpreting the statement Active trail between X and Y given Z A E F X and Y are dependent given Z in some P that factorizes according to G C G J D Sketch of proof (by construction): All CPDs not in the trail are uniform (remove nodes and arrows not in the trail Make the remaining dependencies deterministic (xor)

More General Result Soundness Completeness (almost) Intuition: Two binary variables X and Y; 2-d space of possible conditional distributions with a 1-d curve for independence P(Y=1 X=1) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 X Y X Y 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P(Y=1 X=0)

Dimensionality Reduction 1 0.9 X Y X Y P(Y=1 X=1) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P(Y=1 X=0) Key for dimensionality reduction: finding a parametrization on such a low-dimensional manifold

Equivalent Structures Is DAG structure unique for a set of conditional independencies? X Z Y X Z Y X Z Y What makes DAGs equivalent for CI modeling?

More Formally: I-Equivalence X Z Y X Z Y I-Equivalence: I(G 1 )=I(G 2 ) What makes graphs I-equivalent?

Skeleton A B E F H I C G J D A B E F H I C G J D

Structure Equivalence Is skeleton enough? X Z Y No, because of v-structures What if we add v-structures? skeleton(g 1 )=skeleton(g 2 ) + v-structures(g 1 )=v-structures(g 2 ) Converse? No, e.g., two-different complete DAGs X Z I(G 1 )=I(G 2 ) Y

Structure Equivalence Still not a full characterization Refining the last piece X no edge Y Z Immorality skeleton(g 1 )=skeleton(g 2 ) + immoralities(g 1 )=immoralities(g 2 ) I(G 1 )=I(G 2 )

How To Construct G? A B E C G D?

What Can We Reconstruct? Ideally, want G such that I(G)=I Suppose such graph exists Graph may not be unique n! ordering to search over May not be able to such G Will settle for G an I-map for I, I(G) I But no extra edges (dependencies) Removing edges can only add independencies G is a minimal I-map = removing edges in G adds independencies not in I

How To Construct Minimal I-Map? Assume ordering is given (X 1,,X n ) For each node Find smallest subset so that Add edges from U to X i

Example of Finding I-Maps A B E C D Order=(A,B,E,C,D) What if a different order is chosen?

Summary Bayesian Network = DAG + CPDs Distribution factorizes according to graph (chain rule decomposition) distribution satisfies local independence in graph distribution satisfies global independence in graph (d-separation) D-separation precisely characterizes independencies in the distribution (almost) Convert high-dimensional real-valued space (distribution) to a discrete space (graph) However, graph may not be unique or may not be able to capture the exact set of independencies