Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Similar documents
Rapid Introduction to Machine Learning/ Deep Learning

Directed Graphical Models

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Probabilistic Graphical Models (I)

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Learning Bayesian Networks (part 1) Goals for the lecture

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Lecture 8: Bayesian Networks

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayesian Networks. Motivation

Machine Learning Lecture 14

Probabilistic Models

Review: Bayesian learning and inference

Directed Graphical Models or Bayesian Networks

Probabilistic Graphical Networks: Definitions and Basic Results

PROBABILISTIC REASONING SYSTEMS

{ p if x = 1 1 p if x = 0

CS Lecture 3. More Bayesian Networks

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Chris Bishop s PRML Ch. 8: Graphical Models

Directed and Undirected Graphical Models

CS 5522: Artificial Intelligence II

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayesian Networks BY: MOHAMAD ALSABBAGH

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

2 : Directed GMs: Bayesian Networks

CMPSCI 240: Reasoning about Uncertainty

Lecture 4 October 18th

Bayesian Machine Learning

Bayesian networks (1) Lirong Xia

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Conditional Independence

CS 188: Artificial Intelligence Fall 2009

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Spring Announcements

Chapter 16. Structured Probabilistic Models for Deep Learning

Bayes Nets: Independence

Graphical Models - Part I

Another look at Bayesian. inference

CSEP 573: Artificial Intelligence

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Based on slides by Richard Zemel

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Learning in Bayesian Networks

Intro to AI: Lecture 8. Volker Sorge. Introduction. A Bayesian Network. Inference in. Bayesian Networks. Bayesian Networks.

Machine Learning Summer School

Review: Directed Models (Bayes Nets)

Introduction to Artificial Intelligence. Unit # 11

An Introduction to Bayesian Machine Learning

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Bayes Nets III: Inference

CS 5522: Artificial Intelligence II

Artificial Intelligence Bayesian Networks

Today. Why do we need Bayesian networks (Bayes nets)? Factoring the joint probability. Conditional independence

CS 188: Artificial Intelligence Spring Announcements

Probability. CS 3793/5233 Artificial Intelligence Probability 1

4.1 Notation and probability review

Bayesian Networks: Independencies and Inference

Probabilistic Models. Models describe how (a portion of) the world works

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Graphical Models and Kernel Methods

COMP5211 Lecture Note on Reasoning under Uncertainty

Reasoning Under Uncertainty: Bayesian networks intro

LEARNING WITH BAYESIAN NETWORKS

STA 4273H: Statistical Machine Learning

CS 343: Artificial Intelligence

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Probabilistic Classification

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Bayesian networks. Chapter 14, Sections 1 4

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Introduction to Bayesian Learning

Statistical Approaches to Learning and Discovery

Bayesian Networks Representation

Conditional Independence and Factorization

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

A Tutorial on Learning with Bayesian Networks

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

Advanced Probabilistic Modeling in R Day 1

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

COMP538: Introduction to Bayesian Networks

Directed and Undirected Graphical Models

Artificial Intelligence Bayes Nets: Independence

Probabilistic Graphical Models Redes Bayesianas: Definição e Propriedades Básicas

CSE 473: Artificial Intelligence Autumn 2011

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Bayesian Networks. Material used

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Probabilistic Reasoning Systems

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Transcription:

ecall from last time Lecture 3: onditional independence and graph structure onditional independencies implied by a belief network Independence maps (I-maps) Factorization theorem The Bayes ball algorithm an d-separation Bayesian networks are a graphical model representing conditional independence relations The nodes of the graphs represent r.v. s Each node has associated with it a conditional probability distribution for the corresponding r.v., given its parents The lack of edges represents conditional independence assumptions. The presence of edges doe not represent dependence. January 8, 2006 1 OMP-526 Lecture 3 January 8, 2006 2 OMP-526 Lecture 3 Example: Bayesian (belief) network E=0 E=1 p( E) =1 0.0001 0.65 0.35 =0 =1 p(e) E=1 E=0 0.005 0.995 =0 0.9999 p( ) =1 =0 0.05 0.95 0.7 0.3 E B p(b) B=1 B=0 0.01 0.99 p( B,E) =1 =0 B=0,E=0 0.001 0.999 B=0,E=1 B=1,E=0 B=1,E=1 0.3 0.8 0.95 0.7 0.2 0.05 The nodes represent random variables The arcs represent influences t each node, we have a conditional probability distribution (PD) for the corresponding variable given its parents Factorization Let G be a DG over variables 1,..., n. We say that a joint probability distribution p factorizes according to G if p can be expressed as a product: n p(x 1,..., x n ) = p(x i x πi ) i=1 The individual factors p(x i x πi ) are called local probabilistic models or conditional probability distributions (PD). January 8, 2006 3 OMP-526 Lecture 3 January 8, 2006 4 OMP-526 Lecture 3

Example I-Maps directed acyclic graph (DG) G whose nodes represent random variables 1,..., n is an I-map (independence map) of a distribution p if p satisfies the independence assumptions: i Nondescendents( i ) πi, i = 1,... n where πi are the parents of i January 8, 2006 5 OMP-526 Lecture 3 onsider all possible DG structures over 2 variables. Which graph is an I-map for the following distribution? x y p(x, y) 0 0 0.08 0 1 0.32 1 0 0.32 1 1 0.28 What about the following distribution? x y p(x, y) 0 0 0.08 0 1 0.12 1 0 0.32 1 1 0.48 January 8, 2006 6 OMP-526 Lecture 3 Factorization theorem G is an I-map of p if and only if p factorizes according to G: n p(x 1,..., x n ) = p(x i x πi ), x i Ω i i=1 Proof: = (other direction in a bit) ssume that G is an I-map for p. By the chain rule, p(x 1,...,x n ) = Q n i=1 p(x i x 1,..., x i 1 ). Without loss of generality, we can order the variables x i according to G. From this assumption, πi { 1,..., i 1 }. This means that { 1,..., i 1 } = πi, where Nondescendents( i ). Since G is an I-map, we have i Nondescendents( i ) πi, so: p(x i x 1,..., x i 1 ) = p(x i z, x πi ) = p(x i x πi ) Factorization example E B The factorization theorem allows us to represent p(c, a, r, e,b) as: p(c, a, r, e, b) = p(b)p(e)p(a b, e)p(c a)p(r e) instead of: p(c, a, r, e, b) = p(b)p(e b)p(a e, b)p(c a, e, b)p(r a, e, c, b) and the conclusion follows. January 8, 2006 7 OMP-526 Lecture 3 January 8, 2006 8 OMP-526 Lecture 3

omplexity of factorized representations If k is the maximum number of ancestors for any node in the graph, and we have binary variables, then every conditional probability distribution will require 2 k numbers to specify The whole joint distribution can then be specified with n 2 k numbers, instead of 2 n The savings are big if the graph is sparse (k n). Minimal I-maps The fact that a DG G is an I-map for a joint distribution p might not be very useful. E.g. omplete DGs (where all arcs that do not create a cycle are present) are I-maps for any distribution (because they do not imply any independencies). DG G is minimal I-map of p if: 1. G is an I-map of p 2. If G G then G is not an I-map for p January 8, 2006 9 OMP-526 Lecture 3 January 8, 2006 10 OMP-526 Lecture 3 Non-uniqueness of the minimal I-map onstructing minimal I-maps The factorization theorem suggests an algorithm: 1. Fix an ordering of the variables: 1,..., n 2. For each i, select its parents πi to be the minimal subset of { 1,..., i 1 } such that i ({ 1,..., i 1 } πi ) πi. This will yield a minimal I-map January 8, 2006 11 OMP-526 Lecture 3 Unfortunately, a distribution can have many minimal I-maps, depending on the variable ordering we choose! The initial choice of variable ordering can have a big impact on the complexity of the minimal I-map: Example: E Ordering: E,B,,, Ordering:,,, E,B good heuristic is to use causality in order to generate an ordering. January 8, 2006 12 OMP-526 Lecture 3 B E B

Implied independency The fact that a Bayes net is an I-map for a distribution implies a set of conditional independencies that always hold, and allows simple case: Indirect connection us to compute join probabilities (and hence make inference) a lot faster in practice In practice, we also have evidence about the values of certain variables. Is there a way to say what are all the independence relations Think of as the past, as the present and as the future This is a simple Markov chain We interpret the lack of an edge between and as a conditional independence,. Is this justified? implied by a Bayes net? January 8, 2006 13 OMP-526 Lecture 3 January 8, 2006 14 OMP-526 Lecture 3 Indirect connection (continued) more interesting case: ommon cause We interpret the lack of an edge between and as a conditional independence,. Is this justified? Based on the graph structure, we have: Hence, we have: p(, ) = p(,, ) = p()p( )p( ) p(,,) p(, ) = p()p( )p( ) p()p( ) = p( ) Note that the edges that are present do not imply dependence. But the edges that are missing do imply independence. gain, we interpret the lack of edge between and as. Why is this true? p(,) = p(,, ) p(, ) = p( )p( )p( ) p( )p( ) = p( ) This is a hidden variable scenario: if is unknown, then and could appear to be dependent on each other January 8, 2006 15 OMP-526 Lecture 3 January 8, 2006 16 OMP-526 Lecture 3

The most interesting case: V-structure Bayes ball algorithm In this case, the lacking edge between and is a statement of marginal independence:. In this case, once we know the value of, and might depend on each other. E.g., suppose and are independent coin flips, and is true if and only if both and come up heads. Note that in this case, is not independent of given! This is the case of explaining away. January 8, 2006 17 OMP-526 Lecture 3 Suppose we want to decide whether for a general Bayes net with corresponding graph G. We shade all nodes in the evidence set, We put balls in all the nodes in, and we let them bounce around the graph according to rules inspired by these three base cases Note that the balls can go in any direction along an edge! If any ball reaches any node in, then the conditional independence assertion is not true. January 8, 2006 18 OMP-526 Lecture 3 Head-to-tail Tail-to-tail Head-to-head Base rules unknown, path unblocked known, path blocked unknown, path unblocked known, path blocked unknown, path BLOKED known, path UNBLOKED d-separation Suppose we want to show that a conditional independence relation,, is implied by a DG G in which,, are non-intersecting sets of nodes. path is said to be blocked if it includes a node such that: 1. the arrows in the path do not meet head-to-head at the node, and the node is in the conditioning set (this covers the head-to-tail and tail-to-tail cases) 2. the arrows do meet head-to-head and neither the node nor its descendents are in If, given the set of conditioning nodes, all paths from any node in to any node in are blocked, then is d-separated (directed-separated) from given January 8, 2006 19 OMP-526 Lecture 3 January 8, 2006 20 OMP-526 Lecture 3

Is? Example: The alarm network E B Important results Soundness : If a joint distribution p factorizes according to a DG G, and if, and are subsets of nodes such that d-separates and in G, then p satisfies. This property allows us to prove the reverse direction of the factorization theorem. ompleteness : if does not d-separate and in DG G, then there exists at least one distribution p which factorizes over G and in which \ January 8, 2006 21 OMP-526 Lecture 3 January 8, 2006 22 OMP-526 Lecture 3