Bayesian Networks Representation

Similar documents
Directed Graphical Models or Bayesian Networks

BN Semantics 3 Now it s personal!

BN Semantics 3 Now it s personal! Parameter Learning 1

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Learning P-maps Param. Learning

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Probabilistic Graphical Models

Directed Graphical Models

Bayesian Networks Representation

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Probabilistic Graphical Models

PAC-learning, VC Dimension and Margin-based Bounds

Rapid Introduction to Machine Learning/ Deep Learning

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Probabilistic Models

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Probability Review

Structure Learning: the good, the bad, the ugly

CS 188: Artificial Intelligence Fall 2009

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

CS 343: Artificial Intelligence

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Probabilistic Models. Models describe how (a portion of) the world works

Bayes Nets: Independence

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

Bayesian Networks (Part II)

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

CS 188: Artificial Intelligence Spring Announcements

Review: Bayesian learning and inference

Bayesian Networks. Motivation

Bayesian Networks BY: MOHAMAD ALSABBAGH

Probabilistic Graphical Models

Bayesian networks (1) Lirong Xia

PROBABILISTIC REASONING SYSTEMS

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Artificial Intelligence Bayes Nets: Independence

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Another look at Bayesian. inference

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Introduction to Artificial Intelligence. Unit # 11

Probabilistic Graphical Networks: Definitions and Basic Results

CS 188: Artificial Intelligence Fall 2008

Quantifying uncertainty & Bayesian networks

Undirected Graphical Models

CS 188: Artificial Intelligence Spring Announcements

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Directed Graphical Models

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

2 : Directed GMs: Bayesian Networks

Bayesian Networks Structure Learning (cont.)

Graphical Models - Part I

Bayesian Networks Inference with Probabilistic Graphical Models

Machine Learning Lecture 14

Bayes Nets III: Inference

Informatics 2D Reasoning and Agents Semester 2,

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Learning Bayesian Networks (part 1) Goals for the lecture

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Intelligent Systems (AI-2)

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Chris Bishop s PRML Ch. 8: Graphical Models

Y. Xiang, Inference with Uncertain Knowledge 1

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

CSE 473: Artificial Intelligence Autumn 2011

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Reasoning Under Uncertainty: Bayesian networks intro

Introduction to Bayesian Learning

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

CS 380: ARTIFICIAL INTELLIGENCE UNCERTAINTY. Santiago Ontañón

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models

4 : Exact Inference: Variable Elimination

Conditional Independence

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Probabilistic Classification

Bayesian networks. Chapter 14, Sections 1 4

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Undirected Graphical Models: Markov Random Fields

Be able to define the following terms and answer basic questions about them:

Uncertainty and Bayesian Networks

Naïve Bayes classification

Bayesian Networks. CompSci 270 Duke University Ron Parr. Why Joint Distributions are Important

CSE 473: Artificial Intelligence Probability Review à Markov Models. Outline

CMPSCI 240: Reasoning about Uncertainty

Objectives. Probabilistic Reasoning Systems. Outline. Independence. Conditional independence. Conditional independence II.

Transcription:

Bayesian Networks Representation Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 19 th, 2007

Handwriting recognition Character recognition, e.g., kernel SVMs a c z rr r r r r c c b

Webpage classification Company home page vs Personal home page vs University home page vs

Handwriting recognition 2

Webpage classification 2

Today Bayesian networks One of the most exciting advancements in statistical AI in the last 10-15 years Generalizes naïve Bayes and logistic regression classifiers Compact representation for exponentially-large probability distributions Exploit conditional independencies

Causal structure Suppose we know the following: The flu causes sinus inflammation Allergies cause sinus inflammation Sinus inflammation causes a runny nose Sinus inflammation causes headaches How are these connected?

Possible queries Flu Allergy Inference Sinus Most probable explanation Headache Nose Active data collection

Car starts BN 18 binary attributes Inference P(BatteryAge Starts=f) 2 16 terms, why so fast? Not impressed? HailFinder BN more than 3 54 = 58149737003040059690390169 terms

Announcements Welcome back! One page project proposal due Wednesday Individual or groups of two Must be something related to ML! It will be great if it s related to your research it must be something you started this semester Midway progress report 5 pages NIPS format April 16 th Worth 20% Poster presentation May 4, 2-5pm in the NSH Atrium Worth 20% Final report May 10 th 8 pages NIPS format Worth 60% It will be fun!!!

Factored joint distribution - Preview Flu Allergy Sinus Headache Nose

Number of parameters Flu Allergy Sinus Headache Nose

Key: Independence assumptions Flu Allergy Sinus Headache Nose Knowing sinus separates the variables from each other

(Marginal) Independence Flu and Allergy are (marginally) independent Flu = t Flu = f More Generally: Allergy = t Allergy = f Flu = t Flu = f Allergy = t Allergy = f

Marginally independent random variables Sets of variables X, Y X is independent of Y if P (X=x Y=y), x Val(X), y Val(Y) Shorthand: Marginal independence: P (X Y) Proposition: P statisfies (X Y) if and only if P(X,Y) = P(X) P(Y)

Conditional independence Flu and Headache are not (marginally) independent Flu and Headache are independent given Sinus infection More Generally:

Conditionally independent random variables Sets of variables X, Y, Z X is independent of Y given Z if P (X=x Y=y Z=z), x Val(X), y Val(Y), z Val(Z) Shorthand: Conditional independence: P (X Y Z) For P (X Y ), write P (X Y) Proposition: P statisfies (X Y Z) if and only if P(X,Y Z) = P(X Z) P(Y Z)

Properties of independence Symmetry: (X Y Z) (Y X Z) Decomposition: (X Y,W Z) (X Y Z) Weak union: (X Y,W Z) (X Y Z,W) Contraction: (X W Y,Z) & (X Y Z) (X Y,W Z) Intersection: (X Y W,Z) & (X W Y,Z) (X Y,W Z) Only for positive distributions! P(α)>0, α, α

The independence assumption Flu Headache Sinus Allergy Nose Local Markov Assumption: A variable X is independent of its non-descendants given its parents

Explaining away Local Markov Assumption: A variable X is independent of its non-descendants given its parents Flu Allergy Sinus Headache Nose

Naïve Bayes revisited Local Markov Assumption: A variable X is independent of its non-descendants given its parents

What about probabilities? Conditional probability tables (CPTs) Flu Allergy Sinus Headache Nose

Joint distribution Flu Allergy Sinus Headache Nose Why can we decompose? Markov Assumption!

The chain rule of probabilities P(A,B) = P(A)P(B A) Flu Sinus More generally: P(X 1,,X n ) = P(X 1 ) P(X 2 X 1 ) P(X n X 1,,X n-1 )

Chain rule & Joint distribution Flu Sinus Allergy Local Markov Assumption: A variable X is independent of its non-descendants given its parents Headache Nose

Two (trivial) special cases Edgeless graph Fully-connected graph

The Representation Theorem Joint Distribution to BN BN: Encodes independence assumptions If conditional independencies in BN are subset of conditional independencies in P Obtain Joint probability distribution:

Real Bayesian networks applications Diagnosis of lymph node disease Speech recognition Microsoft office and Windows http://www.research.microsoft.com/research/dtg/ Study Human genome Robot mapping Robots to identify meteorites to study Modeling fmri data Anomaly detection Fault dianosis Modeling sensor network data

A general Bayes net Set of random variables Directed acyclic graph Encodes independence assumptions CPTs Joint distribution:

How many parameters in a BN? Discrete variables X 1,, X n Graph Defines parents of X i, Pa Xi CPTs P(X i Pa Xi )

Another example Variables: B Burglar E Earthquake A Burglar alarm N Neighbor calls R Radio report Both burglars and earthquakes can set off the alarm If the alarm sounds, a neighbor may call An earthquake may be announced on the radio

Another example Building the BN B Burglar E Earthquake A Burglar alarm N Neighbor calls R Radio report

Independencies encoded in BN We said: All you need is the local Markov assumption (X i NonDescendants Xi Pa Xi ) But then we talked about other (in)dependencies e.g., explaining away What are the independencies encoded by a BN? Only assumption is local Markov But many others can be derived using the algebra of conditional independencies!!!

Understanding independencies in BNs BNs with 3 nodes Indirect causal effect: X Z Y Local Markov Assumption: A variable X is independent of its non-descendants given its parents Indirect evidential effect: X Z Y Common effect: X Y Common cause: X Z Y Z

Understanding independencies in BNs Some examples A B C D E F G H J I K

An active trail Example A B C D E F G H F F When are A and H independent?

Active trails formalized A path X 1 X 2 X k is an active trail when variables O {X 1,,X n } are observed if for each consecutive triplet in the trail: X i-1 X i X i+1, and X i is not observed (X i O) X i-1 X i X i+1, and X i is not observed (X i O) X i-1 X i X i+1, and X i is not observed (X i O) X i-1 X i X i+1, and X i is observed (X i O), or one of its descendents

Active trails and independence? Theorem: Variables X i and X j are independent given Z {X 1,,X n } if the is no active trail between X i and X j when variables Z {X 1,,X n } are observed C F A D G B E H J I K

The BN Representation Theorem If conditional independencies in BN are subset of conditional independencies in P Obtain Joint probability distribution: Important because: Every P has at least one BN structure G If joint probability distribution: Obtain Then conditional independencies in BN are subset of conditional independencies in P Important because: Read independencies of P from BN structure G

Simpler BNs A distribution can be represented by many BNs: Simpler BN, requires fewer parameters

Learning Bayes nets Known structure Unknown structure Fully observable data Missing data Data x (1) x (m) structure CPTs P(X i Pa Xi ) parameters

Learning the CPTs Data For each discrete variable X i x (1) x (m)

Queries in Bayes nets Given BN, find: Probability of X given some evidence, P(X e) Most probable explanation, max x1,,x n P(x 1,,x n e) Most informative query Learn more about these next class

What you need to know Bayesian networks A compact representation for large probability distributions Not an algorithm Semantics of a BN Conditional independence assumptions Representation Variables Graph CPTs Why BNs are useful Learning CPTs from fully observable data Play with applet!!!

Acknowledgements JavaBayes applet http://www.pmr.poli.usp.br/ltd/software/javabayes/ho me/index.html