Probabilistic Graphical Models

Similar documents
Learning P-maps Param. Learning

Probabilistic Graphical Models

Probabilistic Graphical Models

BN Semantics 3 Now it s personal! Parameter Learning 1

Probabilistic Graphical Models

BN Semantics 3 Now it s personal!

Structure Learning: the good, the bad, the ugly

Probabilistic Graphical Models

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Probabilistic Graphical Models

Learning Bayes Net Structures

Directed Graphical Models or Bayesian Networks

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Learning Bayesian Networks (part 1) Goals for the lecture

Bayesian Networks. Motivation

From Bayesian Networks to Markov Networks. Sargur Srihari

CS Lecture 4. Markov Random Fields

2 : Directed GMs: Bayesian Networks

3 : Representation of Undirected GM

10708 Graphical Models: Homework 2

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

2 : Directed GMs: Bayesian Networks

Learning Bayesian network : Given structure and completely observed data

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Bayesian Networks Representation

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

A brief introduction to Conditional Random Fields

Structure Learning in Bayesian Networks (mostly Chow-Liu)

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Based on slides by Richard Zemel

Learning in Bayesian Networks

Probabilistic Graphical Models

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

CSC 412 (Lecture 4): Undirected Graphical Models

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

4: Parameter Estimation in Fully Observed BNs

Lecture 6: Graphical Models

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

CS Lecture 3. More Bayesian Networks

Bayesian Networks Structure Learning (cont.)

Conditional Independence and Factorization

Intelligent Systems (AI-2)

Directed Graphical Models

Intelligent Systems (AI-2)

11. Learning graphical models

Quiz 1 Date: Monday, October 17, 2016

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

Review of Probabilities and Basic Statistics

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Probabilistic Graphical Models

{ p if x = 1 1 p if x = 0

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Directed and Undirected Graphical Models

Bayesian Learning. Instructor: Jesse Davis

Directed Graphical Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 8: December 17, 2003

Intelligent Systems (AI-2)

Introduction to Bayesian Learning. Machine Learning Fall 2018

An Introduction to Bayesian Machine Learning

K. Nishijima. Definition and use of Bayesian probabilistic networks 1/32

Lecture 4 October 18th

Structure Learning in Bayesian Networks

Bayes Nets: Independence

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Probabilistic Graphical Networks: Definitions and Basic Results

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

Lecture 5: Bayesian Network

Bayesian Networks (Part II)

Statistical Learning

Bayesian Approaches Data Mining Selected Technique

Introduction to Bayesian Learning

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Machine Learning, Fall 2012 Homework 2

Probabilistic Graphical Models. Rudolf Kruse, Alexander Dockhorn Bayesian Networks 153

Bayesian Networks aka belief networks, probabilistic networks. Bayesian Networks aka belief networks, probabilistic networks. An Example Bayes Net

Undirected Graphical Models

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models (I)

4 : Exact Inference: Variable Elimination

Bayesian networks Lecture 18. David Sontag New York University

COMP538: Introduction to Bayesian Networks

Notes on Machine Learning for and

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Naïve Bayes classification

Lecture 8: Bayesian Networks

Undirected Graphical Models

Lecture 15. Probabilistic Models on Graph

CS 5522: Artificial Intelligence II

Introduction to Probabilistic Graphical Models

Bayesian & Markov Networks: A unified view

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Chapter 16. Structured Probabilistic Models for Deep Learning

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Transcription:

Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause

Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out. Due in class Wed Oct 21 Project proposals due Monday Oct 19 2

Representing the world using BNs s 4 s 1 s 2 s 3 represent s 5 s7 s s 8 6 s 11 s 9 s 10 True distribution P with cond. ind. I(P ) Want to make sure that I(P) I(P ) Need to understand CI properties of BN (G,P) s 12 Bayes net (G,P) with I(P) 3

Factorization Theorem s 4 s 1 s 2 s 3 s 6 s 5 s7 s 8 s 11 s 9 s 10 s 12 I loc (G) I(P) G is an I-mapof P (independence map) True distribution P can be represented exactly as Bayesian network (G,P) 4

Additional conditional independencies BN specifies joint distribution through conditional parameterization that satisfies Local Markov Property I loc (G) = {(X i Nondescendants Xi Pa Xi )} But we also talked about additional properties of CI Weak Union, Intersection, Contraction, Which additional CI does a particular BN specify? All CI that can be derived through algebraic operations proving CI is very cumbersome!! Is there an easy way to find all independences of a BN just by looking at its graph?? 5

Examples A G B D E H I C F J 6

Active trails An undirected path in BN structure G is called active trail for observed variables O {X 1,,X n }, if for every consecutive triple of vars X,Y,Z on the path X Y Z and Y is unobserved (Y O) X Y Z and Y is unobserved (Y O) X Y Z and Y is unobserved (Y O) X Y Z and Y or any of Y s descendants is observed Any variables X i and X j for which active trail for observations O are called d-separated by O We write d-sep(x i ;X j O) Sets Aand Bare d-separated given Oif d-sep(x,y O) for all X A, Y B. Write d-sep(a; B O) 7

Soundness of d-separation Have seen: P factorizes according to G I loc (G) I(P) Define I(G) = {(X Y Z): d-sep G (X;Y Z)} Theorem: Soundness of d-separation P factorizes over G I(G) I(P) Hence, d-separation captures only true independences How about I(G) = I(P)? 8

Completeness of d-separation Theorem: For almost all distributions P that factorize over G it holds that I(G) = I(P) almost all : except for a set of distributions with measure 0, assuming only that no finite set of distributions has measure > 0 9

Algorithm for d-separation How can we check if X Y Z? Idea: Check every possible path connecting X and Y and verify conditions Exponentially many paths!!! Linear time algorithm: Find all nodes reachable from X 1. Mark Zand its ancestors 2. Do breadth-first search starting from X; stop if path is blocked Have to be careful with implementation details (see reading) B C A D E F G H I I 10

Representing the world using BNs s 4 s 1 s 2 s 3 represent s 5 s7 s s 8 6 s 11 s 9 s 10 True distribution P with cond. ind. I(P ) Want to make sure that I(P) I(P ) Ideally: I(P) = I(P ) Want BN that exactly captures independencies in P! s 12 Bayes net (G,P) with I(P) 11

Minimal I-map Graph G is called minimal I-map if it s an I-map, and if any edge is deleted no longer I-map. 12

Uniqueness of Minimal I-maps Is the minimal I-Map unique? E B E B J M A A E B J M J M A 13

Perfect maps Minimal I-maps are easy to find, but can contain many unnecessary dependencies. A BN structure G is called P-map(perfect map) for distribution P if I(G) = I(P) Does every distribution P have a P-map? 14

I-Equivalence Two graphs G, G are called I-equivalent if I(G) = I(G ) I-equivalence partitions graphs into equivalence classes 15

Skeletons of BNs A G A G B D E H I B D E H I C F J C F J I-equivalent BNs must have same skeleton 16

Immoralities and I-equivalence A V-structure X Y Z is called immoralif there is no edge between X and Z ( unmarried parents ) Theorem: I(G) = I(G ) G and G have the same skeleton and the same immoralities. 17

Today: Learning BN from data Want P-map if one exists Need to find Skeleton Immoralities 18

Identifying the skeleton When is there an edge between X and Y? When is there no edge between X and Y? 19

Algorithm for identifying the skeleton 20

Identifying immoralities When is X Z Y an immorality? Immoral for all U, Z U: (X Y U) 21

From skeleton & immoralities to BN Structures Represent I-equivalence class as partially-directed acyclic graph (PDAG) How do I convert PDAG into BN? 22

Testing independence So far, assumed that we know I(P ), i.e., all independencies associated with true dist. P Often, access to P only through sample data (e.g., sensor measurements, etc.) Given varsx, Y, Z, want to test whether X Y Z 23

Next topic: Learning BN from Data Two main parts: Learning structure (conditional independencies) Learning parameters (CPDs) 24

Parameter learning Suppose X is Bernoulli distribution (coin flip) with unknown parameter P(X=H) =θ. Given training data D = {x (1),,x (m) } (e.g., H H T H H H T T H T H H H..) how do we estimate θ? 25

Maximum Likelihood Estimation Given: data set D Hypothesis: data generated i.i.d. from binomial distribution with P(X = H) = θ Optimize for θwhich makes D most likely: 26

Solving the optimization problem 27

Learning general BNs Known structure Unknown structure Fully observable Missing data 28

Estimating CPDs Given data D = {(x 1,y 1 ),,(x n,y n )} of samples from X,Y, want to estimate P(X Y) 29

MLE for Bayesnets 30

Algorithm for BN MLE 31

Learning general BNs Fully observable Known structure Easy! Unknown structure??? Missing data Hard (EM) Very hard (later) 32

Structure learning Two main classes of approaches: Constraint based Search for P-map (if one exists): Identify PDAG Turn PDAG into BN (using algorithm in reading) Key problem: Perform independence tests Optimization based Define scoring function (e.g., likelihood of data) Think about structure as parameters More common; can solve simple cases exactly 33

MLE for structure learning For fixed structure, can compute likelihood of data 34

Decomposable score Log-data likelihood MLE score decomposes over families of the BN (nodes + parents) Score(G; D) = i FamScore(X i Pa i ; D) Can exploit for computational efficiency! 35

Finding the optimal MLE structure Log-likelihood score: Want G * = argmax G Score(G; D) Lemma: G G Score(G; D) Score(G ; D) 36

Finding the optimal MLE structure Optimal solution for MLE is always the fully connected graph!!! Non-compact representation; Overfitting!! Solutions: Priors over parameters / structures (later) Constraint optimization (e.g., bound #parents) 37

Constraint optimization of BN structures Theorem: for any fixed d 2, finding the optimal BN (w.r.t. MLE score) is NP-hard What about d=1?? Want to find optimal tree! 38

Finding the optimal tree BN Scoring function Scoring a tree 39

Finding the optimal tree skeleton Can reduce to following problem: Given graph G = (V,E), and nonnegative weights w e for each edge e=(x i,x j ) In our case: w e = I(X i,x j ) Want to find tree T E that maximizes e T w e Maximum spanning tree problem! Can solve in time O( E log E )! 40

Chow-Liu algorithm For each pair X i, X j of variables compute Compute mutual information Define complete graph with weight of edge (X i,x i ) given by the mutual information Find maximum spanning tree skeleton Orient the skeleton using breadth-first search 41

Generalizing Chow-Liu Tree-augmented Naïve Bayes Model [Friedman 97] If evidence variables are correlated, Naïve Bayes models can be overconfident Key idea: Learn optimal tree for conditional distribution P(X 1,,X n Y) Can do optimally using Chow-Liu (homework! ) 42

Tasks Subscribe to Mailing list https://utils.its.caltech.edu/mailman/listinfo/cs155 Select recitation times Read Koller& Friedman Chapter 17.1-17.3, 18.1-2, 18.4.1 Form groups and think about class projects. If you have difficulty finding a group, email Pete Trautman 43