Learning P-maps Param. Learning

Similar documents
BN Semantics 3 Now it s personal! Parameter Learning 1

BN Semantics 3 Now it s personal!

Probabilistic Graphical Models

Structure Learning: the good, the bad, the ugly

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Directed Graphical Models or Bayesian Networks

Learning Bayes Net Structures

Bayesian Networks Representation

Point Estimation. Maximum likelihood estimation for a binomial distribution. CSE 446: Machine Learning

Point Estimation. Vibhav Gogate The University of Texas at Dallas

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Directed Graphical Models

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Probabilistic Graphical Models

Introduc)on to Bayesian methods (con)nued) - Lecture 16

Gaussians Linear Regression Bias-Variance Tradeoff

2 : Directed GMs: Bayesian Networks

Directed Graphical Models

Bayesian Networks Structure Learning (cont.)

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Context-specific independence Parameter learning: MLE

CS Lecture 4. Markov Random Fields

3 : Representation of Undirected GM

Bias-Variance Tradeoff

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

CS Lecture 3. More Bayesian Networks

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Structure Learning in Bayesian Networks (mostly Chow-Liu)

Directed and Undirected Graphical Models

Dynamic models 1 Kalman filters, linearization,

10708 Graphical Models: Homework 2

Markov Networks.

Bayesian Networks. Motivation

2 : Directed GMs: Bayesian Networks

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Machine Learning

Announcements. Proposals graded

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Learning in Bayesian Networks

Lecture 5: Bayesian Network

Probabilistic Graphical Models

Quiz 1 Date: Monday, October 17, 2016

Learning Bayesian Networks (part 1) Goals for the lecture

Introduction to Bayesian Learning

Introduction to Machine Learning CMU-10701

Machine Learning 4771

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

Bayesian Classifiers, Conditional Independence and Naïve Bayes. Required reading: Naïve Bayes and Logistic Regression (available on class website)

An Introduction to Bayesian Machine Learning

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Rapid Introduction to Machine Learning/ Deep Learning

{ p if x = 1 1 p if x = 0

LEARNING WITH BAYESIAN NETWORKS

Probabilistic Graphical Models

Introduction to Probabilistic Reasoning. Image credit: NASA. Assignment

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Generative v. Discriminative classifiers Intuition

Qualifier: CS 6375 Machine Learning Spring 2015

Markov Independence (Continued)

Probabilistic Graphical Models (I)

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Announcements. Proposals graded

10601 Machine Learning

Review: Directed Models (Bayes Nets)

COMP538: Introduction to Bayesian Networks

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

MLE/MAP + Naïve Bayes

What you need to know about Kalman Filters

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

Bayesian Networks. Semantics of Bayes Nets. Example (Binary valued Variables) CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-III

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Lecture 6: Graphical Models

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Bayesian Methods: Naïve Bayes

Data Mining 2018 Bayesian Networks (1)

Machine Learning

PROBABILISTIC REASONING SYSTEMS

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

From Bayesian Networks to Markov Networks. Sargur Srihari

Probabilistic Graphical Models

Uncertainty and Bayesian Networks

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

Bayesian Networks BY: MOHAMAD ALSABBAGH

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Statistical Approaches to Learning and Discovery

CSE 312 Final Review: Section AA

Bayesian Machine Learning

Final Exam, Spring 2006

Graphical Models - Part I

Learning Causality. Sargur N. Srihari. University at Buffalo, The State University of New York USA

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Lecture 2: Repetition of probability theory and statistics

Probabilistic Graphical Models

Transcription:

Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008 1 Perfect maps (P-maps) I-maps are not unique and often not simple enough Define simplest G that is I-map for P A BN structure G is a perfect map for a distribution P if I(P) = I(G) Our goal: Find a perfect map! Must address equivalent BNs 10-708 Carlos Guestrin 2006-2008 2 1

Inexistence of P-maps 1 XOR (this is a hint for the homework) 10-708 Carlos Guestrin 2006-2008 3 Obtaining a P-map Given the independence assertions that are true for P Assume that there exists a perfect map G * Want to find G * Many structures may encode same independencies as G *, when are we done? Find all equivalent structures simultaneously! 10-708 Carlos Guestrin 2006-2008 4 2

I-Equivalence Two graphs G 1 and G 2 are I-equivalent if I(G 1 ) = I(G 2 ) Equivalence class of BN structures Mutually-exclusive and exhaustive partition of graphs How do we characterize these equivalence classes? 10-708 Carlos Guestrin 2006-2008 5 Skeleton of a BN Skeleton of a BN structure G is an undirected graph over the same variables that has an edge X Y for every X!Y or Y!X in G C A D B E (Little) Lemma: Two I -equivalent BN structures must have the same skeleton F H G J I K 10-708 Carlos Guestrin 2006-2008 6 3

What about V-structures? A B V-structures are key property of BN structure C F D G E H J Theorem: If G 1 and G 2 have the same skeleton and V-structures, then G 1 and G 2 are I-equivalent I K 10-708 Carlos Guestrin 2006-2008 7 Same V-structures not necessary Theorem: If G 1 and G 2 have the same skeleton and V-structures, then G 1 and G 2 are I-equivalent Though sufficient, same V-structures not necessary 10-708 Carlos Guestrin 2006-2008 8 4

Immoralities & I-Equivalence Key concept not V-structures, but immoralities (unmarried parents ) X! Z Y, with no arrow between X and Y Important pattern: X and Y independent given their parents, but not given Z (If edge exists between X and Y, we have covered the V-structure) Theorem: G 1 and G 2 have the same skeleton and immoralities if and only if G 1 and G 2 are I-equivalent 10-708 Carlos Guestrin 2006-2008 9 Obtaining a P-map Given the independence assertions that are true for P Obtain skeleton Obtain immoralities From skeleton and immoralities, obtain every (and any) BN structure from the equivalence class 10-708 Carlos Guestrin 2006-2008 10 5

Identifying the skeleton 1 When is there an edge between X and Y? When is there no edge between X and Y? 10-708 Carlos Guestrin 2006-2008 12 6

Identifying the skeleton 2 Assume d is max number of parents (d could be n) For each X i and X j E ij true For each U X {X i,x j }, U d Is (X i X j U)? E ij false If E ij is true Add edge X Y to skeleton 10-708 Carlos Guestrin 2006-2008 13 Identifying immoralities Consider X Z Y in skeleton, when should it be an immorality? Must be X! Z Y (immorality): When X and Y are never independent given U, if Z2U Must not be X! Z Y (not immorality): When there exists U with Z2U, such that X and Y are independent given U 10-708 Carlos Guestrin 2006-2008 14 7

From immoralities and skeleton to BN structures Representing BN equivalence class as a partially-directed acyclic graph (PDAG) Immoralities force direction on some other BN edges Full (polynomial-time) procedure described in reading 10-708 Carlos Guestrin 2006-2008 15 What you need to know Minimal I-map every P has one, but usually many Perfect map better choice for BN structure not every P has one can find one (if it exists) by considering I-equivalence Two structures are I-equivalent if they have same skeleton and immoralities 10-708 Carlos Guestrin 2006-2008 16 8

Announcements Recitation tomorrow Don t miss it! No class on Monday 10-708 Carlos Guestrin 2006-2008 17 Review Bayesian Networks Compact representation for probability distributions Flu Allergy Exponential reduction in number of parameters Exploits independencies Headache Sinus Nose Next Learn BNs parameters structure 10-708 Carlos Guestrin 2006-2008 18 9

Thumbtack Binomial Distribution P(Heads) = θ, P(Tails) = 1-θ Flips are i.i.d.: Independent events Identically distributed according to Binomial distribution Sequence D of α H Heads and α T Tails 10-708 Carlos Guestrin 2006-2008 19 Maximum Likelihood Estimation Data: Observed set D of α H Heads and α T Tails Hypothesis: Binomial distribution Learning θ is an optimization problem What s the objective function? MLE: Choose θ that maximizes the probability of observed data: 10-708 Carlos Guestrin 2006-2008 20 10

Your first learning algorithm Set derivative to zero: 10-708 Carlos Guestrin 2006-2008 21 Learning Bayes nets Known structure Unknown structure Fully observable data Missing data Data x (1) x (m) structure CPTs P(X i Pa Xi ) parameters 10-708 Carlos Guestrin 2006-2008 22 11

Learning the CPTs Data For each discrete variable X i x (1) x (m) 10-708 Carlos Guestrin 2006-2008 23 Learning the CPTs Data x (1) x (m) For each discrete variable X i WHY?????????? 10-708 Carlos Guestrin 2006-2008 24 12

Maximum likelihood estimation (MLE) of BN parameters example Given structure, log likelihood of data: Flu Allergy Sinus Nose 10-708 Carlos Guestrin 2006-2008 25 13