Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Similar documents
Introduction to Probabilistic Graphical Models

COMP538: Introduction to Bayesian Networks

Lecture 6: Graphical Models: Learning

Machine Learning Summer School

Bayesian Networks BY: MOHAMAD ALSABBAGH

Learning in Bayesian Networks

STA 4273H: Statistical Machine Learning

Bayesian Networks. Machine Learning, Fall Slides based on material from the Russell and Norvig AI Book, Ch. 14

School of EECS Washington State University. Artificial Intelligence

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

An Empirical-Bayes Score for Discrete Bayesian Networks

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Probabilistic Graphical Models

The Monte Carlo Method: Bayesian Networks

Bayesian Networks. Motivation

Directed and Undirected Graphical Models

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

Artificial Intelligence

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Stochastic inference in Bayesian networks, Markov chain Monte Carlo methods

Scoring functions for learning Bayesian networks

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Machine Learning for Data Science (CS4786) Lecture 24

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Directed Graphical Models

Machine Learning Lecture 14

Probabilistic Graphical Networks: Definitions and Basic Results

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Lecture 8: Bayesian Networks

Robustification of the PC-algorithm for Directed Acyclic Graphs

Using background knowledge for the estimation of total causal e ects

Graphical Models and Kernel Methods

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

CS 188: Artificial Intelligence. Bayes Nets

An Empirical-Bayes Score for Discrete Bayesian Networks

Probabilistic Graphical Models

Inferring Causal Phenotype Networks from Segregating Populat

Pattern Recognition and Machine Learning

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

Beyond Uniform Priors in Bayesian Network Structure Learning

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

CS 343: Artificial Intelligence

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Learning Bayesian networks

p L yi z n m x N n xi

Local Structure Discovery in Bayesian Networks

arxiv: v6 [math.st] 3 Feb 2018

Outline. CSE 573: Artificial Intelligence Autumn Bayes Nets: Big Picture. Bayes Net Semantics. Hidden Markov Models. Example Bayes Net: Car

Inference as Optimization

CS281A/Stat241A Lecture 19

Learning causal network structure from multiple (in)dependence models

Estimating High-Dimensional Directed Acyclic Graphs With the PC-Algorithm

Chris Bishop s PRML Ch. 8: Graphical Models

BN Semantics 3 Now it s personal! Parameter Learning 1

Causal Structure Learning and Inference: A Selective Review

Artificial Intelligence: Cognitive Agents

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Symbolic Variable Elimination in Discrete and Continuous Graphical Models. Scott Sanner Ehsan Abbasnejad

Recent Advances in Bayesian Inference Techniques

Variational Scoring of Graphical Model Structures

Introduction to Bayesian Learning

STA 4273H: Statistical Machine Learning

A Tutorial on Learning with Bayesian Networks

Bayesian Network. Outline. Bayesian Network. Syntax Semantics Exact inference by enumeration Exact inference by variable elimination

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Bayesian Networks. Introduction

Bayesian networks. Chapter 14, Sections 1 4

Learning Bayesian Networks (part 1) Goals for the lecture

Directed Graphical Models or Bayesian Networks

STA 4273H: Statistical Machine Learning

Lecture 6: Graphical Models

Probabilistic Graphical Models (I)

Towards an extension of the PC algorithm to local context-specific independencies detection

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Structure Learning: the good, the bad, the ugly

Probabilistic Graphical Models

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Inference in Bayesian Networks

Expectation Propagation in Factor Graphs: A Tutorial

Bayesian Learning in Undirected Graphical Models

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

PROBABILISTIC REASONING SYSTEMS

Directed Graphical Models

Organization. I MCMC discussion. I project talks. I Lecture.

STA 414/2104: Machine Learning

13: Variational inference II

Who Learns Better Bayesian Network Structures

Bayesian Networks Structure Learning (cont.)

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

A Decision Theoretic View on Choosing Heuristics for Discovery of Graphical Models

Bayesian Learning. Instructor: Jesse Davis

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Probabilistic Graphical Models

Bayesian Methods for Machine Learning

Transcription:

Learning With Bayesian Networks Markus Kalisch ETH Zürich

Inference in BNs - Review P(Burglary JohnCalls=TRUE, MaryCalls=TRUE) Exact Inference: P(b j,m) = c Sum e Sum a P(b)P(e)P(a b,e)p(j a)p(m a) Deal with sums in a clever way: Variable elimination, message passing Singly connected: linear in space/time Multiply connected: exponential in space/time (worst case) Approximate Inference: Direct sampling Likelihood weighting MCMC methods 2

Learning BNs - Overview Brief summary of Heckerman Tutorial Recent provably correct Search Methods: Greedy Equivalence Search (GES) PC-algorithm Discussion 3

Abstract and Introduction Graphical Modeling offers: Easy handling of missing data Easy modeling of causal relationships Easy combination of prior information and data Easy to avoid overfitting 4

Bayesian Approach Degree of belief Rules of probability are a good tool to deal with beliefs Probability assessment: Precision & Accuracy Running Example: Multinomial Sampling with Dirichlet Prior 5

Bayesian Networks (BN) Define a BN by a network structure local probability distributions To learn a BN, we have to choose the variables of the model choose the structure of the model assess local probability distributions 6

We have seen up to now: Book by Russell / Norvig: exact inference variable elimination approximate methods Talk by Prof. Loeliger: Inference factor graphs / belief propagation / message passing Probabilistic inference in BN is NP-hard: Approximations or special-case-solutions are needed 7

Learning Parameters (structure given) Prof. Loeliger: Trainable parameters can be added to the factor graph and therefore be infered Complete Data reduce to one-variable case Incomplete Data (missing at random) formula for posterior grows exponential in number of incomplete cases Gibbs-Sampling Gaussian Approximation; get MAP by gradient based optimization or EM-algorithm 8

Learning Parameters AND structure Can learn structure only up to likelihood equivalence Averaging over all structures is infeasible: Space of DAGs and of equivalence classes grows super-exponentially in the number of nodes. 9

Model Selection Don't average over all structures, but select a good one (Model Selection) A good scoring criterion is the log posterior probability: log(p(d,s)) = log(p(s)) + log(p(d S)) Priors: Dirichlet for Parameters / Uniform for structure Complete cases: Compute this exactly Incomplete cases: Gaussian Approximation and further simplification lead to BIC log(p(d S)) = log(p(d ML-Par,S)) d/2 * log(n) This is usually used in practice. 10

Search Methods Learning BNs on discrete nodes (3 or more parents) is NP-hard (Heckerman 2004) There are provably (asymptoticly) correct search methods: Search and Score methods: Greedy Equivalence Search (GES; Chickering 2002) Constrained based methods: PC-algorithm (Spirtes et. al. 2000) 11

GES The Idea Restrict the search space to equivalence classes Score: BIC separable search criterion => fast Greedy Search for best equivalence class In theory (asymptotic): Correct equivalence class is found 12

GES The Algorithm GES is a two-stage greedy algorithm Initialize with equivalence class E containing the empty DAG Stage 1: Repeatedly replace E with the member of E + (E) that has the highest score, until no such replacement increases the score Stage 2: Repeatedly replace E with the member of E - (E) that has the highest score, until no such replacement increases the score 13

PC The idea Start: Complete, undirected graph Recursive conditional independence tests for deleting edges Afterwards: Add arrowheads In theory (asymptotic): Correct equivalence class is found 14

PC The Algorithm Form complete, undirected graph G l = -1 repeat l=l+1 repeat select ordered pair of adjacent nodes A,B in G select neighborhood N of A with size l (if possible) delete edge A,B in G if A,B are cond. indep. given N until all ordered pairs have been tested until all neighborhoods are of size smaller than l Add arrowheads by applying a couple of simple rules 15

A B A D C D Example Conditional Independencies: l=0: none l=1: B C A B D C PC-algorithm: correct skeleton 16

Sample Version of PC-algorithm Real World: Cond. Indep. Relations not known Instead: Use statistical test for Conditional independence Theory: Using statistical test instead of true conditional independence relations is often ok 17

Comparing PC and GES For p = 10, n = 50, E(N) = 0.9, 50 replicates: Method ave[tpr] ave[fpr] ave[tdr] PC 0.57 (0.06) 0.02 (0.01) 0.91 (0.05) GES 0.85 (0.05) 0.13 (0.04) 0.71 (0.07) The PC-algorithm finds less edges finds true edges with higher reliability is fast for sparse graphs (e.g. p=100,n=1000,e[n]=3: T = 13 sec) 18

Learning Causal Relationships Causal Markov Condition: Let C be a causal graph for X then C is also a Bayesian-network structure for the pdf of X Use this to infer causal relationships 19

Conclusion Using a BN: Inference (NP-Hard) exact inference, variable elimination, message passing (factor graphs) approximate methods Learn BN: Parameters: Exact, Factor Graphs Monte Carlo, Gauss Structure: GES, PC-algorithm; NP-Hard 20