Learning in Bayesian Networks

Similar documents
Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Introduction to Probabilistic Graphical Models

COMP538: Introduction to Bayesian Networks

Graphical models and causality: Directed acyclic graphs (DAGs) and conditional (in)dependence

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks BY: MOHAMAD ALSABBAGH

Lecture 6: Graphical Models: Learning

2 : Directed GMs: Bayesian Networks

The Monte Carlo Method: Bayesian Networks

CS Lecture 3. More Bayesian Networks

Machine Learning Summer School

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

CS839: Probabilistic Graphical Models. Lecture 2: Directed Graphical Models. Theo Rekatsinas

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

A Tutorial on Learning with Bayesian Networks

Directed Graphical Models or Bayesian Networks

Directed and Undirected Graphical Models

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

Inferring Protein-Signaling Networks II

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

Rapid Introduction to Machine Learning/ Deep Learning

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

LEARNING WITH BAYESIAN NETWORKS

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Probabilistic Graphical Networks: Definitions and Basic Results

A Tutorial on Bayesian Belief Networks

Summary of the Bayes Net Formalism. David Danks Institute for Human & Machine Cognition

Directed Graphical Models

Bayesian Networks. Motivation

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

An Introduction to Bayesian Machine Learning

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Directed Graphical Models

Bayesian Networks to design optimal experiments. Davide De March

An Empirical-Bayes Score for Discrete Bayesian Networks

Supplementary material to Structure Learning of Linear Gaussian Structural Equation Models with Weak Edges

Causal Graphical Models in Systems Genetics

Reasoning Under Uncertainty: Belief Network Inference

Introduction to Causal Calculus

Representation of undirected GM. Kayhan Batmanghelich

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Bayesian Networks

Probabilistic Graphical Models (I)

The Origin of Deep Learning. Lili Mou Jan, 2015

Markov properties for directed graphs

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

STA 4273H: Statistical Machine Learning

Discovering molecular pathways from protein interaction and ge

Dynamic Approaches: The Hidden Markov Model

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

3 : Representation of Undirected GM

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks. Characteristics of Learning BN Models. Bayesian Learning. An Example

Statistical Approaches to Learning and Discovery

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Probabilistic Graphical Models

Introduction to Bioinformatics

2 : Directed GMs: Bayesian Networks

Data Mining 2018 Bayesian Networks (1)

Intelligent Systems (AI-2)

Bayesian Networks Basic and simple graphs

Predicting Protein Functions and Domain Interactions from Protein Interactions

Bayesian Networks. Vibhav Gogate The University of Texas at Dallas

Inferring Protein-Signaling Networks

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

Structure Learning: the good, the bad, the ugly

An Empirical-Bayes Score for Discrete Bayesian Networks

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Learning Bayesian Networks Does Not Have to Be NP-Hard

Intelligent Systems (AI-2)

Probabilistic Soft Interventions in Conditional Gaussian Networks

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

BN Semantics 3 Now it s personal! Parameter Learning 1

Introduction to Bayesian Networks

Learning Bayesian Networks (part 1) Goals for the lecture

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

Bayes Nets: Independence

Learning P-maps Param. Learning

Undirected Graphical Models

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

PROBABILISTIC REASONING SYSTEMS

{ p if x = 1 1 p if x = 0

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

p L yi z n m x N n xi

Learning Causal Networks from Microarray Data

Review: Bayesian learning and inference

Causality in Econometrics (3)

Introduction to Bayesian Networks. Probabilistic Models, Spring 2009 Petri Myllymäki, University of Helsinki 1

CSE 473: Artificial Intelligence Autumn 2011

Artificial Intelligence Bayes Nets: Independence

CS 5522: Artificial Intelligence II

Computational Genomics

Beyond Uniform Priors in Bayesian Network Structure Learning

CSC 412 (Lecture 4): Undirected Graphical Models

Learning Bayesian Networks for Biomedical Data

Towards an extension of the PC algorithm to local context-specific independencies detection

Transcription:

Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1

Overview 1. Bayesian Networks Stochastic Networks Bayesian statistics Use of Bayesian Networks Inference given a network, Learn the network from data, Causal Inference 2. Analysis of Gene Expression Data Application: Cell Cycle Expression Patterns Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 2

References 1. Bayesian Networks HECKERMAN, David: A Tutorial on Learning with Bayesian Networks, MSR-TR-95-06 2. Analysis of Gene Expression Data FRIEDMAN et. al.: Using Bayesian Networks to Analyze Expression Data, RECOMB 1999. SPELLMAN et. al.: Comprehensive identification of cell cycle-regulated genes of the yeast sac. cer. by microarray hybridization, Molecular Biology of the Cell, 9:3273 3297, 1998. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 3

Graphical Models 1. Markov Chains 2. Hidden Markov Models 3. Markov Random Fields 4. Bayesian Networks... Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 4

Markov Chains Example: Evolution of a protein sequence modelled by a first-order Markov Chain. A A R N i 2 i 1 i i+1 Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 5

Markov Chains Example: Evolution of a protein sequence modelled by a first-order Markov Chain. A A R N i 2 i 1 i i+1 Markov Condition: P ( i i 1, i 2,...) = P ( i i 1 ) Local probability in i depends only on direct predecessor i 1. Conditional Independence: Given its predecessor, i is independent from the other nodes in the graph. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 5

A more complex example 5 4 6 2 3 1 Markov Condition: Each variable i is independent of its non-descendants given its parents. Local probability in i depends only on the parents. Conditional Independence: Given its parents, i is independent from the other nodes in the graph. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 6

Stochastic Networks I A Bayesian Network for = { 1,..., n } consists of a network structure S directed acyclic graph (DAG), nodes variables, lack of arc conditional independence a set of probability distributions P locally: conditional distribution of a variable given its parents in S: P = { P ( i pa i ) } Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 7

Stochastic Networks II 4 5 6 (S, P) encode the joint distribution 2 3 P () = n i=1 P ( i pa i ) 1 Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 8

Equivalence of Networks Markov equivalence S 1 M S 2 if both structures represent the same set of independence assertions. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 9

Equivalence of Networks Markov equivalence S 1 M S 2 if both structures represent the same set of independence assertions. Y Y Y Z Z Z Y Y Y Z Z Z Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 9

Representation of an equivalence class Theorem 1: [Verma and Pearl, 1990] Two DAGs are equivalent iff they have the same skeletons and the same v-structures. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 10

Representation of an equivalence class Theorem 1: [Verma and Pearl, 1990] Two DAGs are equivalent iff they have the same skeletons and the same v-structures. skeleton: the undirected graph resulting from ignoring the directionality of every edge. v-structure: an ordered triple of nodes (, Y, Z) in a graph such that (1) Y and Z Y (2) and Z are not adjacent. Y Z Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 10

Representation of an equivalence class Theorem 1: [Verma and Pearl, 1990] Two DAGs are equivalent iff they have the same skeletons and the same v-structures. skeleton: the undirected graph resulting from ignoring the directionality of every edge. v-structure: an ordered triple of nodes (, Y, Z) in a graph such that (1) Y and Z Y (2) and Z are not adjacent. Y Z Using Theorem 1 we can uniquely represent a equivalence class of DAGs by a partially directed graph (PDAG). Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 10

DAG-2-PDAG Construction: The PDAG identifying the equivalence class of a given DAG contains a directed edge for every edge participating in a v-structure, and an undirected edge for all other edges. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 11

DAG-2-PDAG Construction: The PDAG identifying the equivalence class of a given DAG contains a directed edge for every edge participating in a v-structure, and an undirected edge for all other edges. DAG PDAG DAG PDAG Z Z Z Z Y Y Y Y v structure no v structure Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 11

again: a more complex example DAG 5 PDAG 5 4 6 4 6 2 3 2 3 1 1 Y : all members of the class agree in this arc; Y : some Y and some Y. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 12

What is Probability? Frequentist Answer: Bayesian Answer: Probability is... Probability is... a physical property of the world a personal degree of belief measured by repeated trials arbitrary (techniques for sensible choice) Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 13

Bayes Formula Bayes Formula: P ( ϑ D ) = P (ϑ) P (D ϑ) P (D) Joint distribution: P (ϑ, D) = P (ϑ) P (D ϑ) Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 14

Bayes Formula Bayes Formula: P ( ϑ D ) }{{} Posterior = Prior likelihood {}}{{}}{ P (ϑ) P (D ϑ) P (D) Joint distribution: P (ϑ, D) = P (ϑ) P (D ϑ) Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 15

How to use a Bayesian Network? 1. Probabilistic inference: Determine probability of interest from model 2. Learn Network: Infer Structure and Probabilities from Data 3. Causal Inference: Detect causal patterns. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 16

1. Inference in a Bayesian Network Prior knowledge Construct Bayesian Network Inference Refine by data Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 17

1.1 Construction Determine the variables to model, build DAG that encodes conditional independence edge: cause effect assess local probability distributions P ( i pa i ) Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 18

1.2 Probabilistic Inference Probabilistic Inference: compute a probability of interest given a model. = Use Bayes theorem and simplify by conditional independence Source of difficulty: undirected cycles ignore directionality Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 19

1.3 Refinement Using data to update the probabilities of a given Bayesian Network structure: P ( ϑ, S) = n P ( i pa i, ϑ i, S) i=1 Uncertainty about local distributions is encoded in Prior P (ϑ S) Refinement: Compute Posterior from Prior. Prior P (ϑ S) = P (ϑ D, S) Posterior Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 20

2. Learning a network structure Learning: find network structure which fits the prior knowledge and data. Measure this by a score function, e. g. Score D (S) = log p(s) }{{} Prior + log p(d S) }{{} likelihood Search for highscoring structure by greedy search, simulated annealing,... Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 21

3. Learning Causal Relationships Objective: Determine the flow of causality in a system. Causal Graph: DAG C is a causal graph for = { 1,..., n } iff nodes n i variables i, n i n j i is direct cause of j. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 22

Causal Networks vs. Bayesian Networks Causal Markov Condition: 1. Connection Given the values of a variables immediate causes, it is independent of its earlier causes. Thus: C is a causal graph for = C is a Bayesian Network structure for the joint probability distribution of. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 23

Causal Networks vs. Bayesian Networks 2. Differences A Bayesian Network models the distribution of observations. A Causal Network models the distribution of observations and effects of interventions. Y Y Y Z Z Z Equivalent as Bayesian Networks, but not equivalent as Causal Networks. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 24

Causal Inference from Data From observations alone, we can only learn a PDAG: a whole equivalence class of structures. 5 4 6 2 3 1 Y in the PDAG: all networks agree on this directed arc. = Infer causal direction causes Y. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 25

Application: Cell Cycle in yeast Data from SPELLMAN et. al.: Comprehensive identification of cell cycleregulated genes of the yeast sac. cer. by microarray hybridization, Molecular Biology of the Cell, 9:3273 3297, 1998. contains 76 samples of all the yeast genome (time series in few minutes intervals) Spellman et. al. identified 800 cell cycle regulated genes, and clustered them: 250 genes in 8 clusters Friedman et. al. analysed these 250 genes by a bayesian network. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 26

Data Representation Random Variables: Expression levels of each gene In addition: experimental conditions temporal indication (cell cycle phase) background variables exogenous cellular conditions Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 27

Discretization Focus on qualitative aspects of the data. Discretize gene expression data in three categories: -1 lower than control 0 equal 1 greater Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 28

Pairwise Features Small datasets with many variables: many different networks are reasonable explanations of the data = focus on features that are common to most of these networks 1. Markov relations: Is an direct relative of Y? (local property) 2. Order relations: Is an ancestor of Y in all networks of a given equivalence class? (global property) Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 29

The Sparse Candidate algorithm Learning a network structure = solving optimization problem in the space of DAGs = Efficient search algorithms Sparse Candidate algorithm (Friedman et. al.): Identify a small number of candidate parents, (simple local statistics like correlation) restrict search to networks in which only the candidate parents of a variable can be its parents, iteratively adapt candidate set during search. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 30

Results Order relations: only few genes dominate the order (i. e. appear before many genes). These were found to be key genes in the cell-cycle process. Markov relations: top scoring markov relations between genes were found to indicate a relation in biological function. In general: Friedman et. al. emphasize that BN provide us with a tool that allows biologically plausible conclusions from the data. Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 31

Don t get confused! Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 32

Ideas Incorporating biological knowledge Interventional instead of observational data How to learn causality from knockout experiments? Design of knockout experiments? Florian Markowetz, Learning in Bayesian Networks, 2002-06-20 33