The Monte Carlo Method: Bayesian Networks

Similar documents
Learning in Bayesian Networks

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Probabilistic Graphical Models (I)

Learning Bayesian networks

Graphical Models and Kernel Methods

{ p if x = 1 1 p if x = 0

Learning Bayesian network : Given structure and completely observed data

Bayesian Networks BY: MOHAMAD ALSABBAGH

Lecture 6: Graphical Models: Learning

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Machine Learning Summer School

Directed Graphical Models

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

CS Lecture 3. More Bayesian Networks

Directed Graphical Models

Probabilistic Graphical Networks: Definitions and Basic Results

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Rapid Introduction to Machine Learning/ Deep Learning

An Introduction to Bayesian Machine Learning

Patterns of Scalable Bayesian Inference Background (Session 1)

Lecture 5: Bayesian Network

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Probabilistic Graphical Models

Learning With Bayesian Networks. Markus Kalisch ETH Zürich

An Empirical-Bayes Score for Discrete Bayesian Networks

Directed Graphical Models or Bayesian Networks

Probabilistic Graphical Models

Variational Scoring of Graphical Model Structures

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Probabilistic Graphical Models and Bayesian Networks. Artificial Intelligence Bert Huang Virginia Tech

WELCOME BAYESIAN NETWORK

Probabilistic Graphical Models

STA 4273H: Statistical Machine Learning

Being Bayesian About Network Structure:

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Directed and Undirected Graphical Models

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

PMR Learning as Inference

Chris Bishop s PRML Ch. 8: Graphical Models

LEARNING WITH BAYESIAN NETWORKS

4: Parameter Estimation in Fully Observed BNs

Bayesian Networks Basic and simple graphs

Tópicos Especiais em Modelagem e Análise - Aprendizado por Máquina CPS863

Inferring Protein-Signaling Networks II

Probabilistic Machine Learning

Learning Bayesian Networks

Bayesian Machine Learning

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Beyond Uniform Priors in Bayesian Network Structure Learning

Inference in Bayesian Networks

Introduction to Probabilistic Graphical Models

CS 343: Artificial Intelligence

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Undirected Graphical Models

Chapter 16. Structured Probabilistic Models for Deep Learning

Data Mining 2018 Bayesian Networks (1)

PROBABILISTIC REASONING SYSTEMS

Bayes Nets: Independence

Bayesian Learning. Instructor: Jesse Davis

An Empirical-Bayes Score for Discrete Bayesian Networks

Inferring Protein-Signaling Networks

Probabilistic Classification

Bayesian Networks. Marcello Cirillo PhD Student 11 th of May, 2007

Advanced Probabilistic Modeling in R Day 1

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Introduction to Probabilistic Graphical Models

Bayesian Approaches Data Mining Selected Technique

Uncertainty and Bayesian Networks

Informatics 2D Reasoning and Agents Semester 2,

Review: Bayesian learning and inference

T Machine Learning: Basic Principles

Bayesian Networks. Motivation

Topic Modelling and Latent Dirichlet Allocation

Markov Networks.

Artificial Intelligence Bayes Nets: Independence

Reasoning Under Uncertainty: Belief Network Inference

Preliminaries Bayesian Networks Graphoid Axioms d-separation Wrap-up. Bayesian Networks. Brandon Malone

Bayesian Regression Linear and Logistic Regression

Introduction to Bayesian inference

Bayesian RL Seminar. Chris Mansley September 9, 2008

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Learning Bayesian Networks (part 1) Goals for the lecture

A Tutorial on Learning with Bayesian Networks

An Efficient Search Strategy for Aggregation and Discretization of Attributes of Bayesian Networks Using Minimum Description Length

Need for Sampling in Machine Learning. Sargur Srihari

CMPSCI 240: Reasoning about Uncertainty

CS 5522: Artificial Intelligence II

Probability and Estimation. Alan Moses

COS513 LECTURE 8 STATISTICAL CONCEPTS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Bayesian Methods for Machine Learning

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Introduction to Bayesian Networks

Bayesian Networks ETICS Pierre-Henri Wuillemin (LIP6 UPMC)

Graphical Models - Part I

Artificial Intelligence Bayesian Networks

Transcription:

The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18

Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks for Gene Expression Data Dynamic Bayesian Networks Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 2 / 18

Outline Graphical display of dependence structure between multiple interacting quantities (expression levels of different genes). Probabilistic semantics: Fits the stochastic nature of both the biological processes and noisy experiments. Capable of handling noise and estimating the confidence in the different features of the network. Due to lack of data: Extract features that are pronounced in the data rather than a single model that explains the data. Random variable X i = measured expression level of gene i represented by nodes. Edges = regulatory interactions between genes. Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 3 / 18

Outline Define the functional form of the conditional distributions (e.g. multinomial for discrete variables, linear Gaussian for continuous variables). Find the best network structure S Given a network structure, find the best set of parameters for the conditional distributions (the most probable structure/parameter vector given the data) Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 4 / 18

Bayesian Networks Graphic representation of a joint distribution over a set of random variables A, B, C, D, E. P(A, B, C, D, E) = P(A) P(B) P(C A) P(D A, B) P(E D) A B C D Example: Nodes represent gene expression while edges encode the interactions (cf. inhibition, activation) E ieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 5 / 18

Bayesian Networks Given a set of random variables X = (X 1,..., X n ), a Bayesian network is defined as a pair BN = (S, θ), where S is a directed acyclic graph (DAG), which is a graphical representation of the conditional independencies between variables in X θ is the set of parameters for the conditional probability distributions of these variables. In a Bayesian network, the probability of a state x = (x 1, x 2,..., x n ) is factored as P(x) = P(x 1 pa(x 1 ))P(x 2 pa(x 2 ))...P(x n pa(x n )), where pa(x) denotes the parents of node x in the graph S Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 6 / 18

Gene Expression Data Consider a microarray (D ij ), whose rows (D i. ) correspond to genes and whose columns (D.j ) correspond to probes (tissue samples, experiments, etc.) A real value is coming from one spot and tells if the concentration of a specific mrna is higher(+) or lower(-) than the normal value Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 7 / 18

Dynamic Bayesian Networks A Bayesian network should be a DAG (Direct Acyclic Graph). Random variable X i = measured expression level of gene i. Arcs = regulatory interactions between genes. A A1 A2 However, there are lots regulatory networks having directed cycles. B B1 B2 Solve this by expanding into the time direction Use DBN (Dynamic Bayesian Networks: BN with constraints on parents and children nodes) for sequential gene expression data ieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 8 / 18

Dynamic Bayesian Networks We are looking for a Bayesian network that is most probable given the data D (gene expression) where BN = argmax BN {P(BN D)} P(BN D) = P(D BN)P(BN) P(D) There are many networks. An exhaustive search and scoring approach for the different models will not work in practice (the number of networks increases super-exponentially, O(2 n2 ) for dynamic Bayesian networks) Idea: Sample the networks such that we eventually have sampled the most probable networks Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 9 / 18

Bayesian Networks for Gene Expression Data Recall detailed balance condition for P(BN old D)P(BN old BN new D) = P(BN new D)P(BN new BN old D) Let us look at P(BN D) = P(D BN)P(BN) P(D) Assume P(BN) is uniformly distributed (We could incorporate knowledge) Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 10 / 18

Choose next BN with probability P(BN new ) Accept the new BN with the following Metropolis-Hastings accept/rejection criterion: { P = min 1, P(BN } new D)P(BN new BN old D) P(BN old D)P(BN old BN new D) { = min 1, P(D BN } new)p(bn new )P(D) P(D BN old )P(BN old )P(D) { = min 1, P(D BN } new)p(bn new ) P(D BN old )P(BN old ) { = min 1, P(D BN } new) P(D BN old ) Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 11 / 18

Discrete model Even though the amount of mrna or protein levels, for example, can vary in a scale that is most conveniently modeled as continuous, we can still model the system by assuming that it operates with functionally discrete states activated / not activated (2 states) under expressed / normal / over expressed (3 states) Discretization of data values can be used to compromise between the averaging out of noise accuracy of the model complexity/accuracy of the model/parameter learning Qualitative models can be learned even when the quality of the data is not sufficient for more accurate model classes Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 12 / 18

Let N ijk be the number of times we observe variable/node i in state k given parent node configuration j Summarize the number of total number of observations for variable i with parent node configuration j, r i N ij = k=1 N ijk Since our states are discrete we use a multinomial distribution the ML estimate of multinomial probabilities is obtained by the normalized counts ˆθ ijk = N ijk N i j Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 13 / 18

A convenient prior distribution to choose for the parameters θ is given by the Dirichlet distribution (θ ij1,..., θ ijri ) Dirichlet(α ij1,..., α ijri ) The Dirichlet distribution has PDF f (θ ij1,...θ ijri ; α ij1,...α ijri ) = 1 B(α ij ) r i i=1 θ α ijr i 1 ijr i with θ ijri 0, i θ ijr i = 1 and hyperparameters α ijri 0, α ij = k α ijr i The normalization constant, the Beta function, can be expressed using the gamma function B(α ij ) = ri k=1 Γ(α ijr i ) Γ(α ij ) Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 14 / 18

The convenience arises from the fact that the distribution is conjugate to the multinomial distribution, i.e., if P(θ) is Dirichlet and P(X θ) is multinomial, then P(θ X ) is Dirichlet as well The multinomial distribution is given (for N ij = k N ijk) by f (N ij1,..., N ijri N ij, θ ij1,..., θ ijri ) = N ij! N ij1! N ijri! θn ij1 ij1 θn ijr i ijr i and is the distribution of observations in r i classes if N ij observations are selected as outcomes of independent selection from the classes with probabilities θ ijk, k = 1,...r i Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 15 / 18

Structural Properties In order to get reliable results we can focus on features that can be inferred for example, we can define a feature, an indicator variable f with value 1 if and only if the structure of the model contains a path between nodes A and B Looking at a set of models S with a good fit we can approximate the posterior probability of feature f by P(f D) = S f (S)P(S D) With gene regulatory networks, one can look for only the most significant edges based on the scoring Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 16 / 18

A Markov chain is defined over Bayesian nets so that it approaches a steady-state distribution as it is being run, and the probabilities of the states (networks) correspond to their posterior probability Individual nets are created as states in the chain and after (assumed) convergence, samples S i are taken Posterior probability of an edge can then be approximated with P(f (S) D) 1 n n f (S i ) i=1 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 17 / 18

To work out the Method to generate networks we first have to compute P(D S) P(D S) = P(D θ, S)P(θ S)dθ θ =... n q i = i=1 j=1 Γ(α ij ) Γ(α ij + N ij ) r i k=1 Γ(α ijk + N ijk ) Γ(α ijk ) moves: ADD, REMOVE, REVERSE edge in network Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 18 / 18