Lecture Network analysis for biological systems

Similar documents
Algorithms and methods of the BoolNet R package

Synchronous state transition graph

Introduction to Bioinformatics

State-Feedback Control of Partially-Observed Boolean Dynamical Systems Using RNA-Seq Time Series Data

5.3 METABOLIC NETWORKS 193. P (x i P a (x i )) (5.30) i=1

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

Biological networks CS449 BIOINFORMATICS

Networks in systems biology

86 Part 4 SUMMARY INTRODUCTION

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

arxiv: v1 [q-bio.mn] 7 Nov 2018

Boolean networks for modeling and analysis of gene regulation

Index. FOURTH PROOFS n98-book 2009/11/4 page 261

A&S 320: Mathematical Modeling in Biology

Logical modeling of the mammalian cell cycle

Network Dynamics and Cell Physiology. John J. Tyson Department of Biological Sciences & Virginia Bioinformatics Institute

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Introduction to Bioinformatics

A New Method to Build Gene Regulation Network Based on Fuzzy Hierarchical Clustering Methods

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Random Boolean Networks

nutrients growth & division repellants movement

Inferring Update Sequences in Boolean Gene Regulatory Networks

Classification of Random Boolean Networks

6.867 Machine learning, lecture 23 (Jaakkola)

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Classification of Random Boolean Networks

Sig2GRN: A Software Tool Linking Signaling Pathway with Gene Regulatory Network for Dynamic Simulation

Lecture 6: Time-Dependent Behaviour of Digital Circuits

Simulation of Gene Regulatory Networks

Computational Systems Biology

Lab 2 Worksheet. Problems. Problem 1: Geometry and Linear Equations

Course plan Academic Year Qualification MSc on Bioinformatics for Health Sciences. Subject name: Computational Systems Biology Code: 30180

FCModeler: Dynamic Graph Display and Fuzzy Modeling of Regulatory and Metabolic Maps

Inferring Protein-Signaling Networks II

Learning in Bayesian Networks

The Monte Carlo Method: Bayesian Networks

Optimal control of Boolean control networks

Intrinsic Noise in Nonlinear Gene Regulation Inference

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS

DESIGN OF EXPERIMENTS AND BIOCHEMICAL NETWORK INFERENCE

Self Similar (Scale Free, Power Law) Networks (I)

Inferring Protein-Signaling Networks

Plant Molecular and Cellular Biology Lecture 10: Plant Cell Cycle Gary Peter

Analysis and Simulation of Biological Systems

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs

Accepted Manuscript. Boolean Modeling of Biological Regulatory Networks: A Methodology Tutorial. Assieh Saadatpour, Réka Albert

Lecture 10: May 27, 2004

Models of transcriptional regulation

Approximate inference for stochastic dynamics in large biological networks

Logic-Based Modeling in Systems Biology

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Lecture 9: June 21, 2007

BSc MATHEMATICAL SCIENCE

SYSTEMS MEDICINE: AN INTEGRATED APPROACH WITH DECISION MAKING PERSPECTIVE. A Dissertation BABAK FARYABI

Discovering molecular pathways from protein interaction and ge

Artificial Neural Networks Examination, March 2004

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

CS 188: Artificial Intelligence Fall 2008

Package ENA. February 15, 2013

Unsupervised machine learning

Learning and Memory in Neural Networks

Noisy Attractors and Ergodic Sets in Models. of Genetic Regulatory Networks

Lecture 8: Temporal programs and the global structure of transcription networks. Chap 5 of Alon. 5.1 Introduction

Using a Hopfield Network: A Nuts and Bolts Approach

4. Why not make all enzymes all the time (even if not needed)? Enzyme synthesis uses a lot of energy.

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Machine Learning. Neural Networks

Analog Electronics Mimic Genetic Biochemical Reactions in Living Cells

Reification of Boolean Logic

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

56:198:582 Biological Networks Lecture 10

Artificial Neural Network

Overview of Research at Bioinformatics Lab

Asynchronous Stochastic Boolean Networks as Gene Network Models

Illustration of the K2 Algorithm for Learning Bayes Net Structures

arxiv: v1 [cs.sy] 25 Oct 2017

ACTA PHYSICA DEBRECINA XLVI, 47 (2012) MODELLING GENE REGULATION WITH BOOLEAN NETWORKS. Abstract

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Markov Networks.

Computational methods for predicting protein-protein interactions

Differential Modeling for Cancer Microarray Data

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Predicting Protein Functions and Domain Interactions from Protein Interactions

Introduction to Statistical Inference

Plant Molecular and Cellular Biology Lecture 8: Mechanisms of Cell Cycle Control and DNA Synthesis Gary Peter

Lecture 10: Cyclins, cyclin kinases and cell division

Philadelphia University Faculty of Engineering

ANALYSIS OF BIOLOGICAL NETWORKS USING HYBRID SYSTEMS THEORY. Nael H. El-Farra, Adiwinata Gani & Panagiotis D. Christofides

Lecture 1 Modeling in Biology: an introduction

Logic Regression: Biological Motivation Cyclic Gene Study

ANAXOMICS METHODOLOGIES - UNDERSTANDING

Supplementary Materials

Machine Learning for Data Science (CS4786) Lecture 24

Supplementary methods

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

V12 Gene Regulatory Networks, Boolean Networks

Optimal State Estimation for Boolean Dynamical Systems using a Boolean Kalman Smoother

Transcription:

Lecture 11 2014 Network analysis for biological systems Anja Bråthen Kristoffersen

Biological Networks Gene regulatory network: two genes are connected if the expression of one gene modulates expression of another one by either activation or inhibition Protein interaction network: proteins that are connected in physical interactions or metabolic and signaling pathways of the cell Metabolic network: metabolic products and substrates that participate in one reaction Statistical bioinformatics 3

What is Gene Regulatory Network? Gene regulatory networks (GRNs) are the on-off switches of a cell operating at the gene level. Two genes are connected if the expression of one gene modulates expression of another one by either activation or inhibition Statistical bioinformatics 4

Simplified Representation of Gene Regulatory Network A gene regulatory network can be represented by a directed graph Node represents a gene Directed edge stands for the modulation (regulation) of one node by another: e.g. arrow from gene X to gene Y means gene X affects expression of gene Y Statistical bioinformatics 5

Why study Gene Regulatory Network Genes are not independent They regulate each other and act collectively This collective behavior can be observed using microarray Some genes control the response of the cell to changes in the environment by regulating other genes; Potential discovery of triggering mechanism and treatments for disease Statistical bioinformatics 6

Network Modeling techniques Boolean network (BN) Bayesian belief network Metabolic network modeling methods Statistical bioinformatics 7

Boolean network modeling Boolean: either true or false (1 or 0) Binarization reduces the noise in biological data captures the dynamic behavior in complex systems need a threshold value leads to loss of information Genes are modeled as switch like dynamic elements either on or off Statistical bioinformatics 8

Boolean network consist of A set of genes. A set of Boolean functions F = f i (x 1, x 2,, x n ) the function is described with three boolean operators AND / && / & / OR / / / NOT /! / ~ Statistical bioinformatics 9

Example Graph (G) with 3 genes Given network G(V,F), V = {x 1, x 2, x 3 } F = {f 1 = x 2 & x 3, f 2 = x 1, f 3 = x 2 } x 2 x 1 x 3 Wiring diagram Input (t-1) 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Truth table Output(t) Statistical bioinformatics 10

Example Graph (G) with 3 genes Given network G(V,F), V = {x 1, x 2, x 3 } F = {f 1 = x 2 & x 3, f 2 = x 1, f 3 = x 2 } x 2 x 1 x 3 Input (t-1) 0 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Output(t) Statistical bioinformatics 11

Example Graph (G) with 3 genes Given network G(V,F), V = {x 1, x 2, x 3 } F = {f 1 = x 2 & x 3, f 2 = x 1, f 3 = x 2 } x 2 x 1 x 3 Input (t-1) Output(t) 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Statistical bioinformatics 12

Example Graph (G) with 3 genes Given network G(V,F), V = {x 1, x 2, x 3 } F = {f 1 = x 2 & x 3, f 2 = x 1, f 3 = x 2 } x 2 x 1 x 3 Input (t-1) Output(t) 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Statistical bioinformatics 13

Example Graph (G) with 3 genes Given network G(V,F), V = {x 1, x 2, x 3 } F = {f 1 = x 2 & x 3, f 2 = x 1, f 3 = x 2 } x 2 x 1 x 3 Input (t-1) Output(t) 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 1 1 1 1 1 1 Statistical bioinformatics 14

Example Graph (G) with 3 genes Given network G(V,F), V = {x 1, x 2, x 3 } F = {f 1 = x 2 & x 3, f 2 = x 1, f 3 = x 2 } Input (t-1) Output(t) 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 1 1 1 1 1 1 110 x 2 x 1 x 3 Statistical bioinformatics 15

Example Graph (G) with 3 genes Given network G(V,F), V = {x 1, x 2, x 3 } F = {f 1 = x 2 & x 3, f 2 = x 1, f 3 = x 2 } Input (t-1) Output(t) 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 1 1 1 1 1 1 110 011 x 2 x 1 x 3 Statistical bioinformatics 16

Example Graph (G) with 3 genes Given network G(V,F), V = {x 1, x 2, x 3 } F = {f 1 = x 2 & x 3, f 2 = x 1, f 3 = x 2 } Input (t-1) Output(t) 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 1 1 1 1 1 1 110 011 101 x 2 010 001 x 1 x 3 000 Statistical bioinformatics 17

Example Graph (G) with 3 genes Given network G(V,F), V = {x 1, x 2, x 3 } F = {f 1 = x 2 & x 3, f 2 = x 1, f 3 = x 2 } Input (t-1) Output(t) 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 1 1 1 1 1 1 110 011 101 x 2 010 001 x 1 x 3 000 Statistical bioinformatics 18

Example Graph (G) with 3 genes Given network G(V,F), V = {x 1, x 2, x 3 } F = {f 1 = x 2 & x 3, f 2 = x 1, f 3 = x 2 } Input (t-1) Output(t) 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 1 1 1 1 1 1 110 011 x 2 x 1 101 100 010 001 x 3 State transition 111 000 Statistical bioinformatics 19

R code truth table G(V,F), V = {x 1, x 2, x 3 } F = {f 1 = x 2 & x 3, f 2 = x 1, f 3 = x 2 } # f1 state x(t-1) is a vector x with three elements # want to find state x(t) here called y also with three elements y <- rep(na, 3) # allocate space for three elements in y y[1] <- x[2] && x[3] y[2] <- x[1] y[3] <- x[2] # test it by leting x <- c(0,1,0) and run the code over, according to the # truth table should y be 0,0,1. x <- c(0,1,0) y <- rep(na, 3) # allocate space for three elements in y y[1] <- x[2] && x[3] y[2] <- x[1] y[3] <- x[2] Statistical bioinformatics 20

Make a function of the truth table R code output <- function(x){ y <- rep(na, 3) y[1] <- x[2] && x[3] y[2] <- x[1] y[3] <- x[2] return(y) } output(c(0,0,1)) [1] 0 0 0 Statistical bioinformatics 21

R code: Make the output table based on a input table input <- rbind(c(0,0,0), c(0,0,1), c(0,1,0), c(0,1,1), c(1,0,0), c(1,0,1), c(1,1,0), c(1,1,1)) y <- matrix(na, ncol = ncol(input), nrow= nrow(input)) for(i in 1:nrow(y)){ y[i,] <- output(input[i,]) } Statistical bioinformatics 22

Search for Boolean functions Chi-square testing-based search Kim H, Lee JK, Park T. (2007). Boolean networks using the chi-square test for inferring large-scale gene regulatory networks. BMC Bioinformatics, 8:37. G1 G2 G3 G4 T1 0 1 0 1 T2 1 0 0 1 T3 1 1 1 0 T4 0 1 0 0 T5 0 0 0 1 T6 1 0 1 0 T7 0 1 1 0 T8 0 0 0 0 T9 0 0 1 0 11. januar 2014 Statistical bioinformatics 23 T10 0 0 1 0 Binary data set from simple network: Four nodes (G1, G2, G3, G4) and 10 time points

Observe which gene is on/off the time point before If we want to find a Boolean function, f 4 for node G4, we have to test the independency between G4 at time t and all nodes (G1, G2, G3, G4) at time t-1. We use a chi square distribution and compare expected with observed Statistical bioinformatics 24

2 x 2 Contingency table G4 t 0 1 G1 t-1 0 1 G4 t 0 1 G2 t-1 0 1 Find both the expected and the observed contigency tables G4 t 0 1 G3 t-1 0 1 G4 t 0 1 G4 t-1 0 1 Statistical bioinformatics 25

2 x 2 Contingency table, expected Assume independence between Gi and Gj Only depend on background distribution G1 t-1 0 1 G4 t 0 P(G4=0)*P(G1=0) P(G4=0)*P(G1=1) 1 P(G4=1)*P(G1=0) P(G4=1)*P(G1=1) Similar for the other pars of Gi and Gj Statistical bioinformatics 26

2 x 2 Contingency table, observed G4 t 0 1 G4 t 0 1 G1 t-1 0 1 G3 t-1 0 1 G4 t 0 1 G4 t 0 1 G2 t-1 0 1 G4 t-1 0 1 G1 G2 G3 G4 T1 0 1 0 1 T2 1 0 0 1 T3 1 1 1 0 T4 0 1 0 0 T5 0 0 0 1 T6 1 0 1 0 T7 0 1 1 0 T8 0 0 0 0 T9 0 0 1 0 T10 0 0 1 0 Statistical bioinformatics 27

2 x 2 Contingency table, observed G1 t-1 0 1 G4 t 0 4 3 1 G4 t 0 1 G3 t-1 0 1 G4 t 0 1 G4 t 0 1 G2 t-1 0 1 G4 t-1 0 1 G1 G2 G3 G4 T1 0 1 0 1 T2 1 0 0 1 T3 1 1 1 0 T4 0 1 0 0 T5 0 0 0 1 T6 1 0 1 0 T7 0 1 1 0 T8 0 0 0 0 T9 0 0 1 0 T10 0 0 1 0 Statistical bioinformatics 28

2 x 2 Contingency table, observed G1 t-1 0 1 G4 t 0 4 3 1 2 0 G3 t-1 0 1 G4 t 0 3 4 1 2 0 G2 t-1 0 1 G4 t 0 5 2 1 0 2 G4 t-1 0 1 G4 t 0 5 2 1 1 1 G1 G2 G3 G4 T1 0 1 0 1 T2 1 0 0 1 T3 1 1 1 0 T4 0 1 0 0 T5 0 0 0 1 T6 1 0 1 0 T7 0 1 1 0 T8 0 0 0 0 T9 0 0 1 0 T10 0 0 1 0 Statistical bioinformatics 29

chi square distribution Compare expected with observed Statistical bioinformatics 30

Chi-Square Test Result of independence for G4 t node G1 t-1 G2 t-1 G3 t-1 G4 t-1 1.286 3.214 2.057 0.321 p-value 0.511 0.170 0.430 1 Statistical bioinformatics 31

R code, 2 x 2 Contingency table d1 <- read.table("c:/users/anjab/desktop/infstk/simpledataset4genes10timepoints.txt", header = T, sep = "\t") #I got my dataset read in with the rownames in the first coloumn, #I did not like that so I changed it d2 <- d1[,2:5] rownames(d2) <- d1[,1] #always check that you read it in correctly. head(d2) G1 G2 G3 G4 T1 0 1 0 1 T2 1 0 0 1 T3 1 1 1 0 T4 0 1 0 0 T5 0 0 0 1 T6 1 0 1 0 Statistical bioinformatics 32

R code, 2 x 2 contingency table G1 t-1 0 1 G4 t 0 a b 1 c d # start with gene 4 and make the 2 x 2 contingency table for gene 1 # find the number a. n <- nrow(d2) posible <- which(d2[2:n,4] == 0) posible [1] 2 3 5 6 7 8 9 a <- sum(length(which(d2[posible, 1] == 0))) a [1] 4 G1 G4 T1 0 1 T2 1 1 T3 1 0 T4 0 0 T5 0 1 T6 1 0 T7 0 0 T8 0 0 T9 0 0 Statistical bioinformatics 33 T10 0 0

R code, 2 x 2 contingency table # find the number a, b, c and d. n <- nrow(d2) posible0 <- which(d2[2:n,4] == 0) posible1 <- which(d2[2:n,4] == 1) a <- sum(length(which(d2[posible0, 1] == 0))) b <- sum(length(which(d2[posible0, 1] == 1))) c1 <- sum(length(which(d2[posible1, 1] == 0))) d <- sum(length(which(d2[posible1, 1] == 1))) conttable <- matrix(c(a,b,c1,d), ncol = 2, byrow = T) conttable [,1] [,2] [1,] 4 3 [2,] 2 0 Statistical bioinformatics G1 t-1 0 1 G4 t 0 a b 1 c d G1 G4 T1 0 1 T2 1 1 T3 1 0 T4 0 0 T5 0 1 T6 1 0 T7 0 0 T8 0 0 T9 0 0 T10 0 0

chisq.test(conttable) > chisq.test(conttable) Pearson's Chi-squared test with Yates' continuity correction data: conttable X-squared = 0.0804, df = 1, p-value = 0.7768 Warning message: In chisq.test(conttable) : Chi-squared approximation may be incorrect NB, we have a very little dataset, how can we get rid of the warning message?. Look at help(chisq.test) Statistical bioinformatics 35

help(chisq.test) You find out that the p-value can be simulated using Monte Carlo simulation. This is for us with a small dataset, a good option. Statistical bioinformatics 36

chisq.test(, simulate.p.value = T) > chisq.test(conttable, simulate.p.value = T) Pearson's Chi-squared test with simulated p-value (based on 2000 replicates) data: conttable X-squared = 1.2857, df = NA, p-value = 0.5097 > chisq.test(conttable, simulate.p.value = T) Pearson's Chi-squared test with simulated p-value (based on 2000 replicates) data: conttable X-squared = 1.2857, df = NA, p-value = 0.5002 Statistical bioinformatics 37

Chi-Square Test Result of independence for all genes G1 t-1 G2 t-1 G3 t-1 G4 t-1 p-value G1 t p-value G2 t p-value G3 t p-value G4 t 0.511 0.170 0.430 1 Statistical bioinformatics 38

Make a function that takes two vectors, x (gene at time t) and y (gene at time t-1) chisqres <- function(x,y){ n <- length(x) posible0 <- which(y[2:n] == 0) posible1 <- which(y[2:n] == 1) a <- sum(length(which(x[posible0] == 0))) b <- sum(length(which(x[posible0] == 1))) c1 <- sum(length(which(x[posible1] == 0))) d <- sum(length(which(x[posible1] == 1))) counttable <- matrix(c(a,b,c1,d), ncol = 2, byrow = T) chisq.test(counttable,, simulate.p.value = T)$p.value } 11. januar 2014 Statistical bioinformatics 39

Use chisqres() res <- matrix(na, ncol(d2), ncol(d2)) for(i in 1:ncol(d2)){ } for(j in 1:ncol(d2)){ } res[i,j] <- chisqres(d2[,i], d2[,j]) colnames(res) <- paste(colnames(d2), "t-1", sep = "") rownames(res) <- paste(colnames(d2), "t", sep = "") round(res, 3) G1t-1 G2t-1 G3t-1 G4t-1 G1t 1.000 1.000 0.168 0.012 G2t 0.009 1.000 0.512 1.000 G3t 1.000 0.007 1.000 1.000 This only tells us that G1t is dependent on the state G4t-1 had, not which type of dependence. G4t 0.491 0.161 0.428 1.000 40

Goes further and calculate 2 2 2 tables with three genes: a gene at time t and two genes at time t - 1. Eg. G1t, G3t - 1, G4t - 1 Statistical bioinformatics 41

Probabilistic Boolean Network Allow multiple Boolean functions at each node with different probabilities F = {{(f 11, c 11 ),..., (f 1k1, c 1k1 )},..., {(f n1, c n1 ),..., (f nkn, c nkn )}} where k1 is the number of different transition functions for gene 1 the sum of all transition probabilities c 11 to c 1k1 is always 1 Statistical bioinformatics 42

Example: Probabilistic Boolean Network Given three genes, and the transition functions: F = { F 1 = {(f 11, c 11 ), (f 12, c 12 )}, F 2 = {(f 21, 1)}, F 3 = {(f 31, c 31 ), (f 32, c 32 )}} Given the truth tabel and probabilities for each transition function There are 2*1*2 different set of transition functions that can be used. They are all listed in tabel K Statistical bioinformatics 43

Statistical bioinformatics 44

Transition probability matrix P 1 P 2 P 3 P 4 P 1 = c 11 *1* c 31 000 001 010 011 100 101 110 111 000 001 010 011 100 101 Statistical bioinformatics 45 110 111

R code x1x2x3 f11 f12 f21 f31 f32 State000 0 0 0 0 0 State001 1 1 1 0 0 State010 1 1 1 0 0 State011 1 0 0 1 0 State100 0 0 1 0 0 State101 1 1 1 1 0 State110 1 1 0 1 0 State111 1 1 1 1 1 truthtable1 <- read.table("m:/undervisning/statistical bioinformatics/datasets used/truthtableex.txt", header = T) truthtable <- truthtable1[,2:6] rownames(truthtable) <- substr(truthtable1[,1], 6,8) possiblecomb <- nrow(truthtable) c11 <- 0.6 #probability of function f11 being used c12 <- 1 - c11 c21 <- 1 c31 <- 0.5 c32 <- 1 - c31 A <- matrix(0, nrow = possiblecomb, ncol = possiblecomb) rownames(a) <- rownames(truthtable) colnames(a) <- rownames(truthtable) Statistical bioinfomratics 46

possiblemodelstruthtablecolumn <- rbind(c(1,3,4), c(1,3,5), c(2,3,4), c(2,3,5)) probmodels <- c(c11*c21*c31, c11*c21*c32, c12*c21*c31, c12*c21*c32) #probability that each of the possiblemodels are used for(i in 1:nrow(A)){ from <- rownames(a)[i] for (j in 1:nrow(possibleModelsTruthTableColumn)){ modelj <- possiblemodelstruthtablecolumn[j,] to <- paste(truthtable[i,modelj[1]], truthtable[i,modelj[2]], truthtable[i,modelj[3]], sep = "") probmodelsj <- probmodels[j] A[from, to] <- A[from, to] + probmodelsj } } Statistical bioinfomratics 47

Synchronous Boolean networks Assume that all genes are updated at the same time This simplification facilitates the analysis of the networks We have until now looked at such simplified networks Statistical bioinformatics 48

Asynchronous Boolean networks at each point of time t, only one of the transition functions f i F is chosen at random, and the corresponding Boolean variable is updated. Statistical bioinformatics 49

Provides tools for assembling analyzing visualizing Synchronous, asynchronous and probabilistic Boolean networks install.packages("boolnet") library(boolnet) Statistical bioinformatics 50

BoolNet, syntaxes targets, factors or targets, factors, probabilities Target is the gene that is effected Factors are those genes effecting it Probabilities occurs when it is more then one transition function Statistical bioinformatics 51

BoolNet, syntaxes Example CycD is an input, considered as constant. Translated into a transition rule: CycD, CycD Statistical bioinformatics 52

BoolNet, syntaxes Example Rb is expressed if all the genes CycA, CycB, CycD and CycE is absence; it can be expressed in the presence of CycE or CycA if their inhibitory activity is blocked by p27. Translated into a transition rule: First part:! CycA &! CycB &! CycD &! CycE Second part: p27 &! CycB &! CycD Together: Rb, (! CycA &! CycB &! CycD &! CycE) (p27 &! CycB &! CycD) Statistical bioinformatics 53

Read a network into R Assume that we have the file cellcycle.txt targets, factors CycD, CycD Rb, (! CycA &! CycB &! CycD &! CycE) (p27 &! CycB &! CycD) E2F, (! Rb &! CycA &! CycB) (p27 &! Rb &! CycB) CycE, (E2F &! Rb) CycA, (E2F &! Rb &! Cdc20 &! (Cdh1 & UbcH10)) (CycA &! Rb &! Cdc20 &! (Cdh1 & UbcH10)) p27, (! CycD &! CycE &! CycA &! CycB) (p27 &! (CycE & CycA) &! CycB &! CycD) Cdc20, CycB Cdh1,(! CycA &! CycB) (Cdc20) (p27 &! CycB) UbcH10,! Cdh1 (Cdh1 & UbcH10 & (Cdc20 CycA CycB)) CycB,! Cdc20 &! Cdh1 Read it into R by: cellcycle <- loadnetwork("cellcycle.txt") cellcycle Statistical bioinformatics 54

Reconstruct a network from time series A dataset that are already in BoolNet is the yeasttimeseries: To use this data it has to be binarized Statistical bioinformatics 55

Binarization, can be done in many ways. BoolNet support three methods: k-means clustering For each gene, k-means clustering are performed to determine a good separation of groups Edge detector This approach first sorts the measurements for each gene. In the sorted measurements, the algorithm searches for differences of two successive values that satisfy a predefined condition Scan statistic The scan statistic assumes that the measurements for each gene are uniformly and independently distributed. The scan statistic shifts a scanning window across the data and decides for each window position whether there is an unusual accumulation of data points based on an approximated test statistic (see Glaz et al.). 56

Reconstruct network. Statistical bioinformatics 57

How to read the output Fkh2 = <f(clb1){01}> means Clb1(t) Fkh2(t+1) 0 0 1 1 Sic1 = <f(sic1,clb1){0001}> means Sic1(t) Clb(t) Sic1(t+1) 0 0 0 1 1 0 1 1. Statistical bioinformatics 58

How to ead the output Fkh2 = <f(clb1){01}> means Clb1(t) Fkh2(t+1) 0 0 1 1 Sic1 = <f(sic1,clb1){0001}> means Sic1(t) Clb(t) Sic1(t+1) 0 0 0 0 1 0 1 0 0 1 1 1 Statistical bioinformatics 59

plotnetworkwiring(net) Statistical bioinformatics 60

Creating random networks It is desirable to generate artificial networks To study structural properties of Boolean networks To determine the specific properties of biological networks in comparison to arbitrary networks net <- generaterandomnknetwork(n=10, k=3) Statistical bioinformatics 61

Attractors Attractors are stable cycles of states in a Boolean network Attractors in models of gene-regulatory networks are expected to be linked to phenotypes All states that lead to a certain attractor form its basin of attraction Statistical bioinformatics 62

Simple attractors occur in synchronous Boolean networks consist of a set of states whose synchronous transitions form a cycle. Complex or loose attractors in asynchronous networks usually more than one possible transition for each state in an asynchronous network a complex attractor is formed by two or more overlapping loops. Steady-state attractors are attractors that consist of only one state. All transitions from this state result in the state itself. Statistical bioinformatics 63

Statistical bioinformatics 64

Perturbation experiments The generation of perturbed copies of a network is a way to test the robustness of structural properties of the networks to noise and mismeasurements. For example, you could assess the relevance of an attractor by checking whether the same attractor is still found when small random changes are applied to the network. If this is the case, it is less likely that the attractor is an artifact of mismeasurements. perturbednet <- perturbnetwork(cellcycle, perturb="functions", method="bitflip") Statistical bioinformatics 65

Generate random networks Generate a random network with as many nodes and edges as your original network Find the attractors in this network Perturbate the random network 1000 times, how many times are the original attractors from the random network found in the perturbated networks Repeate 1000 times 66

data(cellcycle) perturbednet <- perturbnetwork(cellcycle, perturb="functions", method="bitflip") Statistical bioinformatics 67

Boolean network modeling Positive: Explains the dynamic behavior of living systems efficiently (with possible loops!) Boolean algebra provides a rich set of algorithms already available for supervised learning in binary domain, such as logical analysis of data, and Boolean-based classification algorithms Dichotomization to binary values improves accuracy of classification and simplifies the obtained models by reducing the noise level in experimental data Statistical bioinformatics 68

Boolean network modeling Negative: Requires heavy computing times to construct a network structure Needs specific time course data that well capture pathway interactions, but often uncertain whether there are such time points and, even so, whether they were captured well by a time-course experiment Needs a relatively large number of time points Lose quantitative information by dichotomization to binary values Statistical bioinformatics 69