Graph structure learning for network inference

Size: px

Start display at page:

Download "Graph structure learning for network inference"

Alicia Norman
6 years ago
Views:

1 Graph structure learning for network inference Sushmita Roy Computa9onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu Sep 13 th 2016 Some of the material covered in this lecture is adapted from BMI 576

2 Goals for today Represen2ng transcrip2onal networks Learning regulatory networks from gene expression data Per-gene regulatory network inference algorithms Different types of probabilis2c graphical models Bayesian networks

Transcrip9onal regulatory networks Transcrip2on

3 Transcrip9onal regulatory networks Transcrip2on factors (TF) (a) DNA A B Gene C A B C Directed, signed, weighted graph Nodes: TFs and Target genes Edges: A regulates C s expression level Regulatory network of E. coli. 153 TFs (green & light red), 1319 targets Vargas and San2llan, 2008

4 Why do we need to computa9onally infer transcrip9onal networks? Why infer transcrip2onal networks? Control which genes are expressed when and where Needed for accurate informa2on processing in cells Many diseases are associated with changes in transcrip2onal networks Why do so computa2onally? Experimental detec2on of networks is hard, expensive A first step towards having an in silico model of a cell A model can be used to make predic2ons that can be tested and refine the model

5 Methods to infer regulatory networks Experimental ChIP-chip and ChIP-seq Factor/regulator knockout followed by genome-wide transcriptome measurements DNase I/ATAC-seq + mo2f finding Computa2onal Supervised network inference Unsupervised network inference Expression-based network inference

6 Types of data for reconstruc9ng transcrip9onal networks Node-specific datasets Transcriptomes: Genome-wide mrna levels Can poten2ally recover genome-wide regulatory networks Gene expression levels Samples Edge-specific datasets ChIP-chip and ChIP-seq Sequence specific mo2fs Factor knockout followed by wholetranscriptome profiling ChIP mo2f Gene

7 Experimental techniques to measure expression Microarrays cdna/spobed arrays Oligonucleo2des arrays Sequencing RNA-seq

8 Microarrays A microarray is a solid support, on which pieces of DNA are arranged in a grid-like array Each piece is called a probe Measures RNA abundances by exploi2ng complementary hybridiza2on DNA from labeled sample is called target

9 SpoEed versus oligonucleo9de array Three key steps: Reverse transcrip2on, labeling and hybridiza2on From Vermeeren and Luc Michiels 2011 DOI: /19432

10 A video about two color DNA microarrays From: hbps://

A video about DNA microarrays From: hbps://www.youtube.com/watch?

11 A video about DNA microarrays From: hbps:// Also see: hbp://

12 PERSP ECTIVES A typical RNA-seq pipeline AAAAAAAA mrna RNA fragments or AAAAAAAA TTTTTTTT cdna EST library with adaptors ATCACAGTGGGACTCCATAAATTTTTCT CGAAGGACCAGCAGAAACGAGAGAAAAA GGACAGAGTCCCCAGCGGGCTGAAGGGG ATGAAACATTAAAGTCAAACAATATGAA... Short sequence reads ORF Coding sequence Junction reads Exonic reads...aaaaaaaaa...aaaaaa poly(a) end reads Mapped sequence reads Wang et al, Nature Gene2cs 2009; RNA expression level Base-resolution expression profile Nucleotide position Figure 1 A typical RNA-Seq experiment. Briefly, long RNAs are first converted into a library of cdna fragments through either RNA fragmentation or DNA fragmentation (see main text). Sequencing adaptors (blue) are subsequently added to each cdna fragment and a short sequence is obtained from

13 What do we want a model for a regulatory network to capture? Hot1 Sko1 HSP12 ON Sko1 HSP12 OFF HSP12 HSP12 HSP12 s expression is dependent upon Hot1 and Sko1 binding to HSP12 s promoter Hot1 Sko1 X 1 X 2 Hot1 regulates HSP12 HSP12 is a target of Hot1 HSP12 X 3 =ψ(x 1,X 2 ) X 3 BOOLEAN LINEAR DIFF. EQNS PROBABILISTIC. Structure Who are the regulators? Func9on How they determine expression levels?

14 Mathema9cal representa9ons of the how ques9on X 1 X 2 Input expression of neighbors ψ Models differ in the func9on that maps input system state to output state X 3 Output expression of target Boolean Networks Differen2al equa2ons Probabilis2c graphical models X1 X2 X3 Input Output X1 X2 X dx 3 (t) = dt g(x 1 (t),x 2 (t)) rx 3 (t) P (X 3 X 1,X 2 )= N(X 1 a + X 2 b, ) Probability distribu9ons

Expression-based network inference X 1 X 2 Biological samples Phenotypic Variation in Yeast Genes Expression-based network inference Expression level of gene i in experiment j Figure 3.

15 Expression-based network inference X 1 X 2 Biological samples Phenotypic Variation in Yeast Genes Expression-based network inference Expression level of gene i in experiment j Figure 3. Variation in gene expression in S. cerevisiae isolates. The diagrams show the average log2 expression differences measured in the denoted strains. Each row represents a given gene and each column represents a different strain, color-coded as described in Figure 1. (A) Expression patterns of 2,680 genes that varied significantly (FDR = 0.01, paired t-test) in at least one strain compared to S288c. (B) Expression patterns of 953 genes that varied significantly in at least one strain compared to strain YPS163 (FDR=0.01, unpaired t-test). For (A) and (B), a red color indicates higher expression and a green color represents lower expression in the denoted strain compared to S288c, according to the key. (C) Expression patterns of 1,330 genes that varied significantly (FDR = 0.01, paired t-test) in at least one strain compared to the mean expression of all 17 strains. Here, red and green correspond to higher and lower expression, respectively, compared to the mean expression of that gene in all strains. Genes were organized independently in each plot by hierarchical clustering. doi: /journal.pgen g003 differences from the mean ranged from 30 (in vineyard strain I14) to nearly 600 (in clinical isolate YJM789), with a median of 88 expression differences per strain. The number of expression differences did not correlate strongly with the genetic distances of the strains (R 2 = 0.16). However, this is not surprising since many Structure sensory protein [35]. We also noted that the 1330 genes with statistically variable expression in at least one non-laboratory of the observed expression differences are likely linked in trans to the same genetic loci [27,31,34,35,43]. Consistent with this of some of these categories was also observed for a different set of vineyard strains [26,28], and the genetic basis for differential expression of amino acid biosynthetic genes in one vineyard strain has recently been linked to a polymorphism in an amino acid strain were enriched for genes that contained upstream TATA X 3 X 1 X 2 X 3 =f(x 1,X 2 ) X 3 Func9on

16 Expression-based regulatory network inference Given A set of measured mrna levels across mul2ple biological samples Do Infer the regulators of genes Infer how regulators specify the expression of a gene Algorithms for network reconstruc2on vary based on their meaning of interac2on

17 Two classes of expression-based network inference methods Per-gene/direct methods X 1 X 2 X 1 X 4 X 3 X 5 Module based methods (Subsequent lectures) X 1 X 2 Module (Cluster) X 3 X 5

18 Per-gene methods X 1 X 2 X 3 X 4 X 3 X 5 Key idea: find the regulators that best explain expression level of a gene Probabilis2c graphical methods Bayesian network Sparse Candidates Dependency networks GENIE3, TIGRESS Informa2on theore2c methods Context Likelihood of relatedness ARACNE

19 Probabilis9c graphical models (PGMs) A marriage between probability and graph theory Nodes on the graph represent random variables Graph structure specifies sta2s2cal dependency structure Graph parameters specify the nature of the dependency PGMs can be directed or undirected Examples of PGMs: Bayesian networks, Dependency networks, Markov networks, Factor graphs

20 Different types of probabilis9c graphs In each graph type we can assert different condi2onal independencies Correla2on networks Gaussian Graphical models Dependency networks Bayesian networks

21 An undirected graph Correla9onal networks Edges represent high correla2on Need to determine what high is Edge weights denote different values of correla2on Cannot discriminate between direct and indirect correla2ons Random variables represent gene expression levels Msb2 Sho1 X 1 X 2 X 1 X 2 w 13 X 3 w 23 A measure of sta2s2cal correla2on (e.g. Pearson s correla2on) Ste20 X 3 An undirected weighted graph.

22 Popular examples of correla9onal networks Weighted Gene Co-expression Network Analysis (WGCNA) - Zhang and Horvath 2005 Relevance Networks BuBe & Kohane, 2000 Pacific symposium of biocompu2ng

23 Limita9ons of correla9onal networks Correla2onal networks cannot dis2nguish between direct and indirect dependencies This makes them less interpretable than other PGMs Coexpression Regulatory network Y Y Y H X Z X Z X Z X Y Z For any co-expression network, there are several possible regulatory networks that can explain these correla2ons. What we would like is to be able to discriminate between direct and indirect dependencies Here we need to review condi2onal independencies

24 Condi9onal independencies in PGMs The different classes of models we will see are based on a general no2on of specifying sta2s2cal independence Suppose we have two genes X and Y. We add an edge between X and Y if X and Y are not independent given a third set Z. Depending upon Z we will have a family of different PGMs

25 Condi9onal independence and PGMs Correla2onal networks Z is the empty set Markov networks X and Y are not independent given all other variables Gaussian Graphical models are a special case (later lectures) Dependency networks Approximate Markov networks May not be associated with a valid joint distribu2on (later lectures) First-order condi2onal independence models Explain the correla2on between two variables by a third variable Bayesian networks Generalize first-order condi2onal independence models

26 Bayesian networks (BN) A special type of probabilis2c graphical model Has two parts: A graph which is directed and acyclic A set of condi2onal distribu2ons Directed Acyclic Graph (DAG) The nodes denote random variables X 1 X N The edges encode sta2s2cal dependencies between the random variables establish parent child rela2onships Each node X i has a condi=onal probability distribu=on (CPD) represen2ng P(X i Pa(X i )); Pa: Parents Provides a tractable way to represent large joint distribu2ons

27 Key ques9ons in Bayesian networks What do the CPDs look like? What independence asser2ons can be made in Bayesian networks? How can we learn them from expression data? Inherent limita2ons of Bayesian networks

C An example Bayesian network P(S=F) P(S=T) F 0.5 0.5 T 0.9 0.

2 0.8 WetGrass (W) S R P(W=F) P(W=T) F F 1 0 T F 0.1 0.9 F T 0.1 0.9 T T 0.01 0.

28 C An example Bayesian network P(S=F) P(S=T) F T Sprinkler (S) P(C=F) P(C=T) Cloudy (C) Rain (R) C P(R=F) P(R=T) F T WetGrass (W) S R P(W=F) P(W=T) F F 1 0 T F F T T T Adapted from Kevin Murphy: Intro to Graphical models and Bayes networks: hbp:// bnintro.html

29 Nota9on X i : i th random variable If there are few random variables, we will just use upper case lebers. E.g. A, B, C.. X={X 1,.., X p }: set of p random variables x ik : An assignment of X i in the kth sample x -ik : Set of assignments to all variables other than X i in the k th sample I(X i ;X j X k,x l ): Condi2onal independence nota2on: X i is independent of X j given X k and X l

30 Bayesian networks compactly represent joint distribu9ons P (X 1,,X p )= py P (X i Pa(X i )) i=1

31 Example Bayesian network of 5 variables PARENTS X 1 X 2 CHILD X 3 X 4 X 5 Assume X i is binary No independence asser2ons P (X) =P (X 1,X 2,X 3,X 4,X 5 ) Needs 2 5 measurements Independence asser2ons Needs 2 3 measurements P (X) =P (X 1 )P (X 2 )P (X 4 )P (X 3 X 1,X 2 )P (X 5 X 3,X 4 )

32 CPD in Bayesian networks The CPD P(X i Pa(X i )) specifies a distribu2on over values of X i for each combina2on of values of Pa(X i ) CPD P(X i Pa(X i )) can be parameterized in different ways X i are discrete random variables Condi2onal probability table or tree X i are con2nuous random variables CPD can be condi2onal Gaussians or regression trees

33 Represen9ng CPDs as tables Consider four binary variables X 1, X 2, X 3, X 4 P( X 4 X 1, X 2, X 3 ) as a table X 4 X 1 X 2 X 3 t f X 1 X 2 X 4 X 3 t t t t t f t f t t f f Pa(X 4 ): X 1, X 2, X 3 f t t f t f f f t f f f

34 Es9ma9ng CPD table from data Assume we observe the following assignments for X 1, X 2, X 3, X 4 N=7 T F T T For each joint assignment to X 1, X 2, X 3, X 1 X 2 X 3 X 4 T T F T es2mate the probabili2es for each value of X 4 T T F T T F T T T F T F T F T F F F T F For example, consider X 1 =T, X 2 =F, X 3 =T P(X 4 =T X 1 =T, X 2 =F, X 3 =T)=2/4 P(X 4 =F X 1 =T, X 2 =F, X 3 =T)=2/4

35 Condi9onal independencies in BN A variable X i is independent of its nondescendants given its parents I(X i ;X j X k,x l ): X i is independent of X j given X k and X l Consider the example Bayesian network. What are the set of condi2onal independencies in this graph? I(A; E) I(B; D A, E) I(D; E,B,C A) I(C; E, A, D B) I(E; A, D) Friedman et al., 2000

36 Bayesian network representa9on of a regulatory network Hot1 Sko1 HSP12 Inside the cell Hot1: Sko1: X 1 X 2 Random variables REGULATORS (PARENTS) P(X 1 ) X 1 X 2 P(X 2 ) Hsp12: X 3 TARGET (CHILD) X 3 P(X 3 X 1,X 2 ) Bayesian network

37 Learning problems in Bayesian networks Parameter learning on known graph structure Given a set of joint assignments of the random variables, es2mate the parameters of the model Structure learning Given a set of joint assignments of the random variables, es2mate the structure and parameters of the model Structure learning subsumes parameter learning

38 Network inference/graph structure learning is a computa9onally difficult problem Given 2 TFs and 3 nodes how many possible networks can there be? Not exhaus2ve set of possible networks. There can be a total of 2 6 possible networks.

39 Structure learning using score-based search Score(B) Describes how well B describes the data... Score(B 1 ) Score(B 2 ) Score(B 3 ) Score(B m ) Exhaus2ve search is not computa2onally tractable

40 Scoring a Bayesian network Maximum likelihood score Score ML (G) = max P (D G, ) Bayesian score Score Bayes (G) =P (D G) = P (D G)P (G) P (D) We typically ignore the denominator as it is the same for all models

41 Greedy Hill climbing to search Bayesian network space Input: Data D, An ini2al Bayesian network, B 0 ={G 0, Θ 0 } Output: B best Loop for r=1, 2.. un2l convergence: {B r1,.., B rm } = Neighbors(B r ) by making local changes to B r B r+1 : arg max j (Score(B rj )) Termina2on: B best = B r

42 Local changes to B i Current network D B C A B r add an edge D delete an edge D B C B C A B 1 Check for cycles A B 2 r r

43 Challenges with applying Bayesian network to genome-scale data Number of variables, p is in thousands Number of samples, N is in hundreds

44 Bayesian network-based methods to handle genome-scale networks Sparse candidate algorithm Friedman, Nachman, Pe er Friedman, Linial, Nachman, Pe er Module networks Segal, Pe er, Regev, Koller, Friedman. 2005

45 The Sparse candidate Structure learning in Bayesian networks A fast Bayesian network learning algorithm Key idea: Iden2fy k promising candidate parents for each X i k<<p, p: number of random variables Candidates define a skeleton graph H Restrict graph structure to select parents from H Early choices in H might exclude other good parents Resolve using an itera2ve algorithm

46 Sparse candidate algorithm Input: A data set D An ini2al Bayes net B 0 A parameter k: max number of parents per variable Output: Final B r Loop for r=1,2.. un2l convergence Restrict Based on D and B r-1 select candidate parents C i r for X i This defines a skeleton directed network H r Maximize Find network B r that maximizes the score Score(B r ) among networks sa2sfying Termina2on: Return B r Pa r (X i ) C r i

47 Informa9on theory for measuring dependence I(X;Y) is the mutual informa2on between two variables Knowing X, how much informa2on do we have for Y P(Z) is the probability distribu2on of Z Measures the difference between the two distribu2ons: joint and product of marginals

48 Selec9ng candidate parents in the Restrict Step A good parent for X i is one with strong sta2s2cal dependence with X i Mutual informa2on provides a good measure of sta2s2cal dependence I(X i ; X j ) Mutual informa2on should be used only as a first approxima2on Candidate parents need to be itera2vely refined to avoid missing important dependences A good parent for X i has the highest score improvement when added to Pa(X i )

49 Sparse candidate learns good networks faster than hill-climbing Greedy hill climbing takes much longer to reach a high scoring bayesian network Score (higher is beber) Greedy HC Disc Disc 15 Score 5 Score 15 Shld 5 Shld Time Time Greedy HC Disc 5 Disc 15 Score 5 Score 15 Shld 5 Shld 15 Dataset 1 Dataset variables 200 variables

50 Some comments about choosing candidates How to select k in the sparse candidate algorithm? Should k be the same for all X i? Regularized regression approaches can be used to es2mate the structure of an undirected graph Schmidt, Niculescu-Mizil, Murphy 2007 Es2mate an undirected dependency network Learn a Bayesian network constrained on the dependency network structure

51 Assessing confidence in the learned network Typically the number of training samples is not sufficient to reliably determine the right network One can however es2mate the confidence of specific features of the network Graph features f(g) Examples of f(g) An edge between two random variables Order rela2ons: Is X, Y s ancestor?

52 How to assess confidence in graph features? What we want is P(f(G) D), which is Gf(G)P (G D) But it is not feasible to compute this sum Instead we will use a bootstrap procedure

53 Bootstrap to assess graph feature confidence For i=1 to m Construct dataset D i by sampling with replacement N samples from dataset D, where N is the size of the original D Learn a network B i ={G i, Θ i } For each feature of interest f, calculate confidence Conf(f) = 1 mx f(g i ) m i=1

54 Does the bootstrap confidence represent real rela9onships? Compare the confidence distribu2on to that obtained from randomized data Shuffle the columns of each row (gene) separately Repeat the bootstrap procedure g Experimental condi2ons 1,1 g 1, n genes g2,1 g 2, n gm,1 g m, n randomize each row independently

Bootstrap-based confidence differs between real and actual data f Features with Confidence above 1400 1200 1000 800 600 400

55 Bootstrap-based confidence differs between real and actual data f Features with Confidence above Random Real f Friedman et al 2000

56 Example of a high confidence sub-network One learned Bayesian network Bootstrapped confidence Bayesian network: highlights a subnetwork associated with yeast ma2ng pathway. Colors indicate genes with known func2ons. Nir Friedman, Science 2004

57 Limita9ons with Bayesian networks Cannot model cyclic dependencies In prac2ce have not been shown to be beber than dependency networks However, most of the evalua2on has been done on structure not func2on Direc2onality is oxen not associated with causality Too many hidden variables in biological systems

58 A non-exhaus9ve list of expression-based network inference method Method Name Per-module Per-gene Parameters Model type Sparse candidate Bayesian network CLR Informa2on theore2c ARACNE Informa2on theore2c TIGRESS Dependency network Inferelator Dependency network GENIE3 Dependency network ModuleNetworks Bayesian network LemonTree Dependency network WGCNA Correla2on

59 ARACNE Es2mate mutual informa2on between all pairs of nodes Get rid of poten2al indirect links: These typically correspond to lowest mutual informa2on in a triple True network I(X 1,X 2 ) X 1 I(X 1,X 3 ) X 1 X 2 X 3 X 2 X 3 Regulators Target I(X 2,X 3 ) The edge between X 1 and X 3 would I(X 1,X 3 ) < min(i(x 1,X 2 ),I(X 2,X 3 )) have a low mutual informa2on Margolin et al 2006 Exclude edges with lowest informa2on in a triplet

60 Context Likelihood of Relatedness (CLR) M ij : Mutual informa2on between gene j and regulator i Context for gene j : mutual informa2on of j with all other regulators Context for regulator i : mutual informa2on of i with all other target genes Use the contexts to compute two background distribu2ons of mutual informa2on Get a z-value for M ij with respect to these distribu2ons. Combine regulator and target gene z-values into a single z-value Call an edge if z-value is greater than a cutoff. Faith et al, 2007

61 Using Context in CLR to prune out edges i M ij j z ij is the likelihood of observing M ij from either distribu2on by chance Use z ij to decide if gene i regulates gene j.

62 Plan for next lectures Dependency networks GENIE3 Evalua2on of different network inference algorithms Per-module networks Module networks Strengths and weaknesses of different methods Combining per-module and per-gene methods Case studies of expression-based network inference methods

63 References Kim, Harold D., Tal Shay, Erin K. O'Shea, and Aviv Regev. "Transcrip2onal Regulatory Circuits: Predic2ng Numbers from Alphabets." Science 325 (July 2009): De Smet, Riet and Kathleen Marchal. "Advantages and limita2ons of current network inference methods.." Nature reviews. Microbiology 8 (October 2010): Markowetz, Florian and Rainer Spang. "Inferring cellular networks-a review.." BMC bioinforma=cs 8 Suppl 6 (2007): S5+. N. Friedman, M. Linial, I. Nachman, and D. Pe'er, "Using bayesian networks to analyze expression data," Journal of Computa=onal Biology, vol. 7, no. 3-4, pp , Aug [Online]. Available: hbp:// dx.doi.org/ /

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput